How to fix Active MQ Wildfly warning - Page file 000002707.page had incomplete records at position 373,401 at record number 9? - wildfly

How to fix following Wildfly Warnings:
2019-10-09 15:15:04,179 WARN [org.apache.activemq.artemis.core.server] (Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2#216f6b3e-1182524264)) AMQ222033: Page file 000002707.page had incomplete records at position 373,401 at record number 9
2019-10-09 15:15:05,182 WARN [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-remoting-threads-ActiveMQServerImpl::serverUUID=3bea749a-88f7-11e7-b497-27b2839ef45c-1594512315-717665458)) AMQ222033: Page file 000002707.page had incomplete records at position 373,401 at record number 9
2019-10-09 15:15:05,185 WARN [org.apache.activemq.artemis.core.server] (Thread-11 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2#216f6b3e-1182524264)) AMQ222033: Page file 000002707.page had incomplete records at position 373,401 at record number 9
I only get one link on google which suggests server crashes but i am not sure how to stop this -> https://developer.jboss.org/thread/154232
It contains Apache Camel Project which picks 20,000 messages put on queue and many of them discards and other processed not sure if that are related

The linked forum post does a fine job of explaining the likely reason why page file was corrupted. To "fix" the problem I recommend you consume all the messages you can from the affected queue, stop the broker, and remove the corrupted page file.

Related

Kubernetes DSE Cassandra CommitLogReplayer$CommitLogReplayException

I have installed Cassandra on Kubernetes (9 pods) All the pods are up and running except
for one pod, which shows the below error.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Caused by: org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:111)
... 12 more
ERROR [main] 2021-09-06 06:19:08,990 JVMStabilityInspector.java:251 - JVM state determined to be unstable. Exiting forcefully due to:
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Can someone help me out please
For whatever reason, one of the commit log segments got corrupted on the node.
You can workaround the issue by manually deleting this file on the pod:
/var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log
Interestingly, that commit log segment was created on September 2 (1630582314923) but the log entry you posted was from September 6. This indicates something happened to the pod which resulted in the corrupted file.
You'll need to review the Cassandra logs on the pod (not the pod logs itself) to determine the root cause and address it. Cheers!

Unable to start tomcat 9 with flowable war- PUBLIC.ACT_DE_DATABASECHANGELOGLOCK error

I have downloaded flowable from flowable.com/open-source and placed the flowable-ui.war and flowable-rest.war in tomcat 9.0.52 webapps folder.
When i start server after some time i can see below line repeating in cmd and server getting stopped.
SELECT LOCKED FROM PUBLIC.ACT_DE_DATABASECHANGELOGLOCK WHERE ID=1
2021-08-13 20:45:05.818 INFO 8316 --- [ main] l.lockservice.StandardLockService : Waiting for changelog lock.
Why is this issue occurring I have not made any changes?
The message
l.lockservice.StandardLockService : Waiting for changelog lock.
occurs when Flowable waits for the lock for the DB changes to be released.
If that doesn't happen it means that some other node has picked up the log and not released it properly. I would suggest you manually deleting the lock from that table (ACT_DE_DATABASECHANGELOGLOCK).
In addition to that, there is no need to run both flowable-ui.war and flowable-rest.war. flowable-rest.war is a subset of flowable-ui.war.

Too many empty chk-* directories with Flink checkpointing using RocksDb as state backend

Too many empty chk-* files exist in the location where I have setup Rocksdb as state backend
I am using FlinkKafkaConsumer to get data from Kafka topic. And I am using RocksDb as state backend. I am just printing the messages received from Kafka.
Following are the properties I have to set up the state backend:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(100);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(50);
env.getCheckpointConfig().setCheckpointTimeout(60);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
StateBackend rdb = new RocksDBStateBackend("file:///Users/user/Documents/telemetry/flinkbackends10", true);
env.setStateBackend(rdb);
env.execute("Flink kafka");
In flink-conf.yaml I have also set this property:
state.checkpoints.num-retained: 3
I am using simple 1 node flink cluster(using ./start-cluster.sh) .I started the job and kept it running for 1 hour and I see too many chk-* files created under /Users/user/Documents/telemetry/flinkbackends10 location
chk-10 chk-12667 chk-18263 chk-20998 chk-25790 chk-26348 chk-26408 chk-3 chk-3333 chk-38650 chk-4588 chk-8 chk-96
chk-10397 chk-13 chk-18472 chk-21754 chk-25861 chk-26351 chk-26409 chk-30592 chk-34872 chk-39405 chk-5 chk-8127 chk-97
chk-10649 chk-13172 chk-18479 chk-22259 chk-26216 chk-26357 chk-26411 chk-31097 chk-35123 chk-39656 chk-5093 chk-8379 chk-98
chk-1087 chk-14183 chk-18548 chk-22512 chk-26307 chk-26360 chk-27055 chk-31601 chk-35627 chk-4 chk-5348 chk-8883 chk-9892
chk-10902 chk-15444 chk-18576 chk-22764 chk-26315 chk-26377 chk-28064 chk-31853 chk-36382 chk-40412 chk-5687 chk-9 chk-99
chk-11153 chk-15696 chk-18978 chk-23016 chk-26317 chk-26380 chk-28491 chk-32356 chk-36885 chk-41168 chk-6 chk-9135 shared
chk-11658 chk-16201 chk-19736 chk-23521 chk-26320 chk-26396 chk-28571 chk-32607 chk-37389 chk-41666 chk-6611 chk-9388 taskowned
chk-11910 chk-17210 chk-2 chk-24277 chk-26325 chk-26405 chk-29076 chk-32859 chk-37642 chk-41667 chk-7 chk-94
chk-12162 chk-17462 chk-20746 chk-25538 chk-26337 chk-26407 chk-29581 chk-33111 chk-38398 chk-41668 chk-7116 chk-95
out of which only chk-41668, chk-41667, chk-41666 have data.
The rest of the directories are empty.
Is this expected behavior. How to delete those empty directories? Is there some configuration for deleting empty directories?
Answering my own question here:
In UI I was seeing 'checkpoint expired before completing' error in the checkpointing section. And found out that to resolve the error we need to increase the checkpoint timeout.
I increased the timeout from 60 to 500 and it started deleting the empty chk-* files.
env.getCheckpointConfig().setCheckpointTimeout(500);

Lagom's embedded Kafka fails to start after killing Lagom process once

I've played around with lagom-scala-word-count Activator template and I was forced to kill the application process. Since then embedded kafka doesn't work - this project and every new I create became unusable. I've tried:
running sbt clean, to delete embedded Kafka data
creating brand new project (from other activator templates)
restarting my machine.
Despite this I can't get Lagom to work. During first launch I get following lines in log:
[warn] o.a.k.c.NetworkClient - Error while fetching metadata with correlation id 1 : {wordCount=LEADER_NOT_AVAILABLE}
[warn] o.a.k.c.NetworkClient - Error while fetching metadata with correlation id 2 : {wordCount=LEADER_NOT_AVAILABLE}
[warn] o.a.k.c.NetworkClient - Error while fetching metadata with correlation id 4 : {wordCount=LEADER_NOT_AVAILABLE}
[warn] a.k.KafkaConsumerActor - Consumer interrupted with WakeupException after timeout. Message: null. Current value of akka.kafka.consumer.wakeup-timeout is 3000 milliseconds
[warn] a.k.KafkaConsumerActor - Consumer interrupted with WakeupException after timeout. Message: null. Current value of akka.kafka.consumer.wakeup-timeout is 3000 milliseconds
Next launches result in:
[info] Starting Kafka
[info] Starting Cassandra
....Kafka Server closed unexpectedly.
....
[info] Cassandra server running at 127.0.0.1:4000
I've posted full server.log from lagom-internal-meta-project-kafka at https://gist.github.com/szymonbaranczyk/a93273537b42aafa45bf67446dd41adb.
Is it possible that some corrupted Embedded Kafka's data is stored globally on my pc and causes this?
For future reference, as James mentioned in comments, you have to delete the folder lagom-internal-meta-project-kafka in target/lagom-dynamic-projects. I don't know why it's not get deleted automatically.

Where can I configure the timeout of JBoss cluster node drop?

We have 4 nodes clustering. When one of them has a long GC pause, then this causes the note to be dropped from clustering and generating the following log:
2012-06-14 03:27:48,277 INFO [org.jboss.messaging.core.impl.postoffice.GroupMember] org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener#6225352b got new view [10.164.218.18:7910|10] [10.164.218.18:7910, 10.164.107.69:7910, 10.164.107.65:7910], old view is [10.164.218.14:7910|9] [10.164.218.14:7910, 10.164.218.18:7910, 10.164.107.69:7910, 10.164.107.65:7910]
2012-06-14 03:27:48,277 INFO [org.jboss.messaging.core.impl.postoffice.GroupMember] I am (10.164.218.18:7910)
2012-06-14 03:27:48,998 INFO [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] JBoss Messaging is failing over for failed node 52. If there are many messages to reload this may take some time...
I would like to configure the timeout of the node drop. It seems to be 2 minutes in my case and I would like to increase it, but I can't find where to configure it.
Where can I configure the timeout of JBoss cluster node drop?
I found it, it in oracle-persistence-service.xml. You need to adjust the configurations in
ControlChannelConfig section. I think it is the 'timeout' of 'FD' protocol.