sincedb_write_interval is not working in ELK - elastic-stack

I am doing offline analysis of the logs from previous days. Elasticsearch and logstash version used is 2.3.1
My input file:
path => "/opt/logs/test/*"
sincedb_path => "/opt/logs/.sincedb"
sincedb_write_interval => 10
start_position => "beginning"
I see the sincedb file is created only when the last log line is reached. And whenever the logstash file is stopped in between, the log parsing is started from the beginning instead of the position where it was stopped. This causes duplicate entries in kibana.
I assume the sincedb should write at interval of every 10sec to the file(as specified in my input). And if the logstash is stopped for any reason and restarted it should continue from the previous stoped position. Is there some more code to be added or the sincedb file is created only at the end of the file? Please suggest how to avoid duplicate parsing.

Related

How to resolve "Invalid Sequence Token" when using cloudwatch agent?

I'm seeing the following warning in the /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log:
2021-10-06T06:39:23Z W! [outputs.cloudwatchlogs] Invalid SequenceToken used, will use new token and retry: The given sequenceToken is invalid. The next expected sequenceToken is: 49619410836690261519535138406911035003981074860446093650
But there is no mention about which file is really the one that it's failing. Not even when I add "debug": true to the /opt/aws/amazon-cloudwatch-agent/bin/config.json.
cat /opt/aws/amazon-cloudwatch-agent/bin/config.json|jq .agent
{
"metrics_collection_interval": 60,
"debug": true,
"run_as_user": "root"
}
I have many (28) files in my .logs.logs_collected.files.collect_list section of the config.json file, so how can I find which file is exactly causing trouble?
As of 2021-11-29 a PR to improve the log messages has been merged to the cloudwatch-agent but a new version of the cloudwatch-agent has not been released yet, the next version after v1.247349.0 will likely include a fix for this.
The fix will change the log statements to say
INFO: First time sending logs to %v/%v since startup so sequenceToken is nil, learned new token: xxxx: yyyy: This is an INFO message, as this behaviour is expected at startup for example.
WARN: Invalid SequenceToken used (%v) while sending logs to %v/%v, will use new token and retry: xxxxxv: This on the other hand is not expected and may mean that someone else is writing to the loggroup/logstream concurrently.
If those warnings come right after a restart of the cloudwatch agent (cwagent) then you can safely ignore them, it's expected behaviour . The cloudwatch agent does not save the next sequence token in its persistent state so on restart it will "learn" the correct sequence number by issuing a PutLogEvent with no sequence token at all, that returns an InvalidSequenceTokenException with the next sequence token to use. So it's expected to see those at startup, anyway I proposed a PR to amazon-cloudwatch-agent to improve those log messages.
If the "Invalid SequenceToken used" is seen long after the restart then you may have other issues.
The "Invalid SequenceToken used" error usually means that two entities/sources are trying to write to the same log group/log stream as mentioned in 2 (which is really for the old awslogs agent but still useful):
Caught exception: An error occurred (InvalidSequenceTokenException)
when calling the PutLogEvents operation: The given sequenceToken is
invalid[…] -or- Multiple agents might be sending log events to log
stream[…] – You can't push logs from multiple log files to a single
log stream. Update your configuration to push each log to a log
stream-log group combination.
I could be that the amazon cloudwatch agent itself it's trying to upload the same file twice because you have duplicates in your config.json.
So first print all your log group / log stream pairs in your config.json with:
cat /opt/aws/amazon-cloudwatch-agent/bin/config.json|jq -r '.logs.logs_collected.files.collect_list[]|"\(.log_group_name) \(.log_stream_name)"'|sort
which should give an output similar to:
/tableauserver/apigateway apigateway_node5-0.log
/tableauserver/apigateway control_apigateway_node5-0.log
/tableauserver/appzookeeper appzookeeper-discovery_node5-1.log
...
/tableauserver/vizqlserver vizqlserver_node5-3.log
Then you can use uniq -d to find the duplicates in that list with:
cat /opt/aws/amazon-cloudwatch-agent/bin/config.json|jq -r '.logs.logs_collected.files.collect_list[]|"\(.log_group_name) \(.log_stream_name)"'|sort|uniq -d
# The list should be empty otherwise you have duplicates
If that command produces any output it means that you have duplicates in your config.json collect_list.
I personally think that cwagent itself should print the "offending" loggroup/logstream in the logs so I opened in issue in amazon-cloudwatch-agent GitHub page.

Kubernetes DSE Cassandra CommitLogReplayer$CommitLogReplayException

I have installed Cassandra on Kubernetes (9 pods) All the pods are up and running except
for one pod, which shows the below error.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Caused by: org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:111)
... 12 more
ERROR [main] 2021-09-06 06:19:08,990 JVMStabilityInspector.java:251 - JVM state determined to be unstable. Exiting forcefully due to:
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Can someone help me out please
For whatever reason, one of the commit log segments got corrupted on the node.
You can workaround the issue by manually deleting this file on the pod:
/var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log
Interestingly, that commit log segment was created on September 2 (1630582314923) but the log entry you posted was from September 6. This indicates something happened to the pod which resulted in the corrupted file.
You'll need to review the Cassandra logs on the pod (not the pod logs itself) to determine the root cause and address it. Cheers!

Starting Druid to consume data from kafka

Took the latest version of druid 0.16.0-incubating.
Had 2 question .
1) As mentioned in quick start , micro-quick-start doesnt work as it complains about no file jvm.config and main.config under /conf/druid/single-server/micro-quickstart/coordinator-overlord .
2) As micro qucik start failed i started to try with single-server-small.
Was trying to import data from kafka in single-server-small but unable to do so as it says extension is not loaded , by the way which i can see gets loaded in logs.
But i think my main problem is when ever i land up on 'Load data' Section on druid web page on localhost:8888 , it keeps me giving below error
"Failed to get overlord modules : Unable to determine destination for [/proxy/overlord/status]; is your coordinator/overlord running ?"
I can see my coordinator-overlord process up .
Any suggestions ?
Thanks
Delete your kafka logs and druid logs(the location set up in common.runtime.properties) and then try again.

Too many empty chk-* directories with Flink checkpointing using RocksDb as state backend

Too many empty chk-* files exist in the location where I have setup Rocksdb as state backend
I am using FlinkKafkaConsumer to get data from Kafka topic. And I am using RocksDb as state backend. I am just printing the messages received from Kafka.
Following are the properties I have to set up the state backend:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(100);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(50);
env.getCheckpointConfig().setCheckpointTimeout(60);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
StateBackend rdb = new RocksDBStateBackend("file:///Users/user/Documents/telemetry/flinkbackends10", true);
env.setStateBackend(rdb);
env.execute("Flink kafka");
In flink-conf.yaml I have also set this property:
state.checkpoints.num-retained: 3
I am using simple 1 node flink cluster(using ./start-cluster.sh) .I started the job and kept it running for 1 hour and I see too many chk-* files created under /Users/user/Documents/telemetry/flinkbackends10 location
chk-10 chk-12667 chk-18263 chk-20998 chk-25790 chk-26348 chk-26408 chk-3 chk-3333 chk-38650 chk-4588 chk-8 chk-96
chk-10397 chk-13 chk-18472 chk-21754 chk-25861 chk-26351 chk-26409 chk-30592 chk-34872 chk-39405 chk-5 chk-8127 chk-97
chk-10649 chk-13172 chk-18479 chk-22259 chk-26216 chk-26357 chk-26411 chk-31097 chk-35123 chk-39656 chk-5093 chk-8379 chk-98
chk-1087 chk-14183 chk-18548 chk-22512 chk-26307 chk-26360 chk-27055 chk-31601 chk-35627 chk-4 chk-5348 chk-8883 chk-9892
chk-10902 chk-15444 chk-18576 chk-22764 chk-26315 chk-26377 chk-28064 chk-31853 chk-36382 chk-40412 chk-5687 chk-9 chk-99
chk-11153 chk-15696 chk-18978 chk-23016 chk-26317 chk-26380 chk-28491 chk-32356 chk-36885 chk-41168 chk-6 chk-9135 shared
chk-11658 chk-16201 chk-19736 chk-23521 chk-26320 chk-26396 chk-28571 chk-32607 chk-37389 chk-41666 chk-6611 chk-9388 taskowned
chk-11910 chk-17210 chk-2 chk-24277 chk-26325 chk-26405 chk-29076 chk-32859 chk-37642 chk-41667 chk-7 chk-94
chk-12162 chk-17462 chk-20746 chk-25538 chk-26337 chk-26407 chk-29581 chk-33111 chk-38398 chk-41668 chk-7116 chk-95
out of which only chk-41668, chk-41667, chk-41666 have data.
The rest of the directories are empty.
Is this expected behavior. How to delete those empty directories? Is there some configuration for deleting empty directories?
Answering my own question here:
In UI I was seeing 'checkpoint expired before completing' error in the checkpointing section. And found out that to resolve the error we need to increase the checkpoint timeout.
I increased the timeout from 60 to 500 and it started deleting the empty chk-* files.
env.getCheckpointConfig().setCheckpointTimeout(500);

Problems with Chronicle Map on Windows

I am trying to use ChronicleMap for my index structure, this seems to work fine on Linux but when I am running my JUnit test on Windows (which is my development environment), I keep getting an error: java.io.IOException: Unable to wait until the file is ready, likely the process which created the file crashed or hung for more than 1 minute.
Here's the code snippet that is problematic:
File file = new File(idxFullPath);
ChronicleMap<Integer, int[]> idx =
ChronicleMapBuilder.of(Integer.class, int[].class)
.averageValue(getSampleIdxList())
.entries(IDX_MAX_SIZE)
.createPersistedTo(file);
The following exception is thrown:
[2016-06-17 14:32:47.779] ERROR main com.mcm.op.persistence.Persistence ERR java.io.IOException: Unable to wait until the file is ready, likely the process which created the file crashed or hung for more than 1 minute
at net.openhft.chronicle.map.ChronicleMapBuilder.waitUntilReady(ChronicleMapBuilder.java:1520)
at net.openhft.chronicle.map.ChronicleMapBuilder.openWithExistingFile(ChronicleMapBuilder.java:1583)
at net.openhft.chronicle.map.ChronicleMapBuilder.createWithFile(ChronicleMapBuilder.java:1444)
at net.openhft.chronicle.map.ChronicleMapBuilder.createPersistedTo(ChronicleMapBuilder.java:1405)
at com.mcm.op.persistence.Persistence.initIdx(Persistence.java:131)
at com.mcm.op.persistence.Persistence.init(Persistence.java:177)
at com.mcm.op.persistence.PersistenceTest.initPersist(PersistenceTest.java:47)
at com.mcm.op.persistence.PersistenceTest.setUp(PersistenceTest.java:29)
Indeed, it is likely that the process which created the file has crashed, or stopped terminated debugging, or something like that.
If it's ok to have a fresh index from unit test-to-test runs, I recommend to try either delete the file at idxFullPath before creating a Chronicle Map, or randomize the mapping file via something like File.createTempFile(). In either case File.deleteOnExit() could appear to be helpful.
If you want to keep the index between unit test runs and always use the same file at idxFullPath for persistence, you could try to use builder.createOrRecoverPersistedTo() instead of plain createPersistedTo() map creation method. However this might slow down the map creation.