kafka partition has lots of log segments - apache-kafka

One topic has 20 partitions, almost everyone has more than 20,000 log segment files, most of them are created months ago. Even after I config the retention.ms to very short, the segments are not deleted. While other topics can recycle normal.
I am wondering what's the issue inside, and how to solve it. Because I'm worry about the number of total segments will keep increasing that larger than OS vm.max_map_count, which will damage kafka process itself. Following image is the describe about the abnormal topic.

Not sure what the issue is exactly, but some things to consider:
Broker vs topic-specific configs. Check to make sure your topic actually has the configs you think it has, and is not inheriting them from the broker settings.
Configs related to retention. As mentioned by Girogos Myrianthous, you can look at log.retention.check.interval.ms and log.cleanup.policy. I would also look at the roll related settings, like log.roll.hours. I believe that in some cases, Kafka will not delete a segment until its partition rolls, even if the segment is old. And rolling follows the following behavior:
The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms (http://kafka.apache.org/20/documentation.html)
So make sure to consider the record timestamps, not just the segment files' age.
Finally:
What version of Kafka are you using?
Have you looked carefully at the broker logs? Broker logs is how I've solved all such problems that I've encountered.

Related

Apache Kafka: large retention time vs. fast read of last value

Dear Apache Kafka friends,
I have a use case for which I am looking for an elegant solution:
Data is published in a Kafka-Topic at a relatively high rate. There are two competing requirements
all records should be kept for 7 days (which is configured by min.compaction.lag)
applications should read the "last status" from the topic during their initialization phase
LogCompaction is enabled in order for the "last state" to be available in the topic.
Now comes the problem. If an application wants to initialize itself from the topic, it has to read a lot of records to get the last state for all keys (the entire topic content must be processed). But this is not performant possible with the amount of records.
Idea
A streaming process streams the data of the topic into a corresponding ShortTerm topic which has a much shorter min.compaction.lag time (1 hour). The applications initialize themselves from this topic.
Risk
The streaming process is a potential source of errors. If it temporarily fails, the applications will no longer receive the latest status.
My Question
Are there any other possible solutions to satisfy the two requirements. Did I maybe miss a Kafa concept that helps to handle these competing requirements?
Any contribution is welcome. Thank you all.
If you don't have a strict guarantee how frequently each key will be updated, you cannot do anything else as you proposed.
To avoid the risk that the downstream app does not get new updates (because the data replication jobs stalls), I would recommend to only bootstrap an app from the short term topic, and let it consume from the original topic afterwards. To not miss any updates, you can sync the switch over as follows:
On app startup, get the replication job's committed offsets from the original topic.
Get the short term topic's current end-offsets (because the replication job will continue to write data, you just need a fixed stopping point).
Consume the short term topic from beginning to the captured end offsets.
Resume consuming from the original topic using the captured committed offsets (from step 1) as start point.
This way, you might read some messages twice, but you won't lose any updates.
To me, the two requirements you have mentioned together with the requirement for new consumers are not competing. In fact, I do not see any reason why you should keep a message of an outdated key in your topic for 7 days, because
New consumers are only interested in the latest message of a key.
Already existing consumers will have processed the message within 1 hour (as taken from your comments).
Therefore, my understanding is that your requirement "all records should be kept for 7 days" can be replaced by "each consumer should have enough time to consume the message & the latest message for each key should be kept for 7 days".
Please correct me if I am wrong and explain which consumer actually does need "all records for 7 days".
If that is the case you could do the following:
Enable log compaction as well as time-based retention to 7 days for this topic
Fine-tune the compaction frequency to be very eager, meaning to keep as little as possible outdated messages for a key.
Set min.compaction.lag to 1 hour such that all consumers have the chance to keep up.
That way, new consumers will read (almost) only the latest message for each key. If that is not performant enough, you can try increasing the partitions and consumer threads of your consumer groups.

Kafka: Messages disappearing from topics, largestTime=0

We have messages disappearing from topics on Apache Kafka with versions 2.3, 2.4.0, 2.4.1 and 2.5.0. We noticed this when we make a rolling deployment of our clusters and unfortunately it doesn't happen every time, so it's very inconsistent.
Sometimes we lose all messages inside a topic, other times we lose all messages inside a partition. When this happens the following log is a constant:
[2020-04-27 10:36:40,386] INFO [Log partition=test-lost-messages-5, dir=/var/kafkadata/data01/data] Deleting segments List(LogSegment(baseOffset=6, size=728, lastModifiedTime=1587978859000, largestTime=0)) (kafka.log.Log)
There is also a previous log saying this segment hit the retention time breach of 24 hours. In this example, the message was produced ~12 minutes before the deployment.
Notice, all messages that are wrongly deleted have largestTime=0 and the ones that are properly deleted have a valid timestamp in there. From what we read from documentation and code it looks like the largestTime is used to calculate if a given segment reached the time breach or not.
Since we can observe this in multiple versions of Kafka, we think this might be related to anything external to Kafka. E.g Zookeeper.
Does anyone have any ideas of why this could be happening? We are using Zookeeper 3.6.0.
We found out that the cause was not related to Kafka itself but to the volume where we stored the logs. Still, the following explanation might be useful for educational purposes:
In detail, it was a permission problem where Kafka was not able to read the .timeindex files when the log cleaner was triggered. This caused largestTime to be 0 and lead to some messages being deleted way before the retention time.
Each topic partition is divided into several segments and the last are then stored into different .log files that contain the actual messages. For each .log file there is a .timeindex file containing a map between offset and lastModifiedTime.
When Kafka needs to check if a segment is deletable, it searches for the most recent offset lastModifiedTime and stores it as largestTime. Then, checks if the retention limit was reached: currentTime - largestTime > retentionTime.
If so, it deletes the segment and the respective messages.
Since Kafka was not able to read the file, largestTime was 0 and the check currentTime > retentionTime was always true for our 1-day retention.
Ensure date is synced between all Kafka brokers and ZooKeeper nodes.
Bash command: date.
Compare year, day, hour and minute.

Kafka compaction for de-duplication

I'm trying to understand how Kafka compaction works and have the following question: Does kafka guarantees uniqueness of keys for messages stored in topic with enabled compaction?
Thanks!
Short answer is no.
Kafka doesn't guarantees uniqueness for key stored with enabled topic retention.
In Kafka you have two types of cleanup.policy:
delete - It means that after configured time messages won't be available. There are several properties, that can be used for that: log.retention.hours, log.retention.minutes, log.retention.ms. By default log.retention.hours is set 168. It means, that messages older than 7 days will be deleted
compact - For each key at least one message will be available. In some situation it can be one, but in the most cases it will be more. Compaction processed is run in background periodically. It copies log parts with removing duplicates and only leaving last value.
If you want to read only one value for each key, you have to use KTable<K,V> abstraction from Kafka Streams.
Related question regarding latest value for key and compaction:
Kafka only subscribe to latest message?
Looking at 4 guarantees of kakfa compaction, number 4 states:
Any consumer progressing from the start of the log will see at least
the final state of all records in the order they were written.
Additionally, all delete markers for deleted records will be seen,
provided the consumer reaches the head of the log in a time period
less than the topic's delete.retention.ms setting (the default is 24
hours). In other words: since the removal of delete markers happens
concurrently with reads, it is possible for a consumer to miss delete
markers if it lags by more than delete.retention.ms.
So, you will have more than one value for the key if the head of the topic is not being retained by the delete.retention.ms policy.
As I understand it, if you set a 24h retention policy (delete.retention.ms=86400000), you'll have a unique value for a single key, for all messages that were from 24h ago. That's your at least, but not only, as many other messages for the same key may have arrived during the last 24 hours.
So, it is guaranteed that you'll catch at least one, but not just the last, because retention didn't act on recent messages.
edit. As cricket's comment states, even if you set a delete retention property of 1 day, the log.roll.ms is what defines when a log segment is closed, based on message's timestamp. As this last segment is never retained for compaction, it becomes the second factor that doesn't allow you having just the last value for your known key. If your topic starts at T0, then messages after T0+log.roll.ms will be on the open log segment, thus, not compacted.

Kafka offset topic compaction is not happening

Kafka 0.11.0.0 has been running in production. We see that log compaction of the consumer offsets topic is not happening. In the consumer offset partitions, we see log segments remaining there for the last 3 months. Log cleaner logs showed that it failed building the map for compaction due to "CorruptRecordException".
Since there were a lot of segment files each of size 100mb in the partitions, instead of taking a DumpLogSegements and finding the bad segment, we decided to go ahead and delete the old segment files and keep only the ones from the last 3 days. After this, we restarted kafka and it seemed to work fine.
But in 2 days of doing this, we are seeing the logs getting built up again, just as it did before. We no longer see a corruptRecord Exception in the logs, but the offsets are not getting compacted and its been 7 days since.
None of the default values for compaction or retention were changed. preallocate is also set to false. Can anybody give me any insight of what could be going on here?
Edit:
The CorruptRecordException that I was running into seems to originate from AbstractLegacyRecordBatch.java
long offset = offsetAndSizeBuffer.getLong(Records.OFFSET_OFFSET);
int size = offsetAndSizeBuffer.getInt(Records.SIZE_OFFSET);
if (size < LegacyRecord.RECORD_OVERHEAD_V0)
throw new CorruptRecordException(String.format("Record size is less than the minimum record overhead (%d)", LegacyRecord.RECORD_OVERHEAD_V0));
Any idea about when this can occur and why the compaction is not happeneing even after the old segments are deleted.

Kafka vm.max_map_count

We have a Kafka cluster for Kafka stream application.
After some hours our broker went down and we got OutOfMemory exception.
We saw the vm.max_map_count is not enough and maps memory of the process is above 40K.
Can someone explain what can be the problem or what influence on that parameter?
The number always increases and never goes down.
Based on the pull request at https://github.com/apache/kafka/pull/4358/files (both the change being proposed and the comments reacting to it), it appears that each log segment (i.e. file) in each partition on each topic on the broker consumes two maps.
I would expect the value to rise until you reach a steady-state where all topics have logs that are old enough to start being deleted due to the retention interval. At that point, each new file would be expected to occur at around the same time as an older one is deleted (assuming roughly constant message rates). I would expect the value to drop if topics were deleted or if you changed the configuration of an existing topic or the full broker (e.g. reduce the log retention time or cause the logs to roll over less frequently), and to go up if you change the configuration in the opposite direction.