Kafka vm.max_map_count - apache-kafka

We have a Kafka cluster for Kafka stream application.
After some hours our broker went down and we got OutOfMemory exception.
We saw the vm.max_map_count is not enough and maps memory of the process is above 40K.
Can someone explain what can be the problem or what influence on that parameter?
The number always increases and never goes down.

Based on the pull request at https://github.com/apache/kafka/pull/4358/files (both the change being proposed and the comments reacting to it), it appears that each log segment (i.e. file) in each partition on each topic on the broker consumes two maps.
I would expect the value to rise until you reach a steady-state where all topics have logs that are old enough to start being deleted due to the retention interval. At that point, each new file would be expected to occur at around the same time as an older one is deleted (assuming roughly constant message rates). I would expect the value to drop if topics were deleted or if you changed the configuration of an existing topic or the full broker (e.g. reduce the log retention time or cause the logs to roll over less frequently), and to go up if you change the configuration in the opposite direction.

Related

Apache Kafka: large retention time vs. fast read of last value

Dear Apache Kafka friends,
I have a use case for which I am looking for an elegant solution:
Data is published in a Kafka-Topic at a relatively high rate. There are two competing requirements
all records should be kept for 7 days (which is configured by min.compaction.lag)
applications should read the "last status" from the topic during their initialization phase
LogCompaction is enabled in order for the "last state" to be available in the topic.
Now comes the problem. If an application wants to initialize itself from the topic, it has to read a lot of records to get the last state for all keys (the entire topic content must be processed). But this is not performant possible with the amount of records.
Idea
A streaming process streams the data of the topic into a corresponding ShortTerm topic which has a much shorter min.compaction.lag time (1 hour). The applications initialize themselves from this topic.
Risk
The streaming process is a potential source of errors. If it temporarily fails, the applications will no longer receive the latest status.
My Question
Are there any other possible solutions to satisfy the two requirements. Did I maybe miss a Kafa concept that helps to handle these competing requirements?
Any contribution is welcome. Thank you all.
If you don't have a strict guarantee how frequently each key will be updated, you cannot do anything else as you proposed.
To avoid the risk that the downstream app does not get new updates (because the data replication jobs stalls), I would recommend to only bootstrap an app from the short term topic, and let it consume from the original topic afterwards. To not miss any updates, you can sync the switch over as follows:
On app startup, get the replication job's committed offsets from the original topic.
Get the short term topic's current end-offsets (because the replication job will continue to write data, you just need a fixed stopping point).
Consume the short term topic from beginning to the captured end offsets.
Resume consuming from the original topic using the captured committed offsets (from step 1) as start point.
This way, you might read some messages twice, but you won't lose any updates.
To me, the two requirements you have mentioned together with the requirement for new consumers are not competing. In fact, I do not see any reason why you should keep a message of an outdated key in your topic for 7 days, because
New consumers are only interested in the latest message of a key.
Already existing consumers will have processed the message within 1 hour (as taken from your comments).
Therefore, my understanding is that your requirement "all records should be kept for 7 days" can be replaced by "each consumer should have enough time to consume the message & the latest message for each key should be kept for 7 days".
Please correct me if I am wrong and explain which consumer actually does need "all records for 7 days".
If that is the case you could do the following:
Enable log compaction as well as time-based retention to 7 days for this topic
Fine-tune the compaction frequency to be very eager, meaning to keep as little as possible outdated messages for a key.
Set min.compaction.lag to 1 hour such that all consumers have the chance to keep up.
That way, new consumers will read (almost) only the latest message for each key. If that is not performant enough, you can try increasing the partitions and consumer threads of your consumer groups.

Kafka: Messages disappearing from topics, largestTime=0

We have messages disappearing from topics on Apache Kafka with versions 2.3, 2.4.0, 2.4.1 and 2.5.0. We noticed this when we make a rolling deployment of our clusters and unfortunately it doesn't happen every time, so it's very inconsistent.
Sometimes we lose all messages inside a topic, other times we lose all messages inside a partition. When this happens the following log is a constant:
[2020-04-27 10:36:40,386] INFO [Log partition=test-lost-messages-5, dir=/var/kafkadata/data01/data] Deleting segments List(LogSegment(baseOffset=6, size=728, lastModifiedTime=1587978859000, largestTime=0)) (kafka.log.Log)
There is also a previous log saying this segment hit the retention time breach of 24 hours. In this example, the message was produced ~12 minutes before the deployment.
Notice, all messages that are wrongly deleted have largestTime=0 and the ones that are properly deleted have a valid timestamp in there. From what we read from documentation and code it looks like the largestTime is used to calculate if a given segment reached the time breach or not.
Since we can observe this in multiple versions of Kafka, we think this might be related to anything external to Kafka. E.g Zookeeper.
Does anyone have any ideas of why this could be happening? We are using Zookeeper 3.6.0.
We found out that the cause was not related to Kafka itself but to the volume where we stored the logs. Still, the following explanation might be useful for educational purposes:
In detail, it was a permission problem where Kafka was not able to read the .timeindex files when the log cleaner was triggered. This caused largestTime to be 0 and lead to some messages being deleted way before the retention time.
Each topic partition is divided into several segments and the last are then stored into different .log files that contain the actual messages. For each .log file there is a .timeindex file containing a map between offset and lastModifiedTime.
When Kafka needs to check if a segment is deletable, it searches for the most recent offset lastModifiedTime and stores it as largestTime. Then, checks if the retention limit was reached: currentTime - largestTime > retentionTime.
If so, it deletes the segment and the respective messages.
Since Kafka was not able to read the file, largestTime was 0 and the check currentTime > retentionTime was always true for our 1-day retention.
Ensure date is synced between all Kafka brokers and ZooKeeper nodes.
Bash command: date.
Compare year, day, hour and minute.

Is there a way to always retain last message in the topic in Kafka server?

From official Kafka documentation https://kafka.apache.org/documentation/#gettingStarted there are time and size retention parameters. Is there a way to configure Kafka to always keep last message per topic regardless how long it would be?
Currently I am thinking to republish it at the end of expiration period, that does not look like good idea.
See the section of log compaction and having a topic setting of cleanup.policy=compact will keep messages retained indefinitely, but only those with unique keys.
Note that all messages will be retained within an open "segment", which defaults to 1GB worth of data, while any closed, old segments will have uniquely keyed events. You can tune the segment size and "dirty ratio" of a topic to make the LogCleaner more aggressive, but this comes at a performance cost.

kafka partition has lots of log segments

One topic has 20 partitions, almost everyone has more than 20,000 log segment files, most of them are created months ago. Even after I config the retention.ms to very short, the segments are not deleted. While other topics can recycle normal.
I am wondering what's the issue inside, and how to solve it. Because I'm worry about the number of total segments will keep increasing that larger than OS vm.max_map_count, which will damage kafka process itself. Following image is the describe about the abnormal topic.
Not sure what the issue is exactly, but some things to consider:
Broker vs topic-specific configs. Check to make sure your topic actually has the configs you think it has, and is not inheriting them from the broker settings.
Configs related to retention. As mentioned by Girogos Myrianthous, you can look at log.retention.check.interval.ms and log.cleanup.policy. I would also look at the roll related settings, like log.roll.hours. I believe that in some cases, Kafka will not delete a segment until its partition rolls, even if the segment is old. And rolling follows the following behavior:
The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms (http://kafka.apache.org/20/documentation.html)
So make sure to consider the record timestamps, not just the segment files' age.
Finally:
What version of Kafka are you using?
Have you looked carefully at the broker logs? Broker logs is how I've solved all such problems that I've encountered.

Kafka retention AFTER initial consuming

I have a Kafka cluster with one consumer, which is processing TB's of data every day. Once a message is consumed and committed, it can be deleted immediately (or after a retention of few minutes).
It looks like the log.retention.bytes and log.retention.hours configurations count from the message creation. Which is not good for me.
In case where the consumer is down for maintenance/incident, I want to keep the data until it comes back online. If I happen to run out of space, I want to refuse accepting new data from the producers, and NOT delete data that wasn't consumed yet (so the log.retention.bytes doesn't help me).
Any ideas?
If you can ensure your messages have unique keys, you can configure your topic to use compaction instead of timed-retention policy. Then have your consumer after having processed each message send a message back to the same topic with the message key but null value. Kafka would compact away such messages. You can tune compaction parameters to your needs (and log segment file size, since the head segment is never compacted, you may want to set it to a smaller size if you want compaction to kick in sooner).
However, as I mentioned before, this would only work if messages have unique keys, otherwise you can't simply turn on compaction as that would cause loss of previous messages with the same key during periods when your consumer is down (or has fallen behind the head segment).