Kafka retention policy on active consumer group - apache-kafka

Does the Kafka clean up the logs only when no consumer is active on a consumer group?
When there is a lag in a partition with an active consumer, I expected the current offset (lag) to also adjust once the time set on the retention policy has passed, but it looks like the lags are still consumable after the retention period had passed as long as the consumer is attached to the group.
I tested with the log.retention.check.interval.ms set to 1ms and log.cleanup.policy to 'delete', along with the topic's retentions.ms set to 1000ms, but the lags were still consumable way past the 1000ms.
When I remove the consumer and add a consumer again to the existing group, the offset gets adjusted as expected.
Does Kafka only adjust the offset when there is no active consumer?
If so, is there a way to update the current offset according to the retention policy other than removing and recreating the consumer?
Thanks in advance.

If there's an active consumer that's committing offsets back to Kafka __consumer_offsets topic, then no, offset information wouldn't ever be removed, despite the original topic segments being removed to where those offsets may no longer exist. As the docs indicate, the group needs to first be inactive, but also need to remain inactive for several minutes.
offsets.retention.minutes
After a consumer group loses all its consumers (i.e. becomes empty) its offsets will be kept for this retention period before getting discarded
(emphasis added)
You can call seekToEarliest / seekToEnd function if you want to always guarantee your group position rather than rely on stored offsets

Related

Kafka consumer-group liveness empty topic partitions

Following up on this question - I would like to know semantics between consumer-groups and offset expiry. In general I'm curious to know, how kafka protocol determines some specific offset (for consumer-group, topic, partition combination) to be expired ? Is it basing on periodic commits from consumer that are part of the group-protocol or does the offset-tick gets applied after all consumers are deemed dead/closed ? Im thinking this could have repercussions when dealing with topic-partitions to which data isn't produced frequently. In my case, we have a consumer-group reading from a fairly idle topic (not much data produced). Since, the consumer-group doesnt periodically commit any offsets, can we ever be in danger of loosing previously committed offsets. For example, when some unforeseen rebalance happens, the topic-partitions could get re-assigned with lost offset-commits and this could cause the consumer to read data from the earliest (configured auto.offset.reset) point ?
For user-topics, offset expiry / topic retention is completely decoupled from consumer-group offsets. Segments do not "reopen" when a consumer accesses them.
At a minimum, segment.bytes, retention.ms(or minutes/hours), retention.bytes all determine when log segments get deleted.
For the internal __consumer_offsets topic, offsets.retention.minutes controls when it is deleted (also in coordination with its segment.bytes).
The LogCleaner thread actively removes closed segments on a periodic basis, not the consumers. If a consumer is lagging considerably, and upon requesting offsets from a segment that had been deleted, then the auto.offset.reset gets applied.

Why did all the offsets disappear for consumers?

I have a service with kafka consumers. Previously, I created and closed consumers after receiving records every time. I made a change and started using resume / pause without closing consumers (with ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG = false and consumer.commitSync(offsetAndMetadataMap);). The service worked great all week. After 7 days it was restarted. After the restart, all offsets disappeared and consumers began to receive all old records (). How could this happen? Where did the offsets go?
I guess your consumers of that consumer group were not up for the 7 days before the restart?
The internal offset topic which contains data about the offsets of your groups is defined as compacted and delete topic policy,
it means it compact the records to save last value of a key
and also deletes old records from the topic,
the default is 7 days, offset topic retention ,
KAFKA-3806: Increase offsets retention default to 7 days (KIP-186) #4648
it is configurable as any other topic configuration
Offset expiration semantics has slightly changed in this version. According to the new semantics, offsets of partitions in a group will not be removed while the group is subscribed to the corresponding topic and is still active (has active consumers). If group becomes empty all its offsets will be removed after default offset retention period (or the one set by broker) has passed (unless the group becomes active again). Offsets associated with standalone (simple) consumers, that do not use Kafka group management, will be removed after default offset retention period (or the one set by broker) has passed since their last commit.

CURRENT-OFFSET and LAG of kafka consumer group that has no active members

How are these two set? Behaviour that I observe with kafka-consumer-groups.sh is that when new message is appended to a certain partition, it increments at first its LOG-END-OFFSET and LAG columns, and after some time, CURRENT-OFFSET column gets incremented and LAG column gets decremented, although no offset was actually commited by any consumer, as there are no active consumers. Am I right, and is this always happening with consumer groups that have no active members, or is there a possibility to turn off the second stage, that simulates commiting offsets by non-existing consumers? This is actually confusing, you have to take into account the information that there are no active members in a consumer group, in order to have the right perspective of what the CURRENT-OFFSET and LAG columns actually mean (not much in that case).
OK, it seems that the consumer actually does continuously connect and poll the messages and commits the offsets, but in a volatile fashion (disconnecting each time) so that kafka-consumer-groups.sh always reports as if there are no active members in a group.
This is a flink job that acts this way. Is that possible?
If the retention policy kicks in, and deletes old messages, the lag could decrease (if published logs are less than deleted ones), since the CURRENT-OFFSET positions itself at the earliest avaliable log.
I'd check what's the retention policy for your topic, since this may be due to deleted messages: The lag doesn't care about purgued messages, only active ones.
This has nothing to do with connecting to and disconnecting from the kafka cluster, that would be way to slow and ineffective. It has to do with the way that flink kafka consumer is implemented, which is described here: Flink Kafka Connector
The committed offsets are only a means to expose the consumer’s
progress for monitoring purposes.
What it basically does, it does not subscribe to topics as standard consumers that use consumer groups and their standard coordinators and leader mechanisms, but it directly assigns partitions, and only commits offsets to a consumer group for monitoring purposes, although it has methods of using these offsets for continuation too, see here, but anyway, that is why these groups appear to kafka as non having active members, and still getting offsets commited.

Apache Kafka Cleanup while consuming messages

Playing around with Apache Kafka and its retention mechanism I'm thinking about following situation:
A consumer fetches first batch of messages with offsets 1-5
The cleaner deletes the first 10 messages, so the topic now has offsets 11-15
In the next poll, the consumer fetches the next batch with offsets 11-15
As you can see the consumer lost the offsets 6-10.
Question, is such a situation possible at all? With other words, will the cleaner execute while there is an active consumer? If yes, is the consumer able to somehow recognize that gap?
Yes such a scenario can happen. The exact steps will be a bit different:
Consumer fetches message 1-5
Messages 1-10 are deleted
Consumer tries to fetch message 6 but this offset is out of range
Consumer uses its offset reset policy auto.offset.reset to find a new valid offset.
If set to latest, the consumer moves to the end of the partition
If set to earliest the consumer moves to offset 11
If none or unset, the consumer throws an exception
To avoid such scenarios, you should monitor the lead of your consumer group. It's similar to the lag, but the lead indicates how far from the start of the partition the consumer is. Being near the start has the risk of messages being deleted before they are consumed.
If consumers are near the limits, you can dynamically add more consumers or increase the topic retention size/time if needed.
Setting auto.offset.reset to none will throw an exception if this happens, the other values only log it.
Question, is such a situation possible at all? will the cleaner execute while there is an active consumer
Yes, if the messages have crossed TTL (Time to live) period before they are consumed, this situation is possible.
Is the consumer able to somehow recognize that gap?
In case where you suspect your configuration (high consumer lag, low TTL) might lead to this, the consumer should track offsets. kafka-consumer-groups.sh command gives you the information position of all consumers in a consumer group as well as how far behind the end of the log they are.

Kafka topics beyond retention period

What happens to topics that are beyond their retention period? The messages will get wiped out but will the topic still exist and if so, will it write to offset 0 if there is only one partition on a topic?
Each offset within a partition is always assigned to a single message, and it won't be reassigned. From Log Compaction Basics documentation:
Note that the messages in the tail of the log retain the original offset assigned when they were first written—that never changes. Note also that all offsets remain valid positions in the log, even if the message with that offset has been compacted away ...
The brokers will hold no data for those topics, but the offsets will be set at their "high water mark" until new messages are produced.
The topic metadata will still exist, and the offsets always increase, never reset.