LOG-END-OFFSET adds up to twice the number of messages in the topic - apache-kafka

According to this log-end-offset for the consumers in a consumer group for a topic should add up to the number of messages in that topic. I have a case where log-end-offset is adding up to twice the number of messages in the topic (log-end-offset adds up to 28 whereas there are only 14 messages in the topic). What are some potential explanations for this?
The current issue I am facing with this jdbc sink connector is that there are there is a bad message at offset 0 i.e. if the connector tries to process it then it will fail due to violating a db constraint. We have been able to work around this by manually moving the connector's consumer offset s.t. it skips over the bad message. Then randomly months later, it tried to go back and process it even though nobody manually asked it to. The two issues seem related - it seems like something is tricking the connector into thinking it needs to reprocess all of the messages in the topic, hence why the log-end-offsets add up to twice the number of messages in the topic.
We are on Confluent 5.3.3.

Hi offset is always moving forward even when old messages getting prune / deleted from the topic after configurable retention times / size. So in your case having log end offset of 28 and only 14 messages which are really inside the topic at the moment, is a valid situation.

Related

Kafka Consumer Offset Lag Causes Kafka Down

I've realized that one of my topics left messages inside of consumer offset and I'm trying to track it down in KafkaDrop but I've seen that for 3 of my __consumer_offsets partition has high last offset value. I have also checked messages but some of them belongs to Sept which is 3 months ago. How can I get rid of these messages without giving harm to whole kafka?
Data in the offsets topic is automatically removed overtime, and high offset simply means that you have many consumers for your cluster, nothing more, and doesn't affect performance at all. You shouldn't manually remove data from it.

Missing events on previously empty partition when restarting kafka streams app

I have a strange issue that I cannot understand how I can resolve. I have a kafka streams app (2.1.0) that reads from a topic with around 40 partitions. The partitions are using a range partition policy so at the moment some of them can be completely empty.
My issue is that during the downtime of the app one of those empty partitions was activated and a number of events were written to it. When the app was restored though, it read all the events from other partitions but it ignored the events already stored to the previous empty partition (the app has OffsetResetPolicy LATEST for the specific topic). On top of that when newer messages arrived to the specific partition it did consume them and somehow bypassed the previous ones.
My assumption is that __consumer_offsets does not have any entry for the specified partition when restoring but how can I avoid this situation without losing events. I mean the topic already exists
with the specified number of partitions.
Does this sound familiar to anybody ? Am I missing something, do I need to set some parameter to kafka because I cannot figure out why this is happening ?
This is expected behaviour.
Your empty partition does not have committed offset in __consumer_offsets. If there are no committed offsets for a partition, the offset policy specified in auto.offset.rest is used to decide at which offset to start consuming the events.
If auto.offset.reset is set to LATEST, your Streams app will only start consuming at the latest offset in the partition, i.e., after the events that were added during downtime and it will only consume events that were written to the partition after downtime.
If auto.offset.reset is set to EARLIEST, your Streams app will start from the earliest offset in the partition and read also the events written to the partition during downtime.
As #mazaneica mentioned in a comment to your question, auto.offset.reset only affects partitions without a committed offset. So your non-empty partitions will be fine, i.e., the Streams app will consume events from where it stopped before the downtime.

Apache Kafka Cleanup while consuming messages

Playing around with Apache Kafka and its retention mechanism I'm thinking about following situation:
A consumer fetches first batch of messages with offsets 1-5
The cleaner deletes the first 10 messages, so the topic now has offsets 11-15
In the next poll, the consumer fetches the next batch with offsets 11-15
As you can see the consumer lost the offsets 6-10.
Question, is such a situation possible at all? With other words, will the cleaner execute while there is an active consumer? If yes, is the consumer able to somehow recognize that gap?
Yes such a scenario can happen. The exact steps will be a bit different:
Consumer fetches message 1-5
Messages 1-10 are deleted
Consumer tries to fetch message 6 but this offset is out of range
Consumer uses its offset reset policy auto.offset.reset to find a new valid offset.
If set to latest, the consumer moves to the end of the partition
If set to earliest the consumer moves to offset 11
If none or unset, the consumer throws an exception
To avoid such scenarios, you should monitor the lead of your consumer group. It's similar to the lag, but the lead indicates how far from the start of the partition the consumer is. Being near the start has the risk of messages being deleted before they are consumed.
If consumers are near the limits, you can dynamically add more consumers or increase the topic retention size/time if needed.
Setting auto.offset.reset to none will throw an exception if this happens, the other values only log it.
Question, is such a situation possible at all? will the cleaner execute while there is an active consumer
Yes, if the messages have crossed TTL (Time to live) period before they are consumed, this situation is possible.
Is the consumer able to somehow recognize that gap?
In case where you suspect your configuration (high consumer lag, low TTL) might lead to this, the consumer should track offsets. kafka-consumer-groups.sh command gives you the information position of all consumers in a consumer group as well as how far behind the end of the log they are.

Kafka retention AFTER initial consuming

I have a Kafka cluster with one consumer, which is processing TB's of data every day. Once a message is consumed and committed, it can be deleted immediately (or after a retention of few minutes).
It looks like the log.retention.bytes and log.retention.hours configurations count from the message creation. Which is not good for me.
In case where the consumer is down for maintenance/incident, I want to keep the data until it comes back online. If I happen to run out of space, I want to refuse accepting new data from the producers, and NOT delete data that wasn't consumed yet (so the log.retention.bytes doesn't help me).
Any ideas?
If you can ensure your messages have unique keys, you can configure your topic to use compaction instead of timed-retention policy. Then have your consumer after having processed each message send a message back to the same topic with the message key but null value. Kafka would compact away such messages. You can tune compaction parameters to your needs (and log segment file size, since the head segment is never compacted, you may want to set it to a smaller size if you want compaction to kick in sooner).
However, as I mentioned before, this would only work if messages have unique keys, otherwise you can't simply turn on compaction as that would cause loss of previous messages with the same key during periods when your consumer is down (or has fallen behind the head segment).

Apache Kafka and durable subscribtion

I'm considering using Apache Kafka and I could not find any information about durable subscriptions. Let's say I have expiration of 5 seconds for messages in my partition. Now if consumer fails and reconnects after 5 seconds, the message he missed will be gone. Even worse, he wont know that he missed a message. The durable subscription pattern solves this by saving the message for the consumer that failed or was disconnected. Is similar feature implemented in Kafka?
This is not supported by Kafka. But you can of course always increase your retention time, and thus limit the probability that a consumer misses messages.
Furthermore, if you set auto.offset.reset to none you will get an exception that informs you if a consumer misses any messages. Hence, it is possible to get informed if this happens.
Last but not least, it might be possible, to use a compacted topic -- this would ensure, that messages are not deleted until you explicitly write a so-called tombstone message. Note, that records must have unique keys to use a compacted topic.