What consumer offset will be set if auto.offset.reset=earliest but topic has no messages - apache-kafka

I have Kafka server version 2.4 and set log.retention.hours=168(so that messages in the topic will get deleted after 7 days) and auto.offset.reset=earliest(so that if the consumer doesn't get the last committed offset then it should be processed from the beginning). And since I am using Kafka 2.4 version so by default value offsets.retention.minutes=10080 (since I am not setting this property in my application).
My Topic data is : 1,2,3,4,5,6,7,8,9,10
current consumer offset before shutting down consumer: 10
End offset:10
last committed offset by consumer: 10
So let's say my consumer is not running for the past 7 days and I have started the consumer on the 8th day. So my last committed offset by the consumer will get expired(due to offsets.retention.minutes=10080 property) and topic messages also will get deleted(due to log.retention.hours=168 property).
So wanted to know what consumer offset will be set by auto.offset.reset=earliest property now?

Although no data is available in the Kafka topic, your brokers still know the "next" offset within that partition. In your case the first and last offset of this topic is 10 whereas it does not contain any data.
Therefore, your consumer which already has committed offset 10 will try to read 11 when started again, independent of the consumer configuration auto.offset.reset.
Your example will get even more interesting when your topic has had offsets, say, until 15 while the consumer was shut down after committing offset 10. Now, imagine all offsets were removed from the topic due to the retention policy. If you then start your consumer only then the consumer configuration auto.offset.reset comes into effect as stated in the documentation:
"What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted)"
As long as the Kafka topic is empty there is no offset "set" for the consumer. The consumer just tries to find the next available offset, either based on
the last committed offset or,
in case the last committed offset does not exist anymore, the configuration given through auto.offset.reset.
Just as an additional note: Even though the messages seem to get cleaned by the retention policy you may still see some data in the topic due to Data still remains in Kafka topic even after retention time/size

Once the consumer group gets deleted from log, auto.offset.reset will take the precedence and consumers will start consuming data from beginning.
My Topic data is : 1,2,3,4,5,6,7,8,9,10
If the topic has the above data, the consumer will start from beginning, and all 1 to 10 records will be consumed
My Topic data is : 11,12,13,14,15,16,17,18,19,20
In this case if old data is purged due to retention, the consumer will reset the offset to earliest (earliest offset available at that time) and start consuming from there, for example in this scenario it will consume all from 11 to 20 (since 1 to 10 are purged)

Related

Why did all the offsets disappear for consumers?

I have a service with kafka consumers. Previously, I created and closed consumers after receiving records every time. I made a change and started using resume / pause without closing consumers (with ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG = false and consumer.commitSync(offsetAndMetadataMap);). The service worked great all week. After 7 days it was restarted. After the restart, all offsets disappeared and consumers began to receive all old records (). How could this happen? Where did the offsets go?
I guess your consumers of that consumer group were not up for the 7 days before the restart?
The internal offset topic which contains data about the offsets of your groups is defined as compacted and delete topic policy,
it means it compact the records to save last value of a key
and also deletes old records from the topic,
the default is 7 days, offset topic retention ,
KAFKA-3806: Increase offsets retention default to 7 days (KIP-186) #4648
it is configurable as any other topic configuration
Offset expiration semantics has slightly changed in this version. According to the new semantics, offsets of partitions in a group will not be removed while the group is subscribed to the corresponding topic and is still active (has active consumers). If group becomes empty all its offsets will be removed after default offset retention period (or the one set by broker) has passed (unless the group becomes active again). Offsets associated with standalone (simple) consumers, that do not use Kafka group management, will be removed after default offset retention period (or the one set by broker) has passed since their last commit.

Kafka Consumer not consuming from last commited offset after restart

I have a consumer polling from subscribed topic. It consumes each message and does some processing (within seconds), pushes to different topic and commits offset.
There are totally 5000 messages,
before restart - consumed 2900 messages and committed offset
after restart - started consuming from offset 0.
Even though consumer is created with same consumer group, it started processing messages from offset 0.
kafka version (strimzi) > 2.0.0
kafka-python == 2.0.1
We don't know how many partitions you have in your topic but when consumers are created within a same consumer group, they will consume records from different partitions ( We can't have two consumers in a consumer group that consume from the same partition and If you add a consumer the group coordinator will execute the process of Re-balancing to reassign each consumer to a specific partition).
I think the offset 0 comes from the property auto.offset.reset which can be :
latest: Start at the latest offset in log
earliest: Start with the earliest record.
none: Throw an exception when there is no existing offset data.
But this property kicks in only if your consumer group doesn't have a valid offset committed.
N.B: Records in a topic have a retention period log.retention.ms property so your latest messages could be deleted when your are processing the first records in the log.
Questions: While you want to consume message from one topic and process data and write them to another topic why you didn't use Kafka Streaming ?

Is it possible to lose Consumer group Offsets by Kafka brokers?

I was consuming from a Kafka topic (with a retention of 21 days) in a Kafka Cluster as a consumer (earliest/from beginning) for 15 days continously with x consumer group and on 15th day producer's team stopped producing and I stopped consumer on my side conforming that no messages left over to consume. Then Kafka Cluster was also turned off. Then on 16th day Kafka Cluster turned on and Producer started his producer on 23rd day and I started my consumer on myside as well. But when I started, I was getting messages from beginning not from where I left out eventhough am consuming with same x consumer group. So my question is why this happened? Does Kafka Broker lost information about consumer group?
When a consumer group loses all its consumers its offsets will be kept for the period configured in the broker property offsets.retention.minutes. This property defaults to 10080 which is the equivalent to 7 days — roughly the time taken when your consumer stopped (the 16th day) and when it was resumed (the 23rd day).
You can tweak this property to increase the retention period of offsets. Alternatively, you can also tweak the property offsets.retention.check.interval.ms that dictates how often a check for stale offsets will occur.

How do i reset, update or clear the offset of each consumer group for a topic in Kafka?

I have a topic let's say as test001 and supposed there is 10000 messages in topic . I have two consumer group's lets say test-group1 and test-group2 for consuming the message from the above topic.
If test-group1 consumer's has consumed 4000 message and test-group2 consumer's has consumed 4500 message so how can i do:
Reset the offset to 0 of test-group1 consumer group?
update the test-group1 consumer groups offset to 4500?
delete the message from topic and reset offset of all consumer group to 0?
I don't think you can reset the offset at consumer group level. You can use the seek method (in the Java client API) to move to the start of the partition (offset 0), end or any other offset of your choice. Try exploring some of the CLI options such as kafka-consumer-groups.sh, kafka-topics.sh
This ticket shows that one can produce directly to the __consumer_offsets topic to overwrite the offsets, using the special "__admin_client" id:
https://issues.apache.org/jira/browse/KAFKA-5246
I'm not familiar with the format of the __consumer_offsets topic messages. This post may help a bit but you'd need to do more digging yourself:
http://dayooliyide.com/post/kafka-consumer-offsets-topic/
It may be simpler to write an app that given your group id does a seek to the given position and commits the offset.
Offsets are stored for each topic+partition+group.id, not overall for the entire topic. You cannot delete committed offsets, only commit newer ones, or wait for them to expire from the _consumer-offsets topic (default of 24 hours).
In 0.11 there will be an offset management tool so you can change offsets from the CLI independently from the consuming apps.

What determines Kafka consumer offset?

I am relatively new to Kafka. I have done a bit of experimenting with it, but a few things are unclear to me regarding consumer offset. From what I have understood so far, when a consumer starts, the offset it will start reading from is determined by the configuration setting auto.offset.reset (correct me if I am wrong).
Now say for example that there are 10 messages (offsets 0 to 9) in the topic, and a consumer happened to consume 5 of them before it went down (or before I killed the consumer). Then say I restart that consumer process. My questions are:
If the auto.offset.reset is set to earliest, is it always going to start consuming from offset 0?
If the auto.offset.reset is set to latest, is it going to start consuming from offset 5?
Is the behavior regarding this kind of scenario always deterministic?
Please don't hesitate to comment if anything in my question is unclear.
It is a bit more complex than you described.
The auto.offset.reset config kicks in ONLY if your consumer group does not have a valid offset committed somewhere (2 supported offset storages now are Kafka and Zookeeper), and it also depends on what sort of consumer you use.
If you use a high-level java consumer then imagine following scenarios:
You have a consumer in a consumer group group1 that has consumed 5 messages and died. Next time you start this consumer it won't even use that auto.offset.reset config and will continue from the place it died because it will just fetch the stored offset from the offset storage (Kafka or ZK as I mentioned).
You have messages in a topic (like you described) and you start a consumer in a new consumer group group2. There is no offset stored anywhere and this time the auto.offset.reset config will decide whether to start from the beginning of the topic (earliest) or from the end of the topic (latest)
One more thing that affects what offset value will correspond to earliest and latest configs is log retention policy. Imagine you have a topic with retention configured to 1 hour. You produce 5 messages, and then an hour later you post 5 more messages. The latest offset will still remain the same as in previous example but the earliest one won't be able to be 0 because Kafka will already remove these messages and thus the earliest available offset will be 5.
Everything mentioned above is not related to SimpleConsumer and every time you run it, it will decide where to start from using the auto.offset.reset config.
If you use Kafka version older than 0.9, you have to replace earliest, latest with smallest,largest.
Just an update: From Kafka 0.9 and forth, Kafka is using a new Java version of the consumer and the auto.offset.reset parameter names have changed; From the manual:
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found
for the consumer's group
anything else: throw exception to the consumer.
I spent some time to find this after checking the accepted answer, so I thought it might be useful for the community to post it.
Further more there's offsets.retention.minutes. If time since last commit is > offsets.retention.minutes, then auto.offset.reset also kicks in