I am consuming from a Kafka topic with Assign Offset
KafkaUtils.createDirectStream(sparkStreamingContext, PreferConsistent, Assign[String, String](fromOffsets.keys, kafkaConsumerParams, fromOffsets))
We have Kafka consumer messages retention period as 4 days.
Suppose If I try to consume message from topic which already expired because of retention. Its throwing offset out of range exception.
So I want to check whether the assigning offset is present or not in consumer topic. If not present I'll make auto.offset.reset as earliest.
Please any one give Suggestion. How do I check assigning offset is present or not in Kafka.
Related
I have Kafka server version 2.4 and set log.retention.hours=168(so that messages in the topic will get deleted after 7 days) and auto.offset.reset=earliest(so that if the consumer doesn't get the last committed offset then it should be processed from the beginning). And since I am using Kafka 2.4 version so by default value offsets.retention.minutes=10080 (since I am not setting this property in my application).
My Topic data is : 1,2,3,4,5,6,7,8,9,10
current consumer offset before shutting down consumer: 10
End offset:10
last committed offset by consumer: 10
So let's say my consumer is not running for the past 7 days and I have started the consumer on the 8th day. So my last committed offset by the consumer will get expired(due to offsets.retention.minutes=10080 property) and topic messages also will get deleted(due to log.retention.hours=168 property).
So wanted to know what consumer offset will be set by auto.offset.reset=earliest property now?
Although no data is available in the Kafka topic, your brokers still know the "next" offset within that partition. In your case the first and last offset of this topic is 10 whereas it does not contain any data.
Therefore, your consumer which already has committed offset 10 will try to read 11 when started again, independent of the consumer configuration auto.offset.reset.
Your example will get even more interesting when your topic has had offsets, say, until 15 while the consumer was shut down after committing offset 10. Now, imagine all offsets were removed from the topic due to the retention policy. If you then start your consumer only then the consumer configuration auto.offset.reset comes into effect as stated in the documentation:
"What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted)"
As long as the Kafka topic is empty there is no offset "set" for the consumer. The consumer just tries to find the next available offset, either based on
the last committed offset or,
in case the last committed offset does not exist anymore, the configuration given through auto.offset.reset.
Just as an additional note: Even though the messages seem to get cleaned by the retention policy you may still see some data in the topic due to Data still remains in Kafka topic even after retention time/size
Once the consumer group gets deleted from log, auto.offset.reset will take the precedence and consumers will start consuming data from beginning.
My Topic data is : 1,2,3,4,5,6,7,8,9,10
If the topic has the above data, the consumer will start from beginning, and all 1 to 10 records will be consumed
My Topic data is : 11,12,13,14,15,16,17,18,19,20
In this case if old data is purged due to retention, the consumer will reset the offset to earliest (earliest offset available at that time) and start consuming from there, for example in this scenario it will consume all from 11 to 20 (since 1 to 10 are purged)
I know that all the messages (or offset) in a Kafka Queue Partition has its offset number and it takes care of the sequence of offsets.
But if I have a Kafka Consumer Group (or single Kafka Consumer) which is reading particularly the Kafka Topic Partition then how it maintains up to which offset messages are read and who maintains this offset counter?
If the consumer goes down then how a new consumer will start reading the offset from the next unread (or not acknowledged) offset.
The information about Consumer Groups is all stored in the internal Kafka topic __consumer_offsets. Whenever a new group tries to read data from a topic it checks its offset position in that internal topic which has a deletion policy set to compact. The compaction keeps this topic small.
Kafka comes with a command line tool kafka-consumer-groups.sh that helps you understand which information is stored for each consumer group.
More information is given in the Kafka Documentation on offset tracking.
When we have multiple consumer reading from the topic with single partition Is there any possibility that all the consumer will get all the message.
I have created the two consumers with manual offset commit.started the first consumer and after 2 mins started 2nd consumer . The second consumer is reading from the message from where the 1st consumer stopped reading. Is there any possibility that the 2nd consumer will read all the message from beginning.I'm new to kafka please help me out.
In your consumer, you would be using commitSync which commits offset returned on last poll. Now, when you start your 2nd consumer, since it is in same consumer group it will read messages from last committed offset.
Messages which your consumer will consumes depends on the ConsumerGroup it belongs to. Suppose you have 2 partitions and 2 consumers in single Consumer Group, then each consumer will read from different partitions which helps to achieve parallelism.
So, if you want your 2nd consumer to read from beginning, you can do one of 2 things:
a) Try putting 2nd consumer in different consumer group. For this consumer group, there won't be any offset stored anywhere. At this time, auto.offset.reset config will decide the starting offset. Set auto.offset.reset to earliest(reset the offset to earliest offset) or to latest(reset the offset to latest offset).
b) Seek to start of all partitions your consumer is assigned by using: consumer.seekToBeginning(consumer.assignment())
Documentation: https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning-java.util.Collection-
https://kafka.apache.org/documentation/#consumerconfigs
Partition is always assigned to unique consumer in single consumer group irrespective of multiplpe consumers. It means only that consumer can read the data and others won't consume data until the partition is assigned to them. When consumer goes down, partition rebalance happens and it will be assigned to another consumer. Since you are performing manual commit, new consumer will start reading from committed offset.
I'm using sarama-cluster (written by Golang kafka consumer client)
In broker, my topic's partition offset was 11000 and my consumer group's partition offset was 10100.
Then I run my cluster-consumer, but nothing consume. (consume time was 1~2days later)
But when I produce message in the topic's partition, it consume! (In each partition)
A number of message is 901.
Why is it, that my consumer-cluster consume seems to activate when produce message?
My consumer setting was auto.offset.reset = lastest
This is because of your offset reset settings. auto.offset.reset = latest means your consumer group should wait for the newest records. If you want to consume from the beginning, use auto.offset.reset = earliest.
The official Kafka documentation: https://kafka.apache.org/0110/documentation.html
I have a topic let's say as test001 and supposed there is 10000 messages in topic . I have two consumer group's lets say test-group1 and test-group2 for consuming the message from the above topic.
If test-group1 consumer's has consumed 4000 message and test-group2 consumer's has consumed 4500 message so how can i do:
Reset the offset to 0 of test-group1 consumer group?
update the test-group1 consumer groups offset to 4500?
delete the message from topic and reset offset of all consumer group to 0?
I don't think you can reset the offset at consumer group level. You can use the seek method (in the Java client API) to move to the start of the partition (offset 0), end or any other offset of your choice. Try exploring some of the CLI options such as kafka-consumer-groups.sh, kafka-topics.sh
This ticket shows that one can produce directly to the __consumer_offsets topic to overwrite the offsets, using the special "__admin_client" id:
https://issues.apache.org/jira/browse/KAFKA-5246
I'm not familiar with the format of the __consumer_offsets topic messages. This post may help a bit but you'd need to do more digging yourself:
http://dayooliyide.com/post/kafka-consumer-offsets-topic/
It may be simpler to write an app that given your group id does a seek to the given position and commits the offset.
Offsets are stored for each topic+partition+group.id, not overall for the entire topic. You cannot delete committed offsets, only commit newer ones, or wait for them to expire from the _consumer-offsets topic (default of 24 hours).
In 0.11 there will be an offset management tool so you can change offsets from the CLI independently from the consuming apps.