How do i reset, update or clear the offset of each consumer group for a topic in Kafka? - apache-kafka

I have a topic let's say as test001 and supposed there is 10000 messages in topic . I have two consumer group's lets say test-group1 and test-group2 for consuming the message from the above topic.
If test-group1 consumer's has consumed 4000 message and test-group2 consumer's has consumed 4500 message so how can i do:
Reset the offset to 0 of test-group1 consumer group?
update the test-group1 consumer groups offset to 4500?
delete the message from topic and reset offset of all consumer group to 0?

I don't think you can reset the offset at consumer group level. You can use the seek method (in the Java client API) to move to the start of the partition (offset 0), end or any other offset of your choice. Try exploring some of the CLI options such as kafka-consumer-groups.sh, kafka-topics.sh

This ticket shows that one can produce directly to the __consumer_offsets topic to overwrite the offsets, using the special "__admin_client" id:
https://issues.apache.org/jira/browse/KAFKA-5246
I'm not familiar with the format of the __consumer_offsets topic messages. This post may help a bit but you'd need to do more digging yourself:
http://dayooliyide.com/post/kafka-consumer-offsets-topic/
It may be simpler to write an app that given your group id does a seek to the given position and commits the offset.

Offsets are stored for each topic+partition+group.id, not overall for the entire topic. You cannot delete committed offsets, only commit newer ones, or wait for them to expire from the _consumer-offsets topic (default of 24 hours).
In 0.11 there will be an offset management tool so you can change offsets from the CLI independently from the consuming apps.

Related

Where is offset of consumer stored in Kafka [duplicate]

From what I understand a consumer reads messages off a particular topic, and the consumer client will periodically commit the offset.
So if for some reason the consumer fails a particular message, that offset won't be committed and you can then go back and reprocess he message.
Is there anything that tracks the offset you just consumed and the offset you then commit?
Does kafka distinguish between consumed offset and commited offset?
Yes, there is a big difference. The consumed offset is managed by the consumer in such a way that the consumer will fetch subsequent messages out of a topic partition.
The consumer can (but it is not a must) commit a message either automatically or by calling the commit API. The information is stored in a Kafka internal topic called __consumer_offsets and stores the committed offset based on ConsumerGroup, Topic and Partition. It will be used if the client is getting restartet or a new consumer joins/leaves the ConsumerGroup.
Just keep in mind that if your client does not committ offset n but later committs offset n+1, for Kafka it won't make a different to the case when you commit both offsets.
Edit: More details on consumed and committed offsets can be found in the JavaDocs of KafkaConsumer on Offsets and Consumer Position:
Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition. For example, a consumer which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5. There are actually two notions of position relevant to the user of the consumer:
The position of the consumer gives the offset of the next record that will be given out. It will be one larger than the highest offset the consumer has seen in that partition. It automatically advances every time the consumer receives messages in a call to poll(Duration).
The committed position is the last offset that has been stored securely. Should the process fail and restart, this is the offset that the consumer will recover to. The consumer can either automatically commit offsets periodically; or it can choose to control this committed position manually by calling one of the commit APIs (e.g. commitSync and commitAsync).
This distinction gives the consumer control over when a record is considered consumed. It is discussed in further detail below.

Does kafka distinguish between consumed offset and commited offset?

From what I understand a consumer reads messages off a particular topic, and the consumer client will periodically commit the offset.
So if for some reason the consumer fails a particular message, that offset won't be committed and you can then go back and reprocess he message.
Is there anything that tracks the offset you just consumed and the offset you then commit?
Does kafka distinguish between consumed offset and commited offset?
Yes, there is a big difference. The consumed offset is managed by the consumer in such a way that the consumer will fetch subsequent messages out of a topic partition.
The consumer can (but it is not a must) commit a message either automatically or by calling the commit API. The information is stored in a Kafka internal topic called __consumer_offsets and stores the committed offset based on ConsumerGroup, Topic and Partition. It will be used if the client is getting restartet or a new consumer joins/leaves the ConsumerGroup.
Just keep in mind that if your client does not committ offset n but later committs offset n+1, for Kafka it won't make a different to the case when you commit both offsets.
Edit: More details on consumed and committed offsets can be found in the JavaDocs of KafkaConsumer on Offsets and Consumer Position:
Kafka maintains a numerical offset for each record in a partition. This offset acts as a unique identifier of a record within that partition, and also denotes the position of the consumer in the partition. For example, a consumer which is at position 5 has consumed records with offsets 0 through 4 and will next receive the record with offset 5. There are actually two notions of position relevant to the user of the consumer:
The position of the consumer gives the offset of the next record that will be given out. It will be one larger than the highest offset the consumer has seen in that partition. It automatically advances every time the consumer receives messages in a call to poll(Duration).
The committed position is the last offset that has been stored securely. Should the process fail and restart, this is the offset that the consumer will recover to. The consumer can either automatically commit offsets periodically; or it can choose to control this committed position manually by calling one of the commit APIs (e.g. commitSync and commitAsync).
This distinction gives the consumer control over when a record is considered consumed. It is discussed in further detail below.

Kafka: Who maintains that upto which offset number message is read by a consumer group?

I know that all the messages (or offset) in a Kafka Queue Partition has its offset number and it takes care of the sequence of offsets.
But if I have a Kafka Consumer Group (or single Kafka Consumer) which is reading particularly the Kafka Topic Partition then how it maintains up to which offset messages are read and who maintains this offset counter?
If the consumer goes down then how a new consumer will start reading the offset from the next unread (or not acknowledged) offset.
The information about Consumer Groups is all stored in the internal Kafka topic __consumer_offsets. Whenever a new group tries to read data from a topic it checks its offset position in that internal topic which has a deletion policy set to compact. The compaction keeps this topic small.
Kafka comes with a command line tool kafka-consumer-groups.sh that helps you understand which information is stored for each consumer group.
More information is given in the Kafka Documentation on offset tracking.

kafka multiple consumer groups

I have multiple consumer groups each having 1 consumer.
Say, Topic T1 and Partition P1 and 2 consumer group CG1 and CG2
CG1 has 1 consumer - CG1#C1 and correspondingly CG2 has CG2#C1
Now we are receiving the same message across multiple consumer groups which is expected as per design.
My question is how Kafka maintains the offset across multiple consumer groups.
If 1 consumer group is down for few minutes and agian it is backed up then how it will get the message from the last committed read.
CG1 is in offset 24 and CG2' before going down committed till offset 10. so when CG2 is up how it will start getting from offset 11 and CG1 is not impacted.
By default the offsets are stored in a special topic called __consumer_offsets. This is a compacted topic, where the offsets are stored as messages with a key which contains the topic, partition and the consumer group as a key. Thanks to this the offsets for different consumer groups are independent. Hope this answers your question. For more information about how exactly this is handled, you can have a look here: http://kafka.apache.org/documentation/#impl_offsettracking
Consumer groups are independent entities. If consumers in the different consumer groups, subscribe to the same kafka topic, there won't be any clash between offset management. Each consumer group will maintain it's own offset in __consumer_offsets topics(this is the compacted internal topic managing the consumer offsets)
Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the offset manager. Whenever a consumer reads a message from a partition, it sends the offset commit request to offset manager. Once the commit request is accepted, the consumer can read the next offset.
In case of offset commit failure, consumer retries to commit. Each offset commit is maintained per partition.
Here is the detailed explanation : http://kafka.apache.org/documentation/#impl_offsettracking

read kafka message starting from a specific offset using high level API

I hope I am not making a mistake, but I remember that in Kafka documentation it mentioned that using high level APIs you can't start reading messages from a specific offset, but it was mentioned that it would change.
Is it possible now using the high level APIs to read messages from a specific partition and a specific offset? Could you please give me an example how to do it?
I am using kafka 0.8.1.1.
Thanks in advance.
You can do that with kafka 0.9:
http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
public void seek(TopicPartition partition, long offset)
Overrides the fetch offsets that the consumer will use on the next poll(timeout). If this API is invoked for the
same partition more than once, the latest offset will be used on the
next poll(). Note that you may lose data if this API is arbitrarily
used in the middle of consumption, to reset the fetch offsets
Kafka 0.8.1.1 can use Zookeeper to store offsets for each consumer group. If you configure your consumer to commit offsets to zookeeper than you Need just to manually set the starting offset for the topic and partition under zookeeper for your consumer Group.
You Need to connect to zookeeper and use the set command:
set /consumers/[groupId]/offsets/[topic]/[partitionId] -> long (offset)
E.g. setting offset 10 for partition 0 of topicname for the spark-app consumer Group.
set /consumers/spark-app/offsets/topicname/0 10
When a consumer starts to consume message from Kafka it always starts to consume from the last committed offset. If this last committes offset is not.valid for any reason than the consumer applies the logic due the configurazione properties auto.offset.reset.
Hope this helps.