how to get the group commit offset from kafka(0.10.x) - apache-kafka

The offsets informations of the group were stored in zookeeper before. Now, in the Kafka Cluster (0.10.x), the offsets informations are stored in the topic which's name is __consumer_offsets.
But how could I get the offsets information of the group which I specified?

For active groups, invoke command below to retrieve the offsets:
bin/kafka-consumer-groups.sh --bootstrap-server broker1:9092 --describe --group test-consumer-group
For inactive groups, first get the target offset topic partition number by calculating Math.abs(groupId.hashCode()) % 50, then invoke:
bin/kafka-simple-consumer-shell.sh --topic __consumer_offsets --partition <calculated number> --broker-list broker1:9092 --formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter"
to find offsets for the groups.

Related

How to create multiple kafka consumer groups in same topic

Below is required scenario.
Topic-1 has 6 partitions, now I want to create 3 consumer groups cg1,cg2 and cg3 and map it like this (cg1 - 0,1 ; cg2 - 2,3 ; cg3 - 4,5). How can i create it using kafka-console-consumer.sh or kafka-consumer-groups.sh
Even Kafka documentation explained about this scenario but nowhere mentioned how to do it.
Any help is appreciated !!!
Kafka Consumer Group is a collection of consumers who shared the same group id. Consumer Group distributes processing by sharing partitions across consumers.
The diagram below shows a single topic with three partitions and a consumer group with two members. Each partition in the topic is assigned to exactly one member of the group.
Note: topic with n partition can at max consume by n consumer of Consumer Group with 1 partition per consumer.
In your case, if you use a consumer group on a topic means all partitions will get assigned to that Consumer group.
But if you are not interested in Consumer Group you can directly assign a partition to each consumer group in that case rebalance will not come in the picture
I am using Kafka Confluent-kafka 2.6.0-5.1.2:
sh kafka-console-consumer --bootstrap-server localhost:9092 --partition 0 --topic abc --group cg1
sh kafka-console-consumer --bootstrap-server localhost:9092 --partition 1 --topic abc --group cg1
--partition <Integer: partition> : The partition to consume from Consumption starts from the end of the partition unless '--offset' is
specified.
Using consumer group you can describe consumer details
sh kafka-consumer-groups --bootstrap-server localhost:9020 --describe --group a
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
abc 0 123 678 0 - - -
abc 1 234 345 0 - - -
You can also manually assigned partition through Java as below
List<TopicPartition> partitions = new ArrayList<>();
partitions.add(new TopicPartition("abc", 0));
partitions.add(new TopicPartition("abc", 1));
......
new KafkaConsumer<>(consumerProperties).assign(partitions);
Note that it isn't possible to mix manual partition assignment (i.e. using assign) with dynamic partition assignment through topic subscription (i.e. using subscribe).
Ref: here
There are below alternate approaches:
Use 3 separate topics to consume messages using a separate consumer group.
Programmatically filter partitions while consuming messages.
I can't try it out right now, but I think you need to pass it as a consumer property
kafka-console-consumer.sh --consumer-property group.id=${your_group_id}
or if you have a config file
kafka-console-consumer.sh --consumer.config ${your_config_file}

Command to know if offset kafka was reseted

Recently I had the name of my topic changed and then it seems that my consumer read all the messages from the topic, ignoring the offset. I wonder if anyone knows a command that I can check if my offset has been reset?
Thanks
Marcus
In kafka version 2+
If you describe your consumer group to can see the offset:
kafka-consumer-groups --describe --group <consumer group name> --bootstrap-server <kafka broker IP>:9092
To change offset to latest:
kafka-consumer-groups --bootstrap-server <kafka broker IP>:9092 --group <consumer group name> --topic <Topic name> --reset-offsets --to-latest --execute
Based on your kafka consumer, consumer group have property to read messages from the beginning, from an offset or latest.

Current offset behavior when set by kafka-consumer-groups to earliest?

I have a kafka topic with 25 partitions and the cluster has been running for 5 months.
As per my understanding for each partition for a given topic, the offset starts from 0,1,2... (un-bounded)
I see log-end-offset at a very high value (right now -> 1230628032)
I created a new consumer group with offset being set to earliest; so i expected the offset from which a client for that consumer group will start from offset 0.
The command which I used to create a new consumer group with offset to earliest:
kafka-consumer-groups --bootstrap-server <IP_address>:9092 --reset-offsets --to-earliest --topic some-topic --group to-earliest-cons --execute
I see the consumer group being created. I expected the current-offset being to 0; however when I described the consumer group the current offset was very high , at the moment --> 1143755193.
The record retention period set is for 7 days (standard value).
My question is why didn't we see the first offset from which a consumer from this consumer group will read 0? Has it to do something with data-retention?
Can anyone help understand this?
It is exactly data retention. It is highly probable that Kafka already removed old messages with offset 0 from your partitions, so it doesn't make sense to start from 0. Instead, Kafka will set offset to the earliest available message on your partition. You can check those offsets using:
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <IP_address>:9092 --topic some-topic --time -2
You will probably see values really close to what you're seeing as new consumer offset.
You can also try and set offset explicitly to 0:
./kafka-consumer-groups.sh --bootstrap-server <IP_address>:9092 --reset-offsets --to-offset 0 --topic some-topic --group to-earliest-cons --execute
However, you will see warning that offset 0 does not exist and it will use higher value (aforementioned earliest message available)
New offset (0) is lower than earliest offset for topic partition some-topic. Value will be set to 1143755193

How to get log end offset of all partitions for a given kafka topic using kafka command line?

When I describe a kafka topic it doesn't show the log end offset of any partition but show all the other metadata such as ISR,Replicas,Leader.
How do I see a log end offset of the partition for a given topic?
Ran this: ./kafka-topics.sh --zookeeper zk-service:2181 --describe --topic "__consumer_offsets"
Output Doesn't have a offset column.
Note: Need Only the log end offset.
Since you're only looking for the log end offset for a topic, you can use kafka-run-class with the kafka.tools.GetOffsetShell class.
Assuming your topic is __consumer_offsets, you would get the end offset by running:
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic __consumer_offsets
Change the --broker-list localhost:9092 to your desired Kafka address. This will list all of the log end offsets for each partition in the topic.
install kafkacat, its an easy to use kafka tool:
sudo apt-get update
sudo apt-get install kafkacat
kafkacat -C -b <kafka-broker-ip-and-port> -t <topic> -o -1
This will not consume anything because the offset is incremented after a message is added. But it will give you the offsets for all the partitions. Note however that this isn't the current offset that you are consuming at... The above answers will help you more in terms of looking into partition lag.
Following is the command you would need to get the offset of all partitions for a given kafka topic for a given consumer group:
kafka-consumer-groups --bootstrap-server <kafka-broker-list-with-ports> --describe --group <consumer-group-name>
Please note that the <consumer-group-name> at the end is important as the offsets are committed by consumers that are typically a part of a consumer group.
The output of this command may look something like:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
<topic-name> 0 62 62 0 <consumer-id> <host> <client>
In your post however, you're trying to get this information for the internal topic __consumer_offsets so you would need a consumer group which would have consumers consuming from this internal topic. You could perhaps do the following:
kafka-console-consumer --bootstrap-server <kafka-broker-list-with-ports> --topic __consumer_offsets --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --max-messages 5
Output of the above command:
[<consumer-group-name>,<topic-name>,0]::[OffsetMetadata[481690879,NO_METADATA],CommitTime 1479708539051,ExpirationTime 1480313339051]
Just use the <consumer-group-name> from the output and put it in the kafka-consumer-groups command mentioned in the beginning and you'll get the offset details for all the 50 partitions for the given consumer group only.
I hope this helps.

Before consumers for new topic are attached, I create new topic and produce message in apache kafka

Before consumers for new topic are attached, I create new topic and produce a first message in apache kafka.
Then consumers for new topic are attached, but the first message could not be consumed.
Why..?
In this case, already log-end offset=1, commited offset=1, lag=0.
Doesn't "commited offset=1" mean it's already been consumed?
My question is why it has already been consumed.
Let me know if there's anything I'm wrong with.
This is my test case.
# create new topic
$ kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic NEW_TOPIC_NAME
# produce a first message
$ kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic NEW_TOPIC_NAME
> send a first message
# then execute consumer
$ kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic NEW_TOPIC_NAME
> # no consume a first message
But after consumers for new topic are attached, I produce a second message then normally consume.
By default, the kafka-console-consumer starts from the end of the topic.
If you want to consume messages produced before, you can set --from-beginning for example:
kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092
--topic NEW_TOPIC_NAME --from-beginning