How kafka identifies consumers in a group uniquely - apache-kafka

I have multiple consumers in a group. How does kafka identify each consumer to be different and map it to partition.
Or - What is the unique key used to identify a consumer in the group

Kafka generates a random consumer id with a format like
<client.id>-<uuid>
You can see this running a new console consumer in a group.
$ ./bin/kafka-console-consumer.sh --new-consumer --bootstrap-server kafka-1:9092 --consumer-property group.id=group1 -consumer-property client.id=myClient --topic topic1
and, while the consumer is running, executing the command line kafka-consumer-groups.sh to describe that group. Take a look at the CONSUMER-ID column.
$ ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server kafka-1:9092 --describe --group group1
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
topic1 0 0 0 0 myClient-e137f762-e550-4c8e-96d9-8f7f725e2c6d /127.0.0.1 myClient
Relevant Kafka code as of 0.10.2.1 looks like this:
val memberId = clientId + "-" + group.generateMemberIdSuffix
where
def generateMemberIdSuffix = UUID.randomUUID().toString

Related

How to create multiple kafka consumer groups in same topic

Below is required scenario.
Topic-1 has 6 partitions, now I want to create 3 consumer groups cg1,cg2 and cg3 and map it like this (cg1 - 0,1 ; cg2 - 2,3 ; cg3 - 4,5). How can i create it using kafka-console-consumer.sh or kafka-consumer-groups.sh
Even Kafka documentation explained about this scenario but nowhere mentioned how to do it.
Any help is appreciated !!!
Kafka Consumer Group is a collection of consumers who shared the same group id. Consumer Group distributes processing by sharing partitions across consumers.
The diagram below shows a single topic with three partitions and a consumer group with two members. Each partition in the topic is assigned to exactly one member of the group.
Note: topic with n partition can at max consume by n consumer of Consumer Group with 1 partition per consumer.
In your case, if you use a consumer group on a topic means all partitions will get assigned to that Consumer group.
But if you are not interested in Consumer Group you can directly assign a partition to each consumer group in that case rebalance will not come in the picture
I am using Kafka Confluent-kafka 2.6.0-5.1.2:
sh kafka-console-consumer --bootstrap-server localhost:9092 --partition 0 --topic abc --group cg1
sh kafka-console-consumer --bootstrap-server localhost:9092 --partition 1 --topic abc --group cg1
--partition <Integer: partition> : The partition to consume from Consumption starts from the end of the partition unless '--offset' is
specified.
Using consumer group you can describe consumer details
sh kafka-consumer-groups --bootstrap-server localhost:9020 --describe --group a
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
abc 0 123 678 0 - - -
abc 1 234 345 0 - - -
You can also manually assigned partition through Java as below
List<TopicPartition> partitions = new ArrayList<>();
partitions.add(new TopicPartition("abc", 0));
partitions.add(new TopicPartition("abc", 1));
......
new KafkaConsumer<>(consumerProperties).assign(partitions);
Note that it isn't possible to mix manual partition assignment (i.e. using assign) with dynamic partition assignment through topic subscription (i.e. using subscribe).
Ref: here
There are below alternate approaches:
Use 3 separate topics to consume messages using a separate consumer group.
Programmatically filter partitions while consuming messages.
I can't try it out right now, but I think you need to pass it as a consumer property
kafka-console-consumer.sh --consumer-property group.id=${your_group_id}
or if you have a config file
kafka-console-consumer.sh --consumer.config ${your_config_file}

How to get log end offset of all partitions for a given kafka topic using kafka command line?

When I describe a kafka topic it doesn't show the log end offset of any partition but show all the other metadata such as ISR,Replicas,Leader.
How do I see a log end offset of the partition for a given topic?
Ran this: ./kafka-topics.sh --zookeeper zk-service:2181 --describe --topic "__consumer_offsets"
Output Doesn't have a offset column.
Note: Need Only the log end offset.
Since you're only looking for the log end offset for a topic, you can use kafka-run-class with the kafka.tools.GetOffsetShell class.
Assuming your topic is __consumer_offsets, you would get the end offset by running:
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic __consumer_offsets
Change the --broker-list localhost:9092 to your desired Kafka address. This will list all of the log end offsets for each partition in the topic.
install kafkacat, its an easy to use kafka tool:
sudo apt-get update
sudo apt-get install kafkacat
kafkacat -C -b <kafka-broker-ip-and-port> -t <topic> -o -1
This will not consume anything because the offset is incremented after a message is added. But it will give you the offsets for all the partitions. Note however that this isn't the current offset that you are consuming at... The above answers will help you more in terms of looking into partition lag.
Following is the command you would need to get the offset of all partitions for a given kafka topic for a given consumer group:
kafka-consumer-groups --bootstrap-server <kafka-broker-list-with-ports> --describe --group <consumer-group-name>
Please note that the <consumer-group-name> at the end is important as the offsets are committed by consumers that are typically a part of a consumer group.
The output of this command may look something like:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
<topic-name> 0 62 62 0 <consumer-id> <host> <client>
In your post however, you're trying to get this information for the internal topic __consumer_offsets so you would need a consumer group which would have consumers consuming from this internal topic. You could perhaps do the following:
kafka-console-consumer --bootstrap-server <kafka-broker-list-with-ports> --topic __consumer_offsets --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --max-messages 5
Output of the above command:
[<consumer-group-name>,<topic-name>,0]::[OffsetMetadata[481690879,NO_METADATA],CommitTime 1479708539051,ExpirationTime 1480313339051]
Just use the <consumer-group-name> from the output and put it in the kafka-consumer-groups command mentioned in the beginning and you'll get the offset details for all the 50 partitions for the given consumer group only.
I hope this helps.

Reset kafka LAG (change offset) within consumer group in Kafka-python

I found this where I reset my LAG with the kafka-consumer-groups.sh tool How to change start offset for topic? but I am needing to reset it within the application. I found this example, but it doesn't seem to reset it. kafka-python read from last produced message after a consumer restart example
consumer = KafkaConsumer("MyTopic", bootstrap_servers=self.kafka_server + ":" + str(self.kafka_port),
enable_auto_commit=False,
group_id="MyTopic.group")
consumer.poll()
consumer.seek_to_end()
consumer.commit()
... continue on with other code...
Running bin\windows\kafka-consumer-groups.bat --bootstrap-server localhost:9092 --group MyTopic.group --describe still shows that both partitions have a LAG. How can I get the current-offset to "fast-foward" to the end?
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
MyTopic 0 52110 66195 14085 kafka-python-1.4.2-6afb6901-c651-4534-a482-15358db42c22 /Host1 kafka-python-1.4.2
MyTopic 1 52297 66565 14268 kafka-python-1.4.2-c70e0a71-7d61-46a1-97bc-aa2726a8109b /Host2 kafka-python-1.4.2
You may want this:
def consumer_from_offset(topic, group_id, offset):
"""return the consumer from a certain offset"""
consumer = KafkaConsumer(bootstrap_servers=broker_list, group_id=group_id)
tp = TopicPartition(topic=topic, partition=0)
consumer.assign([tp])
consumer.seek(tp, offset)
return consumer
consumer = consumer_from_offset('topic', 'group', 0)
for msg in consumer:
# it will consume the msg beginning from offset 0
print(msg)
In order to "fast forward" the offset of consumer group, means to clear the LAG, you need to create new consumer that will join the same group.
the console command for that is:
kafka-console-consumer.sh --bootstrap-server <brokerIP>:9092 --topic <topicName> --consumer-property group.id=<groupName>
In parallel you can run the command to see the lags like you described, and you will see the lag wiped.

Kafka doesn't let me delete consumer groups

I'm trying to delete consumer groups in kafka 10.2.1:
./kafka-consumer-groups.sh --zookeeper myzkhost --list
consumer-1
consumer-2
...
consumer-n
if i do:
./kafka-consumer-groups.sh --zookeeper myzkhost --delete --group consumer-1
Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API).
Error: Delete for group 'consumer-1' failed because group does not exist.
but when i go to my zookeeper host:
ls /consumers/consumer-1
[offset]
and that happens with all consumers... is it safe to delete the zookeeper path for those consumers?
Thanks!

Why kafka consumer (0.10.0.0) for new consumer group does see old/previously published messages?

I have a producer that publishes messages on a topic called 'mytopic' just fine. I have 2 consumers in 2 different consumer groups listening for these messages. I started these 2 consumers and producer in following sequence.
1) Start consumer 1 in group 'group1'
2) start producer to publish several hundreds messages
After sometime I check the offset of consumer 1, which is as I expect:
/opt/kafka_2.11-0.10.0.0/bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic mytopic --group group1
Output:
Group Topic Pid Offset logSize Lag Owner
group1 mytopic 0 30230 36942 6712 none
3) Now I start consumer 2 in group 'group2' to listen to the same messages but it comes back with 0 messages on every poll() call.
The offset check for this consumer shows me that its offset is same as the logSize.
/opt/kafka_2.11-0.10.0.0/bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic mytopic --group group2
Output:
Group Topic Pid Offset logSize Lag Owner
group2 mytopic 0 36942 36942 0 none
Same problem for any other consumer with a new consumer group. Why is the consumer joining a new consumer group after messages have been published not seeing the old messages even though messages exists on the topic (ie., haven't been deleted)?
You need to change parameter setting auto.offset.reset to value "earliest" in you consumer configuration -- default value is "latest" telling a new consumer to start consuming at the current end of the log.