kafka consumer reads the same message - apache-kafka

I have a single Topic with 5 partitions.
I have 5 threads, each creating a Consumer
All consumer are with the same consumer group using group.id.
I also gave each consumer a different and unique client.id
I see that 2 consumers are reading the same message to process
Should kafka handle this?
How do I troubleshoot it?

Consumers within the same group should not receive the same messages. The partitions should be split across all consumers and at any time Kafka's consumer group logic ensures only 1 consumer is assigned to each partition.
The exception is if 1 consumer crashes before it's able to commit its offset. In that case, the new consumer that gets assigned the partition will re-consume from the last committed offset.
You can use the consumer group tool kafka-consumer-groups that comes with Kafka to check the partitions assigned to each consumer in your group.

Related

Is Consumer Offset managed at Consumer group level or at the individual consumer inside that consumer group?

I am trying to find out is there any offsets at consumer group level as well. Is Consumer Offset is at Consumer group level or at the individual consumer inside that consumer group in Kafka ?
Offsets are tracked at ConsumerGroup level.
Imagine you have 4 consumer threads in one ConsumerGroup consuming from one topic with 4 partitions. If you now stop all 4 threads and restart just a single one with the same group the one threads will know where all 4 threads left off consuming and continue from there.
"you are saying one offset (basically a shared int/long value) will be shared/updated by all the consumers in a consumer group?"
Yes, this is correct. Remember that a single partition of a topic can be read only by one consumer thread within a group. Two consumer threads of the same ConsumerGroup will never consume a single topic partition at the same time. The offsets of the consumers groups are stored in an internal Kafka topic called __consumer_offsets. In this topic you basically have a key/value pair, where your key is basically the concatenation of
ConsumerGroup
Topic
Partition within Topic
and your value is the offset. This internal __consumer_offsets topic is available to all consumers so the information is shared.

How multiple consumers from different consumer groups read from same partition?

I have a use case where i have 2 consumers in different consumer groups(cg1 and cg2) subscribing to same topic(Topic A) with 4 partitions.
What happens if both consumers are reading from same partition and one of them failed and other one commited the offset?
In Kafka the offset management is done by Consumer Group per Partition.
If you have two consumer groups reading the same topic and even partition a commit from one consumer group will not have any impact to the other consumer group. The consumer groups are completely discoupled.
One consumer of a consumer group can read data from a single topic partition. A single consumer can't read data from multiple partitions of a topic.
Example Consumer 1 of Consumer Group 1 can read data of only single topic partition.
Offset management is done by the zookeeper.
__consumer_offsets: Every consumer group maintains its offset per topic partitions. Since v0.9 the information of committed offsets for every consumer group is stored in this internal topic (prior to v0.9 this information was stored on Zookeeper).
When the offset manager receives an OffsetCommitRequest, it appends the request to a special compacted Kafka topic named __consumer_offsets. Finally, the offset manager will send a successful offset commit response to the consumer, only when all the replicas of the offsets topic receive the offsets.
simultaneously two consumers from two different consumer groups(cg1 and cg2) can read the data from same topic.
In kafka 1: Offset management is taken care by zookeeper.
In kafka 2: offsets of each consumer is stored at __Consumer_offsets topic
Offset used for keeping the track of consumers (how much records consumed by consumers), let say consumer-1 consume 10 records and consumer-2 consume-20 records and suddenly consumer-1 got died now whenever the consumer-1 will up then it will start reading from 11th record onward.

Kafka multiple consumer

When we have multiple consumer reading from the topic with single partition Is there any possibility that all the consumer will get all the message.
I have created the two consumers with manual offset commit.started the first consumer and after 2 mins started 2nd consumer . The second consumer is reading from the message from where the 1st consumer stopped reading. Is there any possibility that the 2nd consumer will read all the message from beginning.I'm new to kafka please help me out.
In your consumer, you would be using commitSync which commits offset returned on last poll. Now, when you start your 2nd consumer, since it is in same consumer group it will read messages from last committed offset.
Messages which your consumer will consumes depends on the ConsumerGroup it belongs to. Suppose you have 2 partitions and 2 consumers in single Consumer Group, then each consumer will read from different partitions which helps to achieve parallelism.
So, if you want your 2nd consumer to read from beginning, you can do one of 2 things:
a) Try putting 2nd consumer in different consumer group. For this consumer group, there won't be any offset stored anywhere. At this time, auto.offset.reset config will decide the starting offset. Set auto.offset.reset to earliest(reset the offset to earliest offset) or to latest(reset the offset to latest offset).
b) Seek to start of all partitions your consumer is assigned by using: consumer.seekToBeginning(consumer.assignment())
Documentation: https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning-java.util.Collection-
https://kafka.apache.org/documentation/#consumerconfigs
Partition is always assigned to unique consumer in single consumer group irrespective of multiplpe consumers. It means only that consumer can read the data and others won't consume data until the partition is assigned to them. When consumer goes down, partition rebalance happens and it will be assigned to another consumer. Since you are performing manual commit, new consumer will start reading from committed offset.

What Happens when there is only one partition in Kafka topic and multiple consumers?

I have a Kafka Topic with only one partition and I am not getting what will happen in following cases? How messages will be delivered to consumers?
If all consumers are in same group
If all consumers are in different group
I am not sure if consumers will receive unique messages or duplicate ones.
Each Consumer subscribes to a/more partition in a topic. And each consumer belongs to a consumer group. Below are two scenarios:
When all consumers belong to the same group : Each consumer will try to subscribe to a different partition. In case,if there is only one partition, only one consumer will get the messages, while other consumers will be idle.
When all consumers belong to the different consumer group: Each consumer will get the messages from all partitions. Partition subscription is based on the consumer groups.
It depends on the consumer groups. Consumers within the same consumer group don't read the data again from the same partitions once the read offsets have been committed.

Kafka Consumer partition mapping

I have 100 consumers in same group listening to same topic and 100 partition. So as per the documentation each consumer should only listen to one partition since there are 100 consumers and 100 partitions. I produce the message to kafka using a key. So some message with the same key should go in the same partition and should always be consumed by the same consumer in the group. But in my case multiple messages with the same key are consumed multiple consumers randomly. Any way to do that all messages from a partition are consumed by only one specific consumer in the group. I do not want to explicitly assign partition to consumers.
Verify that your message partitioning is working as expected from the producer side
If you have 100 consumers using same consumer group id for a 100 partitions topic , each consumer will get exactly 1 partition to consume from.