When a kafka consumer group can miss a partition? - apache-kafka

We faced the problem few times. When a consumer group is created it doesn't contain all topic partitions. So, the consumer group didn't consume all messages. For example, in a topic with 6 partitions, the partition 1 was missing in the consumer group.
The problem is fixed by recreating the consumer group but we need to avoid this situation in the future. So, when and why a consumer group miss one or many partitions ?

Related

Weird behavior of consumer group rebalance due to increase in partitions

I have 2 Observations wrt consumer group rebalancing.
Observation 1 -
I have multiple consumers on the same group , each subscribed to different topics. The entire Consumer group rebalance is expected to happen when I increase partition count for a topic. But I'm not observing this. The newly created partitions are never automatically assigned to the existing consumer. For them to be assigned I have to recreate the consumers again.
Observation 2 -
I have a single consumer in a consumer group and this consumer is only subscribed to 1 topic. When I increase partition count on that topic, I am observing a rebalance within a couple of minutes.
Can anyone help me identify the problem here? I'm guessing that having multiple consumers subscribed to different topics is the issue here. Is my understanding right? Does anyone have an explanation for this behavior?

Can I have all the consumers of a group consume message from all the partitions of a kafka topic?

Let's say in Kafka I have 4 partitions of a topic 'A' and I have 20 consumers of Consumer Group 'AC'. I don't need any ordering, but I want to process the messages faster by scaling my consumer instances. Please note all messages are independent and can be processed independently.
I looked at a consumer configuration partition.assignment.strategy, but not sure if I can achieve dynamic assignment of consumer to partition, depending on the message availability.
One partition is assigned to exactly one consumer in the group. In your case you have only 4 consumers on 20 which are currently working. You have to increase partitions number if you want more assigned consumers.

I am reading Kafka documentation and trying to understand the working of it

I am reading Kafka documentation and trying to understand the working of it. This is regarding consumers. In brief, a topic is divided into number of partitions. There are number of consumer groups, each having number of consumer instances. Now, my question is, does each partition sends sends "same" message to each consumer groups, which in turn is given to specific consumer instance within the group?
If it is, how does Kafka ensures the message is processed only by one consumer?
Kindly guide me if I am missing something.
Well to put it simply :
we have topic divided into partitions.
we have consumer that consume data from thoses topics.
Consumers are part of consumer group by sharing the same group.id.
From a topic every partitions is consumed by one consumer within a consumer groups.
Example :
Topic "test" with 3 partitions.
Consumer group A : with 3 consumers
Consumer group B : with 2 consumers.
Ths two consumer groups A and B consumes data from the topic "test".
Within the group A every consumer (so 3) will consume one partition each whereas in group consumer B (with 2 consumer) , one consumer will read 2 partitions and the other will consume the last one.
If we have a last consumer group with only one consumer inside, it will read all 3 partitions of that topic.
Hope that's help, let me know if you didn't understand.

kafka consumer reads the same message

I have a single Topic with 5 partitions.
I have 5 threads, each creating a Consumer
All consumer are with the same consumer group using group.id.
I also gave each consumer a different and unique client.id
I see that 2 consumers are reading the same message to process
Should kafka handle this?
How do I troubleshoot it?
Consumers within the same group should not receive the same messages. The partitions should be split across all consumers and at any time Kafka's consumer group logic ensures only 1 consumer is assigned to each partition.
The exception is if 1 consumer crashes before it's able to commit its offset. In that case, the new consumer that gets assigned the partition will re-consume from the last committed offset.
You can use the consumer group tool kafka-consumer-groups that comes with Kafka to check the partitions assigned to each consumer in your group.

How multiple consumer group consumers work across partition on the same topic in Kafka?

I was reading this SO answer and many such blogs.
What I know:
Multiple consumers can run on a single partition when running multiple consumers with multiple consumer group id and only one consumer from a consumer group can consume at a given time from a partition.
My question is related to multiple consumers from multiple consumer groups consuming from the same topic:
What happens in the case of multiple consumers(different groups) consuming a single topic(eventually the same partition)?
Do they get the same data?
How offset is managed? Is it separate for each consumer?
(Might be opinion based) How do you or generally recommended way is to handle overlapping data across two consumers of a separate group operating on a single partition?
Edit:
"overlapping data": means two consumers of separate consumer groups operating on the same partition getting the same data.
Yes they get the same data. Kafka only stores one copy of the data in the topic partitions' commit log. If consumers are not in the same group then they can each get the same data using fetch requests from the clients' consumer library. The assignment of which partitions each group member will get is managed by the lead consumer of each group. The entire process in detailed steps is documented here https://community.hortonworks.com/articles/72378/understanding-kafka-consumer-partition-assignment.html
Offsets are "managed" by the consumers, but "stored" in a special __consumer_offsets topic on the Kafka brokers.
Offsets are stored for each (consumer group, topic, partition) tuple. This combination is also used as the key when publishing offsets to the __consumer_offsets topic so that log compaction can delete old unneeded offset commit messages and so that all offsets for the same (consumer group, topic, partition) tuple are stored in the same partition of the __consumer_offsets topic (which defaults to 50 partitions)
Each consumer group gets every message from a subscribed topic.
Yes
Offset are stored by partition. For example let's say you have a topic with 2 partitions and a consumer group named cg made up of 2 consumers. In that case Kafka assigns each of the consumers one of the partitions. Then the consumers fetch the offset for the partition they were assigned to from Kafka (e.g. consumer 'asks' Kafka: "What is the offset for this topic for consumer group cg partition 1", or partition 2 for the other consumer). After getting the correct offset the consumer polls some Kafka broker for the next message in that partition.
I'm not entirely sure what you mean by overlapping data, can you clarify a bit or give an example?