Weird behavior of consumer group rebalance due to increase in partitions - apache-kafka

I have 2 Observations wrt consumer group rebalancing.
Observation 1 -
I have multiple consumers on the same group , each subscribed to different topics. The entire Consumer group rebalance is expected to happen when I increase partition count for a topic. But I'm not observing this. The newly created partitions are never automatically assigned to the existing consumer. For them to be assigned I have to recreate the consumers again.
Observation 2 -
I have a single consumer in a consumer group and this consumer is only subscribed to 1 topic. When I increase partition count on that topic, I am observing a rebalance within a couple of minutes.
Can anyone help me identify the problem here? I'm guessing that having multiple consumers subscribed to different topics is the issue here. Is my understanding right? Does anyone have an explanation for this behavior?

Related

When a kafka consumer group can miss a partition?

We faced the problem few times. When a consumer group is created it doesn't contain all topic partitions. So, the consumer group didn't consume all messages. For example, in a topic with 6 partitions, the partition 1 was missing in the consumer group.
The problem is fixed by recreating the consumer group but we need to avoid this situation in the future. So, when and why a consumer group miss one or many partitions ?

Does consumer consume from replica partitions if multiple consumers running under same consumer group?

I am writing a kafka consumer application. I have a topic with 4 partitions - 1 is leader and 3 are followers. Producer uses key to identify a partition to push a message.
If I write a consumer and run it on different nodes or start 4 instances of same consumer, how message consuming will happen ? Does all 4 instances will get same messages ?
What happens in the case of multiple consumer(same group) consuming a single topic?
Do they get same data?
How offset is managed? Is it separate for each consumer?
I would suggest that you read at least first few chapters of confluent's definitive guide to kafka to get a priliminary understanding of how kafka works.
I've kept my answers brief. Please refer to the book for detailed explanation.
How offset is managed? Is it separate for each consumer?
Depends on the group id. Only one offset is managed for a group.
What happens in the case of multiple consumer(same group) consuming a single topic?
Consumers can be multiple - all can be identified by the same or different groups.
If 2 consumers belong to the same group, both will not get all messages.
Do they get same data?
No. Once a message is sent and a read is committed, the offset is incremented for that group. So a different consumer with the same group will not receive that message.
Hope that helps :)
What happens in the case of multiple consumer(same group) consuming a single topic?
Answer: Producers send records to a particular partition based on the record’s key here. The default partitioner for Java uses a hash of the record’s key to choose the partition. When there are multiple consumers in same consumer group, each consumer gets different partition. So, in this case, only single consumer receives all the messages. When the consumer which is receiving messages goes down, group coordinator (one of the brokers in the cluster) triggers rebalance and then that partition is assigned to one of the available consumer.
Do they get same data?
Answer: If consumer commits consumed messages to partition and goes down, so as stated above, rebalance occurs. The consumer who gets this partition, will not get messages. But if consumer goes down before committing its then the consumer who gets this partition, will get messages.
How offset is managed? Is it separate for each consumer?
Answer: No, offset is not separate to each consumer. Partition never gets assigned to multiple consumers in same consumer group at a time. The consumer who gets partition assigned, gets offset as well by default.

Can I have all the consumers of a group consume message from all the partitions of a kafka topic?

Let's say in Kafka I have 4 partitions of a topic 'A' and I have 20 consumers of Consumer Group 'AC'. I don't need any ordering, but I want to process the messages faster by scaling my consumer instances. Please note all messages are independent and can be processed independently.
I looked at a consumer configuration partition.assignment.strategy, but not sure if I can achieve dynamic assignment of consumer to partition, depending on the message availability.
One partition is assigned to exactly one consumer in the group. In your case you have only 4 consumers on 20 which are currently working. You have to increase partitions number if you want more assigned consumers.

Kafka rebalance at the group name level?

I have 3 topics A,B,C with same number of partitions. I use the same group name for all the consumers to this topics.
My questions are:
If a consumer for one of the topics/partitions will rebalance be triggered for the other two topics consumers?
Same if adding a new partition for one topic, will rebalance be triggered for the other two topics consumers?
More general, the consumers rebalance occurs at the group level no matter from what topics are consuming (considering they have the same group name)?
I look forward for your answers.
Regards,
Florin
Yes.
On rebalance, from a logical point of view, the previous (global) partitions assignment gets invalidated and each partition gets reassigned.
See FAQ "Can I predict the results of the consumer rebalance?"

Kafka: Which consumer will get the message if there are many consumers that subscribe one topic with the same group-id?

Supposing we have 4 consumers with the same group-id and one topic including 3 partitions. If a producer post a message to partition-1, then which consumer will get this message?
I'll give a bit more detailed answer here.
So the first point is that there is no reason to have more consumer threads (and each consumer has at least 1 consumer thread) than the number of partitions being consumed. The reason is that if you have more consumer threads than partitions, some consumer threads will just end up being idle and will just waste resources. So given the example you attached there is no point of having 4 consumers for 3 partitions.
The second point - the partition assignment depends on the strategy chosen by consumers in the group. Currently there are 2 partition assignment strategies - Range and RoundRobin. If you are using the Range strategy you can predict what partitions will be consumed by each consumer after rebalance. With RoundRobin strategy though you can't predict beforehand the partition assignments for consumers after rebalance.
The detailed answer that explains how consumer rebalancing works and how partitions are assigned is here.
You can also view current partition assignments for your consumer group in Zookeeper at /consumers/[group_id]/owners/[topic]/[partition]