I am new to Kafka and read few tutorials. I couldn't understand the relationship between consumer and partition.
Please address my below queries.
As per documentation, only one consumer can consume message in group. Why do we need to create more consumers in that same group? What is the benefit?
Does consumer are assigned to individual partition by ZK? , if Yes, if producer sends message to different partition then how will other partition’s consumer consume the message ?
I have one topic and that has 3 partitions. I post msg, it goes to P0. I have 5 consumers (different consumer group). Will all consumers read message from P0? if I increase many Consumer, will all read message from same P0 ?
If all consumer read from same PO then how performance will be high?
How rebalancing is working? will it work when you increase consumer group or consumer in same group ?
Please clarify my questions and give some example.
Yes, only once consumer in consumer group can consume message from one partition, rest of consumer in the same group will be assigned to remaining partition to do parallel process. Advantage is parallel processing.
Yes partition will be assigned to consumer by ZK. Based on partition count and consumer count, allocation will be done. Ex: Topic (Test) has 3 Partition (P1, P2, and P3). We have one consumer (C1). C1 will read message from all partition. If you add one more consumer in that same group (c2). ZK will assign P1, p2 to C1 and P3 goes to C2. If add one more consumer (C3) than P1=C1, P2=C2 and P3=C3. No of consumer should not be greater than no of partition for that topic.
Above point will answer this one.
Rebalancing will work when you add consumer on the same consumer group.
Related
Our cluster runs Kafka 0.11 and has strict restrictions on using consumer groups. We cannot use arbitrary consumer groups so Admin has to create required consumer groups.
We run Kafka Connect HDFS Sinks to read data from topics and write to HDFS. All the topics have only one partition.
I can consider following two patterns when using Consumer Groups in Kafka HDFS Sink.
As shown in the pictures:
Case 1: Each topic has its own Consumer Group
Case 2: All the topics have a common Consumer Group
I am aware that when a topic has multiple partitions, and if a consumer failed, another consumer in the same consumer group take over that partition.
My question :
Does the same thing happen when multiple topics share the same consumer group? ie: if a Consumer failed(HDFS Sink), will another Consumer(HDFS Sink connector) takeover the work and read from that topic?
Update: Each Kafka HDFS Sink Connector subscribed to only one topic.
I'm surprised that all answers with "yes" are wrong. I just tested it and having the same group.id for consumers for different topic works well and does NOT mean that they share messages, because for Kafka the key is (topic, group) rather than just (group). Here is what I did:
created 2 different topics T1 and T2 with 2 partitions in each topic
created 2 consumers with the same group xxx
assigned consumer C1 to T1, consumer C2 to T2
produced messages to T1 - only consumer C1 assigned to T1 processed them
produced messages to T2 - only consumer C2 assigned to T2 processed them
killed consumer C1 and repeated 4-5 steps. Only consumer C2 processed messages from T2
messages from T1 were not processed
Conclusion: Consumers with the same group name subscribed to different topics will NOT consume messages from other topics, because the key is (topic, group)
Absolutely yes. The kafka consumers should monitor both topics and then, kafka will assign the partitions (per topic) to the current active members of the consumer group.
Regardless of having one or multiple partitions on every single topic, the consumers will take charge of monitoring the partitions per topic whenever a consumer failure happens in the same group.
When a failure happens, the Kafka will always trigger the re-balancing process in order to distribute the partitions to the remaining active consumers of the group and as a consequence, the work will continue running on that topics.
yes, as long as both consumers subscribe() the the same set of topics (topicA and topicB) the partitions of all topics will be distributed across all consumers.
in your case this would mean that if one of the consumers fails, both topics will be assigned to the surviving consumer.
The question asked is in the event of consumer fails in a consumer group, will the consumers available in the same group pick up the subscribed topics and starts processing again or not?.
But the accepted answer has the scenario where the topics are assigned to consumers, but if its auto assignment(i.e., subscribe) then the consumers that are idle in the group should pick the job of failed consumer and starts reading from the last committed offset. If its not then its breaking the consumer group parallelism architecture.
just look at this answer. Kafka consumer for multiple topic
Say I have a topic T1 with 3 partitions i.e. P1,P2 and P3. Where p1 is leader and rest are followers.
Now there are 2 producers want to push to same topic T1. I believe P1 will be leader for both of them ? Also single offset will be maintained
for both of them or offset is maintainer per partition per producer ?
Now I have single consumer which is polling from T1. Will it get messages from both producers by default or it has to explicitly mention producer name if it
wants message from specfic producer ?
Leader is not dependent on the producers or consumers, so p1 will be always returned as a leader. Offsets are not important for producers, they are defined per consumer group. Offset determines, which messages were read and committed by a consumer group.
Consumer will always read all the messages, it does not matter, which producer published them.
You're maybe mixing up replicas and partitions. When you say you have a topic with 3 partitions, it means your records will be dispatched amongs them according to the record key ( or dispatcher algo) .
There is no ' leader partition' . However you have a leader broker that handle a partition. In your case you will have 3 leaders, each of them managing one of your 3 partitions.
An interstingng post here, regarding Kafka partitions:
Understanding Kafka Topics and Partitions
Yannick
Say, there is a Consumer group. (Consumers with the same group ID).
The Consumer group is consuming Topic A from a Broker.
Topic A has 4 partitions, and there are 4 Consumers in that group.
Each Consumer consumes different partition. ( Consumer 1 takes messages in partition 1, Consumer 2 takes messages in partition 2 and so on because that's what consumer group does in kafka. Among Consumer Group, each has 1/4 of the topic.
My question : How do they share the message so that they all have Topic A?
How do they combine those bits and pieces? and where does this take place?
If my computer (consumer 1 of group A) consumes Topic A from a Broker, and my friend's computer (consumer 2 of group A) consumes other pieces of the same Topic, how do we combine the message in Topic A?
I thought of the term 'Consumer' a computer or a server consuming a Topic from a Broker. That's why I got confused with Consumer group.
Consumer is a client or a program, and I can have many consumer's on my computer or server. Consumer Group means multiple consumer processes on an independent machine
So I don't need to worry about Consumer's in a group sharing bits of message to complete a Topic. Previously, I thought each consumer being a server or a computing resource, so they had to communicate somehow. But that's how I got confused. They don't need to communicate to each other over the network or need a pool to share their consumed partitioned.
Consumer 1 can read from partition 1, Consumer 2 can read from partition 2, and if Consumer 1, 2 share the same group ID (Consumer group), Consumer 1 doesn't need to read from partition 2, and Consumer 2 doesn't need to read from partion 1. They already have a Topic they need. Boom!
I posted an answer to help someone who thought like me.
I am reading Kafka documentation and trying to understand the working of it. This is regarding consumers. In brief, a topic is divided into number of partitions. There are number of consumer groups, each having number of consumer instances. Now, my question is, does each partition sends sends "same" message to each consumer groups, which in turn is given to specific consumer instance within the group?
If it is, how does Kafka ensures the message is processed only by one consumer?
Kindly guide me if I am missing something.
Well to put it simply :
we have topic divided into partitions.
we have consumer that consume data from thoses topics.
Consumers are part of consumer group by sharing the same group.id.
From a topic every partitions is consumed by one consumer within a consumer groups.
Example :
Topic "test" with 3 partitions.
Consumer group A : with 3 consumers
Consumer group B : with 2 consumers.
Ths two consumer groups A and B consumes data from the topic "test".
Within the group A every consumer (so 3) will consume one partition each whereas in group consumer B (with 2 consumer) , one consumer will read 2 partitions and the other will consume the last one.
If we have a last consumer group with only one consumer inside, it will read all 3 partitions of that topic.
Hope that's help, let me know if you didn't understand.
I have a single Topic with 5 partitions.
I have 5 threads, each creating a Consumer
All consumer are with the same consumer group using group.id.
I also gave each consumer a different and unique client.id
I see that 2 consumers are reading the same message to process
Should kafka handle this?
How do I troubleshoot it?
Consumers within the same group should not receive the same messages. The partitions should be split across all consumers and at any time Kafka's consumer group logic ensures only 1 consumer is assigned to each partition.
The exception is if 1 consumer crashes before it's able to commit its offset. In that case, the new consumer that gets assigned the partition will re-consume from the last committed offset.
You can use the consumer group tool kafka-consumer-groups that comes with Kafka to check the partitions assigned to each consumer in your group.