I don't understand the practical use case of the consumer group in Kafka.
A partition can only be read by only one consumer in a consumer group, so only a subset of a topic record is read by one consumer.
Can someone help with any practical scenario where the consumer group helps?
It's for parallel processing of event messages from the specific topic.
Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
Read more here:
https://docs.confluent.io/5.3.3/kafka/introduction.html#consumers
Related
Our cluster runs Kafka 0.11 and has strict restrictions on using consumer groups. We cannot use arbitrary consumer groups so Admin has to create required consumer groups.
We run Kafka Connect HDFS Sinks to read data from topics and write to HDFS. All the topics have only one partition.
I can consider following two patterns when using Consumer Groups in Kafka HDFS Sink.
As shown in the pictures:
Case 1: Each topic has its own Consumer Group
Case 2: All the topics have a common Consumer Group
I am aware that when a topic has multiple partitions, and if a consumer failed, another consumer in the same consumer group take over that partition.
My question :
Does the same thing happen when multiple topics share the same consumer group? ie: if a Consumer failed(HDFS Sink), will another Consumer(HDFS Sink connector) takeover the work and read from that topic?
Update: Each Kafka HDFS Sink Connector subscribed to only one topic.
I'm surprised that all answers with "yes" are wrong. I just tested it and having the same group.id for consumers for different topic works well and does NOT mean that they share messages, because for Kafka the key is (topic, group) rather than just (group). Here is what I did:
created 2 different topics T1 and T2 with 2 partitions in each topic
created 2 consumers with the same group xxx
assigned consumer C1 to T1, consumer C2 to T2
produced messages to T1 - only consumer C1 assigned to T1 processed them
produced messages to T2 - only consumer C2 assigned to T2 processed them
killed consumer C1 and repeated 4-5 steps. Only consumer C2 processed messages from T2
messages from T1 were not processed
Conclusion: Consumers with the same group name subscribed to different topics will NOT consume messages from other topics, because the key is (topic, group)
Absolutely yes. The kafka consumers should monitor both topics and then, kafka will assign the partitions (per topic) to the current active members of the consumer group.
Regardless of having one or multiple partitions on every single topic, the consumers will take charge of monitoring the partitions per topic whenever a consumer failure happens in the same group.
When a failure happens, the Kafka will always trigger the re-balancing process in order to distribute the partitions to the remaining active consumers of the group and as a consequence, the work will continue running on that topics.
yes, as long as both consumers subscribe() the the same set of topics (topicA and topicB) the partitions of all topics will be distributed across all consumers.
in your case this would mean that if one of the consumers fails, both topics will be assigned to the surviving consumer.
The question asked is in the event of consumer fails in a consumer group, will the consumers available in the same group pick up the subscribed topics and starts processing again or not?.
But the accepted answer has the scenario where the topics are assigned to consumers, but if its auto assignment(i.e., subscribe) then the consumers that are idle in the group should pick the job of failed consumer and starts reading from the last committed offset. If its not then its breaking the consumer group parallelism architecture.
just look at this answer. Kafka consumer for multiple topic
I have a Kafka Topic with only one partition and I am not getting what will happen in following cases? How messages will be delivered to consumers?
If all consumers are in same group
If all consumers are in different group
I am not sure if consumers will receive unique messages or duplicate ones.
Each Consumer subscribes to a/more partition in a topic. And each consumer belongs to a consumer group. Below are two scenarios:
When all consumers belong to the same group : Each consumer will try to subscribe to a different partition. In case,if there is only one partition, only one consumer will get the messages, while other consumers will be idle.
When all consumers belong to the different consumer group: Each consumer will get the messages from all partitions. Partition subscription is based on the consumer groups.
It depends on the consumer groups. Consumers within the same consumer group don't read the data again from the same partitions once the read offsets have been committed.
I am new to Kafka . I wanted to know if multiple consumers can consume data from the topic at the same time and process them simultaneously or they are consumed and processed in batches ?
Kafka Consumers group themselves under a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
The processing of the data can be done simultaneously or in batches based on your requirement.
If you have less consumers than partitions, does that simply mean you will not consume all the messages on a given topic?
In a cloud environment, how are you suppose to keep track how many consumers are running and how many are pointing to a given topic#partition?
What if you have multiple consumers on a given topic#partition? I guess the consumer has to somehow keep track of what messages it has already processed in case of duplicates?
In fact, each consumer belongs to a consumer group. When Kafka cluster sends data to a consumer group, all records of a partition will be sent to a single consumer in the group.
If there're more paritions than consumers in a group, some consumers will consume data from more than one partition. If there're more consumers in a group than paritions, some consumers will get no data. If you add new consumer instances to the group, they will take over some partitons from old members. If you remove a consumer from the group (or the consumer dies), its partition will be reassigned to other member.
Now let's take a look at your questions:
If you have less consumers than partitions, does that simply mean you will not consume all the messages on a given topic?
NO. Some consumers in the same consumer group will consume data from more than one partition.
In a cloud environment, how are you suppose to keep track how many consumers are running and how many are pointing to a given topic#partition?
Kafka will take care of it. If new consumers join the group, or old consumers dies, Kafka will do reblance.
What if you have multiple consumers on a given topic#partition?
You CANNOT have multiple consumers (in a consumer group) to consume data from a single parition. However, if there're more than one consumer group, the same partition can be consumed by one (and only one) consumer in each consumer group.
1) No that means you will one consumer handling more than one consumer.
2) Kafka never assigns same partition to more than one consumer because that will violate order guarantee within a partition.
3) You could implement ConsumerRebalanceListener, in your client code that gets called whenever partitions are assigned or revoked from consumer.
You might want to take a look at this article specically "Assigning partitions to consumers" part. In that i have a sample where you create topic with 3 partitions and then a consumer with ConsumerRebalanceListener telling you which consumer is handling which partition. Now you could play around with it by starting 1 or more consumers and see what happens. The sample code is in github
http://www.javaworld.com/article/3066873/big-data/big-data-messaging-with-kafka-part-2.html
I have two consumer servers with same group id subscribed the same topic.
A kafka server is running with only one partition.
As far as I know, the message should be consumed randomly in those two consumer servers.
But now it seems to be always the same consumer server A consume messages, another one does not consume messages.If I stop consumer server A, another one will work fine.
What I expect that they can consume message randomly.
To be able to use two consumer instances in parallel you need at least two partitions in the topic. A consumer will bind to one or more partitions of a topic and other consumers with the same groupId will not claim partitions which already have consumers bound to them. If a consumer fails/crashes, the partition will be released and then picked up by another consumer instance.