Kafka Consumer from different group consuming from different partition of Topic - apache-kafka

I have a scenario where I have deployed 4 instances of Kafka Consumer on different nodes. My topic has 4 partitions. Now, I want to configure the Consumers in such a way that they all fetch from different partitions of the topic.
I know for a fact that if the Consumers are from the same consumer group, they ensure that the partitions are split equally. But in my case, they are not in the same group.

In order to achieve what you want you need the consumers being in the same consumer group. Only in this case a "competing consumer" pattern is applied : each consumer receives 1 partition from the 4, so you have 4 consumers each one reading from 1 partition and receiving messages for that partitions.
When consumers are part of different consumer groups, each consumer will be assigned to all 4 partitions receiving messages from all of them in a publish/subscribe way.

Related

Kafka Consumers subscribed to different topics in same consumer group

I am starting with kafka , have a question on the consumer groups. We have an application where we want different consumers from same group subscribing to different topics. The grouping is done based on some business criteria . To be specific consumer 1 from group A and consumer 2 from group A are subscribed to Topic 1 and Topic 2 each with 10 partitions. Does this mean that there consumer 1 can scale to 10 and consumer 2 also can scale to 10 since they are subscribed to different topics. Is this a correct design
yes, since within a topic kafka try to assign partitions to consumers as equal as possible. The key is topic:consumer_group_id so it doesn't matter another_topic:same_consumer_group_id - it's another key, and consumer with topic:consumer_group_id could be scaled to partitions number

Number of consumers in kafka comsumer-group

If a producer has 3 topics and 4 partitions each topic, should the consumer group contains 4 or 12 consumers?
I want to achieve ideal consumption.
There should be one consumer each partition for ideal consumption. So, for your case, 12 consumers should be ideal.
If you have N partitions, then you can have up to N consumers within the same consumer group each of which reading from a single partition. When you have less consumers than partitions, then some of the consumers will read from more than one partition. Also, if you have more consumers than partitions then some of the consumers will be inactive and will receive no messages at all.
You cannot have multiple consumers -within the same consumer group- consuming data from a single partition. Therefore, in order to consume data from the same partition using N consumers, you'd need to create N distinct consumer groups too.
Note that partitioning enhances the parallelism within a Kafka cluster. If you create thousands of consumers to consume data from only one partition, I suspect that you will lose some level of parallelism.
If you have 3 topics with 4 partition each.
For best optimisation you should have 4 consumers per consumer group.
Reason : If you have more than 4 consumers ,your extra consumers would be left ideal, because 4 consumers will be assigned 4 partitions with 1 consumer assigned 1 partition. So in short more than 4 consumers is not required per consumer group.
If you have less consumers say 2 consumers for 4 topics , each consumer will consume messages from 2 partitions each which will overload it.
There is no limit in number of consumer groups which subscribe to a topic.

Can I have all the consumers of a group consume message from all the partitions of a kafka topic?

Let's say in Kafka I have 4 partitions of a topic 'A' and I have 20 consumers of Consumer Group 'AC'. I don't need any ordering, but I want to process the messages faster by scaling my consumer instances. Please note all messages are independent and can be processed independently.
I looked at a consumer configuration partition.assignment.strategy, but not sure if I can achieve dynamic assignment of consumer to partition, depending on the message availability.
One partition is assigned to exactly one consumer in the group. In your case you have only 4 consumers on 20 which are currently working. You have to increase partitions number if you want more assigned consumers.

Scaling up kafka consumer applications

Lets say I have one consumer group which subscribed to 4 topics and partitions for each topics are:-
EDITED:
First topic => 5 partitions
Second topic => 3 partitions
Third topic => 2 partitions
Fourth topic => 1 partitions
Total number of partitions = 11. So total how many applications I can run.
5(max number of partitions in input topics) or 11?
In kafka, scaling consumers depends on partition number.
Lets assume you have one topic with 3 partitions. And you have 2 different consumer app (different consumer groups) which does different works.
You can scale your consumer number up to 3 for per consumer group.
Single consumer (consumer group A) can consume messages from 3
partitions.
Two consumer (same consumer group) can not consume single
partition.
Take look at image : https://hadoopabcd.files.wordpress.com/2015/04/consumer-group.png
Read more about consumer groups blog series : https://dzone.com/articles/understanding-kafka-consumer-groups-and-consumer-l
In ideal situation the number of consumer in the consumer group should be equal to the number of partition. If that is not the case then you can have more then one consumer group kafka provides the feature that 2 consumer from the different consumer group can read from the same partition. That’s totally depends on your resources how many resources do you have for running the consumers.
Suppose you have an application that needs to read messages from a Kafka topic, run some validations against them, and write the results to another data store. In this case your application will create a consumer object, subscribe to the appropriate topic, and start receiving messages, validating them and writing the results. This may work well for a while, but what if the rate at which producers write messages to the topic exceeds the rate at which your application can validate them? If you are limited to a single consumer reading and processing the data, your application may fall farther and farther behind, unable to keep up with the rate of incoming messages. Obviously there is a need to scale consumption from topics. Just like multiple producers can write to the same topic, we need to allow multiple consumers to read from the same topic, splitting the data between them.
Kafka consumers are typically part of a consumer group. When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic.
Please refer to this https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html

How does Kafka help to realize the abstraction of queuing as well as publish-subscribe?

The Kafka documentation states that:
Consumers label themselves with a consumer group name, and each message published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.
If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
I've a couple of doubts regarding this:
1) Why will the published message go to a single consumer instance of a consumer group? Isn't it the responsibility of consumers to read from the partitions? What does go even mean here?
2)The consumers which are interested in particular topics should just read from the partition they're interested in. What's the relevance of consumer groups?
3) And how does this help to realize the abstraction of queue and publisher-subscirber?
In Kafka a topic can have multiple partitions, if a consumer group has X number of consumers, the partitions for that topic will be split among the consumers. (i.e: if you have 1 topic with 2 partitions, and you have a consumer group with 2 consumers, each consumer will consume from 1 partition, in the same scenario if the consumer group only has 1 consumer, that consumer will read from 2 partitions)
The consumer group basically coordinates (is a coordinator) the different consumers with the topic/s and partitions. If you have 4 consumers in the same CG and 1 crashes the consumer group will give the partitions of the crashed consumer to the other consumers available in the same CG so the information in those partition is processed (if the CG didn't redistribute the different partitions some of the partitions will never be read if a consumer crashes).
If the consumers are in the same CG then the information that is sent to the topic is distributed among them. If each of the consumers has a different CG then they will all get all the messages.
Hope it's more clear now, the Kafka documentation needs improvement.