Number of consumers in kafka comsumer-group - apache-kafka

If a producer has 3 topics and 4 partitions each topic, should the consumer group contains 4 or 12 consumers?
I want to achieve ideal consumption.

There should be one consumer each partition for ideal consumption. So, for your case, 12 consumers should be ideal.

If you have N partitions, then you can have up to N consumers within the same consumer group each of which reading from a single partition. When you have less consumers than partitions, then some of the consumers will read from more than one partition. Also, if you have more consumers than partitions then some of the consumers will be inactive and will receive no messages at all.
You cannot have multiple consumers -within the same consumer group- consuming data from a single partition. Therefore, in order to consume data from the same partition using N consumers, you'd need to create N distinct consumer groups too.
Note that partitioning enhances the parallelism within a Kafka cluster. If you create thousands of consumers to consume data from only one partition, I suspect that you will lose some level of parallelism.

If you have 3 topics with 4 partition each.
For best optimisation you should have 4 consumers per consumer group.
Reason : If you have more than 4 consumers ,your extra consumers would be left ideal, because 4 consumers will be assigned 4 partitions with 1 consumer assigned 1 partition. So in short more than 4 consumers is not required per consumer group.
If you have less consumers say 2 consumers for 4 topics , each consumer will consume messages from 2 partitions each which will overload it.
There is no limit in number of consumer groups which subscribe to a topic.

Related

Will consumers in a group subscribe to one topic each, if there is a single partition per topic?

I'm using Debezium to log changes in my database, and Debezium generates change events in a topic for each table that exists in my database.These change records are consumed to populate another database.
If I restrict each topic to only have 1 partition, and let's say I have 4 consumers running, when the consumers subscribe to topics, will the 4 consumers divide the topics among themselves? (they would distribute partitions among themselves, but here 1 topic = 1 partition)
Would the above setup mean that for each table, the generated events on the topic will always be executed in order because there's at most 1 consumer acting on the topic at the same time?
I'm currently trying to get Kafka to restrict 1 partition per topic, and have 2 consumers. But the 2 consumers seem to pick up different topics every now and then and not be 'sticky' to the topics.
Yes, if there are 4 topics, 1 partition in each topic and 4 consumers, the consumers will evenly distribute the partitions among themselves which will result in 1 topic per consumer.
However, to get a "sticky assignment" you would need to give the consumer groups static group IDs. Otherwise, when there are failures and a rebalance is triggered, the consumer can be assigned a different partition (in this case a different topic).

consume kafka topics with different partition numbers

Hi i have a kafka consumer (using spring kafka dependency) that listens to multiple topics. Lets say i have 3 topics which are topicA, topicB and topicC. In my application i consume all three topics in one consumer like below.
#KafkaListener(topics = "topicA,topicB,topicC", groupId = "myGroup", concurrency="3")
My topics have partitions and those number of partitions are deferent from each. Lets say my topicA has 3 partitions. topicB have 6 partitions and topicC has 9 partitions. How should i determine a number for "concurrency" option in #KafkaListener. (I'm confused since topicB and topicC contain 6 and 9 partitions respectively. So should i change the concurrency to 6 or 9 ? or should i change it to 18 which is total number of partitions from 3 topics)
I know that on the consumer side, Kafka always gives a single partition’s data to one consumer thread and the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed.
My main goal is to consume parallelly by using concurrency option in #kafkalistener
If you set the concurrency to 18, with the default partition assignor, if the concurrency is greater than the number of partitions, you will have idle consumers. The partitions from different topics have no bearing on how the partitions are distributed.
You can use a custom partition assignor (in the consumer configuration) to distribute the partitions differently.
See https://kafka.apache.org/documentation/#consumerconfigs_partition.assignment.strategy
Also see the discussion about RoundRobinAssignor here https://docs.spring.io/spring-kafka/docs/current/reference/html/#using-ConcurrentMessageListenerContainer
Or, simply add 3 separate #KafkaListener annotations to the method, one for each topic, with different concurrencies.

Can I have all the consumers of a group consume message from all the partitions of a kafka topic?

Let's say in Kafka I have 4 partitions of a topic 'A' and I have 20 consumers of Consumer Group 'AC'. I don't need any ordering, but I want to process the messages faster by scaling my consumer instances. Please note all messages are independent and can be processed independently.
I looked at a consumer configuration partition.assignment.strategy, but not sure if I can achieve dynamic assignment of consumer to partition, depending on the message availability.
One partition is assigned to exactly one consumer in the group. In your case you have only 4 consumers on 20 which are currently working. You have to increase partitions number if you want more assigned consumers.

Kafka Consumer from different group consuming from different partition of Topic

I have a scenario where I have deployed 4 instances of Kafka Consumer on different nodes. My topic has 4 partitions. Now, I want to configure the Consumers in such a way that they all fetch from different partitions of the topic.
I know for a fact that if the Consumers are from the same consumer group, they ensure that the partitions are split equally. But in my case, they are not in the same group.
In order to achieve what you want you need the consumers being in the same consumer group. Only in this case a "competing consumer" pattern is applied : each consumer receives 1 partition from the 4, so you have 4 consumers each one reading from 1 partition and receiving messages for that partitions.
When consumers are part of different consumer groups, each consumer will be assigned to all 4 partitions receiving messages from all of them in a publish/subscribe way.

How kafka partitions behave

Can you explain how kafka partitions works for this scenario
If i produce 9 (1-9) messages round robin with 1 topic & 3 partitions.
Does it means that:
Partition 1 contains: [1,4,7]
Partition 2 contains: [2,5,8]
Partition 3 contains: [3,6,9]
?
Also how many consumers can get all the data 3? why?
Can you explain?
I guess also that consumer group can solve it but not sure why
Can you explain how kafka partitions works for this scenario
Your understanding is correct.
Also how many consumers can get all the data 3? why?
Depends on how many consumers you have in your consumer group.
If you only have 1 consumer in a group, it will get all the messages from all partitions.
If you have 2 consumers in a group, each will claim a subset of the partitions, e.g. 1st consumer will get all messages from partitions 1 and 2 and the 2nd consumer will get messages from partition 3.
If you have 3 consumers in a group, each will get one partition assigned.
If you have more than 3 consumers in a group, 3 consumers will get one partition each and the remaining consumers will not get any messages, just act as redundancy in case of failover.
The distribution of messages in the partitions is correct if and only if you publish messages without keys. In Kafka it is common to publish messages as (Key, Value) pairs and if you produce messages this way then the default partitioner will ensure that all messages of the same key will get put in the same partition. It does this by using a hashing function on each of the keys that maps to one of the available partitions. In the extreme case where all your messages have the same key, then they would all go to the same partition. If your messages all had either a string key "foo" or a key called "bar" then all the messages with key "foo" may go to partition 3 and all the messages with key "bar" may go to partition 1.
In terms of your question about consumers, you can have an unlimited number of consumers. If each consumer has a unique group.id then they are considered independent and they will each get their own full set of the messages from all partitions.
However if you have consumers that share the same group.id then they are said to be in a consumer group and each will get an exclusive and roughly equal subset of the partitions. If you had 3 consumers in the same group they would get 1 partition each. If you added any more than 3 consumers in the same group then the first 3 will get 1 partition each and all the others will be standby consumers than only become active if one of the 3 active consumers leaves the group.
The distribution of the messages through the partitions is correct in the idea. The partitions are the paralelism unit of Kafka.
You can have 3 consumers which will each handle one partition, but you can also have only 1 consumer which will get the data from the 3 partitions. It depends on the throughput you can have/want for each consumer.
Concerning the consumer groups :
If all your consumers have the same consumer group, the messages will be load balanced over the consumers
If your consumers have different consumer groups, then each messages will be broadcast to all consumer processes
FYI : the messages order is only kept within a partition, that is why messages coming from different partitions could be unordered.