Kafka Consumers subscribed to different topics in same consumer group - apache-kafka

I am starting with kafka , have a question on the consumer groups. We have an application where we want different consumers from same group subscribing to different topics. The grouping is done based on some business criteria . To be specific consumer 1 from group A and consumer 2 from group A are subscribed to Topic 1 and Topic 2 each with 10 partitions. Does this mean that there consumer 1 can scale to 10 and consumer 2 also can scale to 10 since they are subscribed to different topics. Is this a correct design

yes, since within a topic kafka try to assign partitions to consumers as equal as possible. The key is topic:consumer_group_id so it doesn't matter another_topic:same_consumer_group_id - it's another key, and consumer with topic:consumer_group_id could be scaled to partitions number

Related

Will consumers in a group subscribe to one topic each, if there is a single partition per topic?

I'm using Debezium to log changes in my database, and Debezium generates change events in a topic for each table that exists in my database.These change records are consumed to populate another database.
If I restrict each topic to only have 1 partition, and let's say I have 4 consumers running, when the consumers subscribe to topics, will the 4 consumers divide the topics among themselves? (they would distribute partitions among themselves, but here 1 topic = 1 partition)
Would the above setup mean that for each table, the generated events on the topic will always be executed in order because there's at most 1 consumer acting on the topic at the same time?
I'm currently trying to get Kafka to restrict 1 partition per topic, and have 2 consumers. But the 2 consumers seem to pick up different topics every now and then and not be 'sticky' to the topics.
Yes, if there are 4 topics, 1 partition in each topic and 4 consumers, the consumers will evenly distribute the partitions among themselves which will result in 1 topic per consumer.
However, to get a "sticky assignment" you would need to give the consumer groups static group IDs. Otherwise, when there are failures and a rebalance is triggered, the consumer can be assigned a different partition (in this case a different topic).

Number of consumers in kafka comsumer-group

If a producer has 3 topics and 4 partitions each topic, should the consumer group contains 4 or 12 consumers?
I want to achieve ideal consumption.
There should be one consumer each partition for ideal consumption. So, for your case, 12 consumers should be ideal.
If you have N partitions, then you can have up to N consumers within the same consumer group each of which reading from a single partition. When you have less consumers than partitions, then some of the consumers will read from more than one partition. Also, if you have more consumers than partitions then some of the consumers will be inactive and will receive no messages at all.
You cannot have multiple consumers -within the same consumer group- consuming data from a single partition. Therefore, in order to consume data from the same partition using N consumers, you'd need to create N distinct consumer groups too.
Note that partitioning enhances the parallelism within a Kafka cluster. If you create thousands of consumers to consume data from only one partition, I suspect that you will lose some level of parallelism.
If you have 3 topics with 4 partition each.
For best optimisation you should have 4 consumers per consumer group.
Reason : If you have more than 4 consumers ,your extra consumers would be left ideal, because 4 consumers will be assigned 4 partitions with 1 consumer assigned 1 partition. So in short more than 4 consumers is not required per consumer group.
If you have less consumers say 2 consumers for 4 topics , each consumer will consume messages from 2 partitions each which will overload it.
There is no limit in number of consumer groups which subscribe to a topic.

Kafka Consumer distribution not working as expected

I have Three topics each having three partitions on a cluster of kafka.
now, there are total 9 partitions. and when i create 9 consumers... the 6 are being idle. only three consumers are being used.
the expectation is: each consumer should pickup one partitions and hence, 9 consumer should pick up documents from 9 partitions
but what happens is:
one consumer picks up messages from three paritions one of different topic.
e.g. i have three topics Topic_A,Topic_B and Topic_C and three partitions each. hence parititions are as below:
Topic_A_0, Topic_A_1, Topic_A_2, Topic_B_0, Topic_B_1, Topic_B_2,
Topic_C_0, Topic_C_1, Topic_C_2
When i create 9 consumers,
the distribution works as below:
Consumer1: Topic_A_0,Topic_B_0,Topic_C_0
Consumer2: Topic_A_1,Topic_B_1,Topic_C_1
Consumer3: Topic_A_2,Topic_B_2,Topic_C_2
Consumer4,Consumer5,Consumer6,Consumer7,Consumer8,Consumer9 are idle
It should be
Consumer1: Topic_A_0
Consumer2: Topic_A_1
Consumer3: Topic_A_2
Consumer4: Topic_B_0
Consumer5: Topic_B_1
Consumer6: Topic_B_2
Consumer7: Topic_C_0
Consumer8: Topic_C_1
Consumer9: Topic_C_2
Is there any configuration i need to let all 9 consumer pick up messages from 9 unique parititons?
Make sure your all your consumers are subscribing to same set of topics under the same consumer group id. For the list of topics, you can pass a predefined list or a regular expression for consumers to subscribe from. The consumer-id can be set using group.id property in consumer.
The default partition assignment strategy doesn't work across topics so this is the expected behaviour. A similar question here : Kafka Consumers are balanced across topics

Kafka Consumer from different group consuming from different partition of Topic

I have a scenario where I have deployed 4 instances of Kafka Consumer on different nodes. My topic has 4 partitions. Now, I want to configure the Consumers in such a way that they all fetch from different partitions of the topic.
I know for a fact that if the Consumers are from the same consumer group, they ensure that the partitions are split equally. But in my case, they are not in the same group.
In order to achieve what you want you need the consumers being in the same consumer group. Only in this case a "competing consumer" pattern is applied : each consumer receives 1 partition from the 4, so you have 4 consumers each one reading from 1 partition and receiving messages for that partitions.
When consumers are part of different consumer groups, each consumer will be assigned to all 4 partitions receiving messages from all of them in a publish/subscribe way.

How kafka partitions behave

Can you explain how kafka partitions works for this scenario
If i produce 9 (1-9) messages round robin with 1 topic & 3 partitions.
Does it means that:
Partition 1 contains: [1,4,7]
Partition 2 contains: [2,5,8]
Partition 3 contains: [3,6,9]
?
Also how many consumers can get all the data 3? why?
Can you explain?
I guess also that consumer group can solve it but not sure why
Can you explain how kafka partitions works for this scenario
Your understanding is correct.
Also how many consumers can get all the data 3? why?
Depends on how many consumers you have in your consumer group.
If you only have 1 consumer in a group, it will get all the messages from all partitions.
If you have 2 consumers in a group, each will claim a subset of the partitions, e.g. 1st consumer will get all messages from partitions 1 and 2 and the 2nd consumer will get messages from partition 3.
If you have 3 consumers in a group, each will get one partition assigned.
If you have more than 3 consumers in a group, 3 consumers will get one partition each and the remaining consumers will not get any messages, just act as redundancy in case of failover.
The distribution of messages in the partitions is correct if and only if you publish messages without keys. In Kafka it is common to publish messages as (Key, Value) pairs and if you produce messages this way then the default partitioner will ensure that all messages of the same key will get put in the same partition. It does this by using a hashing function on each of the keys that maps to one of the available partitions. In the extreme case where all your messages have the same key, then they would all go to the same partition. If your messages all had either a string key "foo" or a key called "bar" then all the messages with key "foo" may go to partition 3 and all the messages with key "bar" may go to partition 1.
In terms of your question about consumers, you can have an unlimited number of consumers. If each consumer has a unique group.id then they are considered independent and they will each get their own full set of the messages from all partitions.
However if you have consumers that share the same group.id then they are said to be in a consumer group and each will get an exclusive and roughly equal subset of the partitions. If you had 3 consumers in the same group they would get 1 partition each. If you added any more than 3 consumers in the same group then the first 3 will get 1 partition each and all the others will be standby consumers than only become active if one of the 3 active consumers leaves the group.
The distribution of the messages through the partitions is correct in the idea. The partitions are the paralelism unit of Kafka.
You can have 3 consumers which will each handle one partition, but you can also have only 1 consumer which will get the data from the 3 partitions. It depends on the throughput you can have/want for each consumer.
Concerning the consumer groups :
If all your consumers have the same consumer group, the messages will be load balanced over the consumers
If your consumers have different consumer groups, then each messages will be broadcast to all consumer processes
FYI : the messages order is only kept within a partition, that is why messages coming from different partitions could be unordered.