To achieve concurrency in kafka listener, how are defining consumer groups using groupId & concurrency in #KafkaListener different? - apache-kafka

If we have > 1 partition for a topic, then we can have a consumer group, now the different consumers in this group will split the partitions they want to read from.
We have one more option where in we don't use consumer group, but define concurrency = 2 (for eg), now 2 instances of the consumer runs each reading from a different partition.
How are these 2 different ? or are they same under the hood ?

Yes, we can treat that concurrency option of #KafkaListener as number of consumer group members. From Kafka perspective it is indeed the same if we would just start another instance of our application.
We don't use a consumer group only if we do a manual assignment. Otherwise an auto-generated id for #KafkaListener is used as a groupId. Or an exception is thrown:
Assert.state(hasGroupIdConsumerConfig || StringUtils.hasText(this.containerProperties.getGroupId()),
"No group.id found in consumer config, container properties, or #KafkaListener annotation; "
+ "a group.id is required when group management is used.");

Related

Two Kafka Consumer application having same group.id and same consumer.id

I have 2 different application instance consuming message from a topic, both application have same values for group.id and consumer.id
Will message is read by only 1 application or by both application? Also if we have same consumer.id in 2 different application, will it considered as 2 consumers in one group or only single consumer in that group?
E.g. App1 instance, group.id = conGrp1, consumer.id = consumer
App2 instance, group.id = conGrp1, consumer.id = consumer1
Do we still have only one group with one consumer, even though 2 different application instance are running?
both application have same values for group.id and consumer.id
Then, both are part of the same group, each consumer not overlapping in consumed data - any given message is only seen in one consumer of the group
The group id is what determines this behavior. The consumer id is just a friendly name to find in the metrics or consumer group command, I believe, so you still only have one group. I'm not it's a good idea to put unique instances of the application with the same consumer id, even if part of the same group
difference between groupid and consumerid in Kafka consumer

#KafkaListener concurrency multiple topics

I want to create a concurrent #KafkaListener which can handle multiple topics each with different number of partitions.
I have noticed that Spring-Kafka only initializes one consumer per partition for the topic with most partitions.
Example: I have set concurrency to 8. I got a #KafkaListener listening to the following topics. Topic A has the most partitions - 5, so Spring-Kafka initializes 5 consumers. I expected Spring-Kafka to initialize 8 consumers, which is the maximum allowed according to my concurrency property.
Topic A has 5 partitions
Topic B has 3 partitions
Topic C has 1
What is the technical reason for not initializing more consumers?
How do I bypass this, such that I can initialize more consumers using the #KafkaListener annotation? (if possible at all)
When a listener is configured to listen to multiple topics, each consumer instance listens on all topics; Spring indeed starts 8 consumers (in this case), but the way those partitions are actually distributed across the consumers is controlled by Kafka's group management:
So you end up with 3 idle consumers in this case.
It might be possible to provide a custom partition.assignment.strategy to do the distribution the way you want, but I've never looked into that.
EDIT
I just tested with the RoundRobinAssignor...
spring.kafka.consumer.properties.partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor
and...

Kafka Consumer default Group Id

I'm working with Apache Kafka and its Java client and I see that messages are load balanced across different Kafka Consumers belonging to the same group (i.e. sharing the same group id).
In my application I need all consumers to read all messages.
So I have several questions:
if I don't set any group id in the Consumer Properties, what group id will the Kafka Consumer be given?
Is there a single default value?
Does the client create a random value each time?
Do I need to create a different id for each consumer to be sure that each one receives all messages?
EDIT:
Thank you for your answers.
You are correct: if one doesn't set the consumer group id, Kafka should complain.
However, I have found out that if the group id is null, the Java client sets it to the empty string "" to avoid problems.
So apparently that is the default value I was looking for.
Surprising all my consumers, even if I don't set their groupIds (and so they are all with groupId == "") seem to receive all the messages the producer writes.
I still can't explain this: any suggestions?
if I don't set any group id in the Consumer Properties, what group id will the Kafka Consumer be given?
The kafka consumer will not have any consumer group. Instead you will get this error : The configured groupId is invalid
Is there a single default value?
Yes, you can see the consumer.properties file of kafka for reference. The default consumer group id is: group.id=test-consumer-group
Does the client create a random value each time?
No, groupId seems to be mandatory for Java client starting Kafka 0.9.0.x consumers. You can refer to this JIRA: https://issues.apache.org/jira/browse/KAFKA-2648
Do I need to create a different id for each consumer to be sure that each one receives all messages?
Yes, if all consumers use the same group id, messages in a topic are distributed among those consumers. In other words, each consumer will get a non-overlapping subset of the messages. Having more consumers in the same group increases the degree of parallelism and the overall throughput of consumption. On the other hand, if each consumer is in its own group, each consumer will get a full copy of all messages.
Don't want to repeat other answers, but just to point out something: You don't actually need a consumer group to consume all messages. The Kafka Consumer API (assuming we're dealing with the Java one) has both a subscribe() and an assign() method. If you want all consumers to receive all messages without load balancing (which is what essentially consumer groups are for), you can just invoke assign() on all consumers, passing it all the partitions for the topic, optionally followed by seek() to set the offsets; that way your consumers will get all messages.
This way Kafka will not manage partition assignment and will not persist offsets — the consumer is responsible for all that. Depending on your use case, it may be a better approach comparing to having a consumer group per consumer.
I have same problem. And took some time to research this issue.
The project spring-cloud-stream will check whether you have set the group id for consumer. If not, spring-cloud-stream will create a random value as group id.
Please refer the method createConsumerEndpoint in class KafkaMessageChannelBinder.
Check the groupId from-
#KafkaListener(topics = "${kafka.topic}", groupId = "groupIdName")
Steps->
Go to Kafka folder
Open config folder
Open consumer.properties
Change the group id
group.id=groupIdName
If don't set group.id,you will get error when consume topic data.
org.apache.kafka.common.errors.InvalidGroupIdException: The configured groupId is invalid
22:08:14.132 [testAuto-kafka-consumer-1] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - (Re-)joining group
22:08:14.132 [testAuto-kafka-consumer-1] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Sending JoinGroup ({group_id=,session_timeout=15000,rebalance_timeout=300000,member_id=,protocol_type=consumer,group_protocols=[{protocol_name=range,protocol_metadata=java.nio.HeapByteBuffer[pos=0 lim=18 cap=18]}]}) to coordinator bogon:9092 (id: 2147483647 rack: null)
22:08:14.132 [testAuto-kafka-consumer-1] ERROR org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Attempt to join group failed due to fatal error: The configured groupId is invalid
22:08:14.132 [testAuto-kafka-consumer-1] ERROR org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer - Container exception
According to KIP-289 the default group.id has been "improved" and the default group.id, since kafka clients version 2.2.0, is null.
KIP-289: Improve the default group id behavior in KafkaConsumer.
It seems to me that when using assign you can forgo the group.id, leaving it to null, and there will be no offsets available.
With this exception missing groupId,
The library says that the default is an empty string but this did not work for me. What worked for me was rather a single space between quotes i.e groupId =" " and not groupId=""
if you are using nodejs like me. You can check if clientId is empty.
const kafka = new Kafka({ clientId, brokers })
const consumer = kafka.consumer({ groupId: clientId })

Why kafka 0.8.2 say that each partition is consumed by exactly one consumer in a consumer group

In Apache Kafka 0.8.2 office document, section 5.6 Distribution, Consumers and Consumer Groups subsection, it says that
The consumers in a group divide up the partitions as fairly as
possible, each partition is consumed by exactly one consumer in a
consumer group.
But I have found that in practice, it is possible that multiple consumers in a consumer group can consuming data from a single partition by sending FetchRequest from the same topic-partition.
And in the followed Consumer Id Registry subsection
In addition to the group_id which is shared by all consumers in a
group, each consumer is given a transient, unique consumer_id (of the
form hostname:uuid) for identification purposes. Consumer ids are
registered in the following directory.
/consumers/[group_id]/ids/[consumer_id] --> {"topic1": #streams, ...,
"topicN": #streams} (ephemeral node)
It says there is a unique id for each consumer. However, I could not found such structure in zookeeper.
I do not know when consumer start to register? The client library I used is kakfa-python 0.9.4.
May this help
(1) For your second question.
https://github.com/dpkp/kafka-python/issues/472
And issue38
It said "Coordinated Consumer Group support is under development."
(2) For your first question.
It said "This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. "(statement A). This depends on clients implements. This may be not right in some kafka clients. I just have experience in python and cpp. If group was implemented, each message is consumed by exactly one consumer in the group. How to assign partitions between consumers in one group is different. When there are more partitions than consumers, Statement A may be right. But it is also possible that the partitions may be re-assigned when new partitions join or leave the existing group. In this case, partition A may be consumed by consumer A firstly and then consumed by consumer B, which is possible. In some clients, you can choose the assignment algorithms, such as round-robin, and so on.

Multiple consumer groups in kafka

i am new to kafka , my question is how to create multiple consumer groups with multiple consumer instances and assign that consumer instances to consume from specific broker or partition ? For eg: i have to implement as shown in this example image
Consumer groups relate to the high level consumer API while the ability to choose broker or partition to consume from relates to the simple consumer API.
The high level API will do rebalancing among consumers in a group automatically for you but it will consume all partitions for a given topic.
If you want to consume only from specific partitions within a topic, you need to use the simple consumer API and you'll have to deal with partition assignment yourself. There is an example of how to do this in the Kafka wiki.