Kafka Consumer partition mapping - apache-kafka

I have 100 consumers in same group listening to same topic and 100 partition. So as per the documentation each consumer should only listen to one partition since there are 100 consumers and 100 partitions. I produce the message to kafka using a key. So some message with the same key should go in the same partition and should always be consumed by the same consumer in the group. But in my case multiple messages with the same key are consumed multiple consumers randomly. Any way to do that all messages from a partition are consumed by only one specific consumer in the group. I do not want to explicitly assign partition to consumers.

Verify that your message partitioning is working as expected from the producer side
If you have 100 consumers using same consumer group id for a 100 partitions topic , each consumer will get exactly 1 partition to consume from.

Related

Kafka consumer in group skips the partitions

I've a single consumer which consumes a topic. Topic has 6 partitions. Single consumer assigned to the group.
I do poll like below
Consumer.poll(10000)
I exit the consumer fetch when no records return.
From the documentation I believe poll return empty when no records to consume and duration 10000 is enough to rebalance and fetch records.
Most of the times poll consumes records from all partions but some times poll fetched record from 3 partitions and return empty records with out consuming other 3 partitons.
BTW, I used 2.0.1 Kafka client and Kafka server version is 2.11 - 2.2.0.
Any one have idea why my consumer skipping other partitions and return empty records.what should I do to consume all partitions.
max.poll.records parameter is 500 in default. So sometimes it's possible to not be able to get all messages from all partitions in the topic with one poll().
max.poll.records: The maximum number of records returned in a single
call to poll().
By the way having just one consumer in group is not appropriate way to consume a topic with partitions. Your number of consumers in consumer group should be equals to number of partitions in topic subscribed in best practice. (Kafka assigns partitions to consumers evenly by default) Otherwise you cannot scale load horizontally, and having partitions is not so meaningful in that case.
Kafka always assigns partitions to consumers. It is not possible to have a partition which is not assigned to a consumer. (If this topic is subscribed)
But in your case because you exit consumer it takes some time (session.timeout.ms) to consider this consumer as dead by Kafka. If you start the consumer again without waiting session.timeout.ms to pass, then Kafka realizes that there is two active consumers in consumer group and assigns partitions evenly to this two consumers. (like: partitions 0, 1, 2 to consumer-1 and partitions 3, 4, 5 to consumer-2) But after Kafka realizes that one of the consumer is dead, rebalance is started in the consumer group and all partitions are assigned to one active consumer in consumer group.
session.timeout.ms: The timeout used to detect client failures when
using Kafka's group management facility. The client sends periodic
heartbeats to indicate its liveness to the broker. If no heartbeats
are received by the broker before the expiration of this session
timeout, then the broker will remove this client from the group and
initiate a rebalance. Note that the value must be in the allowable
range as configured in the broker configuration by
group.min.session.timeout.ms and group.max.session.timeout.ms
You can check current partition assignment for your consumer-group with this cli command in broker side:
./kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group yourConsumerGroup

How can I control request/message send by a Kafka cluster?

Suppose, I have 3 Kafka broker, a zookeeper, 50 producers, 50 consumers, and 1 topics (testTopic1).
And All the consumer are subscribed to testTopic1. Now I will send 50 messages at the same time with the 50 producers to the same topic (testTopic1) . Now I want that Kafka cluster do not send more than 40 messages at the same time to consumers. The remaining 10 will keep on queue or drop it.
Maybe it is a load balancing in Kafka.
I do not understand how I will do this work. Im new in Kafka please help.
Kafka brokers are dumb. They cant limit/remove message published to kafka.
If all kafka consumers are part of same consumer group, and there are 50 consumers, then all consumers may or may not receive all those 50 messages at same time, depending on the key. If multiple messages have same key then all same key messages will be listened by single consumer one by one. If all 50 messages have distinct keys, then they they may or may not (depending on hash of the key) will be listened by same or different consumers.
Can you explain your use case more for better understanding.
Kafka broker cannot drop messages randomly. But you can implement logic within consumer to drop the message while processing.
If you have a single topic and single partition for that topic; one among your consumers belong to the same consumer group will process all your messages since partition guaranteed ordering in processing in consumer end.
If you have 10 consumer groups and each belongs to 5 consumers and there is a single partition for the topic, at least 10 consumers process your message from topic. In case one of the consumer from consumer-group-1 fails to process the message, another consumer from same consumer group will process the message.
If you have the requirement to drop randomly 1 out 10 messages while processing, you can achieve it by adjusting the logic in consumer end. But as per consumer group offset according to broker all data is processed in its end, if system configured to maintain offset management in brokers side.

Can I have all the consumers of a group consume message from all the partitions of a kafka topic?

Let's say in Kafka I have 4 partitions of a topic 'A' and I have 20 consumers of Consumer Group 'AC'. I don't need any ordering, but I want to process the messages faster by scaling my consumer instances. Please note all messages are independent and can be processed independently.
I looked at a consumer configuration partition.assignment.strategy, but not sure if I can achieve dynamic assignment of consumer to partition, depending on the message availability.
One partition is assigned to exactly one consumer in the group. In your case you have only 4 consumers on 20 which are currently working. You have to increase partitions number if you want more assigned consumers.

What Happens when there is only one partition in Kafka topic and multiple consumers?

I have a Kafka Topic with only one partition and I am not getting what will happen in following cases? How messages will be delivered to consumers?
If all consumers are in same group
If all consumers are in different group
I am not sure if consumers will receive unique messages or duplicate ones.
Each Consumer subscribes to a/more partition in a topic. And each consumer belongs to a consumer group. Below are two scenarios:
When all consumers belong to the same group : Each consumer will try to subscribe to a different partition. In case,if there is only one partition, only one consumer will get the messages, while other consumers will be idle.
When all consumers belong to the different consumer group: Each consumer will get the messages from all partitions. Partition subscription is based on the consumer groups.
It depends on the consumer groups. Consumers within the same consumer group don't read the data again from the same partitions once the read offsets have been committed.

kafka consumer reads the same message

I have a single Topic with 5 partitions.
I have 5 threads, each creating a Consumer
All consumer are with the same consumer group using group.id.
I also gave each consumer a different and unique client.id
I see that 2 consumers are reading the same message to process
Should kafka handle this?
How do I troubleshoot it?
Consumers within the same group should not receive the same messages. The partitions should be split across all consumers and at any time Kafka's consumer group logic ensures only 1 consumer is assigned to each partition.
The exception is if 1 consumer crashes before it's able to commit its offset. In that case, the new consumer that gets assigned the partition will re-consume from the last committed offset.
You can use the consumer group tool kafka-consumer-groups that comes with Kafka to check the partitions assigned to each consumer in your group.