I've a single consumer which consumes a topic. Topic has 6 partitions. Single consumer assigned to the group.
I do poll like below
I exit the consumer fetch when no records return.
From the documentation I believe poll return empty when no records to consume and duration 10000 is enough to rebalance and fetch records.
Most of the times poll consumes records from all partions but some times poll fetched record from 3 partitions and return empty records with out consuming other 3 partitons.
BTW, I used 2.0.1 Kafka client and Kafka server version is 2.11 - 2.2.0.
Any one have idea why my consumer skipping other partitions and return empty records.what should I do to consume all partitions.
max.poll.records parameter is 500 in default. So sometimes it's possible to not be able to get all messages from all partitions in the topic with one poll().
max.poll.records: The maximum number of records returned in a single
call to poll().
By the way having just one consumer in group is not appropriate way to consume a topic with partitions. Your number of consumers in consumer group should be equals to number of partitions in topic subscribed in best practice. (Kafka assigns partitions to consumers evenly by default) Otherwise you cannot scale load horizontally, and having partitions is not so meaningful in that case.
Kafka always assigns partitions to consumers. It is not possible to have a partition which is not assigned to a consumer. (If this topic is subscribed)
But in your case because you exit consumer it takes some time (session.timeout.ms) to consider this consumer as dead by Kafka. If you start the consumer again without waiting session.timeout.ms to pass, then Kafka realizes that there is two active consumers in consumer group and assigns partitions evenly to this two consumers. (like: partitions 0, 1, 2 to consumer-1 and partitions 3, 4, 5 to consumer-2) But after Kafka realizes that one of the consumer is dead, rebalance is started in the consumer group and all partitions are assigned to one active consumer in consumer group.
session.timeout.ms: The timeout used to detect client failures when
using Kafka's group management facility. The client sends periodic
heartbeats to indicate its liveness to the broker. If no heartbeats
are received by the broker before the expiration of this session
timeout, then the broker will remove this client from the group and
initiate a rebalance. Note that the value must be in the allowable
range as configured in the broker configuration by
group.min.session.timeout.ms and group.max.session.timeout.ms
You can check current partition assignment for your consumer-group with this cli command in broker side:
./kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group yourConsumerGroup
I'm trying to wrap my head around kafka and the thing that confuses me are the partitions. From all/most of the examples I have seen the consumers/products seem to have implicit knowledge of the partitions, (which partition to write messages to, which partition to read messages from). Is this correct, I initially thought that partitions are internal to the system and the consumers/producers dont need to know partition information. If they need to know partition information then aren't we exposing the inner structure of the topic to a certain extent to the outside world?
In kafka every partition in a topic has a set of brokers, and at most one broker leader per partition. You cannot have more consumers of a topic than the number of partitions because otherwise some consumer would be inactive.You can have multiple partitions for a single consumer, but cannot have multiple consumers for a single partition. So the number of partitions must be chosen according to the throughput you expect. The number of partitions can be increased on a topic, but never decreased. When consumers connect to a partition they actually connect to the broker leader to consume messages.
Anyway the partition leader could change, so the consumer would get an error and should send the request for meta-data to the cluster controller in order to get the info on the new partition leader. At consumer startup partitions are assigned according to the kafka parameter partition.assignment.strategy. Of course if consumers start at different times on the same consumer group there will be partition rebalance.
Finally you need a lot of info on the kafka cluser structure as a client.
I have a use case where i have 2 consumers in different consumer groups(cg1 and cg2) subscribing to same topic(Topic A) with 4 partitions.
What happens if both consumers are reading from same partition and one of them failed and other one commited the offset?
In Kafka the offset management is done by Consumer Group per Partition.
If you have two consumer groups reading the same topic and even partition a commit from one consumer group will not have any impact to the other consumer group. The consumer groups are completely discoupled.
One consumer of a consumer group can read data from a single topic partition. A single consumer can't read data from multiple partitions of a topic.
Example Consumer 1 of Consumer Group 1 can read data of only single topic partition.
Offset management is done by the zookeeper.
__consumer_offsets: Every consumer group maintains its offset per topic partitions. Since v0.9 the information of committed offsets for every consumer group is stored in this internal topic (prior to v0.9 this information was stored on Zookeeper).
When the offset manager receives an OffsetCommitRequest, it appends the request to a special compacted Kafka topic named __consumer_offsets. Finally, the offset manager will send a successful offset commit response to the consumer, only when all the replicas of the offsets topic receive the offsets.
simultaneously two consumers from two different consumer groups(cg1 and cg2) can read the data from same topic.
In kafka 1: Offset management is taken care by zookeeper.
In kafka 2: offsets of each consumer is stored at __Consumer_offsets topic
Offset used for keeping the track of consumers (how much records consumed by consumers), let say consumer-1 consume 10 records and consumer-2 consume-20 records and suddenly consumer-1 got died now whenever the consumer-1 will up then it will start reading from 11th record onward.
Lets say we have two consumers for a topic with one partition. At first, first consumes messages from the topic and second remains idle. If first fails, second takes over and starts consuming the messages.
When the first again comes alive, will it again start consuming messages and make the second idle?
How to achieve this?
Consumers are a part of a consumer group, defined by the consumer's group.id. For n partitions, the max number of active consumers in a consumer group is n. You can have more, but they will be idle.
For example, imagine a topic with 6 partitions. If you have 6
consumers in a consumer group, each consumer will read from 1
partition. If you have 12, six of the consumers will remain idle while
the other six consume from 1 partition. If you have 3 consumers, each
consumer will read from 2 partitions.
In your case, for a topic with 1 partition, only 1 consumer for each consumer group can be actively consuming at a time. If you have 2 consumers in your consumer group, then consumer-1 will consume all messages from the single partition. If that consumer fails, consumer-2 will start consuming at the last known offset of consumer-1. If consumer-1 comes back online, it will remain idle until consumer-2 fails. All consumers are treated equally.
I have a single Topic with 5 partitions.
I have 5 threads, each creating a Consumer
All consumer are with the same consumer group using group.id.
I also gave each consumer a different and unique client.id
I see that 2 consumers are reading the same message to process
Should kafka handle this?
How do I troubleshoot it?
Consumers within the same group should not receive the same messages. The partitions should be split across all consumers and at any time Kafka's consumer group logic ensures only 1 consumer is assigned to each partition.
The exception is if 1 consumer crashes before it's able to commit its offset. In that case, the new consumer that gets assigned the partition will re-consume from the last committed offset.
You can use the consumer group tool kafka-consumer-groups that comes with Kafka to check the partitions assigned to each consumer in your group.
I have 100 consumers in same group listening to same topic and 100 partition. So as per the documentation each consumer should only listen to one partition since there are 100 consumers and 100 partitions. I produce the message to kafka using a key. So some message with the same key should go in the same partition and should always be consumed by the same consumer in the group. But in my case multiple messages with the same key are consumed multiple consumers randomly. Any way to do that all messages from a partition are consumed by only one specific consumer in the group. I do not want to explicitly assign partition to consumers.
Verify that your message partitioning is working as expected from the producer side
If you have 100 consumers using same consumer group id for a 100 partitions topic , each consumer will get exactly 1 partition to consume from.