Kafka consumer is taking time to recognize new partition - apache-kafka

I was running a test where kafka consumer was reading data from multiple partitions of a topic. While the process was running I added more partitions. It took around 5 minutes for consumer thread to read data from the new partition. I have found this configuration "topic.metadata.refresh.interval.ms", but this is for producer only. Is there a similar config for consumer too?

When we add more partitions to an existing topic then a rebalance process gets initiated.
Every consumer in a consumer group is assigned one or more topic partitions exclusively, and Rebalance is the re-assignment of partition ownership among consumers.
A Rebalance happens when:
consumer JOINS the group
consumer SHUTS DOWN cleanly
consumer is considered DEAD by the group coordinator. This may happen after a
crash or when the consumer is busy with long-running processing, which means
that no heartbeats have been sent in the meanwhile by the consumer to the
group coordinator within the configured session interval
new partitions are added
We need to provide two parameters to reduce the time to rebalance.
request.timeout.ms
max.poll.interval.ms
More detailed information is available at the following.
https://medium.com/streamthoughts/apache-kafka-rebalance-protocol-or-the-magic-behind-your-streams-applications-e94baf68e4f2

I changed "metadata.max.age.ms" parameter value to refresh matadata https://kafka.apache.org/documentation/#consumerconfigs_metadata.max.age.ms

Related

All paused partitions assigned to a consumer are revoked when another consumer joins the same group

I have a consumer subscribed to a topic and it is polling 3 partitions. Because of my specific use case, this consumer invokes pause(TopicPartition) method, so all partitions are going to be set in paused. Anyway, it continues polling in order to send the heartbeat. Then, a second consumer joins the same group. Coordinator rebalances partitions. It is expected that the 3 partitions being balanced between both consumers, however all partitions (3) are revoked from the first consumer and they are assigned to the second one.
I could verify that in my first consumer at rebalance time (with all partitions in pause). The Coordinator invokes ConsumerRebalanceListener.onPartitionsRevoked but it never invokes ConsumerRebalanceListener.onPartitionsAssigned
Why Coordinator revoke all partitions from the first consumer? Is it related to paused partitions? Has anyone experienced the same behavior when rebalancing paused partitions?

Does Kafka guard against consumers committing after a network partition?

Suppose:
Kakfa consumer reads a message M from its assigned partition P
It gets network-partitioned away from the broker
Kafka detects this and reassigns P to a consumer on another machine.
Network-partition is healed, and the first consumer tries to commit the offset for message M
Will there be any exception thrown in step 4? Is there a check which detects that the first consumer is no longer assigned partition P, and so shouldn't be committing offsets for it?
In this case, the first consumer can not commit off set anymore. It's assigned partition will be revoked and when it join consumer group again, a rebalancing process will be trigger. So the answer is yes.
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
The offsets for a given consumer group are maintained by a specific broker called the group coordinator. i.e., a consumer needs to issue its offset commit and fetch requests to this specific broker
Since one group coordinator is responsible and it know which consumers have partition assigned and which consumers have no longer the partition assigned
When they rejoin, they are treated in the same way as a brand new consumer joining the consumer group. They will get one or more partitions assigned by rebalancing (which may be totally different from the partitions they were reading from last time) and the consumer_offset topic will inform them where to start reading from.

Kafka behavior during partition re-balancing

Given the following scenario: There is a Kafka (2.1.1) topic with 2 partitions and one consumer. A producer sends a message with keyX to Kafka which ends up on partition 2. The consumer starts processing this message. At the same time a new consumer is starting up and Kafka re-balances the topic. Consumer 1 is now responsible only for partition 1, consumer 2 is responsible for partition 2. The producer sends a message again with the same keyX, this time it will be consumer 2 which processes the message.
Consumer 2 might be processing the message, while consumer 1 has not finished yet.
My question is whether this is a realistic scenario or not, since it might be a problem for me if different consumers would process a message with the same key at the same time.
Any thought on this is welcome, thanks a lot!
Yes, it's a realistic scenario. Nevertheless, during a rebalance consumer 1 will closed all of its existing connections. In your case, consumer 1 will closed connections to partition 1 and 2 so it may not have committed its offset before message processing. It may depend if you have configured your consumer with the property enable.auto.commit to true. With this property set to true, consumer will periodically commit its current offset. The period is defined with auto.commit.interval.ms.
You can also be nofity when a rebalance occurs thanks to consumer listener [ConsumerRebalanceListener][1]. It enables to know when a partition is revoked or reassigned.

Kafka multiple consumer

When we have multiple consumer reading from the topic with single partition Is there any possibility that all the consumer will get all the message.
I have created the two consumers with manual offset commit.started the first consumer and after 2 mins started 2nd consumer . The second consumer is reading from the message from where the 1st consumer stopped reading. Is there any possibility that the 2nd consumer will read all the message from beginning.I'm new to kafka please help me out.
In your consumer, you would be using commitSync which commits offset returned on last poll. Now, when you start your 2nd consumer, since it is in same consumer group it will read messages from last committed offset.
Messages which your consumer will consumes depends on the ConsumerGroup it belongs to. Suppose you have 2 partitions and 2 consumers in single Consumer Group, then each consumer will read from different partitions which helps to achieve parallelism.
So, if you want your 2nd consumer to read from beginning, you can do one of 2 things:
a) Try putting 2nd consumer in different consumer group. For this consumer group, there won't be any offset stored anywhere. At this time, auto.offset.reset config will decide the starting offset. Set auto.offset.reset to earliest(reset the offset to earliest offset) or to latest(reset the offset to latest offset).
b) Seek to start of all partitions your consumer is assigned by using: consumer.seekToBeginning(consumer.assignment())
Documentation: https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning-java.util.Collection-
https://kafka.apache.org/documentation/#consumerconfigs
Partition is always assigned to unique consumer in single consumer group irrespective of multiplpe consumers. It means only that consumer can read the data and others won't consume data until the partition is assigned to them. When consumer goes down, partition rebalance happens and it will be assigned to another consumer. Since you are performing manual commit, new consumer will start reading from committed offset.

What does "Rebalancing" mean in Apache Kafka context?

I am a new user to Kafka and have been trialling it for about 2-3 weeks now. I believe at the moment I have a good understand of how Kafka works for the most part, but after attempting to fit the API for my own Kafka consumer (this is obscure but I'm following the guidelines for the new KafkaConsumer that is supposed to be available for v 0.9, which is out on the 'trunk' repo atm) I've had latency issues consuming from a topic if I have multiple consumers with the same groupID.
In this setup, my console consistently logs issues regarding a 'rebalance triggering'. Do rebalances occur when I add new consumers to a consumer group and are they triggered in order to figure out which consumer instance in the same groupID will get which partitions or are rebalances used for something else entirely?
I also came across this passage from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design and I just can't seem to understand it, so if someone could help me make sense of it that would be much appreciated:
Rebalancing is the process where a group of consumer instances
(belonging to the same group) co-ordinate to own a mutually exclusive
set of partitions of topics that the group is subscribed to. At the
end of a successful rebalance operation for a consumer group, every
partition for all subscribed topics will be owned by a single consumer
instance within the group. The way rebalancing works is as follows.
Every broker is elected as the coordinator for a subset of the
consumer groups. The co-ordinator broker for a group is responsible
for orchestrating a rebalance operation on consumer group membership
changes or partition changes for the subscribed topics. It is also
responsible for communicating the resulting partition ownership
configuration to all consumers of the group undergoing a rebalance
operation.
When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up.
the command for this is: rebalance.max.retries and is set to 4 by default.
also, it might be happening if the following is true:
ZooKeeper session timeout. If the consumer fails to send a heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.
Hope this helps!
Rebalance is the re-assignment of partition ownership among consumers within a given consumer group. Remember that every consumer in a consumer group is assigned one or more topic partitions exclusively.
A Rebalance happens when:
a consumer JOINS the group
a consumer SHUTS DOWN cleanly
a consumer is considered DEAD by the group coordinator. This may happen after a crash or when the consumer is busy with a long-running processing, which means that no heartbeats has been sent in the meanwhile by the consumer to the group coordinator within the configured session interval
new partitions are added
Being a group coordinator (one of the brokers in the cluster) and a group leader (the first consumer that joins a group) designated for a consumer group, Rebalance can be more or less described as follows:
the leader receives a list of all consumers in the group from the
group coordinator (this will include all consumers that sent a
heartbeat recently and which are therefore considered alive) and is
responsible for assigning a subset of partitions to each consumer.
After deciding on the partition assignment (Kafka has a couple built-in partition assignment policies), the group leader sends
the list of assignments to the group coordinator, which sends this
information to all the consumers.
This applies to Kafka 0.9, but I'm quite sure for newer versions is still valid.
Consumer rebalance decide which consumer is responsible for which subset of all available partitions for some topic(s).
For example, you might have a topic with 20 partitions and 10 consumers; at the end of a rebalance, you might expect each consumer to be reading from 2 partitions. If you shut down 10 of those consumers, you might expect each consumer to have 1 partition after a rebalance has completed. Consumer rebalance is a dynamic partition assignment that can handle automatically by Kafka.
A Group Coordinator is one of the brokers responsible to communicate with consumers to achieve rebalances between consumers.In earlier version Zookeeper stored metadata details but the latest version, it stores on brokers. The consumer coordinators receive heartbeat and polling from all consumers of the consumer groups so be aware of each consumer's heartbeat and manager their offset on partitions.
Group Leader:
One of a consumer Group work as group leader which is chosen by the Group coordinator and will responsible for making partition assignment decision on behalf of all consumers in a group.
Rebalance Scenario:
Consumer Group subscribes to any topics
A Consumer instance could not able to send a heartbeat with a session.heart.beat time interval.
Consumer long process exceeds the poll timeout
Consumer of Consumer group through exception
New partition added.
Scaling Up and Down consumer. Added new consumer or remove existing consumer manually for
Consumer Rebalance
Consumer rebalance initiated when consumer requests to join a group or leave a group. The Group Leader receives a list of all active consumers from the Group Coordinator. Group Leader decides partition(s) assigned to each consumer by using PartitionAssigner.
Once Group Leader finalize partition assignment it sends assignments list to Group Coordinator which send back this information to all consumer. Group only sends applicable partitions to their consumer not other consumer assigned partitions. Only the Group Leader aware of all consumers and their assigned partitions.
After the rebalance is complete, consumers start sending Heartbeat to the Group Coordinator that it's alive.
Consumers send an OffsetFetch request to the Group Coordinator to get the last committed offsets for their assigned partitions.
Consumers start consuming messaged for newly assigned partition.
State Management
While rebalancing, the Group coordinator set its state to Rebalance and wait for all consumers to re-join the group.
When the Group starts rebalancing, the group coordinator first switches its state to rebalance so that all interacting consumers are notified to rejoin the group.
Once rebalance completed Group coordinator create new generation ID and notified to all consumers and group proceed to sync stage where consumers send sync request and go to wait until group Leader finish generating new assign partition. Once consumers received a new assigned partition they moved to a stable stage.
Static Membership
This rebalancing is quite a heavy operation as it required to stop all consumers and wait to get the new assigned partition. On each rebalance always create new generation id means refresh everything. To solve this overhead Kafka 2.3+ introduced Static Membership to reduce unnecessary Rebalance. KIP-345
In Static Membership, the consumer state will persist and on Rebalance the same assignment will get apply. It uses a new group.instance.id to persist member identity. So even in the worst-case scenario member id get reshuffle to assign a new partition but still, the same consumer instance-id will get the same partition assignment
instanceId: A, memberId: 1, assignment: {0, 1, 2}
instanceId: B, memberId: 2, assignment: {3, 4, 5}
instanceId: C, memberId: 3, assignment: {6, 7, 8}
And after the restart:
instanceId: A, memberId: 4, assignment: {0, 1, 2}
instanceId: B, memberId: 2, assignment: {3, 4, 5}
instanceId: C, memberId: 3, assignment: {6, 7, 8}
Ref:
https://www.confluent.io/blog/kafka-rebalance-protocol-static-membership
https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances
Consumer Group, Consumer and Partition Rebalance
Kafka Consumer can consume/Subscribe to multiple topics and start receiving the messages. Kafka Consumer are typically part of consumer group. When multiple consumers are subscribed to a topic and belong to same consumer group, each consumer in the group will receive messages from a different subset of partitions in the topic.
So consumers in a consumer group share ownership of the partitions in the topics they subscribe to. When we add a new consumer to the group, it starts consuming messages from partitions previously consumed by another consumer. The same thing happen when a consumer shuts down or crashes; it leaves the group, and the partition it used to consume will be consumed by one of the remaining consumers. Reassignment of partitions to consumer also happen when the consumer group is consuming are modified like new partition are added.
"Moving partition ownership from one consumer to another is called rebalance" During a rebalance, consumers can not consumer messages so we can say that rebalance is a short window of unavailability to entire consumer group. It also leads to some other activity on consumer side like when partitions are moved from one consumer t another consumer , cosnumer lose its current state like if any data is cache then it need to refresh its cache , slowing down the overall application until consumer is setup its state again.
heartbeat.interval.ms
Consumer maintain membership in a consumer group and ownership of the partitions assigned to them is by sending heartbeats to Kafka broker designated as a group coordinator and it will be different for different consumer group. As long as consumer is sending heartbeat at a regular intervals then it is considered t be alive and continue processing messages from designated assigned partition Heartbeat are sent when consumer call the poll method ( to retrieve records from partition) and when it commit the records it has consumed.
If a consumer stop sending heartbeat for long time and its session will time out (controlled by session.timeout.ms) then group coordinator will consider it dead and as a result trigger a rebalance. If a consumer crashed and are not processing messages it will take group coordinator few seconds without heartbeat to decide it is dead and trigger rebalance. When closing a consumer cleanly , consumer will notify the group coordinator that it is leaving the group and coordinator will trigger the rebalance immediately , reducing the time of unavailability of messages.