Client rebalacing when leader election takes place - apache-kafka

I have a custom kafka setup, where my application and a kafka broker are placed in a single node.
To make sure that the app instance only consumes the partitions in that node(to reduce network overhead), I have a custom partition assignor assigned to all members of the group.
However, if a broker fails and then it rejoins the cluster, will that trigger a consumer re-balance ? Similarly, if I add a new broker and trigger the partition re-assignment script, would that also trigger a re-balance ?

Typically, a consumer rebalancing will happen when :
A consumer joins or leaves the Consumer Group.
A consumer fails to send an heartbeat request to the Broker Coordinator before reaching a timeout (see session.timeout.ms and heartbeat.interval.ms) managing the group.
A consumer does not invoke the poll() method frequently enough (see max.poll.interval.ms).
A consumer subscription has changed.
Metadata for a topic matching the subscription has changed (i.e: the number of partitions has been increased).
A new topic matching the subscription has been created (when using pattern).
A topic matching the subscrption has been deleted (when using pattern).
When a rebalancing is manually triggered using the using Java Consumer API (see Consumer#enforceRebalance()).
When the broker acting as coordinator of the group fails.
So, to answer your question adding a new broker will not trigger a partition-reassignment.
Here is blog post explaining how the rebalance protocol works Apache Kafka Rebalance Protocol, or the magic behind your streams applications.

Related

Kafka consumer is taking time to recognize new partition

I was running a test where kafka consumer was reading data from multiple partitions of a topic. While the process was running I added more partitions. It took around 5 minutes for consumer thread to read data from the new partition. I have found this configuration "topic.metadata.refresh.interval.ms", but this is for producer only. Is there a similar config for consumer too?
When we add more partitions to an existing topic then a rebalance process gets initiated.
Every consumer in a consumer group is assigned one or more topic partitions exclusively, and Rebalance is the re-assignment of partition ownership among consumers.
A Rebalance happens when:
consumer JOINS the group
consumer SHUTS DOWN cleanly
consumer is considered DEAD by the group coordinator. This may happen after a
crash or when the consumer is busy with long-running processing, which means
that no heartbeats have been sent in the meanwhile by the consumer to the
group coordinator within the configured session interval
new partitions are added
We need to provide two parameters to reduce the time to rebalance.
request.timeout.ms
max.poll.interval.ms
More detailed information is available at the following.
https://medium.com/streamthoughts/apache-kafka-rebalance-protocol-or-the-magic-behind-your-streams-applications-e94baf68e4f2
I changed "metadata.max.age.ms" parameter value to refresh matadata https://kafka.apache.org/documentation/#consumerconfigs_metadata.max.age.ms

how to automatically delete unused kafka consumers

I am using Kafka for a messaging application. For this application, there is a producer putting messages into a topic, and consumers registered to this topic, and consuming these messages. These consumers are Dockerized applications. For autoscaling purposes, each consumer, upon its creation, is registered as a consumer with a unique ID.
Assume the following scenario:
Consumer1 is created as a docker container, and registers itself as a consumer with ID Consumer1
Consumer2 is created as a docker container, and registers itself as a consumer with ID Consumer2
Now for whatever reason Consumer1 fails, and gets replaced by Consumer3 which registers itself as a consumer to kafka with an ID of Consumer3.
The problem is, Consumer1 is no longer used. On the long term, there will be multiple unused consumers.
Is there a way to dynamically and automatically know which consumers are no longer used and delete them?
If consumer1 and consumer3 belongs to the same consumer group, consumer3 will start reading messages from where consumer1 left off. This is because Kafka maintains the offset specific to a consumer group. So in case one among the consumers with same consumer group fails, others will use the offset and avoid reprocessing the data.
Kafka broker does not maintain the failed consumers log anywhere as you assume in your question.

Conditions in which Kafka Consumer (Group) triggers a rebalance

I was going through the Consumer Config for Kafka.
https://kafka.apache.org/documentation/#newconsumerconfigs
what are the parameters that will trigger a rebalance ?. For instance the following parameter will ?. Any other parameters which we need to change or will defaults suffice
connections.max.idle.ms Close idle connections after the number of milliseconds specified by this config. long 540000 medium
Also we have three different topics
Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics.
Assuming if the above scenario is valid (not necessarily the best practice) - if one of the topic is very light traffic, will it cause the Consumer group to rebalance.
A follow up question - what factors affect the rebalancing and its performance.
These condition will trigger a group rebalancing:
Number of partitions change for any of the subscribed list of topics
Topic is created or deleted
An existing member of the consumer group dies
A new member is added to an existing consumer group via the join API
Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics.
At least it is valid, as for good or bad, it depends on your detailed case.
This is supported by the official java client api, see this method definition:
public void subscribe(Collection<String> topics,
ConsumerRebalanceListener listener)
It accepts a collection of topics.
if one of the topic is very light traffic, will it cause the Consumer group to rebalance.
No, because this is not listed in conditions. If we just consider it from the topic aspect. only when the topic is deleted or partition counts changed, the rebalcance will happens,.
Update.
Thanks for #Hans Jespersen's comment about session and hearbeat.
this is quoted by kafka Consumer javadoc:
After subscribing to a set of topics, the consumer will automatically join the group when poll(long) is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the poll API sends periodic heartbeats to the server; when you stop calling poll (perhaps because an exception was thrown), then no heartbeats will be sent. If a period of the configured session timeout elapses before the server has received a heartbeat, then the consumer will be kicked out of the group and its partitions will be reassigned.
And In your question, you ask what are the parameters that will trigger a rebalance
In this case, there are two configs has relation with the rebalance. It is session.timeout.ms and max.poll.records. Its means is obvious.
And from this, We also could learn that it is a bad practice to do a lot work between the poll, overhead work maybe block the heartbeat and cause session timeout.

What is the difference in Kafka between a Consumer Group Coordinator and a Consumer Group Leader?

I see references to Kafka Consumer Group Coordinators and Consumer Group Leaders...
What is the difference?
What is the benefit from separating group management into two different sets of responsibilities?
1. What is the difference?
The consumer group coordinator is one of the brokers while the group leader is one of the consumer in a consumer group.
The group coordinator is nothing but one of the brokers which receives heartbeats (or polling for messages) from all consumers of a consumer group. Every consumer group has a group coordinator. If a consumer stops sending heartbeats, the coordinator will trigger a rebalance.
2. What is the benefit from separating group management into two different sets of responsibilities?
Short answer
It gives you more flexible/extensible assignment policies without rebooting the broker.
Long answer
The key point of this separation is that group leader is responsible for computing the assignments for the whole group.
It means that this assignment strategy can be configured on a consumer (see partition.assignment.strategy consumer config parameter).
If a partitions assignment was handled by a consumer group coordinator, it would be impossible to configure a custom assignment strategy without rebooting the broker.
For more details see Kafka Client-side Assignment Proposal.
Quotes from documentation
From the "Kafka The Definitive Guide" [Narkhede, Shapira & Palino, 2017]:
When a consumer wants to join a consumer group, it sends a JoinGroup
request to the group coordinator. The first consumer to join the group
becomes the group leader. The leader receives a list of all
consumers in the group from the group coordinator (this will include
all consumers that sent a heartbeat recently and are therefore
considered alive) and it is responsible for assigning a subset of
partitions to each consumer. It uses an implementation of the
PartitionAssignor interface to decide which partitions should be
handled by which consumer.
[...] After deciding on the partition assignment, the consumer leader sends
the list of assignments to the GroupCoordinator which sends this
information to all the consumers. Each consumer only sees his own
assignment - the leader is the only client process that has the full
list of consumers in the group and their assignments. This process
repeats every time a rebalance happens.

Does Kafka have a visibility timeout?

Does Kafka have something analogues to an SQS visibility timeout?
What is the property called?
I can't seem to find anything like this in the docs.
Kafka works a little bit differently than SQS.
Every message resides on a single topic partition. Kafka consumers are organized into consumer groups. A single partition can be assigned only to a single consumer in a group. That means that other consumers from the same CG won't get the same message and the same message will only be re-sent if the same consumer pulls messages from a broker and hasn't committed the offset.
If Kafka broker designated as group coordinator detects that consumer died, rebalance happens and that partition can be assigned to another consumer. But again this will be the only consumer that gets messages from that partition.
So as you can see since Kafka is not using the competing consumer pattern, there's no notion of visibility timeout.