Conditions in which Kafka Consumer (Group) triggers a rebalance - apache-kafka

I was going through the Consumer Config for Kafka.
https://kafka.apache.org/documentation/#newconsumerconfigs
what are the parameters that will trigger a rebalance ?. For instance the following parameter will ?. Any other parameters which we need to change or will defaults suffice
connections.max.idle.ms Close idle connections after the number of milliseconds specified by this config. long 540000 medium
Also we have three different topics
Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics.
Assuming if the above scenario is valid (not necessarily the best practice) - if one of the topic is very light traffic, will it cause the Consumer group to rebalance.
A follow up question - what factors affect the rebalancing and its performance.

These condition will trigger a group rebalancing:
Number of partitions change for any of the subscribed list of topics
Topic is created or deleted
An existing member of the consumer group dies
A new member is added to an existing consumer group via the join API
Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics.
At least it is valid, as for good or bad, it depends on your detailed case.
This is supported by the official java client api, see this method definition:
public void subscribe(Collection<String> topics,
ConsumerRebalanceListener listener)
It accepts a collection of topics.
if one of the topic is very light traffic, will it cause the Consumer group to rebalance.
No, because this is not listed in conditions. If we just consider it from the topic aspect. only when the topic is deleted or partition counts changed, the rebalcance will happens,.
Update.
Thanks for #Hans Jespersen's comment about session and hearbeat.
this is quoted by kafka Consumer javadoc:
After subscribing to a set of topics, the consumer will automatically join the group when poll(long) is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the poll API sends periodic heartbeats to the server; when you stop calling poll (perhaps because an exception was thrown), then no heartbeats will be sent. If a period of the configured session timeout elapses before the server has received a heartbeat, then the consumer will be kicked out of the group and its partitions will be reassigned.
And In your question, you ask what are the parameters that will trigger a rebalance
In this case, there are two configs has relation with the rebalance. It is session.timeout.ms and max.poll.records. Its means is obvious.
And from this, We also could learn that it is a bad practice to do a lot work between the poll, overhead work maybe block the heartbeat and cause session timeout.

Related

Kafka: change in consumers number in a group

I understand that Kafka semantics is that a consumer group must read a record only once. To achieve this, Kafka consumers maintain an offset, which is then conveyed to brokers with read requests so that brokers can send data accordingly to ensure that already read data is not resend(). But how does broker and consumers react when their is a change in consumer group, like addition of a new consumer or an existing consumer going down?
There are few things which needs to be considered here.
A consumer goes down, then how is its offset information taken into
account while assigning its partitions to active consumers?
A new consumer joins, then how does system ensures that it doesn't read a
data its consumer group has already read?
If consumers join/leave a group, there's a consumer group rebalance. All consumers in the group will temporarily be suspended, then new partitions will be assigned to consume from.
If those consumers were processing, then there's a good chance that they'll re-consume the same data.
If you use transactions, the chance that happens could be a reduced as records will be consumed "exactly once". But this doesn't necessarily mean "successfully processed and offset committed" exactly once.

Client rebalacing when leader election takes place

I have a custom kafka setup, where my application and a kafka broker are placed in a single node.
To make sure that the app instance only consumes the partitions in that node(to reduce network overhead), I have a custom partition assignor assigned to all members of the group.
However, if a broker fails and then it rejoins the cluster, will that trigger a consumer re-balance ? Similarly, if I add a new broker and trigger the partition re-assignment script, would that also trigger a re-balance ?
Typically, a consumer rebalancing will happen when :
A consumer joins or leaves the Consumer Group.
A consumer fails to send an heartbeat request to the Broker Coordinator before reaching a timeout (see session.timeout.ms and heartbeat.interval.ms) managing the group.
A consumer does not invoke the poll() method frequently enough (see max.poll.interval.ms).
A consumer subscription has changed.
Metadata for a topic matching the subscription has changed (i.e: the number of partitions has been increased).
A new topic matching the subscription has been created (when using pattern).
A topic matching the subscrption has been deleted (when using pattern).
When a rebalancing is manually triggered using the using Java Consumer API (see Consumer#enforceRebalance()).
When the broker acting as coordinator of the group fails.
So, to answer your question adding a new broker will not trigger a partition-reassignment.
Here is blog post explaining how the rebalance protocol works Apache Kafka Rebalance Protocol, or the magic behind your streams applications.

How to reinstate a Kafka Consumer which has been kicked out of the group?

I have a situation where I have a single kafka consumer which retrieves records from kafka using the poll mechanism. Sometimes this consumer gets kicked out of the consumer group due to failure to call poll within the session.timeout period which I have configured to 30s. My question is if this happens will a poll at some later point of time re-add the consumer to the group or do I need to do something else?
I am using kafka version 0.10.2.1
Edit: Aug 14 2018
Some more info. After I do a poll I never process the records in the same thread. I simply add all the records to a separate queue (serviced by a separate thread pool) for processing.
Poll will initiate a "join group" request, if the consumer is not a member of the group yet, and will result in consumer joining the group (unless some error situation prevents it). Note that depending on the group status (other members in the group, subscribed topics in the group) the consumer may or may not get the same partitions it was consuming from before it was kicked out. This would not be the case if the consumer is the only consumer in the group.
Consumer gets kicked out if it fails to send heart beat in designated time period. Every call to poll sends one heart beat to consumer group coordinator.
You need to look at how much time is it taking to process your single record. Maybe it is exceeding session.timeout.ms value which you have set as 30s. Try increasing that. Also keep max.poll.records to a lower value. This setting determines how many records are fetched after the call to poll method. If you fetch too many records then even if you keep session.timeout.ms to a large value your consumer might still get kicked out and group will enter rebalancing stage.
Vahid already mentioned what happens when a kicked-out consumer rejoins the group. You can also tune the below configuration so that consumer won't be kicked out of the group.
max.poll.records - which gives the pre-defined number of records in the poll loop (default: 500)
max.poll.interval.ms - which gives you the amount of time required to process the messages that are received in the poll. (default: 5 min)
You can see the impacts of updating the above configuration in KIP-62
Alternatively, you can use KafkaConsumer#assign mode as you've mentioned that you're using only one consumer. This mode won't do any re-balance.

KAFKA 0.9 new consumer group join and heart beat multi-thread problems

I am try to update my KAFKA client from 0.8.2 to 0.9.0.1 to reduce the pressure to zookeeper cluster. And I'm running into follow questions:
The KAFKA consumer protocal say that "The join group request will park at the coordinator until all expected members have sent their own join group request". Then I found that the join group request is triggered by poll() and the method will not return before the group rebalancing finished. So does that means I will need as the same number of consumer thread as the consumer numbers to make sure all the consumers can send out the group join request at the same time? If I have more than 10000 partitions and I want each partition has it's own consumer, does that means I need more than 10000 consumer threads?
To trigger the heart beat, I need to call poll(). But if I don't want to get new messages since the old messages is still consuming, could I do that by consumer.pause() -> consumer.poll() -> consumer.resume()? Is there a better way to do that?
Consumers can read multiple partitions. So in general, a single consumer is sufficient -- it can assign all partitions to itself. However, if you "want that each partition has it's own consumer", you will of course need one consumer per partition...
About joining groups: if you have multiple consumers and you are in a rebalance, the rebalance will not block forever. There is a timeout applied. If a consumer does not send a join-request within the timeout, it drops out of the group (for now) and rebalance can finish. If this late consumer gets live again sending a join group request, a new rebalance will get triggered.
Pausing, poll, resume would be the right thing to do. Heads-up: this is going to get changed via KIP-62 that introduces a heartbeat background thread in the consumer.

Making Kafka consumers consume existing messages before subscription

Having Publisher and N Consumers, if consumers use auto.offset.reset=latest then they miss all the messages that were published to a topic before they subscribed to it ... It is a known fact that Consumer with auto.offset.reset=latestdoesn't replay messages that existed in the topic before it subscribed...
So I would need either :
Make publisher wait until all subscribers start consuming messages and then start publishing. Dunno how to do that without leveraging Zookeeper for instance. Does Kafka provide means to do that ?
Another way would be having auto.offset.reset=latest Consumers and make them explicitly consume all existing messages before in case they are about to subscribe to a topic with existing messages...
What is the best practice for this case?
I guess that consumer must check topic for existing messages, consume them if there are any and then initiate auto.offset.reset=latest consumption. That sounds like the best way to me ...
If a high level consumer gets started, it does the following:
look for committed offsets for its consumer group
a. if valid offsets are found, resume from there
b. if no valid offsets are found, set offsets according to auto.offset.reset
Thus, auto.offset.reset only triggers, if no valid offset was committed. This behavior is intended and necessary to provide at-least-once processing guarantees in case of failure.
Thus, is you want to read a topic from its beginning, you can either use a new consumer group.id and set auto.offset.reset = earliest or you explicitly modify the offsets on startup using seekToBeginning() before you start your poll() loop.
We do the option (1) using a service discovery feature provided by Eureka (any other service discovery app would do the job) + aliasing. Basically a publisher does not register itself (and start processing requests nor publish notifications) until at least one subscriber is available.