Consumer timeout during rebalance - apache-kafka

When a consumer drops from a group and a rebalance is triggered, I understand no messages are consumed -
But does an in-flight request for messages stay queued passed the max wait time?
Or does Kafka send any payload back during the rebalance?
UPDATE
For clarification, I'm referring specifically to the consumer polling process.
From my understanding, when one of the consumers drop from the consumer group, a rebalance of the partitions to consumers is performed.
During the rebalance, will an error be sent back to the consumer if it's already polled and waiting for max time to pass?
Or does Kafka wait the max time and send an empty payload?
Or does Kafka queue the request passed max wait time until the rebalance is complete?
Bottom line - I'm trying to explain periodic timeouts from consumers.
This may be in the docs, but I'm not sure where to find it.

Kafka producers doesn't directly send messages to their consumers, rather they send them to the brokers.
The inflight requests corresponds to the producer and not to the consumer.
Whether the consumer leaves a group and a rebalance is triggered or not is quite immaterial to the behaviour of the producer.
Producer messages are queued in the buffer, batched, optionally compressed and sent to the Kafka broker as per the configuration.
In-flight requests are the maximum number of unacknowledged requests
the client will send on a single connection before blocking.
Note that when we say ack, it is acknowledgement by the broker and not by the consumer.
Does Kafka send any payload back during the rebalance?
Kafka broker doesn't notify of any rebalance to its producers.

Related

Time limit for pausing kafka consumer

I wanted to implement a functionality which requires Kafka queue to be paused and resume, what I want's to know that is there any time limit upto which it can be paused?
Kafka doesn't really have "queues", all messages in a topic are there to be consumed by Consumers. Your Consumers can consume messages in the way they prefer, a Consumer can start consuming messages from the beginning or from any offset they want, they can also stop and resume as they want.
When a Consumer consumes messages, it can commit the offsets back to Kafka, if the consumer dies, when it will be back, it will start from the last committed message.
If what you want is to poll a bunch of messages and do something with them for a long period of time, Kafka Consumers have a configuration max.poll.interval.ms that by default is 5 minutes. If you expect to consume a message and to be doing something with it for more than 5 minutes, you should increase that configuration, otherwise the consumer group will think your Consumer has died and will rebalance partitions.

What is Kafka Partition Rebalancing expected time?

How long does Partition Rebalancing take when a new Consumer joins the group?
What factors can affect this?
In my understanding, these factors play role:
Consumers needs to finish processing the data they polled last time.
Coordinator waits for Consumer to send JoinGroup request - for how long?
Consumers send SyncGroup request (is there a delay between receiving JoinGroup response and sending SyncGroup request?)
In a normal situation, assuming that Consumers process data instantly, how long should one expect to Partition Rebalancing to take place?
Coordinator waits for at most rebalance.timeout.ms (1 minute by default). If Consumer does not join during this time it is considered dead. But if all consumers re-join before this, the Coordinator will not wait further (my assumption).
Usual Consumers send Sync group as soon as they received the JoinGroup response. Leader needs more time (to perform partition assignment logic).
Source:
https://www.confluent.io/online-talks/everything-you-always-wanted-to-know-about-kafkas-rebalance-protocol-but-were-afraid-to-ask-on-demand/

What happens to records/messages during consumption when the record processing took more than 'max.poll.interval.ms'?

I've below consumer settings.
auto.offset.reset=earliest
enable.auto.commit=true (default value)
session.timeout.ms=10000 (default value)
max.poll.interval.ms= 300000 (default value)
With the above configuration, let's say i have five messages( m1, m2, m3, m4 and m5) in a topic A (with only 1 partition). Now I've consumer subscribed to this topic and was able to process first two messages (m1 and m2) without any issues and committed offset.
Now, Let us say the consumer got the third message m3 and trying to process it and it took 300100 ms for processing because of some network latency. Now, as per my understanding, the offset commit will not happen because the record processing took more than max.poll.interval.ms and hence the consumer would be considered as dead and removed from the group.
Now I've two questions
What happens to the message m3? I mean, would it be picked in the next poll because it's offset was not committed
What happens to the other messages m4 and m5?
Expiring max.poll.inteval.ms without calling poll() is one of the reasons of rebalance. When rebalance starts in a consumer group, all the consumers in this consumer group are revoked. (removed from consumer list) During rebalance Kafka waits all healthy consumers to send joinGroupRequest by calling poll() until rebalance timeout (rebalance timeout equals to max.poll.interval.ms). Upon completion of joinGroupRequests of healthy consumers or rebalance timeout, Kafka assign partitions to consumers that sends joinGroupRequests.
In your case:
What happens to the message m3? I mean, would it be picked in the next
poll because it's offset was not committed
Answer: Its process continues even after your consumer is revoked unless you have a logic to interrupt process thread in case of revoke. So all the messages returned from previous poll are processed. But offset cannot be committed. If this partition is assigned to another consumer at the result of the rebalance, then new consumer will get same messages starts from M3. So message(s) will be processed twice. When first consumer sends poll request again, that means joinGroupRequests and again rebalance will be triggered.
What happens to the other messages m4 and m5?
Answer: If these messages are returned from poll() as well as m3, then result will be the same. They will be processed, but cannot be committed by the old consumer. New consumer will process messages and commit offset.

Issue of Kafka Balancing at high load

Using kafka version 2.11-0.11.0.3 to publish 10,000 messages (total size of all messages are 10MB), there will be 2 consumers (with same group-id) to consume the message as a parallel processing.
While consuming, same message was consumed by both the consumers.
Below errors/warning were throws by kafka
WARN: This member will leave the group because consumer poll timeout
has expired. This means the time between subsequent calls to poll()
was longer than the configured max.poll.interval.ms, which typically
implies that the poll loop is spending too much time processing
messages. You can address this either by increasing
max.poll.interval.ms or by reducing the maximum size of batches
returned in poll() with max.poll.records.
INFO: Attempt to heartbeat failed since group is rebalancing
INFO: Sending LeaveGroup request to coordinator
WARN: Synchronous auto-commit of offsets
{ingest-data-1=OffsetAndMetadata{offset=5506, leaderEpoch=null,
metadata=''}} failed: Commit cannot be completed since the group has
already rebalanced and assigned the partitions to another member. This
means that the time between subsequent calls to poll() was longer than
the configured max.poll.interval.ms, which typically implies that the
poll loop is spending too much time message processing. You can
address this either by increasing max.poll.interval.ms or by reducing
the maximum size of batches returned in poll() with max.poll.records.
Below configurations were provided to kafka
server.properties
max.poll.interval.ms=30000
group.initial.rebalance.delay.ms=0
group.max.session.timeout.ms=120000
group.min.session.timeout.ms=6000
consumer.properties
session.timeout.ms=30000
request.timeout.ms=40000
What should have changed to resolve the multiple consumptions?
Are your consumers in the same group? If yes you will have multiple consumption if a consumer leaves/dies/timeouts without having committed some messages it has processed.
If all your messages are consumed by both consumers you probably have not set the same group id for them.
More info:
So you have set the same group id for all consumers, good. You are in the situation where the cluster/broker thinks that a consumer died and therefore rebalances the load to another one. This other one will start consuming where the last commit was done.
So lets say consumer C_A read offsets up to 100 from partition P_1 then processed them then committed '100' then read offsets up to 200 then processed them but could not commit because the broker considered C_A as dead.
The broker reassigns partition P_1 to consumer C_B which will start from the last commit for the group, which is 100, will read up to 200, process and commit 200.
So your question is how to avoid that the consumer is considered as dead (I assume it is not dead)?
The answer is already in the yellow WARN message in your question: you can tell your consumer to consume less messages (max.poll.records) in one poll to reduce the processing time between two polls to the broker AND/OR you can increase the max.poll.interval.ms telling the broker to wait longer before considering your consumer as dead...

Ensure that all messages are consumed in Kafka

I has a consumer server in Apache-Kafka and its consuming the messages.
However in the poll loop, I have a time consuming process (db access).
So my questions are:
How to make sure that I read all the messages in a given timeframe on the consumer?
Do I need to increase the number of consumer clients?