Kafka Consumer polling interval - apache-kafka

I have a Kafka topic and I have attached 1 consumers to it(topic has only 1 partition).
Now for timeouts, I am using default values(heartbeat : 3Sec, Session Timeout : 10sec, Poll TImeout : 5 mins).
As per documentation, poll timeout defines that consumer has to process the message before this else broker will remove this consumer from the consumer group. Now Suppose , it takes consumer only 1 minute to finish processing the message.
Now I have two questions
a) Now will it call poll only after 5 mins or it will call poll() as soon as it finishes processing.
b) Also, suppose consumer is sitting idle for sometime, then what would be the frequency of polling i.e. at what interval consumer will poll the broker for message? Will it be poll timeout or something else?

I presume the 5 minute setting you are referring to is max.poll.interval.ms, and not the poll timeout.
Also, I presume you are calling Kafka from Java; the answer might be different if you are using a different language.
The poll timeout is the value you pass to the KafkaConsumer poll() method. This is the maximum time the poll() method will block for, after you call it.
The maximum polling interval of 5 minutes means that you must call poll() again before 5 minutes are over since the last call of poll() had returned. If you don't, your consumer will be disconnected.
So your questions:
a) Now will it call poll only after 5 minutes or it will call poll() as soon as it finishes processing.
That is totally up to you. You are the one who is doing the calling, in your own code. You should have a loop in which you call poll().
b) Also, suppose consumer is sitting idle for sometime, then what would be the frequency of polling i.e. at what interval consumer will poll the broker for message? Will it be poll timeout or something else?
A consumer (i.e. your own application code) shouldn't be sitting idle but should be in a loop. In this loop, you call poll(), then you process the event (1 minute), and then you call poll() again.

Related

max poll interval and session timeout ms | kafka consumer alive

Scenario:
Committing offsets manually after processing the messages.
session.timeout.ms: 10 seconds
max.poll.interval.ms: 5 minutes
Processing of messages consumed in a "poll()" is taking 6 minutes
Timeline:
A (0 seconds): app starts poll(), have consumed the messages and started processing (will take 6 minutes)
B (3 seconds): a heartbeat is sent
C (6 seconds): another heartbeat is sent
D (5 minutes): another heartbeat is sent (5 * 60 % 3 = 0) BUT "max.poll.interval.ms" (5 minutes) is reached
At point "D" will consumer:
send "LeaveGroup request" to consider this consumer "dead" and re-balance?
continue sending heartbeats every 3 seconds ?
If point "1" is the case, then
a. how will this consumer commit offsets after completing the processing of 6 minutes considering that its partition(s) are changed due to re-balancing at point "D" ?
b. should the "max.poll.interval.ms" be set in prior according to the expected processing time ?
If point "2" is the case, then will we never know if the processing is actually blocked ?
Thankyou.
Starting with Kafka version 0.10.1.0, consumer heartbeats are sent in a background thread, such that the client processing time can be longer then the session timeout without causing the consumer to be considered dead.
However, the max.poll.interval.ms still sets the maximum allowable time for a consumer to call the poll method.
In your case, with a processing time of 6 minutes it would mean at point "d" that your consumer will be considered dead.
Your concerns are right, as the consumer will then not be able to commit the messages after 6 minutes. Your consumer will get a CommitFailedExcpetion (as described in another anser on CommitFailedExcpetion.
To conclude, yes, you need to increase the max.poll.interval.ms time if you already know that your processing time will exceed the default time of 5 minutes.
Another option would be to limit the fetched records during a poll by decreasing the configuration max.poll.records which defaults to 500 and is described as: "The maximum number of records returned in a single call to poll()".

Do Kafka consumers spin on poll() or are they woken up by a broadcast/signal from the broker?

If I poll() from a consumer in a while True: statement, I see that poll() is blocking. If the consumer is up to date with messages from the topic (offset = OFFSET_END) how is the consumer conducting it's blocking poll()?
Does the consumer default adhere to a pub/sub mentality in which it sleeps and waits for a publish and a broadcast/signal from the broker?
Or is the consumer constantly spinning itself checking the topic?
I'm using the confluent python client, if that matters.
Thanks!
kafka consumers are basically long poll loops, driven (asynchronously) by the user thread calling poll().
the whole protocol is request-response, and entirely client driven. there is no form of broker-initiated "push".
fetch.max.wait.ms controls how long any single broker will wait before responding (if no data), while blocking of the user thread is controlled by argument to poll()
Yes, you are right its while a true condition that waits to consume the message till waiting timeout time.
If it receives a message it will return immediately otherwise it will await to passed timeout and return an empty record.
Kafka Broker use the below parameter to control message to send to Consumer
fetch.min.bytes: The broker will wait for this amount of data to fill BEFORE it sends the response to the consumer client.
fetch.wait.max.ms: The broker will wait for this amount of time BEFORE sending a response to the consumer client unless it has enough data to fill the response (fetch.message.max.bytes)
There is a possibility to take a long time to call the next poll() due to the processing of consumed messages. max.poll.interval.ms prevent not to process take so much time and call the next poll within max.poll.interval.ms otherwise consumer leaves the group and trigger rebalance.
You can get more detail about this here
max.poll.interval.ms: By increasing the interval between expected polls, you can give the consumer more time to handle a batch of
records returned from poll(long). The drawback is that increasing this
value may delay a group rebalance since the consumer will only join
the rebalance inside the call to poll. You can use this setting to
bound the time to finish a rebalance, but you risk slower progress if
the consumer cannot actually call poll often enough.
max.poll.records: Use this setting to limit the total records returned from a single call to a poll. This can make it easier to
predict the maximum that must be handled within each poll interval. By
tuning this value, you may be able to reduce the poll interval, which
will reduce the impact of group rebalancing.

What is the delay time between each poll

In kafka documentation i'm trying to understand this property max.poll.interval.ms
The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.
This mean each poll will happen before the poll-time-out by default it is 5 minutes. So my question is exactly how much time consumer thread takes between two consecutive polls?
For example: Consumer Thread 1
First poll--> with 100 records
--> process 100 records (took 1 minute)
--> consumer submitted offset
Second poll--> with 100 records
--> process 100 records (took 1 minute)
--> consumer submitted offset
Does consumer take time between first and second poll? if yes, why? and how can we change that time ( assume this when topic has huge data)
It's not clear what you mean by "take time between"; if you are talking about the spring-kafka listener container, there is no wait or sleep, if that's what you mean.
The consumer is polled immediately after the offsets are committed.
So, max.poll.interval.ms must be large enough for your listener to process max.poll.records (plus some extra, just in case).
But, no, there are no delays added between polls, just the time it takes the listener to handle the results of the poll.

Prevent kafka consumer from timing out for long process

I need to prevent the kafka consumer from timing out while the application waits for a particular process to complete. My approach is to pause the partitions and then resume them once the process is completed.
List<TopicPartition> partitionList = new ArrayList<>();
partitionList.addAll(kafkaConsumer.assignment());
kafkaConsumer.pause(partitionList);
while(//waiting for the process to complete){
Thread.sleep(10000);
kafkaConsumer.poll(0);
}
kafkaConsumer.resume(partitionList);
Questions
Does pause send heartbeat to kafka automatically or should I still need to poll at regular intervals to send the heart beat?
Is mine the best approach ? or is there a better way of doing it?
Since Kafka 0.10.1, consumers do have a background thread for sending heartbeats: https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
Thus, you don't need to call poll() to send heartbeat to the brokers. However, there is a second timeout max.poll.interval.ms -- you must call poll() within this time to avoid this second timeout. Default value is 5 minutes. You can just increase this timeout if your wait is even longer than this. If you do so, you also don't need to pause any partitions etc.
If you are using an older version, you approach of pausing, and calling poll() regularly is the only way to send regular heartbeat to avoid the timeout.

Difference between session.timeout.ms and max.poll.interval.ms for Kafka >= 0.10.1

I am unclear why we need both session.timeout.ms and max.poll.interval.ms and when would we use one or the other or both? It seems like both settings indicate the upper bound on the time the coordinator will wait to get the heartbeat from a consumer before assuming it's dead.
Also how does it behave for versions 0.10.1.0+ based on KIP-62?
Before KIP-62, there is only session.timeout.ms (ie, Kafka 0.10.0 and earlier). max.poll.interval.ms is introduced via KIP-62 (part of Kafka 0.10.1).
KIP-62, decouples heartbeats from calls to poll() via a background heartbeat thread, allowing for a longer processing time (ie, time between two consecutive poll()) than heartbeat interval.
Assume processing a message takes 1 minute. If heartbeat and poll are coupled (ie, before KIP-62), you will need to set session.timeout.ms larger than 1 minute to prevent consumer to time out. However, if a consumer dies, it also takes longer than 1 minute to detect the failed consumer.
KIP-62 decouples polling and heartbeat allowing to send heartbeats between two consecutive polls. Now you have two threads running, the heartbeat thread and the processing thread and thus, KIP-62 introduced a timeout for each. session.timeout.ms is for the heartbeat thread while max.poll.interval.ms is for the processing thread.
Assume, you set session.timeout.ms=30000, thus, the consumer heartbeat thread must sent a heartbeat to the broker before this time expires. On the other hand, if processing of a single message takes 1 minutes, you can set max.poll.interval.ms larger than one minute to give the processing thread more time to process a message.
If the processing thread dies, it takes max.poll.interval.ms to detect this. However, if the whole consumer dies (and a dying processing thread most likely crashes the whole consumer including the heartbeat thread), it takes only session.timeout.ms to detect it.
The idea is, to allow for a quick detection of a failing consumer even if processing itself takes quite long.
Implemenation Detail
The new timeout max.poll.interval.ms is mainly a client side concept: if poll() is not called within max.poll.interval.ms, the heartbeat thread will detect this case and send a leave-group request to the broker. -- max.poll.interval.ms is still relevant for consumer group rebalances: if a rebalance is triggered, consumers have max.poll.interval.ms time to re-join the group by calling poll() client side which triggers a join-group request.