Kafka Consumer Rebalancing : In-Flight Message Processing is Aborted - apache-kafka

So when our application Scales-Up / Scales-Down, The Kafka Consumer Group Rebalances.
For example when the application Scales-Down, one of the consumer is killed and the partitions which were earlier assigned to this consumer is distributed across the other consumers in the group, When this process happens i have errors in my application logs saying the processing of the in flight message has been aborted
I know the entire consumer group pauses (i.e) does read any new messages while the consumer group is rebalancing. But what happens to the messages which were read by the consumers before pausing ? Can we gracefully handle the messages which are currently being processed ?
Thanks in advance!

The messages which were read but not committed will be ignored when consumer rebalance occurs.After the consumer rebalance is completed the consumers will resume consuming from the last committed offset , so you won't be loosing any message.

Related

How to ensure that a consumer resume from last read message from a specific Partition?

Below is our scenario:
where number of partitions is same as number of consumers
If any one consumer(say consumer 2 reading messages from Partition 2) is crashed or not operational,
What happens to unread messages stored in Partition2? Does other consumer pick it?
or
Messages in Partition2 are unread until consumer 2 restarts from the point it stopped reading(before crash)
Goal:
Message in any partition should not be read more than once by consumer group
Messages in every partition are being read in sequence amidst restart of consumer

stop kafka consumer from consuming messages

Is there any way to stop kafka consumers from consuming messages for sometime ?
I want consumer to stop for sometime and later start consuming messages from the last unconsumed message.
Most Kafka libraries have close or pause methods on the Consumer implementation. Or, you could throw some fatal exception during consumption.
Resuming from the last uncommitted offset for a consumer group is the default behavior

Kafka consumer is taking time to recognize new partition

I was running a test where kafka consumer was reading data from multiple partitions of a topic. While the process was running I added more partitions. It took around 5 minutes for consumer thread to read data from the new partition. I have found this configuration "topic.metadata.refresh.interval.ms", but this is for producer only. Is there a similar config for consumer too?
When we add more partitions to an existing topic then a rebalance process gets initiated.
Every consumer in a consumer group is assigned one or more topic partitions exclusively, and Rebalance is the re-assignment of partition ownership among consumers.
A Rebalance happens when:
consumer JOINS the group
consumer SHUTS DOWN cleanly
consumer is considered DEAD by the group coordinator. This may happen after a
crash or when the consumer is busy with long-running processing, which means
that no heartbeats have been sent in the meanwhile by the consumer to the
group coordinator within the configured session interval
new partitions are added
We need to provide two parameters to reduce the time to rebalance.
request.timeout.ms
max.poll.interval.ms
More detailed information is available at the following.
https://medium.com/streamthoughts/apache-kafka-rebalance-protocol-or-the-magic-behind-your-streams-applications-e94baf68e4f2
I changed "metadata.max.age.ms" parameter value to refresh matadata https://kafka.apache.org/documentation/#consumerconfigs_metadata.max.age.ms

Kafka behavior during partition re-balancing

Given the following scenario: There is a Kafka (2.1.1) topic with 2 partitions and one consumer. A producer sends a message with keyX to Kafka which ends up on partition 2. The consumer starts processing this message. At the same time a new consumer is starting up and Kafka re-balances the topic. Consumer 1 is now responsible only for partition 1, consumer 2 is responsible for partition 2. The producer sends a message again with the same keyX, this time it will be consumer 2 which processes the message.
Consumer 2 might be processing the message, while consumer 1 has not finished yet.
My question is whether this is a realistic scenario or not, since it might be a problem for me if different consumers would process a message with the same key at the same time.
Any thought on this is welcome, thanks a lot!
Yes, it's a realistic scenario. Nevertheless, during a rebalance consumer 1 will closed all of its existing connections. In your case, consumer 1 will closed connections to partition 1 and 2 so it may not have committed its offset before message processing. It may depend if you have configured your consumer with the property enable.auto.commit to true. With this property set to true, consumer will periodically commit its current offset. The period is defined with auto.commit.interval.ms.
You can also be nofity when a rebalance occurs thanks to consumer listener [ConsumerRebalanceListener][1]. It enables to know when a partition is revoked or reassigned.

Kafka Message at-least-once mode at multi-consumer

Kafka messaging use at-least-once message delivery to ensure every message to be processed, and uses a message offset to indicates which message is to deliver next.
When there are multiple consumers, if some deadly message cause a consumer crash during message processing, will this message be redelivered to other consumers and spread the death? If some slow message blocked a single consumer, can other consumers keep going and process subsequent messages?
Or even worse, if a slow and deadly message caused a consumer crash, will it cause other consumers start from its offset again?
There are a few things to consider here:
A Kafka topic partition can be consumed by one consumer in a consumer group at a time. So if two consumers belong to two different groups they can consume from the same partition simultaneously.
Stored offsets are per consumer group. So each topic partition has a stored offset for each active (or recently active) consumer group with consumer(s) subscribed to that partition.
Offsets can be auto-committed at certain intervals, or manually committed (by the consumer application).
So let's look at the scenarios you described.
Some deadly message causes a consumer crash during message processing
If offsets are auto-committed, chances are by the time the processing of the message fails and crashes the consumer, the offset is already committed and the next consumer in the group that takes over would not see that message anymore.
If offsets are manually committed after processing is done, then the offset of that message will not be committed (for simplicity, I am assuming one message is read and processed at a time, but this can be easily generalized) because of the consumer crash. So any other consumer in the group that is (will be) subscribed to that topic will read the message again after taking over that partition. So it's possible that it will crash other consumers too. If offsets are committed before message processing, then the next consumers won't see the message because the offset is already committed when the first consumer crashed.
Some slow message blocks a single consumer: As long as the consumer is considered alive no other consumer in the group will take over. If the slowness goes beyond the consumer's session.timeout.ms the consumer will be considered dead and removed from the group. So whether another consumer in the group will read that message depends on how/when the offset is committed.
Slow and deadly message causes a consumer crash: This scenario should be similar to the previous ones in terms of how Kafka handles it. Either slowness is detected first or the crash occurs first. Again the main thing is how/when the offset is committed.
I hope that helps with your questions.