Selective Kafka rebalancing on Kubernetes infrastructure - kubernetes

I am running a kafka cluster with a set of consumers on a dockerized Kubernetes infrastructure. The typical workflow is that when a certain consumer (of the consumer group) dies, a rebalancing process will be triggered, and a new assignment of the partitions to the set of the consumers (excluding the failed one) is performed.
After some time, Kubernetes controller will recreate/restart the consumer instance that has failed/died and a new rebalance is performed again.
Is there any way to control the first rebalancing process (when the consumer died) e.g., such as to wait few seconds without rebalancing until the failed consumer returns, or until a time out is triggered. And if the consumer returned, continue consuming based on the old rebalancing assignment (i.e., without new rebalancing)?

There are the 3 parameter on that basis group coordinator decide consumer is dead or alive
session.timeout.ms
max.poll.interval.ms
heartbeat.interval.ms
You can avoid unwanted rebalancing by tuning above three parameter and one thumb rule : used separate thread for calling 3rd party api in pool loop.
tuning above three parameter required ans. of below questions
what is size max.poll.records
How much avg. time application is taking to process 1 record[message]
How much avg. time application is taking to process complete batch
Please refer Kafka consumer config
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html
you can also explore Cooperative Rebalances
https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka/

Related

Suspending Camel KafkaConsumer

My app has N instance running. The number of instances is always greater than the number of Kafka partitions. E.g. 6 instances of a consumer-group, consuming from 4 Kafka partitions... so, only 4 of the instances are actually consuming at any point.
In this context can I suspend a Kafka consumer Camel route, without causing Kafka to attempt to re-balance to other potential consumers? My understanding is that the suspended route would stop polling, causing the other to pick up the load.
This is not a Camel but a Kafka question. The rebalancing is handled by Kafka and triggered whenever a consumer explicitly leaves the consumer group or silently dies (does no more sending heartbeats).
Kafka 2.3 introduced a new feature called "Static Membership" to avoid rebalancing just because of a consumer restart.
But in your case (another consumer must take the load of a leaving consumer) I think Kafka must trigger a rebalancing over all consumers due to the protocol used.
See also this article for a quite deep dive into rebalancing and its trade-offs between availability and fault-tolerance.
Edit due to comments
If you want to avoid rebalancing, I think you would have to increase both session.timeout.ms (heartbeat interval) and max.poll.interval.ms (processing timeout).
But even if you set them very high I guess it would not work reliably because route suspension could still happen just before a heartbeat (simply bad timing).
See this q&a for the difference between session.timeout.ms and max.poll.interval.ms.

Processing kafka messages taking long time

I have a Python process (or rather, set of processes running in parallel within a consumer group) that processes data according to inputs coming in as Kafka messages from certain topic. Usually each message is processed quickly, but sometimes, depending on the content of the message, it may take a long time (several minutes). In this case, Kafka broker disconnects the client from the group and initiates the rebalance. I could set session_timeout_ms to a really large value but it would be like 10 minutes of more, which means if a client dies, the cluster would not be properly rebalanced for 10 minutes. This seems to be a bad idea. Also, most messages (about 98% of them) are fast, so paying such penalty for just 1-2% of messages seems wasteful. OTOH, large messages are frequent enough to cause a lot of rebalances and cost a lot of performance (since while the group is rebalancing, nothing is getting done, and then the "dead" client re-joins again and causes another rebalance).
So, I wonder, are there any other ways for handling messages that take a long time to process? Is there any way to initiate heartbeats manually to tell the broker "it's ok, I am alive, I'm just working on the message"? I thought the Python client (I use kafka-python 1.4.7) was supposed to do that for me but it doesn't seem to happen. Also, the API doesn't seem to even have separate "heartbeat" function at all. And as I understand, calling poll() would actually get me the next messages - while I am not even done with the current one, and would also mess up iterator API for Kafka consumer, which is quite convenient to use in Python.
In case it's important, the Kafka cluster is Confluent, version 2.3 if I remember correctly.
In Kafka, 0.10.1+ Kafka polling and session heartbeat are decoupled to each other.
You can get an explanationhere
max.poll.interval.ms how much time permit to complete processing by consumer instance before time out means if processing time takes more than max.poll.interval.ms time Consumer Group will presume its die remove from Consumer Group and invoke rebalance.
To increase this will increase the interval between expected polls which give consumers more time to handle a batch of records returned from poll(long).
But at the same time, it will also delay group rebalances since the consumer will only join the rebalance inside the call to poll.
session.timeout.ms is the timeout used to identify if the consumer is still alive and sending a heartbeat on a defined interval (heartbeat.interval.ms). In general, the thumb-rule is heartbeat.interval.ms should be 1/3 of session timeout so in case of network failure consumers can miss at most 3-time heartbeat before session timeout.
session.timeout.ms: low value would be good to detect failure more quickly.
max.poll.interval.ms: large value will reduce the risk of failure due to increased processing time however increases the rebalancing time.
Note: A large number of partition and topics consumed by Consumer Group also effect on overall rebalance time
The other approach if you would really want to get rid of rebalancing you can assign partitions on each consumer instance manually, using partition assign. In that case, each consumer instance will be running independently with their own assigned partitions. But in that case, you would not able to leverage the rebalance features to assign partitions automatically.

Kafka is marking the coordinator dead when autoscaling is on

We run a Kubernetes cluster with Kafka 0.10.2. In the cluster we have a replica set of 10 replicas running one of our services, which consume from one topic as one consumer-group.
Lately we turned on the autoscaling feature for this replica-set, so it can increase or decrease the number of pods, based on its CPU consumption.
Soon after this feature was enabled we started to see lags in our Kafka queue. I looked at the log and saw the consumer is marking the coordinator dead very often (almost every 5 minutes) and the reconnect to the same coordinator few seconds later.
I also saw frequently in the logs:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
It takes a few seconds to process a message (normally) and we never had this kind of issues before. I assume the problem relates to a bad partition assignment but I can't pinpoint the problem.
If we kill pod that got "stuck" Kafka reassign the partition to another pod and it get stuck as well, but if we scale down the replica-set to 0 and then scale it up the messages are being consumed quickly!
Relevant consumer configurations:
heartbeat.interval.ms = 3000
max.poll.interval.ms = 300000
max.poll.records = 500
session.timeout.ms = 10000
Any suggestions?
I am not saying this is the problem but Spring kafka 1.1.x had a very complicated threading model (required by the 0.9 clients). For long-running listeners we had to pause/resume the consumer thread; I saw some issues with early kafka versions where the resume didn't always work.
KIP-62 allowed us to greatly simplify the threading model.
This was incorporated into the 1.3.x line.
I would suggest upgrading to at least cloud-stream Ditmars, which uses spring-kafka 1.3.x. The current 1.3.x version is 1.3.8.

Kafka consumer behavior when one consumer goes down/crash

My topics has 115 partition and around 130 consumers. I expect 115 consumers in active state (1 to 1 assignment) and the remaining 15 consumers in idle state.
A few times, I observed high memory and JVM in hung state due to which rebalancing is triggered. However, I am unsure if this causes the full rebalancing (i.e., healthy node assignments also gets changed ??) or only the dead node's assigned partitions get assigned to one of the idle nodes ?
Also, in case of restart of the application (mine is a distributed 1 thread/consumer per JVM), how does the rebalance behave ? As the nodes are starting one by one (rolling restart), will the rebalance happen 115 times (ie., every time a new consumer joins the group) or is some threshold/wait applied before kick starting the rebalance (to ensure all healthy nodes join the application)?
Consumer rebalance is triggered anytime a Kafka consumer with the same group ID joins the group or leaves. Leaving the consumer group can be done explicitly by closing a consumer connection, or by timeout if the JVM or server crashed.
So in your case, yes, a rolling restart of the consumers would trigger 115 consumer rebalances. There is no "threshold" or "wait period" before starting a rebalance in Kafka.
By default RangeAssignor.java - is used which may cause that even healthy consumers get different partitions assigned to them over and over again when something happens with other node. It may also mean that partition will be taken from healthy consumer. You may tweak that so it uses different implementations of PartitionAssignor interface - for example StickyAssignor.java
"one advantage of the sticky assignor is that, in general, it reduces the number of partitions that actually move from one consumer to another during a reassignment".
I would also recommend reading https://medium.com/#anyili0928/what-i-have-learned-from-kafka-partition-assignment-strategy-799fdf15d3ab if you want some deep dive how does it work undearneath

max.poll.intervals.ms set to int.Max by default

Apache Kafka documentation states:
The internal Kafka Streams consumer max.poll.interval.ms default value
was changed from 300000 to Integer.MAX_VALUE
Since this value is used to detect when the processing time for a batch of records exceeds a given threshold, is there a reason for such an "unlimited" value?
Does it enable applications to become unresponsive? Or Kafka Streams has a different way to leave the consumer group when the processing is taking too long?
Does it enable applications to become unresponsive? Or Kafka Streams has a different way to leave the consumer group when the processing is taking too long?
Kafka Streams leverages a heartbeat functionality of the Kafka consumer client in this context, and thus decouples heartbeats ("Is this app instance still alive?") from calls to poll(). The two main parameters are session.timeout.ms (for the heartbeat thread) and max.poll.interval.ms (for the processing thread), and their difference is described in more detail at https://stackoverflow.com/a/39759329/1743580.
The heartbeating was introduced so that an application instance may be allowed to spent a lot of time processing a record without being considered "not making progress" and thus "be dead". For example, your app can do a lot of crunching for a single record for a minute, while still heartbeating to Kafka "Hey, I'm still alive, and I am making progress. But I'm simply not done with the processing yet. Stay tuned."
Of course you can change max.poll.interval.ms from its default (Integer.MAX_VALUE) to a lower setting if, for example, you actually do want your app instance to be considered "dead" if it takes longer than X seconds in-between polling records, and thus if it takes longer than X seconds to process the latest round of records. It depends on your specific use case whether or not such a configuration makes sense -- in most cases, the default setting is a safe bet.
session.timeout.ms: The timeout used to detect consumer failures when using Kafka's group management facility. The consumer sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove this consumer from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms.
max.poll.interval.ms: The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.