KafkaStreams stop consuming partitions after partition leader rebalance - apache-kafka

We have experimented an issue that could be caused by the parameter auto.leader.rebalance.enable, which is set to true by default on brokers.
In detail, when the automatic rebalance occurs, for example after a broker restart, some partition leaders are moved to match the preferred leader.
After this event, some stateful Kafka Streams applications blocks on the source partitions whose leader has been moved and the consumer lag start to grow.
Is it a known issue? Why don't applications receive the information regarding the change of leader?
The tactical solution we found in case we need to execute a rolling restart of brokers is:
Stop stateful applications
Perform brokers rolling restarts.
Wait 5 minutes (default value) untile the automatic leader rebalance occurs
Start stateful applications.
We are using Confluent Platform Community 5.2.2, deployed on a 3 node on prem cluster.
We are trying to recreate what happened in the test environment but without success. is it possible that it is influenced by the load of the cluster, much lower in test?
Thanks in Advance!
Giorgio

Related

Does even number of Kafka brokers affect ensemble operation and leader election

we are little confuse about if is valid or not to use kafka cluster with even numbers
the story begin when we build kafka cluster with 6 machines and trying to use the kafka Reassignment partitions tool , because non balance of kafka brokers
unfortunately , kafka Reassignment partitions tool not succeeded to Reassignment partitions
so we suspect about the even number of our kafka machines
so is it true or false ? , to use kafka even number?
As commented, with the official Apache Users email thread
No, Kafka clusters have no such restrictions as that of a Zookeeper Quorum (see definition of quroum) ; in order to elect a leader, "more than half must agree" on a value. That's not possible with an even number.
kafka Reassignment partitions tool not succeeded
You'll need to be more specific. I assume the rest of the cluster works, so why should an even number be an issue?
Failure of reassignment is unlikely to be caused due to the number of brokers being 'even'.
It must have been caused due to follower brokers not being in sync with the leader, or due to a network issue, or due to the leader broker going offline at a critical time.
The Reassignment tool works well if the partitions are all in sync before running the tool.
The tool will usually fail if Partitions are out of sync or if
leader is down an election for the leader is in progress while the tool is running.
As a best practice, it is recommended to make sure the following conditions are met before running the tool:
At least one partition is in sync with the leader
Leader is healthy and no election is happening.
Out-of-sync partitions are not nominated as targets for reassignment.

Zookeeper failures in Kafka 0.9 and above

Based of the answer given in Is Zookeeper a must for Kafka?.
It is clear that what is the responsibility of Zookeeper in Kafka 0.9 and above
I just wanted to understand what will be the impact if zookeeper cluster goes down completely?
kafka uses ZK for membership (figure out what brokers exist and which of them are alive) and leader election (elect the one broker that is controller for the cluster at any moment).
simply put - if ZK fails kafka dies.
if ZK sneezes (say a particularly long GC pause or a short network connectivity issue) a kafka cluster may temporarily "lose" any number of members and/or the controller. by the time this settles you may have a new controller and new leader brokers for all partitions (which may or may not cause loss of acknowledged data, see "unclean leader election"). I'm not sure if all ongoing transactions would be rolled back - never tried.

What happens to consumer groups in Kafka if the entire cluster goes down?

We have a consumer service that is always trying to read data from a topic using a consumer group. Due to redeployments, our Kafka cluster periodically is brought down and recreated again.
Whenever the cluster comes back again, we observed that although the previous topics are picked up (probably from zookeeper), the previous consumer groups are not created. Because of this, our running consumer process which is created with a previous consumer group gets stuck and never comes out.
Is this how the behavior of the consumer groups should be or is there a configuration we need to enable somewhere?
Any help is greatly appreciated.
Kafka Brokers keep a cache of healthy consumers and consumer groups, if the entire cluster is destroyed/recreated it no longer has knowledge of those consumers and groups, including offsets. The consumers will have to reconnect and re-establish the group and offsets from the beginning of the topic.
Operationally it makes more sense to keep the Kafka cluster running long-term, and do version upgrades in a rolling fashion so you don't interrupt the service.

Fixing under replicated partitions in kafka

In our production environment, we often see that the partitions go under-replicated while consuming the messages from the topics. We are using Kafka 0.11. From the documentation what is understand is
Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from the replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.
How do we fix this issue? What are the different reasons for replicas go out of sync? In our scenario, we have all the Kafka brokers in the single RACK of the blade servers and all are using the same network with 10GBPS Ethernet(Simplex). I do not see any reason for the replicas to go out of sync due to the network.
We faced the same issue:
Solution was:
Restart the Zookeeper leader.
Restart the broker\brokers that are not replicating some of the partitions.
No data lose.
The issue is due to a faulty state in ZK, there was an opened issue on ZK for this, don't remember the number.
I faced the same issue on Kafka 2.0,
On restart Kafka controller node everything caught-up on the replicas.
But still looking for the reasons why few partitions are under-replicated whereas the other partitions on the same nodes for the same topic works good, and this issue i see on a random partitions.
Do NOT run reassignment for all topics together, consider running it for small portions.
Find the topic that has under-replicated partitions and where reassignment process can't be completed.
Set unclean.leader.election.enable to true for this topic.
Find under-replicated partition that stuck for this topic. Check its leader ID.
Stop the broker (just the service, not the instance).
Execute Preferred Replica Election (in yahoo/kafka-manager or manually).
Start the broker back.
Repeat for the rest of topics that have the same problem.
Also I tried this advice, it didn't help me: https://stackoverflow.com/a/51063607/1929406

Maximum value for zookeeper.connection.timeout.ms

Right now we are running kafka in AWS EC2 servers and zookeeper is also running on separate EC2 instances.
We have created a service (system units ) for kafka and zookeeper to make sure that they are started in case the server gets rebooted.
The problem is sometimes zookeeper severs are little late in starting and kafka brokers by that time getting terminated.
So to deal with this issue we are planning to increase the zookeeper.connection.timeout.ms to some high number like 10 mins, at the broker side. Is this a good approach ?
Are there any size effect of increasing the zookeeper.connection.timeout.ms timeout in zookeeper ?
Increasing zookeeper.connection.timeout.ms may or may not handle your problem in hand but there is a possibility that it will take longer time to detect a broker soft failure.
Couple of things you can do:
1) You must alter the System to launch the kafka to delay by 10 mins (the time you wanted to put in zookeper timeout).
2) We are using HDP cluster which automatically takes care of such scenarios.
Here is an explanation from Kafka FAQs:
During a broker soft failure, e.g., a long GC, its session on ZooKeeper may timeout and hence be treated as failed. Upon detecting this situation, Kafka will migrate all the partition leaderships it currently hosts to other replicas. And once the broker resumes from the soft failure, it can only act as the follower replica of the partitions it originally leads.
To move the leadership back to the brokers, one can use the preferred-leader-election tool here. Also, in 0.8.2 a new feature will be added which periodically trigger this functionality (details here).
To reduce Zookeeper session expiration, either tune the GC or increase zookeeper.session.timeout.ms in the broker config.
https://cwiki.apache.org/confluence/display/KAFKA/FAQ
Hope this helps