How do i measure time taken for kafka rebalance? - apache-kafka

I am very new to Kafka. So this question might be very basic.
What i am trying to achieve is to find out the time it takes to rebalance when a broker fails and is then added back.
From my reading up of the documentation(http://kafka.apache.org/documentation/#basic_ops_restarting). When a broker fails or is taken down for maintenance
It will sync all its logs to disk to avoid needing to do any log recovery when it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery takes time so this speeds up intentional restarts.
It will migrate any partitions the server is the leader for to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.
What i want to do is find out the time taken to migrate any partitions that the server is the leader for to other replicas
My kafka setup is 3 broker nodes and 3 zk nodes.
Also, when i add this node back to the property of auto.rebalance=true the rebalance again kicks in, and it re-elects a leader.
How do i measure this time as well?

There is no "migration" as in data copy. When shutting down a broker cleanly, the controller will simply elect a new leader from the available replicas for all partitions the broker was the leader, making the transition fast.
There are a few metrics you can monitor the leader elections.
Since 0.11.0.0, the broker exposes a number of Controller metrics including:
kafka.controller:type=ControllerStats,name=AutoLeaderBalanceRateAndTimeMs
This tracks the rate and duration of auto leader rebalance. The full list of controller metrics that were added in 0.11 is available in the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-143%3A+Controller+Health+Metrics#KIP-143:ControllerHealthMetrics-ControllerMetrics
If you are running an older version (< 0.11.0.0), you'll have to rely on metrics like:
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
This include any leader elections.

Related

Kafka - broker partitions not in-sync after restart

We use 3 node kafka clusters running 2.7.0 with quite high number of topics and partitions. Almost all the topics have only 1 partition and replication factor of 3 so that gives us roughly:
topics: 7325
partitions total in cluster (including replica): 22110
Brokers are relatively small with
6vcpu
16gb memory
500GB in /var/lib/kafka occupied by partitions data
As you can imagine because we have 3 brokers and replication factor 3 the data is very evenly spread across brokers. Each broker leads very similar (same) amount of partitions and the number of partitions per broker is equal. Under normal circumstances.
Before doing rolling restart yesterday everything was in-sync. We stopped the process and started it again after 1 minute. It took some 10minutes to get synchronized with Zookeeper and start listening on port.
After saing 'Kafka server started'. Nothing is happening. There is no CPU, memory or disk activity. The partition data is visible on data disk. There are no messages in log for more than 1 day now since process booted up.
We've tried restarting zookeeper cluster (one by one). We've tried restart of broker again. Now it's been 24 hours since last restart and still not change.
Broker itself is reporting it leads 0 partitions. Leadership for all the partitions moved to other brokers and they are reporting that everything located in this broker is not in sync.
I'm aware the number of partitions per broker is far exceeding the recommendation but I'm still confused by lack of any activity or log messages. Any ideas what should be checked further? It looks like something is stuck somewhere. I checked the kafka ACLs and there are no block messages related to broker username.
I tried another restart with DEBUG mode and it seems there is some problem with metadata. These two messages are constantly repeating:
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: No controller defined in metadata cache, retrying after backoff (kafka.server.BrokerToControllerRequestThread)
With kcat it's also impossible to fetch metadata about topics (meaning if I specify this broker as bootstrap server).

Kafka brokers will share same locations to store data logs; if they are in a cluster

I am reading one of the article related to Kafka basics. If one of the Kafka brokerX dies in a cluster then, that brokerX data copies will move to other live brokers, which are in the cluster.
If that is the case, Is zookeeper/Kafka Controller will copy the brokerX data folder and move to live brokers like copy paste from one machine hard-disc to another (physical copy)?
Or, live brokers will share a common location ? so that will zookeeper/controller will link/point to the brokerX locations(logical copy) ?
I am little hard in understanding here. Could someone help me on this?
If a broker dies, it's dead. There's no background process that will copy data off of it
The replication of topics only happens while the broker is running
Plus, that image is wrong. The partitions = 2 means exactly that. A third partition doesn't just appear when a broker dies
This all depends if the topic has a replication factor > 1. In this case, brokers holding follower replica are constantly sending fetch requests to the leader replica (a specific broker), with a goal of being head to head with the leader (both the follower replica and leader replica having the same records stored on disk).
So when a broker goes down, all it takes is for the controller to select and promote an in-sync replica (by default, but could select non insync replicas) to take over as the leader of the partition. No copy/paste required, all brokers holding a partition(s) (as a follower replica or leader replica) of that topic are storing the same information prior to shutting down.
If a broker dies the behaviour depends on the dead broker. If it was not the leader for its partition it's non problem. when the broker returns on-line it will have to copy all missing data from the leader replica. If the dead broker was the leader for the partition a new leader will be elected according to some rules. If the new elected leader was in sync before the old leader died, there will be no message loss and the follower brokers will sync their replica from the new leader, as the broken leader will do when up again. If the new elected leader was not in sync you might have some message loss. Anyway you can drive the behaviour of your kafka cluster setting various parameters to balance speed, data integrity and reliability.

KafkaStreams stop consuming partitions after partition leader rebalance

We have experimented an issue that could be caused by the parameter auto.leader.rebalance.enable, which is set to true by default on brokers.
In detail, when the automatic rebalance occurs, for example after a broker restart, some partition leaders are moved to match the preferred leader.
After this event, some stateful Kafka Streams applications blocks on the source partitions whose leader has been moved and the consumer lag start to grow.
Is it a known issue? Why don't applications receive the information regarding the change of leader?
The tactical solution we found in case we need to execute a rolling restart of brokers is:
Stop stateful applications
Perform brokers rolling restarts.
Wait 5 minutes (default value) untile the automatic leader rebalance occurs
Start stateful applications.
We are using Confluent Platform Community 5.2.2, deployed on a 3 node on prem cluster.
We are trying to recreate what happened in the test environment but without success. is it possible that it is influenced by the load of the cluster, much lower in test?
Thanks in Advance!
Giorgio

What are the impacts of kafka broker being incative for long duration and startup after many days?

We are tackling with production issue which might take few days to fix. Majority of Kafka nodes are active. One node is down. We will bring it up after the bugs are fixed. Our Kafka version is 2.1.X.
I was curious what are the impacts of starting an inactive broker after few days.
Are there any issues we might observe ? (Especially impacts on consumer after replicas are catching up on restarted broker.)
What are the contingencies to rollout safely ?
Whenever a broker is down, it's recommended to restore as quickly as you can. The consumer offsets expire and log-end offsets are also getting cleaned regularly in an active cluster.
We were able to restore node after 4 days but it wasn't easy operation. We restore the Kafka cluster by enabling unclean leader election. We were having controlled shutdowns due to bad leader assignments. After the inactive node was restored, we disabled the unclean leader election.
Things to take into account:
In prod usually the clients can't have any downtime. Monitor consumer
groups for any long rebalances or lagging commits beyond SLA's.
Running a preferred replica election if the offset on restored nodes
are live.
Reset offsets on consumer group. This does require a short
downtime.
Rollback:
You can rollback topic partition using reassignment tool but there are no easy rollback.

Zookeeper failures in Kafka 0.9 and above

Based of the answer given in Is Zookeeper a must for Kafka?.
It is clear that what is the responsibility of Zookeeper in Kafka 0.9 and above
I just wanted to understand what will be the impact if zookeeper cluster goes down completely?
kafka uses ZK for membership (figure out what brokers exist and which of them are alive) and leader election (elect the one broker that is controller for the cluster at any moment).
simply put - if ZK fails kafka dies.
if ZK sneezes (say a particularly long GC pause or a short network connectivity issue) a kafka cluster may temporarily "lose" any number of members and/or the controller. by the time this settles you may have a new controller and new leader brokers for all partitions (which may or may not cause loss of acknowledged data, see "unclean leader election"). I'm not sure if all ongoing transactions would be rolled back - never tried.