Kafka broker goes out of sync for random partitions

Kafka broker goes out of sync for random partitions - apache-kafka

I have a setup of 4 Kafka brokers. Each partition in each topic in my setup has a replication factor of 2. All partitions are balanced - Leaders and followers are uniformly distributed
This setup has been running for over 6 months
While monitoring the setup via Kafka Manager I see that 8% of my partitions are under-replicated.
All these partitions were assigned to the same set of replicas. And every partition which was assigned to this set of replicas is displayed as under-replicated
Lets call this set of brokers as [1,2] - broker 1 and 2. The ISR for all these partitions is [1] right now.
Both brokers 1 and 2 are up and running. All other partitions have the ISR count as expected.
The script bin/kafka-topics.sh also shows 8% of partitions to be under replicated.
But the jolokia metric - UnderReplicatedPartitions - is 0
I need help to answer -
Is there an issue?
Why is there an inconsistency in the jolokia metric and kafka console?
How can I fix the issue ?

I can't say anything about "jolokia metric" but we experienced the same because we had a "slow" broker which was lagging behind with replicating the data.
"Slow" meaning that the replication requests somes breached the broker-wide configuration replica.lag.time.max.ms which defaults to 10 seconds and is described as:
"If a follower hasn't sent any fetch requests or hasn't consumed up to the leaders log end offset for at least this time, the leader will remove the follower from isr"
Slightly increasing this configuration solved the problem for us.

Related

Kafka - broker partitions not in-sync after restart

We use 3 node kafka clusters running 2.7.0 with quite high number of topics and partitions. Almost all the topics have only 1 partition and replication factor of 3 so that gives us roughly:
topics: 7325
partitions total in cluster (including replica): 22110
Brokers are relatively small with
6vcpu
16gb memory
500GB in /var/lib/kafka occupied by partitions data
As you can imagine because we have 3 brokers and replication factor 3 the data is very evenly spread across brokers. Each broker leads very similar (same) amount of partitions and the number of partitions per broker is equal. Under normal circumstances.
Before doing rolling restart yesterday everything was in-sync. We stopped the process and started it again after 1 minute. It took some 10minutes to get synchronized with Zookeeper and start listening on port.
After saing 'Kafka server started'. Nothing is happening. There is no CPU, memory or disk activity. The partition data is visible on data disk. There are no messages in log for more than 1 day now since process booted up.
We've tried restarting zookeeper cluster (one by one). We've tried restart of broker again. Now it's been 24 hours since last restart and still not change.
Broker itself is reporting it leads 0 partitions. Leadership for all the partitions moved to other brokers and they are reporting that everything located in this broker is not in sync.
I'm aware the number of partitions per broker is far exceeding the recommendation but I'm still confused by lack of any activity or log messages. Any ideas what should be checked further? It looks like something is stuck somewhere. I checked the kafka ACLs and there are no block messages related to broker username.
I tried another restart with DEBUG mode and it seems there is some problem with metadata. These two messages are constantly repeating:
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: No controller defined in metadata cache, retrying after backoff (kafka.server.BrokerToControllerRequestThread)
With kcat it's also impossible to fetch metadata about topics (meaning if I specify this broker as bootstrap server).

How to monitor replication progress for a Kafka topic

I have a Kafka topic that somehow went from 3 ISRs to 1 ISR, in a kafka cluster with 3 brokers. I changed minimum ISR from 2 to 1 to allow it to function. Presumably the other brokers are trying to replicate the topic from the leader, how can I monitor their progress?

Actually you can monitor those metrics to see the replication lag:
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)
Lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader.
As stated in https://kafka.apache.org/documentation/
Yannick

Automatic change kafka topic partition leader

I have an issue with my Kafka cluster.
I have 3 brokers, so when I stop the broker 1 (for example), each topic partition with leader 1 change his leader with the seconde broker in replica configuration.
So this is the good behavior and it works fine.
But when I restart the broker 1 I need to execute:
./kafka-preferred-replica-election.sh --zookeeper myHost
because the current leader is the other replica.
So my question is :
there is a way to configure Kafka to do it automatically ?
thx

I'm assuming your default (when all brokers are running) assignment is balanced, and the preferred leaders are evenly spread.
Yes Kafka can re-elect the preferred leaders for all the partitions automatically when a broker is restarted. This is actually enabled by default, see auto.leader.rebalance.enable.
Upon restarting a broker, Kafka can take up to leader.imbalance.check.interval.seconds to trigger the re-election. This defaults to 5 minutes. So maybe you just did not wait long enough!
There is also leader.imbalance.per.broker.percentage which defines the percentage of non-preferred leaders allowed. This default to 10%.
For the full details about these configuration, see the broker config section on Kafka's website.

How to configure the time it takes for a kafka cluster to re-elect partition leaders after stopping and restarting a broker?

I have the following setup:
3 kafka brokers and a 3 zookeeper ensamble
1 topic with 12 partitions and 3 replicas (each kafka broker is thus the leader of 4 partitions)
I stop one of the brokers - it gets removed from the cluster, leadership of its partitions is moved to the two remaining brokers
I start the broker back - it reappears in the cluster, and eventually the leadership gets rebalanced so each broker is the leader of 4 partitions.
It works OK, except I find the time spent before the rebalancing too long (like minutes). This happens under no load - no messages are sent to the cluster, no messages are consumed.
Kafka version 0.9.0.0, zookeeper 3.4.6
zookeeper tickTime = 2000
kafka zookeeper.connection.timeout.ms = 6000
(basically the default config)
Does anyone know what config parameters in kafka and/or zookeeper influence the time taken for the leader rabalancing ?

as said in the official documentation http://kafka.apache.org/documentation.html#configuration (More details about broker configuration can be found in the scala class kafka.server.KafkaConfig.)
there actually is a leader.imbalance.check.interval.seconds property which defaults to 300 (5 minutes), setting it to 30 seconds does what I need.

Partition re-balance on brokers in Kafka 0.8

The relatively scarce documentation for Kafka 0.8 does not mention what the expected behaviour for balancing existing topics, partitions and replicas on brokers is.
More specifically, what is the expected behaviour on arrival of a broker and on crash of a broker (leader or not) ?
Thanks.

I have tested those 2 cases a while ago and not under heavy load. I have one producer sending 10k messages (just a little string) synchronously to a topic, with replication factor of 2, with 2 partitions, on a cluster of 2 brokers. There are 2 consumers. Each component is deployed on a separate machine. What I have observed is :
On normal operation : broker 1 is leader on partition 1 and replica on partition 2. broker 2 is leader on partition 2 and replica on partition 1. Bring a broker 3 into the cluster don't trigger rebalance on partitions automatically.
On broker revival (crashed than reboot) : rebalancing is transparent to the producer and consumers. The rebooting broker replicate the log first and then make itself available.
On broker crashed (leader or not) : simulated by a kill -9 on any one broker. The producer and consumers get frozen until the ephemeral node in ZK of the killed broker is expired. After that, operations are resumed normally.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse