Automatic change kafka topic partition leader - apache-kafka

I have an issue with my Kafka cluster.
I have 3 brokers, so when I stop the broker 1 (for example), each topic partition with leader 1 change his leader with the seconde broker in replica configuration.
So this is the good behavior and it works fine.
But when I restart the broker 1 I need to execute:
./kafka-preferred-replica-election.sh --zookeeper myHost
because the current leader is the other replica.
So my question is :
there is a way to configure Kafka to do it automatically ?
thx

I'm assuming your default (when all brokers are running) assignment is balanced, and the preferred leaders are evenly spread.
Yes Kafka can re-elect the preferred leaders for all the partitions automatically when a broker is restarted. This is actually enabled by default, see auto.leader.rebalance.enable.
Upon restarting a broker, Kafka can take up to leader.imbalance.check.interval.seconds to trigger the re-election. This defaults to 5 minutes. So maybe you just did not wait long enough!
There is also leader.imbalance.per.broker.percentage which defines the percentage of non-preferred leaders allowed. This default to 10%.
For the full details about these configuration, see the broker config section on Kafka's website.

Related

How failover works in kafka along with keeping up replication factor

I am trying to understand how failover and replication factors work in kafka.
Let's say my cluster has 3 brokers and replication factor is also 3. In this case each broker will have one copy of partition and one of the broker is leader. If leader broker fails, then one of the follower broker will become leader but now the replication factor is down to 2. At this point if I add a new broker in the cluster, will kafka make sure that replication factor is 3 and will it copy the required data on the new broker.
How will above scenario work if my cluster already has an addition broker.
In your setup (3 broker, 3 replicas), when 1 broker fails Kafka will automatically elect new leaders (on the remaining brokers) for all the partitions whose leaders were on the failing broker.
The replication factor does not change. The replication factor is a topic configuration that can only be changed by the user.
Similarly the Replica list does not change. This lists the brokers that should host each partition.
However, the In Sync Replicas (ISR) list will change and only contain the 2 remaining brokers.
If you add another broker to the cluster, what happens depend on its broker.id:
if the broker.id is the same as the broker that failed, this new broker will start replicating data and eventually join the ISR for all the existing partitions.
if it uses a different broker.id, nothing will happen. You will be able to create new topics with 3 replicas (that is not possible while there are only 2 brokers) but Kafka will not automatically replicate existing partitions. You can manually trigger a reassignment if needed, see the docs.
Leaving out partitions (which is another concept of Kafka):
The replication factor does not say how many times a topic is replicated, but rather how many times it should be replicated. It is not affected by brokers shutting down.
Once a leader broker shuts down, the "leader" status goes over to another broker which is in sync, that means a broker that has the current state replicated and is not behind. Electing "leader" status to a broker that is not in sync would obviously lead to data loss, so this will never happen (when using the right settings).
These replicas eligible for taking "leader status" are called in-sync replica (ISR), which is important, as there is a configuration called min.insync.replicas that specifies how many ISR have to exist for a Kafka message to be acknowledged. If this is set to 0, every Kafka message is acknowledged as "successful" as soon as it enters the "leader" broker, if this broker would die, all data that was not replicated yet is lost. If min.insync.replicas would be set to 1, every message waits with the acknowledgement, until at least 1 replica exists in order to be "successful", so if the broker would die now, there would be a replica covering this data. If there are not enough brokers to cover the minimum amount of replicas, your cluster will fail eventually.
So to answer your question: if you had 2 running brokers, min.insync.replicas=1 (default) and replication factor of 3, your cluster runs fine and will add a replica as soon as you start up another broker. If another of the 2 brokers dies before you launch the third one, you will run into problems.

Doubts Regarding Kafka Cluster Setup

I have a use case I want to set up a Kafka cluster initially at the starting I have 1 Kafka Broker(A) and 1 Zookeeper Node. So below mentioned are my queries:
On adding a new Kafka Broker(B) to the cluster. Will all data present on broker A will be distributed automatically? If not what I need to do distribute the data.
Not let's suppose somehow the case! is solved my data is distributed on both the brokers. Now due to some maintenance issue, I want to take down the server B.
How to transfer the data of Broker B to the already existing broker A or to a new Broker C.
How can I increase the replication factor of my brokers at runtime
How can I change the zookeeper IPs present in Kafka Broker Config at runtime without restarting Kafka?
How can I dynamically change the Kafka Configuration at runtime
Regarding Kafka Client:
Do I need to specify all Kafka broker IP to kafkaClient for connection?
And each and every time a broker is added or removed does I need to add or remove my IP in Kafka Client connection String. As it will always require to restart my producer and consumers?
Note:
Kafka Version: 2.0.0
Zookeeper: 3.4.9
Broker Size : (2 core, 8 GB RAM) [4GB for Kafka and 4 GB for OS]
To run a topic from a single kafka broker you will have to set a replication factor of 1 when creating that topic (explicitly, or implicitly via default.replication.factor). This means that the topic's partitions will be on a single broker, even after increasing the number of brokers.
You will have to increase the number of replicas as described in the kafka documentation. You will also have to pay attention that the internal __consumer_offsets topic has enough replicas. This will start the replication process and eventually the original broker will be the leader of every topic partition, and the other broker will be the follower and fully caught up. You can use kafka-topics.sh --describe to check that every partition has both brokers in the ISR (in-sync replicas).
Once that is done you should be able to take the original broker offline and kafka will elect the new broker as the leader of every topic partition. Don't forget to update the clients so they are aware of the new broker as well, in case a client needs to restart when the original broker is down (otherwise it won't find the cluster).
Here are the answers in brief:
Yes, the data present on broker A will also be distributed in Kafka broker B
You can set up three brokers A, B and C so if A fails then B and C will, and if B fails then, C will take over and so on.
You can increase the replication factor of your broker
you could create increase-replication-factor.json and put this content in it:
{"version":1,
"partitions":[
{"topic":"signals","partition":0,"replicas":[0,1,2]},
{"topic":"signals","partition":1,"replicas":[0,1,2]},
{"topic":"signals","partition":2,"replicas":[0,1,2]}
]}
To increase the number of replicas for a given topic, you have to:
Specify the extra partitions to the existing topic with below command(let us say the increase from 2 to 3)
bin/kafktopics.sh --zookeeper localhost:2181 --alter --topic topic-to-increase --partitions 3
There is zoo.cfg file where you can add the IP and configuration related to ZooKeeper.

How to configure the time it takes for a kafka cluster to re-elect partition leaders after stopping and restarting a broker?

I have the following setup:
3 kafka brokers and a 3 zookeeper ensamble
1 topic with 12 partitions and 3 replicas (each kafka broker is thus the leader of 4 partitions)
I stop one of the brokers - it gets removed from the cluster, leadership of its partitions is moved to the two remaining brokers
I start the broker back - it reappears in the cluster, and eventually the leadership gets rebalanced so each broker is the leader of 4 partitions.
It works OK, except I find the time spent before the rebalancing too long (like minutes). This happens under no load - no messages are sent to the cluster, no messages are consumed.
Kafka version 0.9.0.0, zookeeper 3.4.6
zookeeper tickTime = 2000
kafka zookeeper.connection.timeout.ms = 6000
(basically the default config)
Does anyone know what config parameters in kafka and/or zookeeper influence the time taken for the leader rabalancing ?
as said in the official documentation http://kafka.apache.org/documentation.html#configuration (More details about broker configuration can be found in the scala class kafka.server.KafkaConfig.)
there actually is a leader.imbalance.check.interval.seconds property which defaults to 300 (5 minutes), setting it to 30 seconds does what I need.

How to load balance the Kafka Leadership?

My kafka version is kafka_2.9.2-0.8.1.1. I have two brokers in the cluster, 4 topics and each topic has 4 partitions.
When I run
sh kafka-topics.sh --describe --zookeeper rhost:2181
for all the topics/partitions, I see broker 1 as Leader.
How can I load balance the leader?
For example, for topic 1 and topic 2 have broker 1 as leader and
for topic 3 and topic 4 have broker 2 as leader.
The partitions should be automatically rebalanced, since the default value of the broker configuration parameter auto.leader.rebalance.enable is true. (see documentation)
However, by default this rebalance occurs every 5 minutes, as defined by the leader.imbalance.check.interval.seconds parameter. If you wish this to occur more frequently, you will have to modify this parameter.
You can use the Preferred Replica Leader Election Tool:
sh kafka-preferred-replica-election.sh --zookeeper zklist
This guarantees that the leadership load across the brokers in a cluster is evenly balanced.
I know it is a bit late, maybe you already have the answer, but to balance leaders, first you need to make brokers equally preferred between all partitions, for a broker to be a "preferred leader" it has two criteria, first, it needs to be in sync replica, second, it has to be the first element on the replicas list. So, if you have a small enough number of topics/partitions, you can do that manually, it would be easier, otherwise you need to reassign partitions with distributing the first element (preferred replica) among all brokers, then kick off preferred leader election tool which will make sure that the preferred leader is actually the leader.
Brokers have a property which can be set in server.properties file, which will enable auto re-balancing of the leadership. By default, it is not enabled. Add the following line of code to every broker and restart kafka.
auto.leader.rebalance.enable=true

Partition re-balance on brokers in Kafka 0.8

The relatively scarce documentation for Kafka 0.8 does not mention what the expected behaviour for balancing existing topics, partitions and replicas on brokers is.
More specifically, what is the expected behaviour on arrival of a broker and on crash of a broker (leader or not) ?
Thanks.
I have tested those 2 cases a while ago and not under heavy load. I have one producer sending 10k messages (just a little string) synchronously to a topic, with replication factor of 2, with 2 partitions, on a cluster of 2 brokers. There are 2 consumers. Each component is deployed on a separate machine. What I have observed is :
On normal operation : broker 1 is leader on partition 1 and replica on partition 2. broker 2 is leader on partition 2 and replica on partition 1. Bring a broker 3 into the cluster don't trigger rebalance on partitions automatically.
On broker revival (crashed than reboot) : rebalancing is transparent to the producer and consumers. The rebooting broker replicate the log first and then make itself available.
On broker crashed (leader or not) : simulated by a kill -9 on any one broker. The producer and consumers get frozen until the ephemeral node in ZK of the killed broker is expired. After that, operations are resumed normally.