Partition re-balance on brokers in Kafka 0.8 - apache-kafka

The relatively scarce documentation for Kafka 0.8 does not mention what the expected behaviour for balancing existing topics, partitions and replicas on brokers is.
More specifically, what is the expected behaviour on arrival of a broker and on crash of a broker (leader or not) ?
Thanks.

I have tested those 2 cases a while ago and not under heavy load. I have one producer sending 10k messages (just a little string) synchronously to a topic, with replication factor of 2, with 2 partitions, on a cluster of 2 brokers. There are 2 consumers. Each component is deployed on a separate machine. What I have observed is :
On normal operation : broker 1 is leader on partition 1 and replica on partition 2. broker 2 is leader on partition 2 and replica on partition 1. Bring a broker 3 into the cluster don't trigger rebalance on partitions automatically.
On broker revival (crashed than reboot) : rebalancing is transparent to the producer and consumers. The rebooting broker replicate the log first and then make itself available.
On broker crashed (leader or not) : simulated by a kill -9 on any one broker. The producer and consumers get frozen until the ephemeral node in ZK of the killed broker is expired. After that, operations are resumed normally.

Related

How failover works in kafka along with keeping up replication factor

I am trying to understand how failover and replication factors work in kafka.
Let's say my cluster has 3 brokers and replication factor is also 3. In this case each broker will have one copy of partition and one of the broker is leader. If leader broker fails, then one of the follower broker will become leader but now the replication factor is down to 2. At this point if I add a new broker in the cluster, will kafka make sure that replication factor is 3 and will it copy the required data on the new broker.
How will above scenario work if my cluster already has an addition broker.
In your setup (3 broker, 3 replicas), when 1 broker fails Kafka will automatically elect new leaders (on the remaining brokers) for all the partitions whose leaders were on the failing broker.
The replication factor does not change. The replication factor is a topic configuration that can only be changed by the user.
Similarly the Replica list does not change. This lists the brokers that should host each partition.
However, the In Sync Replicas (ISR) list will change and only contain the 2 remaining brokers.
If you add another broker to the cluster, what happens depend on its broker.id:
if the broker.id is the same as the broker that failed, this new broker will start replicating data and eventually join the ISR for all the existing partitions.
if it uses a different broker.id, nothing will happen. You will be able to create new topics with 3 replicas (that is not possible while there are only 2 brokers) but Kafka will not automatically replicate existing partitions. You can manually trigger a reassignment if needed, see the docs.
Leaving out partitions (which is another concept of Kafka):
The replication factor does not say how many times a topic is replicated, but rather how many times it should be replicated. It is not affected by brokers shutting down.
Once a leader broker shuts down, the "leader" status goes over to another broker which is in sync, that means a broker that has the current state replicated and is not behind. Electing "leader" status to a broker that is not in sync would obviously lead to data loss, so this will never happen (when using the right settings).
These replicas eligible for taking "leader status" are called in-sync replica (ISR), which is important, as there is a configuration called min.insync.replicas that specifies how many ISR have to exist for a Kafka message to be acknowledged. If this is set to 0, every Kafka message is acknowledged as "successful" as soon as it enters the "leader" broker, if this broker would die, all data that was not replicated yet is lost. If min.insync.replicas would be set to 1, every message waits with the acknowledgement, until at least 1 replica exists in order to be "successful", so if the broker would die now, there would be a replica covering this data. If there are not enough brokers to cover the minimum amount of replicas, your cluster will fail eventually.
So to answer your question: if you had 2 running brokers, min.insync.replicas=1 (default) and replication factor of 3, your cluster runs fine and will add a replica as soon as you start up another broker. If another of the 2 brokers dies before you launch the third one, you will run into problems.

How to monitor replication progress for a Kafka topic

I have a Kafka topic that somehow went from 3 ISRs to 1 ISR, in a kafka cluster with 3 brokers. I changed minimum ISR from 2 to 1 to allow it to function. Presumably the other brokers are trying to replicate the topic from the leader, how can I monitor their progress?
Actually you can monitor those metrics to see the replication lag:
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)
Lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader.
As stated in https://kafka.apache.org/documentation/
Yannick

Automatic change kafka topic partition leader

I have an issue with my Kafka cluster.
I have 3 brokers, so when I stop the broker 1 (for example), each topic partition with leader 1 change his leader with the seconde broker in replica configuration.
So this is the good behavior and it works fine.
But when I restart the broker 1 I need to execute:
./kafka-preferred-replica-election.sh --zookeeper myHost
because the current leader is the other replica.
So my question is :
there is a way to configure Kafka to do it automatically ?
thx
I'm assuming your default (when all brokers are running) assignment is balanced, and the preferred leaders are evenly spread.
Yes Kafka can re-elect the preferred leaders for all the partitions automatically when a broker is restarted. This is actually enabled by default, see auto.leader.rebalance.enable.
Upon restarting a broker, Kafka can take up to leader.imbalance.check.interval.seconds to trigger the re-election. This defaults to 5 minutes. So maybe you just did not wait long enough!
There is also leader.imbalance.per.broker.percentage which defines the percentage of non-preferred leaders allowed. This default to 10%.
For the full details about these configuration, see the broker config section on Kafka's website.

One Kafka broker connects to multiple zookeepers

I'm new to Kafka, zookeeper and Storm.
I our environment we have one Kafka broker connecting to multiple zookeepers. Is there an advantage having the producer send the messages to a specific topic and partition on one broker to multiple zookeepers vs multiple brokers to multiple zookeepers?
Yes there is. Kafka allows you to scale by adding brokers. When you use a Kafka cluster with a single broker, as you have, all partitions reside on that single broker. But when you have multiple brokers, Kafka will split the partitions between them. So, broker A may be elected leader for partitions 1 and 2 of your topic, and broker B leader for partition 3. So, when you publish messages to the topic, the client will split the messages between the various partitions on the two brokers.
Note that I also mentioned leader election. Adding brokers to your Kafka cluster gives you replication. Kafka uses ZooKeeper to elect a leader for each partition as I mentioned in my example. Once a leader is elected, the client splits messages among partitions and sends each message to the leader for the appropriate partition. Depending on the topic configuration, the leader may synchronously replicate messages to a backup. So, in my example, if the replication factor for the topic is 2 then broker A will synchronously replicate messages for partitions 1 and 2 to broker B and broker B will synchronously replicate messages for partition 3 to broker A.
So, that's all to say that adding brokers gives you both scalability and fault-tolerance.

High Level Consumer Failure in Kafka

I have the following Kafka Setup
Number of producer : 1
Number of topics : 1
Number of partitions : 2
Number of consumers : 3 (with same group id)
Number of Kafka cluster : none(single Kafka server)
Zookeeper.session.timeout : 1000
Consumer Type : High Level Consumer
Producer produces messages without any specific partitioning logic(default partitioning logic).
Consumer 1 consumes message continuously. I am abruptly killing consumer 1 and I would except consumer 2 or consumer 3 to consume the messages after the failure of consumer 1.
In some cases rebalance occurs and consumer 2 starts consuming messages. This is perfectly fine.
But in some cases either consumer 2 or consumer 3 is not at all consuming. I have to manually kill all the consumers and start all three consumers again.
Only after this restart consumer 1 starts consuming again.
Precisely rebalance is successful in some cases while in some cases rebalance is not successful.
Is there any configuration that I am missing.
Kafka uses Zookeeper to coordinate high level consumers.
From http://kafka.apache.org/documentation.html :
Partition Owner registry
Each broker partition is consumed by a single consumer within a given
consumer group. The consumer must establish its ownership of a given
partition before any consumption can begin. To establish its
ownership, a consumer writes its own id in an ephemeral node under the
particular broker partition it is claiming.
/consumers/[group_id]/owners/[topic]/[broker_id-partition_id] -->
consumer_node_id (ephemeral node)
There is a known ephemeral nodes quirk that they can linger up to 30 seconds after ZK client suddenly goes down :
http://developers.blog.box.com/2012/04/10/a-gotcha-when-using-zookeeper-ephemeral-nodes/
So you may be running into this if you expect consumer 2 and 3 to start reading messages immediately after #1 is terminated.
You can also check that /consumers/[group_id]/owners/[topic]/[broker_id-partition_id] contains correct data after rebalancing.