What happens when one of the Kafka replicas is down - apache-kafka

I have a cluster of 2 Kafka brokers and a topic with replication factor 2. If one of the brokers dies, will my producers be able to continue sending new messages to this degraded cluster of 1 node? Or replication factor 2 requires 2 alive nodes and messaged will be refused?

It depends on a few factors:
What is your producer configuration for acks? If you configure to "all", the leader broker won't answer with an ACK until the message have been replicated to all nodes in ISR list. At this point is up to your producer to decide if he cares about ACKs or not.
What is your value for min.insync.replicas? If the number of nodes is below this config, your broker leader won't accept more messages from producers until more nodes are available.
So basically your producers may get into a pause for a while, until more nodes are up.

Messages will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node.
You can reproduce this scenario by configuring the replication factor as 3 or more and start only one broker.

Kafka will handle reassigning partitions for producers and consumers that where dealing with the partitions lost, but it will problematic for new topics.
You could start one broker with a replication factor of 2 or 3. It does work. However, you could not create a topic with that replication factor until you have that amount of brokers in the cluster. Either the topic is auto generated on the first message or created manually, kafka will throw an error.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic test
Error while executing topic command : Replication factor: 3 larger than available brokers: 1.
[2018-08-08 15:23:18,339] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.

As soon as new node joined to the kafka cluster, data will be replicated, the replicas factor does not effect the publisher messages

replication-factor 2 doesn't require 2 live brokers, it publish message while one broker is down depends on those configurations
- acks
- min.insync.replicas
Check those configurations as mentioned above #Javier

Related

Kafka automatic replication to new node

I am a newbie to Kafka and wanted to understand the behaviour of apache Kafka in below scenario.
Consider I have the topic with 3:
partitions 3
brokers 3
replication factor 3
min isr 2
Producer acks = all
Unclean leader election false
As per my understanding if broker 1 goes down there is no harm and no data loss as isr=2 and writes will be successful.
If node 1 comes back up it will again follow the leader and catch up.
My question is if node 1 never comes back up..also it's removed from isr list... My replication factor desired is 3... and if I add new node 4 ..how to automatically make node 4 copy the partitions from failed node 1...so that replication of 3 is still maintained?
Topic replicas stored in zookeper and if you add new broker you should update replicas value for topic(s). (As far as I know there is no way to do it automatically)
But you can do it manually by using kafka-reassign-partitions.sh tool.
Steps:
Create a json file to represent your desired replicas for partitions. For example:
{"version":1,
"partitions":[
{"topic":"YourTopic","partition":0,"replicas":[3,2,4]},
{"topic":"YourTopic","partition":1,"replicas":[2,4,3]},
{"topic":"YourTopic","partition":2,"replicas":[4,3,2]}
]}
Execute this command to reassign partitions.
./kafka/bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file my_file.json --execute

Doubts Regarding Kafka Cluster Setup

I have a use case I want to set up a Kafka cluster initially at the starting I have 1 Kafka Broker(A) and 1 Zookeeper Node. So below mentioned are my queries:
On adding a new Kafka Broker(B) to the cluster. Will all data present on broker A will be distributed automatically? If not what I need to do distribute the data.
Not let's suppose somehow the case! is solved my data is distributed on both the brokers. Now due to some maintenance issue, I want to take down the server B.
How to transfer the data of Broker B to the already existing broker A or to a new Broker C.
How can I increase the replication factor of my brokers at runtime
How can I change the zookeeper IPs present in Kafka Broker Config at runtime without restarting Kafka?
How can I dynamically change the Kafka Configuration at runtime
Regarding Kafka Client:
Do I need to specify all Kafka broker IP to kafkaClient for connection?
And each and every time a broker is added or removed does I need to add or remove my IP in Kafka Client connection String. As it will always require to restart my producer and consumers?
Note:
Kafka Version: 2.0.0
Zookeeper: 3.4.9
Broker Size : (2 core, 8 GB RAM) [4GB for Kafka and 4 GB for OS]
To run a topic from a single kafka broker you will have to set a replication factor of 1 when creating that topic (explicitly, or implicitly via default.replication.factor). This means that the topic's partitions will be on a single broker, even after increasing the number of brokers.
You will have to increase the number of replicas as described in the kafka documentation. You will also have to pay attention that the internal __consumer_offsets topic has enough replicas. This will start the replication process and eventually the original broker will be the leader of every topic partition, and the other broker will be the follower and fully caught up. You can use kafka-topics.sh --describe to check that every partition has both brokers in the ISR (in-sync replicas).
Once that is done you should be able to take the original broker offline and kafka will elect the new broker as the leader of every topic partition. Don't forget to update the clients so they are aware of the new broker as well, in case a client needs to restart when the original broker is down (otherwise it won't find the cluster).
Here are the answers in brief:
Yes, the data present on broker A will also be distributed in Kafka broker B
You can set up three brokers A, B and C so if A fails then B and C will, and if B fails then, C will take over and so on.
You can increase the replication factor of your broker
you could create increase-replication-factor.json and put this content in it:
{"version":1,
"partitions":[
{"topic":"signals","partition":0,"replicas":[0,1,2]},
{"topic":"signals","partition":1,"replicas":[0,1,2]},
{"topic":"signals","partition":2,"replicas":[0,1,2]}
]}
To increase the number of replicas for a given topic, you have to:
Specify the extra partitions to the existing topic with below command(let us say the increase from 2 to 3)
bin/kafktopics.sh --zookeeper localhost:2181 --alter --topic topic-to-increase --partitions 3
There is zoo.cfg file where you can add the IP and configuration related to ZooKeeper.

One Kafka broker connects to multiple zookeepers

I'm new to Kafka, zookeeper and Storm.
I our environment we have one Kafka broker connecting to multiple zookeepers. Is there an advantage having the producer send the messages to a specific topic and partition on one broker to multiple zookeepers vs multiple brokers to multiple zookeepers?
Yes there is. Kafka allows you to scale by adding brokers. When you use a Kafka cluster with a single broker, as you have, all partitions reside on that single broker. But when you have multiple brokers, Kafka will split the partitions between them. So, broker A may be elected leader for partitions 1 and 2 of your topic, and broker B leader for partition 3. So, when you publish messages to the topic, the client will split the messages between the various partitions on the two brokers.
Note that I also mentioned leader election. Adding brokers to your Kafka cluster gives you replication. Kafka uses ZooKeeper to elect a leader for each partition as I mentioned in my example. Once a leader is elected, the client splits messages among partitions and sends each message to the leader for the appropriate partition. Depending on the topic configuration, the leader may synchronously replicate messages to a backup. So, in my example, if the replication factor for the topic is 2 then broker A will synchronously replicate messages for partitions 1 and 2 to broker B and broker B will synchronously replicate messages for partition 3 to broker A.
So, that's all to say that adding brokers gives you both scalability and fault-tolerance.

How to load balance the Kafka Leadership?

My kafka version is kafka_2.9.2-0.8.1.1. I have two brokers in the cluster, 4 topics and each topic has 4 partitions.
When I run
sh kafka-topics.sh --describe --zookeeper rhost:2181
for all the topics/partitions, I see broker 1 as Leader.
How can I load balance the leader?
For example, for topic 1 and topic 2 have broker 1 as leader and
for topic 3 and topic 4 have broker 2 as leader.
The partitions should be automatically rebalanced, since the default value of the broker configuration parameter auto.leader.rebalance.enable is true. (see documentation)
However, by default this rebalance occurs every 5 minutes, as defined by the leader.imbalance.check.interval.seconds parameter. If you wish this to occur more frequently, you will have to modify this parameter.
You can use the Preferred Replica Leader Election Tool:
sh kafka-preferred-replica-election.sh --zookeeper zklist
This guarantees that the leadership load across the brokers in a cluster is evenly balanced.
I know it is a bit late, maybe you already have the answer, but to balance leaders, first you need to make brokers equally preferred between all partitions, for a broker to be a "preferred leader" it has two criteria, first, it needs to be in sync replica, second, it has to be the first element on the replicas list. So, if you have a small enough number of topics/partitions, you can do that manually, it would be easier, otherwise you need to reassign partitions with distributing the first element (preferred replica) among all brokers, then kick off preferred leader election tool which will make sure that the preferred leader is actually the leader.
Brokers have a property which can be set in server.properties file, which will enable auto re-balancing of the leadership. By default, it is not enabled. Add the following line of code to every broker and restart kafka.
auto.leader.rebalance.enable=true

Partition re-balance on brokers in Kafka 0.8

The relatively scarce documentation for Kafka 0.8 does not mention what the expected behaviour for balancing existing topics, partitions and replicas on brokers is.
More specifically, what is the expected behaviour on arrival of a broker and on crash of a broker (leader or not) ?
Thanks.
I have tested those 2 cases a while ago and not under heavy load. I have one producer sending 10k messages (just a little string) synchronously to a topic, with replication factor of 2, with 2 partitions, on a cluster of 2 brokers. There are 2 consumers. Each component is deployed on a separate machine. What I have observed is :
On normal operation : broker 1 is leader on partition 1 and replica on partition 2. broker 2 is leader on partition 2 and replica on partition 1. Bring a broker 3 into the cluster don't trigger rebalance on partitions automatically.
On broker revival (crashed than reboot) : rebalancing is transparent to the producer and consumers. The rebooting broker replicate the log first and then make itself available.
On broker crashed (leader or not) : simulated by a kill -9 on any one broker. The producer and consumers get frozen until the ephemeral node in ZK of the killed broker is expired. After that, operations are resumed normally.