Kafka Replication Factor For 5 node Broker Cluster - apache-kafka

I am setting up a Confluent Kafka Cluster (Community) with 3 Zookeper and 5 Kafka Broker nodes .
The requirement is that we should be able to continue in live environment even if 2 broker nodes are down.
What should be the recommended
-replication factor ,
-in sync replica
for topics with 50 partitions.
In most case the suggested replication factor is 3 . What would be the impact if we increase that to 5 in the mentioned cluster configuration

Setting the replication factor to 5 would mean that all partitions exist on all brokers in the cluster. If two brokers are down, then the replication factor requirement is no longer met and your topics will be under-replicated (should give a warning).
min.insync.replicas should then be set to 3 (or less), otherwise producing a message with acks = all would fail. Producing a message with acks set to 1, 2, or 3 would also work on higher values of min.insync.replicas.
Also note that while two nodes are down, you can't create new topics with a replication factor of 5 (also see KIP-409).

Related

How failover works in kafka along with keeping up replication factor

I am trying to understand how failover and replication factors work in kafka.
Let's say my cluster has 3 brokers and replication factor is also 3. In this case each broker will have one copy of partition and one of the broker is leader. If leader broker fails, then one of the follower broker will become leader but now the replication factor is down to 2. At this point if I add a new broker in the cluster, will kafka make sure that replication factor is 3 and will it copy the required data on the new broker.
How will above scenario work if my cluster already has an addition broker.
In your setup (3 broker, 3 replicas), when 1 broker fails Kafka will automatically elect new leaders (on the remaining brokers) for all the partitions whose leaders were on the failing broker.
The replication factor does not change. The replication factor is a topic configuration that can only be changed by the user.
Similarly the Replica list does not change. This lists the brokers that should host each partition.
However, the In Sync Replicas (ISR) list will change and only contain the 2 remaining brokers.
If you add another broker to the cluster, what happens depend on its broker.id:
if the broker.id is the same as the broker that failed, this new broker will start replicating data and eventually join the ISR for all the existing partitions.
if it uses a different broker.id, nothing will happen. You will be able to create new topics with 3 replicas (that is not possible while there are only 2 brokers) but Kafka will not automatically replicate existing partitions. You can manually trigger a reassignment if needed, see the docs.
Leaving out partitions (which is another concept of Kafka):
The replication factor does not say how many times a topic is replicated, but rather how many times it should be replicated. It is not affected by brokers shutting down.
Once a leader broker shuts down, the "leader" status goes over to another broker which is in sync, that means a broker that has the current state replicated and is not behind. Electing "leader" status to a broker that is not in sync would obviously lead to data loss, so this will never happen (when using the right settings).
These replicas eligible for taking "leader status" are called in-sync replica (ISR), which is important, as there is a configuration called min.insync.replicas that specifies how many ISR have to exist for a Kafka message to be acknowledged. If this is set to 0, every Kafka message is acknowledged as "successful" as soon as it enters the "leader" broker, if this broker would die, all data that was not replicated yet is lost. If min.insync.replicas would be set to 1, every message waits with the acknowledgement, until at least 1 replica exists in order to be "successful", so if the broker would die now, there would be a replica covering this data. If there are not enough brokers to cover the minimum amount of replicas, your cluster will fail eventually.
So to answer your question: if you had 2 running brokers, min.insync.replicas=1 (default) and replication factor of 3, your cluster runs fine and will add a replica as soon as you start up another broker. If another of the 2 brokers dies before you launch the third one, you will run into problems.

kafka Brokers Leader Skewed

I have 10 node kafka cluster .both kafka broker and zookeeper are running on each node. Recently we have added 3 new nodes 8 ,9 and 10 and yesterday 2 node was down (2 and 4). I have a topic with 60 partitions and 3 replication. In kafka manager, Brokers Skewed % is showing as 50 and Brokers Leader Skewed % as 70. I have manually reassigned the partition from UI and Brokers Skewed % is 0 now but it didn't change Brokers Leader Skewed %. I also ran the command:
$ kafka-preferred-replica-election.sh --zookeeper localhost:2181 --path-to-json-file test.json
Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
Use --bootstrap-server instead to specify a broker to connect to.
Created preferred replica election path with ...
Successfully started preferred replica election for partitions Set(..
but it didn't change anything. I can see on UI that brokers 8 and 10 have no leader. How can I rebalance leaders across all brokers evenly? I read that cluster rolling restart of all broker can solve it but I can't (in normal case) restart my production Kafka cluster.
kafka version: 2.3.0
zookeeper version: 3.4.12
added kafka manager screenshot and highlighted the issue with the red circle.
I will appreciate any help.
I was able to solve this issue. if you will see the screenshots carefully, you will notice that none of the preferred replica (first entry of Replicas column) is assigned to broker 8 or 10 and only parition 37 has broker 9 as preferred replica so Preferred Replica Election will not work.
I have shuffled preferred replicas using Manual Partition Assignments (moving broker 8,9 and 10 to replica 0 in 17 paritions (6*3 -2)) option and then running Reassign Partitions and then running Preferred Replica Election.
I hope it will behelpful to others.
Note: 8,9 and 10 node was added recently.

Kafka cluster performance dropped after adding more Kafka brokers

does anybody knows of a possible reason of slowing down messages processing when more Kafka brokers are added to the cluster?
The situation is the following:
1 setup: In a Kafka cluster of 3 brokers I produce some messages to 50 topics (replication factor=2, 1 partition, ack=1), each has a consumer assigned. I measure the avg time to process 1 message (from producing to consuming).
2 setup: I add 2 more Kafka brokers to the cluster - they are created by the same standard tool, so have the same characteristics like cpu/ram, and the same Kafka configs. I create 50 new topics (replication factor=2, 1 partition, ack=1) - just to save my time and not doing replicas reassignment. So the replicas are spread over the 5 brokers. I produce some messages only to the new 50 topics and measure the avg processing time - it became slower in almost 1/3.
So I didn't change any settings of producer, consumers or brokers (except for listing 2 new brokers in the config of Kafka and zookeeper), and can't explain the performance drop. Please point me to any config option/log file/useful article that would help to explain this, and thank you so much in advance.
In a Kafka cluster of 3 brokers I produce some messages to 50 topics
In the first setup, you have 50 topics with 3 brokers.
I add 2 more Kafka brokers to the cluster. I create 50 new topics
In the second setup, you have 100 topics with 5 brokers.
Even supposing scaling should be linear, 100 topics should contain 6 brokers but not 5
So the replicas are spread over the 5 brokers
Here, how the replicas are spread also matters. A broker may be serving 10 partitions as leader, another broker may be serving 7 and so on. This being the case, a particular broker may have more load compared to other brokers. This could be the cause for slow down.
Also, when you have replication.factor=2, what matters here is whether acks=all or acks=1 or acks=0. If you have put acks=all, then all the replicas must acknowledge the write to the producer which could slow it down.
Next is the locality and configuration of the new brokers, under what machine configurations they are running, their CPU config, RAM, processor load, network between the old brokers, new brokers and clients are also worth considering.
Moreover, if your application is consuming a lot of topics, it necessarily would have to make requests to a lot of brokers since the topic partitions are spread among different brokers. Utilizing one broker to the fullest (CPU, memory etc) vs utilizing multiple brokers can be benchmarked.

In Kafka HA, why minimum number of brokers required are 3 and not 2

We are trying to implement Kafka HA using kafka cluster. While doing R&D, we found that minimum number of nodes recommended for zookeeper & kafka brokers are 3.
We understand that why zookeeper should have minimum 3 nodes, because for leader election minimum (n+1)/2 nodes should be up & running.
But its not clear, why minimum 3 kafka brokers are required. Why can't we implement HA with 2 kafka brokers & 3 zookeepr nodes?
The minimum number of nodes of Zookeeper is 3 because of the quorum attribute. It should be odd because the even number of nodes is no used. e.g: Zookeeper with total nodes of 8 can be downgraded to 7. Many nodes in Zookeepers also isn't good because of the consensus algorithm. (e.g: Paxos)
For the Kafka cluster, personally I think it is okay for setting 2 brokers. But it is better with 3 brokers. The reason because of maintaining the ISR - In Sync Replicas.
Let say your Kafka cluster has 2 brokers. To maintain the high availability and the consistency of the data, we will set the replicas and the ISR both to 2. The interesting part is the min-ISR attribute. If you set the min-ISR to 1 then the leader fails, likely you don't have any remaining replicas. If you set the min-ISR to 2, when either the leader or the follower fails, nor the producer and consumer can work.
If our Kafka cluster has 3 brokers and we set the ISR equals to 3, the min-ISR equals to 2. With this configuration, we accept the risk of losing 1 replica (either leader or follower) while working. For instance, if we lose the leader, there has at least one follower that in-sync for switching. If we lose one of the followers, we still have a remaining follower to keep the min-ISR to 2.
In addition to #hqt answer:
You can setup a Kafka HA Cluster with only 2 brokers, but the recommended replication-factor for production is 3, so you need 3 brokers in order to achieve this.
Also you should consider that Confluent are working at migrating the leader election to Kafka, so in the future you will not need Zookeeper anymore, which will possibly implies to have an odd number of Kafka brokers.

What happens when one of the Kafka replicas is down

I have a cluster of 2 Kafka brokers and a topic with replication factor 2. If one of the brokers dies, will my producers be able to continue sending new messages to this degraded cluster of 1 node? Or replication factor 2 requires 2 alive nodes and messaged will be refused?
It depends on a few factors:
What is your producer configuration for acks? If you configure to "all", the leader broker won't answer with an ACK until the message have been replicated to all nodes in ISR list. At this point is up to your producer to decide if he cares about ACKs or not.
What is your value for min.insync.replicas? If the number of nodes is below this config, your broker leader won't accept more messages from producers until more nodes are available.
So basically your producers may get into a pause for a while, until more nodes are up.
Messages will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node.
You can reproduce this scenario by configuring the replication factor as 3 or more and start only one broker.
Kafka will handle reassigning partitions for producers and consumers that where dealing with the partitions lost, but it will problematic for new topics.
You could start one broker with a replication factor of 2 or 3. It does work. However, you could not create a topic with that replication factor until you have that amount of brokers in the cluster. Either the topic is auto generated on the first message or created manually, kafka will throw an error.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic test
Error while executing topic command : Replication factor: 3 larger than available brokers: 1.
[2018-08-08 15:23:18,339] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
As soon as new node joined to the kafka cluster, data will be replicated, the replicas factor does not effect the publisher messages
replication-factor 2 doesn't require 2 live brokers, it publish message while one broker is down depends on those configurations
- acks
- min.insync.replicas
Check those configurations as mentioned above #Javier