Kafka rack awareness and locations of ISR - apache-kafka

I want to build a HA Kafka cluster, where the cluster needs to span 2 availability zones.
I want to be able to continue to read and write to a topic, even if all the brokers in an AZ go down.
If I have at least 2 brokers in each AZ, a replication factor of 3, min ISR of 2 and acks set to All, then I think that a producer write will be acked when one other broker other than the leader also acks the write. Does the rack aware algorithm enforce that the ISR must be located in the other AZ? The docs just mention replicas, not the ISR.
Will this enable me to continue reading and writing in the event of the loss of an AZ? If not, what is needed to achieve this?

If you want a true HA Kafka cluster you need to start with an HA Zookeeper ensemble which typically means 3 Availability Zones because (unlike Kafka brokers) Zookeeper nodes need a quorum (a majority of the original nodes) to operate and you can’t have a majority when half of your nodes are down.
The reason Zookeeper is important is that a proper HA Kafka cluster should not just allow reads and writes after a failure, but also allow new topic creation and new leader elections, both of which require Zookeeper to be operational.

Related

How failover works in kafka along with keeping up replication factor

I am trying to understand how failover and replication factors work in kafka.
Let's say my cluster has 3 brokers and replication factor is also 3. In this case each broker will have one copy of partition and one of the broker is leader. If leader broker fails, then one of the follower broker will become leader but now the replication factor is down to 2. At this point if I add a new broker in the cluster, will kafka make sure that replication factor is 3 and will it copy the required data on the new broker.
How will above scenario work if my cluster already has an addition broker.
In your setup (3 broker, 3 replicas), when 1 broker fails Kafka will automatically elect new leaders (on the remaining brokers) for all the partitions whose leaders were on the failing broker.
The replication factor does not change. The replication factor is a topic configuration that can only be changed by the user.
Similarly the Replica list does not change. This lists the brokers that should host each partition.
However, the In Sync Replicas (ISR) list will change and only contain the 2 remaining brokers.
If you add another broker to the cluster, what happens depend on its broker.id:
if the broker.id is the same as the broker that failed, this new broker will start replicating data and eventually join the ISR for all the existing partitions.
if it uses a different broker.id, nothing will happen. You will be able to create new topics with 3 replicas (that is not possible while there are only 2 brokers) but Kafka will not automatically replicate existing partitions. You can manually trigger a reassignment if needed, see the docs.
Leaving out partitions (which is another concept of Kafka):
The replication factor does not say how many times a topic is replicated, but rather how many times it should be replicated. It is not affected by brokers shutting down.
Once a leader broker shuts down, the "leader" status goes over to another broker which is in sync, that means a broker that has the current state replicated and is not behind. Electing "leader" status to a broker that is not in sync would obviously lead to data loss, so this will never happen (when using the right settings).
These replicas eligible for taking "leader status" are called in-sync replica (ISR), which is important, as there is a configuration called min.insync.replicas that specifies how many ISR have to exist for a Kafka message to be acknowledged. If this is set to 0, every Kafka message is acknowledged as "successful" as soon as it enters the "leader" broker, if this broker would die, all data that was not replicated yet is lost. If min.insync.replicas would be set to 1, every message waits with the acknowledgement, until at least 1 replica exists in order to be "successful", so if the broker would die now, there would be a replica covering this data. If there are not enough brokers to cover the minimum amount of replicas, your cluster will fail eventually.
So to answer your question: if you had 2 running brokers, min.insync.replicas=1 (default) and replication factor of 3, your cluster runs fine and will add a replica as soon as you start up another broker. If another of the 2 brokers dies before you launch the third one, you will run into problems.

Kafka setup strategy for replication?

I have two vm servers (say S1 and S2) and need to install kafka in cluster mode where there will be topic with only one partition and two replicas(one is leader in itself and other is follower ) for reliability.
Got high level idea from this cluster setup Want to confirm If below strategy is correct.
First set up zookeeper as cluster on both nodes for high availability(HA). If I do setup zk on single node only and then that node goes down, complete cluster
will be down. Right ? Is it mandatory to use zk in latest kafka version also ? Looks it is must for older version Is Zookeeper a must for Kafka?
Start the kafka broker on both nodes . It can be on same port as it is hosted on different nodes.
Create Topic on any node with partition 1 and replica as two.
zookeeper will select any broker on one node as leader and another as follower
Producer will connect to any broker and start publishing the message.
If leader goes down, zookeeper will select another node as leader automatically . Not sure how replica of 2 will be maintained now as there is only
one node live now ?
Is above strategy correct ?
Useful resources
ISR
ISR vs replication factor
First set up zookeeper as cluster on both nodes for high
availability(HA). If I do setup zk on single node only and then that
node goes down, complete cluster will be down. Right ? Is it mandatory
to use zk in latest kafka version also ? Looks it is must for older
version Is Zookeeper a must for Kafka?
Answer: Yes. Zookeeper is still must until KIP-500 will be released. Zookeeper is responsible for electing controller, storing metadata about Kafka cluster and managing broker membership (link). Ideally the number of Zookeeper nodes should be at least 3. By this way you can tolerate one node failure. (2 healthy Zookeeper nodes (majority in cluster) are still capable of selecting a controller)) You should also consider to set up Zookeeper cluster on different machines other than the machines that Kafka is installed. Thus the failure of a server won't lead to loss of both Zookeeper and Kafka nodes.
Start the kafka broker on both nodes . It can be on same port as it is
hosted on different nodes.
Answer: You should first start Zookeeper cluster, then Kafka cluster. Same ports on different nodes are appropriate.
Create Topic on any node with partition 1 and replica as two.
Answer: Partitions are used for horizontal scalability. If you don't need this, one partition is okay. By having replication factor 2, one of the nodes will be leader and one of the nodes will be follower at any time. But it is not enough for avoiding data loss completely as well as providing HA. You should have at least 3 Kafka brokers, 3 replication factor of topics, min.insync.replicas=2 as broker config and acks=all as producer config in the ideal configuration for avoiding data loss by not compromising HA. (you can check this for more information)
zookeeper will select any broker on one node as leader and another as
follower
Answer: Controller broker is responsible for maintaining the leader/follower relationship for all the partitions. One broker will be partition leader and another one will be follower. You can check partition leaders/followers with this command.
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
Producer will connect to any broker and start publishing the message.
Answer: Yes. Setting only one broker as bootstrap.servers is enough to connect to Kafka cluster. But for redundancy you should provide more than one broker in bootstrap.servers.
bootstrap.servers: A list of host/port pairs to use for establishing
the initial connection to the Kafka cluster. The client will make use
of all servers irrespective of which servers are specified here for
bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the form
host1:port1,host2:port2,.... Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a server is
down).
If leader goes down, zookeeper will select another node as leader
automatically . Not sure how replica of 2 will be maintained now as
there is only one node live now ?
Answer: If Controller broker goes down, Zookeeper will select another broker as new Controller. If broker which is leader of your partition goes down, one of the in-sync-replicas will be the new leader. (Controller broker is responsible for this) But of course, if you have just two brokers then replication won't be possible. That's why you should have at least 3 brokers in your Kafka cluster.
Yes - ZooKeeper is still needed on Kafka 2.4, but you can read about KIP-500 which plans to remove the dependency on ZooKeeper in the near future and start using the Raft algorithm in order to create the quorum.
As you already understood, if you will install ZK on a single node it will work in a standalone mode and you won't have any resiliency. The classic ZK ensemble consist 3 nodes and it allows you to lose 1 ZK node.
After pointing your Kafka brokers to the right ZK cluster you can start your brokers and the cluster will be up and running.
In your example, I would suggest you to create another node in order to gain better resiliency and met the replication factor that you wanted, while still be able to lose one node without losing data.
Bear in mind that using single partition means that you are bounded to single consumer per Consumer Group. The rest of the consumers will be idle.
I suggest you to read this blog about Kafka Best Practices and how to choose the number of topics/partitions in a Kafka cluster.

In Kafka HA, why minimum number of brokers required are 3 and not 2

We are trying to implement Kafka HA using kafka cluster. While doing R&D, we found that minimum number of nodes recommended for zookeeper & kafka brokers are 3.
We understand that why zookeeper should have minimum 3 nodes, because for leader election minimum (n+1)/2 nodes should be up & running.
But its not clear, why minimum 3 kafka brokers are required. Why can't we implement HA with 2 kafka brokers & 3 zookeepr nodes?
The minimum number of nodes of Zookeeper is 3 because of the quorum attribute. It should be odd because the even number of nodes is no used. e.g: Zookeeper with total nodes of 8 can be downgraded to 7. Many nodes in Zookeepers also isn't good because of the consensus algorithm. (e.g: Paxos)
For the Kafka cluster, personally I think it is okay for setting 2 brokers. But it is better with 3 brokers. The reason because of maintaining the ISR - In Sync Replicas.
Let say your Kafka cluster has 2 brokers. To maintain the high availability and the consistency of the data, we will set the replicas and the ISR both to 2. The interesting part is the min-ISR attribute. If you set the min-ISR to 1 then the leader fails, likely you don't have any remaining replicas. If you set the min-ISR to 2, when either the leader or the follower fails, nor the producer and consumer can work.
If our Kafka cluster has 3 brokers and we set the ISR equals to 3, the min-ISR equals to 2. With this configuration, we accept the risk of losing 1 replica (either leader or follower) while working. For instance, if we lose the leader, there has at least one follower that in-sync for switching. If we lose one of the followers, we still have a remaining follower to keep the min-ISR to 2.
In addition to #hqt answer:
You can setup a Kafka HA Cluster with only 2 brokers, but the recommended replication-factor for production is 3, so you need 3 brokers in order to achieve this.
Also you should consider that Confluent are working at migrating the leader election to Kafka, so in the future you will not need Zookeeper anymore, which will possibly implies to have an odd number of Kafka brokers.

Kafka leader election in multi-dc with an arbiter/witness/observer

I would like to deploy a Kafka cluster in two datacenters with the same number of nodes on each DC. The first DC is used in active mode while the second is in passive mode.
For example, let say that both datacenters have 3 nodes with 2 in-sync replica (ISR) on the first DC and one ISR on the second DC.
Is it possible to have a third DC containing an arbiter/witness/observer node such that in case of failure of one DC, a leader election can succeed with the correct outcome in term of consistency? mongoDB has such feature named Replica set Arbiter.
What about deploying ZooKeeper on the three datacenters? From my understanding ZooKeeper does not hold the Kafka data and it should not be contacted for each new record in the Kafka topic, i.e. you do not pay the latency to the third DC for each new record.
There is one presentation at the Kafka summit 2017 One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers speaking about this setup. There is also some interesting information inside a Confluent whitepaper Disaster Recovery for Multi-Datacenter Apache Kafka® Deployments.
It says it could work and they called it an observer node but it also says no one has ever tried this.
Zookeeper keeps tracks of the following metadata for Kafka (0.9.0+).
Electing a controller - The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.
Cluster membership - which brokers are alive and part of the cluster? this is also managed through ZooKeeper.
Topic configuration - what overrides are there for that topic, where are the partitions located etc.
Quotas - how much data is each client allowed to read and write
ACLs - who is allowed to read and write to which topic
More detail on the dependency between Kafka and Zookeeper on the Kafka FAQ and answer at Quora from a Kafka commiter working at Confluent.
From the resources I have read, a setup with two DC (Kafka plus Zookeeper ) and an arbiter/witness/observer Zookeeper node on a third DC with high latency could work but I haven't found any resources that has actually experimented it.

Shutdown Kafka Cluster and then Start Kafka Cluster

I have a 2 Broker node kafka with 3 node ZooKeeper cluster. When Stopping and starting Kafka cluster, what are the steps I should take,
Do I go, stop 2 brokers individually first and then stop 3 zks individually?
And then start zks individually and 2 kafka brokers individually?
Assumptions
This is a production cluster and you don't want any data loss.
You have partition replicas spanned across the brokers
For each partition you have at least one replica on each broker
all zks are accessible by each broker
This is how I would do it
Take down individual broker.
When one of the broker is down, then describe the topics to check if only replicas that are not shown, belong to broker which was taken down.
Restart the broker and again verify that all partitions are in sync before going for next broker.
Then stop-start each zk individually, each time tracking if all replicas and partitions are in sync. That way at least one zk is available for 2 brokers to maintain their meta data