Zookeeper failures in Kafka 0.9 and above - apache-kafka

Based of the answer given in Is Zookeeper a must for Kafka?.
It is clear that what is the responsibility of Zookeeper in Kafka 0.9 and above
I just wanted to understand what will be the impact if zookeeper cluster goes down completely?

kafka uses ZK for membership (figure out what brokers exist and which of them are alive) and leader election (elect the one broker that is controller for the cluster at any moment).
simply put - if ZK fails kafka dies.
if ZK sneezes (say a particularly long GC pause or a short network connectivity issue) a kafka cluster may temporarily "lose" any number of members and/or the controller. by the time this settles you may have a new controller and new leader brokers for all partitions (which may or may not cause loss of acknowledged data, see "unclean leader election"). I'm not sure if all ongoing transactions would be rolled back - never tried.

Related

Kafka brokers will share same locations to store data logs; if they are in a cluster

I am reading one of the article related to Kafka basics. If one of the Kafka brokerX dies in a cluster then, that brokerX data copies will move to other live brokers, which are in the cluster.
If that is the case, Is zookeeper/Kafka Controller will copy the brokerX data folder and move to live brokers like copy paste from one machine hard-disc to another (physical copy)?
Or, live brokers will share a common location ? so that will zookeeper/controller will link/point to the brokerX locations(logical copy) ?
I am little hard in understanding here. Could someone help me on this?
If a broker dies, it's dead. There's no background process that will copy data off of it
The replication of topics only happens while the broker is running
Plus, that image is wrong. The partitions = 2 means exactly that. A third partition doesn't just appear when a broker dies
This all depends if the topic has a replication factor > 1. In this case, brokers holding follower replica are constantly sending fetch requests to the leader replica (a specific broker), with a goal of being head to head with the leader (both the follower replica and leader replica having the same records stored on disk).
So when a broker goes down, all it takes is for the controller to select and promote an in-sync replica (by default, but could select non insync replicas) to take over as the leader of the partition. No copy/paste required, all brokers holding a partition(s) (as a follower replica or leader replica) of that topic are storing the same information prior to shutting down.
If a broker dies the behaviour depends on the dead broker. If it was not the leader for its partition it's non problem. when the broker returns on-line it will have to copy all missing data from the leader replica. If the dead broker was the leader for the partition a new leader will be elected according to some rules. If the new elected leader was in sync before the old leader died, there will be no message loss and the follower brokers will sync their replica from the new leader, as the broken leader will do when up again. If the new elected leader was not in sync you might have some message loss. Anyway you can drive the behaviour of your kafka cluster setting various parameters to balance speed, data integrity and reliability.

Kafka setup strategy for replication?

I have two vm servers (say S1 and S2) and need to install kafka in cluster mode where there will be topic with only one partition and two replicas(one is leader in itself and other is follower ) for reliability.
Got high level idea from this cluster setup Want to confirm If below strategy is correct.
First set up zookeeper as cluster on both nodes for high availability(HA). If I do setup zk on single node only and then that node goes down, complete cluster
will be down. Right ? Is it mandatory to use zk in latest kafka version also ? Looks it is must for older version Is Zookeeper a must for Kafka?
Start the kafka broker on both nodes . It can be on same port as it is hosted on different nodes.
Create Topic on any node with partition 1 and replica as two.
zookeeper will select any broker on one node as leader and another as follower
Producer will connect to any broker and start publishing the message.
If leader goes down, zookeeper will select another node as leader automatically . Not sure how replica of 2 will be maintained now as there is only
one node live now ?
Is above strategy correct ?
Useful resources
ISR
ISR vs replication factor
First set up zookeeper as cluster on both nodes for high
availability(HA). If I do setup zk on single node only and then that
node goes down, complete cluster will be down. Right ? Is it mandatory
to use zk in latest kafka version also ? Looks it is must for older
version Is Zookeeper a must for Kafka?
Answer: Yes. Zookeeper is still must until KIP-500 will be released. Zookeeper is responsible for electing controller, storing metadata about Kafka cluster and managing broker membership (link). Ideally the number of Zookeeper nodes should be at least 3. By this way you can tolerate one node failure. (2 healthy Zookeeper nodes (majority in cluster) are still capable of selecting a controller)) You should also consider to set up Zookeeper cluster on different machines other than the machines that Kafka is installed. Thus the failure of a server won't lead to loss of both Zookeeper and Kafka nodes.
Start the kafka broker on both nodes . It can be on same port as it is
hosted on different nodes.
Answer: You should first start Zookeeper cluster, then Kafka cluster. Same ports on different nodes are appropriate.
Create Topic on any node with partition 1 and replica as two.
Answer: Partitions are used for horizontal scalability. If you don't need this, one partition is okay. By having replication factor 2, one of the nodes will be leader and one of the nodes will be follower at any time. But it is not enough for avoiding data loss completely as well as providing HA. You should have at least 3 Kafka brokers, 3 replication factor of topics, min.insync.replicas=2 as broker config and acks=all as producer config in the ideal configuration for avoiding data loss by not compromising HA. (you can check this for more information)
zookeeper will select any broker on one node as leader and another as
follower
Answer: Controller broker is responsible for maintaining the leader/follower relationship for all the partitions. One broker will be partition leader and another one will be follower. You can check partition leaders/followers with this command.
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic
Producer will connect to any broker and start publishing the message.
Answer: Yes. Setting only one broker as bootstrap.servers is enough to connect to Kafka cluster. But for redundancy you should provide more than one broker in bootstrap.servers.
bootstrap.servers: A list of host/port pairs to use for establishing
the initial connection to the Kafka cluster. The client will make use
of all servers irrespective of which servers are specified here for
bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the form
host1:port1,host2:port2,.... Since these servers are just used for the
initial connection to discover the full cluster membership (which may
change dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a server is
down).
If leader goes down, zookeeper will select another node as leader
automatically . Not sure how replica of 2 will be maintained now as
there is only one node live now ?
Answer: If Controller broker goes down, Zookeeper will select another broker as new Controller. If broker which is leader of your partition goes down, one of the in-sync-replicas will be the new leader. (Controller broker is responsible for this) But of course, if you have just two brokers then replication won't be possible. That's why you should have at least 3 brokers in your Kafka cluster.
Yes - ZooKeeper is still needed on Kafka 2.4, but you can read about KIP-500 which plans to remove the dependency on ZooKeeper in the near future and start using the Raft algorithm in order to create the quorum.
As you already understood, if you will install ZK on a single node it will work in a standalone mode and you won't have any resiliency. The classic ZK ensemble consist 3 nodes and it allows you to lose 1 ZK node.
After pointing your Kafka brokers to the right ZK cluster you can start your brokers and the cluster will be up and running.
In your example, I would suggest you to create another node in order to gain better resiliency and met the replication factor that you wanted, while still be able to lose one node without losing data.
Bear in mind that using single partition means that you are bounded to single consumer per Consumer Group. The rest of the consumers will be idle.
I suggest you to read this blog about Kafka Best Practices and how to choose the number of topics/partitions in a Kafka cluster.

Why does a Kafka cluster need a Controller when it already uses Zookeeper?

A Kafka cluster has a controller node and a Zookeeper cluster, both with their own set of responsibilities. What is the requirement of the controller when we have zookeeper already ?
For example : Controller election is performed by zookeeper, a partition leader election is done by controller. Why doesn't Kafka use Zookeeper also for Partition Leader election when it already has the information of what partitions are on what nodes and which nodes are actually active.
In short, I am struggling to understand the requirement of controller in spite of the zookeeper being present. It would be really helpful if someone could explain the reason for this design choice and the advantages.
kafka uses zookeeper for a few things:
cluster membership - the live brokers of a cluster are those who have ephemeral ZK nodes
leader election - election of the kafka broker that acts as a controller
state storage - some (mostly the older) state is stored in ZK - the configuration for topics, for example. some state that used to be in ZK has been migrated to special topics (consumer offsets) and some newer functionality was written to store state entirely in kafka (transaction logs, for example).
the general trend is to stop using state in ZK and instead self-host it (although older parts of the code have never been migrated out).
as for why not use ZK for partition leader election - one reason is there is logic involved. when electing a cluster leader broker there's no preference - any broker will do. this fits well with how ZK-based leader-election works (1st memeber to create and own an ephemeral znode wins).
when choosing a partition leader, however, you need a little bit more logic. for example - you'd like to elect the leader with the "highest watermark" (with the most up to date data, remember replication is generally async). there's also logic around unclean leader election. ZK alone cannot do that, hence it is done by the controller.
Zookeeper works as a coordination service and Kafka is using zookeeper for the same purpose.
Zookeeper is must by design for Kafka. Because Zookeeper has the responsibility a kind of managing Kafka cluster. It has list of all Kafka brokers with it and Controller of the cluster is selected by Zookeeper and stored there.
Kafka stores minimum information on Zookeeper.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
in order to protected zookeeper. without controller, zookeeper needs to trigger too many listeners (equals to broker count), and most of these listener are useless, it is a potential risk for zookeeper, through controller only controller interact with zookeeper

Kafka leader election in multi-dc with an arbiter/witness/observer

I would like to deploy a Kafka cluster in two datacenters with the same number of nodes on each DC. The first DC is used in active mode while the second is in passive mode.
For example, let say that both datacenters have 3 nodes with 2 in-sync replica (ISR) on the first DC and one ISR on the second DC.
Is it possible to have a third DC containing an arbiter/witness/observer node such that in case of failure of one DC, a leader election can succeed with the correct outcome in term of consistency? mongoDB has such feature named Replica set Arbiter.
What about deploying ZooKeeper on the three datacenters? From my understanding ZooKeeper does not hold the Kafka data and it should not be contacted for each new record in the Kafka topic, i.e. you do not pay the latency to the third DC for each new record.
There is one presentation at the Kafka summit 2017 One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers speaking about this setup. There is also some interesting information inside a Confluent whitepaper Disaster Recovery for Multi-Datacenter Apache Kafka® Deployments.
It says it could work and they called it an observer node but it also says no one has ever tried this.
Zookeeper keeps tracks of the following metadata for Kafka (0.9.0+).
Electing a controller - The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.
Cluster membership - which brokers are alive and part of the cluster? this is also managed through ZooKeeper.
Topic configuration - what overrides are there for that topic, where are the partitions located etc.
Quotas - how much data is each client allowed to read and write
ACLs - who is allowed to read and write to which topic
More detail on the dependency between Kafka and Zookeeper on the Kafka FAQ and answer at Quora from a Kafka commiter working at Confluent.
From the resources I have read, a setup with two DC (Kafka plus Zookeeper ) and an arbiter/witness/observer Zookeeper node on a third DC with high latency could work but I haven't found any resources that has actually experimented it.

How do i measure time taken for kafka rebalance?

I am very new to Kafka. So this question might be very basic.
What i am trying to achieve is to find out the time it takes to rebalance when a broker fails and is then added back.
From my reading up of the documentation(http://kafka.apache.org/documentation/#basic_ops_restarting). When a broker fails or is taken down for maintenance
It will sync all its logs to disk to avoid needing to do any log recovery when it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery takes time so this speeds up intentional restarts.
It will migrate any partitions the server is the leader for to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.
What i want to do is find out the time taken to migrate any partitions that the server is the leader for to other replicas
My kafka setup is 3 broker nodes and 3 zk nodes.
Also, when i add this node back to the property of auto.rebalance=true the rebalance again kicks in, and it re-elects a leader.
How do i measure this time as well?
There is no "migration" as in data copy. When shutting down a broker cleanly, the controller will simply elect a new leader from the available replicas for all partitions the broker was the leader, making the transition fast.
There are a few metrics you can monitor the leader elections.
Since 0.11.0.0, the broker exposes a number of Controller metrics including:
kafka.controller:type=ControllerStats,name=AutoLeaderBalanceRateAndTimeMs
This tracks the rate and duration of auto leader rebalance. The full list of controller metrics that were added in 0.11 is available in the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-143%3A+Controller+Health+Metrics#KIP-143:ControllerHealthMetrics-ControllerMetrics
If you are running an older version (< 0.11.0.0), you'll have to rely on metrics like:
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
This include any leader elections.