Assign a server as leader in zookeeper ensemble - apache-zookeeper

We have a quorum of 4 servers which has zookeeper 3.4.6 installed in all of them.The leader election is currently managed automatically. However we would like to assign a particular server as a leader as this box is more robust and has high capabilities.
I am looking for a setting to assign a server as leader always.Is it possible?. I even tried the zookeeper 3.5.1-alpha version but even that doesnt seem to have any particular setting. I understand there are algorithms for implementing the election but a setting will be more advantageous for us.
Any thoughts?
Thanks,
Ram

There is no such setting. Leader election is automatic unless you decide to implement an algorithm but seems to me thats not a solution you are looking for.

Related

How to handle failure senario for kafka and zookeeper in kubernetes

What I have zookeeper setup which is running on server1, server2 and server3 and similarly kafka also running in server1, server2 and server3.
Setup are running in kubernetes.
Problem statement:
In case one zookeeper setup get down entire setup will get down, because kafka is depended to zookeeper. am i right?
If Q1 correct - Is there any way to make setup like if one zookeeper server will get down then kafka should run as it is?
How to expose kafka port in kubernetes setup ?
what is the recommended way to persist data in kubernetes for production server ?
I fail to see how Zookeeper questions are related to k8s... But you definitely should set affinity rules such that Zookeeper and Kafka are not on the same physical servers or sharing same disks
If one Zookeeper out of three goes down, you'll end up with a split brain event in that no single Zookeeper knows which should be responsible for leadership. This effectively can crash or corrupt Kafka, yes.
To mitigate that risk, you can choose to run 5 Zookeepers, in which case you can lose up to 3 servers to reach the same state. The Definitive Guide book covers these concepts in the first few chapters
Regarding the other questions - NodePorts and PVCs, generally speaking.
Use one of the popular Kafka Operators on Github and you'll not need to think too hard about setting those properties
You still must manually perform Kafka admin tasks in any installation... You can use extra services like Cruise Control if you want to reduce that workload, though

During rolling upgrade/restart, how to detect when a kafka broker is "done"?

I need to automate a rolling restart of a kafka cluster (3 kafka brokers). I can easily do it manually - restart one after the other, while checking the log to see when it's fine (e.g., when the new process has joined the cluster).
What is a good way to automate this check? How can I ask the broker whether it's up and running, connected to its peers, all topics up-to-date and such? In my restart script, I have access to the metrics, but to be frank, I did not really see one there which gives me a clear picture.
Another way would be to ask what a good "readyness" probe would be that does not simply check some TCP/IP port, but looks at the actual server...
I would suggest exposing JMX metrics and tracking the following for cluster health
the controller count (must be 1 over the whole cluster)
under replicated partitions (should be zero for healthy cluster)
unclean leader elections (if you don't disable this in server.properties make sure there are none in the metric counts)
ISR shrinks within a reasonable time period, like 10 minute window (should be none)
Also, Yelp has tooling for rolling restarts implemented in Python, which requires Jolokia JMX Agents installed on the brokers, and it polls the metrics to make sure some of the above conditions are true
Assuming your cluster was healthy at the beginning of the restart operation, at a minimum, after each broker restart, you should ensure that the under-replicated partition count returns to zero before restarting the next broker.
As the previous responders mentioned, there is existing code out there to automate this. I don’t use Jolikia, myself, but my solution (which I’m working on now) also uses JMX metrics.
Kakfa Utils by Yelp is one of the best tools that can be used to detect when a kafka broker is "done". Specifically, kafka_rolling_restart is the tool which gets broker details from zookeeper and URP (Under Replicated Partitions) metrics from each broker. When a broker is restarted, total URPs across Kafka cluster is periodically collected and when it goes to zero, it restarts another broker. The controller broker is restarted at the last.

Leader election with: Etcd vs Zookeeper vs Hazelcast

We are choosing the best option for implementing a leader election for our service (written in Java) comprised of multiple (e.g., 3) instances for high availability. Our goal is to have only a single instance active at any given time.
Would be great to hear your opinion about the following options:
1) Hazelcast. Using "quorum" and a lock we can implement a leader election. However, we can run into a split-brain problem where for some time two leaders may be present. Also, it seems that Hazelcast does not support SSL.
2) Zookeeper. We can implement leader election on top of a Zookeeper ensemble (where a ZK node is run on each instance of our service). Does Zookeeper provide better consistency guarantees than Hazelcast? Does it also suffer from the split-brain problem?
3) Etcd. We can use the Jetcd library which seems like the most modern and robust technology. Is it really better in terms of consistency than Zookeeper?
Thank you.
1) Hazelcast, by version 3.12, provides a CPSubsystem which is a CP system in terms of CAP and built using Raft consensus algorithm inside the Hazelcast cluster. CPSubsytem has a distributed lock implementation called FencedLock which can be used to implement a leader election.
For more information about CPSubsystem and FencedLock see;
CP Subsystem Reference manual
Riding the CP Subsystem
Distributed Locks are Dead; Long Live Distributed Locks!
Hazelcast versions before 3.12 are not suitable for leader election. As you already mentioned, it can choose availability during network splits, which can lead to election of multiple leaders.
2) Zookeeper doesn't suffer from the mentioned split-brain problem, you will not observe multiple leaders when network split happens. Zookeeper is built on ZAB atomic broadcast protocol.
3) Etcd is using Raft consensus protocol. Raft and ZAB have similar consistency guarantees, which both can be used to implement a leader election process.
Disclaimer: I work at Hazelcast.

Running zookeeper on a cluster of 2 nodes

I am currently working on trying to use zookeeper in a two node cluster. I have my own cluster formation algorithm running on the nodes based on configuration. We only need Zookeeper's distributed DB functionality.
Is it possible to use Zookeeper in a two node cluster ? Do you know of any solutions where this has been done ?
Can we still retain the zookeepers DB functionality without forming a quorum ?
Note: Fault tolerance is not the main concern in this project. If one of the nodes go down we have enough code logic to run without the zookeeper service. We use the zookeeper to share data when both the nodes are alive.
Would greatly appreciate any help.
Zookeeper is a coordination system which is basically used to coordinate among nodes. When writes are occurred to such a distributed system, in ordered to coordinate and agree upon values which are being stored, all the writes are gone through master (aka leader). Reads can occur through any node. Zookeeper requires a master/leader to be elected per a quorum in order to serve write requests consistently. Zookeeper make use of the ZAB protocol as the consensus algorithm.
In order to elect a leader, a quorum should ideally have an odd number of nodes (Otherwise, a node will not be able to win majority and become the leader). In your case, with two nodes, zookeeper will not possibly be able to elect a leader for a long time since both nodes will be candidates and wait for the other node to vote for it. Even though they elect a leader, your ensemble will not work properly in network patitioning situations.
As I said, zookeeper is not a distributed storage. If you need to use it in a distributed manner (more than one node), it need to form a quorum.
As I see, what you need is a distributed database. Not a distributed coordination system.

Does leader election need to be executed in separate step after setting up 3-machine ensemble?

I currently have 3 machines in a Zookeeper ensemble - I've already set up the configuration file, and the 3 machines in the ensemble can communicate with each other.
What exactly is leader election? Do I need to use a LeaderLatch to start leader election, or does it happen automatically after correctly setting up the configuration file?
It's a bit confusing because the Zookeeper server itself has the concept of a leader that it takes care of electing when you set up a ZK ensemble. But that leader is purely internal to ZK. See this post for more details.
If you want leader election for your application then you will need to implement a leader election recipe, either using something like Curator LeaderLatch or rolling your own based on the recipe on the ZK site.