Leader election with: Etcd vs Zookeeper vs Hazelcast - apache-zookeeper

We are choosing the best option for implementing a leader election for our service (written in Java) comprised of multiple (e.g., 3) instances for high availability. Our goal is to have only a single instance active at any given time.
Would be great to hear your opinion about the following options:
1) Hazelcast. Using "quorum" and a lock we can implement a leader election. However, we can run into a split-brain problem where for some time two leaders may be present. Also, it seems that Hazelcast does not support SSL.
2) Zookeeper. We can implement leader election on top of a Zookeeper ensemble (where a ZK node is run on each instance of our service). Does Zookeeper provide better consistency guarantees than Hazelcast? Does it also suffer from the split-brain problem?
3) Etcd. We can use the Jetcd library which seems like the most modern and robust technology. Is it really better in terms of consistency than Zookeeper?
Thank you.

1) Hazelcast, by version 3.12, provides a CPSubsystem which is a CP system in terms of CAP and built using Raft consensus algorithm inside the Hazelcast cluster. CPSubsytem has a distributed lock implementation called FencedLock which can be used to implement a leader election.
For more information about CPSubsystem and FencedLock see;
CP Subsystem Reference manual
Riding the CP Subsystem
Distributed Locks are Dead; Long Live Distributed Locks!
Hazelcast versions before 3.12 are not suitable for leader election. As you already mentioned, it can choose availability during network splits, which can lead to election of multiple leaders.
2) Zookeeper doesn't suffer from the mentioned split-brain problem, you will not observe multiple leaders when network split happens. Zookeeper is built on ZAB atomic broadcast protocol.
3) Etcd is using Raft consensus protocol. Raft and ZAB have similar consistency guarantees, which both can be used to implement a leader election process.
Disclaimer: I work at Hazelcast.

Related

Do you need multiple zookeeper instances to run a multiple-broker kafka?

I'm new to kafka.
Kafka is supposed to be used as a distributed service. But the tutorials and blog posts i found online never mention if there is one or several zookeeper nodes.
The tutorials just pop one zookeper instance, and then multiple kafka brokers.
Is it how it is supposed to be done?
Zookeeper is a co-ordination service (in a centralized manner) for distributed systems that is used by clusters for maintenance of distributed system . The distributed synchronization achieved by it via metadata such as configuration information, naming, etc.
In general architectures, Kafka cluster shall be served by 3 ZooKeeper nodes, but if the size of deployment is huge, then it can be ramped up to 5 ZooKeeper nodes but that in turn will add load on the nodes as all nodes try to be in sync as all metadata related activities are handled by ZooKeeper.
Also, it should be noted that as an improvement, the new release of Kafka reduces dependency on ZooKeeper in order to enhance scalability of metadata across, to reduce the complexity in maintaining the meta data with external components and to enhance the recovery from unexpected shutdowns. With new approach, the controller failover is almost instantaneous. This is achieved by Kafka Raft Metadata mode termed as 'KRaft' that will run Kafka without ZooKeeper by merging all the responsibilities handled by ZooKeeper inside a service in the Kafka Cluster itself and operates on event based mechanism that is used in the KRaft protocol.
Tutorials generally keep things nice and simple, so one ZooKeeper (often one Kafka broker too). Useful for getting started; useless for any kind of resilience :)
In practice, you are going to need three ZooKeeper nodes minimum.
If it helps, here is an enterprise reference architecture whitepaper for the deployment of Apache Kafka
Disclaimer: I work for Confluent, who publish the above whitepaper.

Building a Kafka Cluster using two servers only

I'm planning to build a Kafka Cluster using two servers, and host Zookeeper on these two servers as well.
The Question is, since Kafka requires Zookeeper to run, what is the best cluster build for zookeeper to implement Kafka Cluster on two servers?
for eg. I'm currently running two zookeepers on both servers and one Kafka on each server, and in the Kafka configuration they point to all Zookeepers.
Is there a better way to do this?
First of all, you don't have to setup Zookeper and Kafka in the same server. One of the roles of Zookeeper is electing controller. (one of the brokers which is responsible for maintaining the leader/follower relationship for all the partitions) For election; majority of Zookeper nodes must be alive. In your case even one Zookeeper instance is down, you cannot select controller. So there is no difference between having one Zookeper or two. That's why it is recommended to have at least 3 nodes in Zookeeper cluster. By this way you can handle failure of one Zookeeper node.
An addition to this, it is highly recommended to have at least three brokers in your Kafka cluster to maintain both consistency and high availability. (link1, link2)
UPDATE:
As long as you are limited to only two servers, then you can consider sacrificing from high availability by set up your broker by setting min.insync.replicas=2 and having topics with replication.factor=2. If HA is more important than data loss, then you can use min.insync.replicas=1 (default) broker config with again topic replication.factor=2. In this circumstance, your options are these IMHO. (Having one or two Zookeepers is not important as I mentioned above)
I am often faced with the same problem as you do #frisky5 where i would like to achieve a "suboptimal" HA system using only 2 nodes, and thus workarounds are always needed with cloud-native frameworks that rely on the assumption that clusters will have lot of nodes available.
That ain't always the case in real life, is it ;) ?
That being said, i see you essentially having 2 options:
Externalize zookeeper configuration on a replicated storage system using 2 nodes (e.g. DRBD)
Replicate Kafka data volumes entirely on the second nodes and use 2 one-node Kafka clusters that you switch on and off depending on who is the current master node.
I would go for the first option. In that case you would have 2 Kafka servers and one zookeeper server whose ip needs to be static (virtual ip). When the zookeeper node goes down, it is restarted one the second node with same VIP, but it needs to access the synchronized data folder.
I am not too familiar with zookeepers internals and i can't tell you whether it will go in conflict when starting up on a data store who "wasn't its own" but i would guess it makes sense for you to test it using a simple rsync setup.
Another way to achieve consensus if you are using a k3s based kubernetes cluster would be to rely on internal k8s distributed consensus mechanics to "tell Kafka" which node is the leader. This works for the postgresoperator by chruncydata because Patroni is cool ( https://patroni.readthedocs.io/en/latest/kubernetes.html ) 😎 but i am not sure if Kafka/zookeeper are that flexible and can communicate with a rest API to set their locks ...
Once you have achieved this intermediate step, then you can use a PostgreSQL db as external source of truth for k3s and then it is as simple as syncing the postgres data folder between the machines (easily done with rsync). The beauty of this approach is that it is way more generic and could be used for other systems too.
Let me know what do you think about these two approaches and whether you manage to setup a test environment. If you do on GitHub i can help you out with implementation

Apache zookeeper Leader Election: can it work with only two nodes?

I have a two node redhat system with an identical set of services on each. I am looking for a way to determine which service is "in charge" and which is a "running backup". So for example; service-A exists and is running on both nodes but only one should be processing data while the other sleeps until the first crashes. Same for other services in the set.
Zookeeper's leader election capability looked like it would suffice; the whole ephemeral and sequential znode approach looked good on paper. I imagined that I would also need a zookeeper service running on each node for redundancy in the face of node failure, for example.
But the documentation points out issues with multiple zookeeper's requiring at least 3 instances in order to guarantee a quorum to elect the lead zookeeper among all other zookeepers. As I only have two nodes this looks like a deal-breaker.
So before I drop the zookeeper approach, I thought I ask if there were some configuration option to zookeeper to allow a two node system to work. Otherwise I'm off to find the next best fit for my problem.
You can run Zookeeper with just two instances. However, it gives you no benefit of fault tolerance because the quorum is till 2 in that case. Any one of them failing will result in Zookeeper ensemble rejecting client requests. That's why the default configuration for an ensemble is 3 Zookeeper instances because having 2 instances is no better than having 1 so why go through the trouble of creating 2? It actually creates more points of failures because when either instance dies, your Zookeeper ensemble halts and having either one of two to fail is more likely to have just one to fail.

Running zookeeper on a cluster of 2 nodes

I am currently working on trying to use zookeeper in a two node cluster. I have my own cluster formation algorithm running on the nodes based on configuration. We only need Zookeeper's distributed DB functionality.
Is it possible to use Zookeeper in a two node cluster ? Do you know of any solutions where this has been done ?
Can we still retain the zookeepers DB functionality without forming a quorum ?
Note: Fault tolerance is not the main concern in this project. If one of the nodes go down we have enough code logic to run without the zookeeper service. We use the zookeeper to share data when both the nodes are alive.
Would greatly appreciate any help.
Zookeeper is a coordination system which is basically used to coordinate among nodes. When writes are occurred to such a distributed system, in ordered to coordinate and agree upon values which are being stored, all the writes are gone through master (aka leader). Reads can occur through any node. Zookeeper requires a master/leader to be elected per a quorum in order to serve write requests consistently. Zookeeper make use of the ZAB protocol as the consensus algorithm.
In order to elect a leader, a quorum should ideally have an odd number of nodes (Otherwise, a node will not be able to win majority and become the leader). In your case, with two nodes, zookeeper will not possibly be able to elect a leader for a long time since both nodes will be candidates and wait for the other node to vote for it. Even though they elect a leader, your ensemble will not work properly in network patitioning situations.
As I said, zookeeper is not a distributed storage. If you need to use it in a distributed manner (more than one node), it need to form a quorum.
As I see, what you need is a distributed database. Not a distributed coordination system.

Assign a server as leader in zookeeper ensemble

We have a quorum of 4 servers which has zookeeper 3.4.6 installed in all of them.The leader election is currently managed automatically. However we would like to assign a particular server as a leader as this box is more robust and has high capabilities.
I am looking for a setting to assign a server as leader always.Is it possible?. I even tried the zookeeper 3.5.1-alpha version but even that doesnt seem to have any particular setting. I understand there are algorithms for implementing the election but a setting will be more advantageous for us.
Any thoughts?
Thanks,
Ram
There is no such setting. Leader election is automatic unless you decide to implement an algorithm but seems to me thats not a solution you are looking for.