Leader Election : Consul vs ZooKeeper - apache-zookeeper

We are choosing the best option for implementing a leader election to achieve high availability. Our goal is to have only a single instance active at any given time. We are using Spring Boot to develop application which is getting deployed by default on Tomcat. Would be great to hear your opinion about the following options:
Does Zookeeper provide better CP than Consul ?
View on maintenance/complexity ?

ZooKeeper is based on ZAB & Consul is based on Raft. Both are very similar atomic broadcast algorithms at a high level. So, as far as "Consistancy" of CAP (which is actually linearizability, a very strong form of consistancy) is concerned, both will provides similar guarantees. Both of them have linearizable writes to quorum (majority). The other nodes (not in quorum) may lag in updates by default resulting in stale reads. This is done this way because complete linearizability makes things slow and many applications are good with a little stale reads. However, if that is not acceptable in a particular usecase, it is always possible to use sync call before read in ZooKeeper and Consistent mode in Consul to acheive complete linearizability.
For service discovery, however, Consul seems to provide higher level constructs that are not out-of-the-box in ZooKeeper.
In terms of leader election use case, both can be used.
But given that ZooKeeper is used by many top level apache projects and it is also older than the Raft and therefore Consul, I hope it will have better community support and documentation. Also the Apache documentation providing various recepes is great.
Finally, if you go with ZooKeeper, you may also want to use Apache Curator which provides higher level APIs on top of ZooKeeper.

Related

Consensus Service vs Lock Service?

Going through the Google's Chubby Paper,
Like a lock service, a consensus service would
allow clients to make progress safely even with only one
active client process; a similar technique has been used to
reduce the number of state machines needed for Byzantine fault tolerance [24]. However, assuming a consensus
service is not used exclusively to provide locks (which
reduces it to a lock service), this approach solves none of
the other problems described above
they mention how Chubby is not a consensus service, but a lock service,
and also how a consensus service could be used to achieve consensus amongst a peer of nodes as well.
In my understanding I thought services like Chubby and Zookeeper are used to offload your distributed application problems (like leader election, cluster management, access to shared resources) to a different application (chubby/zookeeper) and these are lock based services. Having locks on files/znodes in how consensus is achieved.
What are consensus services and how are they then different from lock services ?
When would one use either of them ?
Zookeeper is a co-ordination service, modeled after Google's Chubby
The major features it provides are
Linearizable atomic operations
Total ordering of operations
Failure detection
Change notifications
Out of these, Linearizable atomic operations requires ZooKeeper to implement a consensus algorithm (Zab), and therefore Linearizability can be used for achieving consensus among peers in distributed systems, using Zookeper locks
Quoting from the book Designing Data-Intensive Application
Coordination services like Apache ZooKeeper [15] and etcd [16] are
often used to implement distributed locks and leader election. They
use consensus algorithms to implement linearizable operations in a
fault-tolerant way
Based on my understanding, consensus services, and coordination services, both run on top of some consensus algorithm, it's just that lock-services represent that consensus through a distributed lock
Similar to what is also mentioned in the Chubby paper,
However, assuming a consensus service is not used exclusively to provide locks (which reduces it to a lock service)
I found chapter 9, "Consistency and Consensus" from the book Designing Data-Intensive Applications, to be very helpful on this topic, if you wanna dig further, would definitely recommend reading that
You can take a lock to propose your value, publish your value, and that's the consensus.

RabbitMQ cluster on IoT devices

I am designing IoT system with board computers such as raspberry pi.
Particularly, am designing application messaging platform that enables pub-sub, esb and so on.
To make it easy and simple, I am considering to employ rabbitmq.
Furthermore, I want to build rabbitmq cluster on those node, to avoid SPoF.
However, those devices sometimes will be turned off.
I think this means a node leaves from cluster temporarily.
I expect rabbitmq cluster assumes this situation a certain degree, but I cannot assume how much it is able to accept, what problems occurs.
To experts of rabbitmq cluster,
Could you tell me any concerns about it, and cases that we should care, please?
Do you think it does work in production?
Please tell me any cases similar to my assumption.
I really look forward to your reply.
Even if it is tiny things, would be nice for me.
TL;DR RabbitMQ doesn't work well in this scenario. Better use another thing.
RabbitMQ is intended to work with stable nodes, it uses the Raft algorithm for distributed consensus and elects their leader (see http://thesecretlivesofdata.com/raft). As we can observe with this approach the process to elect a leader is compounded by several steps. If the network is partitioned or the leader fails another leader must be elected. If this happens frequently the entire network would be unstable.
Maybe you could want to have a look at other technologies like https://deepstream.io.

Mapping out a Kafka+Zookeeper cluster

Background
I inherited a Kafka/Zookeeper installation. I have a passing knowledge of those - I know the general architecture, how clients work, about topics, etc., have been involved in programming Java clients etc.
But the installation is somewhat dubious. They are three instances of Kafka and Zookeeper each (in their separate docker containers). Supposedly they should work, but what I am seeing is all processes spout immense amount of log output with loads and loads of (diverse) warnings and errors. I have the impression that some of these seem to be quite normal (or are being self-healed all the time), and am having a very hard time figuring if everything works as intended or not, and set up correctly.
Some of these are - according to Google - related to unclean shutdowns of the brokers; corrupted individual topics and such. As this is a test environment, I can easily delete such files.
I know about some commands which help me check topics etc. (basic stuff, like listing them, displaying their individual configuration etc.).
However...
Question
Is there an online ressource/documentation which can be used as a systematic walkthrough to check whether everything is basically setup OK; for example to clear up these questions:
Do the three Zookeepers and the three Kafka instances correctly talk to each other for high-availability purposes? Do they have a correct "leader" etc.?
Are the servers generally "healthy", i.e., easily able to accept connections etc.?
How are the topics working (what's in there, how many messages, etc.)?
I am aware that one may very quickly dismiss this question as too generic; I am not asking you to solve my problems. I am looking for a ressource to systematically walk through such an installation - it may or may not cover the examples I have given, but it definitely should give a systematic way to find out if things are fundamentally wrong.
Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.
If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.
You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.
My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.
For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.
To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.
This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.
I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.
The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,
kafka manager can help you with high level monitoring.
Please provide your comments.

Kafka instead of Zookeeper for cluster management

I am writing a clustered application sitting on top of Kafka -- it uses Kafka exclusively for interprocess communications and coordination. I could use Zookeeper to manage my cluster -- but it would not be very difficult to use Kafka topics to manage the cluster. And the more I think about it, other than for historical reasons, it seems like Kafka could drop Zookeeper and just use a topic-based solution
For example, there could be a special topic or topics in Kafka where you publish all of the same data currently kept track of in Zookeeper. Brokers, Topics, Partitions, Leaders, etc -- seems like this is just as easily tracked via Kafka topics as via Zookeeper.
I know in Kafka 0.9.0 there's some movement away from Zookeeper, more towards this model, and remember my question is less about Kafka development or more me trying to figure out which direction to go in my application.
I'm not asking for an opinion -- what I want to know is are there any specific functions provided by Zookeeper that are going to be difficult with a Kafka/topic-based approach to coordination. But I can't think of anything.
Even heartbeat monitoring -- which was the reason I started looking at Zookeeper in the first place -- you could have a client connection topic, and clients could publish to it when they join the cluster, publish heartbeats at a given interval, and publish as they leave it.
Let us start from a space eyed view: You have two distributed
systems which store data. Zookeeper organizes it's data in nodes in some kind
of directory like structure. Kafka stores messages within topics.
From a bird eye view kafka is build for high-throughput and scalability while one of zookeepers
main design goal is consistency. Zookeeper is mean to be a a Distributed Coordination Service for
Distributed Applications while Kafka can be thought as a distributed commit log.
So the answer to your question is surprisingly: 'It depends'. For coordinating
a distributed system I would use zookeeper: Thats what it was build for. You could
do this also with kafka but there are couple of things which needs to be done
manualy which comes out of the box if you are using zookeeper.
Some examples:
Consistency: The ZK-Client can choose if he needs strong or a eventual consistency
Ephemeral nodes: Together with ZK-Watches a great thing to react on failing services
Sequential Consistency: It's not granted that you recieve the kafka-message in the order you wrote it to the broker (it's only granted that messages within a partion are ordered)
ACLs: Never used it but its at least something which is not offered out of the box by kafka
Sequence Nodes
A pretty nice overview about what you can do with zookeeper are the zookeeper-recipes: https://zookeeper.apache.org/doc/trunk/recipes.html
[EDIT]: Heartbeating an application using kafka is of course possible. But ephemeral nodes in zookeeper are in my eyes the easier option.
This is currently being worked on in scope of KIP-500.

Learning Zookeeper - Help me with example

I'm trying to wrap my head around Zookeeper and what it does. To this point, my experience with Zookeeper has been through other libraries that require Zookeeper (Solr and Kafka) and so my basic understand is the very vague "you better use Zookeeper to keep your configuration straight".
So help me think through a simple example problem. Let's say that I build my own service that does "stuff". There are two things that I want to protect:
I want to have as little downtime as possible (gotta keep doing stuff).
I can not have more than one server doing stuff because bad things would happen.
So, how would I set this up in Zookeeper? Is Zookeeper responsible for starting another stuff server if one goes down? Or do I subscribe to a Zookeeper "stuff doer status" callback? If I erroneously start up two stuff servers, how does Zookeeper help me keep bad things from happening?
Zookeeper is a distributed lock manager. These systems provide features like coordinator election (aka "master election" or "leader election") for a distributed system, as well as provide a consistent, distributed access to small amounts of critical information which is frequently used for configuration (i.e., don't treat it like a database or a general file system).
Note that Zookeeper does not manage your service, but you can use Zookeeper to keep a hot standby (or several) such that in case of one master failing, another one will take over, so you would run N replicas of your servers, such that one of the working instances can take over immediately if the current leader goes down or becomes unavailable for any reason.
Using master election, you can choose to have two (or more) servers, but only one of them will be able to take the master lock, so only that one will be able to take action. As soon as it goes away, it will lose its claim to the lock, and your hot standby will pick up the lock and start doing work that you need it to do. Look at Zookeeper recipes for code samples. However, properly handing off work, checkpointing, and general service resilience is still up to you to design and implement.
That said, Zookeeper and similar systems provide a solid foundation to enable you to build robust distributed systems.
Other systems similar to Zookeeper include (alphabetically):
Chubby
doozerd
etcd
Several of these have detailed comparisons written up on their respective websites to show how they differ from the others in the list.