how does storm leverage zookeeper for resilience? - apache-zookeeper

from the description of Storm, it is based on Zookeeper, and whenever a worker node dies, it can be recovered and get its state from zookeeper.
Does any one know how that is done? specifically
how does the failed worker node get recovered?
how does zookeeper keep its state. AFAIK, each zone can only store a small amount to data.

Are you talking about workers or supervisors? Each storm worker node runs a storm "supervisor" daemon which manages worker processes.
You need to setup supervision (something like daemontools or supervisord, which is unrelated to storm supervisors) to monitor and restart nimbus and supervisor daemons in case they take an exception. Both nimbus and supervisors are fail fast and stateless. Zookepeer is used for coordination between nimbus and supervisors along with holding state information, which is in zookeeper or on disk so as to not lose state information.
State data isn't large and Zookeeper should be run supervised too.
Check this for more fault tolerance details.

Related

Is a Kafka Connect worker a machine/server or just a cpu core?

In the docs Kafka Connect workers are described as processes, so in my understanding cores of cpu.
But in the same docs they are meant to provide automatic fault tolerance (in their distributed mode), so in my understanding different machines, since fault tolerance at process level is meaningless imo.
Somebody could enlighten me please ?
A Kafka Connect worker is a JVM process.
You can run multiple Kafka Connect workers in distributed mode configured as a cluster, and if one worker dies the work (tasks) are distributed amongst the remaining workers.
Typically you would deploy one Kafka Connect worker per machine. Running multiple Kafka Connect workers in distributed mode on one machine is not something that would generally make sense IMO.
I have not tested it but I don't believe that a Kafka Connect worker is tied to one CPU.
For more explanation see here: https://youtu.be/oNK3lB8Z-ZA?t=1337 (slides: https://rmoff.dev/bbuzz19-kafka-connect)

Kafka cluster with single broker

I'm looking to start using Kafka for a system and I'm trying to cover all use cases.
Normally it would be run as a cluster of brokers running on virtual servers (replication factor 3-5). but some customers though don't care about resilience and a broker failure needing a manual reboot of the whole system is fine with them, they just care about hardware costs.
So my question is, are there any issues with using Kafka as a single broker system for small installations with low throughput?
Cheers
It's absolutely OK to use a single Kafka broker. Note, however, that with a single broker you won't have a highly available service meaning that when the broker fails you will have a downtime.
Your replication-factor will be limited to 1 and therefore all of the partitions of a topic will be stored on the same node.
For a proof-of-concept or non-critical dev work, a single node cluster works just fine. However having a cluster has multiple benefits. It's okay to go with a single node cluster if the following are not important/relevant for you.
scalability [spreads load across multiple brokers to maintain certain throughput]
fail-over [guards against data loss in case one/more node(s) go down]
availability [system remains reachable and functioning even if one/more node(s) go down]

During rolling upgrade/restart, how to detect when a kafka broker is "done"?

I need to automate a rolling restart of a kafka cluster (3 kafka brokers). I can easily do it manually - restart one after the other, while checking the log to see when it's fine (e.g., when the new process has joined the cluster).
What is a good way to automate this check? How can I ask the broker whether it's up and running, connected to its peers, all topics up-to-date and such? In my restart script, I have access to the metrics, but to be frank, I did not really see one there which gives me a clear picture.
Another way would be to ask what a good "readyness" probe would be that does not simply check some TCP/IP port, but looks at the actual server...
I would suggest exposing JMX metrics and tracking the following for cluster health
the controller count (must be 1 over the whole cluster)
under replicated partitions (should be zero for healthy cluster)
unclean leader elections (if you don't disable this in server.properties make sure there are none in the metric counts)
ISR shrinks within a reasonable time period, like 10 minute window (should be none)
Also, Yelp has tooling for rolling restarts implemented in Python, which requires Jolokia JMX Agents installed on the brokers, and it polls the metrics to make sure some of the above conditions are true
Assuming your cluster was healthy at the beginning of the restart operation, at a minimum, after each broker restart, you should ensure that the under-replicated partition count returns to zero before restarting the next broker.
As the previous responders mentioned, there is existing code out there to automate this. I don’t use Jolikia, myself, but my solution (which I’m working on now) also uses JMX metrics.
Kakfa Utils by Yelp is one of the best tools that can be used to detect when a kafka broker is "done". Specifically, kafka_rolling_restart is the tool which gets broker details from zookeeper and URP (Under Replicated Partitions) metrics from each broker. When a broker is restarted, total URPs across Kafka cluster is periodically collected and when it goes to zero, it restarts another broker. The controller broker is restarted at the last.

Running zookeeper on a cluster of 2 nodes

I am currently working on trying to use zookeeper in a two node cluster. I have my own cluster formation algorithm running on the nodes based on configuration. We only need Zookeeper's distributed DB functionality.
Is it possible to use Zookeeper in a two node cluster ? Do you know of any solutions where this has been done ?
Can we still retain the zookeepers DB functionality without forming a quorum ?
Note: Fault tolerance is not the main concern in this project. If one of the nodes go down we have enough code logic to run without the zookeeper service. We use the zookeeper to share data when both the nodes are alive.
Would greatly appreciate any help.
Zookeeper is a coordination system which is basically used to coordinate among nodes. When writes are occurred to such a distributed system, in ordered to coordinate and agree upon values which are being stored, all the writes are gone through master (aka leader). Reads can occur through any node. Zookeeper requires a master/leader to be elected per a quorum in order to serve write requests consistently. Zookeeper make use of the ZAB protocol as the consensus algorithm.
In order to elect a leader, a quorum should ideally have an odd number of nodes (Otherwise, a node will not be able to win majority and become the leader). In your case, with two nodes, zookeeper will not possibly be able to elect a leader for a long time since both nodes will be candidates and wait for the other node to vote for it. Even though they elect a leader, your ensemble will not work properly in network patitioning situations.
As I said, zookeeper is not a distributed storage. If you need to use it in a distributed manner (more than one node), it need to form a quorum.
As I see, what you need is a distributed database. Not a distributed coordination system.

Storm fault tolerance: Nimbus reassigns worker to a different machine?

How do I make storm-nimbus to restart worker on the same machine?
To test the fault tolerance, I do a kill -9 on a worker process expecting the worker to be restarted on the same machine, but on one of the machines, nimbus launches the worker on another machine!!!
Nimbus log does not show several tries or anything unusual or errors!
Would appreciate any help, Thanks!
You shouldn't need to. Workers should be able to switch to an open slot on any supervisor. If you have a bolt that doesn't accomodate this because it is reading data on a particular supervisor, this is a design problem.
Additionally, Storm's fault tolerance is intended to handle not only worker failures, but also supervisor failures, in which case you won't be able to restart a worker on the same supervisor. You shouldn't need to worry where a worker is: that's a feature of Storm.