Kafka won't start if a Zookeeper node is down - apache-kafka

I have Kafka and Zookeeper co-located on the same servers, with multiple nodes.
In Kafka's server.properties, I have a line like
zookeeper.connect=server1:2181,server2:2181...
the problem is, Kafka will not start until all of the Zookeeper nodes are available. Otherwise, I will get an error like "fatal error during Kafka startup" and "Timed out waiting for connection while in state: CONNECTING" even though the other Zookeeper nodes are up.
This makes it challenging to script startup of each node independently, since the startup scripts on one node are dependent on the state of other nodes.
First: is this expected behavior or am I doing something wrong? Suppose I have 3 nodes in Zookeeper cluster; all 3 nodes have to be up for Kafka to start? That seems counterintuitive, since a larger cluster would actually increase the chance of failure on startup rather than provide more resiliency.
Second: What's a good solution for this? Is the only approach to make Kafka on each node wait until Zookeeper is fully up on all nodes?

As far as I know, this is a prerequisite for Kafka to start up correctly, and I don't think too much of a burden. If the zookeeper cluster itself is already having problems at startup time, Kafka itself might run into problems, so ensuring that the Zookeeper cluster is healthy is a good initial check, IMHO.
A way to get around this limitation is to configure a single-node Zookeeper cluster, and tell Kafka to use that cluster. After the fact, you can grow the zookeeper cluster to 3 or more nodes, while Kafka is already up and running. More details can be found here:
Adding new ZooKeeper node in Kafka cluster?
For the record, Kafka itself is completely fine if the Zookeeper cluster goes down once it's up and running. It just wouldn't be able to accept new producer/consumer connections or create topics, but the current ones that are active on the cluster continue to work just fine.

We have met the same problem in our production environment.
It turns out to be a bug (ZOOKEEPER-2184) from zookeeper library which kafka uses talking to zookeeper.
Our kafka version is 1.1.1 which use zookeeper-3.4.10.jar.
After we replaced it with zookeeper-3.4.13.jar, kafka can restart successfully.

Related

Kafka Cluster cotinues to run without zookeeper

I have a five node kafka cluster(confluent 5.5 community edition) with 3 zookeeper nodeseach on different aws instances.
While doing failover testing , noticed that the kafka cluster works fine even if all zookeeper nodes are down.
I was able to produce , consume and also create new consumers.
why does the kafka cluster not stop if it cannot connect to any zookeeper nodes ?
What would be the possible issues if we are unaware of such a failure scenario in production and kafka cluster continues to run without zookeeper connectivity ?
how do we handle such a scenario ?
Broker leader election, topic creation, simple ACLs (if you use them) still depend on Zookeeper. For other basic functions relying on the Kafka bootstrap protocols, they might still work, sure. There should definitely be broker logs indicating connection was lost
Ideally you'd have basic process healthchecking and incident management software that you shouldn't miss critical services going down in prod
How to handle? Restart Zookeeper...

How to handle failure senario for kafka and zookeeper in kubernetes

What I have zookeeper setup which is running on server1, server2 and server3 and similarly kafka also running in server1, server2 and server3.
Setup are running in kubernetes.
Problem statement:
In case one zookeeper setup get down entire setup will get down, because kafka is depended to zookeeper. am i right?
If Q1 correct - Is there any way to make setup like if one zookeeper server will get down then kafka should run as it is?
How to expose kafka port in kubernetes setup ?
what is the recommended way to persist data in kubernetes for production server ?
I fail to see how Zookeeper questions are related to k8s... But you definitely should set affinity rules such that Zookeeper and Kafka are not on the same physical servers or sharing same disks
If one Zookeeper out of three goes down, you'll end up with a split brain event in that no single Zookeeper knows which should be responsible for leadership. This effectively can crash or corrupt Kafka, yes.
To mitigate that risk, you can choose to run 5 Zookeepers, in which case you can lose up to 3 servers to reach the same state. The Definitive Guide book covers these concepts in the first few chapters
Regarding the other questions - NodePorts and PVCs, generally speaking.
Use one of the popular Kafka Operators on Github and you'll not need to think too hard about setting those properties
You still must manually perform Kafka admin tasks in any installation... You can use extra services like Cruise Control if you want to reduce that workload, though

During rolling upgrade/restart, how to detect when a kafka broker is "done"?

I need to automate a rolling restart of a kafka cluster (3 kafka brokers). I can easily do it manually - restart one after the other, while checking the log to see when it's fine (e.g., when the new process has joined the cluster).
What is a good way to automate this check? How can I ask the broker whether it's up and running, connected to its peers, all topics up-to-date and such? In my restart script, I have access to the metrics, but to be frank, I did not really see one there which gives me a clear picture.
Another way would be to ask what a good "readyness" probe would be that does not simply check some TCP/IP port, but looks at the actual server...
I would suggest exposing JMX metrics and tracking the following for cluster health
the controller count (must be 1 over the whole cluster)
under replicated partitions (should be zero for healthy cluster)
unclean leader elections (if you don't disable this in server.properties make sure there are none in the metric counts)
ISR shrinks within a reasonable time period, like 10 minute window (should be none)
Also, Yelp has tooling for rolling restarts implemented in Python, which requires Jolokia JMX Agents installed on the brokers, and it polls the metrics to make sure some of the above conditions are true
Assuming your cluster was healthy at the beginning of the restart operation, at a minimum, after each broker restart, you should ensure that the under-replicated partition count returns to zero before restarting the next broker.
As the previous responders mentioned, there is existing code out there to automate this. I don’t use Jolikia, myself, but my solution (which I’m working on now) also uses JMX metrics.
Kakfa Utils by Yelp is one of the best tools that can be used to detect when a kafka broker is "done". Specifically, kafka_rolling_restart is the tool which gets broker details from zookeeper and URP (Under Replicated Partitions) metrics from each broker. When a broker is restarted, total URPs across Kafka cluster is periodically collected and when it goes to zero, it restarts another broker. The controller broker is restarted at the last.

Flink with zookeeper: Service temporarily unavailable due to an ongoing leader election. Please refresh

I want to run the flink cluster with High-availability mode. Hence I have made the setting as per JobManager High Availability into flink configuration files. When I start the zookeeper quorum by using start-zookeeper-quorum.sh, I am able to start two zookeerper servers(peers) on two machines. but when I start the flink cluster with 2 JobManagers, I get the message as Service temporarily unavailable due to an ongoing leader election. Please refresh. on web UI of flink.
What does this massage means? Is there a way to notify the leader in configuration file?
The problem is with your zookeeper installation. Your zk nodes can not choose a leader. Also number of two nodes is not the best choice. You should have at least 3 instances or other greater odd number.
You should check the admin docs of Zookeeper for instance here

How to migrate Kafka from old Zookeeper cluster to new Zookeeper cluster with different znode parent path

I have a three-node Kafka cluster in service running on a separate three-node Zookeeper cluster. I intend to switch Kafka to use a new five-node Zookeeper cluster, and although I have found information about doing that, I have an extra wrinkle where Kafka will be using a custom znode parent path on the new cluster.
For instance, my current Kafka Zookeeper string looks something like this:
192.0.2.11:2181,192.0.2.12:2181,192.0.2.13:2181
I'm looking to switch it to this:
192.0.2.21:2181,192.0.2.22:2181,192.0.2.23:2181,192.0.2.24:2181,192.0.2.25:2181/kafka/uid1
The reason for this is that we intend to reuse the larger Zookeeper cluster for other Kafka clusters. Don't worry, this is for testing and not production. However, we still want to do this without losing any data on the stream that is coming into Kafka, so we want to do this without taking anything down.
Is this possible?
I have come across the following questions:
Copy/Migrate old zookeeper znode/data to new zookeeper
best way to copy data across 2 zookeeper cluster?
Unfortunately they appear to require some downtime, which I'm hoping to avoid.
This page (https://qgraph.io/blog/migrating-kafka-zookeeper-cluster/) was a little more helpful in the way of rollover, but not with znode migration.
I've been looking for 'znode symlinks' or 'specifying znode path per zookeeper server' but neither seem possible. Am I out of luck and require downtime and possibly lost data?
By what I can tell, there is no way to move Kafka's parent znode without restarting Kafka. There are no such things as hard or soft links for znodes: https://www.igvita.com/2010/04/30/distributed-coordination-with-zookeeper/