I am new to Mesos and I have built a mesos cluster of three CentOS 7 nodes (all three nodes acting as masters and slaves) in a local hypervisor.
The nodes are named mesos1, mesos2 and mesos3
I have this running with Zookeeper marathon and chronos. I was wondering how I can check who is the acting mesos master at any given time, when I came across this post!
I also found out that I could find the leading master of mesos by adding
/redirect
to the endpoint.
So when I tried that, Mesos UI at 5050 pport redirected me to the node mesos2.
However, when I tried to find the zookeeper leader using this command:
/opt/zookeeper/bin/zkServer.sh status
I got the following response that the leader was mesos3
[root#mesos3 ~]# /opt/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader
I am confused: Shouldn't the mesos master be the node that is indicated as leader by zookeeper?
Any help is greatly appreciated
Mesos and Zookeeper have distinct concepts of leadership and it sounds like you are mixing the two up. You are running three instances of the Zookeeper server process which form a quorum, and three Mesos master processes which also form a quorum that happen to be on the same set of servers. Mesos uses Zookeeper for storing state and other critical functions but Zookeeper's cluster leadership is in no way related to Mesos cluster leadership. They do use very similar terminology so it is easy to see how they can be mixed up.
Related
I have a five node kafka cluster(confluent 5.5 community edition) with 3 zookeeper nodeseach on different aws instances.
While doing failover testing , noticed that the kafka cluster works fine even if all zookeeper nodes are down.
I was able to produce , consume and also create new consumers.
why does the kafka cluster not stop if it cannot connect to any zookeeper nodes ?
What would be the possible issues if we are unaware of such a failure scenario in production and kafka cluster continues to run without zookeeper connectivity ?
how do we handle such a scenario ?
Broker leader election, topic creation, simple ACLs (if you use them) still depend on Zookeeper. For other basic functions relying on the Kafka bootstrap protocols, they might still work, sure. There should definitely be broker logs indicating connection was lost
Ideally you'd have basic process healthchecking and incident management software that you shouldn't miss critical services going down in prod
How to handle? Restart Zookeeper...
I have Kafka and Zookeeper co-located on the same servers, with multiple nodes.
In Kafka's server.properties, I have a line like
zookeeper.connect=server1:2181,server2:2181...
the problem is, Kafka will not start until all of the Zookeeper nodes are available. Otherwise, I will get an error like "fatal error during Kafka startup" and "Timed out waiting for connection while in state: CONNECTING" even though the other Zookeeper nodes are up.
This makes it challenging to script startup of each node independently, since the startup scripts on one node are dependent on the state of other nodes.
First: is this expected behavior or am I doing something wrong? Suppose I have 3 nodes in Zookeeper cluster; all 3 nodes have to be up for Kafka to start? That seems counterintuitive, since a larger cluster would actually increase the chance of failure on startup rather than provide more resiliency.
Second: What's a good solution for this? Is the only approach to make Kafka on each node wait until Zookeeper is fully up on all nodes?
As far as I know, this is a prerequisite for Kafka to start up correctly, and I don't think too much of a burden. If the zookeeper cluster itself is already having problems at startup time, Kafka itself might run into problems, so ensuring that the Zookeeper cluster is healthy is a good initial check, IMHO.
A way to get around this limitation is to configure a single-node Zookeeper cluster, and tell Kafka to use that cluster. After the fact, you can grow the zookeeper cluster to 3 or more nodes, while Kafka is already up and running. More details can be found here:
Adding new ZooKeeper node in Kafka cluster?
For the record, Kafka itself is completely fine if the Zookeeper cluster goes down once it's up and running. It just wouldn't be able to accept new producer/consumer connections or create topics, but the current ones that are active on the cluster continue to work just fine.
We have met the same problem in our production environment.
It turns out to be a bug (ZOOKEEPER-2184) from zookeeper library which kafka uses talking to zookeeper.
Our kafka version is 1.1.1 which use zookeeper-3.4.10.jar.
After we replaced it with zookeeper-3.4.13.jar, kafka can restart successfully.
I currently have a 3 node Kafka cluster which connects to base chroot path in my zookeeper ensemble.
zookeeper.connect=172.12.32.123:2181,172.11.43.211:2181,172.18.32.131:2181
Now, I want to add a new 5 node Kafka cluster which will connect to some other chroot path in the same zookeeper ensemble.
zookeeper.connect=172.12.32.123:2181,172.11.43.211:2181,172.18.32.131:2181/cluster/2
Will these configurations work as in the relative paths for the two chroots? I understand that the original Kafka cluster should have been connected on some path other than the base chroot path for better isolation.
Also, is it good to have same zookeeper ensemble across Kafka clusters? The documentation says that it is generally better to have isolated zookeeper ensembles for different clusters.
If you're only limited to a single Zookeeper cluster, then it should work out fine with a unique chroot that doesn't collide with the other cluster's znodes.
It is not "good" to share, no, because Zookeeper losing quorum causes two clusters to be down, but again if you're limited on hardware, then it'll still work
Note: You can only afford to lose one ZK server with 3 nodes in the cluster, which is why a cluster of 5 is recommended
I am having difficultly in understanding how the leader , follower mechanism works , lets say i am building a distributed application with 2 master node , 6 slave nodes and 3 zookeeper node with one zookeeper node being a leader and among 2 master node 1 being active and connected to zookeeper leader.
My questions here are
Does my master nodes are called as master just because its connected zookeeper leader , (i.e) My node called as master since its Znode connected to Zookeeper leader ?
Does a leader election mechanism happens when a leader zookeeper node dies ? and how it will impact our master , does our master would be connected to the newly elected leader ?
If our application's master node dies ,does the standby master node would be notified if it listens to master's znode , if so is it enough for our standby node has ephemeral sequential node or any other thing we need to do to make it as a master node active?
Zookeeper documentations are saying that writes are happening through only leader and it broadcasts to other follower nodes and reads are serviced directly from follower nodes .
Is this has any relation with read and write design i do with my application (i.e) i have intention to design that my writes has to be happened through my master and reads are through my slaves , zookeeper's broadcasting ability has to do anything with it ? or the zookeeper's writes are completely different from the application's write.
Sorry if i asked anything doesn't make sense , please help me to understand. Any resources which explains these would be very helpful for me.
Assumed that you are using Curator to elect master.I will explain the process of master election of Curator Recipe, then you may figure out all your questions.
Master Election use two features of ZooKeeper, ephemeral node and sequential node
The app node which got the least number will be elected as master and the session will become the ephemeral owner
After your master app node dies, ZooKeeper will delete and noticed all the node which are watching that znode
I want to install 2 node Kafka cluster on Amazon EC2.
I follow the steps from this link: https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
Also, I want to have zookeeper on both nodes, because If I have it only on one node, if that node dies, my kafka cluster dies.
In step 9 (Installing multi-node cluster), they say that I need to modify zookeeper.connect in kafka server properties, so that it has comma separated list of ip:port for each node where zookeeper is installed.
On the other hand, when I want to create a topic, in the script I only specify 1 zookeeper!
1) Will the other zookeeper node know that the topic has been created?
2) In case that 1 zookeeper node fails, will the other one takeover?
3) `When the failed node goes up again, will it take again the information about topics from the node that stayed alive?
Regards,
Srdjan
You should create a cluster with no less than three nodes. Like Serejja mentioned, it should be odd-numbered for fault-tolerance.
3,5,7,9 etc.
For Kafka, you should specify a --replication-factor when creating the topic. In a three node cluster, it's recommended to set it to two or three.
In this scenario if one of the brokers goes down, the data will get replicated across the available nodes, and then once the unavailable node comes back online, the data will propagate to it.
The Kafka Documentation is fantastic, and I recommend further reading of the Replication topic.