How to validate zookeeper quorum - apache-zookeeper

How do I verify that all the nodes in a zookeeper are part of a quorum and are healthy? Manual talks about "ruok" but that doesnt still say if the zookeeper node is part of quorum and is in sync with the rest.

You can use the srvr command documented in The Four Letter Words to get more detailed status information about each ZooKeeper server in the ensemble. See below for sample output from a 3-node cluster, with hosts named ubuntu1, ubuntu2 and ubuntu3.
The Mode field will tell you if that particular server is the leader or a follower. The Zxid field refers to the ZooKeeper cluster's internal transaction ID used for tracking state changes to the tree of znodes. In a healthy cluster, you'll see one leader, multiple followers, and all nodes will generally be close to one another in the zxid value.
> for x in ubuntu1 ubuntu2 ubuntu3; do echo $x; echo srvr|nc $x 2181; echo; done
ubuntu1
Zookeeper version: 3.4.7-1713338, built on 11/09/2015 04:32 GMT
Latency min/avg/max: 3/9/21
Received: 9
Sent: 8
Connections: 1
Outstanding: 0
Zxid: 0x100000004
Mode: follower
Node count: 6
ubuntu2
Zookeeper version: 3.4.7-1713338, built on 11/09/2015 04:32 GMT
Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x100000004
Mode: leader
Node count: 6
ubuntu3
Zookeeper version: 3.4.7-1713338, built on 11/09/2015 04:32 GMT
Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x100000004
Mode: follower
Node count: 6

Related

How to get more information about the Zookeeper status in Confluent

I am making Zookeeper cluster, and I separately start the Zookeeper in Confluent by using:
./bin/zookeeper-server-start etc/kafka/zookeeper.properties
and I want to get the status of Zookeeper.
I search it online, and all of them is using:
./zkServer.sh status
But I can't find zkServer.sh in Confluent.
I know that I can use ./bin/confluent status to get status. But I want more information about the Zookeeper like follow:
./zkServer.sh status
JMX enabled by default
Using config: /opt/../conf/zoo.cfg
Mode: follower
How can I do that?
You can use the Four Letter Words to get the same information or better instead. The output from stat:
$ echo "stat" | nc <ZOOKEEPER-IP-ADDRESS> 2181
Zookeeper version: 3.4.10
Clients:
/192.168.1.2:49618[1](queued=0,recved=1304,sent=1304)
/192.168.1.3:53484[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/15
Received: 1330
Sent: 1329
Connections: 2
Outstanding: 0
Zxid: 0x1000001ee
Mode: leader
Node count: 435
The output from conf:
$ echo "conf" | nc <ZOOKEEPER-IP-ADDRESS> 2181
clientPort=2181
dataDir=/var/zookeeper/data
dataLogDir=/var/log/zookeeper
tickTime=2000
maxClientCnxns=0
minSessionTimeout=4000
maxSessionTimeout=40000
serverId=3
initLimit=20
syncLimit=5
electionAlg=3
electionPort=3888
quorumPort=2888
peerType=0

Haproxy backend stays down and is never brought up again

It works fine up until the moment the remote server becomes unavailable for some time. In which case the server goes down in the logs and is never brought up again. Config is quite simple:
defaults
retries 3
timeout connect 5000
timeout client 3600000
timeout server 3600000
log global
option log-health-checks
listen amazon_ses
bind 127.0.0.2:1234
mode tcp
no option http-server-close
default_backend bk_amazon_ses
backend bk_amazon_ses
mode tcp
no option http-server-close
server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 30s fall 120 rise 1
Here are the logs when the problem occurs:
Jul 3 06:45:35 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30004ms, status: 119/120 UP.
Jul 3 06:46:35 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30003ms, status: 118/120 UP.
Jul 3 06:47:35 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30002ms, status: 117/120 UP.
...
Jul 3 08:44:36 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30000ms, st
atus: 0/1 DOWN.
Jul 3 08:44:36 jupiter haproxy[40331]: Server bk_amazon_ses/amazon is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued,
0 remaining in queue.
Jul 3 08:44:36 jupiter haproxy[40331]: backend bk_amazon_ses has no server available!
And that's it. Nothing but operator intervention brings the server back to life. I also tried removing the check part and what follows it - still the same thing happens. Can't haproxy be configured to try indefinitely and not mark a server DOWN? Thanks.

Zookeeper status - telnet connections: 4

Could someone help me to understand is it required to have 4 connections for zookeeper.
My requirement is simple - I want to run a apache kafka with spark in my local machine. As per the kafka documentation I had started the zookeeper under the kafka bin and wanted to confirm if my zookeeper is up.
So, tried "telnet localhost 2181" from the command prompt.
And got the below ouput:
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
stats
Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT
Clients:
/127.0.0.1:34231[1](queued=0,recved=436,sent=436)
/127.0.0.1:34230[1](queued=0,recved=436,sent=436)
/127.0.0.1:37719[0](queued=0,recved=1,sent=0)
/127.0.0.1:34232[1](queued=0,recved=436,sent=436)
Latency min/avg/max: 0/0/42
Received: 2127
Sent: 2136
Connections: 4
Outstanding: 0
Zxid: 0x143
Mode: standalone
Node count: 51
Connection closed by foreign host.
I would like to know as why the connection is saying 4 with 4 clients. what does that actually mean?
Thank you in advance to help me understand if 4 clients are required.
I would like to know as why the connection is saying 4 with 4 clients. what does that actually mean?
It means there are currently four connections open to zookeeper. This connection:
/127.0.0.1:37719[0](queued=0,recved=1,sent=0)
is your telnet localhost 2181 connection.

Mesos-marthon cluster issue Could not determine the current leader

I am new in Mesos and Marathon services. I have setup 3 master and 3 slave server as per www.digitalocean.com. Configured as it is in master servers as well as slaves. Finally I done setup of Mesos, Marathon, Zookeeper and Chronos. Mesos is able to listing with 5050, Marathon is 8080 and Chronos 4400. After few hours my Marthon instances are showing like Error 503
HTTP ERROR: 503
Problem accessing /. Reason:
Could not determine the current leader
Powered by Jetty:// 9.3.z-SNAPSHOT.
But mesos is working fine. Every time i am facing this problem and if i restart the marathon service and zookeeper service its working fine.
Marathon
Jun 15 06:19:20 master3 marathon[1054]: INFO Waiting for consistent leadership state. Are we leader?: false, leader: Some(192.168.4.78:8080 (mesosphere.marathon.api.LeaderProxyFilter$:qtp522188921-35)
Jun 15 06:19:20 master3 marathon[1054]: INFO Waiting for consistent leadership state. Are we leader?: false, leader: Some(192.168.4.78:8080 (mesosphere.marathon.api.LeaderProxyFilter$:qtp522188921-35)
Zookeeper
2016-06-15 03:41:13,797 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#197] - Accepted socket connection from /192.168.4.78:38339
2016-06-15 03:41:13,798 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running

How to get zxid of a zookeeper server?

Zookeeper assigns a unique number for each transaction called zxid. It has two parts - an epoch and a counter. I could find the epoch value in zookeeper's data directory. However I cant find the counter. Does anyone know where I can find it?
In general, how to get zxid for zookeeper?
Turns out its pretty easy
echo srvr | nc localhost 2181
Also looking at the current status of zookeeper server can show the zxid which is answered in another post. Firt i executed telnet:
telnet localhost 2181
Then send following data to server:
stats
and then received following information:
Zookeeper version: 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT
Clients:
/127.0.0.1:54864[1](queued=0,recved=6030,sent=6033)
/192.168.80.1:55675[0](queued=0,recved=1,sent=0)
/192.168.80.1:54769[1](queued=0,recved=432,sent=432)
Latency min/avg/max: 0/0/35
Received: 7104
Sent: 7114
Connections: 3
Outstanding: 0
Zxid: 0xd0
Mode: standalone
Node count: 148
Connection to host lost.
As you see the zxid is currently 0xd0 in my zookeeper server.