How to get zxid of a zookeeper server? - apache-zookeeper

Zookeeper assigns a unique number for each transaction called zxid. It has two parts - an epoch and a counter. I could find the epoch value in zookeeper's data directory. However I cant find the counter. Does anyone know where I can find it?
In general, how to get zxid for zookeeper?

Turns out its pretty easy
echo srvr | nc localhost 2181

Also looking at the current status of zookeeper server can show the zxid which is answered in another post. Firt i executed telnet:
telnet localhost 2181
Then send following data to server:
stats
and then received following information:
Zookeeper version: 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT
Clients:
/127.0.0.1:54864[1](queued=0,recved=6030,sent=6033)
/192.168.80.1:55675[0](queued=0,recved=1,sent=0)
/192.168.80.1:54769[1](queued=0,recved=432,sent=432)
Latency min/avg/max: 0/0/35
Received: 7104
Sent: 7114
Connections: 3
Outstanding: 0
Zxid: 0xd0
Mode: standalone
Node count: 148
Connection to host lost.
As you see the zxid is currently 0xd0 in my zookeeper server.

Related

Can syslog pri value can be negative?

First i will tell you my architecture
client--->haproxy--->syslog-ng--->kafka
the client is Cisco ASA and haproxy is server for load-balancing and syslog-ng is for receiving ,filtering and sending logs to kafka(destination)
The client sends logs to haproxy and haproxy send logs to syslog-ng using tcp transport
As in tcp the client-server timeout breaks whenever client restored the connection its PRI value is negative which we seeing in wireshark.With this issue the messages gets mixup
Connection restored is normal but PRI value is negative this is incorrect.
I am showing you the the logs
<-1>May 24 2021 17:40:28: %ASA--1-6414004: TCP Syslog Server private:xx.xx.xx.xx/1470 -
Connection restored\\nCAL\\\\John Mike/xxxxxxxxxxxxxxxxxx) to private:xx.xx.xx.xx/xx duration 0:00:00 bytes 142
(John Mike/xxxxxxxxxxxxxxxxxx)\\nxxxxxxx)\\n4 2021 17:40:28: %ASA-6-302016: Teardown UDP connection 1733810491
we've increase the client connection timeout from 1min to 12 hr but the problem is not resolved
Some version of the Cisco ASA TCP Syslog code are affected by bug CSCvz85683:
Symptom:
Wrong syslog message format, ex for 414004:
-1>Sep 08 2021 10:46:25: %ASA--1-6414004: TCP Syslog Server private:xx.xx.xx.xx/1470 - Connection restored\n (xx.xx.xx.xx/64437)
Conditions:
External logging to TCP server is enabled
Workaround:
NA
Further Problem Description:
ASA syslog messages have 6-digit ID
The valid range for message IDs is between 100000 and 999999.
Source: Cisco ASA Series Syslog Messages. About ASA Syslog Messages.
When logging via TCP on versions with the defect code, will shift the priority (6 in this case) into the message code (414004 in this case) and use an illegal priority -1.
According to the bug, this has been fixed in version 9.14.4.

Trying to run Apache/NIFI on Zookeeper from Confluent

I am trying to run Apache/NIFI on confluent-zookeeper. NIFI ver 1.11.3 installed in /opt/nifi by unpacking tar container, confluent is community edition, ver 5.3. installed using confluent repo https://packages.confluent.io/rpm/5.3.
So NIFI works using integrated zookeper, NIFI works if I download zookeeper separatly from Apache/zookeeper site. Confluent Kafka also works with separate zookeeper and NIFI-integrated. BUT I cannot make it works using zookeeper from confluent.
In logs I see only one warning which is:
WARN Received packet at server of unknown type 15 (org.apache.zookeeper.server.ZooKeeperServer)
My config file for all three zookeepers are the same:
tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=myhost1:2888:3888
server.2=myhost2:2888:3888
server.3=myhost3:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
I do not think that Confluent really changed smth in their zookeeper. What could be the reason of this error?
As #BryanBende said:
NiFi 1.11.x (In our particular case, 1.11.4) requires ZK 3.5, please
confirm the version of ZK used in Confluent platform, IF ITS 3.4 THEN
ITS NOT GOING TO WORK – Bryan Bende Mar 27 at 14:35
The typical error you will see in zookeeper log:
Oct 08 17:22:23 some-pro-zk03 zookeeper-server-start[14136]:
[2020-10-08 17:22:23,275] INFO Accepted socket connection from
/10.10.10.1:53794 (org.apache.zookeeper.server.NIOServerCnxnFactory)
Oct 08 17:22:23 some-pro-zk03 zookeeper-server-start[14136]:
[2020-10-08 17:22:23,275] INFO Refusing session request for client
/10.10.10.1:53794 as it hasseen zxid 0x400000000 our last zxid is
0x300000004 client must try another server
(org.apache.zookeeper.server.ZooKeeperServer)
Oct 08 17:22:23
some-pro-zk03 zookeeper-server-start[14136]: [2020-10-08 17:22:23,275]
INFO Closed socket connection for client /10.159.164.93:53794 (no
session established for client)
(org.apache.zookeeper.server.NIOServerCnxn)
The typical error you will see in client log:
2020-10-07 16:00:09,112 ERROR [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave
uporg.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLossat
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)at
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:862)at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:990)at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)at
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)at
java.util.concurrent.FutureTask.run(FutureTask.java:266)at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at
java.lang.Thread.run(Thread.java:748)
This takes a full day of work (reading logs!!) to find the error.
So, check your zookeeper version:
FOR Zookeeper 3.5+
echo srvr | nc localhost 2181
FOR Zookeeper 3.5<
echo stats | nc localost 2181
Also you can use telnet
FOR Zookeeper 3.5+
telnet localhost 2181
srvr
FOR Zookeeper 3.5<
telnet localhost 2181
stats

Too many connections on zookeper server

Environment: HDP 2.6.4
Ambari – 2.6.1
3 zookeeper server
23.1.35.185 - is the IP of the first zookeeper server
hi all,
In the first zookeeper server it seems that even after closing the connection to zookeeper is not getting closed,
which causes the maximum number of client connections to be reached from a host - we have maxClientCnxns as 60 in zookeeper config
As a result when a new application comes and tries to create a connection it fails.
Example when Connections are:
echo stat | nc 23.1.35.185 2181
Latency min/avg/max: 0/71/399
Received: 3031 Sent: 2407
Connections: 67
Outstanding: 622
Zxid: 0x130000004d
Mode: follower
Node count: 3730
But after some time when connection comes to ~70 we see
echo stat | nc 23.1.35.185 2181
Ncat: Connection reset by peer.
And We can see also many CLOSE_WAIT
java 58936 zookeeper 60u IPv6 381963738 0t0 TCP Zookeper_server.sys54.com:eforward->zookeper_server.sys54.com:44983 (CLOSE_WAIT)
From the zookeeper log
2018-12-26 02:50:46,382 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#193]
- Too many connections from /23.1.35.185 - max is 60
In the ambari we can see also
Connection failed: [Errno 104] Connection reset by peer to zookeper_server.sys54.com.:2181
I must to say that this not happening on zookeeper servers 2 and 3
NOTE - if we increase the maxClientCnxns to 300 , its not help because after some time we get more the 300 connections ( CLOSE_WAIT ) and then we see from the log
2018-12-26 02:50:49,375 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#193] - Too many connections from /23.1.35.187 - max is 300
so any hint why the connection are CLOSE_WAIT ?
CLOSE_WAIT means that the local end of the connection has received a FIN from the other end, but the OS is waiting for the program at the local end to actually close its connection.
The problem is your program running on the local machine is not closing the socket. It is not a TCP tuning issue. A connection can (and quite correctly) stay in CLOSE_WAIT forever while the program holds the connection open.
Once the local program closes the socket, the OS can send the FIN to the remote end which transitions you to LAST_ACK while you wait for the ACK of the FIN. Once that is received, the connection is finished and drops from the connection table (if your end is in CLOSE_WAIT you do not end up in the TIME_WAIT state).
There is a kernel level property to reuse the connection and reduce the CLOSE_WAIT time.
I suggest you to follow this tutorial http://www.linuxbrigade.com/reduce-time_wait-socket-connections/
This should probably solve your problem.

Zookeeper refuses Kafka connection from an old client

I have a cluster configuration using Kubernetes on GCE, I have a pod for zookeeper and other for Kafka; it was working normally until Zookeeper get crashed and restarted, and it start refusing connections from the kafka pod:
Refusing session request for client /10.4.4.58:52260 as it has seen
zxid 0x1962630
The complete refusal log is here:
2017-08-21 20:05:32,013 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /10.4.4.58:52260
2017-08-21 20:05:32,013 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#882] - Connection request from old client /10.4.4.58:52260; will be dropped if server is in r-o mode
2017-08-21 20:05:32,013 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#901] - Refusing session request for client /10.4.4.58:52260 as it has seen zxid 0x1962630 our last zxid is 0xab client must try another server
2017-08-21 20:05:32,013 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1008] - Closed socket connection for client /10.4.4.58:52260 (no session established for client)
Because the kafka maintain a zookeeper session which remember the last zxid it has seen. So when the zookeeper sevice go down and come again, the zk's zxid begin from a smaller value. and ZKserver think the kafka has seen a bigger zxid, so it refuse it.
Have a try to restart the kafka.
For the record, I had this problem and all my kafka were off.
But, my kafka-manager was still up and listening on zookeepers. Turning it off resolved the issue.
Related to the answer from #GuangshengZuo.... Steps
Stop any residual zookeeper instances - zookeeper-server-stop.bat
Start a fresh zookeeper- zookeeper-server-start.bat .\config\zookeeper.properties
This will do

Zookeeper status - telnet connections: 4

Could someone help me to understand is it required to have 4 connections for zookeeper.
My requirement is simple - I want to run a apache kafka with spark in my local machine. As per the kafka documentation I had started the zookeeper under the kafka bin and wanted to confirm if my zookeeper is up.
So, tried "telnet localhost 2181" from the command prompt.
And got the below ouput:
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
stats
Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT
Clients:
/127.0.0.1:34231[1](queued=0,recved=436,sent=436)
/127.0.0.1:34230[1](queued=0,recved=436,sent=436)
/127.0.0.1:37719[0](queued=0,recved=1,sent=0)
/127.0.0.1:34232[1](queued=0,recved=436,sent=436)
Latency min/avg/max: 0/0/42
Received: 2127
Sent: 2136
Connections: 4
Outstanding: 0
Zxid: 0x143
Mode: standalone
Node count: 51
Connection closed by foreign host.
I would like to know as why the connection is saying 4 with 4 clients. what does that actually mean?
Thank you in advance to help me understand if 4 clients are required.
I would like to know as why the connection is saying 4 with 4 clients. what does that actually mean?
It means there are currently four connections open to zookeeper. This connection:
/127.0.0.1:37719[0](queued=0,recved=1,sent=0)
is your telnet localhost 2181 connection.