Zookeeper not recovering after timeout - apache-zookeeper

Zookeeper is not properly recovering after timeout and falling into a non working state without restarting.
What can cause this, how can we resolve it?
config
ZOO_TICK_TIME: 2000
ZOO_INIT_LIMIT: 30000
ZOO_SYNC_LIMIT: 10
ZOO_MAX_CLIENT_CNXNS: 2000
ZOO_STANDALONE_ENABLED: 'false'
ZOO_AUTOPURGE_PURGEINTERVAL: 1
ZOO_AUTOPURGE_SNAPRETAINCOUNT: 10
ZOO_LOG4J_PROP: INFO,ROLLINGFILE
log
2020-01-10 20:58:57,473 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 2 (n.leader), 0x2200061493 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x22 (n.peerEPoch), FOLLOWING (my state)0 (n.config version)
2020-01-10 20:59:57,484 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 2 (n.leader), 0x2200061493 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x22 (n.peerEPoch), FOLLOWING (my state)0 (n.config version)
2020-01-10 21:00:57,495 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 2 (n.leader), 0x2200061493 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x22 (n.peerEPoch), FOLLOWING (my state)0 (n.config version)
2020-01-10 21:01:07,510 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Follower#96] - Exception when following the leader
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2020-01-10 21:01:07,516 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Follower#201] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
2020-01-10 21:01:07,516 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):LearnerZooKeeperServer#165] - Shutting down
2020-01-10 21:01:07,517 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#558] - shutting down
2020-01-10 21:01:07,517 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FollowerRequestProcessor#139] - Shutting down
2020-01-10 21:01:07,518 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):CommitProcessor#362] - Shutting down
2020-01-10 21:01:07,518 [myid:1] - INFO [FollowerRequestProcessor:1:FollowerRequestProcessor#110] - FollowerRequestProcessor exited loop!
2020-01-10 21:01:07,518 [myid:1] - INFO [CommitProcessor:1:CommitProcessor#195] - CommitProcessor exited loop!
2020-01-10 21:01:07,518 [myid:1] - INFO [CommitProcessor:1:CommitProcessor#195] - CommitProcessor exited loop!
2020-01-10 21:01:07,526 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FinalRequestProcessor#514] - shutdown of request processor complete
2020-01-10 21:01:07,659 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):SyncRequestProcessor#191] - Shutting down
2020-01-10 21:01:07,660 [myid:1] - INFO [SyncThread:1:SyncRequestProcessor#169] - SyncRequestProcessor exited!
2020-01-10 21:01:07,661 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1318] - PeerState set to LOOKING
2020-01-10 21:01:07,661 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1193] - LOOKING
2020-01-10 21:01:07,664 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection#885] - New election. My id = 1, proposed zxid=0x23007a7d69
2020-01-10 21:01:07,666 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 1 (n.leader), 0x23007a7d69 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0x23 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-01-10 21:01:07,668 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 3 (n.leader), 0x2200061493 (n.zxid), 0x1 (n.round), LEADING (n.state), 3 (n.sid), 0x23 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-01-10 21:01:07,668 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 1 (n.leader), 0x23007a7d69 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x23 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-01-10 21:01:07,870 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1281] - LEADING
2020-01-10 21:01:07,885 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#66] - TCP NoDelay set to: true
2020-01-10 21:01:07,885 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#86] - zookeeper.leader.maxConcurrentSnapshots = 10
2020-01-10 21:01:07,885 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#88] - zookeeper.leader.maxConcurrentSnapshotTimeout = 5
2020-01-10 21:01:07,890 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#938] - minSessionTimeout set to 4000
2020-01-10 21:01:07,890 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#947] - maxSessionTimeout set to 40000
2020-01-10 21:01:07,890 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#166] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /datalog/version-2 snapdir /data/version-2
2020-01-10 21:01:07,900 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#464] - LEADING - LEADER ELECTION TOOK - 29 MS
2020-01-10 21:01:07,918 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FileTxnSnapLog#384] - Snapshotting: 0x23007a7d69 to /data/version-2/snapshot.23007a7d69
2020-01-10 21:01:07,964 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,040 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,318 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,491 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,491 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,501 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,501 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,509 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,510 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,519 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 3 (n.leader), 0x23007a7d72 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x23 (n.peerEPoch), LEADING (my state)0 (n.config version)
2020-01-10 21:01:08,524 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,524 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,535 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,535 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,545 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,545 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,572 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,574 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,582 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,583 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,594 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,594 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,595 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
...

For the typical symptomatic categories listed above, use the lists below to check if you are seeing frequent disconnects of healthy clients
Network connectivity problems​:
Use ifconfig to check a number of errors on NIC's. If high, this can
account for increased latency.
I/O starvation​
It's better if Zookeeper is installed on dedicated nodes when high performance is needed
Check your HDD performance, e.g. using hdparm -tT
Make sure that directory for ZK transaction log (/opt/mapr/zkdata by default) is on fast dedicated drive
Property is dataLogDir in the $ZK_DIR/conf/zoo.cfg
Make sure that ZooKeeper heap size is not larger than the RAM available to avoid swapping
GC starvation
​​- The symptom is frequent client disconnects and session expiration due to starvation of heartbeat thread.
Use jstat -gc to see if there are frequent full garbage collections
Use alternative GC collector, if issue is present, like for example ConcurrentMarkSweep. For this, put -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
Client-side timeouts
Check swapping on client machines free -m.
Possibly increase maxSessionTimeout if set to too low value
If no obvious problems detected, run https://github.com/phunt/zk-smoketest to check if the problem is with zookeeper at all
Virtual Environments - if ZK cluster is on some shared hosting, this can cause resource starvation and introduce latency.
Reference link : https://mapr.com/support/s/article/How-do-I-troubleshoot-ZK-connection-timeout-issues?language=en_US

This
2020-01-10 21:01:07,900 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#464] - LEADING - LEADER ELECTION TOOK - 29 MS
2020-01-10 21:01:07,918 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FileTxnSnapLog#384] - Snapshotting: 0x23007a7d69 to /data/version-2/snapshot.23007a7d69
is telling me, the zk node became leader and it started writing snapshot (after that, it is probably going to send snapshot to followers). Then node will start serving queries. But if snapshot is big (or /data/version-2/ underlying hardware is slow) writing will take significant amount of time. If it takes longer than maxSessionTimeout 40000, all active sessions and all ephemeral data will be lost.
If that is the case, then the only good way to avoid it is to reduce amount of data stored in zookeeper. For example, store heavy blobs somewhere else (file/nfs/hdfs/aws/sql/http/...), and in zk only write timestamps (and path in storage maybe), so that the application still could receive data-change-notications quickly from zk.

Related

Setting up Kafka cluster with 3 Zookeeper and 3 Broker node

On setting up Kafka cluster with 3 Zookeeper and 3 Broker node, only one of the broker node starts and remaining two doesn't. Cluster is present in the same host.on running the command for other two broker they just start for a second and then closes.
The error log message is-
[2021-02-02 20:19:12,874] WARN Unable to read additional data from client sessionid 0x3004e189c730000, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn)
[2021-02-02 20:19:12,877] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner)
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:84)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:118)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1253)
[2021-02-02 20:19:12,966] INFO shutdown called (org.apache.zookeeper.server.quorum.Learner)
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1257)
[2021-02-02 20:19:12,967] INFO Shutting down (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:12,967] INFO shutting down (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:12,967] INFO Shutting down (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
[2021-02-02 20:19:12,967] INFO Shutting down (org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-02 20:19:12,967] INFO FollowerRequestProcessor exited loop! (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
[2021-02-02 20:19:12,967] INFO CommitProcessor exited loop! (org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-02 20:19:12,972] INFO shutdown of request processor complete (org.apache.zookeeper.server.FinalRequestProcessor)
[2021-02-02 20:19:12,983] INFO Shutting down (org.apache.zookeeper.server.SyncRequestProcessor)
[2021-02-02 20:19:12,983] INFO SyncRequestProcessor exited! (org.apache.zookeeper.server.SyncRequestProcessor)
[2021-02-02 20:19:13,040] WARN PeerState set to LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-02 20:19:13,040] INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-02 20:19:13,041] INFO New election. My id = 3, proposed zxid=0xa00000037 (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,041] INFO Notification: 2 (message format version), 1 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,042] INFO Notification: 2 (message format version), 3 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,043] INFO Notification: 2 (message format version), 2 (n.leader), 0x900000011 (n.zxid), 0x1 (n.round), LEADING (n.state), 2 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,044] INFO Notification: 2 (message format version), 3 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,245] INFO LEADING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-02 20:19:13,248] INFO TCP NoDelay set to: true (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,248] INFO zookeeper.leader.maxConcurrentSnapshots = 10 (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,248] INFO zookeeper.leader.maxConcurrentSnapshotTimeout = 5 (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,249] INFO minSessionTimeout set to 4000 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:13,249] INFO maxSessionTimeout set to 40000 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:13,249] INFO Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /u01/data05/zookeeper/version-2 snapdir /u01/data05/zookeeper/version-2 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:13,250] INFO LEADING - LEADER ELECTION TOOK - 5 MS (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,251] INFO Snapshotting: 0xa00000037 to /u01/data05/zookeeper/version-2/snapshot.a00000037 (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2021-02-02 20:19:13,861] INFO Notification: 2 (message format version), 2 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0xa (n.peerEPoch), LEADING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,877] INFO Follower sid: 2 : info : server:12888:13888:participant (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:13,902] INFO On disk txn sync enabled with snapshotSizeFactor 0.33 (org.apache.zookeeper.server.ZKDatabase)
[2021-02-02 20:19:13,902] INFO Synchronizing with Follower sid: 2 maxCommittedLog=0xa00000037 minCommittedLog=0x600000041 lastProcessedZxid=0xa00000037 peerLastZxid=0xa00000037 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:13,902] INFO Sending DIFF zxid=0xa00000037 for peer sid: 2 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:13,924] INFO Have quorum of supporters, sids: [ [2, 3],[2, 3] ]; starting up and setting last processed zxid: 0xb00000000 (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,928] INFO Configuring CommitProcessor with 2 worker threads. (org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-02 20:19:13,932] INFO Using checkIntervalMs=60000 maxPerMinute=10000 (org.apache.zookeeper.server.ContainerManager)
[2021-02-02 20:19:14,248] INFO Follower sid: 1 : info : server:2888:3888:participant (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:14,259] INFO On disk txn sync enabled with snapshotSizeFactor 0.33 (org.apache.zookeeper.server.ZKDatabase)
[2021-02-02 20:19:14,259] INFO Synchronizing with Follower sid: 1 maxCommittedLog=0xa00000037 minCommittedLog=0x600000041 lastProcessedZxid=0xb00000000 peerLastZxid=0xa00000037 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:14,259] INFO Using committedLog for peer sid: 1 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:14,259] INFO Sending DIFF zxid=0xa00000037 for peer sid: 1 (org.apache.zookeeper.server.quorum.LearnerHandler)

Kafka Zookeeper Random Restarts

We are running Hyperledger fabric network with Kafka and zookeeper in production using docker swarm on Azure VM (4 Kafka node, 3 zookeeper nodes) it was running fine but just 2 days back suddenly zookeeper had a restart, after that there's continuous restart on zookeeper having time interval of 6-8 hours.
logs on Kafka node
[2020-07-04 07:48:53,492] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2020-07-04 07:48:53,492] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Shutdown completed (kafka.server.ReplicaFetcherThread)
[2020-07-04 07:48:53,499] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions xxxx-xxxxx-xxx-xxxxx.
zookeeper leader logs
2020-07-04 07:46:27,070 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x10101beb22c0000 type:create cxid:0x4 zxid:0x2e00000114 txntype:-1 reqpath:n/a Error Path:/brokers/ids Error:KeeperErrorCode = NodeExists for /brokers/ids
2020-07-04 07:48:43,084 [myid:3] - INFO [SessionTracker:ZooKeeperServer#355] - Expiring session 0x2010551ef290000, timeout of 6000ms exceeded
2020-07-04 07:48:43,085 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#487] - Processed session termination for sessionid: 0x2010551ef290000
2020-07-04 07:48:43,091 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn#1056] - Closed socket connection for client /100.0.20.80:60672 which had sessionid 0x2010551ef290000
2020-07-04 07:48:55,182 [myid:3] - ERROR [LearnerHandler-/100.0.20.80:58940:LearnerHandler#648] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2020-07-04 07:48:55,183 [myid:3] - WARN [LearnerHandler-/100.0.20.80:58940:LearnerHandler#661] - ******* GOODBYE /100.0.20.80:58940 ********
2020-07-04 07:49:57,623 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /100.0.20.80:37838
2020-07-04 07:49:57,637 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /100.0.20.80:37838
2020-07-04 07:49:57,641 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300ed4720900000 with negotiated timeout 12000 for client /100.0.20.80:37838
2020-07-04 07:49:57,670 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x300ed4720900000 type:setData cxid:0x1 zxid:0x2e000003b2 txntype:-1 reqpath:n/a Error Path:/brokers/topics/xxxxxxxxxxxx/partitions/0/state Error:KeeperErrorCode = BadVersion for /brokers/topics/xxxxxxxxxxxx/partitions/0/state
my zoo.cfg
clientPort=2181
dataDir=/data
dataLogDir=/datalog
tickTime=6000
initLimit=10
syncLimit=2
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=xxx.xxx.com:2888:3888
server.2=xxx.xxx.com:2888:3888
server.3=0.0.0.0:2888:3888

Apache ZooKeeper Cluster loses connectivity after leader election

I am running a ZooKeeper Cluster with five nodes, each of which has the following configuration (plus the correct quorum information at the end):
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/usr/local/zookeeper/data
clientPort=2181
minSessionTimeout=4000
maxSessionTimeout=40000
4lw.commands.whitelist=*
The cluster works fine at first. For testing purposes, I then kill the instance of ZooKeeper that is currently the leader (each ZooKeeper server is running in the foreground of a screen session). Then, a leader election is triggered, as expected, and another node is elected leader.
However, when then trying to send get/set/create requests to th cluster in a Java client (the process that was running fine initially), as well as interfacing with the cluster via zkCli.sh just gives the state of the client as CONNECTING forever.
At this time, I have conducted the obvious troubleshooting steps such as
echo stat | nc localhost 2181 -- On any one of the servers that are still running this will just give an indication of all being well, such as
Clients:
/10.0.0.1:35264[1](queued=0,recved=2,sent=1)
/10.0.0.2:49230[0](queued=0,recved=1,sent=0)
/127.0.0.1:34162[0](queued=0,recved=1,sent=0)
/10.0.0.3:49530[0](queued=0,recved=1,sent=0)
/10.0.0.1:35250[0](queued=0,recved=1,sent=0)
/10.0.0.4:35406[1](queued=0,recved=2,sent=1)
/10.0.0.2:49304[1](queued=0,recved=1,sent=1)
Latency min/avg/max: 0/0/0
Received: 484
Sent: 140
Connections: 7
Outstanding: 343
Zxid: 0x200000000
Mode: leader
Node count: 15
Proposal sizes last/min/max: 32/32/75
echo ruok | nc localhost 2181 just ouputs iamok
The following is the contents (abridged) of the log file on the server which is elected leader after I inject the failure:
2019-08-12 12:25:36,799 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#69] - FOLLOWING - LEADER ELECTION TOOK - 23 MS
2019-08-12 12:25:37,004 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#282] - Unexpected exception, tries=0, remaining init limit=9798, connecting to /10.0.0.10:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:233)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:262)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:77)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2019-08-12 12:25:39,148 [myid:4] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-08-12 12:25:39,174 [myid:4] - INFO [NIOWorkerThread-5:FourLetterCommands#234] - The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 1685417328=dump, 1668445044=crst, 1936880500=srst, 1701738089=envi, 1668247142=conf, -720899=telnet close, 2003003507=wchs, 2003003504=wchp, 1684632179=dirs, 1668247155=cons, 1835955314=mntr, 1769173615=isro, 1920298859=ruok, 1735683435=gtmk, 1937010027=stmk}]
2019-08-12 12:25:39,175 [myid:4] - INFO [NIOWorkerThread-5:FourLetterCommands#235] - The list of enabled four letter word commands is : [[wchs, stat, wchp, dirs, stmk, conf, ruok, mntr, srvr, wchc, envi, srst, isro, dump, gtmk, telnet close, crst, cons]]
2019-08-12 12:25:39,175 [myid:4] - INFO [NIOWorkerThread-5:NIOServerCnxn#518] - Processing stat command from /127.0.0.1:59614
2019-08-12 12:25:39,858 [myid:4] - WARN [NIOWorkerThread-6:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-08-12 12:25:39,906 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#391] - Getting a diff from the leader 0x0
2019-08-12 12:25:39,913 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#546] - Learner received NEWLEADER message
2019-08-12 12:25:40,253 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#529] - Learner received UPTODATE message
2019-08-12 12:25:40,266 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):CommitProcessor#256] - Configuring CommitProcessor with 16 worker threads.
2019-08-12 12:25:40,536 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#125] - Got zxid 0x100000001 expected 0x1
2019-08-12 12:25:40,536 [myid:4] - INFO [SyncThread:4:FileTxnLog#216] - Creating new log file: log.100000001
2019-08-12 12:25:43,617 [myid:4] - INFO [NIOWorkerThread-7:NIOServerCnxn#518] - Processing stat command from /127.0.0.1:59638
2019-08-12 12:25:43,619 [myid:4] - INFO [NIOWorkerThread-7:StatCommand#53] - Stat command output
2019-08-12 12:26:06,907 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#96] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2019-08-12 12:26:06,908 [myid:4] - WARN [RecvWorker:5:QuorumCnxManager$RecvWorker#1176] - Connection broken for id 5, my id = 4, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1161)
2019-08-12 12:26:06,909 [myid:4] - WARN [RecvWorker:5:QuorumCnxManager$RecvWorker#1179] - Interrupting SendWorker
2019-08-12 12:26:06,909 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#201] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
2019-08-12 12:26:06,910 [myid:4] - WARN [SendWorker:5:QuorumCnxManager$SendWorker#1092] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
2019-08-12 12:26:06,910 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):LearnerZooKeeperServer#165] - Shutting down
2019-08-12 12:26:06,911 [myid:4] - WARN [SendWorker:5:QuorumCnxManager$SendWorker#1102] - Send worker leaving thread id 5 my id = 4
2019-08-12 12:26:06,911 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#558] - shutting down
2019-08-12 12:26:06,912 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):FollowerRequestProcessor#139] - Shutting down
2019-08-12 12:26:06,912 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):CommitProcessor#362] - Shutting down
2019-08-12 12:26:06,912 [myid:4] - INFO [FollowerRequestProcessor:4:FollowerRequestProcessor#110] - FollowerRequestProcessor exited loop!
2019-08-12 12:26:06,914 [myid:4] - INFO [CommitProcessor:4:CommitProcessor#195] - CommitProcessor exited loop!
2019-08-12 12:26:06,917 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):FinalRequestProcessor#514] - shutdown of request processor complete
2019-08-12 12:26:06,922 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):SyncRequestProcessor#191] - Shutting down
2019-08-12 12:26:06,923 [myid:4] - INFO [SyncThread:4:SyncRequestProcessor#169] - SyncRequestProcessor exited!
2019-08-12 12:26:06,923 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1318] - PeerState set to LOOKING
2019-08-12 12:26:06,924 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1193] - LOOKING
2019-08-12 12:26:06,925 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection#885] - New election. My id = 4, proposed zxid=0x100000024
2019-08-12 12:26:07,129 [myid:4] - WARN [WorkerSender[myid=4]:QuorumCnxManager#677] - Cannot open channel to 5 at election address /10.0.0.10:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
at java.lang.Thread.run(Thread.java:748)
2019-08-12 12:26:07,337 [myid:4] - INFO [WorkerSender[myid=4]:QuorumCnxManager#430] - Have smaller server identifier, so dropping the connection: (5, 4)
2019-08-12 12:26:07,337 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 3 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,338 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 4 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 4 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,338 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 1 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,338 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 4 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,342 [myid:4] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#888] - Received connection request /172.17.0.6:46520
2019-08-12 12:26:07,345 [myid:4] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#888] - Received connection request /172.17.0.6:46522
2019-08-12 12:26:07,345 [myid:4] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#1092] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
2019-08-12 12:44:00,079 [myid:4] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker#1176] - Connection broken for id 2, my id = 4, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1161)
2019-08-12 12:44:43,382 [myid:4] - INFO [SessionTracker:QuorumZooKeeperServer#157] - Submitting global closeSession request for session 0x4003ead876c0025
2019-08-12 12:44:45,381 [myid:4] - INFO [SessionTracker:ZooKeeperServer#398] - Expiring session 0x1003ead86e90025, timeout of 40000ms exceeded
2019-08-12 12:44:45,381 [myid:4] - INFO [SessionTracker:QuorumZooKeeperServer#157] - Submitting global closeSession request for session 0x1003ead86e90025
2019-08-12 12:45:00,485 [myid:4] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#888] - Received connection request /10.0.0.7:42596
2019-08-12 12:45:00,485 [myid:4] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1176] - Connection broken for id 4, my id = 4, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1161)
2019-08-12 12:45:00,485 [myid:4] - WARN [SendWorker:2:QuorumCnxManager$SendWorker#1092] - Interrupted while waiting for message on queue
I would be extremely grateful for any help as to why the cluster doesn't seem to interconnect (or at least won't serve requests anymore) after the leader election takes place, or any advice what to look at for debugging purposes.

zookeeper - [NIOServerCnxn#383] - Exception causing close of session 0x0: Len error 1195725856

I am trying to insttall zookeeper in my Windows. I am getting the error bellow no matter which suggestion I followed in zookeeper + Kafka - Unable to create data directory.
I am running it as Administrator and I have tried all these options:
#dataDir=/tmp/zookeeper
#dataDir=:\zookeeper-3.4.14\
#dataDir=C:\\_d\\WSs\\kafka\\zookeeper-3.4.14\\data
#dataDir=:\\\\zookeeper\\\\data
dataDir=C:\\_d\\WSs\\kafka\\zookeeper-3.4.14
I don´t think it is relevant but let me add here: I have Java 11.
Any idea why it is happening will be appreciated.
Full logs
C:\Windows\system32>zkserver
C:\Windows\system32>call "C:\Program Files\Java\jdk-11.0.2"\bin\java "-Dzookeeper.log.dir=C:\_d\WSs\kafka\zookeeper-3.4.14\bin\.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\classes;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\lib\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf\zoo.cfg"
2019-04-18 15:17:42,629 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf\zoo.cfg
2019-04-18 15:17:42,644 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2019-04-18 15:17:42,644 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2019-04-18 15:17:42,644 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2019-04-18 15:17:42,644 [myid:] - WARN [main:QuorumPeerMain#116] - Either no config or no quorum defined in config, running in standalone mode
2019-04-18 15:17:42,769 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf\zoo.cfg
2019-04-18 15:17:42,769 [myid:] - INFO [main:ZooKeeperServerMain#98] - Starting server
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:host.name=DESKTOP-AKCNE7F
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.version=11.0.2
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.vendor=Oracle Corporation
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.home=C:\Program Files\Java\jdk-11.0.2
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.class.path=C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\classes;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\build\lib\*;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\zookeeper-3.4.14.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\audience-annotations-0.5.0.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\jline-0.9.94.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\log4j-1.2.17.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\netty-3.10.6.Final.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\slf4j-api-1.7.25.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\lib\slf4j-log4j12-1.7.25.jar;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\..\conf
2019-04-18 15:17:47,344 [myid:] - INFO [main:Environment#100] - Server environment:java.library.path=C:\Program Files\Java\jdk-11.0.2\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\iCLS\;C:\Program Files\Intel\Intel(R) Management Engine Components\iCLS\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Java\jdk-11.0.2\bin;C:\Program Files\Git\cmd;C:\ProgramData\chocolatey\bin;C:\_d\tools\apache-maven-3.6.0\bin;C:\_d\WSs\kafka\zookeeper-3.4.14\bin\;C:\Users\jimis\AppData\Local\Programs\Python\Python37-32\Scripts\;C:\Users\jimis\AppData\Local\Programs\Python\Python37-32\;C:\Users\jimis\AppData\Local\Microsoft\WindowsApps;.
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:java.io.tmpdir=C:\Users\jimis\AppData\Local\Temp\
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:java.compiler=<NA>
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:os.name=Windows 10
2019-04-18 15:17:47,360 [myid:] - INFO [main:Environment#100] - Server environment:os.arch=amd64
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:os.version=10.0
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:user.name=jimis
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:user.home=C:\Users\jimis
2019-04-18 15:17:47,376 [myid:] - INFO [main:Environment#100] - Server environment:user.dir=C:\Windows\system32
2019-04-18 15:17:47,391 [myid:] - INFO [main:ZooKeeperServer#836] - tickTime set to 2000
2019-04-18 15:17:47,391 [myid:] - INFO [main:ZooKeeperServer#845] - minSessionTimeout set to -1
2019-04-18 15:17:47,391 [myid:] - INFO [main:ZooKeeperServer#854] - maxSessionTimeout set to -1
2019-04-18 15:17:47,782 [myid:] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-04-18 15:17:47,797 [myid:] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-04-18 15:18:00,365 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /127.0.0.1:54057
2019-04-18 15:18:00,375 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /127.0.0.1:54058
2019-04-18 15:18:00,378 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: Len error 1195725856
2019-04-18 15:18:00,379 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /127.0.0.1:54057 (no session established for client)
*** Edited
*** edited
*** The answer to my question is "you can ignore the fact that I get an error while curl 127.0.0.1:port. Kafka is working anyway.
Are you trying to do a "HTTP GET" against the zookeeper client port?
So the error comes from NIOServerCnxn.java:readLength which is expecting either a 4-letter command or buffer where the first 4 bytes represent size.
The number 1195725856 in hex is 0x47455420 which is "GET " in ASCII.
So the error message is caused when you try to do a HTTP GET" against the 2181 port.
$ curl http://0.0.0.0:2181/
curl: (52) Empty reply from server
$ sudo tail /var/log/zookeeper/zookeeper.out
...
2019-04-19 12:56:25,303 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted
2019-04-19 12:56:25,304 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: Len error 1195725856
2019-04-19 12:56:25,304 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1040] - Closed socket connection for client /127.0.0.1:33011 (no session established for client)
This WARN message is safe to ignore since ZooKeeper will just close that client session which is implied by the curl response.

Kafka zookeper session not established after creating consumer

This is my first time using Kafka. I followed this tutorial.
After starting the Zookeper, I started the kafka server. Next a topic was created and then started the consumer for the topic. This is when the Zookeper logs says
Exception causing close of session 0x0: null
2019-01-04 14:11:58,160 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1040] - Closed socket connection for client /127.0.0.1:50480 (no session established for client)
2019-01-04 14:11:59,073 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /127.0.0.1:50481
2019-01-04 14:11:59,074 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: null
2019-01-04 14:11:59,078 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1040] - Closed socket connection for client /127.0.0.1:50481 (no session established for client)
2019-01-04 14:11:59,994 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /127.0.0.1:50482
2019-01-04 14:11:59,995 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: null
I am using Windows 10.
kafka_2.11-2.1.0
zookeeper-3.4.12
#itssajan
I tried the similar tutorial as you posted in your section https://dzone.com/articles/running-apache-kafka-on-windows-os, there is a bug in that blog-.
I tried to change zookeeper to bootstrap-server with localhost 9092 and it worked for me.
try this , it should work in windows.
C:\dev\kafka_2.12-2.2.1\kafka_2.12-2.2.1\bin\windows>kafka-console-consumer.bat
--bootstrap-server localhost:9092 --topic test --from-beginning
I cleared
/tmp/zookeeper
/tmp/kafka-logs/
killed all kafka processes.
Then it worked!