Apache ZooKeeper Cluster loses connectivity after leader election - apache-zookeeper

I am running a ZooKeeper Cluster with five nodes, each of which has the following configuration (plus the correct quorum information at the end):
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/usr/local/zookeeper/data
clientPort=2181
minSessionTimeout=4000
maxSessionTimeout=40000
4lw.commands.whitelist=*
The cluster works fine at first. For testing purposes, I then kill the instance of ZooKeeper that is currently the leader (each ZooKeeper server is running in the foreground of a screen session). Then, a leader election is triggered, as expected, and another node is elected leader.
However, when then trying to send get/set/create requests to th cluster in a Java client (the process that was running fine initially), as well as interfacing with the cluster via zkCli.sh just gives the state of the client as CONNECTING forever.
At this time, I have conducted the obvious troubleshooting steps such as
echo stat | nc localhost 2181 -- On any one of the servers that are still running this will just give an indication of all being well, such as
Clients:
/10.0.0.1:35264[1](queued=0,recved=2,sent=1)
/10.0.0.2:49230[0](queued=0,recved=1,sent=0)
/127.0.0.1:34162[0](queued=0,recved=1,sent=0)
/10.0.0.3:49530[0](queued=0,recved=1,sent=0)
/10.0.0.1:35250[0](queued=0,recved=1,sent=0)
/10.0.0.4:35406[1](queued=0,recved=2,sent=1)
/10.0.0.2:49304[1](queued=0,recved=1,sent=1)
Latency min/avg/max: 0/0/0
Received: 484
Sent: 140
Connections: 7
Outstanding: 343
Zxid: 0x200000000
Mode: leader
Node count: 15
Proposal sizes last/min/max: 32/32/75
echo ruok | nc localhost 2181 just ouputs iamok
The following is the contents (abridged) of the log file on the server which is elected leader after I inject the failure:
2019-08-12 12:25:36,799 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#69] - FOLLOWING - LEADER ELECTION TOOK - 23 MS
2019-08-12 12:25:37,004 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#282] - Unexpected exception, tries=0, remaining init limit=9798, connecting to /10.0.0.10:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:233)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:262)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:77)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2019-08-12 12:25:39,148 [myid:4] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-08-12 12:25:39,174 [myid:4] - INFO [NIOWorkerThread-5:FourLetterCommands#234] - The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 1685417328=dump, 1668445044=crst, 1936880500=srst, 1701738089=envi, 1668247142=conf, -720899=telnet close, 2003003507=wchs, 2003003504=wchp, 1684632179=dirs, 1668247155=cons, 1835955314=mntr, 1769173615=isro, 1920298859=ruok, 1735683435=gtmk, 1937010027=stmk}]
2019-08-12 12:25:39,175 [myid:4] - INFO [NIOWorkerThread-5:FourLetterCommands#235] - The list of enabled four letter word commands is : [[wchs, stat, wchp, dirs, stmk, conf, ruok, mntr, srvr, wchc, envi, srst, isro, dump, gtmk, telnet close, crst, cons]]
2019-08-12 12:25:39,175 [myid:4] - INFO [NIOWorkerThread-5:NIOServerCnxn#518] - Processing stat command from /127.0.0.1:59614
2019-08-12 12:25:39,858 [myid:4] - WARN [NIOWorkerThread-6:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-08-12 12:25:39,906 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#391] - Getting a diff from the leader 0x0
2019-08-12 12:25:39,913 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#546] - Learner received NEWLEADER message
2019-08-12 12:25:40,253 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Learner#529] - Learner received UPTODATE message
2019-08-12 12:25:40,266 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):CommitProcessor#256] - Configuring CommitProcessor with 16 worker threads.
2019-08-12 12:25:40,536 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#125] - Got zxid 0x100000001 expected 0x1
2019-08-12 12:25:40,536 [myid:4] - INFO [SyncThread:4:FileTxnLog#216] - Creating new log file: log.100000001
2019-08-12 12:25:43,617 [myid:4] - INFO [NIOWorkerThread-7:NIOServerCnxn#518] - Processing stat command from /127.0.0.1:59638
2019-08-12 12:25:43,619 [myid:4] - INFO [NIOWorkerThread-7:StatCommand#53] - Stat command output
2019-08-12 12:26:06,907 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#96] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2019-08-12 12:26:06,908 [myid:4] - WARN [RecvWorker:5:QuorumCnxManager$RecvWorker#1176] - Connection broken for id 5, my id = 4, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1161)
2019-08-12 12:26:06,909 [myid:4] - WARN [RecvWorker:5:QuorumCnxManager$RecvWorker#1179] - Interrupting SendWorker
2019-08-12 12:26:06,909 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):Follower#201] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
2019-08-12 12:26:06,910 [myid:4] - WARN [SendWorker:5:QuorumCnxManager$SendWorker#1092] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
2019-08-12 12:26:06,910 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):LearnerZooKeeperServer#165] - Shutting down
2019-08-12 12:26:06,911 [myid:4] - WARN [SendWorker:5:QuorumCnxManager$SendWorker#1102] - Send worker leaving thread id 5 my id = 4
2019-08-12 12:26:06,911 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#558] - shutting down
2019-08-12 12:26:06,912 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):FollowerRequestProcessor#139] - Shutting down
2019-08-12 12:26:06,912 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):CommitProcessor#362] - Shutting down
2019-08-12 12:26:06,912 [myid:4] - INFO [FollowerRequestProcessor:4:FollowerRequestProcessor#110] - FollowerRequestProcessor exited loop!
2019-08-12 12:26:06,914 [myid:4] - INFO [CommitProcessor:4:CommitProcessor#195] - CommitProcessor exited loop!
2019-08-12 12:26:06,917 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):FinalRequestProcessor#514] - shutdown of request processor complete
2019-08-12 12:26:06,922 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):SyncRequestProcessor#191] - Shutting down
2019-08-12 12:26:06,923 [myid:4] - INFO [SyncThread:4:SyncRequestProcessor#169] - SyncRequestProcessor exited!
2019-08-12 12:26:06,923 [myid:4] - WARN [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1318] - PeerState set to LOOKING
2019-08-12 12:26:06,924 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1193] - LOOKING
2019-08-12 12:26:06,925 [myid:4] - INFO [QuorumPeer[myid=4](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection#885] - New election. My id = 4, proposed zxid=0x100000024
2019-08-12 12:26:07,129 [myid:4] - WARN [WorkerSender[myid=4]:QuorumCnxManager#677] - Cannot open channel to 5 at election address /10.0.0.10:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:648)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:705)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
at java.lang.Thread.run(Thread.java:748)
2019-08-12 12:26:07,337 [myid:4] - INFO [WorkerSender[myid=4]:QuorumCnxManager#430] - Have smaller server identifier, so dropping the connection: (5, 4)
2019-08-12 12:26:07,337 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 3 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,338 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 4 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 4 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,338 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 1 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,338 [myid:4] - INFO [WorkerReceiver[myid=4]:FastLeaderElection#679] - Notification: 2 (message format version), 4 (n.leader), 0x100000024 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2019-08-12 12:26:07,342 [myid:4] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#888] - Received connection request /172.17.0.6:46520
2019-08-12 12:26:07,345 [myid:4] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#888] - Received connection request /172.17.0.6:46522
2019-08-12 12:26:07,345 [myid:4] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#1092] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1243)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:78)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1080)
2019-08-12 12:44:00,079 [myid:4] - WARN [RecvWorker:2:QuorumCnxManager$RecvWorker#1176] - Connection broken for id 2, my id = 4, error =
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1161)
2019-08-12 12:44:43,382 [myid:4] - INFO [SessionTracker:QuorumZooKeeperServer#157] - Submitting global closeSession request for session 0x4003ead876c0025
2019-08-12 12:44:45,381 [myid:4] - INFO [SessionTracker:ZooKeeperServer#398] - Expiring session 0x1003ead86e90025, timeout of 40000ms exceeded
2019-08-12 12:44:45,381 [myid:4] - INFO [SessionTracker:QuorumZooKeeperServer#157] - Submitting global closeSession request for session 0x1003ead86e90025
2019-08-12 12:45:00,485 [myid:4] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#888] - Received connection request /10.0.0.7:42596
2019-08-12 12:45:00,485 [myid:4] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1176] - Connection broken for id 4, my id = 4, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1161)
2019-08-12 12:45:00,485 [myid:4] - WARN [SendWorker:2:QuorumCnxManager$SendWorker#1092] - Interrupted while waiting for message on queue
I would be extremely grateful for any help as to why the cluster doesn't seem to interconnect (or at least won't serve requests anymore) after the leader election takes place, or any advice what to look at for debugging purposes.

Related

Kafka issue with adding SASL security

I'm using Confluent Community 6.0.1. Three nodes Kafka cluster:
devKafka04: Kafka Broker1, Zookeeper 1
devKafka05: Kafka Broker2, Zookeeper 2
devKafka06: Kafka Broker3, Zookeeper 3
The SSL encryption is already working well on the Kafka Brokers.
I'd like to add SASL to enable mutual authentication between Kafka and Zookeeper.
I was following the Confluent document:
https://docs.confluent.io/platform/current/kafka/incremental-security-upgrade.html#adding-security-to-a-running-zk-cluster
[Updates] After I applied the changes, Zookeeper could not start on the secureclientPort. That's why the Kafka broker couldn't start. Here are the error log and docker compose configurations.
I'm wondering if there's something with the confluent zookeeper image.
Please help me out. Thanks.
$ sudo docker logs zookeeper
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
===> Running preflight checks ...
===> Check if /var/lib/zookeeper/data is writable ...
===> Check if /var/lib/zookeeper/log is writable ...
===> Launching ...
===> Printing /var/lib/zookeeper/data/myid
1===> Launching zookeeper ...
[2021-03-24 19:03:08,857] INFO Reading configuration from: /etc/kafka/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2021-03-24 19:03:08,862] INFO clientPortAddress is 0.0.0.0:2181 (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2021-03-24 19:03:08,862] INFO secureClientPort is not set (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2021-03-24 19:03:08,876] INFO autopurge.snapRetainCount set to 3 (org.apache.zookeeper.server.DatadirCleanupManager)
[2021-03-24 19:03:08,876] INFO autopurge.purgeInterval set to 0 (org.apache.zookeeper.server.DatadirCleanupManager)
[2021-03-24 19:03:08,876] INFO Purge task is not scheduled. (org.apache.zookeeper.server.DatadirCleanupManager)
[2021-03-24 19:03:08,880] INFO Log4j 1.2 jmx support found and enabled. (org.apache.zookeeper.jmx.ManagedUtil)
[2021-03-24 19:03:08,904] INFO Starting quorum peer (org.apache.zookeeper.server.quorum.QuorumPeerMain)
[2021-03-24 19:03:08,909] INFO Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory)
[2021-03-24 19:03:08,917] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2021-03-24 19:03:08,953] INFO Server successfully logged in. (org.apache.zookeeper.Login)
[2021-03-24 19:03:08,957] INFO Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 8 worker threads, and 64 kB direct buffers. (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2021-03-24 19:03:08,961] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2021-03-24 19:03:08,986] INFO Logging initialized #929ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log)
[2021-03-24 19:03:09,081] WARN o.e.j.s.ServletContextHandler#6c2c1385{/,null,UNAVAILABLE} contextPath ends with /* (org.eclipse.jetty.server.handler.ContextHandler)
[2021-03-24 19:03:09,082] WARN Empty contextPath (org.eclipse.jetty.server.handler.ContextHandler)
[2021-03-24 19:03:09,097] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2021-03-24 19:03:09,102] INFO Local sessions disabled (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,102] INFO Local session upgrading disabled (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,102] INFO tickTime set to 3000 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,102] INFO minSessionTimeout set to 6000 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,102] INFO maxSessionTimeout set to 60000 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,102] INFO initLimit set to 10 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,115] INFO zookeeper.snapshotSizeFactor = 0.33 (org.apache.zookeeper.server.ZKDatabase)
[2021-03-24 19:03:09,116] INFO Using insecure (non-TLS) quorum communication (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,117] INFO Port unification disabled (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,117] INFO QuorumPeer communication is not secured! (SASL auth disabled) (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,117] INFO quorum.cnxn.threads.size set to 20 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,118] INFO Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.a00000000 (org.apache.zookeeper.server.persistence.FileSnap)
[2021-03-24 19:03:09,213] INFO jetty-9.4.24.v20191120; built: 2019-11-20T21:37:49.771Z; git: 363d5f2df3a8a28de40604320230664b9c793c16; jvm 11.0.9.1+1-LTS (org.eclipse.jetty.server.Server)
[2021-03-24 19:03:09,261] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
[2021-03-24 19:03:09,261] INFO No SessionScavenger set, using defaults (org.eclipse.jetty.server.session)
[2021-03-24 19:03:09,263] INFO node0 Scavenging every 660000ms (org.eclipse.jetty.server.session)
[2021-03-24 19:03:09,272] INFO Started o.e.j.s.ServletContextHandler#6c2c1385{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2021-03-24 19:03:09,281] INFO Started ServerConnector#6d07a63d{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} (org.eclipse.jetty.server.AbstractConnector)
[2021-03-24 19:03:09,281] INFO Started #1224ms (org.eclipse.jetty.server.Server)
[2021-03-24 19:03:09,281] INFO Started AdminServer on address 0.0.0.0, port 8080 and command URL /commands (org.apache.zookeeper.server.admin.JettyAdminServer)
[2021-03-24 19:03:09,288] INFO Election port bind maximum retries is 3 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-03-24 19:03:09,290] INFO 1 is accepting connections now, my election bind port: devkafka04/172.16.87.141:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-03-24 19:03:09,301] INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,303] INFO New election. My id = 1, proposed zxid=0x1600000030 (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-03-24 19:03:09,308] INFO Notification: 2 (message format version), 1 (n.leader), 0x1600000030 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x16 (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-03-24 19:03:09,310] INFO Have smaller server identifier, so dropping the connection: (myId:1 --> sid:3) (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-03-24 19:03:09,312] INFO Received connection request from /172.16.87.143:53340 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-03-24 19:03:09,315] INFO Have smaller server identifier, so dropping the connection: (myId:1 --> sid:2) (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-03-24 19:03:09,316] INFO Notification: 2 (message format version), 2 (n.leader), 0x150000002b (n.zxid), 0xa (n.round), FOLLOWING (n.state), 3 (n.sid), 0x16 (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-03-24 19:03:09,317] INFO Received connection request from /172.16.87.142:51704 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2021-03-24 19:03:09,319] INFO Notification: 2 (message format version), 2 (n.leader), 0x150000002b (n.zxid), 0xa (n.round), LEADING (n.state), 2 (n.sid), 0x16 (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-03-24 19:03:09,320] INFO Notification: 2 (message format version), 2 (n.leader), 0x150000002b (n.zxid), 0xa (n.round), FOLLOWING (n.state), 3 (n.sid), 0x16 (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-03-24 19:03:09,320] INFO FOLLOWING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-03-24 19:03:09,323] INFO Notification: 2 (message format version), 2 (n.leader), 0x150000002b (n.zxid), 0xa (n.round), LEADING (n.state), 2 (n.sid), 0x16 (n.peerEPoch), FOLLOWING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-03-24 19:03:09,330] INFO TCP NoDelay set to: true (org.apache.zookeeper.server.quorum.Learner)
[2021-03-24 19:03:09,336] INFO Server environment:zookeeper.version=3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:host.name=devkafka04 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.version=11.0.9.1 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.vendor=Azul Systems, Inc. (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.home=/usr/lib/jvm/zulu11-ca (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.class.path=/usr/bin/../share/java/kafka/activation-1.1.1.jar:/usr/bin/../share/java/kafka/aopalliance-repackaged-2.6.1.jar:/usr/bin/../share/java/kafka/argparse4j-0.7.0.jar:/usr/bin/../share/java/kafka/audience-annotations-0.5.0.jar:/usr/bin/../share/java/kafka/commons-cli-1.4.jar:/usr/bin/../share/java/kafka/commons-lang3-3.8.1.jar:/usr/bin/../share/java/kafka/confluent-log4j-1.2.17-cp2.jar:/usr/bin/../share/java/kafka/connect-api-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-basic-auth-extension-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-file-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-json-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-mirror-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-mirror-client-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-runtime-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/connect-transforms-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/hk2-api-2.6.1.jar:/usr/bin/../share/java/kafka/hk2-locator-2.6.1.jar:/usr/bin/../share/java/kafka/hk2-utils-2.6.1.jar:/usr/bin/../share/java/kafka/jackson-annotations-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-core-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-databind-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-dataformat-csv-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-datatype-jdk8-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-base-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-jaxrs-json-provider-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-module-jaxb-annotations-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-module-paranamer-2.10.5.jar:/usr/bin/../share/java/kafka/jackson-module-scala_2.13-2.10.5.jar:/usr/bin/../share/java/kafka/jakarta.activation-api-1.2.1.jar:/usr/bin/../share/java/kafka/jakarta.annotation-api-1.3.5.jar:/usr/bin/../share/java/kafka/jakarta.inject-2.6.1.jar:/usr/bin/../share/java/kafka/jakarta.validation-api-2.0.2.jar:/usr/bin/../share/java/kafka/jakarta.ws.rs-api-2.1.6.jar:/usr/bin/../share/java/kafka/jakarta.xml.bind-api-2.3.2.jar:/usr/bin/../share/java/kafka/javassist-3.25.0-GA.jar:/usr/bin/../share/java/kafka/javassist-3.26.0-GA.jar:/usr/bin/../share/java/kafka/javax.servlet-api-3.1.0.jar:/usr/bin/../share/java/kafka/javax.ws.rs-api-2.1.1.jar:/usr/bin/../share/java/kafka/jaxb-api-2.3.0.jar:/usr/bin/../share/java/kafka/jersey-client-2.30.jar:/usr/bin/../share/java/kafka/jersey-common-2.30.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-2.30.jar:/usr/bin/../share/java/kafka/jersey-container-servlet-core-2.30.jar:/usr/bin/../share/java/kafka/jersey-hk2-2.30.jar:/usr/bin/../share/java/kafka/jersey-media-jaxb-2.30.jar:/usr/bin/../share/java/kafka/jersey-server-2.30.jar:/usr/bin/../share/java/kafka/jetty-client-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-continuation-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-http-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-io-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-security-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-server-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-servlet-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-servlets-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jetty-util-9.4.24.v20191120.jar:/usr/bin/../share/java/kafka/jopt-simple-5.0.4.jar:/usr/bin/../share/java/kafka/kafka-clients-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka-log4j-appender-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka-streams-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka-streams-examples-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka-streams-scala_2.13-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka-streams-test-utils-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka-tools-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/kafka.jar:/usr/bin/../share/java/kafka/kafka_2.13-6.0.1-ccs-javadoc.jar:/usr/bin/../share/java/kafka/kafka_2.13-6.0.1-ccs-scaladoc.jar:/usr/bin/../share/java/kafka/kafka_2.13-6.0.1-ccs-sources.jar:/usr/bin/../share/java/kafka/kafka_2.13-6.0.1-ccs-test-sources.jar:/usr/bin/../share/java/kafka/kafka_2.13-6.0.1-ccs-test.jar:/usr/bin/../share/java/kafka/kafka_2.13-6.0.1-ccs.jar:/usr/bin/../share/java/kafka/lz4-java-1.7.1.jar:/usr/bin/../share/java/kafka/maven-artifact-3.6.3.jar:/usr/bin/../share/java/kafka/metrics-core-2.2.0.jar:/usr/bin/../share/java/kafka/netty-buffer-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-codec-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-common-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-handler-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-resolver-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-transport-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-transport-native-epoll-4.1.50.Final.jar:/usr/bin/../share/java/kafka/netty-transport-native-unix-common-4.1.50.Final.jar:/usr/bin/../share/java/kafka/osgi-resource-locator-1.0.3.jar:/usr/bin/../share/java/kafka/paranamer-2.8.jar:/usr/bin/../share/java/kafka/plexus-utils-3.2.1.jar:/usr/bin/../share/java/kafka/reflections-0.9.12.jar:/usr/bin/../share/java/kafka/rocksdbjni-5.18.4.jar:/usr/bin/../share/java/kafka/scala-collection-compat_2.13-2.1.6.jar:/usr/bin/../share/java/kafka/scala-java8-compat_2.13-0.9.1.jar:/usr/bin/../share/java/kafka/scala-library-2.13.2.jar:/usr/bin/../share/java/kafka/slf4j-api-1.7.30.jar:/usr/bin/../share/java/kafka/scala-logging_2.13-3.9.2.jar:/usr/bin/../share/java/kafka/scala-reflect-2.13.2.jar:/usr/bin/../share/java/kafka/slf4j-log4j12-1.7.30.jar:/usr/bin/../share/java/kafka/snappy-java-1.1.7.3.jar:/usr/bin/../share/java/kafka/zookeeper-3.5.8.jar:/usr/bin/../share/java/kafka/zookeeper-jute-3.5.8.jar:/usr/bin/../share/java/kafka/zstd-jni-1.4.4-7.jar:/usr/bin/../share/java/confluent-telemetry/* (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.io.tmpdir=/tmp (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,336] INFO Server environment:java.compiler=<NA> (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:os.name=Linux (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:os.arch=amd64 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:os.version=3.10.0-1160.21.1.el7.x86_64 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:user.name=appuser (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:user.home=/home/appuser (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:user.dir=/home/appuser (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:os.memory.free=498MB (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:os.memory.max=512MB (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,337] INFO Server environment:os.memory.total=512MB (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,338] INFO minSessionTimeout set to 6000 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,339] INFO maxSessionTimeout set to 60000 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,339] INFO Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000 datadir /var/lib/zookeeper/log/version-2 snapdir /var/lib/zookeeper/data/version-2 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-03-24 19:03:09,339] INFO FOLLOWING - LEADER ELECTION TOOK - 18 MS (org.apache.zookeeper.server.quorum.Learner)
[2021-03-24 19:03:09,345] INFO Getting a diff from the leader 0x1600000030 (org.apache.zookeeper.server.quorum.Learner)
[2021-03-24 19:03:09,350] INFO Learner received NEWLEADER message (org.apache.zookeeper.server.quorum.Learner)
[2021-03-24 19:03:09,363] INFO Learner received UPTODATE message (org.apache.zookeeper.server.quorum.Learner)
[2021-03-24 19:03:09,367] INFO Configuring CommitProcessor with 4 worker threads. (org.apache.zookeeper.server.quorum.CommitProcessor)
$ sudo docker logs kafka
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
SSL is enabled.
SASL is enabled.
===> Running preflight checks ...
===> Check if /var/lib/kafka/data is writable ...
===> Skipping Zookeeper health check for SSL connections...
===> Launching ...
===> Launching kafka ...
[2021-03-23 21:43:43,453] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2021-03-23 21:43:43,838] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2021-03-23 21:43:43,900] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2021-03-23 21:43:43,904] INFO starting (kafka.server.KafkaServer)
[2021-03-23 21:43:43,905] INFO Connecting to zookeeper on devkafka04:2182,devkafka05:2182,devkafka06:2182 (kafka.server.KafkaServer)
[2021-03-23 21:43:43,927] INFO [ZooKeeperClient Kafka server] Initializing a new session to devkafka04:2182,devkafka05:2182,devkafka06:2182. (kafka.zookeeper.ZooKeeperClient)
[2021-03-23 21:43:43,934] INFO Client environment:zookeeper.version=3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT (org.apache.zookeeper.ZooKeeper)
[2021-03-23 21:43:43,934] INFO Client environment:host.name=devkafka04 (org.apache.zookeeper.ZooKeeper)
[2021-03-23 21:43:43,934] INFO Client environment:java.version=11.0.9.1 (org.apache.zookeeper.ZooKeeper)
[2021-03-23 21:43:43,934] INFO Client environment:java.vendor=Azul Systems, Inc. (org.apache.zookeeper.ZooKeeper)
------ Repeating lines removed ---------
'Client' (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:43:59,947] INFO Socket error occurred: devkafka05/172.16.87.142:2182: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,048] INFO Client successfully logged in. (org.apache.zookeeper.Login)
[2021-03-23 21:44:01,048] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2021-03-23 21:44:01,048] INFO Opening socket connection to server devkafka04/172.16.87.141:2182. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,049] INFO Socket error occurred: devkafka04/172.16.87.141:2182: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,150] INFO Client successfully logged in. (org.apache.zookeeper.Login)
[2021-03-23 21:44:01,150] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2021-03-23 21:44:01,150] INFO Opening socket connection to server devkafka06/172.16.87.143:2182. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,153] INFO Socket error occurred: devkafka06/172.16.87.143:2182: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,254] INFO Client successfully logged in. (org.apache.zookeeper.Login)
[2021-03-23 21:44:01,254] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2021-03-23 21:44:01,254] INFO Opening socket connection to server devkafka05/172.16.87.142:2182. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,255] INFO Socket error occurred: devkafka05/172.16.87.142:2182: Connection refused (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:01,952] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-03-23 21:44:02,356] INFO Client successfully logged in. (org.apache.zookeeper.Login)
[2021-03-23 21:44:02,357] INFO Client will use DIGEST-MD5 as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2021-03-23 21:44:02,357] INFO Opening socket connection to server devkafka04/172.16.87.141:2182. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:02,462] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-03-23 21:44:02,463] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-03-23 21:44:02,465] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-03-23 21:44:02,469] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:262)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:119)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1865)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:419)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:444)
at kafka.server.KafkaServer.startup(KafkaServer.scala:222)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
at kafka.Kafka$.main(Kafka.scala:82)
at kafka.Kafka.main(Kafka.scala)
[2021-03-23 21:44:02,471] INFO shutting down (kafka.server.KafkaServer)
[2021-03-23 21:44:02,478] INFO shut down completed (kafka.server.KafkaServer)
[2021-03-23 21:44:02,478] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-03-23 21:44:02,479] INFO shutting down (kafka.server.KafkaServer)
$ sudo cat kafka-docker-compose.yml
version: '3'
services:
kafka:
image: confluentinc/cp-kafka:6.0.1
container_name: kafka
network_mode: host
restart: always
ports:
- "9092:9092"
- "9093:9093"
- "9094:9094"
- "49998:49998"
- "49999:49999"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: 'devkafka04:2182,devkafka05:2182,devkafka06:2182'
KAFKA_ZOOKEEPER_SSL_CLIENT_ENABLE: 'true'
KAFKA_ZOOKEEPER_CLIENTCNXNSOCKET: org.apache.zookeeper.ClientCnxnSocketNetty
KAFKA_ZOOKEEPER_SSL_TRUSTSTORE_FILENAME: kafka.server.truststore.jks
KAFKA_ZOOKEEPER_SSL_TRUSTSTORE_CREDENTIALS: creds
KAFKA_ZOOKEEPER_SET_ACL: 'true'
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://devkafka04:9092,SSL://devkafka04:9093,SASL_SSL://devkafka04:9094
KAFKA_LISTENERS: PLAINTEXT://devkafka04:9092,SSL://devkafka04:9093,SASL_SSL://devkafka04:9094
KAFKA_SASL_ENABLED_MECHANISMS: DIGEST-MD5
KAFKA_SECURITY_INTER_BROKER_PROTOCOL: SSL
KAFKA_SSL_CLIENT_AUTH: requested
KAFKA_SSL_TRUSTSTORE_FILENAME: kafka.server.truststore.jks
KAFKA_SSL_TRUSTSTORE_CREDENTIALS: creds
KAFKA_SSL_KEYSTORE_FILENAME: devkafka04.server.keystore.jks
KAFKA_SSL_KEYSTORE_CREDENTIALS: creds
KAFKA_SSL_KEY_CREDENTIALS: creds
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "false"
KAFKA_OPTS: -Djava.security.auth.login.config=/etc/kafka/jmx/kafka_server_jaas.conf -Djava.rmi.server.hostname=devkafka04 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=49998 -Dcom.sun.management.jmxremote.port=49998 -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/etc/kafka/jmx/jmx_prometheus_javaagent-0.14.0.jar=49999:/etc/kafka/jmx/kafka-2_0_0.yml
CONFLUENT_SUPPORT_METRICS_ENABLE: "false"
volumes:
- /media/kafka/data:/var/lib/kafka/data
- /media/kafka/secrets:/etc/kafka/secrets
- /usr/local/src/kafka/jmx:/etc/kafka/jmx
$ sudo cat jmx/kafka_server_jaas.conf
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="kafkabroker"
password="kafkabroker-secret"
user_kafkabroker="kafkabroker-secret"
user_kafka-broker-metric-reporter="kafkabroker-metric-reporter-secret"
user_client="client-secret";
};
Client {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="kafka"
password="kafka-secret";
};
$ sudo cat zookeeper-docker-compose.yml
version: '3'
services:
zookeeper:
image: confluentinc/cp-zookeeper:6.0.1
container_name: zookeeper
network_mode: host
restart: always
ports:
- "2181:2181"
- "2182:2182"
- "2888:2888"
- "3888:3888"
- "39998:39998"
- "39999:39999"
environment:
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_SERVERS: devkafka04:2888:3888;devkafka05:2888:3888;devkafka06:2888:3888
ZOOKEEPER_AUTHPROVIDER_SASL: org.apache.zookeeper.server.auth.SASLAuthenticationProvider
ZOOKEEPER_AUTHPROVIDER_x509: org.apache.zookeeper.server.auth.X509AuthenticationProvider
ZOOKEEPER_SECURECLIENTPORT: 2182
ZOOKEEPER_SERVERCNXNFACTORY: org.apache.zookeeper.server.NettyServerCnxnFactory
ZOOKEEPER_SSL_TRUSTSTORE_FILENAME: kafka.server.truststore.jks
ZOOKEEPER_SSL_TRUSTSTORE_CREDENTIALS: creds
ZOOKEEPER_SSL_KEYSTORE_FILENAME: devkafka05.server.keystore.jks
ZOOKEEPER_SSL_KEYSTORE_CREDENTIALS: creds
ZOOKEEPER_SSL_KEY_CREDENTIALS: creds
ZOOKEEPER_SSL_CLIENTAUTH: none
KAFKA_OPTS: -Djava.security.auth.login.config=/etc/zookeeper/jmx/zookeeper_jaas.conf -Dzookeeper.4lw.commands.whitelist=* -Djava.rmi.server.hostname=devkafka04 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.rmi.port=39998 -Dcom.sun.management.jmxremote.port=39998 -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/etc/zookeeper/jmx/jmx_prometheus_javaagent-0.14.0.jar=39999:/etc/zookeeper/jmx/jmx-zookeeper-prometheus.yaml
volumes:
- /media/zookeeper/data:/var/lib/zookeeper/data
- /media/zookeeper/log:/var/lib/zookeeper/log
- /media/zookeeper/secrets:/etc/zookeeper/secrets
- /usr/local/src/zookeeper/jmx:/etc/zookeeper/jmx
$ sudo cat jmx/zookeeper_jaas.conf
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_kafka="kafka-secret";
};
Try using KAFKA_ZOOKEEPER_CLIENT_CNXN_SOCKET instead of KAFKA_ZOOKEEPER_CLIENTCNXNSOCKET.

Setting up Kafka cluster with 3 Zookeeper and 3 Broker node

On setting up Kafka cluster with 3 Zookeeper and 3 Broker node, only one of the broker node starts and remaining two doesn't. Cluster is present in the same host.on running the command for other two broker they just start for a second and then closes.
The error log message is-
[2021-02-02 20:19:12,874] WARN Unable to read additional data from client sessionid 0x3004e189c730000, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn)
[2021-02-02 20:19:12,877] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner)
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:84)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:118)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1253)
[2021-02-02 20:19:12,966] INFO shutdown called (org.apache.zookeeper.server.quorum.Learner)
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1257)
[2021-02-02 20:19:12,967] INFO Shutting down (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:12,967] INFO shutting down (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:12,967] INFO Shutting down (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
[2021-02-02 20:19:12,967] INFO Shutting down (org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-02 20:19:12,967] INFO FollowerRequestProcessor exited loop! (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
[2021-02-02 20:19:12,967] INFO CommitProcessor exited loop! (org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-02 20:19:12,972] INFO shutdown of request processor complete (org.apache.zookeeper.server.FinalRequestProcessor)
[2021-02-02 20:19:12,983] INFO Shutting down (org.apache.zookeeper.server.SyncRequestProcessor)
[2021-02-02 20:19:12,983] INFO SyncRequestProcessor exited! (org.apache.zookeeper.server.SyncRequestProcessor)
[2021-02-02 20:19:13,040] WARN PeerState set to LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-02 20:19:13,040] INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-02 20:19:13,041] INFO New election. My id = 3, proposed zxid=0xa00000037 (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,041] INFO Notification: 2 (message format version), 1 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,042] INFO Notification: 2 (message format version), 3 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,043] INFO Notification: 2 (message format version), 2 (n.leader), 0x900000011 (n.zxid), 0x1 (n.round), LEADING (n.state), 2 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,044] INFO Notification: 2 (message format version), 3 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0xa (n.peerEPoch), LOOKING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,245] INFO LEADING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2021-02-02 20:19:13,248] INFO TCP NoDelay set to: true (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,248] INFO zookeeper.leader.maxConcurrentSnapshots = 10 (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,248] INFO zookeeper.leader.maxConcurrentSnapshotTimeout = 5 (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,249] INFO minSessionTimeout set to 4000 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:13,249] INFO maxSessionTimeout set to 40000 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:13,249] INFO Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /u01/data05/zookeeper/version-2 snapdir /u01/data05/zookeeper/version-2 (org.apache.zookeeper.server.ZooKeeperServer)
[2021-02-02 20:19:13,250] INFO LEADING - LEADER ELECTION TOOK - 5 MS (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,251] INFO Snapshotting: 0xa00000037 to /u01/data05/zookeeper/version-2/snapshot.a00000037 (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
[2021-02-02 20:19:13,861] INFO Notification: 2 (message format version), 2 (n.leader), 0xa00000037 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0xa (n.peerEPoch), LEADING (my state)0 (n.config version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-02 20:19:13,877] INFO Follower sid: 2 : info : server:12888:13888:participant (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:13,902] INFO On disk txn sync enabled with snapshotSizeFactor 0.33 (org.apache.zookeeper.server.ZKDatabase)
[2021-02-02 20:19:13,902] INFO Synchronizing with Follower sid: 2 maxCommittedLog=0xa00000037 minCommittedLog=0x600000041 lastProcessedZxid=0xa00000037 peerLastZxid=0xa00000037 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:13,902] INFO Sending DIFF zxid=0xa00000037 for peer sid: 2 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:13,924] INFO Have quorum of supporters, sids: [ [2, 3],[2, 3] ]; starting up and setting last processed zxid: 0xb00000000 (org.apache.zookeeper.server.quorum.Leader)
[2021-02-02 20:19:13,928] INFO Configuring CommitProcessor with 2 worker threads. (org.apache.zookeeper.server.quorum.CommitProcessor)
[2021-02-02 20:19:13,932] INFO Using checkIntervalMs=60000 maxPerMinute=10000 (org.apache.zookeeper.server.ContainerManager)
[2021-02-02 20:19:14,248] INFO Follower sid: 1 : info : server:2888:3888:participant (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:14,259] INFO On disk txn sync enabled with snapshotSizeFactor 0.33 (org.apache.zookeeper.server.ZKDatabase)
[2021-02-02 20:19:14,259] INFO Synchronizing with Follower sid: 1 maxCommittedLog=0xa00000037 minCommittedLog=0x600000041 lastProcessedZxid=0xb00000000 peerLastZxid=0xa00000037 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:14,259] INFO Using committedLog for peer sid: 1 (org.apache.zookeeper.server.quorum.LearnerHandler)
[2021-02-02 20:19:14,259] INFO Sending DIFF zxid=0xa00000037 for peer sid: 1 (org.apache.zookeeper.server.quorum.LearnerHandler)

Kafka Zookeeper Random Restarts

We are running Hyperledger fabric network with Kafka and zookeeper in production using docker swarm on Azure VM (4 Kafka node, 3 zookeeper nodes) it was running fine but just 2 days back suddenly zookeeper had a restart, after that there's continuous restart on zookeeper having time interval of 6-8 hours.
logs on Kafka node
[2020-07-04 07:48:53,492] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2020-07-04 07:48:53,492] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Shutdown completed (kafka.server.ReplicaFetcherThread)
[2020-07-04 07:48:53,499] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions xxxx-xxxxx-xxx-xxxxx.
zookeeper leader logs
2020-07-04 07:46:27,070 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x10101beb22c0000 type:create cxid:0x4 zxid:0x2e00000114 txntype:-1 reqpath:n/a Error Path:/brokers/ids Error:KeeperErrorCode = NodeExists for /brokers/ids
2020-07-04 07:48:43,084 [myid:3] - INFO [SessionTracker:ZooKeeperServer#355] - Expiring session 0x2010551ef290000, timeout of 6000ms exceeded
2020-07-04 07:48:43,085 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#487] - Processed session termination for sessionid: 0x2010551ef290000
2020-07-04 07:48:43,091 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn#1056] - Closed socket connection for client /100.0.20.80:60672 which had sessionid 0x2010551ef290000
2020-07-04 07:48:55,182 [myid:3] - ERROR [LearnerHandler-/100.0.20.80:58940:LearnerHandler#648] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2020-07-04 07:48:55,183 [myid:3] - WARN [LearnerHandler-/100.0.20.80:58940:LearnerHandler#661] - ******* GOODBYE /100.0.20.80:58940 ********
2020-07-04 07:49:57,623 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /100.0.20.80:37838
2020-07-04 07:49:57,637 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /100.0.20.80:37838
2020-07-04 07:49:57,641 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300ed4720900000 with negotiated timeout 12000 for client /100.0.20.80:37838
2020-07-04 07:49:57,670 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x300ed4720900000 type:setData cxid:0x1 zxid:0x2e000003b2 txntype:-1 reqpath:n/a Error Path:/brokers/topics/xxxxxxxxxxxx/partitions/0/state Error:KeeperErrorCode = BadVersion for /brokers/topics/xxxxxxxxxxxx/partitions/0/state
my zoo.cfg
clientPort=2181
dataDir=/data
dataLogDir=/datalog
tickTime=6000
initLimit=10
syncLimit=2
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=xxx.xxx.com:2888:3888
server.2=xxx.xxx.com:2888:3888
server.3=0.0.0.0:2888:3888

To achieve Kafka-mule integration using Mule 4, I installed zookeeper&kafka on my machine. Both the servers connection goes off after sometime

I started by creating zookeeper and kafka servers respectively on my machine. The connection gets established for both the servers but goes off after a few mins, throwing some errors.
ZOOKEEPER ERROR LOG-
'''2020-05-11 14:32:36,908 [myid:] - INFO [SyncThread:0:FileTxnLog#284] - Creating new log file: log.1
2020-05-11 14:36:05,170 [myid:] - WARN [NIOWorkerThread-1:NIOServerCnxn#373] - Close of session 0x10000cc587b0003
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:324)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-05-11 14:36:10,589 [myid:] - INFO [SessionTracker:ZooKeeperServer#600] - Expiring session 0x10000cc587b0003, timeout of 6000ms exceeded
'''
KAFKA ERROR LOG - Below are the kafka server error logs## Heading ##
'''[2020-05-11 14:33:41,077] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
[2020-05-11 14:36:04,762] ERROR Error while writing to checkpoint file C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint (kafka.server.LogDirFailureChannel)
java.nio.file.FileAlreadyExistsException: C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint.tmp -> C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
.....
[2020-05-11 14:36:04,776] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\Kafkakafka-logs (kafka.server.ReplicaManager)
[2020-05-11 14:36:04,776] ERROR [ReplicaManager broker=0] Error while writing to highwatermark file in directory C:\Kafka\Kafkakafka-logs (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
Caused by: java.nio.file.FileAlreadyExistsException: C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint.tmp -> C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
'''

Zookeeper not recovering after timeout

Zookeeper is not properly recovering after timeout and falling into a non working state without restarting.
What can cause this, how can we resolve it?
config
ZOO_TICK_TIME: 2000
ZOO_INIT_LIMIT: 30000
ZOO_SYNC_LIMIT: 10
ZOO_MAX_CLIENT_CNXNS: 2000
ZOO_STANDALONE_ENABLED: 'false'
ZOO_AUTOPURGE_PURGEINTERVAL: 1
ZOO_AUTOPURGE_SNAPRETAINCOUNT: 10
ZOO_LOG4J_PROP: INFO,ROLLINGFILE
log
2020-01-10 20:58:57,473 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 2 (n.leader), 0x2200061493 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x22 (n.peerEPoch), FOLLOWING (my state)0 (n.config version)
2020-01-10 20:59:57,484 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 2 (n.leader), 0x2200061493 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x22 (n.peerEPoch), FOLLOWING (my state)0 (n.config version)
2020-01-10 21:00:57,495 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 2 (n.leader), 0x2200061493 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x22 (n.peerEPoch), FOLLOWING (my state)0 (n.config version)
2020-01-10 21:01:07,510 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Follower#96] - Exception when following the leader
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)
2020-01-10 21:01:07,516 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Follower#201] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
2020-01-10 21:01:07,516 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):LearnerZooKeeperServer#165] - Shutting down
2020-01-10 21:01:07,517 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#558] - shutting down
2020-01-10 21:01:07,517 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FollowerRequestProcessor#139] - Shutting down
2020-01-10 21:01:07,518 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):CommitProcessor#362] - Shutting down
2020-01-10 21:01:07,518 [myid:1] - INFO [FollowerRequestProcessor:1:FollowerRequestProcessor#110] - FollowerRequestProcessor exited loop!
2020-01-10 21:01:07,518 [myid:1] - INFO [CommitProcessor:1:CommitProcessor#195] - CommitProcessor exited loop!
2020-01-10 21:01:07,518 [myid:1] - INFO [CommitProcessor:1:CommitProcessor#195] - CommitProcessor exited loop!
2020-01-10 21:01:07,526 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FinalRequestProcessor#514] - shutdown of request processor complete
2020-01-10 21:01:07,659 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):SyncRequestProcessor#191] - Shutting down
2020-01-10 21:01:07,660 [myid:1] - INFO [SyncThread:1:SyncRequestProcessor#169] - SyncRequestProcessor exited!
2020-01-10 21:01:07,661 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1318] - PeerState set to LOOKING
2020-01-10 21:01:07,661 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1193] - LOOKING
2020-01-10 21:01:07,664 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection#885] - New election. My id = 1, proposed zxid=0x23007a7d69
2020-01-10 21:01:07,666 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 1 (n.leader), 0x23007a7d69 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0x23 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-01-10 21:01:07,668 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 3 (n.leader), 0x2200061493 (n.zxid), 0x1 (n.round), LEADING (n.state), 3 (n.sid), 0x23 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-01-10 21:01:07,668 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 1 (n.leader), 0x23007a7d69 (n.zxid), 0x2 (n.round), LOOKING (n.state), 2 (n.sid), 0x23 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-01-10 21:01:07,870 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer#1281] - LEADING
2020-01-10 21:01:07,885 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#66] - TCP NoDelay set to: true
2020-01-10 21:01:07,885 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#86] - zookeeper.leader.maxConcurrentSnapshots = 10
2020-01-10 21:01:07,885 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#88] - zookeeper.leader.maxConcurrentSnapshotTimeout = 5
2020-01-10 21:01:07,890 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#938] - minSessionTimeout set to 4000
2020-01-10 21:01:07,890 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#947] - maxSessionTimeout set to 40000
2020-01-10 21:01:07,890 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperServer#166] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /datalog/version-2 snapdir /data/version-2
2020-01-10 21:01:07,900 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#464] - LEADING - LEADER ELECTION TOOK - 29 MS
2020-01-10 21:01:07,918 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FileTxnSnapLog#384] - Snapshotting: 0x23007a7d69 to /data/version-2/snapshot.23007a7d69
2020-01-10 21:01:07,964 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,040 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,318 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,491 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,491 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,501 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,501 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,509 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,510 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,519 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection#679] - Notification: 2 (message format version), 3 (n.leader), 0x23007a7d72 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x23 (n.peerEPoch), LEADING (my state)0 (n.config version)
2020-01-10 21:01:08,524 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,524 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,535 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,535 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,545 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,545 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,572 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,574 [myid:1] - WARN [NIOWorkerThread-4:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,582 [myid:1] - WARN [NIOWorkerThread-2:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,583 [myid:1] - WARN [NIOWorkerThread-3:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,594 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,594 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
2020-01-10 21:01:08,595 [myid:1] - WARN [NIOWorkerThread-1:NIOServerCnxn#370] - Exception causing close of session 0x0: ZooKeeperServer not running
...
For the typical symptomatic categories listed above, use the lists below to check if you are seeing frequent disconnects of healthy clients
Network connectivity problems​:
Use ifconfig to check a number of errors on NIC's. If high, this can
account for increased latency.
I/O starvation​
It's better if Zookeeper is installed on dedicated nodes when high performance is needed
Check your HDD performance, e.g. using hdparm -tT
Make sure that directory for ZK transaction log (/opt/mapr/zkdata by default) is on fast dedicated drive
Property is dataLogDir in the $ZK_DIR/conf/zoo.cfg
Make sure that ZooKeeper heap size is not larger than the RAM available to avoid swapping
GC starvation
​​- The symptom is frequent client disconnects and session expiration due to starvation of heartbeat thread.
Use jstat -gc to see if there are frequent full garbage collections
Use alternative GC collector, if issue is present, like for example ConcurrentMarkSweep. For this, put -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
Client-side timeouts
Check swapping on client machines free -m.
Possibly increase maxSessionTimeout if set to too low value
If no obvious problems detected, run https://github.com/phunt/zk-smoketest to check if the problem is with zookeeper at all
Virtual Environments - if ZK cluster is on some shared hosting, this can cause resource starvation and introduce latency.
Reference link : https://mapr.com/support/s/article/How-do-I-troubleshoot-ZK-connection-timeout-issues?language=en_US
This
2020-01-10 21:01:07,900 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):Leader#464] - LEADING - LEADER ELECTION TOOK - 29 MS
2020-01-10 21:01:07,918 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FileTxnSnapLog#384] - Snapshotting: 0x23007a7d69 to /data/version-2/snapshot.23007a7d69
is telling me, the zk node became leader and it started writing snapshot (after that, it is probably going to send snapshot to followers). Then node will start serving queries. But if snapshot is big (or /data/version-2/ underlying hardware is slow) writing will take significant amount of time. If it takes longer than maxSessionTimeout 40000, all active sessions and all ephemeral data will be lost.
If that is the case, then the only good way to avoid it is to reduce amount of data stored in zookeeper. For example, store heavy blobs somewhere else (file/nfs/hdfs/aws/sql/http/...), and in zk only write timestamps (and path in storage maybe), so that the application still could receive data-change-notications quickly from zk.