zookeeper + expiring session + timeout of 8000ms exceeded - apache-zookeeper

We are using Kafka high level consumer / producer , and we are able to successfully consume / produce messages but the zookeeper connections keep expiring and reestablishing.
I am wondering why we get the following errors?
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x17f4d848a4d2613, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x37f4d848c941dea, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x27f4d8506b32638, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x37f4d848c941ded, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x17f4d848a4d260f, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x27f4d8506b3263c, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x17f4d848a4d2616, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [SessionTracker:ZooKeeperServer#347] - Expiring session 0x27f4d8506b32633, timeout of 8000ms exceeded
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x17f4d848a4d2613
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x37f4d848c941dea
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x27f4d8506b32638
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x37f4d848c941ded
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x17f4d848a4d260f
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x27f4d8506b3263c
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x17f4d848a4d2616
2022-03-04 03:04:16,000 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#492] - Processed session termination for sessionid: 0x27f4d8506b32633
2022-03-04 03:04:16,016 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#643] - Got user-level KeeperException when processing sessionid:0x27f4
not sure it this post also point about our problem - https://www.ibm.com/docs/en/streams/4.3.0?topic=solutions-clients-disconnect-from-zookeeper-ensemble
note sure if ZooKeeper session timeouts are caused by "soft failures," which are most commonly a garbage collection pause. Turn on GC logging and see if a long GC occurs at the time the connection times out. Also, read about JVM tuning in Kafka.
Note we set the zookeeper.connection.timeout.ms to 600000

Related

How to solve Invalid value javax.net.ssl.SSLHandshakeException:for confluent-platform in docker?

Below is my Docker-compose file for adding Confluent-platform broker security
i tried so much ways,but it is not working.
i created the keys from my local and mounting to docker,thats works fine but i didnt understand the exception.
---
---
version: "2.3"
services:
zookeeper1:
image: confluentinc/cp-zookeeper:5.5.1
restart: always
hostname: zookeeper1
container_name: zookeeper1
ports:
- 2181:2181
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka1:
image: confluentinc/cp-server:5.5.1
hostname: kafka1
container_name: kafka1
depends_on:
- zookeeper1
volumes:
- //d/tmp/keys/:/etc/kafka/secrets
ports:
- 8091:8091
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper1:2181
KAFKA_BROKER_ID: 0
KAFKA_BROKER_RACK: "r1"
KAFKA_DELETE_TOPIC_ENABLE: "true"
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
KAFKA_DEFAULT_REPLICATION_FACTOR: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: SSL:SSL
KAFKA_SECURITY_INTER_BROKER_PROTOCOL: SSL
KAFKA_LISTENERS: SSL://kafka1:8091
KAFKA_ADVERTISED_LISTENERS: SSL://kafka1:8091
KAFKA_SSL_PROTOCOL: "TLSV1.2"
KAFKA_SSL_KEYSTORE_FILENAME: kafka.broker.keystore.jks
KAFKA_SSL_KEYSTORE_CREDENTIALS: password.txt
KAFKA_SSL_KEY_CREDENTIALS: password.txt
KAFKA_SSL_TRUSTSTORE_FILENAME: kafka.broker.truststore.jks
KAFKA_SSL_TRUSTSTORE_CREDENTIALS: password.txt
KAFKA_SSL_CLIENT_AUTH: "none"
KAFKA_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: ""
But in the log i am seein glike this truststore is not detected
ssl.key.password = [hidden]
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = /etc/kafka/secrets/kafka.broker.keystore.jks
ssl.keystore.password = [hidden]
ssl.keystore.type = JKS
ssl.principal.mapping.rules = DEFAULT
ssl.protocol = TLSV1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
But i am facing the below error:
[2021-03-10 07:45:20,758] INFO Awaiting socket connections on 0.0.0.0:8091. (kafka.network.Acceptor)
[2021-03-10 07:45:21,049] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: org.apache.kafka.common.config.ConfigException: **Invalid value javax.net.ssl.SSLHandshakeException: General SSLEngine problem for configuration A client SSLEngine created with the provided settings can't connect to a server SSLEngine created with those settings.**
at org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:77)
at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
at org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:97)
at kafka.network.Processor.<init>(SocketServer.scala:769)
at kafka.network.SocketServer.newProcessor(SocketServer.scala:396)
at kafka.network.SocketServer.$anonfun$addDataPlaneProcessors$1(SocketServer.scala:281)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at kafka.network.SocketServer.addDataPlaneProcessors(SocketServer.scala:280)
at kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1(SocketServer.scala:243)
at kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1$adapted(SocketServer.scala:240)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.network.SocketServer.createDataPlaneAcceptorsAndProcessors(SocketServer.scala:240)
at kafka.network.SocketServer.startup(SocketServer.scala:123)
at kafka.server.KafkaServer.startup(KafkaServer.scala:406)
at io.confluent.support.metrics.SupportedServerStartable.startup(SupportedServerStartable.java:140)
at io.confluent.support.metrics.SupportedKafka.main(SupportedKafka.java:66)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value javax.net.ssl.SSLHandshakeException: General SSLEngine problem for configuration A client SSLEngine created with the provided settings can't connect to a server SSLEngine created with those settings.
at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:111)
at org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:75)
... 17 more
[2021-03-10 07:45:21,051] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
[2021-03-10 07:45:21,052] INFO [SocketServer brokerId=0] Stopping socket server request processors (kafka.network.SocketServer)
[2021-03-10 07:45:21,053] INFO [SocketServer brokerId=0] Stopped socket server request processors (kafka.network.SocketServer)
[2021-03-10 07:45:21,059] INFO Shutting down. (kafka.log.LogManager)
[2021-03-10 07:45:21,060] INFO Shutting down the log cleaner. (kafka.log.LogCleaner)
[2021-03-10 07:45:21,061] INFO [kafka-log-cleaner-thread-0]: Shutting down (kafka.log.LogCleaner)
[2021-03-10 07:45:21,061] INFO [kafka-log-cleaner-thread-0]: Stopped (kafka.log.LogCleaner)
[2021-03-10 07:45:21,061] INFO [kafka-log-cleaner-thread-0]: Shutdown completed (kafka.log.LogCleaner)
[2021-03-10 07:45:21,066] INFO Shutdown complete. (kafka.log.LogManager)
[2021-03-10 07:45:21,067] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-03-10 07:45:21,177] INFO Session: 0x10004be2a200006 closed (org.apache.zookeeper.ZooKeeper)
[2021-03-10 07:45:21,177] INFO EventThread shut down for session: 0x10004be2a200006 (org.apache.zookeeper.ClientCnxn)
[2021-03-10 07:45:21,180] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-03-10 07:45:21,181] INFO [ThrottledChannelReaper-Fetch]: Shutting down (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:21,383] INFO [ThrottledChannelReaper-Fetch]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:21,383] INFO [ThrottledChannelReaper-Fetch]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:21,383] INFO [ThrottledChannelReaper-Produce]: Shutting down (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:22,383] INFO [ThrottledChannelReaper-Produce]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:22,383] INFO [ThrottledChannelReaper-Produce]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:22,384] INFO [ThrottledChannelReaper-Request]: Shutting down (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:22,385] INFO [ThrottledChannelReaper-Request]: Stopped (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:22,385] INFO [ThrottledChannelReaper-Request]: Shutdown completed (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-03-10 07:45:22,389] INFO [SocketServer brokerId=0] Shutting down socket server (kafka.network.SocketServer)
[2021-03-10 07:45:22,478] INFO [SocketServer brokerId=0] Shutdown completed (kafka.network.SocketServer)
[2021-03-10 07:45:22,485] INFO [KafkaServer id=0] shut down completed (kafka.server.KafkaServer)
[2021-03-10 07:45:22,487] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)
[2021-03-10 07:45:22,487] INFO Closing BaseMetricsReporter (io.confluent.support.metrics.BaseMetricsReporter)
[2021-03-10 07:45:22,487] INFO Waiting for metrics thread to exit (io.confluent.support.metrics.SupportedServerStartable)
[2021-03-10 07:45:22,487] INFO Shutting down KafkaServer (io.confluent.support.metrics.SupportedServerStartable)
[2021-03-10 07:45:22,487] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
Can anyone aware why this exception is coming and how to overcome this?

Not able to install zookeeper in windows

When i am executing zkServer.cmd via below command:
D:\zookeeper\zookeeper-3.4.14\bin>call "C:\Program Files\Java\jdk-13.0.2"\bin\java "-Dzookeeper.log.dir=D:\zookeeper\zookeeper-3.4.14\bin\.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "D:\zookeeper\zookeeper-3.4.14\bin\..\build\classes;D:\zookeeper\zookeeper-3.4.14\bin\..\build\lib\*;D:\zookeeper\zookeeper-3.4.14\bin\..\*;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\*;D:\zookeeper\zookeeper-3.4.14\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "D:\zookeeper\zookeeper-3.4.14\bin\..\conf\zoo.cfg"
Output:
2020-02-22 22:07:42,378 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: D:\zookeeper\zookeeper-3.4.14\bin\..\conf\zoo.cfg
2020-02-22 22:07:42,384 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2020-02-22 22:07:42,384 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2020-02-22 22:07:42,385 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2020-02-22 22:07:42,387 [myid:] - WARN [main:QuorumPeerMain#116] - Either no config or no quorum defined in config, running in standalone mode
2020-02-22 22:07:42,487 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: D:\zookeeper\zookeeper-3.4.14\bin\..\conf\zoo.cfg
2020-02-22 22:07:42,487 [myid:] - INFO [main:ZooKeeperServerMain#98] - Starting server
2020-02-22 22:07:42,512 [myid:] - INFO [main:Environment#100] - Server environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2020-02-22 22:07:42,512 [myid:] - INFO [main:Environment#100] - Server environment:host.name=LAPTOP-78H8U6PB
2020-02-22 22:07:42,517 [myid:] - INFO [main:Environment#100] - Server environment:java.version=13.0.2
2020-02-22 22:07:42,518 [myid:] - INFO [main:Environment#100] - Server environment:java.vendor=Oracle Corporation
2020-02-22 22:07:42,519 [myid:] - INFO [main:Environment#100] - Server environment:java.home=C:\Program Files\Java\jdk-13.0.2
2020-02-22 22:07:42,519 [myid:] - INFO [main:Environment#100] - Server environment:java.class.path=D:\zookeeper\zookeeper-3.4.14\bin\..\build\classes;D:\zookeeper\zookeeper-3.4.14\bin\..\build\lib\*;D:\zookeeper\zookeeper-3.4.14\bin\..\zookeeper-3.4.14.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\audience-annotations-0.5.0.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\jline-0.9.94.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\log4j-1.2.17.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\netty-3.10.6.Final.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\slf4j-api-1.7.25.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\slf4j-log4j12-1.7.25.jar;D:\zookeeper\zookeeper-3.4.14\bin\..\conf
2020-02-22 22:07:42,520 [myid:] - INFO [main:Environment#100] - Server environment:java.library.path=C:\Program Files\Java\jdk-13.0.2\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program Files\Java\jdk-13.0.2\bin;.
2020-02-22 22:07:42,520 [myid:] - INFO [main:Environment#100] - Server environment:java.io.tmpdir=C:\Users\erdiv\AppData\Local\Temp\
2020-02-22 22:07:42,521 [myid:] - INFO [main:Environment#100] - Server environment:java.compiler=<NA>
2020-02-22 22:07:42,523 [myid:] - INFO [main:Environment#100] - Server environment:os.name=Windows 10
2020-02-22 22:07:42,524 [myid:] - INFO [main:Environment#100] - Server environment:os.arch=amd64
2020-02-22 22:07:42,525 [myid:] - INFO [main:Environment#100] - Server environment:os.version=10.0
2020-02-22 22:07:42,525 [myid:] - INFO [main:Environment#100] - Server environment:user.name=erdiv
2020-02-22 22:07:42,526 [myid:] - INFO [main:Environment#100] - Server environment:user.home=C:\Users\erdiv
2020-02-22 22:07:42,526 [myid:] - INFO [main:Environment#100] - Server environment:user.dir=D:\zookeeper\zookeeper-3.4.14\bin
2020-02-22 22:07:42,534 [myid:] - INFO [main:ZooKeeperServer#836] - tickTime set to 2000
2020-02-22 22:07:42,534 [myid:] - INFO [main:ZooKeeperServer#845] - minSessionTimeout set to -1
2020-02-22 22:07:42,534 [myid:] - INFO [main:ZooKeeperServer#854] - maxSessionTimeout set to -1
2020-02-22 22:07:42,962 [myid:] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2020-02-22 22:07:42,964 [myid:] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
the process gets stuck and zookeeper is not started
Also, when i do ./zkServer.cmd start, following is the output:
D:\zookeeper\zookeeper-3.4.14\bin>call "C:\Program Files\Java\jdk-13.0.2"\bin\java "-Dzookeeper.log.dir=D:\zookeeper\zookeeper-3.4.14\bin\.." "-Dzookeeper.root.logger=INFO,CONSOLE" -cp "D:\zookeeper\zookeeper-3.4.14\bin\..\build\classes;D:\zookeeper\zookeeper-3.4.14\bin\..\build\lib\*;D:\zookeeper\zookeeper-3.4.14\bin\..\*;D:\zookeeper\zookeeper-3.4.14\bin\..\lib\*;D:\zookeeper\zookeeper-3.4.14\bin\..\conf" org.apache.zookeeper.server.quorum.QuorumPeerMain "D:\zookeeper\zookeeper-3.4.14\bin\..\conf\zoo.cfg" start
2020-02-22 22:14:58,220 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2020-02-22 22:14:58,223 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2020-02-22 22:14:58,225 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2020-02-22 22:14:58,232 [myid:] - WARN [main:QuorumPeerMain#116] - Either no config or no quorum defined in config, running in standalone mode
2020-02-22 22:14:58,338 [myid:] - ERROR [main:ZooKeeperServerMain#57] - Invalid arguments, exiting abnormally
java.lang.NumberFormatException: For input string: "D:\zookeeper\zookeeper-3.4.14\bin\..\conf\zoo.cfg"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
at java.base/java.lang.Integer.parseInt(Integer.java:658)
at java.base/java.lang.Integer.parseInt(Integer.java:776)
at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:61)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:55)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:119)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2020-02-22 22:14:58,340 [myid:] - INFO [main:ZooKeeperServerMain#58] - Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns]
Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns]
the process gets stuck and zookeeper is not started
It's not stuck. It started and is waiting on connections
By the way, Java 13 might not work with Zookeeper 3.4.x

zookeeper issue - taking 15 minutes to recover if leader is killed

i m trying to implement Kafka with zookeeper in my network but i am facing a weird issue with zookeeper. i have looked around google and realized that many other users reported such issue but no one posted any proper solution for this .
My current setup has 3 different zookeeper nodes (32 GB ram dedicated boxes)
The issue is if i kill zookeeper leader, both the remaining follower nodes also goes down and they do not recover for at-least next 15-20 minutes.
All i am getting in the zookeeper logs is "notification timeouts" without any explanation
Here is my zookeeper config file
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=100
maxSessionTimeout=50000
dataDir=/var/lib/zookeeper
clientPort=2181
autopurge.snapRetainCount=100
autopurge.purgeInterval=1
preAllocSize=131072
snapCount=3000000
server.1=zo1:2888:3888
server.2=zo2:2888:3888
server.3=zo3:2888:3888
in my /etc/hosts file i have mapped zo1 , zo2 , zo3 to their ip address.
Note:i have also tested by setting current node ip to 0.0.0.0 it doesn't makes any difference.
just few minutes ago i tested it and again it failed to recover.
As i have three node cluster zo1 , zo2 and zo3 . zo3 was the leader and zo1 and zo2 were followers. after i killed zo3 node . it took approx 13 minutes to recover automatically . i got the following logs in zo1 and zo2 .
Log for zo1.
tail /var/lib/zookeeper/zookeeper.out -n 10000 | grep 'QuorumPeer'
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer#140] - Shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer#505] - shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor#107] - Shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor#184] - Shutting down
2019-01-02 10:25:50,848 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor#402] - shutdown of request processor complete
2019-01-02 10:25:50,849 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor#208] - Shutting down
2019-01-02 10:25:50,849 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer#865] - LOOKING
2019-01-02 10:25:50,850 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#818] - New election. My id = 1, proposed zxid=0x2d00035c8e
2019-01-02 10:25:51,057 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 400
2019-01-02 10:25:51,458 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 800
2019-01-02 10:25:52,259 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 1600
2019-01-02 10:25:53,859 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 3200
2019-01-02 10:25:57,060 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 6400
2019-01-02 10:26:03,461 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 12800
2019-01-02 10:26:16,262 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 25600
2019-01-02 10:26:41,862 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 51200
2019-01-02 10:27:33,063 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:28:33,065 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:29:33,066 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:30:33,066 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:31:33,067 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:32:33,068 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:33:33,069 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:34:33,069 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:35:33,070 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:36:33,071 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:37:33,071 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:38:33,072 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:39:33,073 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:40:33,074 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:41:33,075 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:42:33,076 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:43:33,076 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:43:33,082 [myid:1] - INFO [WorkerSender[myid=1]:QuorumPeer$QuorumServer#167] - Resolved hostname: zo3 to address: zo3/144.76.xxx.xxx
2019-01-02 10:43:33,091 [myid:1] - INFO [WorkerSender[myid=1]:QuorumPeer$QuorumServer#167] - Resolved hostname: zo3 to address: zo3/144.76.xxx.xxx
2019-01-02 10:43:33,290 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer#935] - FOLLOWING
2019-01-02 10:43:33,290 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer#173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 50000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2019-01-02 10:43:33,291 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower#64] - FOLLOWING - LEADER ELECTION TOOK - 1062441
2019-01-02 10:43:33,291 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer#167] - Resolved hostname: zo2 to address: zo2/88.198.35.34
2019-01-02 10:43:33,294 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner#237] - Unexpected exception, tries=0, connecting to zo2/88.198.35.34:2888
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
2019-01-02 10:43:34,468 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner#332] - Getting a diff from the leader 0x2d00035c8e
2019-01-02 10:43:35,120 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer#687] - Established session 0x2680a49e3dc0013 with negotiated timeout 6000 for client /5.9.xxx.xxx:36664
2019-01-02 10:43:35,244 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer#687] - Established session 0x1680a49b6b90011 with negotiated timeout 30000 for client /5.9.xxx.xxx:36668
2019-01-02 10:43:35,625 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower#118] - Got zxid 0x2e00000001 expected 0x1
Logs from node zo2 , which became leader later
2019-01-02 10:25:50,852 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn#1044] - Closed socket connection for client /5.9.xxx.xxx:21218 which had sessionid 0x2680a49e3dc0012
2019-01-02 10:25:50,852 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer#140] - Shutting down
2019-01-02 10:25:50,853 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer#505] - shutting down
2019-01-02 10:25:50,853 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor#107] - Shutting down
2019-01-02 10:25:50,854 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:CommitProcessor#184] - Shutting down
2019-01-02 10:25:50,854 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor#402] - shutdown of request processor complete
2019-01-02 10:25:50,856 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor#208] - Shutting down
2019-01-02 10:25:50,857 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer#865] - LOOKING
2019-01-02 10:25:50,858 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#818] - New election. My id = 2, proposed zxid=0x2d00035c8e
2019-01-02 10:25:51,061 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 400
2019-01-02 10:25:51,462 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 800
2019-01-02 10:25:52,283 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 1600
2019-01-02 10:25:53,884 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 3200
2019-01-02 10:25:57,084 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 6400
2019-01-02 10:26:03,485 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 12800
2019-01-02 10:26:16,286 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 25600
2019-01-02 10:26:41,887 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 51200
2019-01-02 10:27:33,087 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:28:33,088 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:29:33,089 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:30:33,090 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:31:33,091 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:32:33,092 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:33:33,092 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:34:33,093 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:35:33,094 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:36:33,095 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:37:33,095 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:38:33,096 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:39:33,097 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:40:33,098 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:41:33,099 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:42:33,100 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#852] - Notification time out: 60000
2019-01-02 10:43:33,293 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer#947] - LEADING
2019-01-02 10:43:33,299 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader#62] - TCP NoDelay set to: true
2019-01-02 10:43:33,301 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer#173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 50000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2019-01-02 10:43:33,301 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader#371] - LEADING - LEADER ELECTION TOOK - 1062443
2019-01-02 10:43:34,307 [myid:2] - INFO [LearnerHandler-/144.76.120.143:64542:LearnerHandler#346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer#33d2c290
2019-01-02 10:43:34,509 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader#961] - Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x2e00000000
As you can see all i am getting is continuous timeouts in the log without any explanation.
Been testing it since more then a week still cant find any solution for this.
I would be very grateful if somebody could point me in right direction.
Thanks
We faced the same issue and found that zookeeper leader election happened after 15 minutes.
we bypassed this 15 min idle time by setting tcpKeepAlive=true in zookeeper properties.
No election can happen when the cluster size is 2. If you intend to run a HA Zookeeper cluster, increase your zookeeper count to 5. Also, Zookeeper doesn't need 32GB to run effectively.
Check out:
https://docs.confluent.io/current/zookeeper/deployment.html#multi-node-setup for cluster information
and
https://docs.confluent.io/current/zookeeper/deployment.html#jvm for JVM sizing.
TO ANYONE WHO FACE THE SAME ISSUE - the reason for this was zookeeper being idle. if you the zookeeper nodes becomes idle due to no activity (data transaction) between the nodes then internally the zookeeper nodes stops talking to each other . and if any zookeeper nodes goes down the other nodes will not know until they do their periodic ping after 10-15 minutes.

Zookeeper EndOfStreamException happened in Heron Cluster

There is a problem that bothers me. EndOfStreamExceptionalways happen in zookeeper after submitted topologies. Although it does not affect the normal operation of the cluster, I still hope to solve the problem because it may be affect other parts of Heron function.
The zookeeper version is 3.4.10 and was deployed on standalonemode in one host of my cluster. The contents of zoo.cfg are as follows.
tickTime=10000
initLimit=100
syncLimit=50
dataDir=/home/yitian/zookeeper/data
dataLogDir=/home/yitian/zookeeper/logs
clientPort=2181
maxClientCnxns=100
server.1=heron01:2888:3888
Moreover, there is the myidfile in dataDir, and its content is: 1. That is my configuration of zookeeper.
The contents of the statemgr.yaml in heron are as follows:
com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager
heron.statemgr.connection.string: "heron01:2181"
heron.statemgr.root.path: "/heron"
heron.statemgr.zookeeper.is.initialize.tree: True
There are four hosts in my Heron cluster, one mesos master host and three agent hosts. And in zookeeper log files, EndOfStreamException occurred in the three agent hosts. The contents of zookeeper.log are as follows:
2018-07-10 10:30:06,565 [myid:] - INFO [main:QuorumPeerConfig#134] - Reading configuration from: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
2018-07-10 10:30:06,574 [myid:] - INFO [main:QuorumPeer$QuorumServer#167] - Resolved hostname: heron01 to address: heron01/218.195.228.24
2018-07-10 10:30:06,574 [myid:] - ERROR [main:QuorumPeerConfig#345] - Invalid configuration, only one server specified (ignoring)
2018-07-10 10:30:06,576 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2018-07-10 10:30:06,576 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2018-07-10 10:30:06,576 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2018-07-10 10:30:06,576 [myid:] - WARN [main:QuorumPeerMain#113] - Either no config or no quorum defined in config, running in standalone mode
2018-07-10 10:30:06,584 [myid:] - INFO [main:QuorumPeerConfig#134] - Reading configuration from: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
2018-07-10 10:30:06,584 [myid:] - INFO [main:QuorumPeer$QuorumServer#167] - Resolved hostname: heron01 to address: heron01/218.195.228.24
2018-07-10 10:30:06,584 [myid:] - ERROR [main:QuorumPeerConfig#345] - Invalid configuration, only one server specified (ignoring)
2018-07-10 10:30:06,585 [myid:] - INFO [main:ZooKeeperServerMain#96] - Starting server
2018-07-10 10:30:06,589 [myid:] - INFO [main:Environment#100] - Server environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2018-07-10 10:30:06,589 [myid:] - INFO [main:Environment#100] - Server environment:host.name=heron01
2018-07-10 10:30:06,589 [myid:] - INFO [main:Environment#100] - Server environment:java.version=1.8.0_151
2018-07-10 10:30:06,589 [myid:] - INFO [main:Environment#100] - Server environment:java.vendor=Oracle Corporation
2018-07-10 10:30:06,589 [myid:] - INFO [main:Environment#100] - Server environment:java.home=/usr/java/jdk1.8.0_151/jre
2018-07-10 10:30:06,590 [myid:] - INFO [main:Environment#100] - Server environment:java.class.path=/home/yitian/zookeeper/zookeeper-3.4.10/bin/../build/classes:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../build/lib/*.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/slf4j-api-1.6.1.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/netty-3.10.5.Final.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/log4j-1.2.16.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/jline-0.9.94.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../zookeeper-3.4.10.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../src/java/lib/*.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../build/classes:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../build/lib/*.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/slf4j-api-1.6.1.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/netty-3.10.5.Final.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/log4j-1.2.16.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/jline-0.9.94.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../zookeeper-3.4.10.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../src/java/lib/*.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf:.:/usr/java/jdk1.8.0_151/lib:/usr/java/jdk1.8.0_151/jre/lib
2018-07-10 10:30:06,590 [myid:] - INFO [main:Environment#100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2018-07-10 10:30:06,590 [myid:] - INFO [main:Environment#100] - Server environment:java.io.tmpdir=/tmp
2018-07-10 10:30:06,590 [myid:] - INFO [main:Environment#100] - Server environment:java.compiler=<NA>
2018-07-10 10:30:06,590 [myid:] - INFO [main:Environment#100] - Server environment:os.name=Linux
2018-07-10 10:30:06,590 [myid:] - INFO [main:Environment#100] - Server environment:os.arch=amd64
2018-07-10 10:30:06,591 [myid:] - INFO [main:Environment#100] - Server environment:os.version=4.13.0-43-generic
2018-07-10 10:30:06,591 [myid:] - INFO [main:Environment#100] - Server environment:user.name=yitian
2018-07-10 10:30:06,591 [myid:] - INFO [main:Environment#100] - Server environment:user.home=/home/yitian
2018-07-10 10:30:06,591 [myid:] - INFO [main:Environment#100] - Server environment:user.dir=/home/yitian
2018-07-10 10:30:06,595 [myid:] - INFO [main:ZooKeeperServer#829] - tickTime set to 5000
2018-07-10 10:30:06,595 [myid:] - INFO [main:ZooKeeperServer#838] - minSessionTimeout set to -1
2018-07-10 10:30:06,595 [myid:] - INFO [main:ZooKeeperServer#847] - maxSessionTimeout set to -1
2018-07-10 10:30:06,601 [myid:] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
2018-07-10 10:30:07,585 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39618
2018-07-10 10:30:07,600 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0004 at /218.195.228.24:39618
2018-07-10 10:30:07,608 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0004 with negotiated timeout 10000 for client /218.195.228.24:39618
2018-07-10 10:30:07,624 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#648] - Got user-level KeeperException when processing sessionid:0x16482052aec0004 type:create cxid:0x1c zxid:0x158f txntype:-1 reqpath:n/a Error Path:/aurora/scheduler/member_0000000454 Error:KeeperErrorCode = NodeExists for /aurora/scheduler/member_0000000454
2018-07-10 10:30:07,625 [myid:] - INFO [SyncThread:0:FileTxnLog#203] - Creating new log file: log.158f
2018-07-10 10:30:09,917 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39622
2018-07-10 10:30:09,917 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39620
2018-07-10 10:30:09,917 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39622; will be dropped if server is in r-o mode
2018-07-10 10:30:09,917 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0000 at /218.195.228.24:39622
2018-07-10 10:30:09,918 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0000 with negotiated timeout 10000 for client /218.195.228.24:39622
2018-07-10 10:30:09,918 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39626
2018-07-10 10:30:09,918 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39620; will be dropped if server is in r-o mode
2018-07-10 10:30:09,919 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0006 at /218.195.228.24:39620
2018-07-10 10:30:09,919 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0006 with negotiated timeout 10000 for client /218.195.228.24:39620
2018-07-10 10:30:09,919 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39628
2018-07-10 10:30:09,919 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39626; will be dropped if server is in r-o mode
2018-07-10 10:30:09,919 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0001 at /218.195.228.24:39626
2018-07-10 10:30:09,920 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0001 with negotiated timeout 10000 for client /218.195.228.24:39626
2018-07-10 10:30:09,920 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39630
2018-07-10 10:30:09,920 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39628; will be dropped if server is in r-o mode
2018-07-10 10:30:09,920 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0007 at /218.195.228.24:39628
2018-07-10 10:30:09,921 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0007 with negotiated timeout 10000 for client /218.195.228.24:39628
2018-07-10 10:30:09,921 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39630; will be dropped if server is in r-o mode
2018-07-10 10:30:09,921 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0005 at /218.195.228.24:39630
2018-07-10 10:30:09,921 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0005 with negotiated timeout 10000 for client /218.195.228.24:39630
2018-07-10 10:30:09,922 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39632
2018-07-10 10:30:09,922 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39632; will be dropped if server is in r-o mode
2018-07-10 10:30:09,922 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0002 at /218.195.228.24:39632
2018-07-10 10:30:09,922 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0002 with negotiated timeout 10000 for client /218.195.228.24:39632
2018-07-10 10:30:09,922 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39634
2018-07-10 10:30:09,923 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.24:39634; will be dropped if server is in r-o mode
2018-07-10 10:30:09,923 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#935] - Client attempting to renew session 0x16482052aec0003 at /218.195.228.24:39634
2018-07-10 10:30:09,925 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#687] - Established session 0x16482052aec0003 with negotiated timeout 10000 for client /218.195.228.24:39634
2018-07-10 10:56:28,040 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39852
2018-07-10 10:56:28,043 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.24:39852
2018-07-10 10:56:28,044 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770000 with negotiated timeout 30000 for client /218.195.228.24:39852
2018-07-10 10:56:37,924 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.24:39860
2018-07-10 10:56:37,925 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.24:39860
2018-07-10 10:56:37,926 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770001 with negotiated timeout 10000 for client /218.195.228.24:39860
2018-07-10 10:56:37,936 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#486] - Processed session termination for sessionid: 0x164820646770001
2018-07-10 10:56:37,941 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /218.195.228.24:39860 which had sessionid 0x164820646770001
2018-07-10 10:56:43,514 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.43:35560
2018-07-10 10:56:43,516 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.43:35560
2018-07-10 10:56:43,521 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770002 with negotiated timeout 10000 for client /218.195.228.43:35560
2018-07-10 10:56:44,022 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.19:50174
2018-07-10 10:56:44,026 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.19:50174
2018-07-10 10:56:44,031 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770003 with negotiated timeout 10000 for client /218.195.228.19:50174
2018-07-10 10:56:44,130 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.28:44134
2018-07-10 10:56:44,149 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.28:44134
2018-07-10 10:56:44,154 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770004 with negotiated timeout 10000 for client /218.195.228.28:44134
2018-07-10 10:56:45,730 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor#486] - Processed session termination for sessionid: 0x164820646770000
2018-07-10 10:56:45,736 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /218.195.228.24:39852 which had sessionid 0x164820646770000
2018-07-10 10:57:16,966 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.43:35572
2018-07-10 10:57:16,970 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-07-10 10:57:16,977 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /218.195.228.43:35572 (no session established for client)
2018-07-10 10:57:16,979 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.43:35574
2018-07-10 10:57:16,980 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.43:35574
2018-07-10 10:57:16,984 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770005 with negotiated timeout 10000 for client /218.195.228.43:35574
2018-07-10 10:57:17,095 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.43:35578
2018-07-10 10:57:17,097 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.43:35578; will be dropped if server is in r-o mode
2018-07-10 10:57:17,097 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.43:35578
2018-07-10 10:57:17,101 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770006 with negotiated timeout 30000 for client /218.195.228.43:35578
2018-07-10 10:57:17,747 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.19:50186
2018-07-10 10:57:17,749 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-07-10 10:57:17,751 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /218.195.228.19:50186 (no session established for client)
2018-07-10 10:57:17,752 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.19:50188
2018-07-10 10:57:17,752 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.19:50188
2018-07-10 10:57:17,756 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770007 with negotiated timeout 10000 for client /218.195.228.19:50188
2018-07-10 10:57:18,742 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.19:50190
2018-07-10 10:57:18,743 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.19:50190; will be dropped if server is in r-o mode
2018-07-10 10:57:18,744 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.19:50190
2018-07-10 10:57:18,751 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770008 with negotiated timeout 30000 for client /218.195.228.19:50190
2018-07-10 10:57:20,456 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.43:35586
2018-07-10 10:57:20,458 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.43:35586
2018-07-10 10:57:20,463 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x164820646770009 with negotiated timeout 30000 for client /218.195.228.43:35586
2018-07-10 10:57:36,004 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.28:44146
2018-07-10 10:57:36,004 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-07-10 10:57:36,004 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /218.195.228.28:44146 (no session established for client)
2018-07-10 10:57:36,006 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.28:44148
2018-07-10 10:57:36,006 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.28:44148
2018-07-10 10:57:36,008 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x16482064677000a with negotiated timeout 10000 for client /218.195.228.28:44148
2018-07-10 10:57:36,892 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /218.195.228.28:44152
2018-07-10 10:57:36,893 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#896] - Connection request from old client /218.195.228.28:44152; will be dropped if server is in r-o mode
2018-07-10 10:57:36,894 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /218.195.228.28:44152
2018-07-10 10:57:36,898 [myid:] - INFO [SyncThread:0:ZooKeeperServer#687] - Established session 0x16482064677000b with negotiated timeout 30000 for client /218.195.228.28:44152
In above zookeeper.log, the host with IP:218.195.228.24 is the master node, hosts list with IP: 28, 19, 43 are agents nodes.
I have tried a lot of methods, but they don't work. What is the cause of the problem? And how can I fix it? Thanks a lot for your help.
Yitan - If you are using in Aurora and during submission, the client directly writes into the zookeeper the logical plan. It might be possible the zookeeper client running on heron client might not be gracefully terminating.
Issue seems to be at client side as per below ZK issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-1582
You may have to check the logs of the Zookeeper client trying to connect.

Zookeeper running or not in relation to standard port 2181 usage?

CLOUDERA QUICKSTART 5.13 as follows.
I am not sure whether zookeeper out of the box is running or not, and if so, then if it would work reliably? I got this when trying to run zookeeper from within the from kafka supplied version that I downloaded, in standalone mode:
[2018-06-17 00:49:32,847] INFO binding to port 0.0.0.0/0.0.0.0:2181
(org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-06-17 00:49:32,854] ERROR Unexpected exception, exiting abnormally
(org.apache.zookeeper.server.ZooKeeperServerMain)
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
netstat on the vm reveals:
[cloudera#quickstart kafka_2.11-1.1.0]$ netstat -an | grep 2181
tcp 0 0 0.0.0.0:2181 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:2181 127.0.0.1:49718
ESTABLISHED
tcp 0 0 127.0.0.1:49707 127.0.0.1:2181
ESTABLISHED
tcp 0 0 127.0.0.1:2181 127.0.0.1:49707
ESTABLISHED
tcp 0 0 127.0.0.1:49697 127.0.0.1:2181
ESTABLISHED
tcp 0 0 10.0.2.15:49065 10.0.2.15:2181
ESTABLISHED
tcp 0 0 127.0.0.1:49718 127.0.0.1:2181
ESTABLISHED
tcp 0 0 127.0.0.1:49706 127.0.0.1:2181
ESTABLISHED
tcp 0 0 127.0.0.1:49714 127.0.0.1:2181
ESTABLISHED
tcp 0 0 10.0.2.15:2181 10.0.2.15:49060
ESTABLISHED
tcp 0 0 10.0.2.15:2181 10.0.2.15:49065
ESTABLISHED
tcp 0 0 127.0.0.1:2181 127.0.0.1:49701
ESTABLISHED
tcp 0 0 127.0.0.1:2181 127.0.0.1:49714
ESTABLISHED
tcp 0 0 127.0.0.1:2181 127.0.0.1:49706
ESTABLISHED
tcp 0 0 10.0.2.15:49060 10.0.2.15:2181
ESTABLISHED
tcp 0 0 127.0.0.1:49701 127.0.0.1:2181
ESTABLISHED
tcp 0 0 127.0.0.1:2181 127.0.0.1:49697
ESTABLISHED
sudo jps when executed shows QuorumPeerMain - which I think is zookeeper these days(?):
8196
5559 SecondaryNameNode
7116 HistoryServer
5831 NodeManager
5290 DataNode
10995 Jps
5216 QuorumPeerMain
6449 ThriftServer
6587 RunJar
7068 Bootstrap
5384 JournalNode
7879 Bootstrap
6317 RESTServer
7237 HRegionServer
5687 Bootstrap
6061 ResourceManager
8124 Bootstrap
8153
5479 NameNode
5745 JobHistoryServer
6699 RunJar
6158 HMaster
Not sure what to make of it as got the below when starting zookeeper from cloudera install. Do I have zookeeper working? No such process to kill means?
[cloudera#quickstart kafka_2.11-1.1.0]$ sudo
/usr/lib/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[cloudera#quickstart kafka_2.11-1.1.0]$ sudo /usr/lib/zookeeper/bin/zkServer.sh stop
JMX enabled by default
Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... /usr/lib/zookeeper/bin/zkServer.sh: line 162: kill:
(11140) - No such process
STOPPED
Basically, I find the observations not adding up with standard descriptions.
As your error states that you have an
Address already in use
you should figure out (as beny23 said) which process occupies that port:
lsof -i :2181
Once you have a zookeeper running you should see logs like:
Validating environment
ZK_REPLICAS=1
MY_ID=1
ZK_LOG_LEVEL=INFO
ZK_DATA_DIR=/var/lib/zookeeper/data
ZK_DATA_LOG_DIR=/var/lib/zookeeper/log
ZK_LOG_DIR=/var/log/zookeeper
ZK_CLIENT_PORT=2181
ZK_SERVER_PORT=2888
ZK_ELECTION_PORT=3888
ZK_TICK_TIME=2000
ZK_INIT_LIMIT=10
ZK_SYNC_LIMIT=2000
ZK_MAX_CLIENT_CNXNS=60
ZK_MIN_SESSION_TIMEOUT= 4000
ZK_MAX_SESSION_TIMEOUT= 40000
ZK_HEAP_SIZE=1G
ZK_SNAP_RETAIN_COUNT=3
ZK_PURGE_INTERVAL=0
ENSEMBLE
server.1=zookeeper-0.zookeeper.default.svc.cluster.local:2888:3888
Environment validation successful
Creating ZooKeeper configuration
Wrote ZooKeeper configuration file to /etc/zookeeper/zoo.cfg
Creating ZooKeeper log4j configuration
Wrote log4j configuration to /etc/zookeeper/log4j.properties
Creating ZooKeeper data directories and setting permissions
Created ZooKeeper data directories and set permissions in /var/lib/zookeeper/data
Creating JVM configuration file
Wrote JVM configuration to /etc/zookeeper/java.env
ZooKeeper JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
2018-06-18 07:52:43,747 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: /etc/zookeeper/zoo.cfg
2018-06-18 07:52:43,752 [myid:] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2018-06-18 07:52:43,752 [myid:] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2018-06-18 07:52:43,752 [myid:] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2018-06-18 07:52:43,752 [myid:] - WARN [main:QuorumPeerMain#116] - Either no config or no quorum defined in config, running in standalone mode
2018-06-18 07:52:43,764 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: /etc/zookeeper/zoo.cfg
2018-06-18 07:52:43,764 [myid:] - INFO [main:ZooKeeperServerMain#98] - Starting server
2018-06-18 07:52:43,771 [myid:] - INFO [main:Environment#100] - Server environment:zookeeper.version=3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
2018-06-18 07:52:43,771 [myid:] - INFO [main:Environment#100] - Server environment:host.name=zookeeper-0.zookeeper.default.svc.cluster.local
2018-06-18 07:52:43,771 [myid:] - INFO [main:Environment#100] - Server environment:java.version=1.8.0_151
2018-06-18 07:52:43,772 [myid:] - INFO [main:Environment#100] - Server environment:java.vendor=Oracle Corporation
2018-06-18 07:52:43,772 [myid:] - INFO [main:Environment#100] - Server environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2018-06-18 07:52:43,772 [myid:] - INFO [main:Environment#100] - Server environment:java.class.path=/usr/bin/../build/classes:/usr/bin/../build/lib/*.jar:/usr/bin/../share/zookeeper/zookeeper-3.4.11.jar:/usr/bin/../share/zookeeper/slf4j-log4j12-1.6.1.jar:/usr/bin/../share/zookeeper/slf4j-api-1.6.1.jar:/usr/bin/../share/zookeeper/netty-3.10.5.Final.jar:/usr/bin/../share/zookeeper/log4j-1.2.16.jar:/usr/bin/../share/zookeeper/jline-0.9.94.jar:/usr/bin/../share/zookeeper/audience-annotations-0.5.0.jar:/usr/bin/../src/java/lib/*.jar:/etc/zookeeper:
2018-06-18 07:52:43,772 [myid:] - INFO [main:Environment#100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2018-06-18 07:52:43,772 [myid:] - INFO [main:Environment#100] - Server environment:java.io.tmpdir=/tmp
2018-06-18 07:52:43,772 [myid:] - INFO [main:Environment#100] - Server environment:java.compiler=<NA>
2018-06-18 07:52:43,773 [myid:] - INFO [main:Environment#100] - Server environment:os.name=Linux
2018-06-18 07:52:43,773 [myid:] - INFO [main:Environment#100] - Server environment:os.arch=amd64
2018-06-18 07:52:43,773 [myid:] - INFO [main:Environment#100] - Server environment:os.version=4.9.64
2018-06-18 07:52:43,773 [myid:] - INFO [main:Environment#100] - Server environment:user.name=zookeeper
2018-06-18 07:52:43,774 [myid:] - INFO [main:Environment#100] - Server environment:user.home=/home/zookeeper
2018-06-18 07:52:43,774 [myid:] - INFO [main:Environment#100] - Server environment:user.dir=/usr/bin
2018-06-18 07:52:43,778 [myid:] - INFO [main:ZooKeeperServer#825] - tickTime set to 2000
2018-06-18 07:52:43,778 [myid:] - INFO [main:ZooKeeperServer#834] - minSessionTimeout set to 4000
2018-06-18 07:52:43,778 [myid:] - INFO [main:ZooKeeperServer#843] - maxSessionTimeout set to 40000
2018-06-18 07:52:43,785 [myid:] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2018-06-18 07:52:43,788 [myid:] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
2018-06-18 07:52:56,088 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /127.0.0.1:40634
2018-06-18 07:52:56,093 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ServerCnxn#324] - The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 1685417328=dump, 1668445044=crst, 1936880500=srst, 1701738089=envi, 1668247142=conf, 2003003507=wchs, 2003003504=wchp, 1668247155=cons, 1835955314=mntr, 1769173615=isro, 1920298859=ruok, 1735683435=gtmk, 1937010027=stmk}]
In your config you should have define the server name, I have this line (I deployed zookeeper in a kubernetes cluster)
server.1=zookeeper-0.zookeeper.default.svc.cluster.local:2888:3888
Anyway you need to declare it (in your zoo.cfg) or pass it as variable at startup.