We are using Kafka high level consumer , and we are able to successfully consume messages but the zookeeper connections keep expiring and reestablishing.
I am wondering why are there no heartbeats to keep the connections alive:
Kafka Consumer Logs
====================
[localhost-startStop-1-SendThread(10.41.105.23:2181)] [ClientCnxn$SendThread] [line : 1096 ] - Client session timed out, have not heard from server in 2666ms for sessionid 0x153175bd3860159, closing socket connection and attempting reconnect
2016-03-08 18:00:06,750 INFO [localhost-startStop-1-SendThread(10.41.105.23:2181)] [ClientCnxn$SendThread] [line : 975 ] - Opening socket connection to server 10.41.105.23/10.41.105.23:2181. Will not attempt to authenticate using SASL (unknown error)
2016-03-08 18:00:06,823 INFO [localhost-startStop-1-SendThread(10.41.105.23:2181)] [ClientCnxn$SendThread] [line : 852 ] - Socket connection established to 10.41.105.23/10.41.105.23:2181, initiating session
2016-03-08 18:00:06,892 INFO [localhost-startStop-1-SendThread(10.41.105.23:2181)] [ClientCnxn$SendThread] [line : 1235 ] - Session establishment complete on server 10.41.105.23/10.41.105.23:2181, sessionid = 0x153175bd3860159, negotiated timeout = 4000
Zookeeper Logs
==================
[2016-03-08 17:44:37,722] INFO Accepted socket connection from /10.10.113.92:51333 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2016-03-08 17:44:37,742] INFO Client attempting to renew session 0x153175bd3860159 at /10.10.113.92:51333 (org.apache.zookeeper.server.ZooKeeperServer)
[2016-03-08 17:44:37,742] INFO Established session 0x153175bd3860159 with negotiated timeout 4000 for client /10.10.113.92:51333 (org.apache.zookeeper.server.ZooKeeperServer)
[2016-03-08 17:46:56,000] INFO Expiring session 0x153175bd3860151, timeout of 4000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2016-03-08 17:46:56,001] INFO Processed session termination for sessionid: 0x153175bd3860151 (org.apache.zookeeper.server.PrepRequestProcessor)
[2016-03-08 17:46:56,011] INFO Closed socket connection for client /10.10.114.183:38324 which had sessionid 0x153175bd3860151 (org.apache.zookeeper.server.NIOServerCnxn)
Often ZooKeeper session timeouts are caused by "soft failures," which are most commonly a garbage collection pause. Turn on GC logging and see if a long GC occurs at the time the connection times out. Also, read about JVM tuning in Kafka.
[2016-03-08 17:46:56,000] INFO Expiring session 0x153175bd3860151,
timeout of 4000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
What is Zookeeper's maxSessionTimeout?
If it's just 4000ms (4 seconds), then it's way too small.
In Cloudera distribution of Hadoop, ZK's maxSessionTimeout is by default 40s
(40000ms).
As explained in ZK configuration -
https://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
it defaults 20 ticks
(and one tick by default is 2 seconds).
Related
I had previously run zookeeper and kafka successfully many times, and I believe my installation and configurations are correct.
The only change I made was to the zookeeper config file:
dataDir=/Users/garynackenson/Downloads/kafka_2.12-2.0.0/data/zookeeper
which I have created the directory for.
Now when I run zookeeper instead of getting info binding to port 0.0.0.0/0.0.0.0:2181
I get the error below, and kafka fails with a port 9092 in use error (i have restarted my machine and checked every way i know to see that port 9092 is not in use
the last message from zoopkeeper below, which does not look right
INFO Established session 0x10000025c8a0001 with negotiated timeout 6000 for client /127.0.0.1:49977 (org.apache.zookeeper.server.ZooKeeperServer)
When zookeeper starts that way kafka fails with a 9092 in use error (see below) – I restarted and checked that I am not using port 9092.
org.apache.kafka.common.KafkaException: Socket server failed to bind to 0.0.0.0:9092: Address already in use.
A little while later , I saw that zookeeper had a different issue:
INFO Closed socket connection for client /0:0:0:0:0:0:0:1:49986 which had sessionid 0x100000b679d0000 (org.apache.zookeeper.server.NIOServerCnxn)
I ran zookeeper again and saw the more ‘normal’ binding to 2081 4 messages up
[2018-10-03 18:25:08,064] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-10-03 18:25:09,055] INFO Accepted socket connection from /127.0.0.1:50014 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-10-03 18:25:09,062] INFO Client attempting to renew session 0x10000025c8a0001 at /127.0.0.1:50014 (org.apache.zookeeper.server.ZooKeeperServer)
[2018-10-03 18:25:09,066] INFO Established session 0x10000025c8a0001 with negotiated timeout 6000 for client /127.0.0.1:50014 (org.apache.zookeeper.server.ZooKeeperServer)
but kafka is still failing every time
also sometimes i get the following message when i start zookeeper
[2018-10-03 18:10:36,097] INFO Got user-level KeeperException when processing sessionid:0x10000025c8a0001 type:delete cxid:0x47 zxid:0x179 txntype:-1 reqpath:n/a Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election (org.apache.zookeeper.server.PrepRequestProcessor)
I have setup Kafka 3-node cluster and Zookeeper 3-node cluster, on separate nodes. Using Kafka I can produce and consume messages successfully and run commands like kafka-topic.sh to get topic lists and their informations from Zookeeper, but there are some errors on Kafka server.log file. The following warning appears continuously:
[2018-02-18 21:50:01,241] WARN Client session timed out, have not heard from server in 320190154ms for sessionid 0x161a94b101f0001 (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:01,242] INFO Client session timed out, have not heard from server in 320190154ms for sessionid 0x161a94b101f0001, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:01,343] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2018-02-18 21:50:01,989] INFO Opening socket connection to server zookeeper3/192.168.1.206:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,008] INFO Socket connection established to zookeeper3/192.168.1.206:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,042] INFO Session establishment complete on server zookeeper3/192.168.1.206:2181, sessionid = 0x161a94b101f0001, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,042] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2018-02-18 21:59:31,570] INFO [Group Metadata Manager on Broker 102]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
It seems the Kafka sessions in zookeeper expires periodically!
In Zookeeper logs are the following warninngs, too:
2018-02-18 18:20:06,149 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x161a94b101f0001, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-02-18 18:20:06,151 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /192.168.1.203:43162 which had sessionid 0x161a94b101f0001
2018-02-18 18:20:06,781 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x161a94b101f0002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-02-18 18:20:06,782 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /192.168.1.201:45330 which had sessionid 0x161a94b101f0002
2018-02-18 18:37:29,127 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /192.168.1.202:52480
2018-02-18 18:37:29,139 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /192.168.1.202:52480
2018-02-18 18:37:29,143 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer#687] - Established session 0x161a94b101f0003 with negotiated timeout 30000 for client /192.168.1.202:52480
2018-02-18 18:37:29,432 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /192.168.1.202:52480 which had sessionid 0x161a94b101f0003
I think it's because zookeeper can't get heartbeat from Kafka nodes. The followings are Zookeeper zoo.cfg:
tickTime=2000
dataDir=/var/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
and Kafka server.properties customized setting:
broker.id=1
listeners = PLAINTEXT://kafka1:9092
num.partitions=24
delete.topic.enable=true
default.replication.factor=2
log.dirs=/data/kafka/data
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
log.retention.hours=168
I use the same zookeeper cluster for Hadoop HA without any problem. I think there is something wrong with the Kafka properties listeners and advertised.listeners. I read the Kafka documentation but couldn't understand their meaning.
In the host file of all OSes, hostnames such that zookeeper1 to zookeeper3 and kafka1 to kafka3 are defined and reachable through ping command. I removed the following lines from hosts:
127.0.0.1 localhost
127.0.1.1 hostname
I think it couldn't cause the problem.
Kafka version: 0.11
Zookeeper version: 3.4.10
Can anyone help?
We were facing a similar issue with Kafka. As #Soheil pointed out it was due to a Major GC running.
When a Major GC runs, then Kafka would sometimes not be able to send heartbeat to zookeeper. For us the Major GC was running almost once every 15 sec. On taking a heap dump, we realized it was due to a Metric Memory Leak in Kafka.
Zookeeper is accepting a socket connection from 50167(in my case) and then closed it.
[2017-09-04 14:44:23,926] INFO Server environment:user.dir=C:\kafka_2.11-0.11.0.0 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-09-04 14:44:24,013] INFO tickTime set to 3000 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-09-04 14:44:24,029] INFO minSessionTimeout set to -1 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-09-04 14:44:24,045] INFO maxSessionTimeout set to -1 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-09-04 14:44:24,245] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-09-04 14:45:16,525] INFO Accepted socket connection from /0:0:0:0:0:0:0:1:50167 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-09-04 14:45:16,557] INFO Client attempting to establish new session at /0:0:0:0:0:0:0:1:50167 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-09-04 14:45:16,572] INFO Creating new log file: log.b1 (org.apache.zookeeper.server.persistence.FileTxnLog)
[2017-09-04 14:45:16,613] INFO Established session 0x15e4c2b5f7f0000 with negotiated timeout 6000 for client /0:0:0:0:0:0:0:1:50167 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-09-04 14:45:17,939] INFO Processed session termination for sessionid: 0x15e4c2b5f7f0000 (org.apache.zookeeper.server.PrepRequestProcessor)
[2017-09-04 14:45:17,970] INFO Closed socket connection for client /0:0:0:0:0:0:0:1:50167 which had sessionid 0x15e4c2b5f7f0000 (org.apache.zookeeper.server.NIOServerCnxn)
Due to this, Kafka server is also failing to run. Any solution will be helpful.
After, I deleted the old logs at location:
i.e. folders kafka_2.11-0.11.0.0kafka-logs & kafka_2.11-0.11.0.0zookeeper-data, the server started fine.
I installed fresh zookeeper & kafka both. I started them both. Then when I want to see the list of topics with this command:
bin/kafka-topics.sh --list --zookeeper localhost 2181
It gives me the socket connection closed. Here is the screen shot:
darpanshah#darpan-ubuntu:/opt/Kafka$ bin/kafka-topics.sh --list --zookeeper localhost 2181
2016-08-17 10:44:44,053] INFO Accepted socket connection from /127.0.0.1:48452 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2016-08-17 10:44:44,059] INFO Client attempting to establish new session at /127.0.0.1:48452 (org.apache.zookeeper.server.ZooKeeperServer)
[2016-08-17 10:44:44,069] INFO Established session 0x15698f5ac360001 with negotiated timeout 30000 for client /127.0.0.1:48452 (org.apache.zookeeper.server.ZooKeeperServer)
[2016-08-17 10:44:44,095] INFO Processed session termination for sessionid: 0x15698f5ac360001 (org.apache.zookeeper.server.PrepRequestProcessor)
[2016-08-17 10:44:44,105] INFO Closed socket connection for client /127.0.0.1:48452 which had sessionid 0x15698f5ac360001 (org.apache.zookeeper.server.NIOServerCnxn)
darpanshah#darpan-ubuntu:/opt/Kafka$
Thanks in advance. Got stuck.
That is not an error. The topic details are fetched from Zookeeper. Hence the client (invoked by kafka-topics.sh) first connects to Zookeeper, then establishes a session, gets the data and then disconnects at the end.
This is the expected behavior of any clients that will get some data from Zookeeper.
How to programmatically detect which server in a ZooKeeper ensemble a client is connected to?
I'm using the Apache Curator API and I am listening for state changes in connection by registering ConnectionStateListener. I would like to know which server in the ensemble a client is connected to when the client reconnects if the server it was connected to goes down.
You can see this in the logs produced by Curator. In the example output below, the CuratorFramework client has been given 4 different ZooKeeper instances in the connectionString that it can connect to. As can be seen in the log, it choses the first:
21:13:45.384 [main] INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
21:13:45.386 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState#2876f0c
21:13:45.388 [main-SendThread(127.0.0.1:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
21:13:45.388 [main-SendThread(127.0.0.1:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
21:13:45.392 [main-SendThread(127.0.0.1:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x14aac461eb70004, negotiated timeout = 40000
In case the ZooKeeper server that the client has connected to crashes, you will also see the new server that the client connects to in the logs:
21:23:03.675 [main-SendThread(127.0.0.1:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 127.0.0.1/127.0.0.1:2182. Will not attempt to authenticate using SASL (unknown error)
21:23:03.677 [main-SendThread(127.0.0.1:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 127.0.0.1/127.0.0.1:2182, initiating session
21:23:03.697 [main-SendThread(127.0.0.1:2182)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 127.0.0.1/127.0.0.1:2182, sessionid = 0x14aac461eb70004, negotiated timeout = 40000
21:23:03.697 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED