Zookeeper refuses Kafka connection from an old client - apache-kafka

I have a cluster configuration using Kubernetes on GCE, I have a pod for zookeeper and other for Kafka; it was working normally until Zookeeper get crashed and restarted, and it start refusing connections from the kafka pod:
Refusing session request for client /10.4.4.58:52260 as it has seen
zxid 0x1962630
The complete refusal log is here:
2017-08-21 20:05:32,013 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /10.4.4.58:52260
2017-08-21 20:05:32,013 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#882] - Connection request from old client /10.4.4.58:52260; will be dropped if server is in r-o mode
2017-08-21 20:05:32,013 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#901] - Refusing session request for client /10.4.4.58:52260 as it has seen zxid 0x1962630 our last zxid is 0xab client must try another server
2017-08-21 20:05:32,013 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1008] - Closed socket connection for client /10.4.4.58:52260 (no session established for client)

Because the kafka maintain a zookeeper session which remember the last zxid it has seen. So when the zookeeper sevice go down and come again, the zk's zxid begin from a smaller value. and ZKserver think the kafka has seen a bigger zxid, so it refuse it.
Have a try to restart the kafka.

For the record, I had this problem and all my kafka were off.
But, my kafka-manager was still up and listening on zookeepers. Turning it off resolved the issue.

Related to the answer from #GuangshengZuo.... Steps
Stop any residual zookeeper instances - zookeeper-server-stop.bat
Start a fresh zookeeper- zookeeper-server-start.bat .\config\zookeeper.properties
This will do

Related

Kafka zookeper session not established after creating consumer

This is my first time using Kafka. I followed this tutorial.
After starting the Zookeper, I started the kafka server. Next a topic was created and then started the consumer for the topic. This is when the Zookeper logs says
Exception causing close of session 0x0: null
2019-01-04 14:11:58,160 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1040] - Closed socket connection for client /127.0.0.1:50480 (no session established for client)
2019-01-04 14:11:59,073 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /127.0.0.1:50481
2019-01-04 14:11:59,074 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: null
2019-01-04 14:11:59,078 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1040] - Closed socket connection for client /127.0.0.1:50481 (no session established for client)
2019-01-04 14:11:59,994 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#215] - Accepted socket connection from /127.0.0.1:50482
2019-01-04 14:11:59,995 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#383] - Exception causing close of session 0x0: null
I am using Windows 10.
kafka_2.11-2.1.0
zookeeper-3.4.12
#itssajan
I tried the similar tutorial as you posted in your section https://dzone.com/articles/running-apache-kafka-on-windows-os, there is a bug in that blog-.
I tried to change zookeeper to bootstrap-server with localhost 9092 and it worked for me.
try this , it should work in windows.
C:\dev\kafka_2.12-2.2.1\kafka_2.12-2.2.1\bin\windows>kafka-console-consumer.bat
--bootstrap-server localhost:9092 --topic test --from-beginning
I cleared
/tmp/zookeeper
/tmp/kafka-logs/
killed all kafka processes.
Then it worked!

Zookeeper does not start normally (Established session 0x10000025c8a0001 with negotiated timeout 6000 )and kafka fails

I had previously run zookeeper and kafka successfully many times, and I believe my installation and configurations are correct.
The only change I made was to the zookeeper config file:
dataDir=/Users/garynackenson/Downloads/kafka_2.12-2.0.0/data/zookeeper
which I have created the directory for.
Now when I run zookeeper instead of getting info binding to port 0.0.0.0/0.0.0.0:2181
I get the error below, and kafka fails with a port 9092 in use error (i have restarted my machine and checked every way i know to see that port 9092 is not in use
the last message from zoopkeeper below, which does not look right
INFO Established session 0x10000025c8a0001 with negotiated timeout 6000 for client /127.0.0.1:49977 (org.apache.zookeeper.server.ZooKeeperServer)
When zookeeper starts that way kafka fails with a 9092 in use error (see below) – I restarted and checked that I am not using port 9092.
org.apache.kafka.common.KafkaException: Socket server failed to bind to 0.0.0.0:9092: Address already in use.
A little while later , I saw that zookeeper had a different issue:
INFO Closed socket connection for client /0:0:0:0:0:0:0:1:49986 which had sessionid 0x100000b679d0000 (org.apache.zookeeper.server.NIOServerCnxn)
I ran zookeeper again and saw the more ‘normal’ binding to 2081 4 messages up
[2018-10-03 18:25:08,064] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-10-03 18:25:09,055] INFO Accepted socket connection from /127.0.0.1:50014 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2018-10-03 18:25:09,062] INFO Client attempting to renew session 0x10000025c8a0001 at /127.0.0.1:50014 (org.apache.zookeeper.server.ZooKeeperServer)
[2018-10-03 18:25:09,066] INFO Established session 0x10000025c8a0001 with negotiated timeout 6000 for client /127.0.0.1:50014 (org.apache.zookeeper.server.ZooKeeperServer)
but kafka is still failing every time
also sometimes i get the following message when i start zookeeper
[2018-10-03 18:10:36,097] INFO Got user-level KeeperException when processing sessionid:0x10000025c8a0001 type:delete cxid:0x47 zxid:0x179 txntype:-1 reqpath:n/a Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election (org.apache.zookeeper.server.PrepRequestProcessor)

Kafka Zookeeper Connection drop continuously

I have setup Kafka 3-node cluster and Zookeeper 3-node cluster, on separate nodes. Using Kafka I can produce and consume messages successfully and run commands like kafka-topic.sh to get topic lists and their informations from Zookeeper, but there are some errors on Kafka server.log file. The following warning appears continuously:
[2018-02-18 21:50:01,241] WARN Client session timed out, have not heard from server in 320190154ms for sessionid 0x161a94b101f0001 (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:01,242] INFO Client session timed out, have not heard from server in 320190154ms for sessionid 0x161a94b101f0001, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:01,343] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2018-02-18 21:50:01,989] INFO Opening socket connection to server zookeeper3/192.168.1.206:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,008] INFO Socket connection established to zookeeper3/192.168.1.206:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,042] INFO Session establishment complete on server zookeeper3/192.168.1.206:2181, sessionid = 0x161a94b101f0001, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2018-02-18 21:50:02,042] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2018-02-18 21:59:31,570] INFO [Group Metadata Manager on Broker 102]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
It seems the Kafka sessions in zookeeper expires periodically!
In Zookeeper logs are the following warninngs, too:
2018-02-18 18:20:06,149 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x161a94b101f0001, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-02-18 18:20:06,151 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /192.168.1.203:43162 which had sessionid 0x161a94b101f0001
2018-02-18 18:20:06,781 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x161a94b101f0002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-02-18 18:20:06,782 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /192.168.1.201:45330 which had sessionid 0x161a94b101f0002
2018-02-18 18:37:29,127 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#192] - Accepted socket connection from /192.168.1.202:52480
2018-02-18 18:37:29,139 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#942] - Client attempting to establish new session at /192.168.1.202:52480
2018-02-18 18:37:29,143 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer#687] - Established session 0x161a94b101f0003 with negotiated timeout 30000 for client /192.168.1.202:52480
2018-02-18 18:37:29,432 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1044] - Closed socket connection for client /192.168.1.202:52480 which had sessionid 0x161a94b101f0003
I think it's because zookeeper can't get heartbeat from Kafka nodes. The followings are Zookeeper zoo.cfg:
tickTime=2000
dataDir=/var/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
and Kafka server.properties customized setting:
broker.id=1
listeners = PLAINTEXT://kafka1:9092
num.partitions=24
delete.topic.enable=true
default.replication.factor=2
log.dirs=/data/kafka/data
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
log.retention.hours=168
I use the same zookeeper cluster for Hadoop HA without any problem. I think there is something wrong with the Kafka properties listeners and advertised.listeners. I read the Kafka documentation but couldn't understand their meaning.
In the host file of all OSes, hostnames such that zookeeper1 to zookeeper3 and kafka1 to kafka3 are defined and reachable through ping command. I removed the following lines from hosts:
127.0.0.1 localhost
127.0.1.1 hostname
I think it couldn't cause the problem.
Kafka version: 0.11
Zookeeper version: 3.4.10
Can anyone help?
We were facing a similar issue with Kafka. As #Soheil pointed out it was due to a Major GC running.
When a Major GC runs, then Kafka would sometimes not be able to send heartbeat to zookeeper. For us the Major GC was running almost once every 15 sec. On taking a heap dump, we realized it was due to a Metric Memory Leak in Kafka.

Zookeeper: Connection request from old client will be dropped if server is in r-o mode

storm version: 0.82
zookeeper version: 3.4.5.
We have a small storm cluster (1 nimbus and 3 supervisors), so using just 1 zookeeper instance that's co-located with storm nimbus.
Infrequently we start getting the following errors in the zookeeper logs and our storm cluster comes to a standstill.
2014-04-05 13:27:32,885 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFact
ory#197] - Accepted socket connection from /10.0.1.183:56121
2014-04-05 13:27:32,886 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#7
93] - Connection request from old client /10.0.1.183:56121; will be dropped if server is in r-o mode
2014-04-05 13:27:32,886 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#8
32] - Client attempting to renew session 0x1452dd02834002e at /10.0.1.183:56121
2014-04-05 13:27:32,886 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#5
95] - Established session 0x1452dd02834002e with negotiated timeout 40000 for client /10.0.1.183:561
21
On the storm end we start seeing the following in supervisor and worker logs:
2014-04-05 11:37:29 ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.
2014-04-05 11:37:29 cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2014-04-05 11:37:31 ClientCnxn [WARN] Session 0x1452dd028340015 for server null, unexpected error,
losing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-04-05 11:37:42 CuratorFrameworkImpl [ERROR] Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(Curat
rFrameworkImpl.java:380)
at com.netflix.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl
java:49)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:617)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
Do we need to downgrade zookeeper to 3.3.3 or is there a known issue/config that we're missing?
We also experienced several issues with Storm 0.9 and Zookeeper 3.4.X, even though not exactly the one you describe.
Storm mailing list are also reporting such incompatibility issues:
https://mail.google.com/mail/u/0/#search/label%3Astorm+zookeeper+3.4/144313a45ba069b5
https://mail.google.com/mail/u/0/#search/label%3Astorm+zookeeper+3.4/1447d95d10ce7582
This later one is pointing us to this Storm pull request, which should hopefully let us use ZK 3.4.X with future versions of Storm when it will be released:
https://github.com/apache/incubator-storm/pull/29
Until then, I would recommend downgrading ZK to 3.3.6 (you may install a specific separate instance of ZK for Storm if you absolutely need ZK 3.4.X for another system). You could also clone the Storm code and merge that pull request locally or compile the latest version of the trunk, but that's a bit adventurous and more tiresome than just waiting for those nice folks to just deliver a new release for us :)
A workaround for this situation is to clear storm's data directory (configured in strom.yaml==>storm.local.dir), then restart the supervisor. I did that in my test environment by clear storm's data directory and restart the nimbus and supervisor.
I think it's caused by a previous crash of the storm cluster, and the supervisor can not recovery from such a spot.

testing kafka consumer and producer failed on connection

I have been trying to test a kafka installation and using the guide created a producer and consumer. When trying to retrieve a message I get the following error:
WARN Session 0x0 for server null, unexpected error, closing socket connection and
attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1146)
[2014-03-04 18:01:20,628] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2014-03-04 18:01:21,315] INFO Opening socket connection to server kafka-test/192.xxxxxx.110:2182 (org.apache.zookeeper.ClientCnxn)
[2014-03-04 18:01:21,418] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
Exception in thread "main" org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:151)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:112)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:123)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:89)
at kafka.consumer.ConsoleConsumer$.main(ConsoleConsumer.scala:178)
at kafka.consumer.ConsoleConsumer.main(ConsoleConsumer.scala)
[2014-03-04 18:01:21,419] INFO EventThread shut down (org.apache.zookeeper.ClientCnxn)
Kafka
Looks like you're not connecting to Zookeeper correctly. I'm not sure of your setup (multi-machine, VMs, containers) so it's hard to say what's wrong. From the debug output I see the following line hinting at your expected Zookeeper IP:
[2014-03-04 18:01:21,315] INFO Opening socket connection to server kafka-test/192.xxxxxx.110:2182 (org.apache.zookeeper.ClientCnxn)
Kafka looks for Zookeeper at the address specified by the zookeeper.connect configuration property in the $KAFKA_HOME/config/server.properties file. Be sure to edit that before starting Kafka. Also, try giving the actual public IP of your Zookeeper instance, not just 127.0.0.1 as that solves a lot of confusion if you're running in containers. In your case it looks like it would be:
zookeeper.connect=192.xxxxxx.110:2182
Also relevant to the Kafka config if you're running on AWS or operating in a container, don't forget to update the following two configuration properties to make sure clients who connect to Kafka see the correct public IP
advertised.host.name
advertised.port
and Kafka sees the correct internal IP
host.name
port
Zookeeper
Zookeeper has some gotchas when setting it up as well. On your Zookeeper instance, don't forget to edit the server configuration property in the zoo.cfg (usually in /etc/zookeeper/conf) file to point to the correct IP for your Zookeeper instance. In your case probably the following:
server.1=192.xxxxxx.110:2888:3888
Those last two ports (2888 3888) are only needed if you're running a Zookeeper cluster (for followers to connect to the leader and Zookeeper leader election, respectively, so be sure to unblock them on firewallish things if you have multiple Zookeeper servers).
Check your zookeeper connection with telnet command:
telnet 192.xxxxxx.110 2181
You probably get an error, in which case check that the process is running:
ps -ef | grep "zookeeper.properties"
If it's not running, start it by going into kafka home directory:
bin/zookeeper-server-start.sh config/zookeeper.properties &
Something wrong with your Zookeper configuration. Make sure your zookeeper is up and running. The default port it runs on is 2181
Bit more info and some code could be useful I believe.
I hit the same issue and the problem was the max client connections property in zookeeper config.
if you see something like "maxClientCnxns = 20" in the config file in /etc/zookeeper/conf, comment it out and restart zookeeper.
You may also check if the all the connections available have already been exhausted. If you are using an API to connect to ZK, make sure you free up the connection after you're done.
I also meet the problem. When I shutdown the firewall of the zk node, it will work.