Kafka Zookeeper Random Restarts - apache-kafka

We are running Hyperledger fabric network with Kafka and zookeeper in production using docker swarm on Azure VM (4 Kafka node, 3 zookeeper nodes) it was running fine but just 2 days back suddenly zookeeper had a restart, after that there's continuous restart on zookeeper having time interval of 6-8 hours.
logs on Kafka node
[2020-07-04 07:48:53,492] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread)
[2020-07-04 07:48:53,492] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Shutdown completed (kafka.server.ReplicaFetcherThread)
[2020-07-04 07:48:53,499] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions xxxx-xxxxx-xxx-xxxxx.
zookeeper leader logs
2020-07-04 07:46:27,070 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x10101beb22c0000 type:create cxid:0x4 zxid:0x2e00000114 txntype:-1 reqpath:n/a Error Path:/brokers/ids Error:KeeperErrorCode = NodeExists for /brokers/ids
2020-07-04 07:48:43,084 [myid:3] - INFO [SessionTracker:ZooKeeperServer#355] - Expiring session 0x2010551ef290000, timeout of 6000ms exceeded
2020-07-04 07:48:43,085 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#487] - Processed session termination for sessionid: 0x2010551ef290000
2020-07-04 07:48:43,091 [myid:3] - INFO [CommitProcessor:3:NIOServerCnxn#1056] - Closed socket connection for client /100.0.20.80:60672 which had sessionid 0x2010551ef290000
2020-07-04 07:48:55,182 [myid:3] - ERROR [LearnerHandler-/100.0.20.80:58940:LearnerHandler#648] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2020-07-04 07:48:55,183 [myid:3] - WARN [LearnerHandler-/100.0.20.80:58940:LearnerHandler#661] - ******* GOODBYE /100.0.20.80:58940 ********
2020-07-04 07:49:57,623 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /100.0.20.80:37838
2020-07-04 07:49:57,637 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /100.0.20.80:37838
2020-07-04 07:49:57,641 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300ed4720900000 with negotiated timeout 12000 for client /100.0.20.80:37838
2020-07-04 07:49:57,670 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#653] - Got user-level KeeperException when processing sessionid:0x300ed4720900000 type:setData cxid:0x1 zxid:0x2e000003b2 txntype:-1 reqpath:n/a Error Path:/brokers/topics/xxxxxxxxxxxx/partitions/0/state Error:KeeperErrorCode = BadVersion for /brokers/topics/xxxxxxxxxxxx/partitions/0/state
my zoo.cfg
clientPort=2181
dataDir=/data
dataLogDir=/datalog
tickTime=6000
initLimit=10
syncLimit=2
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=xxx.xxx.com:2888:3888
server.2=xxx.xxx.com:2888:3888
server.3=0.0.0.0:2888:3888

Related

Zookeeper + BadVersion for /brokers/topics/topic_name/partitions/6/state

from zookeeper log we can see huge lines with the same error as
BadVersion for /brokers/topics/topic_name/partitions/6/state
example from the zookeeper log:
2022-03-04 03:09:23,503 [myid:3] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor#643] - Got user-level KeeperException when processing sessionid:0x27f4d8506b21199 type:setData cxid:0x25483f7 zxid:0x280109b155 txntype:-1 reqpath:n/a Error Path:/brokers/topics/my_first_car/partitions/6/state Error:KeeperErrorCode = BadVersion for /brokers/topics/my_first_car/partitions/6/state
any idea what this errors means?
other similar posts - https://zookeeper.apache.org/doc/r3.2.2/api/org/apache/zookeeper/ZooKeeper.html
Distributed state-machine's zookeeper ensemble fails while processing parallel regions with error KeeperErrorCode = BadVersion

To achieve Kafka-mule integration using Mule 4, I installed zookeeper&kafka on my machine. Both the servers connection goes off after sometime

I started by creating zookeeper and kafka servers respectively on my machine. The connection gets established for both the servers but goes off after a few mins, throwing some errors.
ZOOKEEPER ERROR LOG-
'''2020-05-11 14:32:36,908 [myid:] - INFO [SyncThread:0:FileTxnLog#284] - Creating new log file: log.1
2020-05-11 14:36:05,170 [myid:] - WARN [NIOWorkerThread-1:NIOServerCnxn#373] - Close of session 0x10000cc587b0003
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:324)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-05-11 14:36:10,589 [myid:] - INFO [SessionTracker:ZooKeeperServer#600] - Expiring session 0x10000cc587b0003, timeout of 6000ms exceeded
'''
KAFKA ERROR LOG - Below are the kafka server error logs## Heading ##
'''[2020-05-11 14:33:41,077] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
[2020-05-11 14:36:04,762] ERROR Error while writing to checkpoint file C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint (kafka.server.LogDirFailureChannel)
java.nio.file.FileAlreadyExistsException: C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint.tmp -> C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
.....
[2020-05-11 14:36:04,776] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\Kafkakafka-logs (kafka.server.ReplicaManager)
[2020-05-11 14:36:04,776] ERROR [ReplicaManager broker=0] Error while writing to highwatermark file in directory C:\Kafka\Kafkakafka-logs (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.KafkaStorageException: Error while writing to checkpoint file C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
Caused by: java.nio.file.FileAlreadyExistsException: C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint.tmp -> C:\Kafka\Kafkakafka-logs\replication-offset-checkpoint
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:81)
'''

Error when starting Kafka server in Ubuntu linux

I have been trying to setup Kafka in Ubuntu according to the quickstart guide, but I am having an issue with the Kafka server. I can start Zookeeper without problems, the problem comes when starting the server with the following command:
bin/kafka-server-start.sh config/server.properties
In the terminal of the server I get the following error:
[2019-07-09 21:30:24,997] WARN [Controller id=0, targetBrokerId=0] Error connecting to node 0.0.0.18:9092 (id: 0 rack: null) (org.apache.kafka.clients.NetworkClient)
java.net.SocketException: Invalid argument
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at org.apache.kafka.common.network.Selector.doConnect(Selector.java:278)
at org.apache.kafka.common.network.Selector.connect(Selector.java:256)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:920)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:65)
at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:282)
at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:236)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2019-07-09 21:30:25,026] INFO [ExpirationReaper-0-ElectPreferredLeader]: Stopped (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-07-09 21:30:25,026] INFO [ExpirationReaper-0-ElectPreferredLeader]: Shutdown completed (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2019-07-09 21:30:25,029] INFO [ReplicaManager broker=0] Shut down completely (kafka.server.ReplicaManager)
In the Zookeeper terminal I get the following:
INFO Got user-level KeeperException when processing sessionid:0x100004ba9750000 type:create cxid:0x2 zxid:0x3 txntype:-1 reqpath:n/a Error Path:/brokers Error:KeeperErrorCode = NoNode for /brokers (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-07-09 20:46:41,080] INFO Got user-level KeeperException when processing sessionid:0x100004ba9750000 type:create cxid:0x6 zxid:0x7 txntype:-1 reqpath:n/a Error Path:/config Error:KeeperErrorCode = NoNode for /config (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-07-09 20:46:41,087] INFO Got user-level KeeperException when processing sessionid:0x100004ba9750000 type:create cxid:0x9 zxid:0xa txntype:-1 reqpath:n/a Error Path:/admin Error:KeeperErrorCode = NoNode for /admin (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-07-09 20:46:41,233] INFO Got user-level KeeperException when processing sessionid:0x100004ba9750000 type:create cxid:0x15 zxid:0x15 txntype:-1 reqpath:n/a Error Path:/cluster Error:KeeperErrorCode = NoNode for /cluster (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-07-09 20:46:41,803] INFO Got user-level KeeperException when processing sessionid:0x100004ba9750000 type:multi cxid:0x32 zxid:0x1c txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-07-09 21:29:00,344] WARN Unable to read additional data from client sessionid 0x100004ba9750000, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn)
[2019-07-09 21:29:00,348] INFO Closed socket connection for client /127.0.0.1:39738 which had sessionid 0x100004ba9750000 (org.apache.zookeeper.server.NIOServerCnxn)
What could be the issue?

kafka 0.10.1.1 stops responding sporadically

We use kafka 0.10.1.1 and it is running fine for few hours and sometimes few days. All of sudden it starts giving the below exception and broker loses connection between them. The zookeeper and kafka server processes are running, but not accepting any connection.
We are running kafka and zookeeper on the same node and its a 2 nodes cluster setup. This is just our dev environment.
The below exception was observed in the server log.
2017-03-23 14:05:52,729] WARN [ReplicaFetcherThread-0-38], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest#1893e027 (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 2 was disconnected before the response was read
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
at scala.Option.foreach(Option.scala:257)
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
From Zookeeper.log,
[2017-03-22 22:42:15,600] INFO Client attempting to establish new session at /10.141.202.141:59930 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-03-22 22:42:15,603] INFO Established session 0x25af82f142c0000 with negotiated timeout 6000 for client /11.121.102.441:59930 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-03-22 22:42:29,322] INFO Got user-level KeeperException when processing sessionid:0x25af82f142c0000 type:create cxid:0x3a8 zxid:0x1e0000001a txntype:-1 reqpath:n/a Error Path:/brokers Error:KeeperErrorCode = NodeExists for /brokers (org.apache.zookeeper.server.PrepRequestProcessor)
[2017-03-22 22:42:29,329] INFO Got user-level KeeperException when processing sessionid:0x25af82f142c0000 type:create cxid:0x3a9 zxid:0x1e0000001b txntype:-1 reqpath:n/a Error Path:/brokers/ids Error:KeeperErrorCode = NodeExists for /brokers/ids (org.apache.zookeeper.server.PrepRequestProcessor)
[2017-03-22 22:42:32,943] INFO Accepted socket connection from /17.150.218.7:58233 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-03-22 22:42:32,944] WARN Connection request from old client /17.150.218.7:58233; will be dropped if server is in r-o mode (org.apache.zookeeper.server.ZooKeeperServer)
[2017-03-22 22:42:32,944] INFO Client attempting to establish new session at /17.150.218.7:58233 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-03-22 22:42:32,947] INFO Established session 0x25af82f142c0001 with negotiated timeout 30000 for client /17.121.102.241:58233 (org.apache.zookeeper.server.ZooKeeperServer)
[2017-03-22 22:42:33,402] INFO Processed session termination for sessionid: 0x25af82f142c0001 (org.apache.zookeeper.server.PrepRequestProcessor)
[2017-03-22 22:42:33,405] INFO Closed socket connection for client /17.150.218.7:58233 which had sessionid 0x25af82f142c0001 (org.apache.zookeeper.server.NIOServerCnxn)
Thanks

unable to start kafka server/broker

when starting the kafka broker i am getting some error:
I am giving last few lines of the error log:
INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2016-05-12 01:07:01,759] INFO Log directory '/var/logs/service-bridge-logs' not found, creating it. (kafka.log.LogManager)
[2016-05-12 01:07:01,778] INFO Loading logs. (kafka.log.LogManager)
[2016-05-12 01:07:01,796] INFO Logs loading complete. (kafka.log.LogManager)
[2016-05-12 01:07:01,797] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2016-05-12 01:07:01,806] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2016-05-12 01:07:01,874] INFO Awaiting socket connections on gns3-d.cloudapp.net:9092. (kafka.network.Acceptor)
[2016-05-12 01:07:01,875] INFO [Socket Server on Broker 2], Started (kafka.network.SocketServer)
[2016-05-12 01:07:02,042] INFO Will not load MX4J, mx4j-tools.jar is not in the classpath (kafka.utils.Mx4jLoader$)
[2016-05-12 01:07:02,168] INFO 2 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2016-05-12 01:07:02,386] INFO Registered broker 2 at path /brokers/ids/2 with address 10.1.0.4:9092. (kafka.utils.ZkUtils$)
[2016-05-12 01:07:02,416] INFO [Kafka Server 2], started (kafka.server.KafkaServer)
[2016-05-12 01:07:02,529] INFO New leader is 2 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-05-12 01:07:25,798] ERROR Closing socket for /40.122.64.23 because of error (kafka.network.Processor)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at kafka.utils.Utils$.read(Utils.scala:380)
at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:444)
at kafka.network.Processor.run(SocketServer.scala:340)
at java.lang.Thread.run(Thread.java:745)
while in zookeeper sidealso am getting few error:
INFO Established session 0x154a35b64f40000 with negotiated timeout 6000 for client /10.1.0.4:36673 (org.apache.zookeeper.server.ZooKeeperServer)
[2016-05-12 01:07:02,313] INFO Got user-level KeeperException when processing sessionid:0x154a35b64f40000 type:delete cxid:0x1d zxid:0x52 txntype:-1 reqpath:n/a Error Path:/admin/preferred_replica_election Error:KeeperErrorCode = NoNode for /admin/preferred_replica_election (org.apache.zookeeper.server.PrepRequestProcessor)
[2016-05-12 01:08:33,001] INFO Expiring session 0x154a35b64f40000, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2016-05-12 01:08:33,001] INFO Processed session termination for sessionid: 0x154a35b64f40000 (org.apache.zookeeper.server.PrepRequestProcessor)
[2016-05-12 01:08:33,017] INFO Closed socket connection for client /10.1.0.4:36673 which had sessionid 0x154a35b64f40000 (org.apache.zookeeper.server.NIOServerCnxn)
Any idea guys??
Thanks in advance..
As user avr pointed out, this is a known bug in Kafka 0.8.2.x
These are routine informational messages misclassified as ERROR.
As of Kafka v0.9.0.0, the loglevel has been corrected to WARN.