Spring Kafka does not shutdown properly

Spring Kafka does not shutdown properly - apache-kafka

I have configured ConcurrentMessageListenerContainer with concurrency of 3 to consume from 3 partitions, and also KafkaTemplate with producerFactory that produces messages to 3 partitions. Spring bean is configured with destroy-method to invoke stop() of the consumer listener container and the producer at application shutdown time. After shutdown, the log is shown below which looks like consumers has stopped, but there is no info if the producer is stopped or not.
[kafkaContainer-0-C-1] INFO org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer - Consumer stopped
[kafkaContainer-2-C-1] INFO org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer - Consumer stopped
[kafkaContainer-1-C-1] INFO org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer - Consumer stopped
but the application is not shutting down completely. Doing a jstack command shows 3 ThreadPoolTaskScheduler still running in the background. Output snippet of jstack command:
"ThreadPoolTaskScheduler-1" #74 prio=5 os_prio=0 tid=0x00007f8564f23800 nid=0x77e5 waiting on condition [0x00007f8525292000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000ec8f0808> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The above block is printed for each MessageListener, I mean if I set the concurrency to 5, then jstack output contains the above message 5 times. So I think even if consumer listener is logged as shutdown it is internally not shutting down completely.
I'm I missing something on shutting down the producer and consumer properly?

You don't say which version you are using.
This is fixed in 2.1.0, 2.0.2 and 1.3.2.
You don't need to stop() the container from a destroy() method; the context will stop the consumer when it is closed.

Related

Why does the message "The Critical Analyzer detected slow paths on the broker" mean in Artemis broker?

Setup: I have an artemis broker HA cluster with 3 brokers. The replication policy is replication. Each broker is running in its own VM.
Problem: When I leave my brokers running for long time, usually after 5-6 hours, I get the below error.
2022-11-21 21:32:37,902 WARN
[org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager
is expired on path 0 2022-11-21 21:32:37,902 INFO
[org.apache.activemq.artemis.core.server] AMQ224107: The Critical
Analyzer detected slow paths on the broker. It is recommended that
you enable trace logs on org.apache.activemq.artemis.utils.critical
while you troubleshoot this issue. You should disable the trace logs
when you have finished troubleshooting. 2022-11-21 21:32:37,902 ERROR
[org.apache.activemq.artemis.core.server] AMQ224079: The process for
the virtual machine will be killed, as component
org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager#46d59067
is not responsive 2022-11-21 21:32:37,969 WARN
[org.apache.activemq.artemis.core.server] AMQ222199: Thread dump:
******************************************************************************* Complete Thread dump "Thread-517
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7#437da279)"
Id=602 TIMED_WAITING on
java.util.concurrent.SynchronousQueue$TransferStack#75f49105
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.SynchronousQueue$TransferStack#75f49105
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
What does this really mean? I understand that the critical analyzer sees an error and it halts the broker but what is causing this error?

You may take a look at the documentation. Basically you are experiencing some issue tat the broker detects and it shuts down before it becomes too irresponsive. Setting the policy to LOG you might get more clues on the issue.

Kafka streams apps on same machine

I run into issues when starting two streams applications from the same machine. It has to do with the Cooperative rebalancing protocol. For some reason when the second one comes up the first one crashes with:
[streams-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [streams-stream-1-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[streams-StreamThread-1] INFO org.apache.kafka.streams.KafkaStreams - stream-client [streams-app.id-1] State transition from RUNNING to ERROR
[streams-StreamThread-1] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [streams-1] All stream threads have died. The instance will be in error state and should be closed.
[streams-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [streams-StreamThread-1] Shutdown complete
[streams-StreamThread-1] ERROR app.id - Uncaught exception in Streams thread (streams-StreamThread-1)
java.lang.IllegalStateException: Assignor supporting the COOPERATIVE protocol violates its requirements
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.validateCooperativeAssignment(ConsumerCoordinator.java:668)
I have tried using a separate state.dir on the two instances but that didn’t seem to work for me either.
Can be ignored, see first comment: Further, I notice the same behavior when trying to start two consumers with the CooperativeStickyAssignor configured. Is there something with regards to Cooperative Rebalancing configuration that I am missing?

Root cause for Connection broken for id 1, my id = 3, error =

I am using Confluent 4 for kafka and zookeeper installation.
On our Kafka Cluster environment (of 3 brokers and 3 zookeeper nodes running on 3 aws instances)
we are seeing a set of below warnings, repeatedly getting recorded in the broker's server.log file.
We have not observed any functionality issues due to this yet, but we are not able to find the root cause and there may be a chance in future it will affect the clients or other broker nodes. We are not sure yet about this. Below is the set of warnings
[2018-04-03 12:00:40,707] WARN Interrupted while waiting for message on queue (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1097)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:932)
[2018-04-03 12:00:40,707] WARN Connection broken for id 1, my id = 3, error = (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1013)
[2018-04-03 12:00:40,708] WARN Interrupting SendWorker (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2018-04-03 12:00:40,707] WARN Send worker leaving thread (org.apache.zookeeper.server.quorum.QuorumCnxManager)
This set of warnings get repeated and getting observed in all 3 kafka nodes.
If anyone has any idea about why this warning gets generate, then please let me know.
Thanks in advance.

This sounds like a known issue with newer version of Zk, Check out this JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-2938

In my case, I was replacing a ZK node and the old one was still running which I didn't realize. So I had created 2x nodes with the same "myid".

If zookeeper leader process is killed, should all followers get exceptions and restart too?

I'm working on a project using Zookeeper 3.4.6, and am performing some failure mode testing. While doing so, I found (what I think is) unexpected behaviour.
Should followers restart if the leader Zookeeper process is killed?
Environment:
OS: Windows Server 2008 R2 (hosted in a Tanuki Java service wrapper)
Zookeeper: 3.4.6
Java JDK: 1.7.0.210
Tests:
The test is to kill Zookeeper processes and make sure the cluster recovers.
If I kill a non-leader process, it restarts and rejoins the cluster without affecting other nodes.
If I kill the leader process, the leader and followers restart. This doesn't seem right, as there's a period of time where clients can't connect to any Zookeeper node.
I've tried both TCP and UDP communication settings, but both exhibit the same behaviour. UDP is twice as quick to recover though.
Zookeeper settings
tickTime=2000
initLimit=5
syncLimit=2
minSessionTimeout=5000
maxSessionTimeout=120000
dataDir=C:\\ProgramData\\Saab OneView\\ZooKeeper\\zoo-data
clientPort=2181
leaderServes=yes
autopurge.purgeInterval=24
# IP addresses blanked out here
server.1=0.0.0.1:2888:3888
server.2=0.0.0.2:2888:3888
server.3=0.0.0.3:2888:3888
server.4=0.0.0.4:2888:3888
server.5=0.0.0.5:2888:3888
# This is for zookeeper->zookeeper communication
# I've tried both settings, UDP has faster recovery time
# 0 = UDP
# 3 = TCP (default)
electionAlg=3
Sample follower exception causing shutdown
20160309 05:35:51.958Z 20160309 05:35:51.958 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#780] - Connection broken for id 4, my id = 3, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
20160309 05:35:51.959Z 20160309 05:35:51.959 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#783] - Interrupting SendWorker
20160309 05:35:51.959Z 20160309 05:35:51.959 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower#89] - Exception when following the leader
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
20160309 05:35:51.960Z 20160309 05:35:51.960 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower#166] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)

Based on ZOOKEEPER-3478 it is an expected behaviour:
It is normal behaviour that all the followers shutdown during a leader election. Since there is no leader after a leader crash, the servers that used to be followers are not followers anymore. So the followers shutdown and go back to LOOKING state in order to find the new leader.

Apache Kafka - Consumer not receiving messages from producer

I would appreciate your help on this.
I am building a Apache Kafka consumer to subscribe to another already running Kafka. Now, my problem is that when my producer pushes message to server...my consumer does not receive them .. and I get the below info in my logs printed::
13/08/30 18:00:58 INFO producer.SyncProducer: Connected to xx.xx.xx.xx:6667:false for producing
13/08/30 18:00:58 INFO producer.SyncProducer: Disconnecting from xx.xx.xx.xx:6667:false
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] Stopping leader finder thread
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] Stopping all fetchers
13/08/30 18:00:58 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager- 1377910855898] All connections stopped
I am not sure if I am missing any important configuration here...However, I can see some messages coming from my server using WireShark but they are not getting consumed by my consumer....
My code is the exact replica of the sample consumer example:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
UPDATE:
[2013-09-03 00:57:30,146] INFO Starting ZkClient event thread.
(org.I0Itec.zkclient.ZkEventThread)
[2013-09-03 00:57:30,146] INFO Opening socket connection to server /xx.xx.xx.xx:2181 (org.apache.zookeeper.ClientCnxn)
[2013-09-03 00:57:30,235] INFO Connected to xx.xx.xx:6667 for producing (kafka.producer.SyncProducer)
[2013-09-03 00:57:30,299] INFO Socket connection established to 10.224.62.212/10.224.62.212:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2013-09-03 00:57:30,399] INFO Disconnecting from xx.xx.xx.net:6667 (kafka.producer.SyncProducer)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] Stopping leader finder thread (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] Stopping all fetchers (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [ConsumerFetcherManager-1378195030845] All connections stopped (kafka.consumer.ConsumerFetcherManager)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx-1378195030443-cce6fc51], Cleared all relevant queues for this fetcher (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx.-1378195030443-cce6fc51], Cleared the data chunks in all the consumer message iterators (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,400] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], Committing all offsets after clearing the fetcher queues (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] ERROR [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], zk client is null. Cannot commit offsets (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], Releasing partition ownership (kafka.consumer.ZookeeperConsumerConnector)
[2013-09-03 00:57:30,401] INFO [console-consumer-49997_xx.xx.xx.xx-1378195030443-cce6fc51], exception during rebalance (kafka.consumer.ZookeeperConsumerConnector)
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:185)
at scala.None$.get(Option.scala:183)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance$2.apply(ZookeeperConsumerConnector.scala:434)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance$2.apply(ZookeeperConsumerConnector.scala:429)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(ZookeeperConsumerConnector.scala:429)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(ZookeeperConsumerConnector.scala:374)
at scala.collection.immutable.Range$ByOne$class.foreach$mVc$sp(Range.scala:282)
at scala.collection.immutable.Range$$anon$2.foreach$mVc$sp(Range.scala:265)
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:369)
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:681)
at kafka.consumer.ZookeeperConsumerConnector$WildcardStreamsHandler.<init>(ZookeeperConsumerConnector.scala:715)
at kafka.consumer.ZookeeperConsumerConnector.createMessageStreamsByFilter(ZookeeperConsumerConnector.scala:140)
at kafka.consumer.ConsoleConsumer$.main(ConsoleConsumer.scala:196)
at kafka.consumer.ConsoleConsumer.main(ConsoleConsumer.scala)

Can you please provide your producer code sample?
Do you have the latest 0.8 version checked out? It appears that there has been some known issue with consumerFetched deadlock which has been patched and fixed in the current version
you can try to use the admin console script to consume messages making sure your producer is working fine.
If possible post some more logs and code snippet, should help debugging further
(it seems I need more reputation to make a comment so had to answer instead)