Artemis master service got shutdown with IO Error - activemq-artemis

Below is the scenario from our client environment so that it may help you to give the inputs as required:
Master and slave were shutdown
After that only Master service was started and client missed (might have forgotten) to start the Slave service
Master got shutdown with the below IO error message after sometime.
2020-12-07 10:45:40,717 ERROR [org.apache.activemq.artemis.journal] AMQ144002: Error pushing opened file: ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149000: failed to rename file activemq-data-495656.amq.tmp to activemq-data-495656.amq]
at org.apache.activemq.artemis.core.io.AbstractSequentialFile.renameTo(AbstractSequentialFile.java:160) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile0(JournalFilesRepository.java:633) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile(JournalFilesRepository.java:574) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.takeFile(JournalFilesRepository.java:535) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.pushOpenedFile(JournalFilesRepository.java:486) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository$1.run(JournalFilesRepository.java:92) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_271]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
2020-12-07 10:45:40,717 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149000: failed to rename file activemq-data-495656.amq.tmp to activemq-data-495656.amq]
at org.apache.activemq.artemis.core.io.AbstractSequentialFile.renameTo(AbstractSequentialFile.java:160) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile0(JournalFilesRepository.java:633) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile(JournalFilesRepository.java:574) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.takeFile(JournalFilesRepository.java:535) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.pushOpenedFile(JournalFilesRepository.java:486) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository$1.run(JournalFilesRepository.java:92) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_271]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
So we would like to know if the rename of the file failures is due to the file system issues (or) the slave being in stopped could also cause this. If it needs additional question to be raised we will do the same.

The slave being stopped would certainly not cause this problem. Even if it is running the slave won't access any of the journal files if the master is running because the master will hold a lock. To be clear, the master has no strict dependency on the slave. If the slave isn't running the master will simply operate normally.
Therefore the only conclusion that seems at all plausible is that something went wrong with file system.
There is also always the possibility of a bug, although I've never seen any bugs reported for this particular operation. If you are able to reproduce this failure (even intermittently) then please file a Jira.

Related

ActiveMQ Artemis service is shutting down with AMQ144002: Error pushing opened file

There are Master and Slave servers configured in Artemis Cluster. Master service got shut down with the following exceptions in service-out logs:
2020-12-07 10:45:40,717 ERROR [org.apache.activemq.artemis.journal] AMQ144002: Error pushing opened file: ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149000: failed to rename file activemq-data-495656.amq.tmp to activemq-data-495656.amq]
at org.apache.activemq.artemis.core.io.AbstractSequentialFile.renameTo(AbstractSequentialFile.java:160) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile0(JournalFilesRepository.java:633) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile(JournalFilesRepository.java:574) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.takeFile(JournalFilesRepository.java:535) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.pushOpenedFile(JournalFilesRepository.java:486) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository$1.run(JournalFilesRepository.java:92) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_271]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
2020-12-07 10:45:40,717 WARN [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149000: failed to rename file activemq-data-495656.amq.tmp to activemq-data-495656.amq]
at org.apache.activemq.artemis.core.io.AbstractSequentialFile.renameTo(AbstractSequentialFile.java:160) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile0(JournalFilesRepository.java:633) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.createFile(JournalFilesRepository.java:574) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.takeFile(JournalFilesRepository.java:535) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.pushOpenedFile(JournalFilesRepository.java:486) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository$1.run(JournalFilesRepository.java:92) [artemis-journal-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.11.0.jar:2.11.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_271]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_271]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.11.0.jar:2.11.0]
ActiveMQ Artemis will shut itself down on any IO error deemed "critical." This is because the broker is essentially worthless if it can't work with the underlying persistent store. Therefore, it's best just to shut down and fix the underlying issue than to keep trying IO operations & failing.
In this case the broker is trying to rename one of the journal files but the rename operation fails. The broker in calling java.io.File.renameTo(File) which returns false if the rename fails for any reason. Unfortunately renameTo doesn't specify why the rename failed.
Since you're using HA all the clients should fail-over to the backup and continue operating normally. My recommendation would be to inspect the filesystem for potential issues and restart the master broker when ready.

Getting so many Exceptions while running the kafka connect worker

Getting so many exceptions while running the Kafka Connect worker.
I have set all the worker properties and all the jar paths looks fine.
The exceptions are below:
2020-07-23 18:41:58 WARN Reflections:104 - could not create Dir
using jarFile from url
file:/kafka/bin/../clients/build/libs/kafka-clients*.jar. skipping.
java.lang.NullPointerException
at java.util.zip.ZipFile.<init>(ZipFile.java:213)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.reflections.vfs.Vfs$DefaultUrlTypes$1.createDir(Vfs.java:216)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:99)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:240)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader$InternalReflections.scan(DelegatingClassLoader.java:373)
at org.reflections.Reflections$1.run(Reflections.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-07-23 18:41:58 WARN Reflections:377 - could not create Vfs.Dir from url. ignoring the exception and continuing
org.reflections.ReflectionsException: Could not open url connection at org.reflections.vfs.JarInputDir$1$1.<init>(JarInputDir.java:37)
at org.reflections.vfs.JarInputDir$1.iterator(JarInputDir.java:33)
at org.reflections.Reflections.scan(Reflections.java:243)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader$InternalReflections.scan(DelegatingClassLoader.java:373)
at org.reflections.Reflections$1.run(Reflections.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /kafka/bin/../clients/build/libs/kafka-clients*.jar (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138) -
kafkaconnectsladev.log
If you are seeing WARN Reflections, then this is a warning, and not an error, and it is safe to ignore.
You can edit the log4j.properties file to silence the warnings, if you want. Using the Confluent Docker images, this is done via the CONNECT_LOG4J_LOGGERS variable
Thanks guys, yes these are all warnings and with newer version of the kafka client library, i am not seeing these.

windows kafka java.nio.file.FileSystemException

Very frequently error in windows server 2012
kafka verson 2.3.1
the error log
[2019-12-05 03:57:51,567] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for MetadataLog-0 in dir D:\GpsPlatform\kafka\.\tmp\kafka-logs
Caused by: java.nio.file.FileSystemException: D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index -> D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index.deleted: 另一个程序正在使用此文件,进程无法访问。
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1425)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:815)
at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:509)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1982)
at kafka.log.Log.deleteSegment(Log.scala:1967)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1493)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1493)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1493)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2085)
at kafka.log.Log.deleteSegments(Log.scala:1484)
at kafka.log.Log.deleteOldSegments(Log.scala:1479)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1557)
at kafka.log.Log.deleteOldSegments(Log.scala:1547)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:914)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:911)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:911)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
Suppressed: java.nio.file.FileSystemException: D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index -> D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index.deleted: 另一个程序正在使用此文件,进程无法访问。
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1425)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:812)
... 29 more
after running for a period of time, a similar exception will be reported, causing Kafka to crash. How to completely resolve this exception?
If you have to use kafka in windows environment. You have to disable log retention.
In Kafka server.properties
log.retention.hours=-1
log.cleaner.enable=false
# Remove any other rows start from log.retention.*
To run Kafka on Windows it's recommended to do so using WSL2 as detailed here. Otherwise you encounter the kind of problems described above.

kafka 0.9.0.1 fails to start with fatal exception

I see deleting and rebuilding some index. Found that its expected in 0.9.0.1
but after that it fails saying unsafe memory access, any hints on this?
2016-03-16 22:14:01,113] WARN Found a corrupted index file, /kafka_data/kafkain-3655/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,137] WARN Found a corrupted index file, /kafka_data/kafkain-1172/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,151] WARN Found a corrupted index file, /kafka_data/kafkain-2362/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,152] ERROR There was an error in one of the threads during logs loading: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code (kafka.log.LogManager)
[2016-03-16 22:14:01,154] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.<init>(Log.scala:90)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-03-16 22:14:01,158] INFO shutting down (kafka.server.KafkaServer)
This error could be due to the fact that the node is out of space in log.dirs. In itself, the removal and rebuilding of the index - it's not terrible, but if space is insufficient, the node can not be started. If replication factor allows it, you can simply remove the part of the log, then after they run normally all data replicated

KafkaSpout keeps throwing OutOfMemory error

KafkaSpout keeps throwing OutOfMemory error whenever I try to deploy multiple topologies. I checked the file descriptors, memory, checked the worker logs.
java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) at org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:237) at storm.kafka.ZkState.<init>(ZkState.java:62) at storm.kafka.KafkaSpout.open(KafkaSpout.java:85) at backtype.storm.daemon.executor$fn__3373$fn__3388.invoke(executor.clj:522) at backtype.storm.util$async_loop$fn__464.invoke(util.clj:461) at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745)
Error from worker log file:
2015-02-07T05:43:48.657+0000 o.a.z.ClientCnxn [DEBUG] Reading reply sessionid:0x34afd2eb46d25ec, packet:: clientPath:null serverPath:null finished:false header:: 33,4 replyHeader:: 33,-1,0 request:: '/brokers/ids/2,F response:: #7b226a6d785f706f7274223a31313036312c2274696d657374616d70223a2231343137353737373732343937222c22686f7374223a2264616c2d6b61666b612d62726f6b657230312e6266642e77616c6d6172742e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,s{30064774364,30064774364,1417577772497,1417577772497,0,0,0,164959970529443843,114,0,30064774364}
2015-02-07T05:43:48.657+0000 b.s.util [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.3.jar:0.9.3]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
kill 5232: No such processorker$fn__3812$fn__3813.invoke(worker.clj:456) [storm-core-0.9.3.jar:0.9.3] kill 5232: No such process
at backtype.storm.daemon.executor$mk_executor_data$fkill 5491: No such processecutor.clj:240) [storm-core-0.9.3.jar:0.9.3] kill 5491: No such processacktype.storm.daemon.executor$mk_executor$fn__3312.invoke(executor.clj:334) [storm-core-0.9.3.jar:0.9.3]
at backtype.storm.daemon.executor$mk_executor.invoke(executor.clj:334) [storm-core-0.9.3.jar:0.9.3]
at backtype.storm.daemon.worker$fn__3743$exec_fn__1108__auto____3744$iter__3749__3753$fn__3754.invoke(worker.clj:382) [storm-core-0.9.3.jar:0.9.3]
I found the answer for the solution. The reason for this was the limit on the number of max user processes, which can be seen using ulimit -a.
The solution to increase this number is discussed here:
How do I change the number of open files limit in Linux?