Kafka Node crashes intermittently when data is pushed from producer - apache-kafka

We have a 3 node Kafka cluster (version 5.2.1, apache kafka version: 2.2.0) in our environment. For sometime we have been observing an exception which happens intermittently whenever we try to push data from a test producer. Following is the exception:
[Log partition=debug-topic-1, dir=/tmp/kafka-logs] Found deletable segments with base offsets [4] due to retention time 604800000ms breach (kafka.log.Log:66)
[2020-04-20 22:42:39,303] INFO [ProducerStateManager partition=debug-topic-1] Writing producer snapshot at offset 5 (kafka.log.ProducerStateManager:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Rolled new log segment at offset 5 in 1 ms. (kafka.log.Log:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Scheduling log segment [baseOffset 4, size 84] for deletion. (kafka.log.Log:66)
[2020-04-20 22:42:39,310] ERROR Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel:76)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
at kafka.log.Log.deleteSegment(Log.scala:1909)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.deleteSegments(Log.scala:1446)
at kafka.log.Log.deleteOldSegments(Log.scala:1441)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
at kafka.log.Log.deleteOldSegments(Log.scala:1509)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
... 30 more
[2020-04-20 22:42:39,311] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler:76)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs
Caused by: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
at kafka.log.Log.deleteSegment(Log.scala:1909)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.deleteSegments(Log.scala:1446)
at kafka.log.Log.deleteOldSegments(Log.scala:1441)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
at kafka.log.Log.deleteOldSegments(Log.scala:1509)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
... 30 more
We have other producers which push data continuously to different topics in the cluster, but the aforementioned issue never happens.
I have tried to delete and recreate this topic debug-topic-1 several times to ensure that no corrupt or faulty state in present in zookeeper as well as in the kakfa logs. But still this problem occurs after sometime eventually.
If anyone has encountered similar problem and was able to get through it kindly let me know.

Seems like your machine might have rebooted or that /tmp was cleared in some other way.
You must change Kafka log.dirs (and Zookeeper dataDir) to not use /tmp

Related

Kafka crashing after upgrade

I upgraded my kafka from 2.2.1 to 3.1.0.
After upgrade one of the broker stopped having below exception.
2022-05-09 13:36:15,051 ERROR kafka.server.ReplicaManager: [ReplicaManager broker=344] Error processing append operation on partition grpc_flow_logs_prod-0
java.lang.NullPointerException
at kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$4(LogValidator.scala:428)
at scala.Option.orElse(Option.scala:447)
at kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$3(LogValidator.scala:426)
at scala.Option.orElse(Option.scala:447)
at kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$2(LogValidator.scala:426)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:422)
at java.lang.Iterable.forEach(Iterable.java:75)
at kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:407)
at kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:112)
at kafka.log.UnifiedLog.$anonfun$append$2(UnifiedLog.scala:802)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:1707)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:718)
at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1057)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1045)
at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:924)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:912)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:583)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:658)
at kafka.server.KafkaApis.handle(KafkaApis.scala:169)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(Thread.java:745)

Spark streaming, kafka broker error, "Failed to get records for spark-executor- .... after polling for 512"

We have a spark streaming application reading data from Kafka.
Data size: 15 Million
Below errors were seen:
java.lang.AssertionError: assertion failed: Failed to get records for spark-executor- after ...polling for 512 at scala.Predef$.assert(Predef.scala:170)
There were more errors seen pertaining to CachedKafkaConsumer
at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The spark.streaming.kafka.consumer.poll.ms is set to default 512ms and other Kafka stream timeout settings are default.
"request.timeout.ms"
"heartbeat.interval.ms"
"session.timeout.ms"
"max.poll.interval.ms"
Also, the Kafka is being recently updated to 0.10 from 0.8. In 0.8, this behavior was not seen.
No resource issues are seen.
Any pointers?

windows kafka java.nio.file.FileSystemException

Very frequently error in windows server 2012
kafka verson 2.3.1
the error log
[2019-12-05 03:57:51,567] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for MetadataLog-0 in dir D:\GpsPlatform\kafka\.\tmp\kafka-logs
Caused by: java.nio.file.FileSystemException: D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index -> D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index.deleted: 另一个程序正在使用此文件,进程无法访问。
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1425)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:815)
at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:509)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1982)
at kafka.log.Log.deleteSegment(Log.scala:1967)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1493)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1493)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1493)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2085)
at kafka.log.Log.deleteSegments(Log.scala:1484)
at kafka.log.Log.deleteOldSegments(Log.scala:1479)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1557)
at kafka.log.Log.deleteOldSegments(Log.scala:1547)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:914)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:911)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:911)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
Suppressed: java.nio.file.FileSystemException: D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index -> D:\GpsPlatform\kafka\.\tmp\kafka-logs\MetadataLog-0\00000000000003368617.index.deleted: 另一个程序正在使用此文件,进程无法访问。
at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309)
at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1425)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:812)
... 29 more
after running for a period of time, a similar exception will be reported, causing Kafka to crash. How to completely resolve this exception?
If you have to use kafka in windows environment. You have to disable log retention.
In Kafka server.properties
log.retention.hours=-1
log.cleaner.enable=false
# Remove any other rows start from log.retention.*
To run Kafka on Windows it's recommended to do so using WSL2 as detailed here. Otherwise you encounter the kind of problems described above.

kafka.common.KafkaStorageException: I/O exception in append to log

I get some big problems of kafka,when I shutdown my consumer application then change a groupId and restart it,my kafka brokers will stop working, this is the stack trace I get
[2016-07-11 17:02:47,314] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from [__consumer_offsets,0] (kafka.coordinator.GroupMetadataManager)
[2016-07-11 17:02:47,955] FATAL [Replica Manager on Broker 0]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-38'
at kafka.log.Log.append(Log.scala:318)
at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
at kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
at scala.Option.foreach(Option.scala:236)
at kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-38/00000000000000000000.index (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
Probably your /tmp is automatically cleaned up i.e. systemd-tmpfiles.
https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html

kafka 0.9.0.1 fails to start with fatal exception

I see deleting and rebuilding some index. Found that its expected in 0.9.0.1
but after that it fails saying unsafe memory access, any hints on this?
2016-03-16 22:14:01,113] WARN Found a corrupted index file, /kafka_data/kafkain-3655/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,137] WARN Found a corrupted index file, /kafka_data/kafkain-1172/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,151] WARN Found a corrupted index file, /kafka_data/kafkain-2362/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,152] ERROR There was an error in one of the threads during logs loading: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code (kafka.log.LogManager)
[2016-03-16 22:14:01,154] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.<init>(Log.scala:90)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-03-16 22:14:01,158] INFO shutting down (kafka.server.KafkaServer)
This error could be due to the fact that the node is out of space in log.dirs. In itself, the removal and rebuilding of the index - it's not terrible, but if space is insufficient, the node can not be started. If replication factor allows it, you can simply remove the part of the log, then after they run normally all data replicated