I see deleting and rebuilding some index. Found that its expected in 0.9.0.1
but after that it fails saying unsafe memory access, any hints on this?
2016-03-16 22:14:01,113] WARN Found a corrupted index file, /kafka_data/kafkain-3655/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,137] WARN Found a corrupted index file, /kafka_data/kafkain-1172/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,151] WARN Found a corrupted index file, /kafka_data/kafkain-2362/00000000000000000000.index, deleting and rebuilding index... (kafka.log.Log)
[2016-03-16 22:14:01,152] ERROR There was an error in one of the threads during logs loading: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code (kafka.log.LogManager)
[2016-03-16 22:14:01,154] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
at kafka.log.LogSegment.recover(LogSegment.scala:199)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
at kafka.log.Log.loadSegments(Log.scala:160)
at kafka.log.Log.<init>(Log.scala:90)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-03-16 22:14:01,158] INFO shutting down (kafka.server.KafkaServer)
This error could be due to the fact that the node is out of space in log.dirs. In itself, the removal and rebuilding of the index - it's not terrible, but if space is insufficient, the node can not be started. If replication factor allows it, you can simply remove the part of the log, then after they run normally all data replicated
Related
We have a 3 node Kafka cluster (version 5.2.1, apache kafka version: 2.2.0) in our environment. For sometime we have been observing an exception which happens intermittently whenever we try to push data from a test producer. Following is the exception:
[Log partition=debug-topic-1, dir=/tmp/kafka-logs] Found deletable segments with base offsets [4] due to retention time 604800000ms breach (kafka.log.Log:66)
[2020-04-20 22:42:39,303] INFO [ProducerStateManager partition=debug-topic-1] Writing producer snapshot at offset 5 (kafka.log.ProducerStateManager:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Rolled new log segment at offset 5 in 1 ms. (kafka.log.Log:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Scheduling log segment [baseOffset 4, size 84] for deletion. (kafka.log.Log:66)
[2020-04-20 22:42:39,310] ERROR Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel:76)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
at kafka.log.Log.deleteSegment(Log.scala:1909)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.deleteSegments(Log.scala:1446)
at kafka.log.Log.deleteOldSegments(Log.scala:1441)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
at kafka.log.Log.deleteOldSegments(Log.scala:1509)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
... 30 more
[2020-04-20 22:42:39,311] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler:76)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs
Caused by: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
at kafka.log.Log.deleteSegment(Log.scala:1909)
at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.deleteSegments(Log.scala:1446)
at kafka.log.Log.deleteOldSegments(Log.scala:1441)
at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
at kafka.log.Log.deleteOldSegments(Log.scala:1509)
at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
... 30 more
We have other producers which push data continuously to different topics in the cluster, but the aforementioned issue never happens.
I have tried to delete and recreate this topic debug-topic-1 several times to ensure that no corrupt or faulty state in present in zookeeper as well as in the kakfa logs. But still this problem occurs after sometime eventually.
If anyone has encountered similar problem and was able to get through it kindly let me know.
Seems like your machine might have rebooted or that /tmp was cleared in some other way.
You must change Kafka log.dirs (and Zookeeper dataDir) to not use /tmp
Thanks in advance. Please help me to resolve this below mentioned kafka error.
00000000000000.txnindex and rebuilding index... (kafka.log.Log)
[2018-09-25 12:48:05,462] ERROR There was an error in one of the threads during logs loading: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code (kafka.log.LogManager)
[2018-09-25 12:48:05,469] FATAL [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.record.FileLogInputStream$FileChannelRecordBatch.loadBatchWithSize(FileLogInputStream.java:209)
at org.apache.kafka.common.record.FileLogInputStream$FileChannelRecordBatch.loadFullBatch(FileLogInputStream.java:192)
at org.apache.kafka.common.record.FileLogInputStream$FileChannelRecordBatch.ensureValid(FileLogInputStream.java:164)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:263)
at kafka.log.LogSegment$$anonfun$recover$1.apply(LogSegment.scala:262)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
My guess is that your Kafka log directory (log.dirs in server.properties) is out of memory. When Kafka attempts to rebuild the index, there is insufficient memory and therefore Kafka broker cannot be started.
Assuming log.dirs=/var/lib/kafka
df -hT /var/lib/kafka
will display storage usage for your log directory.
I am having a 3 node Kafka Cluster. One of the broker is not starting, i am getting below error. I have tried deleting index files but still, same error coming. Please help to understand what is this issue and how can I recover.
INFO [2018-09-05 11:58:49,585] kafka.log.Log:[Logging$class:info:66] - [pool-4-thread-1] - [Log partition=Topic3-15, dir=/var/lib/kafka/kafka-logs] Completed load of log with 1 segments, log start offset 11547004 and log end offset 11559178 in 1552 ms
INFO [2018-09-05 11:58:49,589] kafka.log.Log:[Logging$class:info:66] - [pool-4-thread-1] - [Log partition=Topic3-13, dir=/var/lib/kafka/kafka-logs] Recovering unflushed segment 12399433
ERROR [2018-09-05 11:58:49,591] kafka.log.LogManager:[Logging$class:error:74] - [main] - There was an error in one of the threads during logs loading: java.lang.IllegalArgumentException: inconsistent range
WARN [2018-09-05 11:58:49,591] kafka.log.Log:[Logging$class:warn:70] - [pool-4-thread-1] - [Log partition=Topic3-35, dir=/var/lib/kafka/kafka-logs] Found a corrupted index file corresponding to log file /var/lib/kafka/kafka-logs/Topic3-35/00000000000011110038.log due to Corrupt time index found, time index file (/var/lib/kafka/kafka-logs/Topic3-35/00000000000011110038.timeindex) has non-zero size but the last timestamp is 0 which is less than the first timestamp 1536129815049}, recovering segment and rebuilding index files...
INFO [2018-09-05 11:58:49,594] kafka.log.ProducerStateManager:[Logging$class:info:66] - [pool-4-thread-1] - [ProducerStateManager partition=Topic3-35] Loading producer state from snapshot file '/var/lib/kafka/kafka-logs/Topic3-35/00000000000011110038.snapshot'
ERROR [2018-09-05 11:58:49,599] kafka.server.KafkaServer:[MarkerIgnoringBase:error:159] - [main] - [KafkaServer id=2] Fatal error during KafkaServer startup. Prepare to shutdown
java.lang.IllegalArgumentException: inconsistent range
at java.util.concurrent.ConcurrentSkipListMap$SubMap.(ConcurrentSkipListMap.java:2620)
at java.util.concurrent.ConcurrentSkipListMap.subMap(ConcurrentSkipListMap.java:2078)
at java.util.concurrent.ConcurrentSkipListMap.subMap(ConcurrentSkipListMap.java:2114)
at kafka.log.Log$$anonfun$12.apply(Log.scala:1561)
at kafka.log.Log$$anonfun$12.apply(Log.scala:1560)
at scala.Option.map(Option.scala:146)
at kafka.log.Log.logSegments(Log.scala:1560)
at kafka.log.Log.kafka$log$Log$$recoverSegment(Log.scala:358)
at kafka.log.Log.recoverLog(Log.scala:448)
at kafka.log.Log.loadSegments(Log.scala:421)
at kafka.log.Log.(Log.scala:216)
at kafka.log.Log$.apply(Log.scala:1747)
at kafka.log.LogManager.kafka$log$LogManager$$loadLog(LogManager.scala:255)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$11$$anonfun$apply$15$$anonfun$apply$2.apply$mcV$sp(LogManager.scala:335)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO [2018-09-05 11:58:49,606] kafka.server.KafkaServer:[Logging$class:info:66] - [main] - [KafkaServer id=2] shutting down
I have a problem installing Kafka on windows.
I installed kafka cluster of 3 instances in 3 differents servers (each server contain one kafka and zookeeper )
all work well, but when an instance kafka stop (or fail ) , when i try to restart this instance .
I have this error message:
`[2017-11-10 15:17:53,999] INFO [ThrottledRequestReaper-Fetch]: Starting
(kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2017-11-10 15:17:53,999] INFO [ThrottledRequestReaper-Produce]: Starting
(kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2017-11-10 15:17:53,999] INFO [ThrottledRequestReaper-Request]: Starting
(kafka.server.ClientQuotaManager$ThrottledRequestReaper)
[2017-11-10 15:17:54,109] INFO Loading logs. (kafka.log.LogManager)
[2017-11-10 15:17:54,171] WARN Found a corrupted index file due to
requirement failed: Corrupt index found, index file (C:\Tools\Kafka\kafka-
logs\hubone.dataCollect.orbiwise.ArchiveQueue-0\00000000000000000015.index)
has non-zero size but the last offset is 15 which is no larger than the base
offset 15.}. deleting C:\Tools\Kafka\kafka-
logs\hubone.dataCollect.orbiwise.Arch`iveQueue-
0\00000000000000000015.timeindex, C:\Tools\Kafka\kafka-
logs\hubone.dataCollect.orbiwise.ArchiveQueue-0\00000000000000000015.index,
and C:\Tools\Kafka\kafka-logs\hubone.dataCollect.orbiwise.ArchiveQueue-
0\00000000000000000015.txnindex and rebuilding index... (kafka.log.Log)
[2017-11-10 15:17:54,171] ERROR There was an error in one of the threads
**during logs loading: java.nio.file.FileSystemException:
C:\Tools\Kafka\kafka-logs\hubone.dataCollect.orbiwise.ArchiveQueue-
0\00000000000000000015.timeindex: The process cannot access the file because
it is being used by another process.**
(kafka.log.LogManager)
[2017-11-10 15:17:54,171] FATAL [Kafka Server 1], Fatal error during
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.nio.file.FileSystemException: C:\Tools\Kafka\kafka-
logs\hubone.dataCollect.orbiwise.ArchiveQueue-
0\00000000000000000015.timeindex: The process cannot access the file because
it is being used by another process.
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
at java.nio.file.Files.deleteIfExists(Files.java:1165)
at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:318)
at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:279)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at kafka.log.Log.loadSegmentFiles(Log.scala:279)
at kafka.log.Log.loadSegments(Log.scala:383)
at kafka.log.Log.<init>(Log.scala:186)
at kafka.log.Log$.apply(Log.scala:1609)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$5$$anonfun$apply$12$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:172)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
when i delete kafka-logs , the offsets are corrupted .
could you help me to fix this problem ?
i opened an issue in kafka :
https://issues.apache.org/jira/browse/KAFKA-6200
I had a similar issue.
I simply deleted the files in ..\tmp**kafka-logs**\ directory.
Then i restarted the services and it worked like a charm.
In my case the issue was resolved after emptying the recycle bin on windows after deleting logs from tmp and log directories
I get some big problems of kafka,when I shutdown my consumer application then change a groupId and restart it,my kafka brokers will stop working, this is the stack trace I get
[2016-07-11 17:02:47,314] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from [__consumer_offsets,0] (kafka.coordinator.GroupMetadataManager)
[2016-07-11 17:02:47,955] FATAL [Replica Manager on Broker 0]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-38'
at kafka.log.Log.append(Log.scala:318)
at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
at kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
at scala.Option.foreach(Option.scala:236)
at kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-38/00000000000000000000.index (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
Probably your /tmp is automatically cleaned up i.e. systemd-tmpfiles.
https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html