Shutdown broker because all log dirs have failed - apache-kafka

[2019-10-29 10:09:36,903] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-46,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-36,__consumer_offsets-42,topic-0,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-11,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-39,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-10 and stopped moving logs for partitions because they are in the failed log directory C:\tmp\kafka-logs. (kafka.server.ReplicaManager)
[2019-10-29 10:09:36,908] INFO Stopping serving logs in dir C:\tmp\kafka-logs (kafka.log.LogManager)
[2019-10-29 10:09:36,952] ERROR Shutdown broker because all log dirs in C:\tmp\kafka-logs have failed (kafka.log.LogManager)
i have started zookeeper,Kafka and producer also. But when i tried to consume data immediately this error is coming in Windows
command: .\bin\windows\Kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic topic

I had the similar issue and had to do trial and error. But what I eventually did was to disable the other JRE versions and leave only one enabled. See image attached. This seems to have resolved my problem since my broker doesn't crash anymore.

Related

Kafka and Zookeeper not working give an error (Kafka shutting down and INFO ZooKeeper audit is disabled despite enabling it)

I am a beginner and I have to use Kafka for data transfer into/from Hadoop FS (or any other application, not just through put or copyFromLocal commands),kafka needs zookeeper as well, I enabled Zooekeeper audit logging but I still get errors.
For Kafka, when I want to start it:
JMX_PORT=8004 bin/kafka-server-start.sh config/server.properties
I get the error:
[2022-02-16 13:56:45,939] INFO shutting down (kafka.server.KafkaServer)
[2022-02-16 13:56:46,114] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2022-02-16 13:56:46,133] INFO shut down completed (kafka.server.KafkaServer)
[2022-02-16 13:56:46,133] ERROR Exiting Kafka. (kafka.Kafka$)
[2022-02-16 13:56:46,165] INFO shutting down (kafka.server.KafkaServer)
And when I want to start Zookeeper using the command:
bin/zookeeper-server-start.sh config/zookeeper.properties
I get the following (and it gets stuck on it):
[2022-02-16 14:03:13,954] INFO zookeeper.request_throttler.shutdownTimeout = 10000 (org.apache.zookeeper.server.RequestThrottler)
[2022-02-16 14:03:13,955] INFO PrepRequestProcessor (sid:0) started, reconfigEnabled=false (org.apache.zookeeper.server.PrepRequestProcessor)
[2022-02-16 14:03:14,136] INFO Using checkIntervalMs=60000 maxPerMinute=10000 maxNeverUsedIntervalMs=0 (org.apache.zookeeper.server.ContainerManager)
[2022-02-16 14:03:14,138] INFO ZooKeeper audit is disabled. (org.apache.zookeeper.audit.ZKAuditProvider)
Does anyone know how to work this out? I enabled audit logging but still. Same problem.
Zookeeper CLI isn't "stuck"; it's waiting for connections.
Open a new terminal and start Kafka
Alternatively, you could use Docker Compose / Kubernetes, if you think your host / local JVM is causing issues

Why Kafka stops with an error on log dirs?

I am trying to learn kafka and I had the below error:
[2021-01-21 13:46:43,247] WARN [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions
__consumer_offsets-22,first_topic-2,__consumer_offsets-37,first_topic-0,__consumer_offsets-
38,__consumer_offsets-13,twitter_tweets-5,__consumer_offsets-30,twitter_tweets-3,__consumer_offsets-
8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-
7,__consumer_offsets-9,__consumer_offsets-46,new_topic-0,__consumer_offsets-25,__consumer_offsets-
35,twitter_tweets-0,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-
23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-
32,__consumer_offsets-40 and stopped moving logs for partitions because they are in the failed log
directory C:\kafka_2.13-2.6.0\data\kafka. (kafka.server.ReplicaManager)
[2021-01-21 13:46:43,252] WARN Stopping serving logs in dir C:\kafka_2.13-2.6.0\data\kafka
(kafka.log.LogManager)
[2021-01-21 13:46:43,254] ERROR Shutdown broker because all log dirs in C:\kafka_2.13-2.6.0\data\kafka have failed (kafka.log.LogManager)
This happens every time I run a command for example:
bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic streams-plaintext-input.
When I delete all the offsets in /data folder then everything run smoothly. Is this happening because of the 7 day existing period that Kafka has?
Main issue is that Kafka depends on POSIX filesystem semantics that don't work well on windows.
Kafka uses specific features of POSIX to achieve high performance, so emulations—which happen on WSL 1—are insufficient. For example, the broker will crash when it rolls a segment file
This appears to be the error you're mentioning about the segment retention
If you want to use Kafka on windows, WSL2 is the suggested solution.
https://www.confluent.io/blog/set-up-and-run-kafka-on-windows-linux-wsl-2/
Also note: --zookeeper flag is deprecated

Wrong Kafka consumer_offset

I am currently using the confluent platform community license. I started Zookeeper, Kafka and the schema-registry - all are used in local mode. However when starting the schema-registry for the first time, 50 messages are sent and stored inside the __consumer_offset topic (__consumer_offsets-0 to __consumer_offsets-49). Those messages are stored in the kafka-logs and when I am trying to start the services again, it fails. To be more precise: Zookeeper works but Kafka fails with the error:
"ERROR Shutdown broker because all log dirs have failed".
As suggested in some other posts I deleted the log.dirs directory referenced in the zookeeper.properties file and the log.dirs directory referenced in the server.properties file. After doing this I can start kafka again without any error - but the 50 messages are stored in __consumer_offset again when starting the schema-registry and after stopping kafka and trying to start kafka again it fails with the same error.
Any help is greatly appreciated. :)
EDIT:
Above that error theres another error saying:
"ERROR Failed to clean up log for _schemas-0 in dir /mnt/c/Users/Username/Desktop/Big_Data/confluent-6.0.0/kafka-logs due to IOException (kafka.server.LogDirFailureChannel) java.io.IOException: Invalid argument"
and also two warnings:
"WARN [ReplicaManager broker=0] Stopping serving replicas in dir /mnt/c/Users/Username/Desktop/Big_Data/confluent-6.0.0/kafka-logs (kafka.server.ReplicaManager)"
and
"WARN [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-22, ... (all of the 50 offsets are then listed)"

Kafka broker shutdown while cleaning up log files

Kafka cluster with 3 brokers(version:1.1.0) and is well running for over 6 months.
Then we modified partitions from 3 to 48 for every topic after 2018/12/12, then the brokers shutdown every 5-10 days.
Then we upgraded the broker from 1.1.0 to 2.1.0, but the brokers still keep shutting down every 5-10 days.
Each time, one broker shut down after the following error log, then several minutes later, the other 2 brokers shut down too, with the same error but other partition log files.
[2019-01-11 17:16:36,572] INFO [ProducerStateManager partition=__transaction_state-11] Writing producer snapshot at offset 807760 (kafka.log.ProducerStateManager)
[2019-01-11 17:16:36,572] INFO [Log partition=__transaction_state-11, dir=/kafka/logs] Rolled new log segment at offset 807760 in 4 ms. (kafka.log.Log)
[2019-01-11 17:16:46,150] WARN Resetting first dirty offset of __transaction_state-35 to log start offset 194404 since the checkpointed offset 194345 is invalid. (kafka.log.LogCleanerManager$)
[2019-01-11 17:16:46,239] ERROR Failed to clean up log for __transaction_state-11 in dir /kafka/logs due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:809)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:222)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1838)
at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:1901)
at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:1896)
at scala.collection.immutable.List.foreach(List.scala:388)
at kafka.log.Log.replaceSegments(Log.scala:1896)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:583)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:515)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:514)
at scala.collection.immutable.List.foreach(List.scala:388)
at kafka.log.Cleaner.doClean(LogCleaner.scala:514)
at kafka.log.Cleaner.clean(LogCleaner.scala:492)
at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:353)
at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Suppressed: java.nio.file.NoSuchFileException: /kafka/logs/__transaction_state-11/00000000000000807727.log -> /kafka/logs/__transaction_state-11/00000000000000807727.log.deleted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
at java.nio.file.Files.move(Files.java:1395)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
... 17 more
[2019-01-11 17:16:46,245] INFO [ReplicaManager broker=2] Stopping serving replicas in dir /kafka/logs (kafka.server.ReplicaManager)
[2019-01-11 17:16:46,314] INFO Stopping serving logs in dir /kafka/logs (kafka.log.LogManager)
[2019-01-11 17:16:46,326] ERROR Shutdown broker because all log dirs in /kafka/logs have failed (kafka.log.LogManager)
if you have not changed log.retention.bytes or log.retention.hours or log.retention.minutes or log.retention.ms configs, Kafka tries to delete logs after 7 days. So based on the exception, Kafka wants to clean up file /kafka/logs/__transaction_state-11/00000000000000807727.log but, there is no such file in Kafka log directory and it throws an exception which causes broker shut down.
if you are able to shut down cluster and Zookeeper do it and clean up /kafka/logs/__transaction_state-11 manually.
Note: I don't know it is harmful or not but you can follow safely remove Kafka topic posts.

Kafka, Unable to produce and consume events

While trying to set kafka on 2 replica and 1 master boxes, got a weird condition where I was not able to consume or produce to a topic.
Using Mirror Maker to sync data between replica <--> Master. Getting following logs unending :
[2016-08-26 14:28:33,897] WARN Bootstrap broker localhost:9092 disconnected (org.apache.kafka.clients.NetworkClient) [2016-08-26
14:28:43,515] WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:45,118]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:46,721]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:48,324]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:49,927]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:53,029]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient)**
Only way I could recover was by restarting Kafka which produced this kind of logs :
[2016-08-26 14:30:54,856] WARN Found a corrupted index file, /tmp/kafka-logs/__consumer_offsets-43/00000000000000000000.index,
deleting and rebuilding index... (kafka.log.Log) [2016-08-26
14:30:54,856] INFO Recovering unflushed segment 0 in log
__consumer_offsets-43. (kafka.log.Log) [2016-08-26 14:30:54,857] INFO Completed load of log __consumer_offsets-43 with log end offset 0
(kafka.log.Log) [2016-08-26 14:30:54,860] WARN Found a corrupted index
file,
/tmp/kafka-logs/__consumer_offsets-26/00000000000000000000.index,
deleting and rebuilding index... (kafka.log.Log) [2016-08-26
14:30:54,860] INFO Recovering unflushed segment 0 in log
__consumer_offsets-26. (kafka.log.Log) [2016-08-26 14:30:54,861] INFO Completed load of log __consumer_offsets-26 with log end offset 0
(kafka.log.Log) [2016-08-26 14:30:54,864] WARN Found a corrupted index
file,
/tmp/kafka-logs/__consumer_offsets-35/00000000000000000000.index,
deleting and rebuilding index... (kafka.log.Log)**
ERROR Error when sending message to topic dr_ubr_analytics_limits with key: null, value: 1 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update
metadata after 60000 ms.**
This is my test phase so I was able to restart and recover from the master box but I want know what caused this issue and how can it be avoided. Is there a way to debug this issue?
Trying to achieve following via Kafka