Why Kafka stops with an error on log dirs?

Why Kafka stops with an error on log dirs? - apache-kafka

I am trying to learn kafka and I had the below error:
[2021-01-21 13:46:43,247] WARN [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions
__consumer_offsets-22,first_topic-2,__consumer_offsets-37,first_topic-0,__consumer_offsets-
38,__consumer_offsets-13,twitter_tweets-5,__consumer_offsets-30,twitter_tweets-3,__consumer_offsets-
8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-
7,__consumer_offsets-9,__consumer_offsets-46,new_topic-0,__consumer_offsets-25,__consumer_offsets-
35,twitter_tweets-0,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-
23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-
32,__consumer_offsets-40 and stopped moving logs for partitions because they are in the failed log
directory C:\kafka_2.13-2.6.0\data\kafka. (kafka.server.ReplicaManager)
[2021-01-21 13:46:43,252] WARN Stopping serving logs in dir C:\kafka_2.13-2.6.0\data\kafka
(kafka.log.LogManager)
[2021-01-21 13:46:43,254] ERROR Shutdown broker because all log dirs in C:\kafka_2.13-2.6.0\data\kafka have failed (kafka.log.LogManager)
This happens every time I run a command for example:
bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic streams-plaintext-input.
When I delete all the offsets in /data folder then everything run smoothly. Is this happening because of the 7 day existing period that Kafka has?

Main issue is that Kafka depends on POSIX filesystem semantics that don't work well on windows.
Kafka uses specific features of POSIX to achieve high performance, so emulations—which happen on WSL 1—are insufficient. For example, the broker will crash when it rolls a segment file
This appears to be the error you're mentioning about the segment retention
If you want to use Kafka on windows, WSL2 is the suggested solution.
https://www.confluent.io/blog/set-up-and-run-kafka-on-windows-linux-wsl-2/
Also note: --zookeeper flag is deprecated

Related

Shutdown broker because all log dirs have failed

[2019-10-29 10:09:36,903] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-46,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-36,__consumer_offsets-42,topic-0,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-11,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-39,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-10 and stopped moving logs for partitions because they are in the failed log directory C:\tmp\kafka-logs. (kafka.server.ReplicaManager)
[2019-10-29 10:09:36,908] INFO Stopping serving logs in dir C:\tmp\kafka-logs (kafka.log.LogManager)
[2019-10-29 10:09:36,952] ERROR Shutdown broker because all log dirs in C:\tmp\kafka-logs have failed (kafka.log.LogManager)
i have started zookeeper,Kafka and producer also. But when i tried to consume data immediately this error is coming in Windows
command: .\bin\windows\Kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic topic

I had the similar issue and had to do trial and error. But what I eventually did was to disable the other JRE versions and leave only one enabled. See image attached. This seems to have resolved my problem since my broker doesn't crash anymore.

Issues with Apache Kafka Quickstart

I am new to Kafka and seem to be having several issues with the 'Quickstart' guide for Apache Kafka found here:
https://kafka.apache.org/quickstart#quickstart_kafkaconnect
Ultimately I am trying to learn how to load a kafka queue with many kafka messages and so the Step 7 part of this Quickstart guide seemed relevant.
I installed the binary download (Scala 2.11 - kafka_2.11-1.1.0.tgz ) found here:
https://kafka.apache.org/downloads
I had initially tried to jump straight to step 7 but realised after finding this question (Kafka Connect implementation errors) I had to do the few steps prior to that
Therefore I followed the first step successfully:
tar -xzf kafka_2.11-1.1.0.tgz
cd kafka_2.11-1.1.0
Then I followed step 2:
bin/zookeeper-server-start.sh config/zookeeper.properties
But I get the error
ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:117)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:87)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
But when I run the next command in that same step:
bin/kafka-server-start.sh config/server.properties
The Kafka server seems to run successfully?
So then I tried to continue to step 3 to create a topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
But this produces the error:
Error while executing topic command : Replication factor: 1 larger than available brokers: 0.
[2018-04-09 14:13:26,908] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 1 larger than available brokers: 0.
(kafka.admin.TopicCommand$)
Then trying step 4:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This seems to work and I can write a message but then I get a connection error (which is probably due to the fact previous steps haven't worked successfully)
kafka_2.11-1.1.0 user$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
>This is a message
[2018-04-09 14:17:52,631] WARN [Producer clientId=console-producer] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-04-09 14:17:52,687] WARN [Producer clientId=console-producer] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Does anyone know why these issues are occurring and how I can fix them? I can't find anymore inforomation in that tutorial about these problems

As the error suggests, you have something running on the default port for ZK. Either close it or change the zookeeper properties file to use another port.

Address localhost:2181 is already in use. Since Zookeeper cannot start, then Kafka brokers won't start too. replication-factor must be less or equal to the number of available brokers, and since no broker is available then the following error will be reported (even if you are using --replication-factor 1).
Error while executing topic command : Replication factor: 1 larger than available brokers: 0.
[2018-04-09 14:13:26,908] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 1 larger than available brokers: 0.
(kafka.admin.TopicCommand$)
You either need stop the process which is running in 2181 or change the ZK default port to a port which is not currently in use.
To see what is running (PID) in port 2181, run
lsof -i -n -P | grep 2181
If you want to kill that process, then run
kill -9 PID
where PID is the process ID which you can get from lsof command.
Otherwise, you need to change the port in the zookeeper.properties file by modifying the parameter clientPort=2181. And finally, you need to change zookeeper.connect=localhost:2181 parameter in the server.properties file accordingly.

what are the options for restoring kafka?

I have Kafka 0.9.
Today I need restart Kafka-Server.
But After restart I checked topics and didn't see all topics except standard
/opt/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
__consumer_offsets
After start in kafka server logs exist this warning:
[2018-03-19 13:10:53,199] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/data/kafka/ae-result-from-0/00000000000000000000 index) has non-zero size but the last offset is 0 which is no larger than the base offset 0.}. deleting /data/kafka/ae-result-from-0/00000000000000000000.timeindex, /data/kafka/ae-result-from-0/00000000000000000000.index and rebuilding index... (kafka.log.Log)
how to recover topics correctly?

kafka.admin.TopicCommand Failing

I am using a single node Kafka V 0.10.2 (16 GB RAM, 8 cores) and a single node zookeeper V 3.4.9 (4 GB RAM, 1 core ). I am having 64 consumer groups and 500 topics each having 250 partitions. I am able to execute the commands which require only Kafka broker and its running fine
ex.
./kafka-consumer-groups.sh --bootstrap-server localhost:9092
--describe --group
But when I execute the admin command like create topic, alter topic For example
./kafka-topics.sh --create --zookeeper :2181
--replication-factor 1 --partitions 1 --topic
Following exception is being displayed:
Error while executing topic command : replication factor: 1 larger
than available brokers: 0 [2017-11-16 11:22:13,592] ERROR
org.apache.kafka.common.errors.InvalidReplicationFactorException:
replication factor: 1 larger than available brokers: 0
(kafka.admin.TopicCommand$)
I checked my broker is up. In server.log following warnings are there
[2017-11-16 11:14:26,959] WARN Client session timed out, have not heard from server in 15843ms for sessionid 0x15aa7f586e1c061 (org.apache.zookeeper.ClientCnxn)
[2017-11-16 11:14:28,795] WARN Unable to reconnect to ZooKeeper service, session 0x15aa7f586e1c061 has expired (org.apache.zookeeper.ClientCnxn)
[2017-11-16 11:21:46,055] WARN Unable to reconnect to ZooKeeper service, session 0x15aa7f586e1c067 has expired (org.apache.zookeeper.ClientCnxn)
Below mentioned is my Kafka server configuration :
broker.id=1
delete.topic.enable=true
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/kafka/data/logs
num.partitions=1
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=<zookeeperIP>:2181
zookeeper.connection.timeout.ms=6000
Zookeeper Configuration is :
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
autopurge.snapRetainCount=20
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=48
I am not able to figure out which configuration to tune. What I am missing .Any help will be appreciated.

When you are running consumer with zookeeper argument like
./kafka-topics.sh --create --zookeeper :2181 --replication-factor 1
--partitions 1 --topic
it means that consumer will go and ask zookeeper to about broker details. if broker details available in zookeeper it can able to connect to the broker.
in your scenario, I think zookeeper lost broker details. zookeeper usually store all your configuration in tree path.
to check whether zookeeper has broker path or not you need log into zookeeper shell using /bin/zkCli.sh -server localhost:2181
after successful connection do ls / you will see output like this
[controller, controller_epoch, brokers, zookeeper, admin, isr_change_notification, consumers, config]
and then do ls /brokers output will be [ids, topics, seqid]
and then do ls /brokers/ids output will be [0] - it is an array of broker id's. if your array is empty [] that means that no broker details are present in your zookeeper
in that case, you need to restart your broker and zookeeper.
Updated :
This problem won't happen usually. because your zookeeper server is closing(killing) or losing broker path automatically.
To overcome this it is better to maintain two more zookeepers means complete 3 zookeepers nodes.
if it is local use localhost:2181, localhost:2182, localhost:2183.
if it is cluster use three instances zookeeper1:2181, zookeeper2:2181, zookeeper3:2181
you can tolerate up to two failures.
for creating topic and use following command :
./kafka-topics.sh --create --zookeeper
localhost:2181,localhost:2182,localhost:2183 --replication-factor 1
--partitions 1 --topic

When does kafka change leader?

I was running my services that work with kafka already for a year and no spontaneous changes of leader happens.
But for the last 2 weeks that started happens quite often.
Kafka log on that:
[2015-09-27 15:35:14,826] INFO [ReplicaFetcherManager on broker 2]
Removed fetcher for partitions [myTopic] (kafka.server.ReplicaFetcherManager)
[2015-09-27 15:35:14,830] INFO Truncating log myTopic-0 to offset 11520979. (kafka.log.Log)
[2015-09-27 15:35:14,845] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 713276 from client ReplicaFetcherThread-0-2 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:14,857] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 256685 from client mirrormaker-1 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:20,171] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [myTopic,0] (kafka.server.ReplicaFetcherManager)
What can cause switching leader? If there is info in some kafka documentation - please - just point the link. I've failed to find.
System configuration
kafka version: kafka_2.10-0.8.2.1
os: Red Hat Enterprise Linux Server release 6.5 (Santiago)
server.properties (differs from default):
broker.id=001
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.bytes=-1
controlled.shutdown.enable=true
auto.create.topics.enable=false

It appears like lead broker is down for that partition. It might be that data directroy(log.dirs) configured in server.properties is out of space and broker is not able to accommodate.
Also, what is replication factor of topic and cluster size of brokers?

I am assuming you have one topic and one partition with a replication factor of 2. Which is not a good configuration for optimal Kafka performance and consumers.
Your Logs are not clear enough for leader switch. Major issue in your topic may be having the only one leader due to the only partition. Now the single file in your logs is getting bigger in size day by day. Kafka internally does rebalancing at some level(details are not confirmed). That can be the reason for your leader switch. But i am not sure.
Also in your 2nd log line its says some of the logs are truncated. Can you please go though the logs in details and check is this happening only after truncation?
As you already mentioned you already checked your Kafka log directory files and their size. Please run the describe when you got this issue. The leader switch will reflect here as well. Or if you can setup some dashboard that will display the leader for past time. Then it will be easy for you to find the root cause.
bin/kafka-topics.sh --describe --zookeeper Zookeeperhost:Port --topic TopicName
Suggestion: i will suggest you to create a new topic with more partitions(read Kafka documentation to get a good idea about optimum number of partitions) and start writing to it. Or you can check, how to change partitions for current topic.
Last Thing: Is leader switch causing some issues in your Clients or you are worried only about warnings?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse