Kafka brokers shuts down because log dirs have failed - mongodb

I have a 3 broker Kafka clusters with the Kafka logs in the /tmp directory. I am running Debezium Source Connector to MongoDB which polls data from 4 collections.
However within 5 mins after starting the connector, the Kafka brokers are shutting down with the following error:
[2020-04-16 18:25:08,642] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs-1 have failed (kafka.log.LogManager)
I have tried the different suggestions viz. Deleting the Kafka logs and cleaning out the Zookeeper logs. But I ran into the same problem again.
I have also noticed that the kafka logs occupy 100% of the /tmp directory when this happens. So I have also changed the log retention policy based on size.
log.retention.hours=168
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=10000
This also turned up to be futile.
I would like to have some assistance regarding this. Thanks in advance!

Your log files got corrupted probably because you've ran out of storage.
I would suggest to change log.dirs in server.properties. Also make sure that you don't use the tmp/ location, as this is going to be purged once your machine turns off. Once you have changed log.dirs you can restart Kafka.
Note that the older messages will be lost.

Related

After reboot KAFKA topic appears to be lost

Having installed KAFKA and having looked at these posts:
kafka loses all topics on reboot
Kafka topic no longer exists after restart
and thus moving kafka-logs to /opt... location, I still note that when I reboot:
I can re-create the topic again.
the kafka-logs directory contains information on topics, offsets etc. but it gets corrupted.
I am wondering how to rectify this.
Testing of new topics prior to reboot works fine.
There can be two potential problems
If it is kafka running in docker, then docker image restart always cleans up the previous state and creates a new cluster hence all topics are lost.
Check the log.dir or Zookeeper data path. If either is set to /tmp directory, it will be cleaned on each reboot. Hence you will lose all logs and topics will be lost.
In this VM I noted the Zookeeper log was defined on /tmp. Changed that to /opt (presume it should be /var though) and the clearing of Kafka data when instance terminated was corrected. Not sure how to explain this completely.

how to start Kafka and zookeeper when the HDD is full

I have setup Kafka cluster with 3 nodes. now their hard disk is full, and Kafka and Zookeeper are down.
What is the best solution to re-start Kafka and Zookeeper?
Can I delete the directory for Kafka logs (log.dirs directory) and start Kafka again?
I using confluent version 4.0.0
If you delete log.dirs you will loose all your data, which I guess should be your last option. If you are using amazon EC2 you better try to add more disk, and after starting the brokers play with the retention policy or with replication factor or remove topics.
Take a look on: https://kafka.apache.org/documentation/#basic_ops_add_topic
I solved this problem with this method and I hope useful for you:
step 1 :
I moved some of the log files from my servers to another server to free up a little bit of disk space. You can temporarily change the port of Kafka So nobody can produce it.
step 2 :
when the disk Was emptied, I was able to run Zookeeper and Kafka on the server.
step 3 :
now I've deleted messages (by kafka-delete-records) from some topics and released half of the hard drive
step 4 :
I stopped Kafka and Zookeeper in all servers (3 nodes)
step 5 :
I moved the Log files to their directory (I moved them in the step 1) and changed the port Kafka to 9092 again
step 6 :
And finally, I started Kafka and Zookeeper on all servers successfully

How to save a kafka topic at shutdown

I'm configuring my first kafka network. I can't seem to find any support on saving a configured topic. I know I can configure a topic from the quickstart guide here, but how do I save it? I thought I could add the topic info to a .properties file inside the config dir, but I don't see any support for that.
If I shutdown my machine, my topic is deleted. How do I save the configuration?
Could the topic be deleted because you are using the default broker config? With the default config, Kafka logs are stored under /tmp folder. This folder gets wiped out during a machine reboot. You could change the broker config and pick another location for Kafka logs.

Recovering Kafka Data from .log Files

I have a 1-node kafka that crashed recently. I was able to salvage the .log and .index files from /tmp/kafka-logs/mytopic-0/ and I have moved these files to a different server and installed kafka on it.
Is there a way to have the new kafka server serve the data contained in these .log files?
Update:
I probably didn't do this the right way, but here is what I've tired:
created a topic named recovermytopic on the new kafka server
stopped kafka
moved all the .log files into /tmp/kafka-logs/recovermytopic-0
restarted kafka
it appeared that for each .log file, kafka generated a .index file, looked promising but after the index files were created, I saw messeages below:
WARN Partition [recovermytopic,0] on broker 0: No checkpointed highwatermark is found for partition [recovermytopic,0] (kafka.cluster.Partition)
INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [recovermytopic,0] (kafka.server.ReplicaFetcherManager)
When I try to check the topic using kafka-console-consumer, the kafka server says:
INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor)
no messages being consumed..
Kafka comes packaged with a DumpLogSegments tool that will extract messages (along with offsets, etc.) from Kafka data log files:
$KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files mytopic-0/00000000000000132285.log > 00000000000000132285_messages.out
The output will vary a bit depending on which version of Kafka you're using, but it should be easy to extract the message keys and values with the use of sed or some other tool. The messages can then be replayed into your Kafka cluster using the kafka-console-producer.sh tool, or programmatically.
While this method is a bit roundabout, I think it's more transparent/reliable than trying to get a broker to start with data log files obtained from somewhere else. I've tested the DumpLogSegments tool with various versions of Kafka from 0.9 all the way up to 2.4.

How to create a durable topic in kafka

I am new to kafka and am still learning the basics of the same. I want to create a durable topic which is preserved even after the zoopkeeper and/or kafka server shutdown.
What I notice it this - I have a zookeeper and kafka server running on my local macbook. When I shutdown the zookeeper server and again bring it up quickly I can see the previously created topics. But If I restart the system and then restart the zookeeper server - I dont see the topic that I had created earlier.
I am running kafka_2.9.2-0.8.1.1 on my local system.
It happens because /tmp is getting cleaned after reboot resulting in loss of your data.
To fix this modify your Zookeeper dataDir property (in config/zookeeper.properties) and Kafka log.dirs (in config/server.properties) to be somewhere NOT in /tmp.