We have setup a zookeeper quorum (3 nodes) and 3 kafka brokers. The producers can't able to send record to kafka --- data loss. During investigation, we (can still) SSH to that broker and observed that the broker disk is full. We deleted topic logs to clear some disk space and the broker function as expected again.
Given that we can still SSH to that broker, (we can't see the logs right now) but I assume that zookeeper can hear the heartbeat of that broker and didn't consider it down? What is the best practice to handle such events?
The best practice is to avoid this from happening!
You need to monitor the disk usage of your brokers and have alerts in advance in case available disk space runs low.
You need to put retention limits on your topics to ensure data is deleted regularly.
You can also use Topic Policies (see create.topic.policy.class.name) to control how much retention time/size is allowed when creating/updating topics to ensure topics can't fill your disk.
The recovery steps you did are ok but you really don't want to fill the disks to keep your cluster availability high.
Related
We use 3 node kafka clusters running 2.7.0 with quite high number of topics and partitions. Almost all the topics have only 1 partition and replication factor of 3 so that gives us roughly:
topics: 7325
partitions total in cluster (including replica): 22110
Brokers are relatively small with
6vcpu
16gb memory
500GB in /var/lib/kafka occupied by partitions data
As you can imagine because we have 3 brokers and replication factor 3 the data is very evenly spread across brokers. Each broker leads very similar (same) amount of partitions and the number of partitions per broker is equal. Under normal circumstances.
Before doing rolling restart yesterday everything was in-sync. We stopped the process and started it again after 1 minute. It took some 10minutes to get synchronized with Zookeeper and start listening on port.
After saing 'Kafka server started'. Nothing is happening. There is no CPU, memory or disk activity. The partition data is visible on data disk. There are no messages in log for more than 1 day now since process booted up.
We've tried restarting zookeeper cluster (one by one). We've tried restart of broker again. Now it's been 24 hours since last restart and still not change.
Broker itself is reporting it leads 0 partitions. Leadership for all the partitions moved to other brokers and they are reporting that everything located in this broker is not in sync.
I'm aware the number of partitions per broker is far exceeding the recommendation but I'm still confused by lack of any activity or log messages. Any ideas what should be checked further? It looks like something is stuck somewhere. I checked the kafka ACLs and there are no block messages related to broker username.
I tried another restart with DEBUG mode and it seems there is some problem with metadata. These two messages are constantly repeating:
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: No controller defined in metadata cache, retrying after backoff (kafka.server.BrokerToControllerRequestThread)
With kcat it's also impossible to fetch metadata about topics (meaning if I specify this broker as bootstrap server).
I understand that most systems should have monitoring to make sure that this doesn't happen (and that we should set the retention policies properly), but am just curious what happens if the Kafka Broker does indeed run out of disk space (for example, if we set retention time to 30 days, but the Broker runs out of disk space by the 1st day)?
In a single Broker scenario, does the Broker simply stop receiving any new messages and return an exception to the Producer? Or does it delete old message to make space for the new ones?
In a multi Broker scenario, assuming we have Broker A (leader of the partition but has not more disk space) and Broker B (follower of the partition and still has disk space), will leadership move to Broker B? What happens when both Brokers run out of space? Does it also return an exception to the Producer?
Assuming the main data directory is not on a separate volume, the OS processes themselves will start locking up because there's no free space left on the device.
Otherwise, if the log directories are isolated to Kafka, you can expect any producer acks to stop working. I'm not sure if a specific error message is returned to clients, though. From what I remember, the brokers just stop responding to Kafka client requests and we had to SSH to them, stop kafka services, and manually clean up files rather than waiting for retention policies. No, the brokers don't preempt old data in favor of new records
A Kafka cluster provides high availability, but does it also provide some disaster recovery protection?
Specifically, if say one of your topic files was somehow corrupted or deleted on one server, can you recover from this with the topic files on your other servers in the cluster?
Topic replication accounts for these scenarios, yes.
If topics have a replication factor of higher than one and you have unclean leader election disabled, then it's highly unlikely for a topic or partition to become non-recoverable.
I have a 6 node Kafka cluster where due to unforseen circumstances the kafka partition on one of the brokers filled up completely.
Kafka understandable won't start.
We managed to process the data from topics on the other brokers.
We have a replication factor of 4 so all is good there.
Can I delete an index file from a topic manually so that kafka can start and clear the data itself or is there a risk of corruption if I do that?
Once the brokers starts it should clear most of the space as we have cleared the topics by setting the retention low on the topics that have been processed.
What is the best approach?
The best way that I found, in this case, is removing logs and decrease the retention or replication of Kafka!
Some comments mention tuning the retention. I mentioned that we had already done that. The problem was that the broker that had a full disk could not start until some space was cleared.
After testing on dev environment I was able to resolve this by deleting some .log and .index files from one Kafka log folder. This allowed the broker to start. It then automatically started to clear the data based on retention and the situation was resolved.
Right now we are running kafka in AWS EC2 servers and zookeeper is also running on separate EC2 instances.
We have created a service (system units ) for kafka and zookeeper to make sure that they are started in case the server gets rebooted.
The problem is sometimes zookeeper severs are little late in starting and kafka brokers by that time getting terminated.
So to deal with this issue we are planning to increase the zookeeper.connection.timeout.ms to some high number like 10 mins, at the broker side. Is this a good approach ?
Are there any size effect of increasing the zookeeper.connection.timeout.ms timeout in zookeeper ?
Increasing zookeeper.connection.timeout.ms may or may not handle your problem in hand but there is a possibility that it will take longer time to detect a broker soft failure.
Couple of things you can do:
1) You must alter the System to launch the kafka to delay by 10 mins (the time you wanted to put in zookeper timeout).
2) We are using HDP cluster which automatically takes care of such scenarios.
Here is an explanation from Kafka FAQs:
During a broker soft failure, e.g., a long GC, its session on ZooKeeper may timeout and hence be treated as failed. Upon detecting this situation, Kafka will migrate all the partition leaderships it currently hosts to other replicas. And once the broker resumes from the soft failure, it can only act as the follower replica of the partitions it originally leads.
To move the leadership back to the brokers, one can use the preferred-leader-election tool here. Also, in 0.8.2 a new feature will be added which periodically trigger this functionality (details here).
To reduce Zookeeper session expiration, either tune the GC or increase zookeeper.session.timeout.ms in the broker config.
https://cwiki.apache.org/confluence/display/KAFKA/FAQ
Hope this helps