Why can it take so long to delete Kafka topics? - apache-kafka

On a 3 node cluster, I created few topics with thousands of messages. I have noticed that it takes long time to delete a topic. It took me more than 14 mins to delete 500 topics.
Are there any best practices for topic deletion?
Is there any document that explains why it takes so much time to delete a topic ?
When I create a topic, Kafka will create a folder under log.dirs. I had 10000 topics; I ran a command to delete all of them. Kafka has deleted all 10000 files from log.dirs but Kafka-topics.sh shows topics that does not exist on the file system with "- marked for deletion".

I don't think there are any best practices for deleting a topic in Kafka. As far delete.topic.enable=true is defined in server.properties you can simply delete the topic using
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic myTopic
If your topics were large enough (and you might possibly had a high replication factor as well), then that's something normal. Essentially, the messages of topics are stored in log files. If your topics are extremely large, then it could take some time in order to get rid of all of these files. I think the proper metric here is the size of the topics you attempted to delete and not the number of topics (You can have 500 topics each of which has 1 message as opposed to 500 topics with e.g. 1TB of messages each).

Kafka topic deletion is not guaranteed to be instant. When you 'delete' a topic, you are actually marking it for deletion. When the TopicDeletionManager next runs, it will then start removing any topics that are marked for deletion. This may also take longer if the topic logs are larger.

Related

Expiring the messages in Kafka Topic

We are using Apache Kafka perform load test in our dev environment.
In our Linux box where we have installed confluent kafka ,have limited space hence to perform load test we have added retention.ms property to the topic.
Idea is to remove the message from the topic after it is consumed by the consumer.
I have tried
kafka-topics --zookeeper localhost:2181 --alter --topic myTopic --config retention.ms=10000
it didn't work hence we re-created the topic and tried below option.
kafka-configs --alter --zookeeper localhost:2181 --entity-type topics --entity-name myTopic -add-config retention.ms=10000
After running the process for few hours the broker is shutting down because of space constraint.
What other options i can try from Topic as well as from broker standpoint to keep expiring the messages reliably and claiming back the disk space for long running load test.
You can define the deletion policy based on the byte size in addition to the time.
The topic configuration is called retention.bytes and in the documentation it is describes as:
This configuration controls the maximum size a partition (which consists of log segments) can grow to before we will discard old log segments to free up space if we are using the "delete" retention policy. By default there is no size limit only a time limit. Since this limit is enforced at the partition level, multiply it by the number of partitions to compute the topic retention in bytes.
You can set it together with retention.ms and whatever limit (bytes or time) will be reached first, the cleaning is triggered.
This might be because your Log cleaner threads might not had triggered.
You did not provide much info of how much data is accumulated on the topics. But it might not be in GB's.
Log cleaner threads will be cleaning on the completed log segments. Default size of segment is 1 GB.
Modify your topic configuration segment.bytes to less value if you are expecting huge load.
or
modify the configuration segment.ms to 1 min or 10 mins as per your requirement.
This should create the segments and based on your log retention time, cleaner threads will clean the older completed segments.

Delete __Consumer_offset Topic form Kafka

I'm trying to delete Kafka topic __Consumer_offset as it is causing a lot of confusion for my brokers.
When i do so, it says this topic can't be marked for deletion.
i'm using the zookeeper cli to delete it such as rmr /brokers/topic __consumer_offset, but it is not working!
__consumer_offsets is a kafka internal topic and it is not allowed to be deleted through delete topic command. It contains information about committed offsets for each topic:partition for each group of consumers (groupID). If you want to wipe it out entirely you have to delete the zookeeper dataDir location. That implies, you lose all the metadata.
Also if you just want to get rid of the existing consumer groups, you can as well reset the offsets or consider deleting them.
AFAIK you cannot delete that topic. It is a internal topic and should not be deleted manually.
If it is must, then you will have to manually clean/remove your data directory. When you deploy Kafka brokers and Zookeepers it creates data directory.
Note: By removing data directory you will loose all topics and related data. So this is not feasible option in Production.

Kafka partition directories not deleted in data dir

I am using bin/kafka-topics.sh --zookeeper --delete --topic and i see in kafka logs of that indicate that the partitions for that topic are marked for deletion. However, I am still seeing the directories for those partitions present in the data dir.
Is this something expected and I am have manually delete them?
The topics haven't been removed from the zookeeper also. I still see the topics in zookeeper. Is this also expected?
Thanks!
There could be several reasons for topics not being deleted automatically.
In order to delete a topic delete.topic.enable should be set to true.
If it is set to true, it should ideally delete the directories from Zookeeper and kafka data.dir . But in case, if it doesn't, you should check the logs to make sure if there is any problem with kafka brokers or zookeeper due to some LEADER selection issue.
So in that case, you have to cleanup the dirs manually.

Delay in Kafka Operations

During Testing I do some operation like delete a topic.
However I can see the topic after immeidately deleting it
Using: bin/kafka-topics.sh --list --zookeeper localhost:2181
It takes some time for deletion to actually occur.
This confused me.
Similarly when i produce data, i cannot consume immediately but have to wait for some time and re-run the consume command to consume data.
Is it because I am running a single node kafka set up and I am testing it too heavily.
If you want to delete a topic, you need to enable this via broker setting delete.topic.enable.
Deleting a topic happens "asynchronously", ie, after you did issues the delete command, the topic gets marked as "to be deleted" and the broker will delete the topic at some point later on.
About producing/consuming: not sure. If the consumer is online and the producer writes data it should up at the consumer shortly after...

Kafka topic is marked for deletion but not getting deleted in kafka 0.9

I am trying to delete my kafka topic which following command.
bin/kafka-topics.sh --zookeeper <zkserver>:2181 --delete --topic test1
My kafka version is 0.9 and I have also set delete.topic.enable flag to true. Still when I fire above command my topic is only marked for deletion not actually getting deleted.
logic topic are composed of multiple partition, and each partition may have multiple copy. In a word, your topic are physically distributed in multiple instance.
If any instance is down, your topic deletion will not able to finish.
There was an orphan producer process running on that topic which was spawned by my java Kafka producer program. That I eventually came to know when I started a console consumer on the same topic. After manually killing that process I was able to delete the topic.