I'm trying to delete Kafka topic __Consumer_offset as it is causing a lot of confusion for my brokers.
When i do so, it says this topic can't be marked for deletion.
i'm using the zookeeper cli to delete it such as rmr /brokers/topic __consumer_offset, but it is not working!
__consumer_offsets is a kafka internal topic and it is not allowed to be deleted through delete topic command. It contains information about committed offsets for each topic:partition for each group of consumers (groupID). If you want to wipe it out entirely you have to delete the zookeeper dataDir location. That implies, you lose all the metadata.
Also if you just want to get rid of the existing consumer groups, you can as well reset the offsets or consider deleting them.
AFAIK you cannot delete that topic. It is a internal topic and should not be deleted manually.
If it is must, then you will have to manually clean/remove your data directory. When you deploy Kafka brokers and Zookeepers it creates data directory.
Note: By removing data directory you will loose all topics and related data. So this is not feasible option in Production.
Related
On a 3 node cluster, I created few topics with thousands of messages. I have noticed that it takes long time to delete a topic. It took me more than 14 mins to delete 500 topics.
Are there any best practices for topic deletion?
Is there any document that explains why it takes so much time to delete a topic ?
When I create a topic, Kafka will create a folder under log.dirs. I had 10000 topics; I ran a command to delete all of them. Kafka has deleted all 10000 files from log.dirs but Kafka-topics.sh shows topics that does not exist on the file system with "- marked for deletion".
I don't think there are any best practices for deleting a topic in Kafka. As far delete.topic.enable=true is defined in server.properties you can simply delete the topic using
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic myTopic
If your topics were large enough (and you might possibly had a high replication factor as well), then that's something normal. Essentially, the messages of topics are stored in log files. If your topics are extremely large, then it could take some time in order to get rid of all of these files. I think the proper metric here is the size of the topics you attempted to delete and not the number of topics (You can have 500 topics each of which has 1 message as opposed to 500 topics with e.g. 1TB of messages each).
Kafka topic deletion is not guaranteed to be instant. When you 'delete' a topic, you are actually marking it for deletion. When the TopicDeletionManager next runs, it will then start removing any topics that are marked for deletion. This may also take longer if the topic logs are larger.
I am using bin/kafka-topics.sh --zookeeper --delete --topic and i see in kafka logs of that indicate that the partitions for that topic are marked for deletion. However, I am still seeing the directories for those partitions present in the data dir.
Is this something expected and I am have manually delete them?
The topics haven't been removed from the zookeeper also. I still see the topics in zookeeper. Is this also expected?
Thanks!
There could be several reasons for topics not being deleted automatically.
In order to delete a topic delete.topic.enable should be set to true.
If it is set to true, it should ideally delete the directories from Zookeeper and kafka data.dir . But in case, if it doesn't, you should check the logs to make sure if there is any problem with kafka brokers or zookeeper due to some LEADER selection issue.
So in that case, you have to cleanup the dirs manually.
I was using a kafka topic, and it's metadata as well in my application. I hard deleted the topic from the zookeeper shell, by deleting the directories corresponding to that topic. After creating the topic again, I described the topic and found that no leaders have been assigned to this newly created topic. In the consumer, I can see repeated logs printing LEADER_NOT_AVAILABLE. Any reason as to what am I doing wrong? Or maybe is there a way to delete the metadata related to the kafka topic as well that I'm unaware of? Thanks in advance!
Deleting topics in Kafka hasn't been straightforward until recently. In general, you shouldn't attempt to delete Kafka topics by deleting metadata in Zookeeper. You should always use the included command line utilities.
First you need to make sure that deleting topics is enabled in the server.properties file on all brokers, and do a rolling restart if needed:
delete.topic.enable=true
After you restart the brokers to enable topic deletion, you should issue the delete command using the command line utilities:
./kafka-topics.sh —zookeeper <zookeeper_host>:2181 —delete —topic <topic_name>
If at this point, it's still stuck, try to run these two commands from the zookeeper shell to make sure and remove all metadata for that particular topic:
rmr /brokers/topics/<topic_name>
rmr /admin/delete_topics/<topic_name>
A few more details here:
https://medium.com/#contactsunny/manually-delete-apache-kafka-topics-424c7e016ff3
I am facing the below issue on changing some properties related to kafka and re-starting the cluster.
In kafka Consumer, there were 5 consumer jobs are running .
If we make some important property change , and on restarting cluster some/all the existing consumer jobs are not able to start.
Ideally all the consumer jobs should start ,
since it will take the meta-data info from the below System-topics .
config.storage.topic
offset.storage.topic
status.storage.topic
First, a bit of background. Kafka stores all of its data in topics, but those topics (or rather the partitions that make up a topic) are append-only logs that would grow forever unless something is done. To prevent this, Kafka has the ability to clean up topics in two ways: retention and compaction. Topics configured to use retention will retain data for a configurable length of time: the broker is free to remove any log messages that are older than this. Topics configured to use compaction require every message have a key, and the broker will always retain the last known message for every distinct key. Compaction is extremely handy when each message (i.e., key/value pair) represents the last known state for the key; since consumers are reading the topic to get the last known state for each key, they will eventually get to that last state a bit faster if older states are removed.
Which cleanup policy a broker will use for a topic depends on several things. Every topic created implicitly or explicitly will use retention by default, though you can change a couple of ways:
change the globally log.cleanup.policy broker setting, affecting only topics created after that point; or
specify the cleanup.policy topic-specific setting when you create or modify a topic
Now, Kafka Connect uses several internal topics to store connector configurations, offsets, and status information. These internal topics must be compacted topics so that (at least) the last configuration, offset, and status for each connector are always available. Since Kafka Connect never uses older configurations, offsets, and status, it's actually a good thing for the broker to remove them from the internal topics.
Before Kafka 0.11.0.0, the recommended process is to manually create these internal topics using the correct topic-specific settings. You could rely upon the broker to auto-create them, but that is problematic for several reasons, not the least of which is that the three internal topics should have different numbers of partitions.
If these internal topics are not compacted, the configurations, offsets, and status info will be cleaned up and removed after the retention period has elapsed. By default this retention period is 24 hours! That means that if you restart Kafka Connect more than 24 hours after deploying / updating a connector configuration, that connector's configuration may have been purged and it will appear as if the connector configuration never existed.
So, if you didn't create these internal topics correctly, simply use the topic admin tool to update the topic's settings as described in the documentation.
BTW, not properly creating these internal topics is a very common problem, so much so that Kafka Connect 0.11.0.0 will be able to automatically create these internal topics using the correct settings without relying upon broker auto-creation of topics.
In 0.11.0 you will still have to rely upon manual creation or broker auto-creation for topics that source connectors write to. This is not ideal, and so there's a proposal to change Kafka Connect to automatically create the topics for the source connectors while giving the source connectors control over the settings. Hopefully that improvement makes it into 0.11.1.0 so that Kafka Connect is even easier to use.
I am facing the below issue on changing some properties related to kafka and re-starting the cluster.
In kafka Consumer, there were 5 consumer jobs are running .
If we make some important property change , and on restarting cluster some/all the existing consumer jobs are not able to start.
Ideally all the consumer jobs should start ,
since it will take the meta-data info from the below System-topics .
config.storage.topic
offset.storage.topic
status.storage.topic
First, a bit of background. Kafka stores all of its data in topics, but those topics (or rather the partitions that make up a topic) are append-only logs that would grow forever unless something is done. To prevent this, Kafka has the ability to clean up topics in two ways: retention and compaction. Topics configured to use retention will retain data for a configurable length of time: the broker is free to remove any log messages that are older than this. Topics configured to use compaction require every message have a key, and the broker will always retain the last known message for every distinct key. Compaction is extremely handy when each message (i.e., key/value pair) represents the last known state for the key; since consumers are reading the topic to get the last known state for each key, they will eventually get to that last state a bit faster if older states are removed.
Which cleanup policy a broker will use for a topic depends on several things. Every topic created implicitly or explicitly will use retention by default, though you can change a couple of ways:
change the globally log.cleanup.policy broker setting, affecting only topics created after that point; or
specify the cleanup.policy topic-specific setting when you create or modify a topic
Now, Kafka Connect uses several internal topics to store connector configurations, offsets, and status information. These internal topics must be compacted topics so that (at least) the last configuration, offset, and status for each connector are always available. Since Kafka Connect never uses older configurations, offsets, and status, it's actually a good thing for the broker to remove them from the internal topics.
Before Kafka 0.11.0.0, the recommended process is to manually create these internal topics using the correct topic-specific settings. You could rely upon the broker to auto-create them, but that is problematic for several reasons, not the least of which is that the three internal topics should have different numbers of partitions.
If these internal topics are not compacted, the configurations, offsets, and status info will be cleaned up and removed after the retention period has elapsed. By default this retention period is 24 hours! That means that if you restart Kafka Connect more than 24 hours after deploying / updating a connector configuration, that connector's configuration may have been purged and it will appear as if the connector configuration never existed.
So, if you didn't create these internal topics correctly, simply use the topic admin tool to update the topic's settings as described in the documentation.
BTW, not properly creating these internal topics is a very common problem, so much so that Kafka Connect 0.11.0.0 will be able to automatically create these internal topics using the correct settings without relying upon broker auto-creation of topics.
In 0.11.0 you will still have to rely upon manual creation or broker auto-creation for topics that source connectors write to. This is not ideal, and so there's a proposal to change Kafka Connect to automatically create the topics for the source connectors while giving the source connectors control over the settings. Hopefully that improvement makes it into 0.11.1.0 so that Kafka Connect is even easier to use.