With Kafka 0.8.1.1, how do I change the log retention time while it's running? The documentation says the property is log.retention.hours, but trying to change it using kafka-topics.sh returns this error
$ bin/kafka-topics.sh --zookeeper zk.yoursite.com --alter --topic as-access --config topic.log.retention.hours=24
Error while executing topic command requirement failed: Unknown configuration "topic.log.retention.hours".
java.lang.IllegalArgumentException: requirement failed: Unknown configuration "topic.log.retention.hours".
at scala.Predef$.require(Predef.scala:145)
at kafka.log.LogConfig$$anonfun$validateNames$1.apply(LogConfig.scala:138)
at kafka.log.LogConfig$$anonfun$validateNames$1.apply(LogConfig.scala:137)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.JavaConversions$JEnumerationWrapper.foreach(JavaConversions.scala:479)
at kafka.log.LogConfig$.validateNames(LogConfig.scala:137)
at kafka.log.LogConfig$.validate(LogConfig.scala:145)
at kafka.admin.TopicCommand$.parseTopicConfigsToBeAdded(TopicCommand.scala:171)
at kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:95)
at kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:93)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
at kafka.admin.TopicCommand$.alterTopic(TopicCommand.scala:93)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:52)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
log.retention.hours is a property of a broker which is used as a default value when a topic is created. When you change configurations of currently running topic using kafka-topics.sh, you should specify a topic-level property.
A topic-level property for log retention time is retention.ms.
From Topic-level configuration in Kafka 0.8.1 documentation:
Property: retention.ms
Default: 7 days
Server Default Property: log.retention.minutes
Description: This configuration controls the maximum time we will retain a log before we will discard old log segments to free up space if we are using the "delete" retention policy. This represents an SLA on how soon consumers must read their data.
So the correct command depends on the version. Up to 0.8.2 (although docs still show its use up to 0.10.1) use kafka-topics.sh --alter and after 0.10.2 (or perhaps from 0.9.0 going forward) use kafka-configs.sh --alter
$ bin/kafka-topics.sh --zookeeper zk.yoursite.com --alter --topic as-access --config retention.ms=86400000
You can check whether the configuration is properly applied with the following command.
$ bin/kafka-topics.sh --describe --zookeeper zk.yoursite.com --topic as-access
Then you will see something like below.
Topic:as-access PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=86400000
The following is the right way to alter topic config as of Kafka 0.10.2.0:
bin/kafka-configs.sh --zookeeper <zk_host> --alter --entity-type topics --entity-name test_topic --add-config retention.ms=86400000
Topic config alter operations have been deprecated for bin/kafka-topics.sh.
WARNING: Altering topic configuration from this script has been deprecated and may be removed in future releases.
Going forward, please use kafka-configs.sh for this functionality`
The correct config key is retention.ms
$ bin/kafka-topics.sh --zookeeper zk.prod.yoursite.com --alter --topic as-access --config retention.ms=86400000
Updated config for topic "my-topic".
I tested and used this command in kafka confluent V4.0.0 and apache kafka V 1.0.0 and 1.0.1
/opt/kafka/confluent-4.0.0/bin/kafka-configs --zookeeper XX.XX.XX.XX:2181 --entity-type topics --entity-name test --alter --add-config retention.ms=55000
test is the topic name.
I think it works well in other versions too
Is there a way to delete all the data from a topic or delete the topic before every run?
Can I modify the KafkaConfig.scala file to change the logRetentionHours property? Is there a way the messages gets deleted as soon as the consumer reads it?
I am using producers to fetch the data from somewhere and sending the data to a particular topic where a consumer consumes, can I delete all the data from that topic on every run? I want only new data every time in the topic. Is there a way to reinitialize the topic somehow?
As I mentioned here Purge Kafka Queue:
Tested in Kafka 0.8.2, for the quick-start example: First, Add one line to server.properties file under config folder:
delete.topic.enable=true
then, you can run this command:
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
Don't think it is supported yet. Take a look at this JIRA issue "Add delete topic support".
To delete manually:
Shutdown the cluster
Clean kafka log dir (specified by the log.dir attribute in kafka config file ) as well the zookeeper data
Restart the cluster
For any given topic what you can do is
Stop kafka
Clean kafka log specific to partition, kafka stores its log file in a format of "logDir/topic-partition" so for a topic named "MyTopic" the log for partition id 0 will be stored in /tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the log.dir attribute
Restart kafka
This is NOT a good and recommended approach but it should work.
In the Kafka broker config file the log.retention.hours.per.topic attribute is used to define The number of hours to keep a log file before deleting it for some specific topic
Also, is there a way the messages gets deleted as soon as the consumer reads it?
From the Kafka Documentation :
The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the log retention is set to two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.
In fact the only metadata retained on a per-consumer basis is the position of the consumer in in the log, called the "offset". This offset is controlled by the consumer: normally a consumer will advance its offset linearly as it reads messages, but in fact the position is controlled by the consumer and it can consume messages in any order it likes. For example a consumer can reset to an older offset to reprocess.
For finding the start offset to read in Kafka 0.8 Simple Consumer example they say
Kafka includes two constants to help, kafka.api.OffsetRequest.EarliestTime() finds the beginning of the data in the logs and starts streaming from there, kafka.api.OffsetRequest.LatestTime() will only stream new messages.
You can also find the example code there for managing the offset at your consumer end.
public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
long whichTime, String clientName) {
TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1));
kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(),clientName);
OffsetResponse response = consumer.getOffsetsBefore(request);
if (response.hasError()) {
System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition) );
return 0;
}
long[] offsets = response.offsets(topic, partition);
return offsets[0];
}
Tested with kafka 0.10
1. stop zookeeper & Kafka server,
2. then go to 'kafka-logs' folder , there you will see list of kafka topic folders, delete folder with topic name
3. go to 'zookeeper-data' folder , delete data inside that.
4. start zookeeper & kafka server again.
Note : if you are deleting topic folder/s inside kafka-logs but not from zookeeper-data folder, then you will see topics are still there.
Below are scripts for emptying and deleting a Kafka topic assuming localhost as the zookeeper server and Kafka_Home is set to the install directory:
The script below will empty a topic by setting its retention time to 1 second and then removing the configuration:
#!/bin/bash
echo "Enter name of topic to empty:"
read topicName
/$Kafka_Home/bin/kafka-configs --zookeeper localhost:2181 --alter --entity-type topics --entity-name $topicName --add-config retention.ms=1000
sleep 5
/$Kafka_Home/bin/kafka-configs --zookeeper localhost:2181 --alter --entity-type topics --entity-name $topicName --delete-config retention.ms
To fully delete topics you must stop any applicable kafka broker(s) and remove it's directory(s) from the kafka log dir (default: /tmp/kafka-logs) and then run this script to remove the topic from zookeeper. To verify it's been deleted from zookeeper the output of ls /brokers/topics should no longer include the topic:
#!/bin/bash
echo "Enter name of topic to delete from zookeeper:"
read topicName
/$Kafka_Home/bin/zookeeper-shell localhost:2181 <<EOF
rmr /brokers/topics/$topicName
ls /brokers/topics
quit
EOF
As a dirty workaround, you can adjust per-topic runtime retention settings, e.g. bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my_topic --config retention.bytes=1 (retention.bytes=0 might also work)
After a short while kafka should free the space. Not sure if this has any implications compared to re-creating the topic.
ps. Better bring retention settings back, once kafka done with cleaning.
You can also use retention.ms to persist historical data
We tried pretty much what the other answers are describing with moderate level of success.
What really worked for us (Apache Kafka 0.8.1) is the class command
sh kafka-run-class.sh kafka.admin.DeleteTopicCommand --topic yourtopic --zookeeper localhost:2181
For brew users
If you're using brew like me and wasted a lot of time searching for the infamous kafka-logs folder, fear no more. (and please do let me know if that works for you and multiple different versions of Homebrew, Kafka etc :) )
You're probably going to find it under:
Location:
/usr/local/var/lib/kafka-logs
How to actually find that path
(this is also helpful for basically every app you install through brew)
1) brew services list
kafka started matbhz
/Users/matbhz/Library/LaunchAgents/homebrew.mxcl.kafka.plist
2) Open and read that plist you found above
3) Find the line defining server.properties location open it, in my case:
/usr/local/etc/kafka/server.properties
4) Look for the log.dirs line:
log.dirs=/usr/local/var/lib/kafka-logs
5) Go to that location and delete the logs for the topics you wish
6) Restart Kafka with brew services restart kafka
All data about topics and its partitions are stored in tmp/kafka-logs/. Moreover they are stored in a format topic-partionNumber, so if you want to delete a topic newTopic, you can:
stop kafka
delete the files rm -rf /tmp/kafka-logs/newTopic-*
As of kafka 2.3.0 version, there is an alternate way to soft deletion of Kafka (old approach are deprecated ).
Update retention.ms to 1 sec (1000ms) then set it again after a min, to default setting i.e 7 days (168 hours, 604,800,000 in ms )
Soft deletion:- (rentention.ms=1000) (using kafka-configs.sh)
bin/kafka-configs.sh --zookeeper 192.168.1.10:2181 --alter --entity-name kafka_topic3p3r --entity-type topics --add-config retention.ms=1000
Completed Updating config for entity: topic 'kafka_topic3p3r'.
Setting to default:- 7 days (168 hours , retention.ms= 604800000)
bin/kafka-configs.sh --zookeeper 192.168.1.10:2181 --alter --entity-name kafka_topic3p3r --entity-type topics --add-config retention.ms=604800000
Simplest way without restarting servers(I am using this with AWS MSK seamlessly):
cd kafka_2.12-2.6.2/bin
Topic Deletion:
Please replace $topic_name:
./kafka-topics.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties \
--delete \
--topic $topic_name
Here is the client.properties file:
kafka_2.12-2.6.2/bin/client.properties
ssl.truststore.location=/usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
max.request.size=104857600
Topic Data Deletion:
Option A:
./kafka-delete-records.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties \
--offset-json-file ./delete-records.json
This is most clean way to delete the data immediately rather than waiting for Kafka to do this as a background job. But there is one time extra effort on specifiying all the partitions for a particular topic in the delete JSON file.
Here is the delete-records.json content is:
{
"partitions": [
{
"topic": $topic_name,
"partition": 0,
"offset": -1
},
{
"topic": $topic_name,
"partition": 1,
"offset": -1
},
{
"topic": $topic_name,
"partition": 2,
"offset": -1
}
],
"version": 1
}
Option B:
Step1:
./kafka-configs.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties
--alter \
--entity-type topics \
--add-config retention.ms=1 \
--entity-name $topic_name
Now, wait for couple of minutes to let Kafka delete the data from topic and now come back and revert to default 7 days data retention.
Step2:
./kafka-configs.sh \
--bootstrap-server $kafka_bootstrap_servers \
--command-config client.properties
--alter \
--entity-type topics \
--add-config retention.ms=604800000 \
--entity-name $topic_name
Stop ZooKeeper and Kafka
In server.properties, change log.retention.hours value. You can comment log.retention.hours and add log.retention.ms=1000. It would keep the record on Kafka Topic for only one second.
Start zookeeper and kafka.
Check on consumer console. When I opened the console for the first time, record was there. But when I opened the console again, the record was removed.
Later on, you can set the value of log.retention.hours to your desired figure.
I use the utility below to cleanup after my integration test run.
It uses the latest AdminZkClient api. The older api has been deprecated.
import javax.inject.Inject
import kafka.zk.{AdminZkClient, KafkaZkClient}
import org.apache.kafka.common.utils.Time
class ZookeeperUtils #Inject() (config: AppConfig) {
val testTopic = "users_1"
val zkHost = config.KafkaConfig.zkHost
val sessionTimeoutMs = 10 * 1000
val connectionTimeoutMs = 60 * 1000
val isSecure = false
val maxInFlightRequests = 10
val time: Time = Time.SYSTEM
def cleanupTopic(config: AppConfig) = {
val zkClient = KafkaZkClient.apply(zkHost, isSecure, sessionTimeoutMs, connectionTimeoutMs, maxInFlightRequests, time)
val zkUtils = new AdminZkClient(zkClient)
val pp = new Properties()
pp.setProperty("delete.retention.ms", "10")
pp.setProperty("file.delete.delay.ms", "1000")
zkUtils.changeTopicConfig(testTopic , pp)
// zkUtils.deleteTopic(testTopic)
println("Waiting for topic to be purged. Then reset to retain records for the run")
Thread.sleep(60000L)
val resetProps = new Properties()
resetProps.setProperty("delete.retention.ms", "3000000")
resetProps.setProperty("file.delete.delay.ms", "4000000")
zkUtils.changeTopicConfig(testTopic , resetProps)
}
}
There is an option delete topic. But, it marks the topic for deletion. Zookeeper later deletes the topic. Since this can be unpredictably long, I prefer the retention.ms approach
do:
cd /path/to/kafkaInstallation/kafka-server
bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic name_of_kafka_topic
then you can recreate it using:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic name_of_kafka_topic
In manually deleting a topic from a kafka cluster , you just might check this out https://github.com/darrenfu/bigdata/issues/6
A vital step missed a lot in most solution is in deleting the /config/topics/<topic_name> in ZK.
I use this script:
#!/bin/bash
topics=`kafka-topics --list --zookeeper zookeeper:2181`
for t in $topics; do
for p in retention.ms retention.bytes segment.ms segment.bytes; do
kafka-topics --zookeeper zookeeper:2181 --alter --topic $t --config ${p}=100
done
done
sleep 60
for t in $topics; do
for p in retention.ms retention.bytes segment.ms segment.bytes; do
kafka-topics --zookeeper zookeeper:2181 --alter --topic $t --delete-config ${p}
done
done
There are two solutions to clean up topics data
Change the zookeeper dataDir path "dataDir=/dataPath" to some other
value, delete kafka logs folder and restart zookeeper and kafka
server
Run zkCleanup.sh from zookeeper server
I pushed a message that was too big into a kafka message topic on my local machine, now I'm getting an error:
kafka.common.InvalidMessageSizeException: invalid message size
Increasing the fetch.size is not ideal here, because I don't actually want to accept messages that big.
Temporarily update the retention time on the topic to one second:
kafka-topics.sh \
--zookeeper <zkhost>:2181 \
--alter \
--topic <topic name> \
--config retention.ms=1000
And in newer Kafka releases, you can also do it with kafka-configs --entity-type topics
kafka-configs.sh \
--zookeeper <zkhost>:2181 \
--entity-type topics \
--alter \
--entity-name <topic name> \
--add-config retention.ms=1000
then wait for the purge to take effect (duration depends on size of the topic). Once purged, restore the previous retention.ms value.
To purge the queue you can delete the topic:
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
then re-create it:
bin/kafka-topics.sh --create --zookeeper localhost:2181 \
--replication-factor 1 --partitions 1 --topic test
While the accepted answer is correct, that method has been deprecated. Topic configuration should now be done via kafka-configs.
kafka-configs --zookeeper localhost:2181 --entity-type topics --alter --add-config retention.ms=1000 --entity-name MyTopic
Configurations set via this method can be displayed with the command
kafka-configs --zookeeper localhost:2181 --entity-type topics --describe --entity-name MyTopic
Here are the steps to follow to delete a topic named MyTopic:
Describe the topic, and take note of the broker ids
Stop the Apache Kafka daemon for each broker ID listed.
Connect to each broker (from step 1), and delete the topic data folder, e.g. rm -rf /tmp/kafka-logs/MyTopic-0. Repeat for other partitions, and all replicas
Delete the topic metadata: zkCli.sh then rmr /brokers/MyTopic
Start the Apache Kafka daemon for each stopped machine
If you miss you step 3, then Apache Kafka will continue to report the topic as present (for example when if you run kafka-list-topic.sh).
Tested with Apache Kafka 0.8.0.
Tested in Kafka 0.8.2, for the quick-start example:
First, Add one line to server.properties file under config folder:
delete.topic.enable=true
then, you can run this command:
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
Then recreate it, for clients to continue operations against an empty topic
Following command can be used to delete all the existing messages in kafka topic:
kafka-delete-records --bootstrap-server <kafka_server:port> --offset-json-file delete.json
The structure of the delete.json file should be following:
{
"partitions": [
{
"topic": "foo",
"partition": 1,
"offset": -1
}
],
"version": 1
}
where offset :-1 will delete all the records
(This command has been tested with kafka 2.0.1
From kafka 1.1
Purge a topic
bin/kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name tp_binance_kline --add-config retention.ms=100
wait at least 1 minute, to be secure that kafka purge the topic
remove the configuration, and then go to default value
bin/kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name tp_binance_kline --delete-config retention.ms
kafka don't have direct method for purge/clean-up topic (Queues), but can do this via deleting that topic and recreate it.
first of make sure sever.properties file has and if not add delete.topic.enable=true
then, Delete topic
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic myTopic
then create it again.
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic myTopic --partitions 10 --replication-factor 2
Following #steven appleyard answer I executed the following commands on Kafka 2.2.0 and they worked for me.
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name <topic-name> --describe
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name <topic-name> --alter --add-config retention.ms=1000
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name <topic-name> --alter --delete-config retention.ms
UPDATE: This answer is relevant for Kafka 0.6. For Kafka 0.8 and later see answer by #Patrick.
Yes, stop kafka and manually delete all files from corresponding subdirectory (it's easy to find it in kafka data directory). After kafka restart the topic will be empty.
A lot of great answers over here but among them, I didn't find one about docker. I spent some time to figure out that using the broker container is wrong for this case (obviously!!!)
## this is wrong!
docker exec broker1 kafka-topics --zookeeper localhost:2181 --alter --topic mytopic --config retention.ms=1000
Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:258)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:254)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:112)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1826)
at kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:280)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
and I should have used zookeeper:2181 instead of --zookeeper localhost:2181 as per my compose file
## this might be an option, but as per comment below not all zookeeper images can have this script included
docker exec zookeper1 kafka-topics --zookeeper localhost:2181 --alter --topic mytopic --config retention.ms=1000
the correct command would be
docker exec broker1 kafka-configs --zookeeper zookeeper:2181 --alter --entity-type topics --entity-name dev_gdn_urls --add-config retention.ms=12800000
Hope it will save someone's time.
Also, be aware that the messages won't be deleted immediately and it will happen when the segment of the log will be closed.
Sometimes, if you've a saturated cluster (too many partitions, or using encrypted topic data, or using SSL, or the controller is on a bad node, or the connection is flaky, it'll take a long time to purge said topic.
I follow these steps, particularly if you're using TLS.
1: Run with kafka tools :
kafka-configs.sh --alter --entity-type topics --zookeeper zookeeper01.kafka.com --add-config retention.ms=1 --entity-name <topic-name>
2: Run:
kafka-console-consumer --consumer-property security.protocol=SSL --consumer-property ssl.truststore.location=/etc/schema-registry/secrets/trust.jks --consumer-property ssl.truststore.password=password --consumer-property ssl.keystore.location=/etc/schema-registry/secrets/identity.jks --consumer-property ssl.keystore.password=password --consumer-property ssl.key.password=password --bootstrap-server broker01.kafka.com:9092 --topic <topic-name> --new-consumer --from-beginning
3: Set topic retention back to the original setting, once topic is empty.
kafka-configs.sh --alter --entity-type topics --zookeeper zookeeper01.kafka.com --add-config retention.ms=604800000 --entity-name <topic-name>
Hope this helps someone, as it isn't easily advertised.
The simplest approach is to set the date of the individual log files to be older than the retention period. Then the broker should clean them up and remove them for you within a few seconds. This offers several advantages:
No need to bring down brokers, it's a runtime operation.
Avoids the possibility of invalid offset exceptions (more on that below).
In my experience with Kafka 0.7.x, removing the log files and restarting the broker could lead to invalid offset exceptions for certain consumers. This would happen because the broker restarts the offsets at zero (in the absence of any existing log files), and a consumer that was previously consuming from the topic would reconnect to request a specific [once valid] offset. If this offset happens to fall outside the bounds of the new topic logs, then no harm and the consumer resumes at either the beginning or the end. But, if the offset falls within the bounds of the new topic logs, the broker attempts to fetch the message set but fails because the offset doesn't align to an actual message.
This could be mitigated by also clearing the consumer offsets in zookeeper for that topic. But if you don't need a virgin topic and just want to remove the existing contents, then simply 'touch'-ing a few topic logs is far easier and more reliable, than stopping brokers, deleting topic logs, and clearing certain zookeeper nodes.
Thomas' advice is great but unfortunately zkCli in old versions of Zookeeper (for example 3.3.6) do not seem to support rmr. For example compare the command line implementation in modern Zookeeper with version 3.3.
If you are faced with an old version of Zookeeper one solution is to use a client library such as zc.zk for Python. For people not familiar with Python you need to install it using pip or easy_install. Then start a Python shell (python) and you can do:
import zc.zk
zk = zc.zk.ZooKeeper('localhost:2181')
zk.delete_recursive('brokers/MyTopic')
or even
zk.delete_recursive('brokers')
if you want to remove all the topics from Kafka.
Besides updating retention.ms and retention.bytes, I noticed topic cleanup policy should be "delete" (default), if "compact", it is going to hold on to messages longer, i.e., if it is "compact", you have to specify delete.retention.ms also.
$ ./bin/kafka-configs.sh --zookeeper localhost:2181 --describe --entity-name test-topic-3-100 --entity-type topics
Configs for topics:test-topic-3-100 are retention.ms=1000,delete.retention.ms=10000,cleanup.policy=delete,retention.bytes=1
Also had to monitor earliest/latest offsets should be same to confirm this successfully happened, can also check the du -h /tmp/kafka-logs/test-topic-3-100-*
$ ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list "BROKER:9095" --topic test-topic-3-100 --time -1 | awk -F ":" '{sum += $3} END {print sum}'
26599762
$ ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list "BROKER:9095" --topic test-topic-3-100 --time -2 | awk -F ":" '{sum += $3} END {print sum}'
26599762
The other problem is, you have to get current config first so you remember to revert after deletion is successful:
./bin/kafka-configs.sh --zookeeper localhost:2181 --describe --entity-name test-topic-3-100 --entity-type topics
The workaround of temporarily reducing the retention time for a topic, suggested by user644265 in this answer still works but recent versions of kafka-configs will warn that the --zookeeper option has been deprecated:
Warning: --zookeeper is deprecated and will be removed in a future version of Kafka
Use --bootstrap-server instead; for example
kafka-configs --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name my_topic --add-config retention.ms=100
and
kafka-configs --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name my_topic --delete-config retention.ms
To clean up all the messages from a particular topic using your application group (GroupName should be same as application kafka group name).
./kafka-path/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topicName --from-beginning --group application-group
Another, rather manual, approach for purging a topic is:
in the brokers:
stop kafka broker
sudo service kafka stop
delete all partition log files (should be done on all brokers)
sudo rm -R /kafka-storage/kafka-logs/<some_topic_name>-*
in zookeeper:
run zookeeper command line interface
sudo /usr/lib/zookeeper/bin/zkCli.sh
use zkCli to remove the topic metadata
rmr /brokers/topic/<some_topic_name>
in the brokers again:
restart broker service
sudo service kafka start
./kafka-topics.sh --describe --zookeeper zkHost:2181 --topic myTopic
This should give retention.ms configured. Then you can use above alter command to change to 1second (and later revert back to default).
Topic:myTopic PartitionCount:6 ReplicationFactor:1 Configs:retention.ms=86400000
From Java, using the new AdminZkClient instead of the deprecated AdminUtils:
public void reset() {
try (KafkaZkClient zkClient = KafkaZkClient.apply("localhost:2181", false, 200_000,
5000, 10, Time.SYSTEM, "metricGroup", "metricType")) {
for (Map.Entry<String, List<PartitionInfo>> entry : listTopics().entrySet()) {
deleteTopic(entry.getKey(), zkClient);
}
}
}
private void deleteTopic(String topic, KafkaZkClient zkClient) {
// skip Kafka internal topic
if (topic.startsWith("__")) {
return;
}
System.out.println("Resetting Topic: " + topic);
AdminZkClient adminZkClient = new AdminZkClient(zkClient);
adminZkClient.deleteTopic(topic);
// deletions are not instantaneous
boolean success = false;
int maxMs = 5_000;
while (maxMs > 0 && !success) {
try {
maxMs -= 100;
adminZkClient.createTopic(topic, 1, 1, new Properties(), null);
success = true;
} catch (TopicExistsException ignored) {
}
}
if (!success) {
Assert.fail("failed to create " + topic);
}
}
private Map<String, List<PartitionInfo>> listTopics() {
Properties props = new Properties();
props.put("bootstrap.servers", kafkaContainer.getBootstrapServers());
props.put("group.id", "test-container-consumer-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
Map<String, List<PartitionInfo>> topics = consumer.listTopics();
consumer.close();
return topics;
}
If you want to do this programmatically within a Java Application you can use the AdminClient's API deleteRecords. Using the AdminClient allows you to delete records on a partition and offset level.
According to the JavaDocs this operation is supported by brokers with version 0.11.0.0 or higher.
Here is a simple example:
String brokers = "localhost:9092";
String topicName = "test";
TopicPartition topicPartition = new TopicPartition(topicName, 0);
RecordsToDelete recordsToDelete = RecordsToDelete.beforeOffset(5L);
Map<TopicPartition, RecordsToDelete> topicPartitionRecordToDelete = new HashMap<>();
topicPartitionRecordToDelete.put(topicPartition, recordsToDelete);
// Create AdminClient
final Properties properties = new Properties();
properties.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
AdminClient adminClient = AdminClient.create(properties);
try {
adminClient.deleteRecords(topicPartitionRecordToDelete).all().get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} finally {
adminClient.close();
}
you have to enable this on config
echo "delete.topic.enable=true" >> /opt/kafka/config/server.properties
sudo systemctl stop kafka
sudo systemctl start kafka
purge the topic
/opt/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic flows
create the topic
# /opt/kafka/bin/kafka-topics.sh --create --bootstrap-server localhost:2181 --replication-factor 1 --partitions 1 --topic Test
read the topic
# /opt/kafka/bin/kafka-console-consumer.sh localhost:9092 --topic flows --from-beginning
if you are using confluentinc/cp-kafka containers here is the command to delete the topic.
docker exec -it <kafka-container-id> kafka-topics --zookeeper zookeeper:2181 --delete --topic <topic-name>
Success response:
Topic <topic-name> is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
I'm using Kafka 2.13 tools. Now --zookeeper is unrecognized option for kafka-topics.sh . To delete a topic:
bin/kafka-topics.sh --bootstrap-server [kafka broker]:9092 --delete --topic [topic name]
Just take into account that to create the same topic again you may need to way a while if you had a lot of data in the deleted topic. When you try to create the same topic, you may get the error:
ERROR org.apache.kafka.common.errors.TopicExistsException: Topic
'[topic name]' is marked for deletion.
Just in case someone is looking for an updated answer (in 2022), I found the following will work for Kafka version 3.3.1. This will change the configuration for "your-topic" so that messages are retained for 1000ms. After messages are purged, then you can set back to a different value.
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name your-topic --alter --add-config retention.ms=1000
have you considered having your app simply use a new renamed topic? (i.e. a topic that is named like the original topic but with a "1" appended at the end).
That would also give your app a fresh clean topic.