How to find the retention bytes per topic - apache-kafka

we the following kafka config command we can set the retention bytes to 1000000
TOPIC_NAME=test
kafka-configs --alter --zookeeper localhost:2181 --entity-type topics --entity-name $TOPIC_NAME --add-config retention.bytes=1000000
but how to do the opposite way to find the retention bytes per topic ?

You can use describe key
./kafka-configs --zookeeper localhost:2181 --describe --entity-type topics --entity-name test
The altered properties are returned, otherwise they are default from the broker.

Related

How to change change Kafka retention which is not working

I am using dockerised wurstmeister/kafka-docker. I created a topic using
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 27 --topic raw-sensor-data --config retention.ms=86400000
After a few days I tried changing retention period by
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name raw-sensor-data --add-config retention.ms=3600000
I also tried
bin/kafka-topics.sh --zookeeper locahost:2181 --alter --topic raw-sensor-data --config retention.ms=3600000
and
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic raw-sensor-data --config cleanup.policy=delete
This also gets reflected in topic describe details
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topics-with-overrides
Topic: raw-sensor-data PartitionCount: 27 ReplicationFactor: 1 Configs: cleanup.policy=delete,retention.ms=3600000
But I can still see old data and data is not getting deleted in 1 hour time.
In server.properties I have
log.retention.check.interval.ms=300000
Only closed log segments will be deleted. The default segment size is 1GB.
So, if you have less data in the topic, it will remain, regardless of the time that has passed.

kafka + how to change the topic retention hours with kafka cli as kafka-configs

we want to change the kafka retention hours to 1 hour
kafka-configs.sh --alter --zookeeper localhost:2181 --entity-type topics --entity-name topic_test --add-config retention.hours=1
Error while executing config command Unknown topic config name: retention.hours
org.apache.kafka.common.errors.InvalidConfigurationException: Unknown topic config name: retention.hours
then we try
kafka-configs.sh --alter --zookeeper localhost:2181 --entity-type topics --entity-name topic_test --add-config retention.ms=3600000
Completed Updating config for entity: topic 'test_topic'.
my equation is
dose retention.ms is the same as retention.hours but in mili seconds ?
the target is to purge the data Kafka files after 1 hour
but the kafka-configs.sh not accept the retention.hours=1 , so we change it to retention.ms=3600000
as I understand
Retention time is controlled by the cluster-wide as:
log.retention.ms
log.retention.minutes
log.retention.hours
so this configuration properties is from higher to lower
but as I explained kafka cli cant use retention.hours , so this is the reason that I am using retention.ms=3600000
Reference -
https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-log-cleanup-policies.html#log-retention
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/installing-configuring-kafka/content/log_settings.html
retention.hours does not exist as a topic config

How to purge or delete a topic in kafka 2.1.0 version

Would like to share different ways to purge or delete a kafka topic in 2.1.0 version. I've found similar question here Purge Kafka Topic however, the accepted answer has been deprecated and it works on Kafka version 0.8 and below hence, creating this question with answer.
This is not a duplicate question.
Kafka by default keeps the messages for 168 hrs which is 7 days. If you wanted to force kafka to purge the topic, you can do this by multiple ways. Let’s see each in detail.
1. Using kafka-configs.sh command
Temporarily change the retention policy to 1 sec.
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --add-config retention.ms=1000 --entity-name text_topic
You can check the current value of retention policy by running below command.
kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --describe --entity-name text_topic
Configs for topic 'text_topic' are retention.ms=1000
Wait for 1 sec and remove the retention policy config which will set it back to default.
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --delete-config retention.ms --entity-name text_topic
2. Delete topic and re-create
Before we delete an existing topic, first get the partition and replica of the current topic as you would need these to re-create the topic. You can get this information by running describe of the topic
kafka-topics.sh --zookeeper localhost:2181 --describe --topic text_topic
Topic:text_topic PartitionCount:3 ReplicationFactor:3 Configs:
Topic: text_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Delete the topic.
kafka-topics.sh --zookeeper localhost:2181 --delete --topic text_topic
Re-create the topic with replication and partition details.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic text_topic
3. Manually delete the data from kafka logs.
Stop zookeeper and kafka from all nodes.
Clean kafka logs from all nodes. kafka stores its log files at
/tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the
log.dirattribute
Restart zookeeper and kafka.
Hope this helps !!

How to verify a Kafka topic is indeed purged after setting retention time to 1 second

I want to purge a topic in Kafka. So I set the retention time to 1 seconds using
/opt/kafka/bin/kafka-configs.sh --zookeeper zk.com --entity-type topics --entity-name my_topic --alter --add-config retention.ms=1000
log.retention.check.interval.ms=300000 i.e 5 mins so I wait 7 minutes and then reset the above retention value
/opt/kafka/bin/kafka-configs.sh --zookeeper zk.com --entity-type topics --entity-name my_topic --alter --delete-config retention.ms
How can I know for certain the the topic is indeed purged?
Another way to do this is to use the GetOffsetShell tool. To get the start offset of each partition in the topic run
bin/kafka-run-class.sh kafka.tools.GetOffsetShell -broker-list localhost:9092 --topic <topic> --time -2
and to get the end offsets run
bin/kafka-run-class.sh kafka.tools.GetOffsetShell -broker-list localhost:9092 --topic <topic> --time -1
If reported offsets are equal, the topic is purged.
I believe one of the ways to surely know that the topic is indeed purged is to read it using the option --from-beginning. For e.g
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server broker.com:9093 --topic my_topic --from-beginning
If it doesn't return any output, then we can be sure it's purged.

Kafka topic have no duplication on messages

How to achieve such outcome with messages in kafka topics?
I.e changelog-like functionality - have multiple messages coming into the topic, but I only care about the last one that came in.
Also what happens in the case topic is partitioned?
Is it possible in Kafka?
To achieve this, you should set cleanup.policy for this topic to compact, as shown below:
CREATE TOPIC:
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config cleanup.policy=compact
UPDATE TOPIC:
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --add-config cleanup.policy=compact
With compact policy set, you have to assign a key for every message and Kafka producer will partition messages based on that key.