Changing the retention period of kafka during run time - apache-kafka

HI I want to reduce the retention period of the Kafka.
so , How to reduce the retention during the run time and so as to not required to restart the Kafka service.
Note: I want to do retention at global level of Kafka and not topic level.

As Kafka documentation described in Broker configs section under the following sub topic
Updating Default Topic Configuration
Default topic configuration options used by brokers may be updated
without broker restart. The configs are applied to topics without a
topic config override for the equivalent per-topic config. One or more
of these configs may be overridden at cluster-default level used by
all brokers.
You can change dynamically Kafka topic default configs in cluster level. For retention period you can change below configs.
log.retention.ms
log.retention.minutes
log.retention.hours
You can see other config list in the documentation.
But again according to documentation log.retention.hours and enter link description here configurations' update mode in read-only and log.retention.ms is cluster-wide
So as stated in 3.1.1 Updating Broker Configs
From Kafka version 1.1 onwards, some of the broker configs can be
updated without restarting the broker. See the Dynamic Update Mode
column in Broker Configs for the update mode of each broker config.
read-only: Requires a broker restart for update
per-broker: May be updated dynamically for each broker
cluster-wide: May be updated dynamically as a cluster-wide default. May also be updated as a per-broker value for testing.
So only you can change log.retention.ms
config updating command for all brokers
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --add-config log.retention.ms=3600000
output
Completed updating default config for brokers in the cluster.
to verify if the config upodated in the cluster level, run following describe command
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --describe
output
Default configs for brokers in the cluster are:
log.retention.ms=3600000 sensitive=false synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:log.retention.ms=3600000}
If you need to remove or reset the config again, run
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --delete-config log.retention.ms

In the latest version of the Kafka (using Kraft) (which is currently 3.3.2) you can change the retention of the specified topic using the following command.
/bin/kafka-configs.sh --bootstrap-server <the-ip-of-the-broker>:9092 --topic <specified-topic> --alter --add-config retention.ms=<retention-ms>

Related

Data still remains in Kafka topic even after retention time/size

We set the log retention hours to 1 hour as the following (previously setting was 72H)
Using the following Kafka command line tool, we set the kafka retention.ms to 1H. Our aim is to purge the data that is older then 1H in topic - test_topic, so we used the following command:
kafka-configs.sh --alter \
--zookeeper localhost:2181 \
--entity-type topics \
--entity-name topic_test \
--add-config retention.ms=3600000
and also
kafka-topics.sh --zookeeper localhost:2181 --alter \
--topic topic_test \
--config retention.ms=3600000
Both commands ran without errors.
But the problem is about Kafka data that is older then 1H and still remains!
Actually no data was removed from the topic topic_test partitions. We have HDP Kafka cluster version 1.0x and ambari
We do not understand why data on topic - topic_test still remained? and not decreased even after we run both cli as already described
what is wrong on the following kafka cli?
kafka-configs.sh --alter --zookeeper localhost:2181 --entity-type topics --entity-name topic_test --add-config retention.ms=3600000
kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic_test --config retention.ms=3600000
from the Kafka server.log we ca see the following
2020-07-28 14:47:27,394] INFO Processing override for entityPath: topics/topic_test with config: Map(retention.bytes -> 2165441552, retention.ms -> 3600000) (kafka.server.DynamicConfigManager)
[2020-07-28 14:47:27,397] WARN retention.ms for topic topic_test is set to 3600000. It is smaller than message.timestamp.difference.max.ms's value 9223372036854775807. This may result in frequent log rolling. (kafka.server.TopicConfigHandler)
reference - https://ronnieroller.com/kafka/cheat-sheet
The log cleaner will only work on inactive (sometimes also referred to as "old" or "clean") segments. As long as all data fits into the active ("dirty", "unclean") segment where its size is defined by segment.bytes size limit there will be no cleaning happening.
The configuration cleanup.policy is described as:
A string that is either "delete" or "compact" or both. This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.
In addition, the segment.bytes is:
This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
The configuration segment.ms can also be used to steer the deletion:
This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn't full to ensure that retention can delete or compact old data.
As it defaults to one week, you might want to reduce it to fit your needs.
Therefore, if you want to set the retention of a topic to e.g. one hour you could set:
cleanup.policy=delete
retention.ms=3600000
segment.ms=3600000
file.delete.delay.ms=1 (The time to wait before deleting a file from the filesystem)
segment.bytes=1024
Note: I am not referring to retention.bytes. The segment.bytes is a very different thing as described above. Also, be aware that log.retention.hours is a cluster-wide configuration. So, if you plan to have different retention times for different topics this will solve it.

In apache kafka, how can we delete the content of _schemas topic without deleting the topic?

In apache kafka, Is there any option delete the content of _schemas topic without deleting the topic and without changing the retention period to 1 sec?
Following up on answer by #cricket_007, Kafka internal topics __consumer_offsets and _schemas have compact cleanup policy by default.
If you want to alter the configuration just to delete, you can use ./kafka-topics.sh --zookeeper <host>:2181 --topic _schemas --alter --config cleanup.policy=delete
Although both compact,delete policy is recommended.
Well, the topic itself is set to cleanup.policy=compact, so retention does not apply.
If you want an empty topic, you should just reboot Registry with a different kafkastore.topic to create a new one
Otherwise, setting cleanup.policy=compact,delete, then something like log.retention.ms=100, will clean up the topic

Increasing Default number of partitions in Kafka cluster

When we create a new Kafka topic automatically in Kafka the default number of partitions for that topic will be 1, since the configuration num.partitions=1 .
Is there any ways to increase this property using any command or scripts without editing the server.properties file?
For updating the property you will have to modify the server.properties but you can increase the partitions by using kafka admin scripts as below
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name
--partitions <number_of_partitions>
You could make a script called create-topic.sh:
./bin/kafka-topics.sh --create --zookeeper <ZK_HOST> --topic $1 --partitions <DEFAULT_NUM_TOPICS>
and force everyone to only make topics via this script:
./create-topic.sh <TOPIC_NAME>
This isn't a fantastic solution, but you're severely limited if you really can't change server.properties.
In Kafka version 1.1, dynamic broker configuration feature is added. But, updating num.partitions config is not supported.

Kafka setting default retention log and testing

I'm currently trying to test the kafka retention log using a new environment with docker-kafka.
I used config/server.properties and set the following for log retention:
log.retention.ms=2000
log.retention.check.interval.ms=2000
Creating a topic and adding messages to it, I would test the size of the log by going to the location of the output logs. In my case it was at /tmp/kafka-logs/<topic-name>. Then just ls -l to see size in bytes.
Adding more messages will increase size in bytes of the log file. If there is a better way to check the logs, please let me know.
Running:
$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic <topic-name>
I would get no config output. Log file is not deleted after 2000 ms.
On Kafka docs:
The number of milliseconds to keep a log file before deleting it (in
milliseconds), If not set, the value in log.retention.minutes is used
However, setting a topic-level configuration with:
$ bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic <topic-name> --config retention.ms=2000
And checking configs on the title:
$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic <topic-name>
I see that the retention is set for the topic. Also, checking the log size again, it might delete the logs after 2000 ms, but it does clear within a short time.
How can I set default configurations for all topics created? Specifically related to log retention time?
Also, an additional add-on question, are there config files for each individual topic created? I'm mainly asking because I know how to set topic-level configurations via cli, but was curious to see if these topic-level configs were saved somewhere.
A global config log.retention.{hours|minutes|ms} controls the default behavior for all topics. The topic-level configretention.ms` is for individual topics.
In your case, the log did not get deleted before you set the topic-level config since the default parameter value is 7 days.
After a server restart, global configs are used for existing topics that use default configs (no topic-level configs that overwrite my existing configs) and any new topics created after the server restart.
I tested again with another configuration (pertaining only to retention log time) and confirmed that any changes made to global configs were not being set for existing or new topics. After another server restart, the "new" global configs were set.

kafka broker config change dynamically

I'm using kafka_2.9.2-0.8.1.1 with zookeeper 3.4.6.
Is there a way to change broker configuration settings dynamically? Specifically, I want to change controlled.shutdown.enable
bin/kafka-topics.sh --zookeeper zookeeper01.mysite.com --config controlled.shutdown.enable=true --alter
but I get the error
Missing required argument "[topic]"
No, you can't change broker configs dynamically.
There are two kinds of configurations related to the brokers: broker configs and per-topic configs.
Since per-topic configs are managed by a Zookeeper cluster, you can change those with kafka-topics.sh on the fly.
controlled.shutdown.enable is, however, a broker config which can be only set up by server.properties file and requires broker restarting when to be changed.
This issue was also discussed in Kafka JIRA:
[KAFKA-1229] Reload broker config without a restart
You can now from 1.1 onwards: Dynamic Broker Config
In your case, something like:
> bin/kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type brokers --entity-name 0 --alter \
--add-config controlled.shutdown.enable=true