Kafka setting default retention log and testing - apache-kafka

I'm currently trying to test the kafka retention log using a new environment with docker-kafka.
I used config/server.properties and set the following for log retention:
log.retention.ms=2000
log.retention.check.interval.ms=2000
Creating a topic and adding messages to it, I would test the size of the log by going to the location of the output logs. In my case it was at /tmp/kafka-logs/<topic-name>. Then just ls -l to see size in bytes.
Adding more messages will increase size in bytes of the log file. If there is a better way to check the logs, please let me know.
Running:
$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic <topic-name>
I would get no config output. Log file is not deleted after 2000 ms.
On Kafka docs:
The number of milliseconds to keep a log file before deleting it (in
milliseconds), If not set, the value in log.retention.minutes is used
However, setting a topic-level configuration with:
$ bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic <topic-name> --config retention.ms=2000
And checking configs on the title:
$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic <topic-name>
I see that the retention is set for the topic. Also, checking the log size again, it might delete the logs after 2000 ms, but it does clear within a short time.
How can I set default configurations for all topics created? Specifically related to log retention time?
Also, an additional add-on question, are there config files for each individual topic created? I'm mainly asking because I know how to set topic-level configurations via cli, but was curious to see if these topic-level configs were saved somewhere.

A global config log.retention.{hours|minutes|ms} controls the default behavior for all topics. The topic-level configretention.ms` is for individual topics.
In your case, the log did not get deleted before you set the topic-level config since the default parameter value is 7 days.

After a server restart, global configs are used for existing topics that use default configs (no topic-level configs that overwrite my existing configs) and any new topics created after the server restart.
I tested again with another configuration (pertaining only to retention log time) and confirmed that any changes made to global configs were not being set for existing or new topics. After another server restart, the "new" global configs were set.

Related

Changing the retention period of kafka during run time

HI I want to reduce the retention period of the Kafka.
so , How to reduce the retention during the run time and so as to not required to restart the Kafka service.
Note: I want to do retention at global level of Kafka and not topic level.
As Kafka documentation described in Broker configs section under the following sub topic
Updating Default Topic Configuration
Default topic configuration options used by brokers may be updated
without broker restart. The configs are applied to topics without a
topic config override for the equivalent per-topic config. One or more
of these configs may be overridden at cluster-default level used by
all brokers.
You can change dynamically Kafka topic default configs in cluster level. For retention period you can change below configs.
log.retention.ms
log.retention.minutes
log.retention.hours
You can see other config list in the documentation.
But again according to documentation log.retention.hours and enter link description here configurations' update mode in read-only and log.retention.ms is cluster-wide
So as stated in 3.1.1 Updating Broker Configs
From Kafka version 1.1 onwards, some of the broker configs can be
updated without restarting the broker. See the Dynamic Update Mode
column in Broker Configs for the update mode of each broker config.
read-only: Requires a broker restart for update
per-broker: May be updated dynamically for each broker
cluster-wide: May be updated dynamically as a cluster-wide default. May also be updated as a per-broker value for testing.
So only you can change log.retention.ms
config updating command for all brokers
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --add-config log.retention.ms=3600000
output
Completed updating default config for brokers in the cluster.
to verify if the config upodated in the cluster level, run following describe command
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --describe
output
Default configs for brokers in the cluster are:
log.retention.ms=3600000 sensitive=false synonyms={DYNAMIC_DEFAULT_BROKER_CONFIG:log.retention.ms=3600000}
If you need to remove or reset the config again, run
bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --entity-default --alter --delete-config log.retention.ms
In the latest version of the Kafka (using Kraft) (which is currently 3.3.2) you can change the retention of the specified topic using the following command.
/bin/kafka-configs.sh --bootstrap-server <the-ip-of-the-broker>:9092 --topic <specified-topic> --alter --add-config retention.ms=<retention-ms>

Data still remains in Kafka topic even after retention time/size

We set the log retention hours to 1 hour as the following (previously setting was 72H)
Using the following Kafka command line tool, we set the kafka retention.ms to 1H. Our aim is to purge the data that is older then 1H in topic - test_topic, so we used the following command:
kafka-configs.sh --alter \
--zookeeper localhost:2181 \
--entity-type topics \
--entity-name topic_test \
--add-config retention.ms=3600000
and also
kafka-topics.sh --zookeeper localhost:2181 --alter \
--topic topic_test \
--config retention.ms=3600000
Both commands ran without errors.
But the problem is about Kafka data that is older then 1H and still remains!
Actually no data was removed from the topic topic_test partitions. We have HDP Kafka cluster version 1.0x and ambari
We do not understand why data on topic - topic_test still remained? and not decreased even after we run both cli as already described
what is wrong on the following kafka cli?
kafka-configs.sh --alter --zookeeper localhost:2181 --entity-type topics --entity-name topic_test --add-config retention.ms=3600000
kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic_test --config retention.ms=3600000
from the Kafka server.log we ca see the following
2020-07-28 14:47:27,394] INFO Processing override for entityPath: topics/topic_test with config: Map(retention.bytes -> 2165441552, retention.ms -> 3600000) (kafka.server.DynamicConfigManager)
[2020-07-28 14:47:27,397] WARN retention.ms for topic topic_test is set to 3600000. It is smaller than message.timestamp.difference.max.ms's value 9223372036854775807. This may result in frequent log rolling. (kafka.server.TopicConfigHandler)
reference - https://ronnieroller.com/kafka/cheat-sheet
The log cleaner will only work on inactive (sometimes also referred to as "old" or "clean") segments. As long as all data fits into the active ("dirty", "unclean") segment where its size is defined by segment.bytes size limit there will be no cleaning happening.
The configuration cleanup.policy is described as:
A string that is either "delete" or "compact" or both. This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.
In addition, the segment.bytes is:
This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
The configuration segment.ms can also be used to steer the deletion:
This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn't full to ensure that retention can delete or compact old data.
As it defaults to one week, you might want to reduce it to fit your needs.
Therefore, if you want to set the retention of a topic to e.g. one hour you could set:
cleanup.policy=delete
retention.ms=3600000
segment.ms=3600000
file.delete.delay.ms=1 (The time to wait before deleting a file from the filesystem)
segment.bytes=1024
Note: I am not referring to retention.bytes. The segment.bytes is a very different thing as described above. Also, be aware that log.retention.hours is a cluster-wide configuration. So, if you plan to have different retention times for different topics this will solve it.

Kafka "num.partitions" setting in service.properties does not take effect

We use Kafka in Docker container. We create topics automatically if the topic does not exist when producing or consuming messages. We want 3 partitions for the topics, so set
num.partitions=3
in file /etc/kafka/server.properties in the Kafka container. However, it does not take effect. After doing the setting and restarting the container, then try subscribing or publishing on some non-existential topics, the topics are created, but only with one partition.
We tried this on containers created from image confluentinc/cp-kafka:5.1.0 and also on containers created from image confluentinc/cp-enterprise-kafka:5.3.1, and the behaviors were the same.
We tested creating topics with command:
kafka-topics --create --topic my_topic --zookeeper zookeeper:2181 --replication-factor 1 --partitions 3
This correctly created the topic with three partitions. But we need Kafka to create multi-partition topics automatically.
What could cause the problem? Or how to make Kafka auto-create multi-partition topics?
We do not have any dynamic configs. This is verified by running the following commands:
kafka-configs --bootstrap-server kafka:9092 --entity-type brokers --entity-default --describe
kafka-configs --bootstrap-server kafka:9092 --entity-type brokers --entity-name 0 (or other ids) --describe
Those commands return empty results.
This answer comes kinda late, but I've been struggling with the same thing using the docker image: confluentinc/cp-enterprise-kafka:5.3.4.
The solution for me was adding a new environment variable in my docker-compose:
KAFKA_NUM_PARTITIONS: 3 (or the partitions you want)
This will automatically add the property num.partitions in your kafka.properties file under /etc/kafka/ directory.
Modifying the property num.partitions in /etc/kafka/server.properties didn't work for me neither.
Docker containers are ephemeral, meaning that once you stopped them, all those changes that you've applied are lost.
If you want to overwrite the default settings you have to mount the property file:
Create a server.properties file on your machine
Fill it in with the properties that you need ( including the original ones )
Mount this file and in order to replace the original one from the container:
docker run ... -v /path/to/custom/server.properties:/etc/kafka/server.properties ...

Kafka 10.2 new consumer vs old consumer

I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.

Limiting log size for a particular topic in kafka

I am trying to limit log size for a kafka topic.
I followed the kafka documentation(Kafka 0.10.0 Documentation) and set these two property cleanup.policy=delete,retention.bytes=1.
--> result of topic describe command
./kafka-topics.sh --describe --zookeeper localhost:2181 --topic
kafkatest1 Topic:kafkatest1 PartitionCount:3
ReplicationFactor:1
Configs:cleanup.policy=delete,retention.bytes=1
I was expecting that whatever message i am writing to topic 'kafkatest1' will get deleted automatically since i have set retention.bytes to 1.
but messages are keep getting appended.
Is there any additional configuration is required to achieve this?
Please check your log.segment.bytes (the default is 1073741824 bytes). As far as I know the retention will be taken into account when a log segment is rolled.
Could you explain what you are trying to a achieve? Why sending a message to delete it?