Kafka Configuration Change - apache-kafka

I am trying to tweak two config params of a kafka topic dynamically (without restart) i.e. flush.messages,flush.ms to restrict the no. of writes to disk, as disk seems to be bottleneck in our case. But these configuration changes are not getting applied to the topic. flush.ms has been set to 10000ms but it writes to disk at every 5000ms. Any idea why this config param isn't being picked up by the topic.
Kafka version used - 0.8.1.1
command(s) used -
bin/kafka-topics.sh --zookeeper zkHost:zkPort --topic topicName --alter --config flush.messages=200000
bin/kafka-topics.sh --zookeeper zkHost:zkPort --topic topicName --alter --config flush.ms=20000
How to change the log flush interval if this doesn't works?

Related

Data still remains in Kafka topic even after retention time/size

We set the log retention hours to 1 hour as the following (previously setting was 72H)
Using the following Kafka command line tool, we set the kafka retention.ms to 1H. Our aim is to purge the data that is older then 1H in topic - test_topic, so we used the following command:
kafka-configs.sh --alter \
--zookeeper localhost:2181 \
--entity-type topics \
--entity-name topic_test \
--add-config retention.ms=3600000
and also
kafka-topics.sh --zookeeper localhost:2181 --alter \
--topic topic_test \
--config retention.ms=3600000
Both commands ran without errors.
But the problem is about Kafka data that is older then 1H and still remains!
Actually no data was removed from the topic topic_test partitions. We have HDP Kafka cluster version 1.0x and ambari
We do not understand why data on topic - topic_test still remained? and not decreased even after we run both cli as already described
what is wrong on the following kafka cli?
kafka-configs.sh --alter --zookeeper localhost:2181 --entity-type topics --entity-name topic_test --add-config retention.ms=3600000
kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic_test --config retention.ms=3600000
from the Kafka server.log we ca see the following
2020-07-28 14:47:27,394] INFO Processing override for entityPath: topics/topic_test with config: Map(retention.bytes -> 2165441552, retention.ms -> 3600000) (kafka.server.DynamicConfigManager)
[2020-07-28 14:47:27,397] WARN retention.ms for topic topic_test is set to 3600000. It is smaller than message.timestamp.difference.max.ms's value 9223372036854775807. This may result in frequent log rolling. (kafka.server.TopicConfigHandler)
reference - https://ronnieroller.com/kafka/cheat-sheet
The log cleaner will only work on inactive (sometimes also referred to as "old" or "clean") segments. As long as all data fits into the active ("dirty", "unclean") segment where its size is defined by segment.bytes size limit there will be no cleaning happening.
The configuration cleanup.policy is described as:
A string that is either "delete" or "compact" or both. This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.
In addition, the segment.bytes is:
This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
The configuration segment.ms can also be used to steer the deletion:
This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn't full to ensure that retention can delete or compact old data.
As it defaults to one week, you might want to reduce it to fit your needs.
Therefore, if you want to set the retention of a topic to e.g. one hour you could set:
cleanup.policy=delete
retention.ms=3600000
segment.ms=3600000
file.delete.delay.ms=1 (The time to wait before deleting a file from the filesystem)
segment.bytes=1024
Note: I am not referring to retention.bytes. The segment.bytes is a very different thing as described above. Also, be aware that log.retention.hours is a cluster-wide configuration. So, if you plan to have different retention times for different topics this will solve it.

how Compaction works in Apache Kafka

My input is 1:45$ and we are processing the message and next I am updating the 1:null.
I do see still the 1:45$ in the topic along with the 1:null (I can see both the messages)
I want output to be 1:null in the same topic.
I have used this code:
kafka-topics --create --zookeeper zookeeper:2181 --topic latest- product-price --replication-factor 1 --partitions 1 --config "cleanup.policy=compact" --config "delete.retention.ms=100" --config "segment.ms=100" --config "min.cleanable.dirty.ratio=0.01"
kafka-console-producer --broker-list localhost:9092 --topic latest- product-price --property parse.key=true --property key.separator=::
1::45$
1::null
kafka-console-consumer --bootstrap-server localhost:9092 --topic latest-product-price --property print.key=true --property key.separator=:: --from-beginning
But I do not find any compaction in my case and need some inputs to make the value as 1::null
Compaction in Kafka is not immediate. If you send two messages with the same key to a compacted topic, and you have a live consumer on that topic, that consumer will see both messages come through.
Periodically, there's a background cleaner thread that goes looking for duplicate keys in compacted topics, and removes the overwritten records, so that a consumer that pulls down the data after that log cleaner has run, will only see the last change/update for a particular key. So, topic compaction seems to be better suited for consumers that run periodically, not ones that are active 100% of the time.
One thing that you can tune how often this background log cleaner thread runs, to maybe run those consumers more often. Look for the log.cleaner configuration parameters in the Kafka documentation: https://kafka.apache.org/documentation/#brokerconfigs
There's a good explanation on how Kafka log compaction works at this link:
https://medium.com/swlh/introduction-to-topic-log-compaction-in-apache-kafka-3e4d4afd2262

Delete topic level config

In order to delete all of the data in a topic I set the retention.ms config of it to 1000.
./bin/kafka-topics.sh --zookeeper $KAFKAZKHOSTS --alter --topic <topic> --config retention.ms=1000
This worked fine. All the data was deleted after a very short wait.
Before altering the config, the retention.ms was not set on the topic and so the server default property log.retention.hours=168 was the previous retention policy. (log.retention.minutes and log.retention.ms had not been set in the server properties).
Now I would like to remove the retention.ms config from this topic completely and go back to using the server level config.
Commands like
./bin/kafka-topics.sh --zookeeper $KAFKAZKHOSTS --alter --topic <topic> --config retention.ms=
or
./bin/kafka-topics.sh --zookeeper $KAFKAZKHOSTS --alter --topic <topic> --config retention.ms=null
throw an error.
I know that the delete option for kafka-topics.sh actually deletes the entire topic, so I'm not going to try play around with that.
Question: How do I completely remove a topic level config so that the topic reverts to using the server default?
To remove a topic configuration override, you can use the kafka-config tool. For example:
./bin/kafka-configs.sh --zookeeper <zookeeper> --alter \
--entity-type topics --entity-name <topic> --delete-config retention.ms
Another traditional solution which I used is manually deleting the particular folder for particular topic.
First stop the kafka server
Go to /var/lib/kafka/data (where your kafka data ios being stored specifies by you at the time of installation)
rm -rf /var/lib/kafka/data/yourTopicName-0

Increasing Default number of partitions in Kafka cluster

When we create a new Kafka topic automatically in Kafka the default number of partitions for that topic will be 1, since the configuration num.partitions=1 .
Is there any ways to increase this property using any command or scripts without editing the server.properties file?
For updating the property you will have to modify the server.properties but you can increase the partitions by using kafka admin scripts as below
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name
--partitions <number_of_partitions>
You could make a script called create-topic.sh:
./bin/kafka-topics.sh --create --zookeeper <ZK_HOST> --topic $1 --partitions <DEFAULT_NUM_TOPICS>
and force everyone to only make topics via this script:
./create-topic.sh <TOPIC_NAME>
This isn't a fantastic solution, but you're severely limited if you really can't change server.properties.
In Kafka version 1.1, dynamic broker configuration feature is added. But, updating num.partitions config is not supported.

Delete topic in Kafka 0.8.1.1

I need to delete the topic test in Apache Kafka 0.8.1.1.
As expressed in the documentation here, I have executed:
bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic test
However, this results in the following message:
Command must include exactly one action: --list, --describe, --create or --alter
How can I delete this topic?
Deleting topic isn't always working in 0.8.1.1
Deletion should be working in the next release, 0.8.2
kafka-topics.sh --delete --zookeeper localhost:2181 --topic your_topic_name
Topic your_topic_name is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
You may also pass in a bootstrap server instead of zookeeper:
kafka-topics.sh --bootstrap-server kafka:9092 --delete --topic your_topic_name
Is it possible to delete a topic?
Jira KAFKA-1397
It seems that the deletion command was not officially documented in Kafka 0.8.1.x because of a known bug (https://issues.apache.org/jira/browse/KAFKA-1397).
Nevertheless, the command was still shipped in the code and can be executed as:
bin/kafka-run-class.sh kafka.admin.DeleteTopicCommand --zookeeper localhost:2181 --topic test
In the meantime, the bug got fixed and the deletion command is now officially available from Kafka 0.8.2.0 as:
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
Add below line in ${kafka_home}/config/server.properties
delete.topic.enable=true
Restart the kafka server with new config:
${kafka_home}/bin/kafka-server-start.sh ~/kafka/config/server.properties
Delete the topics you wish to:
${kafka_home}/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic daemon12
Andrea is correct. we can do it using command line.
And we still can program it, by
ZkClient zkClient = new ZkClient("localhost:2181", 10000);
zkClient.deleteRecursive(ZkUtils.getTopicPath("test2"));
Actually I do not recommend you delete topic on Kafka 0.8.1.1. I can delete this topic by this method, but if you check log for zookeeper, deletion mess it up.
Steps to Delete 1 or more Topics in Kafka
#To delete topics in kafka the delete option needs to be enabled in Kafka server.
1. Go to {kafka_home}/config/server.properties
2. Uncomment delete.topic.enable=true
#Delete one Topic in Kafka enter the following command
kafka-topics.sh --delete --zookeeper localhost:2181 --topic <your_topic_name>
#To Delete more than one topic from kafka
(good for testing purposes, where i created multiple topics & had to delete them for different scenarios)
Stop the Kafka Server and Zookeeper
go to server folder where the logs are stored (defined in their config files) and delete the kafkalogs and zookeeper folder manually
Restart the zookeeper and kafka server and try to list topics,
bin/kafka-topics.sh --list --zookeeper localhost:2181
if no topics are listed then the all topics have been deleted successfully.If topics are listed, then the delete was not successful. Try the above steps again or restart your computer.
You can delete a specific kafka topic (example: test) from zookeeper shell command (zookeeper-shell.sh). Use the below command to delete the topic
rmr {path of the topic}
example:
rmr /brokers/topics/test
This steps will delete all topics and data
Stop Kafka-server and Zookeeper-server
Remove the data directories of both services, by default on windows they are
C:/tmp/kafka-logs and C:/tmp/zookeeper.
then start Zookeeper-server and Kafka-server
The command:
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
unfortunately only marks topic for deletion.
Deletion does not happen.
That makes troubles, while testing any scripts, which prepares Kafka configuration.
Connected threads:
Purge Kafka Queue
Is there a way to delete all the data from a topic or delete the topic before every run?
As mentioned in doc here
Topic deletion option is disabled by default. To enable it set the server config
delete.topic.enable=true
Kafka does not currently support reducing the number of partitions for a topic or changing the replication factor.
Make sure delete.topic.enable=true
Adding to above answers one has to delete the meta data associated with that topic in zookeeper consumer offset path.
bin/zookeeper-shell.sh zookeeperhost:port
rmr /consumers/<sample-consumer-1>/offsets/<deleted-topic>
Otherwise the lag will be negative in kafka-monitoring tools based on zookeeper.
First, you run this command to delete your topic:
$ bin/kafka-topics.sh --delete --bootstrap-server localhost:9092 --topic <topic_name>
List active topics to check delete completely:
$ bin/kafka-topics.sh --list --bootstrap-server localhost:9092
If you have issues deleting the topics, try to delete the topic using:
$KAFKA_HOME/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic your_topic_name
command. Then in order to verify the deletion process, go to the kafka logs directory which normally is placed under /tmp/kafka-logs/, then delete the your_topic_name file via rm -rf your_topic_name command.
Remember to monitor the whole process via a kafka management tool like Kafka Tool.
The mentioned process above will remove the topics without kafka server restart.
There is actually a solution without touching those bin/kafka-*.sh: If you have installed kafdrop, then simply do:
url -XPOST http://your-kafdrop-domain/topic/THE-TOPIC-YOU-WANT-TO-DELETE/delete
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic <topic-name>
Step 1: Make sure you are connected to zookeeper and Kafka running
Step 2: To delele the Kafka topic run kafka-topics script, add the port and --topic with name of your topic and --delete it just delete the topic with success.
# Delete the kafka topic
# it will delete the kafka topic
bin/kafka-topics.sh --zookeeper 127.0.0.1:2181 --topic name_of_topic --delete
This worked for me:
kafka-topics --delete --bootstrap-server localhost:9092 --topic user-commands
I am using Confluent CLI and run this on the directory I installed it.
Recent Kafka versions are about to remove the Zookeeper dependency. Therefore, you should instead reference the brokers (through --boostrap-server):
kafka-topics \
--bootstrap-server localhost:9092,localhost:9093,localhost:9094 \
--delete \
--topic topic_for_deletion
for confluent cloud use -
$ confluent kafka topic delete my_topic