Can different Kafka topics have different retention lengths? - apache-kafka

I'm looking to have a master topic (with log retention 7 days) and several smaller topics with a filtered corpus with a smaller log retention (2 days). Is this possible?
NOTE: I'm using Kafka v0.10.1.1.

log.retention.ms, whose default value is 7 days, is at the global level for all topics, whereas you could override it using a topic-level config retention.ms when creating the topic as below:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic test
--partitions 1 --replication-factor 1 --config retention.ms=172800000

log.retention.hours is a property of a broker which is used as a default value when a topic is created. When you change configurations of currently running topic using kafka-topics.sh, you should specify a topic-level property.
A topic-level property for log retention time is retention.ms.
From Topic-level configuration in Kafka 0.10.1 documentation:
Property: retention.ms
Default: 7 days
Server Default Property: log.retention.minutes
Description: This configuration controls the maximum time we will retain a log before we will discard old log segments to free up space if we are using the "delete" retention policy. This represents an SLA on how soon consumers must read their data.
So the correct command is
$ bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic as-access --config retention.ms=172800000
You can check whether the configuration is properly applied with the following command.
$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic as-access
Then you will see something like below.
Topic:as-access PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=172800000

Related

kafka consumer group id does not work as expected

I am new people on apache Kafka. When I go through quick start instruction via http://kafka.apache.org/quickstart with latest version kafka_2.12-2.2.0. I got a problem and can't figure it out by myself.
The issue is, on my laptop, I created 3 brokers to simulate cluster situation.
Each broker has its owned server property file. I made below change for each server property file and leave other default value as what it is.
broker.id=1 (server2: broker.id=2; server3: broker.id=3)
listeners=PLAINTEXT://127.0.0.1:9092 (server2: 127.0.0.1:9023; server3: 127.0.0.1:9004)
log.dirs=/tmp/kafka-logs (server2: /tmp/kafka-logs-2; server3: /tmp/kafka-logs-3)
num.partitions=3 (for all servers)
offsets.topic.replication.factor=3 (for all servers)
After I started ZK and those 3 brokers, I (can) create a topic 'TestTopic' with 3 partitions on any broker
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic TestTopic
And then I use below command to start 3 consumers in the same group 'rickygroup'.
//consumer one
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
//consumer two
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9093 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
//consumer three
bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9094 --from-beginning --topic TestTopic —group.id rickygroup —group.name rickygroup
Now, I use another terminal to publish some messages on Topic 'TestTopic'. The issue is, all of the above 3 consumers will get all and exactly the same messages. My understanding is 3 consumers should consume all messages indifference instead of the same. Otherwise, the consumer group shows repeated consuming instead of balance consuming.
Is there any misunderstanding on consumer group concept by me? or anything I did wrong here?
The console consumer uses --group (with two dashes), not -group.id and/or -group.name, which are not parsed options.

set config retention.ms=3600000 still data not delete from Kafka

I have set the retention.ms=3600000 by below command but still there is lots of data on disk after 1 hour. My disk got full due to huge data coming to Kafka.
./bin/kafka-topics.sh --zookeeper zookeeper:2181 --alter --topic topic_1 --config retention.ms=3600000
Describe command
./bin/kafka-topics.sh --zookeeper zookeeper:2181 --describe --topics-with-overrides
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic:topic_1 PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=3600000
Topic:topic_2 PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=3600000
Topic:topic_3 PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=3600000,retention.bytes=104857600
Can anyone give advice why kafka not delete the data after 1 hours.?
From the describe command result, topic retention policy is set to compact which will enable log compaction instead of deleting and will keep the latest data for each key. To delete all the data older than the retention period, you need to set retention policy to delete.
./bin/kafka-topics.sh --zookeeper zookeeper:2181 --alter --topic topic_1 --config cleanup.policy=delete
Check the value of log.retention.check.interval.ms.
This value affects the Log cleaner. It will check whether any log is eligible for deletion with this interval.
As the documentation suggests, retention.ms controls the maximum time kafka will retain a log before it will discard old log segments to free up space if we are using the "delete" retention policy.
Looks like your cleanup.policy is set to compact instead of delete
bin/kafka-configs.sh --zookeeper 2181 --entity-type topics
--entity-name topic_1 --alter --add-config cleanup.policy=delete
PS:Altering topic configuration from the kafka-topics.sh script (kafka.admin.TopicCommand) has been deprecated. Going forward, please use the kafka-configs.sh script (kafka.admin.ConfigCommand) for this functionality.

Kafka: Auto flush data for every 1GB

Which property I have to set for auto delete or auto-flush out data of a topic in Kafka broker.
I tried to edit the following properties but it didn't make any difference.
log.retention.ms
log.retention.byte
log.retention.check.interval.ms
But still whenever 1 GB is reaching it is not deleting the flush
So, uncommented below properties along with above
log.flush.interval.messages
log.flush.interval.ms
How much ever I may increase values of these properties it is deleting data around 180 MB Maximum.
How to delete data automatically whenever data for particular topic reaches 1GB.
log.retention.ms and log.retention.bytes are the properties of a broker which is used as a default value when a topic is created. When you change configurations of currently running topic using kafka-topics.sh, you should specify a topic-level property.
A topic-level property for log retention time are retention.ms and retention.bytes.
Try below command to set retention by time:
bin/kafka-topics.sh --zookeeper zk.yoursite.com --alter --topic as-access --config retention.ms=86400000
Try below command to set retention by size:
bin/kafka-topics.sh --zookeeper zk.yoursite.com --alter --topic as-access --config retention.bytes=1048576
Command to verify if properties are set to topic:
bin/kafka-topics.sh --describe --zookeeper zk.yoursite.com --topic as-access
Then you will see something like below.
Topic:as-access PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=86400000

Kafka gruop consumer is not created

I have started a consumer in a consumer group using following command
ldnpsr000001131$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic rent_test --property group.id=rent_test auto.commit.enable=true auto.commit.interval.ms=100
as I understand, above command will create a consumer group named rent_test, and commtted offset every 100 ms. However, when I tried to list all of the consumer group, the group "rent_test" is not presented.
ldnpsr000001131$ bin/kafka-consumer-groups.sh --list --zookeeper localhost:2181
console-consumer-68623
console-consumer-18287
console-consumer-45392
test
console-consumer-9009
KafkaMirror-test
console-consumer-25049
kafka-mirror
console-consumer-61946
console-consumer-940
console-consumer-11318
KafkaMirror
console-consumer-43035
console-consumer-99202
consumer-test
console-consumer-42642
console-consumer-19085
console-consumer-7142
KafkaMirror-test-1
console-consumer-82299
console-consumer-81448
console-consumer-26487
console-consumer-71474
flink
console-consumer-4692
Please advise?
If you are using old consumer, do not specify group.id in property. In 0.10.0.1, you have to specify it in a consumer config file and set consumer.config:
bin/kafka-console-consumer.sh --zookeeper zkHost:2181 --topic test-topic --consumer.config <config file path>

changing kafka retention period during runtime

With Kafka 0.8.1.1, how do I change the log retention time while it's running? The documentation says the property is log.retention.hours, but trying to change it using kafka-topics.sh returns this error
$ bin/kafka-topics.sh --zookeeper zk.yoursite.com --alter --topic as-access --config topic.log.retention.hours=24
Error while executing topic command requirement failed: Unknown configuration "topic.log.retention.hours".
java.lang.IllegalArgumentException: requirement failed: Unknown configuration "topic.log.retention.hours".
at scala.Predef$.require(Predef.scala:145)
at kafka.log.LogConfig$$anonfun$validateNames$1.apply(LogConfig.scala:138)
at kafka.log.LogConfig$$anonfun$validateNames$1.apply(LogConfig.scala:137)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.JavaConversions$JEnumerationWrapper.foreach(JavaConversions.scala:479)
at kafka.log.LogConfig$.validateNames(LogConfig.scala:137)
at kafka.log.LogConfig$.validate(LogConfig.scala:145)
at kafka.admin.TopicCommand$.parseTopicConfigsToBeAdded(TopicCommand.scala:171)
at kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:95)
at kafka.admin.TopicCommand$$anonfun$alterTopic$1.apply(TopicCommand.scala:93)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
at kafka.admin.TopicCommand$.alterTopic(TopicCommand.scala:93)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:52)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
log.retention.hours is a property of a broker which is used as a default value when a topic is created. When you change configurations of currently running topic using kafka-topics.sh, you should specify a topic-level property.
A topic-level property for log retention time is retention.ms.
From Topic-level configuration in Kafka 0.8.1 documentation:
Property: retention.ms
Default: 7 days
Server Default Property: log.retention.minutes
Description: This configuration controls the maximum time we will retain a log before we will discard old log segments to free up space if we are using the "delete" retention policy. This represents an SLA on how soon consumers must read their data.
So the correct command depends on the version. Up to 0.8.2 (although docs still show its use up to 0.10.1) use kafka-topics.sh --alter and after 0.10.2 (or perhaps from 0.9.0 going forward) use kafka-configs.sh --alter
$ bin/kafka-topics.sh --zookeeper zk.yoursite.com --alter --topic as-access --config retention.ms=86400000
You can check whether the configuration is properly applied with the following command.
$ bin/kafka-topics.sh --describe --zookeeper zk.yoursite.com --topic as-access
Then you will see something like below.
Topic:as-access PartitionCount:3 ReplicationFactor:3 Configs:retention.ms=86400000
The following is the right way to alter topic config as of Kafka 0.10.2.0:
bin/kafka-configs.sh --zookeeper <zk_host> --alter --entity-type topics --entity-name test_topic --add-config retention.ms=86400000
Topic config alter operations have been deprecated for bin/kafka-topics.sh.
WARNING: Altering topic configuration from this script has been deprecated and may be removed in future releases.
Going forward, please use kafka-configs.sh for this functionality`
The correct config key is retention.ms
$ bin/kafka-topics.sh --zookeeper zk.prod.yoursite.com --alter --topic as-access --config retention.ms=86400000
Updated config for topic "my-topic".
I tested and used this command in kafka confluent V4.0.0 and apache kafka V 1.0.0 and 1.0.1
/opt/kafka/confluent-4.0.0/bin/kafka-configs --zookeeper XX.XX.XX.XX:2181 --entity-type topics --entity-name test --alter --add-config retention.ms=55000
test is the topic name.
I think it works well in other versions too