Apache Kafka stops deleting topic as per retention policy - apache-kafka

We run Apache Kafka and we have several topics there and because of the amount of data we keep them only for 4 hours:
log.retention.hours=4
log.retention.check.interval.ms=300000
And it works as expected and thanks to the load the data does not stay that long after 4 hours so that is fine.
But now for second time one topic just stopped doing the retention, the other five (and some bigger) are being deleted fine as can be seen on disk and in log, but this one specific just stopped doing any retention deletion on that particular topic. The only thing that I can see in log is that a lot of deletions were happening at the same time but still feels strange to cause just stop in the retention.
Any ideas are welcome as last time we had to stop Kafka, manually delete all the files and start it again and let it figure out what was lost and how to continue and it means we are losing lot of logs going through Kafka normally.
And here is snippet from the log with just that topic syslog being deleted for the last time with the lines before and after as well and just grepping this topic to make it shorter as it was also running some other deletes for other topics as well:
[2022-05-12 15:30:24,596] INFO [Log partition=syslog-7, dir=/data/kafka/data] Incrementing log start offset to 7342768449 (kafka.log.Log)
[2022-05-12 15:32:49,507] INFO [Log partition=syslog-0, dir=/data/kafka/data] Found deletable segments with base offsets [117533146039] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,507] INFO [Log partition=syslog-0, dir=/data/kafka/data] Scheduling log segment [baseOffset 117533146039, size 610913752] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,508] INFO [Log partition=syslog-0, dir=/data/kafka/data] Incrementing log start offset to 117535494639 (kafka.log.Log)
[2022-05-12 15:32:49,509] INFO [Log partition=syslog-9, dir=/data/kafka/data] Found deletable segments with base offsets [7339264694] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,509] INFO [Log partition=syslog-9, dir=/data/kafka/data] Scheduling log segment [baseOffset 7339264694, size 913148471] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,509] INFO [Log partition=syslog-9, dir=/data/kafka/data] Incrementing log start offset to 7342767511 (kafka.log.Log)
[2022-05-12 15:32:49,510] INFO [Log partition=syslog-2, dir=/data/kafka/data] Found deletable segments with base offsets [117535253776] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,510] INFO [Log partition=syslog-2, dir=/data/kafka/data] Scheduling log segment [baseOffset 117535253776, size 58699640] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,510] INFO [Log partition=syslog-2, dir=/data/kafka/data] Incrementing log start offset to 117535480847 (kafka.log.Log)
[2022-05-12 15:32:49,510] INFO [Log partition=syslog-3, dir=/data/kafka/data] Found deletable segments with base offsets [7462911255] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,510] INFO [Log partition=syslog-3, dir=/data/kafka/data] Scheduling log segment [baseOffset 7462911255, size 671165604] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,510] INFO [Log partition=syslog-3, dir=/data/kafka/data] Incrementing log start offset to 7465489528 (kafka.log.Log)
[2022-05-12 15:32:49,512] INFO [Log partition=syslog-11, dir=/data/kafka/data] Found deletable segments with base offsets [7341328180] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,512] INFO [Log partition=syslog-11, dir=/data/kafka/data] Scheduling log segment [baseOffset 7341328180, size 374034996] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,512] INFO [Log partition=syslog-11, dir=/data/kafka/data] Incrementing log start offset to 7342767027 (kafka.log.Log)
[2022-05-12 15:32:49,513] INFO [Log partition=syslog-12, dir=/data/kafka/data] Found deletable segments with base offsets [7339255632] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,513] INFO [Log partition=syslog-12, dir=/data/kafka/data] Scheduling log segment [baseOffset 7339255632, size 915537834] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,513] INFO [Log partition=syslog-12, dir=/data/kafka/data] Incrementing log start offset to 7342767514 (kafka.log.Log)
[2022-05-12 15:32:49,514] INFO [Log partition=syslog-5, dir=/data/kafka/data] Found deletable segments with base offsets [7341362593] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,514] INFO [Log partition=syslog-5, dir=/data/kafka/data] Scheduling log segment [baseOffset 7341362593, size 365599697] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,514] INFO [Log partition=syslog-5, dir=/data/kafka/data] Incrementing log start offset to 7342769250 (kafka.log.Log)
[2022-05-12 15:32:49,515] INFO [Log partition=syslog-13, dir=/data/kafka/data] Found deletable segments with base offsets [7340340358] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,515] INFO [Log partition=syslog-13, dir=/data/kafka/data] Scheduling log segment [baseOffset 7340340358, size 630202638] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,516] INFO [Log partition=syslog-6, dir=/data/kafka/data] Found deletable segments with base offsets [7339301315] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,516] INFO [Log partition=syslog-6, dir=/data/kafka/data] Scheduling log segment [baseOffset 7339301315, size 903744649] for deletion. (kafka.log.Log)
[2022-05-12 15:32:49,516] INFO [Log partition=syslog-6, dir=/data/kafka/data] Incrementing log start offset to 7342768603 (kafka.log.Log)
[2022-05-12 15:32:49,517] INFO [Log partition=syslog-7, dir=/data/kafka/data] Found deletable segments with base offsets [7339606433] due to retention time 14400000ms breach (kafka.log.Log)
[2022-05-12 15:32:49,517] INFO [Log partition=syslog-7, dir=/data/kafka/data] Scheduling log segment [baseOffset 7339606433, size 822575376] for deletion. (kafka.log.Log)
[2022-05-12 15:33:49,508] INFO [Log partition=syslog-0, dir=/data/kafka/data] Deleting segment 117533146039 (kafka.log.Log)
[2022-05-12 15:33:49,509] INFO [Log partition=syslog-9, dir=/data/kafka/data] Deleting segment 7339264694 (kafka.log.Log)
[2022-05-12 15:33:49,510] INFO [Log partition=syslog-2, dir=/data/kafka/data] Deleting segment 117535253776 (kafka.log.Log)
[2022-05-12 15:33:49,511] INFO [Log partition=syslog-3, dir=/data/kafka/data] Deleting segment 7462911255 (kafka.log.Log)
[2022-05-12 15:33:49,512] INFO [Log partition=syslog-11, dir=/data/kafka/data] Deleting segment 7341328180 (kafka.log.Log)
[2022-05-12 15:33:49,512] INFO Deleted log /data/kafka/data/syslog-2/00000000117535253776.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,512] INFO Deleted offset index /data/kafka/data/syslog-2/00000000117535253776.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,512] INFO Deleted time index /data/kafka/data/syslog-2/00000000117535253776.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,513] INFO [Log partition=syslog-12, dir=/data/kafka/data] Deleting segment 7339255632 (kafka.log.Log)
[2022-05-12 15:33:49,514] INFO [Log partition=syslog-5, dir=/data/kafka/data] Deleting segment 7341362593 (kafka.log.Log)
[2022-05-12 15:33:49,520] INFO [Log partition=syslog-13, dir=/data/kafka/data] Deleting segment 7340340358 (kafka.log.Log)
[2022-05-12 15:33:49,526] INFO Deleted log /data/kafka/data/syslog-11/00000000007341328180.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,526] INFO Deleted offset index /data/kafka/data/syslog-11/00000000007341328180.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,526] INFO Deleted time index /data/kafka/data/syslog-11/00000000007341328180.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,528] INFO Deleted log /data/kafka/data/syslog-5/00000000007341362593.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,528] INFO Deleted offset index /data/kafka/data/syslog-5/00000000007341362593.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,528] INFO Deleted time index /data/kafka/data/syslog-5/00000000007341362593.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,528] INFO [Log partition=syslog-6, dir=/data/kafka/data] Deleting segment 7339301315 (kafka.log.Log)
[2022-05-12 15:33:49,542] INFO Deleted log /data/kafka/data/syslog-13/00000000007340340358.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,542] INFO Deleted offset index /data/kafka/data/syslog-13/00000000007340340358.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,542] INFO Deleted time index /data/kafka/data/syslog-13/00000000007340340358.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,578] INFO [Log partition=syslog-7, dir=/data/kafka/data] Deleting segment 7339606433 (kafka.log.Log)
[2022-05-12 15:33:49,606] INFO Deleted log /data/kafka/data/syslog-7/00000000007339606433.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,606] INFO Deleted offset index /data/kafka/data/syslog-7/00000000007339606433.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,606] INFO Deleted time index /data/kafka/data/syslog-7/00000000007339606433.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,807] INFO Deleted log /data/kafka/data/syslog-0/00000000117533146039.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,808] INFO Deleted offset index /data/kafka/data/syslog-0/00000000117533146039.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,808] INFO Deleted time index /data/kafka/data/syslog-0/00000000117533146039.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,841] INFO Deleted log /data/kafka/data/syslog-3/00000000007462911255.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,841] INFO Deleted offset index /data/kafka/data/syslog-3/00000000007462911255.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,842] INFO Deleted time index /data/kafka/data/syslog-3/00000000007462911255.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,935] INFO Deleted log /data/kafka/data/syslog-12/00000000007339255632.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,935] INFO Deleted offset index /data/kafka/data/syslog-12/00000000007339255632.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,936] INFO Deleted time index /data/kafka/data/syslog-12/00000000007339255632.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,950] INFO Deleted log /data/kafka/data/syslog-9/00000000007339264694.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,950] INFO Deleted offset index /data/kafka/data/syslog-9/00000000007339264694.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,950] INFO Deleted time index /data/kafka/data/syslog-9/00000000007339264694.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,958] INFO Deleted log /data/kafka/data/syslog-6/00000000007339301315.log.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,958] INFO Deleted offset index /data/kafka/data/syslog-6/00000000007339301315.index.deleted. (kafka.log.LogSegment)
[2022-05-12 15:33:49,958] INFO Deleted time index /data/kafka/data/syslog-6/00000000007339301315.timeindex.deleted. (kafka.log.LogSegment)
[2022-05-12 17:45:21,218] INFO [ProducerStateManager partition=syslog-2] Writing producer snapshot at offset 117543798666 (kafka.log.ProducerStateManager)
[2022-05-12 17:45:21,218] INFO [Log partition=syslog-2, dir=/data/kafka/data] Rolled new log segment at offset 117543798666 in 2 ms. (kafka.log.Log)```

Related

Kafka delete topics when auto.create.topics is enabled

We have a 2 node kafka cluster with both auto.create.topics.enable and delete.topic.enable set to true. Our app reads from a common request topic and responds to a response topic provided by the client in the request payload.
auto.create.topics is set to true as our client has an auto-scale environment wherein a new worker will read from a new response topic. Due to some implementation issues on the client side, there are a lot of topics created which have never been used (end offset is 0) and we are attempting to clean that up.
The problem is that upon deleting the topic, it is being recreated almost immediately. These topics don't have any consumer (as the worker listening to it is already dead).
I have tried the following
Kafka CLI delete command
kafka-topics.sh --zookeeper localhost:2181 --topic <topic-name> --delete
Create a zookeeper node under
/admin/delete_topics/<topic-name>
Both don't seem to work. In the logs, I see that a request for delete was received and the corresponding logs/indexes were deleted. But within a few seconds/minutes, the topic is auto-created. Logs for reference -
INFO [Partition <topic-name>-0 broker=0] No checkpointed highwatermark is found for partition <topic-name>-0 (kafka.cluster.Partition)
INFO Replica loaded for partition <topic-name>-0 with initial high watermark 0 (kafka.cluster.Replica)
INFO Replica loaded for partition <topic-name>-0 with initial high watermark 0 (kafka.cluster.Replica)
INFO [Partition <topic-name>-0 broker=0] <topic-name>-0 starts at Leader Epoch 0 from offset 0. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-4.7a79dfc720624d228d5ee90c8d4c325e-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-4.7a79dfc720624d228d5ee90c8d4c325e-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-4.7a79dfc720624d228d5ee90c8d4c325e-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-4 in /home/ec2-user/data/kafka/0/<topic-name>-4.7a79dfc720624d228d5ee90c8d4c325e-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-2.d32a905f9ace459cb62a530b2c605347-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-2.d32a905f9ace459cb62a530b2c605347-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-2.d32a905f9ace459cb62a530b2c605347-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-2 in /home/ec2-user/data/kafka/0/<topic-name>-2.d32a905f9ace459cb62a530b2c605347-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-3.0670e8aefae5481682d53afcc09bab6a-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-3.0670e8aefae5481682d53afcc09bab6a-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-3.0670e8aefae5481682d53afcc09bab6a-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-3 in /home/ec2-user/data/kafka/0/<topic-name>-3.0670e8aefae5481682d53afcc09bab6a-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-7.ac76d42a39094955abfb9d37951f4fae-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-7.ac76d42a39094955abfb9d37951f4fae-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-7.ac76d42a39094955abfb9d37951f4fae-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-7 in /home/ec2-user/data/kafka/0/<topic-name>-7.ac76d42a39094955abfb9d37951f4fae-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-1.4872c74d579f4553a881114749e08141-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-1.4872c74d579f4553a881114749e08141-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-1.4872c74d579f4553a881114749e08141-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-1 in /home/ec2-user/data/kafka/0/<topic-name>-1.4872c74d579f4553a881114749e08141-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-0.489b7241226341f0a7ffa3d1b9a70e35-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-0.489b7241226341f0a7ffa3d1b9a70e35-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-0.489b7241226341f0a7ffa3d1b9a70e35-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-0 in /home/ec2-user/data/kafka/0/<topic-name>-0.489b7241226341f0a7ffa3d1b9a70e35-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-5.6d659cd119304e1f9a4077265364d05b-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-5.6d659cd119304e1f9a4077265364d05b-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-5.6d659cd119304e1f9a4077265364d05b-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-5 in /home/ec2-user/data/kafka/0/<topic-name>-5.6d659cd119304e1f9a4077265364d05b-delete. (kafka.log.LogManager)
INFO Deleted log /home/ec2-user/data/kafka/0/<topic-name>-6.652d1aec02014a3aa59bd3e14635bd3b-delete/00000000000000000000.log. (kafka.log.LogSegment)
INFO Deleted offset index /home/ec2-user/data/kafka/0/<topic-name>-6.652d1aec02014a3aa59bd3e14635bd3b-delete/00000000000000000000.index. (kafka.log.LogSegment)
INFO Deleted time index /home/ec2-user/data/kafka/0/<topic-name>-6.652d1aec02014a3aa59bd3e14635bd3b-delete/00000000000000000000.timeindex. (kafka.log.LogSegment)
INFO Deleted log for partition <topic-name>-6 in /home/ec2-user/data/kafka/0/<topic-name>-6.652d1aec02014a3aa59bd3e14635bd3b-delete. (kafka.log.LogManager)
INFO [GroupCoordinator 0]: Removed 0 offsets associated with deleted partitions: <topic-name>-0. (kafka.coordinator.group.GroupCoordinator)
INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(<topic-name>-0) (kafka.server.ReplicaFetcherManager)
INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions Set(<topic-name>-0) (kafka.server.ReplicaAlterLogDirsManager)
INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(<topic-name>-0) (kafka.server.ReplicaFetcherManager)
INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions Set(<topic-name>-0) (kafka.server.ReplicaAlterLogDirsManager)
INFO Log for partition <topic-name>-0 is renamed to /home/ec2-user/data/kafka/0/<topic-name>-0.185c7eda12b749a2999cd39b3f90c738-delete and is scheduled for deletion (kafka.log.LogManager)
INFO Creating topic <topic-name> with configuration {} and initial partition assignment Map(0 -> ArrayBuffer(0, 1)) (kafka.zk.AdminZkClient)
INFO [KafkaApi-0] Auto creation of topic <topic-name> with 1 partitions and replication factor 2 is successful (kafka.server.KafkaApis)
INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(<topic-name>-0) (kafka.server.ReplicaFetcherManager)
INFO [Log partition=<topic-name>-0, dir=/home/ec2-user/data/kafka/0] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
INFO [Log partition=<topic-name>-0, dir=/home/ec2-user/data/kafka/0] Completed load of log with 1 segments, log start offset 0 and log end offset 0 in 0 ms (kafka.log.Log)
INFO Created log for partition <topic-name>-0 in /home/ec2-user/data/kafka/0 with properties {compression.type -> producer, message.format.version -> 2.2-IV1, file.delete.delay.ms -> 60000, max.message.bytes -> 1000012, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, message.downconversion.enable -> true, min.insync.replicas -> 1, segment.jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, cleanup.policy -> [delete], flush.ms -> 9223372036854775807, segment.ms -> 604800000, segment.bytes -> 1073741824, retention.ms -> 86400000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760, flush.messages -> 9223372036854775807}. (kafka.log.LogManager)
INFO [Partition <topic-name>-0 broker=0] No checkpointed highwatermark is found for partition <topic-name>-0 (kafka.cluster.Partition)
INFO Replica loaded for partition <topic-name>-0 with initial high watermark 0 (kafka.cluster.Replica)
INFO Replica loaded for partition <topic-name>-0 with initial high watermark 0 (kafka.cluster.Replica)
INFO [Partition <topic-name>-0 broker=0] <topic-name>-0 starts at Leader Epoch 0 from offset 0. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
Does anyone know the reason behind the topic being re-created when no consumers are listening and no producers are producing to the topic? Could replication be behind it (some race condition perhaps)? We are using Kafka 2.2.
Deleting the log directory for that topic directly seems to work, however, this is cumbersome when there are thousands of topics created. We want to have a cleanup script that does this periodically as due to the auto-scale nature of the client environment, there may be frequent orphaned response topics.
Update
I tried Giorgos' suggestion by disabling auto.create.topics.enable and then deleting the topic. This time the topic did get deleted, but none of my applications through any errors (which leads to me the conclusion that there are no consumers/producers to the said topic).
Further, when auto.create.topics.enable is enabled and the topic is created with a replication-factor=1, the topic does not get re-created after deletion. This leads me to believe that perhaps replication is the culprit. Could this be a bug in Kafka?
Jumped the gun here; turns out something is listening/re-creating these topics from the customer environment.
Even if you've mentioned that no consumer/producer is consuming/producing from the topic, it sounds that this is the case. Maybe you have any connectors running on Kafka Connect that replicate data from/to Kafka?
If you still can't find what is causing the re-creation of the deleted Kafka topics, I would suggest setting auto.create.topics.enable to false (temporarily) so that topics cannot be automatically re-created. Then the process that is causing topic re-creation will normally fail and this is going to be reported in your logs.

Kafka Streams 1.1.0: Consumer Group Reprocessing Entire Log

We have a kafka streams application (2.0) which is communicating with kafka brokers (1.1.0). The streams application has been reprocessing the entire log for no discernible reason - the application hadn't been restarted, wasn't being rebalanced, and was just sitting around - in some cases it was processing messages, in others it was waiting to receive messages (having processed messages less than 6 hours ago). We've done a fair amount of research and have ruled out a potential cause by setting the offset-retention-minutes to 1 week, the same amount of time as our message retention. Additionally, it wouldn't make sense that this would be the root cause of the issue the consumer group offset was reset while it was actively processing messages.
There is nothing interesting in the broker logs around the time of the events:
[2019-02-21 09:02:20,009] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-02-21 09:12:20,009] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-02-21 09:12:51,084] INFO [ProducerStateManager partition=MY_TOPIC-1] Writing producer snapshot at offset 422924 (kafka.log.ProducerStateManager)
[2019-02-21 09:12:51,085] INFO [Log partition=MY_TOPIC-1, dir=/data1/kafka] Rolled new log segment at offset 422924 in 1 ms. (kafka.log.Log)
[2019-02-21 09:14:56,384] INFO [ProducerStateManager partition=MY_TOPIC-12] Writing producer snapshot at offset 295610 (kafka.log.ProducerStateManager)
[2019-02-21 09:14:56,384] INFO [Log partition=MY_TOPIC-12, dir=/data1/kafka] Rolled new log segment at offset 295610 in 1 ms. (kafka.log.Log)
[2019-02-21 09:15:19,365] INFO [ProducerStateManager partition=__transaction_state-8] Writing producer snapshot at offset 3939084 (kafka.log.ProducerStateManager)
[2019-02-21 09:15:19,365] INFO [Log partition=__transaction_state-8, dir=/data1/kafka] Rolled new log segment at offset 3939084 in 0 ms. (kafka.log.Log)
[2019-02-21 09:21:26,755] INFO [ProducerStateManager partition=MY_TOPIC-9] Writing producer snapshot at offset 319799 (kafka.log.ProducerStateManager)
[2019-02-21 09:21:26,755] INFO [Log partition=MY_TOPIC-9, dir=/data1/kafka] Rolled new log segment at offset 319799 in 1 ms. (kafka.log.Log)
[2019-02-21 09:22:20,009] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-02-21 09:23:31,283] INFO [ProducerStateManager partition=__consumer_offsets-17] Writing producer snapshot at offset 47345110 (kafka.log.ProducerStateManager)
[2019-02-21 09:23:31,297] INFO [Log partition=__consumer_offsets-17, dir=/data1/kafka] Rolled new log segment at offset 47345110 in 28 ms. (kafka.log.Log)
And absolutely nothing in the application logs (even with the log level set to DEBUG).
Any ideas about what might be causing this issue?
Upgrading the Kafka brokers to 2.0.0 resolved this issue.

Kafka does not respect topic's retention period on startup?

We're running one ZooKeeper server and one Kafka broker on a test machine. We're using ZooKeeper 3.3.5 and Kafka 0.9.0.1.
The message retention period on the broker is set to 4 hours:
log.retention.ms = null
log.retention.bytes = -1
log.retention.hours = 4
log.retention.minutes = null
log.retention.check.interval.ms = 30000
We have a single topic whose retention period is essentially infinite (we set it to 50 years).
At some point we shut down then restarted ZooKeeper, then we shut down and restarted the Kafka broker.
Looking at the logs, it seems that the Kafka broker, upon startup, decided to delete old messages, even though they are well within the 50 years retention period.
First, here's Kafka starting up normally:
[2016-09-15 09:09:24,877] INFO starting (kafka.server.KafkaServer)
[2016-09-15 09:09:24,883] INFO Connecting to zookeeper on 10.0.4.83:2182 (kafka.server.KafkaServer)
[2016-09-15 09:09:25,056] INFO Loading logs. (kafka.log.LogManager)
[2016-09-15 09:09:25,096] INFO Completed load of log ecosystem_sharedmodel-0 with log end offset 102152 (kafka.log.Log)
[2016-09-15 09:09:25,105] INFO Logs loading complete. (kafka.log.LogManager)
[2016-09-15 09:09:25,106] INFO Starting log cleanup with a period of 30000 ms. (kafka.log.LogManager)
[2016-09-15 09:09:25,107] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2016-09-15 09:09:25,149] INFO Awaiting socket connections on ip-10-0-4-83:9092. (kafka.network.Acceptor)
[2016-09-15 09:09:25,151] INFO [Socket Server on Broker 1], Started 1 acceptor threads (kafka.network.SocketServer)
[2016-09-15 09:09:25,169] INFO [ExpirationReaper-1], Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2016-09-15 09:09:25,173] INFO [ExpirationReaper-1], Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2016-09-15 09:09:25,244] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2016-09-15 09:09:25,250] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2016-09-15 09:09:25,250] INFO 1 successfully elected as leader (kafka.server.ZookeeperLeaderElector)
[2016-09-15 09:09:25,321] INFO [GroupCoordinator 1]: Starting up. (kafka.coordinator.GroupCoordinator)
Then, 30 seconds later, the log cleanup kicks in and deletes all our messages:
[2016-09-15 09:09:55,114] INFO Rolled new log segment for 'ecosystem_sharedmodel-0' in 3 ms. (kafka.log.Log)
[2016-09-15 09:09:55,115] INFO Scheduling log segment 0 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,115] INFO Scheduling log segment 618 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,115] INFO Scheduling log segment 3052 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,116] INFO Scheduling log segment 6050 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,116] INFO Scheduling log segment 73419 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,116] INFO Scheduling log segment 73553 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,116] INFO Scheduling log segment 82663 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,117] INFO Scheduling log segment 85600 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,117] INFO Scheduling log segment 96316 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,117] INFO Scheduling log segment 99030 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:09:55,117] INFO Scheduling log segment 101756 for log ecosystem_sharedmodel-0 for deletion. (kafka.log.Log)
[2016-09-15 09:10:55,116] INFO Deleting segment 618 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,116] INFO Deleting segment 0 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,116] INFO Deleting segment 6050 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,117] INFO Deleting segment 73419 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,117] INFO Deleting segment 3052 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,117] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000073419.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,117] INFO Deleting segment 73553 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,117] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000003052.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,118] INFO Deleting segment 82663 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,118] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000073553.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,118] INFO Deleting segment 85600 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,118] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000082663.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,119] INFO Deleting segment 96316 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,119] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000085600.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,119] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000096316.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,119] INFO Deleting segment 99030 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,119] INFO Deleting segment 101756 from log ecosystem_sharedmodel-0. (kafka.log.Log)
[2016-09-15 09:10:55,120] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,121] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000099030.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,121] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000000618.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,121] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000101756.index.deleted (kafka.log.OffsetIndex)
[2016-09-15 09:10:55,123] INFO Deleting index /data/kafka/ecosystem_sharedmodel-0/00000000000000006050.index.deleted (kafka.log.OffsetIndex)
Is this the expected behavior of Kafka? Is there a misunderstanding on our side? A configuration problem?
EDIT: it turned out to be a user error, no bugs on Kafka's side.

How does an offset expire for an Apache Kafka consumer group?

I was making some tests on an old topic when I noticed some strange behaviours. Reading Kafka's log I noticed this "removed 8 expired offsets" message:
[GroupCoordinator 1001]: Stabilized group GROUP_NAME generation 37 (kafka.coordinator.GroupCoordinator)
[GroupCoordinator 1001]: Assignment received from leader for group GROUP_NAME for generation 37 (kafka.coordinator.GroupCoordinator)
Deleting segment 0 from log __consumer_offsets-31. (kafka.log.Log)
Deleting segment 0 from log __consumer_offsets-45. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-45/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-31/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-13. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-13/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-11. (kafka.log.Log)
Deleting segment 4885 from log __consumer_offsets-11. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-11/00000000000000004885.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-11/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-26. (kafka.log.Log)
Deleting segment 12406 from log __consumer_offsets-26. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-26/00000000000000012406.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-26/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-22. (kafka.log.Log)
Deleting segment 8643 from log __consumer_offsets-22. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-22/00000000000000008643.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-22/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-6. (kafka.log.Log)
Deleting segment 9757 from log __consumer_offsets-6. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-6/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-6/00000000000000009757.index.deleted (kafka.log.OffsetIndex)
Deleting segment 0 from log __consumer_offsets-14. (kafka.log.Log)
Deleting segment 1 from log __consumer_offsets-14. (kafka.log.Log)
Deleting index /data/kafka-logs/__consumer_offsets-14/00000000000000000001.index.deleted (kafka.log.OffsetIndex)
Deleting index /data/kafka-logs/__consumer_offsets-14/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
[GroupCoordinator 1001]: Preparing to restabilize group GROUP_NAME with old generation 37 (kafka.coordinator.GroupCoordinator)
[GroupCoordinator 1001]: Stabilized group GROUP_NAME generation 38 (kafka.coordinator.GroupCoordinator)
[GroupCoordinator 1001]: Assignment received from leader for group GROUP_NAME for generation 38 (kafka.coordinator.GroupCoordinator)
[Group Metadata Manager on Broker 1001]: Removed 8 expired offsets in 1 milliseconds. (kafka.coordinator.GroupMetadataManager)
In fact, I have 2 questions:
How does this offset expiration work for a consumer group?
Can this expired offset explain this behaviour where my consumer would not poll anything when it had auto.offset.reset = latest, but it polled from the last committed offset when it had auto.offset.reset = earliest ?
Update
Since Apache Kafka 2.1, offsets won't be deleted as long as the consumer group is active, independent if the consumers commit offsets or not, ie, the offset.retention.minutes clocks only starts to tick when the group becomes empty (in older released, the clock started to tick directly when the commit happened).
Cf. https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets
Original Answer
Kafka, by default deletes committed offsets after a configurable period of time. See parameter offsets.retention.minutes. Ie, if a consumer group is inactive (ie, does not commit any offsets) for this amount of time, the offsets get deleted. Thus, even if the consumer is running, if it does not commit offsets for some partitions, those offsets are subject to offset.retention.minutes.
If you start a consumer, the following happens:
look for a (valid) committed offset (for the consumer group)
if valid offset is found, resume from there
if no valid offset is found, reset offset according to auto.offset.reset parameter
Thus, if your offsets got deleted and auto.offset.reset = latest, you consumer will not poll anything until new data is added to the topic. If auto.offset.reset = earliest it should consume the whole topic.
See this JIRA for a discussion about this https://issues.apache.org/jira/browse/KAFKA-3806 and https://issues.apache.org/jira/browse/KAFKA-4682
Check my answer here. You should not forget about file rolling. It impacts offset files removal.

cluster no response due to replication

I found this in my server.log:
[2016-03-29 18:24:59,349] INFO Scheduling log segment 3773408933 for log g17-4 for deletion. (kafka.log.Log)
[2016-03-29 18:24:59,349] INFO Scheduling log segment 3778380412 for log g17-4 for deletion. (kafka.log.Log)
[2016-03-29 18:24:59,403] WARN [ReplicaFetcherThread-3-4], Replica 2 for partition [g17,4] reset its fetch offset from 3501121050 to current leader 4's start offset 3501121050 (kafka.server.ReplicaFetcherThread)
[2016-03-29 18:24:59,403] ERROR [ReplicaFetcherThread-3-4], Current offset 3781428103 for partition [g17,4] out of range; reset offset to 3501121050 (kafka.server.ReplicaFetcherThread)
[2016-03-29 18:25:27,816] INFO Rolled new log segment for 'g17-12' in 1 ms. (kafka.log.Log)
[2016-03-29 18:25:35,548] INFO Rolled new log segment for 'g18-10' in 2 ms. (kafka.log.Log)
[2016-03-29 18:25:35,707] INFO Partition [g18,10] on broker 2: Shrinking ISR for partition [g18,10] from 2,4 to 2 (kafka.cluster.Partition)
[2016-03-29 18:25:36,042] INFO Partition [g18,10] on broker 2: Expanding ISR for partition [g18,10] from 2 to 2,4 (kafka.cluster.Partition)
The offset of replication is larger than leader's, so the replication data will delete, and then copy the the data from leader.
But when copying, the cluster is very slow; some storm topology fail due to no response from Kafka.
How do I prevent this problem from occurring?
How do I slow down the replication rate, while replication is copying?