Why is Kafka not deleting data? - apache-kafka

I have a two node Kafka cluster with 48 gb disk allotted to each.
The server.properties is set to retain logs upto 48 hours or log segments up to 1 GB. Here it is :
log.retention.hours=48
log.retention.bytes=1073741824
log.segment.bytes=1073741824
I have 30 partitons for a topic. Here are the disk usage stats for one of these partitions:
-rw-r--r-- 1 root root 1.9M Apr 14 00:06 00000000000000000000.index
-rw-r--r-- 1 root root 1.0G Apr 14 00:06 00000000000000000000.log
-rw-r--r-- 1 root root 0 Apr 14 00:06 00000000000000000000.timeindex
-rw-r--r-- 1 root root 10M Apr 14 12:43 00000000000001486744.index
-rw-r--r-- 1 root root 73M Apr 14 12:43 00000000000001486744.log
-rw-r--r-- 1 root root 10M Apr 14 00:06 00000000000001486744.timeindex
As you can clearly see, we have a log segment of 1 gb. But as per my understanding, it should have already been deleted. Also, its been more than 48 hours since these logs were rolled out by Kafka. Thoughts?

In your case, you set log.retention.bytes and log.segment.bytes to the same value, which means there is always no candidate of deletable segment, so no delete happens.
The algorithm is:
firstly calculate the difference. In your case, the difference is 73MB (73MB + 1GB - 1GB)
Iterator all the non-active log segments, compare its size with the diff
If diff > log segment size, mark this segment deletable, and decrement the diff by the size
Otherwise, mark this segment undeletable and try with the next log segment.

Answering my own question:
Let's say that log.retention.hours has value 24 hours and log.retention.bytes and log.segment.bytes are both set to 1 GB. When the value of the log reaches 1 GB (call this Old Log), a new log segment is created (call this New Log). The Old Log is then deleted 24 hours after the New Log is created.
In my case, the New Log was created about 25 hours before I posted this question. I dynamically changed the retention.ms value for a topic (which is maintained by Zookeeper, and not the Kafka cluster, and therefore does not require Kafka restart) to 24 hours, my old logs were immediately deleted.

Related

kafka + which files should created under kafka-logs

usually after kafka cluster scratch installation I saw this files under /data/kafka-logs ( kafka broker logs. where all topics should be located )
ls -ltr
-rw-r--r-- 1 kafka hadoop 0 Jan 9 10:07 cleaner-offset-checkpoint
-rw-r--r-- 1 kafka hadoop 57 Jan 9 10:07 meta.properties
drwxr-xr-x 2 kafka hadoop 4096 Jan 9 10:51 _schemas-0
-rw-r--r-- 1 kafka hadoop 17 Jan 10 07:39 recovery-point-offset-checkpoint
-rw-r--r-- 1 kafka hadoop 17 Jan 10 07:39 replication-offset-checkpoint
but on some other Kafka scratch installation we saw the folder - /data/kafka-logs is empty
is this indicate on problem ?
note - we still not create the topics
I'm not sure when each checkpoint file is created (though, they track log cleaner and replication offsets), but I assume that the meta properties is created at broker startup.
Otherwise, you would see one folder per Topic-partition, for example, looks like you had one topic created, _schemas.
If you only see one partition folder out of multiple brokers, then your replication factor for that topic is set to 1

Dynamically configuring retention.ms not working for a kafka topic

I have a Kafka topic called retention and Below are the server configuration related to retention:
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=3600000 (~ 1 hour)
log.cleaner.enable=true
And below is the topic specific config:
retention.ms=2592000000,retention.bytes=3298534883328
where retention.ms ~ 30d and retention.bytes = ~ 3.29 TB
I configured the retention.ms and retention.bytes using the below command recently (on 14th Jan 2019) using below commands:
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic retentions --config retention.bytes=219902325555
Here the configuration for the retntion.bytes seems to be working while retention.ms does not seem to be working. Here is the evidence that I could collect:
cd log_dir/retentions-0/
ls -lrt 00000000000000000000.*
-rw-r--r-- 1 root root 294387381 Nov 26 22:37 00000000000000000000.log
-rw-r--r-- 1 root root 3912 Jan 14 18:06 00000000000000000000.index
-rw-r--r-- 1 root root 5868 Jan 14 18:06 00000000000000000000.timeindex
If we look into the logs of older segments these are nearly 2 months old.
Can anybody tell which of these two configurations will take effect on priority Or, both can work whichever crosses the configured threshold.
In my assumptions, both configurations should work in conjunction. Plz, let me know if this is not the case.
Both work in conjunction.
From Kafka: The Definitive Guide book
If you have specified a value for both log.retention.bytes and log.retention.ms ... messages may be removed when either criteria is met.

What are the different logs under kafka data log dir

I am trying to understand the kafka data logs. I can see the logs under the dir set in logs.dir as "Topicname_partitionnumber". However I would like to know what are the different logs captured under it. Below is the screenshot for a sample log.
In Kafka logs, each partition has a log.dir directory. Each partition is split into segments.
A segment is just a collection of messages. Instead of writing all messages into a single file, Kafka splits them into chunks of segments.
Whenever Kafka writes to a partition, it writes to an active segment. Each segment has defined size limit. When the segment size limit is reached, it closes the segment and opens a new one that becomes active. One partition can have one or more segment based on the configuration.
Each segment contains three files - segment.log,segment.index and segment.timeindex
There are three types of file for each Kafka topic partition:
-rw-r--r-- 1 kafka hadoop 10485760 Dec 3 23:57 00000000000000000000.index
-rw-r--r-- 1 kafka hadoop 148814230 Oct 11 06:50 00000000000000000000.log
-rw-r--r-- 1 kafka hadoop 10485756 Dec 3 23:57 00000000000000000000.timeindex
The 00000000000000000000 in front of log and index files is the name of the segments. It represents the offset of the first record written in that segment. If there are 2 segments i.e. Segment 1 containing message offset 0,1 and Segment 2 containing message offset 2 and 3.
-rw-r--r-- 1 kafka hadoop 10485760 Dec 3 23:57 00000000000000000000.index
-rw-r--r-- 1 kafka hadoop 148814230 Oct 11 06:50 00000000000000000000.log
-rw-r--r-- 1 kafka hadoop 10485756 Dec 3 23:57 00000000000000000000.timeindex
-rw-r--r-- 1 kafka hadoop 10485760 Dec 3 23:57 00000000000000000002.index
-rw-r--r-- 1 kafka hadoop 148814230 Oct 11 06:50 00000000000000000002.log
-rw-r--r-- 1 kafka hadoop 10485756 Dec 3 23:57 00000000000000000002.timeindex
.log file stores the offset, the physical position of the message, timestamp along with the message content. While reading the messages from Kafka at a particular offset, it becomes an expensive task to find the offset in a huge log file.
That's where .index the file becomes useful. It stores the offsets and physical position of the messages in the log file.
.timeindex the file is based on the timestamp of messages.
The files without a suffix are the segment files, i.e. the files the data is actually written to, named by earliest contained message offset. The latest of those is the active segment, meaning the one that messages are currently appended to.
.index are corresponding mappings from offset to positions in the segment file. .timeindex are mappings from timestamp to offset.
Below is the screenshot for a sample log
You should add your screenshot and sample log, then we could give your expected and specific answer.
before that, only can give you some common knowledge:
eg: in my CentOS, for folder:
/root/logs/kafka/kafka.log/storybook_add-0
storybook_add: is the topic name
in code, the real topic name is storybook-add
its contains:
[root#xxx storybook_add-0]# ll
total 8
-rw-r--r-- 1 root root 10485760 Aug 28 16:44 00000000000000000023.index
-rw-r--r-- 1 root root 700 Aug 28 16:45 00000000000000000023.log
-rw-r--r-- 1 root root 10485756 Aug 28 16:44 00000000000000000023.timeindex
-rw-r--r-- 1 root root 9 Aug 28 16:44 leader-epoch-checkpoint
00000000000000000023.log: log file
store the real data = kafka message
00000000000000000023.index: index file
00000000000000000023.timeindex: timeindex file
->
00000000000000000023 called segment name
why is 23?
in 00000000000000000023.log, which stored first message's postion is 23
kafka previously has totally received 23 messages
what the message data look like?
we can see through its content:
For further basic concept and logic of kafka, recommend to read this article:
A Practical Introduction to Kafka Storage Internals

Kafka too many open files, many tiny logs, with high ulimit and about 5k segments

It keeps having Kafka reporting "Too many open files". I just restarted clean, but after 10 minutes or so I end up with
lsof | grep cp-kafka | wc -l:
454225
process limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 96186 96186 processes
Max open files 800000 800000 files
Max locked memory 16777216 16777216 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 96186 96186 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
I have set retention.hours to -1, as I want to keep all logs from the past. In my server.properties I had segment files of 100mb, but for some reason, Kafka makes 10mb logs. The strange thing is, I "only" have a relatively low number of files in the log directory.
find | wc -l
5884
I don't understand what I am doing wrong here.
I installed the confluent-kafka deb packages on Ubuntu 18.04.
kafka 2.0
messages are about 500bytes each
auto create topic is true
One directory, are my messages too small for the timeindex?
rw-r--r-- 1 2.2K Sep 30 10:03 00000000000000000000.index
rw-r--r-- 1 1.2M Sep 30 10:03 00000000000000000000.log
rw-r--r-- 1 3.3K Sep 30 10:03 00000000000000000000.timeindex
rw-r--r-- 1 560 Sep 30 10:03 00000000000000004308.index
rw-r--r-- 1 293K Sep 30 10:03 00000000000000004308.log
rw-r--r-- 1 10 Sep 30 10:03 00000000000000004308.snapshot
rw-r--r-- 1 840 Sep 30 10:03 00000000000000004308.timeindex
rw-r--r-- 1 10M Sep 30 10:03 00000000000000005502.index
rw-r--r-- 1 97K Sep 30 10:04 00000000000000005502.log
rw-r--r-- 1 10 Sep 30 10:03 00000000000000005502.snapshot
rw-r--r-- 1 10M Sep 30 10:03 00000000000000005502.timeindex
Also added the following lines in server config; index remain 10Mb max
log.segment.bytes=1073741824
log.segment.index.bytes=1073741824
BTW, I am sending messages with timestamps in the past, with log retention of 1000 years.

Mongo rs.initiate() excessive disk space requirements

I have 2 mongod instances running with the following parameters
--noprealloc --smallfiles --replSet mongors1 --dbpath /data/db --nojournal
The goal of the exercise is to create a replicated environment with a minimal disk footprint for local development purposes.
At this point in time, all is good with each respective data system being around ~32M and having the following
ls -o data/db
total 32784
-rw------- 1 999 16777216 Sep 22 11:38 local.0
-rw------- 1 999 16777216 Sep 22 11:38 local.ns
-rwxr-xr-x 1 999 2 Sep 22 11:38 mongod.lock
-rw-r--r-- 1 999 69 Sep 22 11:38 storage.bson
drwxr-xr-x 2 999 4096 Sep 22 11:38 _tmp
After logging on to the first member and running rs.initiate(); an additional 1G of disk space is utilized.
ls -o data/db
total 1080856
-rw------- 1 999 16777216 Sep 22 11:39 local.0
-rw------- 1 999 536608768 Sep 22 11:39 local.1
-rw------- 1 999 536608768 Sep 22 11:39 local.2
-rw------- 1 999 16777216 Sep 22 11:39 local.ns
-rwxr-xr-x 1 999 2 Sep 22 11:38 mongod.lock
-rw-r--r-- 1 999 69 Sep 22 11:38 storage.bson
drwxr-xr-x 2 999 4096 Sep 22 11:39 _tmp
This seems excessive given the properties of the nodes being replicated and the configuration they are running.
Mongo 3.0.6 is the version in use.
Eventually this will be scaled up to replica sets with 3 members across 2+ shards. A minimal disk requirement of 6Gb to store zero data initially seems sub-optimal.
Is there a way to reduce this to something more representative of the nodes needs?
Any help is appreciated. Thanks in advance
The local database contains the oplog, and I'll leave you to research yourself as to what size this should be for a given node. To address the question at hand, from the docs:
For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB
allocates 5% of the available free disk space, but will always
allocate at least 1 gigabyte and never more than 50 gigabytes.
That's where your usage is coming from - to alter that allocation you will either need to resize the oplog or, if starting from scratch, look at the oplogSizeMB option (or for the CLI equivalent see here).
In addition to what Adam said, add the
--oplogSize X
to your parameters and replace X with the amount of MB you want the oplog to be.