Kafka reduce the no of open files as crossing 1000000 - apache-kafka

I have an kafka recieving 1gb of data every min from certain events, due to which the no of files open is going above 1000000. I am not sure which setting needs to be changed to lessen this no. Can anyone guide what could be a quick fix? should i increase the log.segment.bytes=1073741824
to 10 GB to reduce no of files getting created , or increase log.retention.check.interval.ms=300000 to 15 mins so less get checked for deletion

Increasing the size of the segments will reduce number of files maintained by the broker, with the tradeoffs being that only closed segments are ones that get cleaned or compacted.
The other alternative is to reconsider what types of data you're using. For example, if sending files or other large binary blobs, consider using filesystem URIs rather than push the whole data through a topic

Related

How can Kafka reads be constant irrespective of the datasize?

As per the documentation of Kafka
the data structure used in Kafka to store the messages is a simple log where all writes are actually just appends to the log.
What I don't understand here is, many claim that Kafka performance is constant irrespective of the data size it handles.
How can random reads be constant time in a linear data structure?
If I have a single partition topic with 1 billion messages in it. How can the time taken to retrieve the first message be same as the time taken to retrieve the last message, if the reads are always sequential?
In Kafka, the log for each partition is not a single file. It is actually split in segments of fixed size.
For each segment, Kafka knows the start and end offsets. So for a random read, it's easy to find the correct segment.
Then each segment has a couple of indexes (time and offset based). Those are the file named *.index and *.timeindex. These files enable jumping directly to a location near (or at) the desired read.
So you can see that the total number of segments (also total size of the log) does not really impact the read logic.
Note also that the size of segments, the size of indexes and the index interval are all configurable settings (even at the topic level).

kafka + how to avoid running out of disk storage

I want to described the following case that was on one of our production cluster
We have ambari cluster with HDP version 2.6.4
Cluster include 3 kafka machines – while each kafka have disk with 5 T
What we saw is that all kafka disks was with 100% size , so kafka disk was full and this is the reason that all kafka brokers was failed
df -h /kafka
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 5T 5T 23M 100% /var/kafka
After investigation we saw that log.retention.hours=7 days
So seems that purging is after 7 days and maybe this is the reason that kafka disks are full with 100% even if they are huge – 5T
What we want to do now – is how to avoid this case in the future?
So
We want to know – how to avoid full used capacity on kafka disks
What we need to set in Kafka config in order to purge the kafka disk according to the disk size – is it possible ?
And how to know the right value of log.retention.hours ? according to the disk size or other?
In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes while the latter by log.retention.hours.
In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a delete cleanup policy, you'd need to configure the following parameters to
log.cleaner.enable=true
log.cleanup.policy=delete
Then you need to think about the configuration of log.retention.bytes, log.segment.bytes and log.retention.check.interval.ms. To do so, you have to take into consideration the following factors:
log.retention.bytes is a minimum guarantee for a single partition of a topic, meaning that if you set log.retention.bytes to 512MB, it means you will always have 512MB of data (per partition) in your disk.
Again, if you set log.retention.bytes to 512MB and log.retention.check.interval.ms to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.
A topic log on disk, is made up of segments. The segment size is dependent to log.segment.bytes parameter. For log.retention.bytes=1GB and log.segment.bytes=512MB, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).
Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure log.retention.hours accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48.
I think you have three options:
1) Increase the size of the disks until you notice that you have a comfortable amount of space free thanks to your increase and current retention policy of 7 days. For me a comfortable amount free is around 40% (but that is personal preference).
2) Lower your retention policy to for example 3 days and see if your disks are still full after a period of time. The right retention period varies between different use cases. If you don't need a backup of the data on Kafka when something goes wrong then just pick a very low retention period. If it is crucial that you have need those 7 days worth of data then you should not change the period but the disk sizes.
3) A combination of the options 1 and 2.
More information about optimal retention policies: Kafka optimal retention and deletion policy

Avoiding small files from Kafka connect using HDFS connector sink in distributed mode

We have a topic with messages at the rate of 1msg per second with 3 partitions and I am using HDFS connector to write the data to HDFS as AVRo format(default), it generates files with size in KBS,So I tried altering the below properties in the HDFS properties.
"flush.size":"5000",
"rotate.interval.ms":"7200000"
but the output is still small files,So I need clarity on the following things to solve this issue:
is flush.size property mandatory, in-case if we do not mention the flus.size property how does the data gets flushed?
if the we mention the flush size as 5000 and rotate interval as 2 hours,it is flushing the data for every 2 hours for first 3 intervals but after that it flushes data randomly,Please find the timings of the file creation(
19:14,21:14,23:15,01:15,06:59,08:59,12:40,14:40)--highlighted the mismatched intervals.is it because of the over riding of properties mentioned?that takes me to the third question.
What is the preference for flush if we mention all the below properties (flush.size,rotate.interval.ms,rotate.schedule.interval.ms)
Increasing the rate of msg and reducing the partition is actually showing an increase in the size of the data being flushed, is it the only way to have control over the small files,how can we handle the properties if the rate of the input events are varying and not stable?
It would be great help if you could share documentations regarding handling small files in kafka connect with HDFS connector,Thank you.
If you are using a TimeBasedPartitioner, and the messages are not consistently going to have increasing timestamps, then you will end up with a single writer task dumping files when it sees a message with a lesser timestamp in the interval of rotate.interval.ms of reading any given record.
If you want to have consistent bihourly partition windows, then you should be using rotate.interval.ms=-1 to disable it, then rotate.schedule.interval.ms to some reasonable number that is within the partition duration window.
E.g. you have 7200 messages every 2 hours, and it's not clear how large each message is, but let's say 1MB. Then, you'd be holding ~7GB of data in a buffer, and you need to adjust your Connect heap sizes to hold that much data.
The order of presecence is
scheduled rotation, starting from the top of the hour
flush size or "message-based" time rotation, whichever occurs first, or there is a record that is seen as "before" the start of the current batch
And I believe flush size is mandatory for the storage connectors
Overall, systems such as Uber's Hudi or the previous Kafka-HDFS tool of Camus Sweeper are more equipped to handle small files. Connect Sink Tasks only care about consuming from Kafka, and writing to downstream systems; the framework itself doesn't recognize Hadoop prefers larger files.

How does Kafka guarantee sequential disk access?

I'm a newbie for Kafka. When I read the documentation of Kafka, I saw that Kafka is performing well because of sequential disk access.
But how is that possible? In Java(or something else), If I use File I/O, OS will handle it appropriately. However, I can't know if OS store the files I want to store in multiple sectors or in contiguous sectors. So, Kafka cannot always say that sequential disk access occurs in my opinion.
Am I true or not?
Kafka does not always access disk sequentially but it does some things that make it much more likely that disk access is often sequential. All Kafka messages are stored in larger segment files (1GB each by default) and since Kafka messages are not deleted when consumed (like in other message brokers) Kafka will not end up creating a fragmented filesystem over time by continuously creating and deleting many variable length files. Instead it creates segment files and then appends to that file until it reaches 1GB (a configurable limit). Only when all messages in the segment expire will it delete the entire 1GB segment. This means that often these 1GB sections of disk are actually laid out as contiguous blocks. It is a recommended best practice to keep these Kafka commit log files on a dedicated filesystem so it does not get fragmented by other apps reading and writing variable length files into the same filesystem. More importantly most reading an writing to these segment files is sequential and goes through OS page cache so as to reduce disk I/O even further by caching the most often accessed pages in memory. This is why it is a recommendation to tune the kernel to set swappiness to 1 to reduce the likelihood that these cached pages would get swapped out of memory.

Zookeeper Overall data size limit

I am new to Zookeeper, trying to understand if it fits for my use case.
I have 10 million hierarchical data, which I want to store in Zookeeper.
10M key-value pair with size of the key and value will be 1KB each.
So the total data size is approximately ~20GB (10M * 2KB) without replication.
I know the zNodes data size limit is 1MB( which can be changed).
Questions:
Will zookeeper able to support 20GB of data, with no performance impact.
Is there max size after which the performance degrades?
Is there a limit to total number of nodes?
Zookeeper will no way be suitable for this use case. Zookeeper keeps dumping/snapshotting the data tree periodically and that means it will be dumping whole of the 20 GB data every few minutes. Moreover Zookeeper nodes in the cluster/ensemble are more like replica of each other and hence whole data is replicated to each Zookeeper node and hence no data partitioning either. Zookeeper is not a database.
I guess for your use case, it will be much better to go with some database or some distributed cache (Redis/Hazelcast etc.)
Anyway there are no limits on the total number of nodes on Zookeeper.