Can Kafka compaction overwrite messages with same partition key? - apache-kafka

I am using following code to write to Kafka:
String partitionKey = "" + System.currentTimeMillis();
KeyedMessage<String, String> data = new KeyedMessage<String, String>(topic, partitionKey, payload);
And we are using 0.8.1.1 version of Kafka.
Is it possible that when multiple threads are writing, some of them (with different payload) write with same partition key and because of that Kafka overwrites these messages (due to same partitionKey)?
The documentation that got us thinking in this direction is:
http://kafka.apache.org/documentation.html#compaction

I found some more material at https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
Salient points:
Before 0.8 version, Kafka supported only a single retention
mechanism: deleting old segments of log
Log compaction provides an alternative such that it maintains the most recent entry for each
unique key, rather than maintaining only recent log entries.
There is a per-topic option to choose either "delete" or "compact".
Compaction guarantees that each key is unique in the tail of the
log. It works by recopying the log from beginning to end, removing
keys which have a later occurrence in the log.
Any consumer that stays within the head of the log (~1GB) will see all messages.
So whether we have log compaction or not, it follows that Kafka deletes older records but the records in the head of the log are safe from that.
Missing records problem will occur only when downstream clients are unable to empty Kafka queues for a very long time (such that per topic size/time limit is hit).
This should be an expected behavior I think since we cannot keep records forever. They have to be deleted some time or the other.

Sounds very possible. Compaction saves the last message for each key. If you have multiple messages sharing a key, only the last one will be saved after compaction. The normal use-case is database replication where only the latest state is interesting.

Related

How and why and what for do tombstone records appear in Kafka?

Studying kafka, came across issues deletes tombstone messages ( e.g. while log compaction)
But my question is: how do tombstones even appear there? Who and why wanna use it and for what?
They are written by the producing application (e.g. Producer API, or Kafka Connect source connector) by putting a null in the value part of the message. It denotes a logical deletion for the associated key in the message.
If you use log compaction then in time the previous values for that key are actually deleted too.
More info: https://medium.com/#damienthomlutz/deleting-records-in-kafka-aka-tombstones-651114655a16

What does it really mean by "Kafka partitions are immutable"?

In all the Kafka tutorials I've read so far they all mentioned "Kafka partitions are immutable". However, I also read from this site https://towardsdatascience.com/log-compacted-topics-in-apache-kafka-b1aa1e4665a7 that from time to time, Kafka will remove older messages in the partition (depending on the retention time you set in the log-compact command). You can see from the screenshot below that data within the partition has clearly changed after removing the duplicate Keys in the partition:
So my question is what exactly does it mean to say "Kafka partitions are immutable"?
Tha Kafka partitions are defined as "immutable" referring to the fact that a producer can just append messages to a partition itself and not changing the value for an existing one (i.e. with the same key). The partition itself is a commit log working just in append mode from a producer point of view.
Of course, it means that without any kind of mechanisms like deletion (by retention time) and compaction, the partition size could grow endlessly.
At this point you could think .. "so it's not immutable!" as you mentioned.
Well, as I said the immutability is from a producer's point of view. Deletion and compaction are administrative operations.
For example, deleting records is also possible using the Admin Client API ... but we are always talking about administrative stuff, not producer/consumer related stuff.
If you think about compaction and how it works, the producer initially sends, for example, a message with key = A and payload = "Hello". After a while in order to "update" the value, it sends a new message with same key = A and payload = "Hi" ... but actually it's a really new message appended at the end of the partition log; it will be the compaction thread in the broker doing the work of deleting the old message with "Hello" payload leaving just the new one.
In the same way a producer can send the message with key = A and payload = null. It's the way for actually deleting the message (null is called "tombstone"). Anyway the producer is still appending a new message to the partition; it's always the compaction thread which will delete the last message with key = A when it saw the tombstone.
Inidividual messages are immutable.
Compaction or retention will drop messages. It doesn't alter messages or offsets
Data in Kafka is stored in topics, topics are partitioned, each partition is further divided into segments and finally each segment has a log file to store the actual message, an index file to store the position of the messages in the log file and timeindex file, for example:
$ ls -l /mnt/data/kafka/*consumer*/00000000004618814867*
-rw-r--r-- 1 kafka kafka 10485760 Oct 3 23:41 /mnt/data/kafka/__consumer_offsets-7/00000000004618814867.index
-rw-r--r-- 1 kafka kafka 8189913 Oct 3 23:41 /mnt/data/kafka/__consumer_offsets-7/00000000004618814867.log
-rw-r--r-- 1 kafka kafka 10485756 Oct 3 23:41 /mnt/data/kafka/__consumer_offsets-7/00000000004618814867.timeindex
In scenario where log.cleanup.policy (or cleanup.policy on particular topic) set to delete, occur complete delete some of log segments (one or more).
In scenario where params set to compact the compaction is done in the background by periodically recopying log segments: it recopies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment (not a fully copy of the log). In other words, the old segment is replaced by a new compacted segment
See more about distributed logs:
https://kafka.apache.org/documentation.html#compaction
https://medium.com/#durgaswaroop/a-practical-introduction-to-kafka-storage-internals-d5b544f6925f
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
https://bookkeeper.apache.org/distributedlog/docs/0.5.0/user_guide/architecture/main
https://bravenewgeek.com/building-a-distributed-log-from-scratch-part-1-storage-mechanics/
Immutability is a property of the records stored within the partitions themselves. When the source (documentation or articles) states immutability within the context of topics or partitions, they are usually referring to either one of two things, both of which are correct in a limited context:
Records are immutable. Once a record is written, its contents can never be altered. A record can be deleted by the broker when either (a) the contents of the partition are pruned due to the retention limit, (b) a new record is added for the same key that supersedes the original record and compaction takes place, or (c) a record is added for the same key with a null value, which acts as a tombstone record, deleting the original without adding a replacement.
Partitions are append-only from a client's perspective, in that a client is not permitted to modify records or directly remove records from a partition, only append to the partition. This is somewhat debatable, because a client can induce the deletion of a record through the compaction feature, although this operation is asynchronous and the client cannot specify precisely which record should be deleted.

unique message check in kafka topic

We use Logstash and we want read one table from Oracle Database and send these messages (as shown below) to Kafka:
Topic1: message1: {"name":"name-1", "id":"fbd89256-12gh-10og-etdgn1234njF", "site":"site-1", "time":"2019-07-30"}
message2: {"name":"name-2", "id":"fbd89256-12gh-10og-etdgn1234njG", "site":"site-1", "time":"2019-07-30"}
message3: {"name":"name-3", "id":"fbd89256-12gh-10og-etdgn1234njS", "site":"site-1", "time":"2019-07-30"}
message4: {"name":"name-4", "id":"fbd89256-12gh-10og-etdgn1234njF", "site":"site-1", "time":"2019-07-30"}
Please note that message1 and message4 are the duplicates with the same ID number.
Now, we want sure all messages are unique, so how can we filter topic1 and unique all message then send to topic2?
The end result we want:
Topic2: message1: {"name":"name-1", "id":"fbd89256-12gh-10og-etdgn1234njF", "site":"site-1", "time":"2019-07-30"}
message2: {"name":"name-2", "id":"fbd89256-12gh-10og-etdgn1234njG", "site":"site-1", "time":"2019-07-30"}
message3: {"name":"name-3", "id":"fbd89256-12gh-10og-etdgn1234njS", "site":"site-1", "time":"
This is known as exactly-once processing.
You might be interested in the first part of Kafka FAQ that describes some approaches on how to avoid duplication on data production (i.e. from the producer side):
Exactly once semantics has two parts: avoiding duplication during data
production and avoiding duplicates during data consumption.
There are two approaches to getting exactly once semantics during data
production:
Use a single-writer per partition and every time you get a network
error check the last message in that partition to see if your last
write succeeded
Include a primary key (UUID or something) in the
message and deduplicate on the consumer.
If you do one of these things, the log that Kafka hosts will be
duplicate-free. However, reading without duplicates depends on some
co-operation from the consumer too. If the consumer is periodically
checkpointing its position then if it fails and restarts it will
restart from the checkpointed position. Thus if the data output and
the checkpoint are not written atomically it will be possible to get
duplicates here as well. This problem is particular to your storage
system. For example, if you are using a database you could commit
these together in a transaction. The HDFS loader Camus that LinkedIn
wrote does something like this for Hadoop loads. The other alternative
that doesn't require a transaction is to store the offset with the
data loaded and deduplicate using the topic/partition/offset
combination.
I think there are two improvements that would make this a lot easier:
Producer idempotence could be done automatically and much more cheaply
by optionally integrating support for this on the server.
The existing
high-level consumer doesn't expose a lot of the more fine grained
control of offsets (e.g. to reset your position). We will be working
on that soon
Another option (which is not exactly what you are looking for), would be log compaction. Assuming that your duplicated messages have the same key, log compaction will eventually remove the duplicates when log compaction policy is effective.

Kafka compaction for de-duplication

I'm trying to understand how Kafka compaction works and have the following question: Does kafka guarantees uniqueness of keys for messages stored in topic with enabled compaction?
Thanks!
Short answer is no.
Kafka doesn't guarantees uniqueness for key stored with enabled topic retention.
In Kafka you have two types of cleanup.policy:
delete - It means that after configured time messages won't be available. There are several properties, that can be used for that: log.retention.hours, log.retention.minutes, log.retention.ms. By default log.retention.hours is set 168. It means, that messages older than 7 days will be deleted
compact - For each key at least one message will be available. In some situation it can be one, but in the most cases it will be more. Compaction processed is run in background periodically. It copies log parts with removing duplicates and only leaving last value.
If you want to read only one value for each key, you have to use KTable<K,V> abstraction from Kafka Streams.
Related question regarding latest value for key and compaction:
Kafka only subscribe to latest message?
Looking at 4 guarantees of kakfa compaction, number 4 states:
Any consumer progressing from the start of the log will see at least
the final state of all records in the order they were written.
Additionally, all delete markers for deleted records will be seen,
provided the consumer reaches the head of the log in a time period
less than the topic's delete.retention.ms setting (the default is 24
hours). In other words: since the removal of delete markers happens
concurrently with reads, it is possible for a consumer to miss delete
markers if it lags by more than delete.retention.ms.
So, you will have more than one value for the key if the head of the topic is not being retained by the delete.retention.ms policy.
As I understand it, if you set a 24h retention policy (delete.retention.ms=86400000), you'll have a unique value for a single key, for all messages that were from 24h ago. That's your at least, but not only, as many other messages for the same key may have arrived during the last 24 hours.
So, it is guaranteed that you'll catch at least one, but not just the last, because retention didn't act on recent messages.
edit. As cricket's comment states, even if you set a delete retention property of 1 day, the log.roll.ms is what defines when a log segment is closed, based on message's timestamp. As this last segment is never retained for compaction, it becomes the second factor that doesn't allow you having just the last value for your known key. If your topic starts at T0, then messages after T0+log.roll.ms will be on the open log segment, thus, not compacted.

How Kafka reply works in case of log compaction?

In Kafka if log compaction is enabled it will store only recent key values. If we try to reply these messages, will it just replay latest messages? How exactly Kafka-reply works?
Yes. Offsets of earlier, duplicate keys are dropped and the newest key offset is kept. The consumer skips over gaps in the broker offsets to read all messages available
Also, log compaction happens on a schedule, so you might see the same key within a partition for a certain amount of time, depending on the the properties defined on the broker/topic.