Tombstones and key store cleanup - apache-kafka

I have a few Kafka consumers implemented in Java and I am implementing a standalone application to check records and tombstone them. The hope is that Kafka will delete state stores as it compacts topics.
Now... I am getting a bit confused on the different types of stores that are created by Kafka. For each type of store, I would like to know:
Is it deleted when Kafka deletes old records in the corresponding topic?
Is it deleted when you tombstone records in the corresponding topic?
Are we stuck with it?
The types of stores that I see are the following:
KSTREAM-AGGREGATE-STATE-STORE changelog
KSTREAM-AGGREGATE-STATE-STORE repartition
(KTABLE) STATE-STORE changelog
KSTREAM-KEY-SELECT repartition
For the stream topology using an Aggregate function, we already have a tombstone strategy which should cover stores of type #1. We send a null message to the stream application & it returns it as the aggregate result.
For stores of type #3, I will be running a tombstoning application on the corresponding ktable. I expect the changelog to shrink.
However, for stores of type #2 and #4, I have no idea how they are cleaned up. They corresponding to selectKey()+leftJoin() functions in my consumer's topology. However, they are tied to a kstream-centric topology so I don't know what to do to have them cleaned up. Any suggestions that don't involve stopping the broker?

#2 and #4 are not stores - they are internal topics created by Kafka Streams.
There is no way to explicitly cleanup these repartition topics but Kafka would automatically cleanup these based on the kafka server's retention.ms config (which is default of 7 days if not specified explicitly) as these topics (#2 and #4) are created by Kafka Streams with cleanup.policy = delete.

Related

Kafka - uncompacted topics Vs compacted topics

I come across the following two phrases from the book "Mastering Kafka Streams and ksqlDB" and author used two terms, what does they really mean "compacted topics" and "uncompacted topics"
Does they got anything to with respect to "log compaction" ?
Tables can be thought of as updates to a database. In this view of the logs, only the current state (either the latest record for a given key or some kind of aggregation) for each key is retained. Tables are usually built from compacted topics.
Streams can be thought of as inserts in database parlance. Each distinct record remains in this view of the log. Streams are usually built from uncompacted topics.
Yes, log compaction according to kafka docs
Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition
https://kafka.apache.org/documentation/#compaction
If log compaction is enabled on topic, Kafka removes any old records when there is a newer version of it with the same key in the partition log.
For more detailed explanation of log compaction refer - https://medium.com/swlh/introduction-to-topic-log-compaction-in-apache-kafka-3e4d4afd2262
Yes, these terms are synonymous.
Ref: Log Compaction
From this article:
The idea behind compacted topics is that no duplicate keys exist. Only the most recent value for a message key is maintained.
It is mostly used for the scenarios such as restoring to the previous state before the application crashed or system failed, or while reloading cache after application restarts.
As an example of the above kafka has the topic __consumer_offsets which can be used to to continue from the last message which was read after a crash or a restart. A schema registry is also often used to ensure compatible communication between producers and consumers. The schemas used are maintained in the __schemas topic.

Get latest values from a topic on consumer start, then continue normally

We have a Kafka producer that produces keyed messages in a very high frequency to topics whose retention time = 10 hours. These messages are real-time updates and the used key is the ID of the element whose value has changed. So the topic is acting as a changelog and will have many duplicate keys.
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal, keeping the minimum load on Kafka server and letting the consumer do most of the job. We tried many ways and none of them seems the best.
What we tried:
1 changelog topic + 1 compact topic:
The producer sends the same message to both topics wrapped in a transaction to assure successful send.
Consumer launches and requests the latest offset of the changelog topic.
Consumes the compacted topic from beginning to construct the table.
Continues consuming the changelog since the requested offset.
Cons:
Having duplicates in compacted topic is a very high possibility even with setting the log compaction frequency the highest possible.
x2 number of topics on Kakfa server.
KSQL:
With KSQL we either have to rewrite a KTable as a topic so that consumer can see it (Extra topics), or we will need consumers to execute KSQL SELECT using to KSQL Rest Server and query the table (Not as fast and performant as Kafka APIs).
Kafka Consumer API:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Kafka Streams:
By using KTables as following:
KTable<Integer, MarketData> tableFromTopic = streamsBuilder.table("topic_name", Consumed.with(Serdes.Integer(), customSerde));
KTable<Integer, MarketData> filteredTable = tableFromTopic.filter((key, value) -> keys.contains(value.getRiskFactorId()));
Kafka Streams will create 1 topic on Kafka server per KTable (named {consumer_app_id}-{topic_name}-STATE-STORE-0000000000-changelog), which will result in a huge number of topics since we a big number of consumers.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Thanks in advance.
By using KTables, Kafka Streams will create 1 topic on Kafka server per KTable, which will result in a huge number of topics since we a big number of consumers.
If you are just reading an existing topic into a KTable (via StreamsBuilder#table()), then no extra topics are being created by Kafka Streams. Same for KSQL.
It would help if you could clarify what exactly you want to do with the KTable(s). Apparently you are doing something that does result in additional topics being created?
1 changelog topic + 1 compact topic:
Why were you thinking about having two separate topics? Normally, changelog topics should always be compacted. And given your use case description, I don't see a reason why it should not be:
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal [...]
Hence compaction would be very useful for your use case. It would also prevent this problem you described:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Note that, to reconstruct the latest table values, all three of Kafka Streams, KSQL, and the Kafka Consumer must read the table's underlying topic completely (from beginning to end). If that topic is NOT compacted, this might indeed take a long time depending on the data volume, topic retention settings, etc.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Without knowing more about your use case, particularly what you want to do with the KTable(s) once they are populated, my answer would be:
Make sure the "changelog topic" is also compacted.
Try KSQL first. If this doesn't satisfy your needs, try Kafka Streams. If this doesn't satisfy your needs, try the Kafka Consumer.
For example, I wouldn't use the Kafka Consumer if it is supposed to do any stateful processing with the "table" data, because the Kafka Consumer lacks built-in functionality for fault-tolerant stateful processing.
Consumer starts and consumes the topic from beginning. This worked
perfectly, but the consumer has to consume the 10 hours change log to
construct the last values table.
During the first time your application starts up, what you said is correct.
To avoid this during every restart, store the key-value data in a file.
For example, you might want to use a persistent map (like MapDB).
Since you give the consumer group.id and you commit the offset either periodically or after each record is stored in the map, the next time your application restarts it will read it from the last comitted offset for that group.id.
So the problem of taking a lot of time occurs only initially (during first time). So long as you have the file, you don't need to consume from beginning.
In case, if the file is not there or is deleted, just seekToBeginning in the KafkaConsumer and build it again.
Somewhere, you need to store this key-values for retrieval and why cannot it be a persistent store?
In case if you want to use Kafka streams for whatever reason, then an alternative (not as simple as the above) is to use a persistent backed store.
For example, a persistent global store.
streamsBuilder.addGlobalStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(topic), keySerde, valueSerde), topic, Consumed.with(keySerde, valueSerde), this::updateValue);
P.S: There will be a file called .checkpoint in the directory which stores the offsets. In case if the topic is deleted in the middle you get OffsetOutOfRangeException. You may want to avoid this, perhaps by using UncaughtExceptionHandler
Refer to https://stackoverflow.com/a/57301986/2534090 for more.
Finally,
It is better to use Consumer with persistent file rather than Streams for this, because of simplicity it offers.

How to update internal changelog topic partitions when a source topic partition count is updated?

I have an application in which I'm using a Kstream-Kstream join and Ktream-Ktable join.
I have updated the input source topic partition count from 4 to 16 and the application stopped with below error.
Could not create internal topics: Existing internal topic application-test-processor-KSTREAM-JOINTHIS-0000000009-store-changelog has invalid partitions. Expected: 16 Actual: 4. Use 'kafka.tools.StreamsResetter' tool to clean up invalid topics before processing. Retry #3
How to update internal changelog topic partition count when a source topic partition count is updated ?
Note: We are using kafka version: 0.10.2.1
I looked at the application resetter tool from this link: https://docs.confluent.io/current/streams/developer-guide/app-reset-tool.html
but it doesn't say how to update the changelog partition.
Thanks in advance.
Using the reset tool is actually recommended.
The state of your application is sharded based on the number of input partitions. This was originally 4. Thus, changing it to 16 broke the application. If you would manually add partitions to the changelog topic (what would be possible and resolve the exception, but not really fix the issue), state would not be redistributed and thus would be corrupted.
If you use the reset tool, you delete all state and let your application reprocess all input data from scratch. This allows Kafka Streams to recreate the state correctly (now with 16 shards).

Kafka stream application not consume data after restart

After I did restart our Kafka cluster my application of Kafka streams didn't receive messages from input topic and I got an exception of "can׳t create internal topic". After some research, I did reset with the Kafka tool (to the input topic and the application) the tool is Kafka-streams-application-reset.sh.
Unfortunately, it didn't resolve the problem and I also got the exception again
From the error message, you can infer that the topic already exists and thus, cannot be created. The reason for the failure is, that the existing topic does not have the expected number of partitions (it has 1 instead of 150) -- if the number of partitions would match, Kafka Streams would just use the existing topic.
This can happen, if you have topic auto-create enabled at the brokers (and the topic was created with a wrong number of partitions), or if the number of partitions of your input topic changed. Kafka Streams does not automatically change the number of partitions for the repartition topic, because this might result in data corruption and thus lead to incorrect results.
One way to fix this, it to either manually delete this topic: note, that this might result in data loss and you should only do this, if you know that it is what you want.
Another (better way) would be, to reset the application cleanly using bin/kafka-streams-application-reste.sh in combination with KafkaStreams#cleanup().
Because you need to clean up the application and users should be aware of the implication, Kafka Streams fails to make user aware of the issue instead of "auto magically" take some actions that might be undesired from a user point of view.
Check out the docs for more details. There is also a blog post that explains application reset in details:
https://kafka.apache.org/11/documentation/streams/developer-guide/app-reset-tool.html
https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/

Kafka Streams state store backup topic partitioning strategy

Kafka guarantees that messages with same key will always go to the same
partition.
For instance, I have message with the string key: 2329. And two topics t1 and t2. When I perform write of this message it goes into partition 1 in both topics, as expected.
Now the problem itself: I'm using Kafka Streams 0.10.2.0 persistent state store, which automatically creates a backup topic. Now in case of this backup topic message with the key: 2329 goes into another partition (partition 0), which is strange to me.
Has anyone encountered this issue?
I've found where was the issue.
To increase performance I've skipped repartitioning before write data to the state store. And used another column from the value as the state store key. And it worked until additional topic with enrichment information has been added. So I just forgot to perform repartitioning.