Permanent Kafka Streams/KSQL retention policy - apache-kafka

I'm presently working on an use case in which user interaction with a platform is tracked, thus generating a stream of events that gets stored into kafka and will be subsequently processed in Kafka Streams/KSQL.
But I've run into an issue concerning the state store and changelog topic retention policies. User sessions could happen indefinitely apart in time, therefore I must guarantee that the state will be persisted through that period and restored in case of node and clusterwide failures. During our searches, we came accross the following information:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management
Kafka Streams allows for stateful stream processing, i.e. operators that have an internal state. (...). The default implementation used by Kafka Streams DSL is a fault-tolerant state store using 1. an internally created and compacted changelog topic (for fault-tolerance) and 2. one (or multiple) RocksDB instances (for cached key-value lookups). Thus, in case of starting/stopping applications and rewinding/reprocessing, this internal data needs to get managed correctly.
(...) Thus, RocksDB memory requirement does not grow infinitely (in contrast to changelog topic). (KAFKA-4015 was fixed in 0.10.1 release, and windowed changelog topics don't grow unbounded as they apply an additional retention time parameter).
Retention time in kafka local state store / changelog
"For windowed KTables there is a local retention time and there is the changlog retention time. You can set the local store retention time via Materialized.withRetentionTime(...) -- the default value is 24h.
If a new application is created, changelog topics are created with the same retention time as local store retention time."
https://docs.confluent.io/current/streams/developer-guide/config-streams.html
The windowstore.changelog.additional.retention.ms parameter states:
Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift.
It would seem that Kafka Streams' maintains both a (replicated) local state store and a changelog topic for fault tolerance, with both having a finite, configurable retention period, and will apparently erase records once the retention time expires. This would lead to unnaceptable data loss in our platform, thus raising the following questions:
Does Kafka Streams actually clean up the default state store over time or have I misunderstood something? Is there an actual risk of data loss?
In that case, is it advisable or even possible to set an infinite retention policy to the state store? Or perhaps there could be another way of making sure the state will be persisted, such as using a more traditional database as state store, if that makes sense?
Does the retention policy apply to standby replicas?
If it's impossible to persist the state permanently, could there be another stream processing framework that better suits our use case?
Any clarification would be appreciated.

Seems you're asking about two different things. Session windows and changelog topics...
Compacted topics retain unique key pairs forever. Session window duration should probably be closed over time; a user session a week/month/year from one today is arguably a new session, and you should tie together each individual session window as a collection by the userId, not only store the most recent session (which implies removing previous sessions from the state store)

Related

Does rebuilding state stores in Kafka Streams propagate duplicate records to downstream topics?

I'm currently using Kafka Streams for a stateful application. The state is not stored in a Kafka state store though, but rather just in memory for the moment being. This means whenever I restart the application, all state is lost and it has to be rebuilt by processing all records from the start.
After doing some research on Kafka state stores, this seems to be exactly the solution I'm looking for to persist state between application restarts (either in memory or on disk). However, I find the resources online lack some pretty important details, so I still have a couple of questions on how this would work exactly:
If the stream is set to start from offset latest, will the state still be (re)calculated from all the previous records?
If previously already processed records need to be reprocessed in order to rebuild the state, will this propagate records through the rest of the Streams topology (e.g. InputTopic -> stateful processor -> OutputTopic, will this result in duplicated records in the OutputTopic because of rebuilding state)?
State stores use their own changelog topics, and kafka-streams state stores take on responsibility for loading from them. If your state stores are uninitialised, your kafka-streams app will rehydrate its local state store from the changelog topic using EARLIEST, since it has to read every record.
This means the startup sequence for a brand new instance is roughly:
Observe there is no local state-store cache
Load the local state store by consumeing from the changelog topic for the statestore (the state-store's topic name is <state-store-name>-changelog)
Read each record and update a local rocksDB instance accordingly
Do not emit anything, since this is an application-service, not your actual topology
Read your consumer-groups offsets using EARLIEST or LATEST according to how you configured the topology. Not this is only a concern if your consumer group doesn't have any offsets yet
Process stuff, emitting records according to the topology
Whether you set your actual topology's auto.offset.reset to LATEST or EARLIEST is up to you. In the event they are lost, or you create a new group, its a balance between potentially skipping records (LATEST) vs handling reprocessing of old records & deduplication (EARLIEST),
Long story short: state-restoration is different from processing, and handled by kafka-streams its self.
If the stream is set to start from offset latest, will the state still be (re)calculated from all the previous records?
If you are re-launching the same application (e.g. after having stopped it before), then state will not be recalculated by reprocessing the original input data. Instead, the state will be restored from its "backup" (every state store or KTable is durably stored in a Kafka topic, the so-called "changelog topic" of that table/state store for such purposes) so that its data is exactly what it was when the application was stopped. This behavior enables you to seamlessly stop+restart your applications without skipping over records that arrived between "stop" and "restart".
But there is a different caveat that you need to be aware of: The configuration to set the offset start point (latest or earliest) is only used when you run your Kafka Streams application for the first time. Afterwards, whenever you stop+restart your application, it will always continue where it previously stopped. That's because, if the app has run at least once, it has stored its consumer offset information in Kafka, which allows it to know from where to automatically resume operations once it is being restarted.
If you need the different behavior of always (re)starting from e.g. the latest offsets (thus potentially skipping records that arrived in between when you stopped the application and when you restarted it), you must reset your Kafka Streams application. One of the steps the reset tool performs is removing the application's consumer offset information from Kafka, which makes the application think that it was never started before, so to speak.
If previously already processed records need to be reprocessed in order to rebuild the state, will this propagate records through the rest of the Streams topology (e.g. InputTopic -> stateful processor -> OutputTopic, will this result in duplicated records in the OutputTopic because of rebuilding state)?
This reprocessing will not happen by default as explained above. State will be automatically reconstructed to its prior state (pun intended) at the point when the application was stopped.
Reprocessing would only happen if you manually reset your application (see above) and e.g. configure the application to re-read historical data (like setting auto.offset.reset to earliest after you did the reset).

Kafka Topic Retention and impact on the State store in Kafka streams

I have a state store (using Materialized.as()) in Kafka streams word-count application. Based on my understanding the state-store is maintained in Kafka internal topic.
Following questions are :
Can state-stores have unlimited key-value pairs, or they are
governed by the rules of kafka topics based on the log.retention
policies or log.segment.bytes?
I set the log.retention.ms=60000 and
expected the state store value to be reset to 0 after a minute. But I find that it
is not happening, I can still see values from state store. Does kafka completely wipe out the logs or keeps
the SNAPSHOT in case log-compaction topic?
What does it mean by "segment gets committed"?
Please post along with the sources for solution if available.
Can state-stores have unlimited key-value pairs, or they are governed by the rules of kafka topics based on the log.retention policies or log.segment.bytes?
Yes, state stores can have unlimited key-value pairs = events (or 'messages'). Well, local app storage space and remote storage space in Kafka permitting, of course (the latter for durably storing the data in your state stores).
Your application's state stores are persisted remotely in compacted internal Kafka topics. Compaction means that Kafka periodically purges older events for the same event key (e.g., Bob's old account balances) from storage. But compacted topics do not remove the most recent event per event key (e.g., Bob's current account balance). There is no upper limit for how many such 'unique' key-value pairs will be stored in a compacted topic.
I set the log.retention.ms=60000 and expected the state store value to be reset to 0 after a minute. But I find that it is not happening, I can still see values from state store.
log.retention.ms is not used when a topic is configured to be compacted (log.cleanup.policy=compact). See the existing SO question Log compaction to keep exactly one message per key for details, including for why compaction doesn't happen immediately (in short that's because compaction operates on partition segment files, it will not touch the most current segment file, and there can be multiple events per event key in that file).
Note: You can nowadays set the configuration log.cleanup.policy to a combination of compaction and time/volume-based retention with log.cleanup.policy=compact,delete (see KIP-71 for details). But generally you shouldn't fiddle with this setting unless you really know what you are doing -- the defaults are what you need 99% of the time.
Does kafka completely wipe out the logs or keeps the SNAPSHOT in case log-compaction topic? What does it mean by "segment gets committed"?
I don't understand this question, unfortunately. :-) Perhaps my previous answers and reference links already cover your needs. What I can say is that, no, Kafka doesn't wipe out the logs completely. Compaction operates on a topic-partition's segment files. You will probably need to read up on how compaction works, for which I would suggest an article like https://medium.com/#sunny_81705/kafka-log-retention-and-cleanup-policies-c8d9cb7e09f8, in case the Apache Kafka docs weren't sufficiently clear yet.
State stores are maintained by compacted, internal topics. They therefore follow the same semantics of compacted topics, and have to finite retention duration
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Internal+Data+Management

When does kafka stream app clean it state store?

I have a kafka streams app which is currently just joining two KStreams with a 5-minute window and writing the join result to another topic.
Since I am joining two topics over a time window, my app will have state associated with it. I was under the impression that the state stores in my app would get pruned after every the 5-minute window (Because my app cares only about the 5 min window of events for the join state).
I was expecting a constant disk-space utilization. But, seems like that is not the case. Its been 12 hrs and I do not see that the state store is getting cleaned up. It's consistently growing.
So I have multiple concerns on this now,
When does Kafka Streams app clean up its state?
If one of my app in the kafka streams app cluster fails, and I boot another host and make it join the cluster, after rebalancing, is there orphaned state store sitting in the disk for the partitions that got rebalanced?
My understanding is that the events are joined only if they happen in the defined window, so, why does kafka need to hold on to data that is older than the defined window period in its state store?
Let me know if you need any other information from me regarding my streams app. I am currently running kafka-streams version 2.2.1 and my brokers are also on the same version.
When does Kafka Streams app clean up its state?
The size of the state depends on the retention period, that is 1 day by default.
Atm, it's not possible to change the retention period for KStream-KStream joins -- it's already WIP to add this feature: https://issues.apache.org/jira/browse/KAFKA-8558
If one of my app in the kafka streams app cluster fails, and I boot another host and make it join the cluster, after rebalancing, is there orphaned state store sitting in the disk for the partitions that got rebalanced?
Yes. However, this state will be cleaned (if you restart Kafka Streams on the recovered host) if the state is not reused after a configurable (state.cleanup.delay.ms) period of time.
My understanding is that the events are joined only if they happen in the defined window, so, why does kafka need to hold on to data that is older than the defined window period in its state store?
Having a higher retention period that your window size allows Kafka Streams to process out-of-order data. Note that Kafka Streams uses event time semantics, not processing time semantics.

Kafka stream - define a retention policy for a changelog

I use Kafka Streams for some aggregations of a TimeWindow.
I'm interested only in the final result of each window, so I use the .suppress() feature which creates a changelog topic for its state.
The retention policy configuration for this changelog topic is defined as "compact" which to my understanding will keep at least the last event for each key in the past.
The problem in my application is that keys often change. This means that the topic will grow indefinitely (each window will bring new keys which will never be deleted).
Since the aggregation is per window, after the aggregation was done, I don't really need the "old" keys.
Is there a way to tell Kafka Streams to remove keys from previous windows?
For that matter, I think configuring the changelog topic retention policy to "compact,delete" will do the job (which is available in kafka according to this: KIP-71, KAFKA-4015.
But is it possible to change the retention policy so using the Kafka Streams api?
suppress() operator sends tombstone messages to the changelog topic if a record is evicted from its buffer and sent downstream. Thus, you don't need to worry about unbounded growth of the topic. Changing the compaction policy might in fact break the guarantees that the operator provide and you might loose data.

Kafka Streams: Is it possible to have "compact,delete" policy on state stores?

Kafka Streams state stores are "compact" by default. Is it possible to set "compact,delete" with a retention policy in a state store?
Yes, it is possible to configure topics with retention and compaction and Kafka Streams uses this setting for windowed KTables.
If you really want to set this, you can update the corresponding changelog topic config manually after it is created.
However, setting topic retention time for changelog topics deletes the data only from the topic. Data is not deleted in the local state store. State stores don't offer TTL and RocksDBs TTL setting cannot be enabled (for technical reasons that we hope to resolve eventually).
If you want to delete data cleanly, you should use tombstone messages that will delete the data from the store as well as the changelog topic (instead of using retention time).
If you are using the default RocksDBStore there is an option to set up CompactionStyle to FIFO:
FIFO compaction style is the simplest compaction strategy. It is suited for keeping event log data with very low overhead (query log for example). It periodically deletes the old data, so it's basically a TTL compaction style.
and then use the TTL:
A new option, compaction_options_fifo.ttl, has been introduced for this to delete SST files for which the TTL has expired. This feature enables users to drop files based on time rather than always based on size, say, drop all SST files older than a week or a month.
RocksDB FIFO doc
To actually set the FIFO you have to implement RocksDBConfigSetter and set it as configuration property: rocksdb.config.setter