Kafka Streams - Low-Level Processor API - RocksDB TimeToLive(TTL) - apache-kafka

I'm kind of experimenting with the low level processor API. I'm doing data aggregation on incoming records using the processor API and writing the aggregated records to RocksDB.
However, I want to retain the records added in the rocksdb to be active only for 24hr period. After 24hr period the record should be deleted. This can be done by changing the ttl settings. However, there is not much documentation where I can get some help on this.
how do I change the ttl value? What java api should I use to set the ttl time to 24 hrs and whats the current default ttl settings time?

I believe this is not currently exposed via the api or configuration.
RocksDBStore passes a hard-coded TTL when opening a RocksDB:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java#L158
and the hardcoded value is simply TTL_SECONDS = TTL_NOT_USED (-1) (see line 79 in that same file).
There are currently 2 open ticket regarding exposing TTL support in the state stores: KAFKA-4212 and KAFKA-4273:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22rocksdb%20ttl%22
I suggest you comment on one of them describing your use case to get them moving forward.
In the interim, if you need the TTL functionality right now, state stores are pluggable, and the RocksDBStore sources readily available, so you can fork it and set your TTL value (or, like the pull request associated with KAFKA-4273 proposes, source it from the configs).
I know this is not ideal and sincerely hope someone comes up with a more satisfactory answer.

Related

What is the best way to handle obsolete information in a database with a spring boot application

I am working on an application tracking objects detected by multiple sensors. I receive my inputs by consuming a kafka message and I save the informations in a postgresql database.
An object is located in a specific location if it's last scan was detected by sensor in that exact location, example :
Object last scanned by sensor in room 1 -> means last location known for the object is room 1
The scans are continously happening and we can set a frequency of a few seconds to a few minutes. So for an object that wasn't scanned in the last hour for example. It needs to be considered out of range.
So my question now is, how can I design a system that generates some sort of notification when a device is out of range ?
for example if the timestamp of the last detection goes longer than 5 minutes ago, it triggers a notification.
The only solution I can think of, is to create a batch that repeatedly checks for all the objects with last detection time is more than 5 minutes for example. But I am wondering if that is the right approach and I would like to ask if there is a better way.
I use kotlin and spring boot for my application.
Thank you for your help.
You would need some type of heartbeat mechanism, yes.
Query all detection events with "last seen timestamp" greater than your threshold, and fire an alert when that returned result set is more than some tolerable threshold (e.g. if you are willing to accept intermittent lost devices and expect them to be found in the next scan).
As far as "where/how" to alert - up to you. Slack webhooks are a popular example. Grafana can do alerting and query your database.

How do I make sure that I process one message at a time at most?

I am wondering how to process one message at a time using Googles pub/sub functionality in Go. I am using the official library for this, https://pkg.go.dev/cloud.google.com/go/pubsub#section-readme. The event is being consumed by a service that runs with multiple instances, so any in memory locking mechanism will not work.
I realise that it's an anti-pattern to do this, so let me explain my use-case. Using mongoDB I store an array of objects as an embedded document for each entity. The event being published is modifying parts of this array and saves it. If I receive more than one event at a time and they start processing exactly at the same time, one of the saves will override the other. So I was thinking a solution for this is to make sure that only one message will be processed at a time, and it would be nice to use any built-in functionality in cloud pub/sub to do so. Otherwise I was thinking of implementing some locking mechanism in the DB but i'd like to avoid that.
Any help would be appreciated.
You can imagine 2 things:
You can use ordering key in PubSub. Like that, all the message in relation with the same object will be delivered in order and one by one.
You can use a PUSH subscription to PubSub, to push to Cloud Run or Cloud Functions. With Cloud Run, set the concurrency to 1 (it's by default with Cloud Functions gen1), and set the max instance to 1 also. Like that you can process only one message at a time, all the other message will be rejected (429 HTTP error code) and will be requeued to PubSub. The problem is that you can parallelize the processing as before with ordering key
A similar thing, and simpler to implement, is to use Cloud Tasks instead of PubSub. With Cloud Tasks you can set a rate limit on a queue, and set the maxConcurrentDispatches to 1 (and you haven't to do the same with Cloud Functions max instances or Cloud Run max instances and concurrency)

KStreamWindowAggregate 2.0.1 vs 2.5.0: skipping records instead of processing

I've recently upgraded my kafka streams from 2.0.1 to 2.5.0. As a result I'm seeing a lot of warnings like the following:
org.apache.kafka.streams.kstream.internals.KStreamWindowAggregate$KStreamWindowAggregateProcessor Skipping record for expired window. key=[325233] topic=[MY_TOPIC] partition=[20] offset=[661798621] timestamp=[1600041596350] window=[1600041570000,1600041600000) expiration=[1600059629913] streamTime=[1600145999913]
There seem to be new logic in the KStreamWindowAggregate class that checks if a window has closed. If it has been closed the messages are skipped. Compared to 2.0.1 these messages where still processed.
Question
Is there a way to get the same behavior like before? I'm seeing lots of gaps in my data with this upgrade and not sure how to solve this, as previously these gaps where not seen.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Update
While further exploring I indeed see it to be related to the graceperiod in ms. It seems that in my custom timestampextractor (that has the logic to use the timestamp from the payload instead of the normal timestamp), I'm able to see that the incoming timestamp for the expired window warnings indeed is bigger than the 24 hours compared to the event time from the payload.
I assume this is caused by consumer lags of over 24 hours.
The timestamp extractor extract method has a partition time which according to the docs:
partitionTime the highest extracted valid timestamp of the current record's partition˙ (could be -1 if unknown)
so is this the create time of the record on the topic? And is there a way to influence this in a way that my records are no longer skipped?
Compared to 2.0.1 these messages where still processed.
That is a little bit surprising (even if I would need to double check the code), at least for the default config. By default, store retention time is set to 24h, and thus in 2.0.1 older messages than 24h should also not be processed as the corresponding state got purged already. If you did change the store retention time (via Materialized#withRetention) to a larger value, you would also need to increase the window grace period via TimeWindows#grace() method accordingly.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Not sure what you mean by this or how you actually do this? The old and new logic are similar with regard to how a long a window is stored (retention time config). The new part is the grace period that you can increase to the same value as retention time if you wish).
About "partition time": it is computed base on whatever TimestampExtractor returns. For your case, it's the max of whatever you extracted from the message payload.

How to control retention over aggregate state store and changelog topic

My use case is the following:
Orders are flowing into an activation system via a topic. I have to Identify changes for records of same key. I compare the existing value with the new value using the aggregate function and output an event that points out the type of change identified i.e. DueDate Change.
The key is a randomly generated number and the number of unique keys is pretty much unbound. The same key will be reused in case the ordering system push a revision to an existing order.
The code has been running for a couple month in production but the state store and changelog topic are growing and there is a concern of space usage. I would like to have records expire after 90 days in the state store. I read about ways to apply a time based retention on state store and it looks like windowing the aggregation is a way of achieving that.
I understand that windowed aggregation are only available for tumbling and hopping window. Sliding window is available for join operation only.
Tumbling window wouldn't work in this case because I would have windows for 0-90, 90-180 and I wouldn't be able to identify an update on day 92 for a record that came in on day 89 (they wouldn't share the same window).
Now the only other option is hopping window.
TimeWindows timeWindow = TimeWindows.of(90days).advanceBy(1day).until(1day);
The problem is that I'll have to persist and update 90 windows. When the stream starts, 90 windows will be created 0-90, 1-91, 2-92, 3-93 etc. If I have a retention of 1 day on the windows, the window 0-90 will be cleaned up on day 91.
Now lets say on day 90 I get an update. Correct me if I'm wrong but my understanding is that I will have to update 90 windows and my state store will be quite large by that time because of all the duplicates. Maybe this is where I'm missing something. If a record is present in 90 windows, is it physically written on disk 90 times?
In the end all I need is to prevent my state store and changelog topic from growing indefinitely. 90 days of historical data is sufficient to support my use case.
Would there be a better way to approach this?
It might be simpler to not use the DSL but the Processor API with a windowed state store. A windowed state store is just a key-value store with expiration. Hence, you can use it similar to a key-value store -- you only provide an additional timestamp that will be used to expire data eventually.

Is CEP what I need (system state and event replaying)

I'm looking for a CEP engine, but I' don't know if any engine meets my requirements.
My system has to process multiple streams of event data and generate complex events and this is exactly what almost any CEP engine perfectly fits (ESPER, Drools).
I store all raw events in database (it's not CEP part, but I do this) and use rules (or continious queries or something) to generate custom actions on complex events. But some of my rules are dependent on the events in the past.
For instance: I could have a sensor sending event everytime my spouse is coming or leaving home and if both my car and the car of my fancy woman are near the house, I get SMS 'Dangerous'.
The problem is that with restart of event processing service I lose all information on the state of the system (is my wife at home?) and to restore it I need to replay events for unknow period of time. The system state can depend not only on raw events, but on complex events as well.
The same problem arises when I need some report on complex events in the past. I have raw events data stored in database, and could generate these complex events replaying raw events, but I don't know for which exactly period I have to replay them.
At the same time it's clear that for the most rules it's possible to find automatically the number of events to be processed from the past (or period of time to load events to be processed) to restore system state.
If given action depends on presence of my wife at home, CEP system has to request last status change. If report on complex events is requested and complex event depends on average price within the previous period, all price change events for this period should be replayed. And so on...
If I miss something?
The RuleCore CEP Server might solve your problems if I remember correctly. It does not lose state if you restart it and it contains a virtual logical clock so that you can replay events using any notion of time.
I'm not sure if your question is whether current CEP products offer joining historical data with live events, but if that's what you need, Esper allows you to pull data from JDBC sources (which connects your historical data with your live events) and reflect them in your EPL statements. I guess you already checked the Esper website, if not, you'll see that Esper has excellent documentation with lots of cookbook examples
But even if you model your historical events after your live events, that does not solve your problem with choosing the correct timeframe, and as you wrote, this timeframe is use case dependent.
As previous people mentioned, I don't think your problem is really an engine problem, but more of a use case one. All engines I am familiar with, including Drools Fusion and Esper can join incoming events with historical data and/or state data queried on demand from an external source (like a database). It seems to me that what you need to do is persist state (or "timestamp check-points") when a relevant change happens and re-load the state on re-starts instead of replaying events for an unknown time frame.
Alternatively, if using Drools, you can inspect existing rules (kind of reflection on your rules/queries) to figure out which types of events your rules need and backtrack your event log until a point in time where all requirements are met and load/replay your events from there using the session clock.
Finally, you can use a cluster to reduce the restarts, but that does not solve the problem you describe.
Hope it helps.