Session window how calculate gap? - apache-kafka

I try to understand this shema of window session:
As I got right we have four events:
12:00:00 - event started in this time
12:00:25 - another event was ended
12:00:30 - event started in this time
12:00:50 - another event was ended
How do we get gap 15 seconds?
Could you explain what is start/end - is it one event or two different?

Events don't have a start or end time, but only a single scalar event-timestamp.
If you use session windows, events that have a time difference to each other smaller than the gap parameter, fall into the same window.
Thus, the start and end of a session window always corresponds to an event.
Note that session windows are not designed for the case when you have dedicate start/end events in your input stream. Thinks of session windows more like a "session detection" scenario, i.e., you don't have sessions in your input stream, and want to sessionize your input data based on the record timestamps.
Check out the docs for more details: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#session-windows

Related

What is the best way to handle obsolete information in a database with a spring boot application

I am working on an application tracking objects detected by multiple sensors. I receive my inputs by consuming a kafka message and I save the informations in a postgresql database.
An object is located in a specific location if it's last scan was detected by sensor in that exact location, example :
Object last scanned by sensor in room 1 -> means last location known for the object is room 1
The scans are continously happening and we can set a frequency of a few seconds to a few minutes. So for an object that wasn't scanned in the last hour for example. It needs to be considered out of range.
So my question now is, how can I design a system that generates some sort of notification when a device is out of range ?
for example if the timestamp of the last detection goes longer than 5 minutes ago, it triggers a notification.
The only solution I can think of, is to create a batch that repeatedly checks for all the objects with last detection time is more than 5 minutes for example. But I am wondering if that is the right approach and I would like to ask if there is a better way.
I use kotlin and spring boot for my application.
Thank you for your help.
You would need some type of heartbeat mechanism, yes.
Query all detection events with "last seen timestamp" greater than your threshold, and fire an alert when that returned result set is more than some tolerable threshold (e.g. if you are willing to accept intermittent lost devices and expect them to be found in the next scan).
As far as "where/how" to alert - up to you. Slack webhooks are a popular example. Grafana can do alerting and query your database.

Does Apache Beam stateful Processing consider window lateness constraints (withAllowedLateness) for resetting state?

I'm trying to implement a valueState to filter records in my ParDo transformation. The high level flow is this:
Fixed-Window of 1hr size, with allowedLateness (10min)
The first message (for a given key) that is processed in the ParDo shall set the valueState(boolean) to true. Subsequent messages for the same key shall be dropped if corresponding valueState is set to true. (Allow only first message for a given key in every window).
The messages (that are not dropped in step 2) will be written out as output.
While testing this however, I see that, after the Fixed window time-period ends (1hr), the state is reset/lost. Ideally, the state should be available to process late records until allowedLateness period (10min is complete).
These parts are right:
Each 1 hour window expires when the watermark reaches the end of the hour plus 10 minutes.
For a given window, the state is cleaned up after the window expires.
Here are the parts that I have corrections
State is never reset.
Elements with timestamps in different windows are processed totally independently. Many windows may be receiving data at the same time. Each hour window happened after another, when the data was generated. But it is not processed after the other.
Allowed lateness will not cause elements from a later window to be processed using the state from the prior window. It will simply allow the state to stay longer and the elements to not be dropped.

Flink session window with onEventTime trigger?

I want to create an EventTime based session-window in Flink, such that it triggers when the event time of a new message is more than 180 seconds greater than the event time of the message, that created the window.
For example:
t1(0 seconds) : msg1 <-- This is the first message which causes the session-windows to be created
t2(13 seconds) : msg2
t3(39 seconds) : msg3
.
.
.
.
t7(190 seconds) : msg7 <-- The event time (t7) is more than 180 seconds than t1 (t7 - t1 = 190), so the window should be triggered and processed now.
t8(193 seconds) : msg8 <-- This message, and all subsequent messages have to be ignored as this window was processed at t7
I want to create a trigger such that the above behavior is achieved through appropriate watermark or onEventTime trigger. Can anyone please provide some examples to achieve this?
The best way to approach this might be with a ProcessFunction, rather than with custom windowing. If, as shown in your example, the events will be processed in timestamp order, then this will be pretty straightforward. If, on the other hand, you have to handle out-of-order events (which is common when working with event time data), it will be somewhat more complex. (Imagine that msg6 with for time 187 arrives after t8. If that's possible, and if that will affect the results you want to produce, then this has to be handled.)
If the events are in order, then the logic would look roughly like this:
Use an AscendingTimestampExtractor as the basis for watermarking.
Use Flink state (perhaps ListState) to store the window contents. When an event arrives, add it to the window and check to see if it has been more than 180 seconds since the first event. If so, process the window contents and clear the list.
If your events can be out-of-order, then use a BoundedOutOfOrdernessTimestampExtractor, and don't process the window's contents until currentWatermark indicates that event time has passed 180 seconds past the window's start time (you can use an event time timer for this). Don't completely clear the list when triggering a window, but just remove the elements that belong to the window that is closing.

kafka Streams session windows

Hello I am working on kafka session window with inactive time 5 mins. I want some kind of feedback when inactive time is reached and session is drooped for the key.
lets assume I have
(A,1)
record where 'A' is the key. now if i don't get any 'A' key record in 5 mins the session is dropped.
I want to do some operation on end of session lets say (value)*2 for that session. is there any way I can achieve this using Kafka Stream API
Kafka Streams does not drop a session after the gap-time passed. Instead, if will create a new session if another record with the same key arrives after the gap-time passed and maintain both session in parallel. This allows to handle out-of-order data. It could even happen, that two session get merged if an out-of-order data falls into a gap and "connects" both sessions with each other.
Sessions are maintained for 1 day by default. You can change this via SessionWindows#until() method. If a session expires it will be dropped silently. There is no notification. You also need to consider config parameter window.store.change.log.additional.retention.ms:
The default retention setting is Windows#maintainMs() + 1 day. You can override this setting by specifying StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG in the StreamsConfig.
Thus, you want to do react if time passed, you should look into punctuations that allow you to register regular callbacks (some kind of timer) either based on "even time progress" or wall-clock time. This allows you to react if a session is not update for a certain period of time and you think it's "completed".

Flink event-time session windows with max total time

I was wondering if it's possible to create a WindowAssigner that is similar to:
EventTimeSessionWindows.withGap(Time.seconds(1L))
Except I don't want the window to keep growing in event-time on each element. I want the beginning of the window to be defined at the first element received (for that key), and end exactly 1 second later, no matter how many elements arrive in that second.
So it would probably look like this hypothetically:
EventTimeSessionWindows.withMax(Time.seconds(1L))
Thanks!
There is no built-in window for this use case.
However, you can implement this with a GlobalWindow, which collects all incoming elements, and a Trigger that registers a timer when an element is received and the window is empty, i.e., the first element or the first element after the window was purged. The window collects new elements until the timer fires. At that point, the window is evaluated and purged.