How to find last hopping window using Apache Kafka Streams - apache-kafka

I'm trying to get average value in the last 30 seconds using hopping windows. Here are windowing and suppressing code;
.windowedBy(TimeWindows.of(Duration.ofSeconds(30)).advanceBy(Duration.ofSeconds(30)).grace(Duration.ZERO))
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
When I do that, I'm getting hopping windows in 30 seconds. But I'm interested in just the last 30 seconds. Do I catch the last hopping windows? Then I'm going to look for the top 5 average value in that window using Java treeset.

If you only want the latest you can put the windows in a KTable and if they have the same key you will only have the latest window in the table.

Related

Is it possible to close a kafka streams window before its retention period?

First of all, I must say that I'm new to Kafka and streaming, but I'll try to explain the problem the best I can.
My team is currently developing and application to process data using Kafka Streams. We're using windowing to perform aggregation operations and we need to emit the results only when the window is closed.
Now here's the problem itself: our window retention period is of three days (yes, that is correct). We must keep the window open for this long period of time in case any record arrives late, but the normal course of the operations is that all the records arrive on time and we're able to identify when one batch is complete so, form most of the time, the windows don't need to stay open for so long.
So my question is: knowing that the last record of a batch has already arrived before the 3-day window retention period, is it possible to close the window and emit the aggregation results?

How to select the type of time semantics when working with tumbling windows?

I am working on kafka streams windowing , particularly tumbling windows for my use case.
TimeWindowedKStream<String, Blob> windowedStreams = groupedStreams
.windowedBy(TimeWindows.of(TimeUnit.MINUTES.toMillis(5)));
this is a tumbling window for 5 minutes per record key and advances by 5 minutes. For my use case, I want no old message to be dropped and hence I want it to consider processing time as time semantic.
what is the default behaviour of tumbling window for time semantics, how does I specify in tumbling windows which time semantic to pick ?event time/processing time/ingestion time.
The time semantics are not specified on the window definition, but depend on the configured TimestampeExtractor. If you want to switch to processing time semantics, you can set default.timestamp.extractor to WallclockTimestampExtractor.class in the KafkaStreams config.
Compare
https://docs.confluent.io/current/streams/concepts.html#time
https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-timestamp-extractor

How does the suppress with emitEarlywhenFull() for tumbling windows works?

I am using suppress on tumbling windows to get aggregated results. I am exploring both untilTimeLimit and untilWindowCloses for suppress.
I dont want my streams to shutdown when buffer fulls. I have seen this feature emitEarlyWhenFull() , but it cant be applicable on top of untilWindowCloses.
Hence, i am picking untilTimeLImit with emitEarlyWhenFull() , please refer below code :
groupedStreams.windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
.aggregate(() -> initialBlob, blobAggregator,someserde)
.suppress(Suppressed.untilTimeLimit(Duration.ofMinutes(5), new StrictBufferConfigImpl().emitEarlyWhenFull()))
.toStream()
In my case, I am using tumbling windows for 5 mins. So, for every 5 mins, a window will be open for every record key. According to documentation, oldest records will be sent when the buffer gets filled.
what happens to the new records with same key comes after old records sent down in the same tumbling window?
For Example : messages flow :
(A,1)
(A,2)
(A,3) -> agg result : (A,6) . suppose here , the buffer is full, (A,6) will be sent downstream. Lets suppose (A,4) comes now in the same tumbling window, what will come next ? will it be : (A,10) or it will start fresh with (A,4) again?
If suppress() emits, the state will be preserved. Thus, for your example, the aggregation will continue and eventually (A,10) will be emitted.

flink Windows, when do they start

I want to capture events from a Apache Flink DataStream, every "natural" hour. That is, I want to capture events in a window from 12:00:00 till 12:59:59, 13:00:00 till 13:59:59...
I have been using:
datastream.keyBy(0)
.timeWindow(Time.minutes(60))
But how do I know those 60 minutes start at every o'clock, and that the window is not, for instance, from 12:30:00 till 13:29:59?
Your answer is here. To summarize:
For tumbling and sliding windows, windows are aligned with epoch (00:00:00 1 January 1970). Therefore, if you don't change the offset parameter, then your tumbling window will match the "o'clock" times.

How to write only final output of KStreams windowed operation?

Say, I need to do wordcount like processing but for every 5 minutes. So i am using tumbling windows, but in the output what i see is the intermittent changelog counts also. I want to see only the final counts for the window in the output.
Is there a way to achieve this.