How can I define an overlapping tumbling window as EPL query in Esper? I'm looking for equivalent of hopping windows similar to these:
https://learn.microsoft.com/en-us/stream-analytics-query/hopping-window-azure-stream-analytics
For example: 1 second hopping window with 500ms overlap.
Esper's reference manual describes tumbling windows and overlapping context, but how can I express it as a query?
Thank you.
The solution is the overlapping context. Things that fire regularly are done with a pattern.
create context Hopping
initiated by pattern[every timer:interval(500 milliseconds)]
terminated after 1 second;
context Hopping select sum(price) from StockTick output when terminated;
Related
I'm exploring the Azure Data factory scheduled and tumbling trigger.
In both cases suppose I choose the 1 hour recurrence interval, then the pipeline runs every hour.
When should we use scheduled vs tumbling trigger?
Tumbling window triggers have a self-dependency property which is not available with Schedule triggers. If the consecutive pipeline runs depend on each other, the self-dependency property can be used. Other significant differences between these triggers, including the self-dependency property are mentioned in the following Microsoft Q&A link.
https://learn.microsoft.com/en-us/answers/questions/207405/when-to-use-tumbling-window-type-trigger.html
Similar thread:
https://stackoverflow.com/questions/72194846/what-is-the-behavior-of-adf-schedule-triggers-when-a-schedule-starts-before-the#:~:text=The%20trigger%20used%20has%20a,not%20available%20with%20Schedule%20triggers.
I'm trying to get average value in the last 30 seconds using hopping windows. Here are windowing and suppressing code;
.windowedBy(TimeWindows.of(Duration.ofSeconds(30)).advanceBy(Duration.ofSeconds(30)).grace(Duration.ZERO))
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
When I do that, I'm getting hopping windows in 30 seconds. But I'm interested in just the last 30 seconds. Do I catch the last hopping windows? Then I'm going to look for the top 5 average value in that window using Java treeset.
If you only want the latest you can put the windows in a KTable and if they have the same key you will only have the latest window in the table.
I'm trying to process some events in a sliding window with kafka stream but I think i don't understand some details of kafka streams so I'm not able to do what I want.
What I have :
input topic of events with key/value like (Int, Person)
What I want :
read these events within a sliding window of 10 minutes
process each element in the sliding window
filter and count some element, fire some event to an other kafka
topic (like if a wrong value is detected)
To be simple: get all the events in a sliding window of 10 minutes, do a foreach on them, compute some stats/events in the context of the window, go to the next window...
What I tried :
I tried to mix the Stream and the processor API like :
val streamBuilder = new StreamsBuilder()
streamBuilder.stream[Int, Person](topic)
.groupBy((_, value) => PersonWrapper(value.id, value.name))
.windowedBy(TimeWindows.of(10 * 60 * 1000L).advanceBy(1 * 60 * 1000L))
// now I have a window of (PersonWrapper, Person) right ?
streamBuilder.build().addProcessor(....)
And now I'd add a processor to this topology to process each events of the sliding window.
I don't understand what is TimeWindowStream and why we should have a KGroupedStream to apply a Window on events. If someone can enlight me about Kafka stream and what I'm trying to do.
Did you read the documentation: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#windowing
Windowing is a special form of grouping (grouping based on time)
Grouping is always require to compute an aggregation in Kafka Streams
After you have a grouped and windowed stream you call aggregate() for the actually processing (not need to attach a Processor manually; the call to aggregate() will implicitly add a Processor for you).
Btw: Kafka Streams does not really support "sliding windows" for aggregation. The window you define is called a hopping window.
KGroupedStream and TimeWindowedKStreams are basically just helper classes and an intermediate representation that allows for a fluent API design.
The tutorial is also a good way to get started: https://docs.confluent.io/current/streams/quickstart.html
You should also check out the examples: https://github.com/confluentinc/kafka-streams-examples
Using Fixed Windows in Apache Beam. The watermark is set by the event time.
Some data may arrive out of order and cause the window to close.
How can a trigger be defined in Java to occur say 2 minutes after the last data was seen?
It's not entire clear what behavior you expect. One question is what do you expect to happen if the data arrives within the two minutes? Do you want to restart the two minutes interval, don't restart it, re-emit the data or not?
Looks like the trigger you are trying to describe is something along these lines:
wait until the watermark passed the end of window, in event time;
wait for additional 2 minutes in processing time;
emit the data;
If in step 2 it was event time, i.e. you wanted to re-emit the window if a late element arrives that fits within window + 2min, then you could use withAllowedLateness(). Though it sounds different from what you want, because it can keep re-emitting the window contents every time a matching late element arrives.
With processing-time in step 2 this is not possible in general with basic triggers that are available in Beam. You can probably achieve a behavior you want if you manually manage state and timers in your own ParDo, e.g. you can watch for the incoming elements, keep track on them in the state, and then on timer emit what you want. This can become very complicated and might still be not flexible enough for your specific use case.
One of the major problems is that there is no good way to define processing time triggers in Beam in general. It would be complicated to define a general mechanism of working with timers in this manner. For example, when you want to express "wait for 2 minutes", the framework needs to understand in relation to what these two minutes are, when to start the timer, so you need a mechanism to express that as well. And with composition, continuation and other complications this doesn't seem easy to reason about. So it's not in the framework in this general form.
In order to implement only the "wait for 2 minutes after the last element was seen in the window", the framework has to watch for it and set the timer. Technically it is possible to do something like this but doesn't seem like anyone has done it yet.
There seems to be only one meaningful processing time trigger available in Beam but it's not generic enough and doesn't do what you want. You can look at composite triggers like AfterFirst or AfterAll but they likely won't help you without a better general processing time trigger.
I decided against using Beam and implemented the solution in Kafka Streams.
I basically grouped by, then used fixed windows and the aggregated the result.
The "grace" on the window allows data to arrive late.
KGroupedStream<Long, OxyStreamItem> grouped = input.groupByKey();
TimeWindowedKStream<Long, OxyStreamItem> windowed =
grouped.windowedBy(
TimeWindows.of(WIN_SIZE)
.advanceBy(WIN_SIZE)
.grace(Duration.ofSeconds(5L)));
return windowed
.aggregate(
makeInitializer(),
makeAggregator(),
Materialized
.<Long, Aggregate, WindowStore<Bytes, byte[]>>as("tmp")
.withValueSerde(new AggregateSerde()))
.suppress(
Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.map(calculateAvg());
I am working on kafka streams windowing , particularly tumbling windows for my use case.
TimeWindowedKStream<String, Blob> windowedStreams = groupedStreams
.windowedBy(TimeWindows.of(TimeUnit.MINUTES.toMillis(5)));
this is a tumbling window for 5 minutes per record key and advances by 5 minutes. For my use case, I want no old message to be dropped and hence I want it to consider processing time as time semantic.
what is the default behaviour of tumbling window for time semantics, how does I specify in tumbling windows which time semantic to pick ?event time/processing time/ingestion time.
The time semantics are not specified on the window definition, but depend on the configured TimestampeExtractor. If you want to switch to processing time semantics, you can set default.timestamp.extractor to WallclockTimestampExtractor.class in the KafkaStreams config.
Compare
https://docs.confluent.io/current/streams/concepts.html#time
https://docs.confluent.io/current/streams/developer-guide/config-streams.html#streams-developer-guide-timestamp-extractor