flink Windows, when do they start - streaming

I want to capture events from a Apache Flink DataStream, every "natural" hour. That is, I want to capture events in a window from 12:00:00 till 12:59:59, 13:00:00 till 13:59:59...
I have been using:
datastream.keyBy(0)
.timeWindow(Time.minutes(60))
But how do I know those 60 minutes start at every o'clock, and that the window is not, for instance, from 12:30:00 till 13:29:59?

Your answer is here. To summarize:
For tumbling and sliding windows, windows are aligned with epoch (00:00:00 1 January 1970). Therefore, if you don't change the offset parameter, then your tumbling window will match the "o'clock" times.

Related

How do I consume Kafka-Messages older than x minutes, but all messages on restart?

I need some grace period before consuming the kafka message.
My approach is to use a hopping window.
e.g. If I want to consume the message after 5 minutes, the hopping window would be 6 minutes and will advance by 1 minute.
Then I'll use a filter to get data older than 5 minutes (there's also a timestamp in the message itself). Hence I will process data from minute 0 to minute 1. Then the hopping window jumps 1 minute forward and I process data from minute 1 to minute 2 and so on.
However I need to consume all messages when starting the application and not just the last 6 minutes.
I'm also open for other suggestions, regarding the 5 minute grace period.
I've made wrong assumptions here. All the data in the topic will be consumed, no matter how old it is.
e.g. It's 12:10 now and we start the Kafka-Stream.
The data in the topic, we want to consume, was pushed at 12:00 and we have a window of 6 minutes.
I was expecting everything to be consumed from 12:04 to 12:10 (6 minutes) and everything ago would be lost.
But the 12:00 data will be consumed anyway, it just falls into an older window.

How to get Azure Data Factory Tumbling Window Trigger to work with Daylight Savings Time

I have some tumbling window triggers that are set to run at specific intervals 6 hours apart. They need to run at a designated time (think 5am and 11am, and so on) I have them set up so that they are self-dependent and dependent on a connection check trigger.
The problem arises when daylight savings comes around. Tumbling window triggers only work in UTC and when the clock changes in our time zone, the times they are triggered change by an hour (forward or back depending on the time of year). This causes data to be late to its destination and I am forced to manually deploy new triggers around daylight savings time.
I am wondering if there is a better way to work around daylight savings time as Tumbling Window Triggers do not support any time zone other than UTC and deploying new triggers every time is not an effective solution.
Not sure if I understand your requirement, it is actually different between 6 hours apart and setting scheduled triggers for 2 designated times which are 6 hours apart.
If you want to schedule the job for every 6 hours, the timezone should have no impact since the trigger should always trigger every 6 hours. This is the correct use case for tumbling window trigger.
If you want to schedule for designated times, you should go for scheduled trigger, which supports for timezones. For catering daylight saving time, you can simply select the timezone you want, ADF will auto adjust according to daylight saving time as specified in the UI.

Can interarrival time be used with anylogic schedule block?

I'm trying to model a production sequence in anylogic where orders should come in with an interarrival time of normal(8,105) seconds. These orders should come in every week day between 11 am and 2 pm (3 hour window).
I tried to implement this with the Schedule block in anylogic but this only allows me to define a rate per hour. Is there a way to do this with interarrival time?
Also the agents that arrive at 1:59 pm should also be processed even if it takes until after 2 pm. Is there a way to calculate the mean working time per day (the time from the generation of the first agent by the source block until the last generated agent enters the sink block)?
Thank you all in advance!
I would use the getHour() function and dispose of the agents if hour is not between 11 and 14. And inside the source you don't need to do anything special. If it even arrives at 11.59 pm it will be processed.

How to find last hopping window using Apache Kafka Streams

I'm trying to get average value in the last 30 seconds using hopping windows. Here are windowing and suppressing code;
.windowedBy(TimeWindows.of(Duration.ofSeconds(30)).advanceBy(Duration.ofSeconds(30)).grace(Duration.ZERO))
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
When I do that, I'm getting hopping windows in 30 seconds. But I'm interested in just the last 30 seconds. Do I catch the last hopping windows? Then I'm going to look for the top 5 average value in that window using Java treeset.
If you only want the latest you can put the windows in a KTable and if they have the same key you will only have the latest window in the table.

ADF Tumbling Window Trigger: Dependency trigger not working as expected

I have created a tumbling window trigger for my Azure Data Factory Pipeline Test_Daily with recurrence as 24 hours. For this pipeline, i have added a dependency trigger lets say Test_Hourly (which runs every hour) with offset as 1.00:00:00(1 day).
Test_Daily pipeline is not getting triggered even though the dependency trigger has run successfully. For example, if the daily pipeline windowStartTime is 2020-09-20 00:00:00 and Test_Hourly with WindowStartTime 2020-09-21 00:00:00 has run successfully, then the daily pipeline should get triggered. However, this is not the case and Test_Daily gets triggered only when Test_Hourly has completed 2020-09-22 00:00:00(i.e. with 2 day offset).
Please let me know how to resolve this issue. Thanks.
I think it is your setting problem.
Your Test_Daily will start at 2020-09-20 00:00:00 and end at 2020-09-21 00:00:00.
Your Test_Hourly will wait for Test_Daily to complete and then delay 1.00:00:00(1 day) to complete at (2020-09-22 00:00:00).
Test_Hourly will Waiting on dependency.
yes, it is behaving as per the design with the applied settings.
Tumbling window trigger starts at window end time (For daily pipeline it is 2020-09-21 00:00:00) and it adds offset (1 day) to it and pipeline actually runs at 2020-09-22 00:00:00.
The same is replicated when I have used hourly and 5minute triggers. In this case, hourly is dependent on 5minute trigger with an offset of 5minutes and the delay of 5min after window time is seen in triggering time of hourly trigger.