I am running some test jobs on Azure Stream Analytics running this query:
SELECT System.Timestamp AS ts, Collect()
INTO output−queue
FROM input-hub TIMESTAMP BY tapp
GROUP BY HoppingWindow(second , 4 , 2)
and it turns out that, in some cases, the timestamp for the window end is a multiple of the window slide parameter, but sometimes not.
For example, with slide = 2 you get this window closing timestamps:
2016-08-04T10:36:40.0000000Z
2016-08-04T10:36:42.0000000Z
2016-08-04T10:36:44.0000000Z
2016-08-04T10:36:46.0000000Z
2016-08-04T10:36:48.0000000Z
Or, in the case slide = 5:
2016-08-04T14:55:15.0000000Z
2016-08-04T14:55:20.0000000Z
2016-08-04T14:55:25.0000000Z
2016-08-04T14:55:30.0000000Z
This is true even for different slide values (e.g. 2, 3, 4, 6, ...). Moreover, it is always true! No matter when the job has been started.
There are values instead (such as 7, 11) that does not follow this rule.
Can somebody answer why does this happen?
I am wondering how does Azure SA decides when to open the first window.
Thank you so much!
There are different kinds of windows (see here for more details).
First, window starts/ends do not depend on the job start time.
Tumbling and Hopping windows are best to think of, logically, as partitioning the timeline itself. For example applying Tumbling window of 1 minute will make results to appear only at time values which are mod of 1 min, i.e. 2:00pm, 2:01pm, etc.
Note that not every 1 minute boundary must have window result, it rather depends on the computation.
Sliding window can produce output at any point on the timeline, and unlike tumbling and hopping windows, does depend on the timestamps of the input events. Best way to think of sliding windows is that window may end at any input event and starts slide amount of time before that event. In other words for each event window will include all events occurred at or up to slide time before it.
Hope this helps.
Related
I am using fixed windows in my pipeline with this configuration
Window.<KV<String, DeviceData>>into(FixedWindows.of(Duration.standardSeconds(options.getWindowSize())))
.triggering(
AfterWatermark.pastEndOfWindow()
.withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(10))))
.withAllowedLateness(Duration.standardHours(3))
.accumulatingFiredPanes();
Is my assumption correct that there will be one trigger after closing of the window and then additionally a trigger each 10 minutes after the window has closed if there is late data until 3 hours passed since the closing of the window?
How can I handle data import for periods in time that are further in past? (for example months). In this case I would receive no triggers at all.
It seems that you are missing Repeatedly.forever to allow triggering multiple times, in order for it to match your description.
For data further in the past, you need to configure .withAllowedLateness() based on the lateness that you want to handle.
I have a model as shown below with two situations, I am running it for two situations.
In the first run (for situation1), I write traceln function as "traceln(productDemand)" in the "event-generateDemand" placed in "Main". At the end of simulation, I get the values in the first column below.
2)In the second run (for situation2), for once I write traceln function as "traceln(main.productDemand)" in the "event" placed in "Producer" agent.At the end of the second simulation, I get the values in the second column below.
Normally, these two values are always same , it expected that at the every simulation time they have to be same, but they are not same as shown in the Fig.1. what's the problem? Why the "productDemand" variable is different when we try to access from another agent at the same time?
I hope I was able to explain my problem.
Fig.1- The obtained results as table format
Fig.2- The screenshot of Event placed in Main
Fig.3- The screenshot of Event placed in Producer agent
Fig.4- The obtained results for both traceln functions on the running
Fig.5- Simulation experiment screenshot.
fig.1
fig.2
fig.3
fig.4
fig.5
There is no bug in the model it is just a simple case of timing. Not all events occur at exactly the same "time" although they all occur at the same timestep. One will always execute before the other.
See the simple example below:
I have eventA that increases the variable value and then traces the value (similar to your event on main)
Then I have another event that traces the variable as well, similar to yours in the agent.
Yet when I run the model at the same time the variable appears to be different from the different locations of tracing
If you click on the Events tab in the console you will see that event B is scheduled to run before event A
Even though both will run at the same timestep in the model they don't run "at the same time"
If you want to be in total control of what happens on a specific time step, it is advised to have one event that runs at the time interval you want, e.g daily, and have it sit on main and then call all the functions in the order you want them executed.
If you don't do this then AnyLogic will schedule the events as they get created, which most of the time is the order in which you placed them on the canvas.
I'm working with code that uses a tumbling window of one day, and would like to send early results to a different DataStream on an hourly basis.
I understand that triggers are a way to go here, but don't really see how it would work.
The current code is as follows:
myStream
.keyBy(..)
.window(TumblingEventTimeWindows.of(Time.days(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
In my understanding, I should register a trigger, and then on its onEventTime method get a hold of a TriggerContext and I can send data to the labeled output from there. But how do I get the current state of MyAggregateFunction from there? Or would I need to my own computation here inside of onEventTime()?
Also, the documentation states that "By specifying a trigger using trigger() you are overwriting the default trigger of a WindowAssigner.". Would my one day window then still fire correctly, or do I need to trigger it somehow differently?
Another way of doing this is creating two different operators - one that windows by 1 hour, and another that windows by 1 day. Would triggers be a preferred approach to that?
Rather than using a custom Trigger, it would be simpler to have two layers of windowing, where the hourly results are further aggregated into daily results. Something like this:
hourlyResults = myStream
.keyBy(...)
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
dailyResults = hourlyResults
.keyBy(...)
.window(TumblingEventTimeWindows.of(Time.days(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
hourlyResults.addSink(...)
dailyResults.addSink(...)
Note that the result of a window is not a KeyedStream, so you will need to use keyBy again, unless you can arrange to leverage reinterpretAsKeyedStream (docs).
Normally when I get to more complex behavior like this, I use a KeyedProcessFunction. You can aggregate (and save in state) hourly and daily results, set timers as needed, and use a side output for the hourly results versus the regular output for the daily results.
There are quite a few questions here. I will try to ask all of them. First of all, if You specify Your own trigger using trigger() this means You are going to effectively override the default trigger and thus the window may not work the way it would by default. So, if You for example if You create the 1 day event time tumbling window, but override a trigger so that it fires for every 20th element, it will never fire based on event time.
Now, after Your custom trigger fires, the output from MyAggregateFunction will be passed to MyProcessWindowFunction, so It will work the same as for the default trigger, you don't need to access the MyAggregateFunction from inside the trigger.
Finally, while it may be technically possible to implement trigger to trigger partial results every hour, my personal opinion is that You should probably go with the two separate windows. While this solution may create a slightly larger overhead and may result in a larger state, it should be much clearer, easier to implement, and finally much more error resistant.
I'm working on a streaming Java Apache Beam (2.13.0) pipeline that is running in Google Cloud Dataflow. I have a long running PTransform (for a single input, it does a lot of work, outputs multiple outputs and can take >10 minutes).
I want to return early results from the processing to the user. I have a Window and Combine step afterwards. Early triggers do not seem to work with a long running PTransform. The Combine step outputs elements after the PTransform finishes processing the element (rather than returning early results).
I've tried many different early Window functions. E.g. I've tried doing forever element count triggers and it does not work. Ditto for forever processing time-based triggers (e.g. every 10 processing seconds). I've tried GlobalWindows, Fixed Windows, Session Windows, etc.
Here is rough pseudo code for what I'm doing.
p.apply(PubsubIO.readStrings().fromSubscription(options.getInput()));
.apply(FlatMapElements.via(new LongRunningCalculation()))
.apply(<I've tried a variety of window functions>)
.apply(Combine.perKey(new SumMetrics()))
.apply(DatastoreIO.v1().write().withProjectId(options.getProject()));
For the Window functions, I've tried many different Window functions to see if I can get anything to return early. I can't get it to return early.
Here is a basic one.
Window.into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(10)))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes());
Even for this one, Even if the Window has added >>10 elements, the GroupBy in the Combine step does not output any rows.
Expected: If I have a long running PTransform, I'd still expect early triggers to still fire.
Actual: I can't seem to get early triggers to work.
Any advice?
I want to set the warm-up period in AnyLogic Personal Learning Edition. I searched for the warm-up period place in AnyLogic, but I couldn't find any thing about the warm-up period.
Is there a warm up period in Anylogic or something like this?
There is no default warm-up setting as it would not make sense given the vast flexibility of the tool and user needs.
It is easy, however, to set it up yourself. As usual, there are many different options, here is one:
create a variable v_WarmupDuration on Main, set it to whatever many time units you need
any data object you want to only record after the warmup period, ensure it only captures data if time() > v_WarmupDuration.
Events can have a custom initial time which you can use v_WarmupDuration for.
Functions that log data can only do so if time() > v_WarmupDuration, and so on.
Alternatively, log all your data as normal but add time stamps to them. Then, you can
Creating a warmup variable works fine for metrics you create yourself.
But if you want to use the built in functionality like histograms created from timeMeasureStart and timeMeasureEnd blocks in Anylogic, you will also need to add an extra select option so e.g. assuming you set v_WarmupDuration to 60 minutes, then you need a select block with a decision on false that goes straight to sink block or the next element after the timeMeasureEnd.
Condition if true: time(MINUTE) > v_warmupDuration
That way, the warmup period will not accumulate into the dataset of the timeMeasureEnd.
If you want to set this as a parmeter to an experiment, then ...
Add a variable to the experiment page off the screen e.g. v_warmupMins
Add a control like a slider on the experiment page and link to the variable v_warmupMins
Add a parameter to hold the warmup time in the Main canvas e.g. p_warmupMins
On the experiment properties, set the parameter p_warmupMins = v_warmupMins
to programmatically add this time onto the StopTime, add this to the Before Simulation Runs
getEngine().setStopTime( getEngine().getStopTime(MINUTE) + v_warmupMins );
Now when i run experiment with slider set to 60 mins, it adds 60 mins onto the stoptime and runs the experiment without accumulating metrics until that time has passed.
Hope that helps.