Apache Beam: Handling of late data - apache-beam

I am using fixed windows in my pipeline with this configuration
Window.<KV<String, DeviceData>>into(FixedWindows.of(Duration.standardSeconds(options.getWindowSize())))
.triggering(
AfterWatermark.pastEndOfWindow()
.withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(10))))
.withAllowedLateness(Duration.standardHours(3))
.accumulatingFiredPanes();
Is my assumption correct that there will be one trigger after closing of the window and then additionally a trigger each 10 minutes after the window has closed if there is late data until 3 hours passed since the closing of the window?
How can I handle data import for periods in time that are further in past? (for example months). In this case I would receive no triggers at all.

It seems that you are missing Repeatedly.forever to allow triggering multiple times, in order for it to match your description.
For data further in the past, you need to configure .withAllowedLateness() based on the lateness that you want to handle.

Related

Anylogic ‘how to’ questions

I am using Anylogic for a simulation-modeling class, and I am not anylogic or coding smart. My last and only coding class was MatLab based about 16 yrs ago. I have a few questions about how to implement modeling concepts in a discrete model with anylogic.
How can I add/inject agents directly into a queue downstream from a source? I have tried adding an additional source to use the “Calls of inject() function,” but I am not sure how to implement it after selecting it ( example: what do I do after selecting the Calls of inject() function). I have the new source feeding directly into the queue where I want the inject.
How can I set the release of an agent to a defined schedule instead of a rate? Currently, I have my working model set to interarrival time. But I would like to set the agent release to a defined schedule. (example: agent-1 released at 120 seconds, agent-2 released at 150 seconds, agent-3 released at 270 seconds)
Any help would be greatly appreciated, especially if it can be written in a “explain to me like I am 5yrs old” format.
Question 1:
If you have a source connected directly to a queue, then when you call source.inject() an agent will be created at the source block and go to the queue. If you have 1 source with multiple possible destinations, then you will have to use select output blocks and some criteria to go from the source to the desired queue.
Since you mentioned not being a strong programmer, this probably wouldn't be for you, but I often find myself creating agents via add_population and then just adding them to an ArrayList until I am ready to pull them into the DES flow. Really, there are near infinite ways to control agent flow within AnyLogic.
Question 2:
Option a: Arrivals by "Arrival Table in Database" You can link an AnyLogic database table to Excel, and then the source block will just have an agent arrive based on that table.
Option b: Arrival Schedule - you could set this up manually within the development environment or load your schedule from a database. I prefer option a over option b given your brief description.
Option c: Read in data to variable and then write code to release based on next arrival time. 1,000s of ways to do this, but one example could be a list of doubles (your arrival times), set an event to delay until next arrival, call inject function, remove that arrival from the list. I think option a would be best for you, but given that AnyLogic allows you to add java code, there are no limits to how sophisticated you could make your arrival logic.
For 2) You could also use an event or a dynamic event. The action could be source.inject(1); and you can schedule them to your preferences with variables. Just be vigilant that you re-start the events if necessary.
There is a demo-model from AnyLogic for dynamic events.

How to stop timeout in service block

I am modeling ticket system with various SLA. The model must contain several service blocks with different reaction time ( from 2 to 32 hours). In the service block only working hours should be taken into account. So in the service block timeout should stop when non-workong hours and on the weekend. Could you please kindly tell me how i can realize it?
Thank you very much in advance!
I can think of two answers, one simplified but works in many cases, the other more advanced and probably more accurate:
Simplified approach: I would set the model in hours and keep everything running as is without any stop. So, at the end of the simulation, if the total time is 100 hours and you know that you have 8 hours/day with 5 days/week, then you'd know the total duration is 2.5 weeks. Of course, this might have limitations or might become more complex later on if you want day-specific actions (e.g. you want to differentiate between Monday, Tuesday, etc.)
Advanced more accurate approach: Create resources whose capacities are defined by schedule and assigned them to your services. Create a schedule and specify the working hours in that schedule. Check the below link to learn more about schedules. I call this the more advanced approach because you need to make sure the schedule is defined correctly and make sure all elements in the model are properly controlled (e.g. non-service blocks such as source, delays, etc.).
https://help.anylogic.com/topic/com.anylogic.help/html/data/schedule.html?resultof=%22%73%63%68%65%64%75%6c%65%73%22%20%22%73%63%68%65%64%75%6c%22%20
I personally would use the first approach if the model is rather simple and modeling working hours is enough for analysis. Otherwise, I'd go for option 2.
Finally, another option I'd like to highlight is the "suspend/resume" functions. I am only adding this because you asked "how to stop timeout". So these functions specifically stop and resume timeout. But you'll need to define the times at which they are executed (through an event for example).

Early firing in Flink - how to emit early window results to a different DataStream with a trigger

I'm working with code that uses a tumbling window of one day, and would like to send early results to a different DataStream on an hourly basis.
I understand that triggers are a way to go here, but don't really see how it would work.
The current code is as follows:
myStream
.keyBy(..)
.window(TumblingEventTimeWindows.of(Time.days(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
In my understanding, I should register a trigger, and then on its onEventTime method get a hold of a TriggerContext and I can send data to the labeled output from there. But how do I get the current state of MyAggregateFunction from there? Or would I need to my own computation here inside of onEventTime()?
Also, the documentation states that "By specifying a trigger using trigger() you are overwriting the default trigger of a WindowAssigner.". Would my one day window then still fire correctly, or do I need to trigger it somehow differently?
Another way of doing this is creating two different operators - one that windows by 1 hour, and another that windows by 1 day. Would triggers be a preferred approach to that?
Rather than using a custom Trigger, it would be simpler to have two layers of windowing, where the hourly results are further aggregated into daily results. Something like this:
hourlyResults = myStream
.keyBy(...)
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
dailyResults = hourlyResults
.keyBy(...)
.window(TumblingEventTimeWindows.of(Time.days(1)))
.aggregate(new MyAggregateFunction(), new MyProcessWindowFunction())
hourlyResults.addSink(...)
dailyResults.addSink(...)
Note that the result of a window is not a KeyedStream, so you will need to use keyBy again, unless you can arrange to leverage reinterpretAsKeyedStream (docs).
Normally when I get to more complex behavior like this, I use a KeyedProcessFunction. You can aggregate (and save in state) hourly and daily results, set timers as needed, and use a side output for the hourly results versus the regular output for the daily results.
There are quite a few questions here. I will try to ask all of them. First of all, if You specify Your own trigger using trigger() this means You are going to effectively override the default trigger and thus the window may not work the way it would by default. So, if You for example if You create the 1 day event time tumbling window, but override a trigger so that it fires for every 20th element, it will never fire based on event time.
Now, after Your custom trigger fires, the output from MyAggregateFunction will be passed to MyProcessWindowFunction, so It will work the same as for the default trigger, you don't need to access the MyAggregateFunction from inside the trigger.
Finally, while it may be technically possible to implement trigger to trigger partial results every hour, my personal opinion is that You should probably go with the two separate windows. While this solution may create a slightly larger overhead and may result in a larger state, it should be much clearer, easier to implement, and finally much more error resistant.

UiPath Orchestrator Triggers - Cron Expression For specific day of month or next working day if not a working day

I've currently got this Cron expression that I'm using to trigger a process in UiPath Orchestrator:
0 0 15 21W * ? *
Runs on the closest working day to the 21st of each month at 3pm.
However I need it to run on the next working day at 3pm if the 21st is a non working day.
Tried searching for an answer and nothing quite fit the brief.
I used this website to build my expression (which is a great tool) but it only had an option for 'nearest day' and not next working day given a specific day of month: https://www.freeformatter.com/cron-expression-generator-quartz.html
As you don't need the nearest day, you can't use the functionality of Orchestrator cronjob. I would recommend creating a wrapper process as follows:
Create a new process, let's call it StartJobByCheckingDate
Now create a trigger that starts StartJobByCheckingDate each day at 3pm
So that process is now your manager of your desired process
Now we need to check if it is the 21th day
Here you have different ways to solve it
You could create a DataTable or even a file in the StartJobByCheckingDate process, that contains all the different days where your desired process should be fired (but this is very manual, you might not want to update this every year, so this might not be the smartest but the easiest solution)
The other idea is to check if the current day is the 21th day. If so check if it is Saturday/Sunday (non-working day).
If true: you could now create a empty dummy file somewhere that tracks that the 21th was a non-working day, and the next day you check that file existing, if it exists you check the current day to be a working day, and if so you delete the file again and start your desired process
If false: just start your desired process directly
I think 2. idea would be that best. Sure you have 365 jobs runs/year. But when you keep that helper process smart this will just be seconds.
Another idea instead of using the dummy file, would be to use Entities. Smarter but need some more time to get familiar with.
We have (had) the exact same issue. Since UiPath doesn't offer a feasible solution out of the box, we will work around the restriction using the following strategy: We trigger the actual job daily, considering a custom-built, static NonWorkingDay-list that will just suppress the execution of the robot every day we don't want it to run.
These steps are needed:
Get a list with of all known bank holidays, saturdays and sundays until 2053 or so...
Build a the static exclusion-list using a script that does something like this (pseudocode. I will update the answer once we have actually implemented the solution):
1. get all valid execution dates
loop through every 28th of the month until end of 2053
if the date is in the bankHolidayList then
loop until the next bankDay is found
add it to the list of valid ExecutionDates
else
add the date to the validExecutionDate-list
2. build exclusion-list
loop through every day until end of 2053
if the date is not in the validExecutionDate-list
add it to the exclusionDate-list
Format the csv accordingly and upload it to the orchestrator tenant as a NonWorkingDay-List
Update your trigger to run daily at your desired time, using the uploaded NonWorkDay-Calendar
While the accepted answer will surely work as well, we prefered to go with this approach because having a separate robot that does nothing but executing a UiPath trigger just doesn't seem right to me. With this approach we have no additional code that we potentially need to maintain.
In my oppinion not having a solution for this concern out of the box is a lack of feature that UiPath will (hopefully) fix until end of 2053 ;-)
Cheers
You can configure your trigger to launch oftener, then manage dates at init of your process, but you must set up a list of "holydays" or check in some way.
Also you can use the calendar option of orchestrator (+info)

Need to Make a Parse Cloud Code Job to Reset all Objects at a Certain Time of Day

I currently have an app powered by parse that monitors the wait times for a certain amusement park. On parse, each ride has its own class file and in each class there is an object with a string entitled "waitTime" which has a string that has the most recent wait time submitted. I would like to use cloud code to reset all of the waitTime sections of each object to "0" at 1:00 AM each morning. I have no experience with Cloud Code or anything like it. How would I go about doing something like this? Thank you in advance for your help!
Have a job that runs every 5 minutes, comparing the current server time to any scheduled times (e.g. 1am, including timezone issues), and if it matches run that task (in your case resetting the "waitTime").
It is common to have this single master job trigger multiple different tasks (functions in your cloud code) using different rules such as time or a manual job queue, that's why I suggest this pattern.