How to create rolling windows with Apache Beam? Not sliding or fixed but a rolling window - apache-beam

Say I want to calculate the average of certain metric over the last 10 mins, after each minute and compare it to the average of the same metric over the last 20 mins, after each minute. I need 2 windows (Not 10 Sliding windows vs 20 Sliding windows) or 2 windows of Fixed Duration, with early firing. I need 2 windows which should keep rolling forward by a minute (of duration 10 min and 20 min each) every minute. Alternatively, if I could discard all but the latest of the sliding windows, my problem could be solved. Otherwise multiple sliding windows are very costly.
Could you please help here? A custom WindowFn() function would be very helpful

I must update with what I ended up doing finally.
I created a Global window with AllowedLateness of 1 hour, and triggering every minute repeatedly forever, with Accumulating Panes. From this global window, I applied DoFn filtering for elements with Timestamps in the last 10 mins (Present Instant.minus 10 mins), and events in the last 20 mins (Present Instant.minus 20 mins) to create 2 distinct PCollections. I applied this time filtering twice - once to the trigger output of the global window to add it to the PCollection(s) for 10 min, 20 min and then again to the collection itself to remove all those which are no longer part of the time duration. For now, these 2 PCollection(s) are serving as the rolling window, but I need to audit the results to confirm if this is indeed working.

Related

Show the best / MAX unit count over a 15 minute rolling interval - Tableau

I'm trying to show the best unit count over a 15 minute rolling interval at a specific level of detail (PLC & Point) that will act as a KPI. I think I'm on the right path but I'm currently getting the "an aggregate function is already an aggregation" error and I can't find either a better solution to do the calculation or a work around for the error.
I have created a calculated field to work out the rolling 15 min sum of the counts called 'Rolling 15 mins' and display that alongside each minute of the test window (see screenshot and 'sheet 1' of the Google drive doc) using
WINDOW_SUM(SUM([Unit Count]),-15,0)
Rolling 15 min screenshot / sheet 1
With the calculation 'Rolling 15 Mins' I've tried to show the best or Max rolling 15mins count at the PLC & Point level so that each points best 15 minute count over a test period is clearly visable using an LOD but this is where I'm getting the error, which I now know is due to the heiracrhy of Tableau calculations, but I can't figure another work around.
{ FIXED [PLC New],[PLC & Point (Test Windows)],DATEPART('hour', [Time]),DATEPART('minute', [Time]) : MAX([Rolling 15 Mins]) }
The screenshot from 'sheet 2' below the 'Rolling 15 Mins' is currently displaying the sum of the unit counts of the last 15 PLC points, but this is the level that I would like to display the best / MAX 15mins unit count over the test period.
The level I'd like to display the MAX 15 mins at / sheet 2
Any assistance with this would be much apperciated. Thanks in advance.
Link to Example File (.twbx)

How do I set the number of arrivals?

I'm new to anylogic and I have a question, how do I solve this: supply of 30 materials every 10 minutes.
runtime: 5 hours.
30 materials every 10 minutes means 3 materials per minute... in anylogic a material is defined as an agent.
you can generate agents in many different ways, the most typical is using the source from the process modeling library in which you put as a rate 3 per minute
Note that this means that your arrivals will follow a poisson distribution...which means you won't get exactly 3 materials every minute, but if you run for 5 hours, the average will be close to that
if you want exactly 30 materials every 10 minutes, you can use arrivals defined by interarrival time, with intervals of 20 seconds

reset chart to 0 in grafan

Below is a chart I have in grafana:
My problem is that if my chosen time range is say 5 minutes, the graph wont show only what happened in the last 5 minutes. So in the picture, nothing happened in the past 5 minutes so it's just showing the last points it has. How can I change this so that it goes back to zero if nothing has changed? I'm using a Prometheus counter for this, if that is relevant.
As explained in the Prometheus documentation, a counter value in itself is not of much use. It depends on when your job was last restarted and everything that happened since.
What's interesting about a counter is how much it changed over some period of time. I.e. either the average rate of change per second (e.g. 3 queries per second) or the increase over some time range (e.g. 10K queries in the last hour).
So instead of graphing something like e.g. http_requests, you should graph rate(http_requests[1m]) (the averate number of requests over the previous 1 minute) or increase(http_requests[1h]) (the total number of requests over the past hour). You can play with the range size until you get something which makes sense for your data. But make sure to use a range at least 2x your scrape interval (and ideally more, as Prometheus is somewhat daft in the way it computes rates/increases).

flink sliding window will consider duplicate message?

suppose I am using event time processing and in sliding windows time window 10 and sliding factor 5 seconds.
so now for example some message arrive at 8 seconds event time so it will fall in first
window-1 1 to 10.
now it slides 5 seconds and now that window is 5 to 15. so the same messsage which window1 have consider , window2 will also consider that message.
window2- 5 to 15
so my question is that not duplicate message it will affect my calcualtion ?
suggest me I am thinking ok or not?
and if it consider bothe messages then how to consider it as unique?
A record is added to all sliding windows that overlap with its timestamp, e.g., in your example, the record at time 8 will be added to both windows 1 to 10 and 5 to 15.

Detect an application that has not sent a message in the last 15 minutes using Kapacitor

We are writing a message count per application to InfluxDb every 10 seconds. I want to be able to generate an alert if that number has not changed in the last 15 minutes.
I tried derivative, but that gives the change for each data point. The unit parameter just scales the result. Derivative works well for our chattier apps where we can check if a message was sent every 10s, but the 15 minute window is not working.
I tried using spread with a batched query grouped by time, but that gives me the change in whole quarters of the hour (00 to 15, 15:01 to 30, 30:01 to 45...). I want to be able to check the last 15 minutes and check it every minute or so.
I tried using a windowed stream with spread, but it seems to be grabbing points outside the window since it is giving a non-zero answer.