Detect an application that has not sent a message in the last 15 minutes using Kapacitor - kapacitor

We are writing a message count per application to InfluxDb every 10 seconds. I want to be able to generate an alert if that number has not changed in the last 15 minutes.
I tried derivative, but that gives the change for each data point. The unit parameter just scales the result. Derivative works well for our chattier apps where we can check if a message was sent every 10s, but the 15 minute window is not working.
I tried using spread with a batched query grouped by time, but that gives me the change in whole quarters of the hour (00 to 15, 15:01 to 30, 30:01 to 45...). I want to be able to check the last 15 minutes and check it every minute or so.
I tried using a windowed stream with spread, but it seems to be grabbing points outside the window since it is giving a non-zero answer.

Related

Google OR-Tools: Minimize Total Time

I am working on a VRPTW and want to minimize the total time (travel time + waiting time) cumulated for all vehicles. So if we have 2 vehicles one that starts at time 0 and returns at time 50 and one that starts at time 25 and returns at time 100, then the objective value would be 50+75=125.
Currently I have implemented the following code:
for i in range(data['num_vehicles']):
routing.AddVariableMinimizedByFinalizer(
time_dimension.CumulVar(routing.End(i)))
However, this seems like it is only minimizing the time we arrive back at the depot.
Also it results in very high waiting times.
How do I implement it correctly in Google OR tools?
This is called the span.
See the SetSpanCostCoefficientForVehicle method for one vehicle.
You can also set it for all vehicles.

AnyLogic mean waiting time in queue

I would like to get the mean waiting time of each unit spending in my queue of every hour. (so betweeen 7-8 am for example 4 minutes, 8-9 10 minutes and so on). Thats my current queue with my timemeasure Is there a way to do so?
]
Create a normal dataset and call it datasetHourly. Deactivate the option Use time as horizontal value. This is where we will store your hourly data.
Creat a cyclic event and set the trigger to cyclic, once every hour.
This cyclic event will get the current mean of your time measurement ( waiting time + service time in your example) and save this single value in the extra dataset.
Also we have to clear the dataset that is integrated into the timeMeasurementEnd, in order to get clean statistics again for the next hour interval.
datasetHourly.add(time(HOUR),timeMeasureEnd.dataset.getYMean());
timeMeasureEnd.dataset.reset();
You can now visualise the hourly development by adding the hourlyDataset to a normal plot.

prometheus: is it possible to use event number of gauge as a counter?

I use prometheus to monitor a api service. Currently, I use a Counter to count number of requests received and a Gauge for the response time in milliseconds.
I've tried to use something like count_over_time(response_time_ms[1m]) to count requests during a time range. However, I got result that each point is value of 10.
Why this doesn't work?
count_over_time(response_time_ms[1m]) will tell you the number of samples, not the number of times your Gauge was updated within (what I assume to be) a Java process. Based on the value of 10 you're seeing, I'm assuming your scrape interval is 6 seconds.
For an explanation of why this doesn't work as you would expect it, a Gauge is simply a Java object wrapping a double value. Every time you set its value, that value changes, but nothing more. There's no count of how many times the value changed or any notification sent to Prometheus that this happened. Prometheus simply polls every 6 seconds and collects whatever value was there at the time (never the wiser that the value changed 15 times since the last time it was collected). This is why gauges are intended to measure single values that go up and down (such as memory utilization: it's now 645 MB, in 6 seconds it's 648 MB, in 12 seconds 543 MB): you know the value constantly changes, but the best you can do is sample it every now and then.
For something like request latency, you should use a Histogram: it's basically a counter for the number of observations (i.e. number of requests); a counter for the sum of all observations (i.e. how long all requests put together took); and separate counters for each bucket (i.e. how many requests took less than 1 ms; how many requests took less than 10 ms; etc.). From this you can get an accurate average over any multiple of your scrape interval (i.e. change in total time divided by change in number of requests) as well as estimates for any percentile (including the median). How precise said percentiles are depends on the bucket sizes you choose (and how well they actually match the actual measurements).
Or, if all you're interested in is the number of requests, then a counter that's incremented on every request will be enough. To adjust for counter resets (e.g. job restarts), you should use increase() rather than the simple difference suggested above:
increase(number_of_requests_total[1m])
If you want to count number of requests in some specific time from now (in last 1m in this case) just use
number_of_requests_counter - number_of_requests_counter offset 1m
If you want to have sth like requests per second, than use
rate(number_of_requests_counter[1m])
I can tell you why it's not working with your Gauge, but first of all specify what do you assign to this metric. I mean, do you assing some avarage, last response time, or some other stuff?
For response time you should use Summary or Histogram (more info here)

reset chart to 0 in grafan

Below is a chart I have in grafana:
My problem is that if my chosen time range is say 5 minutes, the graph wont show only what happened in the last 5 minutes. So in the picture, nothing happened in the past 5 minutes so it's just showing the last points it has. How can I change this so that it goes back to zero if nothing has changed? I'm using a Prometheus counter for this, if that is relevant.
As explained in the Prometheus documentation, a counter value in itself is not of much use. It depends on when your job was last restarted and everything that happened since.
What's interesting about a counter is how much it changed over some period of time. I.e. either the average rate of change per second (e.g. 3 queries per second) or the increase over some time range (e.g. 10K queries in the last hour).
So instead of graphing something like e.g. http_requests, you should graph rate(http_requests[1m]) (the averate number of requests over the previous 1 minute) or increase(http_requests[1h]) (the total number of requests over the past hour). You can play with the range size until you get something which makes sense for your data. But make sure to use a range at least 2x your scrape interval (and ideally more, as Prometheus is somewhat daft in the way it computes rates/increases).

How to create rolling windows with Apache Beam? Not sliding or fixed but a rolling window

Say I want to calculate the average of certain metric over the last 10 mins, after each minute and compare it to the average of the same metric over the last 20 mins, after each minute. I need 2 windows (Not 10 Sliding windows vs 20 Sliding windows) or 2 windows of Fixed Duration, with early firing. I need 2 windows which should keep rolling forward by a minute (of duration 10 min and 20 min each) every minute. Alternatively, if I could discard all but the latest of the sliding windows, my problem could be solved. Otherwise multiple sliding windows are very costly.
Could you please help here? A custom WindowFn() function would be very helpful
I must update with what I ended up doing finally.
I created a Global window with AllowedLateness of 1 hour, and triggering every minute repeatedly forever, with Accumulating Panes. From this global window, I applied DoFn filtering for elements with Timestamps in the last 10 mins (Present Instant.minus 10 mins), and events in the last 20 mins (Present Instant.minus 20 mins) to create 2 distinct PCollections. I applied this time filtering twice - once to the trigger output of the global window to add it to the PCollection(s) for 10 min, 20 min and then again to the collection itself to remove all those which are no longer part of the time duration. For now, these 2 PCollection(s) are serving as the rolling window, but I need to audit the results to confirm if this is indeed working.