I'm selecting Sum(time-taken) from a big period of time logs and it is getting negative values. How do I handle it?
You could try doing a sum of TO_REAL(time-taken).
Related
Instant.MAX.toEpochMilli() raises java.lang.ArithmeticException: long overflow.
What is the largest timestamp representable without hitting the limit of long?
And is there a constant somewhere for it?
The largest timestamp that won't raise an exception in toEpochMilli() is +292278994-08-17T07:12:55.807Z.
Instant.MAX is much larger +1000000000-12-31T23:59:59.999999999Z
Instant.ofEpochMilli(Long.MAX_VALUE); // +292278994-08-17T07:12:55.807Z
I'm not aware of any constant for this specific date but it's easy enough to compute with Instant.ofEpochMilli(Long.MAX_VALUE))
I'm running Prometheus in a kubernetes cluster. All is running find and my UI pods are counting visitors.
Please ignore the title, what you see here is the query at the bottom of the image. It's a counter. The gaps in the graph are due to pods restarting. I have two pods running simultaneously!
Now suppose I would like to count the total of visitors, so I need to sum over all the pods
This is what I expect considering the first image, right?
However, I don't want the graph to drop when a pod restarts. I would like to have something cumulative over a specified amount of time (somehow ignoring pods restarting). Hope this makes any sense. Any suggestions?
UPDATE
Below is suggested to do the following
Its a bit hard to see because I've plotted everything there, but the suggested answer sum(rate(NumberOfVisitors[1h])) * 3600 is the continues green line there. What I don't understand now is the value of 3 it has? Also why does the value increase after 21:55, because I can see some values before that.
As the approach seems to be ok, I noticed that the actual increase is actually 3, going from 1 to 4. In the graph below I've used just one time series to reduce noise
Rate, then sum, then multiply by the time range in seconds. That will handle rollovers on counters too.
Prometheus doesn't provide the ability to sum counters, which may be reset. Additionally, the increase() function in Prometheus has some issues, which may prevent from using it for querying counter increase over the specified time range:
It may return fractional values over integer counters because of extrapolation. See this issue for details.
It may miss counter increase between raw sample just before the lookbehind window in square brackets and the first raw sample inside the lookbehind window. For example, increase(NumberOfVisitors[1m]) at timestamp t may miss the counter increase between the last raw sample just before the t-1m time and the first raw sample at (t-1m ... t] time range. See more details here and here.
It may miss the increase for the first raw sample in a time series. For example, if the NumberOfVisitors counter is increased to 10 just before the first scrape of this counter by Prometheus, then increase() over the time range with the first sample would under-count the counter increase by 10.
Prometheus developers are going to fix these issues - see this design doc. In the mean time it is possible to use VictoriaMetrics - its' increase() function is free from these issues.
Returning to the original question - the sum of multiple counters, which may be reset, can be returned with the following MetricsQL query in VictoriaMetrics:
running_sum(sum(increase(NumberOfVisitor)))
It uses the following functions:
increase() for calculating increase per each counter between adjacent points on the graph.
sum() for summing the calculated increases per each point on the graph.
running_sum() for calculating the running sum over per-point increases on the graph.
I have a counter that I am plotting on Grafana.
rate(processed_work_items_total{job="MainWorker"}[1m])
I am not getting the expected numbers in grafana.
What I want is the # of Work Items Processed per minute.
Is my query wrong? or my Unit of Measure in my Y Axis. I currently have it as ops/min and its giving me a super small number.
According to the documentation, rate(processed_work_items_total{job="MainWorker"}[1m]) will calculate the number of work items processed per second, measured over the last one minute (that's the [1m] from your query).
If you want the number of items per minute, simply multiply the above metric with 60.
If you need to calculate per-minute increase rate for a counter metric, then use increase(...[1m]). For example, the following query returns the increase of processed_work_items_total{job="MainWorker"} time series over the last minute:
increase(processed_work_items_total{job="MainWorker"}[1m])
Note that the increase() function in Prometheus may return unexpected results due to the following issues:
It may return fractional results over integer metric because of extrapolation. See this issue for details.
It may miss counter increase between the last raw sample just before the lookbehind window specified in square brackets and the first raw sample inside the lookbehind window.
It may miss the initial counter increase at the beginning of the time series.
These issues are going to be addressed in Prometheus according to this design doc. In the mean time it is possible to use MetricsQL, which is free from these issues.
I have a colleague of mine that keeps telling me not to use a double precision type for a PostgreSQL column, because I will eventually have rounding issues.
I am only aware of one case where a value gets stored with approximation and is when a number with "too many" decimal digits gets saved.
For example if I try to store the result of 1/3, then I will get an approximation.
On the other hand, he is claiming that the above is not the only case. He is saying that sometimes, even if the user is trying to store a number with a well defined number of digits such as 84.2 or 3.124 the value might get save as 84.19 or 3.1239 for the second case
This sounds very strange to me.
Could anyone give me an example/proof that the above can actually happen?
Your colleague is right: stay away from from float or double. But not so much because of rounding issue, but because those are approximate data types. What you put into that column is not necessarily what you get out.
If you care for precision and accurate values, use numeric.
A more detailed explanation about the pitfalls of approximate data types can be found here:
https://floating-point-gui.de/
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
I'm calculating how long it takes to detect an issue in Tableau.
I have the following calculated field to work out mean time to detection:
DATEDIFF('hour',[DATE Reported], [DATE Responded])
When I use the meantimetodetect calculated field it only shows me one result and the rest 0s.
The one result shown is correct there were 3 days apart from when it was reported to the response date. The rest of the results are reported and responded in the same day so I dont know if that has anything to do with it?
Does anyone know why it is displaying like this perhaps there is a better way to calculate it?
Thanks.