Influxdb - ignore partial intervals in group by - grafana

I feel this is a problem all users of influxdb/grafana would encounter. Any time I create a graph that shows aggregations by a time interval then the most recent and oldest intervals are cut short and the ends of the graph show incorrect values. For example, I have data coming in every 10 seconds, so I should get 360 values per hour. I wanted to create a graph showing the number of data points that come in per hour. So I have this query below that does a count by hour and run it over a 24 hour period. The problem I have is that the most recent interval is almost always less than 360 because it's not complete and the oldest interval is usually cut off so it too shows too low a value. This is pretty much always an issue for any graph I create that is grouped by a time interval. Is there a way to just leave out incomplete intervals? I'm happy for a solution in influx or grafana.
SELECT count("wifiStrength") FROM "detailed_data"."water" WHERE $timeFilter GROUP BY time(1h) fill(null)
For anyone who is curious, the data is from a water meter and logs water usage.

Use smarter time ranges in the Grafana, so full hours are selected. See doc, /h is important here, e.g.:

Related

Graph a counter from zero in prometheus/grafana

In prometheus, I have a monotonically increasing counter (ifHCInOctets from IF-MIB, in this case).
In Grafana, I can create a graph using the simple query ifHCInOctets{job='snmp',instance='$Device',ifDescr=~'eth0'} and see the counter graphed over different time ranges by selecting the desired range in the upper-right.
This is almost exactly what I want. However, I would like the graph to always start at zero and increase from there. The use-case is that I want to visualize my data usage over the course of a month to see how quickly I am approaching my data cap. (I already create a gauge object using increase(ifHCInOctets{...}[$__range]) function which shows me how much I have used in total over the given time range, but I'd like to be able to visualize that usage over time.)
Basically, I want ifHCInOctets{...} - X where X is the value of ifHCInOctets at the start of the range. My first thought was:
ifHCInOctets{...} - ifHCInOctets{...} offset $__range
But that seems to show me each data point minus the data point $__range time prior to it (rather than just subtracting the starting value from all points).
I then tried creating a query variable with the query query_result(ifHCInOctets{...} offset $__range) and setting it to update on time range change. This almost seemed to work, but the resulting graph always seemed to start slightly negative, depending on the time range selected, which made me think it wasn't doing what I thought it was.
I have also tried various forms of sum, sum_over_time, and increase, all to no avail.
You're probably looking for something like this
ifHCInOctets
-
min_over_time(
(ifHCInOctets
and
(month(timestamp(ifHCInOctets)) == scalar(month(vector($__to / 1000)))))[31d:]
)
But it doesn't take into account counter resets. And is ugly and inefficient as hell. It's basically the current value minus the min_over_time calculated over samples in the previous 31 days that fell into the same month as Grafana's $__to timestamp.
You probably want to set up a recording rule based on this expression (that adds year, month and day labels to a metric) and then calculate the increase() over any given month (including the current month). That takes into account both counter resets and counters that did not exist at the beginning of the month.

Can I calculate a moving sum on a field in InfluxDB?

I'm trying to understand if it's possible to calculate a 1 month sum of revenue data in one of my measurements. For each day, I would like the sum of the previous 30 days.
Is this possible in InfluxDB or through Grafana's query interface?
A moving average is a moving sum, divided by the number of samples. So if you want a moving sum of the past 30 values:
select 30*moving_average(field_name, 30) from measurement
Edited to add:
As Peter Halicky points out in the comments, this is is not the past 30 days. It's the past 30 data points.
If you will always have data for every single day, it's not an issue.
If you're missing a day's data, you'll still get a 30-sample average, but it'll stretch over 31 days instead of 30.
If you don't actually care about the calendar, but want to know the past 30 days of activity, this is not a problem.
If it is a problem, there are a few work-arounds. One that's probably trickier than it sounds: ensure that there is always an entry for each day.
A more robust way is to have the reporting app do this in two steps. Something like this (haven't worked out all the details, but you get the idea):
find the number of data points in the past 30 days, using a query like select count(field_name) from measurement where time > now() - 30d.
Use this number (call it n) to form the query: select n*moving_average(field_name, n) from measurement where time > now - 30d.
Yes, definitely it's possible.
Just set this part of your query like this:
SELECT sum("value") FROM "YOUR_TAG_NAME"
WHERE $timeFilter GROUP BY time(30d) fill(null)
Just make sure that your dashboard time included Last 30 days (at least).

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

ADF reprocess daily slice 3 times per day

I have a complex ADF's pipeline with slice based scheduling, where slice = day.
Now it works like that:
Day1, Day2, Day3, ..., PreviousDay, CurrentDay
At 00:00 AM of CurrentDay it reprocess PreviousDay. So for Today i have calculated data for the previous day only.
I need to change the schedule to make it works like that:
1) slice size should be left the same = day
2) reprocessing for CurrentDay should be triggered 4 times per day to emulate results refresh (kinda running total)
The reason why i wanna leave the same slice size = 1 day, because it is a partition sizeof underlying tables. I dont wanna make them small as a few hours because it is meaningless for the current volume of data.
Cannot realize how to avoid change size of slice to a few hours and achive this goal. How to force reprocess current day? Any ideas will be helpfull for me.
Thank you.
The way to do this is to make 2 changes:
Set the availability to be StartOfInterval, thus running the CurrentDay instead of PreviousDay. Dataset availability and policies
Set the schedule of the activity to Hourly with frequency 8 (thereby running this 4 times per day) (See data-factory-scheduling-and-execution#specify-schedule-for-an-activity for more info) The activity and output should have matching slices, this can be fixed with the description below.
Since the slices of the input (Day:1) and activity (Hour:8) is different, you need to set two extra parameters in the activity for the input, to change the slice from 8 hours to 1 day, thus matching the input. The execution is based on the output slice. This is explained further here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#model-datasets-with-different-frequencies The Activity and output slice also have different slices and can be fixed with the same method.

Grafana does not show graphs for longer time periods

We are using Grafana to visualise some times measured with an other application. I get a data point every 5 min.
I also get a nice graph if I only visualise the last 24 or 48h.
for longer time ranges no graph is shown.
I researched a little and found that in the database there are data points each minute. which means I only get one value and 4 time NULL every 5 minutes. For a time range bigger 48h grafana starts to cumulate the values it ends up with only NULL values.
Here are two pictures which show my problem:
Timerange 24h
Timerange 7 days
Are there some settings I can make to avoid this behaviour?
Thank you for your help
Are you using graphite? If so, please make sure you configured xFilesFactor correctly.