Behaviour when querying timespan covering two retention periods - grafana

I have a dashboard in grafana showing some metrics from a Graphite.
I'm using storage-schema.conf:
[some_metrics]
pattern = ^some_metrics\..*
retentions = 24h:60d,5d:1y,30d:5y
and storage-aggregation.conf:
[some_metrics]
pattern = ^some_metrics\.
xFilesFactor = 0
aggregationMethod = max
The problem is that I can't view data in a grafana graph that spans over multiple retention periods. If I check a now-2 year time span, I only see data from 1y to 5y. If I check now-30 to now -90 days I only see the 60d-1y samples.
I expected that for 0-1y time span the 0-60d data would be included and down sampled to 5d resolution. Is that right?

Related

Don't see all points in Grafana on lower scales

On lower scale I am obviously seeing several outliers, maximal of which is 18211
if I zoom in then I am starting to see additional outliers
Is it possible to configure Grafana to show all points all the time or aggregate them differently?
Backend is Graphite.
No, this is not possible due to space limitations
For example:
Suppose you have 60 places and you want to fill them with numbers
If the time period is one hour, then in each of these places it will display the metrics stored of every minute
But if you make this interval smaller and convert it to a minute, each of these places will display the metrics stored of every second.

what is the default grafana setting for $__rate_interval

I understand that rate(xyz[5m]) * 60 is the rate of xyz per minute, averaged over 5 mins.
How then would $__rate_interval and $__interval be defined,
possibly in the same syntax?
What format is rate being measured here, in my panel? Per minute, per second?
What is the interval= 30s in my panel here? My scraping interval is set to 5s.
How do i change the rate format?
See New in Grafana 7.2: $__rate_interval for Prometheus rate queries that just work.
Rate is always per second. See Grafana documentation for the rate function.
Click on Query options, then click on the Info-Symbol. An explanation will be displayed.
To get rate per minute, just multiply the rate with 60.
Edit: ($__rate_interval and $_interval)
Prometheus periodically fetches data from your application. Grafana periodically fetches Data from Prometheus. Grafana does not know, how often Prometheus polls your application for data. Grafana will estimate this time by looking at the data. The $__interval variable expands to the duration between two data points in the graph. (Note that this is only true for small time ranges and high resolution as the intended use case for $__interval is reducing the number of data points when the time range is wide. See Approximate Calculation of $__interval.)
If the time-distance between every two data points in each series is 15 seconds, it does not make sense to use anything less than [15s] as interval in the rate function. The rate function works best with at least 4 data points. Therefore [1m] would be much better than anything betweeen [15s] and [1m]. This is what $__rate_interval tries to achieve: guessing a minimal sensible interval for the rate function.
Personally, I think, this does not always work if your application delivers sparse data. I prefer using fixed intervals like 10m or even 1h or 1d in these situations. The interval need to be great enough to get you enough data points for the metric to work with the rate function.
A different approach would be to use any of $__rate_interval and $_interval but also set the Min step parameter for the query in the Grafana UI to be big enough.

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

Grafana / influxDB: How many minutes was metric value=1 during selected period?

Grafana / influxDB: So I have a boolean (0 or 1). I want to know how many minutes the value was 1 during a period I wish to select (several days). Can it be done? Thanks.
It will be tricky to calculate it in the InfluxDB however, Grafana has a 3rd party Discrete panel, where you can calculate it on the app/panel level. Panel calculation will be running in the browser, so it can be slow if you have many datapoints in the selected time period.

ADF reprocess daily slice 3 times per day

I have a complex ADF's pipeline with slice based scheduling, where slice = day.
Now it works like that:
Day1, Day2, Day3, ..., PreviousDay, CurrentDay
At 00:00 AM of CurrentDay it reprocess PreviousDay. So for Today i have calculated data for the previous day only.
I need to change the schedule to make it works like that:
1) slice size should be left the same = day
2) reprocessing for CurrentDay should be triggered 4 times per day to emulate results refresh (kinda running total)
The reason why i wanna leave the same slice size = 1 day, because it is a partition sizeof underlying tables. I dont wanna make them small as a few hours because it is meaningless for the current volume of data.
Cannot realize how to avoid change size of slice to a few hours and achive this goal. How to force reprocess current day? Any ideas will be helpfull for me.
Thank you.
The way to do this is to make 2 changes:
Set the availability to be StartOfInterval, thus running the CurrentDay instead of PreviousDay. Dataset availability and policies
Set the schedule of the activity to Hourly with frequency 8 (thereby running this 4 times per day) (See data-factory-scheduling-and-execution#specify-schedule-for-an-activity for more info) The activity and output should have matching slices, this can be fixed with the description below.
Since the slices of the input (Day:1) and activity (Hour:8) is different, you need to set two extra parameters in the activity for the input, to change the slice from 8 hours to 1 day, thus matching the input. The execution is based on the output slice. This is explained further here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#model-datasets-with-different-frequencies The Activity and output slice also have different slices and can be fixed with the same method.