How to eliminate last residual data point in InfluxDB aggregatewindow query - aggregate

I have a query question with InfluxDB;
I am trying to aggregate the data per day and get the medians.
The dates are truncated to the start of the day (00:00:000)
But, the query returns one more last data which is not truncated to the start of the day;
How can I truncate the last data point’s time to the start of the day / or ignore the last value?
My query:
from(bucket: "metric")
|> range(start: -30d, stop: 0d)
|> filter(fn: (r) => r["_measurement"] == "metric")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["metric"] == "SOME_METRIC")
|> aggregateWindow(every: 1d, fn: median, createEmpty: true)
|> yield(name: "median")
I added the query results and the text explains my situation
What I am trying to get is points as:
(Lets say today is 17.02.2022);
15.02.2022 00:00:00:000 - 16.02.2022 00:00:00:000 - 17.02.2022 00:00:00:000
But I got
15.02.2022 00:00:00:000 - 16.02.2022 00:00:00:000 - 17.02.2022 00:00:00:000 - 17.02.2022 05:30:27:437
Thanks in advance.

Ok, I figured out that I must give exact dates instead of -d notation in the time range.
from(bucket: "metric")
|> range(start: 2022-01-16T00:00:00Z, stop: 2022-02-17T00:00:00Z)
|> filter(fn: (r) => r["_measurement"] == "metric")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["metric"] == "SOME_METRIC")
|> aggregateWindow(every: 1d, fn: median, createEmpty: true)
|> yield(name: "median")

I ran into this issue too. I just end up NOT using aggregateWindow. Instead, try just using the window function with the aggregate function like so.
from(bucket: "metric")
|> range(start: -30d, stop: 0d)
|> filter(fn: (r) => r["_measurement"] == "metric")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["metric"] == "SOME_METRIC")
|> window(every: 1d)
|> median()
|> group()
|> yield(name: "median")
This seemed to resolve my issue...

I will show on example data.
In my case I filter data with such query:
|> range(start: 2022-11-11T10:41:10.589Z, stop: 2022-11-11T17:47:05.518Z)
So from this we can see what is _start and _stop.
When we add window function
|> window(every: 1m)
Clearly it behaves weird with _start and _stop in last row.
If we use solution with window and median function we will lose _time column. You can read more about that here
Solution #1 (round time range)
Round time range you filter:
Normally in grafana or other tool probably we will use such form of range:
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
but for test I will use:
start = date.time(t: 2022-11-11T10:41:10.589Z)
end = date.time(t: 2022-11-11T17:47:05.518Z)
start_rounded = date.truncate(t: start, unit:1m) //we can pass v.windowPeriod
end_rounded = date.truncate(t: end, unit:1m)
my_data = from(bucket: "gen")
|> range(start: start_rounded, stop: end_rounded)
Thanks to this we will get rid of last row:
(from range documentation)
Results exclude rows with _time values that match the specified stop
time.
Solution #2 (override _start, _stop)
|> drop(columns: ["_start", "_stop"])
|> duplicate(column: "_time", as: "_start")
|> duplicate(column: "_time", as: "_stop")
but now we depend on _time rounding so not perfect solution if your data isn't aggregated. (not recommend)
Screens are taken from InfluxDB data explorer.

Related

influxdb2 / grafana - top x of timeserie values

I search a way to get the top X of value from timeseries influxdb:
the granularity is one minute.
Csv imported:
#datatype measurement,tag,tag,double,double,double,dateTime:number
HDD,server,hdd_id,Read_IOPS,Write_IOPS,Read_ms,time
HDD,srv1,hdd1,35,33,1,1671233940
HDD,srv1,hdd1,24,69,1,1671234000
HDD,srv1,hdd1,97,57,2,1671234060
HDD,srv1,hdd1,30,78,2,1671234120
HDD,srv1,hdd1,53,83,2,1671234180
HDD,srv1,hdd1,56,85,2,1671234240
HDD,srv1,hdd1,32,25,22,1671234300
HDD,srv1,hdd1,29,89,6,1671234360
HDD,srv1,hdd1,33,41,1,1671234420
HDD,srv1,hdd1,22,15,8,1671234480
HDD,srv1,hdd1,24,95,4,1671234540
…
HDD,srv1,hdd2,35,33,1,1671233940
HDD,srv1,hdd2,24,69,1,1671234000
HDD,srv1,hdd2,97,57,2,1671234060
HDD,srv1,hdd2,30,78,2,1671234120
HDD,srv1,hdd2,53,83,2,1671234180
HDD,srv1,hdd2,56,85,2,1671234240
HDD,srv1,hdd2,32,25,22,1671234300
HDD,srv1,hdd2,29,89,6,1671234360
HDD,srv1,hdd2,33,41,1,1671234420
HDD,srv1,hdd2,22,15,8,1671234480
HDD,srv1,hdd2,24,95,4,1671234540
…
HDD,srv1,hdd3,35,33,1,1671233940
HDD,srv1,hdd3,24,69,1,1671234000
HDD,srv1,hdd3,97,57,2,1671234060
HDD,srv1,hdd3,30,78,2,1671234120
HDD,srv1,hdd3,53,83,2,1671234180
HDD,srv1,hdd3,56,85,2,1671234240
HDD,srv1,hdd3,32,25,22,1671234300
HDD,srv1,hdd3,29,89,6,1671234360
HDD,srv1,hdd3,33,41,1,1671234420
HDD,srv1,hdd3,22,15,8,1671234480
HDD,srv1,hdd3,24,95,4,1671234540
…
I try with this query (flux influxdb2), but only 2 point are displayed on the grafana dashboard:
-> I would like to display for example the top 2 of hdd more busiest for Read_IOPS
import "strings"
|> from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == “HDD”)
|> drop(columns: [“server”])
|> filter(fn: (r) => r._field == “HDD_Read_IOPS”)
|> map(fn: (r) => ({r with newhddid: strings.substring(v: r.hdd_id, start: 0, end: 4)}))
|> group(columns: [“newhddid”])
|> highestMax(n: 2, groupColumns: [“newhddid”])
|> aggregateWindow(column: “_value”, every: v.windowPeriod, fn: mean)
Many thanks for any help.

Moving average in Grafana with Flux

I want to show 2 variables in a time series panel, one just the _value and for the other the moving average lets say over 6hrs. I cannot figure out the syntax to do it just for one of those 2 measurements
from(bucket: "ns")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "openaps")
|> filter(fn: (r) => r["_field"] == "tdd" or r["_field"] == "isf")
|> filter(fn: (r) => r._value > 0)
|> timedMovingAverage(every: 5m, period: 300m)
I only got that far, all if statements failed

Combie two query results in grafana

I have what I had thought was a simple use-case, but its turning out to be quite difficult.
I have 2 influxdb buckets, one that logs my electricity meter price, and day vs night rate, and another than logs the energy being imported.
what I would like to do is combine these to generate graphs of the amount of energy use on day-rate and on night-rate.
I can query the data with the following flux commands:
Get night-tate (boolean)
from(bucket: "home")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r["friendly_name"] == "Is NightRate")
|> map(fn: (r) => ({r with _value: strings.toLower(v: r._value)}))
|> toBool()
|> toFloat()
Get Energy Imported
from(bucket: "PV")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "PV")
|> filter(fn: (r) => r["_field"] == "Total_Energy_Purchased")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> difference()
These return a different number of rows - 2 in the first query and 24 in the second (for a day).
I basically want to multiply one by the other so it shows the usage only when day-rate is a 1. Any ideas how this can be done?

Multiply two irregular series (Influx/flux)

After reading a lot of forums I still can’t figure this out…
I have two irregular series. The first one reads data from my power cabinet ([kW], one sample every ~2nd second). The second one is the price ([local currency / kWh]), 2-3 samples every hour.
My intention is to calulate the total cost for a given time range (multiplying consumption and price, and probably integrate under that curve/graph?).
from(bucket: "amsdata")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "consumption")
|> filter(fn: (r) => r["_field"] == "kW")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
from(bucket: "amsdata")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "spotprice")
|> filter(fn: (r) => r["_field"] == "price")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
Power consumption and price
I will appreciate your help :)

Query delta between two days

i have energy consumption data that is shown in grafana in 1h blocks per day. The data gets written every 5 seconds and needs to be summed up.
histogram
Thats the query:
query
I want to have another chart that shows the difference between actual consumption and consumption from yesterday in the same style.
The problem is i cant figure out how i could use the influxdb difference function correctly.
Any ideas?
I couldnt find any solution with the regular influxdb query language. But by using flux instead there is a solution
today = from(bucket: "piMeter")
|> range(start: -31d)
|> filter(fn: (r) => r._measurement == "downsampled_energy" and r._field == "sum_Gesamt")
|> fill(value: 0.0)
|> aggregateWindow(every: 1d, fn:sum)
yesterday = from(bucket: "piMeter")
|> range(start: -62d, stop: -31d)
|> filter(fn: (r) => r._measurement == "downsampled_energy" and r._field == "sum_Gesamt")
|> fill(value: 0.0)
|> aggregateWindow(every: 1d, fn:sum)
join(tables:{today:today, yesterday:yesterday}, on:["_field"])
|> map(fn:(r) => ({
_time: r._time_today,
_value: r._value_today - r._value_yesterday,
}))
|> fill(value: 0.0)
|> aggregateWindow(every:1d , fn:mean)