Grafana PostgreSQL distinct on() with time series - postgresql

I'm quite new to Grafana and Postgres and could use some help with this. I have a dataset in PostgreSQL with temperature forecasts. Mutiple forecasts are published at various points throughout the day (indicated by dump_date) for the same reference date. Say: at 06:00 today and at 12:00 today a forecast is published for tomorrow (where the time is indicated by start_time). Now I want to visualize the temperature forecast as a time series using Grafana. However, I only want to visualize the latest published forecast (12:00) and not both forecasts. I thought I would use DISTINCT ON() to select only the latest published forecast from this dataset, but somehow with Grafana this is not responding. My code in Grafana is as follows:
SELECT
$__time(distinct on(t_ID.start_time)),
concat('Forecast')::text as metric,
t_ID.value
FROM
forecast_table t_ID
WHERE
$__timeFilter(t_ID.start_time)
and t_ID.start_time >= (current_timestamp - interval '30 minute')
and t_ID.dump_date >= (current_timestamp - interval '30 minute')
ORDER BY
t_ID.start_time asc,
t_ID.dump_date desc
This is not working however since I get the message: 'syntax error at or near AS'. What should I do?

You are using Grafana macro $__time, so your query in the editor:
SELECT
$__time(distinct on(t_ID.start_time)),
generates SQL:
SELECT
distinct on(t_ID.start_time AS "time"),
which is incorrect SQL syntax.
I wouldn't use macro. I would write correct SQL directly, e.g.
SELECT
distinct_on(t_ID.start_time) AS "time",
Also use Generated SQL and Query inspector Grafana features for debugging and query development. Make sure that Grafana generates correct SQL for Postgres.

Related

Reorganize the data by months in grafana using psql

im trying to do a "AVG" by months but grafana wont let me do it because im using two Selects and the first one is using the other select and i can`t specifie my time column.
I want to get a "time series"(type of grafana dashboard) where it show´s me the average by month in grafana but i dont know how could i do it with psql and the code i have.
This is the code im using:
SELECT AVG(lol), NOW() as time FROM
(SELECT COUNT(DISTINCT(ticket_id)), SUM(time_spent_minutes) AS "lol"
FROM ticket_messages
WHERE admin_id IN ('20439', '20457', '20291', '20371', '20357', '20235','20449','20355','20488')
GROUP BY ticket_id) as media
Where is the temporal column coming from? As of now your query is not doing anything specifically.
I would think that probably you have a ticket_date column available somewhere, the below query could become
SELECT
EXTRACT(MONTH FROM ticket_date) ticket_month,
SUM(time_spent_minutes)/COUNT(DISTINCT ticket_id) avg_time
FROM ticket_messages
WHERE
admin_id IN ('20439', '20457', '20291', '20371', '20357', '20235','20449','20355','20488')
GROUP BY EXTRACT(MONTH FROM ticket_date)

Continuous aggregates in postgres/timescaledb requires time_bucket-function?

I have a SELECT-query which gives me the aggregated sum(minutes_per_hour_used) of some stuff. Grouped by id, weekday and observed hour.
SELECT id,
extract(dow from observed_date) AS weekday, ( --observed_date is type date
observed_hour, -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
The result looks nice, but now I would like to store that in a self-maintained view, which only considers/aggregates the last 8 weeks. I thought contiouus aggregates are the right way, but I can't make it work (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/). It seems I need to somehow use the time_bucket-function, but actually I don't know how. Any ideas/hints?
I am using postgres with timescaledb.
EDIT: This gives me the desired output, but I can't put it in a continouus aggregate
SELECT id,
extract(dow from observed_date) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
EDIT: Prepend this with
CREATE VIEW my_view
WITH (timescaledb.continuous) AS
gives me [0A000] ERROR: invalid SELECT query for continuous aggregate
Continuous aggregates require grouping by time_bucket:
SELECT <grouping_exprs>, <aggregate_functions>
FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
[ optional grouping exprs>]
[HAVING ...]
It should be applied to a partitioned column, which is usually the time dimension column used in the hypertable creation. Also ORDER BY is not supported.
In the case of the aggregate query in the question no time column is used for grouping. Neither weekday nor observed_hour are time valid columns, since they don't increase as time, instead their values are repeat regularly. weekday repeats every 7 days and observed_hour repeats every 24 hours. This breaks requirements for continuous aggregates.
Since there is no ready solution for this use case, one approach is to use a continuous aggregate to reduce the amount of data for the targeted query, e.g., by bucketing by day:
CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
time_bucket('1day', observed_date) AS day,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;
Then execute the targeted aggregate query on top of it:
SELECT id,
extract(dow from day) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
Another approach is to use PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs, which is run by the job scheduling framework of TimescaleDB. Note that the refresh will re-calculate entire view, which in the example case covers 8 weeks of data. The materialized view can be written in terms of the original table base_table or in terms of the continuous aggregate suggested above.

How to use multiple distinct on statements in Postgres while retaining the correct order

I have a table with weather forecasts in Postgres that looks like this
Here, a wind and a solar forecast is published every 15 minutes for the same time. I wish to select the latest wind and solar forecast from this table using a distinct on() statement. However, when I use this only on the time column, it deletes the wind forecast since that forecast is dumped one minute before the solar forecast. I have tried using distinct on(time, forecast) but then the order somehow is messed up and I no longer take the latest dump_date (see below)
How can I use a distinct on() statement on multiple columns while still retaining the order? The query I'm using now is
select
distinct on ("time", "forecast") *
from table
order by "time"
It is important that this query stays dynamic, so hardcoding the dump_date is not an option for me.
I'd add dump_date DESC to the ORDER
SELECT DISTINCT ON (time, forecast)
*
FROM t
ORDER BY time, forecast, dump_date DESC

Postgres SQL query runs very slow when using current_date

We have a SQL query using a where clause to filter the data set by date. Currently the where clause in the query is set as follows -
WHERE date_field BETWEEN current_date - integer #interval AND current_date (where the interval is the last 90 days)
This query has suddenly started to slow down for the past couple of days. It has started to take upwards of 10 mins. If we remove the current_date from this query and hard code dates in this query it runs like it used to previously in less than 10 seconds.
Hard-coded version
WHERE date_field BETWEEN '03-12-2020' AND '06-12-2020'
This query is run against the Postgres engine in Amazon Aurora. The same where clause is used in other queries filter against the same field which are not impacted by this issue.
I am trying to figure out how we can determine what caused this issue suddenly and how we can resolve this?

How to execute SELECT DISTINCT ON query using SQLAlchemy

I have a requirement to display spend estimation for last 30 days. SpendEstimation is calculated multiple times a day. This can be achieved using simple SQL query:
SELECT DISTINCT ON (date) date(time) AS date, resource_id , time
FROM spend_estimation
WHERE
resource_id = '<id>'
and time > now() - interval '30 days'
ORDER BY date DESC, time DESC;
Unfortunately I can't seem to be able to do the same using SQLAlchemy. It always creates select distinct on all columns. Generated query does not contain distinct on.
query = session.query(
func.date(SpendEstimation.time).label('date'),
SpendEstimation.resource_id,
SpendEstimation.time
).distinct(
'date'
).order_by(
'date',
SpendEstimation.time
)
SELECT DISTINCT
date(time) AS date,
resource_id,
time
FROM spend
ORDER BY date, time
It is missing ON (date) bit. If I user query.group_by - then SQLAlchemy adds distinct on. Though I can't think of solution for given problem using group by.
Tried using function in distinct part and order by part as well.
query = session.query(
func.date(SpendEstimation.time).label('date'),
SpendEstimation.resource_id,
SpendEstimation.time
).distinct(
func.date(SpendEstimation.time).label('date')
).order_by(
func.date(SpendEstimation.time).label('date'),
SpendEstimation.time
)
Which resulted in this SQL:
SELECT DISTINCT
date(time) AS date,
resource_id,
time,
date(time) AS date # only difference
FROM spend
ORDER BY date, time
Which is still missing DISTINCT ON.
Your SqlAlchemy version might be the culprit.
Sqlalchemy with postgres. Try to get 'DISTINCT ON' instead of 'DISTINCT'
Links to this bug report:
https://bitbucket.org/zzzeek/sqlalchemy/issues/2142
A fix wasn't backported to 0.6, looks like it was fixed in 0.7.
Stupid question: have you tried distinct on SpendEstimation.date instead of 'date'?
EDIT: It just struck me that you're trying to use the named column from the SELECT. SQLAlchemy is not that smart. Try passing in the func expression into the distinct() call.