Adding Column with time Calculations in postgresql - postgresql

Suppose I have data as shown in image
I want to create a third column that will give me names of different types of Alarm names occurred in 20 minutes' time from the Alarm name column, So I can understand which Alarms are related.

I am not sure I full understand the output you want, but you can collect the error messages using array_agg() and a window function that uses a window that is -10 minutes and +10 minutes around the "current" timestamp.
Something along the lines:
select created_at,
error_message,
array_agg(error_message) over (order by created_at range between interval '10 minute' preceding and '10 minute' following) as nearby_errors
from error_log
order by created_at;

Related

How to select data between two dates using only the start date?

I have problem select data between two dates if the only start_date is available.
The example I want to see is what discount_nr was active between 2020-07-01 and 2020-07-15 or only one day 2020-07-14. I tried different solutions, date range, generate series, and so on, but was still not able to get it to work.
Table only have start dates, no end dates
Example:
discount_nr, start_date
1, 2020-06-30
2, 2020-07-03
3, 2020-07-10
4, 2020-07-15
You can get the end dates by looking at the start date of the next row. This is done with lead. lead(start_date) over(order by start_date asc) will get you the start_date of the next row. If we take 1 day from that we'll get the inclusive end date.
Rather than separate start/end columns, a single daterange column is easier to work with. You can use that as a CTE or create a view.
create view discount_durations as
select
id,
daterange(
start_date,
lead(start_date) over(order by start_date asc)
) as duration
from discounts
Now querying it is easy using range operators. #> to check if the range contains a date.
select *
from discount_durations
where duration #> '2020-07-14'::date
And use && to see if they have any overlap.
select *
from discount_durations
where duration && daterange('2020-07-01', '2020-07-15');
Demonstration

Continuous aggregates in postgres/timescaledb requires time_bucket-function?

I have a SELECT-query which gives me the aggregated sum(minutes_per_hour_used) of some stuff. Grouped by id, weekday and observed hour.
SELECT id,
extract(dow from observed_date) AS weekday, ( --observed_date is type date
observed_hour, -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
The result looks nice, but now I would like to store that in a self-maintained view, which only considers/aggregates the last 8 weeks. I thought contiouus aggregates are the right way, but I can't make it work (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/). It seems I need to somehow use the time_bucket-function, but actually I don't know how. Any ideas/hints?
I am using postgres with timescaledb.
EDIT: This gives me the desired output, but I can't put it in a continouus aggregate
SELECT id,
extract(dow from observed_date) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
EDIT: Prepend this with
CREATE VIEW my_view
WITH (timescaledb.continuous) AS
gives me [0A000] ERROR: invalid SELECT query for continuous aggregate
Continuous aggregates require grouping by time_bucket:
SELECT <grouping_exprs>, <aggregate_functions>
FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
[ optional grouping exprs>]
[HAVING ...]
It should be applied to a partitioned column, which is usually the time dimension column used in the hypertable creation. Also ORDER BY is not supported.
In the case of the aggregate query in the question no time column is used for grouping. Neither weekday nor observed_hour are time valid columns, since they don't increase as time, instead their values are repeat regularly. weekday repeats every 7 days and observed_hour repeats every 24 hours. This breaks requirements for continuous aggregates.
Since there is no ready solution for this use case, one approach is to use a continuous aggregate to reduce the amount of data for the targeted query, e.g., by bucketing by day:
CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
time_bucket('1day', observed_date) AS day,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;
Then execute the targeted aggregate query on top of it:
SELECT id,
extract(dow from day) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
Another approach is to use PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs, which is run by the job scheduling framework of TimescaleDB. Note that the refresh will re-calculate entire view, which in the example case covers 8 weeks of data. The materialized view can be written in terms of the original table base_table or in terms of the continuous aggregate suggested above.

How to get count of timestamps which has interval bigger than xx seconds between next row in PostgresSQL

I have table with 3 columns (postgres 9.6) : serial , timestamp , clock_name
Usually there is 1 second different between each row but sometimes the interval is bigger.
I'm trying to get the number of occasions that the timestamp interval between 2 rows was bigger than 10 seconds (lets say I limit this to 1000 rows)
I would like to do this in one query (probably select from select) but I have no idea how to write such a query , my sql knowladge is very basic.
Any help will be appreciated
You can use window functions to retrieve the next record record given the current record.
Using the ORDER BY on the function to ensure things are in time stamp order and using PARTITION to keep the clocks separate you can find for each row the row that follows it.
WITH links AS
(
SELECT
id, ts, clock, LEAD(ts) OVER (PARTITION BY clock ORDER BY ts) AS next_ts
FROM myTable
)
SELECT * FROM links
WHERE
EXTRACT(EPOCH FROM (next_ts - ts)) > 10
You can then just compare the time stamps.
Window functions https://www.postgresql.org/docs/current/static/functions-window.html
Or if you prefer to use derived tables instead of WITH clause.
SELECT * FROM (
SELECT
id, ts, clock, LEAD(ts) OVER (PARTITION BY clock ORDER BY ts) AS next_ts
FROM myTable
) links
WHERE
EXTRACT(EPOCH FROM (next_ts - ts)) > 10

How can I get a floating average over timestamps in PostgreSQL?

Let's say I have a table for a time-sheet like this:
CREATE TABLE foo (
spent_on DATETIME,
hours FLOAT
)
Assuming spent_on is the timestamp the value was logged, and hours is a floating point value representing the amount of hours spent on a task.
How can I get a floating average of hours over the past 7 days?
I've came up with the following but it won't work:
select spent_on, hours, avg(hours)
over RANGE BETWEEN spent_on - INTERVAL '7 days' AND CURRENT ROW from daily;
I get the following error:
ERROR: syntax error at or near "ROW"
LINE 1: ... BETWEEN spent_on - INTERVAL '7 days' AND CURRENT ROW from d...
I've tried to understand the docs for window functions, but I have real trouble grasping the idea between partitions, windows and frames. And as a result, can't come up with a query.
I'm not sure about the RANGE syntax, so let me offer a solution with a sub query (If performance is not an issue with small tables ETC..) :
SELECT t.spent_on,t.hours,
COALESCE( (SELECT AVG(s.hours) FROM foo
WHERE t.spent_on > CURRENT_TIMESTAMP - INTERVAL '7 days'),0) float_avg
FROM foo t

Select Data over time period

I'm a bit of newbie when it comes to postgres, so bear with me a wee bit and i'll see if i can put up enough information.
i insert weather data into a table every 10 mins, i have a time column that is stamped with an epoch date.
I Have a column of the last hrs rain fall, and every hr that number changes of course with the running total (for that hour).
What i would like to do is skim through the rows to the end of each hour, and get that row, but do it over the last 4 hours, so i would only be returning 4 rows say.
Is this possible in 1 query? Or should i do multiple queries?
I would like to do this in 1 query but not fussed...
Thanks
Thanks guys for your answers, i was/am a bit confused by yours gavin - sorry:) comes from not knowing this terribly well.
I'm still a bit unsure about this, so i'll try and explain it a bit better..
I have a c program that inserts data into the database every 10 mins, it reads the data fom a device that keeps the last hrs rain fall, so every 10 mins it could go up by x amount.
So i guess i have 6 rows / hr of data.
My plan was to go back (in my php page) every 7, which would be the last entry for every hour, and just grab that value. Hence why i would only ever need 4 rows.. just spaced out a bit!
My table (readings) has data like this
index | time (text) | last hrs rain fall (text)
1 | 1316069402 | 1.2
All ears to better ways of storing it too :) I very much appreciate your help too guys thanks.
You should be able to do it in one query...
Would something along the lines of:
SELECT various_columns,
the_hour,
SUM ( column_to_be_summed )
FROM ( SELECT various_columns,
column_to_be_summed,
extract ( hour FROM TIME ) AS the_hour
FROM readings
WHERE TIME > ( NOW() - INTERVAL '4 hour' ) ) a
GROUP BY various_columns,
the_hour ;
do what you need?
SELECT SUM(rainfall) FROM weatherdata WHERE time > (NOW() - INTERVAL '4 hour' );
I don't know column names but that should do it the ones in caps are pgsql types. Is that what you are after?
I am not sure if this is exactly what you are looking for but perhaps it may serve as a basis for adaptation.
I often have a requirment for producing summary data over time periods though I don't use epoch time so there may be better ways of manipulating the values than I have come up with.
create and populate test table
create table epoch_t(etime numeric);
insert into epoch_t
select extract(epoch from generate_series(now(),now() - interval '6 hours',interval '-10 minutes'));
To divide up time into period buckets:
select generate_series(to_char(now(),'yyyy-mm-dd hh24:00:00')::timestamptz,
to_char(now(),'yyyy-mm-dd hh24:00:00')::timestamptz - interval '4 hours',
interval '-1 hour');
Convert epoch time to postgres timestamp:
select timestamptz 'epoch' + etime * '1 second'::interval from epoch_t;
then truncate to hour :
select to_char(timestamptz 'epoch' + etime * '1 second'::interval,
'yyyy-mm-dd hh24:00:00')::timestamptz from epoch_t
To provide summary information by hour :
select to_char(timestamptz 'epoch' + etime * '1 second'::interval,
'yyyy-mm-dd hh24:00:00')::timestamptz,
count(*)
from epoch_t
group by 1
order by 1 desc;
If you might have gaps in the data but need to report zero results use a generate_series to create period buckets and left join to data table.
In this case I create sample hour buckets back prior to the data population above - 9 hours instead of 6 and join on the conversion of epoch time to timestamp truncated to hour.
select per.sample_hour,
sum(case etime is null when true then 0 else 1 end) as etcount
from (select generate_series(to_char(now(),
'yyyy-mm-dd hh24:00:00')::timestamptz,
to_char(now(),'yyyy-mm-dd hh24:00:00')::timestamptz - interval '9 hours',
interval '-1 hour') as sample_hour) as per
left join epoch_t on to_char(timestamptz 'epoch' + etime * '1 second'::interval,
'yyyy-mm-dd hh24:00:00')::timestamptz = per.sample_hour
group by per.sample_hour
order by per.sample_hour desc;