Monthly-hourly-average calculate from Postgresql database - postgresql

I have the time and the values in the data base. I need to calculate for a given month the average during each hour i.e.
YYYY-mm-dd (the day can be omitted)
2021-01-01 00:00:00 value=avg(values from 00:00:00 until 00:59:59 for every day of this month at this hour interval)
2021-01-01 01:00:00 value=avg(values from 01:00:00 until 01:59:59 idem as above)
...
2021-01-01 23:00:00 value=avg(values from 23:00:00 until 23:59:59)
2021-02-01 00:00:00 value=avg(values from 00:00:00 until 00:59:59)
2021-02-01 01:00:00 value=avg(values from 01:00:00 until 01:59:59)
...
2021-02-01 23:00:00 value=avg(values from 23:00:00 until 23:59:59)
...

You can use date_trunc('hour', datestamp) in a GROUP BY statement, something like this.
SELECT DATE_TRUNC('hour', datestamp) hour_beginning, AVG(value) average_value
FROM mytable
WHERE datestamp >= '2021-01-01'
AND datestamp < '2021-02-01'
GROUP BY DATE_TRUNC('hour', datestamp)
ORDER BY DATE_TRUNC('hour', datestamp)
To generalize, in place of DATE_TRUNC you can use any injective function.
You could use
to_char(datestamp, 'YYYY-MM-01 HH24:00:00')
to get one result row per hour for every month in your date range.
SELECT to_char(datestamp, 'YYYY-MM-01 HH24:00:00') hour,
AVG(value) average_value
FROM mytable
GROUP BY to_char(datestamp, 'YYYY-MM-01 HH24:00:00')
ORDER BY to_char(datestamp, 'YYYY-MM-01 HH24:00:00')

Related

PostgreSQL creating timestamp ranges

In PostgreSQL 11, I am trying to get a weekend time range. From 17:00 Friday to Sunday 17:00.
So far I am able to get a working day by doing
select * from generate_series(date '2021-01-01',date '2021-12-31',interval '1' day) as t(dt) where extract (dow from dt) between 1 and 5;
However, I am have trouble creating 2 columns from start (17:00 Friday) to finish (17:00 Sunday).
Expected output should be something like this:
start stop
2022-10-07 17:00 2022-10-09 17:00
2022-10-14 17:00 2022-10-16 17:00
2022-10-21 17:00 2022-10-23 17:00
To get a series of all hours between 17:00 on Friday and 17:00 on Sunday.
SELECT
*
FROM
generate_series(timestamp '2021-01-01', timestamp '2021-12-31', interval '1' hour) AS t (dt)
WHERE
extract(dow FROM dt) IN (5, 6, 0)
AND CASE WHEN extract(dow FROM dt) = 5 THEN
extract(hour FROM dt) >= 17
WHEN extract(dow FROM dt) = 0 THEN
extract(hour FROM dt) <= 17
ELSE
extract(hour FROM dt) IS NOT NULL
END;
UPDATE
Get two timestamps that represent start and stop of each period Friday 17:00 to Sunday 17:00 over a range of dates.
SELECT
dt + '17:00'::time as start, (dt + '17:00'::time) + '2 days'::interval as stop
FROM
generate_series(date '2022-01-01', date '2022-12-31', interval '1' day) AS t (dt)
WHERE
extract(dow FROM dt) = 5
;
start | stop
-------------------------+-------------------------
01/07/2022 17:00:00 PST | 01/09/2022 17:00:00 PST
01/14/2022 17:00:00 PST | 01/16/2022 17:00:00 PST
01/21/2022 17:00:00 PST | 01/23/2022 17:00:00 PST
01/28/2022 17:00:00 PST | 01/30/2022 17:00:00 PST
02/04/2022 17:00:00 PST | 02/06/2022 17:00:00 PST
02/11/2022 17:00:00 PST | 02/13/2022 17:00:00 PST
02/18/2022 17:00:00 PST | 02/20/2022 17:00:00 PST
02/25/2022 17:00:00 PST | 02/27/2022 17:00:00 PST
03/04/2022 17:00:00 PST | 03/06/2022 17:00:00 PST
03/11/2022 17:00:00 PST | 03/13/2022 17:00:00 PDT
03/18/2022 17:00:00 PDT | 03/20/2022 17:00:00 PDT
03/25/2022 17:00:00 PDT | 03/27/2022 17:00:00 PDT
...
--timestamptz type.
SELECT
(day + interval '17:30') AS start,
(day + interval '17:30' + interval '2 days') AS
END
FROM
generate_series(date '2022-10-01', date '2022-12-31', interval '1' day) _ (day)
WHERE
EXTRACT(ISODOW FROM day) = 5;
--timestamp type.
SELECT
(day + interval '17:30')::timestamp AS start,
(day + interval '17:30' + interval '2 days')::timestamp AS
END
FROM
generate_series(date '2022-10-01', date '2022-12-31', interval '1' day) _ (day)
WHERE
EXTRACT(ISODOW FROM day) = 5;
I do checked the calendar, it works.

How to average hourly values over multiple days with SQL

I have a SQL table (postgreSQL/TimescaleDB) with hourly values, eg:
Timestamp Value
...
2021-02-17 13:00:00 2
2021-02-17 14:00:00 4
...
2021-02-18 13:00:00 3
2021-02-18 14:00:00 3
...
I want to get the average values for each hour mapped to today's date in a specific timespan, so something like that:
select avg(value)
from table
where Timestamp between '2021-02-10' and '2021-02-20'
group by *hourpart of timestamp*
result today (2021-10-08) should be:
...
Timestamp Value
2021-10-08 13:00:00 2.5
2021-10-08 14:00:00 3.5
...
If I do the same select tomorrow (2021-10-09) result should change to:
...
Timestamp Value
2021-10-09 13:00:00 2.5
2021-10-09 14:00:00 3.5
...
I resolved the problem by myself:
Solution:
SELECT EXTRACT(HOUR FROM table."Timestamp") as hour,
avg(table."Value") as average
from table
where Timestamp between '2021-02-10' and '2021-02-20'
group by hour
order by hour;
You have to write your query like this:
select avg(value)
from table
where Timestamp between '2021-02-10' and '2021-02-20'
group by substring(TimeStamp,1,10), substring(TimeStamp,11,9)

Group by Date and sum of total duration for that day

I am using workbench/j Postgres DB for my query which is as follows -
Input
ID |utc_tune_start_time |utc_tune_end_time
----------------------------------------------
A |04-03-2019 19:00:00 |04-03-2019 20:00:00
----------------------------------------------
A |04-03-2019 23:00:00 |05-03-2019 01:00:00
-----------------------------------------------
A |05-03-2019 10:00:00 |05-03-2019 10:30:00
-----------------------------------------------
Output
ID |Day |Duration in Minutes
----------------------------------------
A |04-03-2019 |120
-----------------------------------
A |05-03-2019 |90
-----------------------------------
I require the duration elapsed from the utc_tune_start_time till the end of the day and similarly, the time elapsed for utc_tune_end_time since the start of the day.
Thanks for your clarifications. This is possible with some case statements. Basically, if utc_tune_start_time and utc_tune_end_time are on the same day, just use the difference, otherwise calculate the difference from the end or start of the day.
WITH all_activity as (
select date_trunc('day', utc_tune_start_time) as day,
case when date_trunc('day', utc_tune_start_time) =
date_trunc('day', utc_tune_end_time)
then utc_tune_end_time - utc_tune_start_time
else date_trunc('day', utc_tune_start_time) +
interval '1 day' - utc_tune_start_time
end as time_spent
from test
UNION ALL
select date_trunc('day', utc_tune_end_time),
case when date_trunc('day', utc_tune_start_time) =
date_trunc('day', utc_tune_end_time)
then null -- we already calculated this earlier
else utc_tune_end_time - date_trunc('day', utc_tune_end_time)
end
FROM test
)
select day, sum(time_spent)
FROM all_activity
GROUP BY day;
day | sum
---------------------+----------
2019-03-04 00:00:00 | 02:00:00
2019-03-05 00:00:00 | 01:30:00
(2 rows)

hot to add one month to the required column by substracting one day from it in postgresql

I have a column as date. In that column I have a value as '2016-05-06' I want a result in such manner that it will add the complete one month into this column. But it should return a one day before result.
So when i execute the query like:
select date,(date + interval '1 month') as new_column
from batchproduct_info;
it give me the result as:
date new_column
2016-05-06 2016-06-06 00:00:00
2016-05-07 2016-06-07 00:00:00
But I want result in this format:
date new_column
2016-05-06 2016-06-05 00:00:00
2016-05-07 2016-06-06 00:00:00
i.e it should subtract the one day from one month.
This is a solution to your problem:
select date, (date + '1 month'::interval - '1 day'::interval) as new_column
from batchproduct_info;

Grouping by date, with 0 when count() yields no lines

I'm using Postgresql 9 and I'm fighting with counting and grouping when no lines are counted.
Let's assume the following schema :
create table views {
date_event timestamp with time zone ;
event_id integer;
}
Let's imagine the following content :
2012-01-01 00:00:05 2
2012-01-01 01:00:05 5
2012-01-01 03:00:05 8
2012-01-01 03:00:15 20
I want to group by hour, and count the number of lines. I wish I could retrieve the following :
2012-01-01 00:00:00 1
2012-01-01 01:00:00 1
2012-01-01 02:00:00 0
2012-01-01 03:00:00 2
2012-01-01 04:00:00 0
2012-01-01 05:00:00 0
.
.
2012-01-07 23:00:00 0
I mean that for each time range slot, I count the number of lines in my table whose date correspond, otherwise, I return a line with a count at zero.
The following will definitely not work (will yeld only lines with counted lines > 0).
SELECT extract ( hour from date_event ),count(*)
FROM views
where date_event > '2012-01-01' and date_event <'2012-01-07'
GROUP BY extract ( hour from date_event );
Please note I might also need to group by minute, or by hour, or by day, or by month, or by year (multiple queries is possible of course).
I can only use plain old sql, and since my views table can be very big (>100M records), I try to keep performance in mind.
How can this be achieved ?
Thank you !
Given that you don't have the dates in the table, you need a way to generate them. You can use the generate_series function:
SELECT * FROM generate_series('2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts;
This will produce results like this:
ts
---------------------
2012-01-01 00:00:00
2012-01-01 01:00:00
2012-01-01 02:00:00
2012-01-01 03:00:00
...
2012-01-07 21:00:00
2012-01-07 22:00:00
2012-01-07 23:00:00
(168 rows)
The remaining task is to join the two selects using an outer join like this :
select extract ( day from ts ) as day, extract ( hour from ts ) as hour,coalesce(count,0) as count from
(
SELECT extract ( day from date ) as day , extract ( hour from date ) as hr ,count(*)
FROM sr
where date>'2012-01-01' and date <'2012-01-07'
GROUP BY extract ( day from date ) , extract ( hour from date )
) AS cnt
right outer join ( SELECT * FROM generate_series ( '2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts ) as dtetable on extract ( hour from ts ) = cnt.hr and extract ( day from ts ) = cnt.day
order by day,hour asc;
This query will give you the output what your are looking for,
select to_char(date_event, 'YYYY-MM-DD HH24:00') as time, count (to_char(date_event, 'HH24:00')) as count from views where date(date_event) > '2012-01-01' and date(date_event) > '2012-01-07' group by time order by time;