postgres, group by date, and bucketize per hour - postgresql

I would like to create a result object that can be used with Grafana for a heatmap. In order to display the data correctly I need it the output to be like:
| date | 00:00 | 01:00 | 02:00 | 03:00 | ...etc |
| 2023-01-01 | 1 | 2 | 0 | 1 | ... |
| 2023-01-02 | 0 | 0 | 1 | 1 | ... |
| 2023-01-03 | 4 | 0 | 2 | 0 | ... |
my data table structure:
So far, I know that I need to use generate_series and use the interval function to return the hours, but I need my query to plot these hours as columns, but I've not been able to do that, as its getting a bit too advanced.
So far I have the following query:
FROM trades
GROUP BY closed_at
ORDER BY closed_at
It now shows the amount of rows grouped by the days, I want to further aggregate the data, so it outputs the count per hour, as shown above.
Thanks for your help!

You can add more columns, now I only add 0:00 to 05:00.
filter usage:
date_trunc usage:
CREATE temp TABLE trades (
closed_a timestamp,
asset text
INSERT INTO trades (closed_a)
date '2023-01-01' + interval '10 min' * (random() * i * 10)::int
generate_series(1, 10) g (i);
INSERT INTO trades (closed_a)
date '2023-01-02' + interval '10 min' * (random() * i * 10)::int
generate_series(1, 10) g (i);
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date) AS "0:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '1 hour') AS "1:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '2 hour') AS "2:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '3 hour') AS "3:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '4 hour') AS "4:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '5 hour') AS "5:00"


Fixed range of timestamps for every uuid in SQL

I would like to generate a table with the last n weeks timestamps of data (in this case, n=3) and all the data, even if it is null.
I am using the following pieces of code
with raw_weekly_data as (SELECT
distinct d.uuid,
date_trunc('week',a.start_timestamp) as tstamp,
avg(price) as price
a join d on a.uuid = d.uuid
where start_timestamp between date_trunc('week',now()) - interval '3 week' and date_trunc('week',now())
group by 1,2,3
order by 1)
,tstamp as (SELECT
distinct tstamp
from raw_weekly_data r right join tstamp t on r.tstamp = t.tstamp
order by uuid
I would like to have something like that:
week | uuid | price
w1 | 1 | 10
w2 | 1 | 2
w3 | 1 |
w1 | 2 | 20
w2 | 2 |
w3 | 2 |
w1 | 3 | 10
w2 | 3 | 10
w3 | 3 | 20
But instead all the null results are not showed. What is the best approach in here?
week | uuid | price
w1 | 1 | 10
w2 | 1 | 2
w1 | 2 | 20
w1 | 3 | 10
w2 | 3 | 10
w3 | 3 | 20
Form a Cartesian product of all weeks an UUIDs, then LEFT JOIN to actual avg, prices per (week, uuid). Like:
FROM generate_series (date_trunc('week', now() - interval '3 week')
, now() - interval '1 week'
, interval '1 week') tstamp
SELECT d.uuid
, date_trunc('week', a.start_timestamp) AS tstamp
, avg(price) AS price -- d.price?
JOIN d USING (uuid)
WHERE a.start_timestamp >= date_trunc('week',now()) - interval '3 week'
AND a.start_timestamp < date_trunc('week',now())
) ad USING (uuid, tstamp)
This way you get all combinations of the last three weeks and UUIDs, extended by the average price - if one should exist for the combination.
Based on some educated guesses to fill in missing information ..

Combine generate series and count into one query

Postgres version 9.4.18, PostGIS Version 2.2.
I removed some of the details about the tables from this question because I doubt it's needed to answer the question. I can add those details back if necessary.
Desired result:
I want a total count for each week of year and hour of day (0100 to 5223). I'm able to successfully generate a series of 0100 to 5223 (actually up to 5300), and I'm able to get a total count for each week of year and hour of day individually, but i'm unable to combine the queries so that weeks of year/hours of day with a zero county still show up. I want to combine the count result with the generate_series (and ideally divide that result by 30) to get something like below.
MM-DD | count_not_zero | count_not_zero_divided_by_30
0100 | 10 | 33.3
0101 | 0 | 0
0102 | 0 | 0
0123 | 0 | 0
0200 | 3 | 10
0201 | 10 | 33.3
5223 | 20 | 66.6
Here are my individual queries that work...that I want to combine:
SELECT DISTINCT f_woyhh(d::timestamp) as woyhh
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 hour') d
GROUP BY woyhh
ORDER by woyhh asc;
SELECT dt, count(*) FROM
(SELECT f_woyhh((time)::timestamp at time zone 'utc' at time zone 'america/chicago')
AS dt,
EXTRACT(YEAR FROM time) AS ctYear, count(*)
AS ct
FROM counties c
INNER JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE countyname = 'Milwaukee' AND state = 'WI' AND EXTRACT(YEAR from time) > '1987' GROUP BY dt, EXTRACT(YEAR from time))
AS count group by dt;
The result from the second query above is (and skips zero count dt, which I don't want):
dt | count
0100 | 10
0104 | 5
0108 | 4
I'm trying to combine the above working individual queries into a single query that provides a three a three column result--woyhh, count, and count divided by 30. And I want to include woyhh that have zero in the county, so that I have a complete set of woyhh.
Thanks for any help!!
I found the answer. I'll be posting it tomorrow, but I wanted to put this on today so no one unnecessarily works on this question. I apologize for the formatting.
WITH CTE_Dates AS (SELECT DISTINCT f_woyhh(d::timestamp) as dt
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 hour') d),
CTE_WeeklyHourlyCounts AS (SELECT dt, count(*) as ct
FROM (SELECT f_woyhh((time)::timestamp at time zone 'utc' at time zone 'america/chicago') as dt,
EXTRACT(YEAR FROM time) as ctYear, count(*) as ct
FROM counties c
INNER JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE countyname = 'Milwaukee' AND state = 'WI' AND EXTRACT(YEAR from time) > '1987'
EXTRACT(YEAR from time)) as count group by dt),
CTE_FullSTats AS (SELECT CTE_Dates.dt as dt, CAST(CTE_WeeklyHourlyCounts.ct as decimal) as ct
FROM CTE_Dates LEFT JOIN CTE_WeeklyHourlyCounts ON CTE_WeeklyHourlyCounts.dt = CTE_Dates.dt
GROUP BY CTE_Dates.dt, CTE_WeeklyHourlyCounts.ct, CTE_WeeklyHourlyCounts.dt) SELECT dt, COALESCE(ct, 0)
AS count, round(((COALESCE(ct,0) * 100) / 30),0) as percent FROM CTE_FullStats
GROUP BY dt, ct ORDER BY dt;

Grouping Events in Postgres

I've got an events table that is generated by user activity on a site:
timestamp | name
7:00 AM | ...
7:01 AM | ...
7:02 AM | ...
7:30 AM | ...
7:31 AM | ...
7:32 AM | ...
8:01 AM | ...
8:03 AM | ...
8:05 AM | ...
8:08 AM | ...
8:09 AM | ...
I'd like to aggregate over the events to provide a view of when a user is active. I'm defining active to mean the period in which an event is within +/- 2 minutes. For the above that'd mean:
from | till
7:00 AM | 7:02 AM
7:30 AM | 7:32 AM
8:01 AM | 8:05 AM
8:08 AM | 8:09 AM
What's the best way to write a query that'll aggregate in that method? Is it possible via a WINDOW function or self join or is PL/SQL required?
Use two window functions: one to calculate intervals between contiguous events (gaps) and another to find series of gaps less or equal 2 minutes:
select arr[1] as "from", arr[cardinality(arr)] as "till"
from (
select array_agg(timestamp order by timestamp) arr
from (
select timestamp, sum((gap > '2m' )::int) over w
from (
select timestamp, coalesce(timestamp - lag(timestamp) over w, '3m') gap
from events
window w as (order by timestamp)
) s
window w as (order by timestamp)
) s
group by sum
) s
from | till
07:00:00 | 07:02:00
07:30:00 | 07:32:00
08:01:00 | 08:05:00
(3 rows)
Test it here.
By grouping them around half-hour flooring and getting min & max values:
('7:02 AM'::TIME),('7:01 AM'::TIME),('7:00 AM'::TIME),
('7:30 AM'::TIME),('7:31 AM'::TIME),('7:32 AM'::TIME),
('8:01 AM'::TIME),('8:03 AM'::TIME),('8:05 AM'::TIME)
SELECT MIN(t) "from", MAX(t) "till"
FROM (select t, date_trunc('hour', t) +
CASE WHEN (t-date_trunc('hour', t)) >= '30 minutes'::interval
THEN '30 minutes'::interval ELSE '0'::interval END t1 FROM x ) y
You can apply the same receipt with datetime values like:
WITH x(t) AS (
SELECT '2017-01-01'::TIMESTAMP + (RANDOM()*1440*'1 minute'::INTERVAL) t

Postgresql running totals with groups missing data and outer joins

I've written a sql query that pulls data from a user table and produces a running total and cumulative total of when users were created. The data is grouped by week (using the windowing feature of postgres). I'm using a left outer join to include the weeks when no users where created. Here is the query...
<!-- language: lang-sql -->
WITH reporting_period AS (
SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
date(interval) AS interval
, count(users.created_at) as interval_count
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count
FROM reporting_period
ON interval=date(date_trunc('week', users.created_at) )
GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval
It works almost perfectly. The cumulative value is calculated properly for weeks week a user was created. For weeks when no user was create it is set to grand total and not the cumulative total up to that point.
Notice the rows with ** the Week Tot column (interval_count) is 0 as expected but the Run Tot (cumulative_total) is 1053 which equals the grand total.
Week Week Tot Run Tot
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 1053 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 1053 **
2015-05-11 | 0 | 1053 **
2015-05-18 | 1 | 30
2015-05-25 | 0 | 1053 **
2015-06-08 | 996 | 1031
2015-09-07 | 2 | 1052
2015-09-14 | 0 | 1053 **
2015-09-21 | 1 | 1053 **
2015-09-28 | 0 | 1053 **
This is what I would like
Week Week Tot Run Tot
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 17 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 29 **
It seems to me that if the outer join can somehow apply the grand total to the last column it should be possible to apply the current running total but I'm at a loss on how to do it.
Is this possible?
This is not guaranteed to work out of the box as I havent tested on acutal tables, but the key here is to join users on created_at over a range of dates.
with reportingperiod as (
select intervaldate as interval_begin,
intervaldate + interval '1 month' as interval_end
from (
DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
) as rp
select interval_end,
sum(interval_count) over (order by interval_end) as running_sum
from (
select interval_end,
count(u.created_at) as interval_count
from reportingperiod rp
left join (
select created_at
from users
where created_at < '2015-10-02'
) u on u.created_at > rp.interval_begin
and u.created_at <= rp.interval_end
group by interval_end
) q
I figured it out. The trick was subqueries. Here's my approach
Add a count column to the generate_series call with default value of 0
Select interval and count(users.created_at) from the users data
Union the the generate_series and the result from the select in step #2
(At this point the result will have duplicates for each interval)
Use the results in a subquery to get interval and max(interval_count) which eliminates duplicates
Use the window aggregation as before to get the running total
, interval_count
, SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count
SELECT interval, MAX(interval_count) AS interval_count FROM
DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
0 AS interval_count
SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
) sub1
GROUP BY interval
) grouped_data
I'm not sure if there are any serious performance issues with this approach but it seems to work. If anyone has a better, more elegant or performant approach I would love the feedback.
Edit: My solution doesn't work when trying to group by arbitrary time windows
Just tried this solution with the following changes
/* generate series using DATE_TRUNC('day'...)*/
DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
0 AS interval_count
/* And this part */
SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
For example is is possible to produce these similar results but have the data grouped by intervals as so
3/15/15 - 4/14/15,
4/15/15 - 5/14/15,
5/15/15 - 6/14/15

Getting data from postgres weekly (according to date)

user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.