Aggregate data to 30 minute intervals - postgresql

A have a table where each row is 300 seconds (or 5 minutes) apart. I need to aggregate the data on every hour and half hour, aggregating everything before and including the hour or half hour.
I've tried this code:
SELECT
to_timestamp(floor(a / 1800 )) *
1800)
AT TIME ZONE 'UTC' as interval_alias, SUM(b) as b_sum
FROM TABLE_NAME GROUP BY interval_alias
...and it aggregates the data on every hour and half hour, but it sum the values post the hour and half hour.
The table looks something like this:
a | b
-------------------------
1533045600 | 3
1533045900 | 5
1533046200 | 6
1533046500 | 3
1533046800 | 5
1533047100 | 2
1533047400 | 3
1533047700 | 8
1533048000 | 5
1533048300 | 5
1533048600 | 6
The actual result with the above code is:
a | b
-------------------------
1533045600 | 24
1533047400 | 27
The desired output is:
a | b
-------------------------
1533045600 | 3
1533047400 | 24

I used a simpler calculation for the interval_alias and with a GROUP BY you can only select aggregations or the columns that are part of the GROUP BY. (The SELECT * you posted in the question didn't look correct...)
SELECT
FLOOR(a/1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
See example code on SQL Fiddle
update:
This is close to your desired output but will include a third result as your test data spans over more than a half hour.
SELECT
FLOOR(a/1800)*1800 + SIGN(a%1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
ORDER BY interval_alias

Related

Group by day of the month (series) as well as a location

I have come across the following link How to get list day of month data per month in postgresql and am building on this for my own query. Which shows a simple use of generate series for a listing of dates. I have a table that has dates, number of users and a location, which I would like to report on monthly, and on the days which have no data, simply show zero. I think the issue I am having is with the grouping of the location, as that is where my results go astray currently from what is expected.
My data (table = reserve)
Date | Users | Location
-----------------------
2021-05-02 | 3 | 1100<br>
2021-05-24 | 4 | 1000<br>
2021-05-26 | 6 | 1000<br>
2021-05-28 | 7 | 1100<br>
2021-05-29 | 4 | 1100<br>
2021-05-27 | 3 | 1000<br>
etc.
If I use the generate_series for the entire month (generate_series('2021-06-01', '2021-10-31', '1 day'::interval) and then join to the reserve table for each of the locations, the issue is that the group by will exclude the blank days on the join.
I am hoping to achieve:
Date
2021-05-01 | 0 | 1000<br>
2021-05-02 0 1000<br>
....<br>
2021-05-24 4 1000<br>
Until end of month<br>
2021-05-01 0 1100<br>
2021-05-02 3 1100<br>
....
Until end of month
Thank you in advance.
It's hard to tell exactly what you're after based on the example data from your tables but if you don't want to eliminate rows where there is no match this is exactly what a LEFT JOIN is for.
For example:
SELECT * FROM generate_series('2021-06-01', '2021-10-31', '1 day'::interval) as d
LEFT JOIN reserve r ON r.the_date = d;
This will keep all the days in the sequence and just returns null for the columns where there was no match for that day. You can see this in action with this SQL fiddle example

Problem with gaps in data with generate_series

I need a query that returns the cumulative sum of all paid bills per day in the current month.
I've tried a few codes, including this one:
SELECT DISTINCT
month.day,
sum(bills.value) OVER (ORDER BY month.day)
FROM generate_series(1,31) month(day)
LEFT JOIN bills ON date_part('day',bills.payment_date) = month.day
WHERE
(
(date_part('year',bills.payment_date)=date_part('year',CURRENT_DATE)) AND
(date_part('month',bills.payment_date)=date_part('month',CURRENT_DATE))
)
GROUP BY month.day, bills.value, bills.payment_date
ORDER BY month.day
I'm getting:
day | value
1 | 1000
4 | 3000
5 | 5000
The sum is correct, but I'm not getting all the 31 days from the generate_series function. Also, when I remove the DISTINCT command, the query just repeat the days, like:
day | value
1 | 1000
4 | 3000
4 | 3000
4 | 3000
4 | 3000
5 | 5000
5 | 5000
What I want is:
day | value
1 | 1000
2 | 1000
3 | 1000
4 | 3000
5 | 5000
6 | 5000
... | 5000
31 | 5000
Any ideas?
Remove bills.value, bills.payment_date from the group by, then you also don't need the distinct any more. You can also simplify the WHERE clause. But you need to move that condition into the JOIN condition, otherwise it will turn the outer join back into an inner join.
SELECT month.day,
sum(sum(bills.value)) over (order by month.day) as total_value
FROM generate_series(1,31) month(day)
LEFT JOIN bills
ON date_part('day',bills.payment_date) = month.day
AND to_char(bills.payment_date, 'yyyymm') = to_char(current_date, 'yyyymm')
GROUP BY month.day
ORDER BY month.day

Select query for fetching data on the interval of 4 hours

I am using postgres 8.1 database and I want to write a query which select data on the interval of 4 hours.
So as image show subscriber_id with date, this how currently data available in the database and
I want data like
No. of Subscriber | Interval
0 0-4
0 4-8
7 8-12
1 12-16
0 16-20
0 20-24
basically in each day we have 24 hours, if I divide 24/4=6 means I have total 6 intervals for each day
0-4
4-8
8-12
12-16
16-20
20-24
So I need count of subscribers within these intervals. Is there any data function in postgres which solve my problem or how can I write a query for this problem ?
NOTE : Please write your solution according to postgres 8.1 version
Use generate_series() to generate periods and left join date_time with appropriate periods, e.g.:
with my_table(date_time) as (
values
('2016-10-24 11:10:00'::timestamp),
('2016-10-24 11:20:00'),
('2016-10-24 15:10:00'),
('2016-10-24 21:10:00')
)
select
format('%s-%s', p, p+4) as "interval",
sum((date_time notnull)::int) as "no of subscriber"
from generate_series(0, 20, 4) p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
interval | no of subscriber
----------+------------------
0-4 | 0
4-8 | 0
8-12 | 2
12-16 | 1
16-20 | 0
20-24 | 1
(6 rows)
I wouldn't suppose that there is a live guy who remembers version 8.1. You can try:
create table periods(p integer);
insert into periods values (0),(4),(8),(12),(16),(20);
select
p as "from", p+4 as "to",
sum((date_time notnull)::int) as "no of subscriber"
from periods
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
from | to | no of subscriber
------+----+------------------
0 | 4 | 0
4 | 8 | 0
8 | 12 | 2
12 | 16 | 1
16 | 20 | 0
20 | 24 | 1
(6 rows)
In Postgres, you can do this by generating all the intervals for your time periods. This is a little tricky, because you have to pick out the dates in your data. However, generate_series() is really helpful.
The rest is just a left join and aggregation:
select dt.dt, count(t.t)
from (select generate_series(min(d.dte), max(d.dte) + interval '23 hour', interval '4 hour') as dt
from (select distinct date_trunc('day', t.t)::date as dte from t) d
) dt left join
t
on t.t >= dt.dt and t.t < dt.dt + interval '4 hour'
group by dt.dt
order by dt.dt;
Note that this keeps the period as the date/time of the beginning of the period. You can readily convert this to a date and an interval number, if that is more helpful.
The Previous Solution is also working...
Adding one more option,
Instead of creating a table for periods we can also use array and unnest function of arrays in this query
My code is
select
p as "from", p+4 as "to",
sum((date_time not null)::int) as "no of subscriber"
from unnest(ARRAY[0,4,8,12,16,20]) as p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
I think if you run six different queries since you know the time intervals (lower and upper limit) will be better.

Postgresql running totals with groups missing data and outer joins

I've written a sql query that pulls data from a user table and produces a running total and cumulative total of when users were created. The data is grouped by week (using the windowing feature of postgres). I'm using a left outer join to include the weeks when no users where created. Here is the query...
<!-- language: lang-sql -->
WITH reporting_period AS (
SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
)
SELECT
date(interval) AS interval
, count(users.created_at) as interval_count
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count
FROM reporting_period
LEFT JOIN users
ON interval=date(date_trunc('week', users.created_at) )
GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval
It works almost perfectly. The cumulative value is calculated properly for weeks week a user was created. For weeks when no user was create it is set to grand total and not the cumulative total up to that point.
Notice the rows with ** the Week Tot column (interval_count) is 0 as expected but the Run Tot (cumulative_total) is 1053 which equals the grand total.
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 1053 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 1053 **
2015-05-11 | 0 | 1053 **
2015-05-18 | 1 | 30
2015-05-25 | 0 | 1053 **
...
2015-06-08 | 996 | 1031
...
2015-09-07 | 2 | 1052
2015-09-14 | 0 | 1053 **
2015-09-21 | 1 | 1053 **
2015-09-28 | 0 | 1053 **
This is what I would like
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 17 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 29 **
...
It seems to me that if the outer join can somehow apply the grand total to the last column it should be possible to apply the current running total but I'm at a loss on how to do it.
Is this possible?
This is not guaranteed to work out of the box as I havent tested on acutal tables, but the key here is to join users on created_at over a range of dates.
with reportingperiod as (
select intervaldate as interval_begin,
intervaldate + interval '1 month' as interval_end
from (
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-03-15')),
DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
) as rp
)
select interval_end,
interval_count,
sum(interval_count) over (order by interval_end) as running_sum
from (
select interval_end,
count(u.created_at) as interval_count
from reportingperiod rp
left join (
select created_at
from users
where created_at < '2015-10-02'
) u on u.created_at > rp.interval_begin
and u.created_at <= rp.interval_end
group by interval_end
) q
I figured it out. The trick was subqueries. Here's my approach
Add a count column to the generate_series call with default value of 0
Select interval and count(users.created_at) from the users data
Union the the generate_series and the result from the select in step #2
(At this point the result will have duplicates for each interval)
Use the results in a subquery to get interval and max(interval_count) which eliminates duplicates
Use the window aggregation as before to get the running total
SELECT
interval
, interval_count
, SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count
FROM
(
SELECT interval, MAX(interval_count) AS interval_count FROM
(
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('week', DATE '2015-04-02')),
DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
0 AS interval_count
UNION
SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
) sub1
GROUP BY interval
) grouped_data
I'm not sure if there are any serious performance issues with this approach but it seems to work. If anyone has a better, more elegant or performant approach I would love the feedback.
Edit: My solution doesn't work when trying to group by arbitrary time windows
Just tried this solution with the following changes
/* generate series using DATE_TRUNC('day'...)*/
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-04-02')),
DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
0 AS interval_count
/* And this part */
SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
For example is is possible to produce these similar results but have the data grouped by intervals as so
3/15/15 - 4/14/15,
4/15/15 - 5/14/15,
5/15/15 - 6/14/15
etc.

Grouping based on every N days in postgresql

I have a table that includes ID, date, values (temperature) and some other stuff. My table looks like this:
+-----+--------------+------------+
| ID | temperature | Date |
+-----+--------------+------------+
| 1 | 26.3 | 2012-02-05 |
| 2 | 27.8 | 2012-02-06 |
| 3 | 24.6 | 2012-02-07 |
| 4 | 29.6 | 2012-02-08 |
+-----+--------------+------------+
I want to perform aggregation queries like sum and mean for every 10 days.
I was wondering if it is possible in psql or not?
SQL Fiddle
select
"date",
temperature,
avg(temperature) over(order by "date" rows 10 preceding) mean
from t
order by "date"
select id,
temperature,
sum(temperature) over (order by "date" rows between 10 preceding and current row)
from the_table;
It might not exactly be what you want, as it will do a moving sum over the last 10 rows, which is not necessarily the same as the last 10 days.
Since Postgres 11, you can now use a range based on an interval
select id,
temperature,
avg(temperature) over (order by "date"
range between interval '10 days' preceding and current row)
from the_table;