Select query for fetching data on the interval of 4 hours - postgresql

I am using postgres 8.1 database and I want to write a query which select data on the interval of 4 hours.
So as image show subscriber_id with date, this how currently data available in the database and
I want data like
No. of Subscriber | Interval
0 0-4
0 4-8
7 8-12
1 12-16
0 16-20
0 20-24
basically in each day we have 24 hours, if I divide 24/4=6 means I have total 6 intervals for each day
0-4
4-8
8-12
12-16
16-20
20-24
So I need count of subscribers within these intervals. Is there any data function in postgres which solve my problem or how can I write a query for this problem ?
NOTE : Please write your solution according to postgres 8.1 version

Use generate_series() to generate periods and left join date_time with appropriate periods, e.g.:
with my_table(date_time) as (
values
('2016-10-24 11:10:00'::timestamp),
('2016-10-24 11:20:00'),
('2016-10-24 15:10:00'),
('2016-10-24 21:10:00')
)
select
format('%s-%s', p, p+4) as "interval",
sum((date_time notnull)::int) as "no of subscriber"
from generate_series(0, 20, 4) p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
interval | no of subscriber
----------+------------------
0-4 | 0
4-8 | 0
8-12 | 2
12-16 | 1
16-20 | 0
20-24 | 1
(6 rows)
I wouldn't suppose that there is a live guy who remembers version 8.1. You can try:
create table periods(p integer);
insert into periods values (0),(4),(8),(12),(16),(20);
select
p as "from", p+4 as "to",
sum((date_time notnull)::int) as "no of subscriber"
from periods
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;
from | to | no of subscriber
------+----+------------------
0 | 4 | 0
4 | 8 | 0
8 | 12 | 2
12 | 16 | 1
16 | 20 | 0
20 | 24 | 1
(6 rows)

In Postgres, you can do this by generating all the intervals for your time periods. This is a little tricky, because you have to pick out the dates in your data. However, generate_series() is really helpful.
The rest is just a left join and aggregation:
select dt.dt, count(t.t)
from (select generate_series(min(d.dte), max(d.dte) + interval '23 hour', interval '4 hour') as dt
from (select distinct date_trunc('day', t.t)::date as dte from t) d
) dt left join
t
on t.t >= dt.dt and t.t < dt.dt + interval '4 hour'
group by dt.dt
order by dt.dt;
Note that this keeps the period as the date/time of the beginning of the period. You can readily convert this to a date and an interval number, if that is more helpful.

The Previous Solution is also working...
Adding one more option,
Instead of creating a table for periods we can also use array and unnest function of arrays in this query
My code is
select
p as "from", p+4 as "to",
sum((date_time not null)::int) as "no of subscriber"
from unnest(ARRAY[0,4,8,12,16,20]) as p
left join my_table
on extract(hour from date_time) between p and p+ 4
group by p
order by p;

I think if you run six different queries since you know the time intervals (lower and upper limit) will be better.

Related

How to Shorten Execution Time for A View

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.
user admin cust
user_id user_id | admin_id user_id | cust_id
--------- ---------|---------- ---------|---------
1 1 | a 2 | dd
2 4 | b 3 | ff
3
4
Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app
login_history
user_id | login_on
---------|---------------------
1 | 2022-01-01 13:22:43
1 | 2022-01-02 16:16:27
3 | 2022-01-05 21:17:52
2 | 2022-01-11 11:12:26
3 | 2022-01-12 03:34:47
I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.
login_counts_view
week_start_date | admin_count | cust_count
-----------------|-------------|------------
2022-01-01 | 1 | 1
2022-01-08 | 0 | 2
2022-01-15 | 0 | 0
.
.
.
2022-12-31 | 0 | 0
Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.
Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.
CREATE VIEW login_counts_view AS
SELECT
week_start_dates.week_start_date::text AS week_start_date,
count(distinct a.user_id) AS admin_count,
count(distinct c.user_id) AS cust_count
FROM (
SELECT
to_char(i::date, 'YYYY-MM-DD') AS week_start_date
FROM
generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates
LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date + INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
GROUP BY week_start_date;
Does anyone have any tips as to how to make this query execute more efficiently?
Idea
Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.
Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.
Implementation
A view to implement this follows:
CREATE VIEW login_counts_view_fast AS
WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i + 1 FROM Numbers WHERE i < 52 )
SELECT CAST ( date_trunc('year', NOW()) AS DATE) + 7 * n.i week_start_date
, count(distinct lw.admin_id) admin_count
, count(distinct lw.cust_id) cust_count
FROM (
SELECT i FROM Numbers
) n
LEFT JOIN (
SELECT admin_id
, cust_id
, base
, pit
, pit-base delta
, (pit-base) / (3600 * 24 * 7) week
FROM (
SELECT a.user_id admin_id
, c.user_id cust_id
, CAST ( EXTRACT ( EPOCH FROM l.login_on ) AS INTEGER ) pit
, CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW()) ) AS INTEGER ) base
FROM login_history l
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
) le
) lw
ON lw.week = n.i
GROUP BY n.i
;
Some remarks:
The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)
Evaluation
See it in action in this db fiddle
The query plan indicates savings of 50-70% in execution time.

Combine generate series and count into one query

Postgres version 9.4.18, PostGIS Version 2.2.
I removed some of the details about the tables from this question because I doubt it's needed to answer the question. I can add those details back if necessary.
Desired result:
I want a total count for each week of year and hour of day (0100 to 5223). I'm able to successfully generate a series of 0100 to 5223 (actually up to 5300), and I'm able to get a total count for each week of year and hour of day individually, but i'm unable to combine the queries so that weeks of year/hours of day with a zero county still show up. I want to combine the count result with the generate_series (and ideally divide that result by 30) to get something like below.
MM-DD | count_not_zero | count_not_zero_divided_by_30
-------+----------------+----------------------------
0100 | 10 | 33.3
0101 | 0 | 0
0102 | 0 | 0
...
0123 | 0 | 0
0200 | 3 | 10
0201 | 10 | 33.3
...
5223 | 20 | 66.6
Here are my individual queries that work...that I want to combine:
SELECT DISTINCT f_woyhh(d::timestamp) as woyhh
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 hour') d
GROUP BY woyhh
ORDER by woyhh asc;
SELECT dt, count(*) FROM
(SELECT f_woyhh((time)::timestamp at time zone 'utc' at time zone 'america/chicago')
AS dt,
EXTRACT(YEAR FROM time) AS ctYear, count(*)
AS ct
FROM counties c
INNER JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE countyname = 'Milwaukee' AND state = 'WI' AND EXTRACT(YEAR from time) > '1987' GROUP BY dt, EXTRACT(YEAR from time))
AS count group by dt;
The result from the second query above is (and skips zero count dt, which I don't want):
dt | count
-------+-------
0100 | 10
0104 | 5
0108 | 4
...
Conclusion:
I'm trying to combine the above working individual queries into a single query that provides a three a three column result--woyhh, count, and count divided by 30. And I want to include woyhh that have zero in the county, so that I have a complete set of woyhh.
Thanks for any help!!
I found the answer. I'll be posting it tomorrow, but I wanted to put this on today so no one unnecessarily works on this question. I apologize for the formatting.
WITH CTE_Dates AS (SELECT DISTINCT f_woyhh(d::timestamp) as dt
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 hour') d),
CTE_WeeklyHourlyCounts AS (SELECT dt, count(*) as ct
FROM (SELECT f_woyhh((time)::timestamp at time zone 'utc' at time zone 'america/chicago') as dt,
EXTRACT(YEAR FROM time) as ctYear, count(*) as ct
FROM counties c
INNER JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE countyname = 'Milwaukee' AND state = 'WI' AND EXTRACT(YEAR from time) > '1987'
GROUP BY dt,
EXTRACT(YEAR from time)) as count group by dt),
CTE_FullSTats AS (SELECT CTE_Dates.dt as dt, CAST(CTE_WeeklyHourlyCounts.ct as decimal) as ct
FROM CTE_Dates LEFT JOIN CTE_WeeklyHourlyCounts ON CTE_WeeklyHourlyCounts.dt = CTE_Dates.dt
GROUP BY CTE_Dates.dt, CTE_WeeklyHourlyCounts.ct, CTE_WeeklyHourlyCounts.dt) SELECT dt, COALESCE(ct, 0)
AS count, round(((COALESCE(ct,0) * 100) / 30),0) as percent FROM CTE_FullStats
GROUP BY dt, ct ORDER BY dt;

Aggregate data to 30 minute intervals

A have a table where each row is 300 seconds (or 5 minutes) apart. I need to aggregate the data on every hour and half hour, aggregating everything before and including the hour or half hour.
I've tried this code:
SELECT
to_timestamp(floor(a / 1800 )) *
1800)
AT TIME ZONE 'UTC' as interval_alias, SUM(b) as b_sum
FROM TABLE_NAME GROUP BY interval_alias
...and it aggregates the data on every hour and half hour, but it sum the values post the hour and half hour.
The table looks something like this:
a | b
-------------------------
1533045600 | 3
1533045900 | 5
1533046200 | 6
1533046500 | 3
1533046800 | 5
1533047100 | 2
1533047400 | 3
1533047700 | 8
1533048000 | 5
1533048300 | 5
1533048600 | 6
The actual result with the above code is:
a | b
-------------------------
1533045600 | 24
1533047400 | 27
The desired output is:
a | b
-------------------------
1533045600 | 3
1533047400 | 24
I used a simpler calculation for the interval_alias and with a GROUP BY you can only select aggregations or the columns that are part of the GROUP BY. (The SELECT * you posted in the question didn't look correct...)
SELECT
FLOOR(a/1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
See example code on SQL Fiddle
update:
This is close to your desired output but will include a third result as your test data spans over more than a half hour.
SELECT
FLOOR(a/1800)*1800 + SIGN(a%1800)*1800 AS interval_alias,
SUM(b) AS sum_b
FROM TABLE_NAME
GROUP BY interval_alias
ORDER BY interval_alias

Postgresql running totals with groups missing data and outer joins

I've written a sql query that pulls data from a user table and produces a running total and cumulative total of when users were created. The data is grouped by week (using the windowing feature of postgres). I'm using a left outer join to include the weeks when no users where created. Here is the query...
<!-- language: lang-sql -->
WITH reporting_period AS (
SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
)
SELECT
date(interval) AS interval
, count(users.created_at) as interval_count
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count
FROM reporting_period
LEFT JOIN users
ON interval=date(date_trunc('week', users.created_at) )
GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval
It works almost perfectly. The cumulative value is calculated properly for weeks week a user was created. For weeks when no user was create it is set to grand total and not the cumulative total up to that point.
Notice the rows with ** the Week Tot column (interval_count) is 0 as expected but the Run Tot (cumulative_total) is 1053 which equals the grand total.
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 1053 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 1053 **
2015-05-11 | 0 | 1053 **
2015-05-18 | 1 | 30
2015-05-25 | 0 | 1053 **
...
2015-06-08 | 996 | 1031
...
2015-09-07 | 2 | 1052
2015-09-14 | 0 | 1053 **
2015-09-21 | 1 | 1053 **
2015-09-28 | 0 | 1053 **
This is what I would like
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 17 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 29 **
...
It seems to me that if the outer join can somehow apply the grand total to the last column it should be possible to apply the current running total but I'm at a loss on how to do it.
Is this possible?
This is not guaranteed to work out of the box as I havent tested on acutal tables, but the key here is to join users on created_at over a range of dates.
with reportingperiod as (
select intervaldate as interval_begin,
intervaldate + interval '1 month' as interval_end
from (
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-03-15')),
DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
) as rp
)
select interval_end,
interval_count,
sum(interval_count) over (order by interval_end) as running_sum
from (
select interval_end,
count(u.created_at) as interval_count
from reportingperiod rp
left join (
select created_at
from users
where created_at < '2015-10-02'
) u on u.created_at > rp.interval_begin
and u.created_at <= rp.interval_end
group by interval_end
) q
I figured it out. The trick was subqueries. Here's my approach
Add a count column to the generate_series call with default value of 0
Select interval and count(users.created_at) from the users data
Union the the generate_series and the result from the select in step #2
(At this point the result will have duplicates for each interval)
Use the results in a subquery to get interval and max(interval_count) which eliminates duplicates
Use the window aggregation as before to get the running total
SELECT
interval
, interval_count
, SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count
FROM
(
SELECT interval, MAX(interval_count) AS interval_count FROM
(
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('week', DATE '2015-04-02')),
DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
0 AS interval_count
UNION
SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
) sub1
GROUP BY interval
) grouped_data
I'm not sure if there are any serious performance issues with this approach but it seems to work. If anyone has a better, more elegant or performant approach I would love the feedback.
Edit: My solution doesn't work when trying to group by arbitrary time windows
Just tried this solution with the following changes
/* generate series using DATE_TRUNC('day'...)*/
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-04-02')),
DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
0 AS interval_count
/* And this part */
SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
For example is is possible to produce these similar results but have the data grouped by intervals as so
3/15/15 - 4/14/15,
4/15/15 - 5/14/15,
5/15/15 - 6/14/15
etc.

Column of counts for time intervals

I want to get a table that constructs a column that tracks how many times an id appears in a given week. If the id appears once it is given a 1, if it appears twice it is given a 2, but if it appears more than two times it is given a 0.
id date
a 2015-11-10
a 2015-11-25
a 2015-11-09
b 2015-11-10
b 2015-11-09
a 2015-11-05
b 2015-11-23
b 2015-11-28
b 2015-12-04
a 2015-11-10
b 2015-12-04
a 2015-12-07
a 2015-12-09
c 2015-11-30
a 2015-12-06
c 2015-10-31
c 2015-11-04
b 2015-12-01
a 2015-10-30
a 2015-12-14
the one week intervals are given as follows
1 - 2015-10-30 to 2015-11-05
2 - 2015-11-06 to 2015-11-12
3 - 2015-11-13 to 2015-11-19
4 - 2015-11-20 to 2015-11-26
5 - 2015-11-27 to 2015-12-03
6 - 2015-12-04 to 2015-12-10
7 - 2015-12-11 to 2015-12-17
The table should look like this.
id interval count
a 1 2
b 1 0
c 1 2
a 2 0
b 2 2
c 2 0
a 3 0
b 3 0
c 3 0
a 4 1
b 4 1
c 4 0
a 5 0
b 5 2
c 5 1
a 6 0
b 6 2
c 6 0
a 7 1
b 7 0
c 7 0
The interval column doesn't have to be there, I simply added it for clarity.
I am new to sql and am unsure how to break the dates into intervals. The only thing I have is grouping by date and counting.
Select id ,date, count (*) as frequency
from data_1
group by id, date having frequency <= 2;
Looking at just the data you provided, this does the trick:
SELECT v.id,
i.interval,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM (VALUES('a'), ('b'), ('c')) v(id)
CROSS JOIN generate_series(1, 7) i(interval)
LEFT JOIN (
SELECT id, ((date - '2015-10-30')/7 + 1)::int AS interval, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, interval)
ORDER BY 2, 1;
A few words of explanation:
You have three id values which are here recreated with a VALUES clause. If you have many more or don't know beforehand which id's to enumerate, you can always replace the VALUES clause with a sub-query.
You provide a specific date range over 7 weeks. Since you might have weeks where a certain id is not present you need to generate a series of the interval values and CROSS JOIN that to the id values above. This yields the 21 rows you are looking for.
Then you calculate the occurrences of ids in intervals. You can subtract a date from another date which will give you the number of days in between. So subtract the date of the row from the earliest date, divide that by 7 to get the interval period, add 1 to make the interval 1-based and convert to integer. You can then convert counts of > 2 to 0 and NULL to 0 with a combination of CASE and coalesce().
The query outputs the interval too, otherwise you will have no clue what the data refers to. Optionally, you can turn this into a column which shows the date range of the interval.
More flexible solution
If you have more ids and a larger date range, you can use the below version which first determines the distinct ids and the date range. Note that the interval is now 0-based to make calculations easier. Not that it matters much because instead of the interval number, the corresponding date range is displayed.
WITH mi AS (
SELECT min(date) AS min, ((max(date) - min(date))/7)::int AS intv FROM my_table)
SELECT v.id,
to_char((mi.min + i.intv * 7)::timestamp, 'YYYY-mm-dd') || ' - ' ||
to_char((mi.min + i.intv * 7 + 6)::timestamp, 'YYYY-mm-dd') AS period,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM mi,
(SELECT DISTINCT id FROM my_table) v
CROSS JOIN LATERAL generate_series(0, mi.intv) i(intv)
LEFT JOIN LATERAL (
SELECT id, ((date - mi.min)/7)::int AS intv, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, intv)
ORDER BY 2, 1;
SQLFiddle with both solutions.
Assuming you have a table of all users, this will do the trick.
select
users.id,
interval_table.id,
CASE
WHEN count(log_table.user_id)>2 THEN 0
ELSE count(log_table.user_id)
END
from users
cross join interval_table
left outer join log_table
on users.id = log_table.user_id
and log_table.event_date >= interval_table.start_interval
and log_table.event_date < interval_table.stop_interval
group by users.id, interval_table.id
order by interval_table.id, users.id
Check it out: http://sqlfiddle.com/#!15/1a822/21