Combine generate series and count into one query - postgresql

Postgres version 9.4.18, PostGIS Version 2.2.
I removed some of the details about the tables from this question because I doubt it's needed to answer the question. I can add those details back if necessary.
Desired result:
I want a total count for each week of year and hour of day (0100 to 5223). I'm able to successfully generate a series of 0100 to 5223 (actually up to 5300), and I'm able to get a total count for each week of year and hour of day individually, but i'm unable to combine the queries so that weeks of year/hours of day with a zero county still show up. I want to combine the count result with the generate_series (and ideally divide that result by 30) to get something like below.
MM-DD | count_not_zero | count_not_zero_divided_by_30
-------+----------------+----------------------------
0100 | 10 | 33.3
0101 | 0 | 0
0102 | 0 | 0
...
0123 | 0 | 0
0200 | 3 | 10
0201 | 10 | 33.3
...
5223 | 20 | 66.6
Here are my individual queries that work...that I want to combine:
SELECT DISTINCT f_woyhh(d::timestamp) as woyhh
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 hour') d
GROUP BY woyhh
ORDER by woyhh asc;
SELECT dt, count(*) FROM
(SELECT f_woyhh((time)::timestamp at time zone 'utc' at time zone 'america/chicago')
AS dt,
EXTRACT(YEAR FROM time) AS ctYear, count(*)
AS ct
FROM counties c
INNER JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE countyname = 'Milwaukee' AND state = 'WI' AND EXTRACT(YEAR from time) > '1987' GROUP BY dt, EXTRACT(YEAR from time))
AS count group by dt;
The result from the second query above is (and skips zero count dt, which I don't want):
dt | count
-------+-------
0100 | 10
0104 | 5
0108 | 4
...
Conclusion:
I'm trying to combine the above working individual queries into a single query that provides a three a three column result--woyhh, count, and count divided by 30. And I want to include woyhh that have zero in the county, so that I have a complete set of woyhh.
Thanks for any help!!

I found the answer. I'll be posting it tomorrow, but I wanted to put this on today so no one unnecessarily works on this question. I apologize for the formatting.
WITH CTE_Dates AS (SELECT DISTINCT f_woyhh(d::timestamp) as dt
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 hour') d),
CTE_WeeklyHourlyCounts AS (SELECT dt, count(*) as ct
FROM (SELECT f_woyhh((time)::timestamp at time zone 'utc' at time zone 'america/chicago') as dt,
EXTRACT(YEAR FROM time) as ctYear, count(*) as ct
FROM counties c
INNER JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE countyname = 'Milwaukee' AND state = 'WI' AND EXTRACT(YEAR from time) > '1987'
GROUP BY dt,
EXTRACT(YEAR from time)) as count group by dt),
CTE_FullSTats AS (SELECT CTE_Dates.dt as dt, CAST(CTE_WeeklyHourlyCounts.ct as decimal) as ct
FROM CTE_Dates LEFT JOIN CTE_WeeklyHourlyCounts ON CTE_WeeklyHourlyCounts.dt = CTE_Dates.dt
GROUP BY CTE_Dates.dt, CTE_WeeklyHourlyCounts.ct, CTE_WeeklyHourlyCounts.dt) SELECT dt, COALESCE(ct, 0)
AS count, round(((COALESCE(ct,0) * 100) / 30),0) as percent FROM CTE_FullStats
GROUP BY dt, ct ORDER BY dt;

Related

Getting attendance of an employee with a date series in a particular range in Postgres

I have a attendance table with employee_id, date and punch-in time.
Emp_Id PunchTime
101 10/10/2016 07:15
101 10/10/2016 12:20
101 10/10/2016 12:50
101 10/10/2016 16:31
102 10/10/2016 07:15
Here I have the date only for the working days. I want to get the attendance list of a employee with series of given date period. I need the day also. Result should look like as follows
date | day |employee_id | Intime | outtime |
2016-10-09 | sunday | 101 | | |
2016-10-10 | monday | 101 | 2016-10-10 7:15AM |2016-10-10 4:31 PM |
You can generate a list of dates and then do an outer join on them:
The following displays all days in October:
select d.date, a.emp_id,
min(punchtime) as intime,
max(punchtime) as outtime
from generate_series(date '2016-10-01', date '2016-11-01' - 1, interval '1' day) as d (date)
left join attendance a on d.date = a.punchtime::date
group by d.date, a.emp_id;
order by d.date, a.emp_id;
As you want the first and last timestamp from each day this can be done using a simple group by query.
This will however not repeat the emp_id for the non_existing days.
Something like the following will generate a list of the range of dates (starting and ending with whatever range is found in your punchtime table), with employees and intime, outtime for each. Check the SQL fiddle here:
http://sqlfiddle.com/#!15/d93bd/1
WITH RECURSIVE minmax AS
(
SELECT MIN(CAST(time AS DATE)) AS min, MAX(CAST(time as DATE)) AS max
FROM emp_time
),
dates AS
(
SELECT m.min as datepart
FROM minmax m
RIGHT JOIN emp_time e ON m.min = CAST(e.time as DATE)
UNION ALL
SELECT d.datepart + 1 FROM dates d, minmax mm
WHERE d.datepart + 1 <= mm.max
)
SELECT d.datepart as date, e.emp, MIN(e.time) as intime, MAX(e.time) as outtime FROM dates d
LEFT JOIN emp_time e ON d.datepart = CAST(e.time as DATE)
GROUP BY d.datepart, e.emp
ORDER BY d.datepart;

Postgresql running totals with groups missing data and outer joins

I've written a sql query that pulls data from a user table and produces a running total and cumulative total of when users were created. The data is grouped by week (using the windowing feature of postgres). I'm using a left outer join to include the weeks when no users where created. Here is the query...
<!-- language: lang-sql -->
WITH reporting_period AS (
SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
)
SELECT
date(interval) AS interval
, count(users.created_at) as interval_count
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count
FROM reporting_period
LEFT JOIN users
ON interval=date(date_trunc('week', users.created_at) )
GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval
It works almost perfectly. The cumulative value is calculated properly for weeks week a user was created. For weeks when no user was create it is set to grand total and not the cumulative total up to that point.
Notice the rows with ** the Week Tot column (interval_count) is 0 as expected but the Run Tot (cumulative_total) is 1053 which equals the grand total.
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 1053 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 1053 **
2015-05-11 | 0 | 1053 **
2015-05-18 | 1 | 30
2015-05-25 | 0 | 1053 **
...
2015-06-08 | 996 | 1031
...
2015-09-07 | 2 | 1052
2015-09-14 | 0 | 1053 **
2015-09-21 | 1 | 1053 **
2015-09-28 | 0 | 1053 **
This is what I would like
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 17 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 29 **
...
It seems to me that if the outer join can somehow apply the grand total to the last column it should be possible to apply the current running total but I'm at a loss on how to do it.
Is this possible?
This is not guaranteed to work out of the box as I havent tested on acutal tables, but the key here is to join users on created_at over a range of dates.
with reportingperiod as (
select intervaldate as interval_begin,
intervaldate + interval '1 month' as interval_end
from (
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-03-15')),
DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
) as rp
)
select interval_end,
interval_count,
sum(interval_count) over (order by interval_end) as running_sum
from (
select interval_end,
count(u.created_at) as interval_count
from reportingperiod rp
left join (
select created_at
from users
where created_at < '2015-10-02'
) u on u.created_at > rp.interval_begin
and u.created_at <= rp.interval_end
group by interval_end
) q
I figured it out. The trick was subqueries. Here's my approach
Add a count column to the generate_series call with default value of 0
Select interval and count(users.created_at) from the users data
Union the the generate_series and the result from the select in step #2
(At this point the result will have duplicates for each interval)
Use the results in a subquery to get interval and max(interval_count) which eliminates duplicates
Use the window aggregation as before to get the running total
SELECT
interval
, interval_count
, SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count
FROM
(
SELECT interval, MAX(interval_count) AS interval_count FROM
(
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('week', DATE '2015-04-02')),
DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
0 AS interval_count
UNION
SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
) sub1
GROUP BY interval
) grouped_data
I'm not sure if there are any serious performance issues with this approach but it seems to work. If anyone has a better, more elegant or performant approach I would love the feedback.
Edit: My solution doesn't work when trying to group by arbitrary time windows
Just tried this solution with the following changes
/* generate series using DATE_TRUNC('day'...)*/
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-04-02')),
DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
0 AS interval_count
/* And this part */
SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
For example is is possible to produce these similar results but have the data grouped by intervals as so
3/15/15 - 4/14/15,
4/15/15 - 5/14/15,
5/15/15 - 6/14/15
etc.

How to select records in date order that total to an arbitrary amount?

I have a table of fuel deliveries as follows:
Date Time Qty
20160101 0800 4500
20160203 0900 6000
20160301 0810 3400
20160328 1710 5300
20160402 1201 6000
I know that on April 1st I had 10,000 litres in the tank so now I want to select just the deliveries that make up the total. This means I want the records for 20160328,20160301 and 20160203. I am using Postgres and I want to know how to structure a select statement that would accomplish this task.
I understand how to use the where clause to filter records whose date is less than on equal April 1st but I do not know how to instruct Postgres to select the records in reverse date order until the quantity selected is greater than or equal to 10,000.
with d as (
select *, sum(qty) over (order by date desc, time desc) as total
from delivery
where date between '20160101' and '20160401'
)
select *
from d
where total < 10000
union
(
select *
from d
where total >= 10000
order by date desc, time desc
limit 1
)
order by date desc, time desc
;
date | time | qty | total
------------+----------+------+-------
2016-03-28 | 17:10:00 | 5300 | 5300
2016-03-01 | 08:10:00 | 3400 | 8700
2016-02-03 | 09:00:00 | 6000 | 14700
The data:
create table delivery (date date, time time, qty int);
insert into delivery (date, time, qty) values
('20160101','0800',4500),
('20160203','0900',6000),
('20160301','0810',3400),
('20160328','1710',5300),
('20160402','1201',6000);
You can create a running total using a window function based on descending order of date and time, like so:
SELECT
Date,
Time,
Qty
FROM
(
SELECT
Date,
Time,
Qty,
SUM(Qty) OVER (ORDER BY Date DESC, Time DESC) AS Running_Total
FROM
fuel_deliveries
WHERE
Date < '20160402'
) rt
WHERE
Running_Total <= 10000;
The inner/sub query gets you the running total, but you then want to filter on it where the value is less than or equal to 10000.

Getting data from postgres weekly (according to date)

user timespent(in sec) date(in timestamp)
u1 10 t1(2015-08-15)
u1 20 t2(2015-08-19)
u1 15 t3(2015-08-28)
u1 16 t4(2015-09-06)
Above is the format of my table, which represents timespent by user on a course and it is ordered by timestamp. I want to get sum of timespent by a particular user, say u1 weekly in the format :
start_date end_date sum
2015-08-15 2015-08-21 30
2015-08-22 2015-08-28 15
2015-08-29 2015-09-04 0
2015-09-05 2015-09-11 16
The difficulty lies in the fact that the seven-day periods that you want to get are not regular weeks starting with Monday.
You can not therefore use standard functions to get the week number based on the date, and have to use your own weeks generator using generate_series().
Example data:
create table sessions (user_name text, time_spent int, session_date timestamp);
insert into sessions values
('u1', 10, '2015-08-15'),
('u1', 20, '2015-08-19'),
('u1', 15, '2015-08-28'),
('u1', 16, '2015-09-06');
The query for an arbitrary chosen period from 2015-08-15 to 2015-09-06:
with weeks as (
select d::date start_date, d::date+ 6 end_date
from generate_series('2015-08-15', '2015-09-06', '7d'::interval) d
)
select w.start_date, w.end_date, coalesce(sum(time_spent), 0) total
from weeks w
left join (
select start_date, end_date, coalesce(time_spent, 0) time_spent
from weeks
join sessions
on session_date between start_date and end_date
where user_name = 'u1'
) s
on w.start_date = s.start_date and w.end_date = s.end_date
group by 1, 2
order by 1;
start_date | end_date | total
------------+------------+-------
2015-08-15 | 2015-08-21 | 30
2015-08-22 | 2015-08-28 | 15
2015-08-29 | 2015-09-04 | 0
2015-09-05 | 2015-09-11 | 16
(4 rows)
select
ui,
date_trunc('week', the_date)::date as start_date,
date_trunc('week', the_date)::date + 6 as end_date,
sum(timespent) as "sum"
from t
group by 1, 2, 3
order by 1,2
Something like this (assuming that by timestamp you mean the data type timestamp).
In order to make the 1st day of the week to be Sunday, I added and extra day to "date" in the group by.
select (start_date - date_part('dow', start_date) * interval '1 day')::date start_date,
(start_date + (6 - date_part('dow', start_date)) * interval '1 day')::date end_date,
total_time_spent
from (
select min("date") start_date, sum(timespent) total_time_spent
from mytable
where user=u1
group by date_part('year', "date"), date_part('week', "date" + interval '1 day')) "tmp"
order by start_date
This is a more generic approach, for any date interval.

Compare interval date by row

I am trying to group dates within a 1 year interval given an identifier by labeling which is the earliest date and which is the latest date. If there are no dates within a 1 year interval from that date, then it will record it's own date as the first and last date. For example originally the data is:
id | date
____________
a | 1/1/2000
a | 1/2/2001
a | 1/6/2000
b | 1/3/2001
b | 1/3/2000
b | 1/3/1999
c | 1/1/2000
c | 1/1/2002
c | 1/1/2003
And the output I want is:
id | first_date | last_date
___________________________
a | 1/1/2000 | 1/2/2001
b | 1/3/1999 | 1/3/2001
c | 1/1/2000 | 1/1/2000
c | 1/1/2002 | 1/1/2003
I have been trying to figure this out the whole day and can't figure it out. I can do it for cases id's with only 2 duplicates, but can't for greater values. Any help would be great.
SELECT id
, min(min_date) AS min_date
, max(max_date) AS max_date
, sum(row_ct) AS row_ct
FROM (
SELECT id, year, min_date, max_date, row_ct
, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM (
SELECT id
, extract(year FROM the_date)::int AS year
, min(the_date) AS min_date
, max(the_date) AS max_date
, count(*) AS row_ct
FROM tbl
GROUP BY id, year
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, grp;
1) Group all rows per (id, year), in subquery sub1. Record min and max of the date. I added a count of rows (row_ct) for demonstration.
2) Subtract the row_number() from the year in the second subquery sub2. Thus, all rows in succession end up in the same group (grp). A gap in the years starts a new group.
3) In the final SELECT, group a second time, this time by (id, grp) and record min, max and row count again. Voilá. Produces exactly the result you are looking for.
-> SQLfiddle demo.
Related answers:
Return array of years as year ranges
Group by repeating attribute
select id, min ([date]) first_date, max([date]) last_date
from <yourTbl> group by id
Use this (SQLFiddle Demo):
SELECT id,
min(date) AS first_date,
max(date) AS last_date
FROM mytable
GROUP BY 1
ORDER BY 1