generate series using break time - postgresql

I have a table that store opening hour and closing hour
CREATE TABLE public.open_hours
(
id bigint NOT NULL,
open_hour character varying(255),
end_hour character varying(255),
day character varying(255),
CONSTRAINT pk_open_hour_id PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.open_hours
OWNER TO postgres;
I have another table that sotre
CREATE TABLE public.break_hours
(
id bigint ,
start_time character varying(255),
end_time character varying(255),
open_hour_id bigint ,
CONSTRAINT break_hours_pkey PRIMARY KEY (id),
CONSTRAINT fkinhl5x01pnn54nv15ol5ntxr5 FOREIGN KEY (open_hour_id )
REFERENCES public.open_hours(id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.break_hours
OWNER TO postgres;
I need to generate time series of 30 minutest interval based on break times.
For eg: if my open hours is 08:00 AM and end hour is 06:00 PM and my break time is 11:00 AM to 11:30 and another break time is 03:00 PM to 03:15 PM then i need to generate series from 08:00 AM to 11:00 AM and 11:30 AM to 03:00 PM and 03:15 to 06:00 PM.
sample data
open_hours
-----------
id open_hours end_hour day
1 08:00 AM 06:00 PM Monday
break_hours
id start_time end_time open_hour_id
1 11:00 AM 11:30 AM 1
2 03:00 PM 03:15 PM 1
Sample out put
--------------
08:00 AM
08:30 AM
09:00 AM
09:30 AM
10:00 AM
10:30 AM
11:30 AM
12:00 PM
12:30 PM
01:00 PM
01:30 PM
02:PM PM
02:30 PM
03:15 PM
03:45 PM
04:15 PM
04:45 PM
05:15 PM
Query used for generating series between open hours is
SELECT DISTINCT gs AS start_time,gs + interval '30min' as end_time
FROM generate_series( timestamp '2018-11-09 08:00 AM', timestamp '2018-11-09 06:00 PM', interval '30min' )gs
ORDER BY start_time

It seems that your table modelling should be cleaned. E.g. you should not store times as text types but as time without time zone.
demo: db<>fiddle
WITH hours AS (
SELECT
oh.open_hour + '1970-01-01'::date as open_hour,
oh.end_hour + '1970-01-01'::date as end_hour,
bh.start_time + '1970-01-01'::date as break_start,
bh.end_time + '1970-01-01'::date as break_end,
lead(start_time + '1970-01-01'::date) OVER (ORDER BY start_time) as next_start_time
FROM open_hours oh
LEFT JOIN break_hours bh
ON oh.id = bh.start_date
)
SELECT generate_series(open_hour, break_start, interval '30 minutes')::time as time_slot
FROM (
SELECT
open_hour, break_start
FROM hours
ORDER BY break_start
LIMIT 1
)s
UNION
SELECT
generate_series(break_end, next_start_time, interval '30 minutes')::time
FROM (
SELECT
break_end, next_start_time
FROM
hours
WHERE next_start_time IS NOT NULL
) s
UNION
SELECT generate_series(break_end, end_hour, interval '30 minutes')::time
FROM (
SELECT
break_end, end_hour
FROM hours
ORDER BY break_start DESC
LIMIT 1
) s
Explanation:
WITH clause (CTE):
Merging both tables. I am adding a nonsense date because this results in a timestamp. The later used function generate_series only works for timestamps not for type time. The part is cut away later after the generation with the ::time cast.
The result of the CTE is:
open_hour end_hour break_start break_end next_start_time
1970-01-01 08:00:00 1970-01-01 18:00:00 1970-01-01 09:30:00 1970-01-01 09:45:00 1970-01-01 11:00:00
1970-01-01 08:00:00 1970-01-01 18:00:00 1970-01-01 11:00:00 1970-01-01 11:30:00 1970-01-01 15:00:00
1970-01-01 08:00:00 1970-01-01 18:00:00 1970-01-01 15:00:00 1970-01-01 15:15:00 (NULL)
UNION part:
This part contains three subparts. Because I have to merge the time series from both tables:
1. Taking the opening hour. Generate a time series to the first break beginning.
For this I only need the first row from the CTE above. That's why LIMIT 1 is used.
2. For all breaks: Generate a time series from current break ending to the next break beginning.
The CTE contains a window function lead() which shifts the start_time of the next row into the current one (have a look at the last column of the CTE result). So now I am able to get all break times, no matter how many there are. In my example I added a third break from 9:30 to 9:45 to demonstrate it. So the next time series can be generated from all these columns (current break_end to next_start_time). Only the last row does not contain a next_start_time because there is none.
3. Last step: Generate a time series from the last break ending to the closing hour.
This is quiet similar to (1). After iterating all break times I have to add the last time series from the last break time to the closing time. This could be achieved either by filtering the row without next_start_time or sorting DESC and using LIMIT 1 as I did.
More complex case with more day types:
demo: db<>fiddle
WITH hours AS (
SELECT
oh.id as day_id,
oh.open_hour + '1970-01-01'::date as open_hour,
oh.end_hour + '1970-01-01'::date as end_hour,
bh.start_time + '1970-01-01'::date as break_start,
bh.end_time + '1970-01-01'::date as break_end,
lead(start_time + '1970-01-01'::date) OVER (PARTITION BY oh.id ORDER BY start_time) as next_start_time
FROM open_hours oh
LEFT JOIN break_hours bh
ON oh.id = bh.start_date
)
SELECT day_id, generate_series(open_hour, break_start, interval '30 minutes')::time as time_slot
FROM (
SELECT DISTINCT ON (day_id)
day_id, open_hour, break_start
FROM hours
ORDER BY day_id, break_start
)s
UNION
SELECT
day_id, generate_series(break_end, next_start_time, interval '30 minutes')::time
FROM (
SELECT
day_id, break_end, next_start_time
FROM
hours
WHERE next_start_time IS NOT NULL
) s
UNION
SELECT day_id, generate_series(break_end, end_hour, interval '30 minutes')::time
FROM (
SELECT DISTINCT ON (day_id)
day_id, break_end, end_hour
FROM hours
ORDER BY day_id, break_start DESC
) s
ORDER BY day_id, time_slot
The main idea stays the same as in the example for only one day. The difference is that we have to consider the different day types. I expanded the example above and added a second day with different opening hours and break times.
Changes:
The window function in the CTE got a PARTITION BY part. This ensures that only the start_times are shifted that contains to the same day.
LIMIT 1 will not work anymore because it limits the whole table to one row. This has been changed to DISTINCT ON (day_id) which limits the table to the first row of each day.

Related

Postgresql generate series with interval '15 minutes' longer than 29092 items

Sut:
create table meter.materialized_quarters
(
id int4 not null generated by default as identity,
tm timestamp without time zone
,constraint pk_materialized_quarters primary key (id)
--,constraint uq_materialized_quarters unique (tm)
);
Then setup data:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute');
And check data:
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
Some results:
count|tm |
-----+-----------------------+
2|1999-10-31 02:00:00.000|
2|1999-10-31 02:15:00.000|
2|1999-10-31 02:30:00.000|
2|1999-10-31 02:45:00.000|
2|2000-10-29 02:00:00.000|
2|2000-10-29 02:15:00.000|
2|2000-10-29 02:30:00.000|
2|2000-10-29 02:45:00.000|
2|2001-10-28 02:00:00.000|
2|2001-10-28 02:15:00.000|
2|2001-10-28 02:30:00.000|
....
Details:
select * from meter.materialized_quarters where tm = '1999-10-31 01:45:00';
Result:
id |tm |
-----+-----------------------+
29092|1999-10-31 01:45:00.000|
As I see, 29092 is maximum series of nonduplicated data generated by: GENERATE_SERIES with 15 minutes interval.
How to fill table (meter.materialized_quarters) from 1999 to 2030?
One solution is:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '1999-10-31 01:45:00', interval '15 minute');
then:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-10-31 02:00:00.000', '2000-10-29 00:00:00.000', interval '15 minute');
and again, and again.
Or
with bad as (
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
)
, ids as (
select mq1.id, mq2.id as iddel
from meter.materialized_quarters mq1 inner join bad on bad.tm = mq1.tm inner join meter.materialized_quarters mq2 on bad.tm = mq2.tm
where mq1.id<mq2.id
)
delete from meter.materialized_quarters
where id in (select iddel from ids);
Is there more 'elegant' way?
EDIT.
I see the problem.
xxxx-10-29 02:00:00 - summer time become winter time.
select GENERATE_SERIES ('1999-10-31 01:45:00', '1999-10-31 02:00:00', interval '15 minute');
Your problem is the conversion from timestamp WITH time zone which is returned by generate_series() and your column which is defined as timestamp WITHOUT time zone.
1999-10-31 is the day where daylight savings time changes (at least in some countries)
If you change your column to timestamp WITH time zone your code works without any modification.
Example
If you want to stick with timestamp WITHOUT timestamp you need to convert the value returned by generate_series()
insert into materialized_quarters (tm)
select g.tm at time zone 'UTC' --<< change to the time zone you need
from GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute') as g(tm)
Example

postgresql list of time slots from 'Monday' | 09:00:00 | 11:00:00

I’m building a booking system where a user will set their availability eg: I’m available Monday’s from 9am to 11am, Tuesdays from 9am to 5pm etc… and need to generate a list of time slots 15mins apart from their availability.
I have the following table (but am flexible to changing this):
availabilities(day_of_week text, start_time: time, end_time: time)
which returns records like:
‘Monday’ | 09:00:00 | 11:00:00
‘Monday’ | 13:00:00 | 17:00:00
‘Tuesday’ | 08:00:00 | 17:00:00
So I’m trying to build a stored procedure to generate a list of time slots so far I've got this:
create or replace function timeslots ()
return setof timeslots as $$
declare
rec record;
begin
for rec in select * from availabilities loop
/*
convert 'Monday' | 09:00:00 | 11:00:00 into:
2020-02-03 09:00:00
2020-02-03 09:15:00
2020-02-03 09:30:00
2020-02-03 09:45:00
2020-02-03 10:00:00
and so on...
*/
return next
end loop
$$ language plpgsql stable;
I return a setof instead of a table as I'm using Hasura and it needs to return a setof so I just create a blank table.
I think I'm on the right track but am currently stuck on:
how do I create a timestamp from 'Monday' 09:00:00 for the next monday as I only care about timeslots from today onwards?
how do I convert 'Monday' | 09:00:00 | 11:00:00 into a list of time slots 15 mins apart?
how do I create a timestamp from 'Monday' 09:00:00 for the next monday
as I only care about timeslots from today onwards?
You can use date_trunc for this (see this question for more info):
SELECT date_trunc('week', current_date) + interval '1 week';
From the docs re week:
The number of the ISO 8601 week-numbering week of the year. By
definition, ISO weeks start on Mondays
So taking this value and adding a week gives next Monday (you may need to ammend this behaviour based upon what you want to do if today is monday!).
how do I convert 'Monday' | 09:00:00 | 11:00:00 into a list of time
slots 15 mins apart?
This is a little tricker; generate_series will give you the timeslots but the trick is getting it into a result set. The following should do the job (I have included your sample data; change the values bit to refer to your table) - dbfiddle :
with avail_times as (
select
date_trunc('week', current_date) + interval '1 week' + case day_of_week when 'Monday' then interval '0 day' when 'Tuesday' then interval '1 day' end + start_time as start_time,
date_trunc('week', current_date) + interval '1 week' + case day_of_week when 'Monday' then interval '0 day' when 'Tuesday' then interval '1 day' end + end_time as end_time
from
(
values
('Monday','09:00:00'::time,'11:00:00'::time),
('Monday','13:00:00'::time,'17:00:00'::time),
('Tuesday','08:00:00'::time,'17:00:00'::time)
) as availabilities (day_of_week,
start_time,
end_time) )
select
g.ts
from
(
select
start_time,
end_time
from
avail_times) avail,
generate_series(avail.start_time, avail.end_time - interval '1ms', '15 minutes') g(ts);
A few notes:
The CTE avail_times is used to simplify things; it generates two columns (start_time and end_time) which are the full timestamps (so including the date). In this example the first row is "2020-02-03 09:00:00, 2020-02-03 11:00:00" (I'm running this on 2020-02-02 so 2020-02-03 is next Monday).
The way I'm converting 'monday' etc to a day of the week is a bit of a hack (and I have not bothered to do the full week); there is probably a better way but storing the day of week as an integer would make this simpler.
I subtract 1ms from the end time because I'm assuming you dont want this in the result set.
The main query is using a LATERAL Subquery. See this question for more info.
Aditional Question
how to adjust this so I can pass in a start and end date so I can get
time slots for a particular period
You could do something like the following (just adjust the dates CTE to return whatever days you want to include; you could convert to a function or just pass the dates in as parameters).
Note that as #Belayer mentions my original solution did not cater for shifts over midnight so this addresses that too.
with dates as (
select
day
from
generate_series('2020-02-20'::date, '2020-03-10'::date, '1 day') as day ),
availabilities as (
select
*
from
(
values (1,'09:00:00'::time,'11:00:00'::time),
(1,'13:00:00'::time,'17:00:00'::time),
(2,'08:00:00'::time,'17:00:00'::time),
(3,'23:00:00'::time,'01:00:00'::time)
) as availabilities
(day_of_week, -- 1 = monday
start_time,
end_time) ) ,
avail_times as (
select
d.day + start_time as start_time,
case
end_time > start_time
when true then d.day
else d.day + interval '1 day' end + end_time as end_time
from
availabilities a
inner join dates d on extract(ISODOW from d.day) = a.day_of_week )
select
g.ts
from
(
select
start_time,
end_time
from
avail_times) avail,
generate_series(avail.start_time, avail.end_time - interval '1ms', '15 minutes') g(ts)
order by
g.ts;
The following uses much of the techniques mentioned by #Brits. They present some very good information, so I'll not repeat but suggest you review it (and the links).
I do however take a slightly different approach. First a couple table changes. I use the ISO day of week 1-7 (Monday-Sunday) rather than the day name. The day name is easily extracted for the dater later.
Also I use interval instead to time for start and end times. ( A time data type works for most scenarios but there is one it doesn't (more later).
One thing your description does not make clear is whether the ending time is included it the available time or not. If included the last interval would be 11:00-11:15. If excluded the last interval is 10:45-11:00. I have assumed to excluded it. In the final results the end time is to be read as "up to but not including".
-- setup
create table availabilities (weekday integer, start_time interval, end_time interval);
insert into availabilities (weekday , start_time , end_time )
select wkday
, start_time
, end_time
from (select *
from (values (1, '09:00'::interval, '11:00'::interval)
, (1, '13:00'::interval, '17:00'::interval)
, (2, '08:00'::interval, '17:00'::interval)
, (3, '08:30'::interval, '10:45'::interval)
, (4, '10:30'::interval, '12:45'::interval)
) as v(wkday,start_time,end_time)
) r ;
select * from availabilities;
The Query
It begins with a CTE (next_week) generates a entry for each day of the week beginning Monday and the appropriate ISO day number for it. The main query joins these with the availabilities table to pick up times for matching days. Finally that result is cross joined with a generated timestamp to get the 15 minute intervals.
-- Main
with next_week (wkday,tm) as
(SELECT n+1, date_trunc('week', current_date) + interval '1 week' + n*interval '1 day'
from generate_series (0, 6) n
)
select to_char(gdtm,'Day'), gdtm start_time, gdtm+interval '15 min' end_time
from ( select wkday, tm, start_time, end_time
from next_week nw
join availabilities av
on (av.weekday = nw.wkday)
) s
cross join lateral
generate_series(start_time+tm, end_time+tm- interval '1 sec', interval '15 min') gdtm ;
The outlier
As mentioned there is one scenario where a time data type does not work satisfactory, but you may not nee it. What happens when a shift worker says they available time is 23:00-01:30. Believe me when a shift worker goes to work at 22:00 of Friday, 01:30 is still Friday night, even though the calendar might not agree. (I worked that shift for many years.) The following using interval handles that issue. Loading the same data as prior with an addition for the this case.
insert into availabilities (weekday, start_time, end_time )
select wkday
, start_time
, end_time + case when end_time < start_time
then interval '1 day'
else interval '0 day'
end
from (select *
from (values (1, '09:00'::interval, '11:00'::interval)
, (1, '13:00'::interval, '17:00'::interval)
, (2, '08:00'::interval, '17:00'::interval)
, (3, '08:30'::interval, '10:45'::interval)
, (5, '23:30'::interval, '02:30'::interval) -- Friday Night - Saturday Morning
) as v(wkday,start_time,end_time)
) r
;
select * from availabilities;
Hope this helps.

Group by Date and sum of total duration for that day

I am using workbench/j Postgres DB for my query which is as follows -
Input
ID |utc_tune_start_time |utc_tune_end_time
----------------------------------------------
A |04-03-2019 19:00:00 |04-03-2019 20:00:00
----------------------------------------------
A |04-03-2019 23:00:00 |05-03-2019 01:00:00
-----------------------------------------------
A |05-03-2019 10:00:00 |05-03-2019 10:30:00
-----------------------------------------------
Output
ID |Day |Duration in Minutes
----------------------------------------
A |04-03-2019 |120
-----------------------------------
A |05-03-2019 |90
-----------------------------------
I require the duration elapsed from the utc_tune_start_time till the end of the day and similarly, the time elapsed for utc_tune_end_time since the start of the day.
Thanks for your clarifications. This is possible with some case statements. Basically, if utc_tune_start_time and utc_tune_end_time are on the same day, just use the difference, otherwise calculate the difference from the end or start of the day.
WITH all_activity as (
select date_trunc('day', utc_tune_start_time) as day,
case when date_trunc('day', utc_tune_start_time) =
date_trunc('day', utc_tune_end_time)
then utc_tune_end_time - utc_tune_start_time
else date_trunc('day', utc_tune_start_time) +
interval '1 day' - utc_tune_start_time
end as time_spent
from test
UNION ALL
select date_trunc('day', utc_tune_end_time),
case when date_trunc('day', utc_tune_start_time) =
date_trunc('day', utc_tune_end_time)
then null -- we already calculated this earlier
else utc_tune_end_time - date_trunc('day', utc_tune_end_time)
end
FROM test
)
select day, sum(time_spent)
FROM all_activity
GROUP BY day;
day | sum
---------------------+----------
2019-03-04 00:00:00 | 02:00:00
2019-03-05 00:00:00 | 01:30:00
(2 rows)

How do I generate months between start date and now() in postgresql

I also have the question how do i get code block to work on stack overflow but that's a side issue.
I have this quasi-code that works:
select
*
from
unnest('{2018-6-1,2018-7-1,2018-8-1,2018-9-1}'::date[],
'{2018-6-30,2018-7-31,2018-8-31,2018-9-30}'::date[]
) zdate(start_date, end_date)
left join lateral pipe_f(zdate...
But now I want it to work from 6/1/2018 until now(). What's the best way to do this.
Oh, postgresql 10. yay!!
Your query gives a list of first and last days of months between "2018-06-01" and now. So I am assuming that you want to this in a more dynamic way:
demo: db<>fiddle
SELECT
start_date,
(start_date + interval '1 month -1 day')::date as end_date
FROM (
SELECT generate_series('2018-6-1', now(), interval '1 month')::date as start_date
)s
Result:
start_date end_date
2018-06-01 2018-06-30
2018-07-01 2018-07-31
2018-08-01 2018-08-31
2018-09-01 2018-09-30
2018-10-01 2018-10-31
generate_series(timestamp, timestamp, interval) generates a list of timestamps. Starting with "2018-06-01" until now() with the 1 month interval gives this:
start_date
2018-06-01 00:00:00+01
2018-07-01 00:00:00+01
2018-08-01 00:00:00+01
2018-09-01 00:00:00+01
2018-10-01 00:00:00+01
These timestamps are converted into dates with ::date cast.
Then I add 1 month to get the next month. But as we are interested in the last day of the previous month I subtract one day again (+ interval '1 month -1 day')
Another option that's more ANSI-compliant is to use a recursive CTE:
WITH RECURSIVE
dates(d) AS
(
SELECT '2018-06-01'::TIMESTAMP
UNION ALL
SELECT d + INTERVAL '1 month'
FROM dates
WHERE d + INTERVAL '1 month' <= '2018-10-01'
)
SELECT
d AS start_date,
-- add 1 month, then subtract 1 day, to get end of current month
(d + interval '1 month') - interval '1 day' AS end_date
FROM dates

Grouping by date, with 0 when count() yields no lines

I'm using Postgresql 9 and I'm fighting with counting and grouping when no lines are counted.
Let's assume the following schema :
create table views {
date_event timestamp with time zone ;
event_id integer;
}
Let's imagine the following content :
2012-01-01 00:00:05 2
2012-01-01 01:00:05 5
2012-01-01 03:00:05 8
2012-01-01 03:00:15 20
I want to group by hour, and count the number of lines. I wish I could retrieve the following :
2012-01-01 00:00:00 1
2012-01-01 01:00:00 1
2012-01-01 02:00:00 0
2012-01-01 03:00:00 2
2012-01-01 04:00:00 0
2012-01-01 05:00:00 0
.
.
2012-01-07 23:00:00 0
I mean that for each time range slot, I count the number of lines in my table whose date correspond, otherwise, I return a line with a count at zero.
The following will definitely not work (will yeld only lines with counted lines > 0).
SELECT extract ( hour from date_event ),count(*)
FROM views
where date_event > '2012-01-01' and date_event <'2012-01-07'
GROUP BY extract ( hour from date_event );
Please note I might also need to group by minute, or by hour, or by day, or by month, or by year (multiple queries is possible of course).
I can only use plain old sql, and since my views table can be very big (>100M records), I try to keep performance in mind.
How can this be achieved ?
Thank you !
Given that you don't have the dates in the table, you need a way to generate them. You can use the generate_series function:
SELECT * FROM generate_series('2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts;
This will produce results like this:
ts
---------------------
2012-01-01 00:00:00
2012-01-01 01:00:00
2012-01-01 02:00:00
2012-01-01 03:00:00
...
2012-01-07 21:00:00
2012-01-07 22:00:00
2012-01-07 23:00:00
(168 rows)
The remaining task is to join the two selects using an outer join like this :
select extract ( day from ts ) as day, extract ( hour from ts ) as hour,coalesce(count,0) as count from
(
SELECT extract ( day from date ) as day , extract ( hour from date ) as hr ,count(*)
FROM sr
where date>'2012-01-01' and date <'2012-01-07'
GROUP BY extract ( day from date ) , extract ( hour from date )
) AS cnt
right outer join ( SELECT * FROM generate_series ( '2012-01-01'::timestamp, '2012-01-07 23:00', '1 hour') AS ts ) as dtetable on extract ( hour from ts ) = cnt.hr and extract ( day from ts ) = cnt.day
order by day,hour asc;
This query will give you the output what your are looking for,
select to_char(date_event, 'YYYY-MM-DD HH24:00') as time, count (to_char(date_event, 'HH24:00')) as count from views where date(date_event) > '2012-01-01' and date(date_event) > '2012-01-07' group by time order by time;