Postgresql generate series with interval '15 minutes' longer than 29092 items - postgresql

Sut:
create table meter.materialized_quarters
(
id int4 not null generated by default as identity,
tm timestamp without time zone
,constraint pk_materialized_quarters primary key (id)
--,constraint uq_materialized_quarters unique (tm)
);
Then setup data:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute');
And check data:
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
Some results:
count|tm |
-----+-----------------------+
2|1999-10-31 02:00:00.000|
2|1999-10-31 02:15:00.000|
2|1999-10-31 02:30:00.000|
2|1999-10-31 02:45:00.000|
2|2000-10-29 02:00:00.000|
2|2000-10-29 02:15:00.000|
2|2000-10-29 02:30:00.000|
2|2000-10-29 02:45:00.000|
2|2001-10-28 02:00:00.000|
2|2001-10-28 02:15:00.000|
2|2001-10-28 02:30:00.000|
....
Details:
select * from meter.materialized_quarters where tm = '1999-10-31 01:45:00';
Result:
id |tm |
-----+-----------------------+
29092|1999-10-31 01:45:00.000|
As I see, 29092 is maximum series of nonduplicated data generated by: GENERATE_SERIES with 15 minutes interval.
How to fill table (meter.materialized_quarters) from 1999 to 2030?
One solution is:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '1999-10-31 01:45:00', interval '15 minute');
then:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-10-31 02:00:00.000', '2000-10-29 00:00:00.000', interval '15 minute');
and again, and again.
Or
with bad as (
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
)
, ids as (
select mq1.id, mq2.id as iddel
from meter.materialized_quarters mq1 inner join bad on bad.tm = mq1.tm inner join meter.materialized_quarters mq2 on bad.tm = mq2.tm
where mq1.id<mq2.id
)
delete from meter.materialized_quarters
where id in (select iddel from ids);
Is there more 'elegant' way?
EDIT.
I see the problem.
xxxx-10-29 02:00:00 - summer time become winter time.
select GENERATE_SERIES ('1999-10-31 01:45:00', '1999-10-31 02:00:00', interval '15 minute');

Your problem is the conversion from timestamp WITH time zone which is returned by generate_series() and your column which is defined as timestamp WITHOUT time zone.
1999-10-31 is the day where daylight savings time changes (at least in some countries)
If you change your column to timestamp WITH time zone your code works without any modification.
Example
If you want to stick with timestamp WITHOUT timestamp you need to convert the value returned by generate_series()
insert into materialized_quarters (tm)
select g.tm at time zone 'UTC' --<< change to the time zone you need
from GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute') as g(tm)
Example

Related

Dynamic value passing in Postgres

Here is a complex query where i need to pass some dates as dynamic to this, As of now i have hardcoded this '2021-08-01' AND '2022-07-31' these 2 dates.
But i have to pass this dates dynamically in such a way that next dates ie, 2022-06 month , thew dates passed will be '2021-07-01' and '2022-06-30' , basically 12 months behind data.
if we take 2022-05 then the passed date should be '2021-06-01' and '2022-05-31'.
How can we achieve this ? Any suggestions or help will be much appreciated.
below is the query for reference
WITH base as
(
SELECT created_at as period ,order_number, TRIM(email) as email ,is_first_order
FROM orders
WHERE created_at::DATE BETWEEN '2021-08-01' AND '2022-07-31'
)
,base_agg as
(
select TO_CHAR(period,'YYYY-MM') as period
,COUNT(DISTINCT email)FILTER(WHERE is_first_order IS TRUE) as new_users
,COUNT(DISTINCT order_number)FILTER(WHERE is_first_order IS FALSE) as returning_orders
FROM base
GROUP BY 1
)
,base_cumulative as
(
SELECT ROW_NUMBER() OVER(ORDER BY PERIOD DESC ) as rno
,period
,new_users
,returning_orders
,sum("new_users")over (order by "period" asc rows between unbounded preceding and current row) as "cumulative_total"
from base_agg
)
SELECT
(SELECT period FROM base_cumulative WHERE rno=1) period
,(SELECT cumulative_total FROM base_cumulative WHERE rno=1) as cumulated_customers
,SUM(returning_orders) as returning_orders
,SUM(returning_orders)/NULLIF((SELECT cumulative_total FROM base_cumulative WHERE rno=1),0) as rate
FROM base_cumulative
You can calculate the end of current month based on NOW() and some logic, the same can be applied with the rest of the calculation
select date_trunc('month', now())::date + interval '1 month - 1 day' end_of_this_month,
date_trunc('month', now())::date + interval '1 month - 1 day'::interval - '1 year'::interval + '1 day'::interval first_day_of_prev_year_month
;
Result
end_of_this_month | first_day_of_prev_year_month
---------------------+------------------------------
2022-08-31 00:00:00 | 2021-09-01 00:00:00
(1 row)

Generate missing data and fill it down - postgresql

I have the dataset:
The problem is that the records are added only if an event happened, e.g. for the row with id 13897, the record was updated on 4/18/2020 and then on 5/1/2020 - the status was changed. What I need is the status of each record at the end of every month.
I was thinking about the below logic:
generate the series of dates from the min(date) till now - T1
get distinct id from the dataset - T2
do cross join between two above tables so that we get a new row for every row in the second table - T3
extract the dataset with all required fields - T4
merge T3 and T4 by concatenate(date and id) - T5
sort T5 by id and d asc - T5
fill-down all the fields grouped by id - T5
generate the series of dates from min(date) till now with the interval of one month and get the last day of each month - T6
merge T5 and T6 by date - right join so that we get only rows with the date = end of month
I am on step 6.
SELECT *
FROM (SELECT d, Concat(dt, t2.id) AS cnct
FROM (SELECT d,d::date AS dt
FROM generate_series(
( SELECT min(created_at::date)
FROM new_table), CURRENT_DATE , interval '1 day') d) t1
CROSS JOIN
(SELECT DISTINCT id FROM new_table )) t2)t3
--in case if a record with the same id was updated several times throughout the day
LEFT JOIN (WITH cte AS
( SELECT id, status, created_at at time zone 'eat' at time zone 'utc' AS "created_at", updated_at::date AS date, updated_at::date, row_number() OVER (partition BY id, updated_at::date ORDER BY updated_at DESC) rnFROM new_table ))SELECT cte.*, Concat(updated_at::date, id) AS cnct
FROM cte
WHERE rn = 1) t4
ON t3.cnct = t4.cnct
I am stuck on step 7. I found fill column with last value from partition in postgresql but it is not what I need. I envision that I need to sort by a date block i.e. dates from min date to now for one id - 13894 are to be considered block 1, dates from min date to now for another id - 13897 are to be considered block 2. The next step I thought is to fill-down all fields per a block.
And another question, how do you deal with the event-based data to adapt it for the time-series?
Tried:
You can use Postgresql's DISTINCT ON feature to do this. We'll generate a series with the start of every month (you'll need to supply start and end dates here) and put the ID and the date into the DISTINCT ON so that we get only one row of new_table for each distinct ID and month pair. Then we simply filter and order to ensure that the row we're getting for each ID and month is the latest row for which the date is before the new month.
SELECT DISTINCT ON (new_table.id, month_start) *
FROM new_table, generate_series(start_date, end_date, interval '1 month') month_start
WHERE new_table.date < month_start
ORDER BY new_table.id, month_start ASC, new_table.date DESC;
(If you need your results to have the last day of the month and not the first day of the next month, you can just subtract 1 day from month_start in your select clause.)
EDIT: Running on the data you supplied, I get this:
SELECT DISTINCT ON (new_table.id, month_start) new_table.id, month_start - interval '1 day' as month_end, new_table.status
FROM new_table, generate_series('2020-05-01', '2020-06-01', interval '1 month') month_start
WHERE new_table.date < month_start
ORDER BY new_table.id, month_start ASC, new_table.date DESC;
id | month_end | status
-------+------------------------+--------
13894 | 2020-04-30 00:00:00-07 | 5
13894 | 2020-05-31 00:00:00-07 | 5
13897 | 2020-04-30 00:00:00-07 | 2
13897 | 2020-05-31 00:00:00-07 | 5
(4 rows)

postgresql list of time slots from 'Monday' | 09:00:00 | 11:00:00

I’m building a booking system where a user will set their availability eg: I’m available Monday’s from 9am to 11am, Tuesdays from 9am to 5pm etc… and need to generate a list of time slots 15mins apart from their availability.
I have the following table (but am flexible to changing this):
availabilities(day_of_week text, start_time: time, end_time: time)
which returns records like:
‘Monday’ | 09:00:00 | 11:00:00
‘Monday’ | 13:00:00 | 17:00:00
‘Tuesday’ | 08:00:00 | 17:00:00
So I’m trying to build a stored procedure to generate a list of time slots so far I've got this:
create or replace function timeslots ()
return setof timeslots as $$
declare
rec record;
begin
for rec in select * from availabilities loop
/*
convert 'Monday' | 09:00:00 | 11:00:00 into:
2020-02-03 09:00:00
2020-02-03 09:15:00
2020-02-03 09:30:00
2020-02-03 09:45:00
2020-02-03 10:00:00
and so on...
*/
return next
end loop
$$ language plpgsql stable;
I return a setof instead of a table as I'm using Hasura and it needs to return a setof so I just create a blank table.
I think I'm on the right track but am currently stuck on:
how do I create a timestamp from 'Monday' 09:00:00 for the next monday as I only care about timeslots from today onwards?
how do I convert 'Monday' | 09:00:00 | 11:00:00 into a list of time slots 15 mins apart?
how do I create a timestamp from 'Monday' 09:00:00 for the next monday
as I only care about timeslots from today onwards?
You can use date_trunc for this (see this question for more info):
SELECT date_trunc('week', current_date) + interval '1 week';
From the docs re week:
The number of the ISO 8601 week-numbering week of the year. By
definition, ISO weeks start on Mondays
So taking this value and adding a week gives next Monday (you may need to ammend this behaviour based upon what you want to do if today is monday!).
how do I convert 'Monday' | 09:00:00 | 11:00:00 into a list of time
slots 15 mins apart?
This is a little tricker; generate_series will give you the timeslots but the trick is getting it into a result set. The following should do the job (I have included your sample data; change the values bit to refer to your table) - dbfiddle :
with avail_times as (
select
date_trunc('week', current_date) + interval '1 week' + case day_of_week when 'Monday' then interval '0 day' when 'Tuesday' then interval '1 day' end + start_time as start_time,
date_trunc('week', current_date) + interval '1 week' + case day_of_week when 'Monday' then interval '0 day' when 'Tuesday' then interval '1 day' end + end_time as end_time
from
(
values
('Monday','09:00:00'::time,'11:00:00'::time),
('Monday','13:00:00'::time,'17:00:00'::time),
('Tuesday','08:00:00'::time,'17:00:00'::time)
) as availabilities (day_of_week,
start_time,
end_time) )
select
g.ts
from
(
select
start_time,
end_time
from
avail_times) avail,
generate_series(avail.start_time, avail.end_time - interval '1ms', '15 minutes') g(ts);
A few notes:
The CTE avail_times is used to simplify things; it generates two columns (start_time and end_time) which are the full timestamps (so including the date). In this example the first row is "2020-02-03 09:00:00, 2020-02-03 11:00:00" (I'm running this on 2020-02-02 so 2020-02-03 is next Monday).
The way I'm converting 'monday' etc to a day of the week is a bit of a hack (and I have not bothered to do the full week); there is probably a better way but storing the day of week as an integer would make this simpler.
I subtract 1ms from the end time because I'm assuming you dont want this in the result set.
The main query is using a LATERAL Subquery. See this question for more info.
Aditional Question
how to adjust this so I can pass in a start and end date so I can get
time slots for a particular period
You could do something like the following (just adjust the dates CTE to return whatever days you want to include; you could convert to a function or just pass the dates in as parameters).
Note that as #Belayer mentions my original solution did not cater for shifts over midnight so this addresses that too.
with dates as (
select
day
from
generate_series('2020-02-20'::date, '2020-03-10'::date, '1 day') as day ),
availabilities as (
select
*
from
(
values (1,'09:00:00'::time,'11:00:00'::time),
(1,'13:00:00'::time,'17:00:00'::time),
(2,'08:00:00'::time,'17:00:00'::time),
(3,'23:00:00'::time,'01:00:00'::time)
) as availabilities
(day_of_week, -- 1 = monday
start_time,
end_time) ) ,
avail_times as (
select
d.day + start_time as start_time,
case
end_time > start_time
when true then d.day
else d.day + interval '1 day' end + end_time as end_time
from
availabilities a
inner join dates d on extract(ISODOW from d.day) = a.day_of_week )
select
g.ts
from
(
select
start_time,
end_time
from
avail_times) avail,
generate_series(avail.start_time, avail.end_time - interval '1ms', '15 minutes') g(ts)
order by
g.ts;
The following uses much of the techniques mentioned by #Brits. They present some very good information, so I'll not repeat but suggest you review it (and the links).
I do however take a slightly different approach. First a couple table changes. I use the ISO day of week 1-7 (Monday-Sunday) rather than the day name. The day name is easily extracted for the dater later.
Also I use interval instead to time for start and end times. ( A time data type works for most scenarios but there is one it doesn't (more later).
One thing your description does not make clear is whether the ending time is included it the available time or not. If included the last interval would be 11:00-11:15. If excluded the last interval is 10:45-11:00. I have assumed to excluded it. In the final results the end time is to be read as "up to but not including".
-- setup
create table availabilities (weekday integer, start_time interval, end_time interval);
insert into availabilities (weekday , start_time , end_time )
select wkday
, start_time
, end_time
from (select *
from (values (1, '09:00'::interval, '11:00'::interval)
, (1, '13:00'::interval, '17:00'::interval)
, (2, '08:00'::interval, '17:00'::interval)
, (3, '08:30'::interval, '10:45'::interval)
, (4, '10:30'::interval, '12:45'::interval)
) as v(wkday,start_time,end_time)
) r ;
select * from availabilities;
The Query
It begins with a CTE (next_week) generates a entry for each day of the week beginning Monday and the appropriate ISO day number for it. The main query joins these with the availabilities table to pick up times for matching days. Finally that result is cross joined with a generated timestamp to get the 15 minute intervals.
-- Main
with next_week (wkday,tm) as
(SELECT n+1, date_trunc('week', current_date) + interval '1 week' + n*interval '1 day'
from generate_series (0, 6) n
)
select to_char(gdtm,'Day'), gdtm start_time, gdtm+interval '15 min' end_time
from ( select wkday, tm, start_time, end_time
from next_week nw
join availabilities av
on (av.weekday = nw.wkday)
) s
cross join lateral
generate_series(start_time+tm, end_time+tm- interval '1 sec', interval '15 min') gdtm ;
The outlier
As mentioned there is one scenario where a time data type does not work satisfactory, but you may not nee it. What happens when a shift worker says they available time is 23:00-01:30. Believe me when a shift worker goes to work at 22:00 of Friday, 01:30 is still Friday night, even though the calendar might not agree. (I worked that shift for many years.) The following using interval handles that issue. Loading the same data as prior with an addition for the this case.
insert into availabilities (weekday, start_time, end_time )
select wkday
, start_time
, end_time + case when end_time < start_time
then interval '1 day'
else interval '0 day'
end
from (select *
from (values (1, '09:00'::interval, '11:00'::interval)
, (1, '13:00'::interval, '17:00'::interval)
, (2, '08:00'::interval, '17:00'::interval)
, (3, '08:30'::interval, '10:45'::interval)
, (5, '23:30'::interval, '02:30'::interval) -- Friday Night - Saturday Morning
) as v(wkday,start_time,end_time)
) r
;
select * from availabilities;
Hope this helps.

Trouble joining generate_series timestamp without time zone on a field that's timestamp without timezone

I am trying to figure out a way to report how many people are in a location at the same time, down to the second.
I have a table with the id for the person, the date they entered, the time they entered, the date they left and the time they left.
example:
select unique_id, start_date, start_time, end_date, end_time
from My_Table
where start_date between '09/01/2019' and '09/02/2019'
limit 3
"unique_id" "start_date" "start_time" "end_date" "end_time"
989179 "2019-09-01" "06:03:13" "2019-09-01" "06:03:55"
995203 "2019-09-01" "11:29:27" "2019-09-01" "11:30:13"
917637 "2019-09-01" "11:06:46" "2019-09-01" "11:06:59"
i've concatenated the start_date & start_time as well as end_date & end_time so they are 2 fields
select unique_id, ((start_date + start_time)::timestamp without time zone) as start_date,
((end_date + end_time)::timestamp without time zone) as end_date
result example:
"start_date"
"2019-09-01 09:28:54"
so i'm making that a CTE, then using a second CTE that uses generate_series between dates down to the second.
The goal being, the generate series will have a row for every second between the two dates. Then when i join my data sets, i can count how many records exist in my_table where the start_date(plus time) is equal or greater than the generate_series date_time field, and the end_date(plus time) is less than or equal to the generate_series date_time field.
i feel that was harder to explain than it needed to be.
in theory, if a person was in the room from 2019-09-01 00:01:01 and left at 2019-09-01 00:01:03, i would count that record in the generate_series rows 2019-09-01 00:01:01, 2019-09-01 00:01:02 & 2019-09-01 00:01:03.
When i look at the data i can see that i should be returning hundreds of people in the room at specific peak periods. but the query returns all 0's.
is this possibly a field formatting issue i need to adjust?
Here is the query:
with CTE as (
select unique_id, ((start_date+start_time)::timestamp without time zone) as start_date,
((end_date+end_time)::timestamp without time zone) as end_date
from My_table
where start_date between '09/01/2019' and '09/02/2019'
),
time_series as (
select generate_series( (date '2019-09-01')::timestamp, (date '2019-09-02')::timestamp, interval '1 second') as date_time
)
/*FINAL SELECT*/
select date_time, count(B.unique_id) as NumPpl
FROM (
select A.date_time
FROM time_series a
)x
left join CTE b on b.start_date >= x.date_time AND b.end_date <= x.date_time
GROUP BY 1
ORDER BY 1
(partial) result screenshot
Thank you in advance
i should also add i have read only access to this database so i'm not able to create functions.
Simple version: b.start_date >= x.date_time AND b.end_date <= x.date_time will never be true assuming end_date is always after start_date.
Longer version: You also do not need a CTE for the generate_series() and there is no reason for selecting all columns and all rows of this CTE as a subquery. I would also drop the CTE for your original data and just join it to the seconds (NOTE: this does somehow change the query, since you might now take those entries into account, where start_date is earlier than 2019-09-01. If you do not want this, you can add your condition again to the join condition. But I guess this is what you really wanted). I also removed some casts which were not needed. Try this:
SELECT gs.second, COUNT(my.unique_id)
FROM generate_series('2019-09-01'::timestamp, '2019-09-02'::timestamp, interval '1 second') gs (second)
LEFT JOIN my_table my ON (my.start_date + my.start_time) <= gs.second
AND (my.end_date + my.end_time) >= gs.second
GROUP BY 1
ORDER BY 1

Postgres search available time slots with generate_series

I have a table in my postgres database which has a column of dates. I want to search which of those dates is missing - for example:
date
2016-11-09 18:30:00
2016-11-09 19:00:00
2016-11-09 20:15:00
2016-11-09 22:20:00
2016-11-09 23:00:00
Here, |2016-11-09 21:00:00| is missing. After sorting my generated series if my table has an entry between two slots (slot of 1 hr interval) i need to remove that.
I want to make a query with generate_series that returns me the date which is missing. Is this possible?.
sample query that i used to generate series.
SELECT t
FROM generate_series(
TIMESTAMP WITH TIME ZONE '2016-11-09 18:00:00',
TIMESTAMP WITH TIME ZONE '2016-11-09 23:00:00',
INTERVAL '1 hour'
) t
EXCEPT
SELECT tscol
FROM mytable;
But this query is not removing 2016-11-09 18:30:00,2016-11-09 20:15:00 etc. cuz i used except.
This is not a gaps-and-island problem. You just want to find the 1 hour intervals for which no record exist in the table.
EXCEPT does not work here because it does equality comparison, while you want to check if a record exists or not within a range.
A typical solution for this is to use a left join antipattern:
select dt
from generate_series(
timestamp with time zone '2016-11-09 18:00:00',
timestamp with time zone '2016-11-09 23:00:00',
interval '1 hour'
) d(dt)
left join mytable t
on t.tscol >= dt and t.tscol < dt + interval '1 hour'
where t.tscol is null
You can also use not exists:
select dt
from generate_series(
timestamp with time zone '2016-11-09 18:00:00',
timestamp with time zone '2016-11-09 23:00:00',
interval '1 hour'
) d(dt)
where not exists (
select 1
from mytable t
where t.tscol >= dt and t.tscol < dt + interval '1 hour'
)
In this demo on DB Fiddle, both queries return:
| dt |
| :--------------------- |
| 2016-11-09 21:00:00+00 |