Simple temporal view? - postgresql

How can I write a query to compute the end date per ID in postgres? I am able to do this in-memory with Python, but I would rather keep it simple and just create a view.
My table appends any new combination of system_1_id and system_2_id along with the date of the file the data was from (I am reading a snapshot mapping file which is sent a few times per week). It looks like this:
system_1_id system_2_id start_date is_current
123456 05236 2016-06-01 False
123456 98899 2017-01-03 False
123456 05236 2017-04-15 True
To:
system_1_id system_2_id start_date end_date
123456 05236 2016-06-01 2017-01-02
123456 98899 2017-01-03 2017-04-14
123456 05236 2017-04-15
Note that there can only be one system_2_id assigned to a system_1_id at a time, but they can be recycled and even reassigned at a later date.
The end date is simply just 1 day less than the next row date for the same ID
My goal is eventually to be able to join other tables to the data and pull the accurate ids per date:
where t1.system_2_id = t2.system_2_id and t1.report_date >= t2.start_date and t1.report_date <= t2.end_date
A simple temporal table without worrying about triggers or rules or using an extension.

The lead() window function will do this for you, with your example data:
select
system_1_id,
system_2_id,
start_date,
cast(lead(start_date, 1, Null) over(partition by system_1_id order by start_date) - interval '1 day' as date) as end_date
from
the_table;

Related

Postgres check that repeatable event overlap with time slots

In general, I have something similar to the calendar.
In my database, I have repeatable events. To simplify work with them I generate time slots during which booking room will be reserved.
Table event
id long
room_uuid varchar
start_date timestamp
end_date time_stamp
repeat_every_min long
duration_min long
And another table:
Table event_time_slot
id long
event_id long (fk)
start_date timestamp
end_date time_stamp
How it looks like with mock data:
Table event mock data
id 1
room_uuid 267cb70a-6911-488c-aa9e-9deb506f785b
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:57:00"
repeat_every_min 15
duration_min 10
As result in the table event_time_slot I will have next records:
id 1
event_id 1
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:10:00"
____________________________________
id 2
event_id 1
start_date "2023-01-05 10:15:00"
end_date "2023-01-05 10:20:00"
____________________________________
id 3
event_id 1
start_date "2023-01-05 10:30:00"
end_date "2023-01-05 10:35:00"
____________________________________
id 4
event_id 1
start_date "2023-01-05 10:45:00"
end_date "2023-01-05 10:55:00"
Basically, I will generate time slots while
((startTime + N * duration) + repeatEveryMin) < endTime
My current flow to check will 2 repeatable events conflict or not is quite simple:
I generate time slots for event, and I do
select from event_time_slot ts
join event_time_slot its on its.event_id = ts_id
where
//condition that any of the saved slots overlaps with first generated slots
(its.start_date < (*endTime*) AND its.start_date > (*startTime*))
or
//condition that any of the saved slots overlaps (equal) with first generated slots
(its.start_date = (*endTime*) AND its.start_date = (*startTime*))
The problem is that it forces me to generate a lot of the time slots to execute this query.
Moreover, if I have event with 100 time_slots -> I will need to check that any of the previously saved event time slots do not overlap with 100 which I am going to save.
My question is:
Is in the Postgres any functionality, which can simplify working with repeatable events?
Is there any other technology, which solves this problem?
What I have tried:
To generate time slots for the event. The problem is that query is too complex and if I will have more than 5000 time slots for the 1 event -> I will need to do multiple queries to the DB, because I will receive memory error in my app.
Expecting to receive a feedback or a technology how Postgres can simplify current flow.
My primary question is - does Postgres have any functionality, to remove work with time slots at all?
For example - I pass startDate + endDate + repeatInterval to the query and SQL shows me overlapping events.
I want to avoid creating condition for every time_slot from event for which I want to check this
This query generates 4 time slots:
SELECT
tsrange(ts, ts + INTERVAL '10 MINUTE', '[)')
FROM generate_series(
'2023-01-05 10:00:00'::timestamp
, '2023-01-05 10:57:00'::timestamp
, INTERVAL '15 MINUTE') g(ts)
WHERE ts::time BETWEEN '09:00' AND '17:55' -- business hours
AND EXTRACT(DOW FROM ts) BETWEEN 1 AND 5 -- Monday to Friday
-- other conditions
Check the manual for all options you have with ranges, including the very powerful constraints to avoid overlapping events.

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

Retrieve Records for month,datewise group by day in postgres

I have table time_slot where i have columns like date,start_time,end_time,user.
I want to retrieve records like say if I give the month and year along with user, what is the slots available for a particular user day wise for a month. Say user can have 3slots on a day & 0 on a day.
I am using Postgres and my date column is a date, time column is time. I am trying to do this in a Java web application and the date will be picked using a jquery datepicker. From where I'm sending as month, year and user.
Sample Data of table.
Date start-time end-time user
2019-09-01 12:21:34 13:21:34 user1
2019-09-01 14:21:34 15:21:34 user1
2019-09-01 17:21:34 17:21:34 user1
2019-09-03 12:21:34 13:21:34 user1
2019-09-03 12:21:34 13:21:34 user1
I would like to create a query that gives the time-slots of user concating start-time & end-time column and groups the results by date for a month as follows:
Date count_of_slots
2019-09-01 3
2019-09-02 0
2019-09-03 2
I have tried the below Query.
select distinct kt.start_time,kt.end_time,DATE(kt.slot_date),count(kt.slot_date)
from time_slot as kt
WHERE date_trunc('month',to_timestamp(kt.start_time, 'yy-mm-dd HH24:MI:SS.MS') + interval '1 day')
= date_trunc('month',to_timestamp(:startdate, 'yy-mm-dd HH24:MI:SS.MS') + interval '1 day' )
group by DATE(kt.slot_date) order by cb.start_time.
After getting result as expected above format, I need to loop through date to get the time-slots for that day and store in json as below.
{
"Date" : "2019-09-01",
"count" : "3",
"time-slot" : [
"12:21:34 - 13:21:34","14:21:34 - 15:21:34","17:21:34 - 17:21:34"]
}
Any suggestion and leads are welcomed.
Disclaimer: You should really upgrade your Postgres version!
demo:db<>fiddle
You need to join a date series against your data set. This can be done using the generate_series() function.
SELECT
gs::date,
COUNT(the_date)
FROM
time_slot ts
RIGHT JOIN
generate_series('2019-09-01', '2019-09-05', interval '1 day') gs ON ts.the_date = gs
GROUP BY gs
If you want to get the time_slots as well, simply add:
ARRAY_AGG(start_time || ' - ' || end_time) AS time_slot

Getting maximum sequential streak with events

I’m having trouble getting my head around this.
I’m looking for a single query, if possible, running PostgreSQL 9.6.6 under pgAdmin3 v1.22.1
I have a table with a date and a row for each event on the date:
Date Events
2018-12-10 1
2018-12-10 1
2018-12-10 0
2018-12-09 1
2018-12-08 0
2018-12-07 1
2018-12-06 1
2018-12-06 1
2018-12-06 1
2018-12-05 1
2018-12-04 1
2018-12-03 0
I’m looking for the longest sequence of dates without a break. In this case, 2018-12-08 and 2018-12-03 are the only dates with no events, there are two dates with events between 2018-12-08 and today, and four between 2018-12-8 and 2018-12-07 - so I would like the answer of 4.
I know I can group them together with something like:
Select Date, count(Date) from Table group by Date order by Date Desc
To get just the most recent sequence, I’ve got something like this- the subquery returns the most recent date with no events, and the outer query counts the dates after that date:
select count(distinct date) from Table
where date>
( select date from Table
group by date
having count (case when Events is not null then 1 else null end) = 0
order by date desc
fetch first row only)
But now I need the longest streak, not just the most recent streak.
Thank you!
Your instinct is a good one in looking at the rows with zero events and working off them. We can use a subquery with a window function to get the "gaps" between zero event days, and then in a query outside it take the record we want, like so:
select *
from (
select date as day_after_streak
, lag(date) over(order by date asc) as previous_zero_date
, date - lag(date) over(order by date asc) as difference
, date_part('days', date - lag(date) over(order by date asc) ) - 1 as streak_in_days
from dates
group by date
having sum(events) = 0 ) t
where t.streak_in_days is not null
order by t.streak_in_days desc
limit 1

SQL query to convert date ranges to per day records

Requirements
I have data table that saves data in date ranges.
Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
New record can define it's own date range if it needs to hence can overlap several older records.
Each new overlapping record overrides settings of older records that it overlaps.
Result set
What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.
To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.
Stored data
Imagine my stored data looks like this
ID | RangeFrom | RangeTo | Starts | Ends | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1 | 20110101 | 20110331 | 07:00 | 15:00
2 | 20110401 | 20110531 | 08:00 | 16:00
3 | 20110301 | 20110430 | 06:00 | 14:00 <- overrides both partially
Results
If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):
DayDate | Starts | Ends
--------|--------|------
20110101| 07:00 | 15:00 <- defined by record ID = 1
20110102| 07:00 | 15:00 <- defined by record ID = 1
... many rows omitted for obvious reasons
20110301| 06:00 | 14:00 <- defined by record ID = 3
20110302| 06:00 | 14:00 <- defined by record ID = 3
... many rows omitted for obvious reasons
20110501| 08:00 | 16:00 <- defined by record ID = 2
20110502| 08:00 | 16:00 <- defined by record ID = 2
... many rows omitted for obvious reasons
20110531| 08:00 | 16:00 <- defined by record ID = 2
Actually, since you are working with dates, a Calendar table would be more helpful.
Declare #StartDate date
Declare #EndDate date
;With Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
Select ...
From Calendar
Left Join MyTable
On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );
Addition
Missed the part about the trumping rule in your original post:
Set DateFormat MDY;
Declare #StartDate date = '20110101';
Declare #EndDate date = '20110501';
-- This first CTE is obviously to represent
-- the source table
With SampleData As
(
Select 1 As Id
, Cast('20110101' As date) As RangeFrom
, Cast('20110331' As date) As RangeTo
, Cast('07:00' As time) As Starts
, Cast('15:00' As time) As Ends
, CURRENT_TIMESTAMP As CreatedOn
Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
)
, Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
, RankedData As
(
Select C.[Date]
, S.Id
, S.RangeFrom, S.RangeTo, S.Starts, S.Ends
, Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
From Calendar As C
Join SampleData As S
On C.[Date] Between S.RangeFrom And S.RangeTo
)
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1
Option ( Maxrecursion 0 );
In short, I rank all the sample data preferring the newer rows that overlap the same date.
Why do it all in DB when you can do it better in memory
This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.
get actual range definitions from DB to mid tier (smaller amount of data)
generate in memory calendar of a certain date range (faster than in DB)
put those DB definitions in (much easier and faster than DB)
And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.