Postgres check that repeatable event overlap with time slots - postgresql

In general, I have something similar to the calendar.
In my database, I have repeatable events. To simplify work with them I generate time slots during which booking room will be reserved.
Table event
id long
room_uuid varchar
start_date timestamp
end_date time_stamp
repeat_every_min long
duration_min long
And another table:
Table event_time_slot
id long
event_id long (fk)
start_date timestamp
end_date time_stamp
How it looks like with mock data:
Table event mock data
id 1
room_uuid 267cb70a-6911-488c-aa9e-9deb506f785b
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:57:00"
repeat_every_min 15
duration_min 10
As result in the table event_time_slot I will have next records:
id 1
event_id 1
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:10:00"
____________________________________
id 2
event_id 1
start_date "2023-01-05 10:15:00"
end_date "2023-01-05 10:20:00"
____________________________________
id 3
event_id 1
start_date "2023-01-05 10:30:00"
end_date "2023-01-05 10:35:00"
____________________________________
id 4
event_id 1
start_date "2023-01-05 10:45:00"
end_date "2023-01-05 10:55:00"
Basically, I will generate time slots while
((startTime + N * duration) + repeatEveryMin) < endTime
My current flow to check will 2 repeatable events conflict or not is quite simple:
I generate time slots for event, and I do
select from event_time_slot ts
join event_time_slot its on its.event_id = ts_id
where
//condition that any of the saved slots overlaps with first generated slots
(its.start_date < (*endTime*) AND its.start_date > (*startTime*))
or
//condition that any of the saved slots overlaps (equal) with first generated slots
(its.start_date = (*endTime*) AND its.start_date = (*startTime*))
The problem is that it forces me to generate a lot of the time slots to execute this query.
Moreover, if I have event with 100 time_slots -> I will need to check that any of the previously saved event time slots do not overlap with 100 which I am going to save.
My question is:
Is in the Postgres any functionality, which can simplify working with repeatable events?
Is there any other technology, which solves this problem?
What I have tried:
To generate time slots for the event. The problem is that query is too complex and if I will have more than 5000 time slots for the 1 event -> I will need to do multiple queries to the DB, because I will receive memory error in my app.
Expecting to receive a feedback or a technology how Postgres can simplify current flow.
My primary question is - does Postgres have any functionality, to remove work with time slots at all?
For example - I pass startDate + endDate + repeatInterval to the query and SQL shows me overlapping events.
I want to avoid creating condition for every time_slot from event for which I want to check this

This query generates 4 time slots:
SELECT
tsrange(ts, ts + INTERVAL '10 MINUTE', '[)')
FROM generate_series(
'2023-01-05 10:00:00'::timestamp
, '2023-01-05 10:57:00'::timestamp
, INTERVAL '15 MINUTE') g(ts)
WHERE ts::time BETWEEN '09:00' AND '17:55' -- business hours
AND EXTRACT(DOW FROM ts) BETWEEN 1 AND 5 -- Monday to Friday
-- other conditions
Check the manual for all options you have with ranges, including the very powerful constraints to avoid overlapping events.

Related

Getting maximum sequential streak with events

I’m having trouble getting my head around this.
I’m looking for a single query, if possible, running PostgreSQL 9.6.6 under pgAdmin3 v1.22.1
I have a table with a date and a row for each event on the date:
Date Events
2018-12-10 1
2018-12-10 1
2018-12-10 0
2018-12-09 1
2018-12-08 0
2018-12-07 1
2018-12-06 1
2018-12-06 1
2018-12-06 1
2018-12-05 1
2018-12-04 1
2018-12-03 0
I’m looking for the longest sequence of dates without a break. In this case, 2018-12-08 and 2018-12-03 are the only dates with no events, there are two dates with events between 2018-12-08 and today, and four between 2018-12-8 and 2018-12-07 - so I would like the answer of 4.
I know I can group them together with something like:
Select Date, count(Date) from Table group by Date order by Date Desc
To get just the most recent sequence, I’ve got something like this- the subquery returns the most recent date with no events, and the outer query counts the dates after that date:
select count(distinct date) from Table
where date>
( select date from Table
group by date
having count (case when Events is not null then 1 else null end) = 0
order by date desc
fetch first row only)
But now I need the longest streak, not just the most recent streak.
Thank you!
Your instinct is a good one in looking at the rows with zero events and working off them. We can use a subquery with a window function to get the "gaps" between zero event days, and then in a query outside it take the record we want, like so:
select *
from (
select date as day_after_streak
, lag(date) over(order by date asc) as previous_zero_date
, date - lag(date) over(order by date asc) as difference
, date_part('days', date - lag(date) over(order by date asc) ) - 1 as streak_in_days
from dates
group by date
having sum(events) = 0 ) t
where t.streak_in_days is not null
order by t.streak_in_days desc
limit 1

Simple temporal view?

How can I write a query to compute the end date per ID in postgres? I am able to do this in-memory with Python, but I would rather keep it simple and just create a view.
My table appends any new combination of system_1_id and system_2_id along with the date of the file the data was from (I am reading a snapshot mapping file which is sent a few times per week). It looks like this:
system_1_id system_2_id start_date is_current
123456 05236 2016-06-01 False
123456 98899 2017-01-03 False
123456 05236 2017-04-15 True
To:
system_1_id system_2_id start_date end_date
123456 05236 2016-06-01 2017-01-02
123456 98899 2017-01-03 2017-04-14
123456 05236 2017-04-15
Note that there can only be one system_2_id assigned to a system_1_id at a time, but they can be recycled and even reassigned at a later date.
The end date is simply just 1 day less than the next row date for the same ID
My goal is eventually to be able to join other tables to the data and pull the accurate ids per date:
where t1.system_2_id = t2.system_2_id and t1.report_date >= t2.start_date and t1.report_date <= t2.end_date
A simple temporal table without worrying about triggers or rules or using an extension.
The lead() window function will do this for you, with your example data:
select
system_1_id,
system_2_id,
start_date,
cast(lead(start_date, 1, Null) over(partition by system_1_id order by start_date) - interval '1 day' as date) as end_date
from
the_table;

Count data per day in a specific month in Postgresql

I have a table with a create date called created_at and a delete date called delete_at for each record. If the record was deleted, the field save that date; it's a logic delete.
I need to count the active records in a specific month. To understand what is an active record for me, let's see an example:
For this example we'll use this hypothetical record:
id | created_at | deleted_at
1 | 23-01-2014 | 05-06-2014
This record is active for every days between its creation date and delete date. Including that last. So if I need count the active record for March, in this case, this record must be counted in every days of that month.
I have a query (really easy to do) that show the actives records for a specific month, but my principal problem is how to count that actives for each day in that month.
SELECT
date_trunc('day', created_at) AS dia_creacion,
date_trunc('day', deleted_at) AS dia_eliminacion
FROM
myTable
WHERE
created_at < TO_DATE('01-04-2014', 'DD-MM-YYYY')
AND (deleted_at IS NULL OR deleted_at >= TO_DATE('01-03-2014', 'DD-MM-YYYY'))
Here you are:
select
TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i,
count( case (TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i) between created_at and coalesce(deleted_at, TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i)
when true then 1
else null
end)
from generate_series(0, TO_DATE('01-04-2014', 'DD-MM-YYYY') - TO_DATE('01-03-2014', 'DD-MM-YYYY')) as g(i)
left join myTable on true
group by 1
order by 1;
You can add more specific condition for joining only relevant records from myTable, but even without it gives you idea how to achieve counting as desired.

Postgresql: using 'with clause' to iterate over a range of dates

I have a database table that contains a start visdate and an end visdate. If a date is within this range the asset is marked available. Assets belong to a user. My query takes in a date range (start and end date). I need to return data so that for a date range it will query the database and return a count of assets for each day in the date range that assets are available.
I know there are a few examples, I was wondering if it's possible to just execute this as a query/common table expression rather than using a function or a temporary table. I'm also finding it quite complicated because the assets table does not contain one date which an asset is available on. I'm querying a range of dates against a visibility window. What is the best way to do this? Should I just do a separate query for each day in the date range I'm given?
Asset Table
StartvisDate Timestamp
EndvisDate Timestamp
ID int
User Table
ID
User & Asset Join table
UserID
AssetID
Date | Number of Assets Available | User
11/11/14 5 UK
12/11/14 6 Greece
13/11/14 4 America
14/11/14 0 Italy
You need to use a set returning function to generate the needed rows. See this related question:
SQL/Postgres datetime division / normalizing
Example query to get you started:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 15:25:00+00'::timestamptz, '2014-12-05 07:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
count(data.id)
from data
join generate_series(
date_trunc('day', data.start_date),
date_trunc('day', data.end_date),
'1 day'
) as days (d)
on days.d >= date_trunc('day', data.start_date)
and days.d <= date_trunc('day', data.end_date)
group by data.id
id | count
----+-------
1 | 2
2 | 1
(2 rows)
You'll want to convert it to using ranges instead, and adapt it to your own schema and data, but it's basically the same kind of query as the one you want.

SQL query to convert date ranges to per day records

Requirements
I have data table that saves data in date ranges.
Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
New record can define it's own date range if it needs to hence can overlap several older records.
Each new overlapping record overrides settings of older records that it overlaps.
Result set
What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.
To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.
Stored data
Imagine my stored data looks like this
ID | RangeFrom | RangeTo | Starts | Ends | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1 | 20110101 | 20110331 | 07:00 | 15:00
2 | 20110401 | 20110531 | 08:00 | 16:00
3 | 20110301 | 20110430 | 06:00 | 14:00 <- overrides both partially
Results
If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):
DayDate | Starts | Ends
--------|--------|------
20110101| 07:00 | 15:00 <- defined by record ID = 1
20110102| 07:00 | 15:00 <- defined by record ID = 1
... many rows omitted for obvious reasons
20110301| 06:00 | 14:00 <- defined by record ID = 3
20110302| 06:00 | 14:00 <- defined by record ID = 3
... many rows omitted for obvious reasons
20110501| 08:00 | 16:00 <- defined by record ID = 2
20110502| 08:00 | 16:00 <- defined by record ID = 2
... many rows omitted for obvious reasons
20110531| 08:00 | 16:00 <- defined by record ID = 2
Actually, since you are working with dates, a Calendar table would be more helpful.
Declare #StartDate date
Declare #EndDate date
;With Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
Select ...
From Calendar
Left Join MyTable
On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );
Addition
Missed the part about the trumping rule in your original post:
Set DateFormat MDY;
Declare #StartDate date = '20110101';
Declare #EndDate date = '20110501';
-- This first CTE is obviously to represent
-- the source table
With SampleData As
(
Select 1 As Id
, Cast('20110101' As date) As RangeFrom
, Cast('20110331' As date) As RangeTo
, Cast('07:00' As time) As Starts
, Cast('15:00' As time) As Ends
, CURRENT_TIMESTAMP As CreatedOn
Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
)
, Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
, RankedData As
(
Select C.[Date]
, S.Id
, S.RangeFrom, S.RangeTo, S.Starts, S.Ends
, Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
From Calendar As C
Join SampleData As S
On C.[Date] Between S.RangeFrom And S.RangeTo
)
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1
Option ( Maxrecursion 0 );
In short, I rank all the sample data preferring the newer rows that overlap the same date.
Why do it all in DB when you can do it better in memory
This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.
get actual range definitions from DB to mid tier (smaller amount of data)
generate in memory calendar of a certain date range (faster than in DB)
put those DB definitions in (much easier and faster than DB)
And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.