Getting maximum sequential streak with events - postgresql

I’m having trouble getting my head around this.
I’m looking for a single query, if possible, running PostgreSQL 9.6.6 under pgAdmin3 v1.22.1
I have a table with a date and a row for each event on the date:
Date Events
2018-12-10 1
2018-12-10 1
2018-12-10 0
2018-12-09 1
2018-12-08 0
2018-12-07 1
2018-12-06 1
2018-12-06 1
2018-12-06 1
2018-12-05 1
2018-12-04 1
2018-12-03 0
I’m looking for the longest sequence of dates without a break. In this case, 2018-12-08 and 2018-12-03 are the only dates with no events, there are two dates with events between 2018-12-08 and today, and four between 2018-12-8 and 2018-12-07 - so I would like the answer of 4.
I know I can group them together with something like:
Select Date, count(Date) from Table group by Date order by Date Desc
To get just the most recent sequence, I’ve got something like this- the subquery returns the most recent date with no events, and the outer query counts the dates after that date:
select count(distinct date) from Table
where date>
( select date from Table
group by date
having count (case when Events is not null then 1 else null end) = 0
order by date desc
fetch first row only)
But now I need the longest streak, not just the most recent streak.
Thank you!

Your instinct is a good one in looking at the rows with zero events and working off them. We can use a subquery with a window function to get the "gaps" between zero event days, and then in a query outside it take the record we want, like so:
select *
from (
select date as day_after_streak
, lag(date) over(order by date asc) as previous_zero_date
, date - lag(date) over(order by date asc) as difference
, date_part('days', date - lag(date) over(order by date asc) ) - 1 as streak_in_days
from dates
group by date
having sum(events) = 0 ) t
where t.streak_in_days is not null
order by t.streak_in_days desc
limit 1

Related

Postgres check that repeatable event overlap with time slots

In general, I have something similar to the calendar.
In my database, I have repeatable events. To simplify work with them I generate time slots during which booking room will be reserved.
Table event
id long
room_uuid varchar
start_date timestamp
end_date time_stamp
repeat_every_min long
duration_min long
And another table:
Table event_time_slot
id long
event_id long (fk)
start_date timestamp
end_date time_stamp
How it looks like with mock data:
Table event mock data
id 1
room_uuid 267cb70a-6911-488c-aa9e-9deb506f785b
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:57:00"
repeat_every_min 15
duration_min 10
As result in the table event_time_slot I will have next records:
id 1
event_id 1
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:10:00"
____________________________________
id 2
event_id 1
start_date "2023-01-05 10:15:00"
end_date "2023-01-05 10:20:00"
____________________________________
id 3
event_id 1
start_date "2023-01-05 10:30:00"
end_date "2023-01-05 10:35:00"
____________________________________
id 4
event_id 1
start_date "2023-01-05 10:45:00"
end_date "2023-01-05 10:55:00"
Basically, I will generate time slots while
((startTime + N * duration) + repeatEveryMin) < endTime
My current flow to check will 2 repeatable events conflict or not is quite simple:
I generate time slots for event, and I do
select from event_time_slot ts
join event_time_slot its on its.event_id = ts_id
where
//condition that any of the saved slots overlaps with first generated slots
(its.start_date < (*endTime*) AND its.start_date > (*startTime*))
or
//condition that any of the saved slots overlaps (equal) with first generated slots
(its.start_date = (*endTime*) AND its.start_date = (*startTime*))
The problem is that it forces me to generate a lot of the time slots to execute this query.
Moreover, if I have event with 100 time_slots -> I will need to check that any of the previously saved event time slots do not overlap with 100 which I am going to save.
My question is:
Is in the Postgres any functionality, which can simplify working with repeatable events?
Is there any other technology, which solves this problem?
What I have tried:
To generate time slots for the event. The problem is that query is too complex and if I will have more than 5000 time slots for the 1 event -> I will need to do multiple queries to the DB, because I will receive memory error in my app.
Expecting to receive a feedback or a technology how Postgres can simplify current flow.
My primary question is - does Postgres have any functionality, to remove work with time slots at all?
For example - I pass startDate + endDate + repeatInterval to the query and SQL shows me overlapping events.
I want to avoid creating condition for every time_slot from event for which I want to check this
This query generates 4 time slots:
SELECT
tsrange(ts, ts + INTERVAL '10 MINUTE', '[)')
FROM generate_series(
'2023-01-05 10:00:00'::timestamp
, '2023-01-05 10:57:00'::timestamp
, INTERVAL '15 MINUTE') g(ts)
WHERE ts::time BETWEEN '09:00' AND '17:55' -- business hours
AND EXTRACT(DOW FROM ts) BETWEEN 1 AND 5 -- Monday to Friday
-- other conditions
Check the manual for all options you have with ranges, including the very powerful constraints to avoid overlapping events.

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

BigQuery - DATE_TRUNC error

trying to get the monthly aggregated data from Legacy table. Meaning date columns are strings:
amount date_create
100 2018-01-05
200 2018-02-03
300 2018-01-22
However, the command
Select DATE_TRUNC(DATE date_create, MONTH) as month,
sum(amount) as amount_m
from table
group by 1
Returns the following error:
Error: Syntax error: Expected ")" but got identifier "date_create"
Why does this query not run and what can be done to avoid the issue?
Thanks
It looks like you meant to cast date_create instead of using the DATE keyword (which is how you construct a literal value) there. Try this instead:
Select DATE_TRUNC(DATE(date_create), MONTH) as month,
sum(amount) as amount_m
from table
GROUP BY 1
I figured it out:
date_trunc(cast(date_create as date), MONTH) as Month
Another option for BigQuery Standard SQL - using PARSE_DATE function
#standardSQL
WITH `project.dataset.table` AS (
SELECT 100 amount, '2018-01-05' date_create UNION ALL
SELECT 200, '2018-02-03' UNION ALL
SELECT 300, '2018-01-22'
)
SELECT
DATE_TRUNC(PARSE_DATE('%Y-%m-%d', date_create), MONTH) AS month,
SUM(amount) AS amount_m
FROM `project.dataset.table`
GROUP BY 1
with result as
Row month amount_m
1 2018-01-01 400
2 2018-02-01 200
In practice - I prefer PARSE_DATE over CAST as former kind of documents expectation about data format
Try to add double quote to date_creat :
Select DATE_TRUNC('date_create', MONTH) as month,
sum(amount) as amount_m
from table
group by 1

Count data per day in a specific month in Postgresql

I have a table with a create date called created_at and a delete date called delete_at for each record. If the record was deleted, the field save that date; it's a logic delete.
I need to count the active records in a specific month. To understand what is an active record for me, let's see an example:
For this example we'll use this hypothetical record:
id | created_at | deleted_at
1 | 23-01-2014 | 05-06-2014
This record is active for every days between its creation date and delete date. Including that last. So if I need count the active record for March, in this case, this record must be counted in every days of that month.
I have a query (really easy to do) that show the actives records for a specific month, but my principal problem is how to count that actives for each day in that month.
SELECT
date_trunc('day', created_at) AS dia_creacion,
date_trunc('day', deleted_at) AS dia_eliminacion
FROM
myTable
WHERE
created_at < TO_DATE('01-04-2014', 'DD-MM-YYYY')
AND (deleted_at IS NULL OR deleted_at >= TO_DATE('01-03-2014', 'DD-MM-YYYY'))
Here you are:
select
TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i,
count( case (TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i) between created_at and coalesce(deleted_at, TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i)
when true then 1
else null
end)
from generate_series(0, TO_DATE('01-04-2014', 'DD-MM-YYYY') - TO_DATE('01-03-2014', 'DD-MM-YYYY')) as g(i)
left join myTable on true
group by 1
order by 1;
You can add more specific condition for joining only relevant records from myTable, but even without it gives you idea how to achieve counting as desired.

Dynamically change StartDate on a query depending on ClientId

I want to return the sum of daily spent since the beginning of the current insertion order (invoice) for a number of clients. Each client unfortunately has a different start date for the current insertion order.
I don't have any problem to pull the start date for each client but I don't get how to create a sort of lookup to a table with the start dates associated to each client.
Let's say I have a table IO:
ClientId StartDate
1 2014-10-01
2 2014-10-04
3 2014-09-17
...
And another table with the DailySpend for each Client:
Date Client Spend
2014-10-01 1 2325
2014-10-01 2 195
2014-10-01 3 434
2014-10-02 1 43624
...
Now, I would simply want to check for each client how much we spend from the start date of the current insertion order until yesterday.
May be something lyk this
SELECT a.client,
Sum(b.spend)
FROM [IO] a
JOIN DailySpend b
ON a.id = b.id
and a.startdate=>b.date
WHERE b.date <= Dateadd(dd, -1, Cast(Getdate() AS DATE))
GROUP BY client
select *
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date <= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
select DailySpend.Client, sum(DailySpend.Spend)
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date >= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
group by DailySpend.Client
you may need to flip the date order in the datediff