Count data per day in a specific month in Postgresql - postgresql

I have a table with a create date called created_at and a delete date called delete_at for each record. If the record was deleted, the field save that date; it's a logic delete.
I need to count the active records in a specific month. To understand what is an active record for me, let's see an example:
For this example we'll use this hypothetical record:
id | created_at | deleted_at
1 | 23-01-2014 | 05-06-2014
This record is active for every days between its creation date and delete date. Including that last. So if I need count the active record for March, in this case, this record must be counted in every days of that month.
I have a query (really easy to do) that show the actives records for a specific month, but my principal problem is how to count that actives for each day in that month.
SELECT
date_trunc('day', created_at) AS dia_creacion,
date_trunc('day', deleted_at) AS dia_eliminacion
FROM
myTable
WHERE
created_at < TO_DATE('01-04-2014', 'DD-MM-YYYY')
AND (deleted_at IS NULL OR deleted_at >= TO_DATE('01-03-2014', 'DD-MM-YYYY'))

Here you are:
select
TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i,
count( case (TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i) between created_at and coalesce(deleted_at, TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i)
when true then 1
else null
end)
from generate_series(0, TO_DATE('01-04-2014', 'DD-MM-YYYY') - TO_DATE('01-03-2014', 'DD-MM-YYYY')) as g(i)
left join myTable on true
group by 1
order by 1;
You can add more specific condition for joining only relevant records from myTable, but even without it gives you idea how to achieve counting as desired.

Related

Postgres check that repeatable event overlap with time slots

In general, I have something similar to the calendar.
In my database, I have repeatable events. To simplify work with them I generate time slots during which booking room will be reserved.
Table event
id long
room_uuid varchar
start_date timestamp
end_date time_stamp
repeat_every_min long
duration_min long
And another table:
Table event_time_slot
id long
event_id long (fk)
start_date timestamp
end_date time_stamp
How it looks like with mock data:
Table event mock data
id 1
room_uuid 267cb70a-6911-488c-aa9e-9deb506f785b
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:57:00"
repeat_every_min 15
duration_min 10
As result in the table event_time_slot I will have next records:
id 1
event_id 1
start_date "2023-01-05 10:00:00"
end_date "2023-01-05 10:10:00"
____________________________________
id 2
event_id 1
start_date "2023-01-05 10:15:00"
end_date "2023-01-05 10:20:00"
____________________________________
id 3
event_id 1
start_date "2023-01-05 10:30:00"
end_date "2023-01-05 10:35:00"
____________________________________
id 4
event_id 1
start_date "2023-01-05 10:45:00"
end_date "2023-01-05 10:55:00"
Basically, I will generate time slots while
((startTime + N * duration) + repeatEveryMin) < endTime
My current flow to check will 2 repeatable events conflict or not is quite simple:
I generate time slots for event, and I do
select from event_time_slot ts
join event_time_slot its on its.event_id = ts_id
where
//condition that any of the saved slots overlaps with first generated slots
(its.start_date < (*endTime*) AND its.start_date > (*startTime*))
or
//condition that any of the saved slots overlaps (equal) with first generated slots
(its.start_date = (*endTime*) AND its.start_date = (*startTime*))
The problem is that it forces me to generate a lot of the time slots to execute this query.
Moreover, if I have event with 100 time_slots -> I will need to check that any of the previously saved event time slots do not overlap with 100 which I am going to save.
My question is:
Is in the Postgres any functionality, which can simplify working with repeatable events?
Is there any other technology, which solves this problem?
What I have tried:
To generate time slots for the event. The problem is that query is too complex and if I will have more than 5000 time slots for the 1 event -> I will need to do multiple queries to the DB, because I will receive memory error in my app.
Expecting to receive a feedback or a technology how Postgres can simplify current flow.
My primary question is - does Postgres have any functionality, to remove work with time slots at all?
For example - I pass startDate + endDate + repeatInterval to the query and SQL shows me overlapping events.
I want to avoid creating condition for every time_slot from event for which I want to check this
This query generates 4 time slots:
SELECT
tsrange(ts, ts + INTERVAL '10 MINUTE', '[)')
FROM generate_series(
'2023-01-05 10:00:00'::timestamp
, '2023-01-05 10:57:00'::timestamp
, INTERVAL '15 MINUTE') g(ts)
WHERE ts::time BETWEEN '09:00' AND '17:55' -- business hours
AND EXTRACT(DOW FROM ts) BETWEEN 1 AND 5 -- Monday to Friday
-- other conditions
Check the manual for all options you have with ranges, including the very powerful constraints to avoid overlapping events.

Generate missing data and fill it down - postgresql

I have the dataset:
The problem is that the records are added only if an event happened, e.g. for the row with id 13897, the record was updated on 4/18/2020 and then on 5/1/2020 - the status was changed. What I need is the status of each record at the end of every month.
I was thinking about the below logic:
generate the series of dates from the min(date) till now - T1
get distinct id from the dataset - T2
do cross join between two above tables so that we get a new row for every row in the second table - T3
extract the dataset with all required fields - T4
merge T3 and T4 by concatenate(date and id) - T5
sort T5 by id and d asc - T5
fill-down all the fields grouped by id - T5
generate the series of dates from min(date) till now with the interval of one month and get the last day of each month - T6
merge T5 and T6 by date - right join so that we get only rows with the date = end of month
I am on step 6.
SELECT *
FROM (SELECT d, Concat(dt, t2.id) AS cnct
FROM (SELECT d,d::date AS dt
FROM generate_series(
( SELECT min(created_at::date)
FROM new_table), CURRENT_DATE , interval '1 day') d) t1
CROSS JOIN
(SELECT DISTINCT id FROM new_table )) t2)t3
--in case if a record with the same id was updated several times throughout the day
LEFT JOIN (WITH cte AS
( SELECT id, status, created_at at time zone 'eat' at time zone 'utc' AS "created_at", updated_at::date AS date, updated_at::date, row_number() OVER (partition BY id, updated_at::date ORDER BY updated_at DESC) rnFROM new_table ))SELECT cte.*, Concat(updated_at::date, id) AS cnct
FROM cte
WHERE rn = 1) t4
ON t3.cnct = t4.cnct
I am stuck on step 7. I found fill column with last value from partition in postgresql but it is not what I need. I envision that I need to sort by a date block i.e. dates from min date to now for one id - 13894 are to be considered block 1, dates from min date to now for another id - 13897 are to be considered block 2. The next step I thought is to fill-down all fields per a block.
And another question, how do you deal with the event-based data to adapt it for the time-series?
Tried:
You can use Postgresql's DISTINCT ON feature to do this. We'll generate a series with the start of every month (you'll need to supply start and end dates here) and put the ID and the date into the DISTINCT ON so that we get only one row of new_table for each distinct ID and month pair. Then we simply filter and order to ensure that the row we're getting for each ID and month is the latest row for which the date is before the new month.
SELECT DISTINCT ON (new_table.id, month_start) *
FROM new_table, generate_series(start_date, end_date, interval '1 month') month_start
WHERE new_table.date < month_start
ORDER BY new_table.id, month_start ASC, new_table.date DESC;
(If you need your results to have the last day of the month and not the first day of the next month, you can just subtract 1 day from month_start in your select clause.)
EDIT: Running on the data you supplied, I get this:
SELECT DISTINCT ON (new_table.id, month_start) new_table.id, month_start - interval '1 day' as month_end, new_table.status
FROM new_table, generate_series('2020-05-01', '2020-06-01', interval '1 month') month_start
WHERE new_table.date < month_start
ORDER BY new_table.id, month_start ASC, new_table.date DESC;
id | month_end | status
-------+------------------------+--------
13894 | 2020-04-30 00:00:00-07 | 5
13894 | 2020-05-31 00:00:00-07 | 5
13897 | 2020-04-30 00:00:00-07 | 2
13897 | 2020-05-31 00:00:00-07 | 5
(4 rows)

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?

Count and records from yesterday and add datecolumn next to it with yesterday's date in Bigquery, standardSQL

I've been able to get a sql running where I grab the count of all records from the day before.
SELECT count(*)
FROM mytable
WHERE date(ingest_time) >= (DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
date(ingest_time) < (CURRENT_DATE());
Adding to the SQL above in Bigquery, how do I generate a date column next to it that shows that these records are from yesterday with the date.
Something like this:
1) 3000390 | 2019-11-13
Instead of SELECT count(*) use SELECT count(*), DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)

Postgresql: using 'with clause' to iterate over a range of dates

I have a database table that contains a start visdate and an end visdate. If a date is within this range the asset is marked available. Assets belong to a user. My query takes in a date range (start and end date). I need to return data so that for a date range it will query the database and return a count of assets for each day in the date range that assets are available.
I know there are a few examples, I was wondering if it's possible to just execute this as a query/common table expression rather than using a function or a temporary table. I'm also finding it quite complicated because the assets table does not contain one date which an asset is available on. I'm querying a range of dates against a visibility window. What is the best way to do this? Should I just do a separate query for each day in the date range I'm given?
Asset Table
StartvisDate Timestamp
EndvisDate Timestamp
ID int
User Table
ID
User & Asset Join table
UserID
AssetID
Date | Number of Assets Available | User
11/11/14 5 UK
12/11/14 6 Greece
13/11/14 4 America
14/11/14 0 Italy
You need to use a set returning function to generate the needed rows. See this related question:
SQL/Postgres datetime division / normalizing
Example query to get you started:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 15:25:00+00'::timestamptz, '2014-12-05 07:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
count(data.id)
from data
join generate_series(
date_trunc('day', data.start_date),
date_trunc('day', data.end_date),
'1 day'
) as days (d)
on days.d >= date_trunc('day', data.start_date)
and days.d <= date_trunc('day', data.end_date)
group by data.id
id | count
----+-------
1 | 2
2 | 1
(2 rows)
You'll want to convert it to using ranges instead, and adapt it to your own schema and data, but it's basically the same kind of query as the one you want.