PostgreSQL: Query to find number of free days? - postgresql

I have table as given below. This table is showing that which vehicle will be free/available for users from start date to end date.
Suppose
There is vehicle id = 1 is available for user from 2018-01-15 to 2020-02-28 (in yyyy-mm-dd format). In this period any user can take a vehicle at rent.
What i want:
I want to calculate the no of free days in particular period.
Here period: Jan-2018 means (1-Jan-2018 to 31-Jan-2018).
Calculation criteria for free days:
For vehicle Id = 1 --> start date = 2018-01-15 and end Date = 2020-02-28
For Jan-2018 = 16 days
(as total days in Jan is 31 but our start date is starting from 2018-01-15 for vehicle id =1)
For Feb-2018 = 28 days (between 2018-01-15 to 2020-02-28)

Select enddate, startdate, CASE
WHEN startdate<= '2017-01-01' and enddate>= '2017-01-31' THEN ('2017-01-31'::date - '2017-01-01'::date)+1
WHEN startdate<= '2017-01-01' and enddate< '2017-01-31' and enddate>= '2017-01-01' THEN (enddate::date - '2017-01-01') +1
WHEN startdate> '2017-01-01' and enddate>= '2017-01-31' and startdate<= '2017-01-31' THEN (('2017-01-31'::date - '2017-01-01'::date)+1) - EXTRACT(DAY FROM startdate::date))+1
WHEN startdate> '2017-01-01' and enddate< '2017-01-31' and startdate<= '2017-01-31' and enddate>= '2017-01-01' THEN (enddate::date - startdate::date)+1
end
as td from table1

You can use minus operator and daterange data type for these two dates.
postgres=# select *,upper(intersection_range)-lower(intersection_range) as available_days from (select *, daterange(start_date,end_date) * daterange('2018-01-01','2018-01-31') as intersection_range from vehicle) a;
vehicle_id | start_date | end_date | intersection_range | available_days
------------+------------+------------+-------------------------+----------------
1 | 2017-12-31 | 2020-02-28 | [2018-01-01,2018-01-31) | 30
2 | 2017-11-30 | 2018-02-28 | [2018-01-01,2018-01-31) | 30
3 | 2017-07-31 | 2019-02-28 | [2018-01-01,2018-01-31) | 30
(3 rows)
Best regards.

Related

Postgres generate_series joined onto result set to fill empty dates within a range

I have a result set that sometimes has missing dates (because no data is present within that week), and need to fill those with zero's. For simplicity I've reduced the query and table down to
Table: generated_data
id | data | date_key
1 | 3 | 2021-12-13 03:00:00.000
2 | 1 | 2021-12-22 05:00:00.000
3 | 4 | 2021-12-24 07:00:00.000
4 | 7 | 2022-01-03 01:00:00.000
5 | 2 | 2022-01-05 02:00:00.000
Query:
Select
sum(data) / count(data),
DATE_TRUNC('week', date_key AT TIME ZONE 'America/New_York') as date_key
from generated_data
group by DATE_TRUNC('week', date_key AT TIME ZONE 'America/New_York') as date_key
would produce the following result set:
3 | 2021-12-13 00:00:00.000
2.5 | 2021-12-20 00:00:00.000
5.5 | 2022-01-03 00:00:00.000
but as you can see there's a missing date of 12/27 which I'd like to return in the result set as a zero. I've looked into using generate_series and joining onto the above simplified query, but haven't found a good solution.
The idea would be doing something like
SELECT GENERATE_SERIES('2021-11-08T00:00:00+00:00'::date, '2022-01-17T04:59:59.999000+00:00'::date, '1 week'::interval) as date_key
but I'm not sure how to join that back to the result query where just the missing dates are added. What would a on clause look like for something like that?
final result set would look like
3 | 2021-12-13 00:00:00.000
2.5 | 2021-12-20 00:00:00.000
0 | 2021-12-27 00:00:00.000
5.5 | 2022-01-03 00:00:00.000
At first, you should find the min and max of date and generate based on that. Then join a table with generated data
Demo
WITH data_range AS (
SELECT
min(date_key) AT TIME ZONE 'America/New_York' min,
max(date_key) AT TIME ZONE 'America/New_York' max
from generated_data
),
generated_range AS (
SELECT DATE_TRUNC(
'week',
GENERATE_SERIES(min, max, '1 week'::interval)
) AS date FROM data_range
)
SELECT
coalesce(sum(data) / count(data), 0),
DATE_TRUNC('week', gr.date)
FROM
generated_range gr
LEFT JOIN generated_data gd ON
DATE_TRUNC('week', gd.date_key AT TIME ZONE 'America/New_York') = gr.date
GROUP BY DATE_TRUNC('week', gr.date)
ORDER BY 2

How can I find the status in each month using a start and end date?

[ Title was: "Find out the facts: How to find the month wise active members in an healthcare organization per each year and also find the growth percentage" ]
i have 5 years of history data and would like to do some analytics on it. the data will contain active and inactive members data. the ask is for finding the active members per each month per each year.
what i am doing is am extracting month and year from effective data and grouping by month and year based on active status i.e. Status ='Active'
But in this manner I am losing the history records.
for example, if a person had membership from 01-01-2015 to 31-12-2016. this member will be shown as an inactive member now but the same person was an active member in that duration. So if I filter on the status, I will lose these old records.
i need to go to that month, Jan 2015 and check all whoever were active by that time. So I thought of doing another way.
I have extracted the month of expiry date and filtered like exp_month equal to or greater than extracted month of effective date as shown below. Here, I am not relying on the incoming source field containing member status. I am creating a field with logic to identify the status of the member during the period we are finding. This is just to identify active members per each month of year But i am not sure if this is giving me the perfect solution. Please suggest me the better approach.
SELECT extract(YEAR FROM member_effective_date) AS year
, extract(MONTH FROM member_expiry_date) AS month
, CASE WHEN extract(MONTH FROM member_expiry_date)
= extract(MONTH FROM member_effective_date)
OR extract(MONTH FROM member_expiry_date)
> extract(MONTH FROM member_effective_date)
THEN 'Yes'
ELSE 'No' END AS active_status
FROM table_name
You need to use a cross join with table of dates to get the status in each period. The cross join "inflates" the status table so you can evaluate the status for each period.
Here is an example:
CREATE TEMP TABLE table_name AS
SELECT 'member1' AS member
, '2020-01-01'::DATE AS member_effective_date
, '2020-04-27'::DATE AS member_expiry_date
;
WITH month_list
-- Month start and end for previous 12 months
AS (SELECT DATE_TRUNC('month',dt) AS month_start
, MAX(dt) AS month_end
FROM
-- List of the previous 365 dates
(SELECT DATE_TRUNC('day',SYSDATE) - (n * INTERVAL '1 day') AS dt
FROM
-- List of numbers from 1 to 365
(SELECT ROW_NUMBER() OVER () AS n FROM stl_scan LIMIT 365) )
GROUP BY month_start
)
SELECT extract(YEAR FROM b.month_start) AS year
, extract(MONTH FROM b.month_start) AS month
, CASE WHEN -- Effective before the month ended and
(a.member_effective_date <= b.month_end
AND a.member_expiry_date > b.month_start)
THEN 'Yes'
ELSE 'No' END AS active
FROM table_name a
CROSS JOIN month_list b -- Explicit cartesian product
ORDER BY 1,2
;
| year | month | active|
|------|-------|-------|
| 2019 | 8 | No |
| 2019 | 9 | No |
| 2019 | 10 | No |
| 2019 | 11 | No |
| 2019 | 12 | No |
| 2020 | 1 | Yes |
| 2020 | 2 | Yes |
| 2020 | 3 | Yes |
| 2020 | 4 | Yes |
| 2020 | 5 | No |
| 2020 | 6 | No |
| 2020 | 7 | No |
| 2020 | 8 | No |

selecting records without value

I have a problem when I'm trying to reach the desired result. The task looks simple — make a daily count of occurrences of the event for top countries.
The main table looks like this:
id | date | country | col1 | col2 | ...
1 | 2018-01-01 21:21:21 | US | value 1 | value 2 | ...
2 | 2018-01-01 22:32:54 | UK | value 1 | value 2 | ...
From this table, I want to get daily event counts by the country, which is achieved by
SELECT date::DATE AT TIME ZONE 'UTC', country, COALESCE(count(id),0) FROM tab1
GROUP BY 1, 2
The problem comes when there is no event was made by an UK user on 2 January 2018
country_events
date | country | count
2018-01-01 | US | 23
2018-01-01 | UK | 5
2018-01-02 | US | 30
2018-01-02 | UK | 0 -> is desired result, but row is missing
I've tried to generate date series and series of countries which I'm looking for, then CROSS JOIN these two tables. This helper with columns date and country I've left joined with my result table like
SELECT * FROM helper h
LEFT JOIN country_events c ON c.date::DATE = h.date::DATE AND c.country = h.country
I'm using PostgreSQL.
You need an outer join, not a cross join:
SELECT tab1.date::date, tab1.country, coalesce(count(*), 0)
FROM generate_series(TIMESTAMP '2018-01-01 00:00:00',
TIMESTAMP '2018-01-31 00:00:00',
INTERVAL '1 day') AS ts(d)
LEFT JOIN tab1 ON tab1.date >= ts.d AND tab1.date < ts.d + INTERVAL '1 day'
GROUP BY tab1.date::date, tab1.country
ORDER BY tab1.date::date, tab1.country;
This will give the desired list for January 2018.

Break into multiple rows based on date range of a single row

I have a table which captures appointments, some are single day appointments and some are multi day appointments, so the data looks like
AppointmentId StartDate EndDate
9 2017-04-12 2017-04-12
10 2017-05-01 2017-05-03
11 2017-06-01 2017-06-01
I want to split the multi day appointment as single days, so the result I am trying to achieve is like
AppointmentId StartDate EndDate
9 2017-04-12 2017-04-12
10 2017-05-01 2017-05-01
10 2017-05-02 2017-05-02
10 2017-05-03 2017-05-03
11 2017-06-01 2017-06-01
So I have split the appointment id 10 into multiple rows. I checked a few other questions like
here but those are to split just based on a single start date and end date and not based on table data
You can use a Calendar or dates table for this sort of thing.
For only 152kb in memory, you can have 30 years of dates in a table with this:
/* dates table */
declare #fromdate date = '20000101';
declare #years int = 30;
/* 30 years, 19 used data pages ~152kb in memory, ~264kb on disk */
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
select top (datediff(day, #fromdate,dateadd(year,#years,#fromdate)))
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
into dbo.Dates
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date];
create unique clustered index ix_dbo_Dates_date
on dbo.Dates([Date]);
Without taking the actual step of creating a table, you can use it inside a common table expression with just this:
declare #fromdate date = '20161229';
declare #thrudate date = '20170103';
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, #fromdate, #thrudate)+1)
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date]
)
select [Date]
from dates;
Use either like so:
select
t.AppointmentId
, StartDate = d.date
, EndDate = d.date
from dates d
inner join appointments t
on d.date >= t.StartDate
and d.date <= t.EndDate
rextester demo: http://rextester.com/TNWQ64342
returns:
+---------------+------------+------------+
| AppointmentId | StartDate | EndDate |
+---------------+------------+------------+
| 9 | 2017-04-12 | 2017-04-12 |
| 10 | 2017-05-01 | 2017-05-01 |
| 10 | 2017-05-02 | 2017-05-02 |
| 10 | 2017-05-03 | 2017-05-03 |
| 11 | 2017-06-01 | 2017-06-01 |
+---------------+------------+------------+
Number and Calendar table reference:
Generate a set or sequence without loops - 1 - Aaron Bertrand
Generate a set or sequence without loops - 2 - Aaron Bertrand
Generate a set or sequence without loops - 3 - Aaron Bertrand
The "Numbers" or "Tally" Table: What it is and how it replaces a loop - Jeff Moden
Creating a Date Table/Dimension in sql Server 2008 - David Stein
Calendar Tables - Why You Need One - David Stein
Creating a date dimension or calendar table in sql Server - Aaron Bertrand
tsql Function to Determine Holidays in sql Server - Aaron Bertrand
F_table_date - Michael Valentine Jones
Clearly a Calendar/Tally table would be the way to go as SqlZim illustrated (+1), however you can use an ad-hoc tally table with a CROSS APPLY.
Example
Select A.AppointmentId
,StartDate = B.D
,EndDate = B.D
From YourTable A
Cross Apply (
Select Top (DateDiff(DD,A.StartDate,A.EndDate)+1) D=DateAdd(DD,-1+Row_Number() Over (Order By Number),A.StartDate)
From master..spt_values
) B
Returns
AppointmentId StartDate EndDate
9 2017-04-12 2017-04-12
10 2017-05-01 2017-05-01
10 2017-05-02 2017-05-02
10 2017-05-03 2017-05-03
11 2017-06-01 2017-06-01

Postgresql running totals with groups missing data and outer joins

I've written a sql query that pulls data from a user table and produces a running total and cumulative total of when users were created. The data is grouped by week (using the windowing feature of postgres). I'm using a left outer join to include the weeks when no users where created. Here is the query...
<!-- language: lang-sql -->
WITH reporting_period AS (
SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
)
SELECT
date(interval) AS interval
, count(users.created_at) as interval_count
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count
FROM reporting_period
LEFT JOIN users
ON interval=date(date_trunc('week', users.created_at) )
GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval
It works almost perfectly. The cumulative value is calculated properly for weeks week a user was created. For weeks when no user was create it is set to grand total and not the cumulative total up to that point.
Notice the rows with ** the Week Tot column (interval_count) is 0 as expected but the Run Tot (cumulative_total) is 1053 which equals the grand total.
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 1053 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 1053 **
2015-05-11 | 0 | 1053 **
2015-05-18 | 1 | 30
2015-05-25 | 0 | 1053 **
...
2015-06-08 | 996 | 1031
...
2015-09-07 | 2 | 1052
2015-09-14 | 0 | 1053 **
2015-09-21 | 1 | 1053 **
2015-09-28 | 0 | 1053 **
This is what I would like
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 17 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 29 **
...
It seems to me that if the outer join can somehow apply the grand total to the last column it should be possible to apply the current running total but I'm at a loss on how to do it.
Is this possible?
This is not guaranteed to work out of the box as I havent tested on acutal tables, but the key here is to join users on created_at over a range of dates.
with reportingperiod as (
select intervaldate as interval_begin,
intervaldate + interval '1 month' as interval_end
from (
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-03-15')),
DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
) as rp
)
select interval_end,
interval_count,
sum(interval_count) over (order by interval_end) as running_sum
from (
select interval_end,
count(u.created_at) as interval_count
from reportingperiod rp
left join (
select created_at
from users
where created_at < '2015-10-02'
) u on u.created_at > rp.interval_begin
and u.created_at <= rp.interval_end
group by interval_end
) q
I figured it out. The trick was subqueries. Here's my approach
Add a count column to the generate_series call with default value of 0
Select interval and count(users.created_at) from the users data
Union the the generate_series and the result from the select in step #2
(At this point the result will have duplicates for each interval)
Use the results in a subquery to get interval and max(interval_count) which eliminates duplicates
Use the window aggregation as before to get the running total
SELECT
interval
, interval_count
, SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count
FROM
(
SELECT interval, MAX(interval_count) AS interval_count FROM
(
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('week', DATE '2015-04-02')),
DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
0 AS interval_count
UNION
SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
) sub1
GROUP BY interval
) grouped_data
I'm not sure if there are any serious performance issues with this approach but it seems to work. If anyone has a better, more elegant or performant approach I would love the feedback.
Edit: My solution doesn't work when trying to group by arbitrary time windows
Just tried this solution with the following changes
/* generate series using DATE_TRUNC('day'...)*/
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-04-02')),
DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
0 AS interval_count
/* And this part */
SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
For example is is possible to produce these similar results but have the data grouped by intervals as so
3/15/15 - 4/14/15,
4/15/15 - 5/14/15,
5/15/15 - 6/14/15
etc.