DB2 How to convert the following "with temp as" into a normal select statement - db2

I'm trying to do an insert with this "with" statement but it seems like it supports select statement only, therefore I want to convert it into select statement. I'm just amazed on how this is working. Got a similar example on stack and changed it to fit my needs.
with temp (startdate, enddate, maxdate) as (
select min(salesdate) startdate, min(salesdate)+3 months enddate, max(salesdate) maxdate
from SALES
union all
select startdate + 3 months + 1 days, enddate + 3 months + 1 days, maxdate from temp
where enddate <= maxdate
)
select startdate, min(enddate, maxdate) from temp;
Thanks in advance.
Edit: It seems my query is misunderstood. Here is the pseudo code of what the query is supposed to be doing. The query is returning the expected result which is pretty amazing to me. I don't know how the recursive doesn't overlap after I added 1 day. After writing the pseudo code, I see that the select startdate + 3 months + 1 days should have been written as select enddate + 1 days which logically says what it's supposed to do instead of magically work:
rows = []
startdate = min(salesdate)
enddate = startdate + 3 months
maxdate = max(salesdate)
i = 0;
do {
rows[i++] = [startdate, min(enddate, maxdate)] // min for final iteration where enddate > maxdate.
startdate = enddate + 1 days
enddate = enddate + 1 days + 3 months // aka: startdate + 3 months
} while (enddate <= maxdate)
return rows
Hence, I've broken a huge date range into smaller chunks of 3 months ranges. Whether it is exactly 90 days or 91 days is not important, as long as I get every single date without gap and without overlap.

I'm curious about your decision, that a query with a recursive common table expression (RCTE) is "not normal". IBM calls it as 'select-statement' and considers it as normal. If it's some educational question, and you don't want to use RCTE due to some reason, then consider the the following example.
select s + (3*(x.i-1)) month start, s + (3*x.i) month - 1 day end
from table(values (date('2011-01-01'), date('2012-01-01'))) d(s, e)
, xmltable('for $id in (1 to $e) return <i>{number($id)}</i>'
passing ((year(e)-year(s))*12 + (month(e)-month(s)))/3 as "e"
columns i int path '.'
) x;
START END
---------- ----------
2011-01-01 2011-03-31
2011-04-01 2011-06-30
2011-07-01 2011-09-30
2011-10-01 2011-12-31
;
It's a little bit complicated, since you must pass desired number of rows to return to the xmltable table function, which returns a single column with values 1 to N. In other words you must compute desired number of 3-months intervals and pass it to the function.
(R)CTE can't be used in the UPDATE/DELETE statements, where you are able to use so called fullselect statements only (they don't allow CTEs). If you really need to use CTE for UPDATE/DELETE as in this case, you can do one of the following:
If you ARE ABLE to compute a temporary result set for whole delete/update statement, you can do something like this (I don't use here RCTE for simplicity, but a simple CTE only):
with a (id) as (values 1)
select count(1)
from old table(
delete from test t
where exists (select 1 from a where a.id=t.id)
);
If you ARE NOT ABLE to compute a temporary result set for whole delete/update statement, you can create a table/scalar function with the corresponding parameters, where you are able to use your RCTE. This function can be used in the outer statement afterwards.

I was able to insert it by moving the insert statement in front of the "with" statement. Copying answer here so I know next time. Although, I'm still interested in seeing how to convert it to a pure select statement. Will select that answer as the correct answer.
insert into my_temp_table
with temp (startdate, enddate, maxdate) as (
select min(salesdate) startdate, min(salesdate)+3 months enddate, max(salesdate) maxdate
from SALES
union all
select startdate + 3 months + 1 days, enddate + 3 months + 1 days, maxdate from temp
where enddate <= maxdate
)
select startdate, min(enddate, maxdate) from temp;

Related

How can I, in T-SQL, examine date intervals to remove overlapping intervals before adding totals together

I am running an analysis on medication prescribing practices. We want to identify whether someone has been on a class of medications for 60 days out of a 90 day quarter. We have a start and end date for each prescription, and the bounds of the quarter (e.g., 4/1/2022 – 6/30/2022). For each prescription I’ve calculated the number of days between the start and end date (only including days that fall within the bounds of the quarter). There are many instances in which multiple drugs within the same class are prescribed someone might try one antidepressant but not like it, so be given another in the same class.
My original strategy was just to total up number of days for each class of medication and see if it’s 60 or over. The days don’t have to be consecutive, but if they overlap, days during an overlap period shouldn’t count twice (which they would in a simple sum).
For instance in the data table below, patient 1 in row 1 should be included as they are over 60 days. Patient 2 should also get in (rows 2 and 3) because the non-overlapping total (57+8) within the same med class gets them to over 60 days. However, patient 3 should NOT get in, even though the total of 32 + 32 is over 60 because the intervals overlap. This means that they were really on the medication class for only 32 days – this is an instance where someone might be on two different antidepressants simultaneously.
It’s not sufficient to just sum the days in the interval, but I also have to include some way to examine whether the intervals are overlapping and only add days if an interval for a given medication class falls outside another interval for that same class.
Row num Patid Med class Start date End date Interval
1 1 A 2022-04-28 2022-09-12 63
2 2 B 2022-05-03 2022-06-29 57
3 2 B 2022-04-21 2022-04-29 8
4 3 A 2022-01-19 2022-05-03 32
5 3 A 2022-01-19 2022-05-03 32
I’m having a hard time figuring out how to do this. Note, I'm limited to just using SQL for this.
Code that produced the above data. I would embed this in another query to generate a total interval but need to deal with the overlap issue.
DECLARE #startdt DATE;
DECLARE #enddt DATE;
SET #startdt='4/1/2022'
SET #enddt='6/30/2022'
--for q4 fy2022-23 (4/1/2022-6/30/2022)`
SELECT DISTINCT
rx.patid, d.medication_category as medcat, start_date, end_date,
-- case statement to capture days within quarter only
CASE WHEN start_date<#startdt and end_date>#enddt then 90
WHEN start_date<#startdt and end_date>=#startdt then datediff(d,#startdt,end_date)
WHEN start_date>=#startdt and end_date>#enddt then datediff(d,start_date,#enddt)
ELSE datediff(d,start_date,end_date)
END as interval
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
You can accomplish what you want by generating a calendar table (using a Common Table Expression) of individual days within the test range, joining those days with the prescriptions with overlapping days, and then counting distinct days for each patient and medication category combination.
Something like:
DECLARE #startdt DATE = '2022-04-01';
DECLARE #enddt DATE = '2022-06-30';
DECLARE #threshold INT = 60;
WITH Days AS (
SELECT #startdt AS Day
UNION ALL
SELECT DATEADD(day, 1, Day)
FROM Days
WHERE Day < #enddt
)
SELECT
rx.patid, d.medication_category as medcat,
COUNT(DISTINCT DD.Day) AS days_medicated,
MIN(DD.Day) AS start_date,
MAX(DD.Day) AS end_date
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname = d.drugname
INNER JOIN Days DD
ON DD.Day BETWEEN rx.start_date AND rx.end_date
WHERE rx.start_date <= #enddt AND #startdt <= rx.end_date
GROUP BY rx.patid, d.medication_category
HAVING COUNT(DISTINCT DD.Day) >= #threshold
ORDER BY rx.patid, start_date;
If using SQL Server 2022 or later, the Days generator can be simplified by using the new GENERATE_SERIES() function:
WITH Days AS (
SELECT DATEADD(day, S.value, #startdt) AS Day
FROM GENERATE_SERIES(0, DATEDIFF(day, #Startdt, #enddt)) S
)
See this db<>fiddle for an example with some sample data.
I would do this using a date/calendar table, then it's pretty easy.
If you don't already have a date table, this link is one of many that describe how to create one easily ( https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/ )
Here's the script from this link (in case the link dies)
DECLARE #StartDate date = '20100101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 30, #StartDate));
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #CutoffDate)
),
d(d) AS
(
SELECT DATEADD(DAY, n, #StartDate) FROM seq
),
src AS
(
SELECT
TheDate = CONVERT(date, d),
TheDay = DATEPART(DAY, d),
TheDayName = DATENAME(WEEKDAY, d),
TheWeek = DATEPART(WEEK, d),
TheISOWeek = DATEPART(ISO_WEEK, d),
TheDayOfWeek = DATEPART(WEEKDAY, d),
TheMonth = DATEPART(MONTH, d),
TheMonthName = DATENAME(MONTH, d),
TheQuarter = DATEPART(Quarter, d),
TheYear = DATEPART(YEAR, d),
TheFirstOfMonth = DATEFROMPARTS(YEAR(d), MONTH(d), 1),
TheLastOfYear = DATEFROMPARTS(YEAR(d), 12, 31),
TheDayOfYear = DATEPART(DAYOFYEAR, d)
FROM d
)
SELECT *
INTO MyDateTable
FROM src
ORDER BY TheDate
OPTION (MAXRECURSION 0);
No that you have your new date table you can join to it to get the list of dates that are within the start and end date, something like
SELECT DISTINCT COUNT(TheDate)
FROM rx
INNER JOIN MyDateTable dt on dt BETWEEN rx.start_date AND rx.end_date
INNER JOIN Drug_names_categories d ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
Obviously this is simple example but you could extend this easily to include all the details you need, the point is that you now have a list of dates or distinct list of dates which you can work with easily.
You could also simply the date range applied by referencing the TheQuarter and TheYear columns. If this is a common task consider extending the date table to contain a comound YearQurater columns (e.g. 2023Q1/202301 etc)

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How to select a certain future date based on integer and integer[] in Postgresql?

I am trying to create a query that will return information about a series of future dates. So for example, today is Monday, and I want to get three days worth of information in advance: Tuesday, Wednesday, and Thursday. I understand how to use something like generate_series with a starting and end date to get the rows.
The problem I'm having is, I am selecting an integer for the number of days in advance I want from one table from a second table. But the particular dates will change if one or more of the potential future dates is one where the business is not open. So if the starting date were Thursday, and the business is closed on Sunday, I'd want to get rows for Friday, Saturday, and Monday.
So from the first table with the specifics on which days to get, I'd be selecting an integer (e.g. 3) and an integer[] (e.g. {1,2,3,4,5,6}). My thought was to somehow start with the day of the week of tomorrow (e.g. 2 from SELECT EXTRACT(DOW FROM CURRENT_DATE + '1 days'::interval)if today is tomorrow is Tuesday) and then check if that DOW is inside the array. I'd have a separate counter with the number of extra days I'd need to add to my series, and after looping through until I get three days that aren't skipped, I'd add it to my days ahead number. So starting on Thursday, I'd check Friday (5), it's in the array, increment loop variable and continue. Saturday (6), it's in the array, increment loop variable and continue. Sunday (0), not in the array, add one to the extra days counter and continue. Monday (1), in the array, increment loop variable and continue. That's three, so I'm done. Then add my second counter (1) to the original days ahead (3) and get 4 days worth of information. Days that the business isn't open will be excluded through WHERE conditions, so the total number of days displayed will be consistent.
The problem is, I can conceptualize this solution, but I can't figure out how to put it together syntactically. Here's an approximation of what I think would work:
DO $$
BEGIN
DECLARE
counter integer := 0;
increment_days integer := 1;
WITH future_data AS
(SELECT days_ahead, open_days FROM Stores);
WHILE counter < (SELECT days_ahead FROM future_data) loop
CASE WHEN (SELECT EXTRACT(DOW FROM CURRENT_DATE + (days::text || ' days'::interval))
= ANY(SELECT unnest(open_days) FROM future_data)) THEN
counter := counter + 1;
ELSE counter := counter END;
increment_days := increment_days + 1;
END LOOP;
increment_days := increment_days + days_ahead;
--[...main SELECT query...]
END$$;
I keep getting complains about the way I'm putting this all together. Currently it's a syntax error at WHILE. It seems like I can't do anything but a SELECT statement there.
Rather the trying to figure out how many days in advance just build a function where you provide a start_date and the number of days you want. Then let the function determine the actual dates returned (ie it bypasses Sunday). The following SQL function does that using a recursive CTE rather than attempting to calculate the number of days to look forward. See fiddle
create or replace
function business_day(start_date_in date, num_days_in integer default 3)
returns setof date
language sql
immutable strict
as $$
with recursive get_days (bus_date, num_selected) as
( select case when extract(dow from start_date_in::timestamp) > 0
then start_date_in::timestamp + interval '1 day'
else start_date_in::timestamp + interval '2 day'
end
, 1
union all
select case when extract(dow from bus_date + interval '1 day')>0
then bus_date + interval '1 day'
else bus_date + interval '2 day'
end
, num_selected + 1
from get_days
where num_selected<num_days_in
)
select bus_date::date from get_days ;
$$;

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

Postgres find where dates are NOT overlapping between two tables

I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?