Dynamically change StartDate on a query depending on ClientId - tsql

I want to return the sum of daily spent since the beginning of the current insertion order (invoice) for a number of clients. Each client unfortunately has a different start date for the current insertion order.
I don't have any problem to pull the start date for each client but I don't get how to create a sort of lookup to a table with the start dates associated to each client.
Let's say I have a table IO:
ClientId StartDate
1 2014-10-01
2 2014-10-04
3 2014-09-17
...
And another table with the DailySpend for each Client:
Date Client Spend
2014-10-01 1 2325
2014-10-01 2 195
2014-10-01 3 434
2014-10-02 1 43624
...
Now, I would simply want to check for each client how much we spend from the start date of the current insertion order until yesterday.

May be something lyk this
SELECT a.client,
Sum(b.spend)
FROM [IO] a
JOIN DailySpend b
ON a.id = b.id
and a.startdate=>b.date
WHERE b.date <= Dateadd(dd, -1, Cast(Getdate() AS DATE))
GROUP BY client

select *
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date <= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
select DailySpend.Client, sum(DailySpend.Spend)
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date >= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
group by DailySpend.Client
you may need to flip the date order in the datediff

Related

How can I, in T-SQL, examine date intervals to remove overlapping intervals before adding totals together

I am running an analysis on medication prescribing practices. We want to identify whether someone has been on a class of medications for 60 days out of a 90 day quarter. We have a start and end date for each prescription, and the bounds of the quarter (e.g., 4/1/2022 – 6/30/2022). For each prescription I’ve calculated the number of days between the start and end date (only including days that fall within the bounds of the quarter). There are many instances in which multiple drugs within the same class are prescribed someone might try one antidepressant but not like it, so be given another in the same class.
My original strategy was just to total up number of days for each class of medication and see if it’s 60 or over. The days don’t have to be consecutive, but if they overlap, days during an overlap period shouldn’t count twice (which they would in a simple sum).
For instance in the data table below, patient 1 in row 1 should be included as they are over 60 days. Patient 2 should also get in (rows 2 and 3) because the non-overlapping total (57+8) within the same med class gets them to over 60 days. However, patient 3 should NOT get in, even though the total of 32 + 32 is over 60 because the intervals overlap. This means that they were really on the medication class for only 32 days – this is an instance where someone might be on two different antidepressants simultaneously.
It’s not sufficient to just sum the days in the interval, but I also have to include some way to examine whether the intervals are overlapping and only add days if an interval for a given medication class falls outside another interval for that same class.
Row num Patid Med class Start date End date Interval
1 1 A 2022-04-28 2022-09-12 63
2 2 B 2022-05-03 2022-06-29 57
3 2 B 2022-04-21 2022-04-29 8
4 3 A 2022-01-19 2022-05-03 32
5 3 A 2022-01-19 2022-05-03 32
I’m having a hard time figuring out how to do this. Note, I'm limited to just using SQL for this.
Code that produced the above data. I would embed this in another query to generate a total interval but need to deal with the overlap issue.
DECLARE #startdt DATE;
DECLARE #enddt DATE;
SET #startdt='4/1/2022'
SET #enddt='6/30/2022'
--for q4 fy2022-23 (4/1/2022-6/30/2022)`
SELECT DISTINCT
rx.patid, d.medication_category as medcat, start_date, end_date,
-- case statement to capture days within quarter only
CASE WHEN start_date<#startdt and end_date>#enddt then 90
WHEN start_date<#startdt and end_date>=#startdt then datediff(d,#startdt,end_date)
WHEN start_date>=#startdt and end_date>#enddt then datediff(d,start_date,#enddt)
ELSE datediff(d,start_date,end_date)
END as interval
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
You can accomplish what you want by generating a calendar table (using a Common Table Expression) of individual days within the test range, joining those days with the prescriptions with overlapping days, and then counting distinct days for each patient and medication category combination.
Something like:
DECLARE #startdt DATE = '2022-04-01';
DECLARE #enddt DATE = '2022-06-30';
DECLARE #threshold INT = 60;
WITH Days AS (
SELECT #startdt AS Day
UNION ALL
SELECT DATEADD(day, 1, Day)
FROM Days
WHERE Day < #enddt
)
SELECT
rx.patid, d.medication_category as medcat,
COUNT(DISTINCT DD.Day) AS days_medicated,
MIN(DD.Day) AS start_date,
MAX(DD.Day) AS end_date
FROM rx
INNER JOIN Drug_names_categories d
ON rx.drugname = d.drugname
INNER JOIN Days DD
ON DD.Day BETWEEN rx.start_date AND rx.end_date
WHERE rx.start_date <= #enddt AND #startdt <= rx.end_date
GROUP BY rx.patid, d.medication_category
HAVING COUNT(DISTINCT DD.Day) >= #threshold
ORDER BY rx.patid, start_date;
If using SQL Server 2022 or later, the Days generator can be simplified by using the new GENERATE_SERIES() function:
WITH Days AS (
SELECT DATEADD(day, S.value, #startdt) AS Day
FROM GENERATE_SERIES(0, DATEDIFF(day, #Startdt, #enddt)) S
)
See this db<>fiddle for an example with some sample data.
I would do this using a date/calendar table, then it's pretty easy.
If you don't already have a date table, this link is one of many that describe how to create one easily ( https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/ )
Here's the script from this link (in case the link dies)
DECLARE #StartDate date = '20100101';
DECLARE #CutoffDate date = DATEADD(DAY, -1, DATEADD(YEAR, 30, #StartDate));
;WITH seq(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, #StartDate, #CutoffDate)
),
d(d) AS
(
SELECT DATEADD(DAY, n, #StartDate) FROM seq
),
src AS
(
SELECT
TheDate = CONVERT(date, d),
TheDay = DATEPART(DAY, d),
TheDayName = DATENAME(WEEKDAY, d),
TheWeek = DATEPART(WEEK, d),
TheISOWeek = DATEPART(ISO_WEEK, d),
TheDayOfWeek = DATEPART(WEEKDAY, d),
TheMonth = DATEPART(MONTH, d),
TheMonthName = DATENAME(MONTH, d),
TheQuarter = DATEPART(Quarter, d),
TheYear = DATEPART(YEAR, d),
TheFirstOfMonth = DATEFROMPARTS(YEAR(d), MONTH(d), 1),
TheLastOfYear = DATEFROMPARTS(YEAR(d), 12, 31),
TheDayOfYear = DATEPART(DAYOFYEAR, d)
FROM d
)
SELECT *
INTO MyDateTable
FROM src
ORDER BY TheDate
OPTION (MAXRECURSION 0);
No that you have your new date table you can join to it to get the list of dates that are within the start and end date, something like
SELECT DISTINCT COUNT(TheDate)
FROM rx
INNER JOIN MyDateTable dt on dt BETWEEN rx.start_date AND rx.end_date
INNER JOIN Drug_names_categories d ON rx.drugname=d.drugname
WHERE start_date<'7/1/2022' and end_date>'3/30/2022'
AND rx.patid IS NOT NULL
AND d.medication_category IS NOT NULL
AND d.medication_category <>''
Obviously this is simple example but you could extend this easily to include all the details you need, the point is that you now have a list of dates or distinct list of dates which you can work with easily.
You could also simply the date range applied by referencing the TheQuarter and TheYear columns. If this is a common task consider extending the date table to contain a comound YearQurater columns (e.g. 2023Q1/202301 etc)

How to Shorten Execution Time for A View

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.
user admin cust
user_id user_id | admin_id user_id | cust_id
--------- ---------|---------- ---------|---------
1 1 | a 2 | dd
2 4 | b 3 | ff
3
4
Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app
login_history
user_id | login_on
---------|---------------------
1 | 2022-01-01 13:22:43
1 | 2022-01-02 16:16:27
3 | 2022-01-05 21:17:52
2 | 2022-01-11 11:12:26
3 | 2022-01-12 03:34:47
I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.
login_counts_view
week_start_date | admin_count | cust_count
-----------------|-------------|------------
2022-01-01 | 1 | 1
2022-01-08 | 0 | 2
2022-01-15 | 0 | 0
.
.
.
2022-12-31 | 0 | 0
Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.
Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.
CREATE VIEW login_counts_view AS
SELECT
week_start_dates.week_start_date::text AS week_start_date,
count(distinct a.user_id) AS admin_count,
count(distinct c.user_id) AS cust_count
FROM (
SELECT
to_char(i::date, 'YYYY-MM-DD') AS week_start_date
FROM
generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates
LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date + INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
GROUP BY week_start_date;
Does anyone have any tips as to how to make this query execute more efficiently?
Idea
Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.
Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.
Implementation
A view to implement this follows:
CREATE VIEW login_counts_view_fast AS
WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i + 1 FROM Numbers WHERE i < 52 )
SELECT CAST ( date_trunc('year', NOW()) AS DATE) + 7 * n.i week_start_date
, count(distinct lw.admin_id) admin_count
, count(distinct lw.cust_id) cust_count
FROM (
SELECT i FROM Numbers
) n
LEFT JOIN (
SELECT admin_id
, cust_id
, base
, pit
, pit-base delta
, (pit-base) / (3600 * 24 * 7) week
FROM (
SELECT a.user_id admin_id
, c.user_id cust_id
, CAST ( EXTRACT ( EPOCH FROM l.login_on ) AS INTEGER ) pit
, CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW()) ) AS INTEGER ) base
FROM login_history l
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
) le
) lw
ON lw.week = n.i
GROUP BY n.i
;
Some remarks:
The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)
Evaluation
See it in action in this db fiddle
The query plan indicates savings of 50-70% in execution time.

Getting maximum sequential streak with events

I’m having trouble getting my head around this.
I’m looking for a single query, if possible, running PostgreSQL 9.6.6 under pgAdmin3 v1.22.1
I have a table with a date and a row for each event on the date:
Date Events
2018-12-10 1
2018-12-10 1
2018-12-10 0
2018-12-09 1
2018-12-08 0
2018-12-07 1
2018-12-06 1
2018-12-06 1
2018-12-06 1
2018-12-05 1
2018-12-04 1
2018-12-03 0
I’m looking for the longest sequence of dates without a break. In this case, 2018-12-08 and 2018-12-03 are the only dates with no events, there are two dates with events between 2018-12-08 and today, and four between 2018-12-8 and 2018-12-07 - so I would like the answer of 4.
I know I can group them together with something like:
Select Date, count(Date) from Table group by Date order by Date Desc
To get just the most recent sequence, I’ve got something like this- the subquery returns the most recent date with no events, and the outer query counts the dates after that date:
select count(distinct date) from Table
where date>
( select date from Table
group by date
having count (case when Events is not null then 1 else null end) = 0
order by date desc
fetch first row only)
But now I need the longest streak, not just the most recent streak.
Thank you!
Your instinct is a good one in looking at the rows with zero events and working off them. We can use a subquery with a window function to get the "gaps" between zero event days, and then in a query outside it take the record we want, like so:
select *
from (
select date as day_after_streak
, lag(date) over(order by date asc) as previous_zero_date
, date - lag(date) over(order by date asc) as difference
, date_part('days', date - lag(date) over(order by date asc) ) - 1 as streak_in_days
from dates
group by date
having sum(events) = 0 ) t
where t.streak_in_days is not null
order by t.streak_in_days desc
limit 1

Group events by sequence, defining the minimum period between sequences t-SQL

I have a table of events, called tbl_events that looks something like this:
PersonID Date
1 30/03/2015
1 22/04/2015
1 30/06/2015
2 18/07/2016
2 09/12/2016
2 28/04/2017
3 01/10/2014
3 28/11/2016
3 28/11/2016
3 16/01/2017
4 13/04/2017
4 09/05/2017
I want to be able to group these events up by the start date of each 'sequence', with a sequence being defined as a run of events from the first identified to the last identified for each PersonID. The last event in a sequence is defined as the event where thereafter there are no subsequent events for that PersonID for a year.
The result of this I would expect to look like is below:
PersonID FirstDate Sequence Events
1 30/03/2015 1 3
2 18/07/2016 1 3
3 01/10/2014 1 1
3 28/11/2016 2 3
4 13/04/2017 1 2
I am able to identify the sequences in Excel and pivot the data, but I need to be able to do this in SQL.
Here is the formula I have used in Excel to generate the sequence number (I am populating cell C3, with column A being PersonID and B being Date):
=+IF(A2<>A3,1,IF((B3-B2)<365,C2,C2+1))
I have joined the table back on itself using ROW_NUMBER to get the difference between the Date and the previous event date for that ID, but I'm not really sure where to go from there.
Any help is much appreciated.
My solution is based on the sample data you've provided along with your excel formula.
-- easily consumable sample data
DECLARE #tbl_events TABLE (PersonId int, [date] date)
INSERT #tbl_events VALUES
(1,'20150330'),(1,'20150422'),(1,'20150630'),(2,'20160718'),(2,'20161209'),(2,'20170428'),
(3,'20141001'),(3,'20161128'),(3,'20161128'),(3,'20170116'),(4,'20170413'),(4,'20170509');
-- Solution
WITH groupings AS
(
SELECT
PersonId,
FirstDate = MIN([date]) OVER (PARTITION BY personId ORDER BY [date]),
NextDate = LAG([date],1,[date]) OVER (PARTITION BY personId ORDER BY [date]),
[date],
grouper =
DATEDIFF(DAY, MIN([date]) OVER (PARTITION BY personId ORDER BY [date]), [date]) / 365
FROM #tbl_events
),
Prep AS
(
SELECT
PersonId,
firstDate = IIF(grouper = 0, FirstDate, IIF(FirstDate = NextDate, [date],NextDate))
FROM groupings
)
SELECT
PersonId,
FirstDate,
[Sequence] = ROW_NUMBER() OVER (PARTITION BY personId ORDER BY FirstDate),
[Events] = COUNT(*)
FROM prep
GROUP BY personId, FirstDate;
Results
PersonId FirstDate Sequence Events
----------- ---------- -------------------- -----------
1 2015-03-30 1 3
2 2016-07-18 1 3
3 2014-10-01 1 1
3 2016-11-28 2 3
4 2017-04-13 1 2
First note all years have 365 days, nonetheless, I'm using 365 to emulate your excel logic; this would need to be updated to account for leap years. Next, like your excel formula - this will only be correct when there are two sequences;
it would not work when, say personId has a date of jan 1 2015, then jan 10 2016, then feb 1 2017.Let us know if we need logic to accommodate for the aforementioned scenarios.
Lastly this solution uses LAG which requires SQL Server 2012+, if you're working with an earlier version of SQL the query will have to be updated accordingly.

SQL query to convert date ranges to per day records

Requirements
I have data table that saves data in date ranges.
Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
New record can define it's own date range if it needs to hence can overlap several older records.
Each new overlapping record overrides settings of older records that it overlaps.
Result set
What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.
To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.
Stored data
Imagine my stored data looks like this
ID | RangeFrom | RangeTo | Starts | Ends | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1 | 20110101 | 20110331 | 07:00 | 15:00
2 | 20110401 | 20110531 | 08:00 | 16:00
3 | 20110301 | 20110430 | 06:00 | 14:00 <- overrides both partially
Results
If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):
DayDate | Starts | Ends
--------|--------|------
20110101| 07:00 | 15:00 <- defined by record ID = 1
20110102| 07:00 | 15:00 <- defined by record ID = 1
... many rows omitted for obvious reasons
20110301| 06:00 | 14:00 <- defined by record ID = 3
20110302| 06:00 | 14:00 <- defined by record ID = 3
... many rows omitted for obvious reasons
20110501| 08:00 | 16:00 <- defined by record ID = 2
20110502| 08:00 | 16:00 <- defined by record ID = 2
... many rows omitted for obvious reasons
20110531| 08:00 | 16:00 <- defined by record ID = 2
Actually, since you are working with dates, a Calendar table would be more helpful.
Declare #StartDate date
Declare #EndDate date
;With Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
Select ...
From Calendar
Left Join MyTable
On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );
Addition
Missed the part about the trumping rule in your original post:
Set DateFormat MDY;
Declare #StartDate date = '20110101';
Declare #EndDate date = '20110501';
-- This first CTE is obviously to represent
-- the source table
With SampleData As
(
Select 1 As Id
, Cast('20110101' As date) As RangeFrom
, Cast('20110331' As date) As RangeTo
, Cast('07:00' As time) As Starts
, Cast('15:00' As time) As Ends
, CURRENT_TIMESTAMP As CreatedOn
Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
)
, Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
, RankedData As
(
Select C.[Date]
, S.Id
, S.RangeFrom, S.RangeTo, S.Starts, S.Ends
, Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
From Calendar As C
Join SampleData As S
On C.[Date] Between S.RangeFrom And S.RangeTo
)
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1
Option ( Maxrecursion 0 );
In short, I rank all the sample data preferring the newer rows that overlap the same date.
Why do it all in DB when you can do it better in memory
This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.
get actual range definitions from DB to mid tier (smaller amount of data)
generate in memory calendar of a certain date range (faster than in DB)
put those DB definitions in (much easier and faster than DB)
And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.