DateDiff Rows Where UserID is a match - tsql

On Sql Server 2012 (T-SQL), I would like to analyse the date difference between the end dates and start dates for the same userid, and to see if there is a equal or greater than twelve month gap between times.
So for which ContractID the start date is =>12m than the previous end date.
ContractID UserID StartDate EndDate 12m Lapse
1 779 01/01/2000 01/01/2010 False
2 779 01/01/2010 01/01/2015 False
3 779 01/01/2016 NULL True
4 1021 09/03/2008 NULL False
Things perhaps to note are the userID is not in order on the real table, only the contractID is.

Using a CTE and the LAG() window function it's quite easy:
Create sample data:
DECLARE #T as table
(
ContractID int,
UserID int,
StartDate date,
EndDate date
)
INSERT INTO #T VALUES
(1, 779, '01/01/2000', '01/01/2010'),
(2, 779, '01/01/2010', '01/01/2015'),
(3, 779, '01/01/2016', NULL),
(4, 1021, '09/03/2008', NULL)
The query:
;WITH CTE AS
(
SELECT ContractID,
UserID,
StartDate,
EndDate,
LAG(EndDate) OVER(PARTITION BY UserId ORDER BY StartDate) As PreviousEndDate
FROM #T
)
SELECT ContractID,
UserID,
StartDate,
EndDate,
CASE WHEN DATEDIFF(MONTH, ISNULL(PreviousEndDate, StartDate), StartDate) >= 12 THEN
'True'
ELSE
'False'
END As '12m Lapse'
FROM CTE
Results:
ContractID UserID StartDate EndDate 12m Lapse
----------- ----------- ---------- ---------- ---------
1 779 2000-01-01 2010-01-01 False
2 779 2010-01-01 2015-01-01 False
3 779 2016-01-01 NULL True
4 1021 2008-09-03 NULL False

SELECT * FROM Table WHERE DATEDIFF(M,StartDate,EndDate) >=12

Starting with SQL Server 2012, there is a function called Lag that will help you get what you need.
The partition by of the window function will make sure that its separated by userID, and the order by will make sure its in ContractID order.
with prevEndDate as
(
select t.contractID
, t.userID
, t.startDate
, t.endDate
, lag(t.endDate,1,NULL) over (partition by t.userID order by t.contractID asc) as prevEndDate
from db_name.dbo.myTable as t
)
select p.contractID
, p.userID
, p.startDate
, p.endDate
, case when datediff(m,p.prevEndDate, p.startDate) >= 12 then 'True' else 'False' end as [12m Lapse]
from prevEndDate as p

Related

BigQuery SQL: Group rows with shared ID that occur within 7 days of each other, and return values from most recent occurrence

I have a table of datestamped events that I need to bundle into 7-day groups, starting with the earliest occurrence of each event_id.
The final output should return each bundle's start and end date and 'value' column of the most recent event from each bundle.
There is no predetermined start date, and the '7-day' windows are arbitrary, not 'week of the year'.
I've tried a ton of examples from other posts but none quite fit my needs or use things I'm not sure how to refactor for BigQuery
Sample Data;
Event_Id
Event_Date
Value
1
2022-01-01
010203
1
2022-01-02
040506
1
2022-01-03
070809
1
2022-01-20
101112
1
2022-01-23
131415
2
2022-01-02
161718
2
2022-01-08
192021
3
2022-02-12
212223
Expected output;
Event_Id
Start_Date
End_Date
Value
1
2022-01-01
2022-01-03
070809
1
2022-01-20
2022-01-23
131415
2
2022-01-02
2022-01-08
192021
3
2022-02-12
2022-02-12
212223
You might consider below.
CREATE TEMP FUNCTION cumsumbin(a ARRAY<INT64>) RETURNS INT64
LANGUAGE js AS """
bin = 0;
a.reduce((c, v) => {
if (c + Number(v) > 6) { bin += 1; return 0; }
else return c += Number(v);
}, 0);
return bin;
""";
WITH sample_data AS (
select 1 event_id, DATE '2022-01-01' event_date, '010203' value union all
select 1 event_id, '2022-01-02' event_date, '040506' value union all
select 1 event_id, '2022-01-03' event_date, '070809' value union all
select 1 event_id, '2022-01-20' event_date, '101112' value union all
select 1 event_id, '2022-01-23' event_date, '131415' value union all
select 2 event_id, '2022-01-02' event_date, '161718' value union all
select 2 event_id, '2022-01-08' event_date, '192021' value union all
select 3 event_id, '2022-02-12' event_date, '212223' value
),
binning AS (
SELECT *, cumsumbin(ARRAY_AGG(diff) OVER w1) bin
FROM (
SELECT *, DATE_DIFF(event_date, LAG(event_date) OVER w0, DAY) AS diff
FROM sample_data
WINDOW w0 AS (PARTITION BY event_id ORDER BY event_date)
) WINDOW w1 AS (PARTITION BY event_id ORDER BY event_date)
)
SELECT event_id,
MIN(event_date) start_date,
ARRAY_AGG(
STRUCT(event_date AS end_date, value) ORDER BY event_date DESC LIMIT 1
)[OFFSET(0)].*
FROM binning GROUP BY event_id, bin;

Order By Date desc evaluates 01-01-1970 higher then everything else

I have the following simplified query
SELECT
id
to_char(execution_date, 'YYYY-MM-DD') as execution_date
FROM schema.values
ORDER BY execution_date DESC, id DESC
execution_date can be null.
If no value is present in execution_date it will be set to 1970-01-01 as default. My problem is, that the following table values will lead to a result where 1970-01-01 is treated as the newest date.
Table:
id
execution_date
1
2
2020-01-01
3
2022-01-02
4
Result I would expect
id
execution_date
3
2022-01-02
2
2020-01-01
4
1970-01-01
1
1970-01-01
What I get
id
execution_date
4
1970-01-01
1
1970-01-01
3
2022-01-02
2
2020-01-01
How can I get the correct order and is it possible to easily return an empty varchar if the date is empty?
If you table has NULL values, not empty values, you can try to use nulls last :
with t as (select 1 as id, NULL::date as dt
union select
2, '2020-01-01'::date
union select
3, '2020-01-02'::date
union select
4, NULL::date)
select *
from t
order by t.dt desc nulls last, id desc;
It should work for an empty text values also:
with t as (select 1 as id, ''::text as dt
union select
2, '2020-01-01'::text
union select
3, '2020-01-02'::text
union select
4, NULL::text)
select *
from t
order by t.dt desc nulls last, id desc
And if you need to change your NULL date to 1970 just use COALESCE() :
with t as (select 1 as id, NULL::date as dt
union select
2, '2020-01-01'::date
union select
3, '2020-01-02'::date
union select
4, NULL::date)
select coalesce(t.dt, '1970-01-01'::date) as dt
from t
order by t.dt desc nulls last, id desc
Here's the dbfiddle
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=5d1fc31a3cf2d3121092f2446cce87e5
SELECT
id,
to_char(coalesce( execution_date, '1970-01-01'::date), 'YYYY-MM-DD') as execution_date
FROM values1
ORDER BY execution_date DESC, id DESC;
I did not to see the forest for the trees...
Here is the simple solution:
SELECT
id
CASE WHEN execution_date IS NULL THEN ''
ELSE to_char(execution_date, 'YYYY-MM-DD') END
AS execution_date
FROM schema.values
ORDER BY execution_date DESC, id DESC

Repeating value of previous row in a join

I have one table including accounts and their balance. I would like to report the balance for each day while for missing days report the last day.
Table accounts:
AccountName Date Balance
thomas 2008-10-09 1000
thomas 2008-10-20 5000
david 2008-02-18 2000
david 2008-03-10 200000
let's say we want the report for 2018-10 I need to get something like this
thomas 2008-10-01 0
...
thomas 2008-10-09 1000
thomas 2008-10-10 1000
...
thomas 2008-10-20 5000
...
thomas 2008-10-31 5000
I went this far:
DECLARE #StartDate datetime = '2008/10/9';
DECLARE #EndDate datetime = '2008/10/20';
WITH theDates AS
(
SELECT #StartDate as theDate
UNION ALL
SELECT DATEADD(day, 1, theDate)
FROM theDates
WHERE DATEADD(day, 1, theDate) <= #EndDate
)
select * from accounts a
right outer join thedates d on a.date=d.theDate
order by thedate
Results:
AccountNo Date Balance theDate
----------- ---------- -------- ----------
thomas 2008-10-09 1000 2008-10-09
NULL NULL NULL 2008-10-10
NULL NULL NULL 2008-10-11
NULL NULL NULL 2008-10-12
NULL NULL NULL 2008-10-13
NULL NULL NULL 2008-10-14
NULL NULL NULL 2008-10-15
NULL NULL NULL 2008-10-16
NULL NULL NULL 2008-10-17
NULL NULL NULL 2008-10-18
NULL NULL NULL 2008-10-19
thomas 2008-10-20 5000 2008-10-20
Any idea?
Update:
I end up using cursor. This is version working perfectly including the situation where an account has no entry.
DECLARE #Date datetime
declare #result table (accountname nvarchar(50), balance int, date datetime)
DECLARE #StartDate datetime = '2008/10/1';
DECLARE #EndDate datetime = '2008/10/29';
declare cur cursor for
WITH theDates AS
(
SELECT #StartDate as theDate
UNION ALL
SELECT DATEADD(day, 1, theDate)
FROM theDates
WHERE DATEADD(day, 1, theDate) <= #EndDate
)
select * from theDates
open cur
fetch next from cur into #date
while ##FETCH_STATUS=0
begin
insert into #result
select b.accountName, isnull(balance,
(select isnull((select top 1 balance from accounts where date<#date and accountName=b.accountName order by date desc),0))
), #date from
(select * from accounts where date = #date) a
right outer join (select distinct(accountname) from accounts ) b on a.accountname = b.accountname
fetch next from cur into #date
end
close cur
deallocate cur
select * from #result
Try this:
DECLARE #StartDate datetime = '2008/10/9';
DECLARE #EndDate datetime = '2008/10/20';
WITH theDates AS
(
SELECT #StartDate as theDate
UNION ALL
SELECT DATEADD(day, 1, theDate)
FROM theDates
WHERE DATEADD(day, 1, theDate) <= #EndDate
),
acc AS(
SELECT a.AccountName,
a.Balance,
a.Date,
isnull(c.CloseDate, cast(GETDATE()as date)) as CloseDate
FROM accounts a
CROSS APPLY(SELECT MIN(b.Date) as CloseDate
FROM accounts b
WHERE b.Date > a.Date) c
)
SELECT a.AccountName, a.Balance, a.Date, d.theDate
FROM acc a, theDates d
WHERE a.Date <= d.theDate
AND a.CloseDate > d.theDate
option (maxrecursion 0)
Results:
AccountName Balance Date theDate
----------- ----------- ------------------- -----------------------
thomas 1000 2008-10-09 00:00:00 2008-10-09 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-10 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-11 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-12 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-13 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-14 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-15 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-16 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-17 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-18 00:00:00.000
thomas 1000 2008-10-09 00:00:00 2008-10-19 00:00:00.000
thomas 5000 2008-10-20 00:00:00 2008-10-20 00:00:00.000
You can try to use aggregate function MIN and MAX make calendar table then OUTER JOIN
WITH theDates AS
(
SELECT AccountName, MIN(Date) as StartDt,MAX(Date) EndDt
FROM accounts
GROUP BY AccountName
UNION ALL
SELECT AccountName,DATEADD(day, 1, StartDt),EndDt
FROM theDates
WHERE DATEADD(day, 1, StartDt) <= EndDt
)
select d.AccountName,
d.StartDt [date],
ISNULL(a.Balance,0) Balance
from accounts a
LEFT join thedates d on a.date=d.StartDt
order by StartDt

t-sql select max value between two columns, or col one when col two is null

This is not easy for me to describe in the title (please forgive me), but here is my problem:
Suppose you have the following table:
CREATE TABLE Subscriptions (product char(3), start_date datetime, end_date datetime);
INSERT INTO #Subscriptions
VALUES('ABC', '2015-01-28 00:00:00', '2016-02-15 00:00:00'),
('ABC', '2016-02-04 12:08:00', NULL),
('DEF', '2013-04-15 00:00:00', '2013-06-10 00:00:00'),
('GHI', '2013-01-11 00:00:00', '2013-04-08 00:00:00');
Now I want to find out for how long a subscription has been either active or passive. I thus need to select the newest end_dates grouped by product, BUT if end_date is null, then I want start_date.
So - I have:
product start_date end_date
ABC 28-01-2015 00:00 15-02-2016 00:00
ABC 04-02-2016 12:08 NULL
DEF 15-04-2013 00:00 10-06-2013 00:00
GHI 11-01-2013 00:00 08-04-2013 00:00
What I want to find in my query:
product relevant_date
ABC 04-02-2016 12:08
DEF 10-06-2013 00:00
GHI 08-04-2013 00:00
I have tried using a union, and that seems to work, but it is very slow, and my question is: is there a more efficient way to solve this (I am using MS SQL Server 2012):
SELECT [product]
,MAX([start_date]) AS start_date
,NULL AS [end_date]
,MAX([start_date]) AS relevant_date
FROM Subscriptions
where end_date IS NULL
GROUP BY product
UNION
SELECT [product]
,NULL
,MAX([end_date])
,MAX([end_date])
FROM Subscriptions
where end_date IS not NULL and product not in (SELECT product FROM Subscriptions
where end_date IS NULL)
GROUP BY product
(If you have a suggestion for another title for my question, I am also all ears!)
For version 2012 or higher you can use a combination of distinct, first_value and isnull, like this:
SELECT DISTINCT
product,
FIRST_VALUE(ISNULL(end_date,start_date))
OVER(PARTITION BY product
ORDER BY ISNULL(end_date, '9999-12-31') DESC) AS EndDate
FROM Subscriptions
Results:
product EndDate
ABC 04.02.2016 12:08:00
DEF 10.06.2013 00:00:00
GHI 08.04.2013 00:00:00
For versions between 2008 and 2012, you can use a cte with row_number to get the same effect:
;WITH CTE AS
(
SELECT product,
ISNULL(end_date,start_date) As relevant_date,
ROW_NUMBER() OVER(PARTITION BY product ORDER BY ISNULL(end_date, '9999-12-31') DESC) As rn
FROM Subscriptions
)
SELECT product,
relevant_date
FROM CTE
WHERE rn = 1
See a live demo on rextester.
If the second ABC row is showing the incorrect start_date then this query should work
SELECT S.product
, relevant_date = MAX(ISNULL(S.end_date,S.start_date))
FROM dbo.Subscriptions S
GROUP BY S.product
This should do it:
select s1.product,MAX(case when useStartDate=1 then s1.startDate else s1.endDate end) 'SubscriptionDate'
from #Subscriptions s1
join (select s2s1.product, max(case when s2s1.endDate is null then 1 else 0 end) 'useStartDate' from #Subscriptions s2s1 group by s2s1.product) s2 on s1.product=s2.product
group by s1.product

Drive EndDate of Current Row From StarDate of Next Row

Can some one please help me with how to create end date from start date.
Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID).
I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests.
Here is my data set
DECLARE #ProductTests TABLE
(
RequestID int not null,
ProductID int not null,
TestID int not null,
TestDate datetime null,
OutcomeID int
)
insert into #ProductTests
(RequestID ,ProductID ,TestID ,TestDate ,OutcomeID )
select 1,2,22,'2005-01-21',10
union all
select 1,2,42,'2007-03-17',10
union all
select 1,2,45,'2010-12-25',10
union all
select 1,2,325,'2011-01-14',13
union all
select 1,2,895,'2011-08-10',15
union all
select 1,2,111,'2011-12-23',15
union all
select 1,2,636,'2012-05-02',10
union all
select 1,2,554,'2012-11-08',17
--select *from #producttests
RequestID ProductID TestID TestDate OutcomeID
1 2 22 2005-01-21 10
1 2 42 2007-03-17 10
1 2 45 2010-12-25 10
1 2 325 2011-01-14 13
1 2 895 2011-08-10 15
1 2 111 2011-12-23 15
1 2 636 2012-05-02 10
1 2 554 2012-11-08 17
And this is what I need to achieve.
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.
--This is what I have managed to achieve at the moment using the following script
select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID
from #producttests T1
left join #ProductTests T2 ON T1.RequestID=T2.RequestID
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate
group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate
order by T1.TestDate
Result:
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2007-03-17 10
1 2 2007-03-17 2010-12-25 10
1 2 2010-12-25 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2011-12-23 15
1 2 2011-12-23 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
nov 7 but still not answered
so here is my solution
not soo pretty but works
my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc.
those are essential when you want to write raports , and becoming quite powerfull in sql server 2012
i have also used CTE (common table expression) but it can be written as subquery or temporary table
;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing
select b.* from
(select ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a
right outer join
(select ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1 = b.id
where 1=1
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate ,MIN(b.TestDate) AS EndDate ,a.OutcomeID
from cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate
Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.
Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT *
FROM grouped
;
This should give you the following output for your data sample:
RequestID ProductID StartDate OutcomeID
--------- --------- ---------- ---------
1 2 2005-01-21 10
1 2 2011-01-14 13
1 2 2011-08-10 15
1 2 2012-05-02 10
1 2 2012-11-08 17
Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID,
rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT
g1.RequestID,
g1.ProductID,
g1.StartDate,
g2.StartDate AS EndDate,
g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
ON g1.RequestID = g2.RequestID
AND g1.ProductID = g2.ProductID
AND g1.rnk = g2.rnk - 1
;
You can try this query at SQL Fiddle to verify that it returns the output you are after.