I need to split date ranges that overlap. I have a primary table (I've called it Employment for this example), and I need to return all Begin-End date ranges for a person from this table. I also have multiple sub tables (represented by Car and Food), and I want to return the value that was active in the sub tables during the times given in the main tables. This will involve splitting the main table date ranges when a sub table item changes.
I don't want to return sub table information for dates not in the main tables.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
For the above data, the results should be:
Person_ID Employment Food Car Begin_Date End_Date
123 ACME 1986-01-01 1986-04-30
123 ACME Red Car 1986-05-01 1990-12-31
123 Office Corp Red Car 1995-05-15 1996-12-31
123 Office Corp Eggs Red Car 1997-01-01 1997-03-09
123 Office Corp Red Car 1997-03-10 1997-06-23
123 Office Corp 1997-06-24 1997-07-02
123 Office Corp Blue Car 1997-07-03 1998-10-03
123 Job 3 Blue Car 1998-10-04 2001-02-22
123 Job 3 Donuts Blue Car 2001-02-23 2001-02-25
123 Job 3 Blue Car 2001-02-26 2999-12-31
The first row is his time working for ACME, where he didn't have a car or a weird food obsession. In the second row, he purchased a car, and still worked at ACME. In the third row, he changed jobs to Office Corp, but still has the Red Car. Note how we're not returning data during his unemployment gap, even though he had the Red Car. We only want to know what was in the Car and Food tables during the times there are values in the Employment table.
I found a solution for SQL Server 2012 that uses the LEAD/LAG functions to accomplish this, but I'm stuck with 2008 R2.
To change the 2012 solution from that blog to work with 2008, you need to replace the LEAD in the following
with
ValidDates as …
,
ValidDateRanges1 as
(
select EmployeeNo, Date as ValidFrom, lead(Date,1) over (partition by EmployeeNo order by Date) ValidTo
from ValidDates
)
There are a number of ways to do this, but one example is a self join to the same table + 1 row (which is effectively what a lead does). One way to do this is to put a rownumber on the previous table (so it is easy to find the next row) by adding another intermediate CTE (eg ValidDatesWithRowno). Then do a left outer join to that table where EmployeeNo is the same and rowno = rowno + 1, and use that value to replace the lead. If you wanted a lead 2, you would join to rowno + 2, etc. So the 2008 version would look something like
with
ValidDates as …
,
ValidDatesWithRowno as --This is the ValidDates + a RowNo for easy self joining below
(
select EmployeeNo, Date, ROW_NUMBER() OVER (ORDER BY EmployeeNo, Date) as RowNo from ValidDates
)
,
ValidDateRanges1 as
(
select VD.EmployeeNo, VD.Date as ValidFrom, VDLead1.Date as ValidTo
from ValidDatesWithRowno VD
left outer join ValidDatesWithRowno VDLead1 on VDLead1.EmployeeNo = VD.EmployeeNo
and VDLead1.RowNo = VD.RowNo + 1
)
The rest of the solution described looks like it will work like you want on 2008.
Here is the answer I came up with. It works, but it's not very pretty.
It goes it two waves, first splitting any overlapping Employment/Car dates, then running the same SQL a second time add the Food dates and split any overlaps again.
DECLARE #Employment TABLE
( Person_ID INT, Employment VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Car TABLE
( Person_ID INT, Car VARCHAR(50), Begin_Date DATE, End_Date DATE )
DECLARE #Food TABLE
( Person_ID INT, Food VARCHAR(50), Begin_Date DATE, End_Date DATE )
INSERT INTO #Employment ( [Person_ID], [Employment], [Begin_Date], [End_Date] )
VALUES ( 123 , 'ACME' , '1986-01-01' , '1990-12-31' )
, ( 123 , 'Office Corp' , '1995-05-15' , '1998-10-03' )
, ( 123 , 'Job 3' , '1998-10-04' , '2999-12-31' )
INSERT INTO #Car ( [Person_ID] , [Car] , [Begin_Date] , [End_Date] )
VALUES ( 123, 'Red Car', '1986-05-01', '1997-06-23' )
, ( 123, 'Blue Car', '1997-07-03', '2999-12-31' )
INSERT INTO #Food ( [Person_ID], [Food], [Begin_Date], [End_Date] )
VALUES ( 123, 'Eggs', '1997-01-01', '1997-03-09' )
, ( 123, 'Donuts', '2001-02-23', '2001-02-25' )
DECLARE #Person_ID INT = 123;
--A table to hold date ranges that need to be merged together
DECLARE #DatesToMerge TABLE
(
ID INT,
Person_ID INT,
Date_Type VARCHAR(10),
Begin_Date DATETIME,
End_Date DATETIME
)
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Car])
, Person_ID
, 'Car'
, Begin_Date
, End_Date
FROM #Car
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Employment])
, Person_ID
, 'Employment'
, Begin_Date
, End_Date
FROM #Employment
WHERE Person_ID = #Person_ID;
--A table to hold the merged #Employment and Car records
DECLARE #EmploymentAndCar TABLE
(
RowNumber INT,
Person_ID INT,
Begin_Date DATETIME,
End_Date DATETIME
)
;
WITH CarCTE AS
(--This CTE grabs just the Car rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Car'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Car dates for each #Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Car and #Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Car'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
INSERT INTO #EmploymentAndCar ( RowNumber, Person_ID, Begin_Date, End_Date )
SELECT F.RowNumber
, F.Person_ID
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
ORDER BY F.Begin_Date
--------------------------------------------------------------------------------------------------
--Now that the Employment and Car dates have been merged, empty the DatesToMerge table
DELETE FROM #DatesToMerge;
--Reload the DatesToMerge table with the newly-merged Employment and Car records,
--and the Food records that still need to be merged
INSERT INTO #DatesToMerge
SELECT RowNumber
, Person_ID
, 'PtBCar'
, Begin_Date
, End_Date
FROM #EmploymentAndCar
WHERE Person_ID = #Person_ID
INSERT INTO #DatesToMerge
SELECT ROW_NUMBER() OVER(ORDER BY [Food])
, Person_ID
, 'Food'
, Begin_Date
, End_Date
FROM #Food
WHERE Person_ID = #Person_ID
;
WITH CarCTE AS
(--This CTE grabs just the Food rows so we can compare and split dates from them
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
WHERE Date_Type = 'Food'
),
NewRowsCTE AS
( --This CTE creates just new rows starting after the Food dates for each Employment date range
SELECT a.ID,
a.Person_ID,
a.Date_Type,
DATEADD(DAY,1,b.End_Date) AS Begin_Date,
a.End_Date
FROM #DatesToMerge a
INNER JOIN CarCTE b
ON a.Begin_Date <= b.Begin_Date
AND a.End_Date > b.Begin_Date
AND a.End_Date > b.End_Date -- This is needed because if both the Food and Car/Employment end on the same date, there is split row after
),
UnionCTE AS
( -- This CTE merges the new rows with the existing ones
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM #DatesToMerge
UNION ALL
SELECT ID,
Person_ID,
Date_Type,
Begin_Date,
End_Date
FROM NewRowsCTE
),
FixEndDateCTE AS
(
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date) AS FixID,
MIN(d.Begin_Date) AS Begin_Date
FROM UnionCTE c
LEFT OUTER JOIN CarCTE d
ON c.Begin_Date < d.Begin_Date
AND c.End_Date >= d.Begin_Date
WHERE c.Date_Type <> 'Food'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.Begin_Date)
),
Finalize AS
(
SELECT ROW_NUMBER() OVER (ORDER BY e.Begin_Date) AS RowNumber,
e.Person_ID,
e.Begin_Date,
CASE WHEN f.Begin_Date IS NULL THEN e.End_Date
ELSE DATEADD (DAY,-1,f.Begin_Date)
END AS EndDate
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.Begin_Date)) = f.FixID
)
SELECT DISTINCT
F.Person_ID
, Employment
, Car
, Food
, F.Begin_Date
, F.EndDate
FROM Finalize F
INNER JOIN #Employment Employment
ON F.Begin_Date BETWEEN Employment.Begin_Date AND Employment.End_Date AND Employment.Person_ID = #Person_ID
LEFT JOIN #Car Car
ON Car.[Begin_Date] <= F.Begin_Date
AND Car.[End_Date] >= F.[EndDate]
AND Car.Person_ID = #Person_ID
LEFT JOIN #Food Food
ON Food.[Begin_Date] <= F.[Begin_Date]
AND Food.[End_Date] >= F.[EndDate]
AND Food.Person_ID = #Person_ID
ORDER BY F.Begin_Date
If anyone has a more elegant solution, I will be happy to accept their answer.
I am getting an error on this create function code in Postgresql. The error says it is happening around Line 2 at DELETE, but it happens at WITH if I remove that line so I think it is a problem with the format of my Creat Function
create or replace function retention_data(shopId integer) returns void as $$
delete from retention where shop_id = shopId;
WITH ret_grid_step1 as (
select * from (
SELECT
order_id as order_name,
cust_name as cust_name,
email as email,
date(order_date) as created_at,
count(*) as num_items_in_order,
sum(total_price) as sales ,
rank() over (partition BY order_id ORDER BY cust_name ASC) as rnk_shipping_name,
rank() over (partition BY order_id ORDER BY email ASC) as rnk_email
FROM orders
WHERE shop_id = shopId
and order_date is not null and order_date > now()::date - 365 and order_date < now()::date + 1
group by 1,2,3,4
) x
where rnk_shipping_name = 1 and rnk_email = 1
)
insert into retention(shop_id, cust_name, email, last_purchase_dt, total_sales, num_orders, days_since_last_order)
select
shopId as shop_id,
coalesce(b.cust_name,'null') as cust_name,
a.email,
a.last_purchase_dt,
total_sales,
num_orders,
current_date - last_purchase_dt as days_since_last_order
from (
select
email,
max(created_at) as last_purchase_dt,
count(*) as num_orders,
sum(sales) as total_sales
from ret_grid_step1
group by 1
) as a
left join (
select
email,
cust_name,
rank() over (partition BY email ORDER BY created_at DESC) as rnk
from ret_grid_step1
--where cust_name is not null
group by 1,2,created_at
) as b
on a.email = b.email
where b.rnk = 1
and a.email <> '';
$$ language plpgsql;
I need to write a T-SQL group by query for a table with multiple dates and seq columns:
DROP TABLE #temp
CREATE TABLE #temp(
id char(1),
dt DateTime,
seq int)
Insert into #temp values('A','2015-03-31 10:00:00',1)
Insert into #temp values('A','2015-08-31 10:00:00',2)
Insert into #temp values('A','2015-03-31 10:00:00',5)
Insert into #temp values('B','2015-09-01 10:00:00',1)
Insert into #temp values('B','2015-09-01 10:00:00',2)
I want the results to contains only the items A,B with their latest date and the corresponding seq number, like:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 2
B 2015-09-01 10:00:00.000 2
I am trying with (the obviously wrong!):
select id, max(dt) as MaxDate, max(seq) as CorrespondentSeq
from #temp
group by id
which returns:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 5 <-- 5 is wrong
B 2015-09-01 10:00:00.000 2
How can I achieve that?
EDIT
The dt datetime column has duplicated values (exactly same date!)
I am using SQL Server 2005
You can use a ranking subselect to get only the highest ranked entries for an id:
select id, dt, seq
from (
select id, dt, seq, rank() over (partition by id order by dt desc, seq desc) as r
from #temp
) ranked
where r=1;
SELECT ID, DT, SEQ
FROM (
SELECT ID, DT, SEQ, Row_Number()
OVER (PARTITION BY id ORDER BY dt DESC, seq DESC) AS row_number
FROM temp
) cte
WHERE row_number = 1;
Demo : http://www.sqlfiddle.com/#!3/3e3d5/5
With trial and errors maybe I have found a solution, but I'm not completely sure this is correct:
select A.id, B.dt, max(B.seq)
from (select id, max(dt) as maxDt
from #temp
group by id) as A
inner join #temp as B on A.id = B.id AND A.maxDt = B.dt
group by A.id, B.dt
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
If there are duplicate rows, then you also need to specify what the query processor should use to determine which of the duplicates to return. Say you want the lowest value of seq,
Then you could write:
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
and seq = (Select Min(Seq) from #temp
where id = t.Id
and dt = t.dt)