SUM OVER PARTITION to calculate running total - tsql

I am trying to modify my query to include a running total for each county in my report. Below is my working query with an attempt to use SUM OVER PARTITION commented out:
SELECT DATEPART(MONTH, r.received_date) AS [MonthID] ,
DATENAME(MONTH, r.received_date) AS [Month] ,
o.name AS [CountyName] ,
rsc.description AS [Filing] ,
COUNT(r.id) AS [Request_Total] ,
CAST (AVG(CAST (DATEDIFF(HOUR, received_date, completion_date) AS DECIMAL(8,2))) / 24 AS DECIMAL(8,2)) AS [Total_Time_Days]
--SUM(r.id) OVER (PARTITION BY o.name) AS [TotalFilings]
FROM dbo.requests AS [r]
INNER JOIN dbo.organizations AS [o] ON o.id = r.submitted_to_organiztion_id
INNER JOIN dbo.request_status_codes AS [rsc] ON rsc.code = r.request_status_code
WHERE r.submitted_to_organiztion_id < 68
AND r.request_type_code = 1
AND CAST(r.received_date AS DATE) >= '01/01/2016'
AND CAST(r.received_date AS DATE) <= '06/30/2016'
AND o.name = 'Alachua'
GROUP BY DATENAME(MONTH, r.received_date) ,
DATEPART(MONTH, r.received_date) ,
o.name ,
rsc.description
ORDER BY DATEPART(MONTH, r.received_date) ,
CountyName ,
Filing;
And the results look correct:
Perhaps I am misusing the SUM PARTITION BYbut my end goal is to add an additional column that will sum the filing types for each county by month.
For example, the additional column for the month of January should be 13,654 while February should be 14,238 and so on.
Could I get some advice on how to get this query working correctly? Thanks,

Not sure this is the best way or more efficient, but I was able to create a sub-query to obtain the results I wanted. I do believe a CTE or use of a Windows function would be better, but I haven't been able to get it to work. Here is my query however:
SELECT X.[MonthID] ,
X.[Month] ,
X.[CountyName] ,
X.[Filing] ,
X.[Avg_Time_Days] ,
SUM(X.Request_Total) AS [Total_Requests]
FROM ( SELECT DATEPART(MONTH, r.received_date) AS [MonthID] ,
DATENAME(MONTH, r.received_date) AS [Month] ,
o.name AS [CountyName] ,
rsc.description AS [Filing] ,
COUNT(r.id) AS [Request_Total] ,
CAST (AVG(CAST (DATEDIFF(HOUR, received_date,
completion_date) AS DECIMAL(8, 2)))
/ 24 AS DECIMAL(8, 2)) AS [Avg_Time_Days]
--, SUM(r.id) OVER (PARTITION BY o.name, rsc.description) AS [TotalFilings]
FROM dbo.requests AS [r]
INNER JOIN dbo.organizations AS [o] ON o.id = r.submitted_to_organiztion_id
INNER JOIN dbo.request_status_codes AS [rsc] ON rsc.code = r.request_status_code
WHERE r.submitted_to_organiztion_id < 68
AND r.request_type_code = 1
AND CAST(r.received_date AS DATE) >= '01/01/2016'
AND CAST(r.received_date AS DATE) <= '06/30/2016'
--AND o.name = 'Alachua'
GROUP BY DATENAME(MONTH, r.received_date) ,
DATEPART(MONTH, r.received_date) ,
o.name ,
rsc.description
--, r.id
--ORDER BY DATEPART(MONTH, r.received_date) ,
-- CountyName ,
-- Filing
) AS X
GROUP BY X.[MonthID] ,
X.[Month] ,
X.[CountyName] ,
X.[Filing] ,
X.[Avg_Time_Days]
ORDER BY X.[MonthID] ,
X.[Month] ,
X.[CountyName] ,
X.[Filing];

Related

How to collapse overlapping date periods with acceptable gaps using T-SQL?

We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.
SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31'
SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1
-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'
I think you are really close. Your question is very similar to
TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.
My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.
Here is my response to that question modified to your situation.
SELECT
a4.MemID
, CONVERT (DATE, a4.First_EffDate) AS [EffDate]
, CONVERT (DATE, a4.TermDate) AS [TermDate]
FROM (
SELECT
a3.MemID
, a3.EffDate
, a3.TermDate
, a3.MemID_group
, FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
, ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC) AS [Row_number]
FROM (
SELECT
a2.MemID
, a2.EffDate
, a2.TermDate
, a2.Previous_MemID
, a2.Previous_TermDate
, a2.New_group
, SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
FROM (
SELECT
a1.MemID
, a1.EffDate
, a1.TermDate
, a1.Previous_MemID
, a1.Previous_TermDate
---------------------------------------------------------------------------------
-- new group if the MemID is different from the previous row OR
-- if the MemID is the same as the previous row AND it has been more than 45 days
-- between the TermDate of the previous row and the EffDate of the current row
,
IIF((a1.MemID <> a1.Previous_MemID)
OR (
a1.MemID = a1.Previous_MemID
AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
)
, 1
, 0) AS [New_group]
---------------------------------------------------------------------------------
FROM (
SELECT
MemID
, EffDate
, TermDate
, LAG (MemID) OVER (ORDER BY MemID) AS [Previous_MemID]
, LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
FROM #T1
) a1
) a2
) a3
) a4
WHERE a4.[Row_number] = 1;
Here is the dbfiddle.

TSQL - Replace Cursor

I found in our database a cursor statement and I would like to replace it.
Declare #max_date datetime
Select #max_date = max(finished) From Payments
Declare #begin_date datetime = '2015-02-01'
Declare #end_of_last_month datetime
While #begin_date <= #max_date
Begin
SELECT #end_of_last_month = CAST(DATEADD(DAY, -1 , DATEFROMPARTS(YEAR(#begin_date),MONTH(#begin_date),1)) AS DATE) --AS end_of_last_month
Insert Into #table(Customer, ArticleTypeID, ArticleType, end_of_month, month, year)
Select Count(distinct (customerId)), prod.ArticleTypeID, at.ArticleType, #end_of_last_month, datepart(month, #end_of_last_month), datepart(year, #end_of_last_month)
From Customer cust
Inner join Payments pay ON pay.member_id = m.member_id
Inner Join Products prod ON prod.product_id = pay.product_id
Inner Join ArticleType at ON at.ArticleTypeID = prod.ArticleTypeID
Where #end_of_last_month between begin_date and expire_date
and completed = 1
Group by prod.ArticleTypeID, at.ArticleType
order by prod.ArticleTypeID, at.ArticleType
Set #begin_date = DATEADD(month, 1, #begin_date)
End
It groups all User per Month where the begin- and expire date in the actual Cursormonth.
Notes:
The user has different payment types, for e.g. 1 Month, 6 Month and so on.
Is it possible to rewrite the code - my problem is only the identification at the where clause (#end_of_last_month between begin_date and expire_date)
How can I handle this with joins or cte's?
What you need first, if not already is a numbers table
Using said Numbers table you can create a dynamic list of dates for "end_of_Last_Month" like so
;WITH ctexAllDates
AS (
SELECT end_of_last_month = DATEADD(DAY, -1, DATEADD(MONTH, N.N -1, #begin_date))
FROM
dbo.Numbers N
WHERE
N.N <= DATEDIFF(MONTH, #begin_date, #max_date) + 1
)
select * FROM ctexAllDates
Then combine with your query like so
;WITH ctexAllDates
AS (
SELECT end_of_last_month = DATEADD(DAY, -1, DATEADD(MONTH, N.N -1, #begin_date))
FROM
dbo.Numbers N
WHERE
N.N <= DATEDIFF(MONTH, #begin_date, #max_date) + 1
)
INSERT INTO #table
(
Customer
, ArticleTypeID
, ArticleType
, end_of_month
, month
, year
)
SELECT
COUNT(DISTINCT (customerId))
, prod.ArticleTypeID
, at.ArticleType
, A.end_of_last_month
, DATEPART(MONTH, A.end_of_last_month)
, DATEPART(YEAR, A.end_of_last_month)
FROM
Customer cust
INNER JOIN Payments pay ON pay.member_id = m.member_id
INNER JOIN Products prod ON prod.product_id = pay.product_id
INNER JOIN ArticleType at ON at.ArticleTypeID = prod.ArticleTypeID
LEFT JOIN ctexAllDates A ON A.end_of_last_month BETWEEN begin_date AND expire_date
WHERE completed = 1
GROUP BY
prod.ArticleTypeID
, at.ArticleType
, A.end_of_last_month
ORDER BY
prod.ArticleTypeID
, at.ArticleType;

GROUP BY Month, Display Month Name in SELECT List

I am trying to group by month to separate my result set into monthly breakdowns, but I want to display the month name instead of the month number.
SELECT
DATEPART(MONTH, R.received_date) AS [Month],
--DATENAME( MONTH, DATEPART( MONTH, R.received_date)) AS [Month] ,
o.name AS [CountyName] ,
rsc.description AS [Filing],
COUNT(*) AS [Total]
FROM dbo.requests AS [r]
INNER JOIN dbo.organizations AS [o] ON o.id = r.submitted_to_organiztion_id
INNER JOIN dbo.request_status_codes AS [rsc] ON rsc.code = r.request_status_code
WHERE r.submitted_to_organiztion_id < 68
AND r.request_type_code = 1
AND CAST(r.received_date AS DATE) >= '01/01/2016'
AND CAST(r.received_date AS DATE) <= '06/30/2016'
AND o.name = 'Alachua'
GROUP BY
DATEPART(MONTH, R.received_date) ,
--DATENAME( MONTH, DATEPART( MONTH, R.received_date)) ,
o.name ,
rsc.description;
And the results are expected below:
But if I group by DATENAME, the results are not what I would expect from the query below:
SELECT
--DATEPART(MONTH, R.received_date) AS [Month],
DATENAME( MONTH, DATEPART( MONTH, R.received_date)) AS [Month] ,
o.name AS [CountyName] ,
rsc.description AS [Filing],
COUNT(*) AS [Total]
FROM dbo.requests AS [r]
INNER JOIN dbo.organizations AS [o] ON o.id = r.submitted_to_organiztion_id
INNER JOIN dbo.request_status_codes AS [rsc] ON rsc.code = r.request_status_code
WHERE r.submitted_to_organiztion_id < 68
AND r.request_type_code = 1
AND CAST(r.received_date AS DATE) >= '01/01/2016'
AND CAST(r.received_date AS DATE) <= '06/30/2016'
AND o.name = 'Alachua'
GROUP BY
--DATEPART(MONTH, R.received_date) ,
DATENAME( MONTH, DATEPART( MONTH, R.received_date)) ,
o.name ,
rsc.description
ORDER BY
[Month] ,
CountyName ,
Filing
when 6 months become grouped into one set - January:
What can I do to fix the query to display the first result set but with the month name instead of numeric value?
Thanks,

TSQL get overlapping periods from datetime ranges

I have a table with date range an i need the sum of overlapping periods (in hours) between its rows.
This is a schema example:
create table period (
id int,
starttime datetime,
endtime datetime,
type varchar(64)
);
insert into period values (1,'2013-04-07 8:00','2013-04-07 13:00','Work');
insert into period values (2,'2013-04-07 14:00','2013-04-07 17:00','Work');
insert into period values (3,'2013-04-08 8:00','2013-04-08 13:00','Work');
insert into period values (4,'2013-04-08 14:00','2013-04-08 17:00','Work');
insert into period values (5,'2013-04-07 10:00','2013-04-07 11:00','Holyday'); /* 1h overlapping with 1*/
insert into period values (6,'2013-04-08 10:00','2013-04-08 20:00','Transfer'); /* 6h overlapping with 3 and 4*/
insert into period values (7,'2013-04-08 11:00','2013-04-08 12:00','Test'); /* 1h overlapping with 3 and 6*/
And its fiddle: http://sqlfiddle.com/#!6/9ca31/10
I expect a sum of 8h overlapping hours:
1h (id 5 over id 1)
6h (id 6 over id 3 and 4)
1h (id 7 over id 3 and 6)
I check this: select overlapping datetime events with SQL but seems to not do what I need.
Thank you.
select sum(datediff(hh, case when t2.starttime > t1.starttime then t2.starttime else t1.starttime end,
case when t2.endtime > t1.endtime then t1.endtime else t2.endtime end))
from period t1
join period t2 on t1.id < t2.id
where t2.endtime > t1.starttime and t2.starttime < t1.endtime;
Updated to handle several overlaps:
select sum(datediff(hh, start, fin))
from (select distinct
case when t2.starttime > t1.starttime then t2.starttime else t1.starttime end as start,
case when t2.endtime > t1.endtime then t1.endtime else t2.endtime end as fin
from period t1
join period t2 on t1.id < t2.id
where t2.endtime > t1.starttime and t2.starttime < t1.endtime
) as overlaps;
I have some "dirty" solution. Hope this helps :)
with src as (
select
convert(varchar, starttime, 112) [start_date]
, cast(left(convert(varchar, starttime, 108), 2) as int) [start_time]
, convert(varchar, endtime, 112) [end_date]
, cast(left(convert(varchar, endtime, 108), 2) as int) [end_time]
, id
from [period]),
[gr] as (
select
row_number() over(order by s1.[start_date], s1.[start_time], s1.[end_time], s2.[start_time], s2.[end_time]) [no]
, s1.[start_date] [date]
, s1.[start_time] [t1]
, s1.[end_time] [t2]
, s2.[start_time] [t3]
, s2.[end_time] [t4]
from src s1
join src s2 on s1.[start_date] = s2.[start_date]
and s1.[end_date] = s2.[end_date]
and (s1.[start_time] between s2.[start_time] and s2.[end_time] or s1.[end_time] between s2.[start_time] and s2.[end_time])
and s1.id != s2.id),
[raw] as (
select [no], [date], [t1] [h] from [gr] union all
select [no], [date], [t2] from [gr] union all
select [no], [date], [t3] from [gr] union all
select [no], [date], [t4] from [gr]),
[max_min] as (
select [no], [date], max(h) [max_h], min(h) [min_h]
from [raw]
group by [no], [date]
),
[result] as (
select [raw].*
from [raw]
left join [max_min] on [raw].[no] = [max_min].[no]
and ([raw].h = [max_min].[max_h] or [raw].h = [max_min].[min_h])
where [max_min].[no] is null),
[final] as (
select distinct r1.[date], r1.h [start_h], r2.h [end_h], abs(r1.h - r2.h) [dif]
from [result] r1
join [result] r2 on r1.[no] = r2.[no]
where abs(r1.h - r2.h) > 0
and r1.h > r2.h)
select sum(dif) [overlapping hours] from [final]
SQLFiddle

How to display rollup data in new column?

I have the following query which returns the number of android questions per each day on StackOverflow in the year of 2011. I want to get the sum of all the questions asked during the year 2011. For this I am using ROLLUP.
select
year(p.CreationDate) as [Year],
month(p.CreationDate) as [Month],
day(p.CreationDate) as [Day],
count(*) as [QuestionsAskedToday]
from Posts p
inner join PostTags pt on p.id = pt.postid
inner join Tags t on t.id = pt.tagid
where
t.tagname = 'android' and
p.CreationDate > '2011-01-01 00:00:00'
group by year(p.CreationDate), month(p.CreationDate),day(p.CreationDate)
​with rollup
order by year(p.CreationDate), month(p.CreationDate) desc,day(p.CreationDate) desc​
This is the output:
The sum of all questions asked on each day in 2011 is being displayed in the QuestionsAskedToday column itself.
Is there a way to display the rollup in a new column with an alias?
Link to the query
To show this as a column rather than a row you can use SUM(COUNT(*)) OVER () instead of ROLLUP. (Online Demo)
SELECT YEAR(p.CreationDate) AS [Year],
MONTH(p.CreationDate) AS [Month],
DAY(p.CreationDate) AS [Day],
COUNT(*) AS [QuestionsAskedToday],
SUM(COUNT(*)) OVER () AS [Total]
FROM Posts p
INNER JOIN PostTags pt
ON p.id = pt.postid
INNER JOIN Tags t
ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate > '2011-01-01 00:00:00'
GROUP BY YEAR(p.CreationDate),
MONTH(p.CreationDate),
DAY(p.CreationDate)
ORDER BY YEAR(p.CreationDate),
MONTH(p.CreationDate) DESC,
DAY(p.CreationDate) DESC
You could take an approach like this: Example
SELECT
YEAR(p.CreationDate) AS 'Year'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
THEN CAST(MONTH(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS 'Month'
, CASE
WHEN GROUPING(DAY(p.CreationDate)) = 0
THEN CAST(DAY(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS [DAY]
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
AND GROUPING(DAY(p.CreationDate)) = 0
THEN COUNT(1)
END AS 'QuestionsAskedToday'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 1
OR GROUPING(DAY(p.CreationDate)) = 1
THEN COUNT(1)
END AS 'Totals'
FROM Posts AS p
INNER JOIN PostTags AS pt ON p.id = pt.postid
INNER JOIN Tags AS t ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate >= '2011-01-01'
GROUP BY ROLLUP(YEAR(p.CreationDate)
, MONTH(p.CreationDate)
, DAY(p.CreationDate))
ORDER BY YEAR(p.CreationDate)
, MONTH(p.CreationDate) DESC
, DAY(p.CreationDate) DESC​​​​​​​
If this is what you wanted, the same technique can be applied to Years as well to total them in the new column, or their own column, if you want to query for multiple years and aggregate them.