tsql PIVOT function - tsql

Need help with the following query:
Current Data format:
StudentID EnrolledStartTime EnrolledEndTime
1 7/18/2011 1.00 AM 7/18/2011 1.05 AM
2 7/18/2011 1.00 AM 7/18/2011 1.09 AM
3 7/18/2011 1.20 AM 7/18/2011 1.40 AM
4 7/18/2011 1.50 AM 7/18/2011 1.59 AM
5 7/19/2011 1.00 AM 7/19/2011 1.05 AM
6 7/19/2011 1.00 AM 7/19/2011 1.09 AM
7 7/19/2011 1.20 AM 7/19/2011 1.40 AM
8 7/19/2011 1.10 AM 7/18/2011 1.59 AM
I would like to calculate the time difference between EnrolledEndTime and EnrolledStartTime and group it with 15 minutes difference and the count of students that enrolled in the time.
Expected Result :
Count(StudentID) Date 0-15Mins 16-30Mins 31-45Mins 46-60Mins
4 7/18/2011 3 1 0 0
4 7/19/2011 2 1 0 1
Can I use a combination of the PIVOT function to acheive the required result. Any pointers would be helpful.

Create a table variable/temp table that includes all the columns from the original table, plus one column that marks the row as 0, 16, 31 or 46. Then
SELECT * FROM temp table name PIVOT (Count(StudentID) FOR new column name in (0, 16, 31, 46).
That should put you pretty close.

It's possible (just see the basic pivot instructions here: http://msdn.microsoft.com/en-us/library/ms177410.aspx), but one problem you'll have using pivot is that you need to know ahead of time which columns you want to pivot into.
E.g., you mention 0-15, 16-30, etc. but actually, you have no idea how long some students might take -- some might take 24-hours, or your full session timeout, or what have you.
So to alleviate this problem, I'd suggesting having a final column as a catch-all, labeled something like '>60'.
Other than that, just do a select on this table, selecting the student ID, the date, and a CASE statement, and you'll have everything you need to work the pivot on.
CASE WHEN date2 - date1 < 15 THEN '0-15' WHEN date2-date1 < 30 THEN '16-30'...ELSE '>60' END.

I have an old version of ms sql server that doesn't support pivot. I wrote the sql for getting the data. I cant test the pivot, so I tried my best, couldn't test the pivot part. The rest of the sql will give you the exact data for the pivot table. If you accept null instead of 0, it can be written alot more simple, you can skip the "a subselect" part defined in "with a...".
declare #t table (EnrolledStartTime datetime,EnrolledEndTime datetime)
insert #t values('2011/7/18 01:00', '2011/7/18 01:05')
insert #t values('2011/7/18 01:00', '2011/7/18 01:09')
insert #t values('2011/7/18 01:20', '2011/7/18 01:40')
insert #t values('2011/7/18 01:50', '2011/7/18 01:59')
insert #t values('2011/7/19 01:00', '2011/7/19 01:05')
insert #t values('2011/7/19 01:00', '2011/7/19 01:09')
insert #t values('2011/7/19 01:20', '2011/7/19 01:40')
insert #t values('2011/7/19 01:10', '2011/7/19 01:59')
;with a
as
(select * from
(select distinct dateadd(day, cast(EnrolledStartTime as int), 0) date from #t) dates
cross join
(select '0-15Mins' t, 0 group1 union select '16-30Mins', 1 union select '31-45Mins', 2 union select '46-60Mins', 3) i)
, b as
(select (datediff(minute, EnrolledStartTime, EnrolledEndTime )-1)/15 group1, dateadd(day, cast(EnrolledStartTime as int), 0) date
from #t)
select count(b.date) count, a.date, a.t, a.group1 from a
left join b
on a.group1 = b.group1
and a.date = b.date
group by a.date, a.t, a.group1
-- PIVOT(max(date)
-- FOR group1
-- in(['0-15Mins'], ['16-30Mins'], ['31-45Mins'], ['46-60Mins'])AS p

Related

T-SQL vlookup with fake calendar table?

I am rather new in T-SQL and I have to create a view, where the output will be as shown below:
enter image description here
But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?
This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.
The Auxiliary Table Numbers
First lets learn about what a tally table, aka numbers table is all about.
What does this do?
SELECT N = 1 ;
It returns the number 1.
N
-----
1
How about this?
SELECT N = 1 FROM (VALUES(0)) AS e(N);
Same thing:
N
-----
1
What does this return?
SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);
Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:
N
-------
1
1
1
1
1
We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;
It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;
Next we leverage ROW_NUMBER() over our set of dummy values.
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;
Returns (truncated for brevity):
N
--------------------
1
2
3
...
24
25
Using as an auxiliary numbers table
#OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:
;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE #OneToTen TABLE (N INT);
INSERT #OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT
N = i.N,
Wrong = COUNT(*), -- WRONG!!! Don't do THIS, this counts ALL rows returned
Correct = COUNT(t.N) -- Correct, this counts numbers from #OneToTen AKA "t.N"
FROM iTally AS i -- Aux Table of numbers
LEFT JOIN #OneToTen AS t -- Table to evaluate
ON i.N = t.N -- LEFT JOIN #OneToTen numbers to our Aux table of numbers
WHERE i.N <= 10 -- We only need the numbers 1 to 10
GROUP BY i.N; -- Group by with no Sort!!!
This returns:
N Wrong Correct
----- ----------- -----------
1 1 1
2 3 3
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 2 2
9 1 0
10 3 3
Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).
Auxiliary table of Dates (AKA calendar table)
My we use our numbers table to create a calendar table.
;--== 3. Auxilliary Month/Year Calendar Table
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1)
TheDate = f.Dt,
TheYear = YEAR(f.Dt),
TheMonth = MONTH(f.Dt),
TheWeekday = DATEPART(WEEKDAY,f.Dt),
DayOfTheYear = DATEPART(DAYOFYEAR,f.Dt),
LastDayOfMonth = EOMONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
This returns:
TheDate TheYear TheMonth TheWeekday DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019 10 3 274 2019-10-31
2019-11-01 2019 11 6 305 2019-11-30
2019-12-01 2019 12 1 335 2019-12-31
2020-01-01 2020 1 4 1 2020-01-31
2020-02-01 2020 2 7 32 2020-02-29
2020-03-01 2020 3 1 61 2020-03-31
You will only need the YEAR and MONTH.
The Auxiliary Customer table
Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.
SAMPLE DATA:
;--== Sample Data
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
Distinct list of customers for an "Auxilliary Table of Customers"
SELECT DISTINCT s.Customer FROM #sale AS s;
For my sample data we get:
Customer
----------
ABC
CDF
FRED
Putting it all together
Here I'm going to:
Create a numbers table
Use my numbers table to create a calendar table
Create an auxiliary Customer table from #sale
CROSS JOIN (combine) both tables for a "junk dimension"
LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
Group by the auxiliary table values
SOLUTION:
;--==== SAMPLE DATA
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
;--==== START/END DATEs
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
;--==== FINAL SOLUTION
WITH -- 6.1. Auxilliary Table of numbers:
E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
-- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1) YEAR(f.Dt), MONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
)
SELECT
Customer = cust.Customer,
MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
Sales = ISNULL(SUM(s.SaleAmt),0)
-- Auxilliary Table of Customers
FROM (SELECT DISTINCT s.Customer FROM #sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN #sale AS s -- 6.5. Join #sale to Junk Dimension on Year,Month and Customer
ON s.SaleYear = cal.SaleYear
AND s.SaleMonth = cal.SaleMonth
AND s.Customer = cust.Customer
GROUP BY cust.Customer, cal.SaleYear, cal.SaleMonth -- 6.6. Group by Junk Dim values
ORDER BY cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required
RESULTS:
Customer MonthYear Sales
---------- ------------ ------------
ABC 2019-10 0.00
ABC 2019-11 0.00
ABC 2019-12 410.00
ABC 2020-1 718.00
ABC 2020-2 0.00
ABC 2020-3 250.00
CDF 2019-10 200.00
CDF 2019-11 198.00
CDF 2019-12 0.00
CDF 2020-1 333.00
CDF 2020-2 5325.00
CDF 2020-3 1105.00
FRED 2019-10 0.00
FRED 2019-11 0.00
FRED 2019-12 0.00
FRED 2020-1 0.00
FRED 2020-2 0.00
FRED 2020-3 0.00

TSQL Query Calculating number of days and number of occurrences

I'm having some issues writing a query summarizing some stocking information from a query.
I've been trying a few CTE (Common table Expressions ) grouping and subqueries and LAG function to try to summarize the data. Oddly I've gotten stuck on this problem the last few days.
Here is example of the data that I'm dealing with.
--Optional create table
--Drop table testdata
--Create table testdata ( Part int, StockDate date, OutOfStock bit);
with inventorydata(Part, StockDate, OutOfStock) as
(
select 1000, '1/1/2019',1
union
select 1000,'1/2/2019',1
union
select 1000, '1/3/2019',1
union
select 1000, '1/4/2019',0
union
select 1000, '1/5/2019',1
union
select 1005, '1/1/2019',0
union
select 1005,'1/2/2019',1
union
select 1005, '1/3/2019',1
union
select 1005, '1/4/2019',1
union
select 1005, '1/5/2019',0
)
--Insert into testdata ( Part,StockDate,OutOfStock)
Select Part,StockDate,OutOfStock from inventorydata
--Select * from testdata
Output
Part StockDate OutOfStock
----------- --------- -----------
1000 1/1/2019 1
1000 1/2/2019 1
1000 1/3/2019 1
1000 1/4/2019 0
1000 1/5/2019 1
1005 1/1/2019 0
1005 1/2/2019 1
1005 1/3/2019 1
1005 1/4/2019 1
1005 1/5/2019 0
I'm trying get the desired output.
Part StockDate BackInStock Occurance
----------- --------- ----------- -----------
1000 1/1/2019 1/3/2019 1
1000 1/5/2019 1/5/2019 1
1005 1/2/2019 1/4/2019 1
Any help is much appreciated.
Thank you.
Without more detail, it's hard to be sure of exactly what you are going for. Your desired output is missing some detail that would be helpful in giving a better answer. With that said, hopefully this will help you out.
My assumptions:
You are looking to find the periods where parts are out of stock, and note the date they first were out of stock and the date they came back into stock
I'm interpreting "back in stock" date to be the date that an item first had a record noting it was not out of stock. For example, part 1000 went out of stock on 1/1/2019 and would be back in stock starting 1/4/2019, per the data, rather than 1/3/2019 as in the example
If the item is not back in stock, then we don't want a "back in stock" date
I'm going to put your example data into a temp table to make life a little easier for demonstrating:
with inventorydata(Part, StockDate, OutOfStock) as
(
select 1000, '1/1/2019',1
union
select 1000,'1/2/2019',1
union
select 1000, '1/3/2019',1
union
select 1000, '1/4/2019',0
union
select 1000, '1/5/2019',1
union
select 1005, '1/1/2019',0
union
select 1005,'1/2/2019',1
union
select 1005, '1/3/2019',1
union
select 1005, '1/4/2019',1
union
select 1005, '1/5/2019',0
)
Select Part,StockDate,OutOfStock
INTO #Test
from inventorydata
Consider the following:
SELECT
Part,
StockDate,
BackInStockDate
FROM
(
SELECT
Part,
StockDate,
OutOfStock,
LEAD(StockDate, 1) OVER (PARTITION BY Part ORDER BY StockDate) AS BackInStockDate
FROM
(
SELECT
Part,
StockDate,
OutOfStock,
LAG(OutOfStock, 1) OVER (PARTITION BY Part ORDER BY StockDate) AS PrevOutOfStock
FROM #Test
) AS InnerData
WHERE
OutOfStock <> PrevOutOfStock
OR PrevOutOfStock IS NULL
) AS OuterData
WHERE
OutOfStock = 1
The InnerDataquery pulls each record along with the OutOfStock value for the previous row. We use that to find only those rows that represent a date where the status of the stock changed from in-stock to out-of-stock or visa-versa. This is determine by OutOfStock <> PrevOutOfStock and PrevOutOfStock IS NULL.
Once we have just the rows that represent changes we look at the next row to get the date we saw the state of the part change from the state represented in the current row.'
Finally, we filter out rows where OutOfStock = 0, as those would represent in-stock periods, which we are ignoring. This gives us the following result:
Part StockDate BackInStockDate
------- ----------- ----------------
1000 1/1/2019 1/4/2019
1000 1/5/2019 NULL
1005 1/2/2019 1/5/2019
You can modify the structure to get a different BackInStockDate if this isn't the value you are wanting, and add some counting to get whatever Occurances is supposed to be.

Calculate past 3 month average for every past 3rd month

I am using SQL Server 2014. I have a table like this
create table revenue (id varchar(2), trasdate date, revenue int);
insert into revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
;
I am preparing a SQL query to create a SSRS report. I want to calculate a past 3 month average for current and every past 3rd month with result like below. As we are in month of September right now. The result should show something like this:
**id Period Revenue_3Mon**
aa March-May 233
aa June-Aug 516
ab March-May 233
ab June-Aug 516
Though I can figure out about the Period column. I was mainly focussing on getting the Revenue_3Mon. So I initially tried with the below query after some googling. But this query throws an error as incorrect syntax near 'rows' and if I remove rows from the query then it throws an error as Incorrect syntax near the keyword 'between'. And incorrect syntax near i.
select i.id,i.mon,
avg([i.mon_revenue]) over (partition by i.id, i.mon order by [i.id],
[i.mon] rows between 3 preceding and 1 preceding row) as revenue_3mon --
-- using 3 preceding and 1 preceding row you exclude the current row
from (select a.id, month(a.trasdate) as mon,
sum(a.revenue) as mon_revenue
from revenue a
group by a.id, month(a.trasdate)) i
group by i.id, i.mon
order by i.id,i.mon;
After few efforts, I gave up on this query and came up with new solution which was a bit close to my expectation (after lots of trial and errors).
Declare #count as int;
declare #max as int;
set #count = 4
declare #temp as table (id varchar(2), monthoftrasdate int, revenue int,
[3monavg] int);
SET #MAX = (SELECT distinct MAX(a.ROWNUM) FROM (SELECT id, month(trasdate)
as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
) A GROUP BY A.ID);
while (#count <= #max )
begin
WITH CTE AS (
SELECT id, month(trasdate) as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS
ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
)
insert into #temp
SELECT A.ID,A.MON, a.TotalRevenue
,( SELECT avg(b.TotalRevenue) as avgrev
FROM CTE B
WHERE B.ROWNUM BETWEEN A.ROWNUM-3 AND A.ROWNUM-1
AND A.ID = B.ID --AND A.mon = B.mon
--and b.ROWNUM < a.ROWNUM
and (a.mon > 3 and a.ROWNUM > 3)
GROUP BY B.id
) AS REVENUE_3MON
FROM CTE A
set #count = #count + 1
end
select distinct a.* from #temp a
The reason I had to use 'distinct' is because the query was showing duplicate records for every id and every month. So far the result shows like below
id MonthofTrasdate Revenue 3MonAvg
aa 1 400 NULL
aa 2 700 NULL
aa 3 350 NULL
aa 4 150 483
aa 5 200 400
aa 6 600 233
aa 7 500 316
aa 8 450 433
aa 9 1234 516
ab 1 400 NULL
ab 2 700 NULL
ab 3 350 NULL
ab 4 150 483
ab 5 200 400
ab 6 600 233
ab 7 500 316
ab 8 450 433
ab 9 1234 516
This pulls out past 3 month average for every month. But i will just manipulate the rest on SSRS the way i want it.
As currently my table has no data for previous year. This works for me showing the appropriate result for next couple of months for now. But my concern is when I have to show my boss for next year Jan, Feb and March then it should be able to pull also for these months as well like Oct-Dec (Previous year), Nov-Jan and Dec - Feb. I am struggling to figure out the proper way to put this in my query.
Can you please help me out with this query? And also let me know what is wrong with my former query.
Problems with your first attempt:
You enclosed some of the aliases and column names in square brackets like [i.mon_revenue]. There is no need for square brackets, but if you want to use them, you have to break them up at the dot: [i].[mon_revenue].
In your window function expression, there is one row too many (in the end).
Window functions are applied at the very end (after the rest of the respective query), so you also have to include i.mon_revenue in your GROUP BY clause of the outer query.
Knowing that the inner query will produce one row per id and mon, there will never be preceding rows in an id-mon partition. Therefore, you must not partition by both, but only by id.
To simplify the query after resolving the issues: ordering by a partition column generally makes no sense, and since - as already mentioned - the inner query returns unique id-mon combinations, you don't have to group by these in the outer query. Looking at that query, we see that the outer query just directly selects and uses the values from the inner query, which makes a separation in two queries unneccessary. So, in fact, you wanted to perform the following query, which will produce the rolling 3-month average (I added the monthly TotalRevenue as well):
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, MONTH(trasdate)
ORDER BY id, MONTH(trasdate);
Suggestions on your second attempt:
When calculating the #MAX value, you rely on the fact that each id has revenues for the same number of months. Are you sure?
The code inside the WHILE loop does not depend on #count, so it will add the same data into the #temp table multiple times, which is probably the reason why you thought you needed a DISTINCT. Therfore: No need for the variables, no need for a loop and a #temp, no need for DISTINCT.
The conditions A.mon > 3 and A.rownum > 3 are redundant with your current data. In general, I guess, you don't want to explicitly excluse the months from January to March, so A.mon > 3 should be removed. A.rownum > 3 could be removed, too, unless you really don't want to see a 3-month average when there are only 2 preceding months or less.
As the subquery for the average is restricted to only one id, there's no need for a GROUP BY.
Since the ROW_NUMBER function doesn't care about gaps in the months, I suggest to use a different numbering function, for example DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum. Of course, the comparison in the WHERE clause of the subquery then has to be changed to B.mnum BETWEEN A.mnum+1 AND A.mnum+3.
So, your second attempt can be reduced to this, which will produce the same result as the above, at least with your sample data, where no gaps in the months exist:
WITH CTE AS (
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum
FROM revenue
GROUP BY id, MONTH(trasdate)
)
SELECT id, mon, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Now, guess what, an expression like my mnum using DATEDIFF increases by one every month as we move to the past, regardless of a change of years, so this might be useful for grouping as well, whether you want to (or can?) use Window functions or not:
With OVER()
SELECT id, MONTH(MIN(trasdate)) AS mon, YEAR(MIN(trasdate)) AS yr, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MIN(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
ORDER BY id, DATEDIFF(month, trasdate, GETDATE()) DESC;
Without OVER()
WITH CTE AS (
SELECT id, MIN(trasdate) AS min_dt, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, trasdate, GETDATE()) AS mnum
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
)
SELECT id, MONTH(min_dt) AS mon, YEAR(min_dt) AS yr, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Both queries allow for retrieving the minimum and maximum date for each period (including month and year).
If you instead wanted what you originally posted under The result should show something like this (just grouping by previous 3-months intervals), you just would have to group your original revenue table by id and (DATEDIFF(month, trasdate, GETDATE())-1)/3 (filtering WHERE DATEDIFF(month, trasdate, GETDATE()) > 0). If so, this kind of grouping and aggregation could, of course, be done also by the Report Server.
I think this should do what you want:
select r.*,
avg(r.mon_revenue) over (partition by r.id
order by r.mon_min
rows between 3 preceding and 1 preceding row
) as revenue_3mon
-- using 3 preceding and 1 preceding row you exclude the current row
from (select r.id, month(r.trasdate) as mon,
min(r.trasdate) as mon_min,
sum(r.revenue) as mon_revenue
from revenue r
group by r.id, year(r.trasdate), month(r.trasdate)
) 4
order by r.id, r.mon, r.mon_min;
Notes:
I fixed the code so it recognizes years as well as dates.
The expression [i.mon_revenue] is not a valid column reference (in your case). You have no column with the name "i.mon_revenue" (with the . in the name).
I changed the column alias to r to match the table.
I added a date column for each month to make it easier to express the ordering.
The outer group by is not necessary.
There are several syntax errors in your code. This should give you what you need. The inner query is the important bit but hopefully this will be enough to get you on your way.
I switch our the temp table for variable and changed the revenue column to not be INT as you have decimal values in there but other than that your original sample table is unchanged
DECLARE #revenue table (id varchar(2), trasdate date, revenue float)
insert into #revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
SELECT
*
FROM
(
SELECT
*
, MONTH(trasdate) as MonthNumber
, AVG(revenue) OVER (PARTITION BY id
ORDER BY
id
, MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) as ThreeMonthAvg
FROM #revenue
) a
WHERE MONTH(GETDATE()) - MonthNumber IN (0, 3, 6, 9)
This gives the following results
aa 2018-06-04 600 6 400
aa 2018-09-01 1234.5 9 516.666666666667
ab 2018-03-01 350 3 700
ab 2018-06-04 600 6 233.333333333333
ab 2018-09-01 1234.5 9 516.666666666667

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth

How to find the average of certain records T-SQL

I have a table variable that I am dumping data into:
DECLARE #TmpTbl_SKUs AS TABLE
(
Vendor VARCHAR (255),
Number VARCHAR(4),
SKU VARCHAR(20),
PurchaseOrderDate DATETIME,
LastReceivedDate DATETIME,
DaysDifference INT
)
Some records don't have a purchase order date or last received date, so the days difference is null as well. I have done a lot of inner joins on itself, but data seems to take too long, or comes out incorrect most of the time.
Is it possible to get the average per SKU days difference? how would I check if there is only 1 record of that SKU? I need the data, if there is only 1 record, then I have to find it at a champvendor level the average.
Here is the structure:
Vendor has many Numbers and Numbers has many SKUs
Any help would be great, I can't seem to crack this one, nor can I find anything related to this online. Thanks in advance.
Here is some sample data:
Vendor Number SKU PurchaseOrderDate LastReceivedDate DaysDifference
OTHER PMDD 1111 OP1111 2009-08-21 00:00:00.000 2009-09-02 00:00:00.000 12
OTHER PMDD 1111 OP1112 2009-12-09 00:00:00.000 2009-12-17 00:00:00.000 8
MANTOR 3333 MA1111 2006-02-15 00:00:00.000 2006-02-23 00:00:00.000 8
MANTOR 3333 MA1112 2006-02-15 00:00:00.000 2006-02-23 00:00:00.000 8
I'm sorry I may have written this wrong. If there is only 1 SKU for a record, then I want to return the DaysDifference (if it's not null), if it has more than 1 record and they are not null, then return the average days difference. If it is all nulls, then at a vendor level check for the average of the skus that are not null, otherwise it should just return 7. This is what I have tried:
SELECT t1.SKU, ISNULL
(
AVG(t1.DaysDifference),
(
SELECT ISNULL(AVG(t2.DaysDifference), 7)
FROM #TmpTbl_SKUs t2
WHERE t2.SKU=t1.SKU
GROUP BY t2.ChampVendor, t2.VendorNumber, t2.SKU
)
)
FROM #TmpTbl_SKUs t1
GROUP BY t1.SKU
Keep playing with this. I somewhat have what I got, but just don't understand how I would check if it has multiple records, and how to check at a vendor level.
Try this:
EDITED: added NULLIF(..., 0) to treat 0s as NULLs.
SELECT
t1.SKU,
COALESCE(
NULLIF(AVG(t1.DaysDifference), 0),
NULLIF(t2.AvgDifferenceVendor, 0),
7
) AS AvgDiff
FROM #TmpTbl_SKUs t1
INNER JOIN (
SELECT Vendor, AVG(DaysDifference) AS AvgDifferenceVendor
FROM #TmpTbl_SKUs
GROUP BY Vendor
) t2 ON t1.Vendor = t2.Vendor
GROUP BY t1.SKU, t2.AvgDifferenceVendor
EDIT 2: how I tested the script.
For testing I'm using the sample data posted with the question.
DECLARE #TmpTbl_SKUs AS TABLE
(
Vendor VARCHAR (255),
Number VARCHAR(4),
SKU VARCHAR(20),
PurchaseOrderDate DATETIME,
LastReceivedDate DATETIME,
DaysDifference INT
)
INSERT INTO #TmpTbl_SKUs
(Vendor, Number, SKU, PurchaseOrderDate, LastReceivedDate, DaysDifference)
SELECT 'OTHER PMDD', '1111', 'OP1111', '2009-08-21 00:00:00.000', '2009-09-02 00:00:00.000', 12
UNION ALL
SELECT 'OTHER PMDD', '1111', 'OP1112', '2009-12-09 00:00:00.000', '2009-12-17 00:00:00.000', 8
UNION ALL
SELECT 'MANTOR', '3333', 'MA1111', '2006-02-15 00:00:00.000', '2006-02-23 00:00:00.000', 8
UNION ALL
SELECT 'MANTOR', '3333', 'MA1112', '2006-02-15 00:00:00.000', '2006-02-23 00:00:00.000', 8;
First I'm running the script on the unmodified data. Here's the result:
SKU AvgDiff
-------------------- -----------
MA1111 8
MA1112 8
OP1111 12
OP1112 8
AvgDiff for every SKU is identical to the original DaysDifference for every SKU, because there's only one row per each one.
Now I'm changing DaysDifference for SKU='MA1111' to 0 and running the script again. Ther result is:
SKU AvgDiff
-------------------- -----------
MA1111 4
MA1112 8
OP1111 12
OP1112 8
Now AvgDiff for MA1111 is 4. Why? Because the average for the SKU is 0, and so the average by Vendor is taken, which has been calculated as (0 + 8) / 2 = 4.
Next step is to set DaysDifference to 0 for all the SKUs of the same Vendor. In this case I'm setting it for SKUs MA1111 and MA1112. Here's the result of the script for this change:
SKU AvgDiff
-------------------- -----------
MA1111 7
MA1112 7
OP1111 12
OP1112 8
So now AvgDiff is 7 for both MA1111 and MA1112. How has it become so? Both have DaysDifference = 0. That means that the average by Vendor should be taken for each one. But Vendor average is 0 too in this case. According to the requirement, the average here should default to 7, which is what the script has returned.
So the script seems to be working correctly. I understand that it's either me having missed something or you having forgotten to mention some details. In any case, I would be glad to see where this script fails to solve your problem.