DAX Equivalent to T-SQL AVG OVER(PARTITION BY) - tsql

Apologies if this is a simple thing to achieve but after reading several similar posts, I cannot seem to find the right answer.
What I am basically trying to do is replicate the functionality of calculating an average over a group of records.
Below is a quick bit of SQL to demonstrate what I want to get to.
DECLARE #T TABLE(CountryID int, CategoryID int, ProductID int, Price float)
INSERT INTO #T VALUES
(1,20, 300, 10),
(1,20, 301, 11),
(1,20, 302, 12),
(1,20, 303, 13),
(1,30, 300, 21),
(1,30, 300, 22),
(1,30, 300, 23),
(1,30, 300, 24),
(2,20, 300, 5),
(2,20, 301, 6),
(2,20, 302, 7),
(2,20, 303, 8),
(2,30, 300, 9),
(2,30, 300, 8),
(2,30, 300, 7),
(2,30, 300, 6)
SELECT
*
, AVG(Price) OVER(PARTITION BY CountryID, CategoryID) AS AvgPerCountryCategory
FROM #t
Which gives me the results I require ...
CountryID CategoryID ProductID Price AvgPerCountryCategory
1 20 300 10 11.5
1 20 301 11 11.5
1 20 302 12 11.5
1 20 303 13 11.5
1 30 300 21 22.5
1 30 300 22 22.5
1 30 300 23 22.5
1 30 300 24 22.5
2 20 300 5 6.5
2 20 301 6 6.5
2 20 302 7 6.5
2 20 303 8 6.5
2 30 300 9 7.5
2 30 300 8 7.5
2 30 300 7 7.5
2 30 300 6 7.5
As you can see each row now shows the average Price for the respective Country/Category. At a later stage this will be used to calculate a variance from this average, but for now I'd just like to get to this point and try to workout the next steps myself.
So what would bethe equivalent of AVG(Price) OVER(PARTITION BY CountryID, CategoryID) in DAX?
The plan is that the result will also take into account any filters that are applied to the data in Power BI. I'm not sure if this is important at this stage. However this does mean that doing this work in SQL is probably not an option.
I'm very new to DAX so an explanation any suggested expression would also be very wlecome.

You can create a new calculated column that gives you this as follows:
AvgPerCountryCategory =
CALCULATE(AVERAGE('#T'[Price]),
ALLEXCEPT('#T', '#T'[CountryID], '#T'[CategoryID]))
This is saying that we take the average over all rows where the CountryID and CategoryID match the ID values in the current row. (It removes all the row context except for those.)
This is equivalent to this version:
AvgPerCountryCategory =
CALCULATE(AVERAGE('#T'[Price]),
ALL('#T'[ProductID], '#T'[Price]))
This time we're telling it what row context to remove rather than what to keep.
Another way would be to remove all row context and then the parts you want back in explicitly:
AvgPerCountryCategory =
CALCULATE(AVERAGE('#T'[Price]),
ALL('#T'),
'#T'[CountryID] = EARLIER('#T'[CountryID]),
'#T'[CategoryID] = EARLIER('#T'[CategoryID]))
The EARLIER function refers to the earlier row context.
Edit:
The code above is written for calculated columns. For a measure, I'd recommend:
AvgPerCountryCategory =
CALCULATE (
AVERAGE ( '#T'[Price] ),
ALLSELECTED ( '#T' ),
SUMMARIZE (
'#T',
'#T'[CategoryID],
'#T'[CountryID]
)
)

Related

forward rolling sum with different stopping points by row

First, some sample data so the business problem can be explained -
select
ItemID = 276,
Quantity,
Bucket,
DaysInMonth = day(eomonth(Bucket)),
DailyQuantity = cast(Quantity * 1.0 / day(eomonth(Bucket)) as decimal(4, 0)),
DaysFactor
into #data
from
(
values
('1/1/2021', 95, 5500),
('2/1/2021', 75, 6000),
('3/1/2021', 80, 5000),
('4/1/2021', 82, 5300),
('5/1/2021', 90, 5200),
('6/1/2021', 80, 6500),
('7/1/2021', 85, 6100),
('8/1/2021', 90, 5100),
('9/1/2021', null, 5800),
('10/1/2021', null, 5900)
) d (Bucket, DaysFactor, Quantity);
select * from #data;
Now, the business problem -
The first row has a DaysFactor of 95.
The forward rolling sum for this row is calculated as
(31 x 177) + (28 x 214) + (31 x 161) + (5 x 177) = 17,355
That is...
the daily quantity for all 31 days of the 1/1/2021 bucket plus
the daily quantity for all 28 days of the 2/1/2021 bucket plus
the daily quantity for all 31 days of the 3/1/2021 bucket plus
the daily quantity for 5 days of the 4/1/2021 bucket.
This results in 95 days of forward looking quantity.
95 days = 31 + 28 + 31 + 5
For the second row, with a DaysFactor of 75, it would start with daily quantity for the 28 days in the 2/1/2021 bucket and go out until a total of 75 days' worth of quantity were summed, like so:
(28 x 214) + (31 x 161) + (16 x 177) = 13,815
75 days = 28 + 31 + 16
One approach to this is building a calendar of daily demand and then summing quantity over the specified days. However, I'm stuck on how to do the summing. Here is the code that builds the calendar with daily quantities:
with
dates as
(
select
FirstDay = min(cast(Bucket as date)),
LastDay = eomonth(max(cast(Bucket as date)))
from #data
),
tally as (
select top (select datediff(d, FirstDay, LastDay) + 1 from dates) --restrict to number of rows equal to number of days between first and last days
n = row_number() over(order by (select null)) - 1
from sys.messages
),
calendar as (
select
Bucket = dateadd(d, t.n, d.FirstDay)
from tally t
cross join dates d
)
select
c.Bucket,
d.DailyQuantity
from #data d
inner join calendar c
on year(d.Bucket) = year(c.Bucket)
and month(d.Bucket) = month(c.Bucket);
Here's a screenshot of a subset of rows from this query:
I was hoping to use T-SQL's LEAD() to do this but don't see a way to put the DaysFactor into the ROWS clause within OVER(). Is there a way to do that? If not, is there a set based approach to calculating the rolling forward sum?
Expected result set:
Figured it out using an approach different than LEAD(). This column was added to #data:
BucketEnd = cast(dateadd(d, DaysFactor - 1, Bucket) as date)
Then code that builds the calendar with daily quantities shown in original question was put into a temp table called #calendar.
Then this query performs the calculations:
select
d.ItemID,
d.Bucket,
RollingForwardQuantitySum = sum(iif(c.Bucket between d.Bucket and d.BucketEnd, c.DailyQuantity, null))
from #data d
cross join #calendar c
group by
d.ItemID,
d.Bucket
order by
d.ItemID,
cast(d.Bucket as date);
The output from this query matches the expected result set screen shot in the original post.

t-sql function like "filter" for sum(x) filter(condition) over (partition by

I'm trying to sum a window with a filter. I saw something similar to
sum(x) filter(condition) over (partition by...)
but it does not seem to work in t-sql, SQL Server 2017.
Essentially, I want to sum the last 5 rows that have a condition on another column.
I've tried
sum(case when condition...) over (partition...)
and sum(cast(nullif(x))) over (partition...).
I've tried left joining the table with a where condition to filter out the condition.
All of the above will add the last 5 from the starting point of the current row with the condition.
What I want is from the current row. Add the last 5 values above that meet a condition.
Date| Value | Condition | Result
1-1 10 1
1-2 11 1
1-3 12 1
1-4 13 1
1-5 14 0
1-6 15 1
1-7 16 0
1-8 17 0 sum(15+13+12+11+10)
1-9 18 1 sum(18+15+13+12+11)
1-10 19 1 sum(19+18+15+13+12)
In the above example the condition I would want would be 1, ignoring the 0 but still having the "window" size be 5 non-0 values.
This can easily be achieved using a correlated sub query:
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
[Date] Date,
[Value] int,
Condition bit
)
INSERT INTO #T ([Date], [Value], Condition) VALUES
('2019-01-01', 10, 1),
('2019-01-02', 11, 1),
('2019-01-03', 12, 1),
('2019-01-04', 13, 1),
('2019-01-05', 14, 0),
('2019-01-06', 15, 1),
('2019-01-07', 16, 0),
('2019-01-08', 17, 0),
('2019-01-09', 18, 1),
('2019-01-10', 19, 1)
The query:
SELECT [Date], [Value], Condition,
(
SELECT Sum([Value])
FROM
(
SELECT TOP 5 [Value]
FROM #T AS t1
WHERE Condition = 1
AND t1.[Date] <= t0.[Date]
-- If you want the sum to appear starting from a specific date, unremark the next row
--AND t0.[Date] > '2019-01-07'
ORDER BY [Date] DESC
) As t2
HAVING COUNT(*) = 5 -- there are at least 5 rows meeting the condition
) As Result
FROM #T As T0
Results:
Date Value Condition Result
2019-01-01 10 1
2019-01-02 11 1
2019-01-03 12 1
2019-01-04 13 1
2019-01-05 14 0
2019-01-06 15 1 61
2019-01-07 16 0 61
2019-01-08 17 0 61
2019-01-09 18 1 69
2019-01-10 19 1 77

While loop to add data for pivot

Currently i have a requirement which needs a table to look like this:
Instrument Long Short 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 ....
Fixed 41 41 35 35 35 35 35 35 35 53 25 25
Index 16 16 22 22 22 32 12 12 12 12 12 12
Credits 29 29 41 16 16 16 16 16 16 16 16 16
Short term 12 12 5 5 5 5 5 5 5 5 5 17
My worktable looks like the following:
Instrument Long Short Annual Coupon Maturity Date Instrument ID
Fixed 10 10 10 01/01/2025 1
Index 5 5 10 10/05/2016 2
Credits 15 15 16 25/06/2020 3
Short term 12 12 5 31/10/2022 4
Fixed 13 13 15 31/03/2030 5
Fixed 18 18 10 31/01/2019 6
Credits 14 14 11 31/12/2013 7
Index 11 11 12 31/10/2040 8
..... etc
So basically the long and the short in the pivot should be the sum of each distinct instrument ID. And then for each year i need to take the sum of each Annual Coupon until the maturity date year where the long and the coupon rate are added together.
My thinking was that i had to create a while loop which would populate a table with a record for each year for each instrument until the maturity date, so that i could then pivot using an sql pivot some how. Does this seem feasible? Any other ideas on the best way of doing this, particularly i might need help on the while loop?
The following solution uses a numbers table to unfold ranges in your table, performs some special processing on some of the data columns in the unfolded set, and finally pivots the results:
WITH unfolded AS (
SELECT
t.Instrument,
Long = SUM(z.Long ) OVER (PARTITION BY Instrument),
Short = SUM(z.Short) OVER (PARTITION BY Instrument),
Year = y.Number,
YearValue = t.AnnualCoupon + z.Long + z.Short
FROM YourTable t
CROSS APPLY (SELECT YEAR(t.MaturityDate)) x (Year)
INNER JOIN numbers y ON y.Number BETWEEN YEAR(GETDATE()) AND x.Year
CROSS APPLY (
SELECT
Long = CASE y.Number WHEN x.Year THEN t.Long ELSE 0 END,
Short = CASE y.Number WHEN x.Year THEN t.Short ELSE 0 END
) z (Long, Short)
),
pivoted AS (
SELECT *
FROM unfolded
PIVOT (
SUM(YearValue) FOR Year IN ([2013], [2014], [2015], [2016], [2017], [2018], [2019], [2020],
[2021], [2022], [2023], [2024], [2025], [2026], [2027], [2028], [2029], [2030],
[2031], [2032], [2033], [2034], [2035], [2036], [2037], [2038], [2039], [2040])
) p
)
SELECT *
FROM pivoted
;
It returns results for a static range years. To use it for a dynamically calculated year range, you'll first need to prepare the list of years as a CSV string, something like this:
SET #columnlist = STUFF(
(
SELECT ', [' + CAST(Number) + ']'
FROM numbers
WHERE Number BETWEEN YEAR(GETDATE())
AND (SELECT YEAR(MAX(MaturityDate)) FROM YourTable)
ORDER BY Number
FOR XML PATH ('')
),
1, 2, ''
);
then put it into the dynamic SQL version of the query:
SET #sql = N'
WITH unfolded AS (
...
PIVOT (
SUM(YearValue) FOR Year IN (' + #columnlist + ')
) p
)
SELECT *
FROM pivoted;
';
and execute the result:
EXECUTE(#sql);
You can try this solution at SQL Fiddle.

TSQL cumulative column from previous row [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Calculate a Running Total in SqlServer
I need to use values from previous row inorder to generate a cumulative value as shown below. Always for each Code for the year 2000 the starting Base is 100.
I need to ahieve this using tsql code.
id Code Yr Rate Base
1 4 2000 5 100
2 4 2001 7 107 (100+7)
3 4 2002 4 111 (107+4)
4 4 2003 8 119 (111+8)
5 4 2004 10 129 (119+10)
6 5 2000 2 100
7 5 2001 3 103 (100+3)
8 5 2002 8 111 (103+8)
9 5 2003 5 116 (111+5)
10 5 2004 4 120 (116+4)
OK. We have table like this
CREATE Table MyTbl(id INT PRIMARY KEY IDENTITY(1,1), Code INT, Yr INT, Rate INT)
And we would like to calculate cumulative value by Code.
So we can use query like this:
1) recursion (requires more resources, but outputs the result as in the example)
with cte as
(SELECT *, ROW_NUMBER()OVER(PARTITION BY Code ORDER BY Yr ASC) rn
FROM MyTbl),
recursion as
(SELECT id,Code,Yr,Rate,rn, CAST(NULL as int) as Tmp_base, CAST('100' as varchar(25)) AS Base FROM cte
WHERE rn=1
UNION ALL
SELECT cte.id,cte.Code,cte.Yr,cte.Rate,cte.rn,
CAST(recursion.Base as int),
CAST(recursion.Base+cte.Rate as varchar(25))
FROM recursion JOIN cte ON recursion.Code=cte.Code AND recursion.rn+1=cte.rn
)
SELECT id,Code,Yr,Rate,
CAST(Base as varchar(10))+ISNULL(' ('+ CAST(Tmp_base as varchar(10))+'+'+CAST(Rate as varchar(10))+')','') AS Base
FROM recursion
ORDER BY 1
OPTION(MAXRECURSION 0)
2) or we can use a faster query without using recursion. but the result is impossible to generate the strings like '107 (100+7)' (only strings like '107')
SELECT *,
100 +
(SELECT ISNULL(SUM(rate),0) /*we need to calculate only the sum in subquery*/
FROM MyTbl AS a
WHERE
a.Code=b.Code /*the year in subquery equals the year in main query*/
AND a.Yr<b.Yr /*main feature in our subquery*/
) AS base
FROM MyTbl AS b

SQL Pivot Tables - Copy Excel Functionality

Let's say I have a table like this:
Task Type Variable Hours Duration
One A X 10 5
One A Y 40 15
One B X 100 29
Two A X 5 2
Two B X 15 9
Two A Y 60 17
Three A Y 18 5
Where the combination of task-type-variable makes each row unique.
How can I get a pivot table like the following:
X Y
One A Hours 10 40
Duration 5 15
One B Hours 100 0
Duration 29 0
Two A Hours 5 60
Duration 2 17
Two B Hours 15 0
Duration 9 0
Three A Hours 0 18
Duration 0 5
Is this even possible in SQL? I know Excel can do this.
This is a really an UNPIVOT and a PIVOT. The following code achieves the desired results in a single query.
DECLARE #t TABLE (
Task varchar(5),
Type char(1),
Variable char(1),
Hours int,
Duration int
)
INSERT INTO #t
VALUES
('One', 'A', 'X', 10, 5),
('One', 'A', 'Y', 40, 15),
('One', 'B', 'X', 100, 29),
('Two', 'A', 'X', 5, 2),
('Two', 'B', 'X', 15, 9),
('Two', 'A', 'Y', 60, 17),
('Three', 'A', 'Y', 18, 5)
SELECT
P.Task,
P.Type,
CAST(P.Property AS varchar(8)) AS Property,
COALESCE(P.X, 0) AS X,
COALESCE(P.Y, 0) AS Y
FROM #t AS T
UNPIVOT (
Value FOR Property IN (
Hours,
Duration
)
) AS U
PIVOT (
SUM(Value) FOR Variable IN (
X,
Y
)
) AS P
This yields the following results.
Task Type Property X Y
----- ---- -------- ----------- -----------
One A Duration 5 15
One A Hours 10 40
One B Duration 29 0
One B Hours 100 0
Three A Duration 0 5
Three A Hours 0 18
Two A Duration 2 17
Two A Hours 5 60
Two B Duration 9 0
Two B Hours 15 0
As you can see, the Hours and Duration are flipped. I don't think there is any way to force an order using PIVOT alone. This could easily be remedied by joining to another table with the Property value with an associated sort order, as long as you had some other way to ensure the other columns sorted correctly first.