TSQL Distribute one bucket over several other buckets - tsql

I've been pounding my head on this one for two days now.
Here's my issue:
I have 18 buckets. One bucket has a negative value. I need to distribute the bucket with the negative value across the other 17 buckets as a percent of total for the 17 buckets. I can do this in Excel, but I need to do it in T-SQL without any hard coding because this is going to be used in a stored procedure.
Here's my data (Bucket and Amount) and my results from Excel (Pct of Total, Distribution and New Amount):
BUCKET AMOUNT [Pct of Total] Distribution [New Amount]
1 $174,130.91 9.5384% $(281.49) $173,849.41
2 $54,274.13 2.9730% $(87.74) $54,186.39
3 $150,637.86 8.2515% $(243.51) $150,394.34
4 $389,910.65 21.3581% $(630.31) $389,280.34
5 $379,177.75 20.7702% $(612.96) $378,564.79
6 $79,230.40 4.3400% $(128.08) $79,102.32
7 $47,008.64 2.5750% $(75.99) $46,932.64
8 $47,224.95 2.5868% $(76.34) $47,148.60
9 $102,731.42 5.6273% $(166.07) $102,565.35
10 $8,955.93 0.4906% $(14.48) $8,941.45
11 $43,749.52 2.3965% $(70.72) $43,678.80
12 $16,140.85 0.8841% $(26.09) $16,114.76
13 $72,165.14 3.9530% $(116.66) $72,048.48
14 $23,542.26 1.2896% $(38.06) $23,504.21
15 $874.82 0.0479% $(1.41) $873.41
16 $65,665.10 3.5969% $(106.15) $65,558.95
17 $170,162.38 9.3210% $(275.08) $169,887.30
18 $(2,951.15)
Total $1,822,631.55 100.0000% $(2,951.15) $1,822,631.55

Here is an example how you can manage this:
;with neg as(select sum(amount) as amount from t where amount < 0),
pos as(select * from t where amount >= 0)
select *,
p.amount * 100 / sum(p.amount) over() as pct,
neg.amount * p.amount / sum(p.amount) over() as dist,
p.amount + neg.amount * p.amount / sum(p.amount) over() as new
from pos p
cross join neg
Fiddle here: http://sqlfiddle.com/#!3/0cdd7/4

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

Calculate lift for every row and compare it with the average lift of region and year

I have a table that looks like the one below:
Shop
Year
Region
Waste
Avg Waste (Year,Region)
Lift
Column_I_want_To_Calculate (apply case when statements) CASE WHEN Lift > Avg(Lift) OVER (PARTITION BY YEAR, REGION) THEN 1 ELSE 0 END
a
2021
CA
10
15 =>(10+20)/2
0.67 => 10/15
0.67 < (0.67+1.34)/2 = 1.005 THEN 0
b
2021
CA
20
15=> (10+20)/2
1.34 => 20/15
1.34 > (0.67+1.34)/2 = 1.005 THEN 1
c
2021
FL
8
8 => 8/1
8/8
8 = 8 THEN 0
d
2020
LA
25
22 => (25+19)/2
0.88 => 25/22
0.88 > (0.88+0.87)/2 = 0.875 THEN 1
e
2020
LA
19
22 => (25+19)/2
0.87 => 19/22
0.87 < (0.88+0.87)/2 = 0.875 THEN 0
f
2019
NY
35
35
35/35
35 = 35 THEN 0
So far I have calculated the columns Shop, Year, Region, Waste, Avg Waste (Year, Region), Lift. I want to calculate the one marked as Column_I_want_To_Calculate.
Briefly, it computes the average lift per Region and Year and compares Shops' Lift with the Average Lift of all shops in the same Region and Year. Then assigns the value 1 or 0 in case of a greater than statement.
So far I have tried (PostgreSQL),
SELECT shop
,year
,region
,waste
,AVG(waste) over (partition by year, region) as "Avg Waste (Year,Region)"
,waste/avg(waste) over (partition by year, region) AS Lift,
,CASE WHEN waste/avg(waste) over (partition by year, region) >
(SELECT tab2.avg_lift
FROM (
SELECT tab1.year, tab1.region, AVG(tab1.lift) OVER (PARTITION BY tab1.year, tab1.region) avg_lift
FROM (
SELECT year, region, waste/ avg(waste) over (partition by year, region) AS lift
FROM main_table
GROUP BY year,region,waste
ORDER BY lift DESC
) tab1
GROUP BY tab1.year, tab1.region, tab1.lift
) tab2
) THEN 1 ELSE 0 END AS "Column_I_want_To_Calculate"
FROM main_table
GROUP BY shop,
year,
nomos,
waste
;
However, the code above throws the exception
postgresql error: more than one row returned by a subquery used as an expression
This one returns the required output based on your input:
SELECT
region
, shop
, waste
, round(AVG(waste) OVER w,2) AS avg_waste
, round(waste / AVG(waste) OVER w,2) AS lift
, CASE
WHEN waste > AVG(waste) OVER w THEN 1
ELSE 0
END AS above_average
FROM i
WINDOW w AS (PARTITION BY year, region)
ORDER BY
1,2,3;

forward rolling sum with different stopping points by row

First, some sample data so the business problem can be explained -
select
ItemID = 276,
Quantity,
Bucket,
DaysInMonth = day(eomonth(Bucket)),
DailyQuantity = cast(Quantity * 1.0 / day(eomonth(Bucket)) as decimal(4, 0)),
DaysFactor
into #data
from
(
values
('1/1/2021', 95, 5500),
('2/1/2021', 75, 6000),
('3/1/2021', 80, 5000),
('4/1/2021', 82, 5300),
('5/1/2021', 90, 5200),
('6/1/2021', 80, 6500),
('7/1/2021', 85, 6100),
('8/1/2021', 90, 5100),
('9/1/2021', null, 5800),
('10/1/2021', null, 5900)
) d (Bucket, DaysFactor, Quantity);
select * from #data;
Now, the business problem -
The first row has a DaysFactor of 95.
The forward rolling sum for this row is calculated as
(31 x 177) + (28 x 214) + (31 x 161) + (5 x 177) = 17,355
That is...
the daily quantity for all 31 days of the 1/1/2021 bucket plus
the daily quantity for all 28 days of the 2/1/2021 bucket plus
the daily quantity for all 31 days of the 3/1/2021 bucket plus
the daily quantity for 5 days of the 4/1/2021 bucket.
This results in 95 days of forward looking quantity.
95 days = 31 + 28 + 31 + 5
For the second row, with a DaysFactor of 75, it would start with daily quantity for the 28 days in the 2/1/2021 bucket and go out until a total of 75 days' worth of quantity were summed, like so:
(28 x 214) + (31 x 161) + (16 x 177) = 13,815
75 days = 28 + 31 + 16
One approach to this is building a calendar of daily demand and then summing quantity over the specified days. However, I'm stuck on how to do the summing. Here is the code that builds the calendar with daily quantities:
with
dates as
(
select
FirstDay = min(cast(Bucket as date)),
LastDay = eomonth(max(cast(Bucket as date)))
from #data
),
tally as (
select top (select datediff(d, FirstDay, LastDay) + 1 from dates) --restrict to number of rows equal to number of days between first and last days
n = row_number() over(order by (select null)) - 1
from sys.messages
),
calendar as (
select
Bucket = dateadd(d, t.n, d.FirstDay)
from tally t
cross join dates d
)
select
c.Bucket,
d.DailyQuantity
from #data d
inner join calendar c
on year(d.Bucket) = year(c.Bucket)
and month(d.Bucket) = month(c.Bucket);
Here's a screenshot of a subset of rows from this query:
I was hoping to use T-SQL's LEAD() to do this but don't see a way to put the DaysFactor into the ROWS clause within OVER(). Is there a way to do that? If not, is there a set based approach to calculating the rolling forward sum?
Expected result set:
Figured it out using an approach different than LEAD(). This column was added to #data:
BucketEnd = cast(dateadd(d, DaysFactor - 1, Bucket) as date)
Then code that builds the calendar with daily quantities shown in original question was put into a temp table called #calendar.
Then this query performs the calculations:
select
d.ItemID,
d.Bucket,
RollingForwardQuantitySum = sum(iif(c.Bucket between d.Bucket and d.BucketEnd, c.DailyQuantity, null))
from #data d
cross join #calendar c
group by
d.ItemID,
d.Bucket
order by
d.ItemID,
cast(d.Bucket as date);
The output from this query matches the expected result set screen shot in the original post.

While loop to add data for pivot

Currently i have a requirement which needs a table to look like this:
Instrument Long Short 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 ....
Fixed 41 41 35 35 35 35 35 35 35 53 25 25
Index 16 16 22 22 22 32 12 12 12 12 12 12
Credits 29 29 41 16 16 16 16 16 16 16 16 16
Short term 12 12 5 5 5 5 5 5 5 5 5 17
My worktable looks like the following:
Instrument Long Short Annual Coupon Maturity Date Instrument ID
Fixed 10 10 10 01/01/2025 1
Index 5 5 10 10/05/2016 2
Credits 15 15 16 25/06/2020 3
Short term 12 12 5 31/10/2022 4
Fixed 13 13 15 31/03/2030 5
Fixed 18 18 10 31/01/2019 6
Credits 14 14 11 31/12/2013 7
Index 11 11 12 31/10/2040 8
..... etc
So basically the long and the short in the pivot should be the sum of each distinct instrument ID. And then for each year i need to take the sum of each Annual Coupon until the maturity date year where the long and the coupon rate are added together.
My thinking was that i had to create a while loop which would populate a table with a record for each year for each instrument until the maturity date, so that i could then pivot using an sql pivot some how. Does this seem feasible? Any other ideas on the best way of doing this, particularly i might need help on the while loop?
The following solution uses a numbers table to unfold ranges in your table, performs some special processing on some of the data columns in the unfolded set, and finally pivots the results:
WITH unfolded AS (
SELECT
t.Instrument,
Long = SUM(z.Long ) OVER (PARTITION BY Instrument),
Short = SUM(z.Short) OVER (PARTITION BY Instrument),
Year = y.Number,
YearValue = t.AnnualCoupon + z.Long + z.Short
FROM YourTable t
CROSS APPLY (SELECT YEAR(t.MaturityDate)) x (Year)
INNER JOIN numbers y ON y.Number BETWEEN YEAR(GETDATE()) AND x.Year
CROSS APPLY (
SELECT
Long = CASE y.Number WHEN x.Year THEN t.Long ELSE 0 END,
Short = CASE y.Number WHEN x.Year THEN t.Short ELSE 0 END
) z (Long, Short)
),
pivoted AS (
SELECT *
FROM unfolded
PIVOT (
SUM(YearValue) FOR Year IN ([2013], [2014], [2015], [2016], [2017], [2018], [2019], [2020],
[2021], [2022], [2023], [2024], [2025], [2026], [2027], [2028], [2029], [2030],
[2031], [2032], [2033], [2034], [2035], [2036], [2037], [2038], [2039], [2040])
) p
)
SELECT *
FROM pivoted
;
It returns results for a static range years. To use it for a dynamically calculated year range, you'll first need to prepare the list of years as a CSV string, something like this:
SET #columnlist = STUFF(
(
SELECT ', [' + CAST(Number) + ']'
FROM numbers
WHERE Number BETWEEN YEAR(GETDATE())
AND (SELECT YEAR(MAX(MaturityDate)) FROM YourTable)
ORDER BY Number
FOR XML PATH ('')
),
1, 2, ''
);
then put it into the dynamic SQL version of the query:
SET #sql = N'
WITH unfolded AS (
...
PIVOT (
SUM(YearValue) FOR Year IN (' + #columnlist + ')
) p
)
SELECT *
FROM pivoted;
';
and execute the result:
EXECUTE(#sql);
You can try this solution at SQL Fiddle.

TSQL cumulative column from previous row [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Calculate a Running Total in SqlServer
I need to use values from previous row inorder to generate a cumulative value as shown below. Always for each Code for the year 2000 the starting Base is 100.
I need to ahieve this using tsql code.
id Code Yr Rate Base
1 4 2000 5 100
2 4 2001 7 107 (100+7)
3 4 2002 4 111 (107+4)
4 4 2003 8 119 (111+8)
5 4 2004 10 129 (119+10)
6 5 2000 2 100
7 5 2001 3 103 (100+3)
8 5 2002 8 111 (103+8)
9 5 2003 5 116 (111+5)
10 5 2004 4 120 (116+4)
OK. We have table like this
CREATE Table MyTbl(id INT PRIMARY KEY IDENTITY(1,1), Code INT, Yr INT, Rate INT)
And we would like to calculate cumulative value by Code.
So we can use query like this:
1) recursion (requires more resources, but outputs the result as in the example)
with cte as
(SELECT *, ROW_NUMBER()OVER(PARTITION BY Code ORDER BY Yr ASC) rn
FROM MyTbl),
recursion as
(SELECT id,Code,Yr,Rate,rn, CAST(NULL as int) as Tmp_base, CAST('100' as varchar(25)) AS Base FROM cte
WHERE rn=1
UNION ALL
SELECT cte.id,cte.Code,cte.Yr,cte.Rate,cte.rn,
CAST(recursion.Base as int),
CAST(recursion.Base+cte.Rate as varchar(25))
FROM recursion JOIN cte ON recursion.Code=cte.Code AND recursion.rn+1=cte.rn
)
SELECT id,Code,Yr,Rate,
CAST(Base as varchar(10))+ISNULL(' ('+ CAST(Tmp_base as varchar(10))+'+'+CAST(Rate as varchar(10))+')','') AS Base
FROM recursion
ORDER BY 1
OPTION(MAXRECURSION 0)
2) or we can use a faster query without using recursion. but the result is impossible to generate the strings like '107 (100+7)' (only strings like '107')
SELECT *,
100 +
(SELECT ISNULL(SUM(rate),0) /*we need to calculate only the sum in subquery*/
FROM MyTbl AS a
WHERE
a.Code=b.Code /*the year in subquery equals the year in main query*/
AND a.Yr<b.Yr /*main feature in our subquery*/
) AS base
FROM MyTbl AS b