Can this be done without a loop or cursor - tsql

I have a table that list individual items and the amount we billed for them. We receive a payment that may be less than the total amount billed. I want to allocate that payment to each item in proportion to the original billed amount.
Here's the tricky part.
Each individual paid amount can not have fractional cents
The sum of the individual paid amounts must still add up to the TotalPaid Amount.
Setting up the data:
declare #t table
(
id varchar(4) primary key,
Billed money not null
)
insert into #t
(
id,
billed
)
values
( 'A', 5),
( 'B', 3),
( 'C', 2)
declare #TotalPaid money
Set #TotalPaid = 3.33
This way doesn't work
SELECT
ID,
Round(#TotalPaid * Billed / (Select Sum(Billed) from #t), 2)
From
#T
it will return:
A 1.67
C 1
D 0.67
-----
3.34 <--- Note the sum doesn't equal the Total Paid
I know I can accomplish this via a cursor or a loop, keeping track of the unallocated amount at each step and insuring that after the last item the entire TotalPaid amount is allocated.
However I was hoping there was a way to do this without a loop or cursors.
This is a greatly simplified version of the problem I'm trying to address. The actual data has over 100K rows and the cursor approach is really slow.

I think this is a viable approach...
(Pass 1 as the third parameter to ROUND to ensure rounding is always down then distribute the odd 0.01s that make up the balance to ones where the difference between the rounded amount and the ideal amount is the greatest)
WITH t1
AS (SELECT *,
billed_adj = #TotalPaid * Billed / Sum(Billed) OVER(),
billed_adj_trunc = ROUND(#TotalPaid * Billed / Sum(Billed) OVER(), 2, 1)
FROM #t)
SELECT id,
billed,
billed_adj_trunc + CASE
WHEN ROW_NUMBER() OVER (ORDER BY billed_adj - billed_adj_trunc DESC)
<= 100 * ( #TotalPaid - SUM(billed_adj_trunc) OVER() )
THEN 0.01
ELSE 0
END
FROM t1
ORDER BY id

Here is a (somewhat complicated) solution using a recursive common table expression
;with cte as (
select
id
, Paid = round(#TotalPaid * Billed / (Select Sum(Billed) from #t), 2,1)
, Remainder = #TotalPaid * Billed / (Select Sum(Billed) from #t)
- round(#TotalPaid * Billed / (Select Sum(Billed) from #t), 2,1)
, x.next_id
from #t t
outer apply (
select top 1 next_id = i.id
from #t as i
where i.id > t.id
order by i.id asc
) x
)
, r_cte as (
--anchor row(s) / starting row(s)
select
id
, Paid
, Remainder
, next_id
from cte t
where not exists (
select 1
from cte as i
where i.id < t.id
)
union all
--recursion starts here
select
c.id
, c.Paid + round(c.Remainder + p.Remainder,2,1)
, Remainder = c.Remainder + p.Remainder - round(c.Remainder + p.Remainder,2,1)
, c.next_id
from cte c
inner join r_cte p
on c.id = p.next_id
)
select id, paid
from r_cte
rextester demo: http://rextester.com/MKLDX88496
returns:
+----+------+
| id | paid |
+----+------+
| A | 1.66 |
| B | 1.00 |
| C | 0.67 |
+----+------+

For something like this you are not going to able to apply an exact distribution; as you hvae already shown the rounding results in the total exceeding the payment received.
You will therefore need to distribute "whatever is left" to the final [Billed], so you'll need to do 2 things...
Determine if the current row is the final row in that group.
Determine how much of the payment has already been distributed.
You don't give much data to work with here, so the following is not ideal, however this is along the lines of what you want...
SELECT
ID,
CASE WHEN lead(billed,1) OVER(ORDER BY (SELECT 1)) IS NULL THEN #TotalPaid - (sum(round(#TotalPaid * Billed / (Select Sum(Billed) from #t),2)) OVER(ORDER BY (SELECT 1) ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING))
ELSE round(#TotalPaid * Billed / (Select Sum(Billed) from #t),2)
END AS solution
FROM
#T;
Note that if the A,B,C then has a higher key this would make up the "group" so you would adjust the window functions accordingly. If you could supply some more sample data with additional columns etc. I could maybe come up with a more elegant solution.

Related

How to Return Records Equal to a Specific Percentage of an Aggregate in Transact-SQL?

My requirement is to provide a random sample of claims that comprise 2.5% of the total amount paid and also comprise 2.5% of total claims for a given population. The goal is to deliver records in a report that meet both criteria. My staging table is defined as follows:
[RecordId] UniqueIdentifier NOT NULL PRIMARY KEY DEFAULT NEWID()
,ClaimNO varchar(50)
,Company_ID varchar(10)
,HPCode varchar(10)
,FinancialResponsibility varchar(30)
,ProviderType varchar(50)
,DateOfService date
,DatePaid date
,ClaimType varchar(50)
,TotalBilled numeric(11,2)
,TotalPaid numeric(11,2)
,ProcessorType varchar(100)
I've already built the logic to return 2.5% of the total number of claims but need guidance in how best to ensure both criterion are met.
Here's what I've tried thus far:
with cteTotals as (
Select Count(*) as TotalClaims, sum(TotalPaid) as TotalPaid, sum(TotalPaid) * .025 as PaidSampleAmount
from [Z_Monthly_Quality_Review]
),
ctePopulation as (
Select *
from [Z_Monthly_Quality_Review]
),
cteSampleRows as (
select TOP 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review]
order by NEWID()
),
cteSamplePaid as (
Select Top 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review] mqr
inner join ctePopulation cte on mqr.ClaimNo = cte.ClaimNO
order by NEWID()
)
Since both criterion must be satisfied, how should I structure both CTEs to ensure this? In my cteSamplePaid, how do I ensure that the sum of total paid equals 2.5% of the total population? Would this be accomplished with a Having clause? The end result will be displayed to my business users via SQL Server Reporting Services. Ideally, I would want to provide them with 1 sample that meets both criteria. If that's not possible, how do I randomly sample claims from both criterion?
Don't think there is a guaranteed way it will add up to 2.5% of the total. There's no guarantee results and the performance would be very poor as it you would essentially have to brute force every possible combination of rows. A way to get very close to your goal would be to use return rows that add up to an acceptable margin of error.
Since no sample data was provided, I just used AdventureWorks2017 (downloaded from here)
USE AdventureWorks2017
GO
DROP TABLE IF EXISTS #SalesData
SELECT SalesOrderID AS ID,TotalDue
INTO #SalesData
FROM Sales.SalesOrderHeader
Declare #DesiredPercentage Numeric(10,3) = .025 /*Desired sum percentage of total rows*/
,#AcceptableMargin Numeric(10,3) = .01 /*Random row total can be plus or minus this percentage of the desired sum*/
DECLARE #DesiredSum Numeric(16,2) = #DesiredPercentage *(SELECT SUM(TotalDue) FROM #SalesData)
/*For loop*/
DECLARE #RowNum INT
,#LoopCounter INT = 1
WHILE (1=1)
BEGIN
DROP TABLE IF EXISTS #RandomData
SELECT RowNum = ROW_NUMBER() OVER (ORDER BY B.RandID),A.*,RunningTotal = SUM(TotalDue) OVER (ORDER BY B.RandID)
INTO #RandomData
FROM #SalesData AS A
CROSS APPLY (SELECT RandID = NEWID()) AS B
WHERE TotalDue < #DesiredSum /*If single row bigger than desired sum, then filter it out*/
ORDER BY B.RandID
SELECT Top(1) #RowNum = RowNum
FROM #RandomData AS A
CROSS APPLY (SELECT DeltaFromDesiredSum = ABS(RunningTotal-#DesiredSum)) AS B
WHERE RunningTotal BETWEEN #DesiredSum *(1-#AcceptableMargin) AND #DesiredSum *(1+#AcceptableMargin)
ORDER BY DeltaFromDesiredSum
IF (#RowNum IS NOT NULL)
BREAK;
IF (#LoopCounter >=100) /*Prevents infinite loops*/
THROW 59194,'Result unable to be generated in 100 tries. Recommend expanding acceptable margin',1;
SET #LoopCounter +=1;
END
SELECT *
FROM #RandomData
WHERE RowNum <= #RowNum
SELECT RandomRowTotal = SUM(TotalDue)
,DesiredSum = #DesiredSum
,PercentageFromDesiredSum = Concat(Cast(Round(100*(1-SUM(TotalDue)/#DesiredSum),2) as Float),'%')
FROM #RandomData
WHERE RowNum <= #RowNum

SQL Server - Select with Group By together Raw_Number

I'm using SQL Server 2000 (80). So, it's not possible to use the LAG function.
I have a code a data set with four columns:
Purchase_Date
Facility_no
Seller_id
Sale_id
I need to identify missing Sale_ids. So every sale_id is a 100% sequential, so the should not be any gaps in order.
This code works for a specific date and store if specified. But i need to work on entire data set looping looping through every facility_id and every seller_id for ever purchase_date
declare #MAXCOUNT int
set #MAXCOUNT =
(
select MAX(Sale_Id)
from #table
where
Facility_no in (124) and
Purchase_date = '2/7/2020'
and Seller_id = 1
)
;WITH TRX_COUNT AS
(
SELECT 1 AS Number
union all
select Number + 1 from TRX_COUNT
where Number < #MAXCOUNT
)
select * from TRX_COUNT
where
Number NOT IN
(
select Sale_Id
from #table
where
Facility_no in (124)
and Purchase_Date = '2/7/2020'
and seller_id = 1
)
order by Number
OPTION (maxrecursion 0)
My Dataset
This column:
case when
Sale_Id=0 or 1=Sale_Id-LAG(Sale_Id) over (partition by Facility_no, Purchase_Date, Seller_id)
then 'OK' else 'Previous Missing' end
will tell you which Seller_Ids have some sale missing. If you want to go a step further and have exactly your desired output, then filter out and distinct the 'Previous Missing' ones, and join with a tally table on not exists.
Edit: OP mentions in comments they can't use LAG(). My suggestion, then, would be:
Make a temp table that that has the max(sale_id) group by facility/seller_id
Then you can get your missing results by this pseudocode query:
Select ...
from temptable t
inner join tally N on t.maxsale <=N.num
where not exists( select ... from sourcetable s where s.facility=t.facility and s.seller=t.seller and s.sale=N.num)
> because the only way to "construct" nonexisting combinations is to construct them all and just remove the existing ones.
This one worked out
; WITH cte_Rn AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Facility_no, Purchase_Date, Seller_id ORDER BY Purchase_Date) AS [Rn_Num]
FROM (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id
FROM MyTable WITH (NOLOCK)
) a
)
, cte_Rn_0 as (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Rn_Num] AS 'Skipped Sale'
-- , case when Sale_id = 0 Then [Rn_Num] - 1 Else [Rn_Num] End AS 'Skipped Sale for 0'
, [Rn_Num] - 1 AS 'Skipped Sale for 0'
FROM cte_Rn a
)
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Skipped Sale],
[Skipped Sale for 0]
FROM cte_Rn_0 a
WHERE NOT EXISTS
(
select * from cte_Rn_0 b
where b.Sale_id = a.[Skipped Sale for 0]
and a.Facility_no = b.Facility_no
and a.Purchase_Date = b.Purchase_Date
and a.Seller_id = b.Seller_id
)
--ORDER BY Purchase_Date ASC

How do I create a basic looping calculation in sql to avoid doing 200+ Joins

All -
I have a basic issue, which is becoming a major nightmare.
I am creating a payment schedule table for future dates. In order to calculate the future balances, I need to continuously reduce the starting balance in Table A on each date, based upon the future payment amount in table B.
The problem is that I have to left join Table B based upon what the balance is in Table A, and do that for every single row because the ending balance in Row 1 is the starting balance in Row 2, and the ending balance in Row 2, is the starting balance in Row 3. This is a cumulative / looping calculation.
Here is a depiction of what I am trying to do:
[![Table Examples][1]][1]
Here is the actual SQL:
with dates_rns as (
select
a.loan_id
,as_of_date as payment_date
,upb_usd as starting_balance
,principal_amount as payment
,new_upb as new_balance
,row_number() over (partition by loan_id order by a.as_of_date) as rn
from scratchpad.iit1 a
where row_value < 100
), payment_sched as (
select loan_id, payment_date, starting_balance,
payment, new_balance, rn
from dates_rns a
where rn = 1
union all
select n.loan_id, n.payment_date,
p.new_balance as starting_balance,
least(b.principal_amount, p.new_balance),
greatest(p.new_balance - b.principal_amount, 0.00) as new_balance,
n.rn
from dates_rns n -- 'n' is for this payment
join payment_sched p -- 'p' is for previous payment
on n.rn = p.rn + 1
join scratchpad.collectability_1_Princ b -- payment lookup
on b.loan_id = p.loan_id
and round(b.previous_upb) > round(p.new_balance)
and round(b.remaining_upb) <= round(p.new_balance)
)
select *
from payment_sched
You are looking for a recursion query here. These queries can go haywire depending on your data, so I will restrict it to one loan_id, passed in the params CTE.
Change the 12345 to a valid loan_id, and see if this works for you. Please let me know in comments if it gives you trouble.
You cannot use an outer join inside of the recursive CTE, so table_b has to cover the full range.
with recursive params as (
select 12345 as loan_id
), dates_rns as (
select a.loan_id, a.payment_date, a.starting_balance, a.payment, a.new_balance,
row_number() over (order by a.payment_date) as rn
from params p
join table_a a on a.loan_id = p.loan_id
), payment_sched as (
select loan_id, payment_date, starting_balance,
payment, new_balance, rn
from dates_rns a
where rn = 1
union all
select n.loan_id, n.payment_date,
p.new_balance as starting_balance,
least(b.payment, p.new_balance),
greatest(p.new_balance - b.payment, 0.00) as new_balance,
n.rn
from dates_rns n -- 'n' is for this payment
join payment_sched p -- 'p' is for previous payment
on n.rn = p.rn + 1
join table_b b -- payment lookup
on b.loan_id = p.loan_id
and round(b.previous_balance) > round(p.new_balance)
and round(b.remaining_balance) <= round(p.new_balance)
)
select *
from payment_sched;

T-SQL if value exists use it other wise use the value before

I have the following table
-----Account#----Period-----Balance
12345---------200901-----$11554
12345---------200902-----$4353
12345 --------201004-----$34
12345 --------201005-----$44
12345---------201006-----$1454
45677---------200901-----$14454
45677---------200902-----$1478
45677 --------201004-----$116776
45677 --------201005-----$996
56789---------201006-----$1567
56789---------200901-----$7894
56789---------200902-----$123
56789 --------201003-----$543345
56789 --------201005-----$114
56789---------201006-----$54
I want to select the account# that have a period of 201005.
This is fairly easy using the code below. The problem is that if a user enters 201003-which doesnt exist- I want the query to select the previous value.*NOTE that there is an account# that has a 201003 period and I still want to select it too.*
I tried CASE, IF ELSE, IN but I was unsuccessfull.
PS:I cannot create temp tables due to system limitations of 5000 rows.
Thank you.
DECLARE #INPUTPERIOD INT
#INPUTPERIOD ='201005'
SELECT ACCOUNT#, PERIOD , BALANCE
FROM TABLE1
WHERE PERIOD =#INPUTPERIOD
SELECT t.ACCOUNT#, t.PERIOD, t.BALANCE
FROM (SELECT ACCOUNT#, MAX(PERIOD) AS MaxPeriod
FROM TABLE1
WHERE PERIOD <= #INPUTPERIOD
GROUP BY ACCOUNT#) q
INNER JOIN TABLE1 t
ON q.ACCOUNT# = t.ACCOUNT#
AND q.MaxPeriod = t.PERIOD
select top 1 account#, period, balance
from table1
where period >= #inputperiod
; WITH Base AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE Period <= 201003
)
SELECT * FROM Base WHERE RN = 1
Using CTE and ROW_NUMBER() (we take all the rows with Period <= the selected date and we take the top one (the one with auto-generated ROW_NUMBER() = 1)
; WITH Base AS
(
SELECT *, 1 AS RN FROM #MyTable WHERE Period = 201003
)
, Alternative AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE NOT EXISTS(SELECT 1 FROM Base) AND Period < 201003
)
, Final AS
(
SELECT * FROM Base
UNION ALL
SELECT * FROM Alternative WHERE RN = 1
)
SELECT * FROM Final
This one is a lot more complex but does nearly the same thing. It is more "imperative like". It first tries to find a row with the exact Period, and if it doesn't exists does the same thing as before. At the end it unite the two result sets (one of the two is always empty). I would always use the first one, unless profiling showed me the SQL wasn't able to comprehend what I'm trying to do. Then I would try the second one.

Percent to total in PostgreSQL without subquery

I have a table with users. Each user has a country. What I want is to get the list of all countries with the numbers of users and the percent/total. What I have so far is:
SELECT
country_id,
COUNT(*) AS total,
((COUNT(*) * 100) / (SELECT COUNT(*) FROM users WHERE cond1 = true AND cond2 = true AND cond3 = true)::decimal) AS percent
FROM users
WHERE cond1 = true AND cond2 = true AND cond3 = true
GROUP BY contry_id
Conditions in both of queries are the same. I tried to do this without a subquery but then I can't get the total number of users but total per country. Is there a way to do this without a subquery? I'm using PostgreSQL. Any help is highly appreciated.
Thanks in advance
I guess the reason you want to eliminate the subquery is to avoid scanning the users table twice. Remember the total is the sum of the counts for each country.
WITH c AS (
SELECT
country_id,
count(*) AS cnt
FROM users
WHERE cond1=...
GROUP BY country_id
)
SELECT
*,
100.0 * cnt / (SELECT sum(cnt) FROM c) AS percent
FROM c;
This query builds a small CTE with the per-country statistics. It will only scan the users table once, and generate a small result set (only one row per country).
The total (SELECT sum(cnt) FROM c) is calculated only once on this small result set, so it uses negligible time.
You could also use a window function :
SELECT
country_id,
cnt,
100.0 * cnt / (sum(cnt) OVER ()) AS percent
FROM (
SELECT country_id, count(*) as cnt from users group by country_id
) foo;
(which is the same as nightwolf's query with the errors removed lol )
Both queries take about the same time.
This is really old, but both of the select examples above either don't work, or are overly complex.
SELECT
country_id,
COUNT(*),
(COUNT(*) / (SUM(COUNT(*)) OVER() )) * 100
FROM
users
WHERE
cond1 = true AND cond2 = true AND cond3 = true
GROUP BY
country_id
The second count is not necessary, it's just for debugging to ensure you're getting the right results. The trick is the SUM on top of the COUNT over the recordset.
Hope this helps someone.
Also, if anyone wants to do this in Django, just hack up an aggregate:
class PercentageOverRecordCount(Aggregate):
function = 'OVER'
template = '(COUNT(*) / (SUM(COUNT(*)) OVER() )) * 100'
def __init__(self, expression, **extra):
super().__init__(
expression,
output_field=DecimalField(),
**extra
)
Now it can be used in annotate.
I am not a PostgreSQL user but, the general solution would be to use window functions.
Read up on how to use this at http://developer.postgresql.org/pgdocs/postgres/tutorial-window.html
Best explanation i could use to describe it is: basically it allows you to do a group by on one field without the group by clause.
I believe this might do the trick:
SELECT
country_id,
COUNT(*) OVER (country_id)
((((COUNT(*) OVER (country_id)) * 100) / COUNT(*) OVER () )::decimal) as percent
FROM
users
WHERE
cond1 = true AND cond2 = true AND cond3 = true
Using last PostgreSQL version the query can be next:
CREATE TABLE users (
id serial,
country_id int
);
INSERT INTO users (country_id) VALUES (1),(1),(1),(2),(2),(3);
select distinct
country_id,
round(
((COUNT(*) OVER (partition by country_id )) * 100)::numeric
/ COUNT(*) OVER ()
, 2) as percent
from users
order by country_id
;
Result on SQLize.online
+============+=========+
| country_id | percent |
+============+=========+
| 1 | 50.00 |
+------------+---------+
| 2 | 33.33 |
+------------+---------+
| 3 | 16.67 |
+------------+---------+