How do I find the sum of all transactions since an event? - tsql

So, let's say that I have a group of donors, and they make donations on an irregular basis. I can put the donor name, the donation amount, and the donation date into a table, but then I want to do a report that shows all of that information PLUS the value of all donations after that amount.
I know that I can parse through this using a loop, but is there a better way?
I'm cheating here by not bothering with the code that would go through and assign a transaction number by donor and ensure that everything is the right order. That's easy enough.
DECLARE #Donors TABLE (
ID INT IDENTITY
, Name NVARCHAR(30)
, NID INT
, Amount DECIMAL(7,2)
, DonationDate DATE
, AmountAfter DECIMAL(7,2)
)
INSERT INTO #Donors VALUES
('Adam Zephyr',1,100.00,'2017-01-14',NULL)
, ('Adam Zephyr',2,200.00,'2017-01-17',NULL)
, ('Adam Zephyr',3,150.00,'2017-01-20',NULL)
, ('Braden Yu',1,50.00,'2017-01-11',NULL)
, ('Braden Yu',2,75.00,'2017-01-19',NULL)
DECLARE #Counter1 INT = 0
, #Name NVARCHAR(30)
WHILE #Counter1 < (SELECT MAX(ID) FROM #Donors)
BEGIN
SET #Counter1 += 1
SET #Name = (SELECT Name FROM #Donors WHERE ID = #Counter1)
UPDATE d1
SET AmountAfter = (SELECT ISNULL(SUM(Amount),0) FROM #Donors d2 WHERE ID > #Counter1 AND Name = #Name)
FROM #Donors d1
WHERE d1.ID = #Counter1
END
SELECT * FROM #Donors
It seems like there ought to be a way to do this recursively, but I just can't wrap my head around it.

This would show the latest donation per Name which I presume is the donor and the total of all amounts donated by that person. Perhaps it's more appropriate to use NID for the partitions.
;with MostRecentDonations as (
select *,
row_number() over (partition by Name order by DonationDate desc) as rn,
sum(Amount) over (partition by Name) as TotalDonations
from #Donors
)
select * from MostRecentDonations
where rn = 1;
There's certainly no need to store a running total anywhere unless you have some kind of performance issue.
EDIT:
I've thought about your question and now I'm thinking that you just want a running total with all the transactions included. That's easy too:
select *,
sum(Amount) over (partition by Name order by DonationDate) as DonationsToDate
from #Donors
order by Name, DonationDate;

Related

How to Return Records Equal to a Specific Percentage of an Aggregate in Transact-SQL?

My requirement is to provide a random sample of claims that comprise 2.5% of the total amount paid and also comprise 2.5% of total claims for a given population. The goal is to deliver records in a report that meet both criteria. My staging table is defined as follows:
[RecordId] UniqueIdentifier NOT NULL PRIMARY KEY DEFAULT NEWID()
,ClaimNO varchar(50)
,Company_ID varchar(10)
,HPCode varchar(10)
,FinancialResponsibility varchar(30)
,ProviderType varchar(50)
,DateOfService date
,DatePaid date
,ClaimType varchar(50)
,TotalBilled numeric(11,2)
,TotalPaid numeric(11,2)
,ProcessorType varchar(100)
I've already built the logic to return 2.5% of the total number of claims but need guidance in how best to ensure both criterion are met.
Here's what I've tried thus far:
with cteTotals as (
Select Count(*) as TotalClaims, sum(TotalPaid) as TotalPaid, sum(TotalPaid) * .025 as PaidSampleAmount
from [Z_Monthly_Quality_Review]
),
ctePopulation as (
Select *
from [Z_Monthly_Quality_Review]
),
cteSampleRows as (
select TOP 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review]
order by NEWID()
),
cteSamplePaid as (
Select Top 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review] mqr
inner join ctePopulation cte on mqr.ClaimNo = cte.ClaimNO
order by NEWID()
)
Since both criterion must be satisfied, how should I structure both CTEs to ensure this? In my cteSamplePaid, how do I ensure that the sum of total paid equals 2.5% of the total population? Would this be accomplished with a Having clause? The end result will be displayed to my business users via SQL Server Reporting Services. Ideally, I would want to provide them with 1 sample that meets both criteria. If that's not possible, how do I randomly sample claims from both criterion?
Don't think there is a guaranteed way it will add up to 2.5% of the total. There's no guarantee results and the performance would be very poor as it you would essentially have to brute force every possible combination of rows. A way to get very close to your goal would be to use return rows that add up to an acceptable margin of error.
Since no sample data was provided, I just used AdventureWorks2017 (downloaded from here)
USE AdventureWorks2017
GO
DROP TABLE IF EXISTS #SalesData
SELECT SalesOrderID AS ID,TotalDue
INTO #SalesData
FROM Sales.SalesOrderHeader
Declare #DesiredPercentage Numeric(10,3) = .025 /*Desired sum percentage of total rows*/
,#AcceptableMargin Numeric(10,3) = .01 /*Random row total can be plus or minus this percentage of the desired sum*/
DECLARE #DesiredSum Numeric(16,2) = #DesiredPercentage *(SELECT SUM(TotalDue) FROM #SalesData)
/*For loop*/
DECLARE #RowNum INT
,#LoopCounter INT = 1
WHILE (1=1)
BEGIN
DROP TABLE IF EXISTS #RandomData
SELECT RowNum = ROW_NUMBER() OVER (ORDER BY B.RandID),A.*,RunningTotal = SUM(TotalDue) OVER (ORDER BY B.RandID)
INTO #RandomData
FROM #SalesData AS A
CROSS APPLY (SELECT RandID = NEWID()) AS B
WHERE TotalDue < #DesiredSum /*If single row bigger than desired sum, then filter it out*/
ORDER BY B.RandID
SELECT Top(1) #RowNum = RowNum
FROM #RandomData AS A
CROSS APPLY (SELECT DeltaFromDesiredSum = ABS(RunningTotal-#DesiredSum)) AS B
WHERE RunningTotal BETWEEN #DesiredSum *(1-#AcceptableMargin) AND #DesiredSum *(1+#AcceptableMargin)
ORDER BY DeltaFromDesiredSum
IF (#RowNum IS NOT NULL)
BREAK;
IF (#LoopCounter >=100) /*Prevents infinite loops*/
THROW 59194,'Result unable to be generated in 100 tries. Recommend expanding acceptable margin',1;
SET #LoopCounter +=1;
END
SELECT *
FROM #RandomData
WHERE RowNum <= #RowNum
SELECT RandomRowTotal = SUM(TotalDue)
,DesiredSum = #DesiredSum
,PercentageFromDesiredSum = Concat(Cast(Round(100*(1-SUM(TotalDue)/#DesiredSum),2) as Float),'%')
FROM #RandomData
WHERE RowNum <= #RowNum

SQL Server - Select with Group By together Raw_Number

I'm using SQL Server 2000 (80). So, it's not possible to use the LAG function.
I have a code a data set with four columns:
Purchase_Date
Facility_no
Seller_id
Sale_id
I need to identify missing Sale_ids. So every sale_id is a 100% sequential, so the should not be any gaps in order.
This code works for a specific date and store if specified. But i need to work on entire data set looping looping through every facility_id and every seller_id for ever purchase_date
declare #MAXCOUNT int
set #MAXCOUNT =
(
select MAX(Sale_Id)
from #table
where
Facility_no in (124) and
Purchase_date = '2/7/2020'
and Seller_id = 1
)
;WITH TRX_COUNT AS
(
SELECT 1 AS Number
union all
select Number + 1 from TRX_COUNT
where Number < #MAXCOUNT
)
select * from TRX_COUNT
where
Number NOT IN
(
select Sale_Id
from #table
where
Facility_no in (124)
and Purchase_Date = '2/7/2020'
and seller_id = 1
)
order by Number
OPTION (maxrecursion 0)
My Dataset
This column:
case when
Sale_Id=0 or 1=Sale_Id-LAG(Sale_Id) over (partition by Facility_no, Purchase_Date, Seller_id)
then 'OK' else 'Previous Missing' end
will tell you which Seller_Ids have some sale missing. If you want to go a step further and have exactly your desired output, then filter out and distinct the 'Previous Missing' ones, and join with a tally table on not exists.
Edit: OP mentions in comments they can't use LAG(). My suggestion, then, would be:
Make a temp table that that has the max(sale_id) group by facility/seller_id
Then you can get your missing results by this pseudocode query:
Select ...
from temptable t
inner join tally N on t.maxsale <=N.num
where not exists( select ... from sourcetable s where s.facility=t.facility and s.seller=t.seller and s.sale=N.num)
> because the only way to "construct" nonexisting combinations is to construct them all and just remove the existing ones.
This one worked out
; WITH cte_Rn AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Facility_no, Purchase_Date, Seller_id ORDER BY Purchase_Date) AS [Rn_Num]
FROM (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id
FROM MyTable WITH (NOLOCK)
) a
)
, cte_Rn_0 as (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Rn_Num] AS 'Skipped Sale'
-- , case when Sale_id = 0 Then [Rn_Num] - 1 Else [Rn_Num] End AS 'Skipped Sale for 0'
, [Rn_Num] - 1 AS 'Skipped Sale for 0'
FROM cte_Rn a
)
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Skipped Sale],
[Skipped Sale for 0]
FROM cte_Rn_0 a
WHERE NOT EXISTS
(
select * from cte_Rn_0 b
where b.Sale_id = a.[Skipped Sale for 0]
and a.Facility_no = b.Facility_no
and a.Purchase_Date = b.Purchase_Date
and a.Seller_id = b.Seller_id
)
--ORDER BY Purchase_Date ASC

postgres - get top category purchased by customer

I have a denormalized table with the columns:
buyer_id
order_id
item_id
item_price
item_category
I would like to return something that returns 1 row per buyer_id
buyer_id, sum(item_price), item_category
-- but ONLY for the category with the highest rank of sales along that specific buyer_id.
I can't get row_number() or partition to work because I need to order by the sum of item_price relative to item_category relative to buyer. Am I overlooking anything obvious?
You need a few layers of fudging here:
SELECT buyer_id, item_sum, item_category
FROM (
SELECT buyer_id,
rank() OVER (PARTITION BY buyer_id ORDER BY item_sum DESC) AS rnk,
item_sum, item_category
FROM (
SELECT buyer_id, sum(item_price) AS item_sum, item_category
FROM my_table
GROUP BY 1, 3) AS sub2) AS sub
WHERE rnk = 1;
In sub2 you calculate the sum of 'item_price' for each 'item_category' for each 'buyer_id'. In sub you rank these with a window function by 'buyer_id', ordering by 'item_sum' in descending order (so the highest 'item_sum' comes first). In the main query you select those rows where rnk = 1.

T-SQL if value exists use it other wise use the value before

I have the following table
-----Account#----Period-----Balance
12345---------200901-----$11554
12345---------200902-----$4353
12345 --------201004-----$34
12345 --------201005-----$44
12345---------201006-----$1454
45677---------200901-----$14454
45677---------200902-----$1478
45677 --------201004-----$116776
45677 --------201005-----$996
56789---------201006-----$1567
56789---------200901-----$7894
56789---------200902-----$123
56789 --------201003-----$543345
56789 --------201005-----$114
56789---------201006-----$54
I want to select the account# that have a period of 201005.
This is fairly easy using the code below. The problem is that if a user enters 201003-which doesnt exist- I want the query to select the previous value.*NOTE that there is an account# that has a 201003 period and I still want to select it too.*
I tried CASE, IF ELSE, IN but I was unsuccessfull.
PS:I cannot create temp tables due to system limitations of 5000 rows.
Thank you.
DECLARE #INPUTPERIOD INT
#INPUTPERIOD ='201005'
SELECT ACCOUNT#, PERIOD , BALANCE
FROM TABLE1
WHERE PERIOD =#INPUTPERIOD
SELECT t.ACCOUNT#, t.PERIOD, t.BALANCE
FROM (SELECT ACCOUNT#, MAX(PERIOD) AS MaxPeriod
FROM TABLE1
WHERE PERIOD <= #INPUTPERIOD
GROUP BY ACCOUNT#) q
INNER JOIN TABLE1 t
ON q.ACCOUNT# = t.ACCOUNT#
AND q.MaxPeriod = t.PERIOD
select top 1 account#, period, balance
from table1
where period >= #inputperiod
; WITH Base AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE Period <= 201003
)
SELECT * FROM Base WHERE RN = 1
Using CTE and ROW_NUMBER() (we take all the rows with Period <= the selected date and we take the top one (the one with auto-generated ROW_NUMBER() = 1)
; WITH Base AS
(
SELECT *, 1 AS RN FROM #MyTable WHERE Period = 201003
)
, Alternative AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE NOT EXISTS(SELECT 1 FROM Base) AND Period < 201003
)
, Final AS
(
SELECT * FROM Base
UNION ALL
SELECT * FROM Alternative WHERE RN = 1
)
SELECT * FROM Final
This one is a lot more complex but does nearly the same thing. It is more "imperative like". It first tries to find a row with the exact Period, and if it doesn't exists does the same thing as before. At the end it unite the two result sets (one of the two is always empty). I would always use the first one, unless profiling showed me the SQL wasn't able to comprehend what I'm trying to do. Then I would try the second one.

Speeding up TSQL

Hi all i wondering if there's a more efficient way of executing this TSQl script. It basically goes and gets the very latest activity ordering by account name and then join this to the accounts table. So you get the very latest activity for a account. The problem is there are currently about 22,000 latest activities, so obviously it has to go through alot of data, just wondering if theres a more efficient way of doing what i'm doing?
DECLARE #pastAppointments TABLE (objectid NVARCHAR(100), account NVARCHAR(500), startdate DATETIME, tasktype NVARCHAR(100), ownerid UNIQUEIDENTIFIER, owneridname NVARCHAR(100), RN NVARCHAR(100))
INSERT INTO #pastAppointments (objectid, account, startdate, tasktype, ownerid, owneridname, RN)
SELECT * FROM (
SELECT fap.regardingobjectid, fap.regardingobjectidname, fap.actualend, fap.activitytypecodename, fap.ownerid, fap.owneridname,
ROW_NUMBER() OVER (PARTITION BY fap.regardingobjectidname ORDER BY fap.actualend DESC) AS RN
FROM FilteredActivityPointer fap
WHERE fap.actualend < getdate()
AND fap.activitytypecode NOT LIKE 4201
) tmp WHERE RN = 1
ORDER BY regardingobjectidname
SELECT fa.name, fa.owneridname, fa.new_technicalaccountmanagername, fa.new_customerid, fa.new_riskstatusname, fa.new_numberofopencases,
fa.new_numberofurgentopencases, app.startdate, app.tasktype, app.ownerid, app.owneridname
FROM FilteredAccount fa LEFT JOIN #pastAppointments app on fa.accountid = app.objectid and fa.ownerid = app.ownerid
WHERE fa.statecodename = 'Active'
AND fa.ownerid LIKE #owner_search
ORDER BY fa.name
You can remove ORDER BY regardingobjectidname from the first INSERT query - the only (narrow) purpose such a sort would have on an INSERT query is if there was an identity column on the table being inserted into. And there isn't in this case, so if the optimizer isn't smart enough, it'll perform a pointless sort.