How to check if the sum of some records equals the difference between two other records in t-sql? - tsql

I have a view that contains bank account activity.
ACCOUNT BALANCE_ROW AMOUNT SORT_ORDER
111 1 0.00 1
111 0 10.00 2
111 0 -2.50 3
111 1 7.50 4
222 1 100.00 5
222 0 25.00 6
222 1 125.00 7
ACCOUNT = account number
BALANCE_ROW = either starting or ending
balance would be 1, otherwise 0
AMOUNT = the amount
SORT_ORDER =
simple order to return the records in the order of start balance,
activity, and end balance
I need to figure out a way to see if the sum of the non balance_row rows equal the difference between the ending balance and the starting balance. The result for each account (1 for yes, 0 for no) would be simply added to the resulting result set.
Example:
Account 111 had a starting balance of 0.00. There were two account activity records of 10.00 and -2.5. That resulted in the ending balance of 7.50.
I've been playing around with temp tables, but I was not sure if there is a more efficient way of accomplishing this.
Thanks for any input you may have!

I would use ranking, then group rows by ACCOUNT calculating totals along the way:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY ACCOUNT ORDER BY SORT_ORDER)
FROM data
),
grouped AS (
SELECT
ACCOUNT,
BALANCE_DIFF = SUM(CASE BALANCE_ROW WHEN 1 THEN AMOUNT END
* CASE rnk WHEN 1 THEN -1 ELSE 1 END),
ACTIVITY_SUM = SUM(CASE BALANCE_ROW WHEN 0 THEN AMOUNT ELSE 0 END)
FROM data
GROUP BY
ACCOUNT
)
SELECT *
FROM grouped
WHERE BALANCE_DIFF <> ACTIVITY_SUM
Ranking is only used here to make it easier to calculate the starting/ending balance difference. If starting and ending balance rows had, for instance, different BALANCE_ROW codes (like 1 for the starting balance, 2 for the ending one), it would be possible to avoid ranking.

Untested code, but should be really close for comparing the summed balance with the balance_row as you've defined in your question.
SELECT
Account, /* Account Number */
(select sum(B.amount) from yourview B
where B.balance_row = 0 and
B.account = A.account and
B.sort_order BETWEEN A.sort_order and
(select max(sort_order) /* previous sort order value on account */
from yourview C where
C.balance_row = 1 and
C.account = A.account and
C.sort_order < A.sort_order)
) AS Test_Balance, /* Test_Balance = sum of amounts since last balance row */
Balance_Row /* Value of balance row */
FROM yourview A
WHERE A.Balance_Row = 1

Related

Cumulative Sum When Order Was Placed in postgresql

I have an orders table with datetime when an order was placed, and when it was completed:
orderid
userid
price
status
createdat
doneat
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
I would like to have a new column that is the cumulative total (sum price) per user when the order was created:
orderid
userid
price
status
createdat
doneat
cumulative total when placed (per user)
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
0
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
0
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
250
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
0
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
100
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
100
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
100
The logic is sum the price for each user for all orders that were completed before the current row's created at date. For orderid=2, although it's the user's 2nd order, there are no orders that were completed before its createdat datetime of 2/21/21 05:27:29, so the cumulative total when placed is 0.
The same for orderid in [5,6,7]. For those orders and that userid, the only order that was completed before their createdat dates is order 4, so their cumulative total when placed is 100.
In PowerBI the logic is like this:
SUMX (
filter(
orders,
earlier orders.userid = orders.userid && orders.doneat < orders.createdat && order.status = 'completed'),
orders.price)
Would anyone have any hints of how to achieved this in postgresql?
I tried something like this and it didn't work.
select (case when o.doneat < o.createdat over (partition by o.userid, o.status order by o.createdat)
then sum(o.price) over (partition by o.userid, o.status ORDER BY o.doneat asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
end) as cumulativetotal_whenplaced
from order o
Thank you
You can duplicate each row into:
an "original" (which we'll decorate with a flag keep = true), that has an accounting value val = 0 (so far), and a time t = createdat;
a "duplicate" (keep = false), that has the price to account for (if status is 'completed') as val and a time t = doneat.
Then it's just a matter of accounting for the right bits:
select orderid, userid, price, status, createdat, doneat, cumtot
from (
select *, sum(val) over (partition by userid order by t, keep desc) as cumtot
from (
select *, createdat as t, 0 as val, true as keep from foo
union all
select *, doneat as t,
case when status = 'completed' then price else 0 end as val,
false as keep
from foo
) as a
) as a
where keep
order by orderid;
Example: DB Fiddle.
Note for RedShift: the window expression above needs to be replaced by:
...
select *, sum(val) over (
partition by userid order by t, keep desc
rows unbounded preceding) as cumtot
...
Result for your data:
orderid
userid
price
status
createdat
doneat
cumtot
1
128
100
completed
2021-02-16T18:40:45.000Z
2021-02-21T07:59:46.000Z
0
2
128
150
completed
2021-02-21T05:27:29.000Z
2021-02-23T11:58:23.000Z
0
3
128
100
completed
2021-09-03T08:38:14.000Z
2021-09-10T14:24:35.000Z
250
4
5
100
completed
2022-05-28T23:28:07.000Z
2022-06-26T06:10:35.000Z
0
5
5
100
canceled
2022-07-08T22:28:57.000Z
2022-08-10T06:55:17.000Z
100
6
5
100
completed
2022-07-25T13:46:38.000Z
2022-08-10T06:57:20.000Z
100
7
5
5
completed
2022-08-07T18:07:07.000Z
2022-08-12T06:56:23.000Z
100
Note: this type of accounting across time is actually robust to many corner-cases (various orders overlapping, some starting and finishing while others are still in process, etc.) It is the basis for a fast interval compaction algorithm that I should describe someday on SO.
Bonus: try to figure out why the partitioning window is ordered by t (fairly obvious) and also by keep desc (less obvious).

(psql) How to sum up on a column based on conditions?

I have a following postgresql table.
user
amount
type
danny
2
deposit
danny
3
withdraw
kathy
4
deposit
kathy
5
deposit
kathy
6
withdraw
Now, I am trying to get every user's remaining wallet balance. The sum up calculation works like this: for deposits they are positive in the sum function and for withdraw they are negative values. e.g. For danny, the remaining balance after 2 deposit and 3 withdraw is 2 - 3 = -1. For kathy, the remaining balance is 4 + 5 - 6 = 3.
What is the easiest way to calculate this in one Postgresql query?
Thanks.
Convert the type from text to the numeric factor 1 or -1 as appropriate. Then just do sum(amount * factor):
with test (usr, amount, type) as
( values ( 'danny', 2, 'deposit')
, ( 'danny', 3, 'withdraw')
, ( 'kathy', 4, 'deposit')
, ( 'kathy', 5, 'deposit')
, ( 'kathy', 6, 'withdraw')
)
-- your query starts here
select usr "User"
, sum (amount * factor) "Balance"
from ( select usr
, amount
, case when type = 'deposit' then 01
when type = 'withdraw' then -1
else null
end factor
from test
) sq
group by usr
order by usr;
NOTE: It is poor practice to use user as an identifier (i.e. column name, etc) since it is both a Postgres and SQL standard reserved word.

Getting the top 2 amount with the most recent dates

My original code was taking all transactions in the last 12 months and compare the top two highest single transaction.
If the highest single gift within that time period is more than two times greater than the 2nd largest single transaction, take the 2nd highest single gift. If the #1 single highest gift is not two times greater, it is used.
I found I need to use the most recent dates with the top 2 amount with the above rules. If I use the last 12 months, I'm not getting all the amount value I need.
How do I change the where statement to get the most recent dates instead of the last 12 months from the current current date.
Input Values
account number, date, and transaction amount.
7428, 01262018, 2
7428, 12302018, 5
16988 02142016, 100
16988 01152016, 25
22450 04191971, 8
22450 08291971, 10
Results
AccountNumber Number Amount
------------------------------
7428 2 5.00
16988 2 25.00
22450 2 10.00
26997 2 10.00
27316 2 25.00
27365 2 25.00
28620 2 10.00
28951 2 10.00
29905 2 5.00
Code:
DECLARE #start_date date
DECLARE #end_date date
SET #start_date = DATEADD(YEAR, -1, GETDATE())
SET #end_date = GETDATE()
SELECT
AccountNumber,
COUNT(amount) as Number,
CASE
WHEN MAX(CASE WHEN row_num = 1 THEN amount END) > MAX(CASE WHEN row_num = 2 THEN amount END) * 2
THEN MAX(CASE WHEN row_num = 2 THEN amount END)
ELSE MAX(CASE WHEN row_num = 1 THEN amount END)
END AS Amount
FROM
(SELECT
*,
ROW_NUMBER() OVER(PARTITION BY AccountNumber ORDER BY amount DESC) AS row_num
FROM
dbo.[T01_TransactionMaster]
WHERE
date >= #start_date AND date < #end_date) AS tt
WHERE
row_num IN (1, 2)
AND amount > 0
-- AND AccountNumber = 301692
GROUP BY
AccountNumber

How to create buckets and groups within those buckets using PostgresQL

How to find the distribution of credit cards by year, and completed transaction. Group these credit cards into three buckets: less than 10 transactions, between 10 and 30 transactions, more than 30 transactions?
The first method I tried to use was using the width_buckets function in PostgresQL, but the documentation says that only creates equidistant buckets, which is not what I want in this case. Because of that, I turned to case statements. However, I'm not sure how to use the case statement with a group by.
This is the data I am working with:
table 1 - credit_cards table
credit_card_id
year_opened
table 2 - transactions table
transaction_id
credit_card_id - matches credit_cards.credit_card_id
transaction_status ("complete" or "incomplete")
This is what I have gotten so far:
SELECT
CASE WHEN transaction_count < 10 THEN “Less than 10”
WHEN transaction_count >= 10 and transaction_count < 30 THEN “10 <= transaction count < 30”
ELSE transaction_count>=30 THEN “Greater than or equal to 30”
END as buckets
count(*) as ct.transaction_count
FROM credit_cards c
INNER JOIN transactions t
ON c.credit_card_id = t.credit_card_id
WHERE t.status = “completed”
GROUP BY v.year_opened
GROUP BY buckets
ORDER BY buckets
Expected output
credit card count | year opened | transaction count bucket
23421 | 2002 | Less than 10
etc
You can specify the bin sizes in width_bucket by specifying a sorted array of the lower bound of each bin.
In you case, it would be array[10,30]: anything less than 10 gets bin 0, between 10 and 29 gets bin 1 and 30 or more gets bin 2.
WITH a AS (select generate_series(5,35) cnt)
SELECT cnt, width_bucket(cnt, array[10,30])
FROM a;
To figure this out you need to count transactions per credit card in order to figure out the right bucket, then you need to count the credit cards per bucket per year. There are a couple of different ways to get the final result. One way is to first join up all your data and compute the first level of aggregate values. Then compute the final level of aggregate values:
with t1 as (
select year_opened
, c.credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from credit_cards c
join transactions t
on t.credit_card_id = c.credit_card_id
where t.transaction_status = 'complete'
group by year_opened
, c.credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from t1
group by year_opened
, buckets;
However, it may be more perforamant first calculate the first level of aggregate data on the transactions table before joining it to the credit cards table:
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join (select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id) t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
If you prefer to unroll the above query and uses Common Table Expressions, you can do that too (I find this easier to read/follow along):
with bkt as (
select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join bkt t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
Not sure if this is what you are looking for.
WITH cte
AS (
SELECT c.year_opened
,c.credit_card_id
,count(*) AS transaction_count
FROM credit_cards c
INNER JOIN transactions t ON c.credit_card_id = t.credit_card_id
WHERE t.STATUS = 'completed'
GROUP BY c.year_opened
,c.credit_card_id
)
SELECT cte.year_opened AS 'year opened'
,SUM(CASE
WHEN transaction_count < 10
THEN 1
ELSE 0
END) AS 'Less than 10'
,SUM(CASE
WHEN transaction_count >= 10
AND transaction_count < 30
THEN 1
ELSE 0
END) AS '10 <= transaction count < 30'
,SUM(CASE
WHEN transaction_count >= 30
THEN 1
ELSE 0
END) AS 'Greater than or equal to 30'
FROM CTE
GROUP BY cte.year_opened
and the output would be as below.
year opened | Less than 10 | 10 <= transaction count < 30 | Greater than or equal to 30
2002 | 23421 | |

Show the count based on some condition

I've asked a question some days back. Here is that link.
Count() corresponding to max()
Now with the same set of tables (SQL Fiddle) I would like to check a different condition
If the first question was about a count related to the max of a status, this question is about showing the count based on the next status of every project.
Explanation
As you can see in the table user_approval,appr_prjt_id=1 has 3 different statuses namely 10,20 ,30. And the next status will be 40 (With every approval the status is increased by 10) and so on. So is it possible to show that there is a project whose status is waiting to be 40? Its count must only be shown for status corresponding to 40 in the output (not in the statuses 10,20,30,...etc)
Desired Output:
10 | 20 | 30 | 40
location1 0 | 0 | 0 | 1
Not sure what the next status will be 40 means. But assuming that the status is increased by 10 with every approval, the following should work:
SELECT *
FROM user_projects pr
WHERE EXISTS (
SELECT * FROM user_approval ex
WHERE ex.appr_prjt_id = pr.proj_id
AND ex.appr_status = 30
)
AND NOT EXISTS (
SELECT * FROM user_approval nx
WHERE nx.appr_prjt_id = pr.proj_id
AND nx.appr_status >= 40
);
You can get the counts for each of the next status requirements with a query that looks more like:
select
sum(case when ua.appr_status = 10 then 1 else 0 end) as app_waiting_20,
sum(case when ua.appr_status = 20 then 1 else 0 end) as app_waiting_30,
sum(case when ua.appr_status = 30 then 1 else 0 end) as app_waiting_40
from
user_approval ua;
The nice thing about this solution is only one table scan, and you can add all kinds of other counts/sums in the query result as well.
select * from user_approval where appr_status
= (select max(appr_status) from user_approval where appr_status < 40);
SQL Fiddle : - http://www.sqlfiddle.com/#!11/f5243/10