(psql) How to sum up on a column based on conditions? - postgresql

I have a following postgresql table.
user
amount
type
danny
2
deposit
danny
3
withdraw
kathy
4
deposit
kathy
5
deposit
kathy
6
withdraw
Now, I am trying to get every user's remaining wallet balance. The sum up calculation works like this: for deposits they are positive in the sum function and for withdraw they are negative values. e.g. For danny, the remaining balance after 2 deposit and 3 withdraw is 2 - 3 = -1. For kathy, the remaining balance is 4 + 5 - 6 = 3.
What is the easiest way to calculate this in one Postgresql query?
Thanks.

Convert the type from text to the numeric factor 1 or -1 as appropriate. Then just do sum(amount * factor):
with test (usr, amount, type) as
( values ( 'danny', 2, 'deposit')
, ( 'danny', 3, 'withdraw')
, ( 'kathy', 4, 'deposit')
, ( 'kathy', 5, 'deposit')
, ( 'kathy', 6, 'withdraw')
)
-- your query starts here
select usr "User"
, sum (amount * factor) "Balance"
from ( select usr
, amount
, case when type = 'deposit' then 01
when type = 'withdraw' then -1
else null
end factor
from test
) sq
group by usr
order by usr;
NOTE: It is poor practice to use user as an identifier (i.e. column name, etc) since it is both a Postgres and SQL standard reserved word.

Related

Replace content in 'order' column with sequential numbers

I have an 'order' column in a table in a postgres database that has a lot of missing numbers in the sequence. I am having a problem figuring out how to replace the numbers currently in the column, with new ones that are incremental (see examples).
What I have:
id order name
---------------
1 50 Anna
2 13 John
3 2 Bruce
4 5 David
What I want:
id order name
---------------
1 4 Anna
2 3 John
3 1 Bruce
4 2 David
The row containing the lowest order number in the old version of the column should get the new order number '1', the next after that should get '2' etc.
You can use the window function row_number() to calculate the new numbers. The result of that can be used in an update statement:
update the_table
set "order" = t.rn
from (
select id, row_number() over (order by "order") as rn
from the_table
) t
where t.id = the_table.id;
This assumes that id is the primary key of that table.

Several top numbers in a column T-SQL

I have a table called _Invoice in SQL Server 2016 - like this:
Company InvoiceNo
-----------------
10 1
10 2
10 3
20 1
20 2
20 3
20 4
I want to get the highest value from all companies.
Like this:
Company InvoiceNo
-----------------
10 3
20 3
I want this data to then update another table that is called InvoiceSeries
where the InvoiceNo is higher than the NextNo in InvoiceSeries table
I am stuck with getting the highest data from InvoiceNo:
UPDATE InvoiceSeries
SET NextNo = -- Highest number from each company--
FROM InvoiceSeries ise
JOIN _Invoice i ON ise.InvoiceSeries = i.InvoiceSeries
WHERE i.InvoiceNo > ise.NextNo
Some example data:
Columns in InvoiceSeries Columns in _Invoices
Company NextNo Company InvoiceNo
10 9007 10 9008
20 1001 10 9009
10 9010
10 9011
10 9012
20 1002
20 1003
20 1004
If I understand correctly, you are looking for the HIGHEST common invoice number
Example
Select A.*
From YourTable A
Join (
Select Top 1 with ties
InvoiceNo
From YourTable
Group By InvoiceNo
Having count(Distinct Company) = (Select count(Distinct Company) From YourTable)
Order By InvoiceNo Desc
) B on A.InvoiceNo=B.InvoiceNo
Returns
Company InvoiceNo
10 3
20 3
EDIT - Updated for comment
Select company
,Invoice=max(invoiceno)
From YourTable
Group By company
This answer assumes there will be a record in the Invoice Series table.
--Insert Sample Data
CREATE TABLE #_Invoice (Company INT, InvoiceNo INT)
INSERT INTO #_Invoice(Company, InvoiceNo)
VALUES
(10 , 1),
(10 , 2),
(10 , 3),
(20 , 1),
(20 , 2),
(20 , 3),
(20 , 4)
CREATE TABLE #InvoiceSeries(Company INT, NextNo INT)
INSERT INTO #InvoiceSeries(Company, NextNo)
VALUES
(10, 1),
(20 ,1)
UPDATE s
SET NextNo = MaxInvoiceNo
FROM #InvoiceSeries s
INNER JOIN (
--Get the Max invoice number per company
SELECT Company, MAX(InvoiceNo) as MaxInvoiceNo
FROM #_Invoice
GROUP BY Company
) i on i.Company = s.Company
AND s.NextNo < i.MaxInvoiceNo --Only join to records where the 'nextno' is less than the max
--Confirm results
SELECT * FROM #InvoiceSeries
DROP TABLE #InvoiceSeries
DROP TABLE #_Invoice

How to calculate average number and give subquery label

I have two table "book" and "authorCollection". Because a book may have multi-authors, I hope to get the average number of authors in table "book" which published after year 2000(inclusive).
For example:
Table Book:
key year
1 2000
2 2001
3 2002
4 1999
Table authorCollection:
key author
1 Tom
1 John
1 Alex
1 Mary
2 Alex
3 Tony
4 Mary
The result should be (4 + 1 + 1) / 3 = 2;(key 4 publish before year 2000).
I write the following query statement, but not right, I need to get the number of result in subquery, but cannot give it a label "b", How can i solve this problem? And get the average number of author? I still confused about "COUNT(*) as count" meaning....Thanks.
SELECT COUNT(*) as count, b.COUNT(*) AS total
FROM A
WHERE key IN (SELECT key
FROM Book
WHERE year >= 2000
) b
GROUP BY key;
First, count number of authors for a key in a subquery. Next, aggregate needed values:
select avg(coalesce(ct, 0))
from book b
left join (
select key, count(*) ct
from authorcollection
group by 1
) a
using (key)
where year >= 2000;
A sample as well as handling 'divide by zero' error:
select case when count(distinct book.key)=0
then null
else count(authorCollection.key is not null)/count(distinct book.key)
end as avg_after_2000
from book
left join authorCollection on(book.key=authorCollection.key)
where book.year >= 2000

How to check if the sum of some records equals the difference between two other records in t-sql?

I have a view that contains bank account activity.
ACCOUNT BALANCE_ROW AMOUNT SORT_ORDER
111 1 0.00 1
111 0 10.00 2
111 0 -2.50 3
111 1 7.50 4
222 1 100.00 5
222 0 25.00 6
222 1 125.00 7
ACCOUNT = account number
BALANCE_ROW = either starting or ending
balance would be 1, otherwise 0
AMOUNT = the amount
SORT_ORDER =
simple order to return the records in the order of start balance,
activity, and end balance
I need to figure out a way to see if the sum of the non balance_row rows equal the difference between the ending balance and the starting balance. The result for each account (1 for yes, 0 for no) would be simply added to the resulting result set.
Example:
Account 111 had a starting balance of 0.00. There were two account activity records of 10.00 and -2.5. That resulted in the ending balance of 7.50.
I've been playing around with temp tables, but I was not sure if there is a more efficient way of accomplishing this.
Thanks for any input you may have!
I would use ranking, then group rows by ACCOUNT calculating totals along the way:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY ACCOUNT ORDER BY SORT_ORDER)
FROM data
),
grouped AS (
SELECT
ACCOUNT,
BALANCE_DIFF = SUM(CASE BALANCE_ROW WHEN 1 THEN AMOUNT END
* CASE rnk WHEN 1 THEN -1 ELSE 1 END),
ACTIVITY_SUM = SUM(CASE BALANCE_ROW WHEN 0 THEN AMOUNT ELSE 0 END)
FROM data
GROUP BY
ACCOUNT
)
SELECT *
FROM grouped
WHERE BALANCE_DIFF <> ACTIVITY_SUM
Ranking is only used here to make it easier to calculate the starting/ending balance difference. If starting and ending balance rows had, for instance, different BALANCE_ROW codes (like 1 for the starting balance, 2 for the ending one), it would be possible to avoid ranking.
Untested code, but should be really close for comparing the summed balance with the balance_row as you've defined in your question.
SELECT
Account, /* Account Number */
(select sum(B.amount) from yourview B
where B.balance_row = 0 and
B.account = A.account and
B.sort_order BETWEEN A.sort_order and
(select max(sort_order) /* previous sort order value on account */
from yourview C where
C.balance_row = 1 and
C.account = A.account and
C.sort_order < A.sort_order)
) AS Test_Balance, /* Test_Balance = sum of amounts since last balance row */
Balance_Row /* Value of balance row */
FROM yourview A
WHERE A.Balance_Row = 1

How to find the average of certain records T-SQL

I have a table variable that I am dumping data into:
DECLARE #TmpTbl_SKUs AS TABLE
(
Vendor VARCHAR (255),
Number VARCHAR(4),
SKU VARCHAR(20),
PurchaseOrderDate DATETIME,
LastReceivedDate DATETIME,
DaysDifference INT
)
Some records don't have a purchase order date or last received date, so the days difference is null as well. I have done a lot of inner joins on itself, but data seems to take too long, or comes out incorrect most of the time.
Is it possible to get the average per SKU days difference? how would I check if there is only 1 record of that SKU? I need the data, if there is only 1 record, then I have to find it at a champvendor level the average.
Here is the structure:
Vendor has many Numbers and Numbers has many SKUs
Any help would be great, I can't seem to crack this one, nor can I find anything related to this online. Thanks in advance.
Here is some sample data:
Vendor Number SKU PurchaseOrderDate LastReceivedDate DaysDifference
OTHER PMDD 1111 OP1111 2009-08-21 00:00:00.000 2009-09-02 00:00:00.000 12
OTHER PMDD 1111 OP1112 2009-12-09 00:00:00.000 2009-12-17 00:00:00.000 8
MANTOR 3333 MA1111 2006-02-15 00:00:00.000 2006-02-23 00:00:00.000 8
MANTOR 3333 MA1112 2006-02-15 00:00:00.000 2006-02-23 00:00:00.000 8
I'm sorry I may have written this wrong. If there is only 1 SKU for a record, then I want to return the DaysDifference (if it's not null), if it has more than 1 record and they are not null, then return the average days difference. If it is all nulls, then at a vendor level check for the average of the skus that are not null, otherwise it should just return 7. This is what I have tried:
SELECT t1.SKU, ISNULL
(
AVG(t1.DaysDifference),
(
SELECT ISNULL(AVG(t2.DaysDifference), 7)
FROM #TmpTbl_SKUs t2
WHERE t2.SKU=t1.SKU
GROUP BY t2.ChampVendor, t2.VendorNumber, t2.SKU
)
)
FROM #TmpTbl_SKUs t1
GROUP BY t1.SKU
Keep playing with this. I somewhat have what I got, but just don't understand how I would check if it has multiple records, and how to check at a vendor level.
Try this:
EDITED: added NULLIF(..., 0) to treat 0s as NULLs.
SELECT
t1.SKU,
COALESCE(
NULLIF(AVG(t1.DaysDifference), 0),
NULLIF(t2.AvgDifferenceVendor, 0),
7
) AS AvgDiff
FROM #TmpTbl_SKUs t1
INNER JOIN (
SELECT Vendor, AVG(DaysDifference) AS AvgDifferenceVendor
FROM #TmpTbl_SKUs
GROUP BY Vendor
) t2 ON t1.Vendor = t2.Vendor
GROUP BY t1.SKU, t2.AvgDifferenceVendor
EDIT 2: how I tested the script.
For testing I'm using the sample data posted with the question.
DECLARE #TmpTbl_SKUs AS TABLE
(
Vendor VARCHAR (255),
Number VARCHAR(4),
SKU VARCHAR(20),
PurchaseOrderDate DATETIME,
LastReceivedDate DATETIME,
DaysDifference INT
)
INSERT INTO #TmpTbl_SKUs
(Vendor, Number, SKU, PurchaseOrderDate, LastReceivedDate, DaysDifference)
SELECT 'OTHER PMDD', '1111', 'OP1111', '2009-08-21 00:00:00.000', '2009-09-02 00:00:00.000', 12
UNION ALL
SELECT 'OTHER PMDD', '1111', 'OP1112', '2009-12-09 00:00:00.000', '2009-12-17 00:00:00.000', 8
UNION ALL
SELECT 'MANTOR', '3333', 'MA1111', '2006-02-15 00:00:00.000', '2006-02-23 00:00:00.000', 8
UNION ALL
SELECT 'MANTOR', '3333', 'MA1112', '2006-02-15 00:00:00.000', '2006-02-23 00:00:00.000', 8;
First I'm running the script on the unmodified data. Here's the result:
SKU AvgDiff
-------------------- -----------
MA1111 8
MA1112 8
OP1111 12
OP1112 8
AvgDiff for every SKU is identical to the original DaysDifference for every SKU, because there's only one row per each one.
Now I'm changing DaysDifference for SKU='MA1111' to 0 and running the script again. Ther result is:
SKU AvgDiff
-------------------- -----------
MA1111 4
MA1112 8
OP1111 12
OP1112 8
Now AvgDiff for MA1111 is 4. Why? Because the average for the SKU is 0, and so the average by Vendor is taken, which has been calculated as (0 + 8) / 2 = 4.
Next step is to set DaysDifference to 0 for all the SKUs of the same Vendor. In this case I'm setting it for SKUs MA1111 and MA1112. Here's the result of the script for this change:
SKU AvgDiff
-------------------- -----------
MA1111 7
MA1112 7
OP1111 12
OP1112 8
So now AvgDiff is 7 for both MA1111 and MA1112. How has it become so? Both have DaysDifference = 0. That means that the average by Vendor should be taken for each one. But Vendor average is 0 too in this case. According to the requirement, the average here should default to 7, which is what the script has returned.
So the script seems to be working correctly. I understand that it's either me having missed something or you having forgotten to mention some details. In any case, I would be glad to see where this script fails to solve your problem.