INNER JOIN change SUM result - postgresql

CREATE TABLE beauty.customer_payments
(
customer_id integer,
date date,
amount numeric(10,2),
CONSTRAINT customer_payments_customer_id_fkey FOREIGN KEY (customer_id)
REFERENCES beauty.customers (customer_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
CREATE TABLE beauty.sales
(
product_id integer,
customer_id integer,
sell_date date NOT NULL,
qty integer NOT NULL,
sell_price numeric(10,2) NOT NULL,
expiry_date date NOT NULL,
CONSTRAINT sales_customer_id_fkey FOREIGN KEY (customer_id)
REFERENCES beauty.customers (customer_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id)
REFERENCES beauty.products (product_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
Balance of payments beauty.customer_payments for customer_id=6 is 0
SELECT * FROM beauty.customer_payments
WHERE customer_id=6;
customer_id
date
amount
6
2020-11-14
75.00
6
2020-11-14
-75.00
SELECT * FROM beauty.sales
WHERE customer_id=6;
product_id
customer_id
sell_date
qty
sell_price
expiry_date
76
6
2020-11-14
1
75.00
2022-03-03
83
6
2020-11-14
1
10.00
2022-06-23
85
6
2020-11-14
1
10.00
2022-06-23
44
6
2020-11-14
1
12.00
2022-06-23
41
6
2020-11-14
1
15.00
2022-03-26
96
6
2020-11-14
1
75.00
2022-03-15
28
6
2020-11-14
1
4.00
2022-01-22
33
6
2020-11-14
1
4.00
2023-01-23
37
6
2020-11-14
1
4.00
2023-01-23
40
6
2020-11-14
1
4.00
2023-08-13
(10 rows)
SELECT customer_id, SUM(qty * sell_price) AS purchased
FROM beauty.sales
WHERE customer_id=6
GROUP BY customer_id;
customer_id
purchased
6
213.00
SELECT s.customer_id,
SUM(qty * sell_price) AS purchased,
SUM(cp.amount) AS paid,
SUM(qty * sell_price - cp.amount) AS balance
FROM beauty.sales s
INNER JOIN beauty.customer_payments cp
ON cp.customer_id = s.customer_id
WHERE s.customer_id=6
GROUP BY s.customer_id;
customer_id
purchased
paid
balance
6
1065.00
0.00
1065.00
Please advise WHY after adding JOIN (INNER, LEFT, RIGHT) calculation goes wrong and how to solve this multi-calculation issue?
As I see similar question all of them based on not cross-tables calculations like SUM(qty * sell_price - cp.amount)
Delete payments
`
DELETE FROM beauty.customer_payments
WHERE customer_id=6;`
Add new ZERO payment
INSERT INTO beauty.customer_payments(
customer_id, date, amount)
VALUES (6, '2020-11-17', 0);
customer_id
purchased
paid
balance
6
213.00
0.00
213.00
Add payment 10
INSERT INTO beauty.customer_payments(
customer_id, date, amount)
VALUES (6, '2020-11-17', 10);
SELECT * FROM beauty.customer_payments WHERE customer_id=6;
customer_id
date
amount
6
2020-11-17
0.00
6
2020-11-17
10.00
SELECT s.customer_id,
.......
INNER JOIN beauty.customer_payments cp
.......
customer_id
purchased
paid
balance
6
426.00
100.00
326.00
Correct payment with negative amount
INSERT INTO beauty.customer_payments(
customer_id, date, amount)
VALUES (6, '2020-11-17', -10);
SELECT * FROM beauty.customer_payments WHERE customer_id=6;
customer_id
date
amount
6
2020-11-17
0.00
6
2020-11-17
10.00
6
2020-11-17
-10.00
SELECT s.customer_id,
.......
INNER JOIN beauty.customer_payments cp
.......
customer_id
purchased
paid
balance
6
639.00
0.00
639.00
What is this `INNER JOIN' calculate?

In general, you probably wouldn't want to match each payment to each sale (unless there's an additional identifier matching specific payments to specific sales, not just matching each to a customer).
If a customer has 2 sales for $10 and $15, and two payments for $9 and $14, your join is going to match each payment to each sale for that customer, creating something like
Sale
Payment
$10
$9
$10
$14
$15
$9
$15
$14
So the sum of the sales after the join will be $50, not $25 (as you might be expecting). I think the above answers your question about why the join doesn't do what you expect.
The exact query you want might be a little different (do you want all customers even if they have no sales? is it possible for customers to have payments if they don't have a sale?), but in general I'd expect something like the following to work. There are multiple ways of doing this, but I think the following is easy to understand since it aggregates the data into one payment row per customer and one sales row per customer before joining them.
SELECT
s.customer_id,
s.purchased,
cp.amount as paid,
s.purchased - cp.amount as balance
FROM
(SELECT s.customer_id,
SUM(s.qty * s.sell_price) AS purchased
FROM beauty.sales s
GROUP BY s.customer_id) s
LEFT OUTER JOIN
(SELECT cp.customer_id,
SUM(cp.amount) AS amount
FROM beauty.customer_payments cp
GROUP BY cp.customer_id) cp
ON s.customer_id = cp.customer_id
WHERE s.customer_id = 6

Related

Find Accounts with X Number of Transactions within Y Days of Each Other in a Larger Date Range

I am trying to write a SQL statement that will find the accounts that have had 3 or more transactions within 3 days whose absolute value is greater than $10.00 over the course of a week and then return those transactions.
Consider this data...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
1 0123 2020-09-01 45.75
2 0123 2020-09-02 5.23
3 0123 2020-09-03 9.94
4 0123 2020-09-05 8.35
5 0123 2020-09-06 -16.23
6 0123 2020-09-07 14.71
7 0123 2020-09-08 15.03
8 0123 2020-09-08 23.10
9 0123 2020-09-09 94.20
10 0123 2020-09-09 5.01
11 0123 2020-09-10 3.02
12 0123 2020-09-11 4.37
13 0123 2020-09-12 4.54
14 9876 2020-09-01 -45.75
15 9876 2020-09-02 5.27
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
If the week under review is 2020-09-01 to 2020-09-07 I would expect only AccountNumber 9876 to fit the criteria with TransactionIDs 16, 17, and 18 being the 3 transactions within 3 days with an absolute value greater than $10.00.
It seems like I should be able to use window functions (and perhaps framing), but I can't figure out how to start.
I have attempted without the use of window functions based on the answers to this question...
multiple transactions within a certain time period, limited by date range
DECLARE
#BeginDate DATE
, #EndDate DATE
, #ThresholdAmount DECIMAL(10, 2)
, #ThresholdCount INT
, #NumberOfDays INT;
SET #BeginDate = '09/01/2020';
SET #EndDate = '09/07/2020';
SET #ThresholdAmount = 10.00;
SET #ThresholdCount = 3;
SET #NumberOfDays = 3;
SELECT t.*
FROM (
SELECT
t1.*
, (
SELECT COUNT(*)
FROM Transactions t2
WHERE t2.AccountNumber = t1.AccountNumber
AND t2.TransactionID <> t1.TransactionID
AND t2.TransactionDate >= t1.TransactionDate
AND t2.TransactionDate < DATEADD(DAY, #NumberOfDays, t1.TransactionDate)
AND ABS(t2.TransactionAmount) > #ThresholdAmount
) AS NumberWithinXDays
FROM Transactions t1
WHERE t1.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t1.TransactionAmount) > #ThresholdAmount
) t
WHERE t.NumberWithinXDays >= #ThresholdCount;
SELECT *
FROM Transactions t
WHERE EXISTS (
SELECT *
FROM (
SELECT t1.AccountNumber
FROM Transactions t1
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionID <> t2.TransactionID
AND DATEDIFF(DAY, t1.TransactionDate, t2.TransactionDate) BETWEEN 0 AND (#NumberOfDays-1)
WHERE t1.TransactionDate BETWEEN #BeginDate AND #EndDate
AND t2.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t1.TransactionAmount) > #ThresholdAmount
AND ABS(t2.TransactionAmount) > #ThresholdAmount
GROUP BY t1.AccountNumber
HAVING COUNT(t1.TransactionID) >= #ThresholdCount
) x
WHERE x.AccountNumber = t.AccountNumber
)
AND t.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t.TransactionAmount) > #ThresholdAmount
My first query comes back with...
TransactionID AccountNumber TransactionDate TransactionAmount NumberWithinXDays
------------- ------------- --------------- ----------------- -----------------
5 0123 2020-09-06 -16.23 3
6 0123 2020-09-07 14.71 3
Not even close. And the second query returns...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
14 9876 2020-09-01 -45.75
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
Closer, but not restricted to just transaction within 3 days of each other. This is the result I want.
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
Now it is certainly possible I have not implemented these suggested queries correctly. Or maybe there is some subtle difference I am missing and they just don't fit my situation.
Any suggestions on fixing either of my attempted queries or something completely different with or without window functions?
Here is full dbfiddle of my code.
I was not able to come up with a solution using window functions. As I thought about it more I thought I might be able to use a CTE, but I could not figure that out either.
I solve it using a couple of subqueries. I was concerned about performance given my transaction table has 86 million rows. However, it runs in less than 30 seconds and that is good enough for me.
-- distinct is need because a particular transaction may fit into more than
-- one transaction window but we only want to see it once in the results
SELECT DISTINCT
t.TransactionID
, t.AccountNumber
, t.TransactionDate
, t.TransactionAmount
FROM (
SELECT
t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
, COUNT(DISTINCT t2.TransactionID) AS Count
FROM (
-- establish the transaction window for each transaction within the
-- larger date range and an absolute value above the threshold
SELECT
TransactionID
, AccountNumber
, TransactionDate AS [TransactionDateWindowBegin]
, DATEADD(DAY, #NumberOfDays - 1, TransactionDate) AS [TransactionDateWindowEnd]
, TransactionAmount
FROM Transactions
WHERE TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(TransactionAmount) > #ThresholdAmount
) t1
-- join back to the transaction table to find transactions within the transaction window for
-- each transaction, count them, and only keep those that are above the threshold count
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionDateWindowBegin <= t2.TransactionDate
AND t1.TransactionDateWindowEnd >= t2.TransactionDate
WHERE t2.TransactionDate BETWEEN #BeginDate AND #EndDate
AND ABS(t2.TransactionAmount) > #ThresholdAmount
GROUP BY t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
HAVING COUNT(DISTINCT t2.TransactionID) >= #ThresholdCount
) x
-- join back to the transaction table again to get the details for the
-- transactions that meet the threshold amount and count criteria
INNER JOIN Transactions t ON x.AccountNumber = t.AccountNumber
AND x.TransactionDateWindowBegin <= t.TransactionDate
AND x.TransactionDateWindowEnd >= t.TransactionDate
AND ABS(t.TransactionAmount) > #ThresholdAmount;
Here is the full demo.

Several top numbers in a column T-SQL

I have a table called _Invoice in SQL Server 2016 - like this:
Company InvoiceNo
-----------------
10 1
10 2
10 3
20 1
20 2
20 3
20 4
I want to get the highest value from all companies.
Like this:
Company InvoiceNo
-----------------
10 3
20 3
I want this data to then update another table that is called InvoiceSeries
where the InvoiceNo is higher than the NextNo in InvoiceSeries table
I am stuck with getting the highest data from InvoiceNo:
UPDATE InvoiceSeries
SET NextNo = -- Highest number from each company--
FROM InvoiceSeries ise
JOIN _Invoice i ON ise.InvoiceSeries = i.InvoiceSeries
WHERE i.InvoiceNo > ise.NextNo
Some example data:
Columns in InvoiceSeries Columns in _Invoices
Company NextNo Company InvoiceNo
10 9007 10 9008
20 1001 10 9009
10 9010
10 9011
10 9012
20 1002
20 1003
20 1004
If I understand correctly, you are looking for the HIGHEST common invoice number
Example
Select A.*
From YourTable A
Join (
Select Top 1 with ties
InvoiceNo
From YourTable
Group By InvoiceNo
Having count(Distinct Company) = (Select count(Distinct Company) From YourTable)
Order By InvoiceNo Desc
) B on A.InvoiceNo=B.InvoiceNo
Returns
Company InvoiceNo
10 3
20 3
EDIT - Updated for comment
Select company
,Invoice=max(invoiceno)
From YourTable
Group By company
This answer assumes there will be a record in the Invoice Series table.
--Insert Sample Data
CREATE TABLE #_Invoice (Company INT, InvoiceNo INT)
INSERT INTO #_Invoice(Company, InvoiceNo)
VALUES
(10 , 1),
(10 , 2),
(10 , 3),
(20 , 1),
(20 , 2),
(20 , 3),
(20 , 4)
CREATE TABLE #InvoiceSeries(Company INT, NextNo INT)
INSERT INTO #InvoiceSeries(Company, NextNo)
VALUES
(10, 1),
(20 ,1)
UPDATE s
SET NextNo = MaxInvoiceNo
FROM #InvoiceSeries s
INNER JOIN (
--Get the Max invoice number per company
SELECT Company, MAX(InvoiceNo) as MaxInvoiceNo
FROM #_Invoice
GROUP BY Company
) i on i.Company = s.Company
AND s.NextNo < i.MaxInvoiceNo --Only join to records where the 'nextno' is less than the max
--Confirm results
SELECT * FROM #InvoiceSeries
DROP TABLE #InvoiceSeries
DROP TABLE #_Invoice

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth

Calculated balance of purchased lots

I have a list of purchases by date. EG:
ItemCode, Purchase Date, Purchase Qty
XXX, 01 Jan 2012, 10
XXX, 10 Jan 2012, 5
For the item I have a corresponding Sales transactions:
Item, Sales Date, Sales Qty
XXX, 02 Jan 2012, -5
XXX, 09 Jan 2012, -3
XXX, 11 JAN 2012, -3
I am looking to get a SQL query (Without a cursor), to get the balance on each purchase order quantity. I.e Run each purchase (First in first out) to 0. (For the purposes of aging inventory )
How can you join the Purchases to the Sales to get this balance remaining each purchased Inventory Lot? Is this possible without a cursor?
Yes.
You union the two tables together, and run a running total on the resulting set.
;with cte as
(
select itemcode, purchasedate as tdate, purchaseqty as qty from purchases
union
select itemcode, salesdate, salesqty from sales
)
select
t1.*,
SUM(t2.qty)
from cte t1
left join cte t2
on t1.tdate>=t2.tdate
and t1.item = t2.item
group by t1.item, t1.pdate, t1.qty
To get the stock remaining at any particular time the same principal applies.
select p1.*,
case when (select SUM(abs(qty)) from sales) > SUM(p2.qty) then 0
else SUM(p2.qty) - (select SUM(abs(qty)) from sales) end as stockremaining
from purchases p1
left join purchases p2 on p1.item = p2.item
and p2.purchasedate <= p1.purchasedate
group by p1.purchasedate, p1.item, p1.qty
gives
1 2012-01-01 10 0
1 2012-01-10 5 4

How to sum totals across different tables when one table could have no rows

I am trying to have the results from two columns in table1 subtracted from one another and add the total from table2 should the data exist in that table. Not always will there be data in table2 so if there is nothing there I need to use "0". Here is what I have so far which returns the wrong amount when there IS data in table2.
SELECT
CONVERT(CHAR(10),table1.PostingDate, 120) AS business_date,
table1.Location AS store_number,
(CASE WHEN COUNT(table2.Document) > 0 THEN
SUM(table1.Total - table1.TipAmount + table2.Total)
Else
SUM(table1.Total-table1.TipAmount)
END) AS net_sales_ttl
FROM table1
left join table2
on CONVERT(CHAR(10),table1.PostingDate, 120) = CONVERT(CHAR(10),table2.PostingDate, 120)
WHERE table1.PostingDate between '2012-09-09' and '2012-09-16'
GROUP BY CONVERT(CHAR(10),table1.PostingDate, 120), table1.Location
Here is the results:
business_date store_number net_sales_ttl
2012-09-09 xxx 1699.61
2012-09-10 xxx 923.56
2012-09-11 xxx 1230.93 <--This should be 1399.93
2012-09-12 xxx 874.98
2012-09-13 xxx 1342.21
2012-09-14 xxx 1609.6
2012-09-15 xxx 2324.31
For some reason the query is not doing the math correctly and returning the wrong values. The only day that table2 has a value is 09-11-12 and that amount is only -1.00. It is giving me 1230.93 which is -169 from the correct value. I don't know where the -169 is coming from when it should be -1.00. Original amount is 1400.93 in table1 and table2 should be subtracted from that which is -1.00 giving a result of 1399.93.
Sample data:
table1 has date, location, and sales
09-09 1111 5.00
09-10 1111 3.00
09-11 1111 7.00
09-12 1111 10.00
table2 has refunds
09-11 1111 -1.00
Return set would look like this:
09-09 1111 5.00
09-10 1111 3.00
09-11 1111 6.00 <--Reflecting the refund from table2
09-12 1111 10.00
try
SELECT
CONVERT(CHAR(10),table1.PostingDate, 120) AS business_date,
table1.Location AS store_number,
SUM (table1.Total - table1.TipAmount + isnull(table2.Total,0))
FROM table1
left join table2
on CONVERT(CHAR(10),table1.PostingDate, 120) = CONVERT(CHAR(10),table2.PostingDate, 120)
WHERE table1.PostingDate between '2012-09-09' and '2012-09-16'
GROUP BY CONVERT(CHAR(10),table1.PostingDate, 120), table1.Location
Also, some sample data would help - the totals alone can't help
Based on your sample data, try this.
Select date, location, sum(sales)
from
(
select date, location, sales from table1
union
select date, location, refunds from table2
) v
group by date, location