SELECT 0 for multiple rows of join

SELECT 0 for multiple rows of join - tsql

I have an invoices and a payments table.
For each invoice record (which contains an OriginalInvoiceValue field), there can be multiple payments associated in the payments table. In a select that joins the two tables, each invoice record is of course repeated for each occurrence of an associated payment record.
What I would like though, is to have the OriginalInvoiceValue field returned only once per invoice, and then have it return 0 (or NULL) for each additional occurrence of an associated payment record. (Such that if I were to export the data to excel and sum the OriginalInvoiceValue column, I actually get the real total of all invoices, instead of getting it multiplied by each additional occurrence of a payment).
Is this possible in T-SQL?

If Sql Server 2005 or newer, you might assign row numbers to individual payments of an invoice, and left join to invoices selecting actual invoice record for first payment only. Put order by you require in row_number() part; I've chosen PaymentID, but payment date is probably more appropriate.
; with p as (
select *,
row_number() over (partition by InvoiceID
order by PaymentID) rn
from payments
)
select *
from p
left join invoices i
on p.InvoiceID = i.InvoiceID
and p.rn = 1
order by p.InvoiceID, rn
And here is SQL FIDDLE with example.

Related

Postgres query filter by non column in table

i have a challenge whose consist in filter a query not with a value that is not present in a table but a value that is retrieved by a function.
let's consider a table that contains all sales on database
id, description, category, price, col1 , ..... col n
i have function that retrieve me a table of similar sales from one (based on rules and business logic) . This function performs a query again on all records in the sales table and match validation in some fields.
similar_sales (sale_id integer) - > returns a integer[]
now i need to list all similar sales for each one present in sales table.
select s.id, similar_sales (s.id)
from sales s
but the similar_sales can be null and i am interested only return sales which contains at least one.
select id, similar
from (
select s.id, similar_sales (s.id) as similar
from sales s
) q
where #similar > 1 (Pseudocode)
limit x
i can't do the limit in subquery because i don't know what sales have similar or not.
I just wanted do a subquery for a set of small rows and not all entire table to get query performance gains (pagination strategy)

you can try this :
select id, similar
from sales s
cross join lateral similar_sales (s.id) as similar
where not isempty(similar)
limit x

Postgresql finding max transaction_id for each type giving duplicates (when it's not supposed to for PK)

Question as title; So I have a code as shown below to find the ID with highest amount transacted by type of card
SELECT tr.identifier, cc.type, tr.amount as max_amount
FROM credit_cards cc, transactions tr
WHERE (tr.amount, cc.type) IN (SELECT MAX(tr.amount), cc.type
FROM credit_cards cc, transactions tr
WHERE cc.number = tr.number
GROUP BY cc.type)
GROUP BY tr.identifier, cc.type;
When I run the code, I get duplicate transaction_identifier which shouldn't happen since it's the PK of the transactions table; output when I run above code is shown below
ID --------Card type--------------- Max amount
2196 "diners-club-carte-blanche" 1000.62
2196 "visa" 1000.62
11141 "mastercard" 1000.54
2378 "mastercard" 1000.54
e.g. 2196 in above exists for diners carte-blanche not visa;
'mastercard' is correct since 2 different IDs can have same max transaction.
However, this code should run because it is possible for 2 different id to have the same max amount for each type.
Does anyone know how to prevent the duplicates from occurring?
is this due to the WHERE ... IN clause which matches either the max amount or the card type? (the ones with duplicate is Visa and Diners-Carte-Blanche which both have same max value of 1000.62 so I think that's where they're matching wrong)

TL/DR: add WHERE cc.number = tr.number to the outer query.
Long version
When you query FROM table_1, table_2 in the outer query and don't connect the tables (via a join or where clause) the result is a cartesian product, meaning EVERY row from table_1 is joined to EVERY row from table_2. This is the same as a CROSS JOIN.
So while your inner query has a where clause and (correctly) returns the max for each credit card type... your outer query does not, and so all possible combinations of credit card and transaction are being compared to the maximums, not just the valid ones.
For example, if cc has rows three rows (mastercard, visa, amex) and tr has three rows (1,2,3) selecting "from cc, tr" is resulting in nine rows:
mastercard,1
mastercard,2
mastercard,3
visa,1
visa,2
visa,3
amex,1
amex,2
amex,3
where what you want is:
mastercard,1
visa,3
amex,2
Each row in the first table will be repeated for each row in the second. Then the WHERE (...) IN (...) restrict this set of rows to only those that match a row in the inner query. As you can imagine, this can easily lead to duplicate results. Some of those duplicates are being removed by the outer GROUP BY, which should not be necessary once this issue is fixed.
As a general rule, I never use join [table_1], [table_2] and prefer to ALWAYS be explicit about doing an inner or outer join (or, in some situations, a cross join) to help avoid this kind of issue and make it clearer to the reader.
SELECT tr.identifier, cc.type, tr.amount as max_amount
FROM credit_cards cc INNER JOIN transactions tr ON (cc.number = tr.number)
WHERE (tr.amount, cc.type) IN (
SELECT MAX(tr.amount), cc.type
FROM credit_cards cc
INNER JOIN transactions tr ON (cc.number = tr.number)
GROUP BY cc.type
)
NOTE: In the case of a tie, this will give you every transaction for each credit card type that is tied for the maximum amount.

Condition and max reference in redshift window function

I have a list of dates, accounts, and sources of data. I'm taking the latest max date for each account and using that number in my window reference.
In my window reference, I'm using row_number () to assign unique rows to each account and sources of data that we're receiving and sorting it by the max date for each account and source of data. The end result should list out one row for each unique account + source of data combination, with the max date available in that combination. The record with the highest date will have 1 listed.
I'm trying to set a condition on my window function where only rows that populate with 1 are listed in the query, while the other ones are not shown at all. This is what I have below and where I get stuck:
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
Any help is greatly appreciated. I can elaborate on anything if necessary

If I understood your question correctly this SQL would do the trick
SELECT
date,
account,
data source,
MAX(date) max_date
FROM (
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
)
where ROWNUM = 1

If you do not need the row number for anything other than uniqueness then a query like this should work:
select distinct t.account, data_source, date
from table t
join (select account, max(date) max_date from table group by account) m
on t.account=m.account and t.date=m.max_date
This can still generate two records for one account if two records for different data sources have the identical date. If that is a possibility then mdem7's approach is probably best.
It's a bit unclear from the question but if you want each combination of account and data_source with its max date making sure there are no duplicates, then distinct should be enough:
select distinct account, data_source, max(date) max_date
from table t
group by account, data_source

Show field in MS Access query without including it in the group by clause

I'm working on a query that will eventually be used as the record source for a report.
I have a customers and orders table. I want to show customer_id, order_id, and order_date in a query, but I only want to show data associated with the earliest order date for each customer. Basically, I need to show the order_id field without including it in the group by clause. If I include it in the group by clause, I get a lot more records than I want. Based on my research, the code below will work in mysql, but not ms access.
Select customer.customer_id, order.order_id, min(order.order_dt)
From customer inner join order on customer.customer_id = order.customer_id
Group by customer.customer_id
I've tried grouping by order_id in a sub query and ordering by customer then date, then using the first function in the outer query. Unfortunately, the first function doesn't work as advertised.
Any help is greatly appreciated!

Does this work for you? It should bring up the earliest orders by order date for each customer. If there is more than one order on the earliest order date for a customer, all of those orders will be shown, though, so keep it in mind.
SELECT c.customer_id, o.order_id, o.order_dt
FROM customers AS c INNER JOIN (orders AS o INNER JOIN (SELECT customer_ID, MIN([order_dt]) AS MinOrder_dt FROM Orders GROUP BY customer_id) AS d ON (o.Customer_ID = d.customer_id) AND (o.[order_dt] = d.MinOrder_dt)) ON c.customer_id = o.customer_id;
I am deriving a table with just the customer_id and the min order_dt and joining customers and orders to that to only bring up the oldest orders.

PostgreSQL UPDATE - query with left join problem

UPDATE user
SET balance = balance + p.amount
FROM payments p WHERE user.id = p.user_id AND p.id IN (36,38,40)
But it adds to the balance, only the value amount of the first payment 1936.
Please help me how to fix it, i do not want to make cycle in the code to run a lot of requests.

In a multiple-table UPDATE, each row in the target table is updated only once, even it's returned more than once by the join.
From the docs:
When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the fromlist, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
Use this instead:
UPDATE user u
SET balance = balance + p.amount
FROM (
SELECT user_id, SUM(amount) AS amount
FROM payment
WHERE id IN (36, 38, 40)
GROUP BY
user_id
) p
WHERE u.id = p.user_id

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SELECT 0 for multiple rows of join - tsql

Related

Postgres query filter by non column in table

Postgresql finding max transaction_id for each type giving duplicates (when it's not supposed to for PK)

Condition and max reference in redshift window function

Show field in MS Access query without including it in the group by clause

PostgreSQL UPDATE - query with left join problem

Categories

Resources