Distinct Join to find data that does NOT match - Teradata

Distinct Join to find data that does NOT match - Teradata - match

really struggling with this... I have written the following code that seems to work and identifies the row ID of 40,000 addresses that match where FrontDoorColour is RED.
SELECT DISTINCT ID
FROM Database.table1
WHERE table1.address = table2.address
AND table1.FrontDoorColour = 'RED'
The problem I have is when I want to reverse this and identify the 10,000 addresses where FrontDoorColour is RED but where the address does NOT match.
I run the same query but swap
WHERE table1.address = table2.address
for
WHERE table1.address <> table2.addres
Instead of generating the 10,000 NON-matching rows, I get a spool space error (2646)
Any suggestions would be greatly appreciated!
Thanks

An EXPLAIN output of the second query should yield PRODUCT JOIN and is likely the reason for the spool error you received. The first query may also yield a product join but it may process within your spool allocation. The following SQL should help you find address ids from Table1 where the address is not found in Table2 and the door in Table1 is RED for the address id.
SELECT DISTINCT t1.id
FROM Database.Table1 t1
WHERE NOT EXISTS (SELECT 1
FROM Database.Table2 t2
WHERE t1.address = t2.address)
AND t1.FrontDoorColour = 'RED';

Related

Postgresql finding max transaction_id for each type giving duplicates (when it's not supposed to for PK)

Question as title; So I have a code as shown below to find the ID with highest amount transacted by type of card
SELECT tr.identifier, cc.type, tr.amount as max_amount
FROM credit_cards cc, transactions tr
WHERE (tr.amount, cc.type) IN (SELECT MAX(tr.amount), cc.type
FROM credit_cards cc, transactions tr
WHERE cc.number = tr.number
GROUP BY cc.type)
GROUP BY tr.identifier, cc.type;
When I run the code, I get duplicate transaction_identifier which shouldn't happen since it's the PK of the transactions table; output when I run above code is shown below
ID --------Card type--------------- Max amount
2196 "diners-club-carte-blanche" 1000.62
2196 "visa" 1000.62
11141 "mastercard" 1000.54
2378 "mastercard" 1000.54
e.g. 2196 in above exists for diners carte-blanche not visa;
'mastercard' is correct since 2 different IDs can have same max transaction.
However, this code should run because it is possible for 2 different id to have the same max amount for each type.
Does anyone know how to prevent the duplicates from occurring?
is this due to the WHERE ... IN clause which matches either the max amount or the card type? (the ones with duplicate is Visa and Diners-Carte-Blanche which both have same max value of 1000.62 so I think that's where they're matching wrong)

TL/DR: add WHERE cc.number = tr.number to the outer query.
Long version
When you query FROM table_1, table_2 in the outer query and don't connect the tables (via a join or where clause) the result is a cartesian product, meaning EVERY row from table_1 is joined to EVERY row from table_2. This is the same as a CROSS JOIN.
So while your inner query has a where clause and (correctly) returns the max for each credit card type... your outer query does not, and so all possible combinations of credit card and transaction are being compared to the maximums, not just the valid ones.
For example, if cc has rows three rows (mastercard, visa, amex) and tr has three rows (1,2,3) selecting "from cc, tr" is resulting in nine rows:
mastercard,1
mastercard,2
mastercard,3
visa,1
visa,2
visa,3
amex,1
amex,2
amex,3
where what you want is:
mastercard,1
visa,3
amex,2
Each row in the first table will be repeated for each row in the second. Then the WHERE (...) IN (...) restrict this set of rows to only those that match a row in the inner query. As you can imagine, this can easily lead to duplicate results. Some of those duplicates are being removed by the outer GROUP BY, which should not be necessary once this issue is fixed.
As a general rule, I never use join [table_1], [table_2] and prefer to ALWAYS be explicit about doing an inner or outer join (or, in some situations, a cross join) to help avoid this kind of issue and make it clearer to the reader.
SELECT tr.identifier, cc.type, tr.amount as max_amount
FROM credit_cards cc INNER JOIN transactions tr ON (cc.number = tr.number)
WHERE (tr.amount, cc.type) IN (
SELECT MAX(tr.amount), cc.type
FROM credit_cards cc
INNER JOIN transactions tr ON (cc.number = tr.number)
GROUP BY cc.type
)
NOTE: In the case of a tie, this will give you every transaction for each credit card type that is tied for the maximum amount.

Is this a JOIN, Lookup or how to select only records matching a col from two tables

I have two postgres tables where one column listing a city name matches. I'm trying to create a view of some records which I'm displaying on a map via WMS on my GeoServer.
I need to select only records from table1 of 100k records that has a city name that matches those cities listed in table2 of 20 records.
To list everything I've tried would be a waste of your time. I've tried every join tutorial and example but, am perplexed why I can't get any success. I would really appreciate some direction.
Here's a latest query but, if this is the wrong approach just ignore since I have about 50 similar attempts.
SELECT t1.id,
t1.dba,
t1.prem_city,
t1.geom
t2.city_label
FROM schema1.table1 AS t1
LEFT JOIN schema2.table2 AS t2
ON t2.city_label = t1.prem_city;
Thanks for any help!

Your query seems correct, just a minor change - LEFT JOIN keeps all the records from the left table and only the matching record from the right one. If you want only those that appear in both - an INNER JOIN is required .
SELECT t1.id,
t1.dba,
t1.prem_city,
t1.geom,
t2.city_label
FROM schema1.table1 t1
JOIN schema2.table2 t2
ON t2.city_label = t1.prem_city;

INNER JOIN, LEFT/RIGHT OUTER JOIN

Apology in advance for a long question, but doing this just for the sake of learning:
i'm new to SQL and researching on JOIN for now. I'm getting two different behaviors when using INNER and OUTER JOIN. What I know is, INNER JOIN gives an intersection kind of result while returning only common rows among tables, and (LEFT/RIGHT) OUTER JOIN is outputting what is common and remaining rows in LEFT or RIGHT tables, depending upon LEFT/RIGHT clause respectively.
While working with MS Training Kit and trying to solve this practice: "Practice 2: In this practice, you identify rows that appear in one table but have no matches in another. You are given a task to return the IDs of employees from the HR.Employees table who did not handle orders (in the Sales.Orders table) on February 12, 2008. Write three different solutions using the following: joins, subqueries, and set
operators. To verify the validity of your solution, you are supposed to return employeeIDs: 1, 2, 3, 5, 7, and 9."
I'm successful doing this with subqueries and set operators but with JOIN is returning something not expected. I've written the following query:
USE TSQL2012;
SELECT
E.empid
FROM
HR.Employees AS H
JOIN Sales.Orders AS O
ON H.empid = O.empid
AND O.orderdate = '20080212'
JOIN HR.Employees AS E
ON E.empid <> H.empid
ORDER BY
E.empid
;
I'm expecting results as: 1, 2, 3, 5, 7, and 9 (6 rows)
But what i'm getting is: 1,1,1,2,2,2,3,3,3,4,4,5,5,5,6,6,7,7,7,8,8,9,9,9 (24 rows)
I tried some videos but could not understand this side of INNER/OUTER JOIN. I'll be grateful if someone could help this side of JOIN, why is it so and what should I try to understand while working with JOIN.

you can also use left outer join to get not matching
*** The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match.
SELECT
H.empid
FROM
HR.Employees AS H
LEFT OUTER JOIN Sales.Orders AS O
ON H.empid = O.empid
AND O.orderdate = '20080212'
WHERE O.empid IS NULL
Above script will return emp id who did not handle orders on specify date

here you can see all kind of join
Diagram taken from: http://dsin.wordpress.com/2013/03/16/sql-join-cheat-sheet/
adjust your query to be like this
USE TSQL2012;
SELECT
E.empid
FROM
HR.Employees AS H
JOIN Sales.Orders AS O
ON H.empid = O.empid
where O.orderdate = '2008-02-12' AND O.empid IN null
ORDER BY
E.empid
;

USE TSQL2012;
SELECT
distinct E.empid
FROM
HR.Employees AS H
JOIN Sales.Orders AS O
ON H.empid = O.empid
AND O.orderdate = '20080212'
JOIN HR.Employees AS E
ON E.empid <> H.empid
ORDER BY
E.empid
;

Primary things to always remind yourself when working with SQL JOINs:
INNER JOINs require a match in the join in order for result set rows produced prior to the INNER JOIN to remain in the result set. When no match is found for a row, the row is discarded from the result set.
For a row fed to an INNER JOIN that matches to ONLY one row, only one copy of that row fed to the result set is delivered.
For a row fed to an INNER JOIN that matches to multiple rows, the row will be delivered multiple times, once for each row match from the INNER JOIN table.
OUTER JOINs will not discard rows fed to them in the result set, whether or not the OUTER JOIN results in a match or not.
Just like INNER JOINs, if an OUTER JOIN matches to more than one row, it will increase the number of rows in the result set by duplicating rows equal to the number of rows matched from the OUTER JOIN table.
Ask yourself "if I get NO match on the JOIN, do I want the row discarded or not?" If the answer is NO, use an OUTER JOIN. If the answer is YES, use an INNER JOIN.
If you don't need to reference any of the columns from a JOIN table, don't perform a JOIN at all. Instead, use a WHERE EXISTS, WHERE NOT EXISTS, WHERE IN, WHERE NOT IN, etc. or similar, depending on your database engine in use. Don't rely on the database engine to be smart enough to discard unreferenced columns resulting from JOINs from the result set. Some databases may be smart enough to do that, some not. There's no reason to pull columns into a result set only to not reference them. Doing so increases chance of reduced performance.
Your JOIN of:
JOIN HR.Employees AS E
ON E.empid <> H.empid
...is matching to all Employees rows with a DIFFERENT EMPID to all rows fed to that join. Use of NOT EQUAL on an INNER JOIN is a very rare thing to do or need, especially if the JOIN predicate is testing only ONE condition. That is why your getting duplicate rows in the result set.
On DB2, we could perform an EXCEPTION JOIN to accomplish that using a JOIN alone. Normally, on DB2, I would use a WHERE NOT EXISTS for that. On SQL Server you could do a JOIN to a query where the query set is all employees without orders in SALES.ORDERS on the specified date, but I don't know if that violates the rules of your tutorial.
Naveen posted the solution it appears your tutorial is looking for!

SQLPLUS Table Trouble

I've been using SQLPLUS lately and one of my tasks was to display a set of values from two tables (stocks, orderitems). I have done this part, but I am stuck on the last part of the question which states: "including the stocks that no order has been placed on them so far".
Here is the statement:
`select Stocks.StockNo, Stocks.Description, OrderItems.QtyOrd
from Stocks INNER JOIN OrderItems
ON Stocks.StockNo = OrderItems.StockNo;`
and I have gotten the correct results for this part, but the second part is eluding me, as the curernt statement doesn't display the 0 values for QtyOrd.
Any help would be appreciated.

You likely want to use a LEFT OUTER JOIN otherwise the INNER JOIN will exclude Stocks which don't have any Orders. You might also consider grouping by Stock, in order to SUM the overall quantities for each stock?
SELECT Stocks.StockNo, Stocks.Description, SUM(OrderItems.QtyOrd) AS QtyOrd
FROM Stocks
LEFT OUTER JOIN OrderItems
ON Stocks.StockNo = OrderItems.StockNo
GROUP BY Stocks.StockNo, Stocks.Description;

Need help for Filtering the data in sql

i have a table that contain s.no Id and Amount and accCode.
s.no-------------id--------------Amount--------accCode
1----------------2---------------20-------------2.1
2----------------1---------------30-------------2.1
3--------------- 5---------------20-------------3.1
4----------------1---------------30-------------2.1
5----------------3---------------40-------------3.1
6----------------2---------------20-------------2.1
i need all the record that have a common Amount and accCode and id. In this case i need to show the data of S.NO 2 and 4, and also 1 and 6 as they have similar value. If Possible it would be better the similar data come orderly. Is this one possible through Sql? Please give some hints i am stuck with this one.thaks in advance.

One solution could be, assuming that your table is named "test"
select t1.*, t2.[s.no] as MatchSNo from test t1, test t2
where t1.id = t2.id and t1.amount = t2.amount
and t1.acccode = t2.acccode and t1.[s.no] <> t2.[s.no]
order by t1.id, t1.Amount, t1.accCode, [s.no]