JOIN vs. IN vs. EXISTS - tsql

I was reading an article that explained the difference between join and in and exists clause but I got confused with the explanation of different results when using NOT IN vs. NOT EXISTS clause. Can someone clarify why there is a difference between the output for NOT EXISTS clause vs. NOT IN clause? I tried after deleting the NULL row (t2.id = 8) from the table t2 and still got the same result.
Here's the SQL script from the article:
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT)
GO
CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20))
GO
INSERT INTO t1
SELECT 1, 'title 1', 5 UNION ALL
SELECT 2, 'title 2', 5 UNION ALL
SELECT 3, 'title 3', 5 UNION ALL
SELECT 4, 'title 4', 5 UNION ALL
SELECT null, 'title 5', 5 UNION ALL
SELECT null, 'title 6', 5
INSERT INTO t2
SELECT 1, 1, 'data 1' UNION ALL
SELECT 2, 1, 'data 2' UNION ALL
SELECT 3, 2, 'data 3' UNION ALL
SELECT 4, 3, 'data 4' UNION ALL
SELECT 5, 3, 'data 5' UNION ALL
SELECT 6, 3, 'data 6' UNION ALL
SELECT 7, 4, 'data 7' UNION ALL
SELECT 8, null, 'data 8' UNION ALL
SELECT 9, 6, 'data 9' UNION ALL
SELECT 10, 6, 'data 10' UNION ALL
SELECT 11, 8, 'data 11'
And here's the SQL queries and their explanation:
-- IN doesn't get correct results.
-- That's because of how IN treats NULLs and the Three-valued logic
-- NULL is treated as an unknown, so if there's a null in the t2.t1id
-- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE.
-- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set.
SELECT t1.*
FROM t1
WHERE t1.id NOT IN (SELECT t1id FROM t2)
-- NOT EXISTS gets correct results
SELECT t1.*
FROM t1
WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id)
GO
DROP TABLE t2
DROP TABLE t1
Here's the link to the article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
Thank you!

As I can see, you can use them as the same thing in a lot of cases, but you can't forget the details behind them.
Probably you can get the same results applying both NOT IN and NOT EXISTS, but you could see differences in query which involve the NULL value. Because NOT EXISTS is the only way to obtain those rows with the NULL value.
You can see it better in this example:
update cars set c_owner = NULL where c_id = BMW03444
Well... Let's try to see if we have any car in stock that has not been sold yet.
select count(*) from cars where c_owner not it (select c_name from customers);
Output:
COUNT(*): 0
Where's the failure? Quite simple. You're not requesting a group of cars whose buyers has not been included in the list. You are simply asking for a car without owner. Anybody, even if he's not in the list. The correct form is:
select count(*)
from cars c1
where not exists (
select c_owner
from customers c2
where c1.c_owner=c2.customer_id
);
COUNT(*): 1
This is because NOT IN needs specific values to check in. So NULL values are set as FALSE and not counted.
NOT EXISTS checks the non existence of an element in a set, so NULL values are set as TRUE and are included.

Related

Learning Pivoting in TSQL

I feel that this should be simple, but all the pivots I find seem to be more complicated than what I am looking for, so any help or re-direction would be much appreciated.
I have ‘ID_code’ and ‘product_name’ and I am looking for mismatched product names and have them put next to each other in a row as opposed to in a column like this:
Select distinct ID_Code, product_name
From table
Where ID_Code in
(Select ID_Code from table
Group by ID_Code
Having count(distinct product_name) <> 1)
I would like a table set out as
ID_Code Product_name1 Product_name2 Product_name3
Thanks very much, and have a Happy New Year!
This should remove the duplicates but still returns one result if the product_name has a match.
;with testdata as(
SELECT '1' as ID_Code, 'bike' as product_name
UNION ALL SELECT '1', 'biker'
UNION ALL SELECT '1', 'bike'
UNION ALL SELECT '2', 'motorbike'
UNION ALL SELECT '2', 'motorbike'
UNION ALL SELECT '2', 'motorbike'
UNION ALL SELECT '2', 'motrbike'
UNION ALL SELECT '2', 'motorbiker'
)
--added this section to return distinct products
,cte as(
SELECT * FROM testdata d1
INTERSECT
SELECT * FROM testdata d2
)
SELECT --DISTINCT --Use DISTINCT here if need to return just one line per ID_Code
ID_Code
,product_name = STUFF((SELECT ', ' +
--Added this to track product_names for each ID_Code
t2.product_name + '_' + cast(ROW_NUMBER() OVER (PARTITION BY ID_Code ORDER BY product_name) as varchar(100))
FROM cte t2
WHERE t2.ID_Code = cte.ID_Code
FOR XML PATH('')), 1, 2, '')
FROM cte
Example here: db<>fiddle
More info about INTERSECT should this not be what works in this scenario.
Your expected output appears to be somewhat inflexible, because we may not know exactly how many columns/products would be needed. Instead, I recommend and rolling up the mismatched products into a CSV string for output.
SELECT
ID_Code,
STUFF((SELECT ',' + t2.product_name
FROM yourTable t2
WHERE t1.ID_Code = t2.ID_Code
FOR XML PATH('')), 1, 1, '') products
FROM your_table t1
GROUP BY
ID_Code
HAVING
MIN(product_name) <> MAX(product_name); -- index friendly
Demo

Sybase Subgroup Order By

I'm trying to join a sub query but as the table contains no unique row_id and I need the most recent record of a specific type, I want to do an Order by Date Desc and get the Top 1 from that, but it doesn't seem allowed in Sybase ASE.
I put together a small sample to show kinda what i'm trying to do.
CREATE TABLE #test_users (USER_ID CHAR(9), USER_CK INT)
INSERT INTO #test_users (USER_ID, USER_CK)
SELECT 'QA0000001', 123000010
UNION ALL SELECT 'QA0000002', 123000020
UNION ALL SELECT 'QA0000003', 123000030
UNION ALL SELECT 'QA0000004', 123000040
UNION ALL SELECT 'QA0000005', 123000050
CREATE TABLE #test_records (STAT_TYPE CHAR(3), USER_CK INT, MOD_DT DATE, PLAN_ID CHAR(1))
INSERT INTO #test_records (STAT_TYPE, USER_CK, MOD_DT, PLAN_ID)
SELECT 'ADD', 123000010, '8/1/2017', 'A'
UNION ALL SELECT 'TRM', 123000010, '6/1/2018', 'A'
UNION ALL SELECT 'ADD', 123000010, '6/1/2018', 'B'
UNION ALL SELECT 'ADD', 123000020, '5/1/2017', 'A'
UNION ALL SELECT 'TRM', 123000020, '9/1/2018', 'A'
UNION ALL SELECT 'ADD', 123000030, '3/1/2018', 'A'
UNION ALL SELECT 'ADD', 123000040, '4/1/2018', 'A'
UNION ALL SELECT 'ADD', 123000050, '1/1/2018', 'B'
UNION ALL SELECT 'TRM', 123000050, '7/1/2018', 'B'
UNION ALL SELECT 'ADD', 123000050, '7/1/2018', 'A'
--Want to know everyone that has ever been in Plan A, and what their last status type was.
--Should return
-- QA0000001, A, TRM, 6/1/2018
-- QA0000002, A, TRM, 9/1/2018
-- QA0000003, A, ADD, 3/1/2018
-- QA0000004, A, ADD, 4/1/2018
-- QA0000005, A, ADD, 7/1/2018
SELECT u.USER_ID, r.PLAN_ID, r.STAT_TYPE, r.MOD_DT FROM #test_users u
LEFT JOIN #test_records r
ON PLAN_ID = 'A'
AND u.USER_CK = r.USER_CK
Thanks for any help you could provide.
Mike

Reason alias works in ORDER BY but not in WHERE [duplicate]

This question already has answers here:
Cannot use Alias name in WHERE clause but can in ORDER BY
(7 answers)
Closed 5 years ago.
I understand that I can't reference an alias in the WHERE clause, but why is that? Is it interpreted differently?
Something like this generates an error:
declare #myTable table
(
num numeric(5,2),
den numeric(5,2)
)
insert into #mytable
select 1, 2
union
select 1, 3
union
select 2, 3
union
select 2, 4
union
select 2, 5
union
select null, 1
select num/den as 'calc' from #myTable
where calc is not null
order by calc
But this returns rows:
declare #myTable table
(
num numeric(5,2),
den numeric(5,2)
)
insert into #mytable
select 1, 2
union
select 1, 3
union
select 2, 3
union
select 2, 4
union
select 2, 5
union
select null, 1
select num/den as 'calc' from #myTable
--where calc is not null
order by calc
As mentioned in Cannot use Alias name in WHERE clause but can in ORDER BY, it's due to the natural query processing order:
FROM
ON
OUTER
WHERE
GROUP BY
CUBE | ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP

tsql - Self reference

I have a CTE that returns the below records. How should i proceed with the new query so that all records with GID = NULL will get the previous last GID?
ID GID VALUE
1 1 Some Value
2 NULL Some Value
3 2 Some Value
4 3 Some Value
5 NULL Some Value
6 NULL Some Value
Eg. Records with ID 5 and 6 will have GID = 3
with C(ID, GID, VALUE) as
(
select 1, 1, 'Some Value' union all
select 2, NULL, 'Some Value' union all
select 3, 2, 'Some Value' union all
select 4, 3, 'Some Value' union all
select 5, NULL, 'Some Value' union all
select 6, NULL, 'Some Value'
)
select C1.ID,
C3.GID,
C1.VALUE
from C as C1
cross apply
(select top 1 C2.ID, C2.GID
from C as C2
where C2.ID <= C1.ID and
C2.GID is not null
order by C2.ID desc) as C3
;WITH C(ID, GID, VALUE) as
(
select 1, 1, 'Some Value' union all
select 2, NULL, 'Some Value' union all
select 3, 2, 'Some Value' union all
select 4, 3, 'Some Value' union all
select 5, NULL, 'Some Value' union all
select 6, NULL, 'Some Value'
)
select c.id, d.GID, c.value from c
cross apply
(select max(GID) GID from c d where c.id >= id) d

Select count from second table based on initial select

Table 1:
AccountId, ReferenceId, Name, (lots of other columns)
Table 2:
AccountId, ReferenceId, (other columns)
How can I do a select to get the following:
AccountId, ReferenceId, [Count(*) in Table2 where accountId and reference ID match.]
1, AB, 1
1, AC, 0
2, AD, 4
2, EF, 0
etc
Guessing a join, but that gives me values, not a count?
Tried adding a count, but get errors?
SELECT T1.AccountId,
T1.ReferenceId,
COUNT(T2.ReferenceId) AS Cnt
FROM Table1 T1
LEFT JOIN Table2 T2
ON T1.AccountId = T2.AccountId
AND T1.ReferenceId = T2.ReferenceId
GROUP BY T1.AccountId,
T1.ReferenceId
Something like:
SELECT t1.AccountId, t1.ReferenceId, COUNT(t2.AccountId)
FROM Table1 t1
LEFT JOIN Table2 t2 ON t1.AccountId = t2.AccountId AND
t1.ReferenceId = t2.ReferenceId
GROUP BY t1.AccountId, t1.ReferenceId
should work. The trick is to group by both key values so you can aggregate over other values. In this case you want to simply count values from other rows (you could also sum or average values from the grouped-by rows.).
sample data
declare #tbl1 table (AccountId INT, ReferenceId int, Name varchar(20))
declare #tbl2 table (AccountId INT, ReferenceId int)
insert into #tbl1 select 1, 10, 'White'
insert into #tbl1 select 2, 20, 'Green'
insert into #tbl1 select 3, 30, 'Black'
insert into #tbl1 select 3, 40, 'Red'
insert into #tbl2 select 1, 10
insert into #tbl2 select 1, 10
insert into #tbl2 select 2, 20
insert into #tbl2 select 3, 30
Query
select t.AccountId, t.ReferenceId, t.Name
,(select COUNT(*) from #tbl2 t2
where t.AccountId = t2.AccountId
and t.ReferenceId = t.ReferenceId) as countt
from #tbl1 t
SELECT t1.AccountId, t1.ReferenceId, COUNT(t2.AccountId)
FROM Table1 t1 LEFT JOIN Table2 t2
ON (t1.AccountId=t2.AccountId AND t1.ReferenceId=t2.ReferenceId)
GROUP BY Table1.AccountId, Table1.ReferenceId