SELECT WHERE statement in a JOIN - tsql

I'm trying to make a query with a SELECT statement in a JOIN but couldn't get it to work.
The tables I have are below :
CREATE TABLE check_result
(id int,
check_result_id int,
id_relation int);
INSERT INTO check_result
values(1, 12, 1), (2,9, 1),(3,13, 3);
CREATE TABLE relation
(id int,
name VARCHAR(20),
id_group int);
INSERT INTO relation
values(1, 'pietje', 1), (2,'klaasje', 1),(3,'Harry', 3);
CREATE TABLE groups
(id int,
name VARCHAR(20),
id_sub int);
INSERT INTO groups
values(1, 'support_worker 1',2),(2, 'support_worker 2',2),(3, 'support_worker 2',3);
The query I have thus far is something like :
SELECT R.name , G.name
FROM check_result CR
LEFT JOIN relation R ON R.id = CR.id_relation
LEFT JOIN groups G ON R.id_group = (SELECT id_sub
FROM groups
WHERE name = 'support_worker 2'
AND id_sub = R.id_group )
In the end I was hoping for 3 records in the results but instead there are 6, with the correct results from groups.
Is there somebody who can show me what I'm doing wrong?

With that dataset and without your expected results it is hard to give you a solid answer.
SELECT R.name , G.name
FROM check_result CR
LEFT JOIN relation R ON R.id = CR.id_relation
LEFT JOIN groups G ON G.id_sub = R.id_group and G.name = 'support_worker 2'
You mentioned wanting all 3 results, but your sub select was causing duplicate records to appear.
Is it not a case as the above of not needing to rely on the sub select and simply adding more conditions onto your left join?
One additional thing worth mentioning - as I have little knowledge on what you database structure is but if Groups has an Id that is being references in R.id_group then you should join that and not Id_sub which would change your code to be:
SELECT R.name , G.name
FROM check_result CR
LEFT JOIN relation R ON R.id = CR.id_relation
LEFT JOIN groups G ON G.id = R.id_group and G.name = 'support_worker 2'
Giving the same result in the limited data.
SQL Fiddle

Related

Finding an exact match over several rows

I have a table GroupsTable that can be described through a MWE as follows.
GroupID MemberID
1 42
2 42
2 43
3 42
3 43
3 44
I am then given another table MemberTable that contain some rows with MemberID. For example:
MemberID
42
43
The query I need is one that finds the matching GroupID that has exactly the MemberID from the second table, no more no less. I believe the following code is working, but it is terribly slow, so is there a better way to find the answer?
select c.GroupID from (
select g.GroupID
from GroupTable g
join MemberTable m on m.MemberID = g.MemberID
group by g.GroupID
having count(*) = (select count(*) from MemberTable)
) c
left join GroupTable x on x.GroupID = c.GroupID
and x.MemberID not in (select MemberID from MemberTable)
where x.GroupID is null
Sample data:
create table MemberTable (
MemberID int
)
insert into MemberTable
values
(42),
(43);
create table GroupTable (
GroupID int,
MemberID int
);
insert into GroupTable
values
(1, 42), -- only one member
(2, 42), -- both members
(2, 43),
(3, 42), -- one member too many
(3, 43),
(3, 44),
(4, 40), -- two irrelevant members
(4, 41);
There is a fiddle available here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=ad818a306600286433634fa83c3628a0
You can use a combination of left join, group by, having, and count distinct, like this:
DECLARE #Count int;
SELECT #Count = COUNT(*) FROM MemberTable;
SELECT GroupID
FROM GroupTable As G
LEFT JOIN MemberTable As M
ON G.MemberID = M.MemberID
GROUP BY GroupID
HAVING COUNT(DISTINCT G.MemberID) = #Count
AND COUNT(DISTINCT M.MemberID) = #Count
The left join ensures you'll get all the records for each group id.
The count of distinct of the groups member ids ensures you will only get the group ids that does not have have member ids that doesn't appear in the members table.
The count distinct of the members table ensures you will not get groups that happen to have the same number of member ids as the members table.
Update
After testing with temp tables on my own environment, I've confirmed my suspicion that the performance killer is the count(distinct). I've changed my query to get rid of it and now it seems to be almost twice as fast as the query in the question:
DECLARE #Count int;
SELECT #Count = COUNT(*) FROM MemberTable;
SELECT GroupID
FROM
(
SELECT DISTINCT GroupID, MemberID
FROM GroupTable
) As G
LEFT JOIN
(
SELECT DISTINCT MemberID
FROM MemberTable
) As M
ON G.MemberID = M.MemberID
GROUP BY GroupID
HAVING COUNT(G.MemberID) = #Count
AND COUNT(M.MemberID) = #Count;
Note that if the MemberTable is known to always have distinct MemberID values you can get rid of the second derived table and simply left join the first derived table to the MemberTable directly.

How to optimize SELECT DISTINCT when using multiple Joins?

I have read that using cte's you can speed up a select distinct up to 100 times. Link to the website . They have this following example:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
How would one apply this to a query that has multiple joins? For example i am trying to run this query found below, however it takes roughly two and a half minutes. How would I optimize this accordingly?
SELECT DISTINCT x.code
From jpa
INNER JOIN jp ON jpa.ID=jp.ID
INNER JOIN jd ON (jd.ID=jp.ID And jd.JID=3)
INNER JOIN l ON jpa.ID=l.ID AND l.CID=3
INNER JOIN fa ON fa.ID=jpa.ID
INNER JOIN x ON fa.ID=x.ID
1) GROUP BY on every column worked faster for me.
2) If you have duplicates in some of the tables then you can also pre select that and join from that as an inner query.
3) Generally you can nest join if you expect that this join will limit data.
SQL join format - nested inner joins

Can I Choose Different Table for inner join operation?

This is my T-SQL
select Id,Profile,Type ,
case Profile
when 'Soft' then 'SID'
when 'Hard' then 'HID'
end as [Profile]
from ProductDetail p1
inner join [tableA or tableB] on xxxxxxxx
I want join tableA when Profile = Soft and join tableB when Profile = Hard, how can I do just only using T-SQL in one batch?
Thanks
You can't directly do it, but could achieve the same effect with outer joins
select Id,Profile,Type ,
case Profile
when 'Soft' then 'SID'
when 'Hard' then 'HID'
end as [Profile]
from ProductDetail p1
left outer join tableA ON tableA.x = p1.x AND p1.Profile = 'Soft'
left outer join tableB ON tableB.x = p1.x AND p1.Profile = 'Hard'
where
where
(tableA.x IS NOT NULL and p1.Profile = 'Soft')
or (tableB.x IS NOT NULL and p1.Profile = 'Hard')
Of course, you can choose different tables for inner join operation, but it must be based on some condition or variable.
For Example:
select Id,Profile,Type ,
case Profile
when 'Soft' then 'SID'
when 'Hard' then 'HID'
end as [Profile]
from ProductDetail p1
inner join tableA A
on Profile='Soft'
AND <any other Condition>
UNION
select Id,Profile,Type ,
case Profile
when 'Soft' then 'SID'
when 'Hard' then 'HID'
end as [Profile]
from ProductDetail p1
inner join tableB B
on Profile='Hard'
AND <any other Condition>
You can do this in a single statement with the same or similar case statement in your join. Below is sample code using temp tables that joins to 2 different reference tables merged into a single result set using a UNION
DECLARE #ProductDetail TABLE (Id INT, sProfile VARCHAR(100), StID INT, HdID INT)
DECLARE #TableA TABLE (StId INT, Field1 VARCHAR(100))
DECLARE #TableB TABLE (HdId INT, Field1 VARCHAR(100))
INSERT INTO #ProductDetail (Id, sProfile, StID , HdID ) VALUES (1,'Soft',1,1)
INSERT INTO #ProductDetail (Id, sProfile, StID , HdID ) VALUES (2,'Hard',2,2)
INSERT INTO #TableA (StId,Field1) VALUES (1,'Soft 1')
INSERT INTO #TableA (StId,Field1) VALUES (2,'Soft 2')
INSERT INTO #TableB (HdId,Field1) VALUES (1,'Hard 1')
INSERT INTO #TableB (HdId,Field1) VALUES (2,'Hard 2')
SELECT
p1.Id,p1.sProfile,
CASE
WHEN p1.sProfile = 'Soft' THEN StID
WHEN p1.sProfile = 'Hard' THEN HdId
END AS [Profile]
,ReferenceTable.FieldName
FROM
#ProductDetail p1
INNER JOIN
(
SELECT StID AS id, 'Soft' AS sProfile, Field1 AS FieldName
FROM #TableA AS tableA
UNION ALL
SELECT HdID AS id, 'Hard' AS sProfile, Field1 AS FieldName
FROM #TableB AS tableB
)
AS ReferenceTable
ON
CASE
WHEN p1.sProfile = 'Soft' THEN StID
WHEN p1.sProfile = 'Hard' THEN HdID
END = ReferenceTable.Id
AND p1.sProfile = ReferenceTable.sProfile
This will return the following result set:
Id sProfile Profile FieldName
1 Soft 1 Soft 1
2 Hard 2 Hard 2

Syntax for SQL Not In List?

I am trying to develop a T-SQL query to exclude all rows from another table "B". This other table "B" has 3 columns comprising its PK for a total of 136 rows. So I want to select all columns from table "A" minus those from table "B". How do I do this? I don't think this query is correct because I am still getting a duplicate record error:
CREATE TABLE #B (STUDENTID VARCHAR(50), MEASUREDATE SMALLDATETIME, MEASUREID VARCHAR(50))
INSERT #B
SELECT studentid, measuredate, measureid
from [J5C_Measures_Sys]
GROUP BY studentid, measuredate, measureid
HAVING COUNT(*) > 1
insert into J5C_MasterMeasures (studentid, measuredate, measureid, rit)
select A.studentid, A.measuredate, B.measurename+' ' +B.LabelName, A.score_14
from [J5C_Measures_Sys] A
join [J5C_ListBoxMeasures_Sys] B on A.MeasureID = B.MeasureID
join sysobjects so on so.name = 'J5C_Measures_Sys' AND so.type = 'u'
join syscolumns sc on so.id = sc.id and sc.name = 'score_14'
join [J5C_MeasureNamesV2_Sys] v on v.Score_field_id = sc.name
where a.score_14 is not null AND B.MEASURENAME IS NOT NULL
and (A.studentid NOT IN (SELECT studentid from #B)
and a.measuredate NOT IN (SELECT measuredate from #B)
and a.measureid NOT IN (SELECT measureid from #B))
use NOT EXISTS...NOT IN doesn't filter out NULLS
insert into J5C_MasterMeasures (studentid, measuredate, measureid, rit)
select A.studentid, A.measuredate, B.measurename+' ' +B.LabelName, A.score_14
from [J5C_Measures_Sys] A
join [J5C_ListBoxMeasures_Sys] B on A.MeasureID = B.MeasureID
join sysobjects so on so.name = 'J5C_Measures_Sys' AND so.type = 'u'
join syscolumns sc on so.id = sc.id and sc.name = 'score_14'
join [J5C_MeasureNamesV2_Sys] v on v.Score_field_id = sc.name
where a.score_14 is not null AND B.MEASURENAME IS NOT NULL
AND NOT EXISTS (select 1 from #B where #b.studentid = A.studentid
and a.measuredate = #B.measuredate
and a.measureid = #B.measureid)
and not exists (select 1 from J5C_MasterMeasures z
where z.studentid = A.studentid)
Just so you know, take a look at Select all rows from one table that don't exist in another table
Basically there are at least 5 ways to select all rows from onr table that are not in another table
NOT IN
NOT EXISTS
LEFT and RIGHT JOIN
OUTER APLY (2005+)
EXCEPT (2005+)
Here is a general solution for the difference operation using left join:
select * from FirstTable
left join SecondTable on FirstTable.ID = SecondTable.ID
where SecondTable.ID is null
Of course yours would have a more complicated join on clause, but the basic operation is the same.
I think you can use "NOT IN" with a subquery, but you say you have a multi-field key?
I'd be thinking about using a left outer join and then testing for null on the right...
Martin.

T-SQL: Selecting rows to delete via joins

Scenario:
Let's say I have two tables, TableA and TableB. TableB's primary key is a single column (BId), and is a foreign key column in TableA.
In my situation, I want to remove all rows in TableA that are linked with specific rows in TableB: Can I do that through joins? Delete all rows that are pulled in from the joins?
DELETE FROM TableA
FROM
TableA a
INNER JOIN TableB b
ON b.BId = a.BId
AND [my filter condition]
Or am I forced to do this:
DELETE FROM TableA
WHERE
BId IN (SELECT BId FROM TableB WHERE [my filter condition])
The reason I ask is it seems to me that the first option would be much more effecient when dealing with larger tables.
Thanks!
DELETE TableA
FROM TableA a
INNER JOIN TableB b
ON b.Bid = a.Bid
AND [my filter condition]
should work
I would use this syntax
Delete a
from TableA a
Inner Join TableB b
on a.BId = b.BId
WHERE [filter condition]
Yes you can. Example :
DELETE TableA
FROM TableA AS a
INNER JOIN TableB AS b
ON a.BId = b.BId
WHERE [filter condition]
Was trying to do this with an access database and found I needed to use a.* right after the delete.
DELETE a.*
FROM TableA AS a
INNER JOIN TableB AS b
ON a.BId = b.BId
WHERE [filter condition]
It's almost the same in MySQL, but you have to use the table alias right after the word "DELETE":
DELETE a
FROM TableA AS a
INNER JOIN TableB AS b
ON a.BId = b.BId
WHERE [filter condition]
The syntax above doesn't work in Interbase 2007. Instead, I had to use something like:
DELETE FROM TableA a WHERE [filter condition on TableA]
AND (a.BId IN (SELECT a.BId FROM TableB b JOIN TableA a
ON a.BId = b.BId
WHERE [filter condition on TableB]))
(Note Interbase doesn't support the AS keyword for aliases)
I'm using this
DELETE TableA
FROM TableA a
INNER JOIN
TableB b on b.Bid = a.Bid
AND [condition]
and #TheTXI way is good as enough but I read answers and comments and I found one things must be answered is using condition in WHERE clause or as join condition. So I decided to test it and write an snippet but didn't find a meaningful difference between them. You can see sql script here and important point is that I preferred to write it as commnet because of this is not exact answer but it is large and can't be put in comments, please pardon me.
Declare #TableA Table
(
aId INT,
aName VARCHAR(50),
bId INT
)
Declare #TableB Table
(
bId INT,
bName VARCHAR(50)
)
Declare #TableC Table
(
cId INT,
cName VARCHAR(50),
dId INT
)
Declare #TableD Table
(
dId INT,
dName VARCHAR(50)
)
DECLARE #StartTime DATETIME;
SELECT #startTime = GETDATE();
DECLARE #i INT;
SET #i = 1;
WHILE #i < 1000000
BEGIN
INSERT INTO #TableB VALUES(#i, 'nameB:' + CONVERT(VARCHAR, #i))
INSERT INTO #TableA VALUES(#i+5, 'nameA:' + CONVERT(VARCHAR, #i+5), #i)
SET #i = #i + 1;
END
SELECT #startTime = GETDATE()
DELETE a
--SELECT *
FROM #TableA a
Inner Join #TableB b
ON a.BId = b.BId
WHERE a.aName LIKE '%5'
SELECT Duration = DATEDIFF(ms,#StartTime,GETDATE())
SET #i = 1;
WHILE #i < 1000000
BEGIN
INSERT INTO #TableD VALUES(#i, 'nameB:' + CONVERT(VARCHAR, #i))
INSERT INTO #TableC VALUES(#i+5, 'nameA:' + CONVERT(VARCHAR, #i+5), #i)
SET #i = #i + 1;
END
SELECT #startTime = GETDATE()
DELETE c
--SELECT *
FROM #TableC c
Inner Join #TableD d
ON c.DId = d.DId
AND c.cName LIKE '%5'
SELECT Duration = DATEDIFF(ms,#StartTime,GETDATE())
If you could get good reason from this script or write another useful, please share. Thanks and hope this help.
Let's say you have 2 tables, one with a Master set (eg. Employees) and one with a child set (eg. Dependents) and you're wanting to get rid of all the rows of data in the Dependents table that cannot key up with any rows in the Master table.
delete from Dependents where EmpID in (
select d.EmpID from Employees e
right join Dependents d on e.EmpID = d.EmpID
where e.EmpID is null)
The point to notice here is that you're just collecting an 'array' of EmpIDs from the join first, the using that set of EmpIDs to do a Deletion operation on the Dependents table.
In SQLite, the only thing that work is something similar to beauXjames' answer.
It seems to come down to this
DELETE FROM table1 WHERE table1.col1 IN (SOME TEMPORARY TABLE);
and that some temporary table can be crated by SELECT and JOIN your two table which you can filter this temporary table based on the condition that you want to delete the records in Table1.
The simpler way is:
DELETE TableA
FROM TableB
WHERE TableA.ID = TableB.ID
DELETE FROM table1
where id IN
(SELECT id FROM table2..INNER JOIN..INNER JOIN WHERE etc)
Minimize use of DML queries with Joins. You should be able to do most of all DML queries with subqueries like above.
In general, joins should only be used when you need to SELECT or GROUP by columns in 2 or more tables. If you're only touching multiple tables to define a population, use subqueries. For DELETE queries, use correlated subquery.
You can run this query:
DELETE FROM TableA
FROM
TableA a, TableB b
WHERE
a.Bid=b.Bid
AND
[my filter condition]