T-SQL: Selecting rows to delete via joins - tsql

Scenario:
Let's say I have two tables, TableA and TableB. TableB's primary key is a single column (BId), and is a foreign key column in TableA.
In my situation, I want to remove all rows in TableA that are linked with specific rows in TableB: Can I do that through joins? Delete all rows that are pulled in from the joins?
DELETE FROM TableA
FROM
TableA a
INNER JOIN TableB b
ON b.BId = a.BId
AND [my filter condition]
Or am I forced to do this:
DELETE FROM TableA
WHERE
BId IN (SELECT BId FROM TableB WHERE [my filter condition])
The reason I ask is it seems to me that the first option would be much more effecient when dealing with larger tables.
Thanks!

DELETE TableA
FROM TableA a
INNER JOIN TableB b
ON b.Bid = a.Bid
AND [my filter condition]
should work

I would use this syntax
Delete a
from TableA a
Inner Join TableB b
on a.BId = b.BId
WHERE [filter condition]

Yes you can. Example :
DELETE TableA
FROM TableA AS a
INNER JOIN TableB AS b
ON a.BId = b.BId
WHERE [filter condition]

Was trying to do this with an access database and found I needed to use a.* right after the delete.
DELETE a.*
FROM TableA AS a
INNER JOIN TableB AS b
ON a.BId = b.BId
WHERE [filter condition]

It's almost the same in MySQL, but you have to use the table alias right after the word "DELETE":
DELETE a
FROM TableA AS a
INNER JOIN TableB AS b
ON a.BId = b.BId
WHERE [filter condition]

The syntax above doesn't work in Interbase 2007. Instead, I had to use something like:
DELETE FROM TableA a WHERE [filter condition on TableA]
AND (a.BId IN (SELECT a.BId FROM TableB b JOIN TableA a
ON a.BId = b.BId
WHERE [filter condition on TableB]))
(Note Interbase doesn't support the AS keyword for aliases)

I'm using this
DELETE TableA
FROM TableA a
INNER JOIN
TableB b on b.Bid = a.Bid
AND [condition]
and #TheTXI way is good as enough but I read answers and comments and I found one things must be answered is using condition in WHERE clause or as join condition. So I decided to test it and write an snippet but didn't find a meaningful difference between them. You can see sql script here and important point is that I preferred to write it as commnet because of this is not exact answer but it is large and can't be put in comments, please pardon me.
Declare #TableA Table
(
aId INT,
aName VARCHAR(50),
bId INT
)
Declare #TableB Table
(
bId INT,
bName VARCHAR(50)
)
Declare #TableC Table
(
cId INT,
cName VARCHAR(50),
dId INT
)
Declare #TableD Table
(
dId INT,
dName VARCHAR(50)
)
DECLARE #StartTime DATETIME;
SELECT #startTime = GETDATE();
DECLARE #i INT;
SET #i = 1;
WHILE #i < 1000000
BEGIN
INSERT INTO #TableB VALUES(#i, 'nameB:' + CONVERT(VARCHAR, #i))
INSERT INTO #TableA VALUES(#i+5, 'nameA:' + CONVERT(VARCHAR, #i+5), #i)
SET #i = #i + 1;
END
SELECT #startTime = GETDATE()
DELETE a
--SELECT *
FROM #TableA a
Inner Join #TableB b
ON a.BId = b.BId
WHERE a.aName LIKE '%5'
SELECT Duration = DATEDIFF(ms,#StartTime,GETDATE())
SET #i = 1;
WHILE #i < 1000000
BEGIN
INSERT INTO #TableD VALUES(#i, 'nameB:' + CONVERT(VARCHAR, #i))
INSERT INTO #TableC VALUES(#i+5, 'nameA:' + CONVERT(VARCHAR, #i+5), #i)
SET #i = #i + 1;
END
SELECT #startTime = GETDATE()
DELETE c
--SELECT *
FROM #TableC c
Inner Join #TableD d
ON c.DId = d.DId
AND c.cName LIKE '%5'
SELECT Duration = DATEDIFF(ms,#StartTime,GETDATE())
If you could get good reason from this script or write another useful, please share. Thanks and hope this help.

Let's say you have 2 tables, one with a Master set (eg. Employees) and one with a child set (eg. Dependents) and you're wanting to get rid of all the rows of data in the Dependents table that cannot key up with any rows in the Master table.
delete from Dependents where EmpID in (
select d.EmpID from Employees e
right join Dependents d on e.EmpID = d.EmpID
where e.EmpID is null)
The point to notice here is that you're just collecting an 'array' of EmpIDs from the join first, the using that set of EmpIDs to do a Deletion operation on the Dependents table.

In SQLite, the only thing that work is something similar to beauXjames' answer.
It seems to come down to this
DELETE FROM table1 WHERE table1.col1 IN (SOME TEMPORARY TABLE);
and that some temporary table can be crated by SELECT and JOIN your two table which you can filter this temporary table based on the condition that you want to delete the records in Table1.

The simpler way is:
DELETE TableA
FROM TableB
WHERE TableA.ID = TableB.ID

DELETE FROM table1
where id IN
(SELECT id FROM table2..INNER JOIN..INNER JOIN WHERE etc)
Minimize use of DML queries with Joins. You should be able to do most of all DML queries with subqueries like above.
In general, joins should only be used when you need to SELECT or GROUP by columns in 2 or more tables. If you're only touching multiple tables to define a population, use subqueries. For DELETE queries, use correlated subquery.

You can run this query:
DELETE FROM TableA
FROM
TableA a, TableB b
WHERE
a.Bid=b.Bid
AND
[my filter condition]

Related

Postgres join involving tables having join condition defined on an text array

I have two tables in postgresql
One table is of the form
Create table table1(
ID serial PRIMARY KEY,
Type []Text
)
Create table table2(
type text,
sellerID int
)
Now i want to get all the rows from table1 which are having type same that in table2 but the problem is that in table1 the type is an array.
In case the type in the table has an identifiable delimiter like ',' ,';' etc. you can rewrite the query as regexp_split_to_table(type,',') or versions later than 9.5 unnest function can be use too.
For eg.,
select * from
( select id ,regexp_split_to_table(type,',') from table1)table1
inner join
select * from table2
on trim(table1.type) = trim(table2.type)
Another good example can be found - https://www.dbrnd.com/2017/03/postgresql-regexp_split_to_array-to-split-string-using-different-delimiters/
SELECT
a[1] AS DiskInfo
,a[2] AS DiskNumber
,a[3] AS MessageKeyword
FROM (
SELECT regexp_split_to_array('Postgres Disk information , disk 2 , failed', ',')
) AS dt(a)
You can use the ANY operator in the JOIN condition:
select *
from table1 t1
join table2 t2 on t2.type = any (t1.type);
Note that if the types in the table1 match multiple rows in table2, you would get duplicates (from table1) because that's how a join works. Maybe you want an EXISTS condition instead:
select *
from table1 t1
where exists (select *
from table2 t2
where t2.type = any(t1.type));

How to change update statements into insert statements while maintaining SET command from UPDATE command?

As the title suggests, I need to change my update and join statements into insert statements. How I would do this while incorporating SET from the UPDATE statements?
Update statement:
UPDATE tableA
SET A = tableB.A
FROM tableB
JOIN tableC ON tableB.C = tableC.C
WHERE tableC.D = tableA.D
Your tables are empty, thus tableC.D = tableA.D wouldn't provide anything.
May it be you are looking for something like this?:
INSERT INTO tableA (A)
SELECT DISTINCT tb.A
FROM tableB tb
JOIN tableC tc ON tb.C = tc.C
Or perhaps including the D column:
INSERT INTO tableA (A, D)
SELECT DISTINCT tb.A, tc.D
FROM tableB tb
JOIN tableC tc ON tb.C = tc.C
Note the use of DISTINCT here to dispense duplicate records.
Generally, it is good practice to include all the column names in the INSERT statement. And you can remove WHERE clause by JOINing tableA to tableC. But just the simple answer for your question would be:
INSERT INTO tableA (<col name you want to update>)
VALUES (SELECT <one value> FROM tableB JOIN tableC ON tableB.C = tableC.C JOIN tableA ON tableC.D = tableA.D
);

How to optimize SELECT DISTINCT when using multiple Joins?

I have read that using cte's you can speed up a select distinct up to 100 times. Link to the website . They have this following example:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
How would one apply this to a query that has multiple joins? For example i am trying to run this query found below, however it takes roughly two and a half minutes. How would I optimize this accordingly?
SELECT DISTINCT x.code
From jpa
INNER JOIN jp ON jpa.ID=jp.ID
INNER JOIN jd ON (jd.ID=jp.ID And jd.JID=3)
INNER JOIN l ON jpa.ID=l.ID AND l.CID=3
INNER JOIN fa ON fa.ID=jpa.ID
INNER JOIN x ON fa.ID=x.ID
1) GROUP BY on every column worked faster for me.
2) If you have duplicates in some of the tables then you can also pre select that and join from that as an inner query.
3) Generally you can nest join if you expect that this join will limit data.
SQL join format - nested inner joins

set TableA boolean based on TableB record

My data looks like this:
TableA
- id INT
- is_in_table_b BOOL
TableB
- id INT
- table_a_id INT
I accidentally wiped out the 'is_in_table_b' BOOL on my dev machine while reorganizing the data structures, and I forgot how I created it. It's just a shortcut for some dev benchmarks.
All the "UPATE ... FROM ...." variations I tried are setting everything as "true" based on a the join. I can't remember if I originally had a CAST in this.
Does anyone know of a simple , elegant way to accomplish this? I just want to set is_in_table_b to True if the TableA.id appears in TableB.table_a_id. I know some non-elegant ways with inner queries, but I want to remember the more-correct ways to do this. I'm positive I had this done in an "Update From" originally.
This one should be simple enough:
UPDATE tableA SET
is_in_table_b = exists (select 1 FROM tableB WHERE table_a_id=tableA.id);
yeah, do a JOIN between the tables for an UPDATE.
the setup:
CREATE TABLE table_a (
id int not null auto_increment primary key,
is_in_b boolean
);
CREATE TABLE table_b (
table_a_id int
);
-- create some test data in table_a;
INSERT INTO table_a (is_in_b) VALUES (FALSE), (FALSE), (FALSE);
INSERT INTO table_a (is_in_b) SELECT FALSE
FROM table_a a1
JOIN table_a a2
JOIN table_a a3;
-- and create a subset of matching data in table_a;
INSERT INTO table_b (table_a_id)
SELECT id FROM table_a ORDER BY RAND() limit 5;
now the answer:
UPDATE table_a
JOIN table_b ON table_a_id = table_a.id
SET is_in_b = TRUE;
See the results with
SELECT * from table_b;
SELECT * FROM table_a WHERE is_in_b;
Works on http://sqlfiddle.com/#!2/8afc0/1 - should work in Postgres too I think.
Consider to drop that redundant column altogether and use a view or a "generated column" instead (with the EXISTS expression provided by #Daniel). Details under this related question:
Store common query as column?
Just be sure to have an index on TableB.table_a_id.

T-SQL A question about inner join table variable

in my stored procedure I have a table variable contains rows ID. The are 2 scenarios - that table variable is empty and not.
declare #IDTable as table
(
number NUMERIC(18,0)
)
In the main query, I join that table:
inner join #IDTable tab on (tab.number = csr.id)
BUT:
as we know how inner join works, I need that my query returns some rows:
when #IDTable is empty
OR
return ONLY rows that exist in
#IDTable
I tried also with LEFT join but it doesn't work. Any ideas how to solve it ?
If `#IDTable' is empty then what rows do you return? Do you just ignore the Join on to the table?
I'm not sure I get what you're trying to do but this might be easier.
if (Select Count(*) From #IDTable) == 0
begin
-- do a SELECT that doesn't join on to the #IDTable
end
else
begin
-- do a SELECT that joins on to #IDTable
end
It is not optimal, but it works:
declare #z table
(
id int
)
--insert #z values(2)
select * from somTable n
left join #z z on (z.id = n.id)
where NOT exists(select 1 from #z) or (z.id is not null)