How to delete rows using an outer join

How to delete rows using an outer join - postgresql

I've got a problem to delete records from a PostgreSQL table, using a LEFT JOIN.
I'd like to delete rows I get with the following query:
SELECT * FROM url
LEFT JOIN link_type ON url.link_type = link_type.id
WHERE link_type.id IS NULL
To do so, here is what I did:
DELETE FROM url
USING link_type
WHERE url.link_type = link_type.id AND link_type.id IS NULL
Query works but doesn't delete anything, although that's exactly what's explained in the doc: http://www.postgresql.org/docs/current/static/sql-delete.html.
Is my problem due to IS NULL in the query or Am I missing something?

Good work, sun. Minor suggestion: when using EXISTS/NOT EXISTS you don't need to SELECT *. A common convention (docs) is to just write SELECT 1 as like this:
DELETE FROM url WHERE NOT EXISTS (
SELECT 1 FROM link_type WHERE url.link_type = link_type.id
);
Functionally, both ways work.

Still don't understand why my previous query doesn't work (if someone could explain, would be nice), but here is how I did the trick:
DELETE FROM url WHERE NOT EXISTS (SELECT * FROM link_type WHERE url.link_type = link_type.id );

Related

Looking for equivalent in postgresql for merge update implemented in mssql

Have to migrate the below MSSQL code to postgresql
Looking forward for postgresql code which does the following.
MSSQL codee
MERGE TO_SubFamy AS TARGET
USING
(
SELECT SF.Id as SubFamyId,
CASE WHEN COUNT(A.Id)>0 THEN 'Young' ELSE 'OLD' END as ActiveYn
FROM
TO_SubFamy SF
RIGHT JOIN TO_SubFamyAutomationLink SFA ON SF.Id=SFA.SubFamyId
RIGHT JOIN TO_AUTOMATiON A ON A.Id = SFA.AutomationId and A.State='Active'
WHERE SF.Status <> 'not exits'
GROUP BY SF.Id
EXCEPT
SELECT Id, status from TO_SubFamy WHERE Status <> 'not exits'
) as SOURCE
ON TARGET.Id = SOURCE.SubFamyId
WHEN MATCHED THEN
UPDATE SET TARGET.status = SOURCE.ActiveYn;
pls help

My experience with MySql is rather limited, but the statement looks suspiciously like the same statement in Oracle. (Not a surprise as it wasn't supported until after Oracle bought MySql.) And a little research seems to confirm.
The merge statement is typically used to either insert or update based on the result of the ON statement. But another semi-common use is strictly to update as the syntax is somewhat easier in many cases as you do need to essentially repeat the select statement. Since your statement uses only the WHEN MATCHED clause it is the latter use being employed. So the translation is just a UPDATE. I think the follow satisfies your need.
with source (subfamyid,activeyn) as
(select sf.id,
case when count(a.id)>0 then 'Young' else 'OLD' end
from to_subfamy sf
right join to_subfamyautomationlink sfa on sf.id=sfa.subfamyid
right join to_automation a on a.id = sfa.automationid and a.state='Active'
where sf.status <> 'not exits'
group by sf.id
except
select id, status from to_subfamy where status <> 'not exits'
)
update to_subfamy as target
set status = source.activeyn
where target.id = source.subfamyid;
Disclaimer: As you neglected sample data and expected results the query has not been tested. However, right joins look suspicious, appears that all rows from table "to_automation" will be kept even if they do not exist in either of the others.

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.

Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

a dual variable not in statement?

I have the need to look at two tables that share two variables and get a list of the data from one table that does not have matching data in the other table. Example:
Table A
xName
Date
Place
xAmount
Table B
yName
Date
Place
yAmount
I need to be able to write a query that will check Table A and find entries that have no corresponding entry in Table B. If it was a one variable issue I could use not in statement but I can't think of a way to do that with two variables. A left join also does not appear like you could do it. Since looking at it by a specific date or place name would not work since we are talking about thousands of dates and hundreds of place names.
Thanks in advance to anyone who can help out.

SELECT TableA.Date,
TableA.Place,
TableA.xName,
TableA.xAmount,
TableB.yName,
TableB.yAmount
FROM TableA
LEFT OUTER JOIN TableB
ON TableA.Date = TableB.Date
AND TableA.Place = TableB.Place
WHERE TableB.yName IS NULL
OR TableB.yAmount IS NULL

SELECT * FROM A WHERE NOT EXISTS
(SELECT 1 FROM B
WHERE A.xName = B.yName AND A.Date = B.Date AND A.Place = B.Place AND A.xAmount = B.yAmount)

in ORACLE:
select xName , xAmount from tableA
MINUS
select yName , yAmount from tableB

Postgresql Faulty Syntax on select/join/group

What about the following is not proper syntax for Postgresql?
select p.*, SUM(vote) as votes_count
FROM votes v, posts p
where p.id = v.`voteable_id`
AND v.`voteable_type` = 'Post'
group by v.voteable_id
order by votes_count DESC limit 20
I am in the process of installing postgresql locally but wanted to get this out sooner :)
Thank you

MySQL is a lot looser in its interpretation of standard SQL than PostgreSQL is. There are two issues with your query:
Backtick quoting is a MySQL thing.
Your GROUP BY is invalid.
The first one can be fixed by simply removing the offending quotes. The second one requires more work; from the fine manual:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
This means that every column mentioned in your SELECT either has to appear in an aggregate function or in the GROUP BY clause. So, you have to expand your p.* and make sure that all those columns are in the GROUP BY, you should end up with something like this but with real columns in place of p.column...:
select p.id, p.column..., sum(v.vote) as votes_count
from votes v, posts p
where p.id = v.voteable_id
and v.voteable_type = 'Post'
group by p.id, p.column...
order by votes_count desc
limit 20
This is a pretty common problem when moving from MySQL to anything else.

What is the syntax for performing a parameterised delete using joins in SSIS 2008?

I'm trying to use an OLE DB Command to perform a delete using data from each row of my input file. The actual query works fine when running manually in sql server (given tableB.otherID is compared to an int), but I'm having issues parameterising it.
delete tableA from tableA
where tableA.ID = ?
The above query runs, and allows me to assign one of my input columns to tableA.ID. This is what I would expect.
Trying
delete tableA from tableA
INNER JOIN tableB ON tableB.ID = tableA.ID
where tableB.OtherID = ?
Throws up an error however ("The multi-part identifier tableB.OtherID could not be bound"). Hardcoding a value in place of the '?' stops this error from appearing.
It seems like this would be the correct syntax, is there anything wrong with the above?

This seems to be a bug/limitation with SSIS, I've found myself unable to perform similar parameterised update statements using a join.
Solution I ended up using was creating a temporary stored procedure with the delete statement I wanted, and passing the parameter to it directly.

I think the TSQL syntax you want is:
DELETE FROM tableA
FROM tableA INNER JOIN tableB ON tableB.ID = tableA.ID
WHERE tableB.OtherID = ?

DELETE FROM tableA
FROM tableA INNER JOIN tableB ON tableB.ID = tableA.ID
WHERE tableB.OtherID = #OrderId
Where #OrderID should be your variable in SSIS.

Depending on how many rows you need to delete using an Execute Sql task becomes slow quite quickly.
If that happens the solution that worked for me is putting the keys of the rows that need to be deleteed into a staging table, then when they're all in there issue one statement that deletes all those rows in a single statement and purges the staging table. Much quicker that way, added beneift is that you don;t have to use the quirky ? syntax. I never liked that, much too easy to mix stuff up when the sql becomes a little more complicated.
Regards Gert-Jan

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse