Check whether there exist at least one b for every a where each b has a foreign key reference to a - postgresql

Say that you have a table of a's and a table of b's where each b has a foreign key reference to the table of a's. How would you write a SQL statement answering the question whether or not there exists atleast one b for every a?
To reify: Say that you have a table of users:
create table users (
id bigserial primary key,
name text
);
and a table of hats that these users wear:
create table hats (
id bigserial primary key,
user_id bigserial references users,
description text
);
How would you write a query answering whether or not each user has at least one hat, or to rephrase: Is there any user without a hat?

You could use LEFT JOIN to find users without hat:
SELECT u.*
FROM users u
LEFT JOIN hats h
ON u.id = h.user_id
WHERE h.user_id IS NULL;

RhodiumToad on #postgresql#freenode answered:
Do you want a result like (user_id, has_hat)? Or just a list of users with hats, or users without hats? Or a single true/false result for "does any user not have a hat?"
The most efficient answer of the various different queries will be that to the first question:
select exists(
select 1
from users u
where not exists(
select 1
from hats h
where h.user_id=u.id));
This because it's (a) plannable as an anti-join and (b) stops on first match. There's also the added benefit of it literally saying "does a user exist such that no hat exists for that user" so it should be easy to understand for future readers.
The next best if you want more detail is the middle option (users with/without hats) like so:
-- shows all users with at least one hat
select *
from users u
where exists(
select 1
from hats h
where h.user_id=u.id);
-- shows all users with no hat
select *
from users u
where not exists(
select 1
from hats h
where h.user_id=u.id);
The first option doesn't plan as efficiently so it should usually be avoided:
-- shows all users, with a flag for whether they have a hat
select u.id, exists(
select 1
from hats h
where h.user_id=u.id) as has_hat
from users u;

Related

Storing array of IDs and how to correctly unpack them on a select out

This might be partly a design question (new to PostgreSQL) as well.
I have three tables - Users, Groups and User_Group. User_Group represents a combination of 1 user_id being linked to 0..X Group IDs.
The tables are as simple as you think (for now, building out this thing):
User: ID, Name, ....
Group: ID, Name, ...
User_Group: UserID, GroupID int[], ...
So right now, the GroupID field in User_Group is an Integer array. UserID 1 has a value of {1,2,10,19,28} for example.
Goal:
In my UI, I need to represent that list as the group names (ie: {Group1, Group2, Group10, Group19, Group28}).
So, because I am new to PostgreSQL, I'm researching and a couple ideas pop into my mind - unnest, ANY and array replacement. All scream performance issues to me, but I might be wrong (this is the design question, is it smart to store array?)
My query right now:
select
u.*,
g.group_ids
from users u
left join user_group g
on u.id = g.user_id
Piece I'm trying to figure out how to push into:
select ug.group_id
from (select unnest(group_ids) group_id FROM user_group) as ug
left join groups g
on g.id = ug.group_id
This will just result in (obviously) an additional row for each group ID the person is associated with.
Which is the best way to do this?
( Personally I would have a column on Users table as Groups (int array) but your choice is fine too).
It would look like (I used table and field names off the top of my head, slightly modified than yours):
select u.*, g.Name as GroupName
from users u
left join usergroups ug on ug.UserId = u.UserId
left join groups g on g.groupId = ANY( ug.groups );
Update: I might have misunderstood your need. Maybe you meant this:
select u.*,
(select string_agg(g.name,',')
from groups g
inner join usergroups ug on ug.groupId = g.GroupId
where ug.UserId = u.UserID and
g.groupId = ANY( ug.groups )) as Groups
from users u;
There, you have a one-to-many relationship:
User (1)->(*) Groups
This kind of relation doesn't need an intermediary table for the link definition. The one-to-many relation use to have a foreign key in the child table (in this case is Groups).
The result will be:
User: id, name
Group: id, name, user_id
And you can add a constraint to the database as: ALTER TABLE user ADD CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES group;

How to query "has no linked records in this table"

I have two simple tables: one with primary key id, and two with primary key id and a foreign key oneId.
I want to get all rows from one with no references in two.oneId.
I could do
SELECT ... FROM one LEFT JOIN two ON two.oneId = one.id WHERE two.id IS NULL
SELECT ... FROM one WHERE NOT exists(SELECT 1 FROM two WHERE oneId = one.id)
SELECT ... FROM one WHERE id NOT IN (SELECT oneId FROM two)
probably other options exist
Which option is better, and why?
The second choice is the best – it will be translated to an antijoin.
Number one looks pretty good too, it might have the same execution plan.

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.
Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

DB2 Lookup table using two columns of same table

I have a lookup table for institution id, name, address and another table for course details.
In each course record there will be two columns pointing primary and secondary institution ids.
My select query should look like ->
Select course_id,
name,
primary_Institution_id,
Primary_Institution_name,
primary_Institution_address,
Secondary _Institution_id,
Secondary _Institution_name,
Secondary_Institution_address
from [JOIN MAY BE]
where course_id in ('1223','34234','43432')
How to achieve this? I have no control over the tables and I can only select from them and cannot modify their structure.
If you are trying to ask how to do the join, it might look something like this
Select c.course_id,
c.name,
c.primary_Institution_id,
i.name as primary_Institution_name,
i.address as primary_Institution_address
c.secondary_Institution_id
k.name as Secondary _Institution_name,
k.address as Secondary_Institution_address
from courses as c
join institutions as i
on i.id = c.primary_Institution_id
left
join institutions as k
on i.id = c.secondary_Institution_id
where course_id in ('1223','34234','43432')
This assumes that the first institution id is mandatory (never null) so the join is implied as an inner join, but that perhaps the second might be optional (null allowed) so it uses a left join, in case there is nothing to match to.

Finding duplicates between two tables

I've got two SQL2008 tables, one is a "Import" table containing new data and the other a "Destination" table with the live data. Both tables are similar but not identical (there's more columns in the Destination table updated by a CRM system), but both tables have three "phone number" fields - Tel1, Tel2 and Tel3. I need to remove all records from the Import table where any of the phone numbers already exist in the destination table.
I've tried knocking together a simple query (just a SELECT to test with just now):
select t2.account_id
from ImportData t2, Destination t1
where
(t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
... but I'm aware this is almost certainly Not The Way To Do Things, especially as it's very slow. Can anyone point me in the right direction?
this query requires a little more that this information. If You want to write it in the efficient way we need to know whether there is more duplicates each load or more new records. I assume that account_id is the primary key and has a clustered index.
I would use the temporary table approach that is create a normalized table #r with an index on phone_no and account_id like
SELECT Phone, Account into #tmp
FROM
(SELECT account_id, tel1, tel2, tel3
FROM destination) p
UNPIVOT
(Phone FOR Account IN
(Tel1, tel2, tel3)
)AS unpvt;
create unclustered index on this table with the first column on the phone number and the second part the account number. You can't escape one full table scan so I assume You can scan the import(probably smaller). then just join with this table and use the not exists qualifier as explained. Then of course drop the table after the processing
luke
I am not sure on the perforamance of this query, but since I made the effort of writing it I will post it anyway...
;with aaa(tel)
as
(
select Tel1
from Destination
union
select Tel2
from Destination
union
select Tel3
from Destination
)
,bbb(tel, id)
as
(
select Tel1, account_id
from ImportData
union
select Tel2, account_id
from ImportData
union
select Tel3, account_id
from ImportData
)
select distinct b.id
from bbb b
where b.tel in
(
select a.tel
from aaa a
intersect
select b2.tel
from bbb b2
)
Exists will short-circuit the query and not do a full traversal of the table like a join. You could refactor the where clause as well, if this still doesn't perform the way you want.
SELECT *
FROM ImportData t2
WHERE NOT EXISTS (
select 1
from Destination t1
where (t2.Tel1!='' AND (t2.Tel1 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel2!='' AND (t2.Tel2 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
or
(t2.Tel3!='' AND (t2.Tel3 IN (t1.Tel1,t1.Tel2,t1.Tel3)))
)