What's the difference between "NOT EXISTS" and "NOT IN" - postgresql

For school I needed to get all players who never played a match for the team with number 1. So I thought I would look for all players who played a match for the team with number 1 in a subquery. This is my subquery:
select distinct s.spelersnr, naam
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1)
Now to extract the players who never played a match for team 1 I thought I could use the "NOT EXISTS" operator. My query then looked like this:
select spelersnr, naam
from spelers
where not exists (select distinct s.spelersnr, naam
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1))
order by naam, spelersnr
But this query didn't return the result I needed (in fact it didn't return anything). Then I tried this query:
select spelersnr, naam
from spelers
where (spelersnr, naam) not in (select distinct s.spelersnr, naam
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1))
order by naam, spelersnr
This query returned the result I needed, but now I don't really understand the difference between "NOT EXISTS" and "NOT IN".

You where close, the difference between the NOT EXISTS and NOT IN is how the SQL is executed, using NOT IN will be much slower (obviously this depends on the size of the data sets involved) because it compares each row to each of the items in the clause.
in contrast EXISTS or NOT EXISTS does a lookup for the single row based on a a contextual where clause.
All that's missing in you example is the WHERE clause in the EXISTS
select spelersnr, naam
from spelers as sp
where not exists (select 1
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1)
where sp.spelersnr = s.spelersnr and sp.naam = s.naam)
order by naam, spelersnr
Hope this helps :)

Related

GROUP BY one column, then by another column

SELECT lkey, max(votecount) FROM VOTES
WHERE ekey = (SELECT ekey FROM Elections where electionid='NR2019')
GROUP BY lkey
ORDER BY lkey ASC
Is there an easy way to get the pkey in this Statement?
Solution should look like this
Use DISTINCT ON:
SELECT DISTINCT ON (v.ikey) v.*
FROM VOTES v
INNER JOIN Elections e ON e.ekey = v.ekey
WHERE e.electionid = 'NR2019'
ORDER BY v.ikey, v.votecount DESC;
In plain English, the above query says to return the single record for each ikey value having the highest vote count.

Ordering data after an intersection Postgresql

I'm working on a db homework question. It asks that the data be in descending order. However, I'm using an intersection in my query because of the many to many relationship.
The schema for Genre is
CREATE TABLE Genre (
movie_id integer REFERENCES Movie(id),
genre GenreType,
primary key (movie_id,genre)
);
My code is currently
$genres = tokenise($argv[1], "&");
$i = 0;
$qry = "
(select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on (m.id = r.movie_id)
join Genre g on (m.id = g.movie_id)
where m.YEAR >= ".$startYear."
and m.YEAR <= ".$endYear."
and g.genre = '".$genres[$i]."')
";
$i++;
while ($i < count($genres)){
$qry = $qry."
intersect
(select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on (m.id = r.movie_id)
join Genre g on (m.id = g.movie_id)
where m.YEAR >= ".$startYear."
and m.YEAR <= ".$endYear."
and g.genre = '".$genres[$i]."')
";
$i++;
}
I'd like to order the final result with the statement
order by r.imdb_score desc, r.num_voted_users desc
However, tagging it onto the end of each select statement doesn't work (the output is still scrambled).
An intersect (or union or except) can only have a single ORDER BY at the end. Even if it "looks like" it belongs to the final query, it applies to the whole result, e.g.:
select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on m.id = r.movie_id
join Genre g on m.id = g.movie_id
where ...
intersect
select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on m.id = r.movie_id
join Genre g on m.id = g.movie_id
where ...
order by imdb_score desc, num_voted_users desc
Will sort the complete result of the intersect, note that you can't use a table alias when referencing the columns (and the column names correspond to the name from the first query).
Putting the individual queries between parentheses is not needed.
But the use of intersect seems strange to begin with.
It seems you are simulating a simple IN condition with that. As far as I can tell, you could replace that with a single query that uses where ... and genre in ('genre1', 'genre2', ....)
It will be easier to understand and it will also be a lot faster.
You can still do something like that :
SELECT *
FROM
(
[Your_Entire_Query_With_All_Your_Intersects]
) T
ORDER BY [Your_Conditions];
But I don't know exactly what you want to do. Your query seems quite odd to me. Why the intersect in the first place?

Postgres string_agg function not recognized as aggregate function

I am attempting to run this query
SELECT u.*, string_agg(CAST(uar.roleid AS VARCHAR(100)), ',') AS roleids, string_agg(CAST(r.role AS VARCHAR(100)), ',') AS systemroles
FROM idpro.users AS u
INNER JOIN idpro.userapplicationroles AS uar ON u.id = uar.userid
INNER JOIN idpro.roles AS r ON r.id = uar.roleid
GROUP BY u.id, uar.applicationid
HAVING u.organizationid = '77777777-f892-4f4a-8328-c31df32bd6ba'
AND uar.applicationid = 'd88fbf05-c048-4697-8bf3-036f39897183'
AND (u.statusid = '7f9f0b75-44b7-4216-bf2a-03abc47dcff8')
AND uar.roleid IN ('cc9ada1c-fa21-400b-be98-c563ebb65a9c','de087148-4788-43da-89e2-dd7dff097735');
However, I'm getting an error stating that
ERROR: column "uar.roleid" must appear in the GROUP BY clause or be used in an aggregate function
LINE 9: AND uar.roleid IN ('cc9ada1c-fa21-400b-be98-c563ebb65a9c','...
string_agg() IS an aggregate function, is it not? My intent, if it isn't obvious, is to return each user record with the roleids and rolenames in comma-delimited lists. If I am doing everything wrong, could you please point me in the right direction?
You are filtering the data, so a WHERE clause would be needed. This tutorial is worth reading.
SELECT u.*,
string_agg(CAST(uar.roleid AS VARCHAR(100)), ',') AS roleids,
string_agg(CAST(r.role AS VARCHAR(100)), ',') AS systemroles
FROM idpro.users AS u
INNER JOIN idpro.userapplicationroles AS uar ON u.id = uar.userid
INNER JOIN idpro.roles AS r ON r.id = uar.roleid
WHERE u.organizationid = '77777777-f892-4f4a-8328-c31df32bd6ba'
AND uar.applicationid = 'd88fbf05-c048-4697-8bf3-036f39897183'
AND (u.statusid = '7f9f0b75-44b7-4216-bf2a-03abc47dcff8')
AND uar.roleid IN ('cc9ada1c-fa21-400b-be98-c563ebb65a9c','de087148-4788-43da-89e2-dd7dff097735');
GROUP BY u.id, uar.applicationid
The HAVING clause is helpful for filtering the aggregated values or the groups.
Since you are grouping by u.id, the table primary key you have access to every column of the u table. You can either use a where clause or a having clause.
For uar.applicationid, it is part of the group by so you can also use either a where or a having.
uar.roleid is not part of the group by clause, so to be usable in the having clause, you would have to consider the aggregated value.
The following example filters out rows whose aggregated length is more than 10 chars.
HAVING length(string_agg(CAST(uar.roleid AS VARCHAR(100)), ',')) > 10
A more common usage, on numerical field, is to filter out if the number of aggregated rows is less than a threshold (having count(*) > 2) or a sum of some kind (having sum(vacation_days) > 21)

TSQL, join to multiple fields of which one could be NULL

I have a simple query:
SELECT * FROM Products p
LEFT JOIN SomeTable st ON st.SomeId = p.SomeId AND st.SomeOtherId = p.SomeOtherId
So far so good.
But the first join to SomeId can be NULL, In that case the check should be IS NULL, and that's where the join fails. I tried to use a CASE, but can't get that to work also.
Am I missing something simple here?
From Undocumented Query Plans: Equality Comparisons.
SELECT *
FROM Products p
LEFT JOIN SomeTable st
ON st.SomeOtherId = p.SomeOtherId
AND EXISTS (SELECT st.SomeId INTERSECT SELECT p.SomeId)

T-SQL query one table, get presence or absence of other table value

I'm not sure what this type of query is called so I've been unable to search for it properly. I've got two tables, Table A has about 10,000 rows. Table B has a variable amount of rows.
I want to write a query that gets all of Table A's results but with an added column, the value of that column is a boolean that says whether the result also appears in Table B.
I've written this query which works but is slow, it doesn't use a boolean but rather a count that will be either zero or one. Any suggested improvements are gratefully accepted:
SELECT u.number,u.name,u.deliveryaddress,
(SELECT COUNT(productUserid)
FROM ProductUser
WHERE number = u.number and productid = #ProductId)
AS IsInPromo
FROM Users u
UPDATE
I've run the query with actual execution plan enabled, I'm not sure how to show the results but various costs are:
Nested Loops (left semi join): 29%]
Clustered Index scan (User Table): 41%
Clustered Index Scan (ProductUser table): 29%
NUMBERS
There are 7366 users in the users table and currently 18 rows in the productUser table (although this will change and could be in the thousands)
You can use EXISTS to short circuit after the first row is found rather than COUNT-ing all matching rows.
SQL Server does not have a boolean datatype. The closest equivalent is BIT
SELECT u.number,
u.name,
u.deliveryaddress,
CASE
WHEN EXISTS (SELECT *
FROM ProductUser
WHERE number = u.number
AND productid = #ProductId) THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END AS IsInPromo
FROM Users u
RE: "I'm not sure what this type of query is called". This will give a plan with a semi join. See Subqueries in CASE Expressions for more about this.
Which management system are you using?
Try this:
SELECT u.number,u.name,u.deliveryaddress,
case when COUNT(p.productUserid) > 0 then 1 else 0 end
FROM Users u
left join ProductUser p on p.number = u.number and productid = #ProductId
group by u.number,u.name,u.deliveryaddress
UPD: this could be faster using mssql
;with fff as
(
select distinct p.number from ProductUser p where p.productid = #ProductId
)
select u.number,u.name,u.deliveryaddress,
case when isnull(f.number, 0) = 0 then 0 else 1 end
from Users u left join fff f on f.number = u.number
Since you seem concerned about performance, this query can perform faster as this will cause index seek on both tables versus an index scan:
SELECT u.number,
u.name,
u.deliveryaddress,
ISNULL(p.number, 0) IsInPromo
FROM Users u
LEFT JOIN ProductUser p ON p.number = u.number
WHERE p.productid = #ProductId