Ordering data after an intersection Postgresql

Ordering data after an intersection Postgresql - postgresql

I'm working on a db homework question. It asks that the data be in descending order. However, I'm using an intersection in my query because of the many to many relationship.
The schema for Genre is
CREATE TABLE Genre (
movie_id integer REFERENCES Movie(id),
genre GenreType,
primary key (movie_id,genre)
);
My code is currently
$genres = tokenise($argv[1], "&");
$i = 0;
$qry = "
(select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on (m.id = r.movie_id)
join Genre g on (m.id = g.movie_id)
where m.YEAR >= ".$startYear."
and m.YEAR <= ".$endYear."
and g.genre = '".$genres[$i]."')
";
$i++;
while ($i < count($genres)){
$qry = $qry."
intersect
(select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on (m.id = r.movie_id)
join Genre g on (m.id = g.movie_id)
where m.YEAR >= ".$startYear."
and m.YEAR <= ".$endYear."
and g.genre = '".$genres[$i]."')
";
$i++;
}
I'd like to order the final result with the statement
order by r.imdb_score desc, r.num_voted_users desc
However, tagging it onto the end of each select statement doesn't work (the output is still scrambled).

An intersect (or union or except) can only have a single ORDER BY at the end. Even if it "looks like" it belongs to the final query, it applies to the whole result, e.g.:
select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on m.id = r.movie_id
join Genre g on m.id = g.movie_id
where ...
intersect
select m.title, m.YEAR, m.content_rating, m.lang, r.imdb_score, r.num_voted_users
from Movie m
join Rating r on m.id = r.movie_id
join Genre g on m.id = g.movie_id
where ...
order by imdb_score desc, num_voted_users desc
Will sort the complete result of the intersect, note that you can't use a table alias when referencing the columns (and the column names correspond to the name from the first query).
Putting the individual queries between parentheses is not needed.
But the use of intersect seems strange to begin with.
It seems you are simulating a simple IN condition with that. As far as I can tell, you could replace that with a single query that uses where ... and genre in ('genre1', 'genre2', ....)
It will be easier to understand and it will also be a lot faster.

You can still do something like that :
SELECT *
FROM
(
[Your_Entire_Query_With_All_Your_Intersects]
) T
ORDER BY [Your_Conditions];
But I don't know exactly what you want to do. Your query seems quite odd to me. Why the intersect in the first place?

Related

Find a difference between 2 tables

I want to check that the poi_equipement table (relationship table) corresponds to the data in the data table (i.e. a two-way check)
https://dbfiddle.uk/gFMjbIpX
detect that wc (in poi_equipement) is extra (because it is not present in the data table) and that hotel is not in poi_equipement so it is absent compared to the data table
I don't understand why with the raquête except he just answers me hotel.
I want him to answer me hotel and wc.
select object from data where subject = 'url1'
except
select subject from poi_equipement inner join equipement on poi_equipement.equipement_id = equipement.id;
ideally I want to know when I have a difference in poi_equipement, in data or in the 2 tables

A full outer join will do
with params as (
select 'url1' as subject),
data_object as (
select d.object
from data d
join params prm
on d.subject = prm.subject),
equipment_subject as (
select e.subject
from poi_equipement pe
join poi p
on pe.poi_id = p.id
join equipement e
on pe.equipement_id = e.id
join params prm
on p.id_url = prm.subject)
select d.object as data,
e.subject as poi_equipment
from data_object d
full outer
join equipment_subject e
on d.object = e.subject
where d.object is null
or e.subject is null;
Result:
data |poi_equipment|
-----+-------------+
hotel| |
|wc |
You can remove where clause if you need to see which item is in both places.

How to aggregate multiple calues in this postgresql query?

This is my query:
SELECT "vehicle"."id",
"vehicle"."description",
"tag"."id" AS "tag_id",
"tag"."name" AS "tag_name"
FROM "vehicle"
INNER JOIN "vehicle_tag_pivot" ON "vehicle"."id" = "vehicle_tag_pivot"."vehicle_id"
INNER JOIN "tag" ON "vehicle_tag_pivot"."tag_id" = "tag"."id"
WHERE "tag"."name" IN ('car', 'busses')
AND "vehicle"."category_id" = '1E4FD2C5-C32E-4E3F-91B3-45478BCF0185'
I only have one vehicle in my database. It has two tags -> car and busses (this is test data).
So when I run the query, it returns The exact same vehicle showing the 2 tags it has.
How do I get it to return the vehicle once? I do not really want to return the tag_name. I only want to filter and return all the vehicles that has the both tags car and busses. If one vehicle has both those tags, then it should return that vehicle only. But instead it is returning the same vehicle twice showing its tags.

This should work for you.
SELECT i.*
FROM "interest" as "i"
where i.id in (
select it.interest_id
from interest_tag_pivot it
join tag t on it.tag_id = t.id
where t.name in ('car', 'busses')
group by it.interest_id
having count (*) = 2
)
and i.category_id = '1E4FD2C5-C32E-4E3F-91B3-45478BCF0185'

Do not JOIN - joins leads to duplications. Put all tags logic to WHERE EXISTS(...) or similar.
Here two scalar subqueries comparison in WHERE, try this (important! - it is assumed that tags for each vehicle can't duplicate, so we can compare its counts):
WITH required_tags(val) AS (
VALUES ('car'),
('busses')
)
SELECT "vehicle"."id",
"vehicle"."description",
"tag"."id" AS "tag_id",
"tag"."name" AS "tag_name"
FROM "vehicle"
WHERE "vehicle"."category_id" = '1E4FD2C5-C32E-4E3F-91B3-45478BCF0185'
AND (
-- count matching tags...
SELECT count(1)
FROM "vehicle_tag_pivot"
INNER JOIN "tag" ON "vehicle_tag_pivot"."tag_id" = "tag"."id"
WHERE "vehicle"."id" = "vehicle_tag_pivot"."vehicle_id"
AND "tag"."name" IN (SELECT val FROM required_tags)
) = (
-- ...equals to count required tags
SELECT count(1)
FROM required_tags
)

What's the difference between "NOT EXISTS" and "NOT IN"

For school I needed to get all players who never played a match for the team with number 1. So I thought I would look for all players who played a match for the team with number 1 in a subquery. This is my subquery:
select distinct s.spelersnr, naam
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1)
Now to extract the players who never played a match for team 1 I thought I could use the "NOT EXISTS" operator. My query then looked like this:
select spelersnr, naam
from spelers
where not exists (select distinct s.spelersnr, naam
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1))
order by naam, spelersnr
But this query didn't return the result I needed (in fact it didn't return anything). Then I tried this query:
select spelersnr, naam
from spelers
where (spelersnr, naam) not in (select distinct s.spelersnr, naam
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1))
order by naam, spelersnr
This query returned the result I needed, but now I don't really understand the difference between "NOT EXISTS" and "NOT IN".

You where close, the difference between the NOT EXISTS and NOT IN is how the SQL is executed, using NOT IN will be much slower (obviously this depends on the size of the data sets involved) because it compares each row to each of the items in the clause.
in contrast EXISTS or NOT EXISTS does a lookup for the single row based on a a contextual where clause.
All that's missing in you example is the WHERE clause in the EXISTS
select spelersnr, naam
from spelers as sp
where not exists (select 1
from spelers s inner join wedstrijden w on (s.spelersnr = w.spelersnr and teamnr = 1)
where sp.spelersnr = s.spelersnr and sp.naam = s.naam)
order by naam, spelersnr
Hope this helps :)

Why using COUNT with SELF JOIN gives different result value

Can somebody explain me why if I use SELF JOIN and COUNT it gives me different result than just using COUNT command?
Same table with ControlNo column. The value in a column is NOT Unique.
This query gives me total counts 15586.
select (Select COUNT(ControlNo)
from tblQuotes Q1
where Q1.ControlNo = a.ControlNo
) QuotedTotal
FROM tblQuotes a
inner join lstlines l on a.LineGUID = l.LineGUID
where l.LineName = 'EARTHQUAKE' AND YEAR(EffectiveDate) = 2016
But then, if I run this query it gives me total counts of 15095.
select COUNT(ControlNo) as QuotedTotal
from tblQuotes a
inner join lstlines l on a.LineGUID = l.LineGUID
where l.LineName = 'EARTHQUAKE' AND YEAR(EffectiveDate) = 2016
What exactly changing the total amount and why?
And why would I use the first scenario?
And is any way to modify the first query to get the sum of 15586 without breaking down by each row?
Thank you

It seems to be because field ControlNo is not unique and there are some records sharing that value, although not all of them join against the lstlines table with that condition. So basically your last query does:
SELECT COUNT(a.ControlNo)
FROM lstlines l
INNER JOIN tblQuotes a ON a.LineGUID = l.LineGUID
WHERE l.LineName = 'EARTHQUAKE' AND YEAR(EffectiveDate) = 2016
While the first one basically does:
SELECT COUNT(b.ControlNo)
FROM lstlines l
INNER JOIN tblQuotes a ON a.LineGUID = l.LineGUID
INNER JOIN tblQuotes b ON a.ControlNo = b.ControlNo
WHERE l.LineName = 'EARTHQUAKE' AND YEAR(EffectiveDate) = 2016
As you can see, in this second query you are not only counting the rows that match your lstlines table, but also all the rows in tblQuotes which have the same ControlNo as those who match against lstlines.

TSQL efficiency - INNER JOIN replaced by EXISTS

Can the following be rewritten to be more efficient?
I would use EXISTS if I didn't need fields from country but I do need those fields, and am not sure how to write this to make it more efficient.
SELECT distinct
p.ProvinceID,
p.Abbv as RegionCode,
p.name as RegionName,
cn.Code as CountryCode,
cn.Name as CountryName
FROM dbo.provinces AS p
INNER JOIN dbo.Countries AS cn ON p.CountryID = cn.CountryID
INNER JOIN dbo.Cities c on c.ProvinceID = p.ProvinceID
INNER JOIN dbo.Listings AS l ON l.CityID = c.CityID
WHERE l.IsActive = 1 AND l.IsApproved = 1

There are two things to note:
You're joining to dbo.Listings which results in many records, so you need to use DISTINCT (usually an expensive operator)
For any tables with columns not in the select you can move into an EXISTS (but the query planner effectively does this for you anyway)
So try this:
SELECT
p.ProvinceID,
p.Abbv as RegionCode,
p.name as RegionName,
cn.Code as CountryCode,
cn.Name as CountryName
FROM dbo.provinces AS p
INNER JOIN
dbo.Countries AS cn
ON p.CountryID = cn.CountryID
WHERE EXISTS (SELECT 1 FROM
dbo.Listings l
INNER JOIN dbo.Cities c
on l.CityID = c.CityID
WHERE c.ProvinceID = p.ProvinceID
AND l.IsActive = 1 AND l.IsApproved = 1
)
Check the query plans before and after - the query planner might be smart enough to do this anyway, but you have removed your distinct

The following will often perform even better by providing the optimizer more useful information:
SELECT
p.ProvinceID,
p.Abbv as RegionCode,
p.name as RegionName,
cn.Code as CountryCode,
cn.Name as CountryName
FROM dbo.provinces AS p
INNER JOIN
dbo.Countries AS cn
ON p.CountryID = cn.CountryID
INNER JOIN (
SELECT
p.ProvinceID
FROM
dbo.Listings l
INNER JOIN dbo.Cities c
on l.CityID = c.CityID
WHERE l.IsActive = 1 AND l.IsApproved = 1
GROUP BY
p.ProvinceID
) list
on list.ProvinceID = p.ProvinceID