Avoid nested loop in PostgreSQL - postgresql

See query below
Select count(*) FROM
(Select distinct Student_ID, Name, Student_Age, CourseID from student) a1
JOIN
(Select distinct CourseID, CourseName, TeacherID from courses) a2
ON a1.CourseID=a2.CourseID
JOIN
(Select distinct TeacherID, TeacherName, Teacher_Age from teachers) a3
ON a2.TeacherID=a3.TeacherID
The subqueries must be used for deduping purpose.
This query run fine in PostgreSQL. However, if I add a condition between the student and teacher table, according to the execution plan, Postgres will wrongly nested loop join the student and teach tables which have no direct relationship. For example:
Select count(*) FROM
(Select distinct Student_ID, Name, Student_Age, CourseID from student) a1
JOIN
(Select distinct CourseID, CourseName, TeacherID from courses) a2
ON a1.CourseID=a2.CourseID
JOIN
(Select distinct TeacherID, TeacherName, Teacher_Age from teachers) a3 ON
a2.TeacherID=a3.TeacherID
WHERE Teacher_Age>=Student_Age
This query will take forever to run. However, if I replace the subqueries with the tables, it'll run very fast. Without using temp tables to store the deduping result, is there a way to to avoid the nested loop in this situation?
Thank you for your help.

You're making the database perform a lot of unnecessary work to accomplish your goal. Instead of doing 3 different SELECT DISTINCT sub-queries all joined together, try joining the base tables directly to each other and let it handle the DISTINCT part only once. If your tables have proper indexes on the ID fields, this should run rather quick.
SELECT COUNT(1)
FROM (
SELECT DISTINCT s.Student_ID, c.CourseID, t.TeacherID
FROM student s
JOIN courses c ON s.CourseID = c.CourseID
JOIN teachers t ON c.TeacherID = t.TeacherID
WHERE t.Teacher_Age >= s.StudentAge
) a

Related

postgresql left join but dont fetch if matching condition found

I have a bit of a complicated scenario. I have two tables, employee and agency. An employee may or may not have an agency, but if an employee has an agency I want the select clause to check another condition on the agency, but if the employee does not have an agency its fine I want to fetch the employee. I'm not sure how to write the select statement for this. This is what I have come up with so far
select * from employee e left join
agency a on a.id = e.agencyID and a.valid = true;
However the problem with this is that it fetches both employees without agencies which is fine, but it also fetches employees with agencies where a.valid = false. The only option I can think of is to do an union but I'm looking for something more simpler.
A UNION could actually be the solution that performs best, but you can write the query without UNION like this:
select *
from employee e
left join agency a
on a.id = e.agencyID
where coalesce(a.valid, true);
That will accept agencies where valid IS NULL, that is, result rows where the agency part was substituted with NULLs by the outer join.
You want except the condition that both table match(agency.id = employee.agencyID) and also agency.id is false. The following query will express the condition.
SELECT
e.*,
a.*
FROM
employee e
LEFT JOIN agency a ON a.id = e.agencyID
WHERE
NOT EXISTS (
SELECT
1
FROM
agency
WHERE
a.id = e.agencyID
AND a.valid IS FALSE)
ORDER BY
e.id;

How to find in a many to many relation all the identical values in a column and join the table with other three tables?

I have a many to many relation with three columns, (owner_id,property_id,ownership_perc) and for this table applies (many owners have many properties).
So I would like to find all the owner_id who has many properties (property_id) and connect them with other three tables (Table 1,3,4) in order to get further information for the requested result.
All the tables that I'm using are
Table 1: owner (id_owner,name)
Table 2: owner_property (owner_id,property_id,ownership_perc)
Table 3: property(id_property,building_id)
Table 4: building(id_building,address,region)
So, when I'm trying it like this, the query runs but it returns empty.
SELECT address,region,name
FROM owner_property
JOIN property ON owner_property.property_id = property.id_property
JOIN owner ON owner.id_owner = owner_property.owner_id
JOIN building ON property.building_id=building.id_building
GROUP BY owner_id,address,region,name
HAVING count(owner_id) > 1
ORDER BY owner_id;
Only when I'm trying the code below, it returns the owner_id who has many properties (see image below) but without joining it with the other three tables:
SELECT a.*
FROM owner_property a
JOIN (SELECT owner_id, COUNT(owner_id)
FROM owner_property
GROUP BY owner_id
HAVING COUNT(owner_id)>1) b
ON a.owner_id = b.owner_id
ORDER BY a.owner_id,property_id ASC;
So, is there any suggestion on what I'm doing wrong when I'm joining the tables? Thank you!
This query:
SELECT owner_id
FROM owner_property
GROUP BY owner_id
HAVING COUNT(property_id) > 1
returns all the owner_ids with more than 1 property_ids.
If there is a case of duplicates in the combination of owner_id and property_id then instead of COUNT(property_id) use COUNT(DISTINCT property_id) in the HAVING clause.
So join it to the other tables:
SELECT b.address, b.region, o.name
FROM (
SELECT owner_id
FROM owner_property
GROUP BY owner_id
HAVING COUNT(property_id) > 1
) t
INNER JOIN owner_property op ON op.owner_id = t.owner_id
INNER JOIN property p ON op.property_id = p.id_property
INNER JOIN owner o ON o.id_owner = op.owner_id
INNER JOIN building b ON p.building_id = b.id_building
ORDER BY op.owner_id, op.property_id ASC;
Always qualify the column names with the table name/alias.
You can try to use a correlated subquery that counts the ownerships with EXISTS in the WHERE clause.
SELECT b1.address,
b1.region,
o1.name
FROM owner_property op1
INNER JOIN owner o1
ON o1.id_owner = op1.owner_id
INNER JOIN property p1
ON p1.id_property = op1.property_id
INNER JOIN building b1
ON b1.id_building = p1.building_id
WHERE EXISTS (SELECT ''
FROM owner_property op2
WHERE op2.owner_id = op1.owner_id
HAVING count(*) > 1);

How to count amount of results in Postgresql ignoring one to many join

I have a pretty big query with multiple joins, and I need to make also a query with same joins but without limit/offset functionality just to take a total amount of rows for pagination.
I'm going to simplify query here to single join and omit all where clauses.
I have 5 offers and each offer has 2 events.
Main query:
SELECT o.id AS "id",
o.display_name AS "displayName",
o.offer_hidden AS "offerHidden",
o.offer_type AS "offerType",
array_to_json(o.countries) AS "countries",
string_agg(distinct concat(COALESCE(e.event_id::text, ''), ',, ', COALESCE(e.event_tag::text, ''), ',, ',
COALESCE(e.payout::text, '')), ';;') AS "events"
FROM offer o
INNER JOIN (SELECT e.event_id, e.event_tag, e.payout, e.offer_id FROM event e ORDER BY e.offer_id ASC) e
ON e.offer_id = o.id
GROUP BY o.id
As you can see here I'm making concat and string_add for events to get a single record for each offer
Count query:
SELECT count(1) AS count
FROM offer o
INNER JOIN (SELECT e.event_id, e.event_tag, e.payout, e.offer_id FROM event e ORDER BY e.offer_id ASC) e
ON e.offer_id = o.id
Here I'm trying to make a query lightweight as possible omitting all selects and using the only count, but I'm getting count 10 as each offer has 2 events (5*2 = 10).
Question is it possible to make count only by the main table, but still using data from joins for filtering/ordering?
Updated: I know I can add same concat and string_agg to the count query, but this query should be lightweight as it going to query all records with limit/offset
Updated: looks like I found a possible solution using distinct
SELECT count(distinct o.id) AS count
FROM offer o
INNER JOIN (SELECT e.event_id, e.event_tag, e.payout, e.offer_id FROM event e ORDER BY e.offer_id ASC) e
ON e.offer_id = o.id
but not sure is the best way to do it
You should use a semi-join:
SELECT count(*) AS count
FROM offer o
WHERE EXISTS (SELECT 1 FROM event e WHERE e.offer_id = o.id);
That will be faster than using DISTINCT.

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

MS Access INNER JOIN most recent entry

I'm having some trouble trying to get Microsoft Access 2007 to accept my SQL query but it keeps throwing syntax errors at me that don't help me correct the problem.
I have two tables, let's call them Customers and Orders for ease.
I need some customer details, but also a few details from the most recent order. I currently have a query like this:
SELECT c.ID, c.Name, c.Address, o.ID, o.Date, o.TotalPrice
FROM Customers c
INNER JOIN Orders o
ON c.ID = o.CustomerID
AND o.ID = (SELECT TOP 1 ID FROM Orders WHERE CustomerID = c.ID ORDER BY Date DESC)
To me, it appears valid, but Access keeps throwing 'syntax error's at me and when I hit OK, it selects a piece of the SQL text that doesn't even relate to it.
If I take the extra SELECT clause out it works but is obviously not what I need.
Any ideas?
You cannot use AND in that way in MS Access, change it to WHERE. In addition, you have two reserved words in your column (field) names - Name, Date. These should be enclosed in square brackets when not prefixed by a table name or alias, or better, renamed.
SELECT c.ID, c.Name, c.Address, o.ID, o.Date, o.TotalPrice
FROM Customers c
INNER JOIN Orders o
ON c.ID = o.CustomerID
WHERE o.ID = (
SELECT TOP 1 ID FROM Orders
WHERE CustomerID = c.ID ORDER BY [Date] DESC)
I worked out how to do it in Microsoft Access. You INNER JOIN on a pre-sorted sub-query. That way you don't have to do multiple ON conditions which aren't supported.
SELECT c.ID, c.Name, c.Address, o.OrderNo, o.OrderDate, o.TotalPrice
FROM Customers c
INNER JOIN (SELECT * FROM Orders ORDER BY OrderDate DESC) o
ON c.ID = o.CustomerID
How efficient this is another story, but it works...