postgresql left join but dont fetch if matching condition found - postgresql

I have a bit of a complicated scenario. I have two tables, employee and agency. An employee may or may not have an agency, but if an employee has an agency I want the select clause to check another condition on the agency, but if the employee does not have an agency its fine I want to fetch the employee. I'm not sure how to write the select statement for this. This is what I have come up with so far
select * from employee e left join
agency a on a.id = e.agencyID and a.valid = true;
However the problem with this is that it fetches both employees without agencies which is fine, but it also fetches employees with agencies where a.valid = false. The only option I can think of is to do an union but I'm looking for something more simpler.

A UNION could actually be the solution that performs best, but you can write the query without UNION like this:
select *
from employee e
left join agency a
on a.id = e.agencyID
where coalesce(a.valid, true);
That will accept agencies where valid IS NULL, that is, result rows where the agency part was substituted with NULLs by the outer join.

You want except the condition that both table match(agency.id = employee.agencyID) and also agency.id is false. The following query will express the condition.
SELECT
e.*,
a.*
FROM
employee e
LEFT JOIN agency a ON a.id = e.agencyID
WHERE
NOT EXISTS (
SELECT
1
FROM
agency
WHERE
a.id = e.agencyID
AND a.valid IS FALSE)
ORDER BY
e.id;

Related

COALESCE TSQL with a join tsql

I have a requirement to pick up data that is in more than one place and I have some form of recognition if using the coalesce function. Basically I am looking to coalesce the join itself but looking online its seems as if i can only do this on the fields.
So we have a Products and Suppliers table, we also have these as a temp table so in total 4 tables (products, tempproducts, suppliers, tempsuppliers). In the suppliers and products table is where we store our products and suppliers and their temptables we store any new suppliers/products. We also have a tempsupplierproduct which joins new suppliers to new products. However we can end in a situation where a new supplier has an existing product so the new supplier will be in the tempsuppliers table and its product is in the products table NOT the tempproducts as it is not new, we will also have a new tempsupplierproduct to join the two up.
So i want a query which looks in the tempsupplierproducts table and then gets basic information about the supplier and products. To do this i am using a coalesce.
SELECT DISTINCT SP.*, COALESCE(P.Product, PD.Product) 'Product', COALESCE(S.Supplier, SU.Supplier) 'Supplier'
FROM tempsupplierproduct SP
LEFT JOIN tempProduct P ON SP.ProductCode = P.Code
LEFT JOIN Products PD ON SP.ProductCode = PD.Code
LEFT JOIN tempSupplier S ON SP.SupplierCode = S.Code
LEFT JOIN Suppliers SU ON SP.SupplierCode = SU.Code
Now while this works, something at the back of my head tells me it is not entirely right, ideally i want if data is not in table A then join to table B. I have seen maybe coalescing inside the join itself but I am unsure how to do this
LEFT JOIN Suppliers Su ON SP.SupplierCode = COALESCE(S.Code, SU.Code)
maybe away, but I am confused by this, all it is saying is use code in temptable if not there then use supplier code. So what would this mean if we have a code in the temptable, will this try to join on it, if so then this is incorrect also.
Any help is appreciated
You can union the two suppliers tables together and then join them in one go like this. I'm assuming that there are no duplicates between the two tables in this case but with a bit of extra work that could be resolved as well.
WITH AllSuppliers AS
(
SELECT Code, Supplier FROM Suppliers
UNION ALL
SELECT Code, Supplier FROM tempSupplier
)
SELECT DISTINCT SP.*, COALESCE(P.Product, PD.Product) 'Product', S.Supplier
FROM tempsupplierproduct SP
LEFT JOIN tempProduct P ON SP.ProductCode = P.Code
LEFT JOIN Products PD ON SP.ProductCode = PD.Code
LEFT JOIN AllSuppliers S ON SP.SupplierCode = S.Code
If you need to handle duplicates in the two suppliers tables then an approach like this should work, essentially we rank the duplicates and then pick the highest ranked result. For two tables you could use a full outer join between the two but this approach will scale to any number of tables.
WITH AllSuppliers AS
(
SELECT Code, Supplier, 1 AS TablePriority FROM Suppliers
UNION ALL
SELECT Code, Supplier, 2 AS TablePriority FROM tempSupplier
),
SuppliersRanked AS
(
SELECT Code, Supplier,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY TablePriority) AS RowPriority
FROM AllSuppliers
)
SELECT DISTINCT SP.*, COALESCE(P.Product, PD.Product) 'Product', S.Supplier
FROM tempsupplierproduct SP
LEFT JOIN tempProduct P ON SP.ProductCode = P.Code
LEFT JOIN Products PD ON SP.ProductCode = PD.Code
LEFT JOIN SuppliersRanked S ON SP.SupplierCode = S.Code
AND RowPriority = 1
You can absolutely join on a coalesced field. Here is a snippet from one of my production views:
LEFT JOIN [Portal].tblHelpdeskresource supplier ON PO.fld_str_SupplierID = supplier.fld_str_SupplierID
-- Job type a
LEFT JOIN [Portal].tblHelpDeskFault HDF ON PO.fld_int_HelpdeskFaultID = HDF.fld_int_ID
-- Job Type b
LEFT JOIN [Portal].tblProjectHeader PH ON PO.fld_int_ProjectHeaderID = PH.fld_int_ID
LEFT JOIN [Portal].tblPPMScheduleLine PSL ON PH.fld_int_PPMScheduleRef = PSL.fld_int_ID
-- Managers (used to be separate for a & b type, now converged)
LEFT JOIN [Portal].uvw_HelpDeskSiteManagers PSM ON COALESCE(PSL.fld_int_StoreID,HDF.fld_int_StoreID) = PSM.PortalSiteId
LEFT JOIN [Portal].tblHelpdeskResource PHDR ON PSM.PortalResourceId = PHDR.fld_int_ID

Get distinct row by primary key, but use value from another column

I'm trying to get the sum of the total time that was spent sending all emails within a campaign.
Because of the joins in my query I end up with the 'processing_time' column duplicated over many rows. So running sum(s.processing_time) as send_time will always over represent how long it took to run.
select
c.id,
c.sender,
c.subject,
count(*) as total_items,
count(distinct s.id) as sends,
sum(s.processing_time) as send_time,
from campaigns c
left join sends s on c.id = s.campaigns_id
left join opens o on s.id = o.sends_id
group by c.id;
I'd ideally like to do something like sum(s.processing_time when distinct s.id) but I can't quite work out how to achieve that.
I have made other attempts using case but I always run into the same issue, I need to get the distinct rows based on the ID column, but work with another column.
Since you want statistics related to distinct s.id as well as c.id, group by both columns. Collect the (intermediate) data that you need,
and use this table as the inner table in a nested sub-select query.
In the outer select, group by c.id alone.
Since the inner select groups by s.id, values which are unique per s.id will not get double-counted when you sum/group by c.id.
SELECT id
, sender
, subject
, sum(total_items) as total_items
, sum(sends) as sends
, sum(processing_time) as send_time
FROM (
SELECT
c.id
, s.id as sid
, count(*) as total_items
, 1 as sends
, s.processing_time
, c.sender
, c.subject
FROM campaigns c
LEFT JOIN sends s on c.id = s.campaigns_id
LEFT JOIN opens o on s.id = o.sends_id
GROUP BY c.id, c.sender, c.subject, s.processing_time, s.id) t
GROUP BY id, sender, subject
ORDER BY id
Since the final table includes sender and subject, you'll need to group by these columns as well to avoid an error such as:
ERROR: column "c.sender" must appear in the GROUP BY clause or be used in an aggregate function
LINE 14: , c.sender

Avoid nested loop in PostgreSQL

See query below
Select count(*) FROM
(Select distinct Student_ID, Name, Student_Age, CourseID from student) a1
JOIN
(Select distinct CourseID, CourseName, TeacherID from courses) a2
ON a1.CourseID=a2.CourseID
JOIN
(Select distinct TeacherID, TeacherName, Teacher_Age from teachers) a3
ON a2.TeacherID=a3.TeacherID
The subqueries must be used for deduping purpose.
This query run fine in PostgreSQL. However, if I add a condition between the student and teacher table, according to the execution plan, Postgres will wrongly nested loop join the student and teach tables which have no direct relationship. For example:
Select count(*) FROM
(Select distinct Student_ID, Name, Student_Age, CourseID from student) a1
JOIN
(Select distinct CourseID, CourseName, TeacherID from courses) a2
ON a1.CourseID=a2.CourseID
JOIN
(Select distinct TeacherID, TeacherName, Teacher_Age from teachers) a3 ON
a2.TeacherID=a3.TeacherID
WHERE Teacher_Age>=Student_Age
This query will take forever to run. However, if I replace the subqueries with the tables, it'll run very fast. Without using temp tables to store the deduping result, is there a way to to avoid the nested loop in this situation?
Thank you for your help.
You're making the database perform a lot of unnecessary work to accomplish your goal. Instead of doing 3 different SELECT DISTINCT sub-queries all joined together, try joining the base tables directly to each other and let it handle the DISTINCT part only once. If your tables have proper indexes on the ID fields, this should run rather quick.
SELECT COUNT(1)
FROM (
SELECT DISTINCT s.Student_ID, c.CourseID, t.TeacherID
FROM student s
JOIN courses c ON s.CourseID = c.CourseID
JOIN teachers t ON c.TeacherID = t.TeacherID
WHERE t.Teacher_Age >= s.StudentAge
) a

TSQL, join to multiple fields of which one could be NULL

I have a simple query:
SELECT * FROM Products p
LEFT JOIN SomeTable st ON st.SomeId = p.SomeId AND st.SomeOtherId = p.SomeOtherId
So far so good.
But the first join to SomeId can be NULL, In that case the check should be IS NULL, and that's where the join fails. I tried to use a CASE, but can't get that to work also.
Am I missing something simple here?
From Undocumented Query Plans: Equality Comparisons.
SELECT *
FROM Products p
LEFT JOIN SomeTable st
ON st.SomeOtherId = p.SomeOtherId
AND EXISTS (SELECT st.SomeId INTERSECT SELECT p.SomeId)

MS Access INNER JOIN most recent entry

I'm having some trouble trying to get Microsoft Access 2007 to accept my SQL query but it keeps throwing syntax errors at me that don't help me correct the problem.
I have two tables, let's call them Customers and Orders for ease.
I need some customer details, but also a few details from the most recent order. I currently have a query like this:
SELECT c.ID, c.Name, c.Address, o.ID, o.Date, o.TotalPrice
FROM Customers c
INNER JOIN Orders o
ON c.ID = o.CustomerID
AND o.ID = (SELECT TOP 1 ID FROM Orders WHERE CustomerID = c.ID ORDER BY Date DESC)
To me, it appears valid, but Access keeps throwing 'syntax error's at me and when I hit OK, it selects a piece of the SQL text that doesn't even relate to it.
If I take the extra SELECT clause out it works but is obviously not what I need.
Any ideas?
You cannot use AND in that way in MS Access, change it to WHERE. In addition, you have two reserved words in your column (field) names - Name, Date. These should be enclosed in square brackets when not prefixed by a table name or alias, or better, renamed.
SELECT c.ID, c.Name, c.Address, o.ID, o.Date, o.TotalPrice
FROM Customers c
INNER JOIN Orders o
ON c.ID = o.CustomerID
WHERE o.ID = (
SELECT TOP 1 ID FROM Orders
WHERE CustomerID = c.ID ORDER BY [Date] DESC)
I worked out how to do it in Microsoft Access. You INNER JOIN on a pre-sorted sub-query. That way you don't have to do multiple ON conditions which aren't supported.
SELECT c.ID, c.Name, c.Address, o.OrderNo, o.OrderDate, o.TotalPrice
FROM Customers c
INNER JOIN (SELECT * FROM Orders ORDER BY OrderDate DESC) o
ON c.ID = o.CustomerID
How efficient this is another story, but it works...