I am creating SQL query that involves multiple tables with 1 to N relation to support pagination.
To get the first 10 parents, I tried to do
SELECT * from parent p
LEFT JOIN child c
ON c.parent_id = p.id
LIMIT 10
This does not work if any parent has more than one children
One alternative I can do is
SELECT * from parent LIMIT 10 into temp_p;
SELECT * from temp_p p
LEFT JOIN child c
ON c.parent_id = p.id
This is pretty clumsy. What I would like to do is
SELECT * from parent p LIMIT 10
LEFT JOIN child c
ON c.parent_id = p.id
but of course the syntax is wrong. I am wondering if Postgresql have some way to support what I want to do.
Use a common table expression:
WITH ten_parents AS (
SELECT * from parent LIMIT 10)
SELECT *
FROM ten_parents p
LEFT JOIN child c
ON c.parent_id = p.id
Related
Help me. I want a result to parent->child
WITH RECURSIVE subordinates AS (SELECT id,title, parent,level
FROM mst.locations UNION SELECT l.id,l.title,l.parent,l.level
FROM mst.locations l INNER JOIN subordinates son s.id = l.parent) SELECT *
FROM subordinates;
How I can take a child per parent?
Assume I have a query like this:
SELECT *
FROM clients c
INNER JOIN clients_balances cb ON cb.id_clients = c.id
LEFT JOIN clients com ON com.id = c.id_companies
LEFT JOIN clients com_real ON com_real.id = c.id_companies_real
LEFT JOIN rate_tables rt_orig ON rt_orig.id = c.orig_rate_table
LEFT JOIN rate_tables rt_term ON rt_term.id = c.term_rate_table
LEFT JOIN payment_terms pt ON pt.id = c.id_payment_terms
LEFT JOIN paygw_clients_profiles cpgw ON (cpgw.id_clients = c.id AND cpgw.id_companies = c.id_companies_real)
WHERE
EXISTS (SELECT * FROM accounts WHERE (name LIKE 'x' OR accname LIKE 'x' OR ani LIKE 'x') AND id_clients = c.id)
AND c."type" = '0'
AND c."id" > 0
ORDER BY c."name";
This query takes around 35 seconds to run when used in the production environment ("clients" has about 1 million records). However, if I take out ANY join - the query will take only about 300 ms to execute.
I've played around with the query planner settings, but to no avail.
Here are a few explain analyze outputs:
http://explain.depesz.com/s/hzy (slow - 48049.574 ms)
http://explain.depesz.com/s/FWCd (fast - 286.234 ms, rate_tables JOIN removed)
http://explain.depesz.com/s/MyRf (fast - 539.733 ms, paygw_clients_profiles JOIN removed)
It looks like in the fast case the planner starts from the EXISTS statement and has to perform join for only two rows in total. However, in the slow case it will first join all the tables and then filter by EXISTS.
What I need to do is to make this query run in a reasonable time with all seven join in place.
Postgres version is 9.3.10 on CentOS 6.3.
Thanks.
UPDATE
Rewriting the query like this:
SELECT *
FROM clients c
INNER JOIN clients_balances cb ON cb.id_clients = c.id
INNER JOIN accounts a ON a.id_clients = c.id AND (a.name = 'x' OR a.accname = 'x' OR a.ani = 'x')
LEFT JOIN clients com ON com.id = c.id_companies
LEFT JOIN clients com_real ON com_real.id = c.id_companies_real
LEFT JOIN rate_tables rt_orig ON rt_orig.id = c.orig_rate_table
LEFT JOIN rate_tables rt_term ON rt_term.id = c.term_rate_table
LEFT JOIN payment_terms pt ON pt.id = c.id_payment_terms
LEFT JOIN paygw_clients_profiles cpgw ON (cpgw.id_clients = c.id AND cpgw.id_companies = c.id_companies_real)
WHERE
c."type" = '0' AND c.id > 0
ORDER BY c."name";
makes it run fast, however, this is not acceptable, as account filtration parameters are optional, and I still need the result if there are no matches in that table. Using "LEFT JOIN accounts" instead of "INNER JOIN accounts" kills the performance again.
As suggested by Tome Lane, I've changed the following two parameters: join_collapse_limit and from_collapse_limit to 10 instead of the default 8, and this solved the issue.
I have a following script to get the total unit but it gives me an error
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Do I need to calculate SUM(ta.Qty) outside the main table?
SELECT
ta.ProductName
, SUM(ta.Total)
, SUM(SUM(ta.Qty) * ta.Unit)
FROM
tableA tA
INNER JOIN
tableB tB on tA.ID = tb.TableAID
INNER JOIN
tableC tc on ta.ID = tc.TableAID
INNER JOIN
tableD td on td.ID = tb.TableBID
GROUP BY
ta.ProductName
Here is a query in the AdventureWorks database that produces the same error (but might make some sense):
SELECT v.Name AS Vendor, SUM(SUM(p.ListPrice*d.OrderQty)+h.Freight)
FROM Production.Product p
INNER JOIN Purchasing.PurchaseOrderDetail d ON p.ProductID = d.ProductID
INNER JOIN Purchasing.PurchaseOrderHeader h ON h.PurchaseOrderID = d.PurchaseOrderID
INNER JOIN Purchasing.Vendor v ON v.BusinessEntityID = h.VendorID
GROUP BY v.Name
And here are two ways that I could rewrite that query to avoid the error:
SELECT v.Name AS Vendor, SUM(x.TotalAmount+h.Freight)
FROM (
SELECT PurchaseOrderID, SUM(p.ListPrice*d.OrderQty) AS TotalAmount
FROM Production.Product p
INNER JOIN Purchasing.PurchaseOrderDetail d ON p.ProductID = d.ProductID
GROUP BY PurchaseOrderID
) x
INNER JOIN Purchasing.PurchaseOrderHeader h ON h.PurchaseOrderID = x.PurchaseOrderID
INNER JOIN Purchasing.Vendor v ON v.BusinessEntityID = h.VendorID
GROUP BY v.Name
SELECT v.Name AS Vendor, SUM(x.TotalAmount+h.Freight)
FROM Purchasing.PurchaseOrderHeader h
INNER JOIN Purchasing.Vendor v ON v.BusinessEntityID = h.VendorID
CROSS APPLY (
SELECT SUM(p.ListPrice*d.OrderQty) AS TotalAmount
FROM Production.Product p
INNER JOIN Purchasing.PurchaseOrderDetail d ON p.ProductID = d.ProductID
WHERE d.PurchaseOrderID=h.PurchaseOrderID
) x
GROUP BY v.Name
The first query uses derived tables and the second one uses CROSS APPLY.
Totally confused and I have been working at this for 2 hours
I thought restriction on the left side of the join are honored
On this query I am getting [docSVsys].[visibility] 1 and <> 1
I thought this would restrict [docSVsys].[visibility] to 1
select top 1000
[docSVsys].[sID], [docSVsys].[visibility]
,[Table].[sID],[Table].[enumID],[Table].[valueID]
from [docSVsys] with (nolock)
left Join [DocMVenum1] as [Table] with (nolock)
on [docSVsys].[visibility] in (1)
and [Table].[sID] = [docSVsys].[sID]
and [Table].[enumID] = '140'
and [Table].[valueID] in (1,7)
This works
select top 1000
[docSVsys].[sID], [docSVsys].[visibility]
,[Table].[sID],[Table].[enumID],[Table].[valueID]
from [docSVsys] with (nolock)
left Join [DocMVenum1] as [Table] with (nolock)
on [Table].[sID] = [docSVsys].[sID]
and [Table].[enumID] = '140'
and [Table].[valueID] in (1,7)
where [docSVsys].[visibility] in (1)
I am just having a really off day as I had it in my mind the left side honored the join
SELECT *
FROM A
LEFT JOIN B ON Condition
is equivalent to
SELECT *
FROM A
CROSS JOIN B
WHERE Condition
UNION ALL
SELECT A.*, NULL AS B
FROM A
WHERE NOT EXISTS (SELECT * FROM B WHERE Condition)
Some rough pseudo-code...
Note, that all rows from A get through. It's just that the columns from B can be NULL if the join fails for some particular row of A.
Put the filter on docSVsys into the WHERE clause.
LEFT JOINs preserve all rows from the left (first) table, no matter what. The condition in the ON clause is only for matching which rows from the right/second table should be paired with rows from the left/first table.
If you want to exclude some rows from the firs table, use the WHERE clause:
select top 1000
[docSVsys].[sID], [docSVsys].[visibility]
,[Table].[sID],[Table].[enumID],[Table].[valueID]
from [docSVsys] with (nolock)
left Join [DocMVenum1] as [Table] with (nolock)
on [Table].[sID] = [docSVsys].[sID]
and [Table].[enumID] = '140'
and [Table].[valueID] in (1,7)
where [docSVsys].[visibility] in (1)
Is it possible to join a nested select statement with itself (without writing it out twice and running it twice)
Something like this would be ideal
SELECT P.Child, P.Parent, Q.Parent AS GrandParent
FROM (SELECT Child, Parent FROM something-complex) AS P
LEFT JOIN P AS Q ON Q.Child = P.Parent
50% possible. You can use a CTE to avoid writing it twice but it will still execute twice.
;WITH p
AS (SELECT child,
parent
FROM something-complex)
SELECT p.child,
p.parent,
q.parent AS grandparent
FROM p
LEFT JOIN p AS q
ON q.child = p.parent
If the query is expensive you would need to materialize it into a table variable or #temp table to avoid the self join causing two invocations of the underlying query.
You could use a common table expression:
WITH P AS (SELECT Child, Parent FROM something-complex)
SELECT P.Child, P.Parent, Q.Parent as GrandParent
LEFT JOIN P AS Q ON Q.Child = P.Parent