How to use aggregate functions when using recursive query in postgresql - postgresql

On multiple iteration on a recursive query in postgresql, I have got the following result when i run the below query
WITH recursive report AS (
select a.name, a.id, a.parentid, sum(b.id)
from table1 a
INNER JOIN table2 b on a.id=b.table1id
GROUP by a.name, a.id, a.parentid
), report2 AS (
SELECT , 0 as lvl
FROM report
WHERE parentid IS NULL
UNION ALL
SELECT child., parent.lvl + 1
FROM report child
JOIN report2 parent
ON parent.id = child.parentid
)
select * from report2
I want to sum the count column with the top most level, so my output should be like below,
What is the best possible way to get it.

If you calculate a path during recursion, like so:
WITH recursive report AS (
select a.name, a.id, a.parentid, sum(b.id) -- Is summing b.id the right thing here?
from table1 a
INNER JOIN table2 b on a.id=b.table1id
GROUP by a.name, a.id, a.parentid
), report2 AS (
SELECT report.*, 0 as lvl, array[report.id] as path_array
FROM report
WHERE parentid IS NULL
UNION ALL
SELECT child.*, parent.lvl + 1, report2.path_array||report.id
FROM report child
JOIN report2 parent
ON parent.id = child.parentid
)
select * from report2;
Do you really mean sum(b.id) and not count(*) in the report CTE?
You can get the sum of count for your top levels using this query as the main query from your recursion:
select t.name, sum(r.count) as total_count
from report2 r
join table1 t
on t.id = r.path_array[1]
group by t.name;

Related

PostgreSQL Join with special condition

Lets assume we have the following table1:
1 2 3
a x m
a y m
b z m
I want to do an inner join on the table
INNER JOIN tabel2 ON table1.2 = table2.2
Somehow like this, but additional a condition that the value of table1.1 not unique. Thus on table1.1 = b no inner join will occure in this example.
What is the best way to achieve this?
Using a an aggregate in a sub query is how I would do it
SELECT *
FROM table1
JOIN table2
ON table1."2" = table2."2"
JOIN (
SELECT "1"
FROM table1
GROUP BY "1"
HAVING COUNT(*) > 1
) AS sub_q
ON sub_q."1" = table1."1";
Another option might be a cte or temporary table to hold the rows you're joining on
WITH _cte AS
(
SELECT "1"
FROM table1
GROUP BY "1"
HAVING COUNT(*) > 1
)
SELECT *
FROM table1
JOIN table2
ON table1."2" = table2."2"
JOIN _cte AS cte
ON cte."1" = table1."1";
temp table:
CREATE TEMPORARY TABLE _tab
(
"1" varchar
);
INSERT INTO _tab
SELECT "1"
FROM table1
GROUP BY "1"
HAVING COUNT(*) > 1;
SELECT *
FROM table1
JOIN table2
ON table1."2" = table2."2"
JOIN _tab AS tab
ON tab."1" = table1."1";

Correlated subquery in order by clause in DB2

I had a query similar to the following and was wondering that DB2 complained about the correlation use in the ORDER BY clause. It errored with something like
[42703][-206] "A.ID" is not valid in the context where it is used..
SQLCODE=-206, SQLSTATE=42703
I was able to rewrite the query to avoid the correlation usage but I couldn't find a reference in the documenation about this. Is this a bug or am I just not able to find details on the expected behavior?
SELECT a.id
FROM A a
ORDER BY (
SELECT COUNT(*)
FROM B b
WHERE b.id = a.id
)
You can't use correlated query in order by clause. However there is many ways to get same result, for example
select count(*) as count_num ,a.ID
from
a join b on a.ID=b.ID
GROUP BY a.ID
order by 1 DESC
solution 1:
SELECT a.id, (select count(*) from B where B.id=a.id) nbOFB
FROM A
order by 2
solution 2:
select * from (
SELECT a.id, (select count(*) from B where B.id=a.id) nbOFB
FROM A
) tmp
order by nbOFB
Solution 3:
SELECT a.id, c.nb
FROM A
inner join lateral
(
select count(*) nb from B where B.id=a.id
) c on 1=1
order by c.nb
Solution 4 :
SELECT a.id, ifnull(c.nb, 0) nb
FROM A
left outer join
(
select b.id, count(*) nb from B group by b.id
) c on a.id=c.id
order by ifnull(c.nb, 0)
Solution 5:
with c as (
select b.id, count(*) nb from B group by b.id
)
SELECT a.id, ifnull(c.nb, 0) nb
FROM A left outer join c on a.id=c.id
order by ifnull(c.nb, 0)

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC
Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.
Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

Sorting rows by children?

I have this table:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
data TEXT,
parent INT,
posted INT
);
Each item has a piece of data, a timestamp, and a parent. I'd like to select the top 10 root items (parent = 0), sorted by the timestamp of the most recent child.
If item #1 has a child #2 that has a child #3, #3 is considered a child of #1.
How can I do this?
EDIT:
The query has been rewritten to
first sort the child items
get the root parent id and the rank for each item
select the top 10 parents
select the details for the top 10 parents
Common Table expressions have been used to incrementally select the data following the above steps.
WITH recursive c AS
(
SELECT *
FROM seeds
UNION ALL
SELECT
T.id,
T.parent,
c.topParentID,
(c.child_level + 1),
c.child_rank
FROM items AS T
INNER JOIN c ON T.parent = c.id
WHERE T.id <> T.parent
)
, seeds AS
(
SELECT
id,
parent,
parent AS topParentID,
0 AS child_level,
rank() OVER (ORDER BY posted DESC) child_rank
FROM items
WHERE parent <> 0
ORDER BY posted DESC
)
, rank_level AS
(
SELECT DISTINCT
c2.id id,
c_ranks.min_child_rank child_rank,
c_roots.max_child_level root_level
FROM
(
SELECT
id,
MAX(child_level) max_child_level
FROM c
GROUP BY id
)
c_roots
INNER JOIN c c2 ON c_roots.id = c2.id
INNER JOIN
(
SELECT
id,
MIN(child_rank) min_child_rank
FROM c
GROUP BY id
)
c_ranks
ON c2.id = c_ranks.id
)
, top_10_parents AS
(
SELECT
c.topParentID id,
MIN(rl.child_rank) id_rank
FROM rank_level rl
INNER JOIN c ON rl.id = c.id AND c.child_level = rl.root_level
GROUP BY c.topParentID
ORDER BY MIN(rl.child_rank)
limit 10
)
SELECT
i.*
FROM
items i
INNER JOIN top_10_parents tp ON tp.id = i.id
ORDER BY tp.id_rank;
SQL Fiddle
Reference:
WITH Queries (Common Table Expressions) on PostgreSQL Manual

How to get the top most parent in PostgreSQL

I have a tree structure table with columns:
id,parent,name.
Given a tree A->B->C,
how could i get the most top parent A's ID according to C's ID?
Especially how to write SQL with "with recursive"?
Thanks!
WITH RECURSIVE q AS
(
SELECT m
FROM mytable m
WHERE id = 'C'
UNION ALL
SELECT m
FROM q
JOIN mytable m
ON m.id = q.parent
)
SELECT (m).*
FROM q
WHERE (m).parent IS NULL
To implement recursive queries, you need a Common Table Expression (CTE).
This query computes ancestors of all parent nodes. Since we want just the top level, we select where level=0.
WITH RECURSIVE Ancestors AS
(
SELECT id, parent, 0 AS level FROM YourTable WHERE parent IS NULL
UNION ALL
SELECT child.id, child.parent, level+1 FROM YourTable child INNER JOIN
Ancestors p ON p.id=child.parent
)
SELECT * FROM Ancestors WHERE a.level=0 AND a.id=C
If you want to fetch all your data, then use an inner join on the id, e.g.
SELECT YourTable.* FROM Ancestors a WHERE a.level=0 AND a.id=C
INNER JOIN YourTable ON YourTable.id = a.id
Assuming a table named "organization" with properties id, name, and parent_organization_id, here is what worked for me to get a list that included top level and parent level org ID's for each level.
WITH RECURSIVE orgs AS (
SELECT
o.id as top_org_id
,null::bigint as parent_org_id
,o.id as org_id
,o.name
,0 AS relative_depth
FROM organization o
UNION
SELECT
allorgs.top_org_id
,childorg.parent_organization_id
,childorg.id
,childorg.name
,allorgs.relative_depth + 1
FROM organization childorg
INNER JOIN orgs allorgs ON allorgs.org_id = childorg.parent_organization_id
) SELECT
*
FROM
orgs order by 1,5;