Postgresql recursive CTE results ordering - postgresql

I'm working on a query to pull data out of a hierarchy
e.g.
CREATE table org (
id INT PRIMARY KEY,
name TEXT NOT NULL,
parent_id INT);
INSERT INTO org (id, name) VALUES (0, 'top');
INSERT INTO org (id, name, parent_id) VALUES (1, 'middle1', 0);
INSERT INTO org (id, name, parent_id) VALUES (2, 'middle2', 0);
INSERT INTO org (id, name, parent_id) VALUES (3, 'bottom3', 1);
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org;
It works as expected.
3 1 "bottom3"
1 0 "middle1"
0 "top"
It's also returning the data in the order that I expect, and it makes sense to me that it would do this because of the way that the results would be discovered.
The question is, can I count on the order being like this?

Yes, there is a defined order. In the Postgres WITH doc, they give the following example:
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[ROW(g.f1, g.f2)],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ROW(g.f1, g.f2),
ROW(g.f1, g.f2) = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
About which they say in a Tip box (formatting mine):
The recursive query evaluation algorithm produces its output in
breadth-first search order. You can display the results in depth-first
search order by making the outer query ORDER BY a "path" column
constructed in this way.
You do appear to be getting breadth-first output in your case above based on the INSERT statements, so I would say you could, if you wanted, modify your outer SELECT to order it in another fashion.
I believe the analog for depth-first in your case would probably be this:
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org
ORDER BY id;
As I would expect (running things through in my head) that to yield this:
0 "top"
1 0 "middle1"
3 1 "bottom3"

Related

Postgresql - select query with aggregated decisions column as json

I have table which contains specified columns:
id - bigint
decision - varchar(80)
type - varchar(258)
I want to make a select query which in result returns something like this(id, decisionsValues with counts as json, type):
id decisions type
1 {"firstDecisionsValue":countOfThisValue, "secondDecisionsValue": countOfThisValue} entryType
I heard that I can try play with json_agg but it does not allow COUNT method, tried to use json_agg with query:
SELECT ac.id,
json_agg(ac.decision),
ac.type
FROM myTable ac
GROUP BY ac.id, ac.type;
but ends with this(for entry with id 1 there are two occurences of firstDecisionsValue, one occurence of secondDecisionsValue):
id decisions type
1 {"firstDecisionsValue", "firstDecisionsValue", "secondDecisionsValue"} entryType
minimal reproducible example
CREATE TABLE myTable
(
id bigint,
decisions varchar(80),
type varchar(258)
);
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'secondDecisionsValue', 'myType');
Can you provide me any tips how to make it as expected?
1, {"fistDecisionsValue":2, "secondDecisionsValue":1}, entryType
You can try this
SELECT a.id, jsonb_object_agg(a.decisions, a.count), a.type
FROM
( SELECT id, type, decisions, count(*) AS count
FROM myTable
GROUP BY id, type, decisions
) AS a
GROUP BY a.id, a.type
see the result in dbfiddle.
First, you should calculate the count of id, type, decisions for each decisions after that, you should use jsonb_object_agg to create JSON.
Demo
with data as (
select
ac.id,
ac.type,
ac.decisions,
count(*)
from
myTable ac
group by
ac.id,
ac.type,
ac.decisions
)
select
d.id,
d.type,
json_object_agg(d.decisions, d.count)
from
data d
group by
d.id,
d.type

SQL SELECT Parent-Child with SORT (...again)

UPDATED:
I have a simple, one level parent child relation table, with following columns:
ID_Asset| Parent_ID_Asset | ProductTitle
I need output grouped by Parent followed by children, and also sorted by Parent and Children Name. My attempts in the fiddle.
See here for details: https://rextester.com/PPCHG20007
Desired order:
9 8 NULL ADONIS Server
7 16 8 ADONIS Designer
8 20 8 ADONIS Portal Module “Control & Release” Package XS
Parent first, than children, while ProductTitle ordered alphabetically.
Thanx all for hints so far.
I would do conditional ordering instead :
select t.*
from table t
order by (case when parent_id is null then id else parent_id end), ProductTitle;
I am assuming you need to sort the data based on parent-child relation.
As far as I understand, you need to sort on the name of the root products, and in between them to show the sub-products, ordered by name, and in between them to show their sub-products, etc.
I guess you are using a recursive cte. You can define a "hierarchy sorting" helper which is a padded number in the current level, and for each level deep, add a suffix with the padded number in the current level, etc.
Something like that:
declare #Products table(ID int, Parent_ID int, ProductTitle varchar(100))
insert into #Products values
(1, NULL, 'ADONIS'),
(2, NULL, 'BACARAT'),
(3, 1, 'Portal Module'),
(4, 1, 'Alhambra'),
(5, NULL, 'ZULU'),
(6, 2, 'Omega')
; with cte as (
select ID, Parent_ID, ProductTitle, FORMAT(ROW_NUMBER() over(order by ProductTitle), '0000') as SortingHelper
from #Products
where Parent_ID is null
union all
select p.ID, p.Parent_ID, p.ProductTitle, cte.SortingHelper + '.' + FORMAT(ROW_NUMBER() over(order by p.ProductTitle), '0000') as SortingHelper
from #Products p
inner join cte on cte.ID = p.Parent_ID
)
select ID, Parent_ID, ProductTitle
from cte
order by SortingHelper
I think this is the ordering you are looking for. This joins each row to its parent (if it exists). Then if there is a TP (parent) record then that is the parent title, ID etc, otherwise the current record must be the parent. I've shown the parent name in the query results for clarity. Then it sorts by
Parent name (so parents and children are together, but in parent name order)
Parent ID (in case two or more parents have the same name/title, this will keep children with the correct parent)
A flag which is 0 for parents, 1 for children, so the parent comes first
The current record name, which will sort children by name/title order
Code is
Select T.*,
isnull(TP.ProductTitle, T.ProductTitle) as ParentName -- This is the parent name, shown for reference
from test T
left outer join Test TP on TP.ID_Asset = T.Parent_ID_Asset --This is the parent record, if it exists
ORDER BY isnull(TP.ProductTitle, T.ProductTitle), --ParentName sort
isnull(TP.ID_Asset, T.ID_Asset), --if two parents have the same title, this makes sure they group with their correct children
Case when T.Parent_ID_Asset is null then 0 else 1 end, --this makes sure the parent comes before the child
T.ProductTitle --Child Sort

PostgreSQL: Order by name after a multiple-level hierarchy sort

I'm currently exporting queries from Oracle to PostgreSQL, and I am stuck on this one which is used to sort directories:
WITH RECURSIVE R AS (
SELECT ARRAY[ID] AS H
,ID
,PARENTID
,NAME
,1 AS level
FROM REPERTORIES
WHERE ID= (SELECT Min(ID) FROM REPERTORIES)
UNION ALL
SELECT R.H || A.ID
,A.ID
,A.PARENTID
,A.NAME
,R.level + 1
FROM REPERTORIES A
JOIN R ON A.PARENTID = R.ID
)
SELECT NAME
,ID
,PARENTID
, level
FROM R
ORDER BY H
It's partially working, each subdirectory is placed after his parent directory or a directory sharing the same parent directory (A directory can have subdirectories which also have subdirectories and so on)
But I need to also sort the directories that are at the same level by their NAME (while, of course, still having their subdirectories right next to them)
How can I achieve this?
Thanks in advance (and sorry if my English is hard to understand)
EDIT: Here is the orignial Oracle query:
SELECT NAME, ID, PARENTID, level
FROM REPERTORIES
CONNECT BY PRIOR ID = PARENTID
START WITH ID = (SELECT Min(ID) FROM REPERTORIES)
ORDER SIBLINGS BY NAME
Similar to the way you construct h, construct an array that contains the path names and order by that.
The "order by H" is throwing me for a loop, however..
In general you need to do your sorting in the recursive part of the query to keep them "grouped" properly.
WITH RECURSIVE R AS (
SELECT ARRAY[ID] AS H
,ID
,PARENTID
,NAME
,1 AS level
FROM REPERTORIES
WHERE ID= (SELECT Min(ID) FROM REPERTORIES)
UNION ALL
SELECT R.H || A.ID
,A.ID
,A.PARENTID
,A.NAME
,R.level + 1
FROM REPERTORIES A
JOIN R ON A.PARENTID = R.ID
order by name
)
SELECT NAME
,ID
,PARENTID
, level
FROM R

How to merge JSONB field in a tree structure?

I have a table in Postgres which stores a tree structure. Each node has a jsonb field: params_diff:
CREATE TABLE tree (id INT, parent_id INT, params_diff JSONB);
INSERT INTO tree VALUES
(1, NULL, '{ "some_key": "some value" }'::jsonb)
, (2, 1, '{ "some_key": "other value", "other_key": "smth" }'::jsonb)
, (3, 2, '{ "other_key": "smth else" }'::jsonb);
The thing I need is to select a node by id with additional generated params field which contains the result of merging all params_diff from the whole parents chain:
SELECT tree.*, /* some magic here */ AS params FROM tree WHERE id = 3;
id | parent_id | params_diff | params
----+-----------+----------------------------+-------------------------------------------------------
3 | 2 | {"other_key": "smth else"} | {"some_key": "other value", "other_key": "smth else"}
Generally, a recursive CTE can do the job. Example:
Use table alias in another query to traverse a tree
We just need a more magic to decompose, process and re-assemble the JSON result. I am assuming from your example, that you want each key once only, with the first value in the search path (bottom-up):
WITH RECURSIVE cte AS (
SELECT id, parent_id, params_diff, 1 AS lvl
FROM tree
WHERE id = 3
UNION ALL
SELECT t.id, t.parent_id, t.params_diff, c.lvl + 1
FROM cte c
JOIN tree t ON t.id = c.parent_id
)
SELECT id, parent_id, params_diff
, (SELECT json_object(array_agg(key ORDER BY lvl)
, array_agg(value ORDER BY lvl))::jsonb
FROM (
SELECT key, value
FROM (
SELECT DISTINCT ON (key)
p.key, p.value, c.lvl
FROM cte c, jsonb_each_text(c.params_diff) p
ORDER BY p.key, c.lvl
) sub1
ORDER BY lvl
) sub2
) AS params
FROM cte
WHERE id = 3;
How?
Walk the tree with a classic recursive CTE.
Create a derived table with all keys and values with jsonb_each_text() in a LATERAL JOIN, remember the level in the search path (lvl).
Use DISTINCT ON to get the "first" (lowest lvl) value for each key. Details:
Select first row in each GROUP BY group?
Sort and aggregate resulting keys and values and feed the arrays to json_object() to build the final params value.
SQL Fiddle (only as far as pg 9.3 can go with json instead of jsonb).

TSQL Group By with an "OR"?

This query for creating a list of Candidate duplicates is easy enough:
SELECT Count(*), Can_FName, Can_HPhone, Can_EMail
FROM Can
GROUP BY Can_FName, Can_HPhone, Can_EMail
HAVING Count(*) > 1
But if the actual rule I want to check against is FName and (HPhone OR Email) - how can I adjust the GROUP BY to work with this?
I'm fairly certain I'm going to end up with a UNION SELECT here (i.e. do FName, HPhone on one and FName, EMail on the other and combine the results) - but I'd love to know if anyone knows an easier way to do it.
Thank you in advance for any help.
Scott in Maine
Before I can advise anything, I need to know the answer to this question:
name phone email
John 555-00-00 john#example.com
John 555-00-01 john#example.com
John 555-00-01 john-other#example.com
What COUNT(*) you want for this data?
Update:
If you just want to know that a record has any duplicates, use this:
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
)
SELECT *
FROM q qo
WHERE EXISTS
(
SELECT NULL
FROM q qi
WHERE qi.id <> qo.id
AND qi.name = qo.name
AND (qi.phone = qo.phone OR qi.email = qo.email)
)
It's more efficient, but doesn't tell you where the duplicate chain started.
This query select all entries along with the special field, chainid, that indicates where the duplicate chain started.
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
),
dup AS (
SELECT id AS chainid, id, name, phone, email, 1 as d
FROM q
UNION ALL
SELECT chainid, qo.id, qo.name, qo.phone, qo.email, d + 1
FROM dup
JOIN q qo
ON qo.name = dup.name
AND (qo.phone = dup.phone OR qo.email = dup.email)
AND qo.id > dup.id
),
chains AS
(
SELECT *
FROM dup do
WHERE chainid NOT IN
(
SELECT id
FROM dup di
WHERE di.chainid < do.chainid
)
)
SELECT *
FROM chains
ORDER BY
chainid
None of these answers is correct. Quassnoi's is a decent approach, but you will notice one fatal flaw in the expressions "qo.id > dup.id" and "di.chainid < do.chainid": comparisons made by ID! This is ALWAYS bad practice because it depends on some inherent ordering in the IDs. IDs should NEVER be given any implicit meaning and should ONLY participate in equality or null testing. You can easily break Quassnoi's solution in this example by simply reordering the IDs in the data.
The essential problem is a disjunctive condition with a grouping, which leads to the possibility of two records being related through an intermediate, though they are not directly relatable.
e.g., you stated these records should all be grouped:
(1) John 555-00-00 john#example.com
(2) John 555-00-01 john#example.com
(3) John 555-00-01 john-other#example.com
You can see that #1 and #2 are relatable, as are #2 and #3, but clearly #1 and #3 are not directly relatable as a group.
This establishes that a recursive or iterative solution is the ONLY possible solution.
So, recursion is not viable since you can easily end up in a looping situation. This is what Quassnoi was trying to avoid with his ID comparisons, but in doing so he broke the algorithm. You could try limiting the levels of recursion, but you may not then complete all relations, and you will still potentially be following loops back upon yourself, leading to excessive data size and prohibitive inefficiency.
The best solution is ITERATIVE: Start a result set by tagging each ID as a unique group ID, and then spin through the result set and update it, combining IDs into the same unique group ID as they match on the disjunctive condition. Repeat the process on the updated set each time until no further updates can be made.
I will create example code for this soon.
GROUP BY doesn't support OR - it's implicitly AND and must include every non-aggregator in the select list.
I assume you also have a unique ID integer as the primary key on this table. If you don't, it's a good idea to have one, for this purpose and many others.
Find those duplicates by a self-join:
select
c1.ID
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
, c2.ID
, c2.Can_FName
, c2.Can_HPhone
, c2.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
order by
c1.ID
The query gives you N-1 rows for each N duplicate combinations - if you want just a count along with each unique combination, count the rows grouped by the "left" side:
select count(1) + 1,
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
group by
c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
Granted, this is more involved than a union - but I think it illustrates a good way of thinking about duplicates.
Project the desired transformation first from a derived table, then do the aggregation:
SELECT COUNT(*)
, CAN_FName
, Can_HPhoneOrEMail
FROM (
SELECT Can_FName
, ISNULL(Can_HPhone,'') + ISNULL(Can_EMail,'') AS Can_HPhoneOrEMail
FROM Can) AS Can_Transformed
GROUP BY Can_FName, Can_HPhoneOrEMail
HAVING Count(*) > 1
Adjust your 'OR' operation as needed in the derived table project list.
I know this answer will be criticised for the use of the temp table, but it will work anyway:
-- create temp table to give the table a unique key
create table #tmp(
ID int identity,
can_Fname varchar(200) null, -- real type and len here
can_HPhone varchar(200) null, -- real type and len here
can_Email varchar(200) null, -- real type and len here
)
-- just copy the rows where a duplicate fname exits
-- (better performance specially for a big table)
insert into #tmp
select can_fname,can_hphone,can_email
from Can
where can_fname exists in (select can_fname from Can
group by can_fname having count(*)>1)
-- select the rows that have the same fname and
-- at least the same phone or email
select can_Fname, can_Hphone, can_Email
from #tmp a where exists
(select * from #tmp b where
a.ID<>b.ID and A.can_fname = b.can_fname
and (isnull(a.can_HPhone,'')=isnull(b.can_HPhone,'')
or (isnull(a.can_email,'')=isnull(b.can_email,'') )
Try this:
SELECT Can_FName, COUNT(*)
FROM (
SELECT
rank() over(partition by Can_FName order by Can_FName,Can_HPhone) rnk_p,
rank() over(partition by Can_FName order by Can_FName,Can_EMail) rnk_m,
Can_FName
FROM Can
) X
WHERE rnk_p=1 or rnk_m =1
GROUP BY Can_FName
HAVING COUNT(*)>1