how can I get all ids starting from a given id recursively in a postgresql table that references itself? - postgresql

the title may not be very clear so let's consider this example (this is not my code, just taking this example to model my request)
I have a table that references itself (like a filesystem)
id | parent | name
----+----------+-------
1 | null | /
2 | 1 | home
3 | 2 | user
4 | 3 | bin
5 | 1 | usr
6 | 5 | local
Is it possible to make a sql request so if I choose :
1 I will get a table containing 2,3,4,5,6 (because this is the root) so matching :
/home
/home/user
/home/user/bin
/usr
etc...
2 I will get a table containing 3,4 so matching :
/home/user
/home/user/bin
and so on

Use recursive common table expression. Always starting from the root, use an array of ids to get paths for a given id in the WHERE clause.
For id = 1:
with recursive cte(id, parent, name, ids) as (
select id, parent, name, array[id]
from my_table
where parent is null
union all
select t.id, t.parent, concat(c.name, t.name, '/'), ids || t.id
from cte c
join my_table t on c.id = t.parent
)
select id, name
from cte
where 1 = any(ids) and id <> 1
id | name
----+-----------------------
2 | /home/
5 | /usr/
6 | /usr/local/
3 | /home/user/
4 | /home/user/bin/
(5 rows)
For id = 2:
with recursive cte(id, parent, name, ids) as (
select id, parent, name, array[id]
from my_table
where parent is null
union all
select t.id, t.parent, concat(c.name, t.name, '/'), ids || t.id
from cte c
join my_table t on c.id = t.parent
)
select id, name
from cte
where 2 = any(ids) and id <> 2
id | name
----+-----------------------
3 | /home/user/
4 | /home/user/bin/
(2 rows)
Bidirectional query
The question is really interesting. The above query works well but is inefficient as it parses all tree nodes even when we're asking for a leaf. The more powerful solution is a bidirectional recursive query. The inner query walks from a given node to top, while the outer one goes from the node to bottom.
with recursive outer_query(id, parent, name) as (
with recursive inner_query(qid, id, parent, name) as (
select id, id, parent, name
from my_table
where id = 2 -- parameter
union all
select qid, t.id, t.parent, concat(t.name, '/', q.name)
from inner_query q
join my_table t on q.parent = t.id
)
select qid, null::int, right(name, -1)
from inner_query
where parent is null
union all
select t.id, t.parent, concat(q.name, '/', t.name)
from outer_query q
join my_table t on q.id = t.parent
)
select id, name
from outer_query
where id <> 2; -- parameter

Related

Postgresql recursive query

I have table with self-related foreign keys and can not get how I can receive firs child or descendant which meet condition. My_table structure is:
id
parent_id
type
1
null
union
2
1
group
3
2
group
4
3
depart
5
1
depart
6
5
unit
7
1
unit
I should for id 1 (union) receive all direct child or first descendant, excluding all groups between first descendant and union. So in this example as result I should receive:
id
type
4
depart
5
depart
7
unit
id 4 because it's connected to union through group with id 3 and group with id 2 and id 5 because it's connected directly to union.
I've tried to write recursive query with condition for recursive part: when parent_id = 1 or parent_type = 'depart' but it doesn't lead to expected result
with recursive cte AS (
select b.id, p.type_id
from my_table b
join my_table p on p.id = b.parent_id
where b.id = 1
union
select c.id, cte.type_id
from my_table c
join cte on cte.id = c.parent_id
where c.parent_id = 1 or cte.type_id = 'group'
)
Here's my interpretation:
if type='group', then id and parent_id are considered in the same group
id#1 and id#2 are in the same group, they're equals
id#2 and id#3 are in the same group, they're equals
id#1, id#2 and id#3 are in the same group
If the above is correct, you want to get all the first descendent of id#1's group. The way to do that:
Get all the ids in the same group with id#1
Get all the first descendants of the above group (type not in ('union', 'group'))
with recursive cte_group as (
select 1 as id
union all
select m.id
from my_table m
join cte_group g
on m.parent_id = g.id
and m.type = 'group')
select mt.id,
mt.type
from my_table mt
join cte_group cg
on mt.parent_id = cg.id
and mt.type not in ('union','group');
Result:
id|type |
--+------+
4|depart|
5|depart|
7|unit |
Sounds like you want to start with the row of id 1, then get its children, and continue recursively on rows of type group. To do that, use
WITH RECURSIVE tree AS (
SELECT b.id, b.type, TRUE AS skip
FROM my_table b
WHERE id = 1
UNION ALL
SELECT c.id, c.type, (c.type = 'group') AS skip
FROM my_table c
JOIN tree p ON c.parent_id = p.id AND p.skip
)
SELECT id, type
FROM tree
WHERE NOT skip

Cascading sum hierarchy using recursive cte

I'm trying to perform recursive cte with postgres but I can't wrap my head around it. In terms of performance issue there are only 50 items in TABLE 1 so this shouldn't be an issue.
TABLE 1 (expense):
id | parent_id | name
------------------------------
1 | null | A
2 | null | B
3 | 1 | C
4 | 1 | D
TABLE 2 (expense_amount):
ref_id | amount
-------------------------------
3 | 500
4 | 200
Expected Result:
id, name, amount
-------------------------------
1 | A | 700
2 | B | 0
3 | C | 500
4 | D | 200
Query
WITH RECURSIVE cte AS (
SELECT
expenses.id,
name,
parent_id,
expense_amount.total
FROM expenses
WHERE expenses.parent_id IS NULL
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
UNION ALL
SELECT
expenses.id,
expenses.name,
expenses.parent_id,
expense_amount.total
FROM cte
JOIN expenses ON expenses.parent_id = cte.id
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
)
SELECT
id,
SUM(amount)
FROM cte
GROUP BY 1
ORDER BY 1
Results
id | sum
--------------------
1 | null
2 | null
3 | 500
4 | 200
You can do a conditional sum() for only the root row:
with recursive tree as (
select id, parent_id, name, id as root_id
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.root_id
from expense c
join tree p on c.parent_id = p.id
)
select e.id,
e.name,
e.root_id,
case
when e.id = e.root_id then sum(ea.amount) over (partition by root_id)
else amount
end as amount
from tree e
left join expense_amount ea on e.id = ea.ref_id
order by id;
I prefer doing the recursive part first, then join the related tables to the result of the recursive query, but you could do the join to the expense_amount also inside the CTE.
Online example: http://rextester.com/TGQUX53703
However, the above only aggregates on the top-level parent, not for any intermediate non-leaf rows.
If you want to see intermediate aggregates as well, this gets a bit more complicated (and is probably not very scalable for large results, but you said your tables aren't that big)
with recursive tree as (
select id, parent_id, name, 1 as level, concat('/', id) as path, null::numeric as amount
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.level + 1, concat(p.path, '/', c.id), ea.amount
from expense c
join tree p on c.parent_id = p.id
left join expense_amount ea on ea.ref_id = c.id
)
select e.id,
lpad(' ', (e.level - 1) * 2, ' ')||e.name as name,
e.amount as element_amount,
(select sum(amount)
from tree t
where t.path like e.path||'%') as sub_tree_amount,
e.path
from tree e
order by path;
Online example: http://rextester.com/MCE96740
The query builds up a path of all IDs belonging to a (sub)tree and then uses a scalar sub-select to get all child rows belonging to a node. That sub-select is what will make this quite slow as soon as the result of the recursive query can't be kept in memory.
I used the level column to create a "visual" display of the tree structure - this helps me debugging the statement and understanding the result better. If you need the real name of an element in your program you would obviously only use e.name instead of pre-pending it with blanks.
I could not get your query to work for some reason. Here's my attempt that works for the particular table you provided (parent-child, no grandchild) without recursion. SQL Fiddle
--- step 1: get parent-child data together
with parent_child as(
select t.*, amount
from
(select e.id, f.name as name,
coalesce(f.name, e.name) as pname
from expense e
left join expense f
on e.parent_id = f.id) t
left join expense_amount ea
on ea.ref_id = t.id
)
--- final step is to group by id, name
select id, pname, sum(amount)
from
(-- step 2: group by parent name and find corresponding amount
-- returns A, B
select e.id, t.pname, t.amount
from expense e
join (select pname, sum(amount) as amount
from parent_child
group by 1) t
on t.pname = e.name
-- step 3: to get C, D we union and get corresponding columns
-- results in all rows and corresponding value
union
select id, name, amount
from expense e
left join expense_amount ea
on e.id = ea.ref_id
) t
group by 1, 2
order by 1;

removing duplicate rows and dependencies without cursor

I have a table that has a long list of duplicated items. I am working on a stored procedure to consolidate them all into one record. Each one of the duplicated items has a number of child tables that should either be deleted, or rekeyed to point at the resulting record. My table has an Id, but the ReadableIdentifier is the column I need to deduplicate.
Id | ReadableIdentifier | Name | UpdatedOn
1 | ABC1234 | Product X | 2014-04-25 16:00:08.000
2 | ABC1234 | Product X | 2014-04-28 16:00:08.000
3 | ABC1234 | Product X | 2014-04-21 16:00:08.000
4 | ABDD9945 | Widget R | 2014-04-25 16:00:08.000
5 | ABDD9945 | Widget R | 2014-04-25 18:45:08.000
As you can see, records 1-3 are duplicates with different Id's and UpdatedOn dates. Same for 4-5. I need to consolidate these into one record, preferring the one with the most recent UpdatedOn date.
End Goal (not showing children tables):
Id | ReadableIdentifier | Name | UpdatedOn
2 | ABC1234 | Product X | 2014-04-28 16:00:08.000
5 | ABDD9945 | Widget R | 2014-04-25 18:45:08.000
I am using a CURSOR to do this, but am wondering if there is a better solution.
DECLARE dupeCursor CURSOR
FAST_FORWARD
FOR
WITH Counts AS (
SELECT
COUNT(1) Count,
ReadableIdentifier
FROM dbo.Item WITH (NOLOCK)
WHERE ReadableIdentifier IS NOT NULL
GROUP BY ReadableIdentifier)
SELECT
Counts.Count,
Counts.ReadableIdentifier,
Counts.CompanyId
FROM
Counts
WHERE Counts.Count > 1;
OPEN dupeCursor;
DECLARE #readableId VARCHAR(50);
DECLARE #itemToPersistId INT, #itemToDeleteId INT;
FETCH NEXT FROM dupeCursor INTO #readableId;
WHILE ##FETCH_STATUS = 0
BEGIN
WITH V AS (
SELECT Id, ROW_NUMBER() OVER (PARTITION BY ReadableId ORDER BY UpdatedOn DESC) as Row
FROM dbo.Item WITH (NOLOCK) WHERE ReadableId = #readableId
)
SELECT #itemToPersistId = Id
FROM V
WHERE V.Row = 1
CREATE TABLE #itemsToDelete (Id UNIQUEIDENTIFIER)
INSERT INTO #itemsToDelete
SELECT Id
FROM dbo.Item WITH (NOLOCK)
WHERE ReadableId = #readableId AND Id != #itemToPersistId;
--UPDATE CHILDREN TABLES
DELETE FROM dbo.ItemDetails WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
UPDATE dbo.ItemPurchases SET ItemId = #itemToPersistId
WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
UPDATE dbo.PurchaseOrders SET ItemId = #itemToPersistId
WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
DELETE FROM dbo.ItemMetadata WHERE ItemId IN (SELECT Id FROM #itemsToDelete);
--delete Duplicated Items
DELETE FROM dbo.Item WHERE Id IN (SELECT Id FROM #itemsToDelete);
DROP TABLE #itemsToDelete
FETCH NEXT FROM dupeCursor INTO #readableId;
END
CLOSE dupeCursor;
DEALLOCATE dupeCursor;
I realize the cursor is most likely the issue, but I'm not sure how to go about updating all of the child tables without using one.
Ok I dont have data to test this for the child tables but it should work:
WITH V
AS (SELECT *,
ROW_NUMBER() OVER(PARTITION BY ReadableId ORDER BY UpdatedOn DESC) AS Row
FROM dbo.Item WITH (NOLOCK))
SELECT *
INTO #itemsToDelete
FROM V;
--UPDATE CHILDREN TABLES
DELETE FROM dbo.ItemDetails
WHERE ItemId IN
(
SELECT Id
FROM #itemsToDelete
WHERE Row > 1
);
UPDATE IP
SET
IP.ItemId = itk.ID
FROM dbo.ItemPurchases AS IP
INNER JOIN #itemsToDelete AS itd ON IP.ItemId = itd.ID
AND itd.Row > 1
INNER JOIN #itemsToDelete AS itk ON itk.ReadableIdentifier = itd.ReadableIdentifier
AND itk.Row = 1
AND itd.Row > 1;
UPDATE po
SET
po.ItemId = itk.ID
FROM dbo.PurchaseOrders AS po
INNER JOIN #itemsToDelete AS itd ON po.ItemId = itd.ID
AND itd.Row > 1
INNER JOIN #itemsToDelete AS itk ON itk.ReadableIdentifier = itd.ReadableIdentifier
AND itk.Row = 1
AND itd.Row > 1;
DELETE FROM dbo.ItemMetadata
WHERE ItemId IN
(
SELECT Id
FROM #itemsToDelete
WHERE Row > 1
);
--delete Duplicated Items
DELETE FROM dbo.Item
WHERE Id IN
(
SELECT Id
FROM #itemsToDelete
WHERE Row > 1
);

Postgres: Best way to query hierarchy structures by name

Suppose I have a hierarchy of categories as follows:
id | name | parent_id
---+------------+-----------
1 | Computers |
---+------------+-----------
2 | Laptops | 1
---+------------+-----------
3 | Desktops | 1
---+------------+-----------
4 | Big | 2
---+------------+-----------
5 | Small | 2
---+------------+-----------
4 | Big | 3
---+------------+-----------
5 | Small | 3
Now, suppose someone gives me the input ['Computers', 'Laptops', 'Small']. What is the best way in Postgres to query the hierarchy and arrive at the correct end category (e.g. id 5)?
I know you can use recursive CTEs to traverse the tree, but what is the best way to parameterize the input array into the query?
The following more or less works, but feels really sub-par because you have to split up the parameter array:
WITH RECURSIVE path(n, id, name, parent_id) AS (
SELECT
1, c.id, c.name, c.parent_id
FROM
categories c
WHERE c.name = 'Computers' AND parent_id IS NULL
UNION ALL
SELECT n+1, c.id, c.name, c.parent_id
FROM categories c,
(SELECT * FROM unnest(ARRAY['Laptops', 'Small']) WITH ORDINALITY np(name, m)) np,
path p
WHERE c.parent_id = p.id AND np.m = n AND np.name = c.name
)
SELECT * FROM path;
The CTE should look like this:
WITH RECURSIVE search AS (
SELECT ARRAY['Computers', 'Laptops', 'Small'] AS terms
), path (n, id, name, parent_id) AS (
SELECT 1, id, name, parent_id
FROM categories, search
WHERE name = terms[1]
UNION
SELECT p.n+1, c.id, c.name, c.parent_id
FROM categories c, path p, search s
WHERE c.parent_id = p.id
AND c.name = (s.terms)[p.n+1]
)
SELECT * FROM path;
The neat thing is that you specify the array just once and the other terms of the CTE then simply traverse the array, no matter how long the path. No unnesting required. Note that this also works for partial trees: ['Desktop', 'Big'] will nicely produce the right path (excluding, obviously, 'Computer').
SQLFiddle here

SQL to remove rows with duplicated value while keeping one

Say I have this table
id | data | value
-----------------
1 | a | A
2 | a | A
3 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
8 | c | C
I want to remove those rows with duplicated value for each data while keeping the one with the min id, e.g. the result will be
id | data | value
-----------------
1 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
I know a way to do it is to do a union like:
SELECT 1 [id], 'a' [data], 'A' [value] INTO #test UNION SELECT 2, 'a', 'A'
UNION SELECT 3, 'a', 'A' UNION SELECT 4, 'a', 'B'
UNION SELECT 5, 'b', 'C' UNION SELECT 6, 'c', 'A'
UNION SELECT 7, 'c', 'C' UNION SELECT 8, 'c', 'C'
SELECT * FROM #test WHERE id NOT IN (
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) > 1
UNION
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) <= 1
)
but this solution has to repeat the same group by twice (consider the real case is a massive group by with > 20 columns)
I would prefer a simpler answer with less code as oppose to complex ones. Is there any more concise way to code this?
Thank you
You can use one of the methods below:
Using WITH CTE:
WITH CTE AS
(SELECT *,RN=ROW_NUMBER() OVER(PARTITION BY data,value ORDER BY id)
FROM TableName)
DELETE FROM CTE WHERE RN>1
Explanation:
This query will select the contents of the table along with a row number RN. And then delete the records with RN >1 (which would be the duplicates).
This Fiddle shows the records which are going to be deleted using this method.
Using NOT IN:
DELETE FROM TableName
WHERE id NOT IN
(SELECT MIN(id) as id
FROM TableName
GROUP BY data,value)
Explanation:
With the given example, inner query will return ids (1,6,4,5,7). The outer query will delete records from table whose id NOT IN (1,6,4,5,7).
This fiddle shows the records which are going to be deleted using this method.
Suggestion: Use the first method since it is faster than the latter. Also, it manages to keep only one record if id field is also duplicated for the same data and value.
I want to add MYSQL solution for this query
Suggestion 1 : MySQL prior to version 8.0 doesn't support the WITH clause
Suggestion 2 : throw this error (you can't specify table TableName for update in FROM clause
So the solution will be
DELETE FROM TableName WHERE id NOT IN
(SELECT MIN(id) as id
FROM (select * from TableName) as t1
GROUP BY data,value) as t2;