Postgres: Best way to query hierarchy structures by name - postgresql

Suppose I have a hierarchy of categories as follows:
id | name | parent_id
---+------------+-----------
1 | Computers |
---+------------+-----------
2 | Laptops | 1
---+------------+-----------
3 | Desktops | 1
---+------------+-----------
4 | Big | 2
---+------------+-----------
5 | Small | 2
---+------------+-----------
4 | Big | 3
---+------------+-----------
5 | Small | 3
Now, suppose someone gives me the input ['Computers', 'Laptops', 'Small']. What is the best way in Postgres to query the hierarchy and arrive at the correct end category (e.g. id 5)?
I know you can use recursive CTEs to traverse the tree, but what is the best way to parameterize the input array into the query?
The following more or less works, but feels really sub-par because you have to split up the parameter array:
WITH RECURSIVE path(n, id, name, parent_id) AS (
SELECT
1, c.id, c.name, c.parent_id
FROM
categories c
WHERE c.name = 'Computers' AND parent_id IS NULL
UNION ALL
SELECT n+1, c.id, c.name, c.parent_id
FROM categories c,
(SELECT * FROM unnest(ARRAY['Laptops', 'Small']) WITH ORDINALITY np(name, m)) np,
path p
WHERE c.parent_id = p.id AND np.m = n AND np.name = c.name
)
SELECT * FROM path;

The CTE should look like this:
WITH RECURSIVE search AS (
SELECT ARRAY['Computers', 'Laptops', 'Small'] AS terms
), path (n, id, name, parent_id) AS (
SELECT 1, id, name, parent_id
FROM categories, search
WHERE name = terms[1]
UNION
SELECT p.n+1, c.id, c.name, c.parent_id
FROM categories c, path p, search s
WHERE c.parent_id = p.id
AND c.name = (s.terms)[p.n+1]
)
SELECT * FROM path;
The neat thing is that you specify the array just once and the other terms of the CTE then simply traverse the array, no matter how long the path. No unnesting required. Note that this also works for partial trees: ['Desktop', 'Big'] will nicely produce the right path (excluding, obviously, 'Computer').
SQLFiddle here

Related

partial match of ingredients (more then one) that return all the foods that contains all of the requested ingredients

I have a table called food with the following:
id name
1 lazania
2 pizza
3 toast
I have the table called ingredients with the following:
id name
1 milk
2 yellow cheese
3 bread
4 ketchup
then I have food_ingredients table with the following:
id food_id ingredient_id
1 1 1
2 1 2
3 2 3
4 2 4
5 3 2
6 3 4
of course the ingredients don't really belong to the food... this is just to show what I'm trying to resolve.
now I want the user to be able to search partial match for ingredients that it will return all the food ids that contains all of the searched ingredients
so if the user search for yellow c, ket it will show food id 3 because it contains both yellow cheese and ketchup.
if the user only search for ketchup it will return both food id 2 and 3 cause they both contains ketchup
if user searches for milk, bread, ketchup it will return none cause no food contains all 3 ingredients.
I'm really lost on how to implement such a query, any information regarding this issue would be greatly appreciated.
I use PostgreSQL version 12.2
thanks
This is a kind of aggregate problem, where we just need to count the number of input patterns which match the food ingredients. If we find all N ingredients match a food, that food is contained in the result.
Complete example of the solution, with data
WITH inputs (input) AS (
SELECT 'ye' UNION
SELECT 'ket'
)
SELECT f.id, f.name, COUNT(DISTINCT inp.input) AS n
FROM inputs AS inp
JOIN ingredients AS ing
ON ing.name LIKE inp.input||'%'
JOIN food_ingredients AS fi
ON fi.ingredient_id = ing.id
JOIN food AS f
ON fi.food_id = f.id
GROUP BY f.id
HAVING COUNT(DISTINCT inp.input) = (SELECT COUNT(*) FROM inputs)
;
Let's first start with selecting the ingredients from your table that match the search. We split the string using string_to_array and use pattern matching to compare to their names:
SELECT i.*
FROM ingredients i
RIGHT JOIN UNNEST(string_to_array($1, ', ')) AS term ON(i.name LIKE term || '%');
Notice that using the join, we can get NULLs for search terms that are not even in your ingredients table.
Now, to find the foods that have all these ingredients, we need to do some set logic and can use the EXISTS and EXCEPT operations:
SELECT *
FROM food
WHERE NOT EXISTS (
SELECT i.id
FROM ingredients i
RIGHT JOIN UNNEST(string_to_array($1, ', ')) AS term ON(i.name LIKE term || '%')
EXCEPT
SELECT ingredient_id
FROM food_ingredients
WHERE food_id = food.id
);
(Online demo)
Other approaches to find foods that include all the ingredients would be WHERE NOT EXISTS(SELECT … food_ingredients WHERE ingredient_id NOT IN (SELECT …)) or WHERE ARRAY(SELECT … food_ingredients) <# ARRAY(…).
To cover your requirement to include all search terms:
with invars (search_terms) as (
values ('yellow c, ket'),
('ketchup'),
('milk, bread, ketchup')
),
Match search terms to existing ingredients and carry an array of ingredient_id matching the search terms.
search_match as (
select v.search_terms, i.id as ingredient_id,
array_agg(i.id) over (partition by v.search_terms) as all_ingredients
from invars v
cross join lateral regexp_split_to_table(v.search_terms, ', ') as m(term)
join ingredients i on i.name ~ m.term
),
Find food_id for foods containing those ingredients
match_foods as (
select distinct s.search_terms, fi.food_id, s.all_ingredients
from search_match s
join food_ingredients fi on fi.ingredient_id = s.ingredient_id
)
Join back to the food_ingredient and food tables to find your result. Keep food_id only if it contains all of the all_ingredients.
select m.search_terms, m.food_id, f.name
from match_foods m
join food_ingredients fi on fi.food_id = m.food_id
join food f on f.id = m.food_id
group by m.search_terms, m.food_id, f.name, m.all_ingredients
having array_agg(fi.ingredient_id) #> m.all_ingredients;
Results:
| search_terms | food_id | name |
| ------------- | ------- | ----- |
| ketchup | 2 | pizza |
| ketchup | 3 | toast |
| yellow c, ket | 3 | toast |
Working fiddle

Cascading sum hierarchy using recursive cte

I'm trying to perform recursive cte with postgres but I can't wrap my head around it. In terms of performance issue there are only 50 items in TABLE 1 so this shouldn't be an issue.
TABLE 1 (expense):
id | parent_id | name
------------------------------
1 | null | A
2 | null | B
3 | 1 | C
4 | 1 | D
TABLE 2 (expense_amount):
ref_id | amount
-------------------------------
3 | 500
4 | 200
Expected Result:
id, name, amount
-------------------------------
1 | A | 700
2 | B | 0
3 | C | 500
4 | D | 200
Query
WITH RECURSIVE cte AS (
SELECT
expenses.id,
name,
parent_id,
expense_amount.total
FROM expenses
WHERE expenses.parent_id IS NULL
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
UNION ALL
SELECT
expenses.id,
expenses.name,
expenses.parent_id,
expense_amount.total
FROM cte
JOIN expenses ON expenses.parent_id = cte.id
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
)
SELECT
id,
SUM(amount)
FROM cte
GROUP BY 1
ORDER BY 1
Results
id | sum
--------------------
1 | null
2 | null
3 | 500
4 | 200
You can do a conditional sum() for only the root row:
with recursive tree as (
select id, parent_id, name, id as root_id
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.root_id
from expense c
join tree p on c.parent_id = p.id
)
select e.id,
e.name,
e.root_id,
case
when e.id = e.root_id then sum(ea.amount) over (partition by root_id)
else amount
end as amount
from tree e
left join expense_amount ea on e.id = ea.ref_id
order by id;
I prefer doing the recursive part first, then join the related tables to the result of the recursive query, but you could do the join to the expense_amount also inside the CTE.
Online example: http://rextester.com/TGQUX53703
However, the above only aggregates on the top-level parent, not for any intermediate non-leaf rows.
If you want to see intermediate aggregates as well, this gets a bit more complicated (and is probably not very scalable for large results, but you said your tables aren't that big)
with recursive tree as (
select id, parent_id, name, 1 as level, concat('/', id) as path, null::numeric as amount
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.level + 1, concat(p.path, '/', c.id), ea.amount
from expense c
join tree p on c.parent_id = p.id
left join expense_amount ea on ea.ref_id = c.id
)
select e.id,
lpad(' ', (e.level - 1) * 2, ' ')||e.name as name,
e.amount as element_amount,
(select sum(amount)
from tree t
where t.path like e.path||'%') as sub_tree_amount,
e.path
from tree e
order by path;
Online example: http://rextester.com/MCE96740
The query builds up a path of all IDs belonging to a (sub)tree and then uses a scalar sub-select to get all child rows belonging to a node. That sub-select is what will make this quite slow as soon as the result of the recursive query can't be kept in memory.
I used the level column to create a "visual" display of the tree structure - this helps me debugging the statement and understanding the result better. If you need the real name of an element in your program you would obviously only use e.name instead of pre-pending it with blanks.
I could not get your query to work for some reason. Here's my attempt that works for the particular table you provided (parent-child, no grandchild) without recursion. SQL Fiddle
--- step 1: get parent-child data together
with parent_child as(
select t.*, amount
from
(select e.id, f.name as name,
coalesce(f.name, e.name) as pname
from expense e
left join expense f
on e.parent_id = f.id) t
left join expense_amount ea
on ea.ref_id = t.id
)
--- final step is to group by id, name
select id, pname, sum(amount)
from
(-- step 2: group by parent name and find corresponding amount
-- returns A, B
select e.id, t.pname, t.amount
from expense e
join (select pname, sum(amount) as amount
from parent_child
group by 1) t
on t.pname = e.name
-- step 3: to get C, D we union and get corresponding columns
-- results in all rows and corresponding value
union
select id, name, amount
from expense e
left join expense_amount ea
on e.id = ea.ref_id
) t
group by 1, 2
order by 1;

how can I get all ids starting from a given id recursively in a postgresql table that references itself?

the title may not be very clear so let's consider this example (this is not my code, just taking this example to model my request)
I have a table that references itself (like a filesystem)
id | parent | name
----+----------+-------
1 | null | /
2 | 1 | home
3 | 2 | user
4 | 3 | bin
5 | 1 | usr
6 | 5 | local
Is it possible to make a sql request so if I choose :
1 I will get a table containing 2,3,4,5,6 (because this is the root) so matching :
/home
/home/user
/home/user/bin
/usr
etc...
2 I will get a table containing 3,4 so matching :
/home/user
/home/user/bin
and so on
Use recursive common table expression. Always starting from the root, use an array of ids to get paths for a given id in the WHERE clause.
For id = 1:
with recursive cte(id, parent, name, ids) as (
select id, parent, name, array[id]
from my_table
where parent is null
union all
select t.id, t.parent, concat(c.name, t.name, '/'), ids || t.id
from cte c
join my_table t on c.id = t.parent
)
select id, name
from cte
where 1 = any(ids) and id <> 1
id | name
----+-----------------------
2 | /home/
5 | /usr/
6 | /usr/local/
3 | /home/user/
4 | /home/user/bin/
(5 rows)
For id = 2:
with recursive cte(id, parent, name, ids) as (
select id, parent, name, array[id]
from my_table
where parent is null
union all
select t.id, t.parent, concat(c.name, t.name, '/'), ids || t.id
from cte c
join my_table t on c.id = t.parent
)
select id, name
from cte
where 2 = any(ids) and id <> 2
id | name
----+-----------------------
3 | /home/user/
4 | /home/user/bin/
(2 rows)
Bidirectional query
The question is really interesting. The above query works well but is inefficient as it parses all tree nodes even when we're asking for a leaf. The more powerful solution is a bidirectional recursive query. The inner query walks from a given node to top, while the outer one goes from the node to bottom.
with recursive outer_query(id, parent, name) as (
with recursive inner_query(qid, id, parent, name) as (
select id, id, parent, name
from my_table
where id = 2 -- parameter
union all
select qid, t.id, t.parent, concat(t.name, '/', q.name)
from inner_query q
join my_table t on q.parent = t.id
)
select qid, null::int, right(name, -1)
from inner_query
where parent is null
union all
select t.id, t.parent, concat(q.name, '/', t.name)
from outer_query q
join my_table t on q.id = t.parent
)
select id, name
from outer_query
where id <> 2; -- parameter

Select query for selecting columns from those records from the inner query . where inner query and outer query have different columns

I have a group by query which fetches me some records. What if I wish to find other column details representing those records.
Suppose I have a query as follows .Select id,max(date) from records group by id;
to fetch the most recent entry in the table.
I wish to fetch another column representing those records .
I want to do something like this (This incorrect query is just for example) :
Select type from (Select id,max(date) from records group by id) but here type doesnt exist in the inner query.
I am not able to define the question in a simpler manner.I Apologise for that.
Any help is appreciated.
EDIT :
Column | Type | Modifiers
--------+-----------------------+-----------
id | integer |
rdate | date |
type | character varying(20) |
Sample Data :
id | rdate | type
----+------------+------
1 | 2013-11-03 | E1
1 | 2013-12-12 | E1
2 | 2013-12-12 | A3
3 | 2014-01-11 | B2
1 | 2014-01-15 | A1
4 | 2013-12-23 | C1
5 | 2014-01-05 | C
7 | 2013-12-20 | D
8 | 2013-12-20 | D
9 | 2013-12-23 | A1
While I was trying something like this (I'm no good at sql) : select type from records as r1 inner join (Select id,max(rdate) from records group by id) r2 on r1.rdate = r2.rdate ;
or
select type from records as r1 ,(Select id,max(rdate) from records group by id) r2 inner join r1 on r1.rdate = r2.rdate ;
You can easily do this with a window function:
SELECT id, rdate, type
FROM (
SELECT id, rdate, type, rank() OVER (PARTITION BY id ORDER BY rdate DESC) rnk
FROM records
WHERE rnk = 1
) foo
ORDER BY id;
The window definition OVER (PARTITION BY id ORDER BY rdate DESC) takes all records with the same id value, then sorts then from most recent to least recent rdate and assigns a rank to each row. The rank of 1 is the most recent, so equivalent to max(rdate).
If I've understood the question right, then this should work (or at least get you something you can work with):
SELECT
b.id, b.maxdate, a.type
FROM
records a -- this is the records table, where you'll get the type
INNER JOIN -- now join it to the group by query
(select id, max(rdate) as maxdate FROM records GROUP BY id) b
ON -- join on both rdate and id, otherwise you'll get lots of duplicates
b.id = a.id
AND b.maxdate = a.rdate
Note that if you have records with different types for the same id and rdate combination you'll get duplicates.

SQL to remove rows with duplicated value while keeping one

Say I have this table
id | data | value
-----------------
1 | a | A
2 | a | A
3 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
8 | c | C
I want to remove those rows with duplicated value for each data while keeping the one with the min id, e.g. the result will be
id | data | value
-----------------
1 | a | A
4 | a | B
5 | b | C
6 | c | A
7 | c | C
I know a way to do it is to do a union like:
SELECT 1 [id], 'a' [data], 'A' [value] INTO #test UNION SELECT 2, 'a', 'A'
UNION SELECT 3, 'a', 'A' UNION SELECT 4, 'a', 'B'
UNION SELECT 5, 'b', 'C' UNION SELECT 6, 'c', 'A'
UNION SELECT 7, 'c', 'C' UNION SELECT 8, 'c', 'C'
SELECT * FROM #test WHERE id NOT IN (
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) > 1
UNION
SELECT MIN(id) FROM #test
GROUP BY [data], [value]
HAVING COUNT(1) <= 1
)
but this solution has to repeat the same group by twice (consider the real case is a massive group by with > 20 columns)
I would prefer a simpler answer with less code as oppose to complex ones. Is there any more concise way to code this?
Thank you
You can use one of the methods below:
Using WITH CTE:
WITH CTE AS
(SELECT *,RN=ROW_NUMBER() OVER(PARTITION BY data,value ORDER BY id)
FROM TableName)
DELETE FROM CTE WHERE RN>1
Explanation:
This query will select the contents of the table along with a row number RN. And then delete the records with RN >1 (which would be the duplicates).
This Fiddle shows the records which are going to be deleted using this method.
Using NOT IN:
DELETE FROM TableName
WHERE id NOT IN
(SELECT MIN(id) as id
FROM TableName
GROUP BY data,value)
Explanation:
With the given example, inner query will return ids (1,6,4,5,7). The outer query will delete records from table whose id NOT IN (1,6,4,5,7).
This fiddle shows the records which are going to be deleted using this method.
Suggestion: Use the first method since it is faster than the latter. Also, it manages to keep only one record if id field is also duplicated for the same data and value.
I want to add MYSQL solution for this query
Suggestion 1 : MySQL prior to version 8.0 doesn't support the WITH clause
Suggestion 2 : throw this error (you can't specify table TableName for update in FROM clause
So the solution will be
DELETE FROM TableName WHERE id NOT IN
(SELECT MIN(id) as id
FROM (select * from TableName) as t1
GROUP BY data,value) as t2;