Multi-table recursive sql statement - postgresql

I have been struggling to optimize a recursive call done purely in ruby. I have moved the data onto a postgresql database, and I would like to make use of the WITH RECURSIVE function that postgresql offers.
The examples that I could find all seems to use a single table, such as a menu or a categories table.
My situation is slightly different. I have a questions and an answers table.
+----------------------+ +------------------+
| questions | | answers |
+----------------------+ +------------------+
| id | | source_id | <- from question ID
| start_node (boolean) | | target_id | <- to question ID
| end_node (boolean) | +------------------+
+----------------------+
I would like to fetch all questions that's connected together by the related answers.
I would also like to be able to go the other way in the tree, e.g from any given node to the root node in the tree.
To give another example of a question-answer tree in a graphical way:
Q1
|-- A1
| '-- Q2
| |-- A2
| | '-- Q3
| '-- A3
| '-- Q4
'-- A4
'-- Q5
As you can see, a question can have multiple outgoing questions, but they can also have multiple incoming answers -- any-to-many.
I hope that someone has a good idea, or can point me to some examples, articles or guides.
Thanks in advance, everybody.
Regards,
Emil

This is far, far from ideal but I would play around recursive query over joins, like that:
WITH RECURSIVE questions_with_answers AS (
SELECT
q.*, a.*
FROM
questions q
LEFT OUTER JOIN
answers a ON (q.id = a.source_id)
UNION ALL
SELECT
q.*, a.*
FROM
questions_with_answers qa
JOIN
questions q ON (qa.target_id = q.id)
LEFT OUTER JOIN
answers a ON (q.id = a.source_id)
)
SELECT * FROM questions_with_answers WHERE source_id IS NOT NULL AND target_id IS NOT NULL;
Which gives me result:
id | name | start_node | end_node | source_id | target_id
----+------+------------+----------+-----------+-----------
1 | Q1 | | | 1 | 2
2 | A1 | | | 2 | 3
3 | Q2 | | | 3 | 4
3 | Q2 | | | 3 | 6
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
1 | Q1 | | | 1 | 8
8 | A4 | | | 8 | 9
2 | A1 | | | 2 | 3
3 | Q2 | | | 3 | 6
3 | Q2 | | | 3 | 4
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
8 | A4 | | | 8 | 9
3 | Q2 | | | 3 | 6
3 | Q2 | | | 3 | 4
6 | A3 | | | 6 | 7
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
4 | A2 | | | 4 | 5
(20 rows)

In fact you do not need two tables.
I would like to encourage you to analyse this example.
Maintaining one table instead of two will save you a lot of trouble, especially when it comes to recursive queries.
This minimal structure contains all the necessary information:
create table the_table (id int primary key, parent_id int);
insert into the_table values
(1, 0), -- root question
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 1),
(7, 3),
(8, 0), -- root question
(9, 8);
Whether the node is a question or an answer depends on its position in the tree. Of course, you can add a column with information about the type of node to the table.
Use this query to get answer for both your requests (uncomment adequate where condition):
with recursive cte(id, parent_id, depth, type, root) as (
select id, parent_id, 1, 'Q', id
from the_table
where parent_id = 0
-- and id = 1 <-- looking for list of a&q for root question #1
union all
select
t.id, t.parent_id, depth+ 1,
case when (depth & 1)::boolean then 'A' else 'Q' end, c.root
from cte c
join the_table t on t.parent_id = c.id
)
select *
from cte
-- where id = 9 <-- looking for root question for answer #9
order by id;
id | parent_id | depth | type | root
----+-----------+-------+------+------
1 | 0 | 1 | Q | 1
2 | 1 | 2 | A | 1
3 | 1 | 2 | A | 1
4 | 2 | 3 | Q | 1
5 | 2 | 3 | Q | 1
6 | 1 | 2 | A | 1
7 | 3 | 3 | Q | 1
8 | 0 | 1 | Q | 8
9 | 8 | 2 | A | 8
(9 rows)
The relationship child - parent is unambiguous and applies to both sides. There is no need to store this information twice. In other words, if we store information about parents, the information about children is redundant (and vice versa). It is one of the fundamental properties of the data structure called tree. See the examples:
-- find parent of node #6
select parent_id
from the_table
where id = 6;
-- find children of node #6
select id
from the_table
where parent_id = 6;

Related

Find rows in relation with at least n rows in a different table without joins

I have a table as such (tbl):
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
| 3 | faso | 11 |
+----+------+-----+
And another table in n-to-1 relationship with tbl (tbl2):
+----+-----+
| pk | rel |
+----+-----+
| 0 | 0 |
| 1 | 1 |
| 2 | 0 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
| 7 | 2 |
+----+-----+
(tbl2.rel -> tbl.pk.)
I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.
I.e., for n = 2, I want this table:
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
+----+------+-----+
This is the solution I came up with:
SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
SELECT tbl.pk
FROM tbl
RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
GROUP BY tbl.pk
HAVING COUNT(tbl2.*) >= 2 -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk
Can it be done without selecting the candidates with a subquery and re-joining the table with itself?
I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.
OK, just join once, as below:
select
t1.pk,
t1.attr,
t1.val
from
tbl t1
join
tbl2 t2 on t1.pk = t2.rel
group by
t1.pk,
t1.attr,
t1.val
having(count(1)>=2) order by t1.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Or just join once and use CTE(with clause), as below:
with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Is the SQL clearer?

postgres sql : getting unified rows

I have one table where I dump all records from different sources (x, y, z) like below
+----+------+--------+
| id | source |
+----+--------+
| 1 | x |
| 2 | y |
| 3 | x |
| 4 | x |
| 5 | y |
| 6 | z |
| 7 | z |
| 8 | x |
| 9 | z |
| 10 | z |
+----+--------+
Then I have one mapping table where I map values between sources based on my usecase like below
+----+-----------+
| id | mapped_id |
+----+-----------+
| 1 | 2 |
| 1 | 9 |
| 3 | 7 |
| 4 | 10 |
| 5 | 1 |
+----+-----------+
I want merged results where I can see only unique results like
+-----+------------+
| id | mapped_ids |
+-----+------------+
| 1 | 2,9,5 |
| 3 | 7 |
| 4 | 10 |
| 6 | null |
| 8 | null |
+-----+------------+
I am trying different options but could not figure this out, is there way I can write joins to do this. I have to use the mapping table where associations are stored and identify unique records along with records which are not mapped anywhere.
My understanding is, you want to see all dump_table IDs that do not appear in the mapping_id column and then aggregate the mapped_ids for those that are left:
select d1.id,
array_agg(m1.mapped_id order by m1.mapped_id) filter (where m1.mapped_id is not null) as mapped_ids
from dump_table d1
left join mapping_table m1 using (id)
where not exists (select *
from mapping_table m2
where m2.mapped_id = d1.id)
group by d1.id;
Online example: https://rextester.com/JQZ17650
Try something like this:
SELECT id, name, ARRAY_AGG(mapped_id) AS mapped_ids
FROM table1 AS t1
LEFT JOIN table2 AS t2 USING (id)
GROUP BY id, name

How to change the query to remain only leaf nodes

I have table with the following data:
id | parent_id | short_name
----+-----------+----------------
6 | 5 | cpu
7 | 5 | ram
14 | 9 | tier-a
15 | 9 | rfc1918
16 | 9 | tolerant
17 | 9 | nononymous
13 | 12 | cloudstack
5 | 13 | virtualmachine
8 | 13 | volume
9 | 13 | ipv4
3 | | domain
4 | | account
12 | | vdc
(13 rows)
with recursive query it looks like this:
with recursive tree ( id, parent_id, short_name, deep_name ) as (
select resource_type_id, parent_resource_type_id, short_name, short_name::text
from resource_type
where parent_resource_type_id is null
union all
select rt.resource_type_id as id, rt.parent_resource_type_id, rt.short_name,
tree.deep_name || '.' || rt.short_name
from tree, resource_type rt
where tree.id = rt.parent_resource_type_id
)
select * from tree;
id | parent_id | short_name | deep_name
----+-----------+----------------+-----------------------------------
4 | | account | account
3 | | domain | domain
12 | | vdc | vdc
13 | 12 | cloudstack | vdc.cloudstack
9 | 13 | ipv4 | vdc.cloudstack.ipv4
5 | 13 | virtualmachine | vdc.cloudstack.virtualmachine
8 | 13 | volume | vdc.cloudstack.volume
6 | 5 | cpu | vdc.cloudstack.virtualmachine.cpu
15 | 9 | rfc1918 | vdc.cloudstack.ipv4.rfc1918
17 | 9 | nononymous | vdc.cloudstack.ipv4.nononymous
16 | 9 | tolerant | vdc.cloudstack.ipv4.tolerant
14 | 9 | tier-a | vdc.cloudstack.ipv4.tier-a
7 | 5 | ram | vdc.cloudstack.virtualmachine.ram
(13 rows)
How to fix the query so in result I get only leafs? eg. vdc.cloudstack.volume row and no vdc, vdc.cloudstack rows
UPD
rows with no children
Exclude the rows where deep_name has a superstring somewhere else in the table:
WITH RECURSIVE tree AS (...)
SELECT * FROM tree AS t1
WHERE NOT EXISTS (
SELECT 1 FROM tree AS t2
WHERE t2.deep_name
LIKE t1.deep_name || '.%'
);
Laurenz Albe's answer give me an idea. I think it would be more efficient to count childs than working with strings.
My solution is:
WITH RECURSIVE tree AS (...)
SELECT * FROM tree t1
WHERE not EXISTS ( SELECT 1 FROM tree t2 WHERE t1.id = t2.parent_id );
A leaf node is a child which is not itself a parent.
If all you want is a list of leaf notes you don't need the recursive CTE, you just need an anti-join in your preferred format.
If (as I imagine you do) you need the deep_name, I would anti-join the result of the recursive CTE to the raw source table on id = parent_id.
WITH RECURSIVE tree AS (...)
SELECT * FROM tree AS t1
WHERE NOT EXISTS (SELECT 1 FROM resource_type AS t2
WHERE t2.parent_resource_type_id = t1.id);

Finding value difference in column pairs

I'm using SQL server 2008R2 and I have a view which returns the following:
+----+-------+-------+-------+-------+-------+-------+
| ID | col1A | col1B | col2A | col2B | col3A | col3B |
+----+-------+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 3 | 5 | 4 | 4 |
| 2 | 1 | 1 | 5 | 5 | 5 | 4 |
| 3 | 3 | 4 | 5 | 5 | 4 | 4 |
| 4 | 1 | 2 | 5 | 5 | 4 | 3 |
| 5 | 1 | 1 | 2 | 2 | 3 | 3 |
+----+-------+-------+-------+-------+-------+-------+
As you can see this view contains column pairs (col1A and col1B), (col2A and col2B), (col3A and col3B).
I need to query this view and find rows where the column pairs contain different values.
So I would be looking to return:
+----+------------+---+-----+
| ID | ColumnType | A | B |
+----+------------+---+-----+
| 1 | Col2 | 3 | 5 |
| 2 | Col3 | 5 | 4 |
| 3 | Col1 | 3 | 4 |
| 4 | Col1 | 1 | 2 |
| 4 | Col3 | 4 | 3 |
+----+------------+---+-----+
I think I need to use UNPIVOT but not sure how – appreciate any suggestions?
Since you are using SQL Server 2008+ you can use CROSS APPLY to unpivot the pair of columns and then you can easily compare the values in the A and B to return the rows that don't match:
select t.ID,
c.ColumnType,
c.A,
c.B
from [dbo].[yourview] t
cross apply
(
values
('Col1', Col1A, Col1B),
('Col2', Col2A, Col2B),
('Col3', Col3A, Col3B)
) c (ColumnType, A, B)
where c.A <> c.B;
If you have different datatypes in your columns, then you'll need to convert the data to the same type. You can do this conversion within the VALUES clause:
select t.ID,
c.ColumnType,
c.A,
c.B
from [dbo].[yourview] t
cross apply
(
values
('Col1', cast(Col1A as varchar(50)), Col1B),
('Col2', cast(Col2A as varchar(50)), Col2B),
('Col3', cast(Col3A as varchar(50)), Col3B)
) c (ColumnType, A, B)
where c.A <> c.B

PostgreSQL Query?

DB
| ID| VALUE | Parent | Position | lft | rgt |
|---|:------:|:-------:|:--------------:|:--------:|:--------:|
| 1 | A | | | 1 | 12 |
| 2 | B | 1 | L | 2 | 9 |
| 3 | C | 1 | R | 10 | 11 |
| 4 | D | 2 | L | 3 | 6 |
| 5 | F | 2 | R | 7 | 8 |
| 6 | G | 4 | L | 4 | 5 |
Get All Nodes directly under current Node in left side
SELECT "categories".* FROM "categories" WHERE ("categories"."position" = 'L') AND ("categories"."lft" >= 1 AND "categories"."lft" < 12) ORDER BY "categories"."lft"
output { B,D,G } incoorect!
Question !
how have Nodes directly under current Node in left and right side?
output-lft {B,D,F,G}
output-rgt {C}
It sounds like you're after something analogous to Oracle's CONNECT_BY statement, which is used to connect hierarchical data stored in a flat table.
It just so happens there's a way to do this with Postgres, using a recursive CTE.
here is the statement I came up with.
WITH RECURSIVE sub_categories AS
(
-- non-recursive term
SELECT * FROM categories WHERE position IS NOT NULL
UNION ALL
-- recursive term
SELECT c.*
FROM
categories AS c
JOIN
sub_categories AS sc
ON (c.parent = sc.id)
)
SELECT DISTINCT categories.value
FROM categories,
sub_categories
WHERE ( categories.parent = sub_categories.id
AND sub_categories.position = 'L' )
OR ( categories.parent = 1
AND categories.position = 'L' )
here is a SQL Fiddle with a working example.