Parent/child table with include/exclude items - postgresql

I have a table with parent-child relations. The relations can go n-level deep.
There is also a table with elements that belong to a group.
CREATE TABLE group_children(
id serial PRIMARY KEY,
parent_id integer,
children_id integer,
contains boolean
);
CREATE TABLE group_item(
id serial PRIMARY KEY,
group_id integer,
name text
);
INSERT INTO group_children(parent_id, children_id, contains) VALUES
(1, 2, true),
(1, 3, false),
(2, 4, true),
(2, 5, false),
(3, 6, true),
(3, 7, false);
INSERT INTO group_item(group_id, name) VALUES
(4, 'aaa'),
(4, 'bbb'),
(5, 'bbb'),
(5, 'ccc'),
(6, 'aaa'),
(6, 'bbb'),
(7, 'aaa'),
(7, 'ccc');
So, we can represent this data as
It is not necessary to be in the form of a binary tree, just a simple case. Group can contains m child.
Need to read right to left. Group 4 contains ['aaa', 'bbb'], group 5 - ['bbb', 'ccc']. Group 2 includes all items from group 4 and excludes from group 5. So group 2 contains ['aaa']. And so on. After all computation group 1 will contain ['aaa'].
Question is: how to build a sql query to get all items that belong to the group 1?
All i could do:
WITH RECURSIVE r AS (
SELECT group_children.parent_id, group_children.children_id, group_children.contains, group_item.name
FROM group_children
LEFT JOIN group_item ON group_children.children_id = group_item.group_id
WHERE parent_id = 1
UNION ALL
SELECT group_children.parent_id, group_children.children_id, group_children.contains, group_item.name
FROM group_children
LEFT JOIN group_item ON group_children.children_id = group_item.group_id
JOIN r ON group_children.parent_id = r.children_id
)
SELECT * FROM r;
SQL Fiddle

demo:db<>fiddle
WITH RECURSIVE items AS (
SELECT -- 1
group_id,
array_agg(name)
FROM
group_Item
GROUP BY group_id
UNION
SELECT DISTINCT
parent_id,
array_agg(unnest) FILTER (WHERE bool_and) OVER (PARTITION BY parent_id) -- 5
FROM (
SELECT
parent_id,
unnest,
bool_and(contains) OVER (PARTITION BY parent_id, unnest) -- 4
FROM items i
JOIN group_children gc -- 2
ON i.group_id = gc.children_id,
unnest(array_agg) -- 3
) s
)
SELECT * FROM items
The non-recursive part aggregates all names per group_id
Recursive part: Joining the chilren against their parents
Expanding the name arrays into one element per row.
This results in:
| group_id | array_agg | id | parent_id | children_id | contains | unnest |
|----------|-----------|----|-----------|-------------|----------|--------|
| 4 | {aaa,bbb} | 3 | 2 | 4 | true | aaa |
| 4 | {aaa,bbb} | 3 | 2 | 4 | true | bbb |
| 5 | {bbb,ccc} | 4 | 2 | 5 | false | bbb |
| 5 | {bbb,ccc} | 4 | 2 | 5 | false | ccc |
| 6 | {aaa,bbb} | 5 | 3 | 6 | true | aaa |
| 6 | {aaa,bbb} | 5 | 3 | 6 | true | bbb |
| 7 | {aaa,ccc} | 6 | 3 | 7 | false | aaa |
| 7 | {aaa,ccc} | 6 | 3 | 7 | false | ccc |
Now you have the unnested names. Now you want to find the ones that have to be excluded. Taking the bbb element for the parent_id = 2: There is one row with contains = true and one with contains = false. This should be excluded. Therefore all the names per parent_id have to be grouped. The contains values can be aggregated with boolean operators. The aggregate function bool_and gives only true if all elements are true. So bbb would get a false (The aggregation needs to be done as a window function because GROUP BY is not allowed within the recursive part for some reasons):
Result:
| parent_id | unnest | bool_and |
|-----------|--------|----------|
| 2 | aaa | true |
| 2 | bbb | false |
| 2 | bbb | false |
| 2 | ccc | false |
| 3 | aaa | false |
| 3 | aaa | false |
| 3 | bbb | true |
| 3 | ccc | false |
After that the unnested names can be grouped per parent_id. The FILTER clause only aggregates the elements where the bool_and is true. Of course you need to do this in a window function again. This creates duplicate records which can be removed by the DISTINCT clause
Final result (which of course could be filtered by the element 1):
| group_id | array_agg |
|----------|-----------|
| 5 | {bbb,ccc} |
| 4 | {aaa,bbb} |
| 6 | {aaa,bbb} |
| 7 | {aaa,ccc} |
| 2 | {aaa} |
| 3 | {bbb} |
| 1 | {aaa} |

Related

Finding root node for every element recursively in SQL

Consider the following sample from the table category:
category_id | category_name | super_category
-------------+----------------------------------------------------------------+----------------
1 | Features +|
| |
2 | Alle SACDs +| 1
| |
3 | Formate +|
| |
4 | Box-Sets +| 3
| |
5 | Action, Thriller & Horror +| 4
| |
6 | Alternative +| 4
| |
7 | Blues +| 4
| |
8 | Country +| 4
| |
9 | Alternative Country & Americana& Country | 8
10 | Bestseller& Country | 8
11 | Bluegrass& Country | 8
...
The column super_category lists the immediate parent category for any given category. When super_category is NULL, that category is a main category.
Table with all of the main categories:
SELECT * FROM category WHERE super_category IS NULL;
category_id | category_name | super_category
-------------+----------------+----------------
1 | Features +|
| |
3 | Formate +|
| |
497 | Formats +|
| |
544 | Genres +|
| |
923 | Interpreten +|
| |
941 | Kategorien +|
| |
19208 | Schauspieler +|
| |
19211 | Shop-überblick+|
| |
19502 | Shops +|
| |
21350 | Subjects +|
| |
21513 | Unter 10 EUR +|
| |
21520 | Unter 15 EUR +|
I need to write a recursive query that outputs a table that lists all the categories and which main categories the belong to.
So far I only have the following:
WITH RECURSIVE query AS (
SELECT cat.category_id, cat.super_category
FROM category cat
WHERE cat.super_category IS NULL
UNION ALL
SELECT query.category_id, cat.super_category
FROM query JOIN category cat on cat.super_category = query.category_id
)
SELECT * FROM query;
The logic is as follows:
We define the base case where the category is a main category (super_category IS NULL)
Then we define the recursive case, but I am not sure how I should define it.
Any suggestions?
You are in the correct path with the use of a recursive CTE. You can do:
with recursive
n as (
select category_id, category_name, super_category, 0 as level
from category
union all
select n.category_id, n.category_name, c.super_category, n.level + 1
from n
join category c on c.category_id = n.super_category
)
select category_id, category_name, super_category
from n
where (category_id, level) in (
select distinct on (category_id) category_id, level
from n
order by category_id, level desc
)

how to select specific values from table

Well I've thought that will be a easy query , but found out it's not.
Straight to point.
Let's say I have following table named MyTable:
| ID | Val1 | Val2 | GroupName
--------------------------------------------------
| 1 | 1 | null | GroupA
| 2 | 2 | 1 | GroupA
| 3 | 3 | 2 | GroupA
| 4 | 4 | 3 | GroupA
| 5 | 1 | | GroupB
| 6 | 2 | 1 | GroupB
| 7 | 3 | 2 | GroupB
| 8 | 2 | 1 | GroupC
| 9 | 3 | 2 | GroupC
| 10 | 4 | 3 | GroupC
| 11 | 5 | 4 | GroupC
Unfortunatelly Val1,Val2 and GroupName are strings.
What I'd like to achieve is result like
SELECT T.GroupName FROM Mytable T WHERE T.GroupName NOT IN
(
SELECT T2.GroupName FROM Mytable T2
WHERE T2.Val2 IS NULL OR LEN(T2.Val2)=0
)
GROUP BY T.GroupName
So basically I'd like to get all rows where data grouped around specyfic GroupName column there is not case like GroupC where we don't have in Val2 null or empty. Empty or null is required to pass.
Val1 and Val2 are related and enclosed with the same GroupName:
example
Val2 with Id=3 is actually taken from the same table with ID=2 for GroupA
So my finall result would be :
|GroupName
------------
|GroupC
How to query that correctly?
JNavil you were right - data issue. In a few thousands of records one of them had null value inside GroupName. Thank you bro!

Find rows in relation with at least n rows in a different table without joins

I have a table as such (tbl):
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
| 3 | faso | 11 |
+----+------+-----+
And another table in n-to-1 relationship with tbl (tbl2):
+----+-----+
| pk | rel |
+----+-----+
| 0 | 0 |
| 1 | 1 |
| 2 | 0 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
| 7 | 2 |
+----+-----+
(tbl2.rel -> tbl.pk.)
I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.
I.e., for n = 2, I want this table:
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
+----+------+-----+
This is the solution I came up with:
SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
SELECT tbl.pk
FROM tbl
RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
GROUP BY tbl.pk
HAVING COUNT(tbl2.*) >= 2 -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk
Can it be done without selecting the candidates with a subquery and re-joining the table with itself?
I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.
OK, just join once, as below:
select
t1.pk,
t1.attr,
t1.val
from
tbl t1
join
tbl2 t2 on t1.pk = t2.rel
group by
t1.pk,
t1.attr,
t1.val
having(count(1)>=2) order by t1.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Or just join once and use CTE(with clause), as below:
with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Is the SQL clearer?

Multi-table recursive sql statement

I have been struggling to optimize a recursive call done purely in ruby. I have moved the data onto a postgresql database, and I would like to make use of the WITH RECURSIVE function that postgresql offers.
The examples that I could find all seems to use a single table, such as a menu or a categories table.
My situation is slightly different. I have a questions and an answers table.
+----------------------+ +------------------+
| questions | | answers |
+----------------------+ +------------------+
| id | | source_id | <- from question ID
| start_node (boolean) | | target_id | <- to question ID
| end_node (boolean) | +------------------+
+----------------------+
I would like to fetch all questions that's connected together by the related answers.
I would also like to be able to go the other way in the tree, e.g from any given node to the root node in the tree.
To give another example of a question-answer tree in a graphical way:
Q1
|-- A1
| '-- Q2
| |-- A2
| | '-- Q3
| '-- A3
| '-- Q4
'-- A4
'-- Q5
As you can see, a question can have multiple outgoing questions, but they can also have multiple incoming answers -- any-to-many.
I hope that someone has a good idea, or can point me to some examples, articles or guides.
Thanks in advance, everybody.
Regards,
Emil
This is far, far from ideal but I would play around recursive query over joins, like that:
WITH RECURSIVE questions_with_answers AS (
SELECT
q.*, a.*
FROM
questions q
LEFT OUTER JOIN
answers a ON (q.id = a.source_id)
UNION ALL
SELECT
q.*, a.*
FROM
questions_with_answers qa
JOIN
questions q ON (qa.target_id = q.id)
LEFT OUTER JOIN
answers a ON (q.id = a.source_id)
)
SELECT * FROM questions_with_answers WHERE source_id IS NOT NULL AND target_id IS NOT NULL;
Which gives me result:
id | name | start_node | end_node | source_id | target_id
----+------+------------+----------+-----------+-----------
1 | Q1 | | | 1 | 2
2 | A1 | | | 2 | 3
3 | Q2 | | | 3 | 4
3 | Q2 | | | 3 | 6
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
1 | Q1 | | | 1 | 8
8 | A4 | | | 8 | 9
2 | A1 | | | 2 | 3
3 | Q2 | | | 3 | 6
3 | Q2 | | | 3 | 4
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
8 | A4 | | | 8 | 9
3 | Q2 | | | 3 | 6
3 | Q2 | | | 3 | 4
6 | A3 | | | 6 | 7
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
4 | A2 | | | 4 | 5
(20 rows)
In fact you do not need two tables.
I would like to encourage you to analyse this example.
Maintaining one table instead of two will save you a lot of trouble, especially when it comes to recursive queries.
This minimal structure contains all the necessary information:
create table the_table (id int primary key, parent_id int);
insert into the_table values
(1, 0), -- root question
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 1),
(7, 3),
(8, 0), -- root question
(9, 8);
Whether the node is a question or an answer depends on its position in the tree. Of course, you can add a column with information about the type of node to the table.
Use this query to get answer for both your requests (uncomment adequate where condition):
with recursive cte(id, parent_id, depth, type, root) as (
select id, parent_id, 1, 'Q', id
from the_table
where parent_id = 0
-- and id = 1 <-- looking for list of a&q for root question #1
union all
select
t.id, t.parent_id, depth+ 1,
case when (depth & 1)::boolean then 'A' else 'Q' end, c.root
from cte c
join the_table t on t.parent_id = c.id
)
select *
from cte
-- where id = 9 <-- looking for root question for answer #9
order by id;
id | parent_id | depth | type | root
----+-----------+-------+------+------
1 | 0 | 1 | Q | 1
2 | 1 | 2 | A | 1
3 | 1 | 2 | A | 1
4 | 2 | 3 | Q | 1
5 | 2 | 3 | Q | 1
6 | 1 | 2 | A | 1
7 | 3 | 3 | Q | 1
8 | 0 | 1 | Q | 8
9 | 8 | 2 | A | 8
(9 rows)
The relationship child - parent is unambiguous and applies to both sides. There is no need to store this information twice. In other words, if we store information about parents, the information about children is redundant (and vice versa). It is one of the fundamental properties of the data structure called tree. See the examples:
-- find parent of node #6
select parent_id
from the_table
where id = 6;
-- find children of node #6
select id
from the_table
where parent_id = 6;

Finding value difference in column pairs

I'm using SQL server 2008R2 and I have a view which returns the following:
+----+-------+-------+-------+-------+-------+-------+
| ID | col1A | col1B | col2A | col2B | col3A | col3B |
+----+-------+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 3 | 5 | 4 | 4 |
| 2 | 1 | 1 | 5 | 5 | 5 | 4 |
| 3 | 3 | 4 | 5 | 5 | 4 | 4 |
| 4 | 1 | 2 | 5 | 5 | 4 | 3 |
| 5 | 1 | 1 | 2 | 2 | 3 | 3 |
+----+-------+-------+-------+-------+-------+-------+
As you can see this view contains column pairs (col1A and col1B), (col2A and col2B), (col3A and col3B).
I need to query this view and find rows where the column pairs contain different values.
So I would be looking to return:
+----+------------+---+-----+
| ID | ColumnType | A | B |
+----+------------+---+-----+
| 1 | Col2 | 3 | 5 |
| 2 | Col3 | 5 | 4 |
| 3 | Col1 | 3 | 4 |
| 4 | Col1 | 1 | 2 |
| 4 | Col3 | 4 | 3 |
+----+------------+---+-----+
I think I need to use UNPIVOT but not sure how – appreciate any suggestions?
Since you are using SQL Server 2008+ you can use CROSS APPLY to unpivot the pair of columns and then you can easily compare the values in the A and B to return the rows that don't match:
select t.ID,
c.ColumnType,
c.A,
c.B
from [dbo].[yourview] t
cross apply
(
values
('Col1', Col1A, Col1B),
('Col2', Col2A, Col2B),
('Col3', Col3A, Col3B)
) c (ColumnType, A, B)
where c.A <> c.B;
If you have different datatypes in your columns, then you'll need to convert the data to the same type. You can do this conversion within the VALUES clause:
select t.ID,
c.ColumnType,
c.A,
c.B
from [dbo].[yourview] t
cross apply
(
values
('Col1', cast(Col1A as varchar(50)), Col1B),
('Col2', cast(Col2A as varchar(50)), Col2B),
('Col3', cast(Col3A as varchar(50)), Col3B)
) c (ColumnType, A, B)
where c.A <> c.B