Finding root node for every element recursively in SQL - postgresql

Consider the following sample from the table category:
category_id | category_name | super_category
-------------+----------------------------------------------------------------+----------------
1 | Features +|
| |
2 | Alle SACDs +| 1
| |
3 | Formate +|
| |
4 | Box-Sets +| 3
| |
5 | Action, Thriller & Horror +| 4
| |
6 | Alternative +| 4
| |
7 | Blues +| 4
| |
8 | Country +| 4
| |
9 | Alternative Country & Americana& Country | 8
10 | Bestseller& Country | 8
11 | Bluegrass& Country | 8
...
The column super_category lists the immediate parent category for any given category. When super_category is NULL, that category is a main category.
Table with all of the main categories:
SELECT * FROM category WHERE super_category IS NULL;
category_id | category_name | super_category
-------------+----------------+----------------
1 | Features +|
| |
3 | Formate +|
| |
497 | Formats +|
| |
544 | Genres +|
| |
923 | Interpreten +|
| |
941 | Kategorien +|
| |
19208 | Schauspieler +|
| |
19211 | Shop-überblick+|
| |
19502 | Shops +|
| |
21350 | Subjects +|
| |
21513 | Unter 10 EUR +|
| |
21520 | Unter 15 EUR +|
I need to write a recursive query that outputs a table that lists all the categories and which main categories the belong to.
So far I only have the following:
WITH RECURSIVE query AS (
SELECT cat.category_id, cat.super_category
FROM category cat
WHERE cat.super_category IS NULL
UNION ALL
SELECT query.category_id, cat.super_category
FROM query JOIN category cat on cat.super_category = query.category_id
)
SELECT * FROM query;
The logic is as follows:
We define the base case where the category is a main category (super_category IS NULL)
Then we define the recursive case, but I am not sure how I should define it.
Any suggestions?

You are in the correct path with the use of a recursive CTE. You can do:
with recursive
n as (
select category_id, category_name, super_category, 0 as level
from category
union all
select n.category_id, n.category_name, c.super_category, n.level + 1
from n
join category c on c.category_id = n.super_category
)
select category_id, category_name, super_category
from n
where (category_id, level) in (
select distinct on (category_id) category_id, level
from n
order by category_id, level desc
)

Related

Create a PostgreSQL function that becomes a formula field of a table retrieving related data from other table

The example above can be done on a SQL Server. It is a function that performs the calculation on another table while getting the current table field Id to list data from other table, return a single value.
Question: how to do the exact thing with PostgreSQL
SELECT TOP(5) * FROM Artists;
+------------+------------------+--------------+-------------+
| ArtistId | ArtistName | ActiveFrom | CountryId |
|------------+------------------+--------------+-------------|
| 1 | Iron Maiden | 1975-12-25 | 3 |
| 2 | AC/DC | 1973-01-11 | 2 |
| 3 | Allan Holdsworth | 1969-01-01 | 3 |
| 4 | Buddy Rich | 1919-01-01 | 6 |
| 5 | Devin Townsend | 1993-01-01 | 8 |
+------------+------------------+--------------+-------------+
SELECT TOP(5) * FROM Albums;
+-----------+------------------------+---------------+------------+-----------+
| AlbumId | AlbumName | ReleaseDate | ArtistId | GenreId |
|-----------+------------------------+---------------+------------+-----------|
| 1 | Powerslave | 1984-09-03 | 1 | 1 |
| 2 | Powerage | 1978-05-05 | 2 | 1 |
| 3 | Singing Down the Lane | 1956-01-01 | 6 | 3 |
| 4 | Ziltoid the Omniscient | 2007-05-21 | 5 | 1 |
| 5 | Casualties of Cool | 2014-05-14 | 5 | 1 |
+-----------+------------------------+---------------+------------+-----------+
The function
CREATE FUNCTION [dbo].[ufn_AlbumCount] (#ArtistId int)
RETURNS smallint
AS
BEGIN
DECLARE #AlbumCount int;
SELECT #AlbumCount = COUNT(AlbumId)
FROM Albums
WHERE ArtistId = #ArtistId;
RETURN #AlbumCount;
END;
GO
Now, (at SQL Server), after update the first table fields with ALTER TABLE Artists ADD AlbumCount AS dbo.ufn_AlbumCount(ArtistId); whe can list and get the following result.
+------------+------------------+--------------+-------------+--------------+
| ArtistId | ArtistName | ActiveFrom | CountryId | AlbumCount |
|------------+------------------+--------------+-------------+--------------|
| 1 | Iron Maiden | 1975-12-25 | 3 | 5 |
| 2 | AC/DC | 1973-01-11 | 2 | 3 |
| 3 | Allan Holdsworth | 1969-01-01 | 3 | 2 |
| 4 | Buddy Rich | 1919-01-01 | 6 | 1 |
| 5 | Devin Townsend | 1993-01-01 | 8 | 3 |
| 6 | Jim Reeves | 1948-01-01 | 6 | 1 |
| 7 | Tom Jones | 1963-01-01 | 4 | 3 |
| 8 | Maroon 5 | 1994-01-01 | 6 | 0 |
| 9 | The Script | 2001-01-01 | 5 | 1 |
| 10 | Lit | 1988-06-26 | 6 | 0 |
+------------+------------------+--------------+-------------+--------------+
but how to achieve this on postgresql?
Postgres doesn't support "virtual" computed column (i.e. computed columns that are generated at runtime), so there is no exact equivalent. The most efficient solution is a view that counts this:
create view artists_with_counts
as
select a.*,
coalesce(t.album_count, 0) as album_count
from artists a
left join (
select artist_id, count(*) as album_count
from albums
group by artist_id
) t on a.artist_id = t.artist_id;
Another option is to create a function that can be used as a "virtual column" in a select - but as this is done row-by-row, this will be substantially slower than the view.
create function album_count(p_artist artists)
returns bigint
as
$$
select count(*)
from albums a
where a.artist_id = p_artist.artist_id;
$$
language sql
stable;
Then you can include this as a column:
select a.*, a.album_count
from artists a;
Using the function like that, requires to prefix the function reference with the table alias (alternatively, you can use album_count(a))
Online example

Select common values when using group by [Postgres]

I have three main tables meetings, persons, hobbies with two relational tables.
Table meetings
+---------------+
| id | subject |
+----+----------+
| 1 | Kickoff |
| 2 | Relaunch |
| 3 | Party |
+----+----------+
Table persons
+------------+
| id | name |
+----+-------+
| 1 | John |
| 2 | Anna |
| 3 | Linda |
+----+-------+
Table hobbies
+---------------+
| id | name |
+----+----------+
| 1 | Soccer |
| 2 | Tennis |
| 3 | Swimming |
+----+----------+
Relation Table meeting_person
+-----------------+-----------+
| id | meeting_id | person_id |
+----+------------+-----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 1 |
+----+------------+-----------+
Relation Table person_hobby
+----------------+----------+
| id | person_id | hobby_id |
+----+-----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 1 |
+----+-----------+----------+
Now I want to to find the common hobbies of all person attending each meeting.
So the desired result would be:
+------------+-----------------+------------------------+
| meeting_id | persons | common_hobbies |
| | (Aggregated) | (Aggregated) |
+------------+-----------------+------------------------+
| 1 | John,Anna,Linda | Soccer |
| 2 | John,Anna | Soccer,Tennis |
| 3 | John | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+
My current work in progress is:
select
m.id as "meeting_id",
(
select string_agg(distinct p.name, ',')
from meeting_person mp
inner join persons p on mp.person_id = p.id
where m.id = mp.meeting_id
) as "persons",
string_agg(distinct h2.name , ',') as "common_hobbies"
from meetings m
inner join meeting_person mp2 on m.id = mp2.meeting_id
inner join persons p2 on mp2.person_id = p2.id
inner join person_hobby ph2 on p2.id = ph2.person_id
inner join hobbies h2 on ph2.hobby_id = h2.id
group by m.id
But this query lists not the common_hobbies but all hobbies which are at least once mentioned.
+------------+-----------------+------------------------+
| meeting_id | persons | common_hobbies |
+------------+-----------------+------------------------+
| 1 | John,Anna,Linda | Soccer,Tennis,Swimming |
| 2 | John,Anna | Soccer,Tennis,Swimming |
| 3 | John | Soccer,Tennis,Swimming |
+------------+-----------------+------------------------+
Does anyone have any hints for me, on how I could solve this problem?
Cheers
This problem can be solved by implement custom aggregation function (found it here):
create or replace function array_intersect(anyarray, anyarray)
returns anyarray language sql
as $$
select
case
when $1 is null then $2
when $2 is null then $1
else
array(
select unnest($1)
intersect
select unnest($2))
end;
$$;
create aggregate array_intersect_agg (anyarray)
(
sfunc = array_intersect,
stype = anyarray
);
So, the solution can be next:
select
meeting_id,
array_agg(ph.name) persons,
array_intersect_agg(hobby) common_hobbies
from meeting_person mp
join (
select p.id, p.name, array_agg(h.name) hobby
from person_hobby ph
join persons p on ph.person_id = p.id
join hobbies h on h.id = ph.hobby_id
group by p.id, p.name
) ph on ph.id = mp.person_id
group by meeting_id;
Look the example fiddle
Result:
meeting_id | persons | common_hobbies
-----------+-----------------------+--------------------------
1 | {John,Anna,Linda} | {Soccer}
3 | {John} | {Soccer,Tennis,Swimming}
2 | {John,Anna} | {Soccer,Tennis}

What is the proper approach to insert into multiple tables at once?

For example I have a table called product_list, which holds a list of products:
+----+-------+-----------+-------------+--+
| id | name | weight(g) | type | |
+----+-------+-----------+-------------+--+
| 1 | Shirt | 157 | Clothes | |
+----+-------+-----------+-------------+--+
| 2 | Ring | 53 | Accessories | |
+----+-------+-----------+-------------+--+
| 3 | Pants | 202 | Clothes | |
+----+-------+-----------+-------------+--+
and a table called product_price:
+----------+----+-------+--------+--+
| price_id | id | name | price | |
+----------+----+-------+--------+--+
| 1 | 1 | Shirt | 99.00 | |
+----------+----+-------+--------+--+
| 2 | 2 | Ring | 149.00 | |
+----------+----+-------+--------+--+
| 3 | 3 | Pants | 119.00 | |
+----------+----+-------+--------+--+
If I insert 1 row of data into product_list, part of the data (such as product_id & product name) should also be inserted in another table like product_price which holds the price for all products (new products would have 0 or NULL values for their price). Eg:
product_list:
+----+--------+-----------+-------------+--+
| id | name | weight(g) | type | |
+----+--------+-----------+-------------+--+
| 1 | Shirt | 157 | Clothes | |
+----+--------+-----------+-------------+--+
| 2 | Ring | 53 | Accessories | |
+----+--------+-----------+-------------+--+
| 3 | Pants | 202 | Clothes | |
+----+--------+-----------+-------------+--+
| 4 | Shirt2 | 175 | Clothes | |
+----+--------+-----------+-------------+--+
product_price:
+----------+----+-------+--------+--+
| price_id | id | name | price | |
+----------+----+-------+--------+--+
| 1 | 1 | Shirt | 99.00 | |
+----------+----+-------+--------+--+
| 2 | 2 | Ring | 149.00 | |
+----------+----+-------+--------+--+
| 3 | 3 | Pants | 119.00 | |
+----------+----+-------+--------+--+
| 4 | 4 | Shirt2| 0.00 | |
+----------+----+-------+--------+--+
My question here is the method in approaching this. What is the proper way (in a professional manner) would an experienced person approach this matter?
These are 2 approaches I have in mind:
1 - Using triggers to insert into the other tables like product_price,etc whenever I insert a product data into product_list
2 - Using a function (stored procedure) like product_add to add a new product into each tables.
Which method is better? Or if there a better suggestion, then I'd like to know about it. Thanks in advance.
TLDR: Should I use Triggers or instead use Stored Procedures, which is better? Or you have a better suggestion?
In Postgres, you can use CTEs:
with pl as (
insert into product_list(name, weight, type)
select . . .
returning *
)
insert into product_price(id, price)
select id, NULL
from pl;
Note: You shouldn't repeat the name column in the product_list and product_price table. It should only be in the list table.

Postgres values as columns

I am working with PostgreSQL 9.3, and I have this:
PARENT_TABLE
ID | NAME
1 | N_A
2 | N_B
3 | N_C
CHILD_TABLE
ID | PARENT_TABLE_ID | KEY | VALUE
1 | 1 | K_A | V_A
2 | 1 | K_B | V_B
3 | 1 | K_C | V_C
5 | 2 | K_A | V_D
6 | 2 | K_C | V_E
7 | 3 | K_A | V_F
8 | 3 | K_B | V_G
9 | 3 | K_C | V_H
Note that I might add K_D in KEY's, it's completely dynamic.
What I want is a query that returns me the following:
QUERY_TABLE
ID | NAME | K_A | K_B | K_C | others K_...
1 | N_A | V_A | V_B | V_C | ...
2 | N_B | V_D | | V_E | ...
3 | N_C | V_F | V_G | V_H | ...
Is this possible to do ? If so, how ?
Since there can be values missing, you need the "safe" form of crosstab() with the column names as second parameter:
SELECT * FROM crosstab(
'SELECT p.id, p.name, c.key, c."value"
FROM parent_table p
LEFT JOIN child_table c ON c.parent_table_id = p.id
ORDER BY 1'
,$$VALUES ('K_A'::text), ('K_B'), ('K_C')$$)
AS t (id int, name text, k_a text, k_b text, k_c text; -- use actual data types
Details in this related answer:
PostgreSQL Crosstab Query
About adding "extra" columns:
Pivot on Multiple Columns using Tablefunc

Recursive SQL PostgreSQL Empty Result Set

The categories table:
=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+-------
public | categories | table | pgsql
public | products | table | pgsql
public | ticketlines | table | pgsql
(3 rows)
Contents of categories:
=# select * from categories;
id | name | parentid
----+--------+----------
1 | Rack |
2 | Women | 1
3 | Shorts | 2
4 | Wares |
5 | Toys | 4
6 | Trucks | 5
(6 rows)
Running the following query:
WITH RECURSIVE nodes_cte(name, id, parentid, depth, path) AS (
-- Base case?
SELECT c.name,
c.id,
c.parentid,
1::INT AS depth,
c.id::TEXT AS path
FROM categories c
WHERE c.parentid = ''
UNION ALL
-- nth case
SELECT c.name,
c.id,
c.parentid,
n.depth + 1 AS depth,
(n.path || '->' || c.id::TEXT)
FROM nodes_cte n
JOIN categories c on n.id = c.parentid
)
SELECT * FROM nodes_cte AS n GROUP BY n.name, n.id, n.parentid, n.depth, n.path ORDER BY n.id ASC
;
yields these results:
name | id | parentid | depth | path
--------+----+----------+-------+---------
Rack | 1 | | 1 | 1
Women | 2 | 1 | 2 | 1->2
Shorts | 3 | 2 | 3 | 1->2->3
Wares | 4 | | 1 | 4
Toys | 5 | 4 | 2 | 4->5
Trucks | 6 | 5 | 3 | 4->5->6
(6 rows)
Great!
But given a similar table (categories):
=# \d categories
Table "public.categories"
Column | Type | Modifiers
----------+-------------------+-----------
id | character varying | not null
name | character varying | not null
parentid | character varying |
image | bytea |
Indexes:
"categories_pkey" PRIMARY KEY, btree (id)
"categories_name_inx" UNIQUE, btree (name)
Referenced by:
TABLE "products" CONSTRAINT "products_fk_1" FOREIGN KEY (category) REFERENCES categories(id)
=# select * from categories;
id | name | parentid | image
--------------------------------------+-------+--------------------------------------+-------
611572c9-326d-4cf9-ae4a-af5269fc788e | Rack | |
22d15300-40b5-4f43-a8d1-902b8d4c5409 | Women | 611572c9-326d-4cf9-ae4a-af5269fc788e |
6b061073-96f4-49a1-9205-bab7c878f0cf | Wares | |
3f018dfb-e6ee-40d1-9dbc-31e6201e7625 | Toys | 6b061073-96f4-49a1-9205-bab7c878f0cf |
(4 rows)
the same query produces zero rows.
Why?
Is it something to do with primary / foreign keys?
WHERE COALESCE(parent_id, '') = ''
Worked. Thank you.