Optimize stored procedure query - firebird

I have 2 tables in a Firebird 2.5 db:
SL_SPORS
-----------------
ID_PERS ID_SUMSP
10 2
10 3
11 2
SL_BRUTS
----------------------------------------
ID_PERS S_SPORS_1 S_SPORS_2 S_SPORS_3 ...
10 0 50 0
11 0 0 0
I need to get ID_PERS and ID_SUMSP from first table where S_SPORS_[ID_SUMSP] is 0 in second table,ID_SUMSP is from 1 to 7,in SL_BRUTS there is only one record for a person
I have a stored procedure:
CREATE OR ALTER PROCEDURE SP_VALIDSUM()
RETURNS (RES VARCHAR(500)) AS
....
FOR SELECT A.ID_PERS,S.ID_SUMSP FROM SL_SPORS INTO :ID_PERS,:ID_SUMSP DO
BEGIN
SUMA=0;
EXECUTE STATEMENT 'SELECT FIRST 1 S_SPORS_'||:ID_SUMSP||' FROM SL_BRUTS WHERE ID_PERS='||:ID_PERS INTO :SUMA;
IF (SUMA=0) THEN
BEGIN
RES='SUM 0 FOR ID_PERS='||:ID_PERS||' AND ID_SUMSP='||:ID_SUMSP;"
SUSPEND;
END
END
it is working but I want to know if there is a better solution,for example maybe I can load all ID_SUMSP from first table in single query using LIST(..)..or I can check all 1-7 fields from table 2 on a single query

You have to change the structure (Schema) of the second table - read Martin Gruber's "Essential SQL" or any other tutorial on "database normalization". SL_BRUTS table should be re-molded to have three columns: ID_PERS and ID_SUMSP and S_SPORS
https://en.wikipedia.org/wiki/Database_normalization
Do one-time data conversion and then do your queries over normalized tables. The structure you made in SL_BRUTS would never allow you any efficient queries but most trivial.
Check the QUERY EXECUTION PLAN in the last two queries below:
select rdb$get_context('SYSTEM', 'ENGINE_VERSION') as version
, rdb$character_set_name
from rdb$database;
VERSION | RDB$CHARACTER_SET_NAME
:------ | :---------------------------------------------------------------------------------------------------------------------------
3.0.5 | UTF8
create table SL_SPORS (
ID_PERS integer,
ID_SPORS integer,
Primary key (ID_PERS, ID_SPORS)
)
✓
insert into SL_SPORS
select 10, 2 from rdb$database union all
select 10, 3 from rdb$database union all
select 11, 2 from rdb$database union all
select 20, 3 from rdb$database
4 rows affected
select * from sl_spors
ID_PERS | ID_SPORS
------: | -------:
10 | 2
10 | 3
11 | 2
20 | 3
-- CAN this sl_spors_NNN data be NULL ? or not ? what is semantics, what is meaning of it???
create table SL_BROKEN (
ID_PERS integer primary key,
s_SPORS_1 integer NOT NULL,
s_SPORS_2 integer NOT NULL,
s_SPORS_3 integer NOT NULL
)
✓
create table SL_RAWS (
ID_PERS integer,
ID_SPORS integer,
S_SPOR integer NOT NULL,
Primary key (ID_PERS, ID_SPORS),
constraint impossible_over_SL_BROKEN
FOREIGN KEY(ID_PERS, ID_SPORS)
REFERENCES SL_SPORS(ID_PERS, ID_SPORS)
)
✓
-- should throw error over non-existing PERSON - but would it ???
insert into SL_BROKEN
values (-100, 20, 30, 40)
1 rows affected
-- should throw error over non-existing PERSON - and it would!
insert into SL_RAWS
values (-100, 20, 30)
violation of FOREIGN KEY constraint "IMPOSSIBLE_OVER_SL_BROKEN" on table "SL_RAWS" Foreign key reference target does not exist Problematic key value is ("ID_PERS" = -100, "ID_SPORS" = 20)
insert into SL_BROKEN
select 10, 0, 50, 0 from rdb$database union all
select 11, 0, 0, 0 from rdb$database
2 rows affected
select * from SL_BROKEN
ID_PERS | S_SPORS_1 | S_SPORS_2 | S_SPORS_3
------: | --------: | --------: | --------:
-100 | 20 | 30 | 40
10 | 0 | 50 | 0
11 | 0 | 0 | 0
delete from SL_BROKEN where ID_PERS < 0
1 rows affected
create view SL_TRANSPOSE as
SELECT 1 as ID_SPORS, ID_PERS, S_SPORS_1 as S_SPOR from SL_BROKEN
union all
SELECT 2 as ID_SPORS, ID_PERS, S_SPORS_2 as S_SPOR from SL_BROKEN
union all
SELECT 3 as ID_SPORS, ID_PERS, S_SPORS_3 as S_SPOR from SL_BROKEN
✓
select * from SL_TRANSPOSE
ID_SPORS | ID_PERS | S_SPOR
-------: | ------: | -----:
1 | 10 | 0
1 | 11 | 0
2 | 10 | 50
2 | 11 | 0
3 | 10 | 0
3 | 11 | 0
insert into SL_RAWS (ID_PERS, ID_SPORS, S_SPOR)
select ID_PERS, ID_SPORS, S_SPOR from SL_TRANSPOSE
violation of FOREIGN KEY constraint "IMPOSSIBLE_OVER_SL_BROKEN" on table "SL_RAWS" Foreign key reference target does not exist Problematic key value is ("ID_PERS" = 10, "ID_SPORS" = 1)
insert into SL_SPORS
select 10, 1 from rdb$database union all
select 11, 3 from rdb$database union all
select 11, 1 from rdb$database
3 rows affected
insert into SL_RAWS (ID_PERS, ID_SPORS, S_SPOR)
select ID_PERS, ID_SPORS, S_SPOR from SL_TRANSPOSE
6 rows affected
select * from SL_RAWS
ID_PERS | ID_SPORS | S_SPOR
------: | -------: | -----:
10 | 1 | 0
11 | 1 | 0
10 | 2 | 50
11 | 2 | 0
10 | 3 | 0
11 | 3 | 0
select * from SL_RAWS where S_SPOR = 0 -- where S_SPOR IS NULL
ID_PERS | ID_SPORS | S_SPOR
------: | -------: | -----:
10 | 1 | 0
11 | 1 | 0
11 | 2 | 0
10 | 3 | 0
11 | 3 | 0
select * from SL_TRANSPOSE where S_SPOR = 0 -- where S_SPOR IS NULL
ID_SPORS | ID_PERS | S_SPOR
-------: | ------: | -----:
1 | 10 | 0
1 | 11 | 0
2 | 11 | 0
3 | 10 | 0
3 | 11 | 0
db<>fiddle here

Related

historical aggregation of a column up until a specified time in each row in another column

I have two tables login_attempts and checkouts in Amazon RedShift. A user can have multiple (un)successful login attempts and multiple (un)successful checkouts as shown in this example:
login_attempts
login_id | user_id | login | success
-------------------------------------------------------
1 | 1 | 2021-07-01 14:00:00 | 0
2 | 1 | 2021-07-01 16:00:00 | 1
3 | 2 | 2021-07-02 05:01:01 | 1
4 | 1 | 2021-07-04 03:25:34 | 0
5 | 2 | 2021-07-05 11:20:50 | 0
6 | 2 | 2021-07-07 12:34:56 | 1
and
checkouts
checkout_id | checkout_time | user_id | success
------------------------------------------------------------
1 | 2021-07-01 18:00:00 | 1 | 0
2 | 2021-07-02 06:54:32 | 2 | 1
3 | 2021-07-04 13:00:01 | 1 | 1
4 | 2021-07-08 09:05:00 | 2 | 1
Given this information, how can I get the following table with historical performance included for each checkout AS OF THAT TIME?
checkout_id | checkout | user_id | lastGoodLogin | lastFailedLogin | lastGoodCheckout | lastFailedCheckout |
---------------------------------------------------------------------------------------------------------------------------------------
1 | 2021-07-01 18:00:00 | 1 | 2021-07-01 16:00:00 | 2021-07-01 14:00:00 | NULL | NULL
2 | 2021-07-02 06:54:32 | 2 | 2021-07-02 05:01:01 | NULL | NULL | NULL
3 | 2021-07-04 13:00:01 | 1 | 2021-07-01 16:00:00 | 2021-07-04 03:25:34 | NULL | 2021-07-01 18:00:00
4 | 2021-07-08 09:05:00 | 2 | 2021-07-07 12:34:56 | 2021-07-05 11:20:50 | 2021-07-02 06:54:32 | NULL
Update: I was able to get lastFailedCheckout & lastGoodCheckout because that's doing window operations on the same table (checkouts) but I am failing to understand how to best join it with login_attempts table to get last[Good|Failed]Login fields. (sqlfiddle)
P.S.: I am open to PostgreSQL suggestions as well.
Good start! A couple things in your SQL - 1) You should really try to avoid inequality joins as these can lead to data explosions and aren't needed in this case. Just put a CASE statement inside your window function to use only the type of checkout (or login) you want. 2) You can use the frame clause to not self select the same row when finding previous checkouts.
Once you have this pattern you can use it to find the other 2 columns of data you are looking for. The first step is to UNION the tables together, not JOIN. This means making a few more columns so the data can live together but that is easy. Now you have the userid and the time the "thing" happened all in the same data. You just need to WINDOW 2 more times to pull the info you want. Lastly, you need to strip out the non-checkout rows with an outer select w/ where clause.
Like this:
create table login_attempts(
loginid smallint,
userid smallint,
login timestamp,
success smallint
);
create table checkouts(
checkoutid smallint,
userid smallint,
checkout_time timestamp,
success smallint
);
insert into login_attempts values
(1, 1, '2021-07-01 14:00:00', 0),
(2, 1, '2021-07-01 16:00:00', 1),
(3, 2, '2021-07-02 05:01:01', 1),
(4, 1, '2021-07-04 03:25:34', 0),
(5, 2, '2021-07-05 11:20:50', 0),
(6, 2, '2021-07-07 12:34:56', 1)
;
insert into checkouts values
(1, 1, '2021-07-01 18:00:00', 0),
(2, 2, '2021-07-02 06:54:32', 1),
(3, 1, '2021-07-04 13:00:01', 1),
(4, 2, '2021-07-08 09:05:00', 1)
;
SQL:
select *
from (
select
c.checkoutid,
c.userid,
c.checkout_time,
max(case success when 0 then checkout_time end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastFailedCheckout,
max(case success when 1 then checkout_time end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastGoodCheckout,
max(case lsuccess when 0 then login end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastFailedLogin,
max(case lsuccess when 1 then login end) over (
partition by userid
order by event_time
rows between unbounded preceding and 1 preceding
) as lastGoodLogin
from (
select checkout_time as event_time, checkoutid, userid,
checkout_time, success,
NULL as login, NULL as lsuccess
from checkouts
UNION ALL
select login as event_time,NULL as checkoutid, userid,
NULL as checkout_time, NULL as success,
login, success as lsuccess
from login_attempts
) c
) o
where o.checkoutid is not null
order by o.checkoutid

Oracle SQL Percent Difference Same Column

Given the following auction data, how would you find the percent difference between a persons most recent and previous bid for a product using Oracle SQL?
The duplicate sequence (SEQ) for person A and B is representative of data I am working with.
An example of your SQL would be very appreciated.
TXN_TIME | SEQ | PERSON | PRODUCT | TRANSACTION | BID |
2017-11-22 15:41:10:0 | 20 | A | 1 | BID | 12 |
2017-11-22 15:35:10:0 | 10C | A | 1 | CXLBID | NULL |
2017-11-22 15:34:25:0 | 10 | A | 1 | BID | 10 |
2017-11-22 15:35:40:0 | 6 | A | 2 | BID | 4 |
2017-11-22 15:34:50:0 | 1C | A | 2 | CXLBID | NULL |
2017-11-22 15:34:20:0 | 1 | A | 2 | BID | 5 |
2017-11-22 15:35:45:0 | 6 | B | 2 | BID | 2 |
2017-11-22 15:34:55:0 | 1C | B | 2 | CXLBID | NULL |
2017-11-22 15:34:25:0 | 1 | B | 2 | BID | 1 |
We could try to use LEAD/LAG analytic functions if they be available. But one approach here would be to use a CTE to identify just the most recent, and immediately prior, bid for each person, and then compare these two values.
WITH cte AS (
SELECT PERSON, BID,
ROW_NUMBER() OVER (PARTITION BY PERSON ORDER BY TXN_TIME DESC) rn
FROM yourTable
WHERE TRANSACTION = 'BID'
)
SELECT
t1.PERSON,
100*(t1.BID - t2.BID) / t2.BID AS BID_PCT_DIFF
FROM cte t1
INNER JOIN cte t2
ON t1.PERSON = t2.PERSON AND
t1.rn = 1 AND t2.rn = 2;
This output looks correct, because person A went from a bid of 4 to 12, which is an increase of 8, or 200%, and person B went from a bid of 1 to 2, which is a 100% increase.
I created a demo below in SQL Server, because I always have difficulties getting Oracle demos to work. But my query is just ANSI SQL and should run the same on either SQL Server or Oracle.
Demo
Good thing you are using Oracle 12. This way you can use the MATCH_RECOGNIZE clause, which is perfect for your problem.
I calculate the CHANGE column in the MATCH_RECOGNIZE clause, using the LAST() function with the optional second argument, which is a logical offset within the set of rows mapped to a specific pattern variable. I format the CHANGE column in the SELECT clause - I use a favorite hack, using the "currency" symbol to attach the percent sign... you can modify the formatting any way you want, without affecting the calculation (which is hidden in the MATCH_RECOGNIZE clause).
with auction_data ( txn_time, seq, person, product, transaction, bid ) as (
select timestamp '2017-11-22 15:41:10', '20' , 'A', 1, 'BID' , 12 from dual union all
select timestamp '2017-11-22 15:35:10', '10C', 'A', 1, 'CXLBID', NULL from dual union all
select timestamp '2017-11-22 15:34:25', '10' , 'A', 1, 'BID' , 10 from dual union all
select timestamp '2017-11-22 15:35:40', '6' , 'A', 2, 'BID' , 4 from dual union all
select timestamp '2017-11-22 15:34:50', '1C' , 'A', 2, 'CXLBID', NULL from dual union all
select timestamp '2017-11-22 15:34:20', '1' , 'A', 2, 'BID' , 5 from dual union all
select timestamp '2017-11-22 15:35:45', '6' , 'B', 2, 'BID' , 2 from dual union all
select timestamp '2017-11-22 15:34:55', '1C' , 'B', 2, 'CXLBID', NULL from dual union all
select timestamp '2017-11-22 15:34:25', '1' , 'B', 2, 'BID' , 1 from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
select txn_time, seq, person, product, transaction, bid,
to_char( 100 * (change - 1), '999D0L', 'nls_currency=''%''') as change
from auction_data
match_recognize(
partition by person, product
order by txn_time
measures case when classifier() = 'B' then bid / last(B.bid, 1) end as change
all rows per match
pattern ( (B|A)* )
define B as B.transaction = 'BID'
);
TXN_TIME SEQ PERSON PRODUCT TRANSACTION BID CHANGE
------------------- --- ------ ---------- ----------- ---------- ----------------
2017-11-22 15:34:25 10 A 1 BID 10
2017-11-22 15:35:10 10C A 1 CXLBID
2017-11-22 15:41:10 20 A 1 BID 12 20.0%
2017-11-22 15:34:20 1 A 2 BID 5
2017-11-22 15:34:50 1C A 2 CXLBID
2017-11-22 15:35:40 6 A 2 BID 4 -20.0%
2017-11-22 15:34:25 1 B 2 BID 1
2017-11-22 15:34:55 1C B 2 CXLBID
2017-11-22 15:35:45 6 B 2 BID 2 100.0%

DB2 SQL to aggregate value for months with no gaps

I have 2 tables which I need to join against, along with a table that is generated inline using WITH. The WITH is a daterange, and I need to display all rows from 1 table for all months, even where no data exists in the 2nd table.
This is the data within the tables :
Table REFERRAL_GROUPINGS
referral_group
--------------
VER
FRD
FCC
Table DATA_VALUES
referral_group | task_date | task_id | over_threshold
---------------+------------+---------+---------------
VER | 2015-10-01 | 10 | 0
FRD | 2015-11-04 | 20 | 1
The date range will need to select 3 months :
Oct-2015
Nov-2015
Dec-2015
The data I expect to end up with will be :
MonthYear | referral_group | count_of_group | total_over_threshold
----------+----------------+----------------+---------------------
Oct-2015 | VER | 1 | 0
Oct-2015 | FRD | 0 | 0
Oct-2015 | FCC | 0 | 0
Nov-2015 | VER | 0 | 0
Nov-2015 | FRD | 1 | 1
Nov-2015 | FCC | 0 | 0
Dec-2015 | VER | 0 | 0
Dec-2015 | FRD | 0 | 0
Dec-2015 | FCC | 0 | 0
DDL to create the 2 tables and populate with data is as below..
CREATE TABLE test_data (
referral_group char(3),
task_date date,
task_id integer,
over_threshold integer);
insert into test_data values
('VER','2015-10-01',10,1),
('FRD','2015-11-04',20,0);
CREATE TABLE referral_grouper (
referral_group char(3));
insert into referral_grouper values
('FRD'),
('VER'),
('FCC');
This is a very cut-down example which uses the minimal tables/columns for this example, which is why I have no primary keys/indexes.
I can get this running under LUW no problem, by using NOT EXISTS in the joins as per this SQL.
WITH DATERANGE(FROM_DTE,yyyymm, TO_DTE) AS
(
SELECT DATE('2015-10-01'), YEAR('2015-10-01')*100+MONTH('2015-10-01'), '2015-12-31'
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT FROM_DTE + 1 DAY, YEAR(FROM_DTE+1 DAY)*100+MONTH(FROM_DTE+1 DAY), TO_DTE
FROM DATERANGE
WHERE FROM_DTE < TO_DTE
)
select
referral_grouper.referral_group,
daterange.yyyymm,
count(test_data.task_id) AS total_count,
COALESCE(SUM(over_threshold),0) AS total_over_threshold
FROM
test_data
RIGHT OUTER JOIN daterange ON (daterange.from_dte=test_data.task_date OR NOT EXISTS (SELECT 1 FROM daterange d2 WHERE d2.from_dte=test_data.task_date))
RIGHT OUTER JOIN referral_grouper ON (referral_grouper.referral_group=test_data.referral_group OR NOT EXISTS (SELECT 1 FROM referral_grouper g2 WHERE g2.referral_group=test_data.referral_group))
GROUP BY
referral_grouper.referral_group,
daterange.yyyymm
However... This needs to work on ZOS, and under ZOS you cannot use subqueries with EXISTS in a join. Removing the NOT EXISTS means the non existing rows no longer show up.
There must be a way to write the SQL to return all rows from the 2 linking tables without using NOT EXISTS, but I just cannot seem to find it. Any help with this would be very appreciated as it has me stumped

PostgreSQL XOR - How to check if only 1 column is filled in?

How can I simulate a XOR function in PostgreSQL? Or, at least, I think this is a XOR-kind-of situation.
Lets say the data is as follows:
id | col1 | col2 | col3
---+------+------+------
1 | 1 | | 4
2 | | 5 | 4
3 | | 8 |
4 | 12 | 5 | 4
5 | | | 4
6 | 1 | |
7 | | 12 |
And I want to return 1 column for those rows where only one of the columns is filled in. (ignore col3 for now..
Lets start with this example of 2 columns:
SELECT
id, COALESCE(col1, col2) AS col
FROM
my_table
WHERE
COALESCE(col1, col2) IS NOT NULL -- at least 1 is filled in
AND
(col1 IS NULL OR col2 IS NULL) -- at least 1 is empty
;
This works nicely an should result in:
id | col
---+----
1 | 1
3 | 8
6 | 1
7 | 12
But now, I would like to include col3 in a similar way. Like this:
id | col
---+----
1 | 1
3 | 8
5 | 4
6 | 1
7 | 12
How can this be done is a more generic way? Does Postgres support such a method?
I'm not able to find anything like it.
rows with exactly 1 column filled in:
select * from my_table where
(col1 is not null)::integer
+(col1 is not null)::integer
+(col1 is not null)::integer
=1
rows with 1 or 2
select * from my_table where
(col1 is not null)::integer
+(col1 is not null)::integer
+(col1 is not null)::integer
between 1 and 2
The "case" statement might be your friend here, the "min" aggregated function doesn't affect the result.
select id, min(coalesce(col1,col2,col3))
from my_table
group by 1
having sum(case when col1 is null then 0 else 1 end+
case when col2 is null then 0 else 1 end+
case when col3 is null then 0 else 1 end)=1
[Edit]
Well, i found a better answer without using aggregated functions, it's still based on the use of "case" but i think is more simple.
select id, coalesce(col1,col2,col3)
from my_table
where (case when col1 is null then 0 else 1 end+
case when col2 is null then 0 else 1 end+
case when col3 is null then 0 else 1 end)=1
How about
select coalesce(col1, col2, col3)
from my_table
where array_length(array_remove(array[col1, col2, col3], null), 1) = 1

Find Parent Recursively using Query

I am using postgresql. I have the table as like below
parent_id child_id
----------------------
101 102
103 104
104 105
105 106
I want to write a sql query which will give the final parent of input.
i.e suppose i pass 106 as input then , its output will be 103.
(106 --> 105 --> 104 --> 103)
Here's a complete example. First the DDL:
test=> CREATE TABLE node (
test(> id SERIAL,
test(> label TEXT NOT NULL, -- name of the node
test(> parent_id INT,
test(> PRIMARY KEY(id)
test(> );
NOTICE: CREATE TABLE will create implicit sequence "node_id_seq" for serial column "node.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "node_pkey" for table "node"
CREATE TABLE
...and some data...
test=> INSERT INTO node (label, parent_id) VALUES ('n1',NULL),('n2',1),('n3',2),('n4',3);
INSERT 0 4
test=> INSERT INTO node (label) VALUES ('garbage1'),('garbage2'), ('garbage3');
INSERT 0 3
test=> INSERT INTO node (label,parent_id) VALUES ('garbage4',6);
INSERT 0 1
test=> SELECT * FROM node;
id | label | parent_id
----+----------+-----------
1 | n1 |
2 | n2 | 1
3 | n3 | 2
4 | n4 | 3
5 | garbage1 |
6 | garbage2 |
7 | garbage3 |
8 | garbage4 | 6
(8 rows)
This performs a recursive query on every id in node:
test=> WITH RECURSIVE nodes_cte(id, label, parent_id, depth, path) AS (
SELECT tn.id, tn.label, tn.parent_id, 1::INT AS depth, tn.id::TEXT AS path
FROM node AS tn
WHERE tn.parent_id IS NULL
UNION ALL
SELECT c.id, c.label, c.parent_id, p.depth + 1 AS depth,
(p.path || '->' || c.id::TEXT)
FROM nodes_cte AS p, node AS c
WHERE c.parent_id = p.id
)
SELECT * FROM nodes_cte AS n ORDER BY n.id ASC;
id | label | parent_id | depth | path
----+----------+-----------+-------+------------
1 | n1 | | 1 | 1
2 | n2 | 1 | 2 | 1->2
3 | n3 | 2 | 3 | 1->2->3
4 | n4 | 3 | 4 | 1->2->3->4
5 | garbage1 | | 1 | 5
6 | garbage2 | | 1 | 6
7 | garbage3 | | 1 | 7
8 | garbage4 | 6 | 2 | 6->8
(8 rows)
This gets all of the descendents WHERE node.id = 1:
test=> WITH RECURSIVE nodes_cte(id, label, parent_id, depth, path) AS (
SELECT tn.id, tn.label, tn.parent_id, 1::INT AS depth, tn.id::TEXT AS path FROM node AS tn WHERE tn.id = 1
UNION ALL
SELECT c.id, c.label, c.parent_id, p.depth + 1 AS depth, (p.path || '->' || c.id::TEXT) FROM nodes_cte AS p, node AS c WHERE c.parent_id = p.id
)
SELECT * FROM nodes_cte AS n;
id | label | parent_id | depth | path
----+-------+-----------+-------+------------
1 | n1 | | 1 | 1
2 | n2 | 1 | 2 | 1->2
3 | n3 | 2 | 3 | 1->2->3
4 | n4 | 3 | 4 | 1->2->3->4
(4 rows)
The following will get the path of the node with id 4:
test=> WITH RECURSIVE nodes_cte(id, label, parent_id, depth, path) AS (
SELECT tn.id, tn.label, tn.parent_id, 1::INT AS depth, tn.id::TEXT AS path
FROM node AS tn
WHERE tn.parent_id IS NULL
UNION ALL
SELECT c.id, c.label, c.parent_id, p.depth + 1 AS depth,
(p.path || '->' || c.id::TEXT)
FROM nodes_cte AS p, node AS c
WHERE c.parent_id = p.id
)
SELECT * FROM nodes_cte AS n WHERE n.id = 4;
id | label | parent_id | depth | path
----+-------+-----------+-------+------------
4 | n4 | 3 | 4 | 1->2->3->4
(1 row)
And let's assume you want to limit your search to descendants with a depth less than three (note that depth hasn't been incremented yet):
test=> WITH RECURSIVE nodes_cte(id, label, parent_id, depth, path) AS (
SELECT tn.id, tn.label, tn.parent_id, 1::INT AS depth, tn.id::TEXT AS path
FROM node AS tn WHERE tn.id = 1
UNION ALL
SELECT c.id, c.label, c.parent_id, p.depth + 1 AS depth,
(p.path || '->' || c.id::TEXT)
FROM nodes_cte AS p, node AS c
WHERE c.parent_id = p.id AND p.depth < 2
)
SELECT * FROM nodes_cte AS n;
id | label | parent_id | depth | path
----+-------+-----------+-------+------
1 | n1 | | 1 | 1
2 | n2 | 1 | 2 | 1->2
(2 rows)
I'd recommend using an ARRAY data type instead of a string for demonstrating the "path", but the arrow is more illustrative of the parent<=>child relationship.
Use WITH RECURSIVE to create a Common Table Expression (CTE). For the non-recursive term, get the rows in which the child is immediately below the parent:
SELECT
c.child_id,
c.parent_id
FROM
mytable c
LEFT JOIN
mytable p ON c.parent_id = p.child_id
WHERE
p.child_id IS NULL
child_id | parent_id
----------+-----------
102 | 101
104 | 103
For the recursive term, you want the children of these children.
WITH RECURSIVE tree(child, root) AS (
SELECT
c.child_id,
c.parent_id
FROM
mytable c
LEFT JOIN
mytable p ON c.parent_id = p.child_id
WHERE
p.child_id IS NULL
UNION
SELECT
child_id,
root
FROM
tree
INNER JOIN
mytable on tree.child = mytable.parent_id
)
SELECT * FROM tree;
child | root
-------+------
102 | 101
104 | 103
105 | 103
106 | 103
You can filter the children when querying the CTE:
WITH RECURSIVE tree(child, root) AS (...) SELECT root FROM tree WHERE child = 106;
root
------
103