Oracle SQL strategy for slow parameterized/filtered queries due to static optimizer strategy - oracle10g

More simply put than the below: if one has one or multiple query parameters, e.g. x_id, (or report / table function parameters) that are performance crucial (e.g. some primary key index can be used) and it may be (depending on the use case/report filters applied, ...) one of
null
an exact match (e.g. some unique id)
a like expression
or even a regexp expression
then if all these possibilities are coded in a single query, I only see and know that the optimizer will
generate a unique static plan, independent of the actual parameter runtime-value
and thus can't assume to use some index on x_id although it may be e.g. some exact match
Are there ather ways to handle this than to
let some PL/SQL code choose out of n predefined and use case optimized queries/views?
which can be quite large the more such flexible parameters one has
or some manually string-constructed and dynamically compiled query?
Basically I have two slightly different use cases/questions as documented and executable below:
A - select * from tf_sel
B - select * from data_union
which could potentially be solved via SQL hints or using some other trick.
To speed these queries up I am currently separating the "merged queries" on a certain implementation level (table function) which is quite cumbersome and harder to maintain, but assures the queries are running quite fast due their better execution plan.
As I see it, the main problem seems the static nature of the optimizer sql plan that is always the same altough it could be much more efficient, if it would consider some "query-time-constant" filter parameters.
with
-- Question A: What would be a good strategy to make tf_sel with tf_params nearly as fast as query_use_case_1_eq
-- which actually provides the same result?
--
-- - a complex query should be used in various reports with filters
-- - we want to keep as much as possible filter functionality on the db side (not the report engine side)
-- to be able to utilize the fast and efficient db engine and for loosely coupled software design
complex_query as ( -- just some imaginable complex query with a lot of table/view joins, aggregation/analytical functions etc.
select 1 as id, 'ab12' as indexed_val, 'asdfasdf' x from dual
union all select 2, 'ab34', 'a uiop345' from dual
union all select 3, 'xy34', 'asdf 0u0duaf' from dual
union all select 4, 'xy55', ' asdja´sf asd' from dual
)
-- <<< comment the following lines in to test it with the above
-- , query_use_case_1_eq as ( -- quite fast and maybe the 95% use case
-- select * from complex_query where indexed_val = 'ab12'
-- )
--select * from query_use_case_1_eq
-- >>>
-- ID INDEXED_VAL X
-- -- ----------- --------
-- 1 ab12 asdfasdf
-- <<< comment the following lines in to test it with the above
-- , query_use_case_2_all as ( -- significantly slower due to a lot of underlying calculations
-- select * from complex_query
-- )
--select * from query_use_case_2_all
-- >>>
-- ID INDEXED_VAL X
-- -- ----------- -------------
-- 1 ab12 asdfasdf
-- 2 ab34 a uiop345
-- 3 xy34 asdf 0u0duaf
-- 4 xy55 asdja´sf asd
-- <<< comment the following lines in to test it with the above
-- , query_use_case_3_like as (
-- select * from complex_query where indexed_val like 'ab%'
-- )
--select * from query_use_case_3_like
-- >>>
-- ID INDEXED_VAL X
-- -- ----------- ---------
-- 1 ab12 asdfasdf
-- 2 ab34 a uiop345
-- <<< comment the following lines to simulate the table function
, tf_params as ( -- table function params: imagine we have a table function where these are passed depending on the report
select 'ab12' p_indexed_val, 'eq' p_filter_type from dual
)
, tf_sel as ( -- table function select: nicely integrating all query possiblities, but beeing veeery slow :-(
select q.*
from
tf_params p -- just here so this example works without the need for the actual function
join complex_query q on (1=1)
where
p_filter_type = 'all'
or (p_filter_type = 'eq' and indexed_val = p_indexed_val)
or (p_filter_type = 'like' and indexed_val like p_indexed_val)
or (p_filter_type = 'regexp' and regexp_like(indexed_val, p_indexed_val))
)
-- actually we would pass the tf_params above if it were a real table function
select * from tf_sel
-- >>>
-- ID INDEXED_VAL X
-- -- ----------- --------
-- 1 ab12 asdfasdf
-- Question B: How can we speed up data_union with dg_filter to be as fast as the data_group1 query which
-- actually provides the same result?
--
-- A very similar approach is considered in other scenarios where we like to join the results of
-- different queries (>5) returning joinable data and beeing filtered based on the same parameters.
-- <<< comment the following lines to simulate the union problem
-- , data_group1 as ( -- may run quite fast
-- select 'dg1' dg_id, q.* from complex_query q where x < 'a' -- just an example returning some special rows that should be filtered later on!
-- )
--
-- , data_group2 as ( -- may run quite fast
-- select 'dg2' dg_id, q.* from complex_query q where instr(x,'p') >= 0 -- just an example returning some special rows that should be filtered later on!
-- )
--
--
-- , dg_filter as ( -- may be set by a report or indirectly by user filters
-- select 'dg1' dg_id from dual
-- )
--
-- , data_union as ( -- runs much slower due to another execution plan
-- select * from (
-- select * from data_group1
-- union all select * from data_group2
-- )
-- where dg_id in (select dg_id from dg_filter)
-- )
--
--select * from data_union
-- >>>
-- DG_ID ID INDEXED_VAL X
-- ----- -- ----------- -------------
-- dg1 4 xy55 asdja´sf asd
this is a comment to the sample code and answer provided by jonearles
Actually your answer was a mix up of my (unrelated although occuring together in certain scenarios) use cases A and B. Although it's nevertheless essential that you mentioned the optimizer has dynamic FILTER and maybe other capabilities.
use case B ("data partition/group union")
Actually use case B (based on your sample table) looks more like this, but I still have to check for the performance issue in the real scenario. Maybe you can see some problems with it already?
select * from (
select 'dg1' data_group, x.* from sample_table x
where mod(to_number(some_other_column1), 100000) = 0 -- just some example restriction
--and indexed_val = '3635' -- commenting this in and executing this standalone returns:
----------------------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1 | 23 | 2 (0)|
--| 1 | TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE | 1 | 23 | 2 (0)|
--| 2 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 1 | | 1 (0)|
----------------------------------------------------------------------------------------
union all
select 'dg2', x.* from sample_table x
where mod(to_number(some_other_column2), 9999) = 0 -- just some example restriction
union all
select 'dg3', x.* from sample_table x
where mod(to_number(some_other_column3), 3635) = 0 -- just some example restriction
)
where data_group in ('dg1') and indexed_val = '35'
-------------------------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 3 | 639 | 2 (0)|
--| 1 | VIEW | | 3 | 639 | 2 (0)|
--| 2 | UNION-ALL | | | | |
--| 3 | TABLE ACCESS BY INDEX ROWID | SAMPLE_TABLE | 1 | 23 | 2 (0)|
--| 4 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 1 | | 1 (0)|
--| 5 | FILTER | | | | |
--| 6 | TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE | 1 | 23 | 2 (0)|
--| 7 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 1 | | 1 (0)|
--| 8 | FILTER | | | | |
--| 9 | TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE | 1 | 23 | 2 (0)|
--| 10 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 1 | | 1 (0)|
-------------------------------------------------------------------------------------------
use case A (filtering by column query type)
Based on your sample table this is more like what I wanna do.
As you can see the query with just the fast where p.ft_id = 'eq' and x.indexed_val = p.val shows the index usage, but having all the different filter options in the where clause will cause the plan switch to always use a full table scan :-/
(Even if I use the :p_filter_type and :p_indexed_val_filter everywhere in the SQL than just in the one spot I put it, it won't change.)
with
filter_type as (
select 'all' as id from dual
union all select 'eq' as id from dual
union all select 'like' as id from dual
union all select 'regexp' as id from dual
)
, params as (
select
(select * from filter_type where id = :p_filter_type) as ft_id,
:p_indexed_val_filter as val
from dual
)
select *
from params p
join sample_table x on (1=1)
-- the following with the above would show the 'eq' use case with a fast index scan (plan id 14/15)
--where p.ft_id = 'eq' and x.indexed_val = p.val
------------------------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1 | 23 | 12 (0)|
--| 1 | VIEW | | 4 | 20 | 8 (0)|
--| 2 | UNION-ALL | | | | |
--| 3 | FILTER | | | | |
--| 4 | FAST DUAL | | 1 | | 2 (0)|
--| 5 | FILTER | | | | |
--| 6 | FAST DUAL | | 1 | | 2 (0)|
--| 7 | FILTER | | | | |
--| 8 | FAST DUAL | | 1 | | 2 (0)|
--| 9 | FILTER | | | | |
--| 10 | FAST DUAL | | 1 | | 2 (0)|
--| 11 | FILTER | | | | |
--| 12 | NESTED LOOPS | | 1 | 23 | 4 (0)|
--| 13 | FAST DUAL | | 1 | | 2 (0)|
--| 14 | TABLE ACCESS BY INDEX ROWID| SAMPLE_TABLE | 1 | 23 | 2 (0)|
--| 15 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 1 | | 1 (0)|
--| 16 | VIEW | | 4 | 20 | 8 (0)|
--| 17 | UNION-ALL | | | | |
--| 18 | FILTER | | | | |
--| 19 | FAST DUAL | | 1 | | 2 (0)|
--| 20 | FILTER | | | | |
--| 21 | FAST DUAL | | 1 | | 2 (0)|
--| 22 | FILTER | | | | |
--| 23 | FAST DUAL | | 1 | | 2 (0)|
--| 24 | FILTER | | | | |
--| 25 | FAST DUAL | | 1 | | 2 (0)|
------------------------------------------------------------------------------------------
where
--mod(to_number(some_other_column1), 3000) = 0 and -- just some example restriction
(
p.ft_id = 'all'
or
p.ft_id = 'eq' and x.indexed_val = p.val
or
p.ft_id = 'like' and x.indexed_val like p.val
or
p.ft_id = 'regexp' and regexp_like(x.indexed_val, p.val)
)
-- with the full flexibility of the filter the plan shows a full table scan (plan id 13) :-(
--------------------------------------------------------------------------
--| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
--| 0 | SELECT STATEMENT | | 1099 | 25277 | 115 (3)|
--| 1 | VIEW | | 4 | 20 | 8 (0)|
--| 2 | UNION-ALL | | | | |
--| 3 | FILTER | | | | |
--| 4 | FAST DUAL | | 1 | | 2 (0)|
--| 5 | FILTER | | | | |
--| 6 | FAST DUAL | | 1 | | 2 (0)|
--| 7 | FILTER | | | | |
--| 8 | FAST DUAL | | 1 | | 2 (0)|
--| 9 | FILTER | | | | |
--| 10 | FAST DUAL | | 1 | | 2 (0)|
--| 11 | NESTED LOOPS | | 1099 | 25277 | 115 (3)|
--| 12 | FAST DUAL | | 1 | | 2 (0)|
--| 13 | TABLE ACCESS FULL| SAMPLE_TABLE | 1099 | 25277 | 113 (3)|
--| 14 | VIEW | | 4 | 20 | 8 (0)|
--| 15 | UNION-ALL | | | | |
--| 16 | FILTER | | | | |
--| 17 | FAST DUAL | | 1 | | 2 (0)|
--| 18 | FILTER | | | | |
--| 19 | FAST DUAL | | 1 | | 2 (0)|
--| 20 | FILTER | | | | |
--| 21 | FAST DUAL | | 1 | | 2 (0)|
--| 22 | FILTER | | | | |
--| 23 | FAST DUAL | | 1 | | 2 (0)|
--------------------------------------------------------------------------

Several features enable the optimizer to produce dynamic plans. The most common feature is FILTER operations, which should not be confused with filter predicates. A FILTER operation allows Oracle to enable or disable part of the plan at runtime based on a dynamic value. This feature normally works with bind variables, other types of dynamic queries may not use it.
Sample schema
create table sample_table
(
indexed_val varchar2(100),
some_other_column1 varchar2(100),
some_other_column2 varchar2(100),
some_other_column3 varchar2(100)
);
insert into sample_table
select level, level, level, level
from dual
connect by level <= 100000;
create index sample_table_idx1 on sample_table(indexed_val);
begin
dbms_stats.gather_table_stats(user, 'sample_table');
end;
/
Sample query using bind variables
explain plan for
select * from sample_table where :p_filter_type = 'all'
union all
select * from sample_table where :p_filter_type = 'eq' and indexed_val = :p_indexed_val
union all
select * from sample_table where :p_filter_type = 'like' and indexed_val like :p_indexed_val
union all
select * from sample_table where :p_filter_type = 'regexp' and regexp_like(indexed_val, :p_indexed_val);
select * from table(dbms_xplan.display(format => '-cost -bytes -rows'));
Sample plan
This demonstrates vastly different plans being used depending on input. A single = will use an INDEX RANGE SCAN, no predicate will use a TABLE ACCESS FULL. The
regular expression also uses a full table scan since there is no way to index regular expressions. Although depending on the exact type of expressions it may be
possible to enable useful indexing through function based indexes or Oracle Text indexes.
Plan hash value: 100704550
------------------------------------------------------------------------------
| Id | Operation | Name | Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 00:00:01 |
| 1 | UNION-ALL | | |
|* 2 | FILTER | | |
| 3 | TABLE ACCESS FULL | SAMPLE_TABLE | 00:00:01 |
|* 4 | FILTER | | |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| SAMPLE_TABLE | 00:00:01 |
|* 6 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 00:00:01 |
|* 7 | FILTER | | |
| 8 | TABLE ACCESS BY INDEX ROWID BATCHED| SAMPLE_TABLE | 00:00:01 |
|* 9 | INDEX RANGE SCAN | SAMPLE_TABLE_IDX1 | 00:00:01 |
|* 10 | FILTER | | |
|* 11 | TABLE ACCESS FULL | SAMPLE_TABLE | 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:P_FILTER_TYPE='all')
4 - filter(:P_FILTER_TYPE='eq')
6 - access("INDEXED_VAL"=:P_INDEXED_VAL)
7 - filter(:P_FILTER_TYPE='like')
9 - access("INDEXED_VAL" LIKE :P_INDEXED_VAL)
filter("INDEXED_VAL" LIKE :P_INDEXED_VAL)
10 - filter(:P_FILTER_TYPE='regexp')
11 - filter( REGEXP_LIKE ("INDEXED_VAL",:P_INDEXED_VAL))

(more for situation A) but also applicable to B) in this way ...)
I am now using some hybrid approach (combination of the 1. and 2. points in my question) and actually quite like it, because it also provides good debugging and encapsulation possibilities and the optimizer does not have to deal at all with finding the best strategy based on basically logically separated queries in a bigger query, e.g. on internal FILTER rules, which may be good or at worst incredibly more inefficient:
using this in the report
select *
from table(my_report_data_func_sql(
:val1,
:val1_filter_type,
:val2
))
where the table function is defined like this
create or replace function my_report_data_func_sql(
p_val1 integer default 1234,
p_val1_filter_type varchar2 default 'eq',
p_val2 varchar2 default null
) return varchar2 is
query varchar2(4000) := '
with params as ( -- *: default param
select
''||p_val1||'' p_val1, -- eq*
'''||p_val1_filter_type||''' p_val1_filter_type, -- [eq, all*, like, regexp]
'''||p_val2||''' p_val2 -- null*
from dual
)
select x.*
from
params p -- workaround for standalone-sql-debugging using "with" statement above
join my_report_data_base_view x on (1=1)
where 1=1 -- ease of filter expression adding below
'
-- #### FILTER CRITERIAS are appended here ####
-- val1-filter
||case p_val1_filter_type
when 'eq' then '
and val1 = p_val1
' when 'like' then '
and val1 like p_val1
' when 'regexp' then '
and regexp_like(val1, p_val1)
' else '' end -- all
;
begin
return query;
end;
;
and would produce the following by example:
select *
from table(my_report_data_func_sql(
1234,
'eq',
'someval2'
))
/*
with params as ( -- *: default param
select
1 p_val1, -- eq*
'eq' p_val1_filter_type, -- [eq, all*, like, regexp]
'someval2' p_val2 -- null*
from dual
)
select x.*
from
params p -- workaround for standalone-sql-debugging using "with" statement above
join my_report_data_base_view x on (1=1)
where 1=1 -- ease of filter expression adding below
and val1 = p_val1
*/

Related

How can I write a function with two tables inputs and one table output in PostgreSQL?

I want to create a function that can create a table, in which part of the columns is derived from the other two tables.
input table1:
This is a static table for each loan. Each loan has only one row with information related to that loan. For example, original unpaid balance, original interest rate...
| id | loan_age | ori_upb | ori_rate | ltv |
| --- | -------- | ------- | -------- | --- |
| 1 | 360 | 1500 | 4.5 | 0.6 |
| 2 | 360 | 2000 | 3.8 | 0.5 |
input table2:
This is a dynamic table for each loan. Each loan has seraval rows show the loan performance in each month. For example, current unpaid balance, current interest rate, delinquancy status...
| id | month| cur_upb | cur_rate |status|
| ---| --- | ------- | -------- | --- |
| 1 | 01 | 1400 | 4.5 | 0 |
| 1 | 02 | 1300 | 4.5 | 0 |
| 1 | 03 | 1200 | 4.5 | 1 |
| 2 | 01 | 2000 | 3.8 | 0 |
| 2 | 02 | 1900 | 3.8 | 0 |
| 2 | 03 | 1900 | 3.8 | 1 |
| 2 | 04 | 1900 | 3.8 | 2 |
output table:
The output table contains information from table1 and table2. Payoffupb is the last record of cur_upb in table2. This table is built for model development.
| id | loan_age | ori_upb | ori_rate | ltv | payoffmonth| payoffupb | payoffrate |lastStatus | modification |
| ---| -------- | ------- | -------- | --- | ---------- | --------- | ---------- |---------- | ------------ |
| 1 | 360 | 1500 | 4.5 | 0.6 | 03 | 1200 | 4.5 | 1 | null |
| 2 | 360 | 2000 | 3.8 | 0.5 | 04 | 1900 | 3.8 | 2 | null |
Most columns in the output table can directly get or transferred from columns in the two input tables, but some columns can not get then leave blank.
My main question is how to write a function to take two tables as inputs and output another table?
I already wrote the feature transformation part for data files in 2018, but I need to do the same thing again for data files in some other years. That's why I want to create a function to make things easier.
As you want to insert the latest entry of table2 against each entry of table1 try this
insert into table3 (id, loan_age, ori_upb, ori_rate, ltv,
payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id)
t1.id, t1.loan_age, t1.ori_upb, t1.ori_rate, t1.ltv, t2.month, t2.cur_upb,
t2.cur_rate, t2.status
from
table1 t1
inner join
table2 t2 on t1.id=t2.id
order by t1.id , t2.month desc
DEMO1
EDIT for your updated question:
Function to do the above considering table1, table2, table3 structure will be always identical.
create or replace function insert_values(table1 varchar, table2 varchar, table3 varchar)
returns int as $$
declare
count_ int;
begin
execute format('insert into %I (id, loan_age, ori_upb, ori_rate, ltv, payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id) t1.id, t1.loan_age, t1.ori_upb,
t1.ori_rate,t1.ltv,t2.month,t2.cur_upb, t2.cur_rate, t2.status
from %I t1 inner join %I t2 on t1.id=t2.id order by t1.id , t2.month desc',table3,table1,table2);
GET DIAGNOSTICS count_ = ROW_COUNT;
return count_;
end;
$$
language plpgsql
and call above function like below which will return the number of inserted rows:
select * from insert_values('table1','table2','table3');
DEMO2

Find rows in relation with at least n rows in a different table without joins

I have a table as such (tbl):
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
| 3 | faso | 11 |
+----+------+-----+
And another table in n-to-1 relationship with tbl (tbl2):
+----+-----+
| pk | rel |
+----+-----+
| 0 | 0 |
| 1 | 1 |
| 2 | 0 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
| 7 | 2 |
+----+-----+
(tbl2.rel -> tbl.pk.)
I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.
I.e., for n = 2, I want this table:
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
+----+------+-----+
This is the solution I came up with:
SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
SELECT tbl.pk
FROM tbl
RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
GROUP BY tbl.pk
HAVING COUNT(tbl2.*) >= 2 -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk
Can it be done without selecting the candidates with a subquery and re-joining the table with itself?
I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.
OK, just join once, as below:
select
t1.pk,
t1.attr,
t1.val
from
tbl t1
join
tbl2 t2 on t1.pk = t2.rel
group by
t1.pk,
t1.attr,
t1.val
having(count(1)>=2) order by t1.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Or just join once and use CTE(with clause), as below:
with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Is the SQL clearer?

How to change the query to remain only leaf nodes

I have table with the following data:
id | parent_id | short_name
----+-----------+----------------
6 | 5 | cpu
7 | 5 | ram
14 | 9 | tier-a
15 | 9 | rfc1918
16 | 9 | tolerant
17 | 9 | nononymous
13 | 12 | cloudstack
5 | 13 | virtualmachine
8 | 13 | volume
9 | 13 | ipv4
3 | | domain
4 | | account
12 | | vdc
(13 rows)
with recursive query it looks like this:
with recursive tree ( id, parent_id, short_name, deep_name ) as (
select resource_type_id, parent_resource_type_id, short_name, short_name::text
from resource_type
where parent_resource_type_id is null
union all
select rt.resource_type_id as id, rt.parent_resource_type_id, rt.short_name,
tree.deep_name || '.' || rt.short_name
from tree, resource_type rt
where tree.id = rt.parent_resource_type_id
)
select * from tree;
id | parent_id | short_name | deep_name
----+-----------+----------------+-----------------------------------
4 | | account | account
3 | | domain | domain
12 | | vdc | vdc
13 | 12 | cloudstack | vdc.cloudstack
9 | 13 | ipv4 | vdc.cloudstack.ipv4
5 | 13 | virtualmachine | vdc.cloudstack.virtualmachine
8 | 13 | volume | vdc.cloudstack.volume
6 | 5 | cpu | vdc.cloudstack.virtualmachine.cpu
15 | 9 | rfc1918 | vdc.cloudstack.ipv4.rfc1918
17 | 9 | nononymous | vdc.cloudstack.ipv4.nononymous
16 | 9 | tolerant | vdc.cloudstack.ipv4.tolerant
14 | 9 | tier-a | vdc.cloudstack.ipv4.tier-a
7 | 5 | ram | vdc.cloudstack.virtualmachine.ram
(13 rows)
How to fix the query so in result I get only leafs? eg. vdc.cloudstack.volume row and no vdc, vdc.cloudstack rows
UPD
rows with no children
Exclude the rows where deep_name has a superstring somewhere else in the table:
WITH RECURSIVE tree AS (...)
SELECT * FROM tree AS t1
WHERE NOT EXISTS (
SELECT 1 FROM tree AS t2
WHERE t2.deep_name
LIKE t1.deep_name || '.%'
);
Laurenz Albe's answer give me an idea. I think it would be more efficient to count childs than working with strings.
My solution is:
WITH RECURSIVE tree AS (...)
SELECT * FROM tree t1
WHERE not EXISTS ( SELECT 1 FROM tree t2 WHERE t1.id = t2.parent_id );
A leaf node is a child which is not itself a parent.
If all you want is a list of leaf notes you don't need the recursive CTE, you just need an anti-join in your preferred format.
If (as I imagine you do) you need the deep_name, I would anti-join the result of the recursive CTE to the raw source table on id = parent_id.
WITH RECURSIVE tree AS (...)
SELECT * FROM tree AS t1
WHERE NOT EXISTS (SELECT 1 FROM resource_type AS t2
WHERE t2.parent_resource_type_id = t1.id);

Multi-table recursive sql statement

I have been struggling to optimize a recursive call done purely in ruby. I have moved the data onto a postgresql database, and I would like to make use of the WITH RECURSIVE function that postgresql offers.
The examples that I could find all seems to use a single table, such as a menu or a categories table.
My situation is slightly different. I have a questions and an answers table.
+----------------------+ +------------------+
| questions | | answers |
+----------------------+ +------------------+
| id | | source_id | <- from question ID
| start_node (boolean) | | target_id | <- to question ID
| end_node (boolean) | +------------------+
+----------------------+
I would like to fetch all questions that's connected together by the related answers.
I would also like to be able to go the other way in the tree, e.g from any given node to the root node in the tree.
To give another example of a question-answer tree in a graphical way:
Q1
|-- A1
| '-- Q2
| |-- A2
| | '-- Q3
| '-- A3
| '-- Q4
'-- A4
'-- Q5
As you can see, a question can have multiple outgoing questions, but they can also have multiple incoming answers -- any-to-many.
I hope that someone has a good idea, or can point me to some examples, articles or guides.
Thanks in advance, everybody.
Regards,
Emil
This is far, far from ideal but I would play around recursive query over joins, like that:
WITH RECURSIVE questions_with_answers AS (
SELECT
q.*, a.*
FROM
questions q
LEFT OUTER JOIN
answers a ON (q.id = a.source_id)
UNION ALL
SELECT
q.*, a.*
FROM
questions_with_answers qa
JOIN
questions q ON (qa.target_id = q.id)
LEFT OUTER JOIN
answers a ON (q.id = a.source_id)
)
SELECT * FROM questions_with_answers WHERE source_id IS NOT NULL AND target_id IS NOT NULL;
Which gives me result:
id | name | start_node | end_node | source_id | target_id
----+------+------------+----------+-----------+-----------
1 | Q1 | | | 1 | 2
2 | A1 | | | 2 | 3
3 | Q2 | | | 3 | 4
3 | Q2 | | | 3 | 6
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
1 | Q1 | | | 1 | 8
8 | A4 | | | 8 | 9
2 | A1 | | | 2 | 3
3 | Q2 | | | 3 | 6
3 | Q2 | | | 3 | 4
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
8 | A4 | | | 8 | 9
3 | Q2 | | | 3 | 6
3 | Q2 | | | 3 | 4
6 | A3 | | | 6 | 7
4 | A2 | | | 4 | 5
6 | A3 | | | 6 | 7
4 | A2 | | | 4 | 5
(20 rows)
In fact you do not need two tables.
I would like to encourage you to analyse this example.
Maintaining one table instead of two will save you a lot of trouble, especially when it comes to recursive queries.
This minimal structure contains all the necessary information:
create table the_table (id int primary key, parent_id int);
insert into the_table values
(1, 0), -- root question
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 1),
(7, 3),
(8, 0), -- root question
(9, 8);
Whether the node is a question or an answer depends on its position in the tree. Of course, you can add a column with information about the type of node to the table.
Use this query to get answer for both your requests (uncomment adequate where condition):
with recursive cte(id, parent_id, depth, type, root) as (
select id, parent_id, 1, 'Q', id
from the_table
where parent_id = 0
-- and id = 1 <-- looking for list of a&q for root question #1
union all
select
t.id, t.parent_id, depth+ 1,
case when (depth & 1)::boolean then 'A' else 'Q' end, c.root
from cte c
join the_table t on t.parent_id = c.id
)
select *
from cte
-- where id = 9 <-- looking for root question for answer #9
order by id;
id | parent_id | depth | type | root
----+-----------+-------+------+------
1 | 0 | 1 | Q | 1
2 | 1 | 2 | A | 1
3 | 1 | 2 | A | 1
4 | 2 | 3 | Q | 1
5 | 2 | 3 | Q | 1
6 | 1 | 2 | A | 1
7 | 3 | 3 | Q | 1
8 | 0 | 1 | Q | 8
9 | 8 | 2 | A | 8
(9 rows)
The relationship child - parent is unambiguous and applies to both sides. There is no need to store this information twice. In other words, if we store information about parents, the information about children is redundant (and vice versa). It is one of the fundamental properties of the data structure called tree. See the examples:
-- find parent of node #6
select parent_id
from the_table
where id = 6;
-- find children of node #6
select id
from the_table
where parent_id = 6;

PostgreSQL Query?

DB
| ID| VALUE | Parent | Position | lft | rgt |
|---|:------:|:-------:|:--------------:|:--------:|:--------:|
| 1 | A | | | 1 | 12 |
| 2 | B | 1 | L | 2 | 9 |
| 3 | C | 1 | R | 10 | 11 |
| 4 | D | 2 | L | 3 | 6 |
| 5 | F | 2 | R | 7 | 8 |
| 6 | G | 4 | L | 4 | 5 |
Get All Nodes directly under current Node in left side
SELECT "categories".* FROM "categories" WHERE ("categories"."position" = 'L') AND ("categories"."lft" >= 1 AND "categories"."lft" < 12) ORDER BY "categories"."lft"
output { B,D,G } incoorect!
Question !
how have Nodes directly under current Node in left and right side?
output-lft {B,D,F,G}
output-rgt {C}
It sounds like you're after something analogous to Oracle's CONNECT_BY statement, which is used to connect hierarchical data stored in a flat table.
It just so happens there's a way to do this with Postgres, using a recursive CTE.
here is the statement I came up with.
WITH RECURSIVE sub_categories AS
(
-- non-recursive term
SELECT * FROM categories WHERE position IS NOT NULL
UNION ALL
-- recursive term
SELECT c.*
FROM
categories AS c
JOIN
sub_categories AS sc
ON (c.parent = sc.id)
)
SELECT DISTINCT categories.value
FROM categories,
sub_categories
WHERE ( categories.parent = sub_categories.id
AND sub_categories.position = 'L' )
OR ( categories.parent = 1
AND categories.position = 'L' )
here is a SQL Fiddle with a working example.