How to work around the "Recursive CTE member can refer itself only in FROM clause" requirement? - firebird

I'm trying to run a graph search to find all nodes accessible from a starting point, like so:
with recursive
nodes_traversed as (
select START_NODE ID
from START_POSITION
union all
select ed.DST_NODE
from EDGES ed
join nodes_traversed NT
on (NT.ID = ed.START_NODE)
and (ed.DST_NODE not in (select ID from nodes_traversed))
)
select distinct * from nodes_traversed
Unfortunately, when I try to run that, I get an error:
Recursive CTE member (nodes_traversed) can refer itself only in FROM clause.
That "not in select" clause is important to the recursive expression, though, as it provides the ending point. (Without it, you get infinite recursion.) Using generation counting, like in the accepted answer to this question, would not help, since this is a highly cyclic graph.
Is there any way to work around this without having to create a stored proc that does it iteratively?

Here is my solution that use global temporary table, I have limited recursion by level and nodes from temporary table.
I am not sure how it will work on large set of data.
create procedure get_nodes (
START_NODE integer)
returns (
NODE_ID integer)
as
declare variable C1 integer;
declare variable C2 integer;
begin
/**
create global temporary table id_list(
id integer
);
create index id_list_idx1 ON id_list (id);
*/
delete from id_list;
while ( 1 = 1 ) do
begin
select count(distinct id) from id_list into :c1;
insert into id_list
select id from
(
with recursive nodes_traversed as (
select :START_NODE AS ID , 0 as Lv
from RDB$DATABASE
union all
select ed.DST_NODE , Lv+1
from edges ed
join nodes_traversed NT
on
(NT.ID = ed.START_NODE)
and nt.Lv < 5 -- Max recursion level
and nt.id not in (select id from id_list)
)
select distinct id from nodes_traversed);
select count(distinct id) from id_list into :c2;
if (c1 = c2) then break;
end
for select distinct id from id_list into :node_id do
begin
suspend ;
end
end

Related

PgSQL function returning table and extra data computed in process

In PgSQL I make huge select, and then I want count it's size and apply some extra filters.
execute it twice sound dumm,
so I wrapped it in function
and then "cache" it and return union of filtered table and extra row at the end where in "id" column store size
with q as (select * from myFunc())
select * from q
where q.distance < 400
union all
select count(*) as id, null,null,null
from q
but it also doesn't look like proper solution...
and so the question: is in pg something like "generator function" or any other stuff that can properly solve this ?
postgreSQL 13
myFunc aka "selectItemsByRootTag"
CREATE OR REPLACE FUNCTION selectItemsByRootTag(
in tag_name VARCHAR(50)
)
RETURNS table(
id BIGINT,
name VARCHAR(50),
description TEXT,
/*info JSON,*/
distance INTEGER
)
AS $$
BEGIN
RETURN QUERY(
WITH RECURSIVE prod AS (
SELECT
tags.name, tags.id, tags.parent_tags
FROM
tags
WHERE tags.name = (tags_name)
UNION
SELECT c.name, c.id , c.parent_tags
FROM
tags as c
INNER JOIN prod as p
ON c.parent_tags = p.id
)
SELECT
points.id,
points.name,
points.description,
/*points.info,*/
points.distance
from points
left join tags on points.tag_id = tags.id
where tags.name in (select prod.name from prod)
);
END;
$$ LANGUAGE plpgsql;
as a result i want see maybe set of 2 table or generator function that yield some intermediate result not shure how exacltly it should look
demo
CREATE OR REPLACE FUNCTION pg_temp.selectitemsbyroottag(tag_name text, _distance numeric)
RETURNS TABLE(id bigint, name text, description text, distance numeric, count bigint)
LANGUAGE plpgsql
AS $function$
DECLARE _sql text;
BEGIN
_sql := $p1$WITH RECURSIVE prod AS (
SELECT
tags.name, tags.id, tags.parent_tags
FROM
tags
WHERE tags.name ilike '%$p1$ || tag_name || $p2$%'
UNION
SELECT c.name, c.id , c.parent_tags
FROM
tags as c
INNER JOIN prod as p
ON c.parent_tags = p.id
)
SELECT
points.id,
points.name,
points.description,
points.distance,
count(*) over ()
from points
left join tags on points.tag_id = tags.id
where tags.name in (select prod.name from prod)
and points.distance > $p2$ || _distance
;
raise notice '_sql: %', _sql;
return query execute _sql;
END;
$function$
You can call it throug following way
select * from pg_temp.selectItemsByRootTag('test',20);
select * from pg_temp.selectItemsByRootTag('test_8',20) with ORDINALITY;
The 1 way to call the function, will have a row of total count total number of rows. Second way call have number of rows plus a serial incremental number.
I also make where q.distance < 400 into function input argument.
selectItemsByRootTag('test',20); means that q.distance > 20 and tags.name ilike '%test%'.

How to make my query for one id work for many ids

Say I have a table:
CREATE TABLE nodes (
id SERIAL PRIMARY KEY,
parent_id INTEGER REFERENCES nodes(id),
trashed_at timestamptz
)
I have this query nodes_trash_node(node_id INTEGER):
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = node_id
OR nodes.id IN (SELECT id FROM nodes_descendants(node_id))
RETURNING *
The nodes_descendants function operates on an adjacency list structure and looks like this:
CREATE OR REPLACE FUNCTION nodes_descendants(node_id INTEGER, depth INTEGER)
RETURNS TABLE (id INTEGER) AS $$
WITH RECURSIVE tree AS (
SELECT id, array[node_id]::integer[] as ancestors
FROM nodes
WHERE parent_id = node_id
UNION ALL
SELECT nodes.id, tree.ancestors || nodes.parent_id
FROM nodes, tree
WHERE nodes.parent_id = tree.id
AND (depth = 0 OR cardinality(tree.ancestors) < depth)
)
SELECT id FROM tree;
$$ LANGUAGE sql;
(taken from here).
However I'd now like to convert my query to take a list of node_ids, but I'm struggling to find the correct syntax. Something like:
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = ANY(node_ids)
OR nodes.id IN (???)
RETURNING *
EDIT
Just to clarify, I'd like to now select many 'root' node_ids and all their descendants. For the example use case: select many files and folders and move to the trash at the same time.
Thanks.
It is straight-forward if you do not use a function actually.
BTW, I have changed it to a proper INNER JOIN.
Please do not use tables products (i.e. cross joins) followed by WHERE as you will mistakenly skip it someday.
WITH RECURSIVE tree AS (
SELECT id
FROM nodes
WHERE <Type your condition here>
UNION ALL
SELECT nodes.id
FROM nodes
JOIN tree ON nodes.parent_id = tree.id
)
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id IN (SELECT id from Tree)
RETURNING *
It's a little hard to know if this is right or not without the larger context of where these updates are running. Presumably it's within a procedure/function or through an application?
Either way, I think your final syntax was fine -- it's just you need to ensure the datatype you pass is an array:
update nodes
set trashed_at = now()
where id in (1, 2, 3);
Is essentially the same functionally as:
update nodes
set trashed_at = now()
where id = any(array[1, 2, 3]);
So, back to your original statement:
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = ANY(node_ids)
OR nodes.id IN (???)
RETURNING *
I think you can simplify this to:
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = ANY(node_ids)
RETURNING *
Just be sure node_ids is an array of 64-bit integers.
So, assuming this was within a procedure, these are some examples:
DECLARE
node_ids bigint[];
BEGIN
node_ids := array[1, 2, 3, 4];
-- or perhaps
select array_agg (bar)
into node_ids
from foo
where baz = x;
UPDATE nodes
SET trashed_at = now()
WHERE nodes.id = ANY(node_ids)
RETURNING *;
END;
IMO it's always been a struggle to pass an in-list as a parameters, but with PostgreSQL arrays it's not only possible but quite straight-forward.

Get all instances of primary keys of a table

This is a simple example of what I need, for any given table, I need to get all the instances of the primary keys, this is a little example, but I need a generic way to do it.
create table foo
(
a numeric
,b text
,c numeric
constraint pk_foo primary key (a,b)
)
insert into foo(a,b,c) values (1,'a',1),(2,'b',2),(3,'c',3);
select <the magical thing>
result
a|b
1 |1|a|
2 |2|b|
3 |3|c|
.. ...
I need to control if the instances of the primary keys are changed by the user, but I don't want to repeat code in too many tables! I need a generic way to do it, I will put <the magical thing>
in a function to put it on a trigger before update and blah blah blah...
In PostgreSQL you must always provide a resulting type for a query. However, you can obtain the code of the query you need, and then execute the query from the client:
create or replace function get_key_only_sql(regclass) returns string as $$
select 'select '|| (
select string_agg(quote_ident(att.attname), ', ' order by col)
from pg_index i
join lateral unnest(indkey) col on (true)
join pg_attribute att on (att.attrelid = i.indrelid and att.attnum = col)
where i.indrelid = $1 and i.indisprimary
group by i.indexrelid
limit 1) || ' from '||$1::text
end;
$$ language sql;
Here's some client pseudocode using the function above:
sql = pgexecscalar("select get_key_only_sql('mytable'::regclass)");
rs = pgopen(sql);

TSQL CTE: How to avoid circular traversal?

I have written a very simple CTE expression that retrieves a list of all groups of which a user is a member.
The rules goes like this, a user can be in multiple groups, and groups can be nested so that a group can be a member of another group, and furthermore, groups can be mutual member of another, so Group A is a member of Group B and Group B is also a member of Group A.
My CTE goes like this and obviously it yields infinite recursion:
;WITH GetMembershipInfo(entityId) AS( -- entity can be a user or group
SELECT k.ID as entityId FROM entities k WHERE k.id = #userId
UNION ALL
SELECT k.id FROM entities k
JOIN Xrelationships kc on kc.entityId = k.entityId
JOIN GetMembershipInfo m on m.entityId = kc.ChildID
)
I can't find an easy solution to back-track those groups that I have already recorded.
I was thinking of using an additional varchar parameter in the CTE to record a list of all groups that I have visited, but using varchar is just too crude, isn't it?
Is there a better way?
You need to accumulate a sentinel string within your recursion. In the following example I have a circular relationship from A,B,C,D, and then back to A, and I avoid a loop with the sentinel string:
DECLARE #MyTable TABLE(Parent CHAR(1), Child CHAR(1));
INSERT #MyTable VALUES('A', 'B');
INSERT #MyTable VALUES('B', 'C');
INSERT #MyTable VALUES('C', 'D');
INSERT #MyTable VALUES('D', 'A');
; WITH CTE (Parent, Child, Sentinel) AS (
SELECT Parent, Child, Sentinel = CAST(Parent AS VARCHAR(MAX))
FROM #MyTable
WHERE Parent = 'A'
UNION ALL
SELECT CTE.Child, t.Child, Sentinel + '|' + CTE.Child
FROM CTE
JOIN #MyTable t ON t.Parent = CTE.Child
WHERE CHARINDEX(CTE.Child,Sentinel)=0
)
SELECT * FROM CTE;
Result:
Parent Child Sentinel
------ ----- --------
A B A
B C A|B
C D A|B|C
D A A|B|C|D
Instead of a sentinel string, use a sentinel table variable. Function will catch circular reference no matter how many hops the circle is, no issues with maximum length of nvarchar(max), easily modified for different data types or even multipart keys, and you can assign the function to a check constraint.
CREATE FUNCTION [dbo].[AccountsCircular] (#AccountID UNIQUEIDENTIFIER)
RETURNS BIT
AS
BEGIN
DECLARE #NextAccountID UNIQUEIDENTIFIER = NULL;
DECLARE #Sentinel TABLE
(
ID UNIQUEIDENTIFIER
)
INSERT INTO #Sentinel
( [ID] )
VALUES ( #AccountID )
SET #NextAccountID = #AccountID;
WHILE #NextAccountID IS NOT NULL
BEGIN
SELECT #NextAccountID = [ParentAccountID]
FROM [dbo].[Accounts]
WHERE [AccountID] = #NextAccountID;
IF EXISTS(SELECT 1 FROM #Sentinel WHERE ID = #NextAccountID)
RETURN 1;
INSERT INTO #Sentinel
( [ID] )
VALUES ( #NextAccountID )
END
RETURN 0;
END

tsql - using internal stored procedure as parameter is where clause

I'm trying to build a stored procedure that makes use of another stored procedure. Taking its result and using it as part of its where clause, from some reason I receive an error:
Invalid object name 'dbo.GetSuitableCategories'.
Here is a copy of the code:
select distinct top 6 * from
(
SELECT TOP 100 *
FROM [dbo].[products] products
where products.categoryId in
(select top 10 categories.categoryid from
[dbo].[GetSuitableCategories]
(
-- #Age
-- ,#Sex
-- ,#Event
1,
1,
1
) categories
ORDER BY NEWID()
)
--and products.Price <=#priceRange
ORDER BY NEWID()
)as d
union
select * from
(
select TOP 1 * FROM [dbo].[products] competingproducts
where competingproducts.categoryId =-2
--and competingproducts.Price <=#priceRange
ORDER BY NEWID()
) as d
and here is [dbo].[GetSuitableCategories] :
if (#gender =0)
begin
select * from categoryTable categories
where categories.gender =3
end
else
begin
select * from categoryTable categories
where categories.gender = #gender
or categories.gender =3
end
I would use an inline table valued user defined function. Or simply code it inline is no re-use is required
CREATE dbo.GetSuitableCategories
(
--parameters
)
RETURNS TABLE
AS
RETURN (
select * from categoryTable categories
where categories.gender IN (3, #gender)
)
Some points though:
I assume categoryTable has no gender = 0
Do you have 3 genders in your categoryTable? :-)
Why do pass in 3 parameters but only use 1? See below please
Does #sex map to #gender?
If you have extra processing on the 3 parameters, then you'll need a multi statement table valued functions but beware these can be slow
You can't use the results of a stored procedure directly in a select statement
You'll either have to output the results into a temp table, or make the sproc into a table valued function to do what you doing.
I think this is valid, but I'm doing this from memory
create table #tmp (blah, blah)
Insert into #tmp
exec dbo.sprocName