A few days ago I asked a question about deleting using WITH RECURSIVE from PostgreSQL. There is:
DELETE recursive PostgreSQL
That works fine: the intention, initially, was to delete parent folders recursively as long as the final child was deleted. The following image describes it better:
Files tree view
By deleting the file 5.jpg, all parent folders, in this situation, would be deleted as well.
But now I have to delete the parent folders only if they get empty, i.e. by losing its only child. I tried the following:
WITH RECURSIVE all_uploads (codigo, parent, ext, uploader) AS (
SELECT ut1.codigo, ut1.codigo_upload_temp_pai AS parent, ut1.codigo_extensao AS ext, ut1.codigo_usuario_inclusao AS uploader
FROM upload_temp ut1
WHERE ut1.codigo = 576
UNION ALL
SELECT ut2.codigo, ut2.codigo_upload_temp_pai AS parent, ut2.codigo_extensao AS ext, ut2.codigo_upload_temp_pai AS uploader
FROM upload_temp ut2
JOIN all_uploads au ON au.parent = ut2.codigo
WHERE (SELECT ut3.codigo FROM upload_temp ut3 WHERE ut3.codigo_upload_temp_pai = ut2.codigo LIMIT 1) IS NULL
AND ext IS NULL
AND uploader = 1535
)
DELETE FROM upload_temp WHERE codigo IN (SELECT codigo FROM all_uploads);
I thought the only way to check if a folder is empty is to perform a sub-select considering the self relationship. If SELECT ut3.codigo FROM upload_temp ut3 WHERE ut3.codigo_upload_temp_pai = ut2.codigo LIMIT 1) IS NULL returns true, so the folder is empty. And by using the self referencing feature (same DB table for folders and files), I know it's a folder by checking codigo_extensao field (only files have extensions).
Well, it's not working, it removes only my 5.jpg. Any hint? Thanks in advance!
You can't DELETE recursively like you want. The logic here is to create a query that deletes everything you want, and run it recursively until there is anything more to delete.
Here is a function that does exactly what you need:
CREATE OR REPLACE FUNCTION p_remove_empty_folders(_codigo_usuario_ integer) RETURNS integer AS $$
DECLARE
AFFECTEDROWS integer;
BEGIN
WITH a AS (
DELETE FROM upload_temp WHERE codigo IN (SELECT ut1.codigo FROM upload_temp ut1 WHERE ut1.codigo_usuario_inclusao = _codigo_usuario_ AND ut1.codigo_extensao IS NULL AND NOT EXISTS (SELECT * FROM upload_temp ut2 WHERE ut2.codigo_upload_temp_pai = ut1.codigo))
RETURNING 1
)
SELECT count(*) INTO AFFECTEDROWS FROM a;
WHILE AFFECTEDROWS > 0 LOOP
WITH a AS (
DELETE FROM upload_temp WHERE codigo IN (SELECT ut1.codigo FROM upload_temp ut1 WHERE ut1.codigo_usuario_inclusao = _codigo_usuario_ AND ut1.codigo_extensao IS NULL AND NOT EXISTS (SELECT * FROM upload_temp ut2 WHERE ut2.codigo_upload_temp_pai = ut1.codigo))
RETURNING 1
)
SELECT count(*) INTO AFFECTEDROWS FROM a;
END LOOP;
RETURN 0;
END;
$$ LANGUAGE plpgsql;
Cya!
Related
In PgSQL I make huge select, and then I want count it's size and apply some extra filters.
execute it twice sound dumm,
so I wrapped it in function
and then "cache" it and return union of filtered table and extra row at the end where in "id" column store size
with q as (select * from myFunc())
select * from q
where q.distance < 400
union all
select count(*) as id, null,null,null
from q
but it also doesn't look like proper solution...
and so the question: is in pg something like "generator function" or any other stuff that can properly solve this ?
postgreSQL 13
myFunc aka "selectItemsByRootTag"
CREATE OR REPLACE FUNCTION selectItemsByRootTag(
in tag_name VARCHAR(50)
)
RETURNS table(
id BIGINT,
name VARCHAR(50),
description TEXT,
/*info JSON,*/
distance INTEGER
)
AS $$
BEGIN
RETURN QUERY(
WITH RECURSIVE prod AS (
SELECT
tags.name, tags.id, tags.parent_tags
FROM
tags
WHERE tags.name = (tags_name)
UNION
SELECT c.name, c.id , c.parent_tags
FROM
tags as c
INNER JOIN prod as p
ON c.parent_tags = p.id
)
SELECT
points.id,
points.name,
points.description,
/*points.info,*/
points.distance
from points
left join tags on points.tag_id = tags.id
where tags.name in (select prod.name from prod)
);
END;
$$ LANGUAGE plpgsql;
as a result i want see maybe set of 2 table or generator function that yield some intermediate result not shure how exacltly it should look
demo
CREATE OR REPLACE FUNCTION pg_temp.selectitemsbyroottag(tag_name text, _distance numeric)
RETURNS TABLE(id bigint, name text, description text, distance numeric, count bigint)
LANGUAGE plpgsql
AS $function$
DECLARE _sql text;
BEGIN
_sql := $p1$WITH RECURSIVE prod AS (
SELECT
tags.name, tags.id, tags.parent_tags
FROM
tags
WHERE tags.name ilike '%$p1$ || tag_name || $p2$%'
UNION
SELECT c.name, c.id , c.parent_tags
FROM
tags as c
INNER JOIN prod as p
ON c.parent_tags = p.id
)
SELECT
points.id,
points.name,
points.description,
points.distance,
count(*) over ()
from points
left join tags on points.tag_id = tags.id
where tags.name in (select prod.name from prod)
and points.distance > $p2$ || _distance
;
raise notice '_sql: %', _sql;
return query execute _sql;
END;
$function$
You can call it throug following way
select * from pg_temp.selectItemsByRootTag('test',20);
select * from pg_temp.selectItemsByRootTag('test_8',20) with ORDINALITY;
The 1 way to call the function, will have a row of total count total number of rows. Second way call have number of rows plus a serial incremental number.
I also make where q.distance < 400 into function input argument.
selectItemsByRootTag('test',20); means that q.distance > 20 and tags.name ilike '%test%'.
The table has many columns, but for the problematic part let's assume only two, [ID] and [dependency].
[Dependency] means which [ID]s should be listed before this [ID].
Each row has its unique [ID] but it might have none, one or more dependencies in the [Dependency] column
ID
Dependency
1
4
2
null
3
1,2
4
2
5
1,2,4
Expected Result
ID
Dependency
2
null
4
2
1
4
3
1,2
5
1,2,4
I have no prior experience in Postgres, I found this very useful:
SELECT aa.dep_id::integer FROM unnest(string_to_array(ss.dependency, ',')) aa(dep_id)
But still, I can't make it right.
EDIT: Added one row with 3 dependencies'
http://sqlfiddle.com/#!15/894c3/4
Use a recursive CTE:
WITH RECURSIVE o AS (
SELECT ss.id, ss.dependency,
1 AS level
FROM ss
WHERE dependency IS NULL
UNION ALL
SELECT ss.id, ss.dependency,
o.level + 1
FROM ss
JOIN o
ON o.id IN (SELECT x
FROM unnest(ss.dependency) AS x(x)
)
)
SELECT o.id, o.dependency
FROM o
GROUP BY o.id, o.dependency
ORDER BY max(o.level);
My working solution, I hope it can be improved
DO $do$
DECLARE v_RowCountInt Int;
--copy all values to temp table
begin
--This is the origin table, all rows to sort
drop table if exists _tempAll;
create temp table _tempAll as select id, dependency from XXXXX;
-- create temp table for results
drop table if exists _tempSort;
create temp table _tempSort (
id integer
,dependency varchar
,pos serial primary key);
-- move all IDs with no depencencies
WITH tmp AS (DELETE FROM _tempAll where dependency is null RETURNING id, dependency)
INSERT INTO _tempSort ( id, dependency)
SELECT id, dependency FROM tmp;
GET DIAGNOSTICS v_RowCountInt = ROW_COUNT;
while v_RowCountInt>0 loop
-- move rows with all dependencies met
WITH tmp AS (DELETE FROM _tempAll a
where a.id in(SELECT a.id FROM _tempSort s inner join _tempAll a on
s.id in (select split.dep_sid::integer from unnest(string_to_array(a.dependency, ',')) split(dep_sid))
group by a.id, a.dependency
-- verify all dependencies are already sorted
having count(*)=(CHAR_LENGTH(a.dependency) - CHAR_LENGTH(REPLACE(a.dependency, ',', ''))+1))
RETURNING a.id, a.dependency)
INSERT INTO _tempSort (id, dependency)
SELECT id, dependency FROM tmp;
GET DIAGNOSTICS v_RowCountInt = ROW_COUNT;
end loop;
end;
$do$
Say I have a table:
CREATE TABLE nodes (
id SERIAL PRIMARY KEY,
parent_id INTEGER REFERENCES nodes(id),
trashed_at timestamptz
)
I have this query nodes_trash_node(node_id INTEGER):
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = node_id
OR nodes.id IN (SELECT id FROM nodes_descendants(node_id))
RETURNING *
The nodes_descendants function operates on an adjacency list structure and looks like this:
CREATE OR REPLACE FUNCTION nodes_descendants(node_id INTEGER, depth INTEGER)
RETURNS TABLE (id INTEGER) AS $$
WITH RECURSIVE tree AS (
SELECT id, array[node_id]::integer[] as ancestors
FROM nodes
WHERE parent_id = node_id
UNION ALL
SELECT nodes.id, tree.ancestors || nodes.parent_id
FROM nodes, tree
WHERE nodes.parent_id = tree.id
AND (depth = 0 OR cardinality(tree.ancestors) < depth)
)
SELECT id FROM tree;
$$ LANGUAGE sql;
(taken from here).
However I'd now like to convert my query to take a list of node_ids, but I'm struggling to find the correct syntax. Something like:
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = ANY(node_ids)
OR nodes.id IN (???)
RETURNING *
EDIT
Just to clarify, I'd like to now select many 'root' node_ids and all their descendants. For the example use case: select many files and folders and move to the trash at the same time.
Thanks.
It is straight-forward if you do not use a function actually.
BTW, I have changed it to a proper INNER JOIN.
Please do not use tables products (i.e. cross joins) followed by WHERE as you will mistakenly skip it someday.
WITH RECURSIVE tree AS (
SELECT id
FROM nodes
WHERE <Type your condition here>
UNION ALL
SELECT nodes.id
FROM nodes
JOIN tree ON nodes.parent_id = tree.id
)
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id IN (SELECT id from Tree)
RETURNING *
It's a little hard to know if this is right or not without the larger context of where these updates are running. Presumably it's within a procedure/function or through an application?
Either way, I think your final syntax was fine -- it's just you need to ensure the datatype you pass is an array:
update nodes
set trashed_at = now()
where id in (1, 2, 3);
Is essentially the same functionally as:
update nodes
set trashed_at = now()
where id = any(array[1, 2, 3]);
So, back to your original statement:
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = ANY(node_ids)
OR nodes.id IN (???)
RETURNING *
I think you can simplify this to:
UPDATE nodes SET
trashed_at = now()
WHERE nodes.id = ANY(node_ids)
RETURNING *
Just be sure node_ids is an array of 64-bit integers.
So, assuming this was within a procedure, these are some examples:
DECLARE
node_ids bigint[];
BEGIN
node_ids := array[1, 2, 3, 4];
-- or perhaps
select array_agg (bar)
into node_ids
from foo
where baz = x;
UPDATE nodes
SET trashed_at = now()
WHERE nodes.id = ANY(node_ids)
RETURNING *;
END;
IMO it's always been a struggle to pass an in-list as a parameters, but with PostgreSQL arrays it's not only possible but quite straight-forward.
I want to create a function like below which inserts data as per the input given. But I keep on getting an error about undetermined dollar sign.
CREATE OR REPLACE FUNCTION test_generate
(
ref REFCURSOR,
_id INTEGER
)
RETURNS refcursor AS $$
DECLARE
BEGIN
DROP TABLE IF EXISTS test_1;
CREATE TEMP TABLE test_1
(
id int,
request_id int,
code text
);
IF _id IS NULL THEN
INSERT INTO test_1
SELECT
rd.id,
r.id,
rd.code
FROM
test_2 r
INNER JOIN
raw_table rd
ON
rd.test_2_id = r.id
LEFT JOIN
observe_test o
ON
o.raw_table_id = rd.id
WHERE o.id IS NULL
AND COALESCE(rd.processed, 0) = 0;
ELSE
INSERT INTO test_1
SELECT
rd.id,
r.id,
rd.code
FROM
test_2 r
INNER JOIN
raw_table rd
ON rd.test_2_id = r.id
WHERE r.id = _id;
END IF;
DROP TABLE IF EXISTS tmp_test_2_error;
CREATE TEMP TABLE tmp_test_2_error
(
raw_table_id int,
test_2_id int,
error text,
record_num int
);
INSERT INTO tmp_test_2_error
(
raw_table_id,
test_2_id,
error,
record_num
)
SELECT DISTINCT
test_1.id,
test_1.test_2_id,
'Error found ' || test_1.code,
0
FROM
test_1
WHERE 1 = 1
AND data_origin.id IS NULL;
INSERT INTO tmp_test_2_error
SELECT DISTINCT
test_1.id,
test_1.test_2_id,
'Error found ' || test_1.code,
0
FROM
test_1
INNER JOIN
data_origin
ON
data_origin.code = test_1.code
WHERE dop.id IS NULL;
DROP table IF EXISTS test_latest;
CREATE TEMP TABLE test_latest AS SELECT * FROM observe_test WHERE 1 = 2;
INSERT INTO test_latest
(
raw_table_id,
series_id,
timestamp
)
SELECT
test_1.id,
ds.id AS series_id,
now()
FROM
test_1
INNER JOIN data_origin ON data_origin.code = test_1.code
LEFT JOIN
observe_test o ON o.raw_table_id = test_1.id
WHERE o.id IS NULL;
CREATE TABLE latest_observe_test as Select * from test_latest where 1=0;
INSERT INTO latest_observe_test
(
raw_table_id,
series_id,
timestamp,
time
)
SELECT
t.id,
ds.id AS series_id,
now(),
t.time
FROM
test_latest t
WHERE t.series_id IS DISTINCT FROM observe_test.series_id;
DELETE FROM test_2_error re
USING t
WHERE t.test_2_id = re.test_2_id;
INSERT INTO test_2_error (test_2_id, error, record_num)
SELECT DISTINCT test_2_id, error, record_num FROM tmp_test_2_error ORDER BY error;
UPDATE raw_table AS rd1
SET processed = case WHEN tre.raw_table_id IS null THEN 2 ELSE 1 END
FROM test_1 tr
LEFT JOIN
tmp_test_2_error tre ON tre.raw_table_id = tr.id
WHERE rd1.id = tr.id;
OPEN ref FOR
SELECT 1;
RETURN ref;
OPEN ref for
SELECT o.* from observe_test o
;
RETURN ref;
OPEN ref FOR
SELECT
rd.id,
ds.id AS series_id,
now() AS timestamp,
rd.time
FROM test_2 r
INNER JOIN raw_table rd ON rd.test_2_id = r.id
INNER JOIN data_origin ON data_origin.code = rd.code
WHERE o.id IS NULL AND r.id = _id;
RETURN ref;
END;
$$ LANGUAGE plpgsql VOLATILE COST 100;
I am not able to run this procedure.
Can you please help me where I have done wrong?
I am using squirrel and face the same question as you.
until I found that:
-- Note that if you want to create the function under Squirrel SQL,
-- you must go to Sessions->Session Properties
-- then SQL tab and change the Statement Separator from ';' to something else
-- (for intance //). Otherwise Squirrel SQL sends one piece to the server
-- that stops at the first encountered ';', and the server cannot make
-- sense of it. With the separator changed as suggested, you type everything
-- as above and end with
-- ...
-- end;
-- $$ language plpgsql
-- //
--
-- You can then restore the default separator, or use the new one for
-- all queries ...
--
I'm trying to run a graph search to find all nodes accessible from a starting point, like so:
with recursive
nodes_traversed as (
select START_NODE ID
from START_POSITION
union all
select ed.DST_NODE
from EDGES ed
join nodes_traversed NT
on (NT.ID = ed.START_NODE)
and (ed.DST_NODE not in (select ID from nodes_traversed))
)
select distinct * from nodes_traversed
Unfortunately, when I try to run that, I get an error:
Recursive CTE member (nodes_traversed) can refer itself only in FROM clause.
That "not in select" clause is important to the recursive expression, though, as it provides the ending point. (Without it, you get infinite recursion.) Using generation counting, like in the accepted answer to this question, would not help, since this is a highly cyclic graph.
Is there any way to work around this without having to create a stored proc that does it iteratively?
Here is my solution that use global temporary table, I have limited recursion by level and nodes from temporary table.
I am not sure how it will work on large set of data.
create procedure get_nodes (
START_NODE integer)
returns (
NODE_ID integer)
as
declare variable C1 integer;
declare variable C2 integer;
begin
/**
create global temporary table id_list(
id integer
);
create index id_list_idx1 ON id_list (id);
*/
delete from id_list;
while ( 1 = 1 ) do
begin
select count(distinct id) from id_list into :c1;
insert into id_list
select id from
(
with recursive nodes_traversed as (
select :START_NODE AS ID , 0 as Lv
from RDB$DATABASE
union all
select ed.DST_NODE , Lv+1
from edges ed
join nodes_traversed NT
on
(NT.ID = ed.START_NODE)
and nt.Lv < 5 -- Max recursion level
and nt.id not in (select id from id_list)
)
select distinct id from nodes_traversed);
select count(distinct id) from id_list into :c2;
if (c1 = c2) then break;
end
for select distinct id from id_list into :node_id do
begin
suspend ;
end
end