Postgres plpgsql with PERFORM data-modifying CTE queries

Postgres plpgsql with PERFORM data-modifying CTE queries - postgresql

I tried to simulate my problem in the code example below. In the code below, I am doing a delete from test2 in a procedure. This works great:
However, in my case, this delete is part of a rather complex CTE with several updates and inserts (there are no selects so I add a dummy select 1 as main query). Let's simulate this as this:
with my_cte as(delete from test2) select 1
Now, as we know, we have to use the perform keyword to execute this:
perform (with my_cte as(delete from test2) select 1);
I am getting the following error:
ERROR: WITH clause containing a data-modifying statement must be at the top level
Is this a limitation of plpgsql?
(Please note that this is just an example to explain my problem. I know the queries do not really make any sense.)
create table test
(
key int primary key
);
create table test2
(
key int primary key
);
create function test() returns trigger as
$$
begin
raise notice 'hello there';
-- this does work
delete from test2;
-- this doesn't work
perform (with my_cte as(delete from test2) select 1);
return new;
end;
$$
language plpgsql;
create trigger test after insert on test for each row execute procedure test();
insert into test(key) select 1;

You can use CTE for combining several DELETE, INSERT, UPDATE returning queries. And you dont need perform for it, eg:
t=# begin; do $$ begin with d as (delete from s133 returning *) insert into s133 select * from d; raise info '%',(select count(1) from s133);
end; $$; commit;
BEGIN
Time: 0.135 ms
INFO: 4
DO
Time: 0.469 ms
COMMIT
Time: 0.887 ms
t=# select count(1) from s133;
count
-------
4
(1 row)
here I delete four rows and in CTE insert them back

As you found out, you can neither nest such a WITH clause in a subselect, not can you do
WITH cte AS (...)
PERFORM 1;
One solution would be to use SELECT ... INTO dummy instead of PERFORM and ignore the result.
But I don't see why you cannot code the DELETEs, UPDATEs and INSERTs in your function with several SQL statements rather than bundling them into CTEs.
If you try to protect yourself from concurrent data modification, use a REPEATABLE READ transaction so that all your statements operate on the same snapshot of the database.

Related

Query has no result in destination data when calling colpivot inside pgsql stored procedure

I have created a procedure to generate temp table using colpivot https://github.com/hnsl/colpivot
and saving the result into a physical table as below in PGSQL
create or replace procedure create_report_table()
language plpgsql
as $$
begin
drop table if exists reports;
select colpivot('_report',
'select
u.username,
c.shortname as course_short_name,
to_timestamp(cp.timecompleted)::date as completed
FROM mdl_course_completions AS cp
JOIN mdl_course AS c ON cp.course = c.id
JOIN mdl_user AS u ON cp.userid = u.id
WHERE c.enablecompletion = 1
ORDER BY u.username' ,array['username'], array['course_short_name'], '#.completed', null);
create table reports as (SELECT * FROM _report);
commit;
end; $$
colpivot function , drop table , delete table works really fine in isolation. but when I create the procedure as above, and call the procedure to execute, this throws an error Query has no result in destination data
Is there any way I can use colpivot in collaboration with several queries as I am currently trying ?

Use PERFORM instead of SELECT. That will execute the statement, without the need to keep the result somewhere. This is what the manual says:
Sometimes it is useful to evaluate an expression or SELECT query but discard the result, for example when calling a function that has
side-effects but no useful result value. To do this in PL/pgSQL, use
the PERFORM statement

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(

You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.

As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement

Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

Execute a SELECT with dynamic ORDER BY expression inside a function

I'm trying to EXECUTE some SELECTs to use inside a function, my code is something like this:
DECLARE
result_one record;
BEGIN
EXECUTE 'WITH Q1 AS
(
SELECT id
FROM table_two
INNER JOINs, WHERE, etc, ORDER BY... DESC
)
SELECT Q1.id
FROM Q1
WHERE, ORDER BY...DESC';
RETURN final_result;
END;
I know how to do it in MySQL, but in PostgreSQL I'm failing. What should I change or how should I do it?

For a function to be able to return multiple rows it has to be declared as returns table() (or returns setof)
And to actually return a result from within a PL/pgSQL function you need to use return query (as documented in the manual)
To build dynamic SQL in Postgres it is highly recommended to use the format() function to properly deal with identifiers (and to make the source easier to read).
So you need something like:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
'with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc', p_sort_column);
end;
$$
language plpgsql;
Note that the order by inside the CTE is pretty much useless if you are sorting the final query unless you use a LIMIT or distinct on () inside the query.
You can make your life even easier if you use another level of dollar quoting for the dynamic SQL:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
$query$
with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc
$query$, p_sort_column);
end;
$$
language plpgsql;

What a_horse said. And:
How to return result of a SELECT inside a function in PostgreSQL?
Plus, to pick a column for ORDER BY dynamically, you have to add that column to the SELECT list of your CTE, which leads to complications if the column can be duplicated (like with passing 'id') ...
Better yet, remove the CTE entirely. There is nothing in your question to warrant its use anyway. (Only use CTEs when needed in Postgres, they are typically slower than equivalent subqueries or simple queries.)
CREATE OR REPLACE FUNCTION get_data(p_sort_column text)
RETURNS TABLE (id integer) AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$
SELECT t2.id -- assuming you meant t2?
FROM table_two t2
JOIN table_three t3 on ...
ORDER BY t2.%I DESC NULL LAST -- see below!
$q$, $1);
END
$func$ LANGUAGE plpgsql;
I appended NULLS LAST - you'll probably want that, too:
PostgreSQL sort by datetime asc, null first?
If p_sort_column is from the same table all the time, hard-code that table name / alias in the ORDER BY clause. Else, pass the table name / alias separately and auto-quote them separately to be safe:
Define table and column names as arguments in a plpgsql function?
I suggest to table-qualify all column names in a bigger query with multiple joins (t2.id not just id). Avoids various kinds of surprising results / confusion / abuse.
And you may want to schema-qualify your table names (myschema.table_two) to avoid similar troubles when calling the function with a different search_path:
How does the search_path influence identifier resolution and the "current schema"

how to get last Postresql serial ID that all rows before it are committed?

I am using Postgresql 9.0.5 and I have a cron job that periodically reads newly created rows from a table and accumulate its value into a summary table that has hourly data.
I need to get the latest ID (serial) that is committed and all rows before it are committed.
The currval function will not give a correct value in this case, because the transaction inserting currval may commit earlier than others. Using SELECT statement at a moment, I can see Id column is not continuous because some rows are still not committed.
Here is some sample code I have used to test:
--test race condition
create table mydata(id serial,val int);
--run in thread 1
create or replace function insert_delay() returns void as $$
begin
insert into mydata(val) values (1);
perform pg_sleep(60);
end;
$$ language 'plpgsql';
--run in thread 2
create or replace function insert_ok() returns void as $$
begin
insert into mydata(val) values (2);
end;
$$ language 'plpgsql';
--run in thread 3
mytest=# select * from mydata; --SHOULD HAVE SEEN id = 1 and 2;
id | val
----+-----
2 | 2
(1 row)
I even tried some statement like the one below;
select max(id) from mydata age(xmin) >= age(txid_snapshot_xmin(txid_current_snapshot())::text::xid);
But in production line (running high volume transactions), the returned max(id) will not move forwards (even all the busy transaction are finished). So this does not work either.

There isn't a really good way to do this directly. I think the best option really is to create a temporary table which truncates on transaction commit, and a trigger that inserts such into that table. Then you can look up the values from the temp table.

How can I insert the return of DELETE into INSERT in postgresql?

I am trying to delete a row from one table and insert it with some additional data into another. I know this can be done in two separate commands, one to delete and another to insert into the new table. However I am trying to combine them and it is not working, this is my query so far:
insert into b (one,two,num) values delete from a where id = 1 returning one, two, 5;
When running that I get the following error:
ERROR: syntax error at or near "delete"
Can anyone point out how to accomplish this, or is there a better way? or is it not possible?

You cannot do this before PostgreSQL 9.1, which is not yet released. And then the syntax would be
WITH foo AS (DELETE FROM a WHERE id = 1 RETURNING one, two, 5)
INSERT INTO b (one, two, num) SELECT * FROM foo;

Before PostgreSQL 9.1 you can create a volatile function like this (untested):
create function move_from_a_to_b(_id integer, _num integer)
returns void language plpgsql volatile as
$$
declare
_one integer;
_two integer;
begin
delete from a where id = _id returning one, two into strict _one, _two;
insert into b (one,two,num) values (_one, _two, _num);
end;
$$
And then just use select move_from_a_to_b(1, 5). A function has the advantage over two statements that it will always run in single transaction — there's no need to explicitly start and commit transaction in client code.

For all version of PostgreSQL, you can create a trigger function for deleting rows from a table and inserting them to another table. But it seems slower than bulk insert that is released in PostgreSQL 9.1. You just need to move the old data into the another table before it gets deleted. This is done with the OLD data type:
CREATE FUNCTION moveDeleted() RETURNS trigger AS $$
BEGIN
INSERT INTO another_table VALUES(OLD.column1, OLD.column2,...);
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER moveDeleted
BEFORE DELETE ON table
FOR EACH ROW
EXECUTE PROCEDURE moveDeleted();
As above answer, after PostgreSQL 9.1 you can do this:
WITH tmp AS (DELETE FROM table RETURNING column1, column2, ...)
INSERT INTO another_table (column1, column2, ...) SELECT * FROM tmp;

That syntax you have there isn't valid. 2 statements is the best way to do this. The most intuitive way to do it would be to do the insert first and the delete second.

As "AI W", two statements are certainly the best option for you, but you could also consider writing a trigger for that. Each time something is deleted in your first table, another is filled.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres plpgsql with PERFORM data-modifying CTE queries - postgresql

Related

Query has no result in destination data when calling colpivot inside pgsql stored procedure

Declare and return value for DELETE and INSERT

Execute a SELECT with dynamic ORDER BY expression inside a function

how to get last Postresql serial ID that all rows before it are committed?

How can I insert the return of DELETE into INSERT in postgresql?

Categories

Resources