PL/pgSQL perform vs execute - postgresql

What are the difference between perform and execute on PL/pgSQL?
From the manual:
Sometimes it is useful to evaluate an expression or SELECT query but discard the result, for example when calling a function that has side-effects but no useful result value. To do this in PL/pgSQL, use the PERFORM statement.
But, when I'm trying something like:
perform 'create table foo as (select 1)';
Nothing happens. Although this query should have side effects (creating table), and the result can be discarded.
I think I get 1 thing right: in order to run functions I can use perform:
perform pg_temp.addInheritance(foo);

PERFORM is plpgsql command used for calls of void functions. PLpgSQL is careful about useless SELECT statements - the SELECT without INTO clause is not allowed. But sometimes you need to call a function and you don't need to store result (or functions has no result). The function in SQL is called with SELECT statement. But it is not possible in PLpgSQL - so the command PERFORM was introduced.
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS $$
BEGIN
RAISE NOTICE 'Hello from void function';
END;
$$ LANGUAGE plpgsql;
-- direct call from SQL
SELECT foo();
-- in PLpgSQL
DO $$
BEGIN
SELECT foo(); -- is not allowed
PERFORM foo(); -- is ok
END;
$$;
The PERFORM statements execute a parameter and forgot result.
Your example perform 'create table foo as (select 1)';
is same like SELECT 'create table foo as (select 1)'. It returns a string "create table foo as (select 1)" and this string is discarded.
The EXECUTE statement evaluate a expression to get string. In next step this string is executed.
So EXECUTE 'create table ' || some_var || '(a int)'; has two steps
evaluate expression 'create table ' || some_var || '(a int)'
if some_var is mytab for example, then execute a command create table mytab(a int)
The PERFORM statement is used for function calls, when functions are not used in assignment statement. The EXECUTE is used for evaluation of dynamic SQL - when a form of SQL command is known in runtime.

Further next line in docs you quote:
This executes query and discards the result. Write the query the same
way you would write an SQL SELECT command, but replace the initial
keyword SELECT with PERFORM.
Emphasis mine
execute in its turn executes dynamic query (same docs above)

I recently had a case where I needed to set specific constraints to DEFERRED and found something interesting.
- EXECUTE 'SET CONSTRAINTS fk_a DEFERRED';
- PERFORM 'SET CONSTRAINTS fk_a DEFERRED';
Both EXECUTE and PERFORM executed the statements without error but only EXECUTE persisted the action for the rest of the code.
Somehow, it looks like PERFORM runs in it's own transaction "bubble".
In my case I had two tables a and b and a FK (the real stuation is much more complex). There was a need to insert the data out of the parent/child order and for that we needed the constraint DEFERRED.
Using the PERFORM we had a foreign key violation, with EXECUTE we didn't.

Related

Execute prepare statement in a sql function

CREATE OR REPLACE FUNCTION my_function2()
RETURNS VOID
LANGUAGE plpgsql AS
$func$
BEGIN
PREPARE "my-statement7" (text, text) AS
INSERT INTO "table" (key, a_id)
VALUES ($1, $2);
EXECUTE format('EXECUTE "my-statement7" (%1, %2)', 'value1', 'value2');
END;
$func$;
SELECT my_function2();
However this will return error.
I have run successfully when I do execute prepare statement in the command line but when I do it in a function it will prompt error. Somebody said the execute statement will be ambiguous in execute statement or function, so it need put execute into execute format. However, it doesn't work too.
I want to execute the prepare statement for insert because I got a lot of data got to insert in same format, but the data may need to insert to different table (also data are different), so I cannot just do insert multi or COPY for it. I think the prepare statement is the solution for me, but it cannot work in a function. This will bother me if I want to do a lot of "execute prepare statement" in a function.
In PL/SQL, queries are already prepared and cached, so you don't have to.
As each expression and SQL command is first executed in the function, the PL/pgSQL interpreter parses and analyzes the command to create a prepared statement, using the SPI manager's SPI_prepare function. Subsequent visits to that expression or command reuse the prepared statement.
Here's how you'd write it if you tried. Note the use of _ instead of - in the name to avoid quoting issues; no need to format.
CREATE OR REPLACE FUNCTION my_function2()
RETURNS VOID
LANGUAGE plpgsql AS
$func$
BEGIN
PREPARE my_statement7 (text, text) AS
INSERT INTO example (key, a_id)
VALUES ($1, $2);
EXECUTE my_statement7( 'value1', 'value2');
END;
$func$;
However, the first time it will fail (for reasons I don't entirely understand): ERROR: function my_statement7(unknown, unknown) does not exist. The second time it is called it will also fail: ERROR: prepared statement "my_statement7" already exists.

Is it possible to write polymorphic Postgres functions using RECORD parameters?

I want to write a PL/pgSQL function that can take records of different types, check the type of record provided, then do something with the record. Example:
CREATE FUNCTION polymorphic_input(arg_rec RECORD) RETURNS TEXT LANGUAGE plpgsql AS
$plpgsql$
BEGIN
IF pg_typeof(arg_rec)::text = 'information_schema.tables' THEN
RETURN (arg_rec::information_schema.tables).table_name;
ELSIF pg_typeof(arg_rec)::text = 'information_schema.columns' THEN
RETURN (arg_rec::information_schema.columns).column_name;
ELSE
RETURN 'unknown';
END IF;
END;
$plpgsql$;
When you call the function with a row from the information_schema.tables table, it should return the name of the table and it does so when you call it like this:
-- this returns table name "pg_type"
SELECT polymorphic_input((SELECT t FROM information_schema.tables t WHERE table_name = 'pg_type' LIMIT 1));
When you call the function with a row from the information_schema.columns table, it should return the name of the column and it does so when you call it like this:
-- this returns column name "objsubid"
SELECT polymorphic_input((SELECT t FROM information_schema.columns t WHERE t.column_name = 'objsubid' LIMIT 1));
The problem is you CAN'T call the function twice in a row with different row types. For example, if you call it with a row from information_schema.columns it works, then when you call it with a row form information_schema.tables, you get an error like this:
type of parameter 1 (information_schema.tables) does not match that when preparing the plan (information_schema.columns)
The words "when preparing the plan" gave me a hint that Postgres is caching the plans, so I figured running DISCARD PLANS; before each call to the function would work, and indeed it does when you run this entire query:
DISCARD PLANS; SELECT polymorphic_input((SELECT t FROM information_schema.tables t WHERE table_name = 'pg_type' LIMIT 1));
DISCARD PLANS; SELECT polymorphic_input((SELECT t FROM information_schema.columns t WHERE t.column_name = 'objsubid' LIMIT 1));
Running DISCARD PLANS; seems like the nuclear option and would no doubt affect performance in a real-world scenario. After some experimentation, I saw that using the pg_typeof function is what forces the plans to be cached. We can rewrite the function to avoid pg_typeof by adding a parameter that specifies what record type to expect:
CREATE FUNCTION polymorphic_input2(arg_rec RECORD, arg_type text) RETURNS TEXT LANGUAGE plpgsql AS
$plpgsql$
BEGIN
IF arg_type = 'tables' THEN
RETURN (arg_rec::information_schema.tables).table_name;
ELSIF arg_type = 'columns' THEN
RETURN (arg_rec::information_schema.columns).column_name;
ELSE
RETURN 'unknown';
END IF;
END;
$plpgsql$;
You can then call polymorphic_input2 multiple times in a row with different row types without error as follows:
-- no need for DISCARD PLANS here...these calls work fine.
SELECT polymorphic_input2((SELECT t FROM information_schema.tables t WHERE table_name = 'pg_type' LIMIT 1), 'tables');
SELECT polymorphic_input2((SELECT t FROM information_schema.columns t WHERE t.column_name = 'objsubid' LIMIT 1), 'columns');
The problem with polymorphic_input2 is that you have to manually give it a hint as to the type of the record to expect. My question: is it possible to implement a polymorphic function that can figure out the type of record passed to it, without the cached plan errors?
The docs mention the plan_cache_mode setting:
Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions....The allowed values are auto (the default), force_custom_plan and force_generic_plan...
I tried removing the error by running SET plan_cache_mode = force_custom_plan; but that didn't help (which is probably a bug because the docs imply it should force a custom plan in each call, but Postgres is still caching the plan and causing errors). Only DISCARD PLANS worked.
The docs on plan caching seem to recognize this issue and say:
The mutable nature of record variables presents another problem in this connection. When fields of a record variable are used in expressions or statements, the data types of the fields must not change from one call of the function to the next, since each expression will be analyzed using the data type that is present when the expression is first reached. EXECUTE can be used to get around this problem when necessary.
...and a little further down the docs indicate this shouldn't be happening:
Likewise, functions having polymorphic argument types have a separate statement cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.
This is further confirmed by the docs EXECUTE which say:
Also, there is no plan caching for commands executed via EXECUTE. Instead, the command is always planned each time the statement is run. Thus the command string can be dynamically created within the function to perform actions on different tables and columns.
So I tried another variant that tries to run pg_typeof via EXECUTE:
CREATE FUNCTION polymorphic_input3(arg_rec RECORD) RETURNS TEXT LANGUAGE plpgsql AS
$plpgsql$
DECLARE
rec_type text;
BEGIN
EXECUTE 'SELECT pg_typeof($1)' INTO rec_type USING arg_rec;
IF rec_type = 'information_schema.tables' THEN
RETURN (arg_rec::information_schema.tables).table_name;
ELSIF rec_type = 'information_schema.columns' THEN
RETURN (arg_rec::information_schema.columns).column_name;
ELSE
RETURN 'unknown';
END IF;
END;
$plpgsql$;
...but that still produces the same error as the variant which calls pg_typeof directly.
My question once again: is it possible (in Postgres 14) to implement a polymorphic function that can figure out the type of record passed to it, without the cached plan errors?

Capture number of rows affected by dynamic sql?

I am trying to get the return from a QUERY EXEUTE in a plpgsql function to be able to check how many rows were affected from a dynamic update query. My use case is adding an event (with a custom payload) to a separate table on insert or update to a dynamically set table. Because my event has a custom payload, I have not been able to use a database trigger (e.g. trigger before insert). As a simplified example, assume I have this table:
CREATE TABLE users (user_id text primary key, name text)
Here is my simplified events table:
CREATE TABLE events(event_id text primary key, payload json)
Here is my simplified function:
CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement AS
$func$
DECLARE
event_id text;
BEGIN
SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
execute format('insert into event(event_id, payload) values ($1, $2)') using event_id, payload;
RETURN QUERY EXECUTE format('%s', q);
END
$func$ LANGUAGE plpgsql;
The goal is to have this work exactly the same as if someone had created these in a transaction. In pseucode for insert:
BEGIN
insert into events(id, payload) values($1, $2)
insert into users(columns) values(<any values>)
COMMIT
and similarly for update:
BEGIN
insert into events(id, payload) values($1, $2)
result, error := query(`update users set name = 'hello' where id = 'Not Exists Thus No Rows Modified'`);
if result.rowsAffected() == 0 {
ROLLBACK
}
COMMIT
The function my_function almost works except for one edge case: when an update actually doesn't affect any rows.
For example, this works:
select * from my_function(NULL::users,
'insert into users(id,name) values('u1', ''a2'') returning *',
payload => '{"id": "e1", "custom": "s1", "field": "2019-10-12T07:20:50.52Z"}')
As expected, after this is done both a row in the users table and the events table is created.
What fails is the following:
select * from my_function(NULL::users,
'update users set name = ''hello'' where user_id = ''NotExists'' returning *',
payload => '{"id": "e2", "custom": "s3", "field": "2019-10-12T07:20:50.52Z"}')
Here, a row is created in the events table (my goal is that it should not be created).
I know this approach is not elegant, and I know this is vulnerable to SQL injection. I'd love suggestions on better ways to solve this (including scrapping what we're doing now). But to answer the question directly, I'm looking to store the result of QUERY EXECUTE, check if any rows were affected, and raise an error so that there is never a case where a row in the events table is created when there is not real corresponding change in the users table. Users table is just an example, in general, it could be any dynamically set table.
A RETURN QUERY doesn't need to go to the end of the function, it only says: "the results of this query are part of the resulting set".
So you can use the RETURN QUERY, ask for FOUND and act accordingly. Here is your function modified for working this way:
CREATE OR REPLACE FUNCTION public.my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement
LANGUAGE plpgsql
AS $function$
DECLARE
event_id text;
BEGIN
SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
RETURN QUERY EXECUTE format('%s', q);
IF FOUND THEN
execute format('insert into events(event_id, payload) values ($1, $2)') using event_id, payload;
END IF;
RETURN;
END
$function$
PD: Maybe you can also solve your problem with triggers FOR EACH STATEMENT using the transition tables OLD and NEW (which are available since v10, https://www.postgresql.org/docs/10/sql-createtrigger.html)
CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE q;
IF NOT FOUND THEN
RETURN; -- nothing happened yet, we can exit silently.
-- Or you WANT an error for this case. Then do this instead:
-- RAISE EXCEPTION 'Query passed in parameter "q" did not affect any rows. Doing nothing!';
END IF;
INSERT INTO event(event_id, payload)
VALUES (payload->>'id', payload);
END
$func$;
As has been commented, RETURN QUERY does not return from the function. The manual:
RETURN NEXT and RETURN QUERY do not actually return from the
function — they simply append zero or more rows to the function's
result set. Execution then continues with the next statement in the
PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY
commands are executed, the result set is built up. A final RETURN,
which should have no argument, causes control to exit the function (or
you can just let control reach the end of the function).
There's a code example for your case exactly at the bottom of that chapter in the manual. From me, actually. Originating here:
FUNCTION syntax error
It was suggested to use GET DIAGNOSTICS instead of the simpler FOUND. It's true that EXECUTE does not set the state of FOUND. But RETURN QUERY does. So keep using the simpler FOUND. Related:
Dynamic SQL (EXECUTE) as condition for IF statement
You have format() in your original twice. And while that's typically very useful for dynamic SQL, it's useless in your case. EXECUTE format('%s', q) is exactly the same as just EXECUTE q, with added cost. Both are open doors for SQL injection when passing user input.
While there is a good chance that the transaction might be rolled back, start with the critical step, and do the rest later. Avoid wasting the work. So I moved executing q to the top. Assuming it does not depend on the "payload" row, now inserted later.
Also, INSERT INTO events can be plain SQL. Nothing dynamic there. No need for format() or EXECUTE.
Finally, assuming your jsonb_object_field_text (payload, 'id')::text is just a fancy way of saying payload->>'id'. No need for an additional variable and another SELECT INTO.
Warning against SQL injection
Converting user input (parameter q in the example) to code to execute dynamically is the most direct SQL injection vulnerability of all. I wouldn't want to be caught in my underwear doing that.

Why does PostgreSQL treat my query differently in a function?

I have a very simple query that is not much more complicated than:
select *
from table_name
where id = 1234
...it takes less than 50 milliseconds to run.
Took that query and put it into a function:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY SELECT *
FROM table_name
where id = id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
This function when executed select * from pie(123); takes 22 seconds.
If I hard code an integer in place of id_param, the function executes in under 50 milliseconds.
Why does the fact that I am using a parameter in the where statement cause my function to run slow?
Edit to add concrete example:
CREATE TYPE test_type AS (gid integer, geocode character varying(9))
CREATE OR REPLACE FUNCTION geocode_route_by_geocode(geocode_param character)
RETURNS SETOF test_type AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = $1
AND geo_type = 1
GROUP BY geography, gid, geocode' USING geocode_param;
END;
$BODY$
LANGUAGE plpgsql STABLE;
ALTER FUNCTION geocode_carrier_route_by_geocode(character)
OWNER TO root;
--Runs in 20 seconds
select * from geocode_route_by_geocode('999xyz');
--Runs in 10 milliseconds
SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = '9999xyz'
AND geo_type = 1
GROUP BY geography, gid, geocode
Update in PostgreSQL 9.2
There was a major improvement, I quote the release notes here:
Allow the planner to generate custom plans for specific parameter
values even when using prepared statements (Tom Lane)
In the past, a prepared statement always had a single "generic" plan
that was used for all parameter values, which was frequently much
inferior to the plans used for non-prepared statements containing
explicit constant values. Now, the planner attempts to generate custom
plans for specific parameter values. A generic plan will only be used
after custom plans have repeatedly proven to provide no benefit. This
change should eliminate the performance penalties formerly seen from
use of prepared statements (including non-dynamic statements in
PL/pgSQL).
Original answer for PostgreSQL 9.1 or older
A plpgsql functions has a similar effect as the PREPARE statement: queries are parsed and the query plan is cached.
The advantage is that some overhead is saved for every call.
The disadvantage is that the query plan is not optimized for the particular parameter values it is called with.
For queries on tables with even data distribution, this will generally be no problem and PL/pgSQL functions will perform somewhat faster than raw SQL queries or SQL functions. But if your query can use certain indexes depending on the actual values in the WHERE clause or, more generally, chose a better query plan for the particular values, you may end up with a sub-optimal query plan. Try an SQL function or use dynamic SQL with EXECUTE to force a the query to be re-planned for every call. Could look like this:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM table_name
where id = $1'
USING id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
Edit after comment:
If this variant does not change the execution time, there must be other factors at play that you may have missed or did not mention. Different database? Different parameter values? You would have to post more details.
I add a quote from the manual to back up my above statements:
An EXECUTE with a simple constant command string and some USING
parameters, as in the first example above, is functionally equivalent
to just writing the command directly in PL/pgSQL and allowing
replacement of PL/pgSQL variables to happen automatically. The
important difference is that EXECUTE will re-plan the command on each
execution, generating a plan that is specific to the current parameter
values; whereas PL/pgSQL normally creates a generic plan and caches it
for re-use. In situations where the best plan depends strongly on the
parameter values, EXECUTE can be significantly faster; while when the
plan is not sensitive to parameter values, re-planning will be a
waste.

PLPGSQL: Passing an argument into a function breaks my quotation marks

Without functions, I can do:
DELETE FROM table1
WHERE something='hello'
And my rows with the value of something='hello' get deleted, but as soon I implement functions, I begin to have problems with quotation marks.
CREATE OR REPLACE FUNCTION somefunc(varchar)
RETURNS varchar AS $$
BEGIN
DELETE FROM table1
WHERE something='$1';
DELETE FROM table2
WHERE something='$1';
RETURN $1;
END;
$$ LANGUAGE plpgsql;`
Nothing seems to work. I have tried (all variations that I saw on SO or elsewhere):
something=$1 <-- says column "hello" doesn't exist (because no quotes are given)
something=''$1''
something='''$1'''
something=''''$1''''
something='''||$1||'''
something=$Q$$1$Q1$ <--- gives syntax error
something=$Q1$ $1 $Q1$
something=$$ $1 $$
something=quote_literal($1)
And many other variations. How do I get around this??
Btw, I am using a python script to run the function. Here's the line that runs it. I've also tried adding quotes into this line as well to no avail:
cur.execute("SELECT somefunc(%s);" % (sys.argv[2]))
Thank you!
This behavior is based on the implicit use of prepare statements. When prepared statements are used, query and parameters are passed to the database server separately. Do not quote values in that scenario.
PL/pgSQL uses prepared statements, psycopg2 uses prepared statements, too:
...
DECLARE myvar int;
BEGIN
DELETE FROM mytab WHERE column = myvar; -- quietly using prepared statement
versus
DECLARE myvar int;
BEGIN
-- using dynamic SQL is similar to classic languages, quoting is necessary
-- but use the quote_literal() function to protect against SQL injection
EXECUTE 'DELETE FROM mytab WHERE column = ' || quote_literal(myvar);
-- or dynamic SQL with "USING" clause
EXECUTE 'DELETE FROM mytab WHERE column = $1' USING myvar;