PostgreSQL insert or update trigger function volatility category - postgresql

Assume, i have 2 tables in my DB (postgresql-9.x)
CREATE TABLE FOLDER (
KEY BIGSERIAL PRIMARY KEY,
PATH TEXT,
NAME TEXT
);
CREATE TABLE FOLDERFILE (
FILEID BIGINT,
PATH TEXT,
PATHKEY BIGINT
);
I automatically update FOLDERFILE.PATHKEY from FOLDER.KEY whenever i insert into or update FOLDERFILE:
CREATE OR REPLACE FUNCTION folderfile_fill_pathkey() RETURNS trigger AS $$
DECLARE
pathkey bigint;
changed boolean;
BEGIN
IF tg_op = 'INSERT' THEN
changed := TRUE;
ELSE IF old.FILEID != new.FILEID THEN
changed := TRUE;
END IF;
END IF;
IF changed THEN
SELECT INTO pathkey key FROM FOLDER WHERE PATH = new.path;
IF FOUND THEN
new.pathkey = pathkey;
ELSE
new.pathkey = NULL;
END IF;
END IF;
RETURN new;
END
$$ LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER folderfile_fill_pathkey_trigger AFTER INSERT OR UPDATE
ON FOLDERFILE FOR EACH ROW EXECUTE PROCEDURE fcliplink_fill_pathkey();
So the question is about function folderfile_fill_pathkey() volatility. Documentations says
Any function with side-effects must be labeled VOLATILE
But as far as i understand – this function does not change any data in the tables it rely on, so i can mark this function as IMMUTABLE. It that correct?
Would there be any problem with IMMUTABLE trigger function if I bulk-insert many rows into FOLDERFILE within the same transaction, like:
BEGIN;
INSERT INTO FOLDERFILE ( ... );
...
INSERT INTO FOLDERFILE ( ... );
COMMIT;

Firstly, as #pozs already pointed out, the function definition you have provided is most definitely STABLE rather than IMMUTABLE since it performs database look-ups. This means that the result is not simply derived from the input parameters (as IMMUTABLE would suggest), but also from the data stored in your FOLDER table (which is bound to change). As per the documentation:
STABLE indicates that the function cannot modify the database, and
that within a single table scan it will consistently return the same
result for the same argument values, but that its result could change
across SQL statements. This is the appropriate selection for functions
whose results depend on database lookups, parameter variables (such as
the current time zone), etc.
Secondly, adding stability modifiers (IMMUTABLE/STABLE/VOLATILE) to your trigger functions serves an illustrative purpose at best, since AFAIK PostgreSQL doesn't actually perform any planning that would warrant their use. The following post from the pgsql-hackers mailing list seems to support my claim:
Volatility is a complete no-op for a trigger function anyway, as are
other planner parameters such as cost/rows, because there is no
planning involved in trigger calls.
To sum up: you're probably better off avoiding the stability keywords in your trigger(!) procedures for now, since including them seems to add little to no benefit but entails several unexpected caveats/pitfalls (see the end of #pozs's first comment).

Related

Is there any way to check whether a PostgreSQL record- or row-type variable contains a specific field from inside a function?

I have a trigger defined on several tables to fire after all INSERT, UPDATE, or DELETE, all using the same trigger function. The trigger function performs an expensive check, but I can speed it up significantly by filtering some of the intermediate steps of that check using either a WHERE machine_serial = NEW.machine_serial or WHERE machine_serial = OLD.machine_serial clause, depending on what type of statement fired the trigger. However, not all the tables actually have a machine_serial column, so I can't perform this filtering when the trigger is fired on one of those tables. I am currently trying to find a good solution to making the decision of whether to filter or not from within the trigger function, and I believe that simply checking whether NEW or OLD has the machine_serial field would be easiest, clearest, and fastest. I can't find any way to do that in the documentation though, but checking whether a RECORD contains a certain field seems like such a basic, commonplace operation for anyone that has to work with RECORDs that I assume that I've just got to be missing it somewhere - I can't imagine that it's just not possible.
For completeness, I'll go over the alternatives I've considered to the hypothetical does-RECORD-have-field check:
I could create two trigger functions, do_expensive_check_with_machine_serial() and do_expensive_check_without_machine_serial(), and use one or the other depending on whether the table has the machine_serial column. But if I or anyone after me needs to alter the logic in either one of these functions, they'll need to remember to alter the logic in the other one, too.
I could stick with the one trigger function I currently have, and figure out whether the firing table has machine_serial by just trying to access NEW.machine_serial or OLD.machine_serial. If that raises an exception, I can catch it and then I'll know the field isn't present. But the manual explicitly suggests avoiding using exception blocks unless absolutely necessary, due to performance impacts.
I could stick with the one trigger function I currently have, and just add a check like this: IF (TG_TABLE_SCHEMA = x AND TG_TABLE_NAME = y) OR (TG_TABLE_SCHEMA = w AND TG_TABLE_NAME = z) OR ...
, and just maintain that list of every table that has a machine_serial column. But then I and anyone that comes after me would need to alter that check in the trigger function any time the trigger is added to a new table, which is less than ideal.
Of course, the above three alternatives would all function, but they all feel like bad design choices to me. Maybe it's because I'm used to the dynamicness offered by Python, but if I used any of these alternatives, I would feel like I'm doing something wrong. And PostgreSQL is pretty good about offering lots of operators on all sorts of data types, so I just can't imagine that something as basic as checking whether a RECORD or ROW-type variable contains a certain field is impossible.
Before I show the solution, I have to say, so this requirement can be signal of some unhappy design. Maybe you try to implement some functionality that should not be implemented in triggers. Triggers are good, but too smart too generic too rich can be very slow and very hard to maintain and fix errors (but as every in life, there are exceptions from rules).
So first - you can look to system catalog:
CREATE FUNCTION public.foo_trg() RETURNS trigger
LANGUAGE plpgsql
AS $$
begin
raise notice 'a exists %', exists(select * from pg_attribute where attrelid = new.tableoid and attname = 'a');
raise notice 'd exists %', exists(select * from pg_attribute where attrelid = new.tableoid and attname = 'd');
return new;
end;
$$;
CREATE TABLE public.foo (
a integer,
b integer
);
CREATE TRIGGER foo_trg_insert
AFTER INSERT ON public.foo
FOR EACH ROW EXECUTE FUNCTION public.foo_trg();
(2022-09-02 06:18:41) postgres=# insert into foo values(1,2);
NOTICE: a exists t
NOTICE: d exists f
INSERT 0 1
Second solution is based on record to jsonb transformations:
CREATE OR REPLACE FUNCTION public.foo_trg()
RETURNS trigger
LANGUAGE plpgsql
AS $$
declare j jsonb;
begin
j := to_jsonb(new);
raise notice 'a exists %', j ? 'a';
raise notice 'd exists %', j ? 'd';
return new;
end;
$$
(2022-09-02 06:24:54) postgres=# insert into foo values(1,2);
NOTICE: a exists t
NOTICE: d exists f
INSERT 0 1
Second solution can be faster, because doesn't requires queries to system catalog. It hits just system catalog cache, but it doesn't work on some legacy PostgreSQL releases.

Capture number of rows affected by dynamic sql?

I am trying to get the return from a QUERY EXEUTE in a plpgsql function to be able to check how many rows were affected from a dynamic update query. My use case is adding an event (with a custom payload) to a separate table on insert or update to a dynamically set table. Because my event has a custom payload, I have not been able to use a database trigger (e.g. trigger before insert). As a simplified example, assume I have this table:
CREATE TABLE users (user_id text primary key, name text)
Here is my simplified events table:
CREATE TABLE events(event_id text primary key, payload json)
Here is my simplified function:
CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement AS
$func$
DECLARE
event_id text;
BEGIN
SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
execute format('insert into event(event_id, payload) values ($1, $2)') using event_id, payload;
RETURN QUERY EXECUTE format('%s', q);
END
$func$ LANGUAGE plpgsql;
The goal is to have this work exactly the same as if someone had created these in a transaction. In pseucode for insert:
BEGIN
insert into events(id, payload) values($1, $2)
insert into users(columns) values(<any values>)
COMMIT
and similarly for update:
BEGIN
insert into events(id, payload) values($1, $2)
result, error := query(`update users set name = 'hello' where id = 'Not Exists Thus No Rows Modified'`);
if result.rowsAffected() == 0 {
ROLLBACK
}
COMMIT
The function my_function almost works except for one edge case: when an update actually doesn't affect any rows.
For example, this works:
select * from my_function(NULL::users,
'insert into users(id,name) values('u1', ''a2'') returning *',
payload => '{"id": "e1", "custom": "s1", "field": "2019-10-12T07:20:50.52Z"}')
As expected, after this is done both a row in the users table and the events table is created.
What fails is the following:
select * from my_function(NULL::users,
'update users set name = ''hello'' where user_id = ''NotExists'' returning *',
payload => '{"id": "e2", "custom": "s3", "field": "2019-10-12T07:20:50.52Z"}')
Here, a row is created in the events table (my goal is that it should not be created).
I know this approach is not elegant, and I know this is vulnerable to SQL injection. I'd love suggestions on better ways to solve this (including scrapping what we're doing now). But to answer the question directly, I'm looking to store the result of QUERY EXECUTE, check if any rows were affected, and raise an error so that there is never a case where a row in the events table is created when there is not real corresponding change in the users table. Users table is just an example, in general, it could be any dynamically set table.
A RETURN QUERY doesn't need to go to the end of the function, it only says: "the results of this query are part of the resulting set".
So you can use the RETURN QUERY, ask for FOUND and act accordingly. Here is your function modified for working this way:
CREATE OR REPLACE FUNCTION public.my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement
LANGUAGE plpgsql
AS $function$
DECLARE
event_id text;
BEGIN
SELECT jsonb_object_field_text (payload, 'id')::text INTO STRICT event_id;
RETURN QUERY EXECUTE format('%s', q);
IF FOUND THEN
execute format('insert into events(event_id, payload) values ($1, $2)') using event_id, payload;
END IF;
RETURN;
END
$function$
PD: Maybe you can also solve your problem with triggers FOR EACH STATEMENT using the transition tables OLD and NEW (which are available since v10, https://www.postgresql.org/docs/10/sql-createtrigger.html)
CREATE OR REPLACE FUNCTION my_function(_rowtype anyelement, q text, payload jsonb)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE q;
IF NOT FOUND THEN
RETURN; -- nothing happened yet, we can exit silently.
-- Or you WANT an error for this case. Then do this instead:
-- RAISE EXCEPTION 'Query passed in parameter "q" did not affect any rows. Doing nothing!';
END IF;
INSERT INTO event(event_id, payload)
VALUES (payload->>'id', payload);
END
$func$;
As has been commented, RETURN QUERY does not return from the function. The manual:
RETURN NEXT and RETURN QUERY do not actually return from the
function — they simply append zero or more rows to the function's
result set. Execution then continues with the next statement in the
PL/pgSQL function. As successive RETURN NEXT or RETURN QUERY
commands are executed, the result set is built up. A final RETURN,
which should have no argument, causes control to exit the function (or
you can just let control reach the end of the function).
There's a code example for your case exactly at the bottom of that chapter in the manual. From me, actually. Originating here:
FUNCTION syntax error
It was suggested to use GET DIAGNOSTICS instead of the simpler FOUND. It's true that EXECUTE does not set the state of FOUND. But RETURN QUERY does. So keep using the simpler FOUND. Related:
Dynamic SQL (EXECUTE) as condition for IF statement
You have format() in your original twice. And while that's typically very useful for dynamic SQL, it's useless in your case. EXECUTE format('%s', q) is exactly the same as just EXECUTE q, with added cost. Both are open doors for SQL injection when passing user input.
While there is a good chance that the transaction might be rolled back, start with the critical step, and do the rest later. Avoid wasting the work. So I moved executing q to the top. Assuming it does not depend on the "payload" row, now inserted later.
Also, INSERT INTO events can be plain SQL. Nothing dynamic there. No need for format() or EXECUTE.
Finally, assuming your jsonb_object_field_text (payload, 'id')::text is just a fancy way of saying payload->>'id'. No need for an additional variable and another SELECT INTO.
Warning against SQL injection
Converting user input (parameter q in the example) to code to execute dynamically is the most direct SQL injection vulnerability of all. I wouldn't want to be caught in my underwear doing that.

Recursive function postgres to get the latest ID

I want to use a function in PostgreSQL to get the latest ID related to a history:
CREATE TABLE "tbl_ids" (
"ID" oid,
"Name" text,
"newID" oid
);
After creating this simple table, I have no idea where to start my function, and before you ask: I know about COALESCE()-function, but I'm going to have more then one parent-ID in the future.
CREATE FUNCTION get_lastes_id(ID oid, newID oid) RETURNS oid AS $$
BEGIN
IF new IS NOT NULL THEN
--USE old--
END
IF new IS NULL THEN
get_latest_id(new, "newID")
END
END;
I gotta say it because you'd find out anyway: I'm really new in functions with PostgreSQL and I'm not even sure if this is possible. But assuming COALESCE()-Function also exists it has to be a server-side function I guess.
First, it is not clear what you are asking. oid's are probably not the best type to use primarily because they are an internal type designed for the system libraries and therefore you cannot guarantee they will act the way you expect.
Secondly this seems to me to be a poor choice tools if you want to use recursion to just get the latest. If you want things to perform well, try to think in set operations rather than imparitive algorithms.
If you want a trigger to get the latest (maximum) oid for a name and assign it to "newID" then:
CREATE OR REPLACE FUNCTION set_newID() RETURNS TRIGGER LANGUAGE PLPGSQL AS
$$
DECLARE maxid oid;
BEGIN
IF new."newID" IS NOT NULL THEN
RETURN new; -- do nothing
END IF;
SELECT max("ID") INTO maxid FROM tbl_ids WHERE "Name" = new."Name";
new."newID" = maxid;
RETURN new;
END;
$$;
That works with oids and ints. However it has to select a row from the db on each row modified by the trigger so you will have performance problems with bulk inserts for example.
Oh, and far better to use all lower case so you don't have to quote every identifier.

Is this generic MERGE/UPSERT function for PostgreSQL safe?

I have created a "merge" function which is supposed to execute either an UPDATE or an INSERT query, depending on existing data. Instead of writing an upsert-wrapper for each table (as in most of the available examples), this function takes entire SQL strings. Both of the SQL strings are automatically generated by our application.
The plan is to call the function like this:
-- hypothetical "settings" table, with a primary key of (user_id, setting):
SELECT merge(
$$UPDATE settings SET value = 'x' WHERE user_id = 42 AND setting = 'foo'$$,
$$INSERT INTO settings (user_id, setting, value) VALUES (42, 'foo', 'x')$$
);
Here's the full code of the merge() function:
CREATE OR REPLACE FUNCTION merge (update_sql TEXT, insert_sql TEXT) RETURNS TEXT AS
$func$
DECLARE
max_iterations INTEGER := 10;
i INTEGER := 0;
num_updated INTEGER;
BEGIN
-- usually returns before re-entering the loop
LOOP
-- first try the update
EXECUTE update_sql;
GET DIAGNOSTICS num_updated = ROW_COUNT;
IF num_updated > 0 THEN
RETURN 'UPDATE';
END IF;
-- nothing was updated: try the insert, watching out for concurrent inserts
BEGIN
EXECUTE insert_sql;
RETURN 'INSERT';
EXCEPTION WHEN unique_violation THEN
-- nop; just loop and try again from the top
END;
-- emergency brake
i := i + 1;
IF i >= max_iterations THEN
RAISE EXCEPTION 'merge(): tried looping % times, giving up now.', i;
EXIT;
END IF;
END LOOP;
END;
$func$
LANGUAGE plpgsql;
It appears to work well enough in my tests, but I'm not certain if I haven't missed anything crucial, especially regarding concurrent UPDATE/INSERT/DELETE queries, which may be issued without using this function. Did I overlook anything important?
Among the resources I consulted for this function are:
UPDATE/INSERT example 40.2 in the PostgreSQL manual
Why is UPSERT so complicated?
SO: Insert, on duplicate update (postgresql)
(Edit: one of the goals was to avoid locking the target table.)
The answer to your question depends your the context of how your application(s) will access the database. There are many ways to solve this as nicely discussed in depesz's post you cited by yourself. In addition you might want to also consider using writeable CTEs see here. Also the [question]Insert, on duplicate update in PostgreSQL? has some interesting discussions for your decision making process.

using a cursor in a trigger

I am trying to write a trigger function that executes when an insert is performed. The condition is, if the id already exists then the time_create should now be the time_dead.
CREATE function archive_temp() returns trigger as '
begin
insert into temporary_archive
values
(
OLD.id,
OLD.time_create,
OLD.time_dead,
OLD.fname,
current_user,
now(),
now(),
TG_OP
);
return null;
end;
' LANGUAGE 'plpgsql';
-----------------------------
CREATE TRIGGER archive_temps
AFTER DELETE OR UPDATE
on temporary_object
FOR EACH ROW
DECLARE
temporary_archive temporary_object.id%type;
begin
if inserting then
select id into temporary_archive
from temporary_object
where id = :old.id;
if temporary_archive is not null then
EXECUTE PROCEDURE archive_temps();
end if;
end if;ins
It looks to me from your example like you are trying to do things the Oracle way. PostgreSQL is different. I also do not see any uses of cursors anywhere in your code so the question may be misleading.
In PostgreSQL, triggers can only call procedures and the procedures must be largely self-contained. In essence you are going to have to refactor your code a bit. The only accepted syntax after FOR EACH ROW is EXECUTE PROCEDURE and so your other logic will have to be moved into user defined functions. You could, if you want, apply the function in a WHEN condition by calling the function (if your function returns bool). Or you could build it into your trigger logic.