PostgreSQL generic handler for serialization failure - postgresql

This is a followup question from this one so I know I can use (blocking) LOCKs but I want to use predicate locks and serializable transaction isolation.
What I'd like to have is a generic handler of serialization failures that would retry the function/query X number of times.
As example, I have this:
CREATE SEQUENCE account_id_seq;
CREATE TABLE account
(
id integer NOT NULL DEFAULT nextval('account_id_seq'),
title character varying(40) NOT NULL,
balance integer NOT NULL DEFAULT 0,
CONSTRAINT account_pkey PRIMARY KEY (id)
);
INSERT INTO account (title) VALUES ('Test Account');
CREATE OR REPLACE FUNCTION mytest() RETURNS integer AS $$
DECLARE
cc integer;
BEGIN
cc := balance from account where id=1;
RAISE NOTICE 'Balance: %', cc;
perform pg_sleep(3);
update account set balance = cc+10 where id=1 RETURNING balance INTO cc;
return cc;
END
$$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION myretest() RETURNS integer AS $$
DECLARE
tries integer := 5;
BEGIN
WHILE TRUE LOOP
BEGIN -- nested block for exception
RETURN mytest();
EXCEPTION
WHEN SQLSTATE '40001' THEN
IF tries > 0 THEN
tries := tries - 1;
RAISE NOTICE 'Restart! % left', tries;
ELSE
RAISE EXCEPTION 'NO RESTARTS LEFT';
END IF;
END;
END LOOP;
END
$$
LANGUAGE plpgsql;
So if call mytest() directly concurrently I get a serialization failure on the last commit:
4SO$ psql -c "select mytest()" & PIDA=$! && psql -c "select mytest()" && wait $PIDA
[1] 4909
NOTICE: Balance: 0
NOTICE: Balance: 0
mytest
--------
10
(1 row)
ERROR: could not serialize access due to concurrent update
CONTEXT: SQL statement "update account set balance = cc+10 where id=1 RETURNING balance"
PL/pgSQL function mytest() line 10 at SQL statement
If I call myretest() it should try to execute mytest() up until the 5th try where it would raise the exception.
So I have two points here (where maybe point 2 also invalidates point 1):
myretest() does not work as expected, every iteration results in serialiation_failure exception even after the concurrent thread finishes: is there something I should add to "reset" the transaction?
how could I make this (myretest() logic) generic so that it would apply to every called function in the system without the need for "wrapper" functions as such?

Serializable transactions provide exactly what you are looking for as long as you use some framework that starts the transaction over when it receives an error with a SQLSTATE of 40001 or 40P01.
In PostgreSQL a function always runs in the context of a transaction. You can't start a new transaction within the context of a "wrapper" function. That would require a slightly different feature, which is commonly called a "stored procedure" -- something which doesn't exist in PostgreSQL. Therefore, you need to put the logic to manage the restart into code which submits the transaction to the database. Fortunately, there are many connectors for that -- Java, perl, python, tcl, ODBC, etc. There is even a connector for making a separate connection to a PostgreSQL database within a PostgreSQL procedural language, which might allow you to do something like what you want:
http://www.postgresql.org/docs/current/static/dblink.html
I have seen this done in various "client" frameworks. Clearly it is a bad idea to spread this around to all locations where the application is logically dealing with the database, but there are many good reasons to route all database requests through one "accessor" method (or at least a very small number of them), and most frameworks provide a way to deal with this at that layer. (For example, in Spring you would want to create a transaction manager using dependency injection.) That probably belongs in some language you are using for your application logic, but if you really wanted to you could probably use plpgsql and dblink; that's probably not going to be your easiest path, though.

Related

PostgreSQL BEFORE INSERT trigger locking behavior in a concurrent environment

I have a general function that can manipulate the sequence of any table (why is irrelevant to my question). It reads the current value, works out the new value, sets it, and returns its calculation, which is what's inserted. This is obviously a multi-step process.
I call it from a BEFORE INSERT trigger on tables where I need it.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
Specifically, does the BEFORE INSERT trigger have to complete before it is called again by another caller?
Logically, I would assume yes, but one never knows what may be going on under the hood.
If the answer is no, what minimal locking would I need on the function to guarantee I can read and write the sequence in a "thread-safe" manner?
I'm using PG 10.
EDIT
Here is the function updated with a lock:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
seq text := format('%I.%I_uts_seq', tg_table_schema, tg_table_name);
BEGIN
EXECUTE format('LOCK TABLE %I IN ROW EXCLUSIVE MODE;', tg_table_name);
EXECUTE 'SELECT last_value+1 FROM ' || seq INTO sv; -- currval(seq) isn't useable
PERFORM setval(seq, GREATEST(sv, (EXTRACT(epoch FROM localtimestamp) * 1000000)::int8), false);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
However, a SELECT already acquires ROW EXCLUSIVE, so this statement may be redundant and a stronger lock may be needed. Or, conversely, it may mean no lock is needed.
UPDATE
If I am reading this SO question correctly, my original version without the LOCK should work since the trigger acquires the same lock my updated function is redundantly taking.
All I need to know is am I guaranteed that the function will be called by only one caller at a time in a multi-user environment?
No. Not related to calling functions itself, but you can achieve this behaviour with SERIALIZABLE transaction isolation level:
This level emulates serial transaction execution for all committed
transactions; as if transactions had been executed one after another,
serially, rather than concurrently
But this approach would introduce several tradeoffs, such preparing your application to retry transactions with serialization failure.
Maybe a missed something, but I really believe that you just need NEXTVAL, something like below:
CREATE OR REPLACE FUNCTION public.uts_set()
RETURNS TRIGGER AS
$$
DECLARE
sv int8;
-- First, use %I wildcard for identifiers instead of %s
seq text := format('%I.%I', tg_table_schema, tg_table_name || '_uts_seq');
BEGIN
-- Second, you couldn't call CURRVAL on a session
-- that you didn't issued NEXTVAL before
sv := NEXTVAL(seq);
-- Do your logic here...
-- Result is ignored since this is an STATEMENT trigger
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Remember that CURRVAL acts on session local scope and NEXTVAL on global scope, so you have a reliable thread-safe mechanism in hands.
The sequence itself handles thread safety with concurrent sessions. So it real comes down to the code that is interacting with the sequence. The following code is thread safe:
SELECT nextval('myseq');
If the sequence is doing much fancier things like setval and currval, I would be more worried about that being done in a high transaction/multi-user environment. Even so, the sequence itself should be locked from other queries while the sequence is being manipulated.

Persisting some changes before raising an exception in PL/pgSQL function

I'm trying to write a PL/pgSQL function in which I first validate parameters (mostly check whether the supplied ids exist).
When one of this validations fails, I raise an exception stating the reason so the client code can try again.
The problem I'm facing is that, for safety reasons (I can provide more context if needed, but basically I want to leave the app in a non-functional state until specialized intervention), I'd like to write some values to a table before raising the exception and rolling back the changes. It's only some of these changes that I'd like persisted (not rolled back).
I understand transactions cannot be used inside the function because there's no context, and I found that I could probably do what I want to do using dblink (which I just found out).
The thing is it really feels hackish, so I'd like to ask if this is a reasonable idea or not.
Here's some pseudocode to illustrate:
CREATE FUNCTION func(x_id INT) RETURNS INT AS $$
DECLARE
BEGIN
PERFORM * FROM x_table WHERE id = x_id;
IF NOT FOUND THEN
-- write persisting values that will prevent further
-- use, probably using dblink
RAISE EXCEPTION 'Invalid x_id: %', x_id
END IF;
-- function logic
END;
$$ LANGUAGE plpgsql;

Is this generic MERGE/UPSERT function for PostgreSQL safe?

I have created a "merge" function which is supposed to execute either an UPDATE or an INSERT query, depending on existing data. Instead of writing an upsert-wrapper for each table (as in most of the available examples), this function takes entire SQL strings. Both of the SQL strings are automatically generated by our application.
The plan is to call the function like this:
-- hypothetical "settings" table, with a primary key of (user_id, setting):
SELECT merge(
$$UPDATE settings SET value = 'x' WHERE user_id = 42 AND setting = 'foo'$$,
$$INSERT INTO settings (user_id, setting, value) VALUES (42, 'foo', 'x')$$
);
Here's the full code of the merge() function:
CREATE OR REPLACE FUNCTION merge (update_sql TEXT, insert_sql TEXT) RETURNS TEXT AS
$func$
DECLARE
max_iterations INTEGER := 10;
i INTEGER := 0;
num_updated INTEGER;
BEGIN
-- usually returns before re-entering the loop
LOOP
-- first try the update
EXECUTE update_sql;
GET DIAGNOSTICS num_updated = ROW_COUNT;
IF num_updated > 0 THEN
RETURN 'UPDATE';
END IF;
-- nothing was updated: try the insert, watching out for concurrent inserts
BEGIN
EXECUTE insert_sql;
RETURN 'INSERT';
EXCEPTION WHEN unique_violation THEN
-- nop; just loop and try again from the top
END;
-- emergency brake
i := i + 1;
IF i >= max_iterations THEN
RAISE EXCEPTION 'merge(): tried looping % times, giving up now.', i;
EXIT;
END IF;
END LOOP;
END;
$func$
LANGUAGE plpgsql;
It appears to work well enough in my tests, but I'm not certain if I haven't missed anything crucial, especially regarding concurrent UPDATE/INSERT/DELETE queries, which may be issued without using this function. Did I overlook anything important?
Among the resources I consulted for this function are:
UPDATE/INSERT example 40.2 in the PostgreSQL manual
Why is UPSERT so complicated?
SO: Insert, on duplicate update (postgresql)
(Edit: one of the goals was to avoid locking the target table.)
The answer to your question depends your the context of how your application(s) will access the database. There are many ways to solve this as nicely discussed in depesz's post you cited by yourself. In addition you might want to also consider using writeable CTEs see here. Also the [question]Insert, on duplicate update in PostgreSQL? has some interesting discussions for your decision making process.

PostgreSQL cannot begin/end transactions in PL/pgSQL

I am seeking clarification of how to ensure an atomic transaction in a plpgsql function, and where the isolation level is set for this particular change to the database.
In the plpgsql function shown below, I want to make sure that BOTH the deletion AND the insertion succeed. I am getting an error when I try to wrap them in a single transaction:
ERROR: cannot begin/end transactions in PL/pgSQL
What happens during execution of the function below if another user has added a default behavior for circumstances ('RAIN', 'NIGHT', '45MPH') after this function has deleted the custom row but before it has had a chance to insert the custom row? Is there an implicit transaction wrapping the insert and delete so that both are rolled back if another user has changed either of the rows referenced by this function? Can I set the isolation level for this function?
create function foo(v_weather varchar(10), v_timeofday varchar(10), v_speed varchar(10),
v_behavior varchar(10))
returns setof CUSTOMBEHAVIOR
as $body$
begin
-- run-time error if either of these lines is un-commented
-- start transaction ISOLATION LEVEL READ COMMITTED;
-- or, alternatively, set transaction ISOLATION LEVEL READ COMMITTED;
delete from CUSTOMBEHAVIOR
where weather = 'RAIN' and timeofday = 'NIGHT' and speed= '45MPH' ;
-- if there is no default behavior insert a custom behavior
if not exists
(select id from DEFAULTBEHAVIOR where a = 'RAIN' and b = 'NIGHT' and c= '45MPH') then
insert into CUSTOMBEHAVIOR
(weather, timeofday, speed, behavior)
values
(v_weather, v_timeofday, v_speed, v_behavior);
end if;
return QUERY
select * from CUSTOMBEHAVIOR where ... ;
-- commit;
end
$body$ LANGUAGE plpgsql;
A plpgsql function automatically runs inside a transaction. It all succeeds or it all fails. The manual:
Functions and trigger procedures are always executed within a
transaction established by an outer query — they cannot start or
commit that transaction, since there would be no context for them to
execute in. However, a block containing an EXCEPTION clause
effectively forms a subtransaction that can be rolled back without
affecting the outer transaction. For more about that see Section 42.6.6.
So, if you need to, you can catch an exception that theoretically might occur (but is very unlikely).
Details on trapping errors in the manual.
Your function reviewed and simplified:
CREATE FUNCTION foo(v_weather text
, v_timeofday text
, v_speed text
, v_behavior text)
RETURNS SETOF custombehavior
LANGUAGE plpgsql AS
$func$
BEGIN
DELETE FROM custombehavior
WHERE weather = 'RAIN'
AND timeofday = 'NIGHT'
AND speed = '45MPH';
INSERT INTO custombehavior (weather, timeofday, speed, behavior)
SELECT v_weather, v_timeofday, v_speed, v_behavior
WHERE NOT EXISTS (
SELECT FROM defaultbehavior
WHERE a = 'RAIN'
AND b = 'NIGHT'
AND c = '45MPH'
);
RETURN QUERY
SELECT * FROM custombehavior WHERE ... ;
END
$func$;
If you actually need to begin/end transactions like indicated in the title look to SQL procedures in Postgres 11 or later (CREATE PROCEDURE). See:
In PostgreSQL, what is the difference between a “Stored Procedure” and other types of functions?
Update: after PostgreSQL version 11. you can control transaction inside Store Procedure.
=====
Before Version 10:
START TRANSACTION;
select foo() ;
COMMIT;
"Unfortunately Postgres has no stored procedures, so you always need to manage the transaction in the calling code" – a_horse_with_no_name
Transaction in an exception block - how?

postgresql privileges Ensuring inserts are only done through functions

Let's say I have a table persons which contains only a name(varchar) and a user client.
I'd like that the only way for client to insert to persons is through the function:
CREATE OR REPLACE FUNCTION add_a_person(a_name varying character)
RETURNS void AS
$BODY$
BEGIN
INSERT INTO persons VALUES(a_name);
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
So, I don't want to grant client insert privileges on persons and only give execute privilege for add_a_person.
But without doing so, I'd get a permission denied because of the use of insert inside the function.
I have not found a way to this in the postgres documentation about granting privileges.
Is there a way to do this?
You can define the function with SECURITY DEFINER. This will allow the function to run for the restricted user as if they had the higher privileges of the function's creator (which needs to be able to insert into the table).
The last line of the definition would look like this:
LANGUAGE plpgsql VOLATILE COST 100 SECURITY DEFINER;
This is a bit simplistic, but assuming are running 9.2 or later, this is an example of how to check for a single permitted function doing an insert:
CREATE TABLE my_table (col1 text, col2 integer, col3 timestamp);
CREATE FUNCTION my_table_insert_function(col1 text, col2 integer) RETURNS integer AS $$
BEGIN
INSERT INTO my_table VALUES (col1, col2, current_timestamp);
RETURN 1;
END $$ LANGUAGE plpgsql;
CREATE FUNCTION my_table_insert_trigger_function() RETURNS trigger AS $$
DECLARE
stack text;
fn integer;
BEGIN
RAISE EXCEPTION 'secured';
EXCEPTION WHEN OTHERS THEN
BEGIN
GET STACKED DIAGNOSTICS stack = PG_EXCEPTION_CONTEXT;
fn := position('my_table_insert_function' in stack);
IF (fn <= 0) THEN
RAISE EXCEPTION 'Expecting insert from my_table_insert_function'
USING HINT = 'Use function to insert data';
END IF;
RETURN new;
END;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER my_table_insert_trigger BEFORE INSERT ON my_table
FOR EACH ROW EXECUTE PROCEDURE my_table_insert_trigger_function();
And a quick example of usage:
INSERT INTO my_table VALUES ('test one', 1, current_timestamp); -- FAILS
SELECT my_table_insert_function('test one', 1); -- SUCCEEDS
You'll want to peek into the stack in more detail if you want your code to be more robust, secure, etc. Checks for multiple functions are possible, of course, but involve more work. Splitting the stack into multiple lines and parsing it can be fairly involved, so you'll probably want some helper functions if things get more complex.
This is just a proof of concept, but it does what it claims. I would expect this code to be fairly slow given the use of exception handling and stack inspection, so don't use it in performance-critical parts of your application. It's not likely to be suitable for cases where DML statements are frequent, but if security is more important than performance, go for it.
Matthew's answer is correct in that a SECURITY DEFINER will allow the function to run with the privileges of a different user. Documentation for this is at http://www.postgresql.org/docs/9.1/static/sql-createfunction.html
Why are you trying to implement security this way? If you want to enforce some logic on the inserts, then I would strongly recommend doing it with constraints. http://www.postgresql.org/docs/9.1/static/ddl-constraints.html
If you want substantially higher levels of logic than can be reasonably implemented in constraints, I would suggest looking into building a business logic layer between your presentation layer and the data storage layer. You will find that scalability demands this pretty much instantly.
If your goal is to defend against SQL injection then you have found a way that might work, but that will create a heck of a lot of work for you. Worse, it leads to huge volumes of really mindless code that all has to be kept in sync across schema changes. This is pretty rough if you're trying to do anything agile. Consider instead using a programming framework that takes advantage of PREPARE / EXECUTE, which is pretty much all of them at this point.
http://www.postgresql.org/docs/9.0/static/sql-prepare.html