How bad are savepoints in PostgreSQL? - postgresql

We are in process of migrating exisiting java application from Oracle DB to PostgreSQL and recently we have noticed following "unexpected" behaviour from PostgreSQL:
session#1:
db=> begin;
BEGIN
db=*> select r_object_id from dm_sysobject_s where r_object_id='08000000800027d6' for update;
r_object_id
------------------
08000000800027d6
session#2:
db=> begin;
BEGIN
db=*> select r_object_id from dm_sysobject_s where r_object_id='08000000800027d6' for update nowait;
ERROR: could not obtain lock on row in relation "dm_sysobject_s"
db=!> select r_object_id from dm_sysobject_s where r_object_id='08000000800027d6' for update nowait;
ERROR: current transaction is aborted, commands ignored until end of transaction block
while in Oracle everything works as expected:
session#1:
SQL> set autocommit off;
SQL> select r_object_id from dm_sysobject_s where r_object_id='0800012d80000122' for update;
R_OBJECT_ID
----------------
0800012d80000122
session#2:
SQL> set autocommit off;
SQL> select r_object_id from dm_sysobject_s where r_object_id='0800012d80000122' for update nowait;
select r_object_id from dm_sysobject_s where r_object_id='0800012d80000122' for update nowait
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
SQL> select r_object_id from dm_sysobject_s where r_object_id='0800012d80000122' for update nowait;
select r_object_id from dm_sysobject_s where r_object_id='0800012d80000122' for update nowait
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
SQL> commit;
I already read PSQLException: current transaction is aborted, commands ignored until end of transaction block topic, and got an understanding that the only way to preserve previous Oracle behaviour (i.e. keeping transaction active after unsuccessful attempt to lock row) in DB layer is to use savepoints. Ok, so far so good, and the code below is doing what is expected at least in cases when Supplier<T> supplier is performing DB operations only (I do understand the risks of getting discrepancies between DB and persistence context in case when I perform more sophisticated operations backed up by savepoint):
#Override
public <T> T withSavepoint(SessionImplementor session, Supplier<T> supplier) {
return session.doReturningWork(connection -> {
DatabaseMetaData metaData = connection.getMetaData();
if (!metaData.supportsSavepoints()) {
return supplier.get();
}
boolean success = false;
Savepoint savepoint = null;
try {
savepoint = connection.setSavepoint();
T result = supplier.get();
success = true;
return result;
} finally {
if (savepoint != null) {
if (!success) {
connection.rollback(savepoint);
}
connection.releaseSavepoint(savepoint);
}
}
});
}
After some research I have discovered that implementation of savepoints in PostgreSQL may cause severe performance issues, for example:
Why we spent the last month eliminating PostgreSQL subtransactions
PostgreSQL Subtransactions and performance
PostgreSQL Subtransactions Considered Harmful
However, none of those blogposts actually provide information about what savepoint patters are safe and which aren't, so my question is following:
is it safe to use following savepoint pattern in PostgreSQL or not:
savepoint s1;
select id from tbl where id=? for update nowait;
rollback to/release s1;
I do see it is not possible to avoid XID growth, however I'm not sure about it's performance impact, what about other pitfalls?

It is safe to use that pattern in that it will not produce wrong results or break the database, but it is guaranteed to lead to terrible performance. Do not set savepoints for each individual statement. You have to think more carefully and set them only where you really need them, that is, before statements that are expected to sometimes fail, but that shouldn't abort the transaction when they fail.
To answer your question, and to quote from my article: the process array, which is stored in shared memory and contains information about all currently running backends, has room for at most 64 non-aborted subtransactions per session. After that, the subtransaction IDs spill to disk, which is bad for performance. So don't set more than 64 savepoints per transaction, and remember that the PL/pgSQL construct BEGIN ... EXCEPTION ... END is implemented with a savepoint.

Related

When will select query aquire ExclusiveLock and RowExclusiveLock in PostgreSQL?

According to official documentation, select query only need sharelock, but I found my select query acquired Exclusive lock. How did it happen? Here is my select query:
select gc.id
from group_access_strategy ga
left outer join person_group pg on gp.person_group_id=pg.id
where gp.id=3
what is different from official documentation is that I added left join.
Most likely you ran another command like ALTER TABLE person_group ... (Access Exclusive) or an UPDATE/INSERT/DELETE (Row exclusive) in the same transaction. Locks will persist until a transaction is completed or aborted.
So if you ran:
BEGIN; --BEGIN starts the transaction
UPDATE group_access_strategy SET column = 'some data' where id = 1;
SELECT
gc.id,
FROM
group_access_strategy ga
LEFT OUTER JOIN person_group pg ON (gp.person_group_id = pg.id)
WHERE
pg.id = 3
The UPDATE statement would have created a Row Exclusive Lock that will not be released until you end the transaction by:
Saving all of the changes made since BEGIN:
COMMIT;
OR
nullifying any of the effects of statements since BEGIN with
ROLLBACK;
If you're new to Postgres and typically run your queries in an IDE like PG Admin or DataGrip, the BEGIN / COMMIT ROLLBACK commands are issued behind the scenes for you when you click the corresponding UI buttons.

How to prevent or avoid running update and delete statements without where clauses in PostgreSQL

How to prevent or avoid running update or delete statements without where clauses in PostgreSQL?
Same as SQL_SAFE_UPDATES statement in MySQL is needed for PostgreSQL.
For example:
UPDATE table_name SET active=1; -- Prevent this statement or throw error message.
UPDATE table_name SET active=1 WHERE id=1; -- This is allowed
My company database has many users with insert and update privilege any one of the users do that unsafe update.
In this secoario how to handle this.
Any idea can write trigger or any extension to handle the unsafe update in PostgreSQL.
I have switched off autocommits to avoid these errors. So I always have a transaction that I can roll back. All you have to do is modify .psqlrc:
\set AUTOCOMMIT off
\echo AUTOCOMMIT = :AUTOCOMMIT
\set PROMPT1 '%[%033[32m%]%/%[%033[0m%]%R%[%033[1;32;40m%]%x%[%033[0m%]%# '
\set PROMPT2 '%[%033[32m%]%/%[%033[0m%]%R%[%033[1;32;40m%]%x%[%033[0m%]%# '
\set PROMPT3 '>> '
You don't have to insert the PROMPT statements. But they are helpful because they change the psql prompt to show the transaction status.
Another advantage of this approach is that it gives you a chance to prevent any erroneous changes.
Example (psql):
database=# SELECT * FROM my_table; -- implicit start transaction; see prompt
-- output result
database*# UPDATE my_table SET my_column = 1; -- missed where clause
UPDATE 525125 -- Oh, no!
database*# ROLLBACK; -- Puh! revert wrong changes
ROLLBACK
database=# -- I'm completely operational and all of my circuits working perfectly
There actually was a discussion on the hackers list about this very feature. It had a mixed reception, but might have been accepted if the author had persisted.
As it is, the best you can do is a statement level trigger that bleats if you modify too many rows:
CREATE TABLE deleteme
AS SELECT i FROM generate_series(1, 1000) AS i;
CREATE FUNCTION stop_mass_deletes() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
IF (SELECT count(*) FROM OLD) > TG_ARGV[0]::bigint THEN
RAISE EXCEPTION 'must not modify more than % rows', TG_ARGV[0];
END IF;
RETURN NULL;
END;$$;
CREATE TRIGGER stop_mass_deletes AFTER DELETE ON deleteme
REFERENCING OLD TABLE AS old FOR EACH STATEMENT
EXECUTE FUNCTION stop_mass_deletes(10);
DELETE FROM deleteme WHERE i < 100;
ERROR: must not modify more than 10 rows
CONTEXT: PL/pgSQL function stop_mass_deletes() line 1 at RAISE
DELETE FROM deleteme WHERE i < 10;
DELETE 9
This will have a certain performance impact on deletes.
This works from v10 on, when transition tables were introduced.
If you can afford making it a little less convinient for your users, you might try revoking UPDATE privilege for all "standard" users and creating a stored procedure like this:
CREATE FUNCTION update(table_name, col_name, new_value, condition) RETURNS void
/*
Check if condition is acceptable, create and run UPDATE statement
*/
LANGUAGE plpgsql SECURITY DEFINER
Because of SECURITY DEFINER this way your users will be able to UPDATE despite not having UPDATE privilege.
I'm not sure if this is a good approach, but this way you can force as strict UPDATE (or anything else) requirements as you wish.
Of course the more complicated UPDATES are required, the more complicated has to be your procedure, but if this is mostly just about updating single row by ID (as in your example) this might be worth a try.

What is a good way of rolling back a transaction in Postgres

I want to insert data into a table from a staging table but keep the data unchanged if an error happens.
What I have is a working happy path
Begin transaction;
DELETE FROM mytable;
INSERT INTO mytable SELECT * FROM mytable_staging ;
Commit transaction;
In case the insert statement is failing how can I rollback the transaction?
PostgreSQL transactions will roll back on error automatically, see this.
Atomicity − Ensures that all operations within the work unit are
completed successfully; otherwise, the transaction is aborted at the
point of failure and previous operations are rolled back to their
former state.
Consistency − Ensures that the database properly changes states upon a
successfully committed transaction.
Isolation − Enables transactions to operate independently of and
transparent to each other.
Durability − Ensures that the result or effect of a committed
transaction persists in case of a system failure.
You can rollback a Postgres transaction using the ROLLBACK [WORK | TRANSACTION] statement:
Begin transaction;
DELETE FROM mytable;
INSERT INTO mytable SELECT * FROM mytable_staging ;
Rollback transaction;
All the SQL commands are case-insensitive and the transaction part of the statement is optional, but I like to include it for clarity.

Locking in Postgres function

Let's say I have a transactions table and transaction_summary table. I have created following trigger to update transaction_summary table.
CREATE OR REPLACE FUNCTION doSomeThing() RETURNS TRIGGER AS
$BODY$
DECLARE
rec_cnt bigint;
BEGIN
-- lock rows which have to be updated
SELECT count(1) from (SELECT 1 FROM transaction_summary WHERE receiver = new.receiver FOR UPDATE) r INTO rec_cnt ;
IF rec_cnt = 0
THEN
-- if there are no rows then create new entry in summary table
-- lock whole table
LOCK TABLE "transaction_summary" IN ACCESS EXCLUSIVE MODE;
INSERT INTO transaction_summary( ... ) VALUES ( ... );
ELSE
UPDATE transaction_summary SET ... WHERE receiver = new.receiver;
END IF;
SELECT count(1) from (SELECT 1 FROM transaction_summary WHERE sender = new.sender FOR UPDATE) r INTO rec_cnt ;
IF rec_cnt = 0
THEN
LOCK TABLE "transaction_summary" IN ACCESS EXCLUSIVE MODE;
INSERT INTO transaction_summary( ... ) VALUES ( ... );
ELSE
UPDATE transaction_summary SET ... WHERE sender = new.sender;
END IF;
RETURN new;
END;
$BODY$
language plpgsql;
Question: Will there be a dead lock? According to my understanding deadlock it might happen like this:
_________
|__table__| <- executor #1 waits on executor #2 to be able to lock the whole table AND
|_________| executor #2 waits on executor #1 to be able to lock the whole table
|_________|
|_________| <- row is locked by executor #1
|_________|
|_________| <- row is locked by executor #2
It seems that only option is to lock the whole table every time in transaction beginning.
Are your 'SELECT 1 FROM transactions WHERE ...' meant to access 'transactions_summary' instead? Also, notice that those two queries can at least theoretically deadlock each other if two DB transactions are inserting two 'transactions' rows, with new.sender1=new.receiver2 and new.receiver1=new.sender2.
You can't, in general, guarantee that you won't get a deadlock from a database. Even if you try and prevent them by writing your queries carefully (eg, ordering updates) you can still get caught out because you can't control the order of INSERT/UPDATE, or of constraint checks. In any case, comparing every transaction against every other to check for deadlocks doesn't scale as your application grows.
So, your code should always be prepared to re-run transactions when you get 'deadlock detected' errors. If you do that and you think that conflicting transactions will be uncommon then you might as well let your deadlock handling code deal with it.
If you think deadlocks will be common then it might cause you a performance problem - although contending on a big table lock could be, too. Here are some options:
If new.receiver and new.sender are, for example, the IDs of rows in a MyUsers table, you could require all code which inserts into 'transactions_summary' to first do 'SELECT 1 FROM MyUsers WHERE id IN (user1, user2) FOR UPDATE'. It'll break if someone forgets, but so will your table locking. By doing it that way you'll swap one big table lock for many separate row locks.
Add UNIQUE constraints to transactions_summary and look for the error when it's violated. You should probably add constraints anyway, even if you handle this another way. It'll detect bugs.
You could allow duplicate transaction_summary rows, and require users of that table to add them up. Messy, and easy for developers who don't know to create bugs (though you could add a view which does the adding). But if you really can't take the performance hit of locking and deadlocks you could do it.
You could try the SERIALIZABLE transaction isolation level and take out the table locks. By my reading, the SELECT ... FOR UPDATE should create a predicate lock (and so should a plain SELECT). That'd stop any other transaction that does a conflicting insert from committing successfully. However, using SERIALIZABLE throughout your application will cost you performance and give you a lot more transactions to retry.
Here's how SERIALIZABLE transaction isolation level works:
create table test (id serial, x integer, total integer); ...
Transaction 1:
DB=# begin transaction isolation level serializable;
BEGIN
DB=# insert into test (x, total) select 3, 100 where not exists (select true from test where x=3);
INSERT 0 1
DB=# select * from test;
id | x | total
----+---+-------
1 | 3 | 100
(1 row)
DB=# commit;
COMMIT
Transaction 2, interleaved line for line with the first:
DB=# begin transaction isolation level serializable;
BEGIN
DB=# insert into test (x, total) select 3, 200 where not exists (select true from test where x=3);
INSERT 0 1
DB=# select * from test;
id | x | total
----+---+-------
2 | 3 | 200
(1 row)
DB=# commit;
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.

Are PostgreSQL functions transactional?

Is a PostgreSQL function such as the following automatically transactional?
CREATE OR REPLACE FUNCTION refresh_materialized_view(name)
RETURNS integer AS
$BODY$
DECLARE
_table_name ALIAS FOR $1;
_entry materialized_views%ROWTYPE;
_result INT;
BEGIN
EXECUTE 'TRUNCATE TABLE ' || _table_name;
UPDATE materialized_views
SET last_refresh = CURRENT_TIMESTAMP
WHERE table_name = _table_name;
RETURN 1;
END
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
In other words, if an error occurs during the execution of the function, will any changes be rolled back? If this isn't the default behavior, how can I make the function transactional?
PostgreSQL 12 update: there is limited support for top-level PROCEDUREs that can do transaction control. You still cannot manage transactions in regular SQL-callable functions, so the below remains true except when using the new top-level procedures.
Functions are part of the transaction they're called from. Their effects are rolled back if the transaction rolls back. Their work commits if the transaction commits. Any BEGIN ... EXCEPT blocks within the function operate like (and under the hood use) savepoints like the SAVEPOINT and ROLLBACK TO SAVEPOINT SQL statements.
The function either succeeds in its entirety or fails in its entirety, barring BEGIN ... EXCEPT error handling. If an error is raised within the function and not handled, the transaction calling the function is aborted. Aborted transactions cannot commit, and if they try to commit the COMMIT is treated as ROLLBACK, same as for any other transaction in error. Observe:
regress=# BEGIN;
BEGIN
regress=# SELECT 1/0;
ERROR: division by zero
regress=# COMMIT;
ROLLBACK
See how the transaction, which is in the error state due to the zero division, rolls back on COMMIT?
If you call a function without an explicit surounding transaction the rules are exactly the same as for any other Pg statement:
BEGIN;
SELECT refresh_materialized_view(name);
COMMIT;
(where COMMIT will fail if the SELECT raised an error).
PostgreSQL does not (yet) support autonomous transactions in functions, where the procedure/function could commit/rollback independently of the calling transaction. This can be simulated using a new session via dblink.
BUT, things that aren't transactional or are imperfectly transactional exist in PostgreSQL. If it has non-transactional behaviour in a normal BEGIN; do stuff; COMMIT; block, it has non-transactional behaviour in a function too. For example, nextval and setval, TRUNCATE, etc.
As my knowledge of PostgreSQL is less deeper than Craig Ringer´s I will try to give a shorter answer: Yes.
If you execute a function that has an error in it, none of the steps will impact in the database.
Also, if you execute a query in PgAdmin the same happen.
For example, if you execute in a query:
update your_table yt set column1 = 10 where yt.id=20;
select anything_that_do_not_exists;
The update in the row, id = 20 of your_table will not be saved in the database.
UPDATE Sep - 2018
To clarify the concept I have made a little example with non-transactional function nextval.
First, let´s create a sequence:
create sequence test_sequence start 100;
Then, let´s execute:
update your_table yt set column1 = 10 where yt.id=20;
select nextval('test_sequence');
select anything_that_do_not_exists;
Now, if we open another query and execute
select nextval('test_sequence');
We will get 101 because the first value (100) was used in the latter query (that is because the sequences are not transactional) although the update was not committed.
https://www.postgresql.org/docs/current/static/plpgsql-structure.html
It is important not to confuse the use of BEGIN/END for grouping statements in PL/pgSQL with the similarly-named SQL commands for transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do not start or end a transaction. Functions and trigger procedures are always executed within a transaction established by an outer query — they cannot start or commit that transaction, since there would be no context for them to execute in. However, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without affecting the outer transaction. For more about that see Section 39.6.6.
In the function level, it is not transnational. In other words, each statement in the function belongs to a single transaction, which is the default db auto commit value. Auto commit is true by default. But anyway, you have to call the function using
select schemaName.functionName()
The above statement 'select schemaName.functionName()' is a single transaction, let's name the transaction T1, and so the all the statements in the function belong to the transaction T1. In this way, the function is in a single transaction.
Postgres 14 update: All statements written in between the BEGIN and END block of a Procedure/Function is executed in a single transaction. Thus, any errors arising while execution of this block will cause automatic roll back of the transaction.
Additionally, the ATOMIC Transaction including triggers as well.