How to teardown transaction with serializable isolation level after a transaction collision? - postgresql

What's the right way to handle collision with another transaction when serializable isolation level is used with PostgreSQL?
Assuming that the application has already executed
begin transaction
...
savepoint p1
insert ...
If the application gets error during step 4
ERROR 40001 could not serialize access due to read/write dependencies among transactions
Reason code: Canceled on identification as a pivot, during conflict in checking.
The transaction might succeed if retried.
what is the correct teardown before retrying the transaction? Should application execute both queries
rollback to savepoint p1
rollback
or just rollback? If just rollback, can the name p1 used for savepoint during the retry? Also, what if the step 4 above was successful but then executed
release savepoint p1 ?
and that query was interrupted by ERROR 40001, should the application try to execute rollback to savepoint p1 or would that result in error? Or would release savepoint p1 succeed every time so there's no need to think what would happen if there's collision while executing it?
If there's some documentation about which queries cannot possibly result in transaction collision when executed at serializable isolation level, that would be great to find about.

Related

Distributed transaction on PostgreSQL

Can someone tell me about how I can perform a distributed transaction on PostgreSQL?
I need to start transaction from a node x to node y (this node has a database).
But I don't find information on internet on how I can do it.
All I can do is a distributed query with:
select dblink_connect
('conn','dbname=ConsultaRemota host=192.168.3.9
user=remoto password=12345 port=5432');
select * from dblink('conn','select * from tablaremota') as
temp (id_remoto int, nombre_remoto text, descripcion text);
Using dblink is no true distributed transaction, because it is possible that the remote transaction succeeds, while the local transaction fails.
To perform a distributed transaction:
Create a normal transaction with BEGIN or START TRANSACTION on both databases.
Perform work on both databases.
Once you are done, prepare the transaction on both databases:
PREPARE TRANSACTION 'some_name';
This step will perform everything that could potentially fail during COMMIT and persist the transaction, but it will not yet commit it.
If that step fails somewhere, use ROLLBACK or ROLLBACK PREPARED to abort the transaction on all databases.
Commit the transaction on all databases:
COMMIT PREPARED 'some_name';
This is guaranteed to succeed.
To reliably perform a distributed transaction, you need a transaction manager: that is a piece of software that keeps track of all distributed transactions. This component has to persist its information, so that it can survive a crash. The job of the transaction manager is to commit or rollback any transaction that was left in an incomplete state after a crash.
This is necessary, because prepared transactions will stay around even if you restart the database, and they will hold locks and block VACUUM progress. Such orphaned prepared transactions can break your database.
Never use distributed transactions without a transaction manager!

Postgresql - restore to savepoint across transaction boundaries

Is there a way to roll back to a "committed savepoint"?
Afaik, the actual savepoints supported by postgresql are subtransactions and lose their meaning when the enclosing transaction commits or aborts. Are there "savepoints" across transaction boundaries?
Basically, what I want is execute these three transaction in order:
Transaction ~ A
BEGIN TRANSACTION;
COMMIT SAVEPOINT 'before_a';
DO SOMETHING;
COMMIT TRANSACTION;
Transaction ~ B
BEGIN TRANSACTION;
DO SOMETHING_ELSE;
COMMIT TRANSACTION;
Transaction ~ C
BEGIN TRANSACTION
ROLLBACK TO COMMITED SAVEPOINT 'before_a'; -- discards work done in A and B
COMMIT TRANSACTION
The reason is that I am writing a (Java) regression test.
Under faulty circumstances, DO SOMETHING_ELSE will trigger a transaction commit exception when trying to commit B (some foreign key constraint violations upon a DELETE, I believe), but ONLY if the work done in transaction A has been committed.
As the issue is now resolved, Transaction B will commit.
But in doing so, both A and B will have left some by-products in the database.
These need now be purged from the database if the next test is supposed to have any chance at succeeding.
It's exceedingly difficult to track these by-products down manually, so transaction C should remove these.
As has been mentioned in the comments, that is impossible in PostgreSQL.
Your only hope is to use subtransactions inside an enveloping transaction:
BEGIN;
SAVEPOINT before_a;
/* perform the work for A */
/* perform the work for B */
If B fails, do the following:
ROLLBACK TO SAVEPOINT before_a;
Then, no matter if B failed or not:
/* perform the work for C */
COMMIT;

Is it possible that both transactions rollback during a deadlock or serialization error?

In PostgreSQL (and other MVCC databases), transactions can rollback due to a deadlock or serialization error. Assume two transactions are currently running, is it ever possible that both, instead of just one, transaction will fail due to this kind of errors?
The reason why I am asking is that I am writing a retry implementation. If both transactions can fail, we might end up in a never-ending loop of retries if both retry immediately. If only one transaction can fail, I don't see any harm in retrying as soon as possible.
Yes. A deadlock can involve more than two transactions. In this case more than one may be terminated. But this is an extremely rare condition. Normally.
If just two transactions deadlock, one survives. The manual:
PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete.
Serialization failures only happen in REPEATABLE READ or SERIALIZABLE transaction isolation. I wouldn't know of any particular limit to how many serialization failures can happen concurrently. But I also never heard of any necessity to delay retrying.
I would retry as soon as possible either way.

handle locks in redshift

I have a python script that executes multiple sql scripts (one after another) in Redshift. Some of the tables in these sql scripts can be queried multiple times. For ex. Table t1 can be SELECTed in one script and can be dropped/recreated in another script. This whole process is running in one transaction. Now, sometimes, I am getting deadlock detected error and the whole transaction is rolled back. If there is a deadlock on a table, I would like to wait for the table to be released and then retry the sql execution. For other types of errors, I would like to rollback the transaction. From the documentation, it looks like the table lock isn't released until end of transaction. I would like to achieve all or no data changes (which is accomplished by using transaction) but also would like to handle deadlocks. Any suggestion on how this can be accomplished?
I would execute all of the SQL you are referring to in one transaction with a retry loop. Below is the logic I use to handle concurrency issues and retry (pseudocode for brevity). I do not have the system wait indefinitely for the lock to be released. Instead I handle it in the application by retrying over time.
begin transaction
while not successful and count < 5
try
execute sql
commit
except
if error code is '40P01' or '55P03'
# Deadlock or lock not available
sleep a random time (200 ms to 1 sec) * number of retries
else if error code is '40001' or '25P02'
# "In failed sql transaction" or serialized transaction failure
rollback
sleep a random time (200 ms to 1 sec) * number of retries
begin transaction
else if error message is 'There is no active transaction'
sleep a random time (200 ms to 1 sec) * number of retries
begin transaction
increment count
The key components are catching every type of error, knowing which cases require a rollback, and having an exponential backoff for retries.

Can multiple SELECT FOR UPDATES in a single transaction cause a race condition (Postgres)?

I'm using Postgres 9.1. I'm wondering if using multiple SELECT FOR UPDATES in the same transaction could potentially cause a race condition.
2 concurrent transactions:
transaction 1: select for update on table 1 -- successfully acquires lock
transaction 2: select for update on table 2 -- successfully acquires lock
transaction 2: select for update on table 1 -- waiting for lock release from transaction 1
transaction 1: select for update on table 2 -- waiting for lock release from transaction 2
What happens in this situation? Does one of the waiting transactions eventually time out? If so, is there a way to configure the timeout duration?
edit: is deadlock_timeout the configuration I am looking for?
Yes, you should look for the deadlock_timeout in the docs.
But your scenario doesn't means that there will be a deadlock, 'cos PostgreSQL is using row-level locks and it is not clear whether your transactions are concurring for the same rows.
Another option is to use serialization level higher then default READ COMMITTED. But in this case your application should be ready to receive exceptions with SQLCODE=40001:
ERROR: could not serialize access due to concurrent update
This is expected, you should just re-try transaction as is.
A very good overview of Serializable isolation level you can find on the wiki.
PostgreSQL will detect the deadlock on step 4 and will fail the transaction. Here's what happened when I tried it in psql (only showing step 4):
template1=# SELECT * FROM table2 FOR UPDATE;
ERROR: deadlock detected
DETAIL: Process 17536 waits for ShareLock on transaction 166946; blocked by process 18880.
Process 18880 waits for ShareLock on transaction 166944; blocked by process 17536.
HINT: See server log for query details.
template1=#
This happens after 1s, which is the default timeout. The other answer has more information about this.