Is there a way to roll back to a "committed savepoint"?
Afaik, the actual savepoints supported by postgresql are subtransactions and lose their meaning when the enclosing transaction commits or aborts. Are there "savepoints" across transaction boundaries?
Basically, what I want is execute these three transaction in order:
Transaction ~ A
BEGIN TRANSACTION;
COMMIT SAVEPOINT 'before_a';
DO SOMETHING;
COMMIT TRANSACTION;
Transaction ~ B
BEGIN TRANSACTION;
DO SOMETHING_ELSE;
COMMIT TRANSACTION;
Transaction ~ C
BEGIN TRANSACTION
ROLLBACK TO COMMITED SAVEPOINT 'before_a'; -- discards work done in A and B
COMMIT TRANSACTION
The reason is that I am writing a (Java) regression test.
Under faulty circumstances, DO SOMETHING_ELSE will trigger a transaction commit exception when trying to commit B (some foreign key constraint violations upon a DELETE, I believe), but ONLY if the work done in transaction A has been committed.
As the issue is now resolved, Transaction B will commit.
But in doing so, both A and B will have left some by-products in the database.
These need now be purged from the database if the next test is supposed to have any chance at succeeding.
It's exceedingly difficult to track these by-products down manually, so transaction C should remove these.
As has been mentioned in the comments, that is impossible in PostgreSQL.
Your only hope is to use subtransactions inside an enveloping transaction:
BEGIN;
SAVEPOINT before_a;
/* perform the work for A */
/* perform the work for B */
If B fails, do the following:
ROLLBACK TO SAVEPOINT before_a;
Then, no matter if B failed or not:
/* perform the work for C */
COMMIT;
Related
What's the right way to handle collision with another transaction when serializable isolation level is used with PostgreSQL?
Assuming that the application has already executed
begin transaction
...
savepoint p1
insert ...
If the application gets error during step 4
ERROR 40001 could not serialize access due to read/write dependencies among transactions
Reason code: Canceled on identification as a pivot, during conflict in checking.
The transaction might succeed if retried.
what is the correct teardown before retrying the transaction? Should application execute both queries
rollback to savepoint p1
rollback
or just rollback? If just rollback, can the name p1 used for savepoint during the retry? Also, what if the step 4 above was successful but then executed
release savepoint p1 ?
and that query was interrupted by ERROR 40001, should the application try to execute rollback to savepoint p1 or would that result in error? Or would release savepoint p1 succeed every time so there's no need to think what would happen if there's collision while executing it?
If there's some documentation about which queries cannot possibly result in transaction collision when executed at serializable isolation level, that would be great to find about.
Can someone tell me about how I can perform a distributed transaction on PostgreSQL?
I need to start transaction from a node x to node y (this node has a database).
But I don't find information on internet on how I can do it.
All I can do is a distributed query with:
select dblink_connect
('conn','dbname=ConsultaRemota host=192.168.3.9
user=remoto password=12345 port=5432');
select * from dblink('conn','select * from tablaremota') as
temp (id_remoto int, nombre_remoto text, descripcion text);
Using dblink is no true distributed transaction, because it is possible that the remote transaction succeeds, while the local transaction fails.
To perform a distributed transaction:
Create a normal transaction with BEGIN or START TRANSACTION on both databases.
Perform work on both databases.
Once you are done, prepare the transaction on both databases:
PREPARE TRANSACTION 'some_name';
This step will perform everything that could potentially fail during COMMIT and persist the transaction, but it will not yet commit it.
If that step fails somewhere, use ROLLBACK or ROLLBACK PREPARED to abort the transaction on all databases.
Commit the transaction on all databases:
COMMIT PREPARED 'some_name';
This is guaranteed to succeed.
To reliably perform a distributed transaction, you need a transaction manager: that is a piece of software that keeps track of all distributed transactions. This component has to persist its information, so that it can survive a crash. The job of the transaction manager is to commit or rollback any transaction that was left in an incomplete state after a crash.
This is necessary, because prepared transactions will stay around even if you restart the database, and they will hold locks and block VACUUM progress. Such orphaned prepared transactions can break your database.
Never use distributed transactions without a transaction manager!
I have a system in which I'll dispatch events as part of transactions when inserting rows into the database.
This is to ensure that events only ever get dispatched for a successful insert, and that inserts are rolled back incase event dispatching fails.
Problem is that I'm seeing a race condition when dispatching an event after a successful insert. Sometimes the corresponding event listeners will receive the event, before the transaction has been committed, which will result in the row not being available at this time
Want I want to achieve is when inserting row A, any processes trying to read row A must wait until row A's transaction has been committed
Is this a sound approach and how is it best achieved?
For two processes A and B
How it currently works
A Start transaction
A Attempt insert
A dispatch event
B Receives event and attempts to read the inserted row
B Exception is raised as record is not visible yet
A commits transaction
How I'd want it to work
A Start transaction
A Attempt insert
A dispatch event
B Receives event and attempts to read the inserted row
B The row is currently locked by transaction so it waits until released
A commits transaction
B Lock is released and newly inserted row is returned
Based on the question and the comments, I think the main issue here is that you want "A" to dispatch an event based on the content of an uncommitted row, and you want "B" to read and act on that uncommitted row. But in relational theory and in SQL databases, you don't know whether a commit will succeed. A commit can fail for lots of reasons--not only because some constraint fails, but because of things like insufficient permissions or disk space.
To borrow words from one of your comments, the dbms manages a database's transition from one consistent state to another. Uncommitted rows aren't known to be part of a consistent database state. Only successfully committed rows are part of a consistent database state.
So I think, in general, you need "A" to commit before "B" tries to read the new row. And I think this is true even if you switch to a dbms that supports the "read uncommitted" transaction isolation level. (PostgreSQL does not, at least not in the way you might think.)
That means "B" will be responsible for deleting the new row (or for telling "A" to delete it) if "dispatching" fails. Neither "A" nor "B" can roll back a committed transaction.
It might make more sense for another process, "C", to oversee the actions of both "A" and "B".
I'm using Postgres 9.1. I'm wondering if using multiple SELECT FOR UPDATES in the same transaction could potentially cause a race condition.
2 concurrent transactions:
transaction 1: select for update on table 1 -- successfully acquires lock
transaction 2: select for update on table 2 -- successfully acquires lock
transaction 2: select for update on table 1 -- waiting for lock release from transaction 1
transaction 1: select for update on table 2 -- waiting for lock release from transaction 2
What happens in this situation? Does one of the waiting transactions eventually time out? If so, is there a way to configure the timeout duration?
edit: is deadlock_timeout the configuration I am looking for?
Yes, you should look for the deadlock_timeout in the docs.
But your scenario doesn't means that there will be a deadlock, 'cos PostgreSQL is using row-level locks and it is not clear whether your transactions are concurring for the same rows.
Another option is to use serialization level higher then default READ COMMITTED. But in this case your application should be ready to receive exceptions with SQLCODE=40001:
ERROR: could not serialize access due to concurrent update
This is expected, you should just re-try transaction as is.
A very good overview of Serializable isolation level you can find on the wiki.
PostgreSQL will detect the deadlock on step 4 and will fail the transaction. Here's what happened when I tried it in psql (only showing step 4):
template1=# SELECT * FROM table2 FOR UPDATE;
ERROR: deadlock detected
DETAIL: Process 17536 waits for ShareLock on transaction 166946; blocked by process 18880.
Process 18880 waits for ShareLock on transaction 166944; blocked by process 17536.
HINT: See server log for query details.
template1=#
This happens after 1s, which is the default timeout. The other answer has more information about this.
The documentation for set local states:
"Note that SET LOCAL will appear to have no effect if it is executed outside a BEGIN block, since the transaction will end immediately."
If I'm using SET LOCAL in the context of read only transactions do I need to indicate the end of the transaction with a COMMIT statement? Is there any difference if I do this or not?
If your connection is closed without a COMMIT, PostgreSQL will automatically issue a ROLLBACK. In the context of a read only transaction, this has no consequence.
If your connection stays open after your transaction, you might want to issue a ROLLBACK (or a COMMIT, but generally a ROLLBACK is less costly) in order for your next transaction to execute in a clean state.