Can committing an transaction in PostgreSQL fail? - postgresql

If I execute some SQL inside a transaction successfully, can it happens that the commit will fail? And what are possible causes? Can it fail related to the executed queries, or just due to some DB side issues?
The question comes up because I need to judge if it makes sense to commit transactions inside tests or if it is "safe enough" to just rollback after each test case.

If I execute some SQL inside a transaction successfully, can it happens that the commit will fail?
Yes.
And what are possible causes?
DEFERRABLE constraints with SET CONSTRAINTS DEFERRED or in a one-statement autocommit transaction. (Can't happen unless you use DEFERRABLE constraints)
SERIALIZABLE transaction with serialization failure detected at commit time. (Can't happen unless you use SERIALIZABLE transactions)
Asynchronous commit where the DB crashes or is shut down. (Can't happen if synchronous_commit = on, the default)
Disk I/O error, filesystem error, etc
Out-of-memory error
Network error leading to session disconnect after you send the commit but before you get confirmation of success. In this case you don't know for sure if it committed or not.
... probably more
Can it fail related to the executed queries, or just due to some DB side issues?
Either. A serialization failure, for example, is definitely related to the queries run.
If you're using READ COMMITTED isolation with no deferred constraints then commits are only likely to fail due to underlying system errors.
The question comes up because I need to judge if it makes sense to commit transactions inside tests or if it is "safe enough" to just rollback after each test case.
Any sensible test suite has to cover multiple concurrent transactions interacting, committing in different orders, etc.
If all you test is single standalone transactions you're not testing the real system.
So the question is IMO moot, because a decent suite of tests has to commit anyway.

Related

How to avoid long delay before finally getting "40001 could not serialize access due to concurrent update"

We have a Postgres 12 system running one master master and two async hot-standby replica servers and we use SERIALIZABLE transactions. All the database servers have very fast SSD storage for Postgres and 64 GB of RAM. Clients connect directly to master server if they cannot accept delayed data for a transaction. Read-only clients that accept data up to 5 seconds old use the replica servers for querying data. Read-only clients use REPEATABLE READ transactions.
I'm aware that because we use SERIALIZABLE transactions Postgres might give us false positive matches and force us to repeat transactions. This is fine and expected.
However, the problem I'm seeing is that randomly a single line INSERT or UPDATE query stalls for a very long time. As an example, one error case was as follows (speaking directly to master to allow modifying table data):
A simple single row insert
insert into restservices (id, parent_id, ...) values ('...', '...', ...);
stalled for 74.62 seconds before finally emitting error
ERROR 40001 could not serialize access due to concurrent update
with error context
SQL statement "SELECT 1 FROM ONLY "public"."restservices" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
We log all queries exceeding 40 ms so I know this kind of stall is rare. Like maybe a couple of queries a day. We average around 200-400 transactions per second during normal load with 5-40 queries per transaction.
After finally getting the above error, the client code automatically released two savepoints, rolled back the transaction and disconnected from database (this cleanup took 2 ms total). It then reconnected to database 2 ms later and replayed the whole transaction from the start and finished in 66 ms including the time to connect to the database. So I think this is not about performance of the client or the master server as a whole. The expected transaction time is between 5-90 ms depending on transaction.
Is there some PostgreSQL connection or master configuration setting that I can use to make PostgreSQL to return the error 40001 faster even if it caused more transactions to be rolled back? Does anybody know if setting
set local statement_timeout='250'
within the transaction has dangerous side-effects? According to the documentation https://www.postgresql.org/docs/12/runtime-config-client.html "Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions" but I could set the timeout only for transactions by this client that's able to automatically retry the transaction very fast.
Is there anything else to try?
It looks like someone had the parent row to the one you were trying to insert locked. PostgreSQL doesn't know what to do about that until the lock is released, so it blocks. If you failed rather than blocking, and upon failure retried the exact same thing, the same parent row would (most likely) still be locked and so would just fail again, and you would busy-wait. Busy-waiting is not good, so blocking rather than failing is generally a good thing here. It blocks and then unblocks only to fail, but once it does fail a retry should succeed.
An obvious exception to blocking-better-than-failing being if when you retry, you can pick a different parent row to retry with, if that make sense in your context. In this case, maybe the best thing to do is explicitly lock the parent row with NOWAIT before attempting the insert. That way you can perhaps deal with failures in a more nuanced way.
If you must retry with the same parent_id, then I think the only real solution is to figure out who is holding the parent row lock for so long, and fix that. I don't think that setting statement_timeout would be hazardous, but it also wouldn't solve your problem, as you would probably just keep retrying until the lock on the offending row is released. (Setting it on the other session, the one holding the lock, might be helpful, depending on what that session is doing while the lock is held.)

What to do after a query when auto_commit is disabled

In some scenarios we should setAutoCommit(false) before query, see here https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor and When does the PostgreSQL JDBC driver fetch rows after executing a query? .
But none of these topics mentioned how to do after query, when ResultSet and Statement is closed but Connection is not (may be recycled by ConnectionPool or DataSource).
I have these choices:
Do nothing (keep autoCommit = false for next query)
set autoCommit = true
commit
rollback
Which one is the best practice?
Even queries are executed in a transaction. If you started a transaction (which implicitly happened when you executed the query), then you should also end it. Generally, doing nothing would - with a well-behaved connection pool - result in a rollback when your connection is returned to the pool. However, it is best not the rely on such implicit behaviour, because not all connection pools or drivers will adhere to it. For example the Oracle JDBC driver will commit on connection close (or at least, it did so in the past, I'm not sure if it still does), and it might not be the correct behaviour for your program. Explicitly calling commit() or rollback() will clearly document the boundaries and expectations of your program.
Though committing or rolling back a transaction that only executed a query (and thus did not modify the database), will have the same end result, I would recommend using commit() rather than rollback(), to clearly indicate that the result was successful. For some databases, committing might be cheaper than rollback (or vice versa), but such systems usually have heuristics that will convert a commit to rollback (or vice versa, whatever is 'cheaper'), if the result would be equivalent.
You generally don't need to switch auto-commit mode when you're done. A well-behaved connection pool should do that for you (though not all do, or sometimes you need to explicitly configure this). Double check the behaviour and options of your connection pool to be sure.
If you want to continue using a connection yourself (without returning to the pool), then switching back to auto-commit mode is sufficient: calling setAutoCommit(true) with an active transaction will automatically commit that transaction.
It depends what you want to do afterwards. If you want to return to autocommit mode after the operation:
conn.setAutoCommit(true);
This will automatically commit the open transaction.

Is it possible that both transactions rollback during a deadlock or serialization error?

In PostgreSQL (and other MVCC databases), transactions can rollback due to a deadlock or serialization error. Assume two transactions are currently running, is it ever possible that both, instead of just one, transaction will fail due to this kind of errors?
The reason why I am asking is that I am writing a retry implementation. If both transactions can fail, we might end up in a never-ending loop of retries if both retry immediately. If only one transaction can fail, I don't see any harm in retrying as soon as possible.
Yes. A deadlock can involve more than two transactions. In this case more than one may be terminated. But this is an extremely rare condition. Normally.
If just two transactions deadlock, one survives. The manual:
PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete.
Serialization failures only happen in REPEATABLE READ or SERIALIZABLE transaction isolation. I wouldn't know of any particular limit to how many serialization failures can happen concurrently. But I also never heard of any necessity to delay retrying.
I would retry as soon as possible either way.

Bitronix transaction appears to be committing prematurely

We have a spring-batch process that uses the bitronix transaction manager. On the first pass of a particular step, we see the expected commit behavior - data is only committed to the target database when the transaction boundary is reached.
However, on the second and subsequent passes, rows are committed as soon as they are written. That is, they do not wait for the commit point.
We have confirmed that the bitronix commit is only called at the expected points.
Has anyone experienced this behavior before? What kind of bug am I looking for?
Java XA is designed in such a way that connections cannot be reused across transactions. Once the transaction is committed, the connection property is changed to autocommit=true, and the connection cannot be used in another transaction until it is returned to the connection pool and retrieved by the XA code again.

How does TransactionScope guarantee data integrity across multiple databases?

Can someone tell me the principle of how TransactionScope guarantees data integrity across multiple databases? I imagine it first sends the commands to the databases and then waits for the databases to respond before sending them a message to apply the command sent earlier. However when execution is stopped abruptly when sending those apply messages we could still end up with a database that has applied the command and one that has not. Can anyone shed some light on this?
Edit:
I guess what Im asking is can I rely on TransactionScope to guarantee data integrity when writing to multiple databases in case of a power outage or a sudden shutdown.
Thanks, Bas
Example:
using(var scope=new TransactionScope())
{
using (var context = new FirstEntities())
{
context.AddToSomethingSet(new Something());
context.SaveChanges();
}
using (var context = new SecondEntities())
{
context.AddToSomethingElseSet(new SomethingElse());
context.SaveChanges();
}
scope.Complete();
}
It promotes it to the Distributed Transaction Coordinator (msdtc) if it detects multiple databases which use each scope as a part of a 2-phase commit. Each scope votes to commit and hence we get the ACID properties but distributed accross databases. It can also be integrated with TxF, TxR. You should be able to use it the way you describe.
The two databases are consistent as the distributed COM+ transaction running under MTC have attached to them, database transactions.
If one database votes to commit (e.g. by doing a (:TransactionScope).Commit()), "it" tells the DTC that it votes to commit. When all databases have done this they have a change-list. As far as the database transactions don't deadlock or conflict with other transactions now (e.g. by means of a fairness algorithm that pre-empts one transaction) all operations for each database are in the transaction log. If the system loses power when not yet commit for one database has finished but it has for another, it has been recorded in the transaction log that all resources have voted to commit, so there is no logical implication that the commit should fail. Hence, next time the database that wasn't able to commit boots up, it will finish those transactions left in this indeterminate state.
With distributed transactions it can in fact happen that the databases become inconsistent. You said:
At some point both databases have to
be told to apply their changes. Lets
say there is a power outage after
telling the first db to apply, then
the databases are out of sync. Or am I
missing something?
You aren't. I think this is known as the generals problem. It can provably not be prevented. The windows of failure is however quite small.