When concurrent update is called in several transaction at a time, is PostgreSQL still able to maintain ACID?
Say for example if I do
BEGIN
UPDATE post SET like = like + 1
UPDATE post SET like = like + 1
END
multiple times concurrently all at the same time, will I see ACID compliant incrementation?
I am using REPEATABLE READ transaction type.
Yes, ACID will be maintained:
Either both statements will succeed or none of them (atomicity).
No constraint will be violated (consistency).
The transactions will lock out each other, serialization errors reported and deadlocks resolved (isolation).
After COMMIT, the transaction will survive a system crash as long as the transaction logs are preserved (durability).
Related
I'm writing a web app with Postgres 13 as the backend. Most requests are wrapped in a transaction, using the SERIALIZABLE isolation level.
For most things, this works great. However, there are some cases where I'd like some reads in the transaction to have less strict isolation.
For example, I'm introducing a global_flags table for infrequently-changed settings that any request might make use of:
await sqlAsync(`BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE`);
const batchSize = await sqlAsync(
`SELECT value FROM global_flags WHERE name = 'batch_size'`);
// ... a bunch more work ...
await sqlAsync('COMMIT');
I'm a bit worried that when we manually make changes to global_flags entries, we might cause an increase in "serialization failure" errors for in-flight transactions. Is there a way to tell Postgres that I don't need as strong of a consistency guarantee for reads of the global_flags table?
You needn't worry a bit.
If the one transaction does nothing except change that flag, and the other just reads the table (and doesn't try to write to it!), the two transactions will have a simple RW or WR dependency on each other, and there will be no cycles that cause a serialization error.
I have a simple table counter which is as below.
CREATE TABLE counter
(
id text NOT NULL,
total integer NOT NULL,
CONSTRAINT pk_id PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
The counter.id has a fixed set of 20 values and I have manually set the initial total to 0 for all the 20 counter.id. In one of the stored procedures, I added the following line
UPDATE counter SET total = total + 1 WHERE id = 'SomeID'; and now I see a large number of could not serialize access due to read/write dependencies among transactions postgres exceptions. If I comment that line out, the problem goes away. The table counter is not being updated/read anywhere else concurrently but this line.
I am using an ISOLATION level SERIALIZABLE in my transactions. The data access layer consists of Java Spring JDBC. I tried the following two approaches to resolve this issue.
Use a LOCK counter in ACCESS EXCLUSIVE MODE; before calling UPDATE counter.
Use PERFORM pg_advisory_xact_lock(1); right before calling UPDATE counter.
I am astonished that both the approaches did not solve the problem. From the documentation, the LOCK should give one thread an exclusive access to the table counter which should have prevented serializable exceptions. But it does not appear to work.
Any suggestions as to what I am doing wrong here?
UPDATE: So here is my attempt to reproduce the problem in a bit more simplified setting. I have have a single stored procedure which is as below.
CREATE OR REPLACE FUNCTION public.testinsert() RETURNS void AS
$BODY$
BEGIN
LOCK counter IN ACCESS EXCLUSIVE MODE;
INSERT INTO counter("id", "total")
VALUES('SomeID', 0) ON CONFLICT(id)
DO UPDATE SET total = counter.total + 1 where counter.id = 'SomeID';
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.testinsert()
Now I attempt the following in two separate psql consoles.
Console1: begin transaction isolation level serializable;
Console2: begin transaction isolation level serializable;
Console1: select testInsert();
Console2: select testInsert();
Console1: commit;
At this point Console2 throws an exception could not serialize access due to concurrent update. This clearly tells me that the lock counter does not work when placed inside the stored procedure. Any ideas why?
If I try a variation of this with Console1 and Console2 both doing a lock counter right after begin transaction, followed by calling the stored procedure, the code works just fine as Console2 now waits on the lock.
I have tried replacing the lock counter with PERFORM pg_advisory_xact_lock(1) and encountered similar problems.
Consider this sentence from the doc at https://www.postgresql.org/docs/current/static/transaction-iso.html , about SERIALIZABLE:
Applications using this level must be prepared to retry transactions
due to serialization failures.
It looks like you're ignoring that part, but you shouldn't.
See also https://wiki.postgresql.org/wiki/SSI for various examples of serialization failures that a session can have to deal with.
That these failures occur is the point of this isolation level. If you don't want them at all, you should use a less strict isolation, or avoid concurrency by locking out explicitly other sessions, like with your pg_advisory_xact_lock(1), but at the very beginning of the whole transaction.
The UPDATE you added might just change the timing of execution of concurrent transactions, because of the lock it creates (some transactions now stop when they wouldn't before). That's enough to trigger a serialization error. The root cause is that your concurrent transactions read/write data simultaneously at about the same locations (simultaneously meaning: transactions overlap, not necessarily that the writes occur at the same clockwall time exactly).
These hints at the bottom of the above-linked page may also help to reduce the probability of serialization failure :
For optimal performance when relying on Serializable transactions for
concurrency control, these issues should be considered:
Declare transactions as READ ONLY when possible.
Control the number of active connections, using a connection pool if needed. This is always an important performance consideration, but
it can be particularly important in a busy system using Serializable
transactions.
Don't put more into a single transaction than needed for integrity purposes.
Don't leave connections dangling "idle in transaction" longer than necessary.
Eliminate explicit locks, SELECT FOR UPDATE, and SELECT FOR SHARE where no longer needed due to the protections automatically provided
by Serializable transactions.
When the system is forced to combine multiple page-level predicate locks into a single relation-level predicate lock because the
predicate lock table is short of memory, an increase in the rate of
serialization failures may occur. You can avoid this by increasing
max_pred_locks_per_transaction.
A sequential scan will always necessitate a relation-level predicate lock. This can result in an increased rate of serialization
failures. It may be helpful to encourage the use of index scans by
reducing random_page_cost and/or increasing cpu_tuple_cost. Be sure to
weigh any decrease in transaction rollbacks and restarts against any
overall change in query execution time.
I'm a rookie in this topic, all I ever did was making a connection to database for one user, so I'm not familiar with making multiple user access to database.
My case is: 10 facilities will use my program for recording when workers are coming and leaving, the database will be on the main server and all I made was one user while I was programming/testing that program. My question is: Can multiple remote locations use one user for database to connect (there should be no collision because they are all writing different stuff, but at the same tables) and if that's not the case, what should I do?
Good relational databases handle this quite well, it is the “I” in the the so-called ACID properties of transactions in relational databases; it stands for isolation.
Concurrent processes are protected from simultaneously writing the same table row by locks that block other transactions until one transaction is done writing.
Readers are protected from concurrent writing by means of multiversion concurrency control (MVCC), which keeps old versions of the data around to serve readers without blocking anybody.
If you have enclosed all data modifications that belong together into a transaction, so that they happen atomically (the “A” in ACID), and your transactions are simple and short, your application will probably work just fine.
Problems may arise if these conditions are not satisfied:
If your data modifications are not protected by transactions, a concurrent session may see intermediate, incomplete results of a different session and thus work with inconsistent data.
If your transactions are complicated, later statements inside a transaction may rely on results of previous statements in indirect ways. This assumption can be broken by concurrent activity that modifies the data. There are three approaches to that:
Pessimistic locking: lock all data the first time you use them with something like SELECT ... FOR UPDATE so that nobody can modify them until your transaction is done.
Optimistic locking: don't lock, but whenever you access the data a second time, check that nobody else has modified them in the meantime. If that has been the case, roll the transaction back and try it again.
Use high transaction isolation levels like REPEATABLE READ and SERIALIZABLE which give better guarantees that the data you are using don't get modified concurrently. You have to be prepared to receive serialization errors if the database cannot keep the guarantees, in which case you have to roll the transaction back and retry it.
These techniques achieve the same goal in different ways. The discussion when to use which one exceeds the scope of this answer.
If your transactions are complicated and/or take a long time (long transactions are to be avoided as much as possible, because they cause all kinds of problems in a database), you may encounter a deadlock, which is two transactions locking each other in a kind of “deadly embrace”.
The database will detect this condition and interrupt one of the transactions with an error.
There are two ways to deal with that:
Avoid deadlocks by always locking resources in a certain order (e.g., always update the account with the lower account number first).
When you encounter a deadlock, your code has to retry the transaction.
Contrary to common believe, a deadlock is not necessarily a bug.
I recommend that you read the chapter about concurrency control in the PostgreSQL documentation.
I have a table that could have two threads reading data from it. If the data is in a certain state (let's say state 1) then the process will do something (not relevant to this question) and then update the state to 2.
It seems to me that there could be a case where thread 1 and thread 2 both perform a select within microseconds of one another and both see that the row is in state 1, and then both do the same thing and 2 updates occur after locks have been released.
Question is: Is there a way to prevent the second thread from being able to modify this data in Postgres - AKA it is forced to do another SELECT after the lock from the first one is released for its update so it knows to bail in order to prevent dupes?
I looked into row locking, but it says you cannot prevent select statements which sounds like it won't work for my condition here. Is my only option to use advisory locks?
Your question, referencing an unknown source:
I looked into row locking, but it says you cannot prevent select
statements which sounds like it won't work for my condition here. Is
my only option to use advisory locks?
The official documentation on the matter:
Row-level locks do not affect data querying; they block only writers
and lockers to the same row.
Concurrent attempts will not just select but try to take out the same row-level lock with SELECT ... FOR UPDATE - which causes them to wait for any previous transaction holding a lock on the same row to either commit or roll back. Just what you wanted.
However, many use cases are better solved with advisory locks - in versions before 9.5. You can still lock rows being processed with FOR UPDATE additionally to be safe. But if the next transaction just wants to process "the next free row" it's often much more efficient not to wait for the same row, which is almost certainly unavailable after the lock is released, but skip to the "next free" immediately.
In Postgres 9.5+ consider FOR UPDATE SKIP LOCKED for this. Like #Craig commented, this can largely replace advisory locks.
Related question stumbling over the same performance hog:
Function taking forever to run for large number of records
Explanation and code example for advisory locks or FOR UPDATE SKIP LOCKED in Postgres 9.5+:
Postgres UPDATE ... LIMIT 1
To lock many rows at once:
How to mark certain nr of rows in table on concurrent access
What you want is the fairly-common SQL SELECT ... FOR UPDATE. The Postgres-specific docs are here.
Using SELECT FOR UPDATE will lock the selected records for the span of the transaction, allowing you time to update them before another thread can select.
We have a transaction that is modifying a record. The transaction must call a web service, rolling back the transaction if the service fails (so it can't commit it before hand). Because the record is modified, the client app has a lock on it. However, the web service must retrieve that record to get information from it as part of it's processing. Bam, deadlock.
We use websphere, which, for reasons that boggle my mind, defaults to repeatable read isolation level. We knocked it down to read_committed, thinking that this would retrieve the row without seeking a lock. In our dev environment, it seemed to work, but in staging we're getting deadlocks.
I'm not asking why it behaved differently, we probably made a mistake somewhere. Nor am I asking about the specifics of the web service example above, because obviously this same thing could happen elsewhere.
But based on reading the docs, it seems like read_committed DOES acquire a shared lock during read, and as a result will wait for an exclusive lock held by another transaction (in this case the client app). But I don't want to go to read_uncommitted isolation level because I don't want dirty reads. Is there a less extreme solution? I need some middle ground where I can perform reads without any lock-waiting, and retrieve only committed data.
Is there such a goldilocks solution? Not too deadlock-y, not too dirty-read-y? If not in siolation level, maybe some modifier I can tack onto my SQL? Anything?
I assume you are talking jdbc isolation levels, and not db2. The difference between read_committed (cursor stability in db2) and repeatable_read (read stability) is how long the share locks are kept. repeatable_read keeps every lock that satisfied the predicates, read_committed on the other hand only keeps the lock until another row that matches the predicate is found.
Have you compared the plans? If the plans are different you may end up with different behaviour.
Are there any escalations occurring?
Have you tried CURRENTLY_COMMITTED (assuming you are on 9.7+)?
Pre currently_committed there where the following settings, DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and DB2_SKIPDELETED
The lowest isolation level that reads committed rows is read committed.
Usually, you process rows in a DB2 database like this:
Read database row with no read locks (ordinary SELECT with read committed).
Process data so you have a row with changed values.
Read database row again, with a read lock. (SELECT for UPDATE)
Check to see the database row in 1. matches the database row in 3.
If rows match, update database row.
If rows don't match, release update lock and go back to 2.