I have a table that could have two threads reading data from it. If the data is in a certain state (let's say state 1) then the process will do something (not relevant to this question) and then update the state to 2.
It seems to me that there could be a case where thread 1 and thread 2 both perform a select within microseconds of one another and both see that the row is in state 1, and then both do the same thing and 2 updates occur after locks have been released.
Question is: Is there a way to prevent the second thread from being able to modify this data in Postgres - AKA it is forced to do another SELECT after the lock from the first one is released for its update so it knows to bail in order to prevent dupes?
I looked into row locking, but it says you cannot prevent select statements which sounds like it won't work for my condition here. Is my only option to use advisory locks?
Your question, referencing an unknown source:
I looked into row locking, but it says you cannot prevent select
statements which sounds like it won't work for my condition here. Is
my only option to use advisory locks?
The official documentation on the matter:
Row-level locks do not affect data querying; they block only writers
and lockers to the same row.
Concurrent attempts will not just select but try to take out the same row-level lock with SELECT ... FOR UPDATE - which causes them to wait for any previous transaction holding a lock on the same row to either commit or roll back. Just what you wanted.
However, many use cases are better solved with advisory locks - in versions before 9.5. You can still lock rows being processed with FOR UPDATE additionally to be safe. But if the next transaction just wants to process "the next free row" it's often much more efficient not to wait for the same row, which is almost certainly unavailable after the lock is released, but skip to the "next free" immediately.
In Postgres 9.5+ consider FOR UPDATE SKIP LOCKED for this. Like #Craig commented, this can largely replace advisory locks.
Related question stumbling over the same performance hog:
Function taking forever to run for large number of records
Explanation and code example for advisory locks or FOR UPDATE SKIP LOCKED in Postgres 9.5+:
Postgres UPDATE ... LIMIT 1
To lock many rows at once:
How to mark certain nr of rows in table on concurrent access
What you want is the fairly-common SQL SELECT ... FOR UPDATE. The Postgres-specific docs are here.
Using SELECT FOR UPDATE will lock the selected records for the span of the transaction, allowing you time to update them before another thread can select.
Related
I have a question concerning the FOR UPDATE Clause in CURSORs for IBM DB2 for z/OS.
Assume Isolation Level Cursor Stability (standard parameter in BIND command).
DB2 Version is 11.
My first question is: can a CURSOR that is coded with the FOR UPDATE clause prevent concurrent transactions form reading the row on which the CURSOR is currently positioned on?
My second question is: does the UPDATE ... WHERE CURRENT OF ... statement detect when the updated row has been changed after the CURSOR has been opened and before it has been fetched from the CURSORs resultset?
I have read some contradictory statements on the web regarding these questions.
As of my (current) understanding, the FETCH operation only aquires an update lock on the fetched row, so concurrent transactions can at least read the same row. The U-Lock is only promoted to an X-Lock in case the UPDATE WHERE CURRENT OF CURSOR is actually done (dependent on application logic). But this confuses me, because it then would not prevent a lost update phenomenon (when the concurrent process is allowed to read the value before the update in the first process is done it continues its processing with the old value and overwrites the update of the first process which has updated via CURRENT OF CURSOR).
Can a cursor that is coded with the FOR UPDATE clause prevent concurrent transactions from reading the row on which the cursor is currently positioned?
No - with isolation level CS, Db2 will hold a U lock on the current row which is compatible with the S locks potentially required (see later comments about the CURRENTDATA bind parameter and it's impact on avoidance of the S lock for readers).
Does the UPDATE ... WHERE CURRENT OF statement detect when the updated row has been changed after the cursor has been opened and before it has been fetched from the CURSORs result set?
No - with isolation level CS, Db2 will not acquire a lock until the row is read. If you require the data to remain unchanged after OPEN CURSOR you need a different isolation level.
But this confuses me, because it then would not prevent a lost update phenomenon (when the concurrent process is allowed to read the value before the update in the first process is done it continues its processing with the old value and overwrites the update of the first process which has updated via CURRENT OF CURSOR).
Assuming both transactions are using FOR UPDATE and UPDATE ... WHERE CURRENT OF this scenario cannot happen. Each read would attempt to acquire a U lock. Since U locks are incompatible with each other the second read would wait on the first U lock to be released. (https://www.ibm.com/docs/en/db2-for-zos/12?topic=locks-lock-modes-compatibility)
For the more complex case where one (or both) of the transactions are not using FOR UPDATE and UPDATE ... WHERE CURRENT OF there are opportunities for the lost update phenomenon to occur.
Long ago, Db2 introduced bind parameter CURRENTDATA to help control this behavior.
CURRENTDATA(NO) (default as of Db2 10) - Attempt lock avoidance where possible but with an increased risk of obtaining non-current data
CURRENTDATA(YES) - Acquire S locks to reduce the risk of obtaining non-current data. It's important to note that CURRENTDATA(YES) does not completely eliminate the risk of non-current data.
Db2 manual - Choosing CURRENTDATA Option
Gareth has some great articles on this with much more detail - Db2 for z/OS Locking for Application Developers Part 8
To completely guard against the risk of losing an update, a good approach is to add predicates to ensure the update only occurs against the expected data. Gareth provides three options for this in Part 9 of his blog on locking. The general idea is to have something like an update timestamp that is always updated when any part of the row is updated. Then include a predicate in the WHERE clause of the UPDATE statement to ensure that the update will only occur if the update timestamp is the same as when the row was originally read. The ROW CHANGE TIMESTAMP feature in Db2 9 makes this approach easier.
Can I write an UPDATE statement that will simply not bother executing if there's a deadlock?
I have a small, but frequently updated table.
This statement is run quite frequently on it....
UPDATE table_a SET lastChangedTime = 'blah' WHERE pk = 1234;
Where pk is the primary key.
Every now and again this statement gets blocked. That's not in itself a big deal; the issue is that each time there's a lock it seems to take a minute or two for Postgres to sort itself out, and I can lose a lot a data.
table_a is very volatile, and lastChangedTime gets altered all the time, so rather than occasionally having to wait two minutes for the UPDATE to get executed, I'd rather it just didn't bother. Ok, my data might not be as up-to-date as I'd like for this one record, but at least I wouldn't have locked the whole table for 2 minutes.
Update following comments:
the application interacts very simply with the database, it only issues simple, one line UPDATE and INSERT statements, and commits each one immediately. One of the issues causing me a lot of head scratching is how can something work a million times without problem, and then just fail on another record that appears to be identical to all the others.
Final suggestion/question.....
The UPDATE statement is being invoked from a C# application. If I change the 'command timeout' to a very short value - say 1 millisecond would that have the desired effect? or might it end up clogging up the database with lots of broken transactions?
To avoid waiting for locks, first run
SELECT 1 FROM table_a WHERE pk = 1234 FOR UPDATE NOWAIT;
If there is a lock on the row, the statement will fail immediately, and you can go o working on something different.
Mind that the SELECT ... FOR UPDATE statement has to be in the same database transaction as your UPDATE statement.
As a general advice, you should use shorter transactions, which will reduce the length of lock waits and the risk of deadlock.
I have a table called deposits
When a deposit is made, the table is locked, so the query looks something like:
SELECT * FROM deposits WHERE id=123 FOR UPDATE
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
The problem occurs though, when other deposits are trying to get the lock for the table. What happens is, somewhere in between locking the table and calling psql_commit() something is failing and keeping the lock for a stupidly long amount of time. There are a couple of things I need help addressing:
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
Thanks for your help guys, I will keep poking around but I haven't had much luck. Is this a non-existing function of psql, because I found this: http://www.postgresql.org/message-id/40286F1F.8050703#optusnet.com.au
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
Nope. FOR UPDATE locks only those rows, so that another transaction that attempts to lock them (with FOR SHARE, FOR UPDATE, UPDATE or DELETE) blocks until your transaction commits or rolls back.
If you want a whole table lock that blocks inserts/updates/deletes you probably want LOCK TABLE ... IN EXCLUSIVE MODE.
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
See the lock_timeout setting. This was added in 9.3 and is not available in older versions.
Crude approximations for older versions can be achieved with statement_timeout, but that can lead to statements being cancelled unnecessarily. If statement_timeout is 1s and a statement waits 950ms on a lock, it might then get the lock and proceed, only to be immediately cancelled by a timeout. Not what you want.
There's no query-level way to set lock_timeout, but you can and should just:
SET LOCAL lock_timeout = '1s';
after you BEGIN a transaction.
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
There is a statement timeout, but locks are held at transaction level. There's no transaction timeout feature.
If you're running single-statement transactions you can just set a statement_timeout before running the statement to limit how long it can run for. This isn't quite the same thing as limiting how long it can hold a lock, though, because it might wait 900ms of an allowed 1s for the lock, only actually hold the lock for 100ms, then get cancelled by the timeout.
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
No. You must:
BEGIN;
SET LOCAL lock_timeout = '4s';
SELECT ....;
COMMIT;
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
SET LOCAL is suitable, and preferred, for this.
There's no way to do it in the text of the query, it must be a separate statement.
The mailing list post you linked to is a proposal for an imaginary syntax that was never implemented (at least in a public PostgreSQL release) and does not exist.
In a situation like this you may want to consider "optimistic concurrency control", often called "optimistic locking". It gives you greater control over locking behaviour at the cost of increased rates of query repetition and the need for more application logic.
I wanna lock one row by some user until he work with this row on indefinitely time and he must unlock it when done. So any others users will not be able to lock this row for yourself. It is possible to do on data base level?
You can do it with a long-lived transaction, but there'll be performance issues with that. This sounds like more of a job for optimistic concurrency control.
You can just open a transaction and do a SELECT 1 FROM mytable WHERE clause to match row FOR UPDATE;. Then keep the transaction open until you're done. The problem with this is that it can cause issues with vacuum that result in table and index bloat, where tables get filled with deleted data and indexes fill up with entries pointing to obsolete blocks.
It'd be much better to use an advisory lock. You still have to hold the connection the holds the lock open, but it doesn't have to keep an open idle transaction, so it's much lower impact. Transactions that wish to update the row must explicitly check for a conflicting advisory lock, though, otherwise they can just proceed as if it wasn't locked. This approach also scales poorly to lots of tables (due to limited advisory lock namespace) or lots of concurrent locks (due to number of connections).
You can use a trigger to check for the advisory lock and wait for it if you can't make sure your client apps will always get the advisory lock explicitly. However, this can create deadlock issues.
For that reason, the best approach is probably to have a locked_by field that records a user ID, and a locked_time field that records when it was locked. Do it at the application level and/or with triggers. To deal with concurrent attempts to obtain the lock you can use optimistic concurrency control techniques, where the WHERE clause on the UPDATE that sets locked_by and locked_time will not match if someone else gets there first, so the rowcount will be zero and you'll know you lost the race for the lock and have to re-check. That WHERE clause usually tests locked_by and locked_time. So you'd write something like:
UPDATE t
SET locked_by = 'me' AND locked_time = current_timestamp
WHERE locked_by IS NULL AND locked_time IS NULL
AND id = [ID of row to update];
(This is a simplified optimistic locking mode for grabbing a lock, where you don't mind if someone else jumped in and did an entire transaction. If you want stricter ordering, you use a row-version column or you check that a last_modified column hasn't changed.)
In postgres if I run the following statement
update table set col = 1 where col = 2
In the default READ COMMITTED isolation level, from multiple concurrent sessions, am I guaranteed that:
In a case of a single match only 1 thread will get a ROWCOUNT of 1 (meaning only one thread writes)
In a case of a multi match that only 1 thread will get a ROWCOUNT > 0 (meaning only one thread writes the batch)
Your stated guarantees apply in this simple case, but not necessarily in slightly more complex queries. See the end of the answer for examples.
The simple case
Assuming that col1 is unique, has exactly one value "2", or has stable ordering so every UPDATE matches the same rows in the same order:
What'll happen for this query is that the threads will find the row with col=2 and all try to grab a write lock on that tuple. Exactly one of them will succeed. The others will block waiting for the first thread's transaction to commit.
That first tx will write, commit, and return a rowcount of 1. The commit will release the lock.
The other tx's will again try to grab the lock. One by one they'll succeed. Each transaction will in turn go through the following process:
Obtain the write lock on the contested tuple.
Re-check the WHERE col=2 condition after getting the lock.
The re-check will show that the condition no longer matches so the UPDATE will skip that row.
The UPDATE has no other rows so it will report zero rows updated.
Commit, releasing the lock for the next tx trying to get hold of it.
In this simple case the row-level locking and the condition re-check effectively serializes the updates. In more complex cases, not so much.
You can easily demonstrate this. Open say four psql sessions. In the first, lock the table with BEGIN; LOCK TABLE test;*. In the rest of the sessions run identical UPDATEs - they'll block on the table level lock. Now release the lock by COMMITting your first session. Watch them race. Only one will report a row count of 1, the others will report 0. This is easily automated and scripted for repetition and scaling up to more connections/threads.
To learn more, read rules for concurrent writing, page 11 of PostgreSQL concurrency issues - and then read the rest of that presentation.
And if col1 is non-unique?
As Kevin noted in the comments, if col isn't unique so you might match multiple rows, then different executions of the UPDATE could get different orderings. This can happen if they choose different plans (say one is a via a PREPARE and EXECUTE and another is direct, or you're messing with the enable_ GUCs) or if the plan they all use uses an unstable sort of equal values. If they get the rows in a different order then tx1 will lock one tuple, tx2 will lock another, then they'll each try to get locks on each others' already-locked tuples. PostgreSQL will abort one of them with a deadlock exception. This is yet another good reason why all your database code should always be prepared to retry transactions.
If you're careful to make sure concurrent UPDATEs always get the same rows in the same order you can still rely on the behaviour described in the first part of the answer.
Frustratingly, PostgreSQL doesn't offer UPDATE ... ORDER BY so ensuring that your updates always select the same rows in the same order isn't as simple as you might wish. A SELECT ... FOR UPDATE ... ORDER BY followed by a separate UPDATE is often safest.
More complex queries, queuing systems
If you're doing queries with multiple phases, involving multiple tuples, or conditions other than equality you can get surprising results that differ from the results of a serial execution. In particular, concurrent runs of anything like:
UPDATE test SET col = 1 WHERE col = (SELECT t.col FROM test t ORDER BY t.col LIMIT 1);
or other efforts to build a simple "queue" system will *fail* to work how you expect. See the PostgreSQL docs on concurrency and this presentation for more info.
If you want a work queue backed by a database there are well-tested solutions that handle all the surprisingly complicated corner cases. One of the most popular is PgQ. There's a useful PgCon paper on the topic, and a Google search for 'postgresql queue' is full of useful results.
* BTW, instead of a LOCK TABLE you can use SELECT 1 FROM test WHERE col = 2 FOR UPDATE; to obtain a write lock on just that on tuple. That'll block updates against it but not block writes to other tuples or block any reads. That allows you to simulate different kinds of concurrency issues.