postgresql is SELECT FOR UPDATE over multiple rows atomic? - postgresql

Lets say that there are multiple parallel transactions that all do the same query:
SELECT * FROM table1 FOR UPDATE;
Can this result in a deadlock?
To put it in another way. Is the operation "lock all rows" in the above statement atomic or are the locks acquired along the way while the the records are processed?

Yes, it can result in a deadlock.
This is pretty easy to demonstrate. Set up a test table:
CREATE TABLE t AS SELECT i FROM generate_series(1,1000000) s(i);
... and then run these two queries in parallel:
SELECT i FROM t ORDER BY i FOR UPDATE;
SELECT i FROM t ORDER BY i DESC FOR UPDATE;
You can prevent deadlocks by ensuring that all processes acquire their locks in the same order. Alternatively, if you want to lock every record in the table, you can do it atomically with a table lock:
LOCK t IN ROW SHARE MODE;

Related

Does Postgres lock all rows in a query atomically, even across different tables via JOIN?

I am getting a deadlock error on my code. The issue is that this deadlock error is happening on the very first query of the transaction. This query joins two tables, TableA and TableB and should lock a single row in TableA with id==table_a_id, and all the rows on TableB that have a foreign key for table_a_id.
The query looks as follows (I am using SQLAlchemy, this output is from printing the equivalent query from it and will have its code below as well):
SELECT TableB.id AS TableB_id
FROM TableA JOIN TableB ON TableA.id = TableB.table_a_id
WHERE TableB.id = %(id_1)s FOR UPDATE
The query looks as follows in SQLAlchemy syntax:
query = (
database.query(TableB.id)
.select_from(TableA)
.filter_by(id=table_a_id)
.join((TableB, TableA.id == TableB.table_a_id))
.with_for_update()
)
return query.all()
My question is, will this query atomically lock all those rows from both tables? If so, why would I get a deadlock already exactly on this query, given it's the first query of the transaction?
The query will lock the rows one after the other as they are selected. The exact order will depend on the execution plan. Perhaps you can add FOR UPDATE OF table_name to lock rows only in the table where you need them locked.
I have two more ideas:
rewrite the query so that it locks the rows in a certain order:
WITH b AS MATERIALIZED (
SELECT id, table_a_id
FROM tableb
WHERE id = 42
FOR NO KEY UPDATE
)
SELECT b.id
FROM tablea
WHERE EXISTS (SELECT 1 FROM b
WHERE tablea.id = b.table_a_id)
ORDER BY tablea.id
FOR NO KEY UPDATE;
Performance may not be as good, but if everybody selects like that, you won't get a deadlock.
lock the tables:
LOCK TABLE tablea, tableb IN EXCLUSIVE MODE;
That lock will prevent concurrent row locks and data modifications, so you will be safe from a deadlock.
Only do that as a last-ditch effort, and don't do it too often. If you frequently take high table locks like that, you keep autovacuum from running and endanger the health of your database.

REPEATABLE READ isolation level - successive SELECTs return different values

I'm trying to check the REPEATABLE READ isolation level in PostgreSQL, specifically the statement:
successive SELECT commands within a single transaction see the same
data, i.e., they do not see changes made by other transactions that
committed after their own transaction started.
To check that I run two scripts in two Query Editors in pg_admin.
The 1st one:
begin ISOLATION LEVEL REPEATABLE READ;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
select * into t1 from topic where id=1;
select pg_sleep(5);
select * into t2 from topic where id=1 ;
commit;
Here we have two successive SELECTs with the pause of 5 seconds between them, just to have time to run the 2nd UPDATE script. t1 and t2 are tables used to save the results. Thanks to them I'm able to check the selected data after the scripts execution.
I run the 2nd script immediately after the fist:
begin ISOLATION LEVEL REPEATABLE READ;
update topic set title='new value' where id=1;
commit;
It must commit after the 1st select but before the 2nd one.
The problem is that two successive SELECTs return different values - the results in t1 and t2 are different. I suggested they must the same. Could you explain me why this is happening?
Maybe pg_sleep starts transaction implicitly?

How can you use 'For update skip locked' in postgres without locking rows in all tables used in the query?

When you want to use postgres's SELECT FOR UPDATE SKIP LOCKED functionality to ensure that two different users reading from a table and claiming tasks do not get blocked by each other and also do not get tasks already being read by another user:
A join is being used in the query to retrieve tasks. We do not want any other table to have row-level locking except the table that contains the main info. Sample query below - Lock only the rows in the table -'task' in the below query
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo )
ORDER BY v.id limit 200 for update skip locked;
Now if user A is retrieving some 200 rows of this table, user B should be able to retrieve another set of 200 rows.
EDIT: As per the comment below, the query will be changed to :
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo) ORDER BY v.id limit 200 for update of v skip locked;
How best to place order by such that rows are ordered? While the order would get effected if multiple users invoke this command, still some order sanctity should be maintained of the rows that are being returned.
Also, does this also ensure that multiple threads invoking the same select query would be retrieving a different set of rows or is the locking only done for update commands?
Just experimented with this a little bit - multiple select queries will end up retrieving different set of rows. Also, order by ensures the order of the final result obtained.
Yes,
FOR UPDATE OF "TABLE_NAME" SKIP LOCKED
will lock only TABLE_NAME

select from final table(update table) concurrent select i.e select from two threads

Due to a concurrency issue in my project which happend due to 2 threads coming in together to do a select at same time, both recieve the same values which ideally should not happen.
After selecting a value it should perform a update and then second thread should select the updated value.
Am using DB2
I thought of using this approach of using
select number from final table(update tablename set columnanme=""
where )
.
My question is would this approach lock the db when the other thread comes in to select the value as theere is an update within select? and solve my concurrency issue.
OR
I was browsing and found another approach
update table (.....) select col from table where wait for
outcome
Would this select wait until the first thread finishes the select?
One thing you can certainly do to avoid multiple reads of the same value before it gets updated by one of the readers:
LOCK TABLE tablename IN EXCLUSIVE MODE;
SELECT id, ... FROM tablename WHERE ...;
UPDATE tablename SET id=newval WHERE ...;
COMMIT;
This will of course block the full table, which is maybe not what you want!
Alternative approach (relatively standard, but somewhat more involved programming logic):
SELECT id, ... FROM tablename WHERE ...
SELECT count(1) FROM FINAL TABLE (UPDATE tablename SET id=newval
WHERE ... AND id=newval);
while this count(1) is zero (meaning: someone else meanwhile updated
it) repeat from 1)
--Peter Vanroose,
ABIS Training & Consulting,
Leuven, Belgium.

How to list all locked rows of a table?

My application uses pessimistic locking. When a user opens the form for update a record, the application executes this query (table names are exemplary):
begin;
select *
from master m
natural join detail d
where m.master_id = 123456
for update nowait;
The query locks one master row and several (to several dozen) detail rows. Transaction is open until a user confirms or cancels updates.
I need to know what rows (at least master rows) are locked. I have excavated the documentation and postgres wiki without success.
Is it possible to list all locked rows?
PostgreSQL 9.5 added a new option to FOR UPDATE that provides a straightforward way to do this.
SELECT master_id
FROM master
WHERE master_id NOT IN (
SELECT master_id
FROM master
FOR UPDATE SKIP LOCKED);
This acquires locks on all the not-currently-locked rows, so think through whether that's a problem for you, especially if your table is large. If nothing else, you'll want to avoid doing this in an open transaction. If your table is huge you can apply additional WHERE conditions and step through it in chunks to avoid locking everything at once.
Is it possible? Probably yes, but it is the Greatest Mystery of Postgres. I think you would need to write your own extension for it (*).
However, there is an easy way to work around the problem. You can use very nice Postgres feature, advisory locks. Two arguments of the function pg_try_advisory_lock(key1 int, key2 int) you can interpret as: table oid (key1) and row id (key2). Then
select pg_try_advisory_lock(('master'::regclass)::integer, 123456)
locks row 123456 of table master, if it was not locked earlier. The function returns boolean.
After update the lock has to be freed:
select pg_advisory_unlock(('master'::regclass)::integer, 123456)
And the nicest thing, list of locked rows:
select classid::regclass, objid
from pg_locks
where locktype = 'advisory'
Advisory locks may be complementary to regular locks or you can use them independently. The second option is very temptive, as it can significantly simplify the code. But it should be applied with caution because you have to make sure that all updates (deletes) on the table in all applications are performed with this locking.
(*) Mr. Tatsuo Ishii did it (I did not know about it, have just found).