PostgreSQL for update statement - postgresql

PostgreSQL has read committed isolation level. Now I have a transaction which consists of a single DELETE statement and this delete statement has a subquery consisting of a SELECT statement for selection the rows to delete.
Is it true that I have to use FOR UPDATE in the select statement to get no conflicts with other transaction?
My thinking is the following: First the corresponding rows are read out from the table and in a second step these rows are deleted, so another transaction could interfere.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do I also have to use FOR UPDATE?

Is it true that I have to use FOR UPDATE in the select statement to
get no conflicts with other transaction?
What does "no conflicts with other transaction" mean to you? You can test this by opening two terminals, and executing statements in each of them. Interleaved correctly, the DELETE statement will make the "other transaction" (the one that has its isolation level set to READ COMMITTED) wait until it commits or rolls back.
sandbox=# set transaction isolation level read committed;
SET
sandbox=# select * from customer;
date_of_birth
---------------
1996-09-29
1996-09-28
(2 rows)
sandbox=# begin transaction;
BEGIN
sandbox=# delete from customer
sandbox-# where date_of_birth = '1996-09-28';
DELETE 1
sandbox=# update customer
sandbox-# set date_of_birth = '1900-01-01'
sandbox-# where date_of_birth = '1996-09-28';
(Execution pauses here, waiting for transaction in other terminal.)
sandbox=# commit;
COMMIT
sandbox=#
UPDATE 0
sandbox=#
See below for the documentation.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do
I also have to use FOR UPDATE?
There's no such statement as DELETE . . . FOR UPDATE.
You need to be sensitive to context when you're reading about database updates. Update can mean any change to a database; it can include inserting, deleting, and updating rows. In the docs cited below, "locked as though for update" is explicitly talking about UPDATE and DELETE statements, among others.
Current docs
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE,
SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of
these rows will be blocked until the current transaction ends. The FOR
UPDATE lock mode is also acquired by any DELETE on a row, and also by
an UPDATE that modifies the values on certain columns. Currently, the
set of columns considered for the UPDATE case are those that have an
unique index on them that can be used in a foreign key (so partial
indexes and expressional indexes are not considered), but this may
change in the future. Also, if an UPDATE, DELETE, or SELECT FOR UPDATE
from another transaction has already locked a selected row or rows,
SELECT FOR UPDATE will wait for the other transaction to complete, and
will then lock and return the updated row (or no row, if the row was
deleted).

Short version: the FOR UPDATE in a sub-select is not necessary because the DELETE implementation already does the necessary locking. It would be redundant.
Ideally you should read and digest Concurrency Control to learn how the concurrency issues are dealt with by the SQL engine.
Specifically for the case you're mentioning, I think these couple of excerpts are the most relevant, in Read Committed Isolation Level:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time.
However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress).
So one of your two concurrent DELETE will be put to wait, as soon as it tries to delete a row that the other one already processed just before. This wait will only end when the other one commits or roll backs. In a way, that means that the engine "detected the conflict" and serialized the two DELETE in order to deal with that conflict.
If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row.
In your scenario, after the first DELETE has committed and the second one is waked up, the second one will be unable to delete the row that it was put to wait for, because it's no longer current, it's gone. That's not an error in this isolation level. The execution will just go on with the other rows, some of which may also have disappeared. Eventually it will report the actual number of rows that were deleted by this statement, that may be different from the number that the sub-select initially found, before the statement was put to wait.

Related

Postgres cursor update each row in separate transaction

I have a million-row table in Postgres 13 that needs a one-time update of each row: the (golang) script will read the current column value for each row, transform it, then update the row with the new value, for example:
DECLARE c1 CURSOR FOR SELECT v FROM users;
FETCH c1;
-- read and transform v
UPDATE users SET v = ? WHERE CURRENT OF c1;
-- transaction committed
FETCH c1;
...
I'm familiar with cursors for reading, but have a few requirements for writing that I'm struggling to find the right settings for:
I don't want it all to run in a single huge transaction, which is the default with cursors, since the change set will be large and it will take a while. I'd rather each update be its own transaction, and I can re-run the idempotent script again if it fails for any reason. I'm aware of DECLARE WITH HOLD to have the cursor span transactions, but...
By default the data read by the cursor is "insensitive" (a snapshot from when the cursor was first created), but I would like the latest data for each row with FETCH in case there has been a subsequent update. The solution to that is to use FOR UPDATE in the cursor query to make it "sensitive," but that is not allowed together with WITH HOLD. I would prefer the row lock you get with FOR UPDATE to prevent the read-then-write race condition between FETCH and UPDATE, but it's not mandatory
How can I iterate all rows and update them one at a time without having to read everything into memory first?
Make the cursor be WITH HOLD, but select the pk rather than v. Then in the loop, select the now-current v from the table based on the pk (rather than current of main), and update it using the pk.

Lock row, release later

I'm trying to understand how to lock a row, and only release that lock later.
I have a table like this :
create table testTable (Name varchar(100));
Some test data
insert into testTable (name) select 'Bob';
insert into testTable (name) select 'John';
insert into testTable (name) select 'Steve';
Now, I want to select one of those rows, and prevent other other queries from seeing this row. I achieve that like this :
begin transaction;
select * from testTable where name = 'Bob' for update;
In another window, I do this :
select * from testTable for update skip locked;
Great, I don't see 'Bob' in that result set. Now, I want to do something with the primary retrieved row (Bob), and after I did my work, I want to release that row again. Simple answer would be to do :
commit transaction
However, I am running multiple transactions on the same connection, so I can't just begin and commit transactions all over the show. Ideally I would like to have a "named" transaction, something like :
begin transaction 'myTransaction';
select * from testTable where name = 'Bob' for update;
//do stuff with the data, outside sql then later call ...
commit transaction 'myTransaction';
But postgres doesn't support that. I have found "prepare transaction", but that seems to be a pear-shaped path I don't want to go down, especially as these transaction seem to persist through restarts even.
Is there anyway I can have a reference to commit/rollback for a specific transaction?
You can have only one transaction in a database session, so the question as such is moot.
But I assume that you do not really want to run a transaction, you want to block access to a certain row for a while.
It is usually not a good idea to use regular database locks for such a purpose (the exception are advisory locks, which serve exactly that purpose, but are not tied to table rows). The problem is that long database transactions keep autovacuum from doing its job.
I recommend that you add a status column to the table and change the status rather than locking the row. That would server the same purpose in a more natural fashion and make your problem go away.
If you are concerned that the status flag might not get cleared due to application logic problems, replace it with a visible_from column of type timestamp with time zone that initially contains -infinity. Instead of locking the row, set the value to current_timestamp + INTERVAL '5 minutes'. Only select rows that fulfill WHERE visible_from < current_timestamp. That way the “lock” will automatically expire after 5 minutes.

Postgres: SELECT FOR UPDATE does not see new rows after lock release

Trying to support PostgreSQL DB in my application, found this strange behaviour.
Preparation:
CREATE TABLE test(id INTEGER, flag BOOLEAN);
INSERT INTO test(id, flag) VALUES (1, true);
Assume two concurrent transactions (Autocommit=false, READ_COMMITTED) TX1 and TX2:
TX1:
UPDATE test SET flag = FALSE WHERE id = 1;
INSERT INTO test(id, flag) VALUES (2, TRUE);
-- (wait, no COMMIT yet)
TX2:
SELECT id FROM test WHERE flag=true FOR UPDATE;
-- waits for TX1 to release lock
Now, if I COMMIT in TX1, the SELECT in TX2 returns empty cursor.
It is strange to me, because same experiment in Oracle and MariaDB results in selecting newly created row (id=2).
I could not find anything about this behaviour in PG documentation.
Am I missing something?
Is there any way to force PG server to "refresh" statement visibility after acquiring lock?
PS: PostgreSQL version 11.1
TX2 scans the table and tries to lock the results.
The scan sees the snapshot of the database from the start of the query, so it cannot see any rows that were inserted (or made eligible in some other way) by concurrent modifications that started after that snapshot was taken.
That is why you cannot see the row with the id 2.
For id 1, that is also true, so the scan finds that row. But the query has to wait until the lock is released. When that finally happens, it fetches that latest committed version of the row and performs the check again, so that row is excluded as well.
This “EvalPlanQual” recheck (to use PostgreSQL jargon) is only performed for rows that were found during the scan, but were locked. The second row isn't even found during the scan, so no such processing happens there.
This is a bit odd, admitted. But it is not a bug, it is just the way PostgreSQL wirks.
If you want to avoid such anomalies, use the REPEATABLE READ isolation level. Then you will get a serialization error in such a case and can retry the transaction, thus avoiding inconsistencies like that.

apparent transaction isolation violation in postgresql

I am using PostgreSQL 9.3.12 on CentOS Linux.
I have two processes connecting to the same database, using a default transaction isolation level of "read committed". According to the postgres docs, one process in a transaction should not "see" changes made by another process in a transaction until they are committed.
A sequence I am seeing is:
process A starts its transaction
process A deletes everything from table T
process B starts its transaction
process B attempts a select for update on one row in table T
process B comes up empty (0 rows) and calls rollback
process A repopulates table T from incoming data
process A commits its transaction
Now, table T should have been populated before both transactions began, and process B's query should have turned up one row. And it does if these processes do not run concurrently.
My understanding is that process B should see the old copy of the desired row in table T, makes its changes, and those changes should be clobbered by process A's deletion and repopulation of table T. I can't figure out why process B is coming up empty.
Beyond a complete misunderstanding by myself of these preconditions, can anyone think of another reason why I would see this behaviour?
Worry not about the lousy architecture, it is going away. I'm just trying to understand why this situation seems to violate the "read committed" transaction isolation as I understand it.
Thanks.
According to the postgres docs, one process in a transaction should
not "see" changes made by another process in a transaction until they
are committed.
Yes and No - as usual, it depends. The documentation strictly says that:
Read Committed is the default isolation level in PostgreSQL.
When a transaction uses this isolation level, a SELECT query (without
a FOR UPDATE/SHARE clause) sees only data committed before the query
began; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions. In effect, a SELECT
query sees a snapshot of the database as of the instant the query
begins to run. However, SELECT does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed. Also note that two successive SELECT commands can see
different data, even though they are within a single transaction, if
other transactions commit changes after the first SELECT starts and
before the second SELECT starts.
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time. However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress). If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row. The
search condition of the command (the WHERE clause) is re-evaluated to
see if the updated version of the row still matches the search
condition. If so, the second updater proceeds with its operation using
the updated version of the row. In the case of SELECT FOR UPDATE and
SELECT FOR SHARE, this means it is the updated version of the row that
is locked and returned to the client.
In other word, simply SELECT differs from SELECT FOR UPDATE/DELETE/UPDATE.
You can create simple test case to observe that behaviour:
Session 1
test=> START TRANSACTION;
START TRANSACTION
test=> SELECT * FROM test;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
test=> DELETE FROM test;
DELETE 10
test=>
Now login in another Session 2:
test=> START TRANSACTION;
START TRANSACTION
test=> SELECT * FROM test;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
test=> SELECT * FROM test WHERE x = 5 FOR UPDATE;
After the last command SELECT ... FOR UPDATE session 1 "hangs" and is waiting for something ......
Back in session 1
test=> insert into test select * from generate_series(1,10);
INSERT 0 10
test=> commit;
COMMIT
And now when you go back to session 2 you will see this:
test=> SELECT * FROM test WHERE x = 5 FOR UPDATE;
x
---
(0 rows)
test=> select * from test;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
That is - simple SELECT still doesn't see any changes, while SELECT ... FOR UPDATE does see that rows have been deleted. But it doesn't see new rows inserted by session 1
In fact a sequence you are seeing is:
process A starts its transaction
process A deletes everything from table T
process B starts its transaction
process B attempts a select for update on one row in table T
process B "hangs" and is waiting until session A does a commit or rollback
process A repopulates table T from incoming data
process A commits its transaction
process B comes up empty (0 rows- after session A commit) and calls rollback

some questions about "select for update"?

Here is Pseudo code using libpq.so;but it does not go as what I think.
transaction begin
re1 = [select ics_time from table1 where c1=c11, c2=c22, c3=c33, c4=c44 for update];
if(re1 satisfies the condition)
{
re2 = [select id where c1=c11, c2=c22, c3=c33, c4=c44 for update];
delete from table1 where id = re2;
delete from table2 where id = re2;
delete from table3 where id = re3;
insert a new record into table1,table2,table3 with the c1,c2,c3,c4 as primary keys;
]
commit or rollback
Note that c1,c2,c3,c4 are all set as the primary key in the database, so it is only one row with these keys in the database.
What confuses me is as follows:
There are two "select for update" which will lock the same row. In
this code, does the second SQL statement wait for the exclusive lock
blocked by the first statement? But, the actual situation is that it
does not happen.
Something occurs beyond my expectation. In the log, I see a large
number of duplicate insert errors. In my opinion that the "select
for update " locks the row with the unique for keys, two processes
go serially. The insert operation goes after a delete. How can these
duplicate insertation occur? Doesn't the "select for update" add an
exclusive lock to the row, which blocks all other processes that
want to lock the same row?
Regarding your first point: Locks are not held by the statement, locks are held by the surrounding transaction. Your pseudo-code seems to use one connections with one transaction which in turn uses several statements. So the second SELECT FOR UPDATE is not blocked by the first. Read the docs about locking for this:
[...]An exclusive row-level lock on a specific row is automatically acquired when the row is updated or deleted. The lock is held until the transaction commits or rolls back, just like table-level locks. Row-level locks do not affect data querying; they block only writers to the same row.
Otherwise it would be very funny, if a transaction could block itself so easily.
Regarding your second point: I cannot answer this because a) your pseudo code is to pseudo for this problem and b) I don't understand what you mean by "processes" and the exact usecase.