some questions about "select for update"? - postgresql

Here is Pseudo code using libpq.so;but it does not go as what I think.
transaction begin
re1 = [select ics_time from table1 where c1=c11, c2=c22, c3=c33, c4=c44 for update];
if(re1 satisfies the condition)
{
re2 = [select id where c1=c11, c2=c22, c3=c33, c4=c44 for update];
delete from table1 where id = re2;
delete from table2 where id = re2;
delete from table3 where id = re3;
insert a new record into table1,table2,table3 with the c1,c2,c3,c4 as primary keys;
]
commit or rollback
Note that c1,c2,c3,c4 are all set as the primary key in the database, so it is only one row with these keys in the database.
What confuses me is as follows:
There are two "select for update" which will lock the same row. In
this code, does the second SQL statement wait for the exclusive lock
blocked by the first statement? But, the actual situation is that it
does not happen.
Something occurs beyond my expectation. In the log, I see a large
number of duplicate insert errors. In my opinion that the "select
for update " locks the row with the unique for keys, two processes
go serially. The insert operation goes after a delete. How can these
duplicate insertation occur? Doesn't the "select for update" add an
exclusive lock to the row, which blocks all other processes that
want to lock the same row?

Regarding your first point: Locks are not held by the statement, locks are held by the surrounding transaction. Your pseudo-code seems to use one connections with one transaction which in turn uses several statements. So the second SELECT FOR UPDATE is not blocked by the first. Read the docs about locking for this:
[...]An exclusive row-level lock on a specific row is automatically acquired when the row is updated or deleted. The lock is held until the transaction commits or rolls back, just like table-level locks. Row-level locks do not affect data querying; they block only writers to the same row.
Otherwise it would be very funny, if a transaction could block itself so easily.
Regarding your second point: I cannot answer this because a) your pseudo code is to pseudo for this problem and b) I don't understand what you mean by "processes" and the exact usecase.

Related

Should I lock a PostgreSQL table when invoking setval for sequence with the max ID function?

I have the following SQL script which sets the sequence value corresponding to max value of the ID column:
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1)) FROM mytable;
Should I lock 'mytable' in this case in order to prevent changing ID in a parallel request, such in the example below?
request #1 request #2
MAX(id)=5
inserted id 6
SETVAL=5
Or setval(max(id)) is an atomic operation?
Your suspicion is right, this approach is subject to race conditions.
But locking the table won't help, because it won't keep a concurrent transaction from fetching new sequence values. This transaction will block while the table is locked, but will happily continue inserting once the lock is gone, using a sequence value it got while the table was locked.
If it were possible to lock sequences, that might be a solution, but it is not possible to lock sequences.
I can think of two solutions:
Remove all privileges on the sequence while you modify it, so that concurrent requests to the sequence will fail. That causes errors, of course.
The pragmatic way: use
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1) + 100000) FROM mytable;
Here 100000 is a value that is safely bigger than the number rows that might get inserted while your operatoin is running.
You can use two requests in the same transaction:
ALTER SEQUENCE mytable_id_seq RESTART;
SELECT SETVAL('mytable_id_seq', COALESCE(MAX(id), 1)) FROM mytable;
Note: the first command will lock the sequence for other transactions

Lock row, release later

I'm trying to understand how to lock a row, and only release that lock later.
I have a table like this :
create table testTable (Name varchar(100));
Some test data
insert into testTable (name) select 'Bob';
insert into testTable (name) select 'John';
insert into testTable (name) select 'Steve';
Now, I want to select one of those rows, and prevent other other queries from seeing this row. I achieve that like this :
begin transaction;
select * from testTable where name = 'Bob' for update;
In another window, I do this :
select * from testTable for update skip locked;
Great, I don't see 'Bob' in that result set. Now, I want to do something with the primary retrieved row (Bob), and after I did my work, I want to release that row again. Simple answer would be to do :
commit transaction
However, I am running multiple transactions on the same connection, so I can't just begin and commit transactions all over the show. Ideally I would like to have a "named" transaction, something like :
begin transaction 'myTransaction';
select * from testTable where name = 'Bob' for update;
//do stuff with the data, outside sql then later call ...
commit transaction 'myTransaction';
But postgres doesn't support that. I have found "prepare transaction", but that seems to be a pear-shaped path I don't want to go down, especially as these transaction seem to persist through restarts even.
Is there anyway I can have a reference to commit/rollback for a specific transaction?
You can have only one transaction in a database session, so the question as such is moot.
But I assume that you do not really want to run a transaction, you want to block access to a certain row for a while.
It is usually not a good idea to use regular database locks for such a purpose (the exception are advisory locks, which serve exactly that purpose, but are not tied to table rows). The problem is that long database transactions keep autovacuum from doing its job.
I recommend that you add a status column to the table and change the status rather than locking the row. That would server the same purpose in a more natural fashion and make your problem go away.
If you are concerned that the status flag might not get cleared due to application logic problems, replace it with a visible_from column of type timestamp with time zone that initially contains -infinity. Instead of locking the row, set the value to current_timestamp + INTERVAL '5 minutes'. Only select rows that fulfill WHERE visible_from < current_timestamp. That way the “lock” will automatically expire after 5 minutes.

how to lock a table for writing

I would like to lock a table for writing during a period of time, while leaving it available for reading.
Is that possible ?
Ideally I would like to lock the table with a predicate (for example prevent writing rows "where country = france").
If you really want to lock against such inserts, i.e. the query should hang and only continue when you allow it, you would have to place a SHARE lock on the table and keep the transaction open.
This is usually not a good idea.
If you want to prevent any such inserts, i.e. throw an error when such an insert is attempted, create a BEFORE INSERT trigger that throws an exception if the NEW row satisfies the condition.
You can use FOR SHARE lock, which blocks other transactions from performing like UPDATE and DELETE, while allowing SELECT FOR SHARE. (Read the docs for details: https://www.postgresql.org/docs/9.4/explicit-locking.html [13.3.2])
For example, there are 2 processes accessing table user_table, in the following sequence:
Process A: BEGIN;
Process A: SELECT username FROM user_table WHERE country = france FOR SHARE;
Process B: SELECT * FROM user_table FOR SHARE; (In here, process B can still read all the rows of the table)
Process B: UPDATE user_table SET username = 'test' WHERE country = france; (In here, process B is blocked and is waiting for process A to finish its transaction)

PostgreSQL for update statement

PostgreSQL has read committed isolation level. Now I have a transaction which consists of a single DELETE statement and this delete statement has a subquery consisting of a SELECT statement for selection the rows to delete.
Is it true that I have to use FOR UPDATE in the select statement to get no conflicts with other transaction?
My thinking is the following: First the corresponding rows are read out from the table and in a second step these rows are deleted, so another transaction could interfere.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do I also have to use FOR UPDATE?
Is it true that I have to use FOR UPDATE in the select statement to
get no conflicts with other transaction?
What does "no conflicts with other transaction" mean to you? You can test this by opening two terminals, and executing statements in each of them. Interleaved correctly, the DELETE statement will make the "other transaction" (the one that has its isolation level set to READ COMMITTED) wait until it commits or rolls back.
sandbox=# set transaction isolation level read committed;
SET
sandbox=# select * from customer;
date_of_birth
---------------
1996-09-29
1996-09-28
(2 rows)
sandbox=# begin transaction;
BEGIN
sandbox=# delete from customer
sandbox-# where date_of_birth = '1996-09-28';
DELETE 1
sandbox=# update customer
sandbox-# set date_of_birth = '1900-01-01'
sandbox-# where date_of_birth = '1996-09-28';
(Execution pauses here, waiting for transaction in other terminal.)
sandbox=# commit;
COMMIT
sandbox=#
UPDATE 0
sandbox=#
See below for the documentation.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do
I also have to use FOR UPDATE?
There's no such statement as DELETE . . . FOR UPDATE.
You need to be sensitive to context when you're reading about database updates. Update can mean any change to a database; it can include inserting, deleting, and updating rows. In the docs cited below, "locked as though for update" is explicitly talking about UPDATE and DELETE statements, among others.
Current docs
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE,
SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of
these rows will be blocked until the current transaction ends. The FOR
UPDATE lock mode is also acquired by any DELETE on a row, and also by
an UPDATE that modifies the values on certain columns. Currently, the
set of columns considered for the UPDATE case are those that have an
unique index on them that can be used in a foreign key (so partial
indexes and expressional indexes are not considered), but this may
change in the future. Also, if an UPDATE, DELETE, or SELECT FOR UPDATE
from another transaction has already locked a selected row or rows,
SELECT FOR UPDATE will wait for the other transaction to complete, and
will then lock and return the updated row (or no row, if the row was
deleted).
Short version: the FOR UPDATE in a sub-select is not necessary because the DELETE implementation already does the necessary locking. It would be redundant.
Ideally you should read and digest Concurrency Control to learn how the concurrency issues are dealt with by the SQL engine.
Specifically for the case you're mentioning, I think these couple of excerpts are the most relevant, in Read Committed Isolation Level:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time.
However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress).
So one of your two concurrent DELETE will be put to wait, as soon as it tries to delete a row that the other one already processed just before. This wait will only end when the other one commits or roll backs. In a way, that means that the engine "detected the conflict" and serialized the two DELETE in order to deal with that conflict.
If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row.
In your scenario, after the first DELETE has committed and the second one is waked up, the second one will be unable to delete the row that it was put to wait for, because it's no longer current, it's gone. That's not an error in this isolation level. The execution will just go on with the other rows, some of which may also have disappeared. Eventually it will report the actual number of rows that were deleted by this statement, that may be different from the number that the sub-select initially found, before the statement was put to wait.

SQL Isolation levels or locks in large procedures

I have big stored procedures that handle user actions.
They consist of multiple select statements. These are filtered, most of the times only getting one row. The Selects are copied into temptables or otherwise evaluated.
Finally, a merge-Statement does the needed changes in the DB.
All is encapsulated in a transaction.
I have concurrent input from users, and the selected rows of the select statements should be locked to keep data integrity.
How can I lock the selected Rows of all select statements, so that they aren't updated through other transactions while the current transaction is in process?
Does a table hint combination of ROWLOCK and HOLDLOCK work in a way that only the selected rows are locked, or are the whole tables locked because of the HOLDLOCK?
SELECT *
FROM dbo.Test
WITH (ROWLOCK HOLDLOCK )
WHERE id = #testId
Can I instead use
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
right after the start of the transaction? Or does this lock the whole tables?
I am using SQL2008 R2, but would also be interested if things work differently in SQL2012.
PS: I just read about the table hints UPDLOCK and SERIALIZE. UPDLOCK seems to be a solution to lock only one row, and it seems as if UPDLOCK always locks instead of ROWLOCK, which does only specify that locks are row based IF locks are applied. I am still confused about the best way to solve this...
Changing the isolation level fixed the problem (and locked on row level):
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Here is how I tested it.
I created a statement in a blank page of the SQL Management Studio:
begin tran
select
*
into #message
from dbo.MessageBody
where MessageBody.headerId = 28
WAITFOR DELAY '0:00:05'
update dbo.MessageBody set [message] = 'message1'
where headerId = (select headerId from #message)
select * from dbo.MessageBody where headerId = (select headerId from #message)
drop table #message
commit tran
While executing this statement (which takes at last 5 seconds due to the delay), I called the second query in another window:
begin tran
select
*
into #message
from dbo.MessageBody
where MessageBody.headerId = 28
update dbo.MessageBody set [message] = 'message2'
where headerId = (select headerId from #message)
select * from dbo.MessageBody where headerId = (select headerId from #message)
drop table #message
commit tran
and I was rather surprised that it executed instantaneously. This was due to the default SQL Server transaction level "Read Commited" http://technet.microsoft.com/en-us/library/ms173763.aspx . Since the update of the first script is done after the delay, during the second script there are no umcommited changes yet, so the row 28 is read and updated.
Changing the Isolation level to Serialization prevented this, but it also prevented concurrency - both scipts were executed consecutively.
That was OK, since both scripts read and changed the same row (via headerId=28). Changing headerId to another value in the second script, the statements were executed parallel. So the lock from SERIALIZATION seems to be on row level.
Adding the table hint
WITH ( SERIALIZABLE)
in the first select of the first statement does also prevent further reads oth the selected row.