PostgreSQL - Does row locking depends on update syntax in transaction? - postgresql

I have a table called user_table with 2 columns: id (integer) and data_value (integer).
Here are two transactions that end up with the same results:
-- TRANSACTION 1
BEGIN;
UPDATE user_table
SET data_value = 100
WHERE id = 0;
UPDATE user_table
SET data_value = 200
WHERE id = 1;
COMMIT;
-- TRANSACTION 2
BEGIN;
UPDATE user_table AS user_with_old_value SET
data_value = user_with_new_value.data_value
FROM (VALUES
(0, 100),
(1, 200)
) AS user_with_new_value(id, data_value)
WHERE user_with_new_value.id = user_with_old_value.id;
COMMIT;
I would like to know if there is a difference on the lock applied on the rows.
If I understand it correctly, transaction 1 will first lock user 0, then lock user 1, then free both locks.
But what does transaction 2 do ?
Does it do the same thing or does it do: lock user 0 and user 1, then free both locks ?
There is a difference because if i have two concurent transactions, if i write my queries as the first transaction, i might encounter deadlocks issues. But if I write my transactions like the second one, can I run into deadlocks issues ?
If it does the same thing, is there a way to write this transaction so that at the beginning of a transaction, before doing anything, the transaction checks for each rows it needs to update, waits until all this rows are not locked, then lock all rows at the same time ?
links:
the syntax of the second transaction comes from: Update multiple rows in same query using PostgreSQL

Both transactions lock the same two rows, and both can run into a deadlock. The difference is that the first transaction always locks the two rows in a certain order.
If your objective is to avoid deadlocks the first transaction is better: if you make a rule that any transaction must update the user with the lower id first, then no such two transactions can deadlock with each other (but they can still deadlock with other transactions that do not obey that rule).

Related

In certain cases, can UPDATE transactions fail rather than block waiting for "FOR UPDATE lock"?

I'm using Postgres 9.6.5.
The docs say:
13.3. Explicit Locking
13.3.2. Row-level Locks" -> "Row-level Lock Modes" -> "FOR UPDATE":
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). ...
The mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.
Regarding UPDATEs on rows that are locked via "SELECT FOR UPDATE" in another transaction, I read the above as follows: other transactions that attempt UPDATE of these rows will be blocked until the current transaction ( which did "SELECT FOR UPDATE" for those rows ) ends, unless the columns in these rows being UPDATE'ed are those that don't have a unique index on them that can be used in a foreign key.
Is this correct ? If so, if I have a table "program" with a text column "stage" ( this column doesn't fit "have a unique index on them that can be used in a foreign key" ), and I have a transaction that does "SELECT FOR UPDATE" for some rows followed by UPDATE'ing "stage" in these rows, is it correct that other concurrent transactions UPDATE'ing "stage" on these rows can fail, rather than block until the former transaction ends ?
Your transactions can fail if a deadlock is detected in one of the UPDATE or SELECT...FOR UPDATE, for example:
Transaction 1
BEGIN;
SELECT * FROM T1 FOR UPDATE;
SELECT * FROM T2 FOR UPDATE;
-- Intentionally, no rollback nor commit yet
Transaction 2
BEGIN;
SELECT * FROM T2 FOR UPDATE; -- blocks T2 first
SELECT * FROM T1 FOR UPDATE; -- exception will happen here
The moment the second transaction tries to lock T1 you'll get:
ERROR: 40P01: deadlock detected
DETAIL: Process 15981 waits for ShareLock on transaction 3538264; blocked by process 15942.
Process 15942 waits for ShareLock on transaction 3538265; blocked by process 15981.
HINT: See server log for query details.
CONTEXT: while locking tuple (0,1) in relation "t1"
LOCATION: DeadLockReport, deadlock.c:956
Time: 1001.728 ms

Why does PostgreSQL serializable transaction think this as conflict?

In my understanding PostgreSQL use some kind of monitors to guess if there's a conflict in serializable isolation level. Many examples are about modifying same resource in concurrent transaction, and serializable transaction works great. But I want to test concurrent issue in another way.
I decide to test 2 users modifying their own account balance, and wish PostgreSQL is smart enough to not detect it as conflict, but the result is not what I want.
Below is my table, there're 4 accounts which belongs to 2 users, each user has a checking account and a saving account.
create table accounts (
id serial primary key,
user_id int,
type varchar,
balance numeric
);
insert into accounts (user_id, type, balance) values
(1, 'checking', 1000),
(1, 'saving', 1000),
(2, 'checking', 1000),
(2, 'saving', 1000);
The table data is like this:
id | user_id | type | balance
----+---------+----------+---------
1 | 1 | checking | 1000
2 | 1 | saving | 1000
3 | 2 | checking | 1000
4 | 2 | saving | 1000
Now I run 2 concurrent transaction for 2 users. In each transaction, I reduce the checking account with some money, and check that user's total balance. If it's greater than 1000, then commit, otherwise rollback.
The user 1's example:
begin;
-- Reduce checking account for user 1
update accounts set balance = balance - 200 where user_id = 1 and type = 'checking';
-- Make sure user 1's total balance > 1000, then commit
select sum(balance) from accounts where user_id = 1;
commit;
The user 2 is the same, except the user_id = 2 in where:
begin;
update accounts set balance = balance - 200 where user_id = 2 and type = 'checking';
select sum(balance) from accounts where user_id = 2;
commit;
I first commit user 1's transaction, it success with no doubt. When I commit user 2's transaction, it fails.
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
HINT: The transaction might succeed if retried.
My questions are:
Why PostgreSQL thinks this 2 transactions are conflict? I added user_id condition for all SQL, and doesn't modify user_id, but all these have no effect.
Does that mean serializable transaction doesn't allow concurrent transactions happened on the same table, even if their read/write have no conflict?
Do something per user is very common, Should I avoid use serializable transaction for operations which happen very frequently?
You can fix this problem with the following index:
CREATE INDEX accounts_user_idx ON accounts(user_id);
Since there are so few data in your example table, you will have to tell PostgreSQL to use an index scan:
SET enable_seqscan=off;
Now your example will work!
If that seems like black magic, take a look at the query execution plans of your SELECT and UPDATE statements.
Without the index both will use a sequential scan on the table, thereby reading all rows in the table. So both transactions will end up with a SIReadLock on the whole table.
This triggers the serialization failure.
To my knowledge serializable has the highest level of isolation, therefore lowest level of concurrency. The transactions occur one after the other with zero concurrency.

Postgres 9.4 detects Deadlock when read-modify-write on single table

We have an application with a simple table
given_entity{
UUID id;
TimeStamp due_time;
TimeStamp process_time;
}
This is a spring boot (1.2.5.RELEASE) application that uses spring-data-jpa.1.2.5.RELEASE with hibernate-4.3.10.FINAL as jpa provier.
We have 5 instances of this application with each of them having a scheduler running every 2 second and querying the database for rows that have a due_time of last 2 mins until now that are not yet processed;
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
FOR UPDATE
Requirement is each row of above table gets successfully processed by exactly one of application instances.
Then the application instance processes these rows and update its process_time field in one transaction.
This may or may not take more than 2 seconds, which is scheduler interval.
Also we don't have any index but PK index on this table.
Second point worth noting is that these instances might insert rows this table which is called separately by clients.
Problem: in the logs I see this message from postgresql (rarely but it happens)
ERROR: deadlock detected
Detail: Process 10625 waits for ShareLock on transaction 25382449; blocked by process 10012.
Process 10012 waits for ShareLock on transaction 25382448; blocked by process 12238.
Process 12238 waits for AccessExclusiveLock on tuple (1371,45) of relation 19118 of database 19113; blocked by process 10625.
Hint: See server log for query details.
Where: while locking tuple (1371,45) in relation "given_entity"
Question:
How does this happen?
I checked postgresql locks and searched internet. I didn't find anything that says deadlock is possible on only one simple table.
I also couldn't reproduce this error using test.
Process A tries to lock row 1 followed by row 2. Meanwhile, process B tries to lock row 2 then row 1. That's all it takes to trigger a deadlock.
The problem is that the row locks are acquired in an indeterminate order, because the SELECT returns its rows in an indeterminate order. And avoiding this is just a matter of ensuring that all processes agree on an order when locking rows, i.e.:
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
ORDER BY id
FOR UPDATE
In Postgres 9.5+, you can simply ignore any row which is locked by another process using FOR UPDATE SKIP LOCKED.
This can easily happen.
There are probably several rows that satisfy the condition
due_time BETWEEN now() AND now() - INTERVAL '2 minutes'
so it can easily happen that the SELECT ... FOR UPDATE finds and locks one row and then is blocked locking the next row. Remember – for a deadlock it is not necessary that more than one table is involved, it is enough that more than one lockable resource is involved. In your case, those are two different rows in the given_entity table.
It may even be that the deadlock happens between two of your SELECT ... FOR UPDATE statements.
Since you say that there is none but the primary key index on the table, the query has to perform a sequential scan. In PostgreSQL, there is no fixed order for rows returned from a sequential scan. Rather, if two sequential scans run concurrently, the second one will “piggy-back” on the first and will start scanning the table at the current location of the first sequential scan.
You can check if that is the case by setting the parameter synchronize_seqscans to off and see if the deadlocks vanish. Another option would be to take a SHARE ROW EXCLUSIVE lock on the table before you run the statement.
Switch on hibernate batch updates in your application.properties
hibernate.batch.size=100
hibernate.order_updates=true
hibernate.order_inserts=true
hibernate.jdbc.fetch_size = 400

Enforce Atomic Operations x Locks

I have a database model that works similar to a banking account (one table for operations, a nd a trigger to update the balance). I'm currently using SQL SERVER 2008 R2.
TABLE OPERATIONS
----------------
VL_CREDIT decimal(10,2)
VL_DEBIT decimal(10,2)
TABLE BALANCE
-------------
DT_OPERATION datetime
VL_CURRENT decimal(10,2)
PROCEDURE INSERT_OPERATION
--------------------------
GET LAST BALANCE BY DATE
CHECK IF VALUE OF OPERATION > BALANCE
IF > RETURN ERROR
ELSE INSERT INTO OPERATION(...,....)
The issue I have is the following:
The procedure to insert the operation has to check the balance to see if there's money available before inserting the operation, so the balance never gets negative. If there's no balance, I return some code to tell the user the balance is not enough.
My concern is: If this procedure gets called multiple times in a row, how can I guarantee that it's atomic?
I have some ideas, but as I am not sure which would guarantee it:
BEGIN TRANSACTION on the OPERATION PROCEDURE
Some sort of lock on selecting the BALANCE table, but it must hold until the end of procedure execution
Can you suggest some approach to guarantee that? Thanks in advance.
UPDATE
I read on MSDN (http://technet.microsoft.com/en-us/library/ms187373.aspx) that if my procedure has BEGIN/END TRANSACTION, and the SELECT on table BALANCE has WITH(TABLOCKX), it locks the table until the end of the transaction, so if a subsequent call to this procedure is made during the execution of the first, it will wait, and then guarantee that the value is always the last updated. Will it work? And if so, is it the best practice?
If you're amenable to changing your table structures, I'd build it this way:
create table Transactions (
SequenceNo int not null,
OpeningBalance decimal(38,4) not null,
Amount decimal(38,4) not null,
ClosingBalance as CONVERT(decimal(38,4),OpeningBalance + Amount) persisted,
PrevSequenceNo as CASE WHEN SequenceNo > 1 THEN SequenceNo - 1 END persisted,
constraint CK_Transaction_Sequence CHECK (SequenceNo > 0),
constraint PK_Transaction_Sequence PRIMARY KEY (SequenceNo),
constraint CK_Transaction_NotNegative CHECK (OpeningBalance + Amount >= 0),
constraint UQ_Transaction_BalanceCheck UNIQUE (SequenceNo, ClosingBalance),
constraint FK_Transaction_BalanceCheck FOREIGN KEY
(PrevSequenceNo, OpeningBalance)
references Transactions
(SequenceNo,ClosingBalance)
/* Optional - another check that Transaction 1 has 0 amount and
0 opening balance, if required */
)
Where you just apply credits and debits as +ve or -ve values for Amount. The above structure is enough to enforce the "not going negative" requirement (via CK_Transaction_NotNegative), and it also ensures that you know the current balance (by finding the row with the highest SequenceNo and taking the ClosingBalance value. Together, UQ_Transaction_BalanceCheck and FK_Transaction_BalanceCheck (and the computed columns) ensure that the entire sequence of transactions is valid, and PK_Transaction_Sequence keeps everything building in order
So, if we populate it with some data:
insert into Transactions (SequenceNo,OpeningBalance,Amount) values
(1,0.0,10.0),
(2,10.0,-5.50),
(3,4.50,2.75)
And now we can attempt an insert (this could be INSERT_PROCEDURE with #NewAmount passed as a parameter):
declare #NewAmount decimal(38,4)
set #NewAmount = -15.50
;With LastTransaction as (
select SequenceNo,ClosingBalance,
ROW_NUMBER() OVER (ORDER BY SequenceNo desc) as rn
from Transactions
)
insert into Transactions (SequenceNo,OpeningBalance,Amount)
select SequenceNo + 1, ClosingBalance, #NewAmount
from LastTransaction
where rn = 1
This insert fails because it would have caused the balance to go negative. But if #NewAmount was small enough, it would have succeeded. And if two inserts are attempted at "the same time" then either a) They're just far enough apart in reality that they both succeed, and the balances are all kept correct, or b) One of them will receive a PK violation error.

Postgres deadlock with read_commited isolation

We have noticed a rare occurrence of a deadlock on a Postgresql 9.2 server on the following situation:
T1 starts the batch operation:
UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING'
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)
When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:
For select:
SELECT *, RANK() as rno OVER(ORDER BY user_id) FROM BB WHERE status = 'PROCESSING' AND bulk_id = 1 and rno = $1
And Update:
UPDATE BB set datetime=$1, status='DONE', message_id=$2 WHERE bulk_id=1 AND user_id=$3
(user_id, bulk_id have a UNIQUE constraint).
Due to an external to the situation problem, another transaction T2 executes the same query with T1 almost immediately after T1 has committed (the initial batch operation where items are marked as 'PROCESSING').
UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING'
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)
However although these items are marked as 'PROCESSING' this query deadlocks with some of the updates (which are done in batches as i said) off the worker threads. To my understanding this should not happen with READ_COMMITTED isolation level (default) that we use. I am sure that T1 has committed because the worker threads execute after it has done so.
edit: One thing i should clear up is that T2 starts after T1 but before it commits. However due to a write_exclusive tuple lock that we acquire with a SELECT for UPDATE on the same row (that is not affected by any of the above queries), it waits for T1 to commit before it runs the batch update query.
When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:
This strikes me as a concurrency problem. I think you are far better off to have one transaction read the rows and hand them off to worker processes, and then update them in batches, when they come back. Your fundamental problem is going to be that these rows are effectively working on uncertain state, holding the rows during transactions, and the like. You have to handle rollbacks and so forth separately, and consequently the locking is a real problem.
Now, if that solution is not possible, I would have a separate locking table. In this case, each thread spins up separately, locks the locking table, claims a bunch of rows, inserts records into the locking table, and commits. In this way each one thread has claimed records. Then they can work on their record sets, update them, etc. You may want to have a process which periodically clears out stale locks.
In essence your problem is that rows go from state A -> processing -> state B and may be rolled back. Since the other threads have no way of knowing what rows are processing and by which threads, you can't safely allocate records. One option is to change the model to:
state A -> claimed state -> processing -> state B. However you have to have some way of ensuring that rows are effectively allocated and that the threads know which rows have been allocated to eachother.