Postgres deadlock with read_commited isolation - postgresql

We have noticed a rare occurrence of a deadlock on a Postgresql 9.2 server on the following situation:
T1 starts the batch operation:
UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING'
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)
When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:
For select:
SELECT *, RANK() as rno OVER(ORDER BY user_id) FROM BB WHERE status = 'PROCESSING' AND bulk_id = 1 and rno = $1
And Update:
UPDATE BB set datetime=$1, status='DONE', message_id=$2 WHERE bulk_id=1 AND user_id=$3
(user_id, bulk_id have a UNIQUE constraint).
Due to an external to the situation problem, another transaction T2 executes the same query with T1 almost immediately after T1 has committed (the initial batch operation where items are marked as 'PROCESSING').
UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING'
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)
However although these items are marked as 'PROCESSING' this query deadlocks with some of the updates (which are done in batches as i said) off the worker threads. To my understanding this should not happen with READ_COMMITTED isolation level (default) that we use. I am sure that T1 has committed because the worker threads execute after it has done so.
edit: One thing i should clear up is that T2 starts after T1 but before it commits. However due to a write_exclusive tuple lock that we acquire with a SELECT for UPDATE on the same row (that is not affected by any of the above queries), it waits for T1 to commit before it runs the batch update query.

When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:
This strikes me as a concurrency problem. I think you are far better off to have one transaction read the rows and hand them off to worker processes, and then update them in batches, when they come back. Your fundamental problem is going to be that these rows are effectively working on uncertain state, holding the rows during transactions, and the like. You have to handle rollbacks and so forth separately, and consequently the locking is a real problem.
Now, if that solution is not possible, I would have a separate locking table. In this case, each thread spins up separately, locks the locking table, claims a bunch of rows, inserts records into the locking table, and commits. In this way each one thread has claimed records. Then they can work on their record sets, update them, etc. You may want to have a process which periodically clears out stale locks.
In essence your problem is that rows go from state A -> processing -> state B and may be rolled back. Since the other threads have no way of knowing what rows are processing and by which threads, you can't safely allocate records. One option is to change the model to:
state A -> claimed state -> processing -> state B. However you have to have some way of ensuring that rows are effectively allocated and that the threads know which rows have been allocated to eachother.

Related

PostgreSQL - Does row locking depends on update syntax in transaction?

I have a table called user_table with 2 columns: id (integer) and data_value (integer).
Here are two transactions that end up with the same results:
-- TRANSACTION 1
BEGIN;
UPDATE user_table
SET data_value = 100
WHERE id = 0;
UPDATE user_table
SET data_value = 200
WHERE id = 1;
COMMIT;
-- TRANSACTION 2
BEGIN;
UPDATE user_table AS user_with_old_value SET
data_value = user_with_new_value.data_value
FROM (VALUES
(0, 100),
(1, 200)
) AS user_with_new_value(id, data_value)
WHERE user_with_new_value.id = user_with_old_value.id;
COMMIT;
I would like to know if there is a difference on the lock applied on the rows.
If I understand it correctly, transaction 1 will first lock user 0, then lock user 1, then free both locks.
But what does transaction 2 do ?
Does it do the same thing or does it do: lock user 0 and user 1, then free both locks ?
There is a difference because if i have two concurent transactions, if i write my queries as the first transaction, i might encounter deadlocks issues. But if I write my transactions like the second one, can I run into deadlocks issues ?
If it does the same thing, is there a way to write this transaction so that at the beginning of a transaction, before doing anything, the transaction checks for each rows it needs to update, waits until all this rows are not locked, then lock all rows at the same time ?
links:
the syntax of the second transaction comes from: Update multiple rows in same query using PostgreSQL
Both transactions lock the same two rows, and both can run into a deadlock. The difference is that the first transaction always locks the two rows in a certain order.
If your objective is to avoid deadlocks the first transaction is better: if you make a rule that any transaction must update the user with the lower id first, then no such two transactions can deadlock with each other (but they can still deadlock with other transactions that do not obey that rule).

What concurrency issues can this PostgreSQL code create?

Ok so here's the schema, which is pretty self-explanatory:
STORE(storeID, name, city)
PRODUCT(productID, name, brand)
PRODUCT_FOR_SALE(productID, storeID, price)
I have 2 transactions: T1 and T2.
T1 raises by 5% the price for any product sold in any store in 'London'.
T2 lowers by 10% the price for any product whose cost is >= $1050
What I am asked is to tell what kind of concurrency anomaly they may result in, and what isolation level I should apply to which transaction to make it safe.
The code for the transactions is not given, but I suppose it would be something on the lines of:
# T1:
BEGIN;
UPDATE product_for_sale
SET price = price + ((price/100) *5)
WHERE storeID IN (SELECT storeID FROM store WHERE city='London')
COMMIT;
# T2:
BEGIN;
UPDATE product_for_sale
SET price = price - (price/10)
WHERE price >= 1050
COMMIT;
My "guess" to what might happen with READ COMMITTED (default) is:
Considering a product P, sold in 'London' for $1049
both transactions begin
they both consider their row sets: T1 will consider all products sold in London (which includes P), T2 will consider products whose price is $1050 or more (which excludes P)
T1 commits and sets the price of P to $1101 but, since P wasn't in T2's row set to begin with, the change goes unnoticed, and T2 commits without considering it
Which, if I'm not messing up definitions, should be a case of phantom read, which would be fixed if I set T2 to ISOLATION LEVEL REPEATABLE READ
First, it is not quite clear what you mean with a concurrency issue. It could be:
something that could conceivably be a problem, but is handled by PostgreSQL automatically so that no problems arise.
something that can generate an unexpected error or an undesirable result.
For 1., this is handled by locks that serialize transactions that are trying to modify the same rows concurrently.
I assume you are more interested in 2.
What you describe can happen, but it is not a concurrency problem. It just means that T1 logically takes place before T2. This will work just fine on all isolation levels.
I might be missing something, but the only potential problem I see here is a deadlock between the two statements:
They both can update several rows, so it could happen that one of them updates row X first and then tries to update row Y, which has already been updated by the other statement. The first statement then is blocked. Now the second statement wants to update row Y and is blocked too.
Such a deadlock is broken by the deadlock resolver after one second by killing off one of the statements with an error.
Note that deadlocks are not real problems either, all your code has to do is to repeat the failed transaction.

Try to understand MVCC

I'm trying to understand MVCC and can't get it. For example.
Transaction1 (T1) try to read some data row. In that same time T2 update the same row.
The flow of transaction is
T1 begin -> T2 begin -> T2 commit -> T1 commit
So the first transaction get it snapshot of database and returns to user result on which he is gonna build other calculation. But as I understand, customer get the old data value? As I understand MVCC, after T1 transaction begins, that transaction doesn't know, that some other transaction change data. So if now user doing some calculation after that (without DB involved), he is doing it on wrong data? Or I'm not right and first transaction have some mechanisms to know that row was changed?
Let's now change the transaction flow.
T2beg -> T1beg -> T2com -> T1com
In 2PL user get the newer version of data because of locks (T1 must wait before exclusive lock released). But in case of MVCC it still will be the old data, as I understand the postgresql MVCC model. So I can get stale data in exchange for speed. Am I right? Or I miss something?
Thank you
Yes, it can happen that you read some old data from the database (that a concurrent transaction has modified), perform a calculation based on this and store “outdated” data in the database.
This is not a problem, because rather than the actual order in which transactions happen, the logical order is more relevant:
If T1 reads some data, then T2 modifies the data, then T1 modifies the database based on what it read, you can say that T1 logically took place before T2, and there is no inconsistency.
You only get an anomaly if T1 and T2 modify the same data: T1 reads data, T2 modifies the data, then T1 modifies the same data based on what it read. This anomaly is called a “lost update”.
Lost updates can only occur with the weakest (which is the default) isolation level, READ COMMITTED.
If you want better isolation from concurrent activity, you need to use at least REPEATABLE READ isolation. Then T1 would receive a serialization error when it tries to update the data, and you would have to repeat the transaction. On the second attempt, it would read the new data, and everything will be consistent.
The flow of transaction is next T1 begin -> T2 begin -> T2 commit -> T1 commit. So the first transaction get it snapshot of database and returns to user result on which he is gonna build other calculation. But as I understand, customer get the old data value?
Unless T1 and T2 try to update the same row, this could usually be reordered to be the same as: T1 begin; T1 commit; T2 begin; T2 commit;
In other words, any undesirable business outcome that could be achieved by T2 changing the data while T1 was making a decision, could have also occurred by T2 changing the data immediately after T1 made the same decision.
Each transaction can only see data that is younger or equal to that transaction's ID.
When transaction 1 reads data, it marks the read time stamp of that data to transaction 1.
If transaction 2 tries to read the same data, it checks the read timestamp of the data, if the read timestamp for the data is less than transaction 2, then transaction 2 is aborted because 1 < 2 -- 1 got there before us and they must finish before us.
At commit time, we also check if the read timestamp of the data is less than the committing transaction. If it is, we abort the transaction and restart with a new transaction ID.
We also check if write timestamps are less than our transaction. Younger transaction wins.
There is an edge case where a younger transaction can abort if an older transaction gets ahead of the younger transaction.
I have actually implemented MVCC in Java. (see transaction, runner and mvcc code)

Postgres 9.4 detects Deadlock when read-modify-write on single table

We have an application with a simple table
given_entity{
UUID id;
TimeStamp due_time;
TimeStamp process_time;
}
This is a spring boot (1.2.5.RELEASE) application that uses spring-data-jpa.1.2.5.RELEASE with hibernate-4.3.10.FINAL as jpa provier.
We have 5 instances of this application with each of them having a scheduler running every 2 second and querying the database for rows that have a due_time of last 2 mins until now that are not yet processed;
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
FOR UPDATE
Requirement is each row of above table gets successfully processed by exactly one of application instances.
Then the application instance processes these rows and update its process_time field in one transaction.
This may or may not take more than 2 seconds, which is scheduler interval.
Also we don't have any index but PK index on this table.
Second point worth noting is that these instances might insert rows this table which is called separately by clients.
Problem: in the logs I see this message from postgresql (rarely but it happens)
ERROR: deadlock detected
Detail: Process 10625 waits for ShareLock on transaction 25382449; blocked by process 10012.
Process 10012 waits for ShareLock on transaction 25382448; blocked by process 12238.
Process 12238 waits for AccessExclusiveLock on tuple (1371,45) of relation 19118 of database 19113; blocked by process 10625.
Hint: See server log for query details.
Where: while locking tuple (1371,45) in relation "given_entity"
Question:
How does this happen?
I checked postgresql locks and searched internet. I didn't find anything that says deadlock is possible on only one simple table.
I also couldn't reproduce this error using test.
Process A tries to lock row 1 followed by row 2. Meanwhile, process B tries to lock row 2 then row 1. That's all it takes to trigger a deadlock.
The problem is that the row locks are acquired in an indeterminate order, because the SELECT returns its rows in an indeterminate order. And avoiding this is just a matter of ensuring that all processes agree on an order when locking rows, i.e.:
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
ORDER BY id
FOR UPDATE
In Postgres 9.5+, you can simply ignore any row which is locked by another process using FOR UPDATE SKIP LOCKED.
This can easily happen.
There are probably several rows that satisfy the condition
due_time BETWEEN now() AND now() - INTERVAL '2 minutes'
so it can easily happen that the SELECT ... FOR UPDATE finds and locks one row and then is blocked locking the next row. Remember – for a deadlock it is not necessary that more than one table is involved, it is enough that more than one lockable resource is involved. In your case, those are two different rows in the given_entity table.
It may even be that the deadlock happens between two of your SELECT ... FOR UPDATE statements.
Since you say that there is none but the primary key index on the table, the query has to perform a sequential scan. In PostgreSQL, there is no fixed order for rows returned from a sequential scan. Rather, if two sequential scans run concurrently, the second one will “piggy-back” on the first and will start scanning the table at the current location of the first sequential scan.
You can check if that is the case by setting the parameter synchronize_seqscans to off and see if the deadlocks vanish. Another option would be to take a SHARE ROW EXCLUSIVE lock on the table before you run the statement.
Switch on hibernate batch updates in your application.properties
hibernate.batch.size=100
hibernate.order_updates=true
hibernate.order_inserts=true
hibernate.jdbc.fetch_size = 400

PostgreSql 9.3 -> several consumers + locking

There are 2 tables:
CREATE TABLE "job"
(
"id" SERIAL,
"processed" BOOLEAN NOT NULL,
PRIMARY KEY("id")
);
CREATE TABLE "job_result"
(
"id" SERIAL,
"job_id" INT NOT NULL,
PRIMARY KEY("id")
);
There are several consumers, that do the following (sequentially):
1) start transaction
2) search for job not processed yet
3) process it
4) save result ( set processed field to true and insert into job_result )
5) commit
Questions:
1) Is the following sql code correct, so no job could be processed more than one time?
2) If it is correct, can it be rewritten in more clean way ? ( I am confused about "UPDATE job SET id = id" )
UPDATE job
SET id = id
WHERE id =
(
SELECT MIN(id)
FROM job
WHERE processed = false AND pg_try_advisory_lock(id) = true
)
AND processed = false
RETURNING *
Thanks.
with job_update as (
update job
set processed = true
where id = (
select id
from (
select min(id)
from job
where processed = false
) s
for update
)
returning id
)
insert into job_result (job_id)
select id
from job_update
Question 1
To answer your first question, the processing can be done twice if the database crashes between step 3 and step 5. When the server/service recovers, it will be processed again.
If the processing step only computes results which are sent to the database in the same connection as the queuing queries, then no one will be able to see that it was processed twice, as the results of the first time were never visible.
However if the processing step talks to the outside world, such as sending an email or charging a credit card, that action will be taken twice and both will be visible. The only way to avoid that is to use two-phase commits for all dealings with the outside world. Also, if the worker keeps two connections to the database and is not disciplined about their use, then that can also lead to visible double-processing.
Question 2
For your second question, there are several ways it can be made cleaner.
Most importantly, you will want to change the advisory lock from session-duration to transaction-duration. If you leave it at session-duration, long-lived workers will be become slower and slower and will use more and more memory as time goes on. This is safe to do, because in the query as written you are checking the processed flag in both the sub-select and in the update itself.
You could make the table structure itself cleaner. You could have one table with both the processed flag and the results field, instead of two tables. Or if you want two tables, you could remove the processed flag from the job table and signify completion simply be deleting the completed record from the table, rather than updating the processed flag.
Assuming you don't want to make such changes, you could still clean up the SQL without changing the table structure or semantics. You do need to lock the tuple to avoid a race condition with the release of the advisory lock. But rather than using the degenerate id=id construct (which some future maintainer is likely to remove, because it is not intuitively obvious why it is even there), you might as well just set the tuple to its final state by setting processed=true, and then removing that second update step from your step 4. This is safe to do because you do not issue an intermediate commit, so no one can see the tuple in this intermediate state of having processed=true but not yet really being processed.
UPDATE job
SET processed = true
WHERE id =
(
SELECT MIN(id)
FROM job
WHERE processed = false AND pg_try_advisory_xact_lock(id) = true
)
AND processed = false
RETURNING id
However, this query still has the unwanted feature that often someone looking for the next job to process will find no rows. That is because it suffered a race condition which was then filtered out by the outer processed=false condition. This is OK as long as your workers are prepared to retry, but it leads to needless contention in the database. This can be improved by making the inner select lock the tuple when it first encounters it by switching from a min(id) to a LIMIT 1 query:
UPDATE job
SET processed=true
WHERE id =
(
SELECT id
FROM job
WHERE processed = false AND pg_try_advisory_xact_lock(id) = true
order by id limit 1 for update
)
RETURNING id
If PostgreSQL allowed ORDER BY and LIMIT on UPDATES, then you could avoid the subselect altogether, but that is currently implemented (maybe it will be in 9.5).
For good performance (or even to avoid memory errors), you will need an index like:
create index on job (id) where processed = false;