This question already has answers here:
Postgres UPDATE with ORDER BY, how to do it?
(2 answers)
Closed 3 years ago.
We're getting deadlocks in a situation where I thought they wouldn't happen due to sorting.
2019-09-11T20:21:59.505804531Z 2019-09-11 20:21:59.505 UTC [67] ERROR: deadlock detected
2019-09-11T20:21:59.505824424Z 2019-09-11 20:21:59.505 UTC [67] DETAIL: Process 67 waits for ShareLock on transaction 1277067; blocked by process 35.
2019-09-11T20:21:59.505829400Z Process 35 waits for ShareLock on transaction 1277065; blocked by process 67.
2019-09-11T20:21:59.505833648Z Process 67: UPDATE "records" SET "last_data_at" = '2019-09-11 20:21:58.493184' WHERE "records"."id" IN (SELECT "records"."id" FROM "records" WHERE "records"."id" IN ($1, $2) ORDER BY id asc)
2019-09-11T20:21:59.505843428Z Process 35: UPDATE "records" SET "last_data_at" = '2019-09-11 20:21:58.496318' WHERE "records"."id" IN (SELECT "records"."id" FROM "records" WHERE "records"."id" IN ($1, $2) ORDER BY id asc)
Here, since the ids from the (admittedly unnecessary) subquery will be sorted, I'd think a deadlock shouldn't be possible. Does IN not follow the ordering of the passed array? If not, how can I fix this?
(The subquery is coming from our ORM.)
What's the ORM you're using?
You could use advisory locking to mitigate the deadlocks:
UPDATE
"records"
SET
"last_data_at" = '2019-09-11 20:21:58.496318'
WHERE
"records"."id" IN ($1, $2)
--This function will return TRUE if getting
--a lock is possible for current transaction
AND pg_try_advisory_xact_lock("records"."id")
Honestly, IMHO relying on an order by clause to avoid deadlocks seems a bit fragile solution.
More info about advisory locking functions here.
Related
I have a table called user_table with 2 columns: id (integer) and data_value (integer).
Here are two transactions that end up with the same results:
-- TRANSACTION 1
BEGIN;
UPDATE user_table
SET data_value = 100
WHERE id = 0;
UPDATE user_table
SET data_value = 200
WHERE id = 1;
COMMIT;
-- TRANSACTION 2
BEGIN;
UPDATE user_table AS user_with_old_value SET
data_value = user_with_new_value.data_value
FROM (VALUES
(0, 100),
(1, 200)
) AS user_with_new_value(id, data_value)
WHERE user_with_new_value.id = user_with_old_value.id;
COMMIT;
I would like to know if there is a difference on the lock applied on the rows.
If I understand it correctly, transaction 1 will first lock user 0, then lock user 1, then free both locks.
But what does transaction 2 do ?
Does it do the same thing or does it do: lock user 0 and user 1, then free both locks ?
There is a difference because if i have two concurent transactions, if i write my queries as the first transaction, i might encounter deadlocks issues. But if I write my transactions like the second one, can I run into deadlocks issues ?
If it does the same thing, is there a way to write this transaction so that at the beginning of a transaction, before doing anything, the transaction checks for each rows it needs to update, waits until all this rows are not locked, then lock all rows at the same time ?
links:
the syntax of the second transaction comes from: Update multiple rows in same query using PostgreSQL
Both transactions lock the same two rows, and both can run into a deadlock. The difference is that the first transaction always locks the two rows in a certain order.
If your objective is to avoid deadlocks the first transaction is better: if you make a rule that any transaction must update the user with the lower id first, then no such two transactions can deadlock with each other (but they can still deadlock with other transactions that do not obey that rule).
I see this from pg badger: error waiting for AccessExclusiveLock on tuple
My query:
with old_values as (
select * from info where id = $1 for update
)
new_values as (
insert into info (id, data) values $1, $2
ON CONFLICT ON CONSTRAINT info_pkey DO UPDATE
set data = $2
)
select * from old_values
It seems like we should only be getting 1 query at a time per primary key.
We have over 2k records / pk's with almost no new inserts.
We have lots of queries updating the data fields in the table.
Is there a way to improve this and/or avoid locks?
It's fine if it skips the update due to another query updating the same row at the same time.
I'd can't avoid getting the old values back.
Thanks
I'm using Postgres 9.6.5.
The docs say:
13.3. Explicit Locking
13.3.2. Row-level Locks" -> "Row-level Lock Modes" -> "FOR UPDATE":
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). ...
The mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.
Regarding UPDATEs on rows that are locked via "SELECT FOR UPDATE" in another transaction, I read the above as follows: other transactions that attempt UPDATE of these rows will be blocked until the current transaction ( which did "SELECT FOR UPDATE" for those rows ) ends, unless the columns in these rows being UPDATE'ed are those that don't have a unique index on them that can be used in a foreign key.
Is this correct ? If so, if I have a table "program" with a text column "stage" ( this column doesn't fit "have a unique index on them that can be used in a foreign key" ), and I have a transaction that does "SELECT FOR UPDATE" for some rows followed by UPDATE'ing "stage" in these rows, is it correct that other concurrent transactions UPDATE'ing "stage" on these rows can fail, rather than block until the former transaction ends ?
Your transactions can fail if a deadlock is detected in one of the UPDATE or SELECT...FOR UPDATE, for example:
Transaction 1
BEGIN;
SELECT * FROM T1 FOR UPDATE;
SELECT * FROM T2 FOR UPDATE;
-- Intentionally, no rollback nor commit yet
Transaction 2
BEGIN;
SELECT * FROM T2 FOR UPDATE; -- blocks T2 first
SELECT * FROM T1 FOR UPDATE; -- exception will happen here
The moment the second transaction tries to lock T1 you'll get:
ERROR: 40P01: deadlock detected
DETAIL: Process 15981 waits for ShareLock on transaction 3538264; blocked by process 15942.
Process 15942 waits for ShareLock on transaction 3538265; blocked by process 15981.
HINT: See server log for query details.
CONTEXT: while locking tuple (0,1) in relation "t1"
LOCATION: DeadLockReport, deadlock.c:956
Time: 1001.728 ms
This question already has answers here:
Insert, on duplicate update in PostgreSQL?
(18 answers)
Closed 8 years ago.
I've implemented simple update/insert query like this:
-- NOTE: :time is replaced in real code, ids are placed statically for example purposes
-- set status_id=1 to existing rows, update others
UPDATE account_addresses
SET status_id = 1, updated_at = :time
WHERE account_id = 1
AND address_id IN (1,2,3)
AND status_id IN (2);
-- filter values according to what that update query returns, i.e. construct query like this to insert remaining new records:
INSERT INTO account_addresses (account_id, address_id, status_id, created_at, updated_at)
SELECT account_id, address_id, status_id, created_at::timestamptz, updated_at::timestamptz
FROM (VALUES (1,1,1,:time,:time),(1,2,1,:time,:time)) AS sub(account_id, address_id, status_id, created_at, updated_at)
WHERE NOT EXISTS (
SELECT 1 FROM account_addresses AS aa2
WHERE aa2.account_id = sub.account_id AND aa2.address_id = sub.address_id
)
RETURNING id;
-- throws:
-- PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "..."
-- DETAIL: Key (account_id, address_id)=(1, 1) already exists.
The reason why I'm doing it this way is: the record MAY exist with status_id=2. If so, set status_id=1.
Then insert new records. If it already exists, but was not affected by first UPDATE query, ignore it (i.e. rows with status_id=3).
This works nicely, but doing it concurrently, it crashes on duplicate key in race condition.
But why is race condition occurring, if I'm trying to do that "insert-where-not-exists" atomically?
Ah. I just searched a little more and insert where not exists is not atomic.
Quote from http://www.postgresql.org/message-id/26970.1296761016#sss.pgh.pa.us :
Mage writes:
The main question is that isn't "insert into ... select ... where not
exists" atomic?
No, it isn't: it will fail in the presence of other transactions
doing the same thing, because the EXISTS test will only see rows that
committed before the command started. You might care to read the
manual's chapter about concurrency:
http://www.postgresql.org/docs/9.0/static/mvcc.html
I have two query that deadlock together
PERFORM id
FROM stack
WHERE id IN (SELECT tmp.stkid FROM tmp_push_bulk tmp WHERE tmp.stkid > 0)
ORDER BY id
FOR UPDATE OF stack
And
PERFORM stk.id
FROM stack stk
WHERE stk.referer IN (
SELECT tmp.id
FROM tmp_renew_stk tmp
)
ORDER BY stk.id
FOR UPDATE OF stk
The error is:
- PG (20:46:37) [14786]: Execute command failed: ERROR: deadlock detected
DETAIL: Process 14797 waits for ShareLock on transaction 183495696; blocked by process 24303.
Process 24303 waits for ShareLock on transaction 183495704; blocked by process 14797.
HINT: See server log for query details.
I also think that every process lock its row in ordering of id column, so deadlock is impossible. Can anyone tell me why?
It might be that the IN expression locks rows in unspecific order (untested). I would generally replace IN on bigger sets with a JOIN where possible. This is faster to begin with, thereby minimizing the chance for deadlocks. I would try:
Update: According to your comment you have many duplicates. Assuming dupes in the temp table I suggest a subquery for three purposes:
Fold duplicates.
Using DISTINCT it's particularly cheap to sort rows in the subquery at the same.
Apply WHERE conditon AND tmp.stkid > 0
PERFORM s.id
FROM stack s
JOIN (
SELECT DISTINCT stkid
FROM tmp_push_bulk
WHERE stkid > 0
) tmp ON tmp.stkid = s.id
ORDER BY 1
FOR UPDATE OF s;
And:
PERFORM s.id
FROM stack s
JOIN (
SELECT DISTINCT id
FROM tmp_renew_stk
) tmp ON tmp.id = s.referer
ORDER BY 1
FOR UPDATE OF s;