PostgreSQL deadlocks on two table locks - postgresql

I am locking two different tables that have no interconnected columns, but I still get a deadlock.
Here is the server logs:
2013-10-22 15:16:19 EDT ERROR: deadlock detected
2013-10-22 15:16:19 EDT DETAIL: Process 26762 waits for AccessExclusiveLock on relation 39913 of database 39693; blocked by process 26761.
Process 26761 waits for RowExclusiveLock on relation 40113 of database 39693; blocked by process 26762.
Process 26762: lock table par_times in access exclusive mode
Process 26761: INSERT INTO cached_float (entry_id, figure_type, value) VALUES (33225, 1, 54.759402056277075) RETURNING cached_float.id
Any ideas why?

You can debug this by checking the numbers displayed here:
Process 26762 waits for AccessExclusiveLock on relation 39913 of
database 39693; blocked by process 26761.
Process 26761 waits for RowExclusiveLock on relation 40113 of database
39693; blocked by process 26762.
Run in your database:
SELECT 39913::regclass AS tbl1, 40113::regclass AS tbl2
to see the involved tables. Also consider any triggers and possibly foreign key contraints on involved tables.
Generally: locking tables manually does not necessarily prevent deadlocks. It may be the cause of the the deadlock to begin with.

Related

How to avoid long delay before finally getting "40001 could not serialize access due to concurrent update"

We have a Postgres 12 system running one master master and two async hot-standby replica servers and we use SERIALIZABLE transactions. All the database servers have very fast SSD storage for Postgres and 64 GB of RAM. Clients connect directly to master server if they cannot accept delayed data for a transaction. Read-only clients that accept data up to 5 seconds old use the replica servers for querying data. Read-only clients use REPEATABLE READ transactions.
I'm aware that because we use SERIALIZABLE transactions Postgres might give us false positive matches and force us to repeat transactions. This is fine and expected.
However, the problem I'm seeing is that randomly a single line INSERT or UPDATE query stalls for a very long time. As an example, one error case was as follows (speaking directly to master to allow modifying table data):
A simple single row insert
insert into restservices (id, parent_id, ...) values ('...', '...', ...);
stalled for 74.62 seconds before finally emitting error
ERROR 40001 could not serialize access due to concurrent update
with error context
SQL statement "SELECT 1 FROM ONLY "public"."restservices" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
We log all queries exceeding 40 ms so I know this kind of stall is rare. Like maybe a couple of queries a day. We average around 200-400 transactions per second during normal load with 5-40 queries per transaction.
After finally getting the above error, the client code automatically released two savepoints, rolled back the transaction and disconnected from database (this cleanup took 2 ms total). It then reconnected to database 2 ms later and replayed the whole transaction from the start and finished in 66 ms including the time to connect to the database. So I think this is not about performance of the client or the master server as a whole. The expected transaction time is between 5-90 ms depending on transaction.
Is there some PostgreSQL connection or master configuration setting that I can use to make PostgreSQL to return the error 40001 faster even if it caused more transactions to be rolled back? Does anybody know if setting
set local statement_timeout='250'
within the transaction has dangerous side-effects? According to the documentation https://www.postgresql.org/docs/12/runtime-config-client.html "Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions" but I could set the timeout only for transactions by this client that's able to automatically retry the transaction very fast.
Is there anything else to try?
It looks like someone had the parent row to the one you were trying to insert locked. PostgreSQL doesn't know what to do about that until the lock is released, so it blocks. If you failed rather than blocking, and upon failure retried the exact same thing, the same parent row would (most likely) still be locked and so would just fail again, and you would busy-wait. Busy-waiting is not good, so blocking rather than failing is generally a good thing here. It blocks and then unblocks only to fail, but once it does fail a retry should succeed.
An obvious exception to blocking-better-than-failing being if when you retry, you can pick a different parent row to retry with, if that make sense in your context. In this case, maybe the best thing to do is explicitly lock the parent row with NOWAIT before attempting the insert. That way you can perhaps deal with failures in a more nuanced way.
If you must retry with the same parent_id, then I think the only real solution is to figure out who is holding the parent row lock for so long, and fix that. I don't think that setting statement_timeout would be hazardous, but it also wouldn't solve your problem, as you would probably just keep retrying until the lock on the offending row is released. (Setting it on the other session, the one holding the lock, might be helpful, depending on what that session is doing while the lock is held.)

PostgreSQL detected deadlock error when create table partition

I am using PostgreSQL 10 new partition table feature. And I have a partitioned parent table looks like this
CREATE TABLE names (
id integer,
name varchar(64)
) PARTITION BY RANGE (id)
And I have many nodes creating partition table in parallel like this when loading the data into the database
CREATE TABLE names__139__119230558__120050888 PARTITION OF names
FOR VALUES FROM ('119230558') TO ('120050888')
Then somehow, deadlock detected error will raise like this
ERROR: deadlock detected
DETAIL: Process 7802 waits for AccessExclusiveLock on relation 16401 of database 16390; blocked by process 7803.
Process 7803 waits for AccessExclusiveLock on relation 16401 of database 16390; blocked by process 7802.
Process 7802:
CREATE TABLE names__139__119230558__120050888 PARTITION OF names
FOR VALUES FROM ('119230558') TO ('120050888')
Process 7803:
CREATE TABLE names__94__80601867__81503664 PARTITION OF names
FOR VALUES FROM ('80601867') TO ('81503664')
Here's the odd part, not surprising, deadlock means two process waiting resource to be released by other processes while they are holding the resource required by each other, however, look at deadlock error closly
Process 7802 waits for AccessExclusiveLock on relation 16401 of database 16390; blocked by process 7803.
Process 7803 waits for AccessExclusiveLock on relation 16401 of database 16390; blocked by process 7802.
It's actually waiting for the same resource relation 16401 of database 16390, which is names parent partition table in this case. So the question is, how come a deadlock will be detected when two processes are actually acquiring the same resource? I thought the second process should wait for the first one to finish.

How to debug ShareLock in Postgres

I am seeing quite a few occurrences of the following in my Postgres server log:
LOG: process x still waiting for ShareLock on transaction y after 1000.109 ms
DETAIL: Process holding the lock: z. Wait queue: x.
CONTEXT: while inserting index tuple (a,b) in relation "my_test_table"
SQL function "my_test_function" statement 1
...
LOG: process x acquired ShareLock on transaction y after 1013.664 ms
CONTEXT: while inserting index tuple (a,b) in relation "my_test_table"
I am running Postgres 9.5.3. In addition I am running on Heroku so I don't have access to the fine grained superuser-only debugging tools.
I am wondering how best to debug such an issue given these constraints and the fact each individual lock is relatively transient (generally 1000-2000ms).
Things I have tried:
Monitoring pg_locks (and joining to pg_class for context).
Investigating pageinspect.
Replicating locally both by hand and with pgbench where I do have superuser perms. I have so far been unable to replicate the issue locally (I suspect due to having a much smaller data set but I can't be sure).
It is worth noting that CPU utilisation appears high (load average of >1) when I see these issues so it's possible there is nothing wrong with the above per se and that I'm seeing it as a consequence of insufficient system resources being available. I would still like to understand how best to debug it though so I can understand what exactly is happening.
The key thing is that it's a ShareLock on the transaction.
This means that one transaction is waiting for another to commit/rollback before it can proceed. It's only loosely a "lock". What's happening here is that a PostgreSQL transaction takes an ExclusiveLock on its own transaction ID when it starts. Other transactions that want to wait for it to finish can try to acquire a ShareLock on the transaction, which will block until the ExclusiveLock is released on commit/abort. It's basically using the locking mechanism as a convenience to implement inter-transaction completion signalling.
This usually happens when the waiting transaction(s) are trying to INSERT a UNIQUE or PRIMARY KEY value for a row that's recently inserted/modified by the waited-on transaction. The waiting transactions cannot proceed until they know the outcome of the waited-on transaction - whether it committed or rolled back, and if it committed, whether the target row got deleted/inserted/whatever.
That's consistent with what's in your error message. proc "x" is trying to insert into "my_test_table" and has to wait until proc "y" commits xact "z" to find out whether to raise a unique violation or whether it can proceed.
Most likely you have contention in some kind of upsert or queue processing system. This can also happen if you have some function/transaction pattern that tries to insert into a heavily contended table, then does a lot of other time consuming work before it commits.

postmortem deadlock debugging in PostgreSQL

I want to collect post-mortem debugging information about both the "winner" transaction and the "loser" transaction(s) in a PostgreSQL deadlock.
I found this wiki page which includes some good live views that would give hints about what is currently going wrong, but if I understand correctly, by the time the losing transaction is already being rolled back most of the most useful information will already have been removed from these live views.
I saw options such as deadlock_timeout and log_lock_waits which log information about the losing transaction, but notably not the winning transaction. There doesn't appear to be any way to customize the log output produced to include more detailed information than this (notably, none of these integers mean anything when I'm debugging based on logs after the fact):
LOG: process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms
DETAIL: Process holding the lock: 11366. Wait queue: 11367.
CONTEXT: while updating tuple (0,2) in relation "foo"
STATEMENT: UPDATE foo SET value = 3;
Is there a better data source I can use to collect this information?
First, the trace pasted into the question is not a deadlock trace, rather a warning about locks on resources that aren't available since long enough (longer than deadlock_timeout). It's not an error and it does not abort a transaction, whereas a deadlock is fatal to a transaction.
I want to collect post-mortem debugging information about both the
"winner" transaction and the "loser" transaction(s) in a PostgreSQL
deadlock.
They are in the server log, along with the query that gets terminated.
As an example, here's a deadlock trace with log_line_prefix = '%t [%p] ' for the case mentioned in this question: postgres deadlock without explicit locking
2015-04-09 15:16:42 CEST [21689] ERROR: deadlock detected
2015-04-09 15:16:42 CEST [21689] DETAIL: Process 21689 waits for ShareLock on transaction 1866436; blocked by process 21028.
Process 21028 waits for ShareLock on transaction 1866435; blocked by process 21689.
Process 21689: insert into b values(1);
Process 21028: insert into a values(1);
2015-04-09 15:16:42 CEST [21689] HINT: See server log for query details.
2015-04-09 15:16:42 CEST [21689] STATEMENT: insert into b values(1);
The "looser" is PID 21689 as the producer of the error. The "winner" is PID 21028 by virtue of just being the other one.
If looking at it from the point of the view of the client, its gets this mesage:
ERROR: deadlock detected
DETAIL: Process 21689 waits for ShareLock on transaction 1866436; blocked by process 21028.
Process 21028 waits for ShareLock on transaction 1866435; blocked by process 21689.
HINT: See server log for query details.
There is no mention of the query, but that's the one that client just sent. There is no mention of the looser, but it's the one which gets this error, the other one doesn't have to notice anything.

Can multiple SELECT FOR UPDATES in a single transaction cause a race condition (Postgres)?

I'm using Postgres 9.1. I'm wondering if using multiple SELECT FOR UPDATES in the same transaction could potentially cause a race condition.
2 concurrent transactions:
transaction 1: select for update on table 1 -- successfully acquires lock
transaction 2: select for update on table 2 -- successfully acquires lock
transaction 2: select for update on table 1 -- waiting for lock release from transaction 1
transaction 1: select for update on table 2 -- waiting for lock release from transaction 2
What happens in this situation? Does one of the waiting transactions eventually time out? If so, is there a way to configure the timeout duration?
edit: is deadlock_timeout the configuration I am looking for?
Yes, you should look for the deadlock_timeout in the docs.
But your scenario doesn't means that there will be a deadlock, 'cos PostgreSQL is using row-level locks and it is not clear whether your transactions are concurring for the same rows.
Another option is to use serialization level higher then default READ COMMITTED. But in this case your application should be ready to receive exceptions with SQLCODE=40001:
ERROR: could not serialize access due to concurrent update
This is expected, you should just re-try transaction as is.
A very good overview of Serializable isolation level you can find on the wiki.
PostgreSQL will detect the deadlock on step 4 and will fail the transaction. Here's what happened when I tried it in psql (only showing step 4):
template1=# SELECT * FROM table2 FOR UPDATE;
ERROR: deadlock detected
DETAIL: Process 17536 waits for ShareLock on transaction 166946; blocked by process 18880.
Process 18880 waits for ShareLock on transaction 166944; blocked by process 17536.
HINT: See server log for query details.
template1=#
This happens after 1s, which is the default timeout. The other answer has more information about this.