How do I interpret postgresql deadlock message? - postgresql

I'm running a Postgresql 9.5.2 server, and I'm occasionally seeing a message like:
ERROR: deadlock detected
Detail: Process 1234 waits for ShareLock on transaction 3042999324; blocked by process 5678.
Process 5678 waits for ShareLock on transaction 3042999328; blocked by process 1234.
Hint: See server log for query details.
Where: while locking tuple (5389,30) in relation "asset"
If it contains any information about the row or column that's causing the deadlock, it will help me debug the big ugly common-table expression that's causing the error in the first place.

I figured it out while looking up the correct terminology to use while asking my question: tuple refers to the row's ctid, a system column on every row indicating the physical location of the version of the row in question. (When a row is updated, Postgresql keeps the old version around for a while in order to fulfill ACID guarantees.)
You can select the data simply with:
SELECT * from "asset" where ctid = '(5389,30)';
However, if you wait too long (like I did), an autovacuum job might clean up that version of the row if it's no longer in use.

Related

PSQL: VACUUM ANALYZE is showing incorrect oldest xmin

When I run vacuum verbose on a table, the result is showing an oldest xmin value of 9696975, as shown below:
table_xxx: found 0 removable, 41472710 nonremovable row versions in 482550 out of 482550 pages
DETAIL: 41331110 dead row versions cannot be removed yet, oldest xmin: 9696975
There were 0 unused item identifiers.
But when I check in pg_stat_activity, there are no entries with the backend_xmin value that matches this oldest xmin value.
Below is the response I get when I run the query:
SELECT backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
Response:
backend_xmin
------------
10134695
10134696
10134696
10134696
10134696
The issue I am facing is that the vacuum is not removing any dead tuples from the table.
I tried methods mentioned in: this post. But it didn't help.
edit:
The PostgreSQL version is 13.6 running in Aurora cluster.
A row is only completely dead when no live transaction can see it anymore. I.e. no transaction that has been started before the row was updated / deleted is still running. That does not necessarily involve any locks at all. The mere existence of a long-running transaction can block VACUUM from cleaning up.
So the system view to consult is pg_stat_activity. Look for zombi-transactions that you can kill. Then VACUUM can proceed.
Old prepared transactions can also block for the same reason. You can check pg_prepared_xacts for those.
Of course, VACUUM only runs on the primary server, not on replicas (standby) instances - in case streaming replication has been set up.
Related:
Long running function locking the database?
What are the consequences of not ending a database transaction?
What does backend_xmin and backend_xid represent in pg_stat_activity?
Do postgres autovacuum properties persist for DB replications?
Apart from old transactions, there are some other things that can hold the “xmin horizon” back:
stale replication slots (see pg_replication_slots)
abandoned prepared transactions (see pg_prepared_xacts)

Postgres add column on existing table takes very long

I have a table with 500k elements. Now I want to add a new column
(type boolean, nullable = false) without a default value.
The query to do so is running like for ever.
I'm using PostgreSQL 12.1, compiled by Visual C++ build 1914, 64-bit on my Windows 2012 Server
In pgAdmin I can see the query is blocked by PID 0. But when I execute this query, I can't see the query with pid = 0
SELECT *
FROM pg_stat_activity
Can someone help me here? Why is the query blocked and how can I fix this to add a new column to my table.
UPDATE attempt:
SELECT *
FROM pg_prepared_xacts
Update
It works after rollback all prepared transaction.
ROLLBACK PREPARED 'gid goes here';
You have got stale prepared transactions. I say that as in "you have got the measles", because it is a disease for a database.
Such prepared transactions keep holding locks and block autovacuum progress, so they will bring your database to its knees if you don't take action. In addition, such transactions are persisted, so even a restart of the database won't get rid of them.
Remove them with
ROLLBACK PREPARED 'gid goes here'; /* use the transaction names shown in the view */
If you use prepared transactions, you need a distributed transaction manager. That is a piece of software that keeps track of all prepared transactions and their state and persists that information, so that no distributed transaction can become stale. Even if there is a crash, the distributed transaction manager will resolve in-doubt transactions in all involved databases.
If you don't have that, don't use prepared transactions. You now know why. Best is to set max_prepared_transactions to 0 in that case.

Question about deadlocks in terms of Postgresql

Can someone please explain to me the following situation:
I am using Postgresql 12 as main rdbms in my project, there are several background jobs accessing and writing to the database in parallel, also there are some user interactions (which of course produce updates and inserts to the database from the front of application)
Periodically i am getting exceptions like this one:
SQLSTATE[40P01]: Deadlock detected: 7 ERROR: deadlock detected
DETAIL: Process 18046 waits for ShareLock on transaction 212488; blocked by process 31036.
Process 31036 waits for ShareLock on transaction 212489; blocked by process 18046.
HINT: See server log for query details.
CONTEXT: while updating tuple (1637,16) in relation "my_table"
Inside my application i don't lock manually any rows or tables during my transactions. But i have 'large' transactions that can modify a lot of rows in single operation frequently. So the questions are:
Does ordinary transactions produce table-wide locks, or row-wide locks? (I assume yes, unless this whole situation is magic)
Shouldn't the rdbms resolve automatically this kind of problems when two queries are trying to modify the same resource, if they are wrapped inside transaction?
If answer to the second question is "no" then how i should handle that kind of situations?
re 1) DML statements only lock the rows that are modified. There is no lock escalation in Postgres where the whole table is locked for writes. There is a "table lock" but that is only there to prevent concurrent DDL - a SELECT will also acquire that. Those share locks don't prevent DML on the table.
re 2) no, the DBMS can not resolve this because a deadlock means tx1 is waiting for a lock to be released from tx2 and tx2 is waiting for a lock to be released by tx1. How would the DBMS know what to do? The only way the DBMS can solve this is by choosing one of the two sessions as a victim and kill the transaction (which is the error you see).
re 3) the usual approach to avoiding deadlocks is to always update rows in the same order. Which usually turns the deadlock into a simple "lock wait" for the second transaction.
Assume the following UPDATE sequence
tx1 tx2
-------------------------------
update id = 1 |
| update id = 2
update id = 2 |
(tx1 waits) |
| update id = 1
(now we have a deadlock)
If you always update the rows in e.g. ascending order this changes to:
tx1 tx2
-------------------------------
update id = 1 |
| update id = 1
| (waits)
update id = 2 |
|
|
commit; |
(locks released) |
|
| update id = 2
| commit;
So you don't get a deadlock, just a wait for the second transaction.
All SQL statements that affect tables will take a lock on the table (albeit not necessarily a strong one). But that doesn't seem to be your problem here.
All SQL statements that modify a row (or SELECT ... FOR UPDATE) will lock the affected rows. Your two transactions probably blocked on a row-level lock.
Yes; that is what the error message shows. PostgreSQL has resolved the deadlock by killing one of the involved transactions.
If transactoin 1 holds a lock that transaction 2 is waiting for and vice versa, there is no other way to resolve the situation. The only way to release a lock is to end the transaction that holds it.
You should catch the error on your application code and retry the database transaction. A deadlock is a transient error.
If you get a lot of deadlocks, you should try to reduce them. What helps is to keep your transactions short and small. If that is not an option, make sure that all transactions that lock several rows lock them in the same order.

Can not execute select queries while making a long lasting insert transaction

I'm pretty new to PostgreSQL and I'm sure I'm missing something here.
The scenario is with version 11, executing a big drop table and insert transaction on a given table with the nodejs driver, which may take 30 minutes.
While doing that, if I try to query with select on that table using the jdbc driver, the query execution waits for the transaction to finish. If I close the transaction (by finishing it or by forcing it to exit), the jdbc query becomes responsive.
I thought I can read a table with one connection while performing a transaction with another one.
What am I missing here?
Should I keep the table (without dropping it at the beginning of the transaction) ?
DROP TABLE takes an ACCESS EXCLUSIVE lock on the table, which is there precisely to prevent it from taking place concurrently with any other operation on the table. After all, DROP TABLE physically removes the table.
Since all locks are held until the end of the database transaction, all access to the dropped table is blocked until the transaction ends.
Of course the files are only removed when the transaction commits, so you might wonder why PostgreSQL doesn't let concurrent transactions read in the mean time. But that would mean that COMMIT may be blocked by a concurrent reader, or a SELECT might cause a system error in the middle of reading, both of which don't sound appealing.

truncate on one table blocked by select of another

Postgres 9.4, Ubuntu 10
I have been unable to find this exact problem here, so here it goes:
For each table t in my database, I have a table t_audit. Each delete, insert, and update on table t triggers a function that inserts a record to table t_audit.
Each night, a process truncates each t_audit table.
Last night, a select on table t prevented the truncate on t_audit from proceeding. I did not save what was in pg_stat_activity at the time, but I did save the output from blocking_locks().
Blocking pid: RowExclusiveLock, t, select * from t where ...,
Waiting pid: AccessExclusiveLock, t_audit, truncate table t_audit,
I am uncertain as to why a select on t would block the truncate on t_audit. As I did not save pg_stat_activity, the best that I can assume is that the select was "idle in transaction". I asked the person who was running the query at the time, and he said he was not running the update as part of a transaction. He did update table t just prior to the select. He did not close his connection as the pid was still active until I ran pg_terminate_backend on the pid.
Has anyone experienced this issue before? Is there a recommended procedure for this other than running pg_terminate_backend on any pids which are "idle in transaction" just prior to calling truncates?
Thank you for reading and taking time to respond.
Are there any triggers in place that might cause even something as innocuous as a SELECT on the audit table at the same time as the TRUNCATE (although the fact that it's a Row Exclusive lock indicates that whatever is being triggered is something like an UPDATE instead)? From the PG 9.4 locking documentation, SELECT and TRUNCATE would indeed block each other as expected behavior. The relevant tidbits are these:
ACCESS SHARE
Conflicts with the ACCESS EXCLUSIVE lock mode only.
The SELECT command acquires a lock of this mode on referenced tables. In general, any query that only reads a table and does not modify it will acquire this lock mode.
ACCESS EXCLUSIVE
Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder is the only transaction accessing the table in any way.
Acquired by the DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM FULL commands. Many forms of ALTER TABLE also acquire a lock at this level.
And even more specifically telling is this explicit tip on that page:
Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement.
As for what to do in this scenario, if your use case is tolerant of unceremonious terminations of (possibly idle) connections, that is certainly a straightforward way of ensuring that the TRUNCATE succeeds.
A more flexible alternative may be to clear out the table with DELETE instead, and then follow up with some variation of VACUUM afterwards (DELETE and SELECT will not block each other, but it will block UPDATE). The suitability of this approach would depend a lot on things like the growth pattern of the table from day-to-day (simply VACUUM may be suitable if its maximum size is not that different day-to-day) and how badly you need that space reclaimed in the short term if it is a huge table - you may need to VACUUM FULL that table during a quiet window if you need the space quickly and badly, but VACUUM FULL is not a gentle hammer to swing by any means.