When I run vacuum verbose on a table, the result is showing an oldest xmin value of 9696975, as shown below:
table_xxx: found 0 removable, 41472710 nonremovable row versions in 482550 out of 482550 pages
DETAIL: 41331110 dead row versions cannot be removed yet, oldest xmin: 9696975
There were 0 unused item identifiers.
But when I check in pg_stat_activity, there are no entries with the backend_xmin value that matches this oldest xmin value.
Below is the response I get when I run the query:
SELECT backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
Response:
backend_xmin
------------
10134695
10134696
10134696
10134696
10134696
The issue I am facing is that the vacuum is not removing any dead tuples from the table.
I tried methods mentioned in: this post. But it didn't help.
edit:
The PostgreSQL version is 13.6 running in Aurora cluster.
A row is only completely dead when no live transaction can see it anymore. I.e. no transaction that has been started before the row was updated / deleted is still running. That does not necessarily involve any locks at all. The mere existence of a long-running transaction can block VACUUM from cleaning up.
So the system view to consult is pg_stat_activity. Look for zombi-transactions that you can kill. Then VACUUM can proceed.
Old prepared transactions can also block for the same reason. You can check pg_prepared_xacts for those.
Of course, VACUUM only runs on the primary server, not on replicas (standby) instances - in case streaming replication has been set up.
Related:
Long running function locking the database?
What are the consequences of not ending a database transaction?
What does backend_xmin and backend_xid represent in pg_stat_activity?
Do postgres autovacuum properties persist for DB replications?
Apart from old transactions, there are some other things that can hold the “xmin horizon” back:
stale replication slots (see pg_replication_slots)
abandoned prepared transactions (see pg_prepared_xacts)
Related
As per my understanding, I can see that a transaction is holding a snapshot by either of the columns backend_xid or backend_xmin not being NULL in pg_stat_activity.
I am currently investigating cases where backend_xid is not null for sessions from dbeaver and I don't understand why the transaction is requiring a snapshot. This is of interest as long running transaction that are holding a snapshot can cause problems, for autovacuum for instance.
My question is: Can I (serverside) find the reason why a transaction is holding a snapshot? Is there a table where I can see why the transaction is holding a snapshot?
backend_xid is the transaction ID of the session and does not mean that the session has an active snapshot. The documentation says:
Top-level transaction identifier of this backend, if any.
backend_xmin is described as
The current backend's xmin horizon.
“xmin horizon” is PostgreSQL jargon and refers to the lowest transaction ID that was active when the snapshot was taken. It is an upper limit of what VACUUM is allowed to remove.
Platform Heroku
PG version 13
I have a very busy database and it is reaching near transaction wrap around.
At this point I really want to do the vacuum manually.
My question is that if I do manual vacuuming of individual tables then I can see that the txid restores to its minimum value. But the global txid is not changed.
Is the individual vacuuming of tables enough ?
Do I still have to do vacuum database ?
Yes, a manual VACUUM of individual tables will do the trick.
Look at the relfrozenxid and relminmxid columns in the pg_class entries for that database. Find the oldest ones. One or more of these should be equal to datfrozenxid and datminmxid in pg_database. If you VACUUM those tables, the values for the database should advance.
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS