I have given a postgres 9.2 DB around 20GB of size.
I looked through the database and saw that it has been never run vacuum and/or analyze on any tables.
Autovacuum is on and the transaction wraparound limit is very far (only 1% of it).
I know nothing about the data activity (number of deletes,inserts, updates), but I see, it uses a lot of index and sequence.
My question is:
does the lack of vacuum and/or analyze affect data integrity (for example a select doesn't show all the rows matches the select from a table or from an index)? The speed of querys and writes doesn't matter.
is it possible that after the vacuum and/or analyze the same query gives a different answer than it would executed before the vacuum/analyze command?
I'm fairly new to PG, thank you for your help!!
Regards,
Figaro88
Running vacuum and/or analyze will not change the result set produced by any select operation (unless there was a bug in PostgreSQL). They may effect the order of results if you do not supply an ORDER BY clause.
Related
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
How to find in PostgreSQL 9.5 what is causing deadlock error/failure when doing full vacuumdb over database with option --jobs to run full vacuum in parallel.
I just get some process numbers and table names... How to prevent this so I could successfuly do full vacuum over database in parallel?
Completing a VACUUM FULL under load is a pretty hard task. The problem is that Postgres is contracting space taken by the table, thus any data manipulation interferes with that.
To achieve a full vacuum you have these options:
Lock access to the vacuumed table. Not sure if acquiring some exclusive lock will help, though. You may need to prevent access to the table on application level.
Use a create new table - swap (rename tables) - move data - drop original technique. This way you do not contract space under the original table, you free it by simply dropping the table. Of course you are rebuilding all indexes, redirecting FKs, etc.
Another question is: do you need to VACUUM FULL? The only thing it does that VACUUM ANALYZE does not is contracting the table on the file system. If you are not very limited by disk space you do not need doing a full vacuum that much.
Hope that helps.
Am newbie in PostgreSQL(Version 9.2) Database development. While looking one of my table a saw an option called autovaccum.
many of my table contains 20000+ rows.For testing purpose I've altered one of that table like below
ALTER TABLE theTable SET (
autovacuum_enabled = true
);
So,I wish to know the benefits/advantages/disadvantages(if any) autovacuuming a table ?
Autovacuum is enabled by default in current versions of Postgres (and has been for a while). It's generally a good thing to have enabled for performance and other reasons.
Prior to autovacuuming, you would need to explicitly vacuum tables yourself (via cronjobs which executed psql commands to vacuum them, or similar) in order to get rid of dead tuples, etc. Postgres has for a while now managed this for you via autovacuum.
I have in some cases, with tables that have immense churn (i.e. very high rates of insertions and deletions) found it necessary to still explicitly vacuum via a cron in order to keep the dead tuple count low and performance high, because the autovacuum doesn't kick in fast enough, but this is something of a niche case.
More info: http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html
I've got PostgreSQL 9.2 and a tiny database with just a bit of seed data for a website that I'm working on.
The following query seems to run forever:
ALTER TABLE diagnose_bodypart ADD description text NOT NULL;
diagnose_bodypart is a table with less than 10 rows. I've let the query run for over a minute with no results. What could be the problem? Any recommendations for debugging this?
Adding a column does not require rewriting a table (unless you specify a DEFAULT). It is a quick operation absent any locks. pg_locks is the place to check, as Craig pointed out.
In general the most likely cause are long-running transactions. I would be looking at what work-flows are hitting these tables and how long the transactions are staying open for. Locks of this sort are typically transactional and so committing transactions will usually fix the problem.