Am newbie in PostgreSQL(Version 9.2) Database development. While looking one of my table a saw an option called autovaccum.
many of my table contains 20000+ rows.For testing purpose I've altered one of that table like below
ALTER TABLE theTable SET (
autovacuum_enabled = true
);
So,I wish to know the benefits/advantages/disadvantages(if any) autovacuuming a table ?
Autovacuum is enabled by default in current versions of Postgres (and has been for a while). It's generally a good thing to have enabled for performance and other reasons.
Prior to autovacuuming, you would need to explicitly vacuum tables yourself (via cronjobs which executed psql commands to vacuum them, or similar) in order to get rid of dead tuples, etc. Postgres has for a while now managed this for you via autovacuum.
I have in some cases, with tables that have immense churn (i.e. very high rates of insertions and deletions) found it necessary to still explicitly vacuum via a cron in order to keep the dead tuple count low and performance high, because the autovacuum doesn't kick in fast enough, but this is something of a niche case.
More info: http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html
Related
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
We have a postgres database in Amazon RDS. Initially, we needed to load large amount of data quickly, so autovacuum was turned off according to the best practice suggestion from Amazon. Recently I noticed some performance issue when running queries. Then I realized it has not been vacuumed for a long time. As it turns out many tables have lots of dead tuples.
Surprisingly, even after I manually ran vacuum commands on some of the tables, it did not seem to remove these dead tuples at all. vacuum full takes too long to finish which usually ends up timed out after a whole night.
Why does vacuum command not work? What are my other options, restart the instance?
Use VACUUM (VERBOSE) to get detailed statistics of what it is doing and why.
There are three reasons why dead tuples cannot be removed:
There is a long running transaction that has not been closed. You can find the bad boys with
SELECT pid, datname, usename, state, backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
You can get rid of a transaction with pg_cancel_backend() or pg_terminate_backend().
There are prepared transactions which have not been commited. You can find them with
SELECT gid, prepared, owner, database, transaction
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC;
User COMMIT PREPARED or ROLLBACK PREPARED to close them.
There are replication slots which are not used. Find them with
SELECT slot_name, slot_type, database, xmin
FROM pg_replication_slots
ORDER BY age(xmin) DESC;
Use pg_drop_replication_slot() to delete an unused replication slot.
https://dba.stackexchange.com/a/77587/30035 explains why not all dead tuples are removed.
for vacuum full not to time out, set statement_timeout = 0
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL recommends disabling autovacuum for the time of database restore, further they definetely recommend using it:
Important
Not running autovacuum can result in an eventual required outage to
perform a much more intrusive vacuum operation.
Canceling all sessions and vacuuming table should help with previous dead tuples (regarding your suggestion to restart cluster). But what I suggest you to do in first place - switch autovacuum on. And better probably control vacuum on table, not on the whole cluster with autovacuum_vacuum_threshold, (ALTER TABLE) reference here: https://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
I have given a postgres 9.2 DB around 20GB of size.
I looked through the database and saw that it has been never run vacuum and/or analyze on any tables.
Autovacuum is on and the transaction wraparound limit is very far (only 1% of it).
I know nothing about the data activity (number of deletes,inserts, updates), but I see, it uses a lot of index and sequence.
My question is:
does the lack of vacuum and/or analyze affect data integrity (for example a select doesn't show all the rows matches the select from a table or from an index)? The speed of querys and writes doesn't matter.
is it possible that after the vacuum and/or analyze the same query gives a different answer than it would executed before the vacuum/analyze command?
I'm fairly new to PG, thank you for your help!!
Regards,
Figaro88
Running vacuum and/or analyze will not change the result set produced by any select operation (unless there was a bug in PostgreSQL). They may effect the order of results if you do not supply an ORDER BY clause.
I have seen this answer, How to apply PostgreSQL UNLOGGED feature to an existing table?, which basically suggests that the way to convert a table to unlogged is to run:
CREATE UNLOGGED TABLE your_table_unlogged AS SELECT * FROM your_table;
Is this still the case, because while this is an of obvious working solution, for a large table there are potential time and disk space factors that could come into play. And, if yes, could someone please explain briefly how the architecture of Postgres means you need to rewrite an entire table in order to make it unlogged?
Update: In PostgreSQL 9.5+ there is ALTER TABLE ... SET LOGGED and ... SET UNLOGGED
Converting from UNLOGGED to LOGGED requires that the whole table's data be written to xlogs if wal_level is > minimal so replicas get a copy. So it's not free, but it can still be worth creating a table unlogged, populating it then setting it logged if you have a bunch of cleanup and deletion and merging work to do on the table after initial load.
Yes, that's still the case in 9.4.
Converting from logged to UNLOGGED isn't theoretically hard AFAIK, but nobody's done the work to do it. The main thing is that all constraints and types etc referring to it must be re-checked to make sure there's no reference from another logged table to this table. Most attention has been paid to the other case, so if this feature is important to you, consider funding its development or getting involved in development yourself.
Converting UNLOGGED to logged may become possible for nodes that aren't involved in streaming replication or using an archive_command. It's not simple otherwise because of the need to cope with the fact that the data for the table wasn't sent, but suddenly changes to it are - the replication protocol would need further enhancement to allow the table to be base-copied before continuing.
Apparently this (alter table ... set logged | unlogged) has been implemented in (the upcoming) postgresql 9.5.
I've got PostgreSQL 9.2 and a tiny database with just a bit of seed data for a website that I'm working on.
The following query seems to run forever:
ALTER TABLE diagnose_bodypart ADD description text NOT NULL;
diagnose_bodypart is a table with less than 10 rows. I've let the query run for over a minute with no results. What could be the problem? Any recommendations for debugging this?
Adding a column does not require rewriting a table (unless you specify a DEFAULT). It is a quick operation absent any locks. pg_locks is the place to check, as Craig pointed out.
In general the most likely cause are long-running transactions. I would be looking at what work-flows are hitting these tables and how long the transactions are staying open for. Locks of this sort are typically transactional and so committing transactions will usually fix the problem.