why do we use vaccum command in PostgreSQL exactly? - postgresql

VACUUM reclaims storage occupied by deleted tuples. In normal
PostgreSQL operation, tuples that are deleted or obsoleted by an update are
not physically removed from their table; they remain present until a
VACUUM is done. Therefore it's necessary to do VACUUM periodically,
especially on frequently-updated tables.
This is what I got from man page. I want to know some cases where it is really required.

If you DELETE rows or UPDATE them, then VACUUM is required to free the space for re-use. PostgreSQL usually does this automatically with autovacuum, so it is not common for you to need to manually run VACUUM.
You might manually run VACUUM after updating a large proportion of a table, especially if you're then going to do another big update.

Related

Is it possible to run VACUUM FULL for a short while and get some benefit?

Is it possible to run PostgreSQL 11's VACUUM FULL for a short while and then get some benefit? Or does cancelling it midway cause all of its progress to be lost?
I've read about pg_repack (https://aws.amazon.com/blogs/database/remove-bloat-from-amazon-aurora-and-rds-for-postgresql-with-pg_repack/) but the way it works (creating new tables, copying data, etc.) sounds risky to me. Is that my paranoia or is it safe to use on a production database?
Backstory: I am working with a very large production database on AWS Aurora PostgreSQL 11. Many of the tables had tens of millions of records but have been pruned down significantly. The problem is that the table sizes on disk (and in the snapshots) have not decreased because DELETE and VACUUM (without FULL) do not shrink the files. These tables are in the hundreds of gigabytes range and I'm afraid running VACUUM FULL will take forever.
No. VACUUM FULL writes a new physical file for the table. Stopping it before it finishes voids the work done so far.
The manual:
VACUUM FULL rewrites the entire contents of the table into a new
disk file with no extra space, allowing unused space to be returned to
the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.
This is the main reason why community tools like pg_repack or pg_squeeze were created, which are more flexible, less blocking, and often faster, too. (I don't think pg_squeeze is available for Aurora, yet).
pg_repack might be a bit of overkill. You can instead just delete tuples from the end of the table and reinsert them towards the front of the table (reusing space already marked as free by an earlier VACUUM), at which point another ordinary VACUUM can truncate away the free space at the end of the table.
with d as (delete from mytable where ctid>='(50000,1)' returning *)
insert into mytable select * from d;
You can use pg_freespacemap to figure out where would be a good place to start the ctid criterion at.
This might not behave well if you have triggers or FK constraints, and it might bloat indexes such they would need to be rebuilt (but they probably do anyway). It will also lock a large number rows at a time, for the duration it takes for the re-insert to run and commit.
Improvements made since v11 will make the ctid scan more efficient than it will be in v11.

When should one vacuum a database, and when analyze?

I just want to check that my understanding of these two things is correct. If it's relevant, I am using Postgres 9.4.
I believe that one should vacuum a database when looking to reclaim space from the filesystem, e.g. periodically after deleting tables or large numbers of rows.
I believe that one should analyse a database after creating new indexes, or (periodically) after adding or deleting large numbers of rows from a table, so that the query planner can make good calls.
Does that sound right?
vacuum analyze;
collects statistics and should be run as often as much data is dynamic (especially bulk inserts). It does not lock objects exclusive. It loads the system, but is worth of. It does not reduce the size of table, but marks scattered freed up place (Eg. deleted rows) for reuse.
vacuum full;
reorganises the table by creating a copy of it and switching to it. This vacuum requires additional space to run, but reclaims all not used space of the object. Therefore it requires exclusive lock on the object (other sessions shall wait it to complete). Should be run as often as data is changed (deletes, updates) and when you can afford others to wait.
Both are very important on dynamic database
Correct.
I would add that you can change the value of the default_statistics_target parameter (default to 100) in the postgresql.conf file to a higher number, after which, you should restart your server and run analyze to obtain more accurate statistics.

How to Remove Dead Row Version in Postgres 9.2

I ran a vaccum on the tables of my database and it appears it is not helping me. For example I have a huge table and when I run vacuum on it, it returns there are 87887889 dead row versions that can not be deleted
My question is how to get rid of these dead rows
You have two basic options if a routine vacuum is insufficient. Both require a full table lock.
VACUUM FULL. This requires no additional disk space but takes a while to complete.
CLUSTER. This rewrites a table in a physical order optimized for a given index. It requires additional space to do the rewrite, but is much faster.
In general I would recommend using CLUSTER during a maintenance window if disk space allows.

PostgreSQL: DELETE won't free memory

I use PostgreSQL on an embedded system with limited drive space. Now the DB-drive is full. When I delete data, it doesn't seem to free up any space. I tried to VACUUM FULL, but that requires space. So does deleting the last remaining index.
Any ideas on how to free up space without randomly deleting stuff? I can afford to lose some of the data from back when, but I can't seem to actually do it, since there isn't enough space to VACUUM FULL.
PostgreSQL uses MVCC model which means that deleted records mark their space as free (after the transaction which deleted them had been committed) but it is still reserved by the table.
Prior to PostgreSQL 9.0, VACUUM FULL used to move the data inside the table without need for additional space.
In PostgreSQL 9.0, behavior of VACUUM FULL had changed and now it requires additional space for the full copy of the table.
You may try to drop the indexes from the tables and vacuum them one by one, starting from the least one.
The easiest answer at this point would be to dump the database to a different drive/computer (for instance, using pg_dump, or pg_dumpall if you have more than one db, and keeping in mind things like Large Objects that need special backup/restore processes) then drop and recreate the database.
If there's a tiny bit of space left, you might try vacuum full smallesttable, which might be able to finish and free up some space to vacuum the next smallest table, and so on.
If you end up filling the drive completely, the database server will probably refuse to start and you won't be able to do either of those. In that case, you could move the entire data directory to another computer with the same CPU architecture and more disk space, then start postgresql there to perform the vacuum.
In certain situations VACUUM (not full) can reclaim some disk space. (I think it will return pages that are totally dead to the OS.) That might free up enough space to begin with the VACUUM FULL. But it's not a good idea to let one table grow to more than the amount of disk free space.

Free space after massive postgres delete

I have a 9 million row table. I figured out that a large amount of it (around 90%) can be freed up. What actions are needed after the cleanup? Vacuum, reindex etc.
If you want to free up space on the file system, either VACUUM FULL or CLUSTER can help you. You may also want to run ANALYZE after these, to make sure the planner has up-to-date statistics but this is not specifically required.
It is important to note using VACUUM FULL places an ACCESS EXCLUSIVE lock on your table(s) (blocking any operation, writes & reads), so you probably want to take your application offline for the duration.
In PostgreSQL 8.2 and earlier, VACUUM FULL is probably your best bet.
In PostgreSQL 8.3 and 8.4, the CLUSTER command was significantly improved, so VACUUM FULL is not recommended -- it's slow and it will bloat your indexes. `CLUSTER will re-create indexes from scratch and without the bloat. In my experience, it's usually much faster too. CLUSTER will also sort the whole physical table using an index, so you must pick an index. If you don't know which, the primary key will work fine.
In PostgreSQL 9.0, VACUUM FULL was changed to work like CLUSTER, so both are good.
It's hard to make predictions, but on a properly tuned server with commodity hardware, 9 million rows shouldn't take longer than 20 minutes.
See the documentation for CLUSTER.
PostgreSQL wiki about VACUUM FULL and recovering dead space
You definitely want to run a VACUUM, to free up that space for future inserts. If you want to actually reclaim that space on disk, making it available to the OS, you'll need to run VACUUM FULL. Keep in mind that VACUUM can run concurrently, but VACUUM FULL requires an exclusive lock on the table.
You will also want to REINDEX, since the indexes will remain bloated even after the VACUUM runs. If possible, a much faster way to do this is to drop the index and create it again from scratch.
You'll also want to ANALYZE, which you can just combine with the VACUUM.
See the documentation for more info.
Hi
Don't it be more optimal to create a temporary table with 10% of needed records. Then drop original table and rename temporary to original ...
I'm relatively new to the world of Postgres, but I understand VACUUM ANALYZE is recommended. I think there's also a sub-option which just frees up space. I found reindex useful as well when doing batch inserts or deletes. Yes I've been working with tables with a similar number of rows, and the speed increase is very noticeable (UBuntu, Core 2 Quad)