Our postgresql db has a no-usage window of 2am to 6am.
one of the daily cron jobs already does a VACUUM FULL during this period. i am seeing no real performance hit with the ~200 odd users who use the web site. but the db is what i would classify as 'light' at this time.
however, there is forecast of data surge in the upcoming months due to some process changes in the org. my specific question:
is there a performance gain to be expected if I dump the entire db to a text file (already happens as part of db backup), drop the database, recreate it and reload the dump back. if the answer is 'yes', how significant is the gain?
or will VACUUM FULL do the job and no action is needed?
vacuum full would fo it for you. no need to manually reload data
https://www.postgresql.org/docs/current/static/sql-vacuum.html
VACUUM FULL
can reclaim more space, but takes much longer and exclusively locks the table. This method also requires extra disk
space, since it writes a new copy of the table and doesn't release the
old copy until the operation is complete.
Related
Is it possible to run PostgreSQL 11's VACUUM FULL for a short while and then get some benefit? Or does cancelling it midway cause all of its progress to be lost?
I've read about pg_repack (https://aws.amazon.com/blogs/database/remove-bloat-from-amazon-aurora-and-rds-for-postgresql-with-pg_repack/) but the way it works (creating new tables, copying data, etc.) sounds risky to me. Is that my paranoia or is it safe to use on a production database?
Backstory: I am working with a very large production database on AWS Aurora PostgreSQL 11. Many of the tables had tens of millions of records but have been pruned down significantly. The problem is that the table sizes on disk (and in the snapshots) have not decreased because DELETE and VACUUM (without FULL) do not shrink the files. These tables are in the hundreds of gigabytes range and I'm afraid running VACUUM FULL will take forever.
No. VACUUM FULL writes a new physical file for the table. Stopping it before it finishes voids the work done so far.
The manual:
VACUUM FULL rewrites the entire contents of the table into a new
disk file with no extra space, allowing unused space to be returned to
the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.
This is the main reason why community tools like pg_repack or pg_squeeze were created, which are more flexible, less blocking, and often faster, too. (I don't think pg_squeeze is available for Aurora, yet).
pg_repack might be a bit of overkill. You can instead just delete tuples from the end of the table and reinsert them towards the front of the table (reusing space already marked as free by an earlier VACUUM), at which point another ordinary VACUUM can truncate away the free space at the end of the table.
with d as (delete from mytable where ctid>='(50000,1)' returning *)
insert into mytable select * from d;
You can use pg_freespacemap to figure out where would be a good place to start the ctid criterion at.
This might not behave well if you have triggers or FK constraints, and it might bloat indexes such they would need to be rebuilt (but they probably do anyway). It will also lock a large number rows at a time, for the duration it takes for the re-insert to run and commit.
Improvements made since v11 will make the ctid scan more efficient than it will be in v11.
I just want to check that my understanding of these two things is correct. If it's relevant, I am using Postgres 9.4.
I believe that one should vacuum a database when looking to reclaim space from the filesystem, e.g. periodically after deleting tables or large numbers of rows.
I believe that one should analyse a database after creating new indexes, or (periodically) after adding or deleting large numbers of rows from a table, so that the query planner can make good calls.
Does that sound right?
vacuum analyze;
collects statistics and should be run as often as much data is dynamic (especially bulk inserts). It does not lock objects exclusive. It loads the system, but is worth of. It does not reduce the size of table, but marks scattered freed up place (Eg. deleted rows) for reuse.
vacuum full;
reorganises the table by creating a copy of it and switching to it. This vacuum requires additional space to run, but reclaims all not used space of the object. Therefore it requires exclusive lock on the object (other sessions shall wait it to complete). Should be run as often as data is changed (deletes, updates) and when you can afford others to wait.
Both are very important on dynamic database
Correct.
I would add that you can change the value of the default_statistics_target parameter (default to 100) in the postgresql.conf file to a higher number, after which, you should restart your server and run analyze to obtain more accurate statistics.
I have a 9 million row table. I figured out that a large amount of it (around 90%) can be freed up. What actions are needed after the cleanup? Vacuum, reindex etc.
If you want to free up space on the file system, either VACUUM FULL or CLUSTER can help you. You may also want to run ANALYZE after these, to make sure the planner has up-to-date statistics but this is not specifically required.
It is important to note using VACUUM FULL places an ACCESS EXCLUSIVE lock on your table(s) (blocking any operation, writes & reads), so you probably want to take your application offline for the duration.
In PostgreSQL 8.2 and earlier, VACUUM FULL is probably your best bet.
In PostgreSQL 8.3 and 8.4, the CLUSTER command was significantly improved, so VACUUM FULL is not recommended -- it's slow and it will bloat your indexes. `CLUSTER will re-create indexes from scratch and without the bloat. In my experience, it's usually much faster too. CLUSTER will also sort the whole physical table using an index, so you must pick an index. If you don't know which, the primary key will work fine.
In PostgreSQL 9.0, VACUUM FULL was changed to work like CLUSTER, so both are good.
It's hard to make predictions, but on a properly tuned server with commodity hardware, 9 million rows shouldn't take longer than 20 minutes.
See the documentation for CLUSTER.
PostgreSQL wiki about VACUUM FULL and recovering dead space
You definitely want to run a VACUUM, to free up that space for future inserts. If you want to actually reclaim that space on disk, making it available to the OS, you'll need to run VACUUM FULL. Keep in mind that VACUUM can run concurrently, but VACUUM FULL requires an exclusive lock on the table.
You will also want to REINDEX, since the indexes will remain bloated even after the VACUUM runs. If possible, a much faster way to do this is to drop the index and create it again from scratch.
You'll also want to ANALYZE, which you can just combine with the VACUUM.
See the documentation for more info.
Hi
Don't it be more optimal to create a temporary table with 10% of needed records. Then drop original table and rename temporary to original ...
I'm relatively new to the world of Postgres, but I understand VACUUM ANALYZE is recommended. I think there's also a sub-option which just frees up space. I found reindex useful as well when doing batch inserts or deletes. Yes I've been working with tables with a similar number of rows, and the speed increase is very noticeable (UBuntu, Core 2 Quad)
How can I compact Firebird 2.1 database, like we do in MS Access (discarding erased data, remaking index, etc)?
There's a way to do it?
Thanks!
Usually there is no need to compact a Firebird Database: see fb release notes about garbage collection and an automatic (per-database configurable) operation named "sweep".
In few words, fb reuses space in pages when records are deleted or oldest record version are freed asking for disk space chunks only when free space becomes too small (i.e. under a defined percent).
Sweep is performed as default after a predefined number of commited transactions, bur it's an expensive task.
Backup and restore must be intended as last resort to optimize and shrink, as this rebuilds and optimize indexes too, but usually this is not needed as there are commands and tools to rebuild indexes.
The only way to do it is to make a backup and a restore.
From the official faq
Many users wonder why they don't get their disk space back when they
delete a lot of records from database.
The reason is that it is an expensive operation, it would require a
lot of disk writes and memory - just like doing refragmentation of
hard disk partition. The parts of database (pages) that were used by
such data are marked as empty and Firebird will reuse them next time
it needs to write new data.
If disk space is critical for you, you can get the space back by
doing backup and then restore. Since you're doing the backup to
restore right away, it's wise to use the "inhibit garbage collection"
or "don't use garbage collection" switch (-G in gbak), which will make
backup go A LOT FASTER. Garbage collection is used to clean up your
database, and as it is a maintenance task, it's often done together
with backup (as backup has to go throught entire database anyway).
However, you're soon going to ditch that database file, and there's no
need to clean it up.
In PostgreSQL it is necessary to vacuum periodically to prevent data loss of very old data due to transaction ID wraparound. I am concerned that data loss might be an issue with SQLite3 databases as well if they are not vacuumed routinely.
Additionally, does the workload that the SQLite3 database experiences matter? I am currently thinking of using SQLite3 in a few scenarios including:
as a file format for a program where people might share files and use them across different machines
to store application settings
to store logs for an application which might log multiple times per second (queries on recent data might be performed every hour)
Also would the frequency of updates and deletes matter?
VACUUM
removes fragmentation, so it helps when you have both lots of deletions and inserts, and many read-only queries that scan entire tables, and
frees unused pages, so it helps when you have delete lots of data, and have very few insertions afterwards.
But these are merely optimizations.
Fragmentation typcially matters only on rotating disks, and freeing space is not necessary unless you're running out of space.
SQLite uses a different transaction locking mechanism (which is much simpler and faster, but not scalable) and does not require maintenance.