Executing VACUUM FULL on postgresql db which is stopped/down

Executing VACUUM FULL on postgresql db which is stopped/down - postgresql

I am attempting to restart a postgresql db which has stopped/is down and requires a VACUUM.
http://suwala.eu/blog/2010/10/09/how-to-vacuum-postgresql/
Following the above sequence of commands, I can't seem to get the last line to execute right.
$ postgres -D /var/lib/pgsql/data YOUR_DATABASE_NAME < /tmp/fix.sql
This gives me an error that says
postgres: invalid argument: "YOUR_DATABASE_NAME"
Try "postgres --help" for more information.
Any idea why?
CLARIFICATION
The 'YOUR_DATABASE_NAME' and the data directory I used on my server are the correct ones.

The referenced "how-to-vacuum-postgresql" page referenced in the question gives some very bad advice when it recommends VACUUM FULL. All that is needed is a full-database vacuum, which is simply a VACUUM run as the database superuser against the entire database (i.e., you don't specify any table name).
A VACUUM FULL works differently based on the version, but it eliminates all space within the heap files which is held by the database for quick re-use, and releases it to the OS. This can be much slower than the minimum needed to get back to a usable database, by orders of magnitude. And since any inserts or updates after the VACUUM FULL require OS calls to re-allocate space to the database, it can cause slower execution afterward, unless your database had a lot of bloat. (Although, if you turned off autovacuum, it might be in horrible shape, but you probably want to get back on your feet first, and sort that out later.)
Another issue with VACUUM FULL before version 9.0 is that while it eliminates bloat in a table's heap files, it tends to increase bloat in its index files, sometimes dramatically. If you issue a VACUUM FULL, you should normally follow it with a REINDEX to get the indexes back into good shape.
The page referenced in the question also fails to heed the advice given in the PostgreSQL docs at http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND to use single-user mode:
since the system will not execute commands once it has gone into the
safety shutdown mode, the only way to do this is to stop the server
and use a single-user backend to execute VACUUM. The shutdown mode is
not enforced by a single-user backend. See the postgres reference page
for details about using a single-user backend.
As others have mentioned -- there is almost no use case where turning off autovacuum is beneficial. It may be useful to supplement the autovacuum activity with explicit vacuums on large tables, or you may want to adjust autovacuum configuration, but really -- don't turn it off or you will see bloat which saps performance and you'll run into transaction ID wraparound problems periodically. People who notice a performance hit when autovacuum is performing maintenance sometimes have an instinct to make it less aggressive in triggering, but that is usually counter-productive. It is generally better to adjust the autovacuum cost limitation parameters to pace the work, rather than have it neglect tables which need maintenance.

This appears to be an issue in PostgreSQL, as according to documentation for 9.0 and 8.3 it should work with those versions but doesn't.
However, using --single switch makes it work:
postgres --single -D [path-to-data-dir] [db-name] < /tmp/fix.sql

Related

When should one vacuum a database, and when analyze?

I just want to check that my understanding of these two things is correct. If it's relevant, I am using Postgres 9.4.
I believe that one should vacuum a database when looking to reclaim space from the filesystem, e.g. periodically after deleting tables or large numbers of rows.
I believe that one should analyse a database after creating new indexes, or (periodically) after adding or deleting large numbers of rows from a table, so that the query planner can make good calls.
Does that sound right?

vacuum analyze;
collects statistics and should be run as often as much data is dynamic (especially bulk inserts). It does not lock objects exclusive. It loads the system, but is worth of. It does not reduce the size of table, but marks scattered freed up place (Eg. deleted rows) for reuse.
vacuum full;
reorganises the table by creating a copy of it and switching to it. This vacuum requires additional space to run, but reclaims all not used space of the object. Therefore it requires exclusive lock on the object (other sessions shall wait it to complete). Should be run as often as data is changed (deletes, updates) and when you can afford others to wait.
Both are very important on dynamic database

Correct.
I would add that you can change the value of the default_statistics_target parameter (default to 100) in the postgresql.conf file to a higher number, after which, you should restart your server and run analyze to obtain more accurate statistics.

How to Remove Dead Row Version in Postgres 9.2

I ran a vaccum on the tables of my database and it appears it is not helping me. For example I have a huge table and when I run vacuum on it, it returns there are 87887889 dead row versions that can not be deleted
My question is how to get rid of these dead rows

You have two basic options if a routine vacuum is insufficient. Both require a full table lock.
VACUUM FULL. This requires no additional disk space but takes a while to complete.
CLUSTER. This rewrites a table in a physical order optimized for a given index. It requires additional space to do the rewrite, but is much faster.
In general I would recommend using CLUSTER during a maintenance window if disk space allows.

Postgresql Truncation speed

We're using Postgresql 9.1.4 as our db server. I've been trying to speed up my test suite so I've stared profiling the db a bit to see exactly what's going on. We are using database_cleaner to truncate tables at the end of tests. YES I know transactions are faster, I can't use them in certain circumstances so I'm not concerned with that.
What I AM concerned with, is why TRUNCATION takes so long (longer than using DELETE) and why it takes EVEN LONGER on my CI server.
Right now, locally (on a Macbook Air) a full test suite takes 28 minutes. Tailing the logs, each time we truncate tables... ie:
TRUNCATE TABLE table1, table2 -- ... etc
it takes over 1 second to perform the truncation. Tailing the logs on our CI server (Ubuntu 10.04 LTS), take takes a full 8 seconds to truncate the tables and a build takes 84 minutes.
When I switched over to the :deletion strategy, my local build took 20 minutes and the CI server went down to 44 minutes. This is a significant difference and I'm really blown away as to why this might be. I've tuned the DB on the CI server, it has 16gb system ram, 4gb shared_buffers... and an SSD. All the good stuff. How is it possible:
a. that it's SO much slower than my Macbook Air with 2gb of ram
b. that TRUNCATION is so much slower than DELETE when the postgresql docs state explicitly that it should be much faster.
Any thoughts?

This has come up a few times recently, both on SO and on the PostgreSQL mailing lists.
The TL;DR for your last two points:
(a) The bigger shared_buffers may be why TRUNCATE is slower on the CI server. Different fsync configuration or the use of rotational media instead of SSDs could also be at fault.
(b) TRUNCATE has a fixed cost, but not necessarily slower than DELETE, plus it does more work. See the detailed explanation that follows.
UPDATE: A significant discussion on pgsql-performance arose from this post. See this thread.
UPDATE 2: Improvements have been added to 9.2beta3 that should help with this, see this post.
Detailed explanation of TRUNCATE vs DELETE FROM:
While not an expert on the topic, my understanding is that TRUNCATE has a nearly fixed cost per table, while DELETE is at least O(n) for n rows; worse if there are any foreign keys referencing the table being deleted.
I always assumed that the fixed cost of a TRUNCATE was lower than the cost of a DELETE on a near-empty table, but this isn't true at all.
TRUNCATE table; does more than DELETE FROM table;
The state of the database after a TRUNCATE table is much the same as if you'd instead run:
DELETE FROM table;
VACCUUM (FULL, ANALYZE) table; (9.0+ only, see footnote)
... though of course TRUNCATE doesn't actually achieve its effects with a DELETE and a VACUUM.
The point is that DELETE and TRUNCATE do different things, so you're not just comparing two commands with identical outcomes.
A DELETE FROM table; allows dead rows and bloat to remain, allows the indexes to carry dead entries, doesn't update the table statistics used by the query planner, etc.
A TRUNCATE gives you a completely new table and indexes as if they were just CREATEed. It's like you deleted all the records, reindexed the table and did a VACUUM FULL.
If you don't care if there's crud left in the table because you're about to go and fill it up again, you may be better off using DELETE FROM table;.
Because you aren't running VACUUM you will find that dead rows and index entries accumulate as bloat that must be scanned then ignored; this slows all your queries down. If your tests don't actually create and delete all that much data you may not notice or care, and you can always do a VACUUM or two part-way through your test run if you do. Better, let aggressive autovacuum settings ensure that autovacuum does it for you in the background.
You can still TRUNCATE all your tables after the whole test suite runs to make sure no effects build up across many runs. On 9.0 and newer, VACUUM (FULL, ANALYZE); globally on the table is at least as good if not better, and it's a whole lot easier.
IIRC Pg has a few optimisations that mean it might notice when your transaction is the only one that can see the table and immediately mark the blocks as free anyway. In testing, when I've wanted to create bloat I've had to have more than one concurrent connection to do it. I wouldn't rely on this, though.
DELETE FROM table; is very cheap for small tables with no f/k refs
To DELETE all records from a table with no foreign key references to it, all Pg has to do a sequential table scan and set the xmax of the tuples encountered. This is a very cheap operation - basically a linear read and a semi-linear write. AFAIK it doesn't have to touch the indexes; they continue to point to the dead tuples until they're cleaned up by a later VACUUM that also marks blocks in the table containing only dead tuples as free.
DELETE only gets expensive if there are lots of records, if there are lots of foreign key references that must be checked, or if you count the subsequent VACUUM (FULL, ANALYZE) table; needed to match TRUNCATE's effects within the cost of your DELETE .
In my tests here, a DELETE FROM table; was typically 4x faster than TRUNCATE at 0.5ms vs 2ms. That's a test DB on an SSD, running with fsync=off because I don't care if I lose all this data. Of course, DELETE FROM table; isn't doing all the same work, and if I follow up with a VACUUM (FULL, ANALYZE) table; it's a much more expensive 21ms, so the DELETE is only a win if I don't actually need the table pristine.
TRUNCATE table; does a lot more fixed-cost work and housekeeping than DELETE
By contrast, a TRUNCATE has to do a lot of work. It must allocate new files for the table, its TOAST table if any, and every index the table has. Headers must be written into those files and the system catalogs may need updating too (not sure on that point, haven't checked). It then has to replace the old files with the new ones or remove the old ones, and has to ensure the file system has caught up with the changes with a synchronization operation - fsync() or similar - that usually flushes all buffers to the disk. I'm not sure whether the the sync is skipped if you're running with the (data-eating) option fsync=off .
I learned recently that TRUNCATE must also flush all PostgreSQL's buffers related to the old table. This can take a non-trivial amount of time with huge shared_buffers. I suspect this is why it's slower on your CI server.
The balance
Anyway, you can see that a TRUNCATE of a table that has an associated TOAST table (most do) and several indexes could take a few moments. Not long, but longer than a DELETE from a near-empty table.
Consequently, you might be better off doing a DELETE FROM table;.
--
Note: on DBs before 9.0, CLUSTER table_id_seq ON table; ANALYZE table; or VACUUM FULL ANALYZE table; REINDEX table; would be a closer equivalent to TRUNCATE. The VACUUM FULL impl changed to a much better one in 9.0.

Brad, just to let you know. I've looked fairly deeply into a very similar question.
Related question: 30 tables with few rows - TRUNCATE the fastest way to empty them and reset attached sequences?
Please also look at this issue and this pull request:
https://github.com/bmabey/database_cleaner/issues/126
https://github.com/bmabey/database_cleaner/pull/127
Also this thread: http://archives.postgresql.org/pgsql-performance/2012-07/msg00047.php
I am sorry for writing this as an answer, but I didn't find any comment links, maybe because there are too much comments already there.

I've encountered similar issue lately, i.e.:
The time to run test suite which used DatabaseCleaner varied widely between different systems with comparable hardware,
Changing DatabaseCleaner strategy to :deletion provided ~10x improvement.
The root cause of the slowness was a filesystem with journaling (ext4) used for database storage. During TRUNCATE operation the journaling daemon (jbd2) was using ~90% of disk IO capacity. I am not sure if this is a bug, an edge case or actually normal behaviour in these circumstances. This explains however why TRUNCATE was a lot slower than DELETE - it generated a lot more disk writes. As I did not want to actually use DELETE I resorted to setting fsync=off and it was enough to mitigate this issue (data safety was not important in this case).

A couple of alternate approaches to consider:
Create a empty database with static "fixture" data in it, and run the tests in that. When you are done, just just drop the database, which should be fast.
Create a new table called "test_ids_to_delete" that contains columns for table names and primary key ids. Update your deletion logic to insert the ids/table names in this table instead, which will be much faster than running deletes. Then, write a script to run "offline" to actually delete the data, either after a entire test run has finished, or overnight.
The former is a "clean room" approach, while latter means there will be some test data will persist in database for longer. The "dirty" approach with offline deletes is what I'm using for a test suite with about 20,000 tests. Yes, there are sometimes problems due to having "extra" test data in the dev database but at times. But sometimes this "dirtiness" has helped us find and fixed bug because the "messiness" better simulated a real-world situation, in a way that clean-room approach never will.

Free space after massive postgres delete

I have a 9 million row table. I figured out that a large amount of it (around 90%) can be freed up. What actions are needed after the cleanup? Vacuum, reindex etc.

If you want to free up space on the file system, either VACUUM FULL or CLUSTER can help you. You may also want to run ANALYZE after these, to make sure the planner has up-to-date statistics but this is not specifically required.
It is important to note using VACUUM FULL places an ACCESS EXCLUSIVE lock on your table(s) (blocking any operation, writes & reads), so you probably want to take your application offline for the duration.
In PostgreSQL 8.2 and earlier, VACUUM FULL is probably your best bet.
In PostgreSQL 8.3 and 8.4, the CLUSTER command was significantly improved, so VACUUM FULL is not recommended -- it's slow and it will bloat your indexes. `CLUSTER will re-create indexes from scratch and without the bloat. In my experience, it's usually much faster too. CLUSTER will also sort the whole physical table using an index, so you must pick an index. If you don't know which, the primary key will work fine.
In PostgreSQL 9.0, VACUUM FULL was changed to work like CLUSTER, so both are good.
It's hard to make predictions, but on a properly tuned server with commodity hardware, 9 million rows shouldn't take longer than 20 minutes.
See the documentation for CLUSTER.
PostgreSQL wiki about VACUUM FULL and recovering dead space

You definitely want to run a VACUUM, to free up that space for future inserts. If you want to actually reclaim that space on disk, making it available to the OS, you'll need to run VACUUM FULL. Keep in mind that VACUUM can run concurrently, but VACUUM FULL requires an exclusive lock on the table.
You will also want to REINDEX, since the indexes will remain bloated even after the VACUUM runs. If possible, a much faster way to do this is to drop the index and create it again from scratch.
You'll also want to ANALYZE, which you can just combine with the VACUUM.
See the documentation for more info.

Hi
Don't it be more optimal to create a temporary table with 10% of needed records. Then drop original table and rename temporary to original ...

I'm relatively new to the world of Postgres, but I understand VACUUM ANALYZE is recommended. I think there's also a sub-option which just frees up space. I found reindex useful as well when doing batch inserts or deletes. Yes I've been working with tables with a similar number of rows, and the speed increase is very noticeable (UBuntu, Core 2 Quad)

PostgreSQL database size increasing

I have a strange problem. The size of my postgresql (8.3) is increasing. So I made a dump and then cleaned up the database and then re-imported the dump. The database size was reduced by roughly 50%.
Some infomation:
(1) AUTOVACUUM and REINDEX are running regularly in background.
(2) Database encoding is ASCII.
(3) Database location: /database/pgsql/data
(4) System: Suse-Ent. 10.
Any hints are appreciated

If the dead tuples have stacked up beyond what can be accounted for in max_fsm_pages, a regular VACUUM will not be able to free everything. The end result is that the database will grow larger and larger over time as dead space continues to accumulate. Running a VACUUM FULL should fix this problem. Unfortunately it can take a very long time on a large database.
If you're running into this problem frequently, you either need to vacuum more often (autovacuum can help here) or increase the max_fsm_pages setting. When running VACUUM VERBOSE it will tell you how many pages were freed and give you a warning if max_fsm_pages was exceeded, this can help you determine what this value should be. See the manual for more information. http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM
Fortunately, 8.4's visibility map resolves this issue. Despesz has a great story on the subject as usual: http://www.depesz.com/index.php/2008/12/08/waiting-for-84-visibility-maps/

Without knowing more specifics about your particular setup, a couple of things come to mind. When AUTOVACUUM runs, is it trying to reclaim disk space, and can you verify that it is through server logs?
Secondly, especially if the previous answer was no, your AUTOVACUUM values may be incorrect. I would highly recommend reading the following on the subject: http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#AUTOVACUUM

running reindex shouldn't be necessary.
run database wide vacuum with verbose, and check the last lines for fsm settings hint - maybe it is what is wrong.

Did you try a VACUUM FULL, too? (Warning, it locks your database for a long time.) I am not sure that AUTOVACUUM is so eager...

If you haven't already, check your system for long-running idle transactions. They will prevent VACUUM (both manual and auto) from clearing out space.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse