PostgreSQL: VACUUM FULL or CLUSTER? - postgresql

I'm nearly out of disk space because of a query that tried to update every row in a huge table. I don't have enough space for CLUSTER (though it would barely fit if I dropped indexes first and recreated them afterwards).
How can I estimate how long VACUUM will take? How about VACUUM FULL? How do the three (with CLUSTER) compare in terms of running time and disk usage?
It's PostgreSQL 8.3.

use cluster, until 8.4 vacuum full is broke. if it takes to long you might as well dump and reload the table.

Related

Postgres11 vacuum taking days

I have set maintenance_work_mem 50GB and trying to run 200GB non partitioned table with 10 indexes manually. I am wondering it is taking 3 days still not completed vacuum.
I know, these many indexes bad but not understanding why vacuum not finished.
Note: No blockings observed

PostgreSQL suddently takes all the disk space

I am facing a very strange issue on my server, my configuration is very straight-forward:
Small VPS, 500 MiB RAM, 40 GiB disk
Debian stable at install time, now probably old_stable
PostgreSQL v11.11
The data is very small, the use of a database for my purpose is probably overkill, but handy:
7 tables
7 views, including one of them which is a little bit scary
The biggest table have a few hundred records
The full dump of the database gives me a file of 93 KiB
Everything was very fast for 1.5 year. Yesterday, the database suddenly became very slow. My investigations showed that the size of the data on the disk was 34 GiB and I had no disk space available anymore.
After more investigations, I tried the command "vacuum full", which deleted the useless 34 GiB. The disk space changed from 100% usage to 10% usage and the performances came back immediately. One day later, the system is slow again, I saw the disk usage is now around 50%.
I have no clue about what is going on, any suggestion?
I'd recommend reading Optimize and Improve PostgreSQL Performance with VACUUM, ANALYZE, and REINDEX and Routine Vacuuming. Here's some relevant bits.
In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table
You must have done a lot of deletes and updates, so Postgres consumed a lot of disk space. vacuum recovers this space. vacuum full isn't normally necessary and will lock your database.
Normally there is an autovacuum daemon running which will vacuum periodically. It probably isn't running. Check with show autovacuum and show track_counts. Both need to be true for autovacuum to run.
You can see what is "bloating" your database with the check_postgres tool.

Redshift vacuum is not reclaiming space

I have a Redshift cluster that consists of 2 nodes with 160 Gb disks.
I'm randomly getting "Disk full" error when running vacuum or any other query. My disk usage is 92%. I did delete more than a half of the old rows in table that is 10515 Mb in size, but even after rebooting the cluster there's no effect and table still of the same size, though count shows new number of rows. I should have a seen at lease small decrease in disk usage, but there's nothing.
Does anyone has any clues what it might be? Is deleting table in this case is the only option?
There are a few possibilities here but first let me check the facts. You have a 2 node dc2.large cluster and it is 92% disk full. This is too full and needs to lowered to be lowered to provide temp space for query execution. You have a table that is 10515 blocks in size. To address the disk space concern you deleted 1/2 of the rows in the table in question and then vacuumed the table. Once complete you didn't see any change to the cluster space nor the size of the table, not one block difference in table size. Do I have this correct?
First possibility is that the vacuum did not complete correctly. You mention that you are getting disk full messages even when vacuuming. So could it be that the vacuum you tried is not completing? You see vacuum need temp space to sort the table data and if you have a cluster that has gotten too full then the vacuum could fail. In this case you can run a delete-only vacuum that will not attempt to sort the table, just reclaim disk space. This will have a higher likelihood of success in a disk full situation.
Another possibility is that the delete of rows didn't complete correctly or wasn't committed before the vacuum was run. This will cause the vacuum to run on the full set of rows.
It is also possible that the table in question is very wide (many columns). This matters because of how Redshift stores data - each block is 1MB in size and each column needs a block for its data. This cluster has 4 slices and if this table is 1,500 columns wide (yes, that is silly wide) the table will take up 6,000 blocks to just store the first 4 rows. Then it takes no additional disk space to add rows until these blocks start to fill up. The table size will move in very large chunks and when removing rows the size may not change except in large chunks. This is unlikely to be what is happening if you are seeing EXACTLY the same number of blocks but if you are just seeing changes in blocks that are less than you expect this could be in play.
There could be some some other misunderstanding happening. A sort-only vacuum won't free up space. The node type isn't what I think it is. The table could live in S3 and be access through spectrum. But based on the description these don't seem likely.
UNSOLICITED ADVICE: You are on the right track by freeing up disk space but you need to take more action than reducing this one table. (I expect you realize this and this is just a start.) You should be operating below 70% disk full in most cases - this varies by workload and table sizes but is a good general rule. This means reducing a great deal of data on your disks or increasing your node count (and cost). Migrating some data to S3 and using Spectrum to access could be an option. If you need more storage w/o more compute you can look at the storage optimized nodes but since you are at the very smallest end of Redshift these likely aren't a win for you. You need to 1) remove unneeded data, 2) move some data to S3 and use Spectrum, or 3) add a node you your cluster.

PostgreSQL VACUUM(FULL, ANALYZE, VERBOSE) Duration 1.6T Table

We have a large table (1.6T) and deleted 60% of the records, and want to reclaim that space for the OS and file system. We're running PostgreSQL 9.4 (we're stuck on that pending a major software upgrade).
We need that space, as we're down to 100GB and when materialized views are refreshed we're running out of space on the server.
I tried running VACUUM(FULL, ANALYZE, VERBOSE) schema.tablename and let it run for 24 hours last weekend, but had to cancel it to get the server back online.
I'm running it again this weekend, after deleting the indexes (I'm hoping that will speed it up so it will finish). So far there is no output or indication of progress. I created a tablespace on another SSD array and set it up as temp space using temp_tablespaces = 'name_of_other_tablespaces', but du -chs shows it is still empty.
The query shows active, but since disk usage isn't increasing it just feels like it's just sitting there, making no noise and pretending it's not there.
This is on a server with 512GB of RAM and a RAID 10 array of very fast enterprise SSDs. Is there any way to get progress and know that something is actually happening and that it's working? Any guesses as to duration, or other suggestions?
I found out what was happening, by finally noticing that it was waiting for an autovacuum process to finish, which never happened (autovacuum: VACUUM pg_toast.pg_toast_nnnnn (to prevent wraparound)). Once I killed that the VACUUM ran quite quickly and cleared up over 1TB of space. Time to celebrate!

Did Postgresq vacuum execute when performing JDBC transaction?

From PostgreSQL blog
VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
So, i had a performance optimization issue in java application with jdbc. So my question is did VACUUM executed on somewhere in JDBC transaction or need to set explicitly?
While that quotation is saying the truth, it omits the fact that for a decade or so PostgreSQL has been having the autovacuum daemon that does this job for you automatically.
So normally you don't have to concern yourself with that. Only on tables with very high write activity you have to tune autovacuum to be more aggressive, and you may need the occasional VACUUM (FULL) if you bulk-delete a large percentage of a table.
Performance issues are normally not connected with VACUUM (except that sequential scans take longer on bloated tables), so the connection is not clear to me.
You may control the running of VACUUM by do appropriate settings described in the section
https://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE