I've got a "little" problem. A week ago my database was reaching full disk capacity. I deleted many rows in different tables trying to free up disk space. After which I tried running a full vacuum which did not complete.
What I want to know is. When I stopped the vacuum from fully completing does it leave any temp files on the disk that I have to delete manually?
I now have a database which is at a 100% disk capacity, which needlessly to say is a big problem.
Any tips to free disk space?
I'm running SUSE with a postgres 8.1.4 database.
First of all:
UPGRADE
Even if you can't to 8.2, 8.3 or 8.4 - at least upgrade to newest 8.1 (which is 8.1.17 at the moment, but will be 8.1.18 in 1-2 days).
Second: diagnose what is the problem.
Use du tool to diagnose where exactly did the space go. What directory is occupying too much space?
Check with df what is total used space, and then check how much of it is PostgreSQL directory.
The best option is to:
cd YOUR_PGDATA_DIR
du -sk *
cd base
du -sk *
cd LARGEST DIR FROM PREVIOUS COMMAND
du -sk * | sort -nr | head
Now, that you know which directory in PGDATA is using space you can do something about it.
if it's logs or pg_temp - restart pg or remove logs (pg_clog and pg_xlog are not logs in common meaning of the word, never delete anything from there!).
If it's something in your base directory, then:
numerical directories in base directory relate to databases. You can check it with:
select oid, datname from pg_database;
When you know the database that is using most of the space, connect to it, and check which files are using most of the space.
File names will be numerical with optional ".digits" suffix - this suffix is (for now) irrelevant, and you can check what exactly the file represents by issuing:
select relname from pg_class where relfilenode = <NUMBER_FROM_FILE_NAME>;
Once you know which tables/indexes use most of the space - you can VACUUM FULL it, or (much better) issue CLUSTER command on them.
On the new tangent to your problem, you can find out what in the database is using lots of space using a query. That can help you locate candidates to TRUNCATE to reclaim enough working space to clean up the ones with deleted info.
Note that deleting lots of rows but not VACUUMing frequently enough to keep disk space in check will often lead to a condition called index bloat, which VACUUM FULL doesn't help with at all. You'll know you're there when the query I suggested shows most of your space is taken up by indexes rather than regular tables. You'll need CLUSTER, which needs as much free disk space as the table itself to rebuild everything, to recover from that problem.
Related
I'm running a looooong pg_restore process of a database with 70 tables and 800Gb. The process is taking 5 days now. I'm monitoring some aspects of the process to evaluate how long will it take but I've some things missing and this is why I'm asking.
I run pg_dump with parameters -F d -j 10 the dump took about 12 hours. I noticed each one of the 10 threads took responsibility of a single table from start to end. After ending of processing a single table, the same process (pid) started with another table not taken by another process.
Running pg_restore is taking much longer (5 days and still working). The main reason is that I'm restoring to a NAS external drive mounted using nfs and that drive is very slow compared to a local hard drive. This is NOT a problem, I'll migrate the information back from the NAS to the original hard drive once I format the hard drive again and install the new operating system.
I'm doing two things to monitor progress:
In a separate terminal I launch du -sh /var/lib/pgsql and evaluate the disk space consumed in the new installation. It has to reach, more or less, the same space the original database was using.
In a separate terminal I launch ps -fu postgress and I see several pg_restore processes running. Each one of then linked with another process with this shape postgress: postress {dbname} [local] {command} where {dbname} is the database name, and {command} varies. Initially, there was the COPY command I think that was used to restore the table content. I also saw some CREATE INDEX commands for re-creating the indexes of that table, and now I see ALTER TABLE commands, don't know exactly for what.
At this time, all processes are just doing ALTER TABLE and the overall used space almost matches the initial space, but the process does not ends (and it is taking 5 days now).
So I'm asking if someone has more experience and can tell me what pg_restore is doing with the ALTER_TABLE command and if there is any other mechanism to estimate how long will it take.
Thanks!
Ignacio
The ALTER TABLE statements at the end of a pg_restore create primary and unique keys as well as foreign key constraints. They could also be attaching partitions, but that is normally very fast.
Look into pg_stat_progress_create_index if you have a recent enough PostgreSQL version (you didn't say), then you can monitor the progress of primary and unique key indexes being created.
I'm running a query that duplicates a very large table (92 million rows) on PostgreSQL. After a 3 iterations I got this error message:
The query was:
CREATE TABLE table_name
AS SELECT * FROM big_table
The issue isn't due to lack of space in the database cluster: at 0.3% of max possible storage at the time of running the query, table size is about 0.01% of max storage including all replicas. I also checked temporary files and it's not that.
You are definitely running out of file system resources.
Make sure you got the size right:
SELECT pg_table_size('big_table');
Don't forget that the files backing the new table are deleted after the error, so it is no surprise that you have lots of free space after the statement has failed.
One possibility is that you are not running out of disk space, but of free i-nodes. How to examine the free resources differs from file system to file system; for ext4 on Linux it would be
sudo dumpe2fs -h /dev/... | grep Free
I run df -h and it says 36Gb used, 9Gb available on my server. I can see that my PostgreSQL db file is 26Gb.
I delete millions of rows from my database, roughly half of the data in there.
I run df -h and it says the exact same numbers: 36Gb used, 9Gb available on disk.
I googled and found something about the VACUUM command, which said the deleted rows are still taking disk space until you run vacuum, I did this in verbose mode and it looks good but still no disk space released.
The system is FC21 with Postgresql 9.3.9. The cluster has 6 databases and uses 38 GB of storage in the pgsql directory. Recently over 20GB of redundant data has been removed. Each db has been vacuumed with a 'vacuum all' command twice, additionally the entire cluster has been vacuumed twice with a vacuumdb -a command. All ran successfully. Postgresql has been stopped and restarted.
For verification a pg_dumpall command creates an 12GB file.
All the tables from one db were removed:
select pg_size_pretty(pg_database_size('db'));
Shows over 6GB remaining.
How can the space be recovered? It seems unreasonable to have to do a pg_restore to recover the space. I have read and re-read the 'recovering disk space' document.
A VACUUM command will only reclaim space that is at the end of table-files. You will want VACUUM FULL or vacuumdb -f.
You might also want to consider reindexdb since all this row rewriting might leave your indexes a little bloated.
I am currently moving some data around and I am running into an interesting issue.
I have a CentOS server (6.3) up and running with Postgres 9.2 on a server with limited built in disk space; however, I do have a large amount of extremely reliable external network disk space available.
I have set the tablespace to a directory on this storage devise for my database and everything seems to be working well, until...
I realized that I have a large amount of BLOB data that needs to be stored in pg_largeobject.
I have been goggling how to set the tablespace of pg_largeobject and I did find some results, but they are horribly out dated.
I did find one article that looks promising, but I'm hesitant because the thread also references that things will/should have changed.
I have two questions...
In an ideal world, I would like to move all of postgres (including pg_largeobject) onto this external storage for ease of maintenance. Is this possible?
If not, how can I get pg_largeobject to use my network storage?
As you alluded to, your best bet is to move the entirety of PostgreSQL onto the remote storage, assuming that storage uses a reliable file network block device like iSCSI, ATAoE or NBD. I wouldn't recommend running Pg on NFS, and running it on CIFS/SMBFS just won't work.
Just:
Make a backup
Take a note of the output of SHOW data_directory; in psql
Shut PostgreSQL down
Move the data directory (the folder containing pg_xlog, pg_clog, etc) to the remote storage
Adjust the permissions on the parent directories for the datadir's new location to make sure the postgres user, postgres, group or others permissions block has at least execute on each parent directory so it can traverse the tree.
Adjust your system startup scripts to set the new location as the PostgreSQL datadir or symlink the old datadir location (output by SHOW data_directory) to the new location.
Start PostgreSQL
Unfortunately, different systems and packages find the datadir different ways. Debian/Ubuntu use pg_wrapper, for example.