Postgres orphaned files in base directory - postgresql

The Postgres file system is out of synch with the database.
Basically, some tables were dropped in Postgres but the files under /base still exist.
Now I have orphaned files taking up too much space. How do I reclaim the disk space back?
I'd rather not mess with deleting files from the PG data directory unless I am certain they can be deleted.
I tried locating the file system oid in pg_class but they don't exist.
When I calculate db sizes using SELECT pg_size_pretty(pg_database_size(db));
The Master is 100G larger than the slave so there is definitely space that needs to be reclaimed but how?

Related

PostgreSQL how to safely remove files inside pg_wal directory

i have a legacy postgreSQL DB and the pg_wal size is very huge,
how to safely remove the early files inside pg_wal directory to reduce the pg_wal size without interrupting the current database?
Thank you
There is no safe way to manually remove files in pg_wal. Don't do it.
You have to figure out the reason that keeps PostgreSQL from deleting the files. A stale replication slot? Is the archiver stuck? Is wal_keep_size (wal_keep_segments in older releases) large?
Once you have fixed the problem, the situation will gradually improve. WAL segments are automatically deleted during checkpoints.

Postgresql | No space left on device

I am getting space issue while running a batch process on PostgreSQL database.
However, df -h command shows that machine has enough space
below is the exact error
org.springframework.dao.DataAccessResourceFailureException: PreparedStatementCallback; SQL [INSERT into BATCH_JOB_INSTANCE(JOB_INSTANCE_ID, JOB_NAME, JOB_KEY, VERSION) values (?, ?, ?, ?)]; ERROR: could not extend file "base/16388/16452": No space left on device
Hint: Check free disk space.
What is causing this issue?
EDIT
postgres data directory is /var/opt/rh/rh-postgresql96/lib/pgsql/data
df -h /var/opt/rh/rh-postgresql96/lib/pgsql/data
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 100G 63G 38G 63% /
Most likely there are some queries that create large temporary files which fill up your hard disk temporarily. These files will be deleted as soon as the query is done (or has failed), so the file system has enough free space when you look.
Set log_temp_files = 10240 in postgresql.conf (and reload) to log all temporary files exceeding 10 MB, then you can check the log file to see if this is indeed the reason.
Try to identify the bad queries and fix them.
If temporary files are not the problem, maybe temporary tables are. They are dropped automatically when the database session ends. Does your application use temporary tables?
Another possibility might be files created by something else than the database.

Can Google Cloud Local SSD be used for PostgreSQL Temp Tablespace?

We have a PostgreSQL instance running in a VM in the Google Cloud. The nature of the queries that we run involves lots of PostgreSQL temporary table space. (5 or 6 or more TB of disk I/O every day)
This I/O continues to be a major bottleneck in our database. Currently I have it all happening on an SSD persistent disk - not because we need to save any of the data in the event of a reboot, but because PostgreSQL lays out a file structure on the disk that it then uses for the temporary tables and if the file structure is missing when the database starts up, it isn't very good.
What I'd like to do is configure the temporary tablespace on the local SSD's because of their much higher I/O throughput. Unfortunately, they get wiped out on every reboot. I'd like a simple way to be able to re-layout the disk after reboot and before PostgreSQL starts back up.
I could tar up the empty file structure and then write a script that untars it after every boot. Does that make sense? Is there a better way/best practice for doing this?
What would be awesome is if there was a PostgreSQL extension out there that did this magically.
Ideas?
I dug a bit into my previous tests and here is some summary:
PostgreSQL tablespace is just a directory - no big deal. Plus - if you will use it only as temporary table space there will be no persistent file left when you shutdown database.
You can create tablespace for temp tables on any location you want and then go to this location and check directory structure to see what PG created. But you must do under OS because PG will show you only tablespace main directory - both \db+ in psql or select oid, spcname, pg_tablespace_location(oid) from pg_tablespace; work the same way.
My example:
(I used /tempspace/pgtemp as presumed mounting point) CREATE TABLESPACE p_temp OWNER xxxxxx LOCATION '/tempspace/pgtemp'; created in my case structure /tempspace/pgtemp/PG_10_201707211
I set temp_tablespaces = 'pg_temp' in postgresql.conf and reloaded configuration.
When I used create temp table .... PG added another subdirectory - /tempspace/pgtemp/PG_10_201707211/16393 = oid of schema - but this does not matter for temp tablespace because if this subdirectory will be missing PG will create it.
PG created in this subdir files for temp table.
When I closed this session files for temp table were gone.
Now I stopped PG and tested what would happened if directories will be missing:
I deleted PG_10_201707211 with its subdir
started PG and log showed message LOG: could not open tablespace directory "pg_tblspc/166827/PG_10_201707211": No such file or directory but PG started
I tried to create temp table - I got error message ERROR: could not create directory "pg_tblspc/166827/PG_10_201707211/16393": No such file or directory SQL state: 58P01
Now (with running PG) I issued these commands in OS:
sudo mkdir -p /tempspace/pgtemp/PG_10_201707211
sudo chown postgres:postgres -R /tempspace/pgtemp
sudo chmod 700 -R /tempspace/pgtemp
I tried to create temp table again and insert and select values and everything worked OK
So conclusion is - since PG tablespace is no "big magic" just directories you can simply create bash script running on linux startup which will check (and mount if necessary) local SSD and create necessary directories for PG temp tablespace.

Postgres 9.2 pg_largeobject tablespace

I am currently moving some data around and I am running into an interesting issue.
I have a CentOS server (6.3) up and running with Postgres 9.2 on a server with limited built in disk space; however, I do have a large amount of extremely reliable external network disk space available.
I have set the tablespace to a directory on this storage devise for my database and everything seems to be working well, until...
I realized that I have a large amount of BLOB data that needs to be stored in pg_largeobject.
I have been goggling how to set the tablespace of pg_largeobject and I did find some results, but they are horribly out dated.
I did find one article that looks promising, but I'm hesitant because the thread also references that things will/should have changed.
I have two questions...
In an ideal world, I would like to move all of postgres (including pg_largeobject) onto this external storage for ease of maintenance. Is this possible?
If not, how can I get pg_largeobject to use my network storage?
As you alluded to, your best bet is to move the entirety of PostgreSQL onto the remote storage, assuming that storage uses a reliable file network block device like iSCSI, ATAoE or NBD. I wouldn't recommend running Pg on NFS, and running it on CIFS/SMBFS just won't work.
Just:
Make a backup
Take a note of the output of SHOW data_directory; in psql
Shut PostgreSQL down
Move the data directory (the folder containing pg_xlog, pg_clog, etc) to the remote storage
Adjust the permissions on the parent directories for the datadir's new location to make sure the postgres user, postgres, group or others permissions block has at least execute on each parent directory so it can traverse the tree.
Adjust your system startup scripts to set the new location as the PostgreSQL datadir or symlink the old datadir location (output by SHOW data_directory) to the new location.
Start PostgreSQL
Unfortunately, different systems and packages find the datadir different ways. Debian/Ubuntu use pg_wrapper, for example.

postgreSQL vacuum temp files?

I've got a "little" problem. A week ago my database was reaching full disk capacity. I deleted many rows in different tables trying to free up disk space. After which I tried running a full vacuum which did not complete.
What I want to know is. When I stopped the vacuum from fully completing does it leave any temp files on the disk that I have to delete manually?
I now have a database which is at a 100% disk capacity, which needlessly to say is a big problem.
Any tips to free disk space?
I'm running SUSE with a postgres 8.1.4 database.
First of all:
UPGRADE
Even if you can't to 8.2, 8.3 or 8.4 - at least upgrade to newest 8.1 (which is 8.1.17 at the moment, but will be 8.1.18 in 1-2 days).
Second: diagnose what is the problem.
Use du tool to diagnose where exactly did the space go. What directory is occupying too much space?
Check with df what is total used space, and then check how much of it is PostgreSQL directory.
The best option is to:
cd YOUR_PGDATA_DIR
du -sk *
cd base
du -sk *
cd LARGEST DIR FROM PREVIOUS COMMAND
du -sk * | sort -nr | head
Now, that you know which directory in PGDATA is using space you can do something about it.
if it's logs or pg_temp - restart pg or remove logs (pg_clog and pg_xlog are not logs in common meaning of the word, never delete anything from there!).
If it's something in your base directory, then:
numerical directories in base directory relate to databases. You can check it with:
select oid, datname from pg_database;
When you know the database that is using most of the space, connect to it, and check which files are using most of the space.
File names will be numerical with optional ".digits" suffix - this suffix is (for now) irrelevant, and you can check what exactly the file represents by issuing:
select relname from pg_class where relfilenode = <NUMBER_FROM_FILE_NAME>;
Once you know which tables/indexes use most of the space - you can VACUUM FULL it, or (much better) issue CLUSTER command on them.
On the new tangent to your problem, you can find out what in the database is using lots of space using a query. That can help you locate candidates to TRUNCATE to reclaim enough working space to clean up the ones with deleted info.
Note that deleting lots of rows but not VACUUMing frequently enough to keep disk space in check will often lead to a condition called index bloat, which VACUUM FULL doesn't help with at all. You'll know you're there when the query I suggested shows most of your space is taken up by indexes rather than regular tables. You'll need CLUSTER, which needs as much free disk space as the table itself to rebuild everything, to recover from that problem.