Can Google Cloud Local SSD be used for PostgreSQL Temp Tablespace? - postgresql

We have a PostgreSQL instance running in a VM in the Google Cloud. The nature of the queries that we run involves lots of PostgreSQL temporary table space. (5 or 6 or more TB of disk I/O every day)
This I/O continues to be a major bottleneck in our database. Currently I have it all happening on an SSD persistent disk - not because we need to save any of the data in the event of a reboot, but because PostgreSQL lays out a file structure on the disk that it then uses for the temporary tables and if the file structure is missing when the database starts up, it isn't very good.
What I'd like to do is configure the temporary tablespace on the local SSD's because of their much higher I/O throughput. Unfortunately, they get wiped out on every reboot. I'd like a simple way to be able to re-layout the disk after reboot and before PostgreSQL starts back up.
I could tar up the empty file structure and then write a script that untars it after every boot. Does that make sense? Is there a better way/best practice for doing this?
What would be awesome is if there was a PostgreSQL extension out there that did this magically.
Ideas?

I dug a bit into my previous tests and here is some summary:
PostgreSQL tablespace is just a directory - no big deal. Plus - if you will use it only as temporary table space there will be no persistent file left when you shutdown database.
You can create tablespace for temp tables on any location you want and then go to this location and check directory structure to see what PG created. But you must do under OS because PG will show you only tablespace main directory - both \db+ in psql or select oid, spcname, pg_tablespace_location(oid) from pg_tablespace; work the same way.
My example:
(I used /tempspace/pgtemp as presumed mounting point) CREATE TABLESPACE p_temp OWNER xxxxxx LOCATION '/tempspace/pgtemp'; created in my case structure /tempspace/pgtemp/PG_10_201707211
I set temp_tablespaces = 'pg_temp' in postgresql.conf and reloaded configuration.
When I used create temp table .... PG added another subdirectory - /tempspace/pgtemp/PG_10_201707211/16393 = oid of schema - but this does not matter for temp tablespace because if this subdirectory will be missing PG will create it.
PG created in this subdir files for temp table.
When I closed this session files for temp table were gone.
Now I stopped PG and tested what would happened if directories will be missing:
I deleted PG_10_201707211 with its subdir
started PG and log showed message LOG: could not open tablespace directory "pg_tblspc/166827/PG_10_201707211": No such file or directory but PG started
I tried to create temp table - I got error message ERROR: could not create directory "pg_tblspc/166827/PG_10_201707211/16393": No such file or directory SQL state: 58P01
Now (with running PG) I issued these commands in OS:
sudo mkdir -p /tempspace/pgtemp/PG_10_201707211
sudo chown postgres:postgres -R /tempspace/pgtemp
sudo chmod 700 -R /tempspace/pgtemp
I tried to create temp table again and insert and select values and everything worked OK
So conclusion is - since PG tablespace is no "big magic" just directories you can simply create bash script running on linux startup which will check (and mount if necessary) local SSD and create necessary directories for PG temp tablespace.

Related

Postgresql - restore SQL dump with tablespaces

I'm planning to move some tables to different tablespaces (folders) on my PROD Linux box.
Overnight DB backups are done using pg_dumpall
I have also DEV environment working under Windows OS Im usually restoring sql dump (made on Linux).
Im worrying now how to restore such sql dumps, having pointers to Linux partition, in Linux notation.
I read on various webpages that same folder structure has to be created in order to restore non-standard tablespaces. But folder paths in Windows and Linux looks totally different (c:\... vs /opt/...)
Is there any command line switch allowing remap tablespace to other (Windows-like location) during restore? If not how you guys manage that scenario ?
I guess I shoud be able to archieve that by editing this SQL dump file - but it's huge, few hundred gigs file, also it is a bit problematic to automate
You can retrieve the actual tablespace definitions with a separate pg_dumpall command. You still need to do some editing, but the output is not that large. (similar for users)
pg_dumpall --tablespaces-only mydatabasename >stuff.out
There is no option to remap tablespace names during import, so you will need to create them in your Windows installation with the same name - the actual location physical location ("folder structure") is irrelevant as the SQL dump only references them by name.
If the script contains the create tablespace command you need to change that command to use a directory/path name that exists on your system before you can run the SQL script. But you only need to change that, all other places will refer to the tablespace name, not the folder path.
Typically pg_dump is easier than pg_dumpall for moving databases around (e.g. because of tablespaces).

Restoring Postgres database without pg dump?

I have a postgre database DATA1 in table space location D:\tbl_DATA1. We use OS backup restore tool copy the the D:\tbl_DATA1 to a target machine C:\tbl_DATA1. Is it possible for recreate the database from this folder in the second mahcine?
https://www.postgresql.org/docs/current/static/backup-file.html
An alternative backup strategy is to directly copy the files that
PostgreSQL uses to store the data in the database
and later two restrictions mentionned
The database server must be shut down in order to get a usable backup.
You should resotore the whole PGDATA direcotory, not the certain individual tables or databases from their respective files or directories.
So yes - it is a common practice to shutdown the PostgreSQL, copy PGDATA directory to other machine and start Postgres in order to get the cluser copy. But it is done cluster level - not tablespace as you mention or database - the whole data_directory should be copied.
So no - copying the tablespace directory and trying to hack the db to add a tablespace will fail.

Restoring Database PostgreSQL

One of my servers has a virus and the Postgres service in Windows is not running a backup and I'm using Odoo8 and even the Odoo Service is not running.
Is it possible to restore a database using only a OID directory which from what I know is the database file of Postgres.
I assume you mean /data/base/<oid> directory. Unfortunately it's not enough. There are some settings stored outside database oid directory as you called it.
Ex:
/data/glboal/ - cluster users' settings (passwords, roles etc)
/data/pg_xlog/ - WAL entries - possibly with transactions changes not "transfered" to database files yet.
/data/pg_tblspc/ - tablespaces
You need whole /data directory. Read more about PHYSICAL BACKUP.
Edit:
So, if whole /data is available for you, you can restore database to other server. There's one thing you should remember: destination postrges cluster must be at the same varsion ex. 9.4.1. When the first and seccond numbers match (ex 9.2.10 and 9.2.16) this should also work most of the times. Keeping that in mind, you just need to replace /data/ directory on destination server with your source /data directory (destination server must be stopped during that operation).

Postgres 9.2 pg_largeobject tablespace

I am currently moving some data around and I am running into an interesting issue.
I have a CentOS server (6.3) up and running with Postgres 9.2 on a server with limited built in disk space; however, I do have a large amount of extremely reliable external network disk space available.
I have set the tablespace to a directory on this storage devise for my database and everything seems to be working well, until...
I realized that I have a large amount of BLOB data that needs to be stored in pg_largeobject.
I have been goggling how to set the tablespace of pg_largeobject and I did find some results, but they are horribly out dated.
I did find one article that looks promising, but I'm hesitant because the thread also references that things will/should have changed.
I have two questions...
In an ideal world, I would like to move all of postgres (including pg_largeobject) onto this external storage for ease of maintenance. Is this possible?
If not, how can I get pg_largeobject to use my network storage?
As you alluded to, your best bet is to move the entirety of PostgreSQL onto the remote storage, assuming that storage uses a reliable file network block device like iSCSI, ATAoE or NBD. I wouldn't recommend running Pg on NFS, and running it on CIFS/SMBFS just won't work.
Just:
Make a backup
Take a note of the output of SHOW data_directory; in psql
Shut PostgreSQL down
Move the data directory (the folder containing pg_xlog, pg_clog, etc) to the remote storage
Adjust the permissions on the parent directories for the datadir's new location to make sure the postgres user, postgres, group or others permissions block has at least execute on each parent directory so it can traverse the tree.
Adjust your system startup scripts to set the new location as the PostgreSQL datadir or symlink the old datadir location (output by SHOW data_directory) to the new location.
Start PostgreSQL
Unfortunately, different systems and packages find the datadir different ways. Debian/Ubuntu use pg_wrapper, for example.

Creating a tablespace in postgresql

I'm trying to create a tablespace in postgres, but I'm getting ownership problems. The command I'm using is:
CREATE TABLESPACE magdat OWNER maggie LOCATION '/home/john/BSTablespace'
I get the error:
ERROR: could not set permissions on directory "/home/john/BSTablespace": Operation not permitted
The folder belongs to postgres:postgres, I've tried changing it to maggie, but if I go :
chown maggie:postgres /home/john/BSTablespace
I get:
chown: invalid user: `maggie:postgres'
How come the user does not exist? If I list the users inside of postgres it does come up. Any ideas what I could be doing wrong?
I would hazard a guess that the problem lies in the permissions of the parent directory "/home/john". Your home directory is probably setup so that only your user has access (i.e chmod 700) to it (it's a good thing for your home directory to be chmod 700, don't change it).
Doing something like:
mkdir /BSTablespace
chown postgres:postgres /BSTablespace
and then
CREATE TABLESPACE magdat OWNER maggie LOCATION '/BSTablespace';
should work fine.
Regarding the user maggie: database users are not the same as OS users. That isn't to say that you couldn't have a user in both places named maggie-- but you would need to create the user in both the database and the OS for that to happen.
When you install Postgres on a Mac, and are trying to use PgAdmin to create your databases, tablespaces, etc. You need to know that the PgAdmin Utility is running under the postgres account that it created when you installed the postgres database and the utilities.
The postgres account is part of the _postgres group
( dscacheutil -q group|grep -i postgres command will list the group associated with the postgres account)
The best practice would be to create a new directory under root(/) for housing the tablespaces,(let us call it /postgresdata then make postgres:_postgres the owners of that directory, using the command below)
sudo chown postgres:_postgres /postgresdata
This should do it for you.
You could then create a subdirectory under /postgresdata for each unique table space
There is a problem with this solution. Think about it. Why do you want to create a new tablespace? Most people do it for either space limitations or performance. In both cases, that means placing each tablespace on a different drive. So, archive data goes on the slower hard-drive, while actively used data does on the SSD.
Assume your OS is on the SSD and you have mounted your slower spin up hard drive as /media/slowdrive. The same dilemma would occur if the reverse, where the spinup is the OS and SSD is the mounted.
Your solution would place the new tablespace at /newtablespace.
Do you see the problem? ... /newtablespace is on the SSD, which does not have the capacity to hold both the archival and active data. If it did, we would not be creating a new tablespace in the first place.
So, how do we solve this issue when our newtablespace is mounted at /media/slowdrive/newtablespace? In my case, the slowdrive (spinup HD) is mounted as root:root for security purposes, although I am not entirely sure about why. What you are suggesting is that I have to chage the mount of my secondary drive to postgres:postgres in addition to having the newtablespace directory as postgres:postgres. That makes no sense, especially since I use this drive for many other reasons than just a postgres tablesapce.
Joe