Postgres 9.2 pg_largeobject tablespace - postgresql

I am currently moving some data around and I am running into an interesting issue.
I have a CentOS server (6.3) up and running with Postgres 9.2 on a server with limited built in disk space; however, I do have a large amount of extremely reliable external network disk space available.
I have set the tablespace to a directory on this storage devise for my database and everything seems to be working well, until...
I realized that I have a large amount of BLOB data that needs to be stored in pg_largeobject.
I have been goggling how to set the tablespace of pg_largeobject and I did find some results, but they are horribly out dated.
I did find one article that looks promising, but I'm hesitant because the thread also references that things will/should have changed.
I have two questions...
In an ideal world, I would like to move all of postgres (including pg_largeobject) onto this external storage for ease of maintenance. Is this possible?
If not, how can I get pg_largeobject to use my network storage?

As you alluded to, your best bet is to move the entirety of PostgreSQL onto the remote storage, assuming that storage uses a reliable file network block device like iSCSI, ATAoE or NBD. I wouldn't recommend running Pg on NFS, and running it on CIFS/SMBFS just won't work.
Just:
Make a backup
Take a note of the output of SHOW data_directory; in psql
Shut PostgreSQL down
Move the data directory (the folder containing pg_xlog, pg_clog, etc) to the remote storage
Adjust the permissions on the parent directories for the datadir's new location to make sure the postgres user, postgres, group or others permissions block has at least execute on each parent directory so it can traverse the tree.
Adjust your system startup scripts to set the new location as the PostgreSQL datadir or symlink the old datadir location (output by SHOW data_directory) to the new location.
Start PostgreSQL
Unfortunately, different systems and packages find the datadir different ways. Debian/Ubuntu use pg_wrapper, for example.

Related

Readonly postgres database on Dropbox

This is on several MacBook Pros and iMac, all running Monterey (OSX 12.3)
I would like to store a readonly postgres database on my Dropbox, so that I can use access it from multiple computers. I realize storing a writeable database is unworkable, but I'm not trying to do that.
I created a database under /Users/me/Dropbox/dbs on my MacBook Pro. But the other machines don't see the contents of the directory.
I figured it was because postgres has a different UID on the various machines, so I tried doing chown -R me:staff on the db database directory on the MacBook Pro. That made the directories and their contents visible in the Dropbox on the other machines, but I can't start postgres on the MacBook Pro (haven't tried elsewhere yet), because of file permissions, even after making the complained of file go=rx. I'm thinking since the file is in group=staff and not in postgres that's not going to work.
Any ideas? For example: can I create the database as me:staff, rather and postgres:postgres?
Like Laurenz says, this is never going to work properly. The closest you could get I suspect is some foreign data wrapper to the read-only shared directory. That's not going to provide the experience you are looking for I suspect though.
The SQLite wrapper might be your best bet. Store the shared data in SQLite and access it through PG. https://github.com/pgspider/sqlite_fdw
PostgreSQL is not Microsoft Access, it is a relational database with a client-server architecture. It works by starting the database server on the data directory and then connecting to the server process with a client. You can only run a single instance of the database server on a single data directory, and that is all you need, since many clients can connect to the server at the same time. The database directory has to be writable by the server and indeed is modified, even if you only perform read operations in the database.

Postgresql - restore SQL dump with tablespaces

I'm planning to move some tables to different tablespaces (folders) on my PROD Linux box.
Overnight DB backups are done using pg_dumpall
I have also DEV environment working under Windows OS Im usually restoring sql dump (made on Linux).
Im worrying now how to restore such sql dumps, having pointers to Linux partition, in Linux notation.
I read on various webpages that same folder structure has to be created in order to restore non-standard tablespaces. But folder paths in Windows and Linux looks totally different (c:\... vs /opt/...)
Is there any command line switch allowing remap tablespace to other (Windows-like location) during restore? If not how you guys manage that scenario ?
I guess I shoud be able to archieve that by editing this SQL dump file - but it's huge, few hundred gigs file, also it is a bit problematic to automate
You can retrieve the actual tablespace definitions with a separate pg_dumpall command. You still need to do some editing, but the output is not that large. (similar for users)
pg_dumpall --tablespaces-only mydatabasename >stuff.out
There is no option to remap tablespace names during import, so you will need to create them in your Windows installation with the same name - the actual location physical location ("folder structure") is irrelevant as the SQL dump only references them by name.
If the script contains the create tablespace command you need to change that command to use a directory/path name that exists on your system before you can run the SQL script. But you only need to change that, all other places will refer to the tablespace name, not the folder path.
Typically pg_dump is easier than pg_dumpall for moving databases around (e.g. because of tablespaces).

Persisting a single, static, large Postgres database beyond removal of the db cluster?

I have an application which, for local development, has multiple Docker containers (organized under Docker Compose). One of those containers is a Postgres 10 instance, based on the official postgres:10 image. That instance has its data directory mounted as a Docker volume, which persists data across container runs. All fine so far.
As part of testing the creation and initialization of the postgres cluster, it is frequently the case that I need to remove the Docker volume that holds the data. (The official postgres image runs cluster init if-and-only-if the data directory is found to be empty at container start.) This is also fine.
However! I now have a situation where in order to test and use a third party Postgres extension, I need to load around 6GB of (entirely static) geocoding lookup data into a database on the cluster, from Postgres backup dump files. It's certainly possible to load the data from a local mount point at container start, and the resulting (very large) tables would persist across container restarts in the volume that holds the entire cluster.
Unfortunately, they won't survive the removal of the docker volume which, again, needs to happen with some frequency. I am looking for a way to speed up or avoid the rebuilding of the single database which holds the geocoding data.
Approaches I have been or currently am considering:
Using a separate Docker volume on the same container to create persistent storage for a separate Postgres tablespace that holds only the geocoder database. This appears to be unworkable because while I can definitely set it up, the official PG docs say that tablespaces and clusters are inextricably linked such that the loss of the rest of the cluster would render the additional tablespace unusable. I would love to be wrong about this, since it seems like the simplest solution.
Creating an entirely separate container running Postgres, which mounts a volume to hold a separate cluster containing only the geocoding data. Presumably I would then need to do something kludgy with foreign data wrappers (or some more arcane postgres admin trickery that I don't know of at this point) to make the data seamlessly accessible from the application code.
So, my question: Does anyone know of a way to persist a single database from a dockerized Postgres cluster, without resorting to a dump and reload strategy?
If you want to speed up then you could convert your database dump to a data directory (import your dump to a clean postgres container, stop it and create a tarball of the data directory, then upload it somewhere). Now when you need to create a new postgres container use use a init script to stop the database, download and unpack your tarball to the data directory and start the database again, this way you skip the whole db restore process.
Note: The data tarball has to match the postgres major version so the container has no problem to start from it.
If you want to speed up things even more then create a custom postgres image with the tarball and init script bundled so everytime it starts then it will wipe the empty cluster and copy your own.
You could even change the entrypoint to use your custom script and load the database data, then call docker-entrypoint.sh so there is no need to delete a possible empty cluster.
This will only work if you are OK with replacing the whole cluster everytime you want to run your tests, else you are stuck with importing the database dump.

Database restore from a hacked system

A linux VM with postgres 9.4 was hacked into. (Two processes taking 100% cpu, weird files in /tmp, did not reoccur after kill(s) and restart.) It was decided to install the system from scratch on a new machine (with postgres 9.6). The only data needed was in one of postgres databases. A pg_dump of the database was made after the attack.
Regardless of whether the data - the tables/rows/etc. - were modified during the attack: is it safe to restore the database in the new system?
I consider using pg_restore with the -O option (ignores the user permissions)
The two dangers are:
important data could have been modified
back doors could have been installed in your database
With the first, you're on your own how to verify that your data are ok. The safest thing would be to use a backup from before the machine was compromized, but this would mean data loss.
For the second, I would run a pg_dumpall -s and spend a day reading it carefully. Compare it with a dump from a backup made before the breach. Watch out for weird object and column names and functions with SECURITY DEFINER.

Backing up the DB vs. backing up the VM

We're serving a Django/Postgres site running on a VM hypervisor. We're now trying to figure out our back up strategy and have two probable options:
Back up the DB directly using pg_dump
Back up the VM directly by copying the VM image
I'm with the latter as I think, I could simply back up everything that has to do with the site. I'm not sure whether I have to shut down the VM for this though.
What is a better and more recommended way of backing up a DB? Are there any reasons for not using the VM backup?
Thanks
The question basically boils down to, can you consider a hot copy of PostgreSQL's data files a backup?
The answer is: not really. PostgreSQL tries very hard through the use of WAL to ensure that its files are in a consistent state all the time and that it can survive a power failure, but starting it up from a copy of these files puts PostgreSQL into recovery mode. If the backup happened at the wrong second and PostgreSQL can't recover from the state of these files, your backup is useless. You don't want your backup/restore mechanism to depend on the recovery mechanism (unless you're dealing with "crash only" software, which PostgreSQL is not).
The probability of PostgreSQL not being able to recover from these files is not high, but it's not zero either. The probability of PostgreSQL not being able to load an SQL dump that it made, on the other hand, is zero. I prefer backup choices with lower probabilities of failure. pg_dump was designed for doing backups.
PostgreSQL recommends using pg_dump for backups, as a file system (or VM) backup requires the database to be shut down (and has other drawbacks):
http://www.postgresql.org/docs/8.1/static/backup-file.html
Edit: Also, a pg_dump backup will be significantly smaller than a filesystem dump of the same database.
There is an additional option. With PostgreSQL you can make an online backup that allows you to snapshot the file system and maintain consistency. You can see details here:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
We use this exact method for making backups when we run PostgreSQL in a VM.