Can I just backup postgres's directory while it's running? - postgresql

I need to back up my database but I don't have enough disk space to dump it. Can I just use duplicity to perform incremental backups on the data directory? Would that corrupt the backup somehow? I don't mind a few of the latest rows missing, but I would like my backup to not be destroyed.
Does anyone know what the case is?
Thanks!

See this page. I believe duplicity uses rsync type mechanism, so you cannot simply grab the directory and go - see the page of that page about rsync. If you need to do a file system level backup, while online, then you'll need some sort of atomicity like snapshots.
Most likely, the backup simply wouldn't work.
Postgres has lots of backup options though, like PITR. I suggest a read through the fine manual.

No, you can't backup the data directory while the database service is running. You could backup the WAL-segments for a point in time recovery when you want to restore. You have to make sure you test your recovery as well, it's a litte more complicated then pgrestore of an ordinary dump.

Related

Could not open file "pg_clog/0000": No such file or directory

I am getting the error like following while accessing a Postgres database
ERROR: could not access status of transaction 69675
DETAIL: Could not open file "pg_clog/0000": No such file or directory.
I didn't do anything with the pg_clog folder but the 0000 file is not there.
Is there any way to recover that file or in any way to fix this issue?
Any help would be appreciated.
You are experiencing database corruption, and you should restore from a backup. You should try to figure out what happened to the database so you can prevent it in the future.
Is your storage reliable?
Are you using dangerous settings like fsync = off?
Were there any crashes recently?
Are you really running 9.1? If yes, you shouldn't do that, as it is out of support.
Are there any files in the pg_clog directory? There should be.
Did you have an out-of-space problem recently that may have led someone to remove files from a “log” directory?
As stated in the previous response, you're better off restoring from backup, however, I discovered the metadata for those transaction files are not stored in the same location as the data when we restored the data on a server where we were doing some testing with full vacuum and needed to restore the database to an earlier state before the vacuum. In the event where your data integrity isn't as critical like a test database you can get away with creating empty files for the missing transaction logs like this:
dd if=/dev/zero of=/path/to/db/pg_clog/xxxx bs=256k count=1
chown postgres.postgres /path/to/db/pg_clog/xxxx
chmod go-rwX /path/to/db/pg_clog/xxxx
There may be multiple missing files, but if it's just a few files this is an alternative to consider.

Can I copy the postgresql /base directory as a DB backup?

Don't shoot me, I'm only the OP!
When needing to backup our DB, we always are able to shutdown postgresql completely. After it is down, I found I could copy the "/base" directory with the binary data in it to another location. Checksum the accuracy and am later able to restore that if/when necessary. This has even worked when upgrading to a later version of postgresql. Integrity of the various 'conf' files is not an issue as that is done elsewhere (ie. by other processes/procedures!) in the system.
Is there any risk to this approach that I am missing?
The "File System Level Backup" link in Abelisto's comment is what JoeG is talking about doing. https://www.postgresql.org/docs/current/static/backup-file.html
To be safe I would go up one more level, to "main" on our ubuntu systems to take the snapshot, and thoroughly go through the caveats of doing file-level backups. I was tempted to post the caveats here, but I'd end up quoting the entire page.
The thing to be most aware of (in a 'simple' postgres environment ) is the relationship between the postgres database, a user database and the pg_clog and pg_xlog files. If you only get the "base" you lose the transaction and WAL information, and in more complex installations, other 'necessary' information.
If those caveat conditions listed do not exist in your environment, and you can do a full shutdown, this is a valid backup strategy, which can be much faster than a pg_dump.

How can I force drop a broken Postgres database?

I have a database that seems to be broken for some reason. It's a development db for rails so I don't have a backup but I do need to continue development. I tried to just drop it but that's not working.
$ dropdb "database-name"
dropdb: database removal failed: ERROR: could not open file "global/2964": No such file or directory
Thanks in advance for any help!
There's more wrong here than a "broken" database. Something is badly wrong with your PostgreSQL data directory.
global/9264 looks like it's pg_catalog.pg_db_role_setting, which stores ALTER DATABASE ... SET ... and ALTER ROLE ... SET ... settings. This is not database-specific, it's a global table.
If you have missing files in your data directory your whole PostgreSQL data directory is probably damaged. You should back up what you can, if there's anything you care about, then rename or delete the damaged data directory and initdb a new blank one.
You won't be able to DROP this database (or do much else) because PostgreSQL can't load the files for the pg_db_role_setting table, but it needs to delete entries referring to the dropped database from there.
As for how this happened:
Have you ever run with fsync = off in postgresql.conf?
Do you have SSD storage? If so, have you had any recent sudden power loss?
Have you ever done any direct modifications of any kind inside the PostgreSQL data directory?
Is the PostgreSQL data directory on external storage that might have been suddenly removed?
Have you ever deleted postmaster.pid ?
See also https://wiki.postgresql.org/wiki/Corruption

Backing up the DB vs. backing up the VM

We're serving a Django/Postgres site running on a VM hypervisor. We're now trying to figure out our back up strategy and have two probable options:
Back up the DB directly using pg_dump
Back up the VM directly by copying the VM image
I'm with the latter as I think, I could simply back up everything that has to do with the site. I'm not sure whether I have to shut down the VM for this though.
What is a better and more recommended way of backing up a DB? Are there any reasons for not using the VM backup?
Thanks
The question basically boils down to, can you consider a hot copy of PostgreSQL's data files a backup?
The answer is: not really. PostgreSQL tries very hard through the use of WAL to ensure that its files are in a consistent state all the time and that it can survive a power failure, but starting it up from a copy of these files puts PostgreSQL into recovery mode. If the backup happened at the wrong second and PostgreSQL can't recover from the state of these files, your backup is useless. You don't want your backup/restore mechanism to depend on the recovery mechanism (unless you're dealing with "crash only" software, which PostgreSQL is not).
The probability of PostgreSQL not being able to recover from these files is not high, but it's not zero either. The probability of PostgreSQL not being able to load an SQL dump that it made, on the other hand, is zero. I prefer backup choices with lower probabilities of failure. pg_dump was designed for doing backups.
PostgreSQL recommends using pg_dump for backups, as a file system (or VM) backup requires the database to be shut down (and has other drawbacks):
http://www.postgresql.org/docs/8.1/static/backup-file.html
Edit: Also, a pg_dump backup will be significantly smaller than a filesystem dump of the same database.
There is an additional option. With PostgreSQL you can make an online backup that allows you to snapshot the file system and maintain consistency. You can see details here:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
We use this exact method for making backups when we run PostgreSQL in a VM.

Best method for PostgreSQL incremental backup

I am currently using pg_dump piped to gzip piped to split. But the problem with this is that all output files are always changed. So checksum-based backup always copies all data.
Are there any other good ways to perform an incremental backup of a PostgreSQL database, where a full database can be restored from the backup data?
For instance, if pg_dump could make everything absolutely ordered, so all changes are applied only at the end of the dump, or similar.
Update: Check out Barman for an easier way to set up WAL archiving for backup.
You can use PostgreSQL's continuous WAL archiving method. First you need to set wal_level=archive, then do a full filesystem-level backup (between issuing pg_start_backup() and pg_stop_backup() commands) and then just copy over newer WAL files by configuring the archive_command option.
Advantages:
Incremental, the WAL archives include everything necessary to restore the current state of the database
Almost no overhead, copying WAL files is cheap
You can restore the database at any point in time (this feature is called PITR, or point-in-time recovery)
Disadvantages:
More complicated to set up than pg_dump
The full backup will be much larger than a pg_dump because all internal table structures and indexes are included
Does not work well for write-heavy databases, since recovery will take a long time.
There are some tools such as pitrtools and omnipitr that can simplify setting up and restoring these configurations. But I haven't used them myself.
Also check out http://www.pgbackrest.org
pgBackrest is another backup tool for PostgreSQL which you should be evaluating as it supports:
parallel backup (tested to scale almost linearly up to 32 cores but can probably go much farther..)
compressed-at-rest backups
incremental and differential (compressed!) backups
streaming compression (data is compressed only once at the source and then transferred across the network and stored)
parallel, delta restore (ability to update an older copy to the latest)
Fully supports tablespaces
Backup rotation and archive expiration
Ability to resume backups which failed for some reason
etc, etc..
Another method is to backup to plain text and use rdiff to create incremental diffs.