Can I copy the postgresql /base directory as a DB backup? - postgresql

Don't shoot me, I'm only the OP!
When needing to backup our DB, we always are able to shutdown postgresql completely. After it is down, I found I could copy the "/base" directory with the binary data in it to another location. Checksum the accuracy and am later able to restore that if/when necessary. This has even worked when upgrading to a later version of postgresql. Integrity of the various 'conf' files is not an issue as that is done elsewhere (ie. by other processes/procedures!) in the system.
Is there any risk to this approach that I am missing?

The "File System Level Backup" link in Abelisto's comment is what JoeG is talking about doing. https://www.postgresql.org/docs/current/static/backup-file.html
To be safe I would go up one more level, to "main" on our ubuntu systems to take the snapshot, and thoroughly go through the caveats of doing file-level backups. I was tempted to post the caveats here, but I'd end up quoting the entire page.
The thing to be most aware of (in a 'simple' postgres environment ) is the relationship between the postgres database, a user database and the pg_clog and pg_xlog files. If you only get the "base" you lose the transaction and WAL information, and in more complex installations, other 'necessary' information.
If those caveat conditions listed do not exist in your environment, and you can do a full shutdown, this is a valid backup strategy, which can be much faster than a pg_dump.

Related

Change path of Firebird Secondary database files

I have created a Firebird multi-file database
Main Database file D:\Database\MainDB.fdb
Secondary files (240 Files) located under D:\Database\DBFiles\Data001.fdb to D:\Database\DBFiles\Data240.fdb
When copy database to another location and trying to open it Firebird doesn't locate the files if they are not in D:\ partition
I want Firebird to locate the secondary files under Database\DBFiels folder at the new path.
So if I copy the database to C:\Database\MainDB.fdb
Firebird would open Data001.fdb in new path like C:\Database\DBFiles instead of old path in D:\Database\DBFiles where they were initially created
Can that be done with Firebird? if not, then how it should be done?
Update:
Finally I found out it's not possiable to change Firebird database secondary files usign Firebird.
but I found this Firebird FAQ mention GLINK tool but It doesn't support Firebird 3.x so I didn't test it, and It's not recommended to use it even with supported versions of Firebird.
Done what exactly?
UPD. I edited the very vague original question to make clear WHAT the topic starter wants.
You can not reliably "copy files with Firebird" - Firebird is not files copying tool. You can to a degree use EXTERNAL TABLE for raw files access, but very limited and not upon the database itself.
It is dangerous practice to "copy databases" while Firebird is working, because you would only copy part of the data. The recently updated data that is in memory cache but did not yet made it on disk would be lost. The database file would be inconsistent with some data updated and some not yet. When you "copy database files" you have first to shutdown either those databases or even the whole Firebird server.
Firebird has it's own tools for moving databases around - and those are called backup/restore tools. Maybe what you need is nbackup tool, if gbak is too slow for you.
Finally, you can list files that comprise the database. You can do it via gstat utility or via "Services API" it uses. You also can select from RDB$FILES system table. However what would you do after you did it? The very access to the database makes it badly suited for consequent copying (#2). You would perhaps need to shutdown database, turn it to read-only AND single-user state, and only then attach to it and read RDB$FILES. And after copying done - you would have to de-shutdown the database. Kinda much more complex than nbackup.
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gstat-example-header.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gfix-dbstartstop.html
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref-appx04-files.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gbak.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/nbackup.html

Database restore from a hacked system

A linux VM with postgres 9.4 was hacked into. (Two processes taking 100% cpu, weird files in /tmp, did not reoccur after kill(s) and restart.) It was decided to install the system from scratch on a new machine (with postgres 9.6). The only data needed was in one of postgres databases. A pg_dump of the database was made after the attack.
Regardless of whether the data - the tables/rows/etc. - were modified during the attack: is it safe to restore the database in the new system?
I consider using pg_restore with the -O option (ignores the user permissions)
The two dangers are:
important data could have been modified
back doors could have been installed in your database
With the first, you're on your own how to verify that your data are ok. The safest thing would be to use a backup from before the machine was compromized, but this would mean data loss.
For the second, I would run a pg_dumpall -s and spend a day reading it carefully. Compare it with a dump from a backup made before the breach. Watch out for weird object and column names and functions with SECURITY DEFINER.

Backing up the DB vs. backing up the VM

We're serving a Django/Postgres site running on a VM hypervisor. We're now trying to figure out our back up strategy and have two probable options:
Back up the DB directly using pg_dump
Back up the VM directly by copying the VM image
I'm with the latter as I think, I could simply back up everything that has to do with the site. I'm not sure whether I have to shut down the VM for this though.
What is a better and more recommended way of backing up a DB? Are there any reasons for not using the VM backup?
Thanks
The question basically boils down to, can you consider a hot copy of PostgreSQL's data files a backup?
The answer is: not really. PostgreSQL tries very hard through the use of WAL to ensure that its files are in a consistent state all the time and that it can survive a power failure, but starting it up from a copy of these files puts PostgreSQL into recovery mode. If the backup happened at the wrong second and PostgreSQL can't recover from the state of these files, your backup is useless. You don't want your backup/restore mechanism to depend on the recovery mechanism (unless you're dealing with "crash only" software, which PostgreSQL is not).
The probability of PostgreSQL not being able to recover from these files is not high, but it's not zero either. The probability of PostgreSQL not being able to load an SQL dump that it made, on the other hand, is zero. I prefer backup choices with lower probabilities of failure. pg_dump was designed for doing backups.
PostgreSQL recommends using pg_dump for backups, as a file system (or VM) backup requires the database to be shut down (and has other drawbacks):
http://www.postgresql.org/docs/8.1/static/backup-file.html
Edit: Also, a pg_dump backup will be significantly smaller than a filesystem dump of the same database.
There is an additional option. With PostgreSQL you can make an online backup that allows you to snapshot the file system and maintain consistency. You can see details here:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
We use this exact method for making backups when we run PostgreSQL in a VM.

Best method for PostgreSQL incremental backup

I am currently using pg_dump piped to gzip piped to split. But the problem with this is that all output files are always changed. So checksum-based backup always copies all data.
Are there any other good ways to perform an incremental backup of a PostgreSQL database, where a full database can be restored from the backup data?
For instance, if pg_dump could make everything absolutely ordered, so all changes are applied only at the end of the dump, or similar.
Update: Check out Barman for an easier way to set up WAL archiving for backup.
You can use PostgreSQL's continuous WAL archiving method. First you need to set wal_level=archive, then do a full filesystem-level backup (between issuing pg_start_backup() and pg_stop_backup() commands) and then just copy over newer WAL files by configuring the archive_command option.
Advantages:
Incremental, the WAL archives include everything necessary to restore the current state of the database
Almost no overhead, copying WAL files is cheap
You can restore the database at any point in time (this feature is called PITR, or point-in-time recovery)
Disadvantages:
More complicated to set up than pg_dump
The full backup will be much larger than a pg_dump because all internal table structures and indexes are included
Does not work well for write-heavy databases, since recovery will take a long time.
There are some tools such as pitrtools and omnipitr that can simplify setting up and restoring these configurations. But I haven't used them myself.
Also check out http://www.pgbackrest.org
pgBackrest is another backup tool for PostgreSQL which you should be evaluating as it supports:
parallel backup (tested to scale almost linearly up to 32 cores but can probably go much farther..)
compressed-at-rest backups
incremental and differential (compressed!) backups
streaming compression (data is compressed only once at the source and then transferred across the network and stored)
parallel, delta restore (ability to update an older copy to the latest)
Fully supports tablespaces
Backup rotation and archive expiration
Ability to resume backups which failed for some reason
etc, etc..
Another method is to backup to plain text and use rdiff to create incremental diffs.

Can I just backup postgres's directory while it's running?

I need to back up my database but I don't have enough disk space to dump it. Can I just use duplicity to perform incremental backups on the data directory? Would that corrupt the backup somehow? I don't mind a few of the latest rows missing, but I would like my backup to not be destroyed.
Does anyone know what the case is?
Thanks!
See this page. I believe duplicity uses rsync type mechanism, so you cannot simply grab the directory and go - see the page of that page about rsync. If you need to do a file system level backup, while online, then you'll need some sort of atomicity like snapshots.
Most likely, the backup simply wouldn't work.
Postgres has lots of backup options though, like PITR. I suggest a read through the fine manual.
No, you can't backup the data directory while the database service is running. You could backup the WAL-segments for a point in time recovery when you want to restore. You have to make sure you test your recovery as well, it's a litte more complicated then pgrestore of an ordinary dump.