We have a custom backup solution utilizing the ibx controls in Delphi to perform nightly automatic backups. As part of our current validation for a successful backup, we read the output logs generated by the backup looking for the "closing file, committing and finishing" verbiage that's last in the log file. Additionally we perform a full restore to a separate area to ensure the ibk file is valid. That's turning out to be problematic in terms of available drive space so looking for other ideas to make sure the backup is successful.
How else might we ensure that our ibk file is valid?
Jeff,
Not sure what your database size or backup file size is, and if they are too big for the remaining disk space. Can you share the database and backup size details?
Older InterBase (2017 and earlier) had a way for the command line tool, gbak, to pipe the output from the backup to another gbak process restoring from the backup. This would allow you to save the disk space on the backup file. But, since you are using the IBX backup/restore service, this is not possible. Also, InterBase 2020 has a different backup format which requires random (not sequential) write access to the backup file, thereby not allowing any pipe output even via the 'gbak' command line tool.
Here are a couple of ways to "reduce" the disk storage requirements that may work for you.
** Backup file **
You can have the InterBase backup service (from your application) store the target backup file in an external storage medium (HDD, USB stick etc., or a SAN disk/network file share). The backup/restore service can read/write backup file(s) from network shares/external medium.
** Restored database **
When restoring the database you can use the service parameter option UseAllSpace (http://docwiki.embarcadero.com/Libraries/Sydney/en/IBX.IBServices.TRestoreOptions), equivalent to gbak option "-use_all_space". This will save you about 20% space on restored data pages.
Turn off index creation, thereby reducing page consumption (possibly quite a bit depending on your index definitions). But, you will lose index validation because of this. "DeactivateIndexes" option (gbak option "-inactive") in the same page above.
Restore the database to a remote InterBase server with its own storage medium, or, to an attached USB stick or SAN disk. Since you are using the restored database only for validating the backup file, you can have this restored database on a slower I/O medium or a slower server over the network.
Related
Don't shoot me, I'm only the OP!
When needing to backup our DB, we always are able to shutdown postgresql completely. After it is down, I found I could copy the "/base" directory with the binary data in it to another location. Checksum the accuracy and am later able to restore that if/when necessary. This has even worked when upgrading to a later version of postgresql. Integrity of the various 'conf' files is not an issue as that is done elsewhere (ie. by other processes/procedures!) in the system.
Is there any risk to this approach that I am missing?
The "File System Level Backup" link in Abelisto's comment is what JoeG is talking about doing. https://www.postgresql.org/docs/current/static/backup-file.html
To be safe I would go up one more level, to "main" on our ubuntu systems to take the snapshot, and thoroughly go through the caveats of doing file-level backups. I was tempted to post the caveats here, but I'd end up quoting the entire page.
The thing to be most aware of (in a 'simple' postgres environment ) is the relationship between the postgres database, a user database and the pg_clog and pg_xlog files. If you only get the "base" you lose the transaction and WAL information, and in more complex installations, other 'necessary' information.
If those caveat conditions listed do not exist in your environment, and you can do a full shutdown, this is a valid backup strategy, which can be much faster than a pg_dump.
I am currently moving some data around and I am running into an interesting issue.
I have a CentOS server (6.3) up and running with Postgres 9.2 on a server with limited built in disk space; however, I do have a large amount of extremely reliable external network disk space available.
I have set the tablespace to a directory on this storage devise for my database and everything seems to be working well, until...
I realized that I have a large amount of BLOB data that needs to be stored in pg_largeobject.
I have been goggling how to set the tablespace of pg_largeobject and I did find some results, but they are horribly out dated.
I did find one article that looks promising, but I'm hesitant because the thread also references that things will/should have changed.
I have two questions...
In an ideal world, I would like to move all of postgres (including pg_largeobject) onto this external storage for ease of maintenance. Is this possible?
If not, how can I get pg_largeobject to use my network storage?
As you alluded to, your best bet is to move the entirety of PostgreSQL onto the remote storage, assuming that storage uses a reliable file network block device like iSCSI, ATAoE or NBD. I wouldn't recommend running Pg on NFS, and running it on CIFS/SMBFS just won't work.
Just:
Make a backup
Take a note of the output of SHOW data_directory; in psql
Shut PostgreSQL down
Move the data directory (the folder containing pg_xlog, pg_clog, etc) to the remote storage
Adjust the permissions on the parent directories for the datadir's new location to make sure the postgres user, postgres, group or others permissions block has at least execute on each parent directory so it can traverse the tree.
Adjust your system startup scripts to set the new location as the PostgreSQL datadir or symlink the old datadir location (output by SHOW data_directory) to the new location.
Start PostgreSQL
Unfortunately, different systems and packages find the datadir different ways. Debian/Ubuntu use pg_wrapper, for example.
I read a lot of documentations for db2 restore but I could not find how to perform online restore from the last database backup but without roll forwarding of logs?
I will appreciate command example.
On example my last online backup is made 1st february. I want to do ONLINE RESTORE of that backup but without logs after 1st February (similar with offline restore option WITHOUT ROLL FORWARD).
I am using db2 9.7
Thank you in advance
The database backup contains a snapshot of the tablespaces, and they may not be in stable state. Roll-forward is always required (unless you want to take insane risks by forcing DB2 to start using potentially corrupt data) to reach the nearest stable state.
If you are asking your question because you want manageable database backup dumps without having to worry about shipping logs etc, use the INCLUDE LOGS option when taking the backup. It will include in the backup file the minimum set of transaction logs that would be required for reaching stable state. When restoring you could then use the LOGS to extract them and then ROLLFORWARD DATABASE for the required typical 0-x seconds (depending on your database transactions).
A lazy dba would probably just use the RECOVER DB SAMPLE TO 2013-02-01-00.00.00 and allow the DB2 to worry about all the details. It will automatically fetch the required database backup and transaction files (even from the backup tapes etc if you set them up correctly), and do everything for you - as long as you don't attempt to manually manage them.
We're serving a Django/Postgres site running on a VM hypervisor. We're now trying to figure out our back up strategy and have two probable options:
Back up the DB directly using pg_dump
Back up the VM directly by copying the VM image
I'm with the latter as I think, I could simply back up everything that has to do with the site. I'm not sure whether I have to shut down the VM for this though.
What is a better and more recommended way of backing up a DB? Are there any reasons for not using the VM backup?
Thanks
The question basically boils down to, can you consider a hot copy of PostgreSQL's data files a backup?
The answer is: not really. PostgreSQL tries very hard through the use of WAL to ensure that its files are in a consistent state all the time and that it can survive a power failure, but starting it up from a copy of these files puts PostgreSQL into recovery mode. If the backup happened at the wrong second and PostgreSQL can't recover from the state of these files, your backup is useless. You don't want your backup/restore mechanism to depend on the recovery mechanism (unless you're dealing with "crash only" software, which PostgreSQL is not).
The probability of PostgreSQL not being able to recover from these files is not high, but it's not zero either. The probability of PostgreSQL not being able to load an SQL dump that it made, on the other hand, is zero. I prefer backup choices with lower probabilities of failure. pg_dump was designed for doing backups.
PostgreSQL recommends using pg_dump for backups, as a file system (or VM) backup requires the database to be shut down (and has other drawbacks):
http://www.postgresql.org/docs/8.1/static/backup-file.html
Edit: Also, a pg_dump backup will be significantly smaller than a filesystem dump of the same database.
There is an additional option. With PostgreSQL you can make an online backup that allows you to snapshot the file system and maintain consistency. You can see details here:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
We use this exact method for making backups when we run PostgreSQL in a VM.
I am currently using pg_dump piped to gzip piped to split. But the problem with this is that all output files are always changed. So checksum-based backup always copies all data.
Are there any other good ways to perform an incremental backup of a PostgreSQL database, where a full database can be restored from the backup data?
For instance, if pg_dump could make everything absolutely ordered, so all changes are applied only at the end of the dump, or similar.
Update: Check out Barman for an easier way to set up WAL archiving for backup.
You can use PostgreSQL's continuous WAL archiving method. First you need to set wal_level=archive, then do a full filesystem-level backup (between issuing pg_start_backup() and pg_stop_backup() commands) and then just copy over newer WAL files by configuring the archive_command option.
Advantages:
Incremental, the WAL archives include everything necessary to restore the current state of the database
Almost no overhead, copying WAL files is cheap
You can restore the database at any point in time (this feature is called PITR, or point-in-time recovery)
Disadvantages:
More complicated to set up than pg_dump
The full backup will be much larger than a pg_dump because all internal table structures and indexes are included
Does not work well for write-heavy databases, since recovery will take a long time.
There are some tools such as pitrtools and omnipitr that can simplify setting up and restoring these configurations. But I haven't used them myself.
Also check out http://www.pgbackrest.org
pgBackrest is another backup tool for PostgreSQL which you should be evaluating as it supports:
parallel backup (tested to scale almost linearly up to 32 cores but can probably go much farther..)
compressed-at-rest backups
incremental and differential (compressed!) backups
streaming compression (data is compressed only once at the source and then transferred across the network and stored)
parallel, delta restore (ability to update an older copy to the latest)
Fully supports tablespaces
Backup rotation and archive expiration
Ability to resume backups which failed for some reason
etc, etc..
Another method is to backup to plain text and use rdiff to create incremental diffs.