PostgreSQL backup with smallest output files - postgresql-8.1

We have a Postgresql database that is over 732 GB when backed as a file system backup. When we do a pg_dump we can get it down to 585 GB. If I combined the pg_dump with the PITR method will this give me the best backup with smallest backup data file size? My plan was to run the pg_start_backup, then the pg_dump, then the pg_stop_backup. I know the documentation states to run a file system backup but I want a smaller backup data set. I would then copy off WAL files and then backup them up at night.

To truly get the smallest file, you'll have to try compressing your pg_dump -Fc dump file with one of many compression tools and settings. Using gzip or xz with maximum possible compression would be a start. This will of course require an excellent CPU and lots of CPU time.

Related

Is it possible to restore a MongoDB database from a .bson file quicker than mongorestore?

I have a very huge database from an old backup. It's about 500GB total and it's a .bson file. At the current rate of my harddrive and CPU, I am done in probably 10-20 hours. EDIT: About 9 hours.
I simply ran:
mongorestore -d database -c collection C:\very_large_backup.bson
Is it possible for MongoDB to simply access the .bson file directly, or is mongorestore the only option I have?
I plan on moving to a Microsoft SQL Server with this data (discarding the extra bits of information that might overlap). Maybe there's a faster way that way?

restore a PostgreSQL backup

I am new in postgresql, and I have a big dataset which is a postgresql backup. I have problem to import this dataset to my PostgreSQL.
Actually, this is a "pgdata" format, consists of some files and folders. One of these folders (base folder) has all the main files, (2000 files, each of which is 1 GB). But all of these files are in the "file" format, with no extension!!
I would be so grateful if you could give me some advice on this issue and help me to restore this backup.
Best,
From your description I guess that you have a physical backup, a copy of the data files like pg_basebackup creates.
If there is a backup_label file in the backup, and all the required WAL files are in the pg_xlog (or pg_wal) directory, then all you have to do is start the server on the data directory (pg_ctl start -D <directory here>) and wait until recovery has completed.
Then you can use pg_dump and pg_restore to extract the data from this new PostgreSQL cluster and import it into the destination.

Backup taken from pgadmin is smaller than backup taken from pgdump

Hello experts I am using postgres 9.5 . When I take a backup from pgadmin it has 950 MB size but when i take the same db backup from pgdump.exe command the backup size is with 7.5 GB. I am confused which backup file will be secured for me that I can use to restore? the restoring process is also slow in postgresql. Please help me.
When you backup something in pgadmin it just calls pg_dump with appropriate options, so both your backups are made by the same pg_dump utility.
I guess you're comparing dumps in two different formats.
Default format for pg_dump is plain, which is basically an enormous uncompressed SQL file.
As for pgadmin, it uses custom format by default, which is a highly compressed binary file.
Also note that pgadmin always displays the actual pg_dump command used to create your dump in the log window, along with its full output.
You should be able to call this command in your command prompt to generate an identical backup file.
You can read more about different output formats and other pg_dump options in PostgreSQL docs.

A faster way to copy a postgresql database (or the best way)

I did a pg_dump of a database and am now trying to install the resulting .sql file on to another server.
I'm using the following command.
psql -f databasedump.sql
I initiated the database install earlier today and now 7 hours later the database is still being populated. I don't know if this his how long it is supposed to take, but I continue to monitor it, so far I've seen over 12 millon inserts and counting. I suspect there's a faster way to do this.
Create your dumps with
pg_dump -Fc -Z 9 --file=file.dump myDb
Fc
Output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default.
Z 9: --compress=0..9
Specify the compression level to use. Zero means no compression. For the custom archive format, this specifies compression of individual table-data segments, and the default is to compress at a moderate level. For plain text output, setting a nonzero compression level causes the entire output file to be compressed, as though it had been fed through gzip; but the default is not to compress. The tar archive format currently does not support compression at all.
and restore it with
pg_restore -Fc -j 8 file.dump
-j: --jobs=number-of-jobs
Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machine.
Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.
The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreased performance because of thrashing.
Only the custom and directory archive formats are supported with this option. The input must be a regular file or directory (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.
Links:
pg_dump
pg_restore
Improve pg dump&restore
PG_DUMP | always use format directory with -j option
time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external
PG_RESTORE | always use tuning for postgres.conf with format directory With -j option
work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/`
For more info
https://gitlab.com/yanar/Tuning/wikis/improve-pg-dump&restore
Why are you producing a raw .sql dump? The opening description of pg_dump recommends the "custom" format -Fc.
Then you can use pg_restore which will restore your data (or selected parts of it). There is a "number of jobs" option -j which can use multiple cores (assuming your disks aren't already the limiting factor). In most cases, on a modern machine you can expect at least some gains from this.
Now you say "I don't know how long this is supposed to take". Well, until you've done a few restores you won't know. Do monitor what your system is doing and whether you are limited by cpu or disk I/O.
Finally, the configuration settings you want for restoring a database are not those you want to run it. A couple of useful starters:
Increase maintenance_work_mem so you can build indexes in larger chunks
Turn off fsync during the restore. If your machine crashes, you'll start from scratch again anyway.
Do remember to reset them after the restore though.
The usage of pg_dump is generally recommended to be paired with pg_restore, instead of psql. This method can be split among cores to speed up the loading process by passing the --jobs flag as such:
$ pg_dump -Fc db > db.Fc.dump
$ pg_restore -d db --jobs=8 db.Fc.dump
Postgres themselves have a guide on bulk loading of data.
I also would recommend heavily tuning your postgresql.conf configuration file and set appropriately high values for the maintenance_work_mem and checkpoint_segments values; higher values on these may dramatically increase your write performance.

Best method for PostgreSQL incremental backup

I am currently using pg_dump piped to gzip piped to split. But the problem with this is that all output files are always changed. So checksum-based backup always copies all data.
Are there any other good ways to perform an incremental backup of a PostgreSQL database, where a full database can be restored from the backup data?
For instance, if pg_dump could make everything absolutely ordered, so all changes are applied only at the end of the dump, or similar.
Update: Check out Barman for an easier way to set up WAL archiving for backup.
You can use PostgreSQL's continuous WAL archiving method. First you need to set wal_level=archive, then do a full filesystem-level backup (between issuing pg_start_backup() and pg_stop_backup() commands) and then just copy over newer WAL files by configuring the archive_command option.
Advantages:
Incremental, the WAL archives include everything necessary to restore the current state of the database
Almost no overhead, copying WAL files is cheap
You can restore the database at any point in time (this feature is called PITR, or point-in-time recovery)
Disadvantages:
More complicated to set up than pg_dump
The full backup will be much larger than a pg_dump because all internal table structures and indexes are included
Does not work well for write-heavy databases, since recovery will take a long time.
There are some tools such as pitrtools and omnipitr that can simplify setting up and restoring these configurations. But I haven't used them myself.
Also check out http://www.pgbackrest.org
pgBackrest is another backup tool for PostgreSQL which you should be evaluating as it supports:
parallel backup (tested to scale almost linearly up to 32 cores but can probably go much farther..)
compressed-at-rest backups
incremental and differential (compressed!) backups
streaming compression (data is compressed only once at the source and then transferred across the network and stored)
parallel, delta restore (ability to update an older copy to the latest)
Fully supports tablespaces
Backup rotation and archive expiration
Ability to resume backups which failed for some reason
etc, etc..
Another method is to backup to plain text and use rdiff to create incremental diffs.