pg_dump with -j option and -Z - postgresql

I am about to backup 120 Gb database. I kept on failing when using PGADMIN backup (because of VPN disconnection after 7 hours running) or SQLMaestro (out of memory issue after 3 hours running).
So I want to run it on the server using pg_dump. The command I want to use is : time pg_dump -j 5 -Fc -Z 1 db_profile_20210714 -f /var/lib/postgresql/backup2/ (I want to measure the time as well, so I put time). And after that I will run pg_dumpall -g
I have 30 cores server and backup drive mounted on NFS. Postgres 12 running on Ubuntu 12.
Questions :
If I use -Z 0, will it undo the default compression of -Fc ? (-Fc is compressed by default)
Does the usage of -j 5 and -Z 1 counter productive to each other ? I read from article that to throttle pg_dump process so that it wont cause I/O spike, one can use -Z between 3 and 5. But what if some one want to utilize the cores and compress at once, is it effective / efficient ?
Thanks

Yes, if you use -Z 0, the custom format dump will be uncompressed. -j and -Z are independent from each other, and you cannot use -j with the custom format. If using compression speeds up the dump or not depends on your bottleneck. If that is the network, compression can help. Otherwise, compression usually makes pg_dump slower.

Related

How to improve pg_basebackup speed

How to improve pg_basebackup speed. I can not find the parallelism option for pg_basebackup. There is no any parallelism option for pg_basebackup? Thank you. I just want to create slave database fast. The database is 5TB and it takes more time for creating Slave database. If there is no any parallel option how can I avoid this time problem?
Command for creating Slave
pg_basebackup -Xs -h 172.31.34.215 -U repuser --checkpoint=fast -D /var/lib/postgresql/14/ter -R --slot=replication_slot -C

trickle does not limit the bandwidth of gsutil

I have tried to copy a .mp4 file from my local directory to my google cloud bucket,
using:
gsutil cp my_filefile.mp4 gs://my_bucket
This part works as expected, but when i try to limit the bandwidth, using:
trickle -d 10 -u 10 gsutil cp my_filefile.mp4 gs://my_bucket
the uploading happens at the same rate, and not with 10 kb/s. I have read that trickle does not handle static executable files, which the .mp4 appears to be since when running ldd my_file.mp4, in the terminal, it returns not a dynamic executable.
Has anyone experienced the same issue, and if that is the case, how was the problem handled, or am i approaching this issue the wrong way?
UPDATE 1:
Turns out it does not matter what file i use. gsutil still bypasses trickle somehow. I have tested to see if trickle worked with other programs, and it performed as expected, with bandwidth control.
I have also tested gsutil mv and gsutil rsync, with the same results, as with cp. I have also tested the bandwidth throttling on an arm64 system, with the same results.
You should limit the number of thread and process as described in the documentation. Trickle shouldn't been applied in case of multi process.
trickle -d 10 -u 10 gsutil -o "GSUtil:parallel_process_count=1" \
-o "GSUtil:parallel_thread_count=1" cp my_filefile.mp4 gs://my_bucket

Postgresql "pg_restore.exe" taking over a day to complete

I'm on a Windows Server 2016 machine. I have run pg_dump.exe on a 3gb postgres 9.4 database using the -Fc format.
When I run pg_restore to a local database (9.6):
pg_restore.exe -O -x -C -v -f c:/myfilename
The command runs for over 24 hours. (Still running)
Similar to this issue: Postgres Restore taking ages (days)
I am using the verbose cli option, which looks to be spitting out a lot of JSON. I'm assuming that's getting inserted into tables. The task manager has the CPU at 0%, using .06MB of memory. Looks like I should add more jobs next time, but this still seems pretty ridiculous.
I prefer using a linux machine, but this is what the client provided. Any suggestions?
pg_restore.exe -d {db_name} -O -x c:/myfilename
Did the trick.
I got rid of the -C and manually created the database prior to running the command. I also realized that connection options should come before other options:
pg_restore [connection-option...] [option...] [filename]
see postgres documentation for more.

Mongo DB Backup & restore

I have a DB with 150GB of data's . Am using MongoDump and Mongorestore method to back and restore.
My Production server is running with Mongo 2.2 and Test server is with 2.6.1
When i take a back up from production server Mongo2.2 its taking long time to complete the back up for 150GB of data. And restoration take 6-8 Hours. its not completed without error, some times restore is dropped automatically and we need to run the restore again or restore the missed collection.
is there a best way to to take a backup and restore method, where we can save time and run it without Errors?
Regards,
Rishi
You can couple of options for native backup and restore functionality and these are listed very well in the documentation at http://docs.mongodb.org/manual/administration/backup/.
Just to summarize, as your data grows, mongodump / mongorestore becomes less ideal for backup / restore purpose and you should start looking at other options like:
File system based snapshots or LVM snapshots (Since you are in EC2, this should be fairly straight forwards)
MMS Backup
The best method is to Backup and Restore Using LVM on a Linux System.
Creating a Snapshot:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
Archive a Snapshot:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz
Restore a Snapshot:
lvcreate --size 1G --name mdb-new vg0
gzip -d -c mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Restore Directly from a Snapshot:
umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Remote Backup Storage:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | ssh username#example.com gzip > /opt/backup/mdb-snap01.gz
lvcreate --size 1G --name mdb-new vg0
ssh username#example.com gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb

After tuning Postgresql, PgBench results are worse

I am testing PostgreSQL on an 8gb Ram/4 CPUs/ 80gb SSD cloud server from Digital Ocean. I originally ran PgBench with default settings in the postgresql.conf, and then altered some common settings--shared_buffers, work_mem, maintenance_work_mem, effective_cache_size--to reflect the 8gb of RAM. After running the 2nd set of tests, I noticed that some of my results were actually worse. Any suggestions on why this might be? I am rather new to PgBench and tuning PostgreSQL in general.
Settings:
shared_buffers = 2048mb
work_mem = 68mb
maintenance_work_mem = 1024mb
effective_cache_size = 4096mb
Tests:
pgbench -i -s 100
pgbench -c 16 -j 2 -T 60 -U postgres postgres
pgbench -S -c 16 -j 2 -T 60 -U postgres postgres
pgbench -c 16 -j 4 -T 60 -U postgres postgres
pgbench -S -c 16 -j 4 -T 60 -U postgres postgres
pgbench -c 16 -j 8 -T 60 -U postgres postgres
pgbench -S -c 16 -j 8 -T 60 -U postgres postgres
How effective are these tests? Is this an effective way to employ PgBench? How should I customize tests to properly reflect my data and server instance?
What is mean "worse"? How long time you run pgbench? This test should be executed about 2hour as minimum for realistic values. What version of PostgreSQL do you have?
Attention: You should be very careful about interpretation pgbench result. Probably you should to optimize execution of your application, not pgbench. pgbench is good for hw or sw checking, is bad tool for optimizing of PostgreSQL configuration.
A mentioned configuration variables are basic for configuration and you probably cannot to be wrong there (server must not use a swap actively ever - and these variables ensure it).
A formula that I use:
-- Dedicated server 8GB RAM
shared_buffers = 1/3 .. 1/4 dedicated RAM
effecttive_cache_size = 2/3 dedicated RAM
maintenance_work_mem > higher than the most big table (if possible)
else 1/10 RAM
else max_connection * 1/4 * work_mem
work_mem = precious setting is based on slow query analyse
(first setting about 100MB)
--must be true
max_connection * work_mem * 2 + shared_buffers
+ 1GB (O.S.) + 1GB (filesystem cache) <= RAM size
Usually default values of WAL buffer size and checkpoint segments is too low too. And you can increase it.