Migrating 200GB of Postgres data from 9.0 to 9.6 - postgresql

We have a simple database with just 5 tables. But 1 table is huge, around 100GB of data by itself, and the indices together are nearly double that size. The server is an old CentOS 5 server with PG 9.0. I'm moving to a more modern setup with SSD hard disks, CentOS 7, and PG 9.6.
Question: what's the best way to migrate data in a simple way. pg_dump it on the old server, move it via rsync or something to the new server and pg_restore? I could do the pg_dump with -Fc option, so that we can pg_restore it easily (otherwise it's a text format and we have to use psql -f instead). But a trial run suggested that while the pg_dump is OK, the pg_restore on the destination server, which is much faster, goes on and on. We did a pg_restore --verbose, but there was no verbosity at all. Perhaps the server was stuck doing IO?
Our pg.conf settings for the pg_restore are as follows:
maintenance_work_mem = 1500MB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
What should we do to ensure that the pg_restore works? Right now both servers are offline, so I can do pretty much anything needed -- any settings can be changed.
Some more background info--
Old server: CentOS 5, SCSI RAID 1 disks, 4GB RAM (not much), PG 9.0
New server: CentOS 7 (latest), SSD disk, 16GB RAM, PG 9.6
Thank you for any pointers on moving large tables in the best way possible. The usual PG documentation doesn't seem to be helping. We've tried both the text dump way and the -Fc way.

I strongly suggest you pg_upgrade:
Install 9.0.23 on the new server. From source if necessary.
Set up a streaming replica on the new server using pg_basebackup and a suitable recovery.conf. Enable WAL archiving and restore_command too, in case it becomes desynchronised for any reason.
Also install 9.6 on the new server
Do an upgrade test by stopping the replica and attempting a pg_upgrade to 9.6. Restart the replica, fix any issues and repeat until you succeed.
When you're confident pg_upgrade will succeed, plan a cut-over time. Stop the 9.0 master and stop the replica. pg_upgrade the replica. Start the new 9.6 server.
See the pg_upgrade documentation for more info.
Remember: KEEP BACKUPS.
If you want simple, just pg_dumpall and then pipe to psql. But that'll be slow and it'll cause problems if your restore fails partway through then you try to resume, etc.
Better:
If you don't want to use replication, then use parallel-mode pg_dump and pg_restore with directory format input/output if you want to get things done quickly.
Configure your 9.0 database to accept connections from the 9.6 host and make sure there's a high-performance network connection (gigabit or better).
Using the 9.6 host, running the 9.6 versions of pg_dump and pg_dumpall:
Dump your global objects with pg_dumpall --globals-only -f globals.sql
Dump your database(s) with pg_dump -Fd -j4 -d dbname -f dbname.dumpdir or similar. -j is the number of parallel jobs. You'll need to dump each database separately if there are multiple ones.
Cleanly initdb a new PostgreSQL 9.6 install, removing whatever attempts you have previously made (since I don't know what is/isn't there). Alternately, DROP any created roles, databases, etc, returning it to a clean state.
Use psql to run the globals script: psql -v ON_ERROR_STOP=1 --single-transaction -f globals.sql -d postgres
Use pg_restore to load the database dumps: pg_restore --create -d template1 -j4 template1 dbname.dump, repeating for each dumped DB. You can restore multiple DBs concurrently.
Yes, I know the handling of global objects sucks. And yes, it'd be nice if all this were wrapped up in a simple command. But it isn't. Designs and well thought out patches are welcome if you want to try to improve this. So far nobody's wanted to enough to do the work.

Related

Update Postgresql in Ubuntu - pg_upgrade vs pg_upgradecluster

I would like to switch from postgres 9.6 to version 14 which runs on Ubuntu 21.04. I have a cluster with 3 databases.
I would like to know what is the difference between upgrading with pg_upgrade and pg_upgradecluster? Which one is faster and safer?
pg_upgrade is a tool from Postgresql itself that will operate on a single database (folder).
pg_upgradecluster however is a wrapper from your operating system (= Ubuntu) to pg_upgrade or pg_dump/pg_restore. In addition to very conveniently upgrading your database, it will also do some housekeeping like moving the config files to the correct folder in /etc/postgres/ .
So, if you have set up your database by pg_createcluster and it is hence listed by pg_lsclusters, I'd strongly recommend using pg_upgradecluster to upgrade it.
In terms of "faster vs. safer", be sure to read about the various options on the manpage.
If you can take a reliable backup (e.g. snapshot), you can safely use the -m upgrade --link option which will be fastest and allow for a very short downtime (depending on database size and resources, but I've recently upgraded a 700GB database in ~25 seconds).
The safest option of course is not using pg_upgrade, but the default pg_dump/pg_restore method, which will shut down your original database and copy the data to a new database in a new location (= it will use at approx. twice the space, at least temporarily until you decide to delete the original folder).

Why is pg_restore that slow and PostgreSQL almost not even using the CPU?

I just had to use pg_restore with a small dump of 30MB and it took in average 5 minutes! On my colleagues' computers, it is ultra fast, like a dozen of seconds. The difference between the two is the CPU usage: while for the others, the database uses quite a bunch of CPU (60-70%) during the restore operation, on my machine, it stays around a few percents only (0-3%) as if it was not active at all.
The exact command was : pg_restore -h 127.0.0.1 --username XXX --dbname test --no-comments test_dump.sql
The originating command to produce this dump was: pg_dump --dbname=XXX --user=XXX --no-owner --no-privileges --verbose --format=custom --file=/sql/test_dump.sql
Look at the screenshot taken in the middle of the restore operation:
Here is the corresponding vmstat 1 result running the command:
I've looked at the web for a solution during a few hours but this under-usage of the CPU remains quite mysterious. Any idea will be appreciated.
For the stack, I am on Ubuntu 20.04 and postgres version 13.6 is running into a docker container. I have a decent hardware, neither bad nor great.
EDIT: This very same command worked in the past on my machine with a same common HDD but now it is terribly slow. The only difference I saw with others (for whom it is blazing fast) was really on the CPU-usage from my point of view (even if they have an SSD which shouldn't be at all the limiting factor especially with a 30 MB dump).
EDIT 2: For those who proposed the problem was about IO-boundness and maybe a slow disk, I just tried without any conviction to run my command on an SSD partition I just made and nothing has changed.
The vmstat output shows that you are I/O bound. Get faster storage, and performance will improve.
PostgreSQL, by default, is tuned for data durability. Usually transactions are flushed to the disk at each and every commit, forcing write-through of any disk write cache, so it seems to be very IO-bound.
When restoring database from a dump file, it may make sense to lower these durability settings, especially if the restore is done while your application is offline, especially in non-production environments.
I temporarily run postgres with these options: -c fsync=off -c synchronous_commit=off -c full_page_writes=off -c checkpoint_flush_after=256 -c autovacuum=off -c max_wal_senders=0
Refer to these documentation sections for more information:
14.4.9. Some Notes about pg_dump
14.5. Non-Durable Settings.
Also this article:
Settings for a fast pg_restore

Restoring PostgreSQL 11 backup to 12 hangs. How can I debug it?

I'm attempting to upgrade heroku PostgreSQL instances from pg11 to pg12 using the copy method as my testing environments are on hobby instances. At the end of the process it appears to be hanging for a long time (does not exit after >30 minutes for a 120MB database). The datastore view suggests everything is fine, I have the same number of rows, but there are issues.
It appears to be the fault of a materialized view. If I connect to the database and look through the tables and views, only one appears to be empty. Using postico, it waits and waits for the view's structure, but doesn't give the usual warning for an unpopulated view.
I can recreate the stalling behaviour by creating a local pg12 database and attempting to use pg_restore with a recent backup. Along the same lines, I appear to be able to get it working by creating an empty local database, running all the db migrations, truncating all tables and sequences, and then doing a --data-only --disable-triggers load from the same backup. Not a particularly smooth or inspiring migration plan plan. Using --verbose doesn't show up any obvious errors, the last thing I get is that it's creating the problematic materialized view.
I've also set log_statement to all, and the last one I get is that it's refreshing the problematic view. At this point, the postgres command starts using ~100% CPU.
Locally, I'm using this command to restore:
pg_restore --verbose --clean --no-acl --no-owner -h localhost -d database_name database_backup.dump
This is the command we use regularly to restore production backups for local development.
Are there any known gotchas with upgrading from 11 to 12, or ways that I might be able to extract more information about what's going on?
It has probably chosen an appalling plan for doing the materialized view query, due to lack of statistics at the time the refresh was launched.
You could kill the process, then restart the refresh once stats are gathered (which they might already be.)
If starting from scratch, you could run pg_restore with --section of pre-data and data, then do an ANALYZE, then do post-data.

Database restore from a hacked system

A linux VM with postgres 9.4 was hacked into. (Two processes taking 100% cpu, weird files in /tmp, did not reoccur after kill(s) and restart.) It was decided to install the system from scratch on a new machine (with postgres 9.6). The only data needed was in one of postgres databases. A pg_dump of the database was made after the attack.
Regardless of whether the data - the tables/rows/etc. - were modified during the attack: is it safe to restore the database in the new system?
I consider using pg_restore with the -O option (ignores the user permissions)
The two dangers are:
important data could have been modified
back doors could have been installed in your database
With the first, you're on your own how to verify that your data are ok. The safest thing would be to use a backup from before the machine was compromized, but this would mean data loss.
For the second, I would run a pg_dumpall -s and spend a day reading it carefully. Compare it with a dump from a backup made before the breach. Watch out for weird object and column names and functions with SECURITY DEFINER.

How to migrate data to remote server with PostgreSQL?

How can I dump my database schema and data in such a way that the usernames, database names and the schema names of the dumped data matches these variables on the servers I deploy to?
My current process entails moving the data in two steps. First, I dump the schema of the database (pg_dump --schema-only -C -c) then I dump out the data with pg_dump --data-only -C and restore these on the remote server in tandem using the psql command. But there has to be a better way than this.
We use the following to replicate databases.
pg_basebackup -x -P -D /var/lib/pgsql/9.2/data -h OTHER_DB_IP_ADDR -U postgres
It requires the "master" server at OTHER_DB_IP_ADDR to be running the replication service and pg_hba.conf must allow replication connections. You do not have to run the "slave" service as a hot/warm stand by in order to replicate. One downside of this method compared with a dump/restore, the restore operation effectively vacuums and re-indexes and resets EVERYTHING, while the replication doesn't, so replicating can use a bit more disk space if your database has been heavily edited. On the other hand, replicating is MUCH faster (15 mins vs 3 hours in our case) since indexes do not have to be rebuilt.
Some useful references:
http://opensourcedbms.com/dbms/setup-replication-with-postgres-9-2-on-centos-6redhat-el6fedora/
http://www.postgresql.org/docs/9.2/static/high-availability.html
http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-postgresql-streaming-replication-in-10-mins/