How to migrate data to remote server with PostgreSQL? - postgresql

How can I dump my database schema and data in such a way that the usernames, database names and the schema names of the dumped data matches these variables on the servers I deploy to?
My current process entails moving the data in two steps. First, I dump the schema of the database (pg_dump --schema-only -C -c) then I dump out the data with pg_dump --data-only -C and restore these on the remote server in tandem using the psql command. But there has to be a better way than this.

We use the following to replicate databases.
pg_basebackup -x -P -D /var/lib/pgsql/9.2/data -h OTHER_DB_IP_ADDR -U postgres
It requires the "master" server at OTHER_DB_IP_ADDR to be running the replication service and pg_hba.conf must allow replication connections. You do not have to run the "slave" service as a hot/warm stand by in order to replicate. One downside of this method compared with a dump/restore, the restore operation effectively vacuums and re-indexes and resets EVERYTHING, while the replication doesn't, so replicating can use a bit more disk space if your database has been heavily edited. On the other hand, replicating is MUCH faster (15 mins vs 3 hours in our case) since indexes do not have to be rebuilt.
Some useful references:
http://opensourcedbms.com/dbms/setup-replication-with-postgres-9-2-on-centos-6redhat-el6fedora/
http://www.postgresql.org/docs/9.2/static/high-availability.html
http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-postgresql-streaming-replication-in-10-mins/

Related

I have loaded wrong psql dump into my database, anyway to revert?

Ok, I screwed up.
I dumped one of my psql (9.6.18) staging database with the following command
pg_dump -U postgres -d <dbname> > db.out
And after doing some testing, I "restored" the data using the following command.
psql -f db.out postgres
Notice the absence of -d option? yup. And that was supposed to be the username.
Annnd as the database happend to have the same name as its user, it overwrote the 'default' database (postgres), which had data that other QAs are using.
I cancelled the operation quickly as soon as I realised my mistake, but the damage was still done. Around 1/3 ~ 1/2 of the database is roughly identical to the staging database - at least in terms of the schema.
Is there any way to revert this? I am still looking for any other dumps if any of these guys made one. But I don't think there is any past two to three months. Seems like I got no choice but to own up and apologise to them in the morning.
Without a recent dump or some sort of PITR replication setup, you can't un-revert this easily. The only option is to manually go through the log of what was restored and remove/alter it in the postgres database. This will work for the schema, the data is another matter. FYI, the postgres database should not really be used as a 'working' database. It is there to be a database to connect to for doing other operations, such as CREATE DATABASE or to bootstrap your way into a cluster. If left empty then the above would not have been a problem. You could have done, from another database, DROP DATABASE postgres; and then CREATE DATABASE postgres.
Do you have a capture of the output of the psql -f db.out postgres run?
Since the pg_dump didn't specify --clean or -c, it should not have overwritten anything, just appended. And if your tables have unique or primary keys, most of the data copy operations should have failed with unique key violations and rolled back. Even one overlapping row (per table) would roll back the entire dataset for that table.
Without having the output, it will be hard to figure out what damage has actually been done.
You should also immediately copy the pg_xlog data someplace safe. If it comes down to it, you might be able to use pg_xlogdump to figure out what changes committed and what did not.

Backing a very large PostgreSQL DB "by parts"

I have a PostgreQL DB that is about 6TB. I want to transfer this database to another server using for example pg_dumpall. The problem I have is that I only have a 1TB HD. How can I do to copy this database to the other new server that has enough space? Let's suppose I can not get another HD. Is there the possibility to do partial backup files, upload them to the new server, erase the HD and get another batch of backup files until the transfer is complete?
This works here(proof of concept):
shell-command issued from the receiving side
remote side dumps through the network-connection
local side psql just accepts the commands from this connection
the data is never stored in a physical file
(for brevity, I only sent the table definitions, not the actual data: --schema-only)
you could have some problems with users and tablespaces (these are global for an installation in Postgres) pg_dumpall will dump+restore these, too, IIRC.
#!/bin/bash
remote=10.224.60.103
dbname=myremotedbname
pg_dump -h ${remote} --schema-only -c -C ${dbname} | psql
#eof
As suggested above if you have a fast network connection between source and destination you can do it without any extra disk.
However for a 6 TB DB (which includes indexes I assume) using the archive dump format (-Fc) could yield a database dump of less than 1 TB.
Regarding the "by parts" question: yes, it possible using the table pattern (-t, --table):
pg_dump -t TABLE_NAME ...
You can also exclude tables using -T, --exclude-table:
pg_dump -T TABLE_NAME ...
The above options (-t , -T) can be specified multiple times and can be even combined.
They also support patterns for specifying the tables:
pg_dump -t 'employee_*' ...

Best way to make PostgreSQL backups

I have a site that uses PostgreSQL. All content that I provide in my site is created at a development environment (this happens because it's webcrawler content). The only information created at the production environment is information about the users.
I need to find a good way to update data stored at production. May I restore to production only the tables updated at development environment and PostgreSQL will update this records at production or the best way would be to backup the users information at production, insert them at development and restore the whole database at production?
Thank you
You can use pg_dump to export the data just from the non-user tables in the development environment and pg_restore to bring that into prod.
The -t switch will let you pick specific tables.
pg_dump -d <database_name> -t <table_name>
https://www.postgresql.org/docs/current/static/app-pgdump.html
There are many tips arounds this subject here and here.
I'd suggest you to take a look on these links before everything.
If your data is discarded at each update process then a plain dump will be enough. You can redirect pg_dump output directly to psql connected on production to avoid pg_restore step, something like below:
#Of course you must drop tables to load it again
#so it'll be reasonable to make a full backup before this
pg_dump -Fp -U user -h host_to_dev -T=user your_db | psql -U user -h host_to_production your_db
You might asking yourself "Why he's saying to drop my tables"?
Bulk loading data on a fresh table is faster than deleting old data and inserting again. A quote from the docs:
Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
Ps¹: If you can't connect on both environment at same time then you need to do pg_restore manually.
Ps²: I don't recommend it but you can append --clean option on pg_dump to generate DROP statements automatically. Be extreme careful with this option to avoid dropping unnexpected objects.

Migrating 200GB of Postgres data from 9.0 to 9.6

We have a simple database with just 5 tables. But 1 table is huge, around 100GB of data by itself, and the indices together are nearly double that size. The server is an old CentOS 5 server with PG 9.0. I'm moving to a more modern setup with SSD hard disks, CentOS 7, and PG 9.6.
Question: what's the best way to migrate data in a simple way. pg_dump it on the old server, move it via rsync or something to the new server and pg_restore? I could do the pg_dump with -Fc option, so that we can pg_restore it easily (otherwise it's a text format and we have to use psql -f instead). But a trial run suggested that while the pg_dump is OK, the pg_restore on the destination server, which is much faster, goes on and on. We did a pg_restore --verbose, but there was no verbosity at all. Perhaps the server was stuck doing IO?
Our pg.conf settings for the pg_restore are as follows:
maintenance_work_mem = 1500MB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
What should we do to ensure that the pg_restore works? Right now both servers are offline, so I can do pretty much anything needed -- any settings can be changed.
Some more background info--
Old server: CentOS 5, SCSI RAID 1 disks, 4GB RAM (not much), PG 9.0
New server: CentOS 7 (latest), SSD disk, 16GB RAM, PG 9.6
Thank you for any pointers on moving large tables in the best way possible. The usual PG documentation doesn't seem to be helping. We've tried both the text dump way and the -Fc way.
I strongly suggest you pg_upgrade:
Install 9.0.23 on the new server. From source if necessary.
Set up a streaming replica on the new server using pg_basebackup and a suitable recovery.conf. Enable WAL archiving and restore_command too, in case it becomes desynchronised for any reason.
Also install 9.6 on the new server
Do an upgrade test by stopping the replica and attempting a pg_upgrade to 9.6. Restart the replica, fix any issues and repeat until you succeed.
When you're confident pg_upgrade will succeed, plan a cut-over time. Stop the 9.0 master and stop the replica. pg_upgrade the replica. Start the new 9.6 server.
See the pg_upgrade documentation for more info.
Remember: KEEP BACKUPS.
If you want simple, just pg_dumpall and then pipe to psql. But that'll be slow and it'll cause problems if your restore fails partway through then you try to resume, etc.
Better:
If you don't want to use replication, then use parallel-mode pg_dump and pg_restore with directory format input/output if you want to get things done quickly.
Configure your 9.0 database to accept connections from the 9.6 host and make sure there's a high-performance network connection (gigabit or better).
Using the 9.6 host, running the 9.6 versions of pg_dump and pg_dumpall:
Dump your global objects with pg_dumpall --globals-only -f globals.sql
Dump your database(s) with pg_dump -Fd -j4 -d dbname -f dbname.dumpdir or similar. -j is the number of parallel jobs. You'll need to dump each database separately if there are multiple ones.
Cleanly initdb a new PostgreSQL 9.6 install, removing whatever attempts you have previously made (since I don't know what is/isn't there). Alternately, DROP any created roles, databases, etc, returning it to a clean state.
Use psql to run the globals script: psql -v ON_ERROR_STOP=1 --single-transaction -f globals.sql -d postgres
Use pg_restore to load the database dumps: pg_restore --create -d template1 -j4 template1 dbname.dump, repeating for each dumped DB. You can restore multiple DBs concurrently.
Yes, I know the handling of global objects sucks. And yes, it'd be nice if all this were wrapped up in a simple command. But it isn't. Designs and well thought out patches are welcome if you want to try to improve this. So far nobody's wanted to enough to do the work.

Where does pgdump do its compression?

Here is the command I am using
pgdump -h localhost -p 54321 -U example_user --format custom
which dumps a database on a remote server which I have connected to with a port forward on port 54321.
I know that the custom format does some compression by default.
Does this compression happen on the database server, or does everything get sent across to my local machine where the compression happens.
Compression is done on the client side so everything gets sent to your computer. What pg_dump does to the database is that it just executes ordinary queries to get the data.
PostgreSQL Documentation: 24.1. SQL Dump:
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one).
PostgreSQL Documentation - II. PostgreSQL Client Applications - pg_dump:
pg_dump internally executes SELECT statements. If you have problems running pg_dump, make sure you are able to select information from the database using, for example, psql.
If you need more information about the inner workings of pg_dump I would suggest asking it from PostgreSQL mailing list or looking at the source code.