Where does pgdump do its compression? - postgresql

Here is the command I am using
pgdump -h localhost -p 54321 -U example_user --format custom
which dumps a database on a remote server which I have connected to with a port forward on port 54321.
I know that the custom format does some compression by default.
Does this compression happen on the database server, or does everything get sent across to my local machine where the compression happens.

Compression is done on the client side so everything gets sent to your computer. What pg_dump does to the database is that it just executes ordinary queries to get the data.
PostgreSQL Documentation: 24.1. SQL Dump:
pg_dump is a regular PostgreSQL client application (albeit a particularly clever one).
PostgreSQL Documentation - II. PostgreSQL Client Applications - pg_dump:
pg_dump internally executes SELECT statements. If you have problems running pg_dump, make sure you are able to select information from the database using, for example, psql.
If you need more information about the inner workings of pg_dump I would suggest asking it from PostgreSQL mailing list or looking at the source code.

Related

Copy an entire data table (not entire database) from local machine to heroku postgres

I have a relatively large data table (~4m rows) that has been imported to a locally hosted postgresql database. (As it happens it's a ruby on rails app database, but that shouldn't be important for the purposes of the question - unless it helps)
I want to take that table and add it into an identical table in a heroku postgresql database (the table is currently empty).
How would I do that quickly and efficiently?
I found this Copy a table from one database to another in Postgres
but I'm struggling with the syntax for the heroku end, i.e. how do I connect to both at the same time? Which database am I connecting to originally?
In that answer, you are originally connected to the database "source_db" or "my_db" (depending on which line in the answer you are looking at). Presumably that database is on the instance running locally on port 5432, unless unshown environment variables (or non-default compilation) have changed that. And the destination database is named "target_db", running in the same instance.
The pg_dump and psql are independent commands and each takes all the connection options that they would take if run in isolation. So you would probably want something like:
pg_dump -t table_to_copy source_db | psql target_db -h you.heroku.hostname_or_ip
A problem could be if both commands prompt for a password, it might make a mess. Which password do you need to enter first? And whichever order, will they read them correctly? If both need passwords, it is best to arrange that at least one of them be supplied by ~/.pgpass.

Issues when upgrading and dockerising a Postgres v9.2 legacy database using pg_dumpall and pg_dump

I am using an official postgres v12 docker image that I want to initialise with two SQL dump files that are gathered from a remote legacy v9.2 postgres server during the docker build phase:
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dumpall -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --globals-only -l $REMOTE_DB_NAME" >> dump/a_globals.sql
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dump -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --create $REMOTE_DB_NAME" >> dump/b_db.sql
By placing both a_globals.sql and b_db.sql files into the docker image folder docker-entrypoint-initdb.d, then the database is initialised with the legacy SQL files when the v12 container starts (as described here). Docker is working correctly, the dump files are retrieved successfully. However I am running into problems initialising the container's database and require guidance:
When the container starts to initialise its DB, it stops with ERROR: role $someDBRole does not exist. This is because the psql v9.2 dump SQL files DROP roles before reinstating them; the container DB does not like this. Unfortunately it is not until psql v9.4 that pg_dumpall and pg_dump have the option to --if-exists (see pg_dumpall v9.2 documentation). What would you suggest that I do in order to remedy this? I could manually edit the SQL dump files, but this would be impractical as the snapshots of the legacy DB need to be automated. Is there a way to suppress this error during container startup?
If I want to convert from ASCII to UTF-8, is it adequate to simply set the encoding option for pg_dumpall and pg_dump? Or do I need to take into consideration other issues when upgrading?
Is there a way to supress the removal and adding of the postgres super user which is in the dump SQL?
In general are there any other gotchas when containerising and/or updating a postgres DB.
I'm not familiar with Docker so I don't know how straightforward it'll be do to these things, but in general, pg_dump/dumpall output, when it's in SQL format, will work just fine after having gone through some ugly string manipulation.
Pipe it through sed -e 's/DROP ROLE/DROP ROLE IF EXISTS/', ideally when writing the .sqls, but it's fine to just run sed -i -e <...> to munge the files in-place after they're created if you don't have a full shell available. Make it sed -r -e '/^DROP ROLE/DROP ROLE IF EXISTS/ if you're worried about strings containing DROP ROLE in your data, at the cost of portability (AFAIK -r is a GNU addition to sed).
Yes. It's worth checking the data in pg12 to make sure it got imported correctly, but in the general case, pg_dump has been aware of encoding considerations since time immemorial, and a dump->load is absolutely the best way to change your DB encoding.
Sure. Find the lines that do it in your .sql, copy enough of it to be unique, and pipe it through grep -v <what you copied> :D
I can't speak to the containerizing aspect of things, but - and this is more of a general practice, not even really PG-specific - if you're dealing with a large DB that's getting migrated, prepare a small one, as similar as possible to the real one but omitting any bulky data, to test with to get everything working so that doing the real migration is just a matter of changing some vars (I guess $REMOtE_HOST and $REMOTE_PORT in your case). If it's not large, then just be comfortable blowing away any pg12 containers that failed partway through the import, figure out & do whatever to fix the failure, and start from the top again until it works end-to-end.

Backing a very large PostgreSQL DB "by parts"

I have a PostgreQL DB that is about 6TB. I want to transfer this database to another server using for example pg_dumpall. The problem I have is that I only have a 1TB HD. How can I do to copy this database to the other new server that has enough space? Let's suppose I can not get another HD. Is there the possibility to do partial backup files, upload them to the new server, erase the HD and get another batch of backup files until the transfer is complete?
This works here(proof of concept):
shell-command issued from the receiving side
remote side dumps through the network-connection
local side psql just accepts the commands from this connection
the data is never stored in a physical file
(for brevity, I only sent the table definitions, not the actual data: --schema-only)
you could have some problems with users and tablespaces (these are global for an installation in Postgres) pg_dumpall will dump+restore these, too, IIRC.
#!/bin/bash
remote=10.224.60.103
dbname=myremotedbname
pg_dump -h ${remote} --schema-only -c -C ${dbname} | psql
#eof
As suggested above if you have a fast network connection between source and destination you can do it without any extra disk.
However for a 6 TB DB (which includes indexes I assume) using the archive dump format (-Fc) could yield a database dump of less than 1 TB.
Regarding the "by parts" question: yes, it possible using the table pattern (-t, --table):
pg_dump -t TABLE_NAME ...
You can also exclude tables using -T, --exclude-table:
pg_dump -T TABLE_NAME ...
The above options (-t , -T) can be specified multiple times and can be even combined.
They also support patterns for specifying the tables:
pg_dump -t 'employee_*' ...

How to migrate data to remote server with PostgreSQL?

How can I dump my database schema and data in such a way that the usernames, database names and the schema names of the dumped data matches these variables on the servers I deploy to?
My current process entails moving the data in two steps. First, I dump the schema of the database (pg_dump --schema-only -C -c) then I dump out the data with pg_dump --data-only -C and restore these on the remote server in tandem using the psql command. But there has to be a better way than this.
We use the following to replicate databases.
pg_basebackup -x -P -D /var/lib/pgsql/9.2/data -h OTHER_DB_IP_ADDR -U postgres
It requires the "master" server at OTHER_DB_IP_ADDR to be running the replication service and pg_hba.conf must allow replication connections. You do not have to run the "slave" service as a hot/warm stand by in order to replicate. One downside of this method compared with a dump/restore, the restore operation effectively vacuums and re-indexes and resets EVERYTHING, while the replication doesn't, so replicating can use a bit more disk space if your database has been heavily edited. On the other hand, replicating is MUCH faster (15 mins vs 3 hours in our case) since indexes do not have to be rebuilt.
Some useful references:
http://opensourcedbms.com/dbms/setup-replication-with-postgres-9-2-on-centos-6redhat-el6fedora/
http://www.postgresql.org/docs/9.2/static/high-availability.html
http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-postgresql-streaming-replication-in-10-mins/

Use pg_restore to restore from a newer version of PostgreSQL

I have a (production) DB server running PostgreSQL v9.0 and a development machine running PostgreSQL v8.4. I would like to take a dump of the production DB and use it on the development machine. I cannot upgrade the postgres on the dev machine.
On the production machine, I run:
pg_dump -f nvdls.db -F p -U nvdladmin nvdlstats
On the development machine, I run:
pg_restore -d nvdlstats -U nvdladmin nvdls.db
And I got this error:
pg_restore: [archiver] unsupported version (1.12) in file header
This occurs regardless of whether I choose the custom, tar, or plain_text format when dumping.
I found one discussion online which suggests that I should use a newer version of pg_restore on the dev machine. I tried this by simply copying the 9.0 binary to the dev machine, but this fails (not unexpectedly) due to linking problems.
I thought that the point of using a plain_text dump was that it would be raw, portable SQL. Apparently not.
How can I get the 9.0 DB into my 8.4 install?
pg_restore is only for restoring dumps taken in the "custom" format.
If you do a "plain text" dump you have to use psql to run the generated SQL script:
psql -f nvdls.db dbname username
Using pg_dump/pg_restore to move from 9.0 to 8.4 is not supported - only moving forward is supported.
However, you can usually get the data across (in a data-only dump), and in some cases you can get the schema - but that's mostly luck, it depends on which features you're using.
You should normally use the target version of pg_dump and pg_restore - meaning in this case you should use the binaries from 8.4. But you should use the same version of pg_dump and pg_restore. Both tools will work fine across the network, so there should be no need to copy the binaries around.
And as a_horse_with_no_name says, you may be better off using pg_dump in plaintext mode - that will allow you to hand-edit the dump if necessary. In particular, you can make one schema only dump (with -s) and one data only dump - only the schema dump is likely to require any editing.
If the 9.0 database contains any bytea columns, then bigger problems await.
These columns will be exported by pg_dump using the "hex" representation and appear in your dump file like:
SELECT pg_catalog.lowrite(0, '\x0a2')
Any version of the postgres backend below 9.0 can't grok the hex representation of bytea, and I can't find an option to tell pg_dump on the 9.0 side to not use it. Setting the default "bytea_output" setting to ESCAPE for either the database or the whole server is seemingly ignored by pg_dump.
I suppose it would be possible to post-process the dump file and actually change every hex-encoded bytea value to an escaped one, but the risk of untraceably corrupting the kind of things normally stored in a bytea (images, PDFs etc) does not excite me.
I solved this by upgrading postgresql from 8.X to 9.2.4. If you're using brew on Mac OS-X, use -
brew upgrade postgresql
Once this is done, just make sure your new postgres installation is at the top of your path. It'll look something like (depending on the version installation path) -
export PATH=/usr/local/Cellar/postgresql/9.2.4/bin:$PATH
I had same issue. I used pgdump and psql for export/import DB.
1.Set PGPASSWORD
export PGPASSWORD='h0ld1tn0w';
2.Export DB with pg_dump
pg_dump -h <<host>> -U <<username>> <<dbname>> > /opt/db.out
/opt/db.out is dump path. You can specify of your own.
3.Then set again PGPASSWORD of you another host. If host is same or password is same then this is not required.
4.Import db at your another host
psql -h <<host>> -U <<username>> -d <<dbname>> -f /opt/db.out
If username is different then find and replace with your local username in db.out file. And make sure on username is replaced and not data.