Issues when upgrading and dockerising a Postgres v9.2 legacy database using pg_dumpall and pg_dump - postgresql

I am using an official postgres v12 docker image that I want to initialise with two SQL dump files that are gathered from a remote legacy v9.2 postgres server during the docker build phase:
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dumpall -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --globals-only -l $REMOTE_DB_NAME" >> dump/a_globals.sql
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dump -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --create $REMOTE_DB_NAME" >> dump/b_db.sql
By placing both a_globals.sql and b_db.sql files into the docker image folder docker-entrypoint-initdb.d, then the database is initialised with the legacy SQL files when the v12 container starts (as described here). Docker is working correctly, the dump files are retrieved successfully. However I am running into problems initialising the container's database and require guidance:
When the container starts to initialise its DB, it stops with ERROR: role $someDBRole does not exist. This is because the psql v9.2 dump SQL files DROP roles before reinstating them; the container DB does not like this. Unfortunately it is not until psql v9.4 that pg_dumpall and pg_dump have the option to --if-exists (see pg_dumpall v9.2 documentation). What would you suggest that I do in order to remedy this? I could manually edit the SQL dump files, but this would be impractical as the snapshots of the legacy DB need to be automated. Is there a way to suppress this error during container startup?
If I want to convert from ASCII to UTF-8, is it adequate to simply set the encoding option for pg_dumpall and pg_dump? Or do I need to take into consideration other issues when upgrading?
Is there a way to supress the removal and adding of the postgres super user which is in the dump SQL?
In general are there any other gotchas when containerising and/or updating a postgres DB.

I'm not familiar with Docker so I don't know how straightforward it'll be do to these things, but in general, pg_dump/dumpall output, when it's in SQL format, will work just fine after having gone through some ugly string manipulation.
Pipe it through sed -e 's/DROP ROLE/DROP ROLE IF EXISTS/', ideally when writing the .sqls, but it's fine to just run sed -i -e <...> to munge the files in-place after they're created if you don't have a full shell available. Make it sed -r -e '/^DROP ROLE/DROP ROLE IF EXISTS/ if you're worried about strings containing DROP ROLE in your data, at the cost of portability (AFAIK -r is a GNU addition to sed).
Yes. It's worth checking the data in pg12 to make sure it got imported correctly, but in the general case, pg_dump has been aware of encoding considerations since time immemorial, and a dump->load is absolutely the best way to change your DB encoding.
Sure. Find the lines that do it in your .sql, copy enough of it to be unique, and pipe it through grep -v <what you copied> :D
I can't speak to the containerizing aspect of things, but - and this is more of a general practice, not even really PG-specific - if you're dealing with a large DB that's getting migrated, prepare a small one, as similar as possible to the real one but omitting any bulky data, to test with to get everything working so that doing the real migration is just a matter of changing some vars (I guess $REMOtE_HOST and $REMOTE_PORT in your case). If it's not large, then just be comfortable blowing away any pg12 containers that failed partway through the import, figure out & do whatever to fix the failure, and start from the top again until it works end-to-end.

Related

pg_dump to copy schema to remote server

I need to copy a schema in Postgres to another database on a remote server, but I keep up ending to get a fail like:
pg_dump: too many command-line arguments (first is "--n")
My code:
pg_dump postgres -n my_local_shema | psql -h 11.22.33.44 -U my_user_on_remote_server-d postgres
I have tried for hours and with different commands but I keep getting the "too many command lines".
Try with a reversed order like this:
pg_dump -n my_local_shema postgres
Ok. This command structure works as a charm:
pg_dump -n my_local_shema_name -d my_local_database -U my_local_username | psql -h 111.222.333.444 -U my_user_name_on_remote_Server my_Database_name_on_remote_server
Step-by-step-guide
I copied a shema with all tables and indexes to another database on another server.
The 111.222.333.444 is the IP of the remote server.
In preparation(I dont know if it is actually needed), I first created a shema on the remote server with an identical shemaname as the one, I wanted to copy. I also checked, that the firewall was open for datatransfer from the old server to the new one.
Then, i opened a commandpromt (I use windows) and opened the folder where the pgdump.exefile was. Here typed the command.
Last it asked me to type in a password. First it promted it. THen it was silent - nothing happened, and I did not know, what to expect. Last I typed in the password 2 times (i use the same password both on the old server and the new, upgraded one). Then things started to work and it wrote a lot with alter table, ect, ect.
Hope others can use it. :-)

Best way to make PostgreSQL backups

I have a site that uses PostgreSQL. All content that I provide in my site is created at a development environment (this happens because it's webcrawler content). The only information created at the production environment is information about the users.
I need to find a good way to update data stored at production. May I restore to production only the tables updated at development environment and PostgreSQL will update this records at production or the best way would be to backup the users information at production, insert them at development and restore the whole database at production?
Thank you
You can use pg_dump to export the data just from the non-user tables in the development environment and pg_restore to bring that into prod.
The -t switch will let you pick specific tables.
pg_dump -d <database_name> -t <table_name>
https://www.postgresql.org/docs/current/static/app-pgdump.html
There are many tips arounds this subject here and here.
I'd suggest you to take a look on these links before everything.
If your data is discarded at each update process then a plain dump will be enough. You can redirect pg_dump output directly to psql connected on production to avoid pg_restore step, something like below:
#Of course you must drop tables to load it again
#so it'll be reasonable to make a full backup before this
pg_dump -Fp -U user -h host_to_dev -T=user your_db | psql -U user -h host_to_production your_db
You might asking yourself "Why he's saying to drop my tables"?
Bulk loading data on a fresh table is faster than deleting old data and inserting again. A quote from the docs:
Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
Ps¹: If you can't connect on both environment at same time then you need to do pg_restore manually.
Ps²: I don't recommend it but you can append --clean option on pg_dump to generate DROP statements automatically. Be extreme careful with this option to avoid dropping unnexpected objects.

Postgresql transfer table data from different databases

I have two apps, with same tables. One of app collecting data from web. I want to send the datas to my second(web app)'s app database.
With the code below, I have created the file with datas:
pg_dump -U username -t public."table_name" -d database name --inserts > table_name.sql
The problem is that I just want to insert data's which does not exist in second database.
If I try the code below, I get a lot of already exists errors:
psql -U username second_database_name < table_name.sql
One of error:
multiple primary keys for table "table_name" are not allowed
Another one:
relation "table_name_attribute_442....c74_uniq" already exists
--clean , --if-exists ... What should I do?
The way I did it, was to do a pg_dump that creates a compressed archive suitable for use with pg_restore, which has the needed flags to allow the data to be imported without throwing errors.
For example:
pg_dump -Fc -h 127.0.0.1 {db_name_here} > {dump_file_name_here}
The "-Fc" gets you the file-type that pg_restore wants; it will reject a dump made without those magic letters.
Now you can restore the file with:
pg_restore -O -h 127.0.0.1 --clean --disable-triggers -d {target_db_name} {dump_file_name_here}
Voila - the data is now in the target_db.
If the 'psql' command has the equivalent flags, I don't know what they are, and did not find them in SO searching. But, hopefully, this provides a DB dump/restore that 'just works' for those who need to get back to using the DB, instead of fiddling with trying to get a simple dump/restore to happen with the expected behavior.
Also note, if you do not specify ...
-h 127.0.0.1
... it will go looking for a unix-socket, which may or may not be configured correctly. Chances are, your use-case for the DB addresses the db that way, so you never manually-configured the unix-socket to match whatever the command defaults to trying to find, which is different than how setup sets the default socket (of course, because things "just working," so you can "just use it" is just not possible).

Custom pg:dump options with Heroku pg:backups capture?

When developing, I need to pull the latest database so I know I'm working with the latest data. However, we keep a table full of Archives that I don't need to bother downloading because it's a very large table.
I know pg_dump allows for custom parameters that will let you exclude a certain table from being dumped.
Without doing anything crazy like having 2 databases, 1 for data and 1 for archives, is there any way to download everything BUT the archives table from Heroku?
I still need it to keep backups of the archives table, but I don't want to be downloading it. Can I just do a pg_dump when needed that is seperate from the backups?
I know it's a long shot, but any suggestions would be greatly appreciated.
You can't add any custom pg_dump options when using heroku pg:backups capture. This command actually calls an undocumented Heroku Postgres API and it doesn't pass any parameters (see here for the code if you are curious).
What you can do is run your own pg_dump dump command that points to the Heroku Postgres instance.
Get the connection info with pg:credentials where DATABASE_URL can also be the the database color if you have more than one database attached to the app:
> heroku pg:credentials DATABASE_URL --app app_name
Connection info string:
"dbname=zzxcasdqwe host=ec2-1-1-1-1.compute-1.amazonaws.com port=1111 user=asdfasdf password=qwertyqwerty sslmode=require"
Connection URL:
postgres://asdfasdf:qwertyqwerty#ec2-1-1-1-1.compute-1.amazonaws.com:1111/zzxcasdqwe
Take either the the connection info string or the connection url and include that as the first argument to pg_dump and add your custom options
pg_dump "dbname=zzxcasdqwe host=ec2-1-1-1-1.compute-1.amazonaws.com port=1111 user=asdfasdf password=qwertyqwerty sslmode=require"\
-n schema -t table -O -x -Fc -f dump.out
# OR
pg_dump postgres://asdfasdf:qwertyqwerty#ec2-1-1-1-1.compute-1.amazonaws.com:1111/zzxcasdqwe \
-n schema -t table -O -x -Fc -f dump.out
I also co-wrote a Heroku plugin (parse_db_url) that will parse DATABASE_URL's into other formats like pg_dump, pg_restore, pgpass etc. I find it useful when dealing with several different Heroku databases.

Use pg_restore to restore from a newer version of PostgreSQL

I have a (production) DB server running PostgreSQL v9.0 and a development machine running PostgreSQL v8.4. I would like to take a dump of the production DB and use it on the development machine. I cannot upgrade the postgres on the dev machine.
On the production machine, I run:
pg_dump -f nvdls.db -F p -U nvdladmin nvdlstats
On the development machine, I run:
pg_restore -d nvdlstats -U nvdladmin nvdls.db
And I got this error:
pg_restore: [archiver] unsupported version (1.12) in file header
This occurs regardless of whether I choose the custom, tar, or plain_text format when dumping.
I found one discussion online which suggests that I should use a newer version of pg_restore on the dev machine. I tried this by simply copying the 9.0 binary to the dev machine, but this fails (not unexpectedly) due to linking problems.
I thought that the point of using a plain_text dump was that it would be raw, portable SQL. Apparently not.
How can I get the 9.0 DB into my 8.4 install?
pg_restore is only for restoring dumps taken in the "custom" format.
If you do a "plain text" dump you have to use psql to run the generated SQL script:
psql -f nvdls.db dbname username
Using pg_dump/pg_restore to move from 9.0 to 8.4 is not supported - only moving forward is supported.
However, you can usually get the data across (in a data-only dump), and in some cases you can get the schema - but that's mostly luck, it depends on which features you're using.
You should normally use the target version of pg_dump and pg_restore - meaning in this case you should use the binaries from 8.4. But you should use the same version of pg_dump and pg_restore. Both tools will work fine across the network, so there should be no need to copy the binaries around.
And as a_horse_with_no_name says, you may be better off using pg_dump in plaintext mode - that will allow you to hand-edit the dump if necessary. In particular, you can make one schema only dump (with -s) and one data only dump - only the schema dump is likely to require any editing.
If the 9.0 database contains any bytea columns, then bigger problems await.
These columns will be exported by pg_dump using the "hex" representation and appear in your dump file like:
SELECT pg_catalog.lowrite(0, '\x0a2')
Any version of the postgres backend below 9.0 can't grok the hex representation of bytea, and I can't find an option to tell pg_dump on the 9.0 side to not use it. Setting the default "bytea_output" setting to ESCAPE for either the database or the whole server is seemingly ignored by pg_dump.
I suppose it would be possible to post-process the dump file and actually change every hex-encoded bytea value to an escaped one, but the risk of untraceably corrupting the kind of things normally stored in a bytea (images, PDFs etc) does not excite me.
I solved this by upgrading postgresql from 8.X to 9.2.4. If you're using brew on Mac OS-X, use -
brew upgrade postgresql
Once this is done, just make sure your new postgres installation is at the top of your path. It'll look something like (depending on the version installation path) -
export PATH=/usr/local/Cellar/postgresql/9.2.4/bin:$PATH
I had same issue. I used pgdump and psql for export/import DB.
1.Set PGPASSWORD
export PGPASSWORD='h0ld1tn0w';
2.Export DB with pg_dump
pg_dump -h <<host>> -U <<username>> <<dbname>> > /opt/db.out
/opt/db.out is dump path. You can specify of your own.
3.Then set again PGPASSWORD of you another host. If host is same or password is same then this is not required.
4.Import db at your another host
psql -h <<host>> -U <<username>> -d <<dbname>> -f /opt/db.out
If username is different then find and replace with your local username in db.out file. And make sure on username is replaced and not data.