Custom pg:dump options with Heroku pg:backups capture? - postgresql

When developing, I need to pull the latest database so I know I'm working with the latest data. However, we keep a table full of Archives that I don't need to bother downloading because it's a very large table.
I know pg_dump allows for custom parameters that will let you exclude a certain table from being dumped.
Without doing anything crazy like having 2 databases, 1 for data and 1 for archives, is there any way to download everything BUT the archives table from Heroku?
I still need it to keep backups of the archives table, but I don't want to be downloading it. Can I just do a pg_dump when needed that is seperate from the backups?
I know it's a long shot, but any suggestions would be greatly appreciated.

You can't add any custom pg_dump options when using heroku pg:backups capture. This command actually calls an undocumented Heroku Postgres API and it doesn't pass any parameters (see here for the code if you are curious).
What you can do is run your own pg_dump dump command that points to the Heroku Postgres instance.
Get the connection info with pg:credentials where DATABASE_URL can also be the the database color if you have more than one database attached to the app:
> heroku pg:credentials DATABASE_URL --app app_name
Connection info string:
"dbname=zzxcasdqwe host=ec2-1-1-1-1.compute-1.amazonaws.com port=1111 user=asdfasdf password=qwertyqwerty sslmode=require"
Connection URL:
postgres://asdfasdf:qwertyqwerty#ec2-1-1-1-1.compute-1.amazonaws.com:1111/zzxcasdqwe
Take either the the connection info string or the connection url and include that as the first argument to pg_dump and add your custom options
pg_dump "dbname=zzxcasdqwe host=ec2-1-1-1-1.compute-1.amazonaws.com port=1111 user=asdfasdf password=qwertyqwerty sslmode=require"\
-n schema -t table -O -x -Fc -f dump.out
# OR
pg_dump postgres://asdfasdf:qwertyqwerty#ec2-1-1-1-1.compute-1.amazonaws.com:1111/zzxcasdqwe \
-n schema -t table -O -x -Fc -f dump.out
I also co-wrote a Heroku plugin (parse_db_url) that will parse DATABASE_URL's into other formats like pg_dump, pg_restore, pgpass etc. I find it useful when dealing with several different Heroku databases.

Related

Issues when upgrading and dockerising a Postgres v9.2 legacy database using pg_dumpall and pg_dump

I am using an official postgres v12 docker image that I want to initialise with two SQL dump files that are gathered from a remote legacy v9.2 postgres server during the docker build phase:
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dumpall -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --globals-only -l $REMOTE_DB_NAME" >> dump/a_globals.sql
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dump -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --create $REMOTE_DB_NAME" >> dump/b_db.sql
By placing both a_globals.sql and b_db.sql files into the docker image folder docker-entrypoint-initdb.d, then the database is initialised with the legacy SQL files when the v12 container starts (as described here). Docker is working correctly, the dump files are retrieved successfully. However I am running into problems initialising the container's database and require guidance:
When the container starts to initialise its DB, it stops with ERROR: role $someDBRole does not exist. This is because the psql v9.2 dump SQL files DROP roles before reinstating them; the container DB does not like this. Unfortunately it is not until psql v9.4 that pg_dumpall and pg_dump have the option to --if-exists (see pg_dumpall v9.2 documentation). What would you suggest that I do in order to remedy this? I could manually edit the SQL dump files, but this would be impractical as the snapshots of the legacy DB need to be automated. Is there a way to suppress this error during container startup?
If I want to convert from ASCII to UTF-8, is it adequate to simply set the encoding option for pg_dumpall and pg_dump? Or do I need to take into consideration other issues when upgrading?
Is there a way to supress the removal and adding of the postgres super user which is in the dump SQL?
In general are there any other gotchas when containerising and/or updating a postgres DB.
I'm not familiar with Docker so I don't know how straightforward it'll be do to these things, but in general, pg_dump/dumpall output, when it's in SQL format, will work just fine after having gone through some ugly string manipulation.
Pipe it through sed -e 's/DROP ROLE/DROP ROLE IF EXISTS/', ideally when writing the .sqls, but it's fine to just run sed -i -e <...> to munge the files in-place after they're created if you don't have a full shell available. Make it sed -r -e '/^DROP ROLE/DROP ROLE IF EXISTS/ if you're worried about strings containing DROP ROLE in your data, at the cost of portability (AFAIK -r is a GNU addition to sed).
Yes. It's worth checking the data in pg12 to make sure it got imported correctly, but in the general case, pg_dump has been aware of encoding considerations since time immemorial, and a dump->load is absolutely the best way to change your DB encoding.
Sure. Find the lines that do it in your .sql, copy enough of it to be unique, and pipe it through grep -v <what you copied> :D
I can't speak to the containerizing aspect of things, but - and this is more of a general practice, not even really PG-specific - if you're dealing with a large DB that's getting migrated, prepare a small one, as similar as possible to the real one but omitting any bulky data, to test with to get everything working so that doing the real migration is just a matter of changing some vars (I guess $REMOtE_HOST and $REMOTE_PORT in your case). If it's not large, then just be comfortable blowing away any pg12 containers that failed partway through the import, figure out & do whatever to fix the failure, and start from the top again until it works end-to-end.

database backup in jelastic can't be done from the app node

My Goal is to have an automatic database backup that will be sent to my s3 backet
Jelastic has a good documentation how to run the pg_dump inside the database node/container, but in order to obtain the backup file you have to do it manually using an FTP add-ons!
But As I said earlier my goal is to send the backup file automatically to my s3 backet, what I tried to do is to run the pg_dump from my app node instead of postgresql node (hopefully I can have some control from the app side), the command I run basically looks like this:
PGPASSWORD="my_database_password" pg_dump --host "nodeXXXX-XXXXX.jelastic.XXXXX.net"
-U my_db_username -p "5432" -f sql_backup.sql "database_name" 2> $LOG_FILE
The output of my log file is :
pg_dump: server version: 10.3; pg_dump version: 9.4.10
pg_dump: aborting because of server version mismatch
The issue here is that the database node has a different pg_dump version than the nginx/app node, so the backup can't be performed! I looked around but can't find an easy way to solve this. Am open to any alternative way that helps to achieve my initial goal.

Best way to make PostgreSQL backups

I have a site that uses PostgreSQL. All content that I provide in my site is created at a development environment (this happens because it's webcrawler content). The only information created at the production environment is information about the users.
I need to find a good way to update data stored at production. May I restore to production only the tables updated at development environment and PostgreSQL will update this records at production or the best way would be to backup the users information at production, insert them at development and restore the whole database at production?
Thank you
You can use pg_dump to export the data just from the non-user tables in the development environment and pg_restore to bring that into prod.
The -t switch will let you pick specific tables.
pg_dump -d <database_name> -t <table_name>
https://www.postgresql.org/docs/current/static/app-pgdump.html
There are many tips arounds this subject here and here.
I'd suggest you to take a look on these links before everything.
If your data is discarded at each update process then a plain dump will be enough. You can redirect pg_dump output directly to psql connected on production to avoid pg_restore step, something like below:
#Of course you must drop tables to load it again
#so it'll be reasonable to make a full backup before this
pg_dump -Fp -U user -h host_to_dev -T=user your_db | psql -U user -h host_to_production your_db
You might asking yourself "Why he's saying to drop my tables"?
Bulk loading data on a fresh table is faster than deleting old data and inserting again. A quote from the docs:
Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
Ps¹: If you can't connect on both environment at same time then you need to do pg_restore manually.
Ps²: I don't recommend it but you can append --clean option on pg_dump to generate DROP statements automatically. Be extreme careful with this option to avoid dropping unnexpected objects.

Postgresql transfer table data from different databases

I have two apps, with same tables. One of app collecting data from web. I want to send the datas to my second(web app)'s app database.
With the code below, I have created the file with datas:
pg_dump -U username -t public."table_name" -d database name --inserts > table_name.sql
The problem is that I just want to insert data's which does not exist in second database.
If I try the code below, I get a lot of already exists errors:
psql -U username second_database_name < table_name.sql
One of error:
multiple primary keys for table "table_name" are not allowed
Another one:
relation "table_name_attribute_442....c74_uniq" already exists
--clean , --if-exists ... What should I do?
The way I did it, was to do a pg_dump that creates a compressed archive suitable for use with pg_restore, which has the needed flags to allow the data to be imported without throwing errors.
For example:
pg_dump -Fc -h 127.0.0.1 {db_name_here} > {dump_file_name_here}
The "-Fc" gets you the file-type that pg_restore wants; it will reject a dump made without those magic letters.
Now you can restore the file with:
pg_restore -O -h 127.0.0.1 --clean --disable-triggers -d {target_db_name} {dump_file_name_here}
Voila - the data is now in the target_db.
If the 'psql' command has the equivalent flags, I don't know what they are, and did not find them in SO searching. But, hopefully, this provides a DB dump/restore that 'just works' for those who need to get back to using the DB, instead of fiddling with trying to get a simple dump/restore to happen with the expected behavior.
Also note, if you do not specify ...
-h 127.0.0.1
... it will go looking for a unix-socket, which may or may not be configured correctly. Chances are, your use-case for the DB addresses the db that way, so you never manually-configured the unix-socket to match whatever the command defaults to trying to find, which is different than how setup sets the default socket (of course, because things "just working," so you can "just use it" is just not possible).

Heroku: Storing local MongoDB to MongoLab

It might be a dead simple question yet I still wanted to ask. I've created a Node.js application and deployed it on Heroku. I've also set up the database connection without having any trouble as well.
However, I cannot get the load the local data in my MongoDB to MongoLab I use on heroku. I've searched google and could not find a useful solution so I ended up trying these commands;
mongodump
And:
mongorestore -h mydburl:mydbport -d mydbname -u myusername -p mypassword --db Collect.1
Now when I run the command mongorestore, I received the error;
ERROR: multiple occurrences
Import BSON files into MongoDB.
When I take a look at the DB file for MongoDB I've specified and used during the local development, I see that there are files Collect.0, Collect.1 and Collect.ns. Now I know that my db name is 'Collect' since when I use the shell I always type `use Collect`. So I specified the db as Collect.1 in command line but I still receive the same errors. Should I remove all the other collect files or there is another way around?
You can't use 'mongorestore' against the raw database files. 'mongorestore' is meant to work off of a dump file generated by 'mongodump'. First us 'mongodump' to dump your local database and then use 'mongorestore' to restore that dump file.
If you go to the Tools tab in the MongoLab UI for your database, and click 'Import / Export' you can see an example of each command with the correct params for your database.
Email us at support#mongolab.com if you continue to have trouble.
-will
This can done by two steps.
1.Dump the database
mongodump -d mylocal_db_name -o dump/
2.Restore the database
mongorestore -h xyz.mongolab.com:12345 -d remote_db_name -u username -p password dump/mylocal_db_name/