How to copy a PostgreSQL RDS database within an RDS instance - postgresql

I had so much trouble doing this - I thought I would make a Q/A on StackOverflow to explain the process.
The question is about copying an RDS postgres database for development usage - especially for testing database migrations scripts, etc. That's why the focus on a "single schema" within a "single database".
In my case, I want to create a test database that's as isolated as possible, while remaining within a single RDS instance (because spinning up entire RDS instances takes anywhere from 5 - 15 minutes and because I'm cheap).

Here is an answer using only the command line.
Pre-requisites:
you must have Postgres client tools installed (don't need the actual server)
client version must be same or higher than your postgres server version
network access to the RDS instance
credentials for accessing the relevant database accounts
Example context:
I have an RDS instance at rds.example.com which has a master user named rds_master.
I have an "application user" named db_dev_user, a database named dev_db that contains the schema app_schema.
note that "user" and "role" in postgres are synonymous
Note: this guide was written in 2017, for postgres version 9.6.
If you find that some steps are no longer working on a recent version of postgres - please do post any fixes to this post as comments or alternative answers.
pg_dump prints out the schema and data of the original database and will work even while there are active connections to the database. Of course, performance for those connections is likely to be affected, but the resultant copy of the DB is transactional.
pg_dump --host=rds.example.com --port=5432 \
--format=custom \
--username=db_dev_user --dbname=dev_db \
> pgdumped
The createuser command creates the user that your test application/processes should connect with (for better isolation), note that the created user is not a superuser and it cannot create databases or roles.
createuser --host=rds.example.com --port=5432 \
--username=rds_master \
--no-createdb --no-createrole --no-superuser \
--login --pwprompt \
db_test_user
Without this next grant command the following createdb will fail:
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=postgres \
--command="grant db_test_user TO rds_master"
createdb does what it says on the tin; note that the db_test_user role "owns" the DB.
createdb --host=rds.example.com --port=5432 \
--username=rds_master --owner=db_test_user test_db
The create schema command is next. The db_test_user cannot create the schema, but it must be authorized for the schema or the pg_restore would fail because it would end up trying to restore into the pg_catalog schema (so note that user=rds_master, but dbname=test_db).
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=test_db \
--command="create schema app_schema authorization db_test_user"
Finally, we issue the pg_restore command, to actually create the schema objects (tables, etc.) and load the data into them:
pg_restore --host=rds.example.com --port=5432 \
--verbose --exit-on-error --single-transaction \
--username=db_test_user --schema=app_schema \
--dbname=test_db --no-owner \
./pgdumped
exit-on-error - because otherwise finding out what went wrong involves too much scrolling and scanning (also it's implied by single-transaction anyway)
single-transaction - avoids having to drop or recreate the DB if things go pear-shaped
schema - only do the schema we care about (can also supply this to the original pg_dump command)
dbname - to ensure use of the DB we created
no-owner - we're connecting as db_test_user anyway, so everything should be owned by the right user

For production, you'd be better off just taking an RDS snapshot of your instance and restoring that, which will create an entirely new RDS instance.
On a mostly empty database - it takes a few minutes to create the snapshot and another 5 minutes or so to create the new RDS instance (that's part of why it's a pain during development).
You will be charged for the new RDS instance only while it is running. Staying within the free tier is one of the reasons I wanted to create this DB with the same instance for development purposes, plus not having to deal with a second DNS name; and that effect is multiplied as you start to have multiple small development environments.
Running a second RDS instance is the better option for production because you nearly completely eliminate any risk to your original DB. Also, when you're dealing with real amounts of data - snapshot/DB creation times will be dwarfed by the amount of time spent reading/writing the data. For large amounts of data, it's likely the Amazon RDS snapshot creation/restore process is going to have far better parallelisation than a set of scripts running on a single server somewhere. Additionally, the RDS console gives you visilibility into the progress of the restore - which becomes invaluable as the dataset grows larger and more people become involved.

Related

AWS postgres copying data from one database to the other daily?

I want to create an automated job that can copy the entire database to a different one, both are in AWS RDS Postgres, how can I do that?
Thanks.
You can use Database create/restore snapshot.
Here is the example for command line:
aws rds create-db-snapshot \
--db-instance-identifier mydbinstance \
--db-snapshot-identifier mydbsnapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mynewdbinstance \
--db-snapshot-identifier mydbsnapshot
The same APIs such as CreateDBSnapshot are available for multiple languages via AWS SDK.
I have had success in the past running a script that dumps data from one Postgres server and pipes it into another server. It was basically like this pseudo-code:
psql target-database -c "truncate foo"
pg_dump source-database --data-only --table=foo | psql target-database
The pg_dump command outputs normal SQL commands that can be piped into a receiving psql command, which then inserts the data.
To understand how this works, run pg_dump on one table and then take a look at the output. You'll need to tweak the command to get exactly what you want (eg using --no-owner to avoid sending access configurations).

Why did SQL syntax change after restoring local Postgres onto AWS RDS?

Problem
A simple 2x2 table of data explains my problem. Both databases can be made to work, but they behave differently and I need them to be the same.
PostreSQL Query | Local DB | Amazon-RDS |
--------------------------+------------+-----------------------------------------+
SELECT * from mydb.users; | Success | Success |
--------------------------+------------+-----------------------------------------+
SELECT * from users; | Success | ERROR: relation "users" does not exist |
---------------------------------------------------------------------------------+
Details
The databases should be identical. Amazon-RDS is literally pg_restore'd from a pg_dump of the local database. Exact commands:
$ pg_dump --format=c ---no-privileges --no-owner --verbose \
--host=localhost --port=5432 --username=gary mydb;
$ pg_restore --no-owner --no-tablespaces --dbname=mydb --verbose \
--host=127.0.0.1 --port=47737 \ #ssh tunnel
--username=XXXXXX --format=c
The problem is not with the data dump itself. I've wiped the local database, restored it from the dump, and it still behaves the way it's supposed to.
The problem doesn't just manifest not just with my raw SQL queries, there's a sizable Node/Express app that is supposed to front-end the database. It generates queries without the database prefix in front of the tables too. The app uses Sequelize for an ORM and has been running with MySQL on Amazon-RDS in
production for years. The issue I'm seeing now has only appeared while migrating from MySQL to PostgreSQL.
I have no experience with Postgres.
I don't think it should matter, but in full disclosure, I'm using DBeaver to handle all my database connections, and do the db dump and restore.
Questions
Why does one database successfully infer the database from the name of the table alone, and not the other cannot?
Is there a configuration setting somewhere to make them both work? mydb is the only database in the RDS instance.
mydb is not a database, it's a schema. And it appears that it is not in the schema search_path on RDS.
It could be configured in the cluster settings, the settings of your database, the settings of your (login) user, or locally on the DBeaver connection.

Restore database from production to Development

We have the database named 'itreport' on production server and database named 'itreport_dev' on development server.
1)On Production server, 52 users are present in the database 'itreport'.
2)On Development server, 60 users are in present the database 'itreport_dev'.
3)I have taken the dump of production server database 'itreport'. Dump file name is backup_12082017.sql
My question is
If I restore the above dump(backup) file to Development server database 'itreport_dev, Users(60) present will present in the Development database?
If not what option we have to give in the restore process?
What are the pre steps and post steps to be performed on Develpement server?
Short answer: No, roles are not part of a single-database backup.
If you dump only the database using pg_dump it will only restore tables and data. not any roles. any objects owned by missing roles will end up owned by the user performing the restore (this user should be a superuser)
If you do pg_dumpall roles and all databases will be backed up.
Roles can be backed up separately using pg_dumpall -r
if you do pgdumpall --clean the resore will destroy and replace any databases and roles on the dev server that also exist in the dump. any names that are not in both will be unaffected, (the special role "postgres" and template databases also are untouched)
pgdumpall backups are SQL backups and should be restores using psql
su postgres -c psql < all-database-backupfile.sql
or
zcat all-database-backupfile.sql.gz | su postgres -c psql
etc.
(for windows use runas instead of su, I'm not sure of the exact syntax needed)

Importing data into two postgres servers works on one, not on other

I dumped my production db from an Amazon RDS postgresql instance and on occasion, I restore production to our staging and development databases.
Currently the staging and development databases reside on an RDS instance and the import works great. I am currently attempting to restore the database to a postgres installation that isn't an RDS instance and I continuously get the error invalid command \N before that I get ERROR: relation "locations" does not exist. I have been trying everything to get this to work. I have recreated the database several times ensuring all of the settings match what I can see of the RDS instance, and am having no luck.
I am attempting to use psql -h {host} -U {user} -d {db} < production.sql

Copying postgresql data without roles

I have a snapshot of a Postgres database named srcDB in a file named srcDB-cachedump.gz, obtained using pg_dump. The dump is from a production database, so there are a bunch of different roles. What I would like to do is make a development database called devDB, which has only one role, a user by the name of 'development', but with the same schema as the original database.
The way I initially tried to do this was:
gunzip -c srcDB-cachedump.gz | psql -d devDB -U development -W
When I ran this, however, I got many errors that boiled down to, essentially, "role does not exist." I would like to bypass the recreation of the roles in the production database if at all possible, as I have other programmers on my team and I would like the dev environment to be as portable as possible. I am relatively new at Postgres administration, though, so I am at a loss.