I want to create an automated job that can copy the entire database to a different one, both are in AWS RDS Postgres, how can I do that?
Thanks.
You can use Database create/restore snapshot.
Here is the example for command line:
aws rds create-db-snapshot \
--db-instance-identifier mydbinstance \
--db-snapshot-identifier mydbsnapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mynewdbinstance \
--db-snapshot-identifier mydbsnapshot
The same APIs such as CreateDBSnapshot are available for multiple languages via AWS SDK.
I have had success in the past running a script that dumps data from one Postgres server and pipes it into another server. It was basically like this pseudo-code:
psql target-database -c "truncate foo"
pg_dump source-database --data-only --table=foo | psql target-database
The pg_dump command outputs normal SQL commands that can be piped into a receiving psql command, which then inserts the data.
To understand how this works, run pg_dump on one table and then take a look at the output. You'll need to tweak the command to get exactly what you want (eg using --no-owner to avoid sending access configurations).
Related
I have an un-updated local Postgres server, running on a docker container, now I want to update all new records from Production DB,
which runs on Azure Postgres DB.
I'm aware of the pg_dump as in this Answer
but, I'm not clear where should I commend it - the Azure DB doesn't know the Local one and vice versa?
There are multiple methods which you can try.
The most common and simple approach is to use pg_dump and pg_restore commands in bash to upgrade the database.
In above mentioned method, you first create a dump from the source server using pg_dump. Then you restore that dump file to the target server using pg_restore.
To back up an existing PostgreSQL database, run the following command:
pg_dump -Fc -v --host=<host> --username=<name> --dbname=<database name> -f <database>.dump
Once the file has been created, download it in local environment.
After you've created the target database, you can use the pg_restore command and the --dbname parameter to restore the data into the target database from the dump file.
pg_restore -v --no-owner --host=<server name> --port=<port> --username=<user-name> --dbname=<target database name> <database>.dump
To find more upgrade methods, you can refer https://learn.microsoft.com/en-us/azure/postgresql/how-to-upgrade-using-dump-and-restore#method-1-using-pg_dump-and-psql.
To get more details on pg_dump and pg_restore methood, please refer Microsoft Official document Migrate your PostgreSQL database by using dump and restore.
I'm having difficulty restoring a DB to an AWS RDS Postgresql instance. Context is that i am backing up from one RDS instance and restoring to another RDS insurance. They both have the same version of Postgresql 9.6.5.
I was able to take a dump using the following command:
./pg_dump.exe -U dbuser -W -h prod-pgsql-rds.3ft5coqxjdnq.eu-west-2.rds.amazonaws.com -d devdb > c:\tmp\backup.sql
From the resulting .sql file, I then attempted a restore to another RDS instance which is also using Postgresql 9.6.5 using below command:
./pg_restore.exe -U dbuser -d testdevdb -h dev-pgsql-rds.cym8coqx52lq.eu-west-2.rds.amazonaws.com "c:\tmp\backup.sql"
*I also tried the -f switch in the above restore command instead of the " " quotes before/after the file name
But when I try to restore it to a newly created database I get the following error:
pg_restore: [archiver] input file does not appear to be a valid archive
Can anyone help? FYI, I am using PGAdmin 4 via Windows PowerShell. I have to edit some of the values in the strings above due to data sensitivity.
pg_restore is only used for the other, non-plain-text output formats that pg_dump can output. For .sql dumps, you just use psql. See the docs on restoring from backups.
In a Unix env, you'd do psql [yourflags] < /tmp/backup.sql, but I'm unfamiliar with powershell and don't know if it supports < for input redirection; hopefully either it's present or you know the equivalent PowerShell syntax.
So I couldn't get psql or pg_restore to work so opted to import the .SQL file into via the SQL query tool in PGAmdin. This through up some errors so had to make several changes to the .SQL file and perform below:
Commented out a couple of lines that were causing errors
Elevated permissions for the user and made him the owner of for the Schema and DB properties by right-clicking on these via PGAdmin
The .sql file was making several references to the user from the source RDS DB so had to do a find and replace with a user account created for the destination RDS DB. Alternatively, I could have just created a new user on the destination DB with the same username and password as the source DB and then make him the owner in ref to step 2.
I am trying to take backup of a schema from a remote Postgresql server (Version:11.5). The below command used to take the backup works:
pg_dump -CFc -h host -U user -d database -n schema -f "/path/data.backup".
Below is the verbose printed, which says the table contents are also dumped:
Besides, I am using Dbeaver tool to restore the backup into the postgresql server (Version:11.5) installed in my local machine. The restore works but the tables are empty.
Is there any other option which needs to be added, to export the data into the backup file ?
I've dumped my database as described in Exporting data from an externally-managed database server:
pg_dump -U [USERNAME] --format=plain --no-owner \
--no-acl [DATABASE_NAME] \
| sed -E 's/(DROP|CREATE|COMMENT ON) EXTENSION/-- \1 EXTENSION/g' > [SQL_FILE].sql
The database I'm dumping from is running PostgreSQL 9.6.6. Google Cloud SQL also uses 9.6.
Then I have copied the db-dump to a bucket and tried to restore it as described here.
That yields this error message from the web interface at cloud.google.com:
Any idea how I fix that?
You are missing the pgcrypto extension. The sed post-processing comments out all extension statements in the SQL dump file. You need to un-comments the neccessary and cloud sql supported extensions like pgcrypto and leave only not supported commented. You can find info about supporee extensions at https://cloud.google.com/sql/docs/postgres/extensions.
I had so much trouble doing this - I thought I would make a Q/A on StackOverflow to explain the process.
The question is about copying an RDS postgres database for development usage - especially for testing database migrations scripts, etc. That's why the focus on a "single schema" within a "single database".
In my case, I want to create a test database that's as isolated as possible, while remaining within a single RDS instance (because spinning up entire RDS instances takes anywhere from 5 - 15 minutes and because I'm cheap).
Here is an answer using only the command line.
Pre-requisites:
you must have Postgres client tools installed (don't need the actual server)
client version must be same or higher than your postgres server version
network access to the RDS instance
credentials for accessing the relevant database accounts
Example context:
I have an RDS instance at rds.example.com which has a master user named rds_master.
I have an "application user" named db_dev_user, a database named dev_db that contains the schema app_schema.
note that "user" and "role" in postgres are synonymous
Note: this guide was written in 2017, for postgres version 9.6.
If you find that some steps are no longer working on a recent version of postgres - please do post any fixes to this post as comments or alternative answers.
pg_dump prints out the schema and data of the original database and will work even while there are active connections to the database. Of course, performance for those connections is likely to be affected, but the resultant copy of the DB is transactional.
pg_dump --host=rds.example.com --port=5432 \
--format=custom \
--username=db_dev_user --dbname=dev_db \
> pgdumped
The createuser command creates the user that your test application/processes should connect with (for better isolation), note that the created user is not a superuser and it cannot create databases or roles.
createuser --host=rds.example.com --port=5432 \
--username=rds_master \
--no-createdb --no-createrole --no-superuser \
--login --pwprompt \
db_test_user
Without this next grant command the following createdb will fail:
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=postgres \
--command="grant db_test_user TO rds_master"
createdb does what it says on the tin; note that the db_test_user role "owns" the DB.
createdb --host=rds.example.com --port=5432 \
--username=rds_master --owner=db_test_user test_db
The create schema command is next. The db_test_user cannot create the schema, but it must be authorized for the schema or the pg_restore would fail because it would end up trying to restore into the pg_catalog schema (so note that user=rds_master, but dbname=test_db).
psql --host=rds.example.com --port=5432 \
--username=rds_master --dbname=test_db \
--command="create schema app_schema authorization db_test_user"
Finally, we issue the pg_restore command, to actually create the schema objects (tables, etc.) and load the data into them:
pg_restore --host=rds.example.com --port=5432 \
--verbose --exit-on-error --single-transaction \
--username=db_test_user --schema=app_schema \
--dbname=test_db --no-owner \
./pgdumped
exit-on-error - because otherwise finding out what went wrong involves too much scrolling and scanning (also it's implied by single-transaction anyway)
single-transaction - avoids having to drop or recreate the DB if things go pear-shaped
schema - only do the schema we care about (can also supply this to the original pg_dump command)
dbname - to ensure use of the DB we created
no-owner - we're connecting as db_test_user anyway, so everything should be owned by the right user
For production, you'd be better off just taking an RDS snapshot of your instance and restoring that, which will create an entirely new RDS instance.
On a mostly empty database - it takes a few minutes to create the snapshot and another 5 minutes or so to create the new RDS instance (that's part of why it's a pain during development).
You will be charged for the new RDS instance only while it is running. Staying within the free tier is one of the reasons I wanted to create this DB with the same instance for development purposes, plus not having to deal with a second DNS name; and that effect is multiplied as you start to have multiple small development environments.
Running a second RDS instance is the better option for production because you nearly completely eliminate any risk to your original DB. Also, when you're dealing with real amounts of data - snapshot/DB creation times will be dwarfed by the amount of time spent reading/writing the data. For large amounts of data, it's likely the Amazon RDS snapshot creation/restore process is going to have far better parallelisation than a set of scripts running on a single server somewhere. Additionally, the RDS console gives you visilibility into the progress of the restore - which becomes invaluable as the dataset grows larger and more people become involved.