Can you use AWS DMS to move Aurora DB from one account to another? - postgresql

I am trying to migrate an Aurora cluster from one of our accounts to another. We actually do not have a lot write requests and the database itself is quite small, but somehow we decided to minimize the downtime.
I have looked into several options
Use snapshot: cut off the mutation in source DB, take snapshot, share and restore in another account. This would definitely introduce some downtime
Use Aurora cloning: cut off the mutation in source DB, clone the cluster in target account and switch to the target DB. According to AWS, the cloning is much faster than taking and restoring a snapshot, so the downtime should be shorter.
I am not sure if I can use DMS to do this as I did not find useful doc/tutorials about moving Aurora across accounts. Also, I am not sure whether DMS will sync any write requests to target DB during migration.
If DMS can not live sync, then I probably should use Bucardo to live migrate.

Looking at the docs, AWS Aurora with PostgreSQL compatibility is allowed as source & target endpoints. So, answering your question, yes it's possible.
Obviously, your source Aurora DB should be accessible from the target account. Check that the DB endpoint is public and the traffic is not restricted by ACLs rules or SGs rules.
Also, if you want to enable ongoing replication, you need to grant rds_replication (or rds_superuser) role to the source database user. Link to the docs.

We actually ended up using DMS for this migration. What we did was:
Take a snapshot of the target DB in the original account.
Share the snapshot to the target account and restore it over there. (You have to use snapshot for migrating things like triggers, custom types, sequence, etc)
Setup connections (like VPC peering or security groups) between two accounts.
Setup DMS in source account (endpoints, replication instance, task)
Write SQL to temporarily disable/delete constraints, triggers, etc which may cause error when load source data.
Using DMS to load source data and enable ongoing replication.
Enable/add constraints, triggers, etc back.
Post migration test

Related

how to setup replication instance in on premises postgres for master database in AWS RDS postgres?

I have a requirement of checking whether the exact copy of master database from AWS RDS can be created in on premises or not..
I have already established the connectivity between on prem and aws. Also checked the data migration using pg dump. But i am not getting how to create the replica without using DMS. Due to some security purpose we are not supposed to use DMS. So is there any other way out to implement thi ?
Any help will be much appreciated
It appears that your goal is disaster recovery.
Amazon RDS offers a few options for this:
Amazon RDS Snapshots are a backup of the database, stored in a region. If your database is in an Availability Zone that fails, the snapshot can be restored as a new database in another AZ. All AZs are physically separate data centers, much like your own data center is physically separate from an AWS data center.
Snapshots can also be copied to other Regions, which would guarantee a separation distance between data centers.
Multi-AZ Amazon RDS Databases keep a second copy of the data in another AZ and can switch-over to the alternate AZ without losing any data. This is faster than restoring a snapshot, but costs twice as much since two separate database servers are deployed.
These options would be easier to manage than replicating your data to an on-premises system. A Multi-AZ will automatically start the secondary instance, so your app can continue operating with only a short delay and no data loss. This is much better than you could offer if you fail-over to an on-premises system.

Create an RDS/Postgres Replica in another AWS account?

I have an AWS account with a Postgres RDS database that represents the production environment for an app. We have another team that is building an analytics infrastructure in a different AWS account. They need to be able to pull data from our production database to hydrate their reports.
From my research so far, it seems there are a couple options:
Create a bash script that runs on a CRON schedule that uses pg_dump and pg_restore and stash that on an EC2 instance in one of the accounts.
Automate the process of creating a Snapshot on a schedule and then ship that to the other accounts S3 bucket. Then create a Lambda (or other script) that triggers when the snapshot is placed in the S3 bucket and restore it. Downside to this is we'd have to create a new RDS instance with each restore (since you can't restore a Snapshot to an existing instance), which changes the FQDN of the database (which we can mitigate using Route53 and a CNAME that gets updated, but this is complicated).
Create a read-replica in the origin AWS account and open up security for that instance so they can just access it directly (but then my account is responsible for all the costs associated with hosting and accessing it).
None of these seem like good options. Is there some other way to accomplish this?
I would suggest to use AWS Data Migration Service It can listen to changes on your source database and stream them to a target (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html)
There is also a third-party blog post explaining how to set this up
https://medium.com/tensult/cross-account-and-cross-region-rds-mysql-db-replication-part-1-55d307c7ae65
Pricing is per hour, depending on the size of the replication EC2 instance. It runs in the target account, so it will not be on your cost center.

Gradual PostgreSQL database migration from an AWS EC2 instance to Amazon's RDS

We have a quite large (at least to us) database that has over 20.000 tables, which is running in an AWS EC2 Instance, but due to several reasons, we'd like to move it into an AWS RDS instance. We've tried a few different approaches for migrating into RDS but as per the data volume involved (2TB) and RDS' restrictions (users and permissions) and compatibility issues, we haven't been able to accomplish it.
Given the above facts, I was wondering if PostgreSQL actually supports something like mapping a remote schema into a database, if that would be possible we could try to tinker individual per schema migrations and, not the whole database at once, which would actually make the process less painful.
I've read about the IMPORT FOREIGN SCHEMA feature which seems to be supported from version 9.5 and, that seems to do the trick, but is there something like that for 9.4.9?
You might want to look at the AWS Database Migration tool, and the associated Schema Migration tool.
This can move data from an existing database into RDS, and convert - or at least report on what would need to be changed - the schema and associated objects.
You can run this in AWS, point it at your existing EC2-based database as the source, and use a new RDS instance as the destination.

How to setup cross region replica of AWS RDS for PostgreSQL

I have a RDS for PostgreSQL setup in ASIA and would like to have a read copy in US.
But unfortunately just found from the official site that only RDS for MySQL has cross-region replica but not for PostgreSQL.
And I saw this page introduced other ways to migrate data in to and out of RDS for PostgreSQL.
If not buy an EC2 to install a PostgreSQL by myself in US, is there any way the synchronize data from ASIA RDS to US RDS?
It all depends on the purpose of your replication. Is it to provide a local data source and avoid network latencies ?
Assuming that your goal is to have cross-region replication, you have a couple of options.
Custom EC2 Instances
You can create your own EC2 instances and install PostgreSQL so you can customize replication behavior.
I've documented configuring master-slave replication with PostgreSQL on my blog: http://thedulinreport.com/2015/01/31/configuring-master-slave-replication-with-postgresql/
Of course, you lose some of the benefits of AWS RDS, namely automated multi-AZ redundancy, etc., and now all of a sudden you have to be responsible for maintaining your configuration. This is far from perfect.
Two-Phase Commit
Alternate option is to build replication into your application. One approach is to use a database driver that can do this, or to do your own two-phase commit. If you are using Java, some ideas are described here: JDBC - Connect Multiple Databases
Use SQS to uncouple database writes
Ok, so this one is the one I would personally prefer. For all of your database writes you should use SQS and have background writer processes that take messages off the queue.
You will need to have a writer in Asia and a writer in the US regions. To publish on SQS across regions you can utilize SNS configuration that publishes messages onto multiple queues: http://docs.aws.amazon.com/sns/latest/dg/SendMessageToSQS.html
Of course, unlike a two phase commit, this approach is subject to bugs and it is possible for your US database to get out of sync. You will need to implement a reconciliation process -- a simple one can be a pg_dump from Asian and pg_restore into US on a weekly basis to re-sync it, for instance. Another approach can do something like a Cassandra read-repair: every 10 reads out of your US database, spin up a background process to run the same query against Asian database and if they return different results you can kick off a process to replay some messages.
This approach is common, actually, and I've seen it used on Wall St.
So, pick your battle: either you create your own EC2 instances and take ownership of configuration and devops (yuck), implement a two-phase commit that guarantees consistency, or relax consistency requirements and use SQS and asynchronous writers.
This is now directly supported by RDS.
Example of creating a cross region replica using the CLI:
aws rds create-db-instance-read-replica \
--db-instance-identifier DBInstanceIdentifier \
--region us-west-2 \
--source-db-instance-identifier arn:aws:rds:us-east-1:123456789012:db:my-postgres-instance

Sandbox version for AWS RedShift

I have been using RedShift for a few months and I like it. But I need to add some tests around it and I am not sure what the most cost effective way of doing it is. I can only think of using one server RedShift cluster as Sandbox but that seems to be too costly even if I only use it during testing
Databases in Redshift cannot 'see' each other and cross-database queries are not supported. Therefore we simply have 'development', 'test' and 'production' databases on the same cluster.
When we're ready to push to production we:
take a snapshot
drop production
rename test to production
This generally works fine for use because we find Redshift to be over-provisioned on storage, i.e., filling our nodes to their max storage capacity does not provide acceptable performance.
NOTE: You cannot drop your "master" database defined when the cluster was created. If you are using that as your primary database you will have to unload your cluster and recreate it for this approach to be viable.
I got the answer from AWS RedShift forum: "There is no way of creating a sandbox version of Redshift. We'll add this to our backlog of feature requests"