Replicate data from one RDS server to another - rds

Can we replicate data from one RDS server to another? Or can we set master slave relationship between two RDS servers?
Should we replicate data from non RDS instance to RDS instance?

RDS can replicate from external mysql and also be a master of an external slave. It depends on your usecase if you "should" do it.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.External.Repl.html
While i guess you could setup replication between two RDS instances yourself I don't see why you should since starting a RDS read replica is just a few clicks in AWS console or an api call.

It can be possible to replicate data from RDS to RDS. It is also possible to replicate data from RDS to some other MySQL server.
Steps:
You can go creating your ec2 server and install MySQL.
Change configuration to replicate data.
That will require additional work to manage ec2 instance in case if your data is increasing and crossing the server limits
Then you have to do all the manual work again to replicate data as we can't increase storage in ec2 server.
RDS provides an easy mechanism to create Read replica via a few clicks. (Note: replica is quite a costlier option.)
But going with that you will save manual work one person salary who will be managing the database and doing these setups regularly.

If you are using postgresql database on RDS then you can use bucardo for asynchronous replication. You need to create a EC2 or use can use local system also but it will not be fast enough.
Use the following tutorial if you want to use bucardo.
https://www.installvirtual.com/how-to-install-bucardo-for-postgres-replication/

I think you can using snapshot to clone another rds database

Related

Multi cloud PostgreSQL replication

I have an Azure-managed PostgreSQL database.
I want to create a logical replica of it at GCP, (Google-managed, if possible).
At Azure, I've set the Azure replication support to Logical. However, this just seems to allow me to create replicas inside Azure. What I want is to create a replica in GCP.
If this was not Azure-managed, but self-managed, I would be able to create a tunnel from Azure to GCP and then do the WAL copy replication.
One might wonder: why? Because I don't want to be locked with one vendor.
If that cross-cloud replication is not possible, what's the easiest way to pull the entire database off (possibly not just the data with pgdump, but all its internals too).
While this question is Azure -> GCP, it seems other alternatives like GCP -> AWS or other vendors are also not supported. Or what am I missing?
Cross-Cloud Replication from Azure Source PostgreSQL to GCP destination CloudSQL through Conventional Native Logical Replication is possible and I've tested that it's working. I'm sure that it would work for self managed database too.

how to setup replication instance in on premises postgres for master database in AWS RDS postgres?

I have a requirement of checking whether the exact copy of master database from AWS RDS can be created in on premises or not..
I have already established the connectivity between on prem and aws. Also checked the data migration using pg dump. But i am not getting how to create the replica without using DMS. Due to some security purpose we are not supposed to use DMS. So is there any other way out to implement thi ?
Any help will be much appreciated
It appears that your goal is disaster recovery.
Amazon RDS offers a few options for this:
Amazon RDS Snapshots are a backup of the database, stored in a region. If your database is in an Availability Zone that fails, the snapshot can be restored as a new database in another AZ. All AZs are physically separate data centers, much like your own data center is physically separate from an AWS data center.
Snapshots can also be copied to other Regions, which would guarantee a separation distance between data centers.
Multi-AZ Amazon RDS Databases keep a second copy of the data in another AZ and can switch-over to the alternate AZ without losing any data. This is faster than restoring a snapshot, but costs twice as much since two separate database servers are deployed.
These options would be easier to manage than replicating your data to an on-premises system. A Multi-AZ will automatically start the secondary instance, so your app can continue operating with only a short delay and no data loss. This is much better than you could offer if you fail-over to an on-premises system.

Best way to set up jupyter notebook project in AWS

My current project have the following structure:
Starts with a script in jupyter notebook which dowloads data from a CRM API to put in a local PostgressSql database I run with PgAdmin. After that it runs cluster analysis, return some scoring values, creates a table in database with the results and updates this values in the CRM with another API call. This process will take between 10 to 20 hours (the API only allows 400 requests per minute).
The second notebook reads the database, detects last update, runs api call to update database since the last call, runs kmeans analysis to cluster the data, compare results with the previous call, updates the new ones and the CRM via API. This second process takes less than 2 hours in my estimation and I want this script to run every 24 hours.
After testing, this works fine. Now I'm evaluating how to put this in production in AWS. I understand for the notebooks I need Sagemaker and from I have seen is not that complicated, my only doubt here is if I can call the API without implementing aditional code or need some configuration. My second problem is database. I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3. My goal is to write the less code as possible, but a have try some tutorial of RDS like this one: [1]: https://www.youtube.com/watch?v=6fDTre5gikg&t=10s, and I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance?? and how to connect to it to analysis this data from SageMaker. My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud. Just some orientation about how to use this tools would be appreciated.
I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3
RDS and Aurora are relational databases fully managed by AWS. "Regular" RDS allows you to launch the existing popular databases such as MySQL, PostgreSQSL and other which you can launch at home/work as well.
Aurora is in-house, cloud-native implementation databases compatible with MySQL and PosrgreSQL. It can store the same data as RDS MySQL or PosrgreSQL, but provides a number of features not available for RDS, such as more read replicas, distributed storage, global databases and more.
S3 is not a database, but an object storage, where you can store your files, such as images, csv, excels, similarly like you would store them on your computer.
I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance??
You can migrate your data from your local postgress to RDS or Aurora if you wish. But RDS nor Aurora will not connect to your existing local database, as they are databases themselfs.
My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud.
I don't see a reason why you wouldn't be able to connect to the database. You can try to make it work, and if you encounter difficulties you can make new question on SO with RDS/Aurora setup details.

real-time sync between local Postgres instance and Azure Cloud Postgres instance

I need to set up real time sync process between a on premise postgresql instance with cloud postgresql instance. Please let me know what are all the options available through which i can achieve it.
Do i have to use any specific tool or it can be managed through replication .
Please advice
Use PgPool
http://www.pgpool.net/mediawiki/index.php/Main_Page
from their web page:
pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.

How to setup cross region replica of AWS RDS for PostgreSQL

I have a RDS for PostgreSQL setup in ASIA and would like to have a read copy in US.
But unfortunately just found from the official site that only RDS for MySQL has cross-region replica but not for PostgreSQL.
And I saw this page introduced other ways to migrate data in to and out of RDS for PostgreSQL.
If not buy an EC2 to install a PostgreSQL by myself in US, is there any way the synchronize data from ASIA RDS to US RDS?
It all depends on the purpose of your replication. Is it to provide a local data source and avoid network latencies ?
Assuming that your goal is to have cross-region replication, you have a couple of options.
Custom EC2 Instances
You can create your own EC2 instances and install PostgreSQL so you can customize replication behavior.
I've documented configuring master-slave replication with PostgreSQL on my blog: http://thedulinreport.com/2015/01/31/configuring-master-slave-replication-with-postgresql/
Of course, you lose some of the benefits of AWS RDS, namely automated multi-AZ redundancy, etc., and now all of a sudden you have to be responsible for maintaining your configuration. This is far from perfect.
Two-Phase Commit
Alternate option is to build replication into your application. One approach is to use a database driver that can do this, or to do your own two-phase commit. If you are using Java, some ideas are described here: JDBC - Connect Multiple Databases
Use SQS to uncouple database writes
Ok, so this one is the one I would personally prefer. For all of your database writes you should use SQS and have background writer processes that take messages off the queue.
You will need to have a writer in Asia and a writer in the US regions. To publish on SQS across regions you can utilize SNS configuration that publishes messages onto multiple queues: http://docs.aws.amazon.com/sns/latest/dg/SendMessageToSQS.html
Of course, unlike a two phase commit, this approach is subject to bugs and it is possible for your US database to get out of sync. You will need to implement a reconciliation process -- a simple one can be a pg_dump from Asian and pg_restore into US on a weekly basis to re-sync it, for instance. Another approach can do something like a Cassandra read-repair: every 10 reads out of your US database, spin up a background process to run the same query against Asian database and if they return different results you can kick off a process to replay some messages.
This approach is common, actually, and I've seen it used on Wall St.
So, pick your battle: either you create your own EC2 instances and take ownership of configuration and devops (yuck), implement a two-phase commit that guarantees consistency, or relax consistency requirements and use SQS and asynchronous writers.
This is now directly supported by RDS.
Example of creating a cross region replica using the CLI:
aws rds create-db-instance-read-replica \
--db-instance-identifier DBInstanceIdentifier \
--region us-west-2 \
--source-db-instance-identifier arn:aws:rds:us-east-1:123456789012:db:my-postgres-instance