How do I move data from RDS of one AWS account to another account - postgresql

We have our web services and database set up on AWS a while back and application is now in production. For some reason, we need to terminate the old AWS and move everything under a newly created AWS account. Application and all the infrastructure are pretty straightforward. It is trickier for data though. The current database is still receiving lots of data on daily basis. So it is best to migrate the data after we turn off the old application and switch on new platform.
Both source RDS and target RDS are Postgres. We have about 40GB data to transfer. There are three approaches I could think of and they all have drawbacks.
Take a snapshot of the first RDS and restore it in second one. Problem is I don't need to transfer all the data from source to destination. Probably just records after 10/01 is enough. Also snapshot works best to restore in an empty rds that is just created. For our case, the new RDS will start receiving data already after the cutoff. Only after that, the data will be transferred from old account to new account otherwise we will lose data.
Dump data from tables in old RDS and backup in new RDS. This will have the same problem as #1. Also, if I dump data to local machine and then back up from local, the network speed is bottleneck.
Export table data to csv files and import to new RDS. The advantage is this method allows pick and choose and some data cleaning as well. But it takes forever to export a big fact table to local csv file. Another problem is, for some of the tables, I have surrogate row IDs which are serial (auto-incremental). The row IDs of exported csv may conflicting with existing data in new RDS tables.
I wonder if there is a better way to do it. Maybe some ETL tool AWS has which does point to point direct transfer without involving using local computer as the middle point.

In 2022 the simplest way to achieve this task is using AWS Database Migration Services (AWS DMS).
You can create a migration task, and set the original database as the source endpoint, and the new database as a destination endpoint.
Next create a task with "Full load, ongoing replication" settings.
More details here: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html

I have recently moved the data of RDS from one account to other using Bucardo (https://bucardo.org/). Please refer the following blogs
https://www.compose.com/articles/using-bucardo-5-3-to-migrate-a-live-postgresql-database/
https://bucardo.org/pipermail/bucardo-general/2017-February/002875.html
Though this has not mentioned exactly about migration between two RDS account, this could help setting things. We still need some intermediate point such as EC2 instance where we need to configure this Bucardo and migrate the data between accounts. If you are looking for more information, I am happy to help.
In short, we need to take a manual snapshot of the source db and restore it in the another account (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ShareSnapshot.html) and with Bucardo set up in the EC2 instance, we can start to sync the data using triggers and that will update the data in destination db as and then the new data comes in to the source DB.

Related

Connect services DataConnect and DB2 Warehouse on Cloud (former DashDB)

I am working on the migration of a db2 process, which connects to several remotes servers, exports data into our local db, and then manipulates it (insert computed data, calculated times, etc) I have created some activities in DataConnect to replicate the export data from different datasources and load to local tables. The scripts that handles the data have to be done in DB2 Warechouse on Cloud (ex dashdb)
Currently, this scripts run automatically triggered by the first task (manual) However, having the new processes separated (2 services) it does not allow me to automate it. Furthermore, we have many activities in dataconnect, then it keeps switching between dc and db2...and you have to go from one console to the other.
Does anyone know of a Bluemix service which allow schedule or trigger jobs or events from services? Is there a way to use the API and programmatically do this?
Thanks
Well,
Bluemix offers Workload Scheduler. Data Connect allows to schedule activities.
Juan,
A couple of things come to mind for automation here. Do the databases that you are speaking of have IP line of sight to the Warehouse DB ? If so remote tables may be able to help depending on the source database. Once the data is visible you should be able to write a SQL process that manages the process all from Warehouse DB.
The other possibility is external tables as long as the data is visible on the head node. There are some other choices like s3 storage. I think the concept is if that you can push your data into S3 storage you can pull it into Warehouse DB. I think you should be able to coordinate this all from the Warehouse db side as long as the data is visible through remote tables and/or external tables.

Realtime Content and Database Replication for Alfresco

I'm currently researching on how to replicate physical files (store contents) and the meta data (database) of Alfresco. This is of course a safety measure in case of server failure or whatsoever.
Currently i am running Alfresco's Database on PostgreSQL Engine, and by far, have learned PostgreSQL's WAL and Stream replication. Of which i believe, i can use in terms on replicating Alfresco's meta data (database) real time.
The next problem i face now, is as to how i can replicate alfresco's repository/physical files (store contents) in real time ?
i am currently looking at Alfresco's Built-in Replication Job. But as far as i have read, it is "scheduled" and not in real time. And, it needs another instance of Alfresco running on the "SLAVE" Server.
So my question is:
Does Alfresco's Built-in Replication Job cover both the Physical/Repository Files (store contents) and meta data (database) contents ?
or
what is/are the viable ways of replication Alfresco's Physical/Repository Files (store contents) and meta data (database) contents in real time ?
The replication service can be used to replicate objects from one Alfresco server to another at the object level, not the file system and database level. So, of course there are files and database records that are created when an object is replicated, but the those are by-products of the object being created in the replication target.
The replication service is really used to make it easier for objects in a particular path to be read by people in another office. When they read the object they get it locally. When they click "Edit" in Share they will be redirected back to the source Alfresco server.
Long story short, the replication service is in no way, shape, or form, to be used to replicate data for backup or disaster recovery.
If you are running on EC2 or a local filer that supports it, it should be enough to take volume snapshots.
Otherwise, you could use something like rsync scheduled with cron.
But this approach sounds risky. I'm not sure how you will ensure that your database is kept in sync with your file system, which is a requirement for your Alfresco repo to remain consistent.

PostgreSQL - Periodically copying data from one database to another

I'm trying to set up an architecture with 2 databases, say preview and live, that have the exact same schemas. The use case is that edits can be made to the preview database and then pushed to the live database after they are vetted and approved. The production application would read from the live database.
What would be the most appropriate way to push all data from the preview database to the live database without bringing the live database down? Ideally the copy from preview to live would be an atomic transaction.
I've worked with this type of setup in MSSQL, but I'm fairly new to Postgres. So I'm open to hearing other ways to architect this (with Schemas perhaps?).
EDIT: The main reason to use separate databases is that I may need more than 1 target database (not just a single "live" database). I also may need to switch target databases on the fly without altering the source database schema.
I think what you're looking for is a "hot standby". This would be a separate instance of Postgresql, possibly on the same server but usually not, which is a near-real-time replica of the primary server.
In broad strokes, this is done by shipping the binary transaction logs from the primary server to the backup server, and then "replaying" them there. The exact mechanism for transmitting the logs may vary depending on your requirements.
Fortunately, the docs on this are excellent:
https://www.postgresql.org/docs/9.3/static/warm-standby.html
https://www.postgresql.org/docs/9.0/static/hot-standby.html

Point in time reqovery for just one schema or database

I'm going to develop a multi-tenant application, where each tenant lives in its own database or schema (I've not decided this yet).
In this scenario, if I wanted to use point in time recovery (PITR), I also want to have it per-tenant. If a tenant has a problem, I want to be able to roll back only his database or schema and not the whole server.
While I found information how to do backup/restore in such situations with pg_dump and pg_restore, I haven't found any information for PITR.
Is this even possible? If yes, only per database or even per schema?
I can imagine that postgres maybe stores the log of the whole server in a single file, which may be the reason why it could not be possible. But I may be wrong..

syncing postgres dbs across different accounts and regions

Environment
We have 2 AWS accounts A and B.
I have a postgres DB on RDS in Account B.
This DB gets updated periodically.
Few services in Account A need to see the data from DB in Account B.
We tried using VPN tunnels, however they were not very reliable and kept going down.
What we thought as an alternative was to create a new RDS instance in Account A and get the services to connect to DB in Account A.
We need to ensure that the data in DB in Account A is updated when the the Account B data gets updated. This does not have to be right away, it can be a nightly task or something like that.
Questions
Is it possible to sync 2 postgres databases running on RDS across different zones and in different accounts ?
I read up a bit on Read Replica, however I am not sure if they work across different accounts.
The other solution I thought of is to write a script, however I am not sure if I can get only the changes(in B) to be written to the DB in A.
Any other solutions to solve the problem as also welcome.
Thanks
Tanmay