what's a good way to backup a Postgres DB (running on Amazon RDS).
The built in snapshoting from RDS is by default daily and you can not export the snapshots. Besides that, it can take quite a long time to import a snapshot.
Is there a good service that takes dumps on a regular basis and stores them on e.g. S3? We don't want to spin up and maintain a ec2 instance which does that.
Thank you!
I want the backups to be automated, so I would prefer to have dedicated service for that.
Your choices:
run pg_dump from an EC2 instance on a schedule. This is a great use case for Spot instances.
restore a snapshot to a new RDS instance, then run pg_dump as above. This reduces database load.
Want to run a RDS snapshot more often than daily? Kick it off manually.
These are all automateable. For "free" (low effort on your part) you get daily snapshots. I agree, I wish they could be sent to S3.
SOLUTION: Now you can do a pg_dumpall and dump all Postgres databases on a single AWS RDS Instance.
It has caveats and so its better to read the post before going ahead and compiling your own version of pg_dumpall for this. Details here.
Related
Is there a way to backup all of the databases on a hyperscaler managed postgres server as of a certain time in order to maintain data consistency between the databases with either pg_dumpall, pg_dump or something else?
Background:
With the utilization of micro-services, an application may have many databases associated to it on a single hyperscaler managed postgres server. The hyperscalers do perform a functional snapshot backup; however, when a hyperscaler managed postgres server is accidentally deleted, the postgres backups are lost as well. These hyperscalers provide locks to prevent accidental deletes of a postgres server and mention that their support teams can be contacted to restore a deleted server, however, we still had a postgres server get deleted. We were able to recover by contacting the hyperscalers support team but would like to have a second way of backing up a hyperscaler managed postgres server.
I realize that the micro-services should be able to auto-recover to a data consistent point but the reality is that many of the micro-services have not been designed nor written to that requirement. I really do not want to get into the aspect of micro-service design and want to retain this to be a DBA backup question.
pg_dumpall will not take a consistent backup across all databases. Each database backup will be consistent, but the snapshots for the backups of the different databases will be taken at different times.
If you need a consistent backup across several databases in a single cluster, use an online file system backup with pg_basebackup.
You can use pg_basebackup which most of the PostgreSQL DBA’s would end up using on a daily basis be it via scripts or manually. This creates the base backup of the database which can help in recovering in multiple situations. This takes an online backup of the database and hence is very useful when being used in production
you can review this for more details
I'm working on trying to setup my local database with some mock data to work with. We have a development AWS account with a postgres database. I would like to create a backup of it, export it to my local computer, and restore to my local postgres database.
I've been trying to find how to do this online, but everything I'm finding is on how to backup to AWS and to restore back to AWS. I tried creating a snapshot and exporting it via S3 - but the snapshot doesn't produce a sql file to restore from like I was expecting.
If anyone can point me in the right direction I would very much appreciate it :)
I am afraid that the only chance you have is pg_dump/pg_restore.
Even if Amazon lets you get your hands on its file system backups, which I doubt, they may be of little use to you, since Amazon runs modified versions of PostgreSQL and you cannot be sure that the physical file format is identical to PostgreSQL.
My problem is to get big(250Gb) postgres dump on my local machine.
Its on AWS RDS. I tried to dump it to local machine, but it takes too long, kinda 3+ days.
Trying to find a way to dump it into S3 and download from there safely. May be you could suggest more effective way to do that. Will appreciate any kind of help.
Thanks!
As of my knowledge, aws does not provide a way to backup db into s3
you can take a look into this question and answers,
Export huge database from amazon RDS to local mysql
here is one answer
If the data is that big I would suggest copying the RDS snapshot on S3, as explained here.
Link to documentation to copy snapshot to s3
This topic is covered in this StackOverflow thread Exporting a AWS Postgres RDS Table to AWS S3
Another solution would be to spin up an EC2 instance and dump the database to a local EBS volume that is large enough for the following steps. Then chose one of the following:
Compress the DB dump into multiple files and copy to S3 for download. I would use a smart S3 download manager given the size of the database dump.
Export the S3 data using Snowball Export S3 Data. If your Internet connection is not fast enough / reliable enough then Snowball will get you the data.
I want to migrate our postgres db from heroku to our own postgres on AWS.
I have tried using pg_dump and pg_restore to do the migration and it works; but it takes a really long time to do this. Our database size is around 20GB.
What's the best way to do the migration with minimal downtime?
If you mean AWS RDS PostgreSQL:
pg_dump and pg_restore
I know you don't like it, but you don't really have other options. With a lot of hoop jumping you might be able to do it with Londiste or Slony-I on a nearby EC2 instance, but it'd be ... interesting. That's not the most friendly way to do an upgrade, to say the least.
What you should be able to do is ship WAL into RDS PostgreSQL, and/or stream replication logs. However Amazon don't support this.
Hopefully Amazon will adopt some part of 9.4's logical replication and logical changeset extraction features, or better yet the BDR project - but I wouldn't hold my breath.
If you mean AWS EC2
If you're running your own EC2 instance with Pg, use replication then promote the standby into the new master.
I have confusion on what I need to do here. I am new to Mongo. I have set up a small Mongo server on Amazon EC2, with EBS volumes, one for data, one for logs. I need to do a backup. It's okay to take the DB down in the middle of the night, at least currently.
Using the boto library, EBS snapshots and python to do the backup, I built a simple script that does the following:
sudo service mongodb stop
run backup of data
run backup of logs
sudo service mongodb start
The script ran through and restarted, but I noted in the AWS console that the snapshots are still being created, even through boto has come back, but Mongo has restarted. Certainly not ideal.
I checked the Mongo docs, and found this explanation on what to do for backups:
http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/#ec2-backup-database-files
This is good info, but a bit unclear. If you are using journaling, which we are, it says:
If the dbpath is mapped to a single EBS volume then proceed to Backup the Database Files.
We have a single volume for data. So, I'm assuming that means to bypass the steps on flushing and locking. But at the end of Backup the Database Files, it discusses removing the locks.
So, I'm a bit confused. As I read it originally, then I don't actually need to do anything - I can just run the backup, and not worry about flushing/locking period. I probably don't need to take the DB down. But the paranoid part of me says no, that sounds suspicious.
Any thoughts from anyone on this, or experience, or good old fashioned knowledge?
Since you are using journaling, you can just run the snapshot without taking the DB down. This will be fine as long as the journal files are on the same EBS volume, which they would be unless you symlink them elsewhere.
We run a lot of mongodb servers on Amazon and this is how we do it too.