How to reduce storage(scale down) my RDS instance? - postgresql

I have a RDS(Postgres) instance with Storage SSD 1000GB, but the data is only 100GB of size.
How can I scale down the storage resource of RDS easily ?

RDS does not allow you to reduce the amount of storage allocated to a database instance, only increase it.
To move your database to less storage you would have to create a new RDS instance with your desired storage space, then use something like pg_dump/pg_restore to move the data from the old database to the new one.
Also be aware that an RDS instance with 1,000GB of SSD storage has a base IOPS of 3,000. An RDS instance with 100GB of SSD storage has a base IOPS of 300, with occasional bursts of up to 3,000.

Based on AWS's help here, this is the full process that worked for me:
1) Dump the database to a file: run this on a machine that has network access to the database:
pg_dump -Fc -v -h your-rds-endpoint.us-west-2.rds.amazonaws.com -U your-username your-databasename > your-databasename.dump
2) In the AWS console, create a new RDS instance with smaller storage. (You probably want to set it up with the same username, password, and database name.)
3) Restore the database on the new RDS instance: run this command (obviously on the same machine as the previous command):
pg_restore -v -h the-new-rds-endpoint.us-west-2.rds.amazonaws.com -U your-username -d your-databasename your-databasename.dump
(Note, in step 3, that I'm using the endpoint of the new RDS instance. Also note there's no :5432 at the end of the endpoint addresses.)

Amazon doesn't allow to reduce size of HDD of RDS instance, you may have two options to reduce size of storage.
1:-if you can afford downtimes, then mysqldump backup of old instance can be restored to new instance having lesser storage size.
2:- You can use Database migration service to move data from one instance to another instance without any downtime.

When using RDS, instead of doing typical hardware "capacity planning", you just provisioning just enough disk space for short or medium term (depends), expand it when needed.
As #Mark B mentioned , you need to watchout the IOPS as well. You can use "provisioned IOPS" if you need high performance DB.
You should make you cost vs performance adjustment before jump into the disk space storage part.
E.g. if you reduce 1000GB to 120GB , for US west, you will save 0.125x 880GB = 110/month. But the Max IOPS will be 120x 3 = 360IOPS
It will cost you $0.10 to provision additional IOPS to increase performance. Say if you actually need 800IOPS for higher online user response,
(800-360) x 0.10 = $44. So the actual saving may eventually "less". You will not save any money if your RDS need constant 1100 IOPS. And also other discount factor may come into play.

You can do this by migrating the DB to Aurora.
If you don't want Aurora, the Data Migration Service is the best option in my opinion. We're moving production to Aurora, so this didn't matter, and we can always get it back out of Aurora using pg_dump or DMS. (I assume this will apply to MySQL as well, but haven't tested it.)
My specific goal was to reduce RDS Postgres final snapshot sizes after decommissioning some instances that were initially created with 1TB+ storage each.
Create the normal snapshot. The full provisioned storage size is allocated to the snapshot.
Upgrade the snapshot to an engine version supported by Aurora, if not already supported. I chose 10.7.
Migrate the snapshot to Aurora. This creates a new Aurora DB.
Snapshot the new Aurora DB. The snapshot storage size starts as the full provisioned size, but drops to actual used storage after completion.
Remove the new Aurora DB.
Confirm your Aurora snapshot is good by restoring it again and poking around in the new new DB until you're satisfied that the original snapshots can be deleted.
Remove new new Aurora DB and original snapshot.
You can stop at 3 if you want and just use the Aurora DB going forward.

The #2 answer does not work on Windows 10 because, per this dba overflow question, the shell re-encodes the output when the > operator is used. The pg_dump will generate a file, but the pg_restore gives the cryptic error:
pg_restore: [archiver] did not find magic string in file header
Add -E UTF8 and -f instead of >:
pg_dump -Fc -v -E UTF8 -h your-rds-endpoint.us-west-2.rds.amazonaws.com -U your-username your-databasename -f your-databasename.dump

What about creating a read replica with smaller disk space, promoting it to standalone, and then switching that to be primary? I think that's what I will do.

Related

Backup and Restore AWS RDS Aurora cluster

I would like to backup every single PostgreSql database of my AWS RDS Cluster (Aurora DB Engine). Are there some managed tools (like Veeam or N2WS) or best practices, how to backup and restore a single database or schema from AWS S3?
Many thanks
You can use automatic backup combined with manual backup for Aurora PostgreSql database. For automatic backup, the max retention period is 35 days, and support any point in time restore and recovery. However, if you need a backup beyond the backup retention period (35 days), you can also take a snapshot of the data in your cluster volume.
If you use third-party tools, such as Veeam, it will also invoke AWS RDS snapshot API to take the backup, so the underly mechanism is the same.
You can also use the pg_dump utility for backing up the RDS for PostgreSQL database, and run pg_dump on read replica to minimize the performance impact to the primary database.

performance issue until mongodump

we operate for our customer a server with a single mongo instance, gradle, postgres and nginx running on it. The problem is we had massiv performance problmes until mongodump is running. The mongo queue is growing and no data be queried. The next problem is the costumer want not invest in a replica-set or a software update (mongod 3.x).
Has somebody any idea how i clould improve the performance.
command to create the dump:
mongodump -u ${MONGO_USER} -p ${MONGO_PASSWORD} -o ${MONGO_DUMP_DIR} -d ${MONGO_DATABASE} --authenticationDatabase ${MONGO_DATABASE} > /backup/logs/mongobackup.log
tar cjf ${ZIPPED_FILENAME} ${MONGO_DUMP_DIR}
System:
6 Cores
36 GB RAM
1TB SATA HDD
+ 2TB (backup NAS)
MongoDB 2.6.7
Thanks
Best regards,
Markus
As you have heavy load, adding a replica set is a good solution, as backup could be taken on secondary node, but be aware that replica need at least three servers (you can have an master/slave/arbiter - where the last need a little amount of resources)
MongoDump makes general query lock which will have an impact if there is a lot of writes in dumped database.
Hint: try to make backup when there is light load on system.
Try with volume snapshots. Check with your cloud provider what are the options available to take snapshots. It is super fast and cheaper if you compare actual pricing used in taking a backup(RAM and CPU used and if HDD then transactions const(even if it is little)).

What's a good way to backup a (AWS) Postgres DB

what's a good way to backup a Postgres DB (running on Amazon RDS).
The built in snapshoting from RDS is by default daily and you can not export the snapshots. Besides that, it can take quite a long time to import a snapshot.
Is there a good service that takes dumps on a regular basis and stores them on e.g. S3? We don't want to spin up and maintain a ec2 instance which does that.
Thank you!
I want the backups to be automated, so I would prefer to have dedicated service for that.
Your choices:
run pg_dump from an EC2 instance on a schedule. This is a great use case for Spot instances.
restore a snapshot to a new RDS instance, then run pg_dump as above. This reduces database load.
Want to run a RDS snapshot more often than daily? Kick it off manually.
These are all automateable. For "free" (low effort on your part) you get daily snapshots. I agree, I wish they could be sent to S3.
SOLUTION: Now you can do a pg_dumpall and dump all Postgres databases on a single AWS RDS Instance.
It has caveats and so its better to read the post before going ahead and compiling your own version of pg_dumpall for this. Details here.

mongodump a db on archlinux

I try to backup my local mongodb. I use archlinux and installed mongodb-tools in order to use mongodump.
I tried :
mongodump --host localhost --port 27017
mongodump --host localhost --port 27017 --db mydb
Every time I have the same response :
Failed: error connecting to db server: no recheable server
I'm however able to connect to the database using
mongo --host localhost --port 27017
or just
mongo
My mongodb version is 3.0.7.
I did not set any username/password
How can I properly use mongodump to backup my local database ?
This appears to be a bug in the mongodump tool, see this JIRA ticket for more detail. You should be able to use mongodump if you explicitely specify the IP address:
mongo --host 127.0.0.1 --port 27017
"Properly" is a highly subjective term in this context. To give you an impression:
mongodump and mongorestore aren't incredibly fast. In sharded environments, they can take days (note the plural!) for reasonably sized databases. Which in turn means that in a worst case scenario, you can loose days worth of data. Furthermore, during backup, the data may change quite a bit, so the state of your backup may be inconsistent. It is better to think of mongodump as "mongodumb" in this aspect.
Your application has to be able to deal with the lack of consistency gracefully, which can be quite a pain in the neck to develop. Furthermore, long restore times cost money and (sometimes even more important) reputation.
I personally use mongodump only in two scenarios: for backing up a sharded clusters metadata (which is only a couple of MB in size) and for (relatively) cheap data, which is easy to reobtain by other means.
For doing a MongoDB backup properly, imho, there are only three choices:
MongoDB Inc's cloud backup,
MongoDB Ops Manager
Filesystem snapshots
Cloud backup
It has several advantages. You can do point in time recoveries, guaranteeing to have the database in a consistent state as it was at the chosen point in time. It is extremely easy to set up and maintain.
However, you guessed it, it comes with a price tag based on data volatility and overall size, which, imho, is reasonable for small to medium sized data with low to moderate volatility.
MongoDB Ops Manager
Being an on premise version of the cloud backup (It has quite some other features out of the scope of this answer, too), it offers the same benefits. It is more suited for upper scale medium size to large databases or for databases with disproportionate high volatility (as indicated by a high "OplogGb/h" value in comparison to the data size).
Filesystem snapshots
Well, it is sort of cheap. Just make a snapshot, mount it, copy it to some backup space, unmount and destroy the snapshot, optionally compress the copied data and you are done. There are some caveats, though.
Synchronization
To get a backup of consistent data, you need to synchronize your snapshots on a sharded cluster. Especially since the sharded clusters metadata needs to be consistent with the backups, too, if you want a halfway fast recovery. That can become a bit tricky. To make sure your data is consistent, you'd need to disconnect all mongos, stop the balancer, fsync the data to the files on each node, make the snapshot, start the balancer again and restart all mongos. To have this properly synced, you need a maintenance window of some minutes every time you make a backup.
Note that for a simple replica set, synchronization is not required and backups work flawlessly.
Overprovisioning
Filesystem snapshots work with what is called "Copy-On-Write" (CoW). A bit simplified: When you make a snapshot and a file is modified, it is instead copied and the changes are applied to the newly copied file. The snapshot however, points to the old file. It is obvious that in order to be able to make a snapshot, as per CoW, you need some additional disk space so that MongoDB can work while you deal with the snapshot. Let us assume a worst case scenario in which all the data is changed – you'd need to overprovision your partition for MongoDB by at least 100% of your data size or, to put it in other terms, your critical disk utilization would be 50% minus some threshold for the time you need to scale up. Of course, this is a bit exaggerated, but you get the picture.
Conclusion
IMHO, proper backups should be done this way:
mongorestore for cheap data and little concern for consistency
Filesystem snapshots for replica sets
Cloud Backups for small to medium sized sharded databases with low to moderate volatility
Ops Manager Backups for large databases or small to medium ones with disproportionate high volatility
As said: "properly" is a highly subjective term when it comes to backups. ;)

Migrate database from Heroku to AWS

I want to migrate our postgres db from heroku to our own postgres on AWS.
I have tried using pg_dump and pg_restore to do the migration and it works; but it takes a really long time to do this. Our database size is around 20GB.
What's the best way to do the migration with minimal downtime?
If you mean AWS RDS PostgreSQL:
pg_dump and pg_restore
I know you don't like it, but you don't really have other options. With a lot of hoop jumping you might be able to do it with Londiste or Slony-I on a nearby EC2 instance, but it'd be ... interesting. That's not the most friendly way to do an upgrade, to say the least.
What you should be able to do is ship WAL into RDS PostgreSQL, and/or stream replication logs. However Amazon don't support this.
Hopefully Amazon will adopt some part of 9.4's logical replication and logical changeset extraction features, or better yet the BDR project - but I wouldn't hold my breath.
If you mean AWS EC2
If you're running your own EC2 instance with Pg, use replication then promote the standby into the new master.