why is AWS RDS backup taking so long? - postgresql

I am using RDS to run a postgresql server (9.6.3) and this morning, a backup was automatically kicked off. It is still going 6 hours later which seems absurd. The database is not that big (~ 600 GB), and as far as I can tell, this is the first time i've experienced this problem. The machine is relatively beefy (db.m4.2xlarge), so it seems like these backups should take a lot less than 6 hours.
I am also surprised by the fact that a backup would be kicked off at 5:30 AM, which seems awfully close to standard biz hours.
Any ideas?

You scheduled the 5:30 AM backup window. Amazon didn't randomly kick it off at that time. Look at your RDS instance's settings and you will see a backup window that was defined when you created the instance.
An RDS backup is like an EBS snapshot, and it shouldn't be reliant on the CPU of the server at all. It should also not affect server performance at all.
You should look into migrating to Amazon Aurora now that the PostreSQL version is out of beta. Among other benefits, you will get extremely fast snapshot creation with Aurora.
Sometimes things like this become "stuck" due to an issue behind the scenes. If that happens all you can do is open a support ticket with AWS to get it fixed.

Related

Expected downtime to reboot a multi az rds instance without failover to apply some static parameter changes

I would like to know how much downtime should I expect to reboot an multi az rds instance without failover. I want to apply some static parameter changes. The instance is t2 medium with 100 GiB storage. It's running postgres 9.6. I am not looking for exact number but an estimate.
I recently rebooted db.t3.micro RDS postgres 10.6 instance without failover, and it took about 1min when the status became available again, DB size 18GB.
In my experience that would take about 5 to 10 minutes, but I'm not sure if the DB would be down and inaccessible the whole time. Please don't make any critical business decisions based on my estimate though. If this is a critical issue, then you should create a copy of the database from a snapshot and test out the parameter changes on the copy.
If you are concerned with limiting the downtime, just add a replica, make the changes and let it failover, then once the changes are done remove the replica.

MongoDB database restoration takes too much time

I encountered with the following issue. I was tasked to restore Mongo DB from a backup and I am using mongorestore.exe (on Windows OS) for it. Restoration process takes about 1,5h, a backup file size is about 25G (contains 25M documents).
I tried to restore both on an AWS Document DB cluster (instance type: db.r5.large) and on locally installed MongoDB (EC2-instance, r5n.large). I got almost the same time of the process (about 1.5h)
My question: Is it reasonable time for this operation and how can I optimize/reduce time that needs for this?
All advice is very appreciated.
Agree with Ayoub#, parallelizing with --numParallelCollections has helped many folks speed up the restore process. Also, you can additionally speed up the restore by scaling up the Amazon DocumentDB instance to a larger size for the duration of the restore and scale it back down to an r5.large when the restore is complete. Amazon DocumentDB charges by the second for instance costs to help minimize costs for these scenarios.

What to do when a Google Cloud SQL postgres server upgrade (to a bigger machine) takes considerably longer than "a few minutes"?

We upgraded our Google Cloud SQL postgres server to a bigger machine and the upgrade is not terminating. In our experience, this usually takes less than 5 minutes, but we'ven been waiting for about 1.5 hours now and nothing is happening. There are no logs after the server shut down(except for failed connection attempts). We cannot switch to the failover, because there is already an operation in progress (namely the upgrade that's causing the problem in the first place). Restarting is disabled because the operation is in progress. It seems like there's nothing we can do right now, except maybe apply the last backup, though we're not sure if that's even possible while an operation is in progress.
Is there anything we can do to restart the DB or fix the problem?
When you upgrade a CloudSQL server, the instance is rebooted. It can happen occasionally that rebooting takes more than expected, which seems to be what happened to your server, but this is not unexpected behaviour.
This being said, be sure to check the status of the CloudSQL service. And if upgrades get stuck too often or never finish, contact support.
To reduce the chances of having this issue again:
Configure High Availability for your instance, so it has failover capability.
Make sure that the maintenance window of failover replicas is different from that of the master instance. To change the maintenance schedule, on the GCP console, go to SQL, click on an instance, and "Edit maintenance schedule"->"Set maintenance schedule". Then choose a window.

RDS snapshot restore taking too long

As part of our blue-green deployment strategy we are snapshoting the prod RDS instance and then restoring this snapshot into a new instance applying db migrations after it and linking the newly Green application to it.
Our RDS instance has a 100 GB space, but our DB uses only 10 MB at the moment.
Taking a snapshot takes roughly < 2 minutes
Restoring from the Snapshot takes 25 minutes!
25 minutes for the restore is too long considering users are forced to stay in read only mode for all this period and that our DB size is less than 10 mb at the moment.
I am wondering if this restore time is the usual time for Amazon RDS or if we are doing something wrong.
Amazon RDS Postgres.
Multi AZ: Yes
Instance Class: Medium
General Purpose (SSD)
IOPS: disabled.
After some experimentation we were able to reduce the restoring time from 25 minutes to 5 minutes. This was due to the fact, that RDS first tries to restores the snapshot. (In our case this took 5 minutes). And afterwards it applied the Multi Az change to the new instance. (this was taking like 20 minutes)
Previously we were waiting for the DB to finish the MULTI AZ change, and status="available" to continue with our Deployment, but after contacting AWS, they have confirmed that is safe to start using the new instance even when the instance is being modified to apply the MULTI AZ change. So we continue our deployment process as soon as the restored instance status change from "creating" to "modifying"
This solution as correctly said, might not scale very well but at the moment this is not a concern as we are not expecting this DB to grow significantly.
We consider this approach to be very safe, as any DB schema changes wont affect the live DB and we can safely test the whole GREEN stack before switching to PROD. The only caveat here is that the application need to be in read-only mode, so as not to loose information between the blue and green environments

How to scale MongoDB?

I know that MongoDB can scale vertically. What about if I am running out of disk?
I am currently using EC2 with EBS. As you know, I have to assign EBS for a fixed size.
What if the MongoDB growth bigger than the EBS size? Do I have to create a larger EBS and Copy & Paste the files?
Or shall we start more MongoDB instance and each connect to different EBS disk? In such case, I could connect to a different instance for different databases.
If you're running out of disk, you obviously need to get a bigger disk.
There are several ways to migrate your data, it really depends on the type of up-time you need. First steps of course involve bundling the machine and creating the new volume.
These tips go from easiest to hardest.
Can you take the database completely off-line for several minutes?
If so, do this (migration by copy):
Mount new EBS on the server.
Stop your app from connecting to Mongo.
Shut down mongod and wait for everything to write (check the logs)
Copy all of the data files (and probably the logs) to the new EBS volume.
While the copy is happening, update your mongod start script (or config file) to point to the new volume.
Start mongod and check connection
Restart your app.
Can you take the database off-line for just a few minutes?
If so, do this (slaving and switch):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a --slave pointing at the current database. (you may need to re-start the current as --master)
The slave will do a fresh synchronization. Once the slave is up-to-date, you'll do a "switch" (next steps).
Turn off writes from the system.
Shut down the original mongod process.
Re-start the "new" mongod as a master instead of the slave.
Re-activate system writes pointing at the new master.
Done correctly those last three steps can happen in minutes or even seconds.
Can you not afford any down-time?
If so, do this (master-master):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a master and a slave against the current database. (may need to re-start current as master, minimal down-time?)
The new computer should do a fresh synchronization.
Once the new computer is up-to-date, switch the system to point at the new server.
I know it seems like this last version is actually the best, but it can be a little dicey (as of this writing). The reason is simply that I've honestly had a lot of issues with "Master-Master" replication, especially if you don't start with both active.
If you plan on using this method, I highly suggest a smaller practice run first. If something bombs here, Mongo might simply wipe all of your data files which will have the effect of taking more stuff down.
If you get a good version of this please post the commands, I'd like to see it in action.
Doesn't the E in EBS stand for elastic meaning something like resizing on the fly?
Currently the MongoDB team is working on finishining sharding which will allow you horizontal scaling by partitioning data separately on different servers. Give it a month or two and it will work fine. The developers are quite good at keeping their promises.
http://api.mongodb.org/wiki/current/Sharding%20Introduction.html
http://api.mongodb.org/wiki/current/Sharding%20Limits.html
You could slave the bigger disk off the smaller until it's caught up
or
fsync+lock and take a file system snapshot and copy it onto the bigger disk.
well, I am using Mongo DB now. I am pretty amazed the performance it generated, especially on some simple sorting.
I believe it's a good tool for simple web application logic. The remaining concern for is how to scale and backup. I will continue to explore.
The only disadvantage I have is that I didn't have any good tools to reveal the data stored inside. For example, I want to put my logging from MYSQL into Mongo as well. However, it's pretty difficult for me to view the log. Previously, i can use MYSQL query to fetch what I want easily.
Anyway, it's a good tool and I will continue to use it.