MongoDB database restoration takes too much time

MongoDB database restoration takes too much time - mongodb

I encountered with the following issue. I was tasked to restore Mongo DB from a backup and I am using mongorestore.exe (on Windows OS) for it. Restoration process takes about 1,5h, a backup file size is about 25G (contains 25M documents).
I tried to restore both on an AWS Document DB cluster (instance type: db.r5.large) and on locally installed MongoDB (EC2-instance, r5n.large). I got almost the same time of the process (about 1.5h)
My question: Is it reasonable time for this operation and how can I optimize/reduce time that needs for this?
All advice is very appreciated.

Agree with Ayoub#, parallelizing with --numParallelCollections has helped many folks speed up the restore process. Also, you can additionally speed up the restore by scaling up the Amazon DocumentDB instance to a larger size for the duration of the restore and scale it back down to an r5.large when the restore is complete. Amazon DocumentDB charges by the second for instance costs to help minimize costs for these scenarios.

Related

mongorestore is very slow on AWS DocumentDB

There is any way to make this operation faster?
I'm trying to restore my DB to the AWS DocumentDB, and probably it will take some weeks to finish... my overall data is less than 400MB.
dump is Gzipped

To resolve this it was suggested to run the command from an EC2 instance rather than a remote host.
This enabled a speedy import.
The likely reason is the number of network based operations across the internet rather than a local network resource which has shorter latency between each interaction.

why is AWS RDS backup taking so long?

I am using RDS to run a postgresql server (9.6.3) and this morning, a backup was automatically kicked off. It is still going 6 hours later which seems absurd. The database is not that big (~ 600 GB), and as far as I can tell, this is the first time i've experienced this problem. The machine is relatively beefy (db.m4.2xlarge), so it seems like these backups should take a lot less than 6 hours.
I am also surprised by the fact that a backup would be kicked off at 5:30 AM, which seems awfully close to standard biz hours.
Any ideas?

You scheduled the 5:30 AM backup window. Amazon didn't randomly kick it off at that time. Look at your RDS instance's settings and you will see a backup window that was defined when you created the instance.
An RDS backup is like an EBS snapshot, and it shouldn't be reliant on the CPU of the server at all. It should also not affect server performance at all.
You should look into migrating to Amazon Aurora now that the PostreSQL version is out of beta. Among other benefits, you will get extremely fast snapshot creation with Aurora.
Sometimes things like this become "stuck" due to an issue behind the scenes. If that happens all you can do is open a support ticket with AWS to get it fixed.

Mongo backup - performance

I'm going to run schedule backup for my MongoDB server base on this opensource script.
My data is about 10GB (and counting) and back it up taking time, my question is - is that backup operation actually blocking the database or will make the server slow to not serve the applications using it? or that Mongo backup is smart enough to do that in background without harm the service?
Do anyone have this such experience?
TU.

Mongo db partial back ups

We have a 5 node replication set up on our development server. We are looking for a way to allow developers to back up a subset of data in a mongo db and restore this to their local development enviroments.
We have looked into the clonedb and the mongodump utils, but both only allow for a backup/dump of the complete database. Due to the possible size of the database, we need an option that allows us to limit the data being backed up or restored.
Do any know of a util or way to achieve this?

I just stumbled upon this question again and decided to add a description of our backup strategy we opt in for:
Current back up strategy for our mongo db this server consist of 2 setups; backup via delayed passive secondarynode and daily backup using mongodump (takes journalling and oplog into play).
Besides our normal production nodes, we have setup another secondary node with a priority of 0 (this can either be on its own server or piggy backing off another mongo server but using a seperate port), hidden as true and a delay of 7200 seconds (2hours). This slave is there for "butter fingers", when some one accidentally drops a database or clears a collection, we have 2 hours before these changes replicate to this passive secondary. The passive secondary can NOT be used for READING or WRITING. It's role is simply a back up node. We also use this node for nightly backup to prevent unnecessary overhead on any of the other nodes.
The nightly backup is set to run every night at 23:00 via a cron tab. The command simply executes a script setup in /opt/auto-mongo-backup. This script can be found at https://github.com/jaconel/automongobackup (originally found it at https://github.com/micahwedemeyer/automongobackup). This script allows for a single nightly cron to cover weekly backups and monthly backups. Back ups are saved at /var/backups/mongodb.
Hope this helps some one out.

How to scale MongoDB?

I know that MongoDB can scale vertically. What about if I am running out of disk?
I am currently using EC2 with EBS. As you know, I have to assign EBS for a fixed size.
What if the MongoDB growth bigger than the EBS size? Do I have to create a larger EBS and Copy & Paste the files?
Or shall we start more MongoDB instance and each connect to different EBS disk? In such case, I could connect to a different instance for different databases.

If you're running out of disk, you obviously need to get a bigger disk.
There are several ways to migrate your data, it really depends on the type of up-time you need. First steps of course involve bundling the machine and creating the new volume.
These tips go from easiest to hardest.
Can you take the database completely off-line for several minutes?
If so, do this (migration by copy):
Mount new EBS on the server.
Stop your app from connecting to Mongo.
Shut down mongod and wait for everything to write (check the logs)
Copy all of the data files (and probably the logs) to the new EBS volume.
While the copy is happening, update your mongod start script (or config file) to point to the new volume.
Start mongod and check connection
Restart your app.
Can you take the database off-line for just a few minutes?
If so, do this (slaving and switch):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a --slave pointing at the current database. (you may need to re-start the current as --master)
The slave will do a fresh synchronization. Once the slave is up-to-date, you'll do a "switch" (next steps).
Turn off writes from the system.
Shut down the original mongod process.
Re-start the "new" mongod as a master instead of the slave.
Re-activate system writes pointing at the new master.
Done correctly those last three steps can happen in minutes or even seconds.
Can you not afford any down-time?
If so, do this (master-master):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a master and a slave against the current database. (may need to re-start current as master, minimal down-time?)
The new computer should do a fresh synchronization.
Once the new computer is up-to-date, switch the system to point at the new server.
I know it seems like this last version is actually the best, but it can be a little dicey (as of this writing). The reason is simply that I've honestly had a lot of issues with "Master-Master" replication, especially if you don't start with both active.
If you plan on using this method, I highly suggest a smaller practice run first. If something bombs here, Mongo might simply wipe all of your data files which will have the effect of taking more stuff down.
If you get a good version of this please post the commands, I'd like to see it in action.

Doesn't the E in EBS stand for elastic meaning something like resizing on the fly?
Currently the MongoDB team is working on finishining sharding which will allow you horizontal scaling by partitioning data separately on different servers. Give it a month or two and it will work fine. The developers are quite good at keeping their promises.
http://api.mongodb.org/wiki/current/Sharding%20Introduction.html
http://api.mongodb.org/wiki/current/Sharding%20Limits.html

You could slave the bigger disk off the smaller until it's caught up
or
fsync+lock and take a file system snapshot and copy it onto the bigger disk.

well, I am using Mongo DB now. I am pretty amazed the performance it generated, especially on some simple sorting.
I believe it's a good tool for simple web application logic. The remaining concern for is how to scale and backup. I will continue to explore.
The only disadvantage I have is that I didn't have any good tools to reveal the data stored inside. For example, I want to put my logging from MYSQL into Mongo as well. However, it's pretty difficult for me to view the log. Previously, i can use MYSQL query to fetch what I want easily.
Anyway, it's a good tool and I will continue to use it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse