There is any way to make this operation faster?
I'm trying to restore my DB to the AWS DocumentDB, and probably it will take some weeks to finish... my overall data is less than 400MB.
dump is Gzipped
To resolve this it was suggested to run the command from an EC2 instance rather than a remote host.
This enabled a speedy import.
The likely reason is the number of network based operations across the internet rather than a local network resource which has shorter latency between each interaction.
Related
The idea here is, I have mongo cluster deployed in managed cloud service atlas. I have enabled Continuous Backup.
Now what I want to do is :
1) I want to use existing backup.
2) Using this existing backup I want to create similar cluster
(having same data form backup)
3) Automate this process so that every day my new cluster gets upto date from original cluster.
Note: The idea here for cloning cluster is, The original cluster is production data. I want to create a db which has similar data on which I can plug and play using any analytic tools and perform diffrent operations without affecting production data and load.
So far what I have found is to use mongorestore and mongodump.But here mongodump is putting load on production db even though my backup is enabled. I want to use same backup to clone this to another db cluster.
Deployed on Atlas, your server must have replica set.
Here are 2 solutions :
You need only reading data : connect your tools to a secondary server (ideally dedicated with priority 0 for becoming primary)
You need to read/write data : on the same server than above, play your mongodump command with --oplog option. By this way, you're dumping your data from a read-only server, preventing slowing performances of your main servers.
In this last case, what you need will find its solution in backup strategies, take a look at the doc to know more.
There's an offering for this purpose in ATLAS called analytic node.Link.
Analytic node is read replica of your database. Plus it will not interfere with your production traffic which makes it safer.
Also, you can connect BI connectors to this node and create your analytic platform.
We used redash.
i am working on my next project currently which works 100% on mongo,
my past projects worked on SQL + Mongo on which i used AWS RDS + AWS EC2 and could connect them both in AWS internal IP which result me with much faster connection.
Now in mongo there is alot of fancy cloud servers like MLab and MongoDB Atlas which is actually cheaper then AWS.
My concern is that moving back to external DB connection will be slower and more network consuming then the internal connection in RDS
Have anyone experienced in such issue? maybe the different isn't that big as i make it but i need it to be optimized
This depends on your setup. Many of the "fancy" services also host stuff on AWS, so latency is minimal. Some even offer "private environments" or such, so you can hide your databases from public view.
The only thing left to care about is the amount of network traffic. But this will be your problem regardless of your database host. You can test this relatively easily (e.g. get a trial from one of the providers and test for throughput, or raise your own MongoDB docker cluster to use as a test etc) just to get an idea of the performance range you'll be in.
One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.
Can I deploy large database by copying its files (eg. testing database with files: testing.0,testing.1,testing.ns found on mongodb dbpath) from another server to the target servers (replica set) to avoid usage of communication bandwidth for replication (in case it is only deployed to the primary)? So basically I want to avoid the slow process of replication.
If journaling is enabled, what is the effect on the process?
Yes you can, this is a perfectly valid way of solving having to do tedious and time consuming replication between members of a distanced or latenced network.
If journaling is enabled nothing really happens, copying via the file system goes around MongoDB.
If I populate a MongoDB instance on my local machine, can I wholesale transfer that database to a server and have it work without too much effort?
The reason I ask is that my server is currently an Amazon EC2 Micro instance and I need to put LOTS of data into a MongoDB and don't think I can spare the transactions and bandwidth on the EC2 instance.
There is copy database command which I guess should be good fit for your need.
Alternatively, you can just stop MongoDb, copy the database files to another server and run an instance of MongoDb there.