Mongodb clone to another cluster - mongodb

The idea here is, I have mongo cluster deployed in managed cloud service atlas. I have enabled Continuous Backup.
Now what I want to do is :
1) I want to use existing backup.
2) Using this existing backup I want to create similar cluster
(having same data form backup)
3) Automate this process so that every day my new cluster gets upto date from original cluster.
Note: The idea here for cloning cluster is, The original cluster is production data. I want to create a db which has similar data on which I can plug and play using any analytic tools and perform diffrent operations without affecting production data and load.
So far what I have found is to use mongorestore and mongodump.But here mongodump is putting load on production db even though my backup is enabled. I want to use same backup to clone this to another db cluster.

Deployed on Atlas, your server must have replica set.
Here are 2 solutions :
You need only reading data : connect your tools to a secondary server (ideally dedicated with priority 0 for becoming primary)
You need to read/write data : on the same server than above, play your mongodump command with --oplog option. By this way, you're dumping your data from a read-only server, preventing slowing performances of your main servers.
In this last case, what you need will find its solution in backup strategies, take a look at the doc to know more.

There's an offering for this purpose in ATLAS called analytic node.Link.
Analytic node is read replica of your database. Plus it will not interfere with your production traffic which makes it safer.
Also, you can connect BI connectors to this node and create your analytic platform.
We used redash.

Related

can multiple server access the same mongodb?

I am going to create a load balancer in Azure. I have a VM that already running and going to take a backup of the existing VM and will create another VM using that backup. So two servers will have the same configuration and will use the same credentials.
In the already existing server, I have MongoDB configured, and if I create the same VM that will also have the same configuration as the old VM. Now what I want to know is can I use the same MongoDB which will be accessed by two servers that have the same configurations?
Will it create any mess or any give any error?
can I use like above mentioned?
Do I need to configure another MongoDB for the second server?
can anyone please clarify my questions? it would be great to have some clear explanation. thank you
MongoDB has build in support for horizontal scalability and high availability meaning that you dont need to create a 3th party load balancer , the mongos service part of mongoDB sharding cluster is the load balancer itself. Check the official documentation for mongoDB replication and sharding ...
On your questions:
Will it create any mess or any give any error?
If you just copy data to another VM it will be fine , as far as you dont write to one of the VMs you can loadbalance reads between this independent VMs , but this is strange approch when you have build in mongoDB replication mechanism and you can just add the second VM as a SECONDARY member from replicaSet.
can I use like above mentioned?
Sure , you can use also this approach but why you will need to do it?
Do I need to configure another MongoDB for the second server?
Depends on the use case , but in general you would prefer to create 3x members replicaSet or if your database is large and write performance is strong requirement you may need to distribute the database between multiple servers ( shards ) so you will need more then just 3x servers ...

How to quickly mirror a Mongo database?

What's a quick and efficient way to transfer a large Mongo database?
I want to transfer a 10GB production Mongo 3.4 database to a staging environment for testing. I used the mongodump/mongorestore tools to test this transfer to my localhost, but it took over 8 hours and consumed a massive amount of CPU and memory, which is something I'd like to avoid in the future. The database doesn't have any indexes, so the mongodump option to exclude indexes doesn't increase performance.
My staging environment will mostly be read-only, but it will still need to write occasionally, so it can't be setup as a permanent read replica of production.
I've read about [replication sets][1], but they seem very complicated to setup and designed for permanent mirroring of a primary to two or more secondaries. I've read some posts about people hacking this to be temporary, so they can do a one-time mirroring, but I can't find any reliable documentation since this isn't the intended usage of the feature. All the guides I've read also say you need at least 3 servers, which seems unintuitive since I only have 2 (production and staging) and don't want to create a third.
Several options exist today (2020-05-06).
Copy Data Directory
If you can take the system offline you can copy the data directory from one host to another then set the configuration to point to this directory and start up the new mongod.
Mongomirror
Mongomirror (https://docs.atlas.mongodb.com/import/mongomirror/) is intended to be a tool to migrate from on-premises to Atlas, but this tool can be leveraged to copy data to another on-premises host. Beware, this connection requires SSL configurations on source and target to transfer.
Replicaset
MongoDB has built-in High Availability features using a replica set model (https://docs.mongodb.com/manual/tutorial/deploy-replica-set/). It is not overly complicated and works very well. This option allows the original system to stay online while replication does its magic. Once the replication completes reconfigure the replica set to be a single node replica set referring only to the new host and shut down the original host. This configuration is referred to as a single-node replica set. Having a single node replica set offers benefits over a stand-alone installation in that the replica set underpinnings (oplog) are the basis for other features such as change streams (https://docs.mongodb.com/manual/changeStreams/)
Backup and Restore
As you mentioned you can use mongodump/mongorestore. There is a point in time where the backup must be restored. During this time it is expected the original system is offline and not accepting any additional writes. This method is robust but has downtime associated with it. You could use mongoexport/mongoimport to use a JSON file as an intermediate step but this is not recommended as BSON data types could be lost in translation.
Per Mongo documentation, you should be able to cp/rsync files for creating a backup (if you are able to halt write ops temporarily on your production setup - or if you do this during a maintenance window)
https://docs.mongodb.com/manual/core/backups/#back-up-by-copying-underlying-data-files
Back Up with cp or rsync
If your storage system does not support snapshots, you can copy the files >directly using cp, rsync, or a similar tool. Since copying multiple files is not >an atomic operation, you must stop all writes to the mongod before copying the >files. Otherwise, you will copy the files in an invalid state.
Backups produced by copying the underlying data do not support point in time >recovery for replica sets and are difficult to manage for larger sharded >clusters. Additionally, these backups are larger because they include the >indexes and duplicate underlying storage padding and fragmentation. mongodump, >by contrast, creates smaller backups.
FYI - for replica sets, the third "server" is an arbiter which exists to break the tie when electing a new primary. It does not consume as many resources as the primary/secondaries. Since you are looking to creating a staging environment, i would not recommend creating a replica set that includes production and staging env. Your primary instance could switch over to the staging instance and clients who are meant to access production instance will end up reading/writing from staging instance.

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.

How to replicate MySQL database to Cloud SQL Database

I have read that you can replicate a Cloud SQL database to MySQL. Instead, I want to replicate from a MySQL database (that the business uses to keep inventory) to Cloud SQL so it can have up-to-date inventory levels for use on a web site.
Is it possible to replicate MySQL to Cloud SQL. If so, how do I configure that?
This is something that is not yet possible in CloudSQL.
I'm using DBSync to do it, and working fine.
http://dbconvert.com/mysql.php
The Sync version do the service that you want.
It work well with App Engine and Cloud SQL. You must authorize external conections first.
This is a rather old question, but it might be worth noting that this seems now possible by Configuring External Masters.
The high level steps are:
Create a dump of the data from the master and upload the file to a storage bucket
Create a master instance in CloudSQL
Setup a replica of that instance, using the external master IP, username and password. Also provide the dump file location
Setup additional replicas if needed
Voilà!

Can I keep two mongo databases synced?

I have an app that can run in offline mode. If offline it uses a local mongo database, if it has a data connection it will use a remote mongo database.
Is there an easy way to sync these two databases and make sure they both have the union of their collections and documents?
EDIT: Effectively there are two databases that could both have insertions and deletions happening on them that aren't happening on the other. At fixed points in time I would like to have both databases show the union of them both.
For example over a period of time.
DB1.insert(A)
DB1.insert(B)
DB2.insert(C)
DB1.remove(A)
RUN SYNC
DB1 = DB2 = {B, C}
EDIT2: Been doing some reading. It's not the intended purpose but could they be set up as slaves replica sets of the remote and used that way? Problem is that I think replicas need to have a replica hosts must be accessible by way of resolvable DNS. Not sure how the remote could access local host.
You could use replica set but MongoDB doesn’t support master-master replication. Let's assume if you have setup like this:
two nodes with priority 1 which will be used as remote servers
single arbiter to ensure majority if one of remotes dies
5 local dbs with priority set as 0
When your application goes offline, it will stay secondary so you won't be able to perform writes. When you go online it will sync changes from remote dbs but you still need some way of syncing local changes. One of dealing with could be using local fallback db which will be used for writes when you are offline. When you go online, you push all new records to master. A little bit trickier could be dealing with updates but it is doable.
Another problem is that it won't scale up if you'll need to add more applications. If I remember correctly, there is a 12 nodes per replica set limit. For small cluster DNS resolution could be solved by using ssh tunnels.
Another way of dealing with a problem could be using small restful service and document timestamps. Whenever app is online it can periodically push local inserts to remote and pull data from remote db.