Can I keep two mongo databases synced? - mongodb

I have an app that can run in offline mode. If offline it uses a local mongo database, if it has a data connection it will use a remote mongo database.
Is there an easy way to sync these two databases and make sure they both have the union of their collections and documents?
EDIT: Effectively there are two databases that could both have insertions and deletions happening on them that aren't happening on the other. At fixed points in time I would like to have both databases show the union of them both.
For example over a period of time.
DB1.insert(A)
DB1.insert(B)
DB2.insert(C)
DB1.remove(A)
RUN SYNC
DB1 = DB2 = {B, C}
EDIT2: Been doing some reading. It's not the intended purpose but could they be set up as slaves replica sets of the remote and used that way? Problem is that I think replicas need to have a replica hosts must be accessible by way of resolvable DNS. Not sure how the remote could access local host.

You could use replica set but MongoDB doesn’t support master-master replication. Let's assume if you have setup like this:
two nodes with priority 1 which will be used as remote servers
single arbiter to ensure majority if one of remotes dies
5 local dbs with priority set as 0
When your application goes offline, it will stay secondary so you won't be able to perform writes. When you go online it will sync changes from remote dbs but you still need some way of syncing local changes. One of dealing with could be using local fallback db which will be used for writes when you are offline. When you go online, you push all new records to master. A little bit trickier could be dealing with updates but it is doable.
Another problem is that it won't scale up if you'll need to add more applications. If I remember correctly, there is a 12 nodes per replica set limit. For small cluster DNS resolution could be solved by using ssh tunnels.
Another way of dealing with a problem could be using small restful service and document timestamps. Whenever app is online it can periodically push local inserts to remote and pull data from remote db.

Related

Mongodb clone to another cluster

The idea here is, I have mongo cluster deployed in managed cloud service atlas. I have enabled Continuous Backup.
Now what I want to do is :
1) I want to use existing backup.
2) Using this existing backup I want to create similar cluster
(having same data form backup)
3) Automate this process so that every day my new cluster gets upto date from original cluster.
Note: The idea here for cloning cluster is, The original cluster is production data. I want to create a db which has similar data on which I can plug and play using any analytic tools and perform diffrent operations without affecting production data and load.
So far what I have found is to use mongorestore and mongodump.But here mongodump is putting load on production db even though my backup is enabled. I want to use same backup to clone this to another db cluster.
Deployed on Atlas, your server must have replica set.
Here are 2 solutions :
You need only reading data : connect your tools to a secondary server (ideally dedicated with priority 0 for becoming primary)
You need to read/write data : on the same server than above, play your mongodump command with --oplog option. By this way, you're dumping your data from a read-only server, preventing slowing performances of your main servers.
In this last case, what you need will find its solution in backup strategies, take a look at the doc to know more.
There's an offering for this purpose in ATLAS called analytic node.Link.
Analytic node is read replica of your database. Plus it will not interfere with your production traffic which makes it safer.
Also, you can connect BI connectors to this node and create your analytic platform.
We used redash.

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.

Life-copies for devel-team in MongoDb

Q: Which is the best architecture for life-copies for testing and development?
Current setup:
We have two amazon/EC2 mongod servers like this:
Machine A:
A production database (on an amazon/EC2 server) (name it ‘PROD’)
Other databases (‘OTHER’)
Machine B:
a pre-production database (name it ‘PRE’)
a copy for developer 1 own tests (call it ‘DEVEL-1’)
a copy for developer 2 (DEVEL-2)
…DEVEL-n
The PRE database is for integration-tests before deploying into production.
The DEVEL-n is for each developer trashing its own data without annoying the other developers.
From time to time we want to “restore” fresh data from PROD into the PRE and DEVEL-n bases.
Currently we pass from PROD to PRE via the .copyDatabase() command.
Then we issue .copyDatabase() “n” times to make copies from PRE into DEVEL-n.
The trouble:
A copy takes soooo long (1hour per copy, DBsize over 10GB) and also normally it saturates the mongod so we have to restart the service.
We have found about:
Dump/restore system (saturates as .copyDatabase() does)
Replica sets
Master/Slave (seem deprecated)
Replica-sets seem the winners, but we have serious doubts:
Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):
a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?
b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?
c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?
d) Is there any other better way than replica-sets suitable for this case?
Thanks.
Marina and Xavi.
Replica-sets seem the winners, but we have serious doubts:
Suppose we want a replica-set to sync live A/PROD into B/PRE
(and have A likely as a primary and B likely as secondary):
a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?
As at MongoDB 2.4, replication always includes all databases. The design intent is for all nodes to be eventually consistent replicas, so that you can failover to another non-hidden secondary in the same replica set.
b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?
No, there is only a single primary in a replica set.
c) Can I “stop to replicate” so we can deploy to PRE, test the soft
with fresh-data, trash the data with the testing and after tests have
been complete “re-link” the replica so changes in PRE are deleted and
changes in PROD are transported into PRE adequately?
Since there can only be a single primary, the use case of mixing production and test roles in the same replica set is not possible how you've envisioned.
Best practice would to isolate your production and dev/staging environments so there can be no unexpected interaction.
d) Is there any other better way than replica-sets suitable for this case?
There are some approaches you can take to limit the amount of data needed to be transferred so you are not copying the full database (10Gb) across from production each time. Replica sets are suitable as part of the solution, but you will need to have a separate standalone server or replica set for your PRE environment.
Some suggestions:
Use a replica set and add a hidden secondary in your development environment. You can take backups from this node without affecting your production application, and since the secondary replicates changes as they occur you should be doing a comparatively faster local network copy of the backup.
Implement your own scheme for partial replication based on a tailable cursor of MongoDB's oplog. The local oplog.rs capped collection is the same mechanism used to relay changes to members of a replica set and includes details for inserts, deletes, and updates. You could match on the relevant database namespaces and relay matching changes from your production replica set into your isolated PRE environment.
Either of these approaches would allow you control over when the backup is transferred from PROD to PRE, as well as restarting from a previous point after testing.
In our setup we use EBS snapshots to quickly replicate production database on staging environment. Snapshots are run every few hours as part of backup cycle. When starting new DB server in staging, it looks for most recent DB snapshot and use it for EBS drive.
Taking snapshot is almost instant, recovery is also very fast. This approach also scales up very well, we actually using it in huge sharded MongoDB installation. The only downside is that you need to rely on AWS services to implement it. That can be undesirable in some cases.

How to query from nearest shard/replica set

We are looking at CQRS and are evaluating MongoDB as our query database. We have several branch offices with very poor internet connections (256 - 768 mbs DSL) that cannot be upgraded (remote rural locations with no ISP options). The thought was that we could put a box running MongoDB in each branch office and use some sort of clustering so that reads would become very fast since they were accessed over the LAN.
I do not need fail over or high availability.
How do I setup MongoDB so that computers in the branch office talk to the Mongodb instance in the branch office? Note this will be read only.
If MongoDB cannot support this workflow then I am open to suggestions for alternate document based databases that can support this.
You can add a secondary to the MongoDB replica set for each geographical location. Then you can connect to each secondary directly by specifying the hostname of the mongo instance that's local to where you are connecting from, and then issue the slaveOk() command to allow it to be queried. Replication will keep the secondaries in sync with the primary as closely as possible.
Writes/updates will still need to be handled by the primary of the replica set, however.
CouchDB was built for this: occasionally connected document databases.
You can copy a database to your laptop, take it on a plane, modify it there and when you'll have network again, you'll be able to synchronize it with master database. The same works for remote offices.
I strongly suggest that you check it out.

mongoDB replication with offline nodes

Is it possible to set up mongoDB replica set with following scenario (if it is,how):
2 servers always online running mongodb, one of them holds the main node, the other one a rescue copy;
n computers each of them running mongodb, occasionally connected to internet, holding nodes which need synchronizing with main node, when they go online.
Backup only. In order to do this, you'll have to specify the priority of this node to 0. If your node is never going to be used as master nor queried, you can also set buildIndexes to false.
More informations here.
Intermitent slave. Due to limitations (mainly on the oplog queue), you can't have a slave halted for a very long time if you have many writes on your MongoDB, see here. However, you can use the mongodump and mongorestore tools directly over network or by script + sync the backup file. More informations here. Note that a restore will bring a db or collection in a server and recreate the indexes completely (if you restore the system.indexes collection too) which can take some time.