How to query from nearest shard/replica set - mongodb

We are looking at CQRS and are evaluating MongoDB as our query database. We have several branch offices with very poor internet connections (256 - 768 mbs DSL) that cannot be upgraded (remote rural locations with no ISP options). The thought was that we could put a box running MongoDB in each branch office and use some sort of clustering so that reads would become very fast since they were accessed over the LAN.
I do not need fail over or high availability.
How do I setup MongoDB so that computers in the branch office talk to the Mongodb instance in the branch office? Note this will be read only.
If MongoDB cannot support this workflow then I am open to suggestions for alternate document based databases that can support this.

You can add a secondary to the MongoDB replica set for each geographical location. Then you can connect to each secondary directly by specifying the hostname of the mongo instance that's local to where you are connecting from, and then issue the slaveOk() command to allow it to be queried. Replication will keep the secondaries in sync with the primary as closely as possible.
Writes/updates will still need to be handled by the primary of the replica set, however.

CouchDB was built for this: occasionally connected document databases.
You can copy a database to your laptop, take it on a plane, modify it there and when you'll have network again, you'll be able to synchronize it with master database. The same works for remote offices.
I strongly suggest that you check it out.

Related

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.

Can I keep two mongo databases synced?

I have an app that can run in offline mode. If offline it uses a local mongo database, if it has a data connection it will use a remote mongo database.
Is there an easy way to sync these two databases and make sure they both have the union of their collections and documents?
EDIT: Effectively there are two databases that could both have insertions and deletions happening on them that aren't happening on the other. At fixed points in time I would like to have both databases show the union of them both.
For example over a period of time.
DB1.insert(A)
DB1.insert(B)
DB2.insert(C)
DB1.remove(A)
RUN SYNC
DB1 = DB2 = {B, C}
EDIT2: Been doing some reading. It's not the intended purpose but could they be set up as slaves replica sets of the remote and used that way? Problem is that I think replicas need to have a replica hosts must be accessible by way of resolvable DNS. Not sure how the remote could access local host.
You could use replica set but MongoDB doesn’t support master-master replication. Let's assume if you have setup like this:
two nodes with priority 1 which will be used as remote servers
single arbiter to ensure majority if one of remotes dies
5 local dbs with priority set as 0
When your application goes offline, it will stay secondary so you won't be able to perform writes. When you go online it will sync changes from remote dbs but you still need some way of syncing local changes. One of dealing with could be using local fallback db which will be used for writes when you are offline. When you go online, you push all new records to master. A little bit trickier could be dealing with updates but it is doable.
Another problem is that it won't scale up if you'll need to add more applications. If I remember correctly, there is a 12 nodes per replica set limit. For small cluster DNS resolution could be solved by using ssh tunnels.
Another way of dealing with a problem could be using small restful service and document timestamps. Whenever app is online it can periodically push local inserts to remote and pull data from remote db.

Mongodb slaveOk - preferred server

Assume I have N servers, each operating as a web server and a mongodb member of a replica set.
I'd like the slaveOk reads to be satisfied first by the local mongodb instance, rather than a remote machine across the network.
The documentation says slaveOk reads are satisfied by an arbitrary member. Is it possible to override that?
Mongodb 1.8, C-sharp driver 1.2.
The documentation says slaveOk reads are satisfied by an arbitrary member. Is it possible to override that?
Not without changing the C# driver. You'd probably have to look somewhere in this file to make those changes.
Assume I have N servers, each operating as a web server and a mongodb member of a replica set.
As a note, this is generally not the expected usage for MongoDB. Implemented in this way, your web server will be competing for RAM with MongoDB. If a server gets overloaded the web server will starve the mongod process which will cause connections to back up and exacerbate the issue.
It sounds like you're trying to use MongoDB as a local cache and there are far better tools for this job.
The closest you could come to what you are describing is for each web application to open a separate direct connection (not in replica set mode) to the local mongodb and use that separate connection for reads.

mongoDB replication with offline nodes

Is it possible to set up mongoDB replica set with following scenario (if it is,how):
2 servers always online running mongodb, one of them holds the main node, the other one a rescue copy;
n computers each of them running mongodb, occasionally connected to internet, holding nodes which need synchronizing with main node, when they go online.
Backup only. In order to do this, you'll have to specify the priority of this node to 0. If your node is never going to be used as master nor queried, you can also set buildIndexes to false.
More informations here.
Intermitent slave. Due to limitations (mainly on the oplog queue), you can't have a slave halted for a very long time if you have many writes on your MongoDB, see here. However, you can use the mongodump and mongorestore tools directly over network or by script + sync the backup file. More informations here. Note that a restore will bring a db or collection in a server and recreate the indexes completely (if you restore the system.indexes collection too) which can take some time.

mongodb single DB replication

I've a working MongoDB "replica set" made up by 3 servers.
It is storing two DBs, I wonder if is it possible to replicate only one of the DBs without running more than one mongoDB instance(one per DB).
Here is a sketch of the "problem"
Server1 Server2 Server3
DB1 X X X
DB2 X X
X stands for Server where DBs have to be replicated in.
thank
I don't believe it is possible.
Unlike sharding, where you specify down to the collection level what gets sharded, with replica sets you're defining that a given MongoDB instance is part of a replica set. As only one node in a replica set can be the master at any given time, based on the scenario you are talking about, then there would be a problem if e.g. Server1 went down and Server3 was promoted to master - as DB2 would then not be able to be written to.
I had a simliar problem and found a quite easy solution in javascript to be executed in a mongo-shell.
Sourcecode available here:
http://www.suenkel.de/blog/2012/02/mongodb-replicate-one-database-or-collection/
With opening a tailable cursor on the oplog of the master server each operation could be applied to another server (of course you can filter by the namespace of the collections or even the databases...)
According to current MongoDB ReplicaSet architecture, you can't use a single Replica Set with some members having parts of the databases or collections.
However, if you have the requirement of replicating a single database or collection in real-time in another location, I ended up with following workaround:
Use directoryPerDB to separate the desired database files (Create a new replica with this option enabled if you don't have this already)
Copy the directory of desired database to the new location.
Deploy a new ReplicaSet with this single database.
Write a simple script and use Change Streams to perform the replication for you.
As I said, you will end up with another Replica Set dedicated for this database, but replication is done in real-time and both Replica Sets has the data in a consistent way (You have to perform your write operations on first ReplicaSet, though).