Mongodb cluster and reading from secondary when primary is down - mongodb

This is not a basic question on just asking about mongo cluster. It is not a duplicate in my opinion
I have a mongodb 3 node cluster and my url is something along the following lines in a PlayFramework conf file
mongodb.uri = "mongodb://mongodb1:27017,mongodb2:27017,mongodb3:27017/myproj"
By default when the replica is configured, all reads and writes are happening only on Primary and that is what I want. I however, want the reads to go to secondary when no primary is left, that is when 2 nodes go down, there will not be a primary left and only one secondary.
I do not want to modify my code to achieve this for each read query. I tried the following on the secondary node but it does not help
db.getMongo().setReadPref('primaryPreferred')
What exactly do I need to do to make this work?

I do not want to modify my code to achieve this for each read query. I tried the following on the secondary node but it does not help:
db.getMongo().setReadPref('primaryPreferred')
You are on the right track with read preferences, but need to set this in your connection string or driver. Setting a read preference in the mongo shell only affects the current shell session, and has no effect on remote connections.
mongodb.uri = "mongodb://mongodb1:27017,mongodb2:27017,mongodb3:27017/myproj"
You need to add some additional parameters as per MongoDB's Connecting String URI Format:
(required) The replicaSet=... option indicates that the driver should use "replica set" connection mode as opposed to the default direct connection mode. This parameter enables replica set monitoring, read preferences, and discovery of topology changes. The provided replica set name must match the replica set name configured for your deployment. For full details on the connection behaviour expected for officially supported MongoDB Drivers, see the Server Discovery and Monitoring (SDAM) specification. The rationale section of the spec includes answers to common questions about the chosen approach.
(required) The readPreference=primaryPreferred option indicates the preference to read from a primary but use a secondary if there is no primary available.
(optional) In MongoDB 3.4+ you can specify a maxStalenessSeconds=... option which limits the maximum replication lag (or staleness) when using a secondary read preference. By default there is no max staleness so the driver will not consider replication lag when selecting a secondary based on read preference. If your intent is to use primaryPreferred as a failover option for reads I would set max staleness with caution: you need to ensure that you have at least one secondary which has acceptable staleness.
So, assuming a replica set name of mongocluster and database of myproj the suggested connection string would be:
mongodb://mongodb1:27017,mongodb2:27017,mongodb3:27017/mypr‌oj?replicaSet=‌mongocluster&readPreference=primaryPreferred

Related

Why isn't "connect" option in mongo connection string documented?

The issue was that even if i target just one node of my replica set in my connection string, mongo-go-driver always want to discover and connect other nodes.
I found a solution here that basically say i should add the connect option in the connection string.
mongodb://host:27017/authDb?connect=direct
My question is: How good or bad practice is this and why mongo doesn't have documented, are there other available values that this option can have?
That option only exists for the Go driver. For all other drivers it is unrecognized, so it is not documented as a general connection string option.
It is documented for the Go Driver at https://godoc.org/go.mongodb.org/mongo-driver/mongo#example-Connect--Direct
How good or bad practice is this and why mongo doesn't have documented, are there other available values that this option can have?
As pointed out in the accepted answer, that this is documented under the driver documentation. Now for the other part of the question.
Generally speaking in the replica set context, you would want to connect to the topology instead of directly to a specific replica set member, with an exception for administrative purposes. Replication is designed to provide redundancy, and connecting directly to one member i.e. Primary is not recommended in case of fail-over.
All of the official MongoDB drivers follows MongoDB Specifications. In regards to the direct connections, the requirement currently is server-discovery-and-monitoring.rst#general-requirements:
Direct connections: A client MUST be able to connect to a single
server of any type. This includes querying hidden replica set members,
and connecting to uninitialized members (see RSGhost) in order to run
"replSetInitiate". Setting a read preference MUST NOT be necessary to
connect to a secondary. Of course, the secondary will reject all
operations done with the PRIMARY read preference because the slaveOk
bit is not set, but the initial connection itself succeeds. Drivers
MAY allow direct connections to arbiters (for example, to run
administrative commands).
It only specifies that it MUST be able to do so, but not how. MongoDB Go driver is not the only driver that currently supporting the direct option approach, there are also .NET/C# and Ruby as well.
Currently there is an open PR for the specifications to unify the behaviour. In the future, all drivers will have the same way of establishing a direct connection.

How do I set Secondary Preferred reads on Mongo Database from Parse Server?

We created 10 nodes with Parse Server and they killed our Mongo database's Primary.
Is there a way we can set up Secondary Preferred for the reads on Parse servers otherwise the load will be put in the primary node?
You should read this article about read preferences in MongoDB
If you use MongoDB Native driver or Mongoose you can achieve a read preference read this
By setting secondaryPreferred as the read preference.
The documentation statues for secondaryPreferred in most situations, operations read from secondary members but if no secondary members are available, operations read from the primary
Also you can tweak your secondary read preference by configuring maxStaleTimeout
Please go through the documentation for understanding more about read preference
https://docs.mongodb.com/manual/core/read-preference/
The Parse Server docs you are looking for are here.
Example:
const query = new Parse.Query("MyClass");
query.readPreference("SECONDARY");

Replicating data between 2 Mongo replica sets

I'm currently have a mongo replicaset consisting of 1 primary and 2 slaves, that is used by a read-only application. I'm adding a 2nd read-only application that requires access to the same data. I have / am considering using the same RS for both applications, but was wondering if there's a way to create a specific type of configuration with Mongo, that works something like this:
1 x primary, that handles all writes, but is not seen as part of a replicaset by the application, and then 2 sets of read-only secondaries that replicate from primary. Each set of secondary replicates writes from the master. Conceptually, something like:
/----> RS1: |Secondary1|Secondary2|..|SecondaryN| <--- App1
PRIMARY|=>
\----> RS2: |Secondary1|Secondary2|..|SecondaryN| <--- App2
Is this sort of configuration possible at all? What similar architectures could I consider for this use-case?
Thanks in advance.
Brett
I came across a way to implement this using mongo tooling:
Create a replica set to use as a master. Data updates are written to this RS (represented by "PRIMARY" in the diagram). Do not enable authentication on this host
Create 2 replica sets with the same data (completely independent of each other)
Schedule regular "mongooplog" runs, using #1 as from and each of the RS's for host see the manual
Authentication can be set up in the RSs from #2, that only give applications read access to the data.
I haven't tested this yet, but from what I can tell, this approach should work for my objectives - is there anything I've overlooked in this approach?
edit
While trying to use this approach, I ran into issues when trying to use mongooplog with authentication on the destination. over and above that, mongooplog doesn't cater for authentication on the source / --from rs, and so I wrote a tool to cater for this:
https://github.com/brettcave/mongo-oplogreplay
It supports authentication on both source and destination, as well as replicaset targets.

MongoDB one way replication

Need some way to push data from clients database to central database.Basically, there are several instances of MongoDB running on remote machines [clients] , and need some method to periodically update central mongo database with newly added and modified documents in clients.it must replicate its records to the single central server
Eg:
If I have 3 mongo instances running on 3 machines each having data of 10GB then after the data migration 4th machine's mongoDB must have 30GB of data. And cenral mongoDB machine must get periodically updated with data of all those 3 machines. But these 3 machines not only get new documents but existing documents in them may get updated. I would like the central mongoDB machine also to get these updations.
Your desired replication strategy is not formally supported by MongoDB.
A MongoDB replica set consists of a single primary with asynchronous replication to one or more secondary servers in the same replica set. You cannot configure a replica set with multiple primaries or replication to a different replica set.
However, there are a few possible approaches for your use case depending on how actively you want to keep your central server up to date and the volume of data/updates you need to manage.
Some general caveats:
Merging data from multiple standalone servers can create unexpected conflicts. For example, unique indexes would not know about documents created on other servers.
Ideally the data you are consolidating will still be separated by a unique database name per origin server so you don't have strange crosstalk between disparate documents that happen to have the same namespace and _id shared by different origin servers.
Approach #1: use mongodump and mongorestore
If you just need to periodically sync content to your central server, one way to do so is using mongodump and mongorestore. You can schedule a periodic mongodump from each of your standalone instances and use mongorestore to import them into the central server.
Caveats:
There is a --db parameter for mongorestore that allows you to restore into a different database from the original name (if needed)
mongorestore only performs inserts into the existing database (i.e. does not perform updates or upserts). If existing data with the same _id already exists on the target database, mongorestore will not replace it.
You can use mongodump options such as --query to be more selective on data to export (for example, only select recent data rather than all)
If you want to limit the amount of data to dump & restore on each run (for example, only exporting "changed" data), you will need to work out how to handle updates and deletions on the central server.
Given the caveats, the simplest use of this approach would be to do a full dump & restore (i.e. using mongorestore --drop) to ensure all changes are copied.
Approach #2: use a tailable cursor with the MongoDB oplog.
If you need more realtime or incremental replication, a possible approach is creating tailable cursors on the MongoDB replication oplog.
This approach is basically "roll your own replication". You would have to write an application which tails the oplog on each of your MongoDB instances and looks for changes of interest to save to your central server. For example, you may only want to replicate changes for selective namespaces (databases or collections).
A related tool that may be of interest is the experimental Mongo Connector from 10gen labs. This is a Python module that provides an interface for tailing the replication oplog.
Caveats:
You have to implement your own code for this, and learn/understand how to work with the oplog documents
There may be an alternative product which better supports your desired replication model "out of the box".
You should be aware that there are only replica set for doing replication there a replicat set always means: one primary, multiple secondary. Write always go to the primary server. Appearently you want multi-master replication which is not supported by MongoDB. So you want to look into a different technology like CouchDB or CouchBase. MongoDB is barrel burst here.
There may be a way since MongoDB 3.6 to achieve your goal: Change Streams.
Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.
There are some configuration options that affect whether you can use Change Streams or not, so please read about them.
Another option is Delayed Replica Set Members.
Because delayed members are a "rolling backup" or a running "historical" snapshot of the data set, they may help you recover from various kinds of human error. For example, a delayed member can make it possible to recover from unsuccessful application upgrades and operator errors including dropped databases and collections.
Hidden Replica Set Members may be another option to consider.
A hidden member maintains a copy of the primary's data set but is invisible to client applications. Hidden members are good for workloads with different usage patterns from the other members in the replica set.
Another option may be to configure a Priority 0 Replica Set Member.
Because delayed members are a "rolling backup" or a running "historical" snapshot of the data set, they may help you recover from various kinds of human error.
I am interested in these options myself, but I haven't decided what approach I will use.

MongoDB BI Architecture

We've have a production app running Mongo with a replica set on different box.
I'd like to start doing some BI on the data, possibly using Pentaho.
My question is: how should I setup my architecture such that I'm not doing BI directly on the production environment?
Should I create a separate BI instance and do an mongoexport to the BI instance or is there some other best-practice I should follow?
There are several options to consider depending on your data set, BI requirements, and MongoDB server version. If you just need to read data for reports, there are more options than if you want to write data as well (eg. for a map/reduce operation). MongoDB 2.2 also introduces some very useful features and improvements as noted below.
In general, use of a replica set configuration will be extremely helpful for administrative purposes as this enables a full copy of your data set to be available without disrupting your primary MongoDB server. For larger data sets and horizontal write scaling, MongoDB's sharding feature can also be used in conjunction with any of the suggestions below.
Before going down the path of separating your BI data, it would be worth determining what the actual impact is by testing in a staging environment.
The following approaches are roughly in order of isolation from your production environment:
With replica sets you can use read preferences to direct queries to appropriate servers. In versions of MongoDB prior to 2.2 the general read preferences were limited to reading from a primary or allowing reads from non-hidden secondaries with "slaveOK" (equivalent to "secondaryPreferred"). In MongoDB 2.2 there are some additional read preferences including "secondary" (read from secondary if available, otherwise error); "primary preferred" (read from primary if available .. otherwise a secondary); and "nearest" (read from nearest primary or secondary node based on network latency). The read preferences in MongoDB 2.2 can be used in conjunction with tag sets to provide even more granular control over directing queries to servers in a replica set or sharded cluster.
For MongoDB 1.8 and higher, you can use replica sets with a hidden secondary node. A hidden node will not be advertised to clients connecting to the replica set normally, but can be connected to directly for report generation. Note: the hidden node will be a read-only secondary, so this limits the use of some queries. For example, map/reduce requires write access to save to an output collection .. but you could use an inline map/reduce depending on your BI requirements.
MongoDB 2.2 has a database-level write lock (an improvement from prior versions that had a global write lock). If you need to write BI data, you can save this into a separate database to minimize lock contention. You still have to consider the overall resource effect .. for example, processing a significant number of older documents for BI purposes might compete with the caching of latest documents that are being queried by your production application.
If you want to completely separate BI data from the production environment, you could create a separate instance using one of the MongoDB backup strategies. If you have replication enabled, you can create a backup from a secondary in your replica set. Depending on the size of your data set, it will likely be faster to do a snapshot copy of the data (which includes indexes that are already built) rather than a full mongodump/mongorestore cycle.
Use a replica set and run the analysis on a secondary node (as long as only are involved).