Is it possible to consolidate mongodb primaries to a single secondary? - mongodb

I want to achieve a configuration in which there are 3 primary mongodbs on 3 different servers and 1 mongodb running in the cloud. All the 3 primaries will have different collection names in which they write their data. What I want to achieve is that all the 3 primaries should sync their data to the one present in the cloud to their own collections. Is this possible? If yes, then can you point me to any resource for it? If no, then I think, I will have to have a seperate secondary in the cloud for each primary, right?
I did quite extensive searches and went through mongodb docs but couldn't find anything relevant.

One mongodb process can't belong to more than one replica set, so you'll have to have three processes for the secondaries, be it on the same cloud server or several.

Related

how to write data to multi master instance in mongodb 3.4

How can I write data to multi mongodb instances and keep data synchronous among these instances? Just like in mariaDB.
Currently we use the replica-set in mongodb, but this seems can only support writing data to one node, and this may cause pressure issue if writing requests going up.
Sharded Cluster is not appropriate for me.
Thanks.
Please read the docs (Replication in MongoDB)
Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes

Mongodb two nodes deployment

I know Mongodb suggest the replication set minimum is 3
Can I use two servers to install Mongodb with 4 replication sets in order to prevent Write failure when one node down?
My idea is to install two more instances on each server to fake/get rid of it:
Mongodb 1
Replcation set 1 (Master)
Replcation set 2 (Secondary)
Mongodb 2
Replcation set 3 (Secondary)
Replcation set 4 (Arbiter)
If one server down, it still has two replication sets for it.
Can anyone can comment my idea?
I would like to know is there any issue/risk needed to consider?
Thanks!
This is a bad idea. Your suggested configuration would be counter-productive for two main reasons:
Running two instances of MongoDB on the same hosts will lead to them both running slowly and awkwardly, competing for memory, cpu, disk i/o, etc.
If one of the hosts goes down, the two nodes on the other will not be able to continue the replica set because with only two nodes running out of four, you don't have the majority necessary to become primary.
MongoDB is intended to be run effectively on multiple hosts, so it is designed for that; even if you could find a way to shoe-horn a functioning replica set onto only two hosts, you would be working against MongoDB's design and struggling to make it work effectively.
Really, your best option is to run a single mongod instance on each of your two hosts, and to run an arbiter on a third (separate) host; that kind of normal 3-host configuration is an effective and straightforward way of achieving both data replication and high availability.
Please read https://docs.mongodb.com/manual/replication/
Even number of members in a set is not recommended. The only reason to add an Arbiter is to resolve this issue and increase number of members by 1 to make it odd. It makes no sense to add 2 Arbiters at all.
Similarly there is no point to put an Arbiter to the same server with any other member. It is quite lightweight, so you can co-locate it with an application server for example.
Having 2 db servers the best replica set configuration would be:
Mongodb 1
Master
Mongodb 2
Secondary
Any other server
Arbiter
Simple answer is, no you cannot do it..
Reason is "majority". If you loose one server of those two, you don't have majority of votes online.
This is reason, why 3 is minimum. It can be 3 data bearing nodes or 2 data bearing nodes and one arbiter. All of those have one vote and if you loose one of those, replica set still have 2/3 votes what is majority. Like 3/5 or 4/7.
Values 1/2 (or 2/4) are not majority.

MongoDB replication factors

I'm new to Mongo, have been using Cassandra for a while. I didn't find any clear answers from Mongo's docs. My questions below are all closely related.
1) How to specify the replication factors in Mongo?
In C* you can define replication factors (how many replicas) for the entire database and/or for each table. Mongo's newer replica set concept is similar to C*'s idea. But, I don't see how I can define the replication factors for each table/collection in Mongo. Where to define that in Mongo?
2) Replica sets and replication factors?
It seems you can define multiple replica sets in a Mongo cluster for your databases. A replica set can have up to 7 voting members. Does a 5-member replica set means the data is replicated 5 times? Or replica sets are only for voting the primary?
3) Replica sets for what collections?
The replica set configuration doc didn't mention anything about databases or collections. Is there a way to specify what collections a replica set is intended for?
4) Replica sets definition?
If I wanted to create a 15-node Mongo cluster and keeping 3 copies of each record, how to partition the nodes into multiple replica sets?
Thanks.
Mongo replication works by replicating the entire instance. It is not done at the individual database or collection level. All replicas contain a copy of the data except for arbiters. Arbiters do not hold any data and only participate in elections of a new primary. They are usually deployed to create enough of a majority that if an instance goes down a new instance can be elected as the primary.
Its pretty well explained here https://docs.mongodb.com/manual/replication/
Replication is referred to the process of ensuring that the same data is available on more than one Mongo DB Server. This is sometimes required for the purpose of increasing data availability.
Because if your main MongoDB Server goes down for any reason, there will be no access to the data. But if you had the data replicated to another server at regular intervals, you will be able to access the data from another server even if the primary server fails.
Another purpose of replication is the possibility of load balancing. If there are many users connecting to the system, instead of having everyone connect to one system, users can be connected to multiple servers so that there is an equal distribution of the load.
In MongoDB, multiple MongDB Servers are grouped in sets called Replica sets. The Replica set will have a primary server which will accept all the write operation from clients. All other instances added to the set after this will be called the secondary instances which can be used primarily for all read operations.

MongoDB load balancing in multiple AWS instances

We're using amazon web service for a business application which is using node.js server and mongodb as database. Currently the node.js server is runing on a EC2 medium instance. And we're keeping our mongodb database in a separate micro instance. Now we want to deploy replica set in our mongodb database, so that if the mongodb gets locked or unavailble, we still can run our database and get data from it.
So we're trying to keep each member of the replica set in separate instances, so that we can get data from the database even if the instance of the primary memeber shuts down.
Now, I want to add load balancer in the database, so that the database works fine even in huge traffic load at a time. In that case I can read balance the database by adding slaveOK config in the replicaSet. But it'll not load balance the database if there is huge traffic load for write operation in the database.
To solve this problem I got two options till now.
Option 1: I've to shard the database and keep each shard in separate instance. And under each shard there will be a reaplica set in the same instance. But there is a problem, as the shard divides the database in multiple parts, so each shard will not keep same data within it. So if one instance shuts down, we'll not be able to access the data from the shard within that instance.
To solve this problem I'm trying to divide the database in shards and each shard will have a replicaSet in separate instances. So even if one instance shuts down, we'll not face any problem. But if we've 2 shards and each shard has 3 members in the replicaSet then I need 6 aws instances. So I think it's not the optimal solution.
Option 2: We can create a master-master configuration in the mongodb, that means all the database will be primary and all will have read/write access, but I would also like them to auto-sync with each other every so often, so they all end up being clones of each other. And all these primary databases will be in separate instance. But I don't know whether mongodb supports this structure or not.
I've not got any mongodb doc/ blog for this situation. So, please suggest me what should be the best solution for this problem.
This won't be a complete answer by far, there is too many details and I could write an entire essay about this question as could many others however, since I don't have that kind of time to spare, I will add some commentary about what I see.
Now, I want to add load balancer in the database, so that the database works fine even in huge traffic load at a time.
Replica sets are not designed to work like that. If you wish to load balance you might in fact be looking for sharding which will allow you to do this.
Replication is for automatic failover.
In that case I can read balance the database by adding slaveOK config in the replicaSet.
Since, to stay up to date, your members will be getting just as many ops as the primary it seems like this might not help too much.
In reality instead of having one server with many connections queued you have many connections on many servers queueing for stale data since member consistency is eventual, not immediate unlike ACID technologies, however, that being said they are only eventually consistent by 32-odd ms which means they are not lagging enough to give decent throughput if the primary is loaded.
Since reads ARE concurrent you will get the same speed whether you are reading from the primary or secondary. I suppose you could delay a slave to create a pause of OPs but that would bring back massively stale data in return.
Not to mention that MongoDB is not multi-master as such you can only write to one node a time makes slaveOK not the most useful setting in the world any more and I have seen numerous times where 10gen themselves recommend you use sharding over this setting.
Option 2: We can create a master-master configuration in the mongodb,
This would require you own coding. At which point you may want to consider actually using a database that supports http://en.wikipedia.org/wiki/Multi-master_replication
This is since the speed you are looking for is most likely in fact in writes not reads as I discussed above.
Option 1: I've to shard the database and keep each shard in separate instance.
This is the recommended way but you have found the caveat with it. This is unfortunately something that remains unsolved that multi-master replication is supposed to solve, however, multi-master replication does add its own ship of plague rats to Europe itself and I would strongly recommend you do some serious research before you think as to whether MongoDB cannot currently service your needs.
You might be worrying about nothing really since the fsync queue is designed to deal with the IO bottleneck slowing down your writes as it would in SQL and reads are concurrent so if you plan your schema and working set right you should be able to get a massive amount of OPs.
There is in fact a linked question around here from a 10gen employee that is very good to read: https://stackoverflow.com/a/17459488/383478 and it shows just how much throughput MongoDB can achieve under load.
It will grow soon with the new document level locking that is already in dev branch.
Option 1 is the recommended way as pointed out by #Sammaye but you would not need 6 instances and can manage it with 4 instances.
Assuming you need below configuration.
2 shards (S1, S2)
1 copy for each shard (Replica set secondary) (RS1, RS2)
1 Arbiter for each shard (RA1, RA2)
You could then divide your server configuration like below.
Instance 1 : Runs : S1 (Primary Node)
Instance 2 : Runs : S2 (Primary Node)
Instance 3 : Runs : RS1 (Secondary Node S1) and RA2 (Arbiter Node S2)
Instance 4 : Runs : RS2 (Secondary Node S2) and RA1 (Arbiter Node S1)
You could run arbiter nodes along with your secondary nodes which would help you in election during fail-overs.

How to specify databases to be replicated in MongoDB replicasets?

I am very new to MongoDB.
I know that MongoDB can replicate the nodes.
Is there a way to choose specific databases to be replicated ?
No, in a replica set, every node contains the exact same data as the other nodes, so that at any point a secondary can step up and be the primary seamlessly. There is no way to pick and choose what databases are included, nor would there be any way to make that work with the intention of using them for redundancy.
In a MongoDB replica set, the entire node is replicated between all members of the replica set. In this way you don't pick a specific database to replicate, rather all databases on the primary node are replicated to all the secondary nodes.