MongoDB Sharding and Replication - mongodb

I've already setup MongoDB sharding and now I need to setup replication for availability. How do I do this? I've currently got this:
2 mongos instances running in different datacenters
2 mongod config servers running in different datacenters
2 mongod shard servers running in different datacenters
all communication is over a private network setup by my provider that is available cross-datacenter
Do I just setup replication on each server (by assigning each a secondary)?

I would build the whole system in 3 DCs', for redundancy.
Every data center would have three servers with services of:
1x mongoS at Server1
1x node of config server replica set at Server1
1x node of shard1 replica set at Server2
1x node of shard2 replica set at Server3
So, a total of 9 nodes (physical or virtual).
If we "lose" one DC, everything works still, because we have a majority in all three replica sets.

You need 3 servers in each replica set for redundancy. Either put the third one in one of the data centers or get a third data center.
The config replica set needs 3 servers.
Each of the shard replica sets needs 3 servers.
You can keep the 2 mongoses.

After reading through the suggestions from D. SM and JJussi (thanks by the way), I'll be implementing the following infrastructure:
3 monogs instances spaced across different datacenters
3 config servers spaced across different datacenters
2 shards with 2 storage servers spaced across different datacenters with an arbiter each (to cut down on costs for now) each
Thanks once again for your input.

Related

Does MongoDB has a centralized way to get node status for sharded replica sets?

I have a mongodb cluster running 11 shards across 25 host machines. Each shard is based on a replica set spread across 3 instances (2 data + 1 arbiter).
Is there some easy centralized way I can get node status via mongos? I like the data output by sh.status(), but it doesn't tell me if any of the nodes are down.
I know that I can log into 11 different nodes and run rs.status() on each (if I know which ones are working), but seems like it would be good to have some centralized way of getting status for both the shards and their underlying replica sets. Is there?

mongodb shards, how many mongod for this case

In mongodb.
If you want to build a production system with two shards, each one a replica set with three nodes, how may mongod processes must you start?
why the answer is 9?
Because you need 3 replicas per shard x the 2 shards + 3 config servers to run the sharded cluster = 9 mongods. The config servers, although also mongod processes, aren't data carrying nodes. You must have 3 config servers though, to guarantee redundancy among the config server nodes.
http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/

MongoDB - Mongos instance saturated fast

I'm using mongoDB with 3 config server, 1 mongos and 3 mongod. I see the server in which runs mongos' instance saturate very faster. It is normal or there are some properties I'm setting wrong? To solve it I have to running more mongos instances and connect to them. Also, there are some rilevant differences in performace to have 2 shard with one node each or 1 shard composed by two nodes?
Thanks for replying!

do we need 3*N instances on amazon ec2 to host N mongodb shards?

The question might seem ridiculous but it seems to me that a "yes" would be a little crazy.
MongoDB suggests to have replication sets of 3 machines. So if the database can stand on 1 computer, I need 3 machines, and if tomorrow I need to shard and need 2 machines I will actually need 6, right ?
Or is there something smarter that can be done and that comes for free with mongoDB ? (with coding theory like Hamming, ... the number of extra bits that we need is not linear in the size of the total number of bits)
Please don't hesitate to ask me to reformulate if what I say is not clear
Thanks in advance for your answers,
Thomas
So there is some really good documentation which is the recommended cluster setup in terms of phisycal instance separation. There should be considered two things (at least) separately. One is replication and for this one see this documentation : http://docs.mongodb.org/manual/core/replica-set-members/
Which means you have to have at least two data nodes (due to HA) in a replicaset and can have one arbiter which is not holding data just participate in election as it is described in the docs linked above. You need an odd number of setmembers due to the primary has to be elected by a majority inside the replicaset.
The other aspect is sharding. Sharding needs some additional metadata maintaining layer which is achived through additional processes these are configuration servers and mongos routers. For sharded production cluster see : http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/. In this setup the three configservers have to be on separated instances. Also the two mongos processes cannot reside on the same instance.
So for the minimal alignment. Have to be considered :
You must not collocate data nodes (each two datanodes in each shard have to be on a separated instance)
The arbiter node belonging to a specific shards replicaset have to be on a separated instance from the two datanodes
The three configservers should reside on separated instances from each other
The minimal two mongos processes have to reside on separated nodes from each other
However datanodes cannot be collocated, configservers and mongos processes can be on the same instances as the datanodes.
So theoretically one can align a sharded cluster without braking any of the recomendations on 4 instances with two shards like this:
Instance 1:
datanode replicaset 1, configserver 1, arbiter replicaset 2
Instance 2:
datanode replicaset 1, configserver 2, mongos 1
Instance 3:
datanode replicaset 2, configserver 3, arbiter replicaset 1
Instance 4:
datanode replicaset 2, mongos 2
Where replicaset 1 represents the first shard and replicaset 2 represents the second.
datanode is not a terminology which is used for mongoDB in general just i am likely to address with this name those mongod process which are handling real data, so the (Primaries and secondaries in a replicaset).
Just as a sidenote i would not do this. Just start micro instances for the configservers and keep mongos processes on the application servers.

is this the optimal minimum setup for mongodb to allow for sharding/scaling?

3 instances for config servers
1 instance for webserver & mongos
1 instance for shard 1
then when i need to start more shards i can just add more instances?
also, what is a replica set? if i had say 3 servers to shard 1 then is that a replica set?
A Replica Set is a set of computers that are clones of each other. (i.e.: replicas) Within a given set there is an elected master. By default reads and writes go to this elected master and the replicas just "tail" the changes to be up-to-date copies. If the master fails, a new one is elected and the system just keeps going. The documentation is here.
So you ask about scaling with MongoDB. There are two types of scaling:
Read Scaling: use Replica Sets (see here)
Write Scaling: use Sharding
The minimum config for Replica Sets is
- 2 full replicas
- 1 arbiter (lightweight process, breaks ties when voting)
The minimum config for Sharding is
- 1 config server
- 1 mongod process (only one shard)
- 1 or more mongos (generaly on app server)
However, you probably don't want to run like this in production. Running only a single DB, means that you only have one source for the data which can result in large down-times or total data loss. This is generally solved by using replica sets.
Additionally, the config server is quite important. MongoDB supports 1 or 3 config servers. Most production deployments use 3. Note that config servers and arbiters are very lightweight and can live on other boxes or on Amazon micro instances.
Most production deployments with sharding also involve replica sets. In fact, they usually start as replica sets.
then when i need to start more shards i can just add more instances?
From a sharding perspective it should be as easy as:
- start new shard server
- run the addshard command from a mongos
Note that when you add a shard, you will need to allow for time and resources as data migrates between shards and everything re-balances.