MongoDB with same throughput - mongodb

I'm testing mongodb starting with one node, then 2 datanodes (1 master, 1 config-server and 2 masters), then 4 datanodes (1 master, 1 config-server, 4 masters) and with 16 datanodes (same configuration as before with 16 masters). I noticed the throughput is the same indipendently from the number of nodes: with 30 threads (using YCSB) I obtained about 6000 ops/sec with 2, 4 or 16 nodes!!
It's normal? Or there are some parameters to set??
Thanks for your replies!

You may have a non balanced shard configuration based on the value of the keys. This shard key isn't likely optimal with YCSB:
shard key: { "_id" : 1 }
You may instead want to try hash sharding like:
shard key: { "_id" : "hashed" }
One way you can tell is to run mongostat on each shard and see if you are getting a nice spread of OPS. If not, then iterate on shard keys until you do. YCSB may need to be altered with this in mind.

I was able to add more mongos on the network. So now I have 4 mongod and 4 mongos. I connect to each mongos with a different client, but it seems that one mongos works really hard, the second one works but not so hard and the other two doesn't work at all (cpu utilization is around 10%). My question is, how mongos' work? I mean, the mongos cluster decide wich mongos has to manage the load/connection? If I connect from a client to a mongos, it could happen that mongos redirect the request to another mongos? Thanks!

Related

MongoDB Sharding and Replication

I've already setup MongoDB sharding and now I need to setup replication for availability. How do I do this? I've currently got this:
2 mongos instances running in different datacenters
2 mongod config servers running in different datacenters
2 mongod shard servers running in different datacenters
all communication is over a private network setup by my provider that is available cross-datacenter
Do I just setup replication on each server (by assigning each a secondary)?
I would build the whole system in 3 DCs', for redundancy.
Every data center would have three servers with services of:
1x mongoS at Server1
1x node of config server replica set at Server1
1x node of shard1 replica set at Server2
1x node of shard2 replica set at Server3
So, a total of 9 nodes (physical or virtual).
If we "lose" one DC, everything works still, because we have a majority in all three replica sets.
You need 3 servers in each replica set for redundancy. Either put the third one in one of the data centers or get a third data center.
The config replica set needs 3 servers.
Each of the shard replica sets needs 3 servers.
You can keep the 2 mongoses.
After reading through the suggestions from D. SM and JJussi (thanks by the way), I'll be implementing the following infrastructure:
3 monogs instances spaced across different datacenters
3 config servers spaced across different datacenters
2 shards with 2 storage servers spaced across different datacenters with an arbiter each (to cut down on costs for now) each
Thanks once again for your input.

Does MongoDB has a centralized way to get node status for sharded replica sets?

I have a mongodb cluster running 11 shards across 25 host machines. Each shard is based on a replica set spread across 3 instances (2 data + 1 arbiter).
Is there some easy centralized way I can get node status via mongos? I like the data output by sh.status(), but it doesn't tell me if any of the nodes are down.
I know that I can log into 11 different nodes and run rs.status() on each (if I know which ones are working), but seems like it would be good to have some centralized way of getting status for both the shards and their underlying replica sets. Is there?

mongodb shards, how many mongod for this case

In mongodb.
If you want to build a production system with two shards, each one a replica set with three nodes, how may mongod processes must you start?
why the answer is 9?
Because you need 3 replicas per shard x the 2 shards + 3 config servers to run the sharded cluster = 9 mongods. The config servers, although also mongod processes, aren't data carrying nodes. You must have 3 config servers though, to guarantee redundancy among the config server nodes.
http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/

MongoDB - Mongos instance saturated fast

I'm using mongoDB with 3 config server, 1 mongos and 3 mongod. I see the server in which runs mongos' instance saturate very faster. It is normal or there are some properties I'm setting wrong? To solve it I have to running more mongos instances and connect to them. Also, there are some rilevant differences in performace to have 2 shard with one node each or 1 shard composed by two nodes?
Thanks for replying!

do we need 3*N instances on amazon ec2 to host N mongodb shards?

The question might seem ridiculous but it seems to me that a "yes" would be a little crazy.
MongoDB suggests to have replication sets of 3 machines. So if the database can stand on 1 computer, I need 3 machines, and if tomorrow I need to shard and need 2 machines I will actually need 6, right ?
Or is there something smarter that can be done and that comes for free with mongoDB ? (with coding theory like Hamming, ... the number of extra bits that we need is not linear in the size of the total number of bits)
Please don't hesitate to ask me to reformulate if what I say is not clear
Thanks in advance for your answers,
Thomas
So there is some really good documentation which is the recommended cluster setup in terms of phisycal instance separation. There should be considered two things (at least) separately. One is replication and for this one see this documentation : http://docs.mongodb.org/manual/core/replica-set-members/
Which means you have to have at least two data nodes (due to HA) in a replicaset and can have one arbiter which is not holding data just participate in election as it is described in the docs linked above. You need an odd number of setmembers due to the primary has to be elected by a majority inside the replicaset.
The other aspect is sharding. Sharding needs some additional metadata maintaining layer which is achived through additional processes these are configuration servers and mongos routers. For sharded production cluster see : http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/. In this setup the three configservers have to be on separated instances. Also the two mongos processes cannot reside on the same instance.
So for the minimal alignment. Have to be considered :
You must not collocate data nodes (each two datanodes in each shard have to be on a separated instance)
The arbiter node belonging to a specific shards replicaset have to be on a separated instance from the two datanodes
The three configservers should reside on separated instances from each other
The minimal two mongos processes have to reside on separated nodes from each other
However datanodes cannot be collocated, configservers and mongos processes can be on the same instances as the datanodes.
So theoretically one can align a sharded cluster without braking any of the recomendations on 4 instances with two shards like this:
Instance 1:
datanode replicaset 1, configserver 1, arbiter replicaset 2
Instance 2:
datanode replicaset 1, configserver 2, mongos 1
Instance 3:
datanode replicaset 2, configserver 3, arbiter replicaset 1
Instance 4:
datanode replicaset 2, mongos 2
Where replicaset 1 represents the first shard and replicaset 2 represents the second.
datanode is not a terminology which is used for mongoDB in general just i am likely to address with this name those mongod process which are handling real data, so the (Primaries and secondaries in a replicaset).
Just as a sidenote i would not do this. Just start micro instances for the configservers and keep mongos processes on the application servers.