I'm using mongoDB with 3 config server, 1 mongos and 3 mongod. I see the server in which runs mongos' instance saturate very faster. It is normal or there are some properties I'm setting wrong? To solve it I have to running more mongos instances and connect to them. Also, there are some rilevant differences in performace to have 2 shard with one node each or 1 shard composed by two nodes?
Thanks for replying!
Related
I've already setup MongoDB sharding and now I need to setup replication for availability. How do I do this? I've currently got this:
2 mongos instances running in different datacenters
2 mongod config servers running in different datacenters
2 mongod shard servers running in different datacenters
all communication is over a private network setup by my provider that is available cross-datacenter
Do I just setup replication on each server (by assigning each a secondary)?
I would build the whole system in 3 DCs', for redundancy.
Every data center would have three servers with services of:
1x mongoS at Server1
1x node of config server replica set at Server1
1x node of shard1 replica set at Server2
1x node of shard2 replica set at Server3
So, a total of 9 nodes (physical or virtual).
If we "lose" one DC, everything works still, because we have a majority in all three replica sets.
You need 3 servers in each replica set for redundancy. Either put the third one in one of the data centers or get a third data center.
The config replica set needs 3 servers.
Each of the shard replica sets needs 3 servers.
You can keep the 2 mongoses.
After reading through the suggestions from D. SM and JJussi (thanks by the way), I'll be implementing the following infrastructure:
3 monogs instances spaced across different datacenters
3 config servers spaced across different datacenters
2 shards with 2 storage servers spaced across different datacenters with an arbiter each (to cut down on costs for now) each
Thanks once again for your input.
In mongodb.
If you want to build a production system with two shards, each one a replica set with three nodes, how may mongod processes must you start?
why the answer is 9?
Because you need 3 replicas per shard x the 2 shards + 3 config servers to run the sharded cluster = 9 mongods. The config servers, although also mongod processes, aren't data carrying nodes. You must have 3 config servers though, to guarantee redundancy among the config server nodes.
http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/
I'm testing mongodb starting with one node, then 2 datanodes (1 master, 1 config-server and 2 masters), then 4 datanodes (1 master, 1 config-server, 4 masters) and with 16 datanodes (same configuration as before with 16 masters). I noticed the throughput is the same indipendently from the number of nodes: with 30 threads (using YCSB) I obtained about 6000 ops/sec with 2, 4 or 16 nodes!!
It's normal? Or there are some parameters to set??
Thanks for your replies!
You may have a non balanced shard configuration based on the value of the keys. This shard key isn't likely optimal with YCSB:
shard key: { "_id" : 1 }
You may instead want to try hash sharding like:
shard key: { "_id" : "hashed" }
One way you can tell is to run mongostat on each shard and see if you are getting a nice spread of OPS. If not, then iterate on shard keys until you do. YCSB may need to be altered with this in mind.
I was able to add more mongos on the network. So now I have 4 mongod and 4 mongos. I connect to each mongos with a different client, but it seems that one mongos works really hard, the second one works but not so hard and the other two doesn't work at all (cpu utilization is around 10%). My question is, how mongos' work? I mean, the mongos cluster decide wich mongos has to manage the load/connection? If I connect from a client to a mongos, it could happen that mongos redirect the request to another mongos? Thanks!
The question might seem ridiculous but it seems to me that a "yes" would be a little crazy.
MongoDB suggests to have replication sets of 3 machines. So if the database can stand on 1 computer, I need 3 machines, and if tomorrow I need to shard and need 2 machines I will actually need 6, right ?
Or is there something smarter that can be done and that comes for free with mongoDB ? (with coding theory like Hamming, ... the number of extra bits that we need is not linear in the size of the total number of bits)
Please don't hesitate to ask me to reformulate if what I say is not clear
Thanks in advance for your answers,
Thomas
So there is some really good documentation which is the recommended cluster setup in terms of phisycal instance separation. There should be considered two things (at least) separately. One is replication and for this one see this documentation : http://docs.mongodb.org/manual/core/replica-set-members/
Which means you have to have at least two data nodes (due to HA) in a replicaset and can have one arbiter which is not holding data just participate in election as it is described in the docs linked above. You need an odd number of setmembers due to the primary has to be elected by a majority inside the replicaset.
The other aspect is sharding. Sharding needs some additional metadata maintaining layer which is achived through additional processes these are configuration servers and mongos routers. For sharded production cluster see : http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/. In this setup the three configservers have to be on separated instances. Also the two mongos processes cannot reside on the same instance.
So for the minimal alignment. Have to be considered :
You must not collocate data nodes (each two datanodes in each shard have to be on a separated instance)
The arbiter node belonging to a specific shards replicaset have to be on a separated instance from the two datanodes
The three configservers should reside on separated instances from each other
The minimal two mongos processes have to reside on separated nodes from each other
However datanodes cannot be collocated, configservers and mongos processes can be on the same instances as the datanodes.
So theoretically one can align a sharded cluster without braking any of the recomendations on 4 instances with two shards like this:
Instance 1:
datanode replicaset 1, configserver 1, arbiter replicaset 2
Instance 2:
datanode replicaset 1, configserver 2, mongos 1
Instance 3:
datanode replicaset 2, configserver 3, arbiter replicaset 1
Instance 4:
datanode replicaset 2, mongos 2
Where replicaset 1 represents the first shard and replicaset 2 represents the second.
datanode is not a terminology which is used for mongoDB in general just i am likely to address with this name those mongod process which are handling real data, so the (Primaries and secondaries in a replicaset).
Just as a sidenote i would not do this. Just start micro instances for the configservers and keep mongos processes on the application servers.
I've been working with mongo for a few weeks and and building my environment in a dev. I started with a single node, then moved to a shard cluster, and now want to move to a replicated shard cluster. From what I read a Replicated Shard Cluster is the best of the best, scalability, durability, performance increase, etc.
I've read most of the (very good) tutorials in their help. It seems their lessons advise going from single node, to replica set, to sharded replica set, which, unfortunately is the opposite way I did it. I can't seem to find anything to upgrade a sharded cluster to a replicated shard cluster.
Here are 5 hosts that I have:
APPSERVER
CONFIGSERVER
SHARD1
SHARD2
SHARD3
I started each of the shard servers with:
mongod --shardsvr
Then I started the config server with:
mongod --configsvr
Then I started the mongos process on the APPSERVER with:
mongos --configdb CONFIGSERVER
Then in mongos, I added the shards, enabled sharding on my database, and defined a shardkey for a collection:
sh.addShard("SHARD1:27018");//again for 2 and 3
sh.enableSharding("beacon");
sh.shardCollection("beacon.alpha2", {"ip":1});
I want each of the shards replicated on each of the other two. (right?) Do I need to bring down the mongod processes on the shards and restart them with different CL parameters? What commands do I need to issue in my mongos shell to get it to replicate? Should I export all my data, take everything down, restart and reimport? Again, I see a lot of tutorials on how to create a replica set, but I don't really see anything on how to do a replica set given a sharded system to start with.
Thanks!
For each shard, you will need to restart the current member and start both it and two new members (or 1 new member and an arbiter) with the --replset command line option. You could add more members than that, but 3 is the lowest workable set. Then from inside what will become the new primary (your current SHARD1 for example) you could do the following:
rs.add("newmember1:port")
rs.add("newmember2:port")
rs.initiate();
You would then need to check and make sure that the sh.status() has been updated to reflect the new members of the replica set. In 2.2 this has become slightly easier as it should be automatic, for prior versions it was necessary to manually save the shard information in the config database, which is reflected in the documentation under sharded cluster. If it has been automatically added you will see the replica set list in the sh.status() output, similar to the following:
{ "_id" : "shard1", "host" : "shard1/SHARD1:port,newmember1:port,newmember2:port" }
If this does not happen automatically you will need to follow the procedure outlined in the documentation link above, namely from inside mongos:
db.getSiblingDB("config").shards.save({_id:"<name>", host:"<rsName>/member1,member2,..."})
Following the above example it would look like:
db.getSiblingDB("config").shards.save({_id:"SHARD1", host:"shard1/SHARD1:port,newmember1:port,newmember2:port"})
You would need to do this procedure for each shard, and you should do them one at a time so that you have all of SHARD1 as a replica set before moving on to SHARD2. You will also need to be aware that each replica set will become read-only while the initial election takes place, so at the very least you should schedule this in a downtime or maintenance window. Ideally test first in a staging environment.