How many config server do I need to run mongodb shard environment with 4 replicasets of 3 nodes in each?
What is the formula between shard config servers, shards and nodes?
You can run MongoDB with either one or 3 config servers. For a production deployment this should always be 3 to allow for redundancy. Documentation for this is available here.
Related
I am new to mongodb, after install hortonworks HDP cluster and embedded mongodb with 3 nodes at HDP cluster.
now, try to setup shardings with mongodb. I tried few things and executed few steps. when I mongo, I saw these 3 servers have shard0:PRIMARY, shard1:SECONDARY> and shard1:SECONDARY>
Q1. did this mean I have sharding working?
Q2. if this is not right, how to remove all settings and back to a initial settings?
Answer is simple. Yes, you have managed to start Replica Set, not cluster.
Cluster is group of mongod nodes or replica sets where sharded collections exists. Sharding is like oracle partitioning.
If you are building cluster, it's better to do it with replica sets, so every shard in cluster is one replica set of (at least) three nodes.
We want to create a MongoDB shard (v. 2.4). The official documentation recommends to have 3 config servers.
However, the policies of our company won't allow us to get 3 extra servers for this purpose. Since we have already 3 application servers (1 web node, 2 process nodes) we are considering to put the configuration servers in the same application servers, with the mongos. Availability is not critical for us.
What do you think about this configuration? Can we face some problem or is it discouraged for some reason?
Given that Availability is not critical for your use case, I would say it should be fine to place the config servers in the same application servers and mongos.
If one of the process nodes is down, you will lose: 1 x mongos, 1 application server and 1 config server. During this down time, the other two config servers will be read-only , which means there won't be balancing of shards, modification to cluster config etc. Although your other two mongos should still be operational (CRUD wise). If your web-node is down, then you have a bigger problem to deal with.
If two of the nodes are down (2 process nodes, or 1 web server and process node), again, you would have bigger problem to deal with. i.e. Your applications are probably not going to work anyway.
Having said that, please consider the capacity of these nodes to be able to handle a mongos, an application server and a config server. i.e. CPU, RAM, network connections, etc.
I would recommend to test the deployment architecture in a development/staging cluster first under your typical workload and use case.
Also see Sharded Cluster High Availability for more info.
Lastly, I would recommend to check out MongoDB v3.2 which is the current stable release. The config servers in v3.2 are modelled as a replica set, see Sharded Cluster config servers for more info.
In mongodb.
If you want to build a production system with two shards, each one a replica set with three nodes, how may mongod processes must you start?
why the answer is 9?
Because you need 3 replicas per shard x the 2 shards + 3 config servers to run the sharded cluster = 9 mongods. The config servers, although also mongod processes, aren't data carrying nodes. You must have 3 config servers though, to guarantee redundancy among the config server nodes.
http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/
After reading the official documentation for the MongoDB sharding architecture I have not found out why you need to have one or three config servers, and not another number.
The MongoDB documentation on Config Servers says:
"If one or two config servers become unavailable, the cluster’s metadata becomes read only. You can still read and write data from the shards, but no chunk migrations or splits will occur until all three servers are available."
Hence the reflection: one server is equivalent to a single point of failure, but with two servers we have the same behavior as three, right?
So, why absolutely three servers and not only two or more, in example?
Because the doc says also:
Config servers do not run as replica sets.
Config Server Protocols
MongoDB 3.0 and earlier only support a single type of config server deployment protocol which is referred to as the legacy SCCC (Sync Cluster Connection Configuration) as of MongoDB 3.2. An SCCC deployment has either 1 config server (development only) or 3 config servers (production).
MongoDB 3.2 deprecates the SCCC protocol and supports a new deployment type: Config Servers as Replica Sets (CSRS). A CSRS deployment has the same limits as a standard replica set, which can have 1 config server (development only) or up to 50 servers (production) as at MongoDB 3.2. A minimum of 3 CSRS servers is recommended for high availability in a production deployment, but additional servers may be useful for geographically distributed deployments.
SCCC (Sync Cluster Connection Configuration)
With SCCC, the config servers are updated using a two-phase commit protocol which requires consensus from multiple servers for a transaction. You can use a single config server for testing/development purposes, but in production usage you should always have 3. A practical answer for why you cannot use only 2 (or more than 3) servers in MongoDB is that the MongoDB code base only supports 1 or 3 config servers for an SCCC configuration.
Three servers provide a stronger guarantee of consistency than two servers, and allows for maintenance activity (for example, backups) on one config server while still having two servers available for your mongos to query. More than three servers would increase the time required to commit data across all servers.
The metadata for your sharded cluster needs to be identical across all config servers, and is maintained by the MongoDB sharding implementation. The metadata includes the essential details of which shards currently hold ranges of documents (aka chunks). In a SCCC configuration, config servers are not a replica set, so if one or more config servers are offline then the config data will be read only -- otherwise there is no means for the data to propagate to the offline config servers when they are back online.
Clearly 1 config server provides no redundancy or backup. With 2 config servers, a potential failure scenario is where the servers are available but the data on the servers does not agree (for example, one of the servers had some data corruption). With 3 config servers you can improve on the previous scenario: 2/3 servers might be consistent and you could identify the odd server out.
CSRS (Config Servers as Replica Sets)
MongoDB 3.2 deprecates the use of three mirrored mongod instances for config servers, and starting in 3.2 config servers are (by default) deployed as a replica set. Replica set config servers must use the WiredTiger 3.2+ storage engine (or another storage engine that supports the new readConcern read isolation semantics). CSRS also disallows some non-default replica set configuration options (e.g. arbiterOnly, buildIndexes, and slaveDelay) that are unsuitable for the sharded cluster metadata use case.
The CSRS deployment improves consistency and availability for config servers, since MongoDB can take advantage of the standard replica set read and write protocols for sharding config data. In addition, this allows a sharded cluster to have more than 3 config servers since a replica set can have up to 50 members (as at MongoDB 3.2).
With a CSRS deployment, write availability depends on maintaining a quorum of members that can see the current primary for a replica set. For example, a 3-node replica set would require 2 available members to maintain a primary. Additional members can be added for improved fault tolerance, subject to the same election rules as a normal replica set. A readConcern of majority is used by mongos to ensure that sharded cluster metadata can only be read once it is committed to a majority of replica set members and a readPreference of nearest is used to route requests to the nearest config server.
3 instances for config servers
1 instance for webserver & mongos
1 instance for shard 1
then when i need to start more shards i can just add more instances?
also, what is a replica set? if i had say 3 servers to shard 1 then is that a replica set?
A Replica Set is a set of computers that are clones of each other. (i.e.: replicas) Within a given set there is an elected master. By default reads and writes go to this elected master and the replicas just "tail" the changes to be up-to-date copies. If the master fails, a new one is elected and the system just keeps going. The documentation is here.
So you ask about scaling with MongoDB. There are two types of scaling:
Read Scaling: use Replica Sets (see here)
Write Scaling: use Sharding
The minimum config for Replica Sets is
- 2 full replicas
- 1 arbiter (lightweight process, breaks ties when voting)
The minimum config for Sharding is
- 1 config server
- 1 mongod process (only one shard)
- 1 or more mongos (generaly on app server)
However, you probably don't want to run like this in production. Running only a single DB, means that you only have one source for the data which can result in large down-times or total data loss. This is generally solved by using replica sets.
Additionally, the config server is quite important. MongoDB supports 1 or 3 config servers. Most production deployments use 3. Note that config servers and arbiters are very lightweight and can live on other boxes or on Amazon micro instances.
Most production deployments with sharding also involve replica sets. In fact, they usually start as replica sets.
then when i need to start more shards i can just add more instances?
From a sharding perspective it should be as easy as:
- start new shard server
- run the addshard command from a mongos
Note that when you add a shard, you will need to allow for time and resources as data migrates between shards and everything re-balances.