Mongo sharding confusion with replication turned on - mongodb

Lets say I have 3 machines. Call them A, B, C. I have mongo replication turned on. So assume A is primary. B and C are secondary.
But now I would like to shard the database. But I believe all writes still go to A since A is primary. Is that true ? Can somebody explain the need of sharding and how it increases write throughput ?
I understand mongos will find the right shard. But the writes will have to be done on the primary. And rest other machines B, C will follow. Am I missing something.
I also don't understand the statement : "Each shard is a replica set". The shard is incomplete unless it is also the primary.

Sharding is completely different to replication.
I am now going to attempt to explain your confusion without making you more confused.
When sharding is taken into consideration a replica set is a replicated range of sharded information.
As such this means that every shard in the cluster is actually a replica set in itself holding a range (can hold differnt range chunks but that's another topic) of the sharded data and replicating itself across the members of that self contained replica set.
Imagine that a shard will be a primary of its own replica set with its own set of members and what not. This is what is meant by each shard being a replica set.
As such each shard will have its own self contained primary but at the same time each primary of each replica set of each shard can receive writes from a mongos if the range that the replica set holds matches the shard key target being sent down by the client.
I hope that makes sense.
Edit
This post might help: http://www.kchodorow.com/blog/2010/08/09/sharding-and-replica-sets-illustrated/
Does that apply here ? Say I want writes acknowledged from 2 mongod instances. Is that possible ? I still have trouble understanding. Can you please explain with an example ?
Ok let's take an example. You have 3 shards which are in fact 3 replica sets, called rs1 and rs2 and rs3, consisting of 3 nodes each (1 primary and 2 secondaries). If you want a write to be acknowledged by 2 of the members of the replica set then you can do that as you normally would:
db.users.insert({username:'sammaye'},{w:2})
Where username is the shard key.
The mongos this query is sent to will seek out the correct shard (node) which is in fact a replica set, connect to that replica set and then perform the insert as normal.
So as an example, rs2 actually holds the range of all usernames starting with the letters m-s. The mongos will use its internal mapping of rs2 (gotten when connecting to the replica set) along with your write/read concern to judge which members to read from and write to.
Tags will apply all the same too.
If mongos finds a shard, which is not "primary"(as in replication) is the write still performed in the secondary ?
You are still confusing yourself here, there is no "primary" like in replication, there is a seed (sometimes called a master) server for the sharded database but that is a different thing entirely. The mongos does not have to write to the master node in anyway, it is free to write to any part of the sharded cluster that your queries allow it.

Related

MongoDB make primary shard (not to confuse with the primary of a replica set) only hold the unsharded collections

I have 3 computers (A,B,C).
On all computers there will be a replica set of the primary shard (to have a redundancy of the unsharded collections).
On comuter B and C there will be a single member shard replica set.
How to i tell mongodb to not hold any sharded collection on the primary shard replica set cluster?
Sharding is used to horizontally scale (aka scale out). If you shard but then put multiple shards on the same computer, this defeats the purpose of sharding in the first place. Simply use a 3 node replica set with your 3 computers.
With that said, see https://docs.mongodb.com/manual/tutorial/sharding-segmenting-data-by-location/ for how to route your data to specific nodes in a sharded cluster.

mongodb write concern for sharded cluster

From the mongodb docs:
mongos uses "majority" for the write concern of the shardCollection command and its helper sh.shardCollection().
In replica sets majority is Nodes/2 and round up.
Is it the majority of the config servers or what exactly?
I guess mongo doesn't specify very specifically what happens. If you dug one more page down, Write Concern, you'll read that "In sharded clusters, mongos instances will pass the write concern on to the shards."
Assuming your shard cluster is also a replica set e.g. P-S-S (primary-secondary-secondary) it should follow the same behavior as if you only had a single unsharded replica set. "For this replica set, calculated majority is two, and the write must propagate to the primary and one secondary to acknowledge the write concern to the client". The client here technically being mongos.
The other shard clusters do not need to acknowledge the write if they are not being written to. If you are writing to multiple shard clusters, then I'd assume both shard clusters need to ack the write similarly.

How to understand "The shards are replica sets."

When I put shard and Replica Set together, I am confused.
Why does the reference say that the shards are replica sets?
Do replica sets contains shards?
Can someone give me a conceptual explanation?
Replica Set is a cluster of MongoDB servers which implements Master - slave implementation. So, basically same data is shared between multiple replica i.e Master and Slave(s). Master is also termed as primary node and Slave(s) is/are considered as Secondary nodes.
It replicates your data on multiple mongo instances to solve/avoid fail overs. MongoDB also perform election of Primary node between secondary nodes automatically whenever Primary node goes down.
Sharding is used to store large data set between multiple machines. So basically, if you simply wants to compare Sharded nodes doesnt/may not contain same data where as Relicated nodes contains same data.
Sharding has different purpose,large data set is spread accross multiple machines.
Now, this large data set's subset can also be replicated to multiple nodes as primary and secondary to overcome failovers. So basically a shard can have multiple replica-set. These replica set of a shard contains subset of data for a large data set.
So, multiple shards can complete the whole large data set which are separated in the form of chunks. These chunks can be replicated within a Shard using Replica set.
You can also get more details related to this in MongoDB manual.
Sharding happens one level above replication.
When you use both sharding and replication, your cluster consists of many replica-sets and one replica-set consists of many mongod instances.
However, it is also possible to create a cluster of stand-alone mongod instances which are not replicated or have only some shards implemented as replica-sets and some shards implemented as stand-alone mongod instances.
Each shard is a replica set, not the shards are replica sets.
This is a language barrier, in English to say such a thing really means the same as "each shard is a replica set" in this context.
So to explain, say you have a collection of names a-z. Shard 1 holds a-b. This shard is also a replica set which means it has automated failover and replication of that range as well. So sharding in this sense is a top level term that comes above replica sets.
Shards are used to break a collection and store parts of it in different places. It is not necessary that a shard be a replica set, it can be a single server, but to achieve reliability and avoid loss of data, a replica set can be used as a shard instead of a single server. So, if one of the servers in the replica set goes down, the others will still hold the data.

Why do we need an 'arbiter' in MongoDB replication?

Assume we setup a MongoDB replication without arbiter, If the primary is unavailable, the replica set will elect a secondary to be primary. So I think it's kind of implicit arbiter, since the replica will elect a primary automatically.
So I am wondering why do we need a dedicated arbiter node? Thanks!
I created a spreadsheet to better illustrate the effect of Arbiter nodes in a Replica Set.
It basically comes down to these points:
With an RS of 2 data nodes, losing 1 server brings you below your voting minimum (which is "greater than N/2"). An arbiter solves this.
With an RS of even numbered data nodes, adding an Arbiter increases your fault tolerance by 1 without making it possible to have 2 voting clusters due to a split.
With an RS of odd numbered data nodes, adding an Arbiter would allow a split to create 2 isolated clusters with "greater than N/2" votes and therefore a split brain scenario.
Elections are explained [in poor] detail here. In that document it states that an RS can have 50 members (even number) and 7 voting members. I emphasize "states" because it does not explain how it works. To me it seems that if you have a split happen with 4 members (all voting) on one side and 46 members (3 voting) on the other, you'd rather have the 46 elect a primary and the 4 to be a read-only cluster. But, that's exactly what "limited voting" prevents. In that situation you will actually have a 4 member cluster with a primary and a 46 member cluster that is read only. Explaining how that makes sense is out of the scope of this question and beyond my knowledge.
Its necessary to have a arbiter in a replication for the below reasons:
Replication is more reliable if it has odd number of replica sets. Incase if there is even number of replica sets its better to add a arbiter in the replication.
Arbiters do not hold data in them and they are just to vote in election when there is any node failure.
Arbiter is a light weight process they do not consume much hardware resources.
Arbiters just exchange the user credentials data between the replica set which are encrypted.
Vote during elections,hearbeats and configureation data are not encrypted while communicating in between the replica sets.
It is better to run arbiter on a separate machine rather than along with any one of the replica set to retain high availability.
Hope this helps !!!
This really comes down to the CAP theorem whereby it is stated that if there are equal number of servers on either side of the partition the database cannot maintain CAP (Consistency, Availability, and Partition tolerance). An Arbiter is specifically designed to create an "imbalance" or majority on one side so that a primary can be elected in this case.
If you get an even number of nodes on either side MongoDB will not elect a primary and your set will not accept writes.
Edit
By either side I mean, for example, 2 on one side and 2 on the other. My English wasn't easy to understand there.
So really what I mean is both sides.
Edit
Wikipedia presents quite a good case for explaining CAP: http://en.wikipedia.org/wiki/CAP_theorem
Arbiters are an optional mechanism to allow voting to succeed when you have an even number of mongods deployed in a replicaset. Arbiters are light weight, meant to be deployed on a server that is NOT a dedicated mongo replica, i.e: the server's primary role is some other task, like a redis server. Since they're light they won't interfere (noticeably) with the system's resources.
From the docs :
An arbiter does not have a copy of data set and cannot become a
primary. Replica sets may have arbiters to add a vote in elections of
for primary. Arbiters allow replica sets to have an uneven number of
members, without the overhead of a member that replicates data.
http://docs.mongodb.org/manual/core/replica-set-arbiter/
http://docs.mongodb.org/manual/core/replica-set-elections/#replica-set-elections

How does MongoDB do both sharding and replication at the same time?

For scaling/failover mongodb uses a “replica set” where there is a primary and one or more secondary servers. Primary is used for writes. Secondaries are used for reads. This is pretty much master slave pattern used in SQL programming.
If the primary goes down a secondary in the cluster of secondaries takes its place.
So the issue of horizontally scaling and failover is taken care of. However, this is not a solution which allows for sharding it seems. A true shard holds only a portion of the entire data, so if the secondary in a replica set is shard how can it qualify as primary when it doesn’t have all of the data needed to service the requests ?
Wouldn't we have to have a replica set for each one of the shards?
This obviously a beginner question so a link that visually or otherwise illustrates how this is done would be helpful.
Your assumption is correct, each shard contains a separate replica set. When a write request comes in, MongoS finds the right shard for it based on the shard key, and the data is written to the Primary of the replica set contained in that shard. This results in write scaling, as a (well chosen) shard key should distribute writes over all your shards.
A shard is the sum of a primary and secondaries (replica set), so yes, you would have to have a replica set in each shard.
The portion of the entire data is held in the primary and it's shared with the secondaries to maintain consistency. If the primary goes out, a secondary is elected to be the new primary and has the same data as its predecessor to begin serving immediately. That means that the sharded data is still present and not lost.
You would typically map individual shards to separate replica sets.
See http://docs.mongodb.org/manual/core/sharded-clusters/ for an overview of MongoDB sharding.