Why do we need an 'arbiter' in MongoDB replication? - mongodb

Assume we setup a MongoDB replication without arbiter, If the primary is unavailable, the replica set will elect a secondary to be primary. So I think it's kind of implicit arbiter, since the replica will elect a primary automatically.
So I am wondering why do we need a dedicated arbiter node? Thanks!

I created a spreadsheet to better illustrate the effect of Arbiter nodes in a Replica Set.
It basically comes down to these points:
With an RS of 2 data nodes, losing 1 server brings you below your voting minimum (which is "greater than N/2"). An arbiter solves this.
With an RS of even numbered data nodes, adding an Arbiter increases your fault tolerance by 1 without making it possible to have 2 voting clusters due to a split.
With an RS of odd numbered data nodes, adding an Arbiter would allow a split to create 2 isolated clusters with "greater than N/2" votes and therefore a split brain scenario.
Elections are explained [in poor] detail here. In that document it states that an RS can have 50 members (even number) and 7 voting members. I emphasize "states" because it does not explain how it works. To me it seems that if you have a split happen with 4 members (all voting) on one side and 46 members (3 voting) on the other, you'd rather have the 46 elect a primary and the 4 to be a read-only cluster. But, that's exactly what "limited voting" prevents. In that situation you will actually have a 4 member cluster with a primary and a 46 member cluster that is read only. Explaining how that makes sense is out of the scope of this question and beyond my knowledge.

Its necessary to have a arbiter in a replication for the below reasons:
Replication is more reliable if it has odd number of replica sets. Incase if there is even number of replica sets its better to add a arbiter in the replication.
Arbiters do not hold data in them and they are just to vote in election when there is any node failure.
Arbiter is a light weight process they do not consume much hardware resources.
Arbiters just exchange the user credentials data between the replica set which are encrypted.
Vote during elections,hearbeats and configureation data are not encrypted while communicating in between the replica sets.
It is better to run arbiter on a separate machine rather than along with any one of the replica set to retain high availability.
Hope this helps !!!

This really comes down to the CAP theorem whereby it is stated that if there are equal number of servers on either side of the partition the database cannot maintain CAP (Consistency, Availability, and Partition tolerance). An Arbiter is specifically designed to create an "imbalance" or majority on one side so that a primary can be elected in this case.
If you get an even number of nodes on either side MongoDB will not elect a primary and your set will not accept writes.
Edit
By either side I mean, for example, 2 on one side and 2 on the other. My English wasn't easy to understand there.
So really what I mean is both sides.
Edit
Wikipedia presents quite a good case for explaining CAP: http://en.wikipedia.org/wiki/CAP_theorem

Arbiters are an optional mechanism to allow voting to succeed when you have an even number of mongods deployed in a replicaset. Arbiters are light weight, meant to be deployed on a server that is NOT a dedicated mongo replica, i.e: the server's primary role is some other task, like a redis server. Since they're light they won't interfere (noticeably) with the system's resources.
From the docs :
An arbiter does not have a copy of data set and cannot become a
primary. Replica sets may have arbiters to add a vote in elections of
for primary. Arbiters allow replica sets to have an uneven number of
members, without the overhead of a member that replicates data.
http://docs.mongodb.org/manual/core/replica-set-arbiter/
http://docs.mongodb.org/manual/core/replica-set-elections/#replica-set-elections

Related

Spring Boot Mongo DB Replica Set not working as expected [duplicate]

I am investigating using MongoDB ReplicaSet for high availability.
But just discovered that in ReplicaSet with 3 nodes, if PRIMARY mongod is the only one left (that is 2 other mongod instances died or were shut down), then after several seconds it switches role to SECONDARY and accepts writes no more. That makes Replica Set worth less than single instance.
I know & understand about PRIMARY election, but the PRIMARY role is fixed to a server (by using priority set to ,say, 10) and (for example due to network problems) other servers become inaccessible, why the main server just gives up?!
Tested with 2.4.8 on Windows (mongodb-win32-x86_64-2008plus-2.4.8) and Linux (CentOS) and 2.0.x on Linux
BOUNTY STARTED:
If the replica set gives up when PRIMARY feels alone, what are alternative to ensure 100% availability? Or maybe there is special configuration needed for the case. The current implementation makes ReplicaSet fragile in case of network problems.
UPDATED:
Alas, I have not said before the scenario when #3 goes down (PRIMARY & SECONDARY are left)
and then after a while SECONDARY goes down. Then PRIMARY really just "gives up", because it is already known that #3 is unavailable for some time. This was actually tested in my test environment.
var rsconfig = {"_id":"rs4","members":[{"_id":0,"host":"localhost:27041","priority":10},{"_id":1,"host":"localhost:27042"},{"_id":2,"host":"localhost:27043","arbiterOnly":true}]}
printjson(rsconfig)
rs.initiate(rsconfig)
We initially thought to put SECONDARY and #3 (that is ARBITER) on the same server,
but because of question in title, we cannot use such configuration.
Thanks to Alan Spencer for first explaining the logic that MongoDB takes.
This is expected, since the majority of the members are down MongoDB does not assume the last remaining member is consistent.
When you have a majority of the members down there are a couple of options: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/
You say that when the primary is cut off from the other two nodes it should stay up, otherwise write availability is lost, but that's not necessarily the case. If the other two nodes are actually up and on the other side of the network partition, then they have elected a new primary (as two out of three are a majority) and it is that primary that is accepting new writes.
If the previous primary continued to accept writes, you would have potentially conflicting data which there is no mechanism to resolve. Since MongoDB replica set is a single primary architecture (as opposed to a multi-master system) the election mechanism assures that there cannot be two primaries at the same time.
From the point of view of two secondaries, network partition is the same as primary being unavailable, and from the primary's point of view, network partition is indistinguishable from "both other nodes are down". It steps down, because in case of network partition there may already be another primary on the other side of it, and it assures there cannot be two primaries by stepping down.
It is not the case that the "replica set" gives up when primary feels alone - the reason primary steps down when it feels alone is precisely to preserve the integrity of the replica set as a whole. It is not true that setting high priority score fixes a role to a node - a primary can only be elected via consensus among majority - all priority scores do is influence election when all other things are equal.
I highly recommend the excellent "call me maybe" series as reading to understand the challenges of write availability in a distributed system: http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions
Just to chime in on the answers. The behavior in this scenario is expected. MongoDB uses a leader election algorithm to elect the new leader. So if there is no majority you cannot elect a leader and hence no writes.
Your only option at the point where 2 nodes are down is to reconfigure your replica set as a 1 node replica set to make it writeable. You can do this using the rs.reconfig cmd with just one server. However please note that this should just be a temporary and emergency configuration. For the longer duration you should have an odd number of total nodes (3+) in your replica set configuration.
Try to use arbiters, most documents say to use just one, but in you case, you need to win the election.
From http://docs.mongodb.org/manual/core/replica-set-architectures/ :
Fault tolerance for a replica set is the number of members that can
become unavailable and still leave enough members in the set to elect
a primary. In other words, it is the difference between the number of
members in the set and the majority needed to elect a primary. Without
a primary, a replica set cannot accept write operations. Fault
tolerance is an effect of replica set size, but the relationship is
not direct.
More on elections: http://docs.mongodb.org/manual/core/replica-set-elections/
More on arbiters: http://docs.mongodb.org/manual/faq/replica-sets/#how-many-arbiters-do-replica-sets-need

Requires simple explanation on Arbiter's role in a given mongoDB replica set

I came across MongoDB official site explaining on having odd number of members replica set up. I also heard of the term Arbiter from the same site, which based on my understanding, it will not be elected as primary and it does participate on election (from https://docs.mongodb.com/manual/core/replica-set-arbiter/).
There is also a post related to Arbiter in Why do we need an 'arbiter' in MongoDB replication? which then relates to CAP theorem, which further gets things more complicated.
First of all, why do we need to make the number of members odd? Also, can someone explain to me what this Arbiter is and what is its role in a given replica set in simple layman English??
Thanks in advance.
In short: it is to stop the two normal nodes of the replica set getting into a split-brain situation if they lose contact with each other.
MongoDB replica sets are designed so that, if one or more members goes down or loses contact, the other members are able to keep going as long as between them they have a majority. The majority clause is important: without that, you might have a situation where the network is split in two, and the nodes on each side of the partition think that they're still carrying on the replica set, and end up with different sets of data.
So to avoid the split brain problem, the nodes of a replica set will not continue if they can't command an absolute majority. An example of this is if you have two nodes, in a replica set like this:
If they lose communication, the outcome is symmetrical:
Each one will reason the same way:
realise it has lost communication with the other
assess whether it is possible to keep the replica set going
realise that 1 node (out of 2) does not constitute a majority
revert to Secondary mode
The difference an Arbiter makes
If there is a third node, then even if the two main nodes lose contact with each other then there will still be one of them in contact with the arbiter. This allows the two main nodes to make different decisions, and keep the replica set going while avoiding the split-brain problem.
Consider the following example of a 3-node replica set:
Whichever way the network partition goes, one node will still be in contact with the arbiter; for example like this:
Node A will:
realise it can contact neither node B nor the arbiter
assess whether it is possible to keep the replica set going
realise that 1 node (out of 3) does not constitute a majority
revert to Secondary mode
Whereas node B is able to react differently:
realise it cannot contact node A, but still has contact with the arbiter
assess whether it is possible to keep the replica set going
realise that 2 nodes (out of 3) do constitute a majority
take over as Primary
This also illustrates how you should deploy an arbiter to get that benefit:
try to put the arbiter on a system independent of both the data-bearing nodes, to maximise the chance of it still being able to communicate with either throughout network problems
it doesn't need to store data, so you don't need high-spec hardware
Just 1 arbiter is enough to break the deadlock; you don't get any benefit from multiple arbiters
Take the example of a 2-member replica set: in the event of a network-partitioning, i.e., the 2 members lost touch of each other, who gets to become the primary? There will be a tie and a need for a tie-breaker. That would not be the case if we have a 3-member replica set: the group that contains two nodes will win and one of them will become primary. That is the basis of the requirement for an odd number of nodes in a replica set. As for an arbiter, it happens to be light weight so that I guess one can save money by having in place a smaller machine, since we do not expect it to hold any data, and that we just need it to be present to vote for primary.

Why is an arbiter needed for an election in a primary - secondary - arbiter MongoDB replica set?

Mongo docs list this three-member configuration: primary, secondary, arbiter, as the minimal architecture of a replica set.
Why would an arbiter be necessary there? If the primary fails, the secondary won't see the heartbeat, so it needs to become primary. In other words, why wouldn't a primary + secondary configuration be sufficient? This related question doesn't seem to address the issue, as it discusses larger numbers of nodes.
Suppose you have only two servers, one primary and one secondary.
If suddenly the secondary can not reach the primary server it could be that the primary is down (in that case the secondary should become primary) but it could be as well a network issue that isolated the secondary (this the secondary is the one that is in deed down).
however, if you have an arbiter and the secondary cannot reach the primary but it CAN reach the arbiter then the issue is with the primary so it must become the new primary. If it CANNOT reach the primary, nor the arbiter, then the secondary knows that the issue is that he is isolated/broken -poor secondary :(- so he must not become the primary
If you bring the Arbiter down to its core it is essentially a none-data holding member used for voting.
One case for an Arbiter is as I state in the linked question: Why do we need an 'arbiter' in MongoDB replication? to break the problems of CAP but that is not its true purpose since you could easily replace that Arbiter with a data holding node and have the same effect.
However, an Arbiter will have a few benefits:
Small footprint
No data
No need to synch
can instantly vote
can be put literally anywhere in your network, app server or even another secondary to boost that part of your network (this comes into partitions).
So an Arbiter is extremely useful, even on one side of a partition (i.e. you have no partitioning in your network).
Now to explain base setup. An Arbiter would NOT be required, you could factor it out for a data holding node, but 3 data holding nodes is not the minimum (that is the minimum you need to keep automatic failover), 2 data holding nodes and 1 Arbiter is actually the minimum.
Now to answer:
In other words, why wouldn't a primary + secondary configuration be sufficient?
Because if one of those goes down there is only 50% of the vote left (2-1 = 1) and 50% is not classed as a sufficient majority for MongoDB to actually vote in a member (judged by the total configured voteable members in your rs.config).
Also in this case MongoDB does not actually know if that last member is the last member. It needs other members to tell it otherwise.
So yes, this is why you need a third guy.

Advantage of Mongodb Aribter in mongo replication

I have the following set up in my production environement,
3 nodes, with 1 primary and 2 secondary...
I came to know about the arbiter, Since mongo db itself does the work of election within the replicaset. What is the need of arbiter in the mongo replication?
In which scenario, arbiter will be useful?
Regards,
Harry
The point of an Arbiter is to break the deadlock when an election needs to be held for a Primary. In such that there are a majority of nodes that can make the decision as to which node to elect.
In your current configuration you have an odd number of nodes, so the election process is simple when all nodes are up, and in a failover one of the other nodes will simply be elected.
If you have an even number of nodes in a replica set to begin with and Arbiter may be required in the case where you do not want to commit the same level of hardware to have say a five node* replica set. Here you could use an arbiter on a lower spec machine in order to avoid a deadlock in elections.
An arbiter is also useful if you want to give preference to certain nodes to be elected as the Primary.
Plenty of information in the documentation:
http://docs.mongodb.org/manual/core/replica-set-members/
An arbiter enables MongoDB to garnish majority holdings across partitions without the need for a data holding node.
This all stems from the CAP theorem: http://en.wikipedia.org/wiki/CAP_theorem and essentially it enables for one side to have a majority voting as such stops stale mate elections across paritions within your clusters.
Of course Arbiters can be used only on side too if you want say 4 data holding nodes and 1 that is not.
In addition to breaking ties during election, arbiters do not hold data. An arbiter is lightweight and can run alongside other processes. Since it does not participate in a replication/serving queries, it does not use much CPU/memory. Arbiters have minimal resource requirements and do not require dedicated hardware.

MongoDB ReplicaSet - PRIMARY role falls to SECONDARY when only PRIMARY is left

I am investigating using MongoDB ReplicaSet for high availability.
But just discovered that in ReplicaSet with 3 nodes, if PRIMARY mongod is the only one left (that is 2 other mongod instances died or were shut down), then after several seconds it switches role to SECONDARY and accepts writes no more. That makes Replica Set worth less than single instance.
I know & understand about PRIMARY election, but the PRIMARY role is fixed to a server (by using priority set to ,say, 10) and (for example due to network problems) other servers become inaccessible, why the main server just gives up?!
Tested with 2.4.8 on Windows (mongodb-win32-x86_64-2008plus-2.4.8) and Linux (CentOS) and 2.0.x on Linux
BOUNTY STARTED:
If the replica set gives up when PRIMARY feels alone, what are alternative to ensure 100% availability? Or maybe there is special configuration needed for the case. The current implementation makes ReplicaSet fragile in case of network problems.
UPDATED:
Alas, I have not said before the scenario when #3 goes down (PRIMARY & SECONDARY are left)
and then after a while SECONDARY goes down. Then PRIMARY really just "gives up", because it is already known that #3 is unavailable for some time. This was actually tested in my test environment.
var rsconfig = {"_id":"rs4","members":[{"_id":0,"host":"localhost:27041","priority":10},{"_id":1,"host":"localhost:27042"},{"_id":2,"host":"localhost:27043","arbiterOnly":true}]}
printjson(rsconfig)
rs.initiate(rsconfig)
We initially thought to put SECONDARY and #3 (that is ARBITER) on the same server,
but because of question in title, we cannot use such configuration.
Thanks to Alan Spencer for first explaining the logic that MongoDB takes.
This is expected, since the majority of the members are down MongoDB does not assume the last remaining member is consistent.
When you have a majority of the members down there are a couple of options: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/
You say that when the primary is cut off from the other two nodes it should stay up, otherwise write availability is lost, but that's not necessarily the case. If the other two nodes are actually up and on the other side of the network partition, then they have elected a new primary (as two out of three are a majority) and it is that primary that is accepting new writes.
If the previous primary continued to accept writes, you would have potentially conflicting data which there is no mechanism to resolve. Since MongoDB replica set is a single primary architecture (as opposed to a multi-master system) the election mechanism assures that there cannot be two primaries at the same time.
From the point of view of two secondaries, network partition is the same as primary being unavailable, and from the primary's point of view, network partition is indistinguishable from "both other nodes are down". It steps down, because in case of network partition there may already be another primary on the other side of it, and it assures there cannot be two primaries by stepping down.
It is not the case that the "replica set" gives up when primary feels alone - the reason primary steps down when it feels alone is precisely to preserve the integrity of the replica set as a whole. It is not true that setting high priority score fixes a role to a node - a primary can only be elected via consensus among majority - all priority scores do is influence election when all other things are equal.
I highly recommend the excellent "call me maybe" series as reading to understand the challenges of write availability in a distributed system: http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions
Just to chime in on the answers. The behavior in this scenario is expected. MongoDB uses a leader election algorithm to elect the new leader. So if there is no majority you cannot elect a leader and hence no writes.
Your only option at the point where 2 nodes are down is to reconfigure your replica set as a 1 node replica set to make it writeable. You can do this using the rs.reconfig cmd with just one server. However please note that this should just be a temporary and emergency configuration. For the longer duration you should have an odd number of total nodes (3+) in your replica set configuration.
Try to use arbiters, most documents say to use just one, but in you case, you need to win the election.
From http://docs.mongodb.org/manual/core/replica-set-architectures/ :
Fault tolerance for a replica set is the number of members that can
become unavailable and still leave enough members in the set to elect
a primary. In other words, it is the difference between the number of
members in the set and the majority needed to elect a primary. Without
a primary, a replica set cannot accept write operations. Fault
tolerance is an effect of replica set size, but the relationship is
not direct.
More on elections: http://docs.mongodb.org/manual/core/replica-set-elections/
More on arbiters: http://docs.mongodb.org/manual/faq/replica-sets/#how-many-arbiters-do-replica-sets-need