Mongodb Replicaset on AZURE with an Arbiter - mongodb

I want to use MongoDB with replication; I created a VM with 2 secondary nodes and 1 arbiter:
1 Primary
2 Secondary
1 Arbiter
I'm trying to understand how this system works, so I have some questions:
1) According to information "If a replica set has an even number of members, add an arbiter." I added an arbiter. But I'm not sure if I have done it correctly. Does this even number apply to secondaries or to all members in total?
2) What does this arbiter doing? I actually don't understand its job.
3) I created public IP addresses for each VM, in order to connect to them from outside. I successfully connected from my application, using this connection string:
mongodb://username:password#vm0:27017,vm1:27017,vm2:27017/dbname?replicaSet=xxx&readPreference=primaryPreferred
I didn't add the arbiter in this connection string but Should I add it or not?
4) When I shut down the primary machine, one of the secondary machine successfully became primary as I expect. There is no problem in this case; but when I shut down the second primary machine my application throws an error. The second secondary node has not become primary - why is this happening?
5) If all VMs are working but I shut down the arbiter, my application again throws an error and I cannot connect to the db. I'm trying this because I'm thinking the case of if there will be something wrong on arbiter machine and it may be shut down in the future because of the maintenance or any other problems.
Maybe because I didn't understand the role of an arbiter; I'm thinking this is wrong but why it is not converting any secondary machine to arbiter? And why when I shut down the arbiter does the whole system not work?
Thanks.

1) If you have 1 Primary and 2 Secondaries, you have 3 members in your replica set. Therefore you should not be adding an arbiter. You already have an odd number of nodes.
2) An arbiter is a node which doesn't hold data and can't be elected as Primary. It is only used to elect a new Primary if the current Primary goes down.
For example, say you have 1 Primary and 1 Secondary. The replica set has 2 members. If the primary goes down, the replica set will attempt to vote to elect a new Primary. In order for a node to be elected, it needs to win over half the votes. But if the Secondary votes for itself, it will only get 1 out of 2 votes. That's not more than half so it will not be elected. Thus the replica set will not be able to elect a new Primary and your whole replica set will go down.
To fix this, you can add an arbiter to the replica set. This is usually a much smaller machine since it doesn't need to hold data. It just has one job, voting for the Secondary to be the new Primary in the case of elections.
But, since you already have 3 data-bearing nodes, you won't want to add an arbiter. You can read more about arbiters here.
3) You can add arbiters to connection strings but in general you won't need to. Adding the data-bearing nodes is just fine. That's what people usually do.
4) You have 4 members in the replica set. You took down 2 of them. That means there are only 2 votes left. The final secondary won't be able to get more than 50% of the votes so no Primary will be elected.
In general, testing two nodes going down is overkill. You probably want a 3 member replica set. Each member should be in a different availability zone (Availability Set in Azure). If two nodes go down your replica set will be unavailable. But two nodes going down at the same time is very unlikely if all nodes are in different availability zones. So don't worry too much about more than one node going down. If that's a real concern (in most applications it really isn't), you want to make a 5 member replica set.
5) That's weird. This sounds like your replica set might be configured incorrectly. As I said, you don't need an arbiter anyway. So you could just try setting it up again without the arbiter and see if it works. Open a new question if you're still having issues. Make sure to include the output of running rs.status() in your question.

Related

Two nodes MongoDB replica set without arbiter

Is it possible to create a MongoDB replica set consisting of only 1 primary and 1 secondary member?
I would like to have delayed replica set that will copy data from primary with delay of 24 hours. I know I can put arbiter on one of the servers (primary or secondary, I know this is not advised but my only wish is to run this configuration on two servers) and it would run fine, but I want to know if it is possible to completely kick arbiter out.
It would look like this:
Short answer: don't.
Long answer: the way automatic failover works in MongoDB is that a replica set needs a qualified majority to successfully elect a new primary. Delayed members do have votes in elections. So if either of your nodes fails the replica set finds out that it doesn't have this majority and the current primary steps down even if it didn't fail. So what you essentially do is doubling the chances of making your replica set fail. An arbiter is a very cheap process, in term of RAM usage, CPU and even disk space when run with --smallfiles --no-journal --noprealloc or the equivalent options set in the config file. Note that the mentioned options are safe to use, since an arbiter essentially only checks the heartbeats of the data bearing nodes. You could put the arbiter on the application server for example.
Disclaimer: the following procedure is strongly discouraged to use. Proceed at your own risk.
You could set the votes of the delayed server to 0. This way the undelayed node will call for an election in case the delayed member fails, comes to the conclusion that it is the only node online of the replica set and that it has the majority of votes (1/1) and will continue to work as expected. This course of action needs some attention, as you will have an even number of votes again in case you add a member to the replica set later and makes it necessary to reconfigure the replica set. It also has serious implications with network fragmentation issues. Again: Use at your own risk
Yes, it is possible but not recommended. The caveat of this approach is no automatic failovers.
If you primary goes down then you will have to manually make the other server as primary.
If you are keeping you secondary only as a mirror of your primary and you are fine with manual failover then it should work for you.
More info here:
http://openmymind.net/Does-My-Replica-Set-Need-An-Arbiter/
Yes you can and all you really need to do is set the member to not be eligible for primary.
There is documentation on how to make sure a member cannot be elected as primary here: http://docs.mongodb.org/manual/tutorial/configure-secondary-only-replica-set-member/
In this case, the best option is add an arbiter. I tried before with votes, but on 2 nodes replicaset you can have some issues with sync.

Why is an arbiter needed for an election in a primary - secondary - arbiter MongoDB replica set?

Mongo docs list this three-member configuration: primary, secondary, arbiter, as the minimal architecture of a replica set.
Why would an arbiter be necessary there? If the primary fails, the secondary won't see the heartbeat, so it needs to become primary. In other words, why wouldn't a primary + secondary configuration be sufficient? This related question doesn't seem to address the issue, as it discusses larger numbers of nodes.
Suppose you have only two servers, one primary and one secondary.
If suddenly the secondary can not reach the primary server it could be that the primary is down (in that case the secondary should become primary) but it could be as well a network issue that isolated the secondary (this the secondary is the one that is in deed down).
however, if you have an arbiter and the secondary cannot reach the primary but it CAN reach the arbiter then the issue is with the primary so it must become the new primary. If it CANNOT reach the primary, nor the arbiter, then the secondary knows that the issue is that he is isolated/broken -poor secondary :(- so he must not become the primary
If you bring the Arbiter down to its core it is essentially a none-data holding member used for voting.
One case for an Arbiter is as I state in the linked question: Why do we need an 'arbiter' in MongoDB replication? to break the problems of CAP but that is not its true purpose since you could easily replace that Arbiter with a data holding node and have the same effect.
However, an Arbiter will have a few benefits:
Small footprint
No data
No need to synch
can instantly vote
can be put literally anywhere in your network, app server or even another secondary to boost that part of your network (this comes into partitions).
So an Arbiter is extremely useful, even on one side of a partition (i.e. you have no partitioning in your network).
Now to explain base setup. An Arbiter would NOT be required, you could factor it out for a data holding node, but 3 data holding nodes is not the minimum (that is the minimum you need to keep automatic failover), 2 data holding nodes and 1 Arbiter is actually the minimum.
Now to answer:
In other words, why wouldn't a primary + secondary configuration be sufficient?
Because if one of those goes down there is only 50% of the vote left (2-1 = 1) and 50% is not classed as a sufficient majority for MongoDB to actually vote in a member (judged by the total configured voteable members in your rs.config).
Also in this case MongoDB does not actually know if that last member is the last member. It needs other members to tell it otherwise.
So yes, this is why you need a third guy.

ReplicaSet with only 2 nodes

i've got two server with a mongo instance each.
On the first server i set mongo instance as primary and on the second mongo is secondary.
I haven't got the possibility to take another server to make it as arbiter.
How can i use mongodb with just two server?
If primary fails, secondary becomes automatically primary?
Thanks!
How can i use mongodb with just two server?
If you really want to go down this road, which may I add is a very bad road then you can set your primary to have no votes, in which case the only voting member would be the secondary in the event of a failover, however, this then causes another problem. In the event of a secondary failover you cannot have a primary elected (failover of any member will trigger an election).
So even though with 2 members you can account for one failover you cannot account for both equally.
It is not a good practice to have even number of members in replica set because it leads to election problem. In order to be elected node is required to get majority of votes. If you have two members you need to get two votes, that is impossible in case at least one node is down. There are several options:
add lightweight arbiter node to the first or second server to replicaSet, so you would have three members in replica set. It doesn't prevent you from recovery in case of network partition, but it is a bit better than just having two node replica set.
use replica set in master-slave mode, i.e. without automatic recovery, you could achive it by setting votes:2 for primary. If primary is down, you need to reconfigure replica set and set votes:2 for secondary, then secondary would be elected as primary. So you would have option for manual recovery.

Mongodb replica set odd member count vs even member count + an arbiter

I've read quite a bit about mongodb replica set and how the elections work on a failover. My curiosity is assuming the client will use readPreference set to primary only, is there any advantage to having odd number of members against having even number of members + 1 arbiter?
For example if you have a 3 member replica set you can set all 3 members to be replicas or you can have only 2 replicas and an arbiter (which you can install on a smaller machine). The safety is basically the same, any machine can go down and the replica set is still ok, but if two of them go down then the replica set is in stalemate (it cannot elect a new primary).
The only difference is that in the second case you can use a way smaller machine for the arbiter.
It's actually not true that three data holding nodes provide the same "safety" net as two data holding nodes plus an arbiter.
Consider these cases:
1) one of your nodes loses its disk and you need to fully resync it. If you had three data holding nodes you can resync off of the other secondary, instead of the primary (which would reduce the load on the primary).
2) one of your nodes loses its disk and it takes you a while to located a new one. While that secondary is down you are running with ZERO safety net if you had two nodes and an arbiter since you only have one node with data left, if anything happens to it, you are toast.

Why do we need an 'arbiter' in MongoDB replication?

Assume we setup a MongoDB replication without arbiter, If the primary is unavailable, the replica set will elect a secondary to be primary. So I think it's kind of implicit arbiter, since the replica will elect a primary automatically.
So I am wondering why do we need a dedicated arbiter node? Thanks!
I created a spreadsheet to better illustrate the effect of Arbiter nodes in a Replica Set.
It basically comes down to these points:
With an RS of 2 data nodes, losing 1 server brings you below your voting minimum (which is "greater than N/2"). An arbiter solves this.
With an RS of even numbered data nodes, adding an Arbiter increases your fault tolerance by 1 without making it possible to have 2 voting clusters due to a split.
With an RS of odd numbered data nodes, adding an Arbiter would allow a split to create 2 isolated clusters with "greater than N/2" votes and therefore a split brain scenario.
Elections are explained [in poor] detail here. In that document it states that an RS can have 50 members (even number) and 7 voting members. I emphasize "states" because it does not explain how it works. To me it seems that if you have a split happen with 4 members (all voting) on one side and 46 members (3 voting) on the other, you'd rather have the 46 elect a primary and the 4 to be a read-only cluster. But, that's exactly what "limited voting" prevents. In that situation you will actually have a 4 member cluster with a primary and a 46 member cluster that is read only. Explaining how that makes sense is out of the scope of this question and beyond my knowledge.
Its necessary to have a arbiter in a replication for the below reasons:
Replication is more reliable if it has odd number of replica sets. Incase if there is even number of replica sets its better to add a arbiter in the replication.
Arbiters do not hold data in them and they are just to vote in election when there is any node failure.
Arbiter is a light weight process they do not consume much hardware resources.
Arbiters just exchange the user credentials data between the replica set which are encrypted.
Vote during elections,hearbeats and configureation data are not encrypted while communicating in between the replica sets.
It is better to run arbiter on a separate machine rather than along with any one of the replica set to retain high availability.
Hope this helps !!!
This really comes down to the CAP theorem whereby it is stated that if there are equal number of servers on either side of the partition the database cannot maintain CAP (Consistency, Availability, and Partition tolerance). An Arbiter is specifically designed to create an "imbalance" or majority on one side so that a primary can be elected in this case.
If you get an even number of nodes on either side MongoDB will not elect a primary and your set will not accept writes.
Edit
By either side I mean, for example, 2 on one side and 2 on the other. My English wasn't easy to understand there.
So really what I mean is both sides.
Edit
Wikipedia presents quite a good case for explaining CAP: http://en.wikipedia.org/wiki/CAP_theorem
Arbiters are an optional mechanism to allow voting to succeed when you have an even number of mongods deployed in a replicaset. Arbiters are light weight, meant to be deployed on a server that is NOT a dedicated mongo replica, i.e: the server's primary role is some other task, like a redis server. Since they're light they won't interfere (noticeably) with the system's resources.
From the docs :
An arbiter does not have a copy of data set and cannot become a
primary. Replica sets may have arbiters to add a vote in elections of
for primary. Arbiters allow replica sets to have an uneven number of
members, without the overhead of a member that replicates data.
http://docs.mongodb.org/manual/core/replica-set-arbiter/
http://docs.mongodb.org/manual/core/replica-set-elections/#replica-set-elections