I'm facing some issues when trying the following:
having a three-node MongoDB 4.0 cluster, no sharding, one mongodb instance as primary, the rest, secondaries. when I shut down the primary, one of the secondaries becomes the master, the other remains as such when I shut down the new primary (former secondary), the remain secondary don't become master, so cluster stays inoperative
I've been following steps as documentation dictates, so there should be some configuration I forgot. I even change the order of creation, I always get the same result
What am I doing wrong?
Thanks in advance.
MongoDB cannot automatically failover to a single member, this is due to how elections work.
If you think about this logically, if there was a network partition between both the primary and secondary, how would either of them know if one or the other is down... so they both step down to be secondaries until one or the other can see the majority of the node.
https://docs.mongodb.com/manual/core/replica-set-elections/#network-partition
A network partition may segregate a primary into a partition with a minority of nodes. When the primary detects that it can only see a minority of nodes in the replica set, the primary steps down as primary and becomes a secondary. Independently, a member in the partition that can communicate with a majority of the nodes (including itself) holds an election to become the new primary.
Election process need 'majority' to elect PRIMARY , e.g. when you have 2 nodes down from 3 , there will be no majority of nodes to elect the new PRIMARY so the 3th node will stay in SECONDARY until you start at least one of the other members.
Related
I am investigating using MongoDB ReplicaSet for high availability.
But just discovered that in ReplicaSet with 3 nodes, if PRIMARY mongod is the only one left (that is 2 other mongod instances died or were shut down), then after several seconds it switches role to SECONDARY and accepts writes no more. That makes Replica Set worth less than single instance.
I know & understand about PRIMARY election, but the PRIMARY role is fixed to a server (by using priority set to ,say, 10) and (for example due to network problems) other servers become inaccessible, why the main server just gives up?!
Tested with 2.4.8 on Windows (mongodb-win32-x86_64-2008plus-2.4.8) and Linux (CentOS) and 2.0.x on Linux
BOUNTY STARTED:
If the replica set gives up when PRIMARY feels alone, what are alternative to ensure 100% availability? Or maybe there is special configuration needed for the case. The current implementation makes ReplicaSet fragile in case of network problems.
UPDATED:
Alas, I have not said before the scenario when #3 goes down (PRIMARY & SECONDARY are left)
and then after a while SECONDARY goes down. Then PRIMARY really just "gives up", because it is already known that #3 is unavailable for some time. This was actually tested in my test environment.
var rsconfig = {"_id":"rs4","members":[{"_id":0,"host":"localhost:27041","priority":10},{"_id":1,"host":"localhost:27042"},{"_id":2,"host":"localhost:27043","arbiterOnly":true}]}
printjson(rsconfig)
rs.initiate(rsconfig)
We initially thought to put SECONDARY and #3 (that is ARBITER) on the same server,
but because of question in title, we cannot use such configuration.
Thanks to Alan Spencer for first explaining the logic that MongoDB takes.
This is expected, since the majority of the members are down MongoDB does not assume the last remaining member is consistent.
When you have a majority of the members down there are a couple of options: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/
You say that when the primary is cut off from the other two nodes it should stay up, otherwise write availability is lost, but that's not necessarily the case. If the other two nodes are actually up and on the other side of the network partition, then they have elected a new primary (as two out of three are a majority) and it is that primary that is accepting new writes.
If the previous primary continued to accept writes, you would have potentially conflicting data which there is no mechanism to resolve. Since MongoDB replica set is a single primary architecture (as opposed to a multi-master system) the election mechanism assures that there cannot be two primaries at the same time.
From the point of view of two secondaries, network partition is the same as primary being unavailable, and from the primary's point of view, network partition is indistinguishable from "both other nodes are down". It steps down, because in case of network partition there may already be another primary on the other side of it, and it assures there cannot be two primaries by stepping down.
It is not the case that the "replica set" gives up when primary feels alone - the reason primary steps down when it feels alone is precisely to preserve the integrity of the replica set as a whole. It is not true that setting high priority score fixes a role to a node - a primary can only be elected via consensus among majority - all priority scores do is influence election when all other things are equal.
I highly recommend the excellent "call me maybe" series as reading to understand the challenges of write availability in a distributed system: http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions
Just to chime in on the answers. The behavior in this scenario is expected. MongoDB uses a leader election algorithm to elect the new leader. So if there is no majority you cannot elect a leader and hence no writes.
Your only option at the point where 2 nodes are down is to reconfigure your replica set as a 1 node replica set to make it writeable. You can do this using the rs.reconfig cmd with just one server. However please note that this should just be a temporary and emergency configuration. For the longer duration you should have an odd number of total nodes (3+) in your replica set configuration.
Try to use arbiters, most documents say to use just one, but in you case, you need to win the election.
From http://docs.mongodb.org/manual/core/replica-set-architectures/ :
Fault tolerance for a replica set is the number of members that can
become unavailable and still leave enough members in the set to elect
a primary. In other words, it is the difference between the number of
members in the set and the majority needed to elect a primary. Without
a primary, a replica set cannot accept write operations. Fault
tolerance is an effect of replica set size, but the relationship is
not direct.
More on elections: http://docs.mongodb.org/manual/core/replica-set-elections/
More on arbiters: http://docs.mongodb.org/manual/faq/replica-sets/#how-many-arbiters-do-replica-sets-need
I'm using 3 Linux servers that runs mongodb. I want to use replication with 1 primary and 2 secondaries. I succeeded with setting it up, and when I shut down the primary I saw that one of the secondaries was chosen as primary.
So now I have left with 1 primary and 1 secondary.
When I shut down the primary I am left with secondary which doesn't elected as primary.
I have read the documentation and search the web but I couldn't find anything about it. Why does the secondary not being elected as primary?
There are a couple of reasons the last member is not being elected but the biggest one is that you require a majority of configured, voting, members to be online and voting in order for MongoDB to know that it is not:
A Network Partition
Or a problem within that mongod
If a majority of members are offline MongoDB will, naturally, stop writes to the set since it cannot make a best guess at the current state of the replica set.
Once you have a majority offline you will need to intervene manually: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/
Mongo docs list this three-member configuration: primary, secondary, arbiter, as the minimal architecture of a replica set.
Why would an arbiter be necessary there? If the primary fails, the secondary won't see the heartbeat, so it needs to become primary. In other words, why wouldn't a primary + secondary configuration be sufficient? This related question doesn't seem to address the issue, as it discusses larger numbers of nodes.
Suppose you have only two servers, one primary and one secondary.
If suddenly the secondary can not reach the primary server it could be that the primary is down (in that case the secondary should become primary) but it could be as well a network issue that isolated the secondary (this the secondary is the one that is in deed down).
however, if you have an arbiter and the secondary cannot reach the primary but it CAN reach the arbiter then the issue is with the primary so it must become the new primary. If it CANNOT reach the primary, nor the arbiter, then the secondary knows that the issue is that he is isolated/broken -poor secondary :(- so he must not become the primary
If you bring the Arbiter down to its core it is essentially a none-data holding member used for voting.
One case for an Arbiter is as I state in the linked question: Why do we need an 'arbiter' in MongoDB replication? to break the problems of CAP but that is not its true purpose since you could easily replace that Arbiter with a data holding node and have the same effect.
However, an Arbiter will have a few benefits:
Small footprint
No data
No need to synch
can instantly vote
can be put literally anywhere in your network, app server or even another secondary to boost that part of your network (this comes into partitions).
So an Arbiter is extremely useful, even on one side of a partition (i.e. you have no partitioning in your network).
Now to explain base setup. An Arbiter would NOT be required, you could factor it out for a data holding node, but 3 data holding nodes is not the minimum (that is the minimum you need to keep automatic failover), 2 data holding nodes and 1 Arbiter is actually the minimum.
Now to answer:
In other words, why wouldn't a primary + secondary configuration be sufficient?
Because if one of those goes down there is only 50% of the vote left (2-1 = 1) and 50% is not classed as a sufficient majority for MongoDB to actually vote in a member (judged by the total configured voteable members in your rs.config).
Also in this case MongoDB does not actually know if that last member is the last member. It needs other members to tell it otherwise.
So yes, this is why you need a third guy.
i've got two server with a mongo instance each.
On the first server i set mongo instance as primary and on the second mongo is secondary.
I haven't got the possibility to take another server to make it as arbiter.
How can i use mongodb with just two server?
If primary fails, secondary becomes automatically primary?
Thanks!
How can i use mongodb with just two server?
If you really want to go down this road, which may I add is a very bad road then you can set your primary to have no votes, in which case the only voting member would be the secondary in the event of a failover, however, this then causes another problem. In the event of a secondary failover you cannot have a primary elected (failover of any member will trigger an election).
So even though with 2 members you can account for one failover you cannot account for both equally.
It is not a good practice to have even number of members in replica set because it leads to election problem. In order to be elected node is required to get majority of votes. If you have two members you need to get two votes, that is impossible in case at least one node is down. There are several options:
add lightweight arbiter node to the first or second server to replicaSet, so you would have three members in replica set. It doesn't prevent you from recovery in case of network partition, but it is a bit better than just having two node replica set.
use replica set in master-slave mode, i.e. without automatic recovery, you could achive it by setting votes:2 for primary. If primary is down, you need to reconfigure replica set and set votes:2 for secondary, then secondary would be elected as primary. So you would have option for manual recovery.
I am investigating using MongoDB ReplicaSet for high availability.
But just discovered that in ReplicaSet with 3 nodes, if PRIMARY mongod is the only one left (that is 2 other mongod instances died or were shut down), then after several seconds it switches role to SECONDARY and accepts writes no more. That makes Replica Set worth less than single instance.
I know & understand about PRIMARY election, but the PRIMARY role is fixed to a server (by using priority set to ,say, 10) and (for example due to network problems) other servers become inaccessible, why the main server just gives up?!
Tested with 2.4.8 on Windows (mongodb-win32-x86_64-2008plus-2.4.8) and Linux (CentOS) and 2.0.x on Linux
BOUNTY STARTED:
If the replica set gives up when PRIMARY feels alone, what are alternative to ensure 100% availability? Or maybe there is special configuration needed for the case. The current implementation makes ReplicaSet fragile in case of network problems.
UPDATED:
Alas, I have not said before the scenario when #3 goes down (PRIMARY & SECONDARY are left)
and then after a while SECONDARY goes down. Then PRIMARY really just "gives up", because it is already known that #3 is unavailable for some time. This was actually tested in my test environment.
var rsconfig = {"_id":"rs4","members":[{"_id":0,"host":"localhost:27041","priority":10},{"_id":1,"host":"localhost:27042"},{"_id":2,"host":"localhost:27043","arbiterOnly":true}]}
printjson(rsconfig)
rs.initiate(rsconfig)
We initially thought to put SECONDARY and #3 (that is ARBITER) on the same server,
but because of question in title, we cannot use such configuration.
Thanks to Alan Spencer for first explaining the logic that MongoDB takes.
This is expected, since the majority of the members are down MongoDB does not assume the last remaining member is consistent.
When you have a majority of the members down there are a couple of options: http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/
You say that when the primary is cut off from the other two nodes it should stay up, otherwise write availability is lost, but that's not necessarily the case. If the other two nodes are actually up and on the other side of the network partition, then they have elected a new primary (as two out of three are a majority) and it is that primary that is accepting new writes.
If the previous primary continued to accept writes, you would have potentially conflicting data which there is no mechanism to resolve. Since MongoDB replica set is a single primary architecture (as opposed to a multi-master system) the election mechanism assures that there cannot be two primaries at the same time.
From the point of view of two secondaries, network partition is the same as primary being unavailable, and from the primary's point of view, network partition is indistinguishable from "both other nodes are down". It steps down, because in case of network partition there may already be another primary on the other side of it, and it assures there cannot be two primaries by stepping down.
It is not the case that the "replica set" gives up when primary feels alone - the reason primary steps down when it feels alone is precisely to preserve the integrity of the replica set as a whole. It is not true that setting high priority score fixes a role to a node - a primary can only be elected via consensus among majority - all priority scores do is influence election when all other things are equal.
I highly recommend the excellent "call me maybe" series as reading to understand the challenges of write availability in a distributed system: http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions
Just to chime in on the answers. The behavior in this scenario is expected. MongoDB uses a leader election algorithm to elect the new leader. So if there is no majority you cannot elect a leader and hence no writes.
Your only option at the point where 2 nodes are down is to reconfigure your replica set as a 1 node replica set to make it writeable. You can do this using the rs.reconfig cmd with just one server. However please note that this should just be a temporary and emergency configuration. For the longer duration you should have an odd number of total nodes (3+) in your replica set configuration.
Try to use arbiters, most documents say to use just one, but in you case, you need to win the election.
From http://docs.mongodb.org/manual/core/replica-set-architectures/ :
Fault tolerance for a replica set is the number of members that can
become unavailable and still leave enough members in the set to elect
a primary. In other words, it is the difference between the number of
members in the set and the majority needed to elect a primary. Without
a primary, a replica set cannot accept write operations. Fault
tolerance is an effect of replica set size, but the relationship is
not direct.
More on elections: http://docs.mongodb.org/manual/core/replica-set-elections/
More on arbiters: http://docs.mongodb.org/manual/faq/replica-sets/#how-many-arbiters-do-replica-sets-need