MongoDB failover when 2 nodes of a 3 node replicaset go down

MongoDB failover when 2 nodes of a 3 node replicaset go down - mongodb

I need to setup a mongo replicaset on two data centers.
For the sake of testing, I setup a replicaset of 3 nodes, thinking of putting 2 nodes on the local site - Primary and a secondary, and on the other site another standby.
However, if I take down the Primary and one of the standby's, the remaining standby stays as secondary, and is not promoted to become a Primary, like I expected.
Reading about it in other questions here, looke like the only solution is to use an arbiter on a third site, which is quite problematic.
As a temporary solution - is there a way to force this standalone secondary to become a primary?

In order to elect a PRIMARY the majority of all members must be up.
2 out of 3 nodes is not the majority. Typically the data center itself does not crash, usually you "only" lose the connection to a data center.
You can to following.
Put 2 nodes in first data center, and 1 node it second data center. In this setup the first data center acts as primary and must not fail! The second data center may fail.
Another setup is to put one node in each data center and an ARBITER on - a different site. This "different site" does not need to be a full-blown data center, the MongoDB ARBITER process is a very light process and does not store any data, so it could be a small host somewhere in your IT network. Of course, it must have connection to both data centers.

Related

Mongodb Replicaset on AZURE with an Arbiter

I want to use MongoDB with replication; I created a VM with 2 secondary nodes and 1 arbiter:
1 Primary
2 Secondary
1 Arbiter
I'm trying to understand how this system works, so I have some questions:
1) According to information "If a replica set has an even number of members, add an arbiter." I added an arbiter. But I'm not sure if I have done it correctly. Does this even number apply to secondaries or to all members in total?
2) What does this arbiter doing? I actually don't understand its job.
3) I created public IP addresses for each VM, in order to connect to them from outside. I successfully connected from my application, using this connection string:
mongodb://username:password#vm0:27017,vm1:27017,vm2:27017/dbname?replicaSet=xxx&readPreference=primaryPreferred
I didn't add the arbiter in this connection string but Should I add it or not?
4) When I shut down the primary machine, one of the secondary machine successfully became primary as I expect. There is no problem in this case; but when I shut down the second primary machine my application throws an error. The second secondary node has not become primary - why is this happening?
5) If all VMs are working but I shut down the arbiter, my application again throws an error and I cannot connect to the db. I'm trying this because I'm thinking the case of if there will be something wrong on arbiter machine and it may be shut down in the future because of the maintenance or any other problems.
Maybe because I didn't understand the role of an arbiter; I'm thinking this is wrong but why it is not converting any secondary machine to arbiter? And why when I shut down the arbiter does the whole system not work?
Thanks.

1) If you have 1 Primary and 2 Secondaries, you have 3 members in your replica set. Therefore you should not be adding an arbiter. You already have an odd number of nodes.
2) An arbiter is a node which doesn't hold data and can't be elected as Primary. It is only used to elect a new Primary if the current Primary goes down.
For example, say you have 1 Primary and 1 Secondary. The replica set has 2 members. If the primary goes down, the replica set will attempt to vote to elect a new Primary. In order for a node to be elected, it needs to win over half the votes. But if the Secondary votes for itself, it will only get 1 out of 2 votes. That's not more than half so it will not be elected. Thus the replica set will not be able to elect a new Primary and your whole replica set will go down.
To fix this, you can add an arbiter to the replica set. This is usually a much smaller machine since it doesn't need to hold data. It just has one job, voting for the Secondary to be the new Primary in the case of elections.
But, since you already have 3 data-bearing nodes, you won't want to add an arbiter. You can read more about arbiters here.
3) You can add arbiters to connection strings but in general you won't need to. Adding the data-bearing nodes is just fine. That's what people usually do.
4) You have 4 members in the replica set. You took down 2 of them. That means there are only 2 votes left. The final secondary won't be able to get more than 50% of the votes so no Primary will be elected.
In general, testing two nodes going down is overkill. You probably want a 3 member replica set. Each member should be in a different availability zone (Availability Set in Azure). If two nodes go down your replica set will be unavailable. But two nodes going down at the same time is very unlikely if all nodes are in different availability zones. So don't worry too much about more than one node going down. If that's a real concern (in most applications it really isn't), you want to make a 5 member replica set.
5) That's weird. This sounds like your replica set might be configured incorrectly. As I said, you don't need an arbiter anyway. So you could just try setting it up again without the arbiter and see if it works. Open a new question if you're still having issues. Make sure to include the output of running rs.status() in your question.

Requires simple explanation on Arbiter's role in a given mongoDB replica set

I came across MongoDB official site explaining on having odd number of members replica set up. I also heard of the term Arbiter from the same site, which based on my understanding, it will not be elected as primary and it does participate on election (from https://docs.mongodb.com/manual/core/replica-set-arbiter/).
There is also a post related to Arbiter in Why do we need an 'arbiter' in MongoDB replication? which then relates to CAP theorem, which further gets things more complicated.
First of all, why do we need to make the number of members odd? Also, can someone explain to me what this Arbiter is and what is its role in a given replica set in simple layman English??
Thanks in advance.

In short: it is to stop the two normal nodes of the replica set getting into a split-brain situation if they lose contact with each other.
MongoDB replica sets are designed so that, if one or more members goes down or loses contact, the other members are able to keep going as long as between them they have a majority. The majority clause is important: without that, you might have a situation where the network is split in two, and the nodes on each side of the partition think that they're still carrying on the replica set, and end up with different sets of data.
So to avoid the split brain problem, the nodes of a replica set will not continue if they can't command an absolute majority. An example of this is if you have two nodes, in a replica set like this:
If they lose communication, the outcome is symmetrical:
Each one will reason the same way:
realise it has lost communication with the other
assess whether it is possible to keep the replica set going
realise that 1 node (out of 2) does not constitute a majority
revert to Secondary mode
The difference an Arbiter makes
If there is a third node, then even if the two main nodes lose contact with each other then there will still be one of them in contact with the arbiter. This allows the two main nodes to make different decisions, and keep the replica set going while avoiding the split-brain problem.
Consider the following example of a 3-node replica set:
Whichever way the network partition goes, one node will still be in contact with the arbiter; for example like this:
Node A will:
realise it can contact neither node B nor the arbiter
assess whether it is possible to keep the replica set going
realise that 1 node (out of 3) does not constitute a majority
revert to Secondary mode
Whereas node B is able to react differently:
realise it cannot contact node A, but still has contact with the arbiter
assess whether it is possible to keep the replica set going
realise that 2 nodes (out of 3) do constitute a majority
take over as Primary
This also illustrates how you should deploy an arbiter to get that benefit:
try to put the arbiter on a system independent of both the data-bearing nodes, to maximise the chance of it still being able to communicate with either throughout network problems
it doesn't need to store data, so you don't need high-spec hardware
Just 1 arbiter is enough to break the deadlock; you don't get any benefit from multiple arbiters

Take the example of a 2-member replica set: in the event of a network-partitioning, i.e., the 2 members lost touch of each other, who gets to become the primary? There will be a tie and a need for a tie-breaker. That would not be the case if we have a 3-member replica set: the group that contains two nodes will win and one of them will become primary. That is the basis of the requirement for an odd number of nodes in a replica set. As for an arbiter, it happens to be light weight so that I guess one can save money by having in place a smaller machine, since we do not expect it to hold any data, and that we just need it to be present to vote for primary.

Mongodb architecture and failover with two data centres

I’m trying to figure out whether there is a way to seamlessly failover a mongo replicaset where most of the mongodb nodes live in the primary data centre. My current limitation is 2 data centres and third datacentre is out of the question. The issue I have is that if data centre 1 goes down, the secondary node in data centre 2 will not be promoted to primary without manual intervention.
Data centre 1 (Primary):
Mongo Node (Primary)
Mongo Node (Arbiter)
Data centre 2 (Secondary):
Mongo Node (Secondary)
I've looked at mongodb whitepapers but they state manual intervention is required to make the mongodb instance in dc2 primary if dc1 is lost.
My question is whether there is an architecture out there or configuration that will make it possible to lose data centre 1 and still have the ability to have a data centre 2 takeover with write enabled without manual intervention/reconfiguration. Is this possible without going down a 3 data centre architecture path. Is it possible to keep two 3 member replica sets at each site synchronised and potentially do the failover at a network level for the connecting applications?
Thanks.

If you go with 2 data centers to me easiest solution is to cover only fail in Primary. Good news is if Slave is dead - you only need to wait.
If access to Primary fails you need to callback procedure that will force Slave to Primary. This switch will cause downtime in your application if you don't spent more time to create a gateway that will buffer queries and waits for callback from the switch. In that way you will have only slowness with increase timeout.
After Primary is live again you need to connect back to it (because your Slave node is not reliable) - this will cause again downtime- you need another process that checks if Primary is alive (from data center 2) and if it is trigger event and proceed with callback.
Manual intervention to force Slave as Primary can be wrapped to script.
To me here best solution is to go with 3rd data center where arbiter will stay. The effort to skip that and put application logic there is not worthy. Automatic failover in Mongo works very well and its reliable. You may have lots of problems if you go with application logic to achieve that with 2 data centers ... I rather go with their recommendation.

First, as you have noticed, you cannot do automatic fail over with only two nodes. Second, money is not real issue when you think that "third" data center. You may ask why or "how so"?
You need arbiter, as you know. Arbiter don't need resources really, any small Linux machine will do fine. Small VPS machines don't cost that much. Here you can find machine 1 x 2.40 GHz, 512 MB, 20 GB with only 1,24€/month. From here you get beefier machine with 1.99€/month.
Actually both those places could run quite big mongodb with those "tiny" machines.

Replica set sharding. Can second data centre do failover?

We have 3 shards, replicated over 3 boxes each (9 boxes in total). 2 replicas are at our main hosting provider (site A) and we have a third replica (set to secondary only) on another hosting provider (site B). If site A fails, (how) can we automatically take requests from site B?
We have configured site B's replica to secondary-only as advised in http://docs.mongodb.org/manual/core/replication/ I know you can do rs.SlaveOk() on these boxes and take requests but this would only do the local shard, which is a third of the database.
All help appreciated!

You cannot automatically failover when the majority of the replica set is not available.
Losing site A means you've lost 2/3rds of your replica nodes. In order to have site B nodes accept reads and writes, you have to reconfigure the replica set. You can either remove the nodes at site A from the configuration or add arbiters to restore "majority".
Note that you will not have any redundancy running this way, so I would recommend that rather than trying to come up automatically or as quickly as possible, in case of major loss of servers that you take the time and spin up new servers that will provide redundancy of your new configuration.
If surviving the loss of a data center is a requirement for your application, the recommended configuration would be to have the same number of nodes in DC1 and DC2 and then an arbiter in a third location (so that whichever data center failed, you could ensure that majority of the replica set is still present and can elect a new primary).

Two Datacenters, connectivity breaks, both continue writing, connectivity returns, sync?

We have two datacenters, and are writing data to Mongo from both datacenters. The collection is sharded and we have the primary for one shard in datacenter A and primary for the other in datacenter B. Occasionally, connectivity between the datacenters fails.
We'd like to be able to continue writing IN BOTH DATACENTERS. The data we're writing won't conflict - they're both just adding documents, or updating documents that won't be updated in two places.
Then, when connectivity returns (sometimes in seconds, or even minutes), we'd like the database to cope with this situation nicely and have all the data updated automatically.
Can someone please advise if this is possible? It doesn't say much in the docs about what happens when you divide a replica set to two independent DB's, then get both to become master until you reconnect them. What happens? How do I set this up?

I don't see why this wouldn't work the way you already have it set up presuming that your secondaries are in the same data center as your primary.
In other words, if primary and secondaries for shard A are in data center A and primary and secondaries for shard B are in data center B, then you are already writing in both data centers.
If you now lose connectivity between the two data centers, then clients from data center A won't be able to read or write to shard B and clients in data center B won't be able to write to shard A but both data center clients will continue writing to the shard that's at the same data center as they are.
So, that's simple then - keep the majority of the replica set at the same data center and you will continue writing to that shard as long as that data center is up.
I have a feeling though that you expect that somehow magically clients from a disconnected data center will stash away their writes for the other data center's shards somewhere - that cannot happen - they cannot see the other data center. So when connectivity returns, there is nothing for the DB to cope with (other than the fact that there were a bunch of writes that failed during disconnected phase).

It's not possible to "divide a replica set" to have two primaries in the same set.
So you have two replica sets, which shards on one key using mongos as a router.
One solution is if first part of a sharding key is set to start with 'A' or 'B', which means if a new record starts with A it gets routed to the first set, and if it's B it gets routed to the 2nd set.
This is the only way you can control where mongos will try to put the data.
Connectivity problems between mongos and the different replicasets doesn't matter as long as your new entries doesn't have a sharding key which match the broken replicaset.
You could for example let the mongo client in datacenter A that writes data to always start the sharding key with A. Which means if datacenter B is down, only records with A is created.
Clients on both centers will have access to reads from both shards as long as they are up.
Mongos should run close to each client, so you will have these on both locations which will each have access to the sharding configuration

Replica set node cannot become master unless it sees at least half of the nodes. This means that if there's no link to the datacenter where primary is, you can't write to that shard.
I can see how one can implement a home-grown solution for this. But it requires some effort.
You might also want to look at CouchDB. It's also schemaless and JSON-based and it can handle your situation well (it was basically built for situations like this).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse