MongoDB replicaset configuration - mongodb

Could you please tell me if this will cause any issues with failover? For example, what would happen if host mongo2.local is down? (assuming the original host and the arbiter go down and only 2 members are left). Will the rest of the members be able to elect a new primary ever?
I know that there shouldn't be an arbiter here as it makes things worse but I wanted to know if a failover will occur in case of this setup and mongo2.local go down.
mongo:ARBITER> rs.status()
{
"set" : "mongo",
"date" : ISODate("2015-02-12T09:00:08Z"),
"myState" : 7,
"members" : [
{
"_id" : 0,
"name" : "mongo1.local:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2572473,
"optime" : Timestamp(1423731603, 4),
"optimeDate" : ISODate("2015-02-12T09:00:03Z"),
"lastHeartbeat" : ISODate("2015-02-12T09:00:07Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T09:00:07Z"),
"pingMs" : 0,
"syncingTo" : "mongo2.local:27017"
},
{
"_id" : 1,
"name" : "mongo2.local:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 12148099,
"optime" : Timestamp(1423731603, 4),
"optimeDate" : ISODate("2015-02-12T09:00:03Z"),
"lastHeartbeat" : ISODate("2015-02-12T09:00:08Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T09:00:08Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1423711411, 1),
"electionDate" : ISODate("2015-02-12T03:23:31Z")
},
{
"_id" : 2,
"name" : "mongo3.local:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 5474488,
"optime" : Timestamp(1423731603, 4),
"optimeDate" : ISODate("2015-02-12T09:00:03Z"),
"lastHeartbeat" : ISODate("2015-02-12T09:00:07Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T09:00:08Z"),
"pingMs" : 139,
"syncingTo" : "mongo2.local:27017"
},
{
"_id" : 3,
"name" : "mongo2.local:27020",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 12148101,
"self" : true
}
],
"ok" : 1
}
and:
mongo:ARBITER> rs.config()
{
"_id" : "mongo",
"version" : 5,
"members" : [
{
"_id" : 0,
"host" : "mongo1.local:27017",
"priority" : 0.5
},
{
"_id" : 1,
"host" : "mongo2.local:27017"
},
{
"_id" : 2,
"host" : "mongo3.local:27017",
"priority" : 0.5
},
{
"_id" : 3,
"host" : "mongo2.local:27020",
"arbiterOnly" : true
}
]
}

If you have less than a majority of the votes in a replica set available, the replica set will not be able to elect or maintain a primary and the replica set will be unhealthy and will be read-only. Ergo, if you only have 2 of your 4 embers up, you will not have a primary. No automatic failover will occur as there aren't enough votes for an election.
Don't have an even number of nodes in a replica set. It increases the chances that there will be problems, just because there are more servers, without increasing the failure tolerance of the set. With 3 or 4 replica set members, 2 down servers will render the set unhealthy.

Related

Resync a Mongo Replica Set

I'm having a replica set, and to free some disk space, I want to resync my replica set members.
Thus, on the SECONDARY member of the replica set, I've emptied the /var/lib/mongodb/ directory which holds the data for the database.
When I open a shell to the Replication Set, and execute the command rs.status(), the following is showed.
{
"set" : "rs1",
"date" : ISODate("2016-12-13T08:28:00.414Z"),
"myState" : 5,
"term" : NumberLong(29),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "10.20.2.87:27017",
"health" : 1.0,
"state" : 5,
"stateStr" : "SECONDARY",
"uptime" : 148,
"optime" : {
"ts" : Timestamp(6363490787761586, 1),
"t" : NumberLong(29)
},
"optimeDate" : ISODate("2016-12-13T07:54:16.000Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 3,
"self" : true
},
{
"_id" : 1,
"name" : "10.20.2.95:27017",
"health" : 1.0,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 146,
"optime" : {
"ts" : Timestamp(6363490787761586, 1),
"t" : NumberLong(29)
},
"optimeDate" : ISODate("2016-12-13T07:54:16.000Z"),
"lastHeartbeat" : ISODate("2016-12-13T08:27:58.435Z"),
"lastHeartbeatRecv" : ISODate("2016-12-13T08:27:59.447Z"),
"pingMs" : NumberLong(0),
"electionTime" : Timestamp(6363486827801739, 1),
"electionDate" : ISODate("2016-12-13T07:38:54.000Z"),
"configVersion" : 3
},
{
"_id" : 2,
"name" : "10.20.2.93:30001",
"health" : 1.0,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 146,
"lastHeartbeat" : ISODate("2016-12-13T08:27:58.437Z"),
"lastHeartbeatRecv" : ISODate("2016-12-13T08:27:59.394Z"),
"pingMs" : NumberLong(0),
"configVersion" : 3
}
],
"ok" : 1.0
}
Why does my secondary member shows `Could not find member to sync from, however, my primary is up and running."
My collection is sharded, over 6 servers, and I have this message on 2 replica set members. The ones which have the SECONDARY member on top in the members array when requesting the replication set status.
I really would like to get rid of this error message.
It scares me :-)
Kind regards
I had a similar problem, and it was due to the fact that the heartbeat timeout was too short, you can solve that problem here

How to know when replica set initial sync completed

From the MongoDB documentation:
At this point, the mongod will perform an initial sync. The length of the initial sync process depends on the size of the database and network connection between members of the replica set.
Source
My question in very simple, how can I know when it's safe to stepDown the PRIMARY member of my replica set? I just upgrated my secondary to use WiredTiger.
Output of rs.status():
{
"set" : "m0",
"date" : ISODate("2015-03-18T09:59:21.486Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "example.com",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 4642,
"optime" : Timestamp(1426672500, 1),
"optimeDate" : ISODate("2015-03-18T09:55:00Z"),
"electionTime" : Timestamp(1426668268, 1),
"electionDate" : ISODate("2015-03-18T08:44:28Z"),
"configVersion" : 7,
"self" : true
},
{
"_id" : 1,
"name" : "example.com"",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1309,
"optime" : Timestamp(1426672500, 1),
"optimeDate" : ISODate("2015-03-18T09:55:00Z"),
"lastHeartbeat" : ISODate("2015-03-18T09:59:20.968Z"),
"lastHeartbeatRecv" : ISODate("2015-03-18T09:59:20.762Z"),
"pingMs" : 0,
"syncingTo" : "example.com"",
"configVersion" : 7
},
{
"_id" : 2,
"name" : "example.com"",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 4640,
"lastHeartbeat" : ISODate("2015-03-18T09:59:21.009Z"),
"lastHeartbeatRecv" : ISODate("2015-03-18T09:59:21.238Z"),
"pingMs" : 59,
"configVersion" : 7
}
],
"ok" : 1
}
Found the solution:
While performing the inital sync, the status is RECOVERING

mongoDB servers are not available when the master is down

Once I turn off the master node, the other nodes are down, too.
I can NOT get data from other nodes (if the master is down/off )
The following errors happends when the master replica is down
ConnectionPool::PoolShuttingDownError (ConnectionPool::PoolShuttingDownError):
Moped::Errors::ConnectionFailure (Could not connect to a primary node for replica set #<Moped::Cluster:70117691586500 #seeds=[<Moped::Node resolved_address="0.0.0.0:27017">]>
I have 3 mongoDB replicas.
It seems the primary replica is 172.19.16.109:27017
I don't understand why I shutdown the machine 172.19.16.109,
And remaining 2 replicas are unavailable to load data from its database.
Is it not making sense for the replicas ? Once if the master is shutdown, the others are also unavailable, too.
I expect that even if any replica is down, the others should work correctly.
My configurations are as below.
vvtk_dqa:PRIMARY> rs.conf()
{
"_id" : "vvtk_dqa",
"version" : 4,
"members" : [
{
"_id" : 1,
"host" : "172.19.16.109:27017"
},
{
"_id" : 2,
"host" : "172.19.16.104:27017"
},
{
"_id" : 3,
"host" : "192.168.14.7:27017"
}
]
}
mongodb.conf
# Where to store the data.
dbpath=/var/lib/mongodb
#where to log
logpath=/var/log/mongodb/mongodb.log
logappend=true
bind_ip = 0.0.0.0
#port = 27017
# Enable journaling, http://www.mongodb.org/display/DOCS/Journaling
journal=true
replSet=vvtk_dqa
isMaster
vvtk_dqa:PRIMARY> rs.isMaster()
{
"setName" : "vvtk_dqa",
"setVersion" : 4,
"ismaster" : true,
"secondary" : false,
"hosts" : [
"172.19.16.109:27017",
"192.168.14.7:27017",
"172.19.16.104:27017"
],
"primary" : "172.19.16.109:27017",
"me" : "172.19.16.109:27017",
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2015-02-11T02:05:12.021Z"),
"maxWireVersion" : 2,
"minWireVersion" : 0,
"ok" : 1
}
After I shutdown the 172.19.16.109, and type rs.status() on 192.168.14.7
and still can not get any data from 192.168.14.7 or 172.19.16.104
vvtk_dxa:PRIMARY> rs.status()
{
"set" : "vvtk_dqa",
"date" : ISODate("2015-02-12T01:51:37Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "172.19.16.109:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1423649845, 1),
"optimeDate" : ISODate("2015-02-11T10:17:25Z"),
"lastHeartbeat" : ISODate("2015-02-12T01:51:28Z"),
"lastHeartbeatRecv" : ISODate("2015-02-11T10:31:04Z"),
"pingMs" : 0,
"syncingTo" : "192.168.14.7:27017"
},
{
"_id" : 2,
"name" : "172.19.16.104:27017",
"health" : 1,
"state" : 4,
"stateStr" : "FATAL",
"uptime" : 64832,
"optime" : Timestamp(1423545748, 1),
"optimeDate" : ISODate("2015-02-10T05:22:28Z"),
"lastHeartbeat" : ISODate("2015-02-12T01:51:37Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T01:51:35Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "192.168.14.7:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 64835,
"optime" : Timestamp(1423704914, 1),
"optimeDate" : ISODate("2015-02-12T01:35:14Z"),
"self" : true
}
],
"ok" : 1
}

mongo secondary has no queries after recovery

I have a test case, a sharding cluster with 1 shard.
The shard is rs, which has 1 primary and 2 secondaries.
My application uses secondaryPreferred policy, at first the queries balanced over two secondaries. Then I stop 1 secondary 10.160.243.22 to simulate fault, and then reboot it, the status is ok:
rs10032:PRIMARY> rs.status()
{
"set" : "rs10032",
"date" : ISODate("2014-12-05T09:21:07Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "10.160.243.22:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2211,
"optime" : Timestamp(1417771218, 3),
"optimeDate" : ISODate("2014-12-05T09:20:18Z"),
"lastHeartbeat" : ISODate("2014-12-05T09:21:05Z"),
"lastHeartbeatRecv" : ISODate("2014-12-05T09:21:07Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: 10.160.188.52:27017",
"syncingTo" : "10.160.188.52:27017"
},
{
"_id" : 1,
"name" : "10.160.188.52:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 2211,
"optime" : Timestamp(1417771218, 3),
"optimeDate" : ISODate("2014-12-05T09:20:18Z"),
"electionTime" : Timestamp(1417770837, 1),
"electionDate" : ISODate("2014-12-05T09:13:57Z"),
"self" : true
},
{
"_id" : 2,
"name" : "10.160.189.52:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2209,
"optime" : Timestamp(1417771218, 3),
"optimeDate" : ISODate("2014-12-05T09:20:18Z"),
"lastHeartbeat" : ISODate("2014-12-05T09:21:07Z"),
"lastHeartbeatRecv" : ISODate("2014-12-05T09:21:06Z"),
"pingMs" : 0,
"syncingTo" : "10.160.188.52:27017"
}
],
"ok" : 1
}
but all queries go to another secondary 10.160.188.52, and 10.160.243.22 is idle
Why the queries not balanced to two secondaries after recovery and how to fix it ?
Your application uses some kind of driver(I don't know exact technology stack you are using) to connect to MongoDb. Your driver could remember(cache) replica set status or connections for some period of time. So, there is no guarantee that secondary node will be available immediately after a recovery.

MongoDB Primary replica set member syncing to secondary

I have a replica set having three members, with host0:27100 as a primary member. Recently i changed the configuration and made the host2:27102 as primary member. Followed these docs.
After changing the configuratio, the rs.status() output says that the host1:27101 is "syncingTo" : "host2:27102" which is intended.
But the output for new primary host2:27102 shows it is "syncingTo" : "host0:27100" which is the previous primary member, and changed into secondary.
I cannot understand why its syncing to the secondary member. Is it a normal behavior?
s0:SECONDARY> rs.status()
{
"set" : "s0",
"date" : ISODate("2013-09-25T12:31:42Z"),
"myState" : 2,
"syncingTo" : "host2:27102",
"members" : [
{
"_id" : 0,
"name" : "host0:27100",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 428068,
"optime" : Timestamp(1380112272, 1),
"optimeDate" : ISODate("2013-09-25T12:31:12Z"),
"self" : true
},
{
"_id" : 1,
"name" : "host1:27101",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 397,
"optime" : Timestamp(1380112272, 1),
"optimeDate" : ISODate("2013-09-25T12:31:12Z"),
"lastHeartbeat" : ISODate("2013-09-25T12:31:42Z"),
"lastHeartbeatRecv" : ISODate("2013-09-25T12:31:41Z"),
"pingMs" : 10,
"syncingTo" : "host2:27102"
},
{
"_id" : 2,
"name" : "host2:27102",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 397,
"optime" : Timestamp(1380112272, 1),
"optimeDate" : ISODate("2013-09-25T12:31:12Z"),
"lastHeartbeat" : ISODate("2013-09-25T12:31:42Z"),
"lastHeartbeatRecv" : ISODate("2013-09-25T12:31:41Z"),
"pingMs" : 2,
"syncingTo" : "host0:27100"
}
],
"ok" : 1
}
This is a known issue. There is an open ticket about rs.status() showing the primary as syncingTo when run from a secondary if the current primary was a secondary in the past ( SERVER-9989 ). Fix verion is 2.5.1