How to know when replica set initial sync completed - mongodb

From the MongoDB documentation:
At this point, the mongod will perform an initial sync. The length of the initial sync process depends on the size of the database and network connection between members of the replica set.
Source
My question in very simple, how can I know when it's safe to stepDown the PRIMARY member of my replica set? I just upgrated my secondary to use WiredTiger.
Output of rs.status():
{
"set" : "m0",
"date" : ISODate("2015-03-18T09:59:21.486Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "example.com",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 4642,
"optime" : Timestamp(1426672500, 1),
"optimeDate" : ISODate("2015-03-18T09:55:00Z"),
"electionTime" : Timestamp(1426668268, 1),
"electionDate" : ISODate("2015-03-18T08:44:28Z"),
"configVersion" : 7,
"self" : true
},
{
"_id" : 1,
"name" : "example.com"",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1309,
"optime" : Timestamp(1426672500, 1),
"optimeDate" : ISODate("2015-03-18T09:55:00Z"),
"lastHeartbeat" : ISODate("2015-03-18T09:59:20.968Z"),
"lastHeartbeatRecv" : ISODate("2015-03-18T09:59:20.762Z"),
"pingMs" : 0,
"syncingTo" : "example.com"",
"configVersion" : 7
},
{
"_id" : 2,
"name" : "example.com"",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 4640,
"lastHeartbeat" : ISODate("2015-03-18T09:59:21.009Z"),
"lastHeartbeatRecv" : ISODate("2015-03-18T09:59:21.238Z"),
"pingMs" : 59,
"configVersion" : 7
}
],
"ok" : 1
}

Found the solution:
While performing the inital sync, the status is RECOVERING

Related

Resync a Mongo Replica Set

I'm having a replica set, and to free some disk space, I want to resync my replica set members.
Thus, on the SECONDARY member of the replica set, I've emptied the /var/lib/mongodb/ directory which holds the data for the database.
When I open a shell to the Replication Set, and execute the command rs.status(), the following is showed.
{
"set" : "rs1",
"date" : ISODate("2016-12-13T08:28:00.414Z"),
"myState" : 5,
"term" : NumberLong(29),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "10.20.2.87:27017",
"health" : 1.0,
"state" : 5,
"stateStr" : "SECONDARY",
"uptime" : 148,
"optime" : {
"ts" : Timestamp(6363490787761586, 1),
"t" : NumberLong(29)
},
"optimeDate" : ISODate("2016-12-13T07:54:16.000Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 3,
"self" : true
},
{
"_id" : 1,
"name" : "10.20.2.95:27017",
"health" : 1.0,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 146,
"optime" : {
"ts" : Timestamp(6363490787761586, 1),
"t" : NumberLong(29)
},
"optimeDate" : ISODate("2016-12-13T07:54:16.000Z"),
"lastHeartbeat" : ISODate("2016-12-13T08:27:58.435Z"),
"lastHeartbeatRecv" : ISODate("2016-12-13T08:27:59.447Z"),
"pingMs" : NumberLong(0),
"electionTime" : Timestamp(6363486827801739, 1),
"electionDate" : ISODate("2016-12-13T07:38:54.000Z"),
"configVersion" : 3
},
{
"_id" : 2,
"name" : "10.20.2.93:30001",
"health" : 1.0,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 146,
"lastHeartbeat" : ISODate("2016-12-13T08:27:58.437Z"),
"lastHeartbeatRecv" : ISODate("2016-12-13T08:27:59.394Z"),
"pingMs" : NumberLong(0),
"configVersion" : 3
}
],
"ok" : 1.0
}
Why does my secondary member shows `Could not find member to sync from, however, my primary is up and running."
My collection is sharded, over 6 servers, and I have this message on 2 replica set members. The ones which have the SECONDARY member on top in the members array when requesting the replication set status.
I really would like to get rid of this error message.
It scares me :-)
Kind regards
I had a similar problem, and it was due to the fact that the heartbeat timeout was too short, you can solve that problem here

MongoDB replicaset configuration

Could you please tell me if this will cause any issues with failover? For example, what would happen if host mongo2.local is down? (assuming the original host and the arbiter go down and only 2 members are left). Will the rest of the members be able to elect a new primary ever?
I know that there shouldn't be an arbiter here as it makes things worse but I wanted to know if a failover will occur in case of this setup and mongo2.local go down.
mongo:ARBITER> rs.status()
{
"set" : "mongo",
"date" : ISODate("2015-02-12T09:00:08Z"),
"myState" : 7,
"members" : [
{
"_id" : 0,
"name" : "mongo1.local:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2572473,
"optime" : Timestamp(1423731603, 4),
"optimeDate" : ISODate("2015-02-12T09:00:03Z"),
"lastHeartbeat" : ISODate("2015-02-12T09:00:07Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T09:00:07Z"),
"pingMs" : 0,
"syncingTo" : "mongo2.local:27017"
},
{
"_id" : 1,
"name" : "mongo2.local:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 12148099,
"optime" : Timestamp(1423731603, 4),
"optimeDate" : ISODate("2015-02-12T09:00:03Z"),
"lastHeartbeat" : ISODate("2015-02-12T09:00:08Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T09:00:08Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1423711411, 1),
"electionDate" : ISODate("2015-02-12T03:23:31Z")
},
{
"_id" : 2,
"name" : "mongo3.local:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 5474488,
"optime" : Timestamp(1423731603, 4),
"optimeDate" : ISODate("2015-02-12T09:00:03Z"),
"lastHeartbeat" : ISODate("2015-02-12T09:00:07Z"),
"lastHeartbeatRecv" : ISODate("2015-02-12T09:00:08Z"),
"pingMs" : 139,
"syncingTo" : "mongo2.local:27017"
},
{
"_id" : 3,
"name" : "mongo2.local:27020",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 12148101,
"self" : true
}
],
"ok" : 1
}
and:
mongo:ARBITER> rs.config()
{
"_id" : "mongo",
"version" : 5,
"members" : [
{
"_id" : 0,
"host" : "mongo1.local:27017",
"priority" : 0.5
},
{
"_id" : 1,
"host" : "mongo2.local:27017"
},
{
"_id" : 2,
"host" : "mongo3.local:27017",
"priority" : 0.5
},
{
"_id" : 3,
"host" : "mongo2.local:27020",
"arbiterOnly" : true
}
]
}
If you have less than a majority of the votes in a replica set available, the replica set will not be able to elect or maintain a primary and the replica set will be unhealthy and will be read-only. Ergo, if you only have 2 of your 4 embers up, you will not have a primary. No automatic failover will occur as there aren't enough votes for an election.
Don't have an even number of nodes in a replica set. It increases the chances that there will be problems, just because there are more servers, without increasing the failure tolerance of the set. With 3 or 4 replica set members, 2 down servers will render the set unhealthy.

mongo secondary has no queries after recovery

I have a test case, a sharding cluster with 1 shard.
The shard is rs, which has 1 primary and 2 secondaries.
My application uses secondaryPreferred policy, at first the queries balanced over two secondaries. Then I stop 1 secondary 10.160.243.22 to simulate fault, and then reboot it, the status is ok:
rs10032:PRIMARY> rs.status()
{
"set" : "rs10032",
"date" : ISODate("2014-12-05T09:21:07Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "10.160.243.22:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2211,
"optime" : Timestamp(1417771218, 3),
"optimeDate" : ISODate("2014-12-05T09:20:18Z"),
"lastHeartbeat" : ISODate("2014-12-05T09:21:05Z"),
"lastHeartbeatRecv" : ISODate("2014-12-05T09:21:07Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: 10.160.188.52:27017",
"syncingTo" : "10.160.188.52:27017"
},
{
"_id" : 1,
"name" : "10.160.188.52:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 2211,
"optime" : Timestamp(1417771218, 3),
"optimeDate" : ISODate("2014-12-05T09:20:18Z"),
"electionTime" : Timestamp(1417770837, 1),
"electionDate" : ISODate("2014-12-05T09:13:57Z"),
"self" : true
},
{
"_id" : 2,
"name" : "10.160.189.52:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2209,
"optime" : Timestamp(1417771218, 3),
"optimeDate" : ISODate("2014-12-05T09:20:18Z"),
"lastHeartbeat" : ISODate("2014-12-05T09:21:07Z"),
"lastHeartbeatRecv" : ISODate("2014-12-05T09:21:06Z"),
"pingMs" : 0,
"syncingTo" : "10.160.188.52:27017"
}
],
"ok" : 1
}
but all queries go to another secondary 10.160.188.52, and 10.160.243.22 is idle
Why the queries not balanced to two secondaries after recovery and how to fix it ?
Your application uses some kind of driver(I don't know exact technology stack you are using) to connect to MongoDb. Your driver could remember(cache) replica set status or connections for some period of time. So, there is no guarantee that secondary node will be available immediately after a recovery.

How come that primary member behind secondary in one mongodb replica set

On our production environment we have strange behavior of a mongo replica, our primary is always behind secondaries.
rs.status():
{
"set" : "repl01",
"date" : ISODate("2014-02-20T11:11:28.000Z"),
"myState" : 2,
"syncingTo" : "prodsrv04:27018",
"members" : [
{
"_id" : 0,
"name" : "prodsrv02:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 6271,
"optime" : Timestamp(1392894670, 97),
"optimeDate" : ISODate("2014-02-20T11:11:10.000Z"),
"self" : true
},
{
"_id" : 1,
"name" : "prodsrv03:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 6270,
"optime" : Timestamp(1392894670, 68),
"optimeDate" : ISODate("2014-02-20T11:11:10.000Z"),
"lastHeartbeat" : ISODate("2014-02-20T11:11:28.000Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00.000Z"),
"pingMs" : 2
},
{
"_id" : 2,
"name" : "prodsrv04:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 6270,
"optime" : Timestamp(1392894669, 113),
"optimeDate" : ISODate("2014-02-20T11:11:09.000Z"),
"lastHeartbeat" : ISODate("2014-02-20T11:11:27.000Z"),
"lastHeartbeatRecv" : ISODate("2014-02-20T11:11:28.000Z"),
"pingMs" : 6
}
],
"ok" : 1
}
Master optime: Timestamp(1392894669, 113);
Slave optime : Timestamp(1392894670, 68);
How come?

MongoDB Primary replica set member syncing to secondary

I have a replica set having three members, with host0:27100 as a primary member. Recently i changed the configuration and made the host2:27102 as primary member. Followed these docs.
After changing the configuratio, the rs.status() output says that the host1:27101 is "syncingTo" : "host2:27102" which is intended.
But the output for new primary host2:27102 shows it is "syncingTo" : "host0:27100" which is the previous primary member, and changed into secondary.
I cannot understand why its syncing to the secondary member. Is it a normal behavior?
s0:SECONDARY> rs.status()
{
"set" : "s0",
"date" : ISODate("2013-09-25T12:31:42Z"),
"myState" : 2,
"syncingTo" : "host2:27102",
"members" : [
{
"_id" : 0,
"name" : "host0:27100",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 428068,
"optime" : Timestamp(1380112272, 1),
"optimeDate" : ISODate("2013-09-25T12:31:12Z"),
"self" : true
},
{
"_id" : 1,
"name" : "host1:27101",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 397,
"optime" : Timestamp(1380112272, 1),
"optimeDate" : ISODate("2013-09-25T12:31:12Z"),
"lastHeartbeat" : ISODate("2013-09-25T12:31:42Z"),
"lastHeartbeatRecv" : ISODate("2013-09-25T12:31:41Z"),
"pingMs" : 10,
"syncingTo" : "host2:27102"
},
{
"_id" : 2,
"name" : "host2:27102",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 397,
"optime" : Timestamp(1380112272, 1),
"optimeDate" : ISODate("2013-09-25T12:31:12Z"),
"lastHeartbeat" : ISODate("2013-09-25T12:31:42Z"),
"lastHeartbeatRecv" : ISODate("2013-09-25T12:31:41Z"),
"pingMs" : 2,
"syncingTo" : "host0:27100"
}
],
"ok" : 1
}
This is a known issue. There is an open ticket about rs.status() showing the primary as syncingTo when run from a secondary if the current primary was a secondary in the past ( SERVER-9989 ). Fix verion is 2.5.1