Primary election isn't done after primary is killed on a MongoDB Cluster - mongodb

I try to test fail over scenario of a mongoDB cluster. When I stopped the primary, I don't see any new primary election on my Java code's logs, and read/write operations are ignore and getting following:
No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=mongo1:30001, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}, ServerDescription{address=mongo2:30002, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=3215664, setName='rs0', canonicalAddress=mongo2:30002, hosts=[mongo1:30001], passives=[mongo2:30002, mongo3:30003], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=1, lastWriteDate=Fri Mar 26 02:08:27 CET 2021, lastUpdateTimeNanos=91832460163658}, ServerDescription{address=mongo3:30003, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=3283858, setName='rs0', canonicalAddress=mongo3:30003, hosts=[mongo1:30001], passives=[mongo2:30002, mongo3:30003], arbiters=[], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=1, lastWriteDate=Fri Mar 26 02:08:27 CET 2021, lastUpdateTimeNanos=91832459878686}]}. Waiting for 30000 ms before timing out
I am using the following config:
var cfg = {
"_id": "rs0",
"protocolVersion": 1,
"version": 1,
"members": [
{
"_id": 0,
"host": "mongo1:30001",
"priority": 4
},
{
"_id": 1,
"host": "mongo2:30002",
"priority": 3
},
{
"_id": 2,
"host": "mongo3:30003",
"priority": 2,
}
]
};
rs.initiate(cfg, { force: true });
rs.secondaryOk();
db.getMongo().setReadPref('primary');
rs.isMaster() returns this:
{
"hosts" : [
"mongo1:30001"
],
"passives" : [
"mongo2:30002",
"mongo3:30003"
],
"setName" : "rs0",
"setVersion" : 1,
"ismaster" : true,
"secondary" : false,
"primary" : "mongo1:30001",
"me" : "mongo1:30001",
"electionId" : ObjectId("7fffffff0000000000000017"),
"lastWrite" : {
"opTime" : {
"ts" : Timestamp(1616719738, 1),
"t" : NumberLong(23)
},
"lastWriteDate" : ISODate("2021-03-26T00:48:58Z"),
"majorityOpTime" : {
"ts" : Timestamp(1616719738, 1),
"t" : NumberLong(23)
},
"majorityWriteDate" : ISODate("2021-03-26T00:48:58Z")
},
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 100000,
"localTime" : ISODate("2021-03-26T00:49:08.019Z"),
"logicalSessionTimeoutMinutes" : 30,
"connectionId" : 28,
"minWireVersion" : 0,
"maxWireVersion" : 8,
"readOnly" : false,
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1616719738, 1),
"signature" : {
"hash" : BinData(0,"/+QXGSyYY+M/OXbZ1UixjrDOVz4="),
"keyId" : NumberLong("6942620613131370499")
}
},
"operationTime" : Timestamp(1616719738, 1)
}
Here what I see is hosts list has primary node and passives list have the secondries. I don't know when is the case that all nodes are considered under hosts in a cluster setup, so passives will be empty. The only related info I found is priority of the secondries should not be 0. Otherwise they won't be considered as candidate for the primary election.
"mongo1:30001"
],
"passives" : [
"mongo2:30002",
"mongo3:30003"
],...

From the docs:
isMaster.passives
An array of strings in the format of "[hostname]:[port]" listing all members of the replica set which have a members[n].priority of 0.
This field only appears if there is at least one member with a members[n].priority of 0.
Those nodes have been set to priority 0 somehow, and will therefore never attempt to become primary.

Related

[Mongodb][OpsManager]Mongodb secondary instances are not adding to replica set

I have created 3 mongodb instances with below mongod.conf [Primary1,Secondary1,Secondary2]
net:
port: 27017
bindIp: 0.0.0.0
replication:
replSetName: Replica1
I have started the 3 instance using " sudo service mongod start "
If I connect primary server to other 2 servers using mongo –host "ip" –port 27017..its working and connected.
Issue 1:
After I do rs.initiate() and rs.conf()
Secondary 1 is included in members but when I use rs.status() but showing stateStr" : "STARTUP"
And If I check its log means below error :
"msg":"Failed to reap transaction table","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
Issue 2:
Because of issue 1 if I try to rs.add() for Secondary 2 ,its not working
rs.status() reponse
{
"_id" : 0,
"name" : "primary:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
.......
}
{
"_id" : 1,
"name" : "secondary1:27017",
"health" : 1,
"state" : 0,
"stateStr" : "STARTUP",
"uptime" : 1579,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDurable" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2021-01-08T04:47:35.455Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : -2,
"configTerm" : -1
}
Log Output of secondary 1's mongodb.log
{"t":{"$date":"2021-01-08T04:39:16.634+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"primary:27017"}}
{"t":{"$date":"2021-01-08T04:39:18.004+00:00"},"s":"I", "c":"CONTROL", "id":20714, "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
{"t":{"$date":"2021-01-08T04:39:18.004+00:00"},"s":"I", "c":"CONTROL", "id":20712, "ctx":"LogicalSessionCacheReap","msg":"Sessions collection is not set up; waiting until next sessions reap interval","attr":{"error":"NamespaceNotFound: config.system.sessions does not exist"}}
{"t":{"$date":"2021-01-08T04:39:36.634+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"primary:27017"}}
Issue got fixed. Primary database was able to connect to secondary hosts but in return secondary host can't. So I adjusted the security groups and connected. Takeaway is all the primary and secondary hosts should be able to connect with each other in mongodb port (27017)
Same issue happened for me , networks and everything is fine . Issue with host entry missing for primary server, still secondary was trying to connect DNS name
"ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"MONGODB01:27717"}}
Immediately have added into host entry everything is synced

Our config version of 1 is no larger than the version on torvm-core20.xyz.com:27019, which is 1

I was trying SSL enabled MongoDB 3.7.9 replica sets. below is the code
I ran this command on abc.xyz.com:27019
> rs.initiate({ _id: "rs0", configsvr: true, members: [{ _id : 0, host : "pqr.xyz.com:27019" }, { _id : 1, host : "abc.xyz.com:27019" }]});
{
"ok" : 0,
"errmsg" : "Our config version of 1 is no larger than the version on pqr.xyz.com:27019, which is 1",
"code" : 103,
"codeName" : "NewReplicaSetConfigurationIncompatible",
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("000000000000000000000000")
},
"$clusterTime" : {
"clusterTime" : Timestamp(1536753816, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"lastCommittedOpTime" : Timestamp(0, 0)
}
I did not find any hint on internet. Can someone guide me over this. I am not able to see the relica set created, Ideally it shouls have created.
This error is saying that you are trying to initialize a new replica set, but the node at pqr.xyz.com:27019 is already a member of replica set configuration with the same _id. This error message is to avoid mismatching of RS configs by using the version value. See the code for more details.

In mongodb 3.0 replication, how elections happen when a secondary goes down

Situation: I have a MongoDB replication set over two computers.
One computer is a server that holds the primary node and the arbiter. This server is a live server and is always on. It's local IP that is used in replication is 192.168.0.4.
Second is a PC that the secondary node resides on and is on for a few hours a day. It's local IP that is used in replication is 192.168.0.5.
My expectation: I wanted the live server to be the main point of data interaction of my application, regardless of the state of the PC (whether it is reachable or not, since PC is secondary), so I wanted to make sure that server's node is always primary.
The following is the result of rs.config():
liveSet:PRIMARY> rs.config()
{
"_id" : "liveSet",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.4:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 10,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 1,
"host" : "192.168.0.5:5051",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 2,
"host" : "192.168.0.4:5052",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatTimeoutSecs" : 10,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
Also I have set the storage engine to be WiredTiger, if that matters.
What I actually get, and the problem: When I turn off the PC, or kill its mongod process, then the node on the server becomes secondary.
The following is the output of the server when I killed PC's mongod process, while connected to primary node's shell:
liveSet:PRIMARY>
2015-11-29T10:46:29.471+0430 I NETWORK Socket recv() errno:10053 An established connection was aborted by the software in your host machine. 127.0.0.1:27017
2015-11-29T10:46:29.473+0430 I NETWORK SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:27017]
2015-11-29T10:46:29.475+0430 I NETWORK DBClientCursor::init call() failed
2015-11-29T10:46:29.479+0430 I NETWORK trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2015-11-29T10:46:29.481+0430 I NETWORK reconnect 127.0.0.1:27017 (127.0.0.1) ok
liveSet:SECONDARY>
There are two doubts for me:
Considering this part of MongoDB documentation:
Replica sets use elections to determine which set member will become primary. Elections occur after initiating a replica set, and also any time the primary becomes unavailable.
The election occurs when the primary is not available (or at the time of initiating, however this is part does not concern our case), but primary was always available, so why the election happens.
Considering this part of the same documentation:
If a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining members become read-only.
Considering the part 'members become read-only', I have two nodes up vs one down, so this should not also affect our replication.
Now my question: How to keep the node on the server as primary, when the node on PC is not reachable?
Update:
This is the output of rs.status().
Thanks to Wan Bachtiar, now This makes the behavior obvious, since arbiter was not reachable.
liveSet:PRIMARY> rs.status()
{
"set" : "liveSet",
"date" : ISODate("2015-11-30T04:33:03.864Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.0.4:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1807553,
"optime" : Timestamp(1448796026, 1),
"optimeDate" : ISODate("2015-11-29T11:20:26Z"),
"electionTime" : Timestamp(1448857488, 1),
"electionDate" : ISODate("2015-11-30T04:24:48Z"),
"configVersion" : 2,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.0.5:5051",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 496,
"optime" : Timestamp(1448796026, 1),
"optimeDate" : ISODate("2015-11-29T11:20:26Z"),
"lastHeartbeat" : ISODate("2015-11-30T04:33:03.708Z"),
"lastHeartbeatRecv" : ISODate("2015-11-30T04:33:02.451Z"),
"pingMs" : 1,
"configVersion" : 2
},
{
"_id" : 2,
"name" : "192.168.0.4:5052",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"lastHeartbeat" : ISODate("2015-11-30T04:33:00.008Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"configVersion" : -1
}
],
"ok" : 1
}
liveSet:PRIMARY>
As stated in the documentation, if a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining members become read-only.
In this case the primary has to step down if the arbiter and the secondary are not reachable. rs.status() should be able to determine the health of the replica members.
One thing you should also watch for is the primary oplog size. The size of the oplog determines how long a replica set member can be down for and still be able to catch up when it comes back online. The bigger the oplog size, the longer you can deal with a member being down for as the oplog can hold more operations. If it does fall too far behind, you must resynchronise the member by removing its data files and performing an initial sync.
See Check the size of the Oplog for more info.
Regards,
Wan.

Mongo 2.6.4 won't stepDown() because can't find a secondary within 10 seconds, but rs.status() shows optimeDates in sync

I'm attempting to step down my mongo primary and have my secondaries take over - mongo won't step down and says the my secondaries are more than 10 seconds out of sync. Yet my replica set says they are in sync - I'm baffled and it is likely something silly I'm missing
here's my output:
MongoDB shell version: 2.6.4
sessionV2:PRIMARY> rs.stepDown()
{
"closest" : NumberLong(0),
"difference" : NumberLong(1441842526),
"ok" : 0,
"errmsg" : "no secondaries within 10 seconds of my optime"
}
sessionV2:PRIMARY> rs.status()
{
"set" : "sessionV2",
"date" : ISODate("2015-09-09T23:48:53Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "sessionv2-mongo-replset-moprd1-02:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 2659,
"optime" : Timestamp(1441842533, 61),
"optimeDate" : ISODate("2015-09-09T23:48:53Z"),
"electionTime" : Timestamp(1441839881, 1),
"electionDate" : ISODate("2015-09-09T23:04:41Z"),
"self" : true
},
{
"_id" : 1,
"name" : "sessionv2-mongo-replset-moprd1-01:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2658,
"optime" : Timestamp(1441842531, 120),
"optimeDate" : ISODate("2015-09-09T23:48:51Z"),
"lastHeartbeat" : ISODate("2015-09-09T23:48:51Z"),
"lastHeartbeatRecv" : ISODate("2015-09-09T23:48:51Z"),
"pingMs" : 0,
"syncingTo" : "sessionv2-mongo-replset-moprd1-03:27017"
},
{
"_id" : 2,
"name" : "sessionv2-mongo-replset-moprd1-03:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2658,
"optime" : Timestamp(1441842531, 120),
"optimeDate" : ISODate("2015-09-09T23:48:51Z"),
"lastHeartbeat" : ISODate("2015-09-09T23:48:51Z"),
"lastHeartbeatRecv" : ISODate("2015-09-09T23:48:52Z"),
"pingMs" : 0,
"syncingTo" : "sessionv2-mongo-replset-moprd1-02:27017"
}
],
"ok" : 1
}
sessionV2:PRIMARY>
here's what the primary reports as far a status:
sessionV2:PRIMARY> rs.printSlaveReplicationInfo()
source: sessionv2-mongo-replset-moprd1-01:27017
syncedTo: Wed Sep 09 2015 19:15:02 GMT-0500 (CDT)
1 secs (0 hrs) behind the primary
source: sessionv2-mongo-replset-moprd1-03:27017
syncedTo: Wed Sep 09 2015 19:15:02 GMT-0500 (CDT)
1 secs (0 hrs) behind the primary
sessionV2:PRIMARY>
and an oplog view from a secondary:
sessionV2:SECONDARY> db.getReplicationInfo()
{
"logSizeMB" : 5120,
"usedMB" : 5077.25,
"timeDiff" : 12226,
"timeDiffHours" : 3.4,
"tFirst" : "Wed Sep 09 2015 15:53:29 GMT-0500 (CDT)",
"tLast" : "Wed Sep 09 2015 19:17:15 GMT-0500 (CDT)",
"now" : "Wed Sep 09 2015 19:17:15 GMT-0500 (CDT)"
}
thanks in advance!
2015 Sept 10th update:
we stopped each secondary and performed an initial sync from primary - then attempted to step down the primary again - it looks like the PRIMARY can't find the secondary oplogDate (we were unsure if forcing would free a SECONDARY to take PRIMARY)
sessionV2:PRIMARY> db.runCommand( { replSetStepDown: 60, force: false } )
{
"closest" : NumberLong(0),
"difference" : NumberLong(1441936029),
"ok" : 0,
"errmsg" : "no secondaries within 10 seconds of my optime"
}
issue solved! - I was playing with workarounds and documenting the replica setup - our initial scripts set the primary at default priority and the secondaries at priority 0 (which means they will never take PRIMARY role) - so basically -> bad config and the error message doesn't give any info onto the root problem (the docs are pretty clear, just missed it because I trusted our init scripts) - if you run into this and your replicaset oplogs are up to date, double check that the priorities are not set at 0

mongodb - All nodes in replica set are primary

I am trying to configure a replica set with two nodes but when I execute rs.add("node2") and then rs.status() both nodes are set to PRIMARY. Also when I run rs.status() on the other node the only node that appears is the local one.
Edit1:
rs.status() output:
{
"set" : "rs0",
"date" : ISODate("2012-09-22T01:01:12Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "node1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 70968,
"optime" : Timestamp(1348207012000, 1),
"optimeDate" : ISODate("2012-09-21T05:56:52Z"),
"self" : true
},
{
"_id" : 1,
"name" : "node2:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 68660,
"optime" : Timestamp(1348205568000, 1),
"optimeDate" : ISODate("2012-09-21T05:32:48Z"),
"lastHeartbeat" : ISODate("2012-09-22T01:01:11Z"),
"pingMs" : 0
}
],
"ok" : 1
}
Edit2: I tried doing the same thing with 3 different nodes and I got the same result (rs.status() says I have a replica set with three primary nodes). Is it possible that this problem is caused by some specific configuration of the network?
If you issue rs.initiate() from both of your the members of the replica set before rs.add() then both will come up as primary.
You should only use rs.initiate() on one of the members of the replica set, the one that you intend to be primary initially. Then you can rs.add() the other member to the replica set.
The answer above does not answer how to fix it. I kind of got it done using trial and error.
I have cleaned up the data directory (as in rm -rf *) and restarted these PRIMARY nodes, except one. I added them back. It seems to work.
Edit1
The nice little trick below did not seem to work for me,
So, I logged into the mongod console using mongo <hostname>:27018
here is how the shell looks like:
rs2:PRIMARY> rs.conf()
{
"_id" : "rs2",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "ip-10-159-42-911:27018"
}
]
}
I decided to change it to secondary. So,
rs2:PRIMARY> var c = {
... "_id" : "rs2",
... "version" : 1,
... "members" : [
... {
... "_id" : 1,
... "host" : "ip-10-159-42-911:27018",
... "priority": 0.5
... }
... ]
... }
rs2:PRIMARY> rs.reconfig(c, { "force": true})
Mon Nov 11 19:46:39.244 DBClientCursor::init call() failed
Mon Nov 11 19:46:39.245 trying reconnect to ip-10-159-42-911:27018
Mon Nov 11 19:46:39.245 reconnect ip-10-159-42-911:27018 ok
reconnected to server after rs command (which is normal)
rs2:SECONDARY>
Now it is secondary. I do not know if there is a better way. But this seems to work.
HTH