[Mongodb][OpsManager]Mongodb secondary instances are not adding to replica set - mongodb

I have created 3 mongodb instances with below mongod.conf [Primary1,Secondary1,Secondary2]
net:
port: 27017
bindIp: 0.0.0.0
replication:
replSetName: Replica1
I have started the 3 instance using " sudo service mongod start "
If I connect primary server to other 2 servers using mongo –host "ip" –port 27017..its working and connected.
Issue 1:
After I do rs.initiate() and rs.conf()
Secondary 1 is included in members but when I use rs.status() but showing stateStr" : "STARTUP"
And If I check its log means below error :
"msg":"Failed to reap transaction table","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
Issue 2:
Because of issue 1 if I try to rs.add() for Secondary 2 ,its not working
rs.status() reponse
{
"_id" : 0,
"name" : "primary:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
.......
}
{
"_id" : 1,
"name" : "secondary1:27017",
"health" : 1,
"state" : 0,
"stateStr" : "STARTUP",
"uptime" : 1579,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDurable" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2021-01-08T04:47:35.455Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : -2,
"configTerm" : -1
}
Log Output of secondary 1's mongodb.log
{"t":{"$date":"2021-01-08T04:39:16.634+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"primary:27017"}}
{"t":{"$date":"2021-01-08T04:39:18.004+00:00"},"s":"I", "c":"CONTROL", "id":20714, "ctx":"LogicalSessionCacheRefresh","msg":"Failed to refresh session cache, will try again at the next refresh interval","attr":{"error":"NotYetInitialized: Replication has not yet been configured"}}
{"t":{"$date":"2021-01-08T04:39:18.004+00:00"},"s":"I", "c":"CONTROL", "id":20712, "ctx":"LogicalSessionCacheReap","msg":"Sessions collection is not set up; waiting until next sessions reap interval","attr":{"error":"NamespaceNotFound: config.system.sessions does not exist"}}
{"t":{"$date":"2021-01-08T04:39:36.634+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"primary:27017"}}

Issue got fixed. Primary database was able to connect to secondary hosts but in return secondary host can't. So I adjusted the security groups and connected. Takeaway is all the primary and secondary hosts should be able to connect with each other in mongodb port (27017)

Same issue happened for me , networks and everything is fine . Issue with host entry missing for primary server, still secondary was trying to connect DNS name
"ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"MONGODB01:27717"}}
Immediately have added into host entry everything is synced

Related

MonogoDB Replica Set Status Not changing from Startup to Secondary

I have setup a MongoDB replica set with 3 nodes(vm's running CentOS). One node became Primary other 2 stuck in Startup. When these 2 nodes will change their states from startup to secondary.
aryabhata:PRIMARY> rs.status()
{
"set" : "aryabhata",
"date" : ISODate("2016-04-30T08:10:45.173Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost.localdomain:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 69091,
"optime" : Timestamp(1461935462, 1),
"optimeDate" : ISODate("2016-04-29T13:11:02Z"),
"electionTime" : Timestamp(1461934754, 1),
"electionDate" : ISODate("2016-04-29T12:59:14Z"),
"configVersion" : 459192,
"self" : true
},
{
"_id" : 1,
"name" : "repset1.com:27017",
"health" : 1,
"state" : 0,
"stateStr" : "STARTUP",
"uptime" : 92,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-04-30T08:10:44.485Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"configVersion" : -2
},
{
"_id" : 2,
"name" : "repset2.com:27017",
"health" : 1,
"state" : 0,
"stateStr" : "STARTUP",
"uptime" : 68382,
"lastHeartbeat" : ISODate("2016-04-30T08:10:43.974Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"configVersion" : -2
}
],
"ok" : 1
}
My problem fix with set ip address for Primary instead hostname
cfg = rs.conf()
cfg.members[0].host = "public-or-private-primary-ip:27017"
rs.reconfig(cfg)
after that secondary state change to STARTUP2
From primary check whether you are able to connect to secondary
mongo --host repset1.com --port 27017
When the above one fails may be firewall or BindIP issue.
Check bind_ip (should be 0.0.0.0, change in mongodb.conf is it's 127.0.0.1):
netstat -plunt | grep :27017 | grep LISTEN
Look at the log-files of secondaries, why they are stuck. Did they receive the configuration details?
Try to reconfigure, see mongo replicaset reconfigure
For me the problem was that primary had authorization enabled. In this case the secondaries always stayed in STARTUP.
To use authorization you need to set keyFile in configuration file of all nodes (primary and secondary).
Create mongodb key file on linux:
openssl rand -base64 741 > mongodb.key
chmod 600 mongodb.key
chown mongod:mongod mongodb.key
mongod.conf file:
replication:
replSetName: rs0
security:
authorization: enabled
keyFile: /home/mongodb.key
Source MongoDB replica set with simple password authentication
It requires that both the primary resolves the host name to IP of the secondary and also the secondary resolves the hostname of the primary to an IP.
In my case I forgot to add in hosts file for the secondary to resolve the hostname of the primary. Once I updated the hosts file in the secondary, the state of the secondary transitioned to STARTUP2 and then to SECONDARY.

In mongodb 3.0 replication, how elections happen when a secondary goes down

Situation: I have a MongoDB replication set over two computers.
One computer is a server that holds the primary node and the arbiter. This server is a live server and is always on. It's local IP that is used in replication is 192.168.0.4.
Second is a PC that the secondary node resides on and is on for a few hours a day. It's local IP that is used in replication is 192.168.0.5.
My expectation: I wanted the live server to be the main point of data interaction of my application, regardless of the state of the PC (whether it is reachable or not, since PC is secondary), so I wanted to make sure that server's node is always primary.
The following is the result of rs.config():
liveSet:PRIMARY> rs.config()
{
"_id" : "liveSet",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "192.168.0.4:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 10,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 1,
"host" : "192.168.0.5:5051",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 2,
"host" : "192.168.0.4:5052",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatTimeoutSecs" : 10,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
Also I have set the storage engine to be WiredTiger, if that matters.
What I actually get, and the problem: When I turn off the PC, or kill its mongod process, then the node on the server becomes secondary.
The following is the output of the server when I killed PC's mongod process, while connected to primary node's shell:
liveSet:PRIMARY>
2015-11-29T10:46:29.471+0430 I NETWORK Socket recv() errno:10053 An established connection was aborted by the software in your host machine. 127.0.0.1:27017
2015-11-29T10:46:29.473+0430 I NETWORK SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:27017]
2015-11-29T10:46:29.475+0430 I NETWORK DBClientCursor::init call() failed
2015-11-29T10:46:29.479+0430 I NETWORK trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2015-11-29T10:46:29.481+0430 I NETWORK reconnect 127.0.0.1:27017 (127.0.0.1) ok
liveSet:SECONDARY>
There are two doubts for me:
Considering this part of MongoDB documentation:
Replica sets use elections to determine which set member will become primary. Elections occur after initiating a replica set, and also any time the primary becomes unavailable.
The election occurs when the primary is not available (or at the time of initiating, however this is part does not concern our case), but primary was always available, so why the election happens.
Considering this part of the same documentation:
If a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining members become read-only.
Considering the part 'members become read-only', I have two nodes up vs one down, so this should not also affect our replication.
Now my question: How to keep the node on the server as primary, when the node on PC is not reachable?
Update:
This is the output of rs.status().
Thanks to Wan Bachtiar, now This makes the behavior obvious, since arbiter was not reachable.
liveSet:PRIMARY> rs.status()
{
"set" : "liveSet",
"date" : ISODate("2015-11-30T04:33:03.864Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.0.4:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1807553,
"optime" : Timestamp(1448796026, 1),
"optimeDate" : ISODate("2015-11-29T11:20:26Z"),
"electionTime" : Timestamp(1448857488, 1),
"electionDate" : ISODate("2015-11-30T04:24:48Z"),
"configVersion" : 2,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.0.5:5051",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 496,
"optime" : Timestamp(1448796026, 1),
"optimeDate" : ISODate("2015-11-29T11:20:26Z"),
"lastHeartbeat" : ISODate("2015-11-30T04:33:03.708Z"),
"lastHeartbeatRecv" : ISODate("2015-11-30T04:33:02.451Z"),
"pingMs" : 1,
"configVersion" : 2
},
{
"_id" : 2,
"name" : "192.168.0.4:5052",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"lastHeartbeat" : ISODate("2015-11-30T04:33:00.008Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"configVersion" : -1
}
],
"ok" : 1
}
liveSet:PRIMARY>
As stated in the documentation, if a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining members become read-only.
In this case the primary has to step down if the arbiter and the secondary are not reachable. rs.status() should be able to determine the health of the replica members.
One thing you should also watch for is the primary oplog size. The size of the oplog determines how long a replica set member can be down for and still be able to catch up when it comes back online. The bigger the oplog size, the longer you can deal with a member being down for as the oplog can hold more operations. If it does fall too far behind, you must resynchronise the member by removing its data files and performing an initial sync.
See Check the size of the Oplog for more info.
Regards,
Wan.

Sitecore/xDB/MongoDB arbiter instance receiving application account connection attempts

I have a Sitecore 8 test environment with three mongo 2.6.5 instances for xDB. These are configured as a replication set using a keyfile with two servers 'mongotest1' & 'mongotest2' and one arbiter 'mongotest3'. A non-admin account has been createdin mongo for the web application, stored in the admin database with readWrite permissions in all five xDB databases. My sitecore connection strings are in the format:
connectionString="mongodb://webapp:password#mongotest1:27018,mongotest2.local:27019/sc8-tracking-history?replicaSet=repSet1&authSource=admin
Note that the arbiter is not specified in the connection string, which is normal.
rs.status() gives the following, which looks correct:
"date" : ISODate("2015-07-23T16:47:08Z"),
"myState" : 2,
"syncingTo" : "mongotest1.local:27018",
"members" : [
{
"_id" : 0,
"name" : "mongotest1.local:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 93,
"optime" : Timestamp(1437668096, 1),
"optimeDate" : ISODate("2015-07-23T16:14:56Z"),
"lastHeartbeat" : ISODate("2015-07-23T16:47:07Z"),
"lastHeartbeatRecv" : ISODate("2015-07-23T16:47:08Z"),
"pingMs" : 0,
"electionTime" : Timestamp(1437669503, 1),
"electionDate" : ISODate("2015-07-23T16:38:23Z")
},
{
"_id" : 1,
"name" : "mongotest2.local:27019",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 93,
"optime" : Timestamp(1437668096, 1),
"optimeDate" : ISODate("2015-07-23T16:14:56Z"),
"infoMessage" : "syncing to: mongotest1.local:27018",
"self" : true
},
{
"_id" : 2,
"name" : "mongotest3.local:27020",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 91,
"lastHeartbeat" : ISODate("2015-07-23T16:47:07Z"),
"lastHeartbeatRecv" : ISODate("2015-07-23T16:47:07Z"),
"pingMs" : 0
}
],
"ok" : 1
When the three mongo instances are started, normal log output is observed. However, after the Sitecore web application connects to mongo I see connection attempts using the 'webapp' account to the arbiter service which fail because the arbiter does not hold an admin database with any user accounts defined. The frequency of this increases if I shutdown mongotest1 and mongotest2 instances. If I shutdown Sitecore, the connection attempts stop.
2015-07-23T18:02:42.741+0100 [conn699] end connection 127.0.0.1:62206 (2 connections now open)
2015-07-23T18:02:43.610+0100 [conn689] end connection 127.0.0.1:62193 (1 connection now open)
2015-07-23T18:02:43.611+0100 [initandlisten] connection accepted from 127.0.0.1:62207 #700 (3 connections now open)
2015-07-23T18:02:43.613+0100 [conn700] authenticate db: local { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }
2015-07-23T18:02:43.890+0100 [initandlisten] connection accepted from 127.0.0.1:62208 #701 (3 connections now open)
2015-07-23T18:02:43.891+0100 [conn701] authenticate db: admin { authenticate: 1, user: "webapp", nonce: "xxx", key: "xxx" }
2015-07-23T18:02:43.891+0100 [conn701] Failed to authenticate webapp#admin with mechanism MONGODB-CR: AuthenticationFailed UserNotFound Could not find
user webapp#admin
Why is this happening? The arbiter shouldn't be receiving any connection attempts from clients. Its as if the replica set is sharing the arbiter connection to the web application which then invokes some connection attempts.
This was a known c-sharp driver issue, see https://jira.mongodb.org/browse/CSHARP-851
Some aspects of the situation are Sitecore release specific - the bundled driver (v1.8.3.9) seems to slightly predate the driver released with v2.6.5, which meant it predated the fix.
Sitecore 8 update 5 updates Mongo and includes the fix. Alternatively a minimal fix can be achieved by upgrading the bundled MongoDB.Driver.dll to v1.9 and adding an assembly redirect to web.config.
I blogged a more verbose version of this answer at http://blog.paulgeorge.co.uk/2016/01/26/sitecore-mongo-arbiter-shows-failed-authentication-attempts/

mongodb java driver not finding master after rs.stepDown()

Grails 2.2.1
MongoDB GORM plugin 1.2
When running with a replica set I am finding that stepping down the primary causes the following infinitely repeated errors in the java driver.
2013-09-09 16:00:19,655 [SimpleAsyncTaskExecutor-1] ERROR grails.app.services.plover.UserStreamAnalyzerService - Exception while handling status update event: org.springframework.data.mongodb.UncategorizedMongoDbException: not talking to master and retries used up; nested exception is com.mongodb.MongoException: not talking to master and retries used up
...
Caused by: org.springframework.data.mongodb.UncategorizedMongoDbException: not talking to master and retries used up; nested exception is com.mongodb.MongoException: not talking to master and retries used up
The stacktrace is here:
Caused by: com.mongodb.MongoException: not talking to master and retries used up
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:314)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:316)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:257)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:310)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
at com.mongodb.DBCursor._check(DBCursor.java:368)
at com.mongodb.DBCursor._hasNext(DBCursor.java:459)
at com.mongodb.DBCursor._fill(DBCursor.java:518)
at com.mongodb.DBCursor.toArray(DBCursor.java:553)
at com.mongodb.DBCursor.toArray(DBCursor.java:542)
at org.grails.datastore.mapping.mongo.query.MongoQuery$MongoResultList.<init>(MongoQuery.java:908)
at org.grails.datastore.mapping.mongo.query.MongoQuery$36.doInDB(MongoQuery.java:536)
at org.grails.datastore.mapping.mongo.query.MongoQuery$36.doInDB(MongoQuery.java:508)
I have setup a local test environment to replicate this problem. Here is the config output:
{
"set" : "rsMesh",
"date" : ISODate("2013-09-10T01:08:20Z"),
"myState" : 2,
"syncingTo" : "macbookpro.local:27018",
"members" : [
{
"_id" : 1,
"name" : "macbookpro.local:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 9940,
"optime" : {
"t" : 1378767619,
"i" : 5
},
"optimeDate" : ISODate("2013-09-09T23:00:19Z"),
"lastHeartbeat" : ISODate("2013-09-10T01:08:19Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "macbookpro.local:27019",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 9914,
"lastHeartbeat" : ISODate("2013-09-10T01:08:19Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "macbookpro.local:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 10392,
"optime" : {
"t" : 1378767619,
"i" : 5
},
"optimeDate" : ISODate("2013-09-09T23:00:19Z"),
"self" : true
}
],
"ok" : 1
}
Replica set configuration has been set in Datasource.groovy as per documentation:
grails {
mongo {
replicaSet = ["macbookpro.local:27017", "macbookpro.local:27018", "macbookpro.local:27019"]
}
}
So I am not running in standalone, the replica set servers are synched properly, and all servers are running properly. But if I force a new server to become the primary then all access appears to fail as if the driver was not redirecting queries to the new primary.
What am I missing?
No one ever answered this so I decided that the recovery from replica sets was non-functional. Instead I went to sharding and hoped that layering a mongos between the app server and the clusters themselves would provide enough protection.
The answer was a definitive "sort of". Before, when I stepped a primary down (or simulated a crash) the app server would hang indefinitely. Now I just get a few errors about not finding a primary in a given cluster and then the system recovers. Not a desirable solution but at least it is better than a permanent failure.

Why a member of mongodb keep RECOVERING?

I set up a replica set with three members and one of them is an arbiter.
One time I restart a member, the member keep RECOVERING for a long time and did not be SECONDARY again, even though the database was not large.
The status of replica set is like that:
rs:PRIMARY> rs.status()
{
"set" : "rs",
"date" : ISODate("2013-01-17T02:08:57Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "192.168.1.52:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 67968,
"optime" : Timestamp(1358388479000, 1),
"optimeDate" : ISODate("2013-01-17T02:07:59Z"),
"self" : true
},
{
"_id" : 2,
"name" : "192.168.1.50:29017",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 107,
"lastHeartbeat" : ISODate("2013-01-17T02:08:56Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "192.168.1.50:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 58,
"optime" : Timestamp(1358246732000, 100),
"optimeDate" : ISODate("2013-01-15T10:45:32Z"),
"lastHeartbeat" : ISODate("2013-01-17T02:08:55Z"),
"pingMs" : 0,
"errmsg" : "still syncing, not yet to minValid optime 50f6472f:5d"
}
],
"ok" : 1
}
How should I fix this problem?
I had exact same issue: Secondary member of replica stuck in recovering mode.
Here how to solve the issue:
stop secondary mongo db
delete all secondary db data files
start secondary mongo
It will start in startup2 mode and will replicate all data from Primary
I've fixed the issue by following the below procedure.
Step1:
Login to different node and remove the issue node from mongodb replicaset. eg.
rs.remove("10.x.x.x:27017")
Step 2:
Stop the mongodb server on the issue node
systemctl stop mongodb.service
Step 3:
Create a new new folder on the dbpath
mkdir /opt/mongodb/data/db1
Note : existing path was /opt/mongodb/data/db
Step 4:
Modify dbpath on /etc/mongod.conf or mongdb yaml file
dbPath: /opt/mongodb/data/db1
Step 5:
Start the mongodb service
systemctl start mongodb.service
Step 6:
Takebackup of the existing folder and remove it
mkdir /opt/mongodb/data/backup
mv /opt/mongodb/data/db/* /opt/mongodb/data/backup
tar -cvf /opt/mongodb/data/backup.tar.gz /opt/mongodb/data/backup
rm -rf /opt/mongodb/data/db/
This will happen if replication has been broken for a while and on the slave it's not enough data to resume replication.
You would have to re-sync the slave either by replicating data from scratch or by copying it from another server and then resume it.
Check mongodb documentation for this issue https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#replica-set-auto-resync-stale-member