Why a member of mongodb keep RECOVERING? - mongodb

I set up a replica set with three members and one of them is an arbiter.
One time I restart a member, the member keep RECOVERING for a long time and did not be SECONDARY again, even though the database was not large.
The status of replica set is like that:
rs:PRIMARY> rs.status()
{
"set" : "rs",
"date" : ISODate("2013-01-17T02:08:57Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "192.168.1.52:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 67968,
"optime" : Timestamp(1358388479000, 1),
"optimeDate" : ISODate("2013-01-17T02:07:59Z"),
"self" : true
},
{
"_id" : 2,
"name" : "192.168.1.50:29017",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 107,
"lastHeartbeat" : ISODate("2013-01-17T02:08:56Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "192.168.1.50:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 58,
"optime" : Timestamp(1358246732000, 100),
"optimeDate" : ISODate("2013-01-15T10:45:32Z"),
"lastHeartbeat" : ISODate("2013-01-17T02:08:55Z"),
"pingMs" : 0,
"errmsg" : "still syncing, not yet to minValid optime 50f6472f:5d"
}
],
"ok" : 1
}
How should I fix this problem?

I had exact same issue: Secondary member of replica stuck in recovering mode.
Here how to solve the issue:
stop secondary mongo db
delete all secondary db data files
start secondary mongo
It will start in startup2 mode and will replicate all data from Primary

I've fixed the issue by following the below procedure.
Step1:
Login to different node and remove the issue node from mongodb replicaset. eg.
rs.remove("10.x.x.x:27017")
Step 2:
Stop the mongodb server on the issue node
systemctl stop mongodb.service
Step 3:
Create a new new folder on the dbpath
mkdir /opt/mongodb/data/db1
Note : existing path was /opt/mongodb/data/db
Step 4:
Modify dbpath on /etc/mongod.conf or mongdb yaml file
dbPath: /opt/mongodb/data/db1
Step 5:
Start the mongodb service
systemctl start mongodb.service
Step 6:
Takebackup of the existing folder and remove it
mkdir /opt/mongodb/data/backup
mv /opt/mongodb/data/db/* /opt/mongodb/data/backup
tar -cvf /opt/mongodb/data/backup.tar.gz /opt/mongodb/data/backup
rm -rf /opt/mongodb/data/db/

This will happen if replication has been broken for a while and on the slave it's not enough data to resume replication.
You would have to re-sync the slave either by replicating data from scratch or by copying it from another server and then resume it.

Check mongodb documentation for this issue https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#replica-set-auto-resync-stale-member

Related

MonogoDB Replica Set Status Not changing from Startup to Secondary

I have setup a MongoDB replica set with 3 nodes(vm's running CentOS). One node became Primary other 2 stuck in Startup. When these 2 nodes will change their states from startup to secondary.
aryabhata:PRIMARY> rs.status()
{
"set" : "aryabhata",
"date" : ISODate("2016-04-30T08:10:45.173Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost.localdomain:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 69091,
"optime" : Timestamp(1461935462, 1),
"optimeDate" : ISODate("2016-04-29T13:11:02Z"),
"electionTime" : Timestamp(1461934754, 1),
"electionDate" : ISODate("2016-04-29T12:59:14Z"),
"configVersion" : 459192,
"self" : true
},
{
"_id" : 1,
"name" : "repset1.com:27017",
"health" : 1,
"state" : 0,
"stateStr" : "STARTUP",
"uptime" : 92,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-04-30T08:10:44.485Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"configVersion" : -2
},
{
"_id" : 2,
"name" : "repset2.com:27017",
"health" : 1,
"state" : 0,
"stateStr" : "STARTUP",
"uptime" : 68382,
"lastHeartbeat" : ISODate("2016-04-30T08:10:43.974Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"configVersion" : -2
}
],
"ok" : 1
}
My problem fix with set ip address for Primary instead hostname
cfg = rs.conf()
cfg.members[0].host = "public-or-private-primary-ip:27017"
rs.reconfig(cfg)
after that secondary state change to STARTUP2
From primary check whether you are able to connect to secondary
mongo --host repset1.com --port 27017
When the above one fails may be firewall or BindIP issue.
Check bind_ip (should be 0.0.0.0, change in mongodb.conf is it's 127.0.0.1):
netstat -plunt | grep :27017 | grep LISTEN
Look at the log-files of secondaries, why they are stuck. Did they receive the configuration details?
Try to reconfigure, see mongo replicaset reconfigure
For me the problem was that primary had authorization enabled. In this case the secondaries always stayed in STARTUP.
To use authorization you need to set keyFile in configuration file of all nodes (primary and secondary).
Create mongodb key file on linux:
openssl rand -base64 741 > mongodb.key
chmod 600 mongodb.key
chown mongod:mongod mongodb.key
mongod.conf file:
replication:
replSetName: rs0
security:
authorization: enabled
keyFile: /home/mongodb.key
Source MongoDB replica set with simple password authentication
It requires that both the primary resolves the host name to IP of the secondary and also the secondary resolves the hostname of the primary to an IP.
In my case I forgot to add in hosts file for the secondary to resolve the hostname of the primary. Once I updated the hosts file in the secondary, the state of the secondary transitioned to STARTUP2 and then to SECONDARY.

mongodb java driver not finding master after rs.stepDown()

Grails 2.2.1
MongoDB GORM plugin 1.2
When running with a replica set I am finding that stepping down the primary causes the following infinitely repeated errors in the java driver.
2013-09-09 16:00:19,655 [SimpleAsyncTaskExecutor-1] ERROR grails.app.services.plover.UserStreamAnalyzerService - Exception while handling status update event: org.springframework.data.mongodb.UncategorizedMongoDbException: not talking to master and retries used up; nested exception is com.mongodb.MongoException: not talking to master and retries used up
...
Caused by: org.springframework.data.mongodb.UncategorizedMongoDbException: not talking to master and retries used up; nested exception is com.mongodb.MongoException: not talking to master and retries used up
The stacktrace is here:
Caused by: com.mongodb.MongoException: not talking to master and retries used up
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:314)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:316)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:257)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:310)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
at com.mongodb.DBCursor._check(DBCursor.java:368)
at com.mongodb.DBCursor._hasNext(DBCursor.java:459)
at com.mongodb.DBCursor._fill(DBCursor.java:518)
at com.mongodb.DBCursor.toArray(DBCursor.java:553)
at com.mongodb.DBCursor.toArray(DBCursor.java:542)
at org.grails.datastore.mapping.mongo.query.MongoQuery$MongoResultList.<init>(MongoQuery.java:908)
at org.grails.datastore.mapping.mongo.query.MongoQuery$36.doInDB(MongoQuery.java:536)
at org.grails.datastore.mapping.mongo.query.MongoQuery$36.doInDB(MongoQuery.java:508)
I have setup a local test environment to replicate this problem. Here is the config output:
{
"set" : "rsMesh",
"date" : ISODate("2013-09-10T01:08:20Z"),
"myState" : 2,
"syncingTo" : "macbookpro.local:27018",
"members" : [
{
"_id" : 1,
"name" : "macbookpro.local:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 9940,
"optime" : {
"t" : 1378767619,
"i" : 5
},
"optimeDate" : ISODate("2013-09-09T23:00:19Z"),
"lastHeartbeat" : ISODate("2013-09-10T01:08:19Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "macbookpro.local:27019",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 9914,
"lastHeartbeat" : ISODate("2013-09-10T01:08:19Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "macbookpro.local:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 10392,
"optime" : {
"t" : 1378767619,
"i" : 5
},
"optimeDate" : ISODate("2013-09-09T23:00:19Z"),
"self" : true
}
],
"ok" : 1
}
Replica set configuration has been set in Datasource.groovy as per documentation:
grails {
mongo {
replicaSet = ["macbookpro.local:27017", "macbookpro.local:27018", "macbookpro.local:27019"]
}
}
So I am not running in standalone, the replica set servers are synched properly, and all servers are running properly. But if I force a new server to become the primary then all access appears to fail as if the driver was not redirecting queries to the new primary.
What am I missing?
No one ever answered this so I decided that the recovery from replica sets was non-functional. Instead I went to sharding and hoped that layering a mongos between the app server and the clusters themselves would provide enough protection.
The answer was a definitive "sort of". Before, when I stepped a primary down (or simulated a crash) the app server would hang indefinitely. Now I just get a few errors about not finding a primary in a given cluster and then the system recovers. Not a desirable solution but at least it is better than a permanent failure.

Mongodb replica set status showing "RECOVERING"

I have setup the replica set over 3 mongo server and imported the 5 GB data.
now status of secondary server showing "RECOVERING".
Could you let me know what is means for "RECOVERING" and how to solve this issue.
Status is as below
rs.status()
{
"set" : "kutendarep",
"date" : ISODate("2013-01-15T05:04:18Z"),
"myState" : 3,
"members" : [
{
"_id" : 0,
"name" : "10.1.4.138:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 86295,
"optime" : Timestamp(1357901076000, 4),
"optimeDate" : ISODate("2013-01-11T10:44:36Z"),
"errmsg" : "still syncing, not yet to minValid optime 50f04941:2",
"self" : true
},
{
"_id" : 1,
"name" : "10.1.4.21:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 86293,
"optime" : Timestamp(1358160135000, 18058),
"optimeDate" : ISODate("2013-01-14T10:42:15Z"),
"lastHeartbeat" : ISODate("2013-01-15T05:04:18Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "10.1.4.88:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 86291,
"optime" : Timestamp(1357900674000, 10),
"optimeDate" : ISODate("2013-01-11T10:37:54Z"),
"lastHeartbeat" : ISODate("2013-01-15T05:04:16Z"),
"pingMs" : 0,
"errmsg" : "still syncing, not yet to minValid optime 50f04941:2"
}
],
"ok" : 1
The message on the "RECOVERING" replica set nodes means that these are still performing the initial sync.
These nodes are not available for reads until they transitions to the Secondary state.
There are several steps in the initial sync.
See here for more information about the replica set synchronization process:
http://docs.mongodb.org/manual/core/replica-set-sync/
Login to RECOVERING instance.
Check RECOVERING instance replication status with,
db.printReplicationInfo()
You will get a result like this,
oplog first event time: Tue Jul 30 2019 17:26:37 GMT+0000 (UTC)
oplog last event time: Wed Jul 31 2019 16:46:53 GMT+0000
now: Thu Aug 22 2019 07:36:38 GMT+0000 (UTC)
If you find the difference between oplog last event time and now.
That means this particular instance is not PRIMARY and SECONDARY and not an active member of the replica set.
Now there are two solutions for this
First,
1. Login to RECOVERING instance
2. Delete data from existing db which will be /data/db
3. Restart this RECOVERING instance
4. (optional) If you find the following error. Remove that mongod.pid from the specified location.
Error starting mongod. /var/run/mongod/mongod.pid
5. Restart instance.
6. Now your recovering instance will be running state and It will show PRIMARY or Secondary in place of RECOVERING.
Second,
Copy other running instance data into RECOVERING instance and restart mongodb.

mongodb replica set with multiple primaries and pingMS=0

I am trying to set a replica set with two nodes: Node0 and Node1. From the Node0 I initialized a replica set named "rs0" and added Node1 to it. The problem is that it is added as a primary node instead of a secondary node and the final result is a replica set with two primary nodes.
This is the result of executing the rs.status() command from Node0
"set" : "rs0",
"date" : ISODate("2012-10-23T21:03:37Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "Node0:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 61185,
"optime" : Timestamp(1350967947000, 1),
"optimeDate" : ISODate("2012-10-23T04:52:27Z"),
"self" : true
},
{
"_id" : 1,
"name" : "Node1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 58270,
"optime" : Timestamp(1350956423000, 1),
"optimeDate" : ISODate("2012-10-23T01:40:23Z"),
"lastHeartbeat" : ISODate("2012-10-23T21:03:37Z"),
"pingMs" : 0
}
],
If I execute the same command from Node1 the only node listed is itself. Note that the pingMs is 0. Trying to add a third node or an arbiter give similar results: each one is added as primary and the pingMS is always 0.
You mentioned that you run rs.initiate() on both servers. This should only be done on one.
I suggest you start from scratch, by deleting the dbpath directory for each node (backup data before if the db was not empty). Then, start all mongod processes, log into one of them, then call
rs.initiate()
rs.add(<other node 1>)
The other node gets the replica set configuration through the first one automatically. Repeat `rs.add() for each additional node you want to add.
I ran into the same situation, having wrongly run rs.initiate() on two instances. I solved this by shutting down the should-be second instance, removing the data directory and relaunching the instance. Upon restart, it is properly detected as a member of the replica set, is properly synchronized, and most importantly there is only one primary.
This operation should not be dangerous since, to my knowledge, a replica set replicate all the data on the nodes. To be sure, you could just move the data directory after the shutdown of secondary node, so that you would keep a backup in case anything went wrong.

mongodb - All nodes in replica set are primary

I am trying to configure a replica set with two nodes but when I execute rs.add("node2") and then rs.status() both nodes are set to PRIMARY. Also when I run rs.status() on the other node the only node that appears is the local one.
Edit1:
rs.status() output:
{
"set" : "rs0",
"date" : ISODate("2012-09-22T01:01:12Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "node1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 70968,
"optime" : Timestamp(1348207012000, 1),
"optimeDate" : ISODate("2012-09-21T05:56:52Z"),
"self" : true
},
{
"_id" : 1,
"name" : "node2:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 68660,
"optime" : Timestamp(1348205568000, 1),
"optimeDate" : ISODate("2012-09-21T05:32:48Z"),
"lastHeartbeat" : ISODate("2012-09-22T01:01:11Z"),
"pingMs" : 0
}
],
"ok" : 1
}
Edit2: I tried doing the same thing with 3 different nodes and I got the same result (rs.status() says I have a replica set with three primary nodes). Is it possible that this problem is caused by some specific configuration of the network?
If you issue rs.initiate() from both of your the members of the replica set before rs.add() then both will come up as primary.
You should only use rs.initiate() on one of the members of the replica set, the one that you intend to be primary initially. Then you can rs.add() the other member to the replica set.
The answer above does not answer how to fix it. I kind of got it done using trial and error.
I have cleaned up the data directory (as in rm -rf *) and restarted these PRIMARY nodes, except one. I added them back. It seems to work.
Edit1
The nice little trick below did not seem to work for me,
So, I logged into the mongod console using mongo <hostname>:27018
here is how the shell looks like:
rs2:PRIMARY> rs.conf()
{
"_id" : "rs2",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "ip-10-159-42-911:27018"
}
]
}
I decided to change it to secondary. So,
rs2:PRIMARY> var c = {
... "_id" : "rs2",
... "version" : 1,
... "members" : [
... {
... "_id" : 1,
... "host" : "ip-10-159-42-911:27018",
... "priority": 0.5
... }
... ]
... }
rs2:PRIMARY> rs.reconfig(c, { "force": true})
Mon Nov 11 19:46:39.244 DBClientCursor::init call() failed
Mon Nov 11 19:46:39.245 trying reconnect to ip-10-159-42-911:27018
Mon Nov 11 19:46:39.245 reconnect ip-10-159-42-911:27018 ok
reconnected to server after rs command (which is normal)
rs2:SECONDARY>
Now it is secondary. I do not know if there is a better way. But this seems to work.
HTH