I am trying to add one more replica set in existing sets and getting problem in reachability.
What are the reasons when we get Not reachable/healthy replica set ?
"name" : "IP ADDRESS",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"t" : 0,
"i" : 0
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-06-18T10:52:50Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0
similar problem i had , the solution was to have a keyfile.
http://docs.mongodb.org/manual/tutorial/deploy-replica-set-with-auth/#create-the-key-file-to-be-used-by-each-member-of-the-replica-set
I could ping and telnet both server but facing the same issue.
Error I was getting "[ReplicationExecutor] Error in heartbeat request to prodmongo:27017; HostUnreachable Connection refused"
Also I had the "(not reachable/healthy)" stateStr
Please check the key on both server, all replica set should run with same key. I had the same issue and I found that key was not identical in my secondary server.
This issue in my case happened when it seems that after replicaSet was created the primary could not reach secondary on 27017(or any other configured port).
Writing the data on primary was fine and even the secondaries also synced the data from the primary. I wonder if the secondary tail the oplog of primary on some other port, other than the ones configured in replicaSet configuration
I also faced similar type of problem. But solved
If replication server is different then first check mongodb access from other server. Check that mongodb port is opened.
For that I connected mongodb server from other server
Second case In my case I started mongodb without "replSet" , its given me problem of "Not reachable/healthy Replica Set" for solve this problem
I again started mongodb with "--replSet" on other computer where my mongodb running. then run rs.add("ServerName:PortNumber") on Master replication Server.
!worked for me
Related
I have a mongoDB cluster
server1:27017
server2:27017
server3:27017
For historical reason, IT team could not provide the replicaSet name for this cluster.
My question is: without knowing the replicaSet name, is the following mongoDB url legal and will missing the optional replicaSet optional parameter cause any possible problems in future?
mongodb://username:password#server1:27017,server2:27017,server3:27017
I am using Java to setup MongoDB connection using the following
String MONGO_REPLICA_SET = "mongodb://username:password#server1:27017,server2:27017,server3:27017";
MongoClientURI mongoClientURI = new MongoClientURI(MONGODB_REPLICA_SET);
mongoClient = new MongoClient(mongoClientURI);
To clarify, although it may be functional to connect to the replica set it would be preferable to specify the replicaSet option.
Depending on the MongoDB Drivers that you're using it may behaves slightly differently. For example quoting the Server Discovery and Monitoring Spec for Initial Topology Type:
In the Java driver a single seed means Single, but a list containing one seed means Unknown, so it can transition to replica-set monitoring if the seed is discovered to be a replica set member. In contrast, PyMongo requires a non-null setName in order to begin replica-set monitoring, regardless of the number of seeds.
There are variations, and it's best to check whether the connection can still handle topology discovery and failover.
For historical reason, IT team could not provide the replicaSet name for this cluster.
If you have access to the admin database, you could execute rs.status() on mongo shell to find out the name of the replica set. See also replSetGetStatus for more information.
It should be possible to find out the name of the replica set, to avoid this worry. Open a connection to any one of your nodes (e.g. direct to server1:27017), and run rs.status(); that will tell you the name of your replica set as well as lots of other useful data such as the complete set of configured nodes and their individual statuses.
In this example of the output, "rsInternalTest" is the replica set name:
{
"set" : "rsInternalTest",
"date" : ISODate("2018-05-01T11:38:32.608Z"),
"myState" : 1,
"term" : NumberLong(123),
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
...
},
"members" : [
{
"_id" : 1,
"name" : "server1:27017",
"health" : 1.0,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1652592,
"optime" : {
"ts" : Timestamp(1525174711, 1),
"t" : NumberLong(123)
},
"optimeDate" : ISODate("2018-05-01T11:38:31.000Z"),
"electionTime" : Timestamp(1524371004, 1),
"electionDate" : ISODate("2018-04-22T04:23:24.000Z"),
"configVersion" : 26140,
"self" : true
}
...
],
"ok" : 1.0
}
Note that you will need the login of a high-level user account, otherwise you won't have permission to run rs.status().
Most guidelines recommend to use mongodump/mongorestore, but for large product databases downtime can be very long
You can use replication and an additional server for this or the same server if the load allows.
You need 3 running MongoDB instance:
Your server you want to update (remind that WiredTiger support since 3.0).
Second instance of MongoDB which can be run on an additional server. Database will be temporarily copied to it by the replication.
And the third instance of MongoDB is arbiter, which doesn’t store data and only participates in the election of primary server. The arbiter can be run on the additional server on a separate port.
Anyway you need to backup your database. You can run “mongodump” without parameters and directory “./dump” will be created with the database dump. You can use “--gzip“ parameter to compress result size.
mongodump --gzip
Just in case, the command to restore:
mongorestore --gzip
It should be run in the same directory where “./dump” dir and “--gzip“ parameter should be added if used in “mongodump”.
Begin configure from the additional server. My target system is Linux RedHat without Internet, so I download and install MongoDB via RPM manually. Add the section to /etc/mongod.conf:
replication:
oplogSizeMB: 10240
replSetName: REPLICA
Check that the net section look like this to allow access from other servers:
net:
bindIp: 0.0.0.0
port: 27017
and run:
service mongod start
Run the third MongoDB instance - arbiter. It can work on the additional server on a different port. Create a temporary directory for the arbiter database:
mkdir /tmp/mongo
chmod 777 -R /tmp/mongo
and run:
mongod --dbpath /tmp/mongo --port 27001 --replSet REPLICA \
--fork --logpath /tmp/mongo/db1.log
Now configure the main server. Edit /etc/mongod.conf
replication:
oplogSizeMB: 10240
replSetName: REPLICA
and restart MongoDB on the main server:
service mongod restart
It’s important! After restarting the main server read operations may be unavailable. I was getting the following error:
{ "ok" : 0, "errmsg" : "node is recovering", "code" : 13436 }
So as quickly as possible you need to connect to MongoDB on the main server via “mongo” console and run the following command to configure replication:
rs.initiate(
{
_id: "REPLICA",
members: [
{ _id: 0, host : "<IP address of main server>:27017",
priority: 1.0 },
{ _id: 1, host : "<IP address of additional server>:27017",
priority: 0.5 },
{ _id: 2, host : "<IP address of additional server(the arbiter)>:27001",
arbiterOnly : true, priority: 0.5 }
]
}
)
After this operation all actions with MongoDB will be available and data synchronization will be started.
I don’t recommend to use rs.initiate() on the main server without parameters as in most tutorials, because name of the main server will be configured by default as DNS-name from the /etc/hostname. It's not very convenient for me because I use IP-addresses for communications in my projects.
To check the synchronization progress you can call from “mongo” console:
rs.status()
Result example:
{
"set" : "REPLICA",
"date" : ISODate("2017-01-19T14:30:34.292Z"),
"myState" : 1,
"term" : NumberLong(1),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "<IP address of main server>:27017",
"health" : 1.0,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 165,
"optime" : {
"ts" : Timestamp(6377323060650835, 3),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2017-01-19T14:30:33.000Z"),
"infoMessage" : "could not find member to sync from",
"electionTime" : Timestamp(6377322974751490, 1),
"electionDate" : ISODate("2017-01-19T14:30:13.000Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "<IP address of additional server>:27017",
"health" : 1.0,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 30,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00.000Z"),
"lastHeartbeat" : ISODate("2017-01-19T14:30:33.892Z"),
"lastHeartbeatRecv" : ISODate("2017-01-19T14:30:34.168Z"),
"pingMs" : NumberLong(3),
"syncingTo" : "<IP address of main server>:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "<IP address of additional server (the arbiter)>:27001",
"health" : 1.0,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 30,
"lastHeartbeat" : ISODate("2017-01-19T14:30:33.841Z"),
"lastHeartbeatRecv" : ISODate("2017-01-19T14:30:30.158Z"),
"pingMs" : NumberLong(0),
"configVersion" : 1
}
],
"ok" : 1.0
}
After “stateStr” of the additional server will be replaced from ”STARTUP2” to ”SECONDARY”, our servers are synchronized.
While we wait for the end of the synchronization, it is necessary to modify client applications a little bit they can work with all servers in replica.
If you use the ConnectionString, you should replace it with something like:
mongodb://<IP address of main server>:27017,<IP address of additional server>:27017,<IP address of additional server (the arbiter)>:27001/?replicaSet=REPLICA
If you use C++ mongo-cxx-driver legacy, as I am, you should to use mongo::DBClientReplicaSet instead mongo::DBClientConnection and list all three servers in connection parameters, including the arbiter.
There is a third option - you can simply change IP of MongoDB server in clients after switching PRIMARY-SECONDARY, but it's not very fair.
After the synchronization has ended and an additional server status has established as SECONDARY, we need to switch the PRIMARY and SECONDARY by executing the command in “mongo” console on the main server. This is important because command will not work on the additional server.
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 1
cfg.members[2].priority = 0.5
rs.reconfig(cfg)
Then check server status by executing:
rs.status()
Stop the MongoDB on the main server
service mongod stop
and simply delete the entire contents of a directory with database. It is safe, because we have a working copy on the additional server, and in the beginning we have made a backup. Be careful. MongoDB doesn’t create a database directory itself. If you've deleted it, you need not only to restore
mkdir /var/lib/mongo
and setup owner:
chown -R mongod:mongod /var/lib/mongo
Check storage engine wiredTiger is configured in /etc/mongod.conf. From 3.2 it is used by default:
storage:
...
engine: wiredTiger
...
And run MongoDB:
service mongod start
The main server will get the configuration from the secondary server automatically and data will be synced back to WiredTiger storage.
After the synchronization is finished switch the PRIMARY server back. This operation should be performed on an additional server because it is the PRIMARY now.
cfg = rs.conf()
cfg.members[0].priority = 1
cfg.members[1].priority = 0.5
cfg.members[2].priority = 0.5
rs.reconfig(cfg)
Return the old version of database clients or change ConnectionString back.
Now turn off replication if necessary. Remove 2 replication servers from the main server:
rs.remove("<IP address of additional server>:27017")
rs.remove("<IP address of additional server (the arbiter)>:27001")
Remove all “replication” section from /etc/mongod.conf and restart MongoDB:
service mongod restart
After these we get the warning when connected via the “mongo” console:
2017-01-19T12:26:51.948+0300 I STORAGE [initandlisten] ** WARNING: mongod started without --replSet yet 1 documents are present in local.system.replset
2017-01-19T12:26:51.948+0300 I STORAGE [initandlisten] ** Restart with --replSet unless you are doing maintenance and no other clients are connected.
2017-01-19T12:26:51.948+0300 I STORAGE [initandlisten] ** The TTL collection monitor will not start because of this.
To get rid of it, you need to remove the database “local”. There is only one collection “startup_log” in this database in default state, so you can do this without fear via “mongo” console
use local
db.dropDatabase()
and restart MongoDB:
service mongod restart
If you will remove the “local” database before “replication” section from /etc/mongod.conf, it is immediately restored. So I could not do only one MongoDB restart.
On the additional server perform the same action:
remove “replication“ section from /etc/mongod.conf
restart MongoDB
drop the “local“ database
again restart
The arbiter just stop and remove:
pkill -f /tmp/mongo
rm -r /tmp/mongo
I have replica set (hosted on amazon) which has:
primary
secondary
arbiter
All of them are version 3.2.6 and this replica is making one shard in my sharded cluster (if that is important although I think it is not).
When I type rs.status() on primary it says that cannot reach secondary (the same thing is on arbiter):
{
"_id" : 1,
"name" : "secondary-ip:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-07-20T15:40:50.479Z"),
"lastHeartbeatRecv" : ISODate("2016-07-20T15:40:51.793Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "Couldn't get a connection within the time limit",
"configVersion" : -1
}
(btw look at the optimeDate O.o)
Error in my log is:
[ReplicationExecutor] Error in heartbeat request to secondary-ip:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
Strange thing is that when I go on secondary and type rs.status() everything looks OK. Also I am able to connect to secondary from my primary instance (with mongo --host secondary) so I guess it is not network issue. Yesterday it was all working fine.
TL;DR my primary cannot see secondary and arbiter cannot see secondary and my secondary sees primary and it was all working fine just day ago and I am able manually connect to secondary from primary instance.
Anyone has an idea what could go wrong?
Tnx,
Ivan
It seems the secondary optimeDate is responsible for the error, the best way to get to know the reasons for this wrong optimeDate is to investigate the secondary's machine current date time as it could be wrong as well. Not sure you are still looking for an answer but the optimedate is the problem and its not the connection between your replicaset machines.
I have mongodb replication set with 2 node(node0, node1), one day one of it(node1) crash.
considering deleting all data of node1 and restart it will take a long time, I shutdown node0 and rsync data to node1
after that, I start node0 and node1. both replSet stuck at STARTUP2, bellow is some log:
Sat Feb 8 13:14:22.031 [rsMgr] replSet I don't see a primary and I can't elect myself
Sat Feb 8 13:14:24.888 [rsStart] replSet initial sync pending
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
How to solve this problem?
EDIT 10/29/15: I found there's actually an easier way to find back your primary by using rs.reconfig with option {force: true}. You can find detail document here. Use with caution though as mentioned in the document it may cause rollback.
You should never build a 2-member replica set because once one of them is down, the other one wouldn't know if it's because the other one is down, or itself has been cut off from network. As a solution, add an arbiter node for voting.
So your problem is, when you restart node0, while node1 is already dead, no other node votes to it. it doesn't know if it's suitable to run a a primary node anymore. Thus it falls back to a secondary, that's why you see the message
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
I'm afraid as I know there's no other official way to resolve this issue other than rebuilding the replica set (but you can find some tricks later). Follow these steps:
Stop node0
Go to the data folder of node0 (on my machine it's /var/lib/mongodb. find yours in config file located at /etc/mongodb.conf)
delete local.* from the folder. note that
this undoable, even if you backed up these files.
You'll lose all the users in local database.
Start node0 and you shall see it running as a standalone node.
Then follow mongodb manual to recreate a replica set
run rs.initiate() to initialize replica set
add node1 to replica set: rs.add("node1 domain name");
I'm afraid you'll have to spend a long time waiting for the sync to finish. And then you are good to go.
I strongly recommend adding an arbiter to avoid this situation again.
So, above is the official way to reolve your issue, and this is how I did it with MongoDB 2.4.8. I didn't find any document to prove it so there's absolutely NO gurantee. you do it on your own risk. Anyway, if it doesn't work for you, just fallback to the official way. Worth tryng ;)
make sure in the whole progress no application is trying to modify your database. otherwise these modifications will not be synced to secondary server.
restart your server without the replSet=[set name] parameter, so that it runs as standalone, and you can do modifications to it.
go to local database, and delete node1 from db.system.replset. for example in my machine originally it's like:
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}, {
"_id": 1,
"host": "node1"
}]
}
You should change it to
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}]
}
Restart with replSet=[set name] and you are supposed to see node0 become primary again.
Add node1 to the replica set with rs.add command.
That's all. Let me know if you should have any question.
I had the same issue when using MMS. I created a new ReplicaSet of 3 machines (2 data + 1 arbiter, which is tricky to setup on MMS btw) and they were all in STARTUP2 "initial sync need a member to be primary or secondary to do our initial sync"
myReplicaSet:STARTUP2> rs.status()
{
"set" : "myReplicaSet",
"date" : ISODate("2015-01-17T21:20:12Z"),
"myState" : 5,
"members" : [
{
"_id" : 0,
"name" : "server1.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:11Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"
},
{
"_id" : 1,
"name" : "server2.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"infoMessage" : "initial sync need a member to be primary or secondary to do our initial sync",
"self" : true
},
{
"_id" : 3,
"name" : "server3.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 140,
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:10Z"),
"pingMs" : 0
}
],
"ok" : 1
}
To fix it, I used yaoxing answer. I had to shutdown the ReplicaSet on MMS, and wait for all members to be shut. It took a while...
Then, On all of them, I removed the content of the data dir:
sudo rm -Rf /var/data/*
And only after that, I turned the ReplicaSet On and all was fine.
I want to set
mongoClient.setWriteConcern(WriteConcern.REPLICAS_SAFE);
only if replica set is present.
But in sharded environment when I do:
mongoClient.getReplicaSetStatus();
It returns null even though I have replica set.
To mongo client I am passing mongos IP.
Most MongoDB drivers, in particular Java driver which you are using will throw an exception if you try to set REPLICA_ACKNOWLEDGED writeConcern when it's not possible to get an acknowledgement from two or more nodes.
From the docs:
WriteConcern.REPLICA_ACKNOWLEDGED Tries to write to two separate nodes. [...] will
throw an exception if two writes are not possible.
See the following for more details:
http://docs.mongodb.org/manual/reference/write-concern/
http://docs.mongodb.org/ecosystem/drivers/java-replica-set-semantics/
In my testing with mongo shell, if you provide REPLICA_ACKNOWLEDGED (formerly called REPLICA_SAFE) concern to 'getlasterror' command, you will get an error when you are not communicating with a replica set. When talking to mongos process, the error will be:
{
"singleShard" : "localhost:30001",
"n" : 0,
"connectionId" : 3,
"wnote" : "no replication has been enabled, so w=2.0 won't work",
"err" : "norepl",
"ok" : 1
}
It is not the case that the client will hang forever without wtimeout being specified, that would only be the case if there is a replica set but two nodes are not available for writes indefinitely.
Note that using "majority" as w value for write concern will work correctly through mongos - note the difference in writeConcern responses:
mongos> db.coll.insert({}); db.runCommand({getlasterror:1,w:"majority"})
{
"singleShard" : "localhost:30001",
"n" : 0,
"connectionId" : 3,
"err" : null,
"ok" : 1
}
First verify that your replica set has a PRIMARY using the mongo shell command rs.status()
Then if that worked, verify that you are connecting to the database correctly:
MongoClient mongoClient = new MongoClient( "hostname" , 27017 );
If both of those are true then there should be no reason mongoClient.getReplicaSetStatus() should return NULL. It should be returning a ReplicaSetStatus object.