I have a mongoDB cluster
server1:27017
server2:27017
server3:27017
For historical reason, IT team could not provide the replicaSet name for this cluster.
My question is: without knowing the replicaSet name, is the following mongoDB url legal and will missing the optional replicaSet optional parameter cause any possible problems in future?
mongodb://username:password#server1:27017,server2:27017,server3:27017
I am using Java to setup MongoDB connection using the following
String MONGO_REPLICA_SET = "mongodb://username:password#server1:27017,server2:27017,server3:27017";
MongoClientURI mongoClientURI = new MongoClientURI(MONGODB_REPLICA_SET);
mongoClient = new MongoClient(mongoClientURI);
To clarify, although it may be functional to connect to the replica set it would be preferable to specify the replicaSet option.
Depending on the MongoDB Drivers that you're using it may behaves slightly differently. For example quoting the Server Discovery and Monitoring Spec for Initial Topology Type:
In the Java driver a single seed means Single, but a list containing one seed means Unknown, so it can transition to replica-set monitoring if the seed is discovered to be a replica set member. In contrast, PyMongo requires a non-null setName in order to begin replica-set monitoring, regardless of the number of seeds.
There are variations, and it's best to check whether the connection can still handle topology discovery and failover.
For historical reason, IT team could not provide the replicaSet name for this cluster.
If you have access to the admin database, you could execute rs.status() on mongo shell to find out the name of the replica set. See also replSetGetStatus for more information.
It should be possible to find out the name of the replica set, to avoid this worry. Open a connection to any one of your nodes (e.g. direct to server1:27017), and run rs.status(); that will tell you the name of your replica set as well as lots of other useful data such as the complete set of configured nodes and their individual statuses.
In this example of the output, "rsInternalTest" is the replica set name:
{
"set" : "rsInternalTest",
"date" : ISODate("2018-05-01T11:38:32.608Z"),
"myState" : 1,
"term" : NumberLong(123),
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
...
},
"members" : [
{
"_id" : 1,
"name" : "server1:27017",
"health" : 1.0,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1652592,
"optime" : {
"ts" : Timestamp(1525174711, 1),
"t" : NumberLong(123)
},
"optimeDate" : ISODate("2018-05-01T11:38:31.000Z"),
"electionTime" : Timestamp(1524371004, 1),
"electionDate" : ISODate("2018-04-22T04:23:24.000Z"),
"configVersion" : 26140,
"self" : true
}
...
],
"ok" : 1.0
}
Note that you will need the login of a high-level user account, otherwise you won't have permission to run rs.status().
Related
How to configure removed members of a replica set to form new replica set?
I have a replica set with 4 mongod instances
Output of rs.config()
{
"_id" : "rs0",
"members" : [
{
"_id" : 0,
"host" : "localhost:27031"
},
{
"_id" : 1,
"host" : "localhost:27032"
},
{
"_id" : 2,
"host" : "localhost:27033"
},
{
"_id" : 3,
"host" : "localhost:27034"
}
],
"settings" : {
"replicaSetId" : ObjectId("5cf22332f5b9d21b01b9b6b2")
}
}
I removed 2 instances from the replica set
rs.remove("localhost:27033")
rs.remove("localhost:27034")
Now my requirement is to form a new replica set with these 2 removed members. What is the best way for that?
My current solution
connect to removed member
mongo --port 27033
and execute
conf = {
"_id" : "rs0",
"members" : [
{
"_id" : 2,
"host" : "localhost:27033"
},
{
"_id" : 3,
"host" : "localhost:27034"
}
],
"settings" : {
"replicaSetId" : ObjectId("5cf22332f5b9d21b01b9b6b2")
}
}
and then
rs.reconfig(conf, {force:true})
Outcome
This solution worked fine practically.
The removed members formed a replicaset, one of them became primary and other became secondary. Data was replicated among them.
And this replica set seems to be isolated from the the initial replica set from which they were removed.
Concerns
1) I had to use forced reconfiguration. Not sure about the consequences.
"errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is REMOVED; use the \"force\" argument to override",
2) Is the new replica set actually new one? In the rs.config()
replicaSetId is same as old one.
"replicaSetId" : ObjectId("5cf22332f5b9d21b01b9b6b2")
I had to use same value for _id of members as in config of old replica set
"errmsg" : "New and old configurations both have members with host of localhost:27034 but in the new configuration the _id field is 1 and in the old configuration it is 3 for replica set rs0",
Is this solution good?
Is there any better solution?
Note: I need to retain data from old replica set (data which was present at the time of removal) in the new replica set.
As you have suspected, the procedure did not create a new replica set. Rather, it's a continuation of the old replica set, albeit superficially they look different.
There is actually a procedure in the MongoDB documentation to do what you want: Restore a Replica Set from MongoDB Backups. The difference being, you're not restoring from a backup. Rather, you're using one of the removed secondaries to seed a new replica set.
Hence you need to modify the first step in the procedure mentioned in the link above. The rest of the procedure would still be the same:
Restart the removed secondary as a standalone (without the --replSet parameter) and connect to it using the mongo shell.
Drop the local database in the standalone node:
use local
db.dropDatabase()
Restart the ex-secondary, this time with the --replSet parameter (with a new replica set name)
Connect to it using the mongo shell.
rs.initiate() the new set.
After this, the new set should have a different replicaSetId compared to the old set. In my quick test of the procedure above, this is the result I see:
Old set:
> rs.conf()
...
"replicaSetId": ObjectId("5cf45d72a1c6c4de948ff5d8")
...
New set
> rs.conf()
...
"replicaSetId": ObjectId("5cf45d000dda9e1025d6c65e")
...
As with any major deployment changes like this, please ensure that you have a backup, and thoroughly test the procedures before doing it on a production system.
I have replica set (hosted on amazon) which has:
primary
secondary
arbiter
All of them are version 3.2.6 and this replica is making one shard in my sharded cluster (if that is important although I think it is not).
When I type rs.status() on primary it says that cannot reach secondary (the same thing is on arbiter):
{
"_id" : 1,
"name" : "secondary-ip:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-07-20T15:40:50.479Z"),
"lastHeartbeatRecv" : ISODate("2016-07-20T15:40:51.793Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "Couldn't get a connection within the time limit",
"configVersion" : -1
}
(btw look at the optimeDate O.o)
Error in my log is:
[ReplicationExecutor] Error in heartbeat request to secondary-ip:27017; ExceededTimeLimit: Couldn't get a connection within the time limit
Strange thing is that when I go on secondary and type rs.status() everything looks OK. Also I am able to connect to secondary from my primary instance (with mongo --host secondary) so I guess it is not network issue. Yesterday it was all working fine.
TL;DR my primary cannot see secondary and arbiter cannot see secondary and my secondary sees primary and it was all working fine just day ago and I am able manually connect to secondary from primary instance.
Anyone has an idea what could go wrong?
Tnx,
Ivan
It seems the secondary optimeDate is responsible for the error, the best way to get to know the reasons for this wrong optimeDate is to investigate the secondary's machine current date time as it could be wrong as well. Not sure you are still looking for an answer but the optimedate is the problem and its not the connection between your replicaset machines.
I have mongodb replication set with 2 node(node0, node1), one day one of it(node1) crash.
considering deleting all data of node1 and restart it will take a long time, I shutdown node0 and rsync data to node1
after that, I start node0 and node1. both replSet stuck at STARTUP2, bellow is some log:
Sat Feb 8 13:14:22.031 [rsMgr] replSet I don't see a primary and I can't elect myself
Sat Feb 8 13:14:24.888 [rsStart] replSet initial sync pending
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
How to solve this problem?
EDIT 10/29/15: I found there's actually an easier way to find back your primary by using rs.reconfig with option {force: true}. You can find detail document here. Use with caution though as mentioned in the document it may cause rollback.
You should never build a 2-member replica set because once one of them is down, the other one wouldn't know if it's because the other one is down, or itself has been cut off from network. As a solution, add an arbiter node for voting.
So your problem is, when you restart node0, while node1 is already dead, no other node votes to it. it doesn't know if it's suitable to run a a primary node anymore. Thus it falls back to a secondary, that's why you see the message
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
I'm afraid as I know there's no other official way to resolve this issue other than rebuilding the replica set (but you can find some tricks later). Follow these steps:
Stop node0
Go to the data folder of node0 (on my machine it's /var/lib/mongodb. find yours in config file located at /etc/mongodb.conf)
delete local.* from the folder. note that
this undoable, even if you backed up these files.
You'll lose all the users in local database.
Start node0 and you shall see it running as a standalone node.
Then follow mongodb manual to recreate a replica set
run rs.initiate() to initialize replica set
add node1 to replica set: rs.add("node1 domain name");
I'm afraid you'll have to spend a long time waiting for the sync to finish. And then you are good to go.
I strongly recommend adding an arbiter to avoid this situation again.
So, above is the official way to reolve your issue, and this is how I did it with MongoDB 2.4.8. I didn't find any document to prove it so there's absolutely NO gurantee. you do it on your own risk. Anyway, if it doesn't work for you, just fallback to the official way. Worth tryng ;)
make sure in the whole progress no application is trying to modify your database. otherwise these modifications will not be synced to secondary server.
restart your server without the replSet=[set name] parameter, so that it runs as standalone, and you can do modifications to it.
go to local database, and delete node1 from db.system.replset. for example in my machine originally it's like:
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}, {
"_id": 1,
"host": "node1"
}]
}
You should change it to
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}]
}
Restart with replSet=[set name] and you are supposed to see node0 become primary again.
Add node1 to the replica set with rs.add command.
That's all. Let me know if you should have any question.
I had the same issue when using MMS. I created a new ReplicaSet of 3 machines (2 data + 1 arbiter, which is tricky to setup on MMS btw) and they were all in STARTUP2 "initial sync need a member to be primary or secondary to do our initial sync"
myReplicaSet:STARTUP2> rs.status()
{
"set" : "myReplicaSet",
"date" : ISODate("2015-01-17T21:20:12Z"),
"myState" : 5,
"members" : [
{
"_id" : 0,
"name" : "server1.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:11Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"
},
{
"_id" : 1,
"name" : "server2.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"infoMessage" : "initial sync need a member to be primary or secondary to do our initial sync",
"self" : true
},
{
"_id" : 3,
"name" : "server3.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 140,
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:10Z"),
"pingMs" : 0
}
],
"ok" : 1
}
To fix it, I used yaoxing answer. I had to shutdown the ReplicaSet on MMS, and wait for all members to be shut. It took a while...
Then, On all of them, I removed the content of the data dir:
sudo rm -Rf /var/data/*
And only after that, I turned the ReplicaSet On and all was fine.
MongoDB shard needs to know about the members of a replicaset. Is the member list discovery dynamic? I mean if we add a node to an existing replicaset which is already configured as a shard on the config servers, does the shard automatically update or do we have do manually update the shard configuration with any new member added to the replica?
In older versions of Mongo, prior to 2.0.3 all of the replica set members needed to be specified when adding a shard. Thus it seems fair to conclude that when a shard is added and it only needs to know one of the members of the replica set, then all activity between replica set members is delegated to the replica set.
Probably the optimal way to be sure is fire up a test scenario on your own machine. But there is nothing to suggest there is any additional configuration to sharding that should be required.
And as a bit of an update as I had nothing to do over having lunch :) I just spun up the a load of instances as mapped out in the listed tutorial:
http://docs.mongodb.org/manual/tutorial/add-shards-to-shard-cluster/
A few differences being use of sh.addShard in the first member of replica set added only, rather than the all members syntax shown in the docs.
Once the shards were up. I just added two more replica set nodes to the firstset.
http://docs.mongodb.org/manual/tutorial/expand-replica-set/
Without anything else let's see the status from mongos
mongos> db.printShardingStatus()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("52f2f77a538f784f4413e6b9")
}
shards:
{ "_id" : "firstset",
"host" :"firstset/localhost:10001,localhost:10002,localhost:10003,localhost:10004,localhost:10005" }
{ "_id" : "secondset", "host" : "secondset/localhost:20001,localhost:20002,localhost:20003" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "firstset" }
test.test_collection
shard key: { "number" : 1 }
chunks:
secondset 23
firstset 191
So the shard is still moving chunks and the new nodes just finsihed initializing as I was typing.
And that's all there is to adding additional nodes to a replica set on a shard. Most of this was done during a 1 million document insert.
I am trying to add one more replica set in existing sets and getting problem in reachability.
What are the reasons when we get Not reachable/healthy replica set ?
"name" : "IP ADDRESS",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"t" : 0,
"i" : 0
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-06-18T10:52:50Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0
similar problem i had , the solution was to have a keyfile.
http://docs.mongodb.org/manual/tutorial/deploy-replica-set-with-auth/#create-the-key-file-to-be-used-by-each-member-of-the-replica-set
I could ping and telnet both server but facing the same issue.
Error I was getting "[ReplicationExecutor] Error in heartbeat request to prodmongo:27017; HostUnreachable Connection refused"
Also I had the "(not reachable/healthy)" stateStr
Please check the key on both server, all replica set should run with same key. I had the same issue and I found that key was not identical in my secondary server.
This issue in my case happened when it seems that after replicaSet was created the primary could not reach secondary on 27017(or any other configured port).
Writing the data on primary was fine and even the secondaries also synced the data from the primary. I wonder if the secondary tail the oplog of primary on some other port, other than the ones configured in replicaSet configuration
I also faced similar type of problem. But solved
If replication server is different then first check mongodb access from other server. Check that mongodb port is opened.
For that I connected mongodb server from other server
Second case In my case I started mongodb without "replSet" , its given me problem of "Not reachable/healthy Replica Set" for solve this problem
I again started mongodb with "--replSet" on other computer where my mongodb running. then run rs.add("ServerName:PortNumber") on Master replication Server.
!worked for me