mongodb replica set with multiple primaries and pingMS=0 - mongodb

I am trying to set a replica set with two nodes: Node0 and Node1. From the Node0 I initialized a replica set named "rs0" and added Node1 to it. The problem is that it is added as a primary node instead of a secondary node and the final result is a replica set with two primary nodes.
This is the result of executing the rs.status() command from Node0
"set" : "rs0",
"date" : ISODate("2012-10-23T21:03:37Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "Node0:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 61185,
"optime" : Timestamp(1350967947000, 1),
"optimeDate" : ISODate("2012-10-23T04:52:27Z"),
"self" : true
},
{
"_id" : 1,
"name" : "Node1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 58270,
"optime" : Timestamp(1350956423000, 1),
"optimeDate" : ISODate("2012-10-23T01:40:23Z"),
"lastHeartbeat" : ISODate("2012-10-23T21:03:37Z"),
"pingMs" : 0
}
],
If I execute the same command from Node1 the only node listed is itself. Note that the pingMs is 0. Trying to add a third node or an arbiter give similar results: each one is added as primary and the pingMS is always 0.

You mentioned that you run rs.initiate() on both servers. This should only be done on one.
I suggest you start from scratch, by deleting the dbpath directory for each node (backup data before if the db was not empty). Then, start all mongod processes, log into one of them, then call
rs.initiate()
rs.add(<other node 1>)
The other node gets the replica set configuration through the first one automatically. Repeat `rs.add() for each additional node you want to add.

I ran into the same situation, having wrongly run rs.initiate() on two instances. I solved this by shutting down the should-be second instance, removing the data directory and relaunching the instance. Upon restart, it is properly detected as a member of the replica set, is properly synchronized, and most importantly there is only one primary.
This operation should not be dangerous since, to my knowledge, a replica set replicate all the data on the nodes. To be sure, you could just move the data directory after the shutdown of secondary node, so that you would keep a backup in case anything went wrong.

Related

Mongos routing with ReadPreference=NEAREST

I'm having trouble diagnosing an issue where my Java application's requests to MongoDB are not getting routed to the Nearest replica, and I hope someone can help. Let me start by explaining my configuration.
The Configuration:
I am running a MongoDB instance in production that is a Sharded ReplicaSet. It is currently only a single shard (it hasn't gotten big enough yet to require a split). This single shard is backed by a 3-node replica set. 2 nodes of the replica set live in our primary data center. The 3rd node lives in our secondary datacenter, and is prohibited from becoming the Master node.
We run our production application simultaneously in both data centers, however the instance in our secondary data center operates in "read-only" mode and never writes data into MongoDB. It only serves client requests for reads of existing data. The objective of this configuration is to ensure that if our primary datacenter goes down, we can still serve client read traffic.
We don't want to waste all of this hardware in our secondary datacenter, so even in happy times we actively load balance a portion of our read-only traffic to the instance of our application running in the secondary datacenter. This application instance is configured with readPreference=NEAREST and is pointed at a mongos instance running on localhost (version 2.6.7). The mongos instance is obviously configured to point at our 3-node replica set.
From a mongos:
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("52a8932af72e9bf3caad17b5")
}
shards:
{ "_id" : "shard1", "host" : "shard1/failover1.com:27028,primary1.com:27028,primary2.com:27028" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard1" }
{ "_id" : "MyApplicationData", "partitioned" : false, "primary" : "shard1" }
From the failover node of the replicaset:
shard1:SECONDARY> rs.status()
{
"set" : "shard1",
"date" : ISODate("2015-09-03T13:26:18Z"),
"myState" : 2,
"syncingTo" : "primary1.com:27028",
"members" : [
{
"_id" : 3,
"name" : "primary1.com:27028",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 674841,
"optime" : Timestamp(1441286776, 2),
"optimeDate" : ISODate("2015-09-03T13:26:16Z"),
"lastHeartbeat" : ISODate("2015-09-03T13:26:16Z"),
"lastHeartbeatRecv" : ISODate("2015-09-03T13:26:18Z"),
"pingMs" : 49,
"electionTime" : Timestamp(1433952764, 1),
"electionDate" : ISODate("2015-06-10T16:12:44Z")
},
{
"_id" : 4,
"name" : "primary2.com:27028",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 674846,
"optime" : Timestamp(1441286777, 4),
"optimeDate" : ISODate("2015-09-03T13:26:17Z"),
"lastHeartbeat" : ISODate("2015-09-03T13:26:18Z"),
"lastHeartbeatRecv" : ISODate("2015-09-03T13:26:18Z"),
"pingMs" : 53,
"syncingTo" : "primary1.com:27028"
},
{
"_id" : 5,
"name" : "failover1.com:27028",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 8629159,
"optime" : Timestamp(1441286778, 1),
"optimeDate" : ISODate("2015-09-03T13:26:18Z"),
"self" : true
}
],
"ok" : 1
}
shard1:SECONDARY> rs.conf()
{
"_id" : "shard1",
"version" : 15,
"members" : [
{
"_id" : 3,
"host" : "primary1.com:27028",
"tags" : {
"dc" : "primary"
}
},
{
"_id" : 4,
"host" : "primary2.com:27028",
"tags" : {
"dc" : "primary"
}
},
{
"_id" : 5,
"host" : "failover1.com:27028",
"priority" : 0,
"tags" : {
"dc" : "failover"
}
}
],
"settings" : {
"getLastErrorModes" : {"ACKNOWLEDGED" : {}}
}
}
The Problem:
The problem is that requests which hit this mongos in our secondary datacenter seem to be getting routed to a replica running in our primary datacenter, not the nearest node, which is running in the secondary datacenter. This incurs a significant amount of network latency and results in bad read performance.
My understanding is that the mongos is deciding which node in the replica set to route the request to, and it's supposed to honor the ReadPreference from my java driver's request. Is there a command I can run in the mongos shell to see the status of the replica set, including ping times to nodes? Or some way to see logging of incoming requests which indicates the node in the replicaSet that was chosen and why? Any advice at all on how to diagnose the root cause of my issue?
While configuring read preference, when ReadPreference = NEAREST the system does not look for minimum network latency as it may decide primary as the nearest, if the network connection is proper. However, the nearest read mode, when combined with a tag set, selects the matching member with the lowest network latency. Even nearest may be any of primary or secondary. Behaviour of mongos when preferences configured , and in terms of network latency is not so clearly explained in the official docs.
http://docs.mongodb.org/manual/core/read-preference/#replica-set-read-preference-tag-sets
hope this helps
If I start mongos with flag -vvvv (4x verbose) then I am presented with request routing information in the log files, including information about the read preference used and the host to which requests were routed. for example:
2015-09-10T17:17:28.020+0000 [conn3] dbclient_rs say
using secondary or tagged node selection in shard1,
read pref is { pref: "nearest", tags: [ {} ] }
(primary : primary1.com:27028,
lastTagged : failover1.com:27028)
Despite the wording, when using nearest, the absolute fastest member isn't necessarily the one chosen. Instead, a random member is chosen out of a pool of members that have a latency lower than the calculated latency window.
The latency window is calculated by taking the fastest member's ping and adding replication.localPingThresholdMs, whose default is 15ms. You can read more about the algorithm here.
So what I do is I combine nearest with tags so that I can specify the member manually that I know is geographically closest.

MongoDB as windows service and setting up replicaSet

I have installed MongoDB and its set up as windows service. When I try to set up replicaSet I am getting error "Only one usage of each socket address (protocol/network address/port) is normally permitted. for socket: 0.0.0.0:27017".
So, I have stopped the windows service and set up replicaSet. The replicaSet is working fine now. But, I don't see the windows service up and running. Does that means I can't set up replicaSet and MongoDB service at the same time?
You can set up replica set and MongoDB service at the same time on Windows. Since you have already set up a replica set, you are aware that you need to have a data directory and a log file for each replica set member. If you are running all replica set members on a single machine, each replica set member must be assigned a different port number. The sample provided is for development or functional testing only. Setting up all replica set members on a single machine will constitute a single point of failure in addition to being a total performance drag.
Create a configuration file for each replica set member including data directory, log file, port number, and replica set name. For example, I have a replica set of 3 members, one primary Mongodb running on port 27017 and two secondary, Mongodb1 on port 37017, and Mongodb2 on port 47017. The replica set name is rs1.
Here is the configuration file for instance Mongodb.
# mongod.conf
# data directory
dbpath=C:\data\db
# log file
logpath=C:\mongodb-win32-i386-2.4.4\log\mongo.log
logappend=true
#port number
port=27017
#replica set name
replSet=rs1
Here is the configuration file for one of the secondaries.
# mongo.conf
# data directory
dbpath=C:\data\db2
# log file
logpath=C:\mongodb-win32-i386-2.4.4\log2\mongo.log
logappend=true
# port number
port=47017
# replica set name
replSet=rs1
The following link provides a complete list of configuration file options:
http://docs.mongodb.org/manual/reference/configuration-options/
Add all three MongoDB instances as Windows service. Since I did not specify the service and service display name, the MongoDB service will use the default service/service display name MongoDB
C:\mongodb-2.4.4\bin>mongod --config C:\mongodb-2.4.4\mongod.cfg --install
Install the other two MongoDB instances with service name and service display name.
C:\mongodb-2.4.4\bin>mongod --config C:\mongodb-2.4.4\mongod1.cfg --serviceName MongoDB1 --serviceDisplayName MongoDB1 --install
C:\mongodb-2.4.4\bin>mongod --config C:\mongodb-2.4.4\mongod2.cfg --serviceName MongoDB2 --serviceDisplayName MongoDB2 --install
Start all three instances of MongDB
C:\mongodb-2.4.4\bin>net start mongodb
The Mongo DB service is starting.
The Mongo DB service was started successfully.
C:\mongodb-2.4.4\bin>net start mongodb1
The MongoDB1 service is starting.
The MongoDB1 service was started successfully.
C:\mongodb-2.4.4\bin>net start mongodb2
The MongoDB2 service is starting.
The MongoDB2 service was started successfully.
Verify the status of all three Windows service using the sc command with query option.
C:\mongodb-2.4.4\bin>sc query mongodb
C:\mongodb-2.4.4\bin>sc query mongodb1
C:\mongodb-2.4.4\bin>sc query mongodb2
Configure replica set from MongoDB shell. In the following example, the MongoDB instance on port 27017 will be the primary replica set member.
C:\mongodb-2.4.4\bin>mongo --port 27017
Set replica set configuration from MongoDB shell.
> config = { _id: "rs1", members:[
... { _id : 0, host : "localhost:27017"},
... { _id : 1, host : "localhost:37017"},
... { _id : 2, host : "localhost:47017"}
... ] }
At MongoDB shell, initialize the replica set and verify its status.
> rs.initiate(config)
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
> rs.status()
{
"set" : "rs1",
"date" : ISODate("2013-07-02T18:40:27Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 651,
"optime" : {
"t" : 1372790393,
"i" : 1
},
"optimeDate" : ISODate("2013-07-02T18:39:53Z"),
"self" : true
},
{
"_id" : 1,
"name" : "localhost:37017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 31,
"optime" : {
"t" : 1372790393,
"i" : 1
},
"optimeDate" : ISODate("2013-07-02T18:39:53Z"),
"lastHeartbeat" : ISODate("2013-07-02T18:40:26Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"syncingTo" : "localhost:27017"
},
{
"_id" : 2,
"name" : "localhost:47017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 31,
"optime" : {
"t" : 1372790393,
"i" : 1
},
"optimeDate" : ISODate("2013-07-02T18:39:53Z"),
"lastHeartbeat" : ISODate("2013-07-02T18:40:26Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"syncingTo" : "localhost:27017"
}
],
"ok" : 1
}
rs1:PRIMARY>
You can also check the status of the secondaries. Here I am connecting to one of the secondaries on port 37017.
C:\mongodb-2.4.4\bin>mongo --port 37017
The following prompt will present in MongoDB shell showing the secondary status.
rs1:SECONDARY>
A tutorial on deploying a replica set can be found here:
https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
Not covering the service part.
but setting up sharded replica set.
Try this blog to setup mongodb cluster on your local, it has all the configuration files with windows script to create required folder structure.
https://medium.com/#arun2pratap/set-up-mongodb-sharded-cluster-for-windows-with-just-a-double-click-6eedbb7b79e

Why a member of mongodb keep RECOVERING?

I set up a replica set with three members and one of them is an arbiter.
One time I restart a member, the member keep RECOVERING for a long time and did not be SECONDARY again, even though the database was not large.
The status of replica set is like that:
rs:PRIMARY> rs.status()
{
"set" : "rs",
"date" : ISODate("2013-01-17T02:08:57Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "192.168.1.52:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 67968,
"optime" : Timestamp(1358388479000, 1),
"optimeDate" : ISODate("2013-01-17T02:07:59Z"),
"self" : true
},
{
"_id" : 2,
"name" : "192.168.1.50:29017",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 107,
"lastHeartbeat" : ISODate("2013-01-17T02:08:56Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "192.168.1.50:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 58,
"optime" : Timestamp(1358246732000, 100),
"optimeDate" : ISODate("2013-01-15T10:45:32Z"),
"lastHeartbeat" : ISODate("2013-01-17T02:08:55Z"),
"pingMs" : 0,
"errmsg" : "still syncing, not yet to minValid optime 50f6472f:5d"
}
],
"ok" : 1
}
How should I fix this problem?
I had exact same issue: Secondary member of replica stuck in recovering mode.
Here how to solve the issue:
stop secondary mongo db
delete all secondary db data files
start secondary mongo
It will start in startup2 mode and will replicate all data from Primary
I've fixed the issue by following the below procedure.
Step1:
Login to different node and remove the issue node from mongodb replicaset. eg.
rs.remove("10.x.x.x:27017")
Step 2:
Stop the mongodb server on the issue node
systemctl stop mongodb.service
Step 3:
Create a new new folder on the dbpath
mkdir /opt/mongodb/data/db1
Note : existing path was /opt/mongodb/data/db
Step 4:
Modify dbpath on /etc/mongod.conf or mongdb yaml file
dbPath: /opt/mongodb/data/db1
Step 5:
Start the mongodb service
systemctl start mongodb.service
Step 6:
Takebackup of the existing folder and remove it
mkdir /opt/mongodb/data/backup
mv /opt/mongodb/data/db/* /opt/mongodb/data/backup
tar -cvf /opt/mongodb/data/backup.tar.gz /opt/mongodb/data/backup
rm -rf /opt/mongodb/data/db/
This will happen if replication has been broken for a while and on the slave it's not enough data to resume replication.
You would have to re-sync the slave either by replicating data from scratch or by copying it from another server and then resume it.
Check mongodb documentation for this issue https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#replica-set-auto-resync-stale-member

mongodb - All nodes in replica set are primary

I am trying to configure a replica set with two nodes but when I execute rs.add("node2") and then rs.status() both nodes are set to PRIMARY. Also when I run rs.status() on the other node the only node that appears is the local one.
Edit1:
rs.status() output:
{
"set" : "rs0",
"date" : ISODate("2012-09-22T01:01:12Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "node1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 70968,
"optime" : Timestamp(1348207012000, 1),
"optimeDate" : ISODate("2012-09-21T05:56:52Z"),
"self" : true
},
{
"_id" : 1,
"name" : "node2:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 68660,
"optime" : Timestamp(1348205568000, 1),
"optimeDate" : ISODate("2012-09-21T05:32:48Z"),
"lastHeartbeat" : ISODate("2012-09-22T01:01:11Z"),
"pingMs" : 0
}
],
"ok" : 1
}
Edit2: I tried doing the same thing with 3 different nodes and I got the same result (rs.status() says I have a replica set with three primary nodes). Is it possible that this problem is caused by some specific configuration of the network?
If you issue rs.initiate() from both of your the members of the replica set before rs.add() then both will come up as primary.
You should only use rs.initiate() on one of the members of the replica set, the one that you intend to be primary initially. Then you can rs.add() the other member to the replica set.
The answer above does not answer how to fix it. I kind of got it done using trial and error.
I have cleaned up the data directory (as in rm -rf *) and restarted these PRIMARY nodes, except one. I added them back. It seems to work.
Edit1
The nice little trick below did not seem to work for me,
So, I logged into the mongod console using mongo <hostname>:27018
here is how the shell looks like:
rs2:PRIMARY> rs.conf()
{
"_id" : "rs2",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "ip-10-159-42-911:27018"
}
]
}
I decided to change it to secondary. So,
rs2:PRIMARY> var c = {
... "_id" : "rs2",
... "version" : 1,
... "members" : [
... {
... "_id" : 1,
... "host" : "ip-10-159-42-911:27018",
... "priority": 0.5
... }
... ]
... }
rs2:PRIMARY> rs.reconfig(c, { "force": true})
Mon Nov 11 19:46:39.244 DBClientCursor::init call() failed
Mon Nov 11 19:46:39.245 trying reconnect to ip-10-159-42-911:27018
Mon Nov 11 19:46:39.245 reconnect ip-10-159-42-911:27018 ok
reconnected to server after rs command (which is normal)
rs2:SECONDARY>
Now it is secondary. I do not know if there is a better way. But this seems to work.
HTH

RS102 MongoDB on ReplicaSet

I have set up a replica set with 4 servers.
For testing purpose, I wrote a script to fill my database up to ~150 millions rows of photos using GridFS. My photos are around ~15KB. (This shouldn't be a problem to use gridfs for small files ?!)
After after a few hours, there were around 50 millions rows, but I had this message in the logs :
replSet error RS102 too stale to catch up, at least from 192.168.0.1:27017
And here is the replSet status :
rs.status();
{
"set" : "rsdb",
"date" : ISODate("2012-07-18T09:00:48Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.0.1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1342601552000,
"i" : 245
},
"optimeDate" : ISODate("2012-07-18T08:52:32Z"),
"self" : true
},
{
"_id" : 1,
"name" : "192.168.0.2:27018",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 64770,
"optime" : {
"t" : 1342539026000,
"i" : 5188
},
"optimeDate" : ISODate("2012-07-17T15:30:26Z"),
"lastHeartbeat" : ISODate("2012-07-18T09:00:47Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
},
{
"_id" : 2,
"name" : "192.168.0.3:27019",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 64735,
"optime" : {
"t" : 1342539026000,
"i" : 5188
},
"optimeDate" : ISODate("2012-07-17T15:30:26Z"),
"lastHeartbeat" : ISODate("2012-07-18T09:00:47Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
},
{
"_id" : 3,
"name" : "192.168.0.4:27020",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 65075,
"optime" : {
"t" : 1342539085000,
"i" : 3838
},
"optimeDate" : ISODate("2012-07-17T15:31:25Z"),
"lastHeartbeat" : ISODate("2012-07-18T09:00:46Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
The set is still accepting datas, but as I have my 3 servers "DOWN" how should I proceed to repair (nicer than delete datas and re-sync which wil take ages, but will work) ?
And especially :
Is this something because of a too violent script ? Meaning that it almost never happens in production ?
You don't need to repair, simply perform a full resync.
On the secondary, you can:
stop the failed mongod
delete all data in the dbpath (including subdirectories)
restart it and it will automatically resynchronize itself
Follow the instructions here.
What's happened in your case is that your secondaries have become stale, i.e. there is no common point in their oplog and that of the oplog on the primary. Look at this document, which details the various statuses. The writes to the primary member have to be replicated to the secondaries and your secondaries couldn't keep up until they eventually went stale. You will need to consider resizing your oplog.
Regarding oplog size, it depends on how much data you insert/update over time. I would chose a size which allows you many hours or even days of oplog.
Additionally, I'm not sure which O/S you are running. However, for 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate 5% of the available free disk space to the oplog. If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space. For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog and for 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.
How big are records and how many do you want? It depends on whether this data insertion is something typical or something abnormal that you were merely testing.
For example, at 2000 documents per second for documents of 1KB, that would net you 120MB per minute and your 5GB oplog would last about 40 minutes. This means if the secondary ever goes offline for 40 minutes or falls behind by more than that, then you are stale and have to do a full resync.
I recommend reading the Replica Set Internals document here. You have 4 members in your replica set, which is not recommended. You should have an odd number for the voting election (of primary) process, so you either need to add an arbiter, another secondary or remove one of your secondaries.
Finally, here's a detailed document on RS administration.