I'm attempting to setup a MongoDB test replica set. The problem is that I can't find any way to get an error message and one of the nodes remains permanently in DOWN or UNKNOWN status.
Here is my rs.status from the primary
{
"set" : "rs0",
"date" : ISODate("2014-05-08T00:41:11Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mongo1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3319,
"optime" : Timestamp(1399509356, 1),
"optimeDate" : ISODate("2014-05-08T00:35:56Z"),
"electionTime" : Timestamp(1399506359, 1),
"electionDate" : ISODate("2014-05-07T23:45:59Z"),
"self" : true
},
{
"_id" : 2,
"name" : "mongo3:30000",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 319,
"lastHeartbeat" : ISODate("2014-05-08T00:41:11Z"),
"lastHeartbeatRecv" : ISODate("2014-05-08T00:41:11Z"),
"pingMs" : 2,
"syncingTo" : "mongo1:27017"
},
{
"_id" : 3,
"name" : "mongo2:27018",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 315,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2014-05-08T00:41:11Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 2,
"lastHeartbeatMessage" : "still initializing"
}
],
"ok" : 1
}
Here is the rs.conf from primary
{
"_id" : "rs0",
"version" : 12,
"members" : [
{
"_id" : 0,
"host" : "mongo1:27017"
},
{
"_id" : 2,
"host" : "mongo3:30000",
"arbiterOnly" : true
},
{
"_id" : 3,
"host" : "mongo2:27018"
}
]
}
The issue is mongo2:27018. I've tried adding and removing it. I've tried wiping the entire box and re-installing Cent + Mongo. From any of the 3 boxes, I can mongo to other the 2. So from mongo1:27017 I can type mongo mongo2:27018 and it has no problems. All 3 boxes have the same configuration which I've double, triple, and quadraple checked in their /etc/hosts.
The only debugging information I can find anywhere is the following block on problematic node:
2014-05-08T02:45:51.763+0200 [initandlisten] connection accepted from 10.0.2.2:48720 #50 (2 connections now open)
2014-05-08T02:46:00.593+0200 [rsStart] trying to contact mongo1:27017
2014-05-08T02:46:00.602+0200 [rsStart] trying to contact mongo3:30000
2014-05-08T02:46:00.605+0200 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
Any guidance is appreciated, been struggling at this for 5 hours now.
The eventual issue we discovered is that the hostname for each replica node not only needs to be valid between the nodes but also from a node to itself!
In example, due to some port forwarding going on, mongo1 could successfully communicate to mongo2 by mongo2:27018, mongo3 could successfully communicate to mongo2 by mongo2:27018, but mongo2 could not communicate to itself at mongo2:27018 (since it was actually listening on 27017). The reason it worked for the other boxes was that they were mongo1 and mongo3 had an alias for mongo2 which was port forwarding 27018 to 27017.
So basically unless each node can ping themselves AND the other nodes from the hostname in the config it will not work!
Related
I have Meteor project with replicated two mongoDB server
It works pretty well but when I tested some DB error simulation,
the mongoDB replication's primary election and multi-oplog setting work differently with my though.
Here's my Meteor and MongoDB settings.
- MongodB -
rs.conf()
{
"_id" : "meteor",
"version" : 6,
"members" : [
{
"_id" : 0,
"host" : "hostname1:27017",
"priority" : 2.5
},
{
"_id" : 1,
"host" : "hostname2:27017",
"priority" : 1.5
}
]
}
rs.status()
{
"set" : "meteor",
"date" : ISODate("2014-10-08T10:40:38Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "hostname1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 462,
"optime" : Timestamp(1412764634, 2),
"optimeDate" : ISODate("2014-10-08T10:37:14Z"),
"electionTime" : Timestamp(1412764612, 1),
"electionDate" : ISODate("2014-10-08T10:36:52Z"),
"self" : true
},
{
"_id" : 1,
"name" : "hostname2:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 234,
"optime" : Timestamp(1412764634, 2),
"optimeDate" : ISODate("2014-10-08T10:37:14Z"),
"lastHeartbeat" : ISODate("2014-10-08T10:40:37Z"),
"lastHeartbeatRecv" : ISODate("2014-10-08T10:40:37Z"),
"pingMs" : 141,
"syncingTo" : "hostname2:27017"
}
],
"ok" : 1
}
- Meteor Environment -
MONGO_OPLOG_URL=mongodb://hostname1:27017,hostname2:27017/local MONGO_URL=mongodb://hostname1:27017/sports meteor
I assumed that if I terminate DB hostname1, then hostname2 should be elected by Primary, so my Meteor server could redirect its OPLOG URL to it.
But when I terminated hostname1, hostname2 stayed as Secondary and Meteor couldn't find any oplog server.
And even when I terminated hostname2, hostname1 which was Primary was changed into Secondary so Meteor couldn't find any oplog server as well.
I think I missed something big one, but I can't figure it out.
Does anyone have some idea about this?
Thanks in advance.
Your MONGO_URL needs to include both members of the replica set. Meteor is built on the node driver for mongo and thus has to be aware of both members of the replica set in order for failover to occur gracefully.
I'm trying to deploy sharded cluster in Mongodb
I followed up the tutorial here
http://docs.mongodb.org/manual/tutorial/convert-replica-set-to-replicated-shard-cluster/
first I deployed a Replica set with test data on a separate machine with IP 192.168.1.212
and this is the status after I finished deploying it
firstset:PRIMARY> rs.status();
{
"set" : "firstset",
"date" : ISODate("2014-03-24T10:54:06Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "localhost:10001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 117,
"optime" : Timestamp(1395650164, 10507),
"optimeDate" : ISODate("2014-03-24T08:36:04Z"),
"self" : true
},
{
"_id" : 2,
"name" : "localhost:10002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 65,
"optime" : Timestamp(1395650164, 10507),
"optimeDate" : ISODate("2014-03-24T08:36:04Z"),
"lastHeartbeat" : ISODate("2014-03-24T10:54:05Z"),
"lastHeartbeatRecv" : ISODate("2014-03-24T10:54:05Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: localhost:10001",
"syncingTo" : "localhost:10001"
},
{
"_id" : 3,
"name" : "localhost:10003",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 51,
"optime" : Timestamp(1395650164, 10507),
"optimeDate" : ISODate("2014-03-24T08:36:04Z"),
"lastHeartbeat" : ISODate("2014-03-24T10:54:05Z"),
"lastHeartbeatRecv" : ISODate("2014-03-24T10:54:04Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: localhost:10001",
"syncingTo" : "localhost:10001"
}
],
"ok" : 1
}
Then I deployed three config server on separate machine then run mongos instance on another machine
Then I wanted to add Replica shard using the following command
sh.addShard("firstset/192.168.1.212:10001,192.168.1.212:10002,192.168.1.212:10003")
But I get the following error
mongos> sh.addShard('firstset/192.168.1.212:10001,192.168.1.212:10002,192.168.1.212:10003');
{
"ok" : 0,
"errmsg" : "couldn't connect to new shard ReplicaSetMonitor no master found for set: firstset"
}
I found the solution to this problem with Sammaye's help
the problem was that when replica set is initiated you should take care of the IPs you use because when the router will try to connect to the replica set, it reads its configuration file
So if you use rs.initiate() with setting your configuration, the configuration will be like that
{
"_id" : "firstset",
"version" : 1,
"members" : [
{
"_id" : 1,
"host" : "localhost:10001"
},
{
"_id" : 2,
"host" : "localhost:10002"
},
{
"_id" : 3,
"host" : "localhost:10003"
}
]
}
So the router will try to search at localhost to find the primary replica set but it won't find it because it is on other machine
If you use different machines for testing so initialize Replica set manually as following
rsconf ={
"_id" : "firstset",
"version" : 1,
"members" : [
{
"_id" : 1,
"host" : "machine_ip:machine_port"
},
{
"_id" : 2,
"host" : "machine_ip:machine_port"
},
{
"_id" : 3,
"host" : "machine_ip:machine_port"
}
]
}
rs.initiate(rsconf);
Also if you use either “localhost” or “127.0.0.1” as the host identifier, then you must use “localhost” or “127.0.0.1” for all host settings for any MongoDB instances in the cluster. This applies to both the host argument to addShard and the value to the mongos --configdb run time option. If you mix localhost addresses with remote host address, MongoDB will produce errors.
I'm trying to implement mongoDB replication, made up of 4 nodes in the form of Virtual Machines,
Info: I use virtualbox, and v.machines comunicate with each other throught the host-only adapter. Communication is been tested, and ervery node can ping other nodes.
this is the output of rs.conf() command :
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.56.1:27017"
},
{
"_id" : 1,
"host" : "192.168.56.101:27018"
},
{
"_id" : 2,
"host" : "192.168.56.102:27019"
},
{
"_id" : 3,
"host" : "192.168.56.103:27020"
}
]
}
this is the output of the command rs.status()
rs0:PRIMARY> rs.status()
{
"set" : "rs0",
"date" : ISODate("2013-12-14T16:09:36Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.56.1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3207,
"optime" : Timestamp(1387034904, 1),
"optimeDate" : ISODate("2013-12-14T15:28:24Z"),
"self" : true
},
{
"_id" : 1,
"name" : "192.168.56.101:27018",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 2542,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-12-14T16:09:35Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 1,
"lastHeartbeatMessage" : "still initializing"
},
{
"_id" : 2,
"name" : "192.168.56.102:27019",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 2497,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-12-14T16:09:35Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "still initializing"
},
{
"_id" : 3,
"name" : "192.168.56.103:27020",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 2472,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-12-14T16:09:36Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 1,
"lastHeartbeatMessage" : "still initializing"
}
],
"ok" : 1
}
from the last command it seems the replicaSet is still initializating, but looking at the 4 instances of mongod, i don't know, it seems like here is something not working properly..
im wondering why all the nodes try to contact only the primary instance ignoring the ohers? And why when the connection get accepted, it try again to contact the same node, failing to do that because it say that "Couldn't load config yet".. i really need a thrust for understand the problem, if is necessary any other command output, or information in general just let me know i'll post them.
thanks in advance for any help
For anyone else stumbling on this:
Make sure your primary's hostname is resolvable from the members, or otherwise your replSetConfig is using your primary's IP instead of hostname or fqdn.
By default mongo uses the primary's hostname in the "host" configuration field,
making all the other members fail to communicate with it if they don't have the information in the /etc/hosts file.
solved - i just checked the connections betweens all members in the way of mongo connection like rubenfa suggested in his comment.
mongo --host 192.168.56.103 --port 27020
and every connection between each members work properly using Host-only Adapter.
the main problem i posted at the beginning, was related to the fact that i didn't checked inside the 3 secondary nodes, if there was others local database created from the previous attempts to configure the replicaSet. I just used to check the primary node, and delete the local db from there, without check other nodes.
Also, remember to delete all the local dbs from all the nodes before try to reconfigure replicaSet
I have deployed a mongodb replicaSet with 3 servers ( One primary , Secondary and an arbitery )
The servers mongoA are located on one Linux machine and
mongoB(Secondary) and mongoB (arbitery) located in another Linux Machiene
If i start Primary script from mongoB Linux box and start Secondary and arbitery on mongoA , i couldn't see any of my data ( Collections ) under my db in mongoB linux machine even though the mongo shell shows primary , the vice versa is working fine .
The logs aren't showing any error .
Please let me know if this is expected behaviour in mongob machine ??
These are the statistics of my server collected from mongob
at:PRIMARY> rs.conf()
{
"_id" : "nat",
"version" : 18,
"members" : [
{
"_id" : 0,
"host" : "mongoA:27017"
},
{
"_id" : 1,
"host" : "mongoB:27018"
},
{
"_id" : 2,
"host" : "mongoB:27019",
"arbiterOnly" : true
}
]
}
nat:PRIMARY> rs.status()
{
"set" : "nat",
"date" : ISODate("2013-11-05T09:57:30Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mongoA:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 216,
"optime" : Timestamp(1383315218, 1),
"optimeDate" : ISODate("2013-11-01T14:13:38Z"),
"self" : true
},
{
"_id" : 1,
"name" : "mongoB:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 150,
"optime" : Timestamp(1383315218, 1),
"optimeDate" : ISODate("2013-11-01T14:13:38Z"),
"lastHeartbeat" : ISODate("2013-11-05T09:57:28Z"),
"lastHeartbeatRecv" : ISODate("2013-11-05T09:57:28Z"),
"pingMs" : 0,
"syncingTo" : "mongoA:27017"
},
{
"_id" : 2,
"name" : "mongoB:27019",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 134,
"lastHeartbeat" : ISODate("2013-11-05T09:57:28Z"),
"lastHeartbeatRecv" : ISODate("2013-11-05T09:57:29Z"),
"pingMs" : 0
}
],
"ok" : 1
}
In your rs.status() mongoA:27017 is primary and mongoB:27018 is secondary. The status of the secondary is "syncingTo" : "mongoA:27017" - which means that the secondary is still syncing to the primary (mongoA).
You need to wait for the servers to sync and then try again.
Your rs.status() output is showing that the primary and secondary should contain the same data. because the value of optime is identical on both nodes ("optime" refers to the oplog entry that each node contains)
"optime" : Timestamp(1383315218, 1),
This means you should be seeing the data on mongoA and mongoB.
try running the following command to see what databases you have:
show dbs
if you see you application database then you should be good.
(otherwise, please explain how you came to the conclusion that the data isn't there)
Ever since I have added a new database into the mongodb, it stopped syncing replicaSet secondary instances, i.e. database name appear when running show dbs yet appear as (empty)
There is a repeating error in the log file at the secondary which also appear in
"errmsg" : "syncTail: ...
below is the output of rs.Status() on primary
PRIMARY> rs.status()
{
"set" : "contoso_db_set",
"date" : ISODate("2012-11-01T13:05:22Z"),
"myState" : 1,
"syncingTo" : "dbuse1d.int.contoso.com:27017",
"members" : [
{
"_id" : 0,
"name" : "dbuse1a.int.contoso.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1351775119000,
"i" : 2
},
"optimeDate" : ISODate("2012-11-01T13:05:19Z"),
"self" : true
},
{
"_id" : 1,
"name" : "dbuse1d.int.contoso.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 4108139,
"optime" : {
"t" : 1351405977000,
"i" : 12
},
"optimeDate" : ISODate("2012-10-28T06:32:57Z"),
"lastHeartbeat" : ISODate("2012-11-01T13:05:21Z"),
"pingMs" : 1,
"errmsg" : "syncTail: 10068 invalid operator: $oid, syncing: { ts: Timestamp 1351576230000|1, h: -2878874165043062831, op: \"i\", ns: \"new_contoso_db.accounts\", o: { _id: { $oid: \"4f79a1d1d4941d3755000000\" }, delegation: [ \"nE/UhsnmZ1BCCB+tiiS8fjjNwkxbND5PwESsaXeuaJw=\""
},
{
"_id" : 2,
"name" : "dbuse1a.int.contoso.com:8083",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 10671267,
"optime" : {
"t" : 0,
"i" : 0
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2012-11-01T13:05:21Z"),
"pingMs" : 0
}
],
"ok" : 1
}
PRIMARY>
The solution I have found was to delete the entire database from secondary
# rm -rf /data/db
# mkdir -p /data/db
And then restarting mongo and setting up the the replicaSet.
See more at Mongo's doc
What to do on a RS102 sync error
If one of your members has been offline and is now too far behind to
catch up, you will need to resync. There are a number of ways to do
this.
Perform a full resync. If you stop the failed mongod, delete all data in the dbpath (including subdirectories), and restart it, it will
automatically resynchronize itself. Obviously it would be better/safer
to back up the data first. If disk space is adequate, simply move it
to a backup location on the machine if appropriate. Resyncing may take
a long time if the database is huge or the network slow – even
idealized one terabyte of data would require three hours to transmit
over gigabit ethernet.*