Ever since I have added a new database into the mongodb, it stopped syncing replicaSet secondary instances, i.e. database name appear when running show dbs yet appear as (empty)
There is a repeating error in the log file at the secondary which also appear in
"errmsg" : "syncTail: ...
below is the output of rs.Status() on primary
PRIMARY> rs.status()
{
"set" : "contoso_db_set",
"date" : ISODate("2012-11-01T13:05:22Z"),
"myState" : 1,
"syncingTo" : "dbuse1d.int.contoso.com:27017",
"members" : [
{
"_id" : 0,
"name" : "dbuse1a.int.contoso.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1351775119000,
"i" : 2
},
"optimeDate" : ISODate("2012-11-01T13:05:19Z"),
"self" : true
},
{
"_id" : 1,
"name" : "dbuse1d.int.contoso.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 4108139,
"optime" : {
"t" : 1351405977000,
"i" : 12
},
"optimeDate" : ISODate("2012-10-28T06:32:57Z"),
"lastHeartbeat" : ISODate("2012-11-01T13:05:21Z"),
"pingMs" : 1,
"errmsg" : "syncTail: 10068 invalid operator: $oid, syncing: { ts: Timestamp 1351576230000|1, h: -2878874165043062831, op: \"i\", ns: \"new_contoso_db.accounts\", o: { _id: { $oid: \"4f79a1d1d4941d3755000000\" }, delegation: [ \"nE/UhsnmZ1BCCB+tiiS8fjjNwkxbND5PwESsaXeuaJw=\""
},
{
"_id" : 2,
"name" : "dbuse1a.int.contoso.com:8083",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 10671267,
"optime" : {
"t" : 0,
"i" : 0
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2012-11-01T13:05:21Z"),
"pingMs" : 0
}
],
"ok" : 1
}
PRIMARY>
The solution I have found was to delete the entire database from secondary
# rm -rf /data/db
# mkdir -p /data/db
And then restarting mongo and setting up the the replicaSet.
See more at Mongo's doc
What to do on a RS102 sync error
If one of your members has been offline and is now too far behind to
catch up, you will need to resync. There are a number of ways to do
this.
Perform a full resync. If you stop the failed mongod, delete all data in the dbpath (including subdirectories), and restart it, it will
automatically resynchronize itself. Obviously it would be better/safer
to back up the data first. If disk space is adequate, simply move it
to a backup location on the machine if appropriate. Resyncing may take
a long time if the database is huge or the network slow – even
idealized one terabyte of data would require three hours to transmit
over gigabit ethernet.*
Related
I have Meteor project with replicated two mongoDB server
It works pretty well but when I tested some DB error simulation,
the mongoDB replication's primary election and multi-oplog setting work differently with my though.
Here's my Meteor and MongoDB settings.
- MongodB -
rs.conf()
{
"_id" : "meteor",
"version" : 6,
"members" : [
{
"_id" : 0,
"host" : "hostname1:27017",
"priority" : 2.5
},
{
"_id" : 1,
"host" : "hostname2:27017",
"priority" : 1.5
}
]
}
rs.status()
{
"set" : "meteor",
"date" : ISODate("2014-10-08T10:40:38Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "hostname1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 462,
"optime" : Timestamp(1412764634, 2),
"optimeDate" : ISODate("2014-10-08T10:37:14Z"),
"electionTime" : Timestamp(1412764612, 1),
"electionDate" : ISODate("2014-10-08T10:36:52Z"),
"self" : true
},
{
"_id" : 1,
"name" : "hostname2:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 234,
"optime" : Timestamp(1412764634, 2),
"optimeDate" : ISODate("2014-10-08T10:37:14Z"),
"lastHeartbeat" : ISODate("2014-10-08T10:40:37Z"),
"lastHeartbeatRecv" : ISODate("2014-10-08T10:40:37Z"),
"pingMs" : 141,
"syncingTo" : "hostname2:27017"
}
],
"ok" : 1
}
- Meteor Environment -
MONGO_OPLOG_URL=mongodb://hostname1:27017,hostname2:27017/local MONGO_URL=mongodb://hostname1:27017/sports meteor
I assumed that if I terminate DB hostname1, then hostname2 should be elected by Primary, so my Meteor server could redirect its OPLOG URL to it.
But when I terminated hostname1, hostname2 stayed as Secondary and Meteor couldn't find any oplog server.
And even when I terminated hostname2, hostname1 which was Primary was changed into Secondary so Meteor couldn't find any oplog server as well.
I think I missed something big one, but I can't figure it out.
Does anyone have some idea about this?
Thanks in advance.
Your MONGO_URL needs to include both members of the replica set. Meteor is built on the node driver for mongo and thus has to be aware of both members of the replica set in order for failover to occur gracefully.
I'm trying to implement mongoDB replication, made up of 4 nodes in the form of Virtual Machines,
Info: I use virtualbox, and v.machines comunicate with each other throught the host-only adapter. Communication is been tested, and ervery node can ping other nodes.
this is the output of rs.conf() command :
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 4,
"members" : [
{
"_id" : 0,
"host" : "192.168.56.1:27017"
},
{
"_id" : 1,
"host" : "192.168.56.101:27018"
},
{
"_id" : 2,
"host" : "192.168.56.102:27019"
},
{
"_id" : 3,
"host" : "192.168.56.103:27020"
}
]
}
this is the output of the command rs.status()
rs0:PRIMARY> rs.status()
{
"set" : "rs0",
"date" : ISODate("2013-12-14T16:09:36Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.56.1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3207,
"optime" : Timestamp(1387034904, 1),
"optimeDate" : ISODate("2013-12-14T15:28:24Z"),
"self" : true
},
{
"_id" : 1,
"name" : "192.168.56.101:27018",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 2542,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-12-14T16:09:35Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 1,
"lastHeartbeatMessage" : "still initializing"
},
{
"_id" : 2,
"name" : "192.168.56.102:27019",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 2497,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-12-14T16:09:35Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "still initializing"
},
{
"_id" : 3,
"name" : "192.168.56.103:27020",
"health" : 1,
"state" : 6,
"stateStr" : "UNKNOWN",
"uptime" : 2472,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2013-12-14T16:09:36Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 1,
"lastHeartbeatMessage" : "still initializing"
}
],
"ok" : 1
}
from the last command it seems the replicaSet is still initializating, but looking at the 4 instances of mongod, i don't know, it seems like here is something not working properly..
im wondering why all the nodes try to contact only the primary instance ignoring the ohers? And why when the connection get accepted, it try again to contact the same node, failing to do that because it say that "Couldn't load config yet".. i really need a thrust for understand the problem, if is necessary any other command output, or information in general just let me know i'll post them.
thanks in advance for any help
For anyone else stumbling on this:
Make sure your primary's hostname is resolvable from the members, or otherwise your replSetConfig is using your primary's IP instead of hostname or fqdn.
By default mongo uses the primary's hostname in the "host" configuration field,
making all the other members fail to communicate with it if they don't have the information in the /etc/hosts file.
solved - i just checked the connections betweens all members in the way of mongo connection like rubenfa suggested in his comment.
mongo --host 192.168.56.103 --port 27020
and every connection between each members work properly using Host-only Adapter.
the main problem i posted at the beginning, was related to the fact that i didn't checked inside the 3 secondary nodes, if there was others local database created from the previous attempts to configure the replicaSet. I just used to check the primary node, and delete the local db from there, without check other nodes.
Also, remember to delete all the local dbs from all the nodes before try to reconfigure replicaSet
I have deployed a mongodb replicaSet with 3 servers ( One primary , Secondary and an arbitery )
The servers mongoA are located on one Linux machine and
mongoB(Secondary) and mongoB (arbitery) located in another Linux Machiene
If i start Primary script from mongoB Linux box and start Secondary and arbitery on mongoA , i couldn't see any of my data ( Collections ) under my db in mongoB linux machine even though the mongo shell shows primary , the vice versa is working fine .
The logs aren't showing any error .
Please let me know if this is expected behaviour in mongob machine ??
These are the statistics of my server collected from mongob
at:PRIMARY> rs.conf()
{
"_id" : "nat",
"version" : 18,
"members" : [
{
"_id" : 0,
"host" : "mongoA:27017"
},
{
"_id" : 1,
"host" : "mongoB:27018"
},
{
"_id" : 2,
"host" : "mongoB:27019",
"arbiterOnly" : true
}
]
}
nat:PRIMARY> rs.status()
{
"set" : "nat",
"date" : ISODate("2013-11-05T09:57:30Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mongoA:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 216,
"optime" : Timestamp(1383315218, 1),
"optimeDate" : ISODate("2013-11-01T14:13:38Z"),
"self" : true
},
{
"_id" : 1,
"name" : "mongoB:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 150,
"optime" : Timestamp(1383315218, 1),
"optimeDate" : ISODate("2013-11-01T14:13:38Z"),
"lastHeartbeat" : ISODate("2013-11-05T09:57:28Z"),
"lastHeartbeatRecv" : ISODate("2013-11-05T09:57:28Z"),
"pingMs" : 0,
"syncingTo" : "mongoA:27017"
},
{
"_id" : 2,
"name" : "mongoB:27019",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 134,
"lastHeartbeat" : ISODate("2013-11-05T09:57:28Z"),
"lastHeartbeatRecv" : ISODate("2013-11-05T09:57:29Z"),
"pingMs" : 0
}
],
"ok" : 1
}
In your rs.status() mongoA:27017 is primary and mongoB:27018 is secondary. The status of the secondary is "syncingTo" : "mongoA:27017" - which means that the secondary is still syncing to the primary (mongoA).
You need to wait for the servers to sync and then try again.
Your rs.status() output is showing that the primary and secondary should contain the same data. because the value of optime is identical on both nodes ("optime" refers to the oplog entry that each node contains)
"optime" : Timestamp(1383315218, 1),
This means you should be seeing the data on mongoA and mongoB.
try running the following command to see what databases you have:
show dbs
if you see you application database then you should be good.
(otherwise, please explain how you came to the conclusion that the data isn't there)
If a mongo node is offline for too long and the oplog wraps before it comes back up then it can get stuck in a stale state and require manual intervention. How can I recognise that state from the replica set status document? Will it stick in state 3, which is also used by nodes in maintenance mode and presumably by nodes that can catch up? If so, how can I tell the difference?
From http://docs.mongodb.org/manual/reference/replica-status/:
Number State
0 Starting up, phase 1 (parsing configuration)
1 Primary
2 Secondary
3 Recovering (initial syncing, post-rollback, stale members)
4 Fatal error
5 Starting up, phase 2 (forking threads)
6 Unknown state (the set has never connected to the member)
7 Arbiter
8 Down
9 Rollback
10 Removed
It will be in state 3, Recovering. To recognize the stale state specifically you need to look for the errmsg field. When stale, the secondary in question will have an errmsg like this:
"errmsg" : "error RS102 too stale to catch up"
In terms of a full output, it would look something like this:
rs.status()
{
"set" : "testReplSet",
"date" : ISODate("2013-01-29T01:39:38Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "hostname:31000",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 507,
"optime" : Timestamp(1359423456000, 893),
"optimeDate" : ISODate("2013-01-29T01:37:36Z"),
"self" : true
},
{
"_id" : 1,
"name" : "hostname:31001",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 483,
"optime" : Timestamp(1359423456000, 893),
"optimeDate" : ISODate("2013-01-29T01:37:36Z"),
"lastHeartbeat" : ISODate("2013-01-29T01:39:37Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "hostname:31002",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 4,
"optime" : Timestamp(1359423087000, 1),
"optimeDate" : ISODate("2013-01-29T01:31:27Z"),
"lastHeartbeat" : ISODate("2013-01-29T01:39:38Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}
And finally, a code snippet to print out the error only, if it exists, from the shell:
rs.status().members.forEach(function printError(rsmember){if (rsmember.errmsg){print(rsmember.errmsg)}})
I have a replica set that I am trying to upgrade the primary to one with more memory and upgraded disk space. So I raided a couple disks together on the new primary, rsync'd the data from a secondary and added it to the replica set. After checking out the rs.status(), I noticed that all the secondaries are at about 12 hours behind the primary. So when I try to force the new server to the primary spot it won't work, because it is not up to date.
This seems like a big issue, because in case the primary fails, we are at least 12 hours and some almost 48 hours behind.
The oplogs all overlap and the oplogsize is fairly large. The only thing that I can figure is I am performing a lot of writes/reads on the primary, which could keep the server in lock, not allowing for proper catch up.
Is there a way to possibly force a secondary to catch up to the primary?
There are currently 5 Servers with last 2 are to replace 2 of the other nodes.
The node with _id as 6, is to be the one to replace the primary. The node that is the furthest from the primary optime is a little over 48 hours behind.
{
"set" : "gryffindor",
"date" : ISODate("2011-05-12T19:34:57Z"),
"myState" : 2,
"members" : [
{
"_id" : 1,
"name" : "10******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20231,
"optime" : {
"t" : 1305057514000,
"i" : 31
},
"optimeDate" : ISODate("2011-05-10T19:58:34Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 2,
"name" : "10******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20231,
"optime" : {
"t" : 1305056009000,
"i" : 400
},
"optimeDate" : ISODate("2011-05-10T19:33:29Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 3,
"name" : "10******:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 20229,
"optime" : {
"t" : 1305228858000,
"i" : 422
},
"optimeDate" : ISODate("2011-05-12T19:34:18Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 5,
"name" : "10*******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20231,
"optime" : {
"t" : 1305058009000,
"i" : 226
},
"optimeDate" : ISODate("2011-05-10T20:06:49Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 6,
"name" : "10*******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"optime" : {
"t" : 1305050495000,
"i" : 384
},
"optimeDate" : ISODate("2011-05-10T18:01:35Z"),
"self" : true
}
],
"ok" : 1
}
I'm not sure why the syncing has failed in your case, but one way to brute force a resync is to remove the data files on the replica and restart the mongod. It will initiate a resync. See http://www.mongodb.org/display/DOCS/Halted+Replication. It is likely to take quite some time, dependent on the size of your database.
After looking through everything I saw a single error, which led me back to a mapreduce that was run on the primary, which had this issue: https://jira.mongodb.org/browse/SERVER-2861 . So when replication was attempted it failed to sync because of a faulty/corrupt operation in the oplog.