Secondary keeps rolling back - mongodb

In last 7 days, three times our secondary servers went down with the following message. What these errors mean? Why does it rollback? I have attached the screen shot of the oplog window and replication lag.
Around 4AM the server went down. Around 3:50 the replication lag went to 300 seconds, but that is just 5 mins, the node has more oplog window.
We take backups using MMS from one of the secondary, does this could be the cause of issue?
Mon May 19 03:50:27.146 [rsBackgroundSync] replSet syncing to: xxxx.prod.xxxx.net:17017
Mon May 19 03:50:27.231 [rsBackgroundSync] replSet our last op time fetched: May
19 03:50:16:152
Mon May 19 03:50:27.231 [rsBackgroundSync] replset source's GTE: May 19 03:50:16
:153
Mon May 19 03:50:27.231 [rsBackgroundSync] replSet rollback 0
Mon May 19 03:50:27.231 [rsBackgroundSync] replSet ROLLBACK
Mon May 19 03:50:27.231 [rsBackgroundSync] replSet rollback 1
Mon May 19 03:50:27.231 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Mon May 19 03:50:27.232 [rsBackgroundSync] replSet info rollback our last optime
: May 19 03:50:16:152
Mon May 19 03:50:27.232 [rsBackgroundSync] replSet info rollback their last opti
me: May 19 03:50:16:155
Mon May 19 03:50:27.232 [rsBackgroundSync] replSet info rollback diff in end of
log times: 0 seconds
Mon May 19 03:50:27.691 [rsBackgroundSync] replSet rollback found matching event
s at Mar 13 06:12:22:11
Mon May 19 03:50:27.691 [rsBackgroundSync] replSet rollback findcommonpoint scan
ned : 222891
Mon May 19 03:50:27.691 [rsBackgroundSync] replSet replSet rollback 3 fixup
Mon May 19 03:50:30.065 [rsBackgroundSync] replSet rollback 3.5
Mon May 19 03:50:30.065 [rsBackgroundSync] replSet rollback 4 n:7018
Mon May 19 03:50:30.065 [rsBackgroundSync] replSet minvalid=May 19 03:50:16 5379
e1e8:155
Mon May 19 03:50:30.065 [rsBackgroundSync] replSet rollback 4.6
Mon May 19 03:50:30.065 [rsBackgroundSync] replSet rollback 4.7
Mon May 19 03:50:30.443 [rsBackgroundSync] ERROR: rollback cannot find object by
id
Mon May 19 03:50:30.444 [rsBackgroundSync] ERROR: rollback cannot find object by
id
Mon May 19 03:50:30.444 [rsBackgroundSync] replSet rollback 5 d:4 u:7016
Mon May 19 03:50:30.460 [rsBackgroundSync] replSet rollback 6

We found oplog in the primary got corrupted somehow. We found it by running hte following queries
db.oplog.rs.find().sort({$natural:1}).explain()
db.oplog.rs.find().sort({$natural:-1}).explain()
So we did a primary step down, and did a fresh sync.

Related

MongoDB data corruption on a replica set

I am working with a MongoDB database running in a replica set.
Unfortunately, I noticed that the data appears to be corrupted.
There should be over 10,000 documents in the database. However, there are several thousand records that are not being returned in queries.
The total count DOES show the correct total.
db.records.find().count()
10793
And some records are returned when querying by RecordID (a custom sequence integer).
db.records.find({"RecordID": 10049})
{ "_id" : ObjectId("5dfbdb35c1c2a400104edece")
However, when querying for a records that I know for a fact should exist, it does not return anything.
db.records.find({"RecordID": 10048})
db.records.find({"RecordID": 10047})
db.records.find({"RecordID": 10046})
The issue appears to be very sporadic, and in some cases entire ranges of records are missing. The entire range from RecordIDs 1500 to 8000 is missing.
Questions: What could be the cause of the issue? What can I do to troubleshoot this issue further and recover the corrupted data? I looked into running repairDatabase but that is for standalone instances only.
UPDATE:
More info on replication:
rs.printReplicationInfo()
configured oplog size: 5100.880859375MB
log length start to end: 14641107secs (4066.97hrs)
oplog first event time: Wed Mar 03 2021 05:21:25 GMT-0500 (EST)
oplog last event time: Thu Aug 19 2021 17:19:52 GMT-0400 (EDT)
now: Thu Aug 19 2021 17:20:01 GMT-0400 (EDT)
rs.printSecondaryReplicationInfo()
source: node2-examplehost.com:27017
syncedTo: Thu Aug 19 2021 17:16:42 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
source: node3-examplehost.com:27017
syncedTo: Thu Aug 19 2021 17:16:42 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
UPDATE 2:
We did a restore from a backup and somehow it looks like it fixed the issue.
We did a restore from a backup and somehow it looks like it fixed the issue.

MongoDB SECONDARY becoming RECOVERING at nighttime

I am running a conventional MongoDB Replica Set consisting of 3 members (member1 in datacenter A, member2 and member3 in datacenter B).
member1 is the current PRIMARY and I am adding members 2 and 3 via rs.add(). They are performing their initial sync and become SECONDARY very soon. Everything is fine all day long and the replication delay of both members is 0 seconds until 2 AM at nighttime.
Now: Every night at 2 AM both members shift into the RECOVERING state and stop replication at all, which leads to a replication delay of hours when I am having a look into rs.printSlaveReplicationInfo() in the morning hours. At around 2 AM there are no massive inserts or maintenance tasks known to me.
I get the following log entries on the PRIMARY:
2015-10-09T01:59:38.914+0200 [initandlisten] connection accepted from 192.168.227.209:59905 #11954 (37 connections now open)
2015-10-09T01:59:55.751+0200 [conn11111] warning: Collection dropped or state deleted during yield of CollectionScan
2015-10-09T01:59:55.869+0200 [conn11111] warning: Collection dropped or state deleted during yield of CollectionScan
2015-10-09T01:59:55.870+0200 [conn11111] getmore local.oplog.rs cursorid:1155433944036 ntoreturn:0 keyUpdates:0 numYields:1 locks(micros) r:32168 nreturned:0 reslen:20 134ms
2015-10-09T01:59:55.872+0200 [conn11111] end connection 192.168.227.209:58972 (36 connections now open)
And, which is more interesting, I get the following log entries on both SECONDARYs:
2015-10-09T01:59:55.873+0200 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one
2015-10-09T01:59:55.873+0200 [rsBackgroundSync] replSet syncing to: member1:27017
2015-10-09T01:59:56.065+0200 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from member1:27017
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet our last optime : Oct 9 01:59:23 5617035b:17f
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet oldest at member1:27017 : Oct 9 01:59:23 5617035b:1af
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet RECOVERING
Which is also striking - the start of the oplog "resets" itself every night at around 2 AM:
configured oplog size: 990MB
log length start to end: 19485secs (5.41hrs)
oplog first event time: Fri Oct 09 2015 02:00:33 GMT+0200 (CEST)
oplog last event time: Fri Oct 09 2015 07:25:18 GMT+0200 (CEST)
now: Fri Oct 09 2015 07:25:26 GMT+0200 (CEST)
I am not sure if this is somehow correlated to the issue. I am also wondering that such a small delay (Oct 9 01:59:23 5617035b:17f <-> Oct 9 01:59:23 5617035b:1af) lets the members become stale.
Could this also be a server (VM host) time issue or is it something completely different? (Why is the first oplog event being "resetted" every night and not "shifting" to a timestamp like NOW minus 24 hrs?)
What can I do to investigate and to avoid?
Upping the oplog size should solve this (per our comments).
Some references for others who run into this issue
Workloads that Might Require a Larger Oplog Size
Error: replSet error RS102 too stale to catch up link1 & link2

Mongo Crashes Periodically

We have a 3-node replicaSet that periodically crashes and is unable to recover. Looking through our PRIMARY server's mongod.log file, I see multiple errors. I'm not sure where to begin or even what to include in this post but I'll start with the errors I am receiving. If I'm missing something, please let me know and I'll edit the post and include it. Can anyone shed any light on why this is happening?
Thu Feb 27 14:09:47.790 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.790 [rsBackgroundSync] replSet syncing to: mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.791 [rsBackgroundSync] repl: couldn't connect to server mongos2i.hostname.com:27017
Thu Feb 27 14:09:47.792 [conn152] end connection xx.xxx.xxx.107:43904 (71 connections now open)
Thu Feb 27 14:09:48.077 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:09:48.077 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet info mongos2i.hostname.com:27017 is down (or slow to respond):
Thu Feb 27 14:09:48.078 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state DOWN
Thu Feb 27 14:09:48.080 [rsMgr] not electing self, mongos1i.hostname.com:27017 would veto with 'mongom1i.hostname.com:27017 is trying to elect itself but mongos2i.hostname.com:27017 is already primary and more up-to-date'
Thu Feb 27 14:09:49.079 [conn153] replSet info voting yea for mongos1i.hostname.com:27017 (1)
Thu Feb 27 14:09:50.080 [rsHealthPoll] replSet member mongos1i.hostname.com:27017 is now in state PRIMARY
Thu Feb 27 14:09:50.081 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is up
Thu Feb 27 14:09:50.082 [initandlisten] connection accepted from xx.xxx.xxx.107:43907 #154 (72 connections now open)
Thu Feb 27 14:09:50.082 [conn154] end connection xx.xxx.xxx.107:43907 (71 connections now open)
Thu Feb 27 14:09:50.086 [initandlisten] connection accepted from xx.xxx.xxx.107:43909 #155 (72 connections now open)
Thu Feb 27 14:09:50.792 [rsBackgroundSync] replSet syncing to: mongos1i.hostname.com:27017
Thu Feb 27 14:09:52.082 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:10:04.090 [conn155] end connection xx.xxx.xxx.107:43909 (71 connections now open)
Thu Feb 27 14:10:04.091 [initandlisten] connection accepted from xx.xxx.xxx.107:43913 #156 (72 connections now open)
Thu Feb 27 14:10:10.731 [conn153] end connection xx.xxx.xxx.97:52297 (71 connections now open)
Thu Feb 27 14:10:10.732 [initandlisten] connection accepted from xx.xxx.xxx.97:52302 #157 (72 connections now open)
Thu Feb 27 14:10:29.706 [initandlisten] connection accepted from 127.0.0.1:56436 #158 (73 connections now open)
Thu Feb 27 14:10:34.100 [conn156] end connection xx.xxx.xxx.107:43913 (72 connections now open)
Thu Feb 27 14:10:34.101 [initandlisten] connection accepted from xx.xxx.xxx.107:43916 #159 (73 connections now open)
Thu Feb 27 14:10:40.743 [conn157] end connection xx.xxx.xxx.97:52302 (72 connections now open)
Thu Feb 27 14:10:40.744 [initandlisten] connection accepted from xx.xxx.xxx.97:52309 #160 (73 connections now open)
Thu Feb 27 14:11:04.110 [conn159] end connection xx.xxx.xxx.107:43916 (72 connections now open)
Thu Feb 27 14:11:04.111 [initandlisten] connection accepted from xx.xxx.xxx.107:43918 #161 (73 connections now open)
Thu Feb 27 14:11:09.191 [conn161] end connection xx.xxx.xxx.107:43918 (72 connections now open)
Thu Feb 27 14:11:09.452 [initandlisten] connection accepted from xx.xxx.xxx.107:43919 #162 (73 connections now open)
Thu Feb 27 14:11:09.453 [conn162] end connection xx.xxx.xxx.107:43919 (72 connections now open)
Thu Feb 27 14:11:09.456 [initandlisten] connection accepted from xx.xxx.xxx.107:43921 #163 (73 connections now open)
Thu Feb 27 14:11:10.111 [rsHealthPoll] DBClientCursor::init call() failed
Thu Feb 27 14:11:10.111 [rsHealthPoll] replset info mongos2i.hostname.com:27017 heartbeat failed, retrying
Thu Feb 27 14:11:10.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state STARTUP2
Thu Feb 27 14:11:10.755 [conn160] end connection xx.xxx.xxx.97:52309 (72 connections now open)
Thu Feb 27 14:11:10.757 [initandlisten] connection accepted from xx.xxx.xxx.97:52311 #164 (73 connections now open)
Thu Feb 27 14:11:12.113 [rsHealthPoll] replSet member mongos2i.hostname.com:27017 is now in state SECONDARY
Thu Feb 27 14:11:23.462 [conn163] end connection xx.xxx.xxx.107:43921 (72 connections now open)
Thu Feb 27 14:11:23.463 [initandlisten] connection accepted from xx.xxx.xxx.107:43925 #165 (73 connections now open)
Thu Feb 27 14:11:31.831 [conn158] end connection 127.0.0.1:56436 (72 connections now open)
Thu Feb 27 14:11:40.768 [conn164] end connection xx.xxx.xxx.97:52311 (71 connections now open)
Thu Feb 27 14:11:40.769 [initandlisten] connection accepted from xx.xxx.xxx.97:52315 #166 (72 connections now open)
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:
We are using CentOS and Mongo 2.4.9.
Thanks in advance for the help.
The log output you have posted shows that your MongoDB instance did not crash. It exited normally.
Consider the following lines:
Thu Feb 27 14:11:53.082 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Feb 27 14:11:53.082 [signalProcessingThread] now exiting
Thu Feb 27 14:11:53.082 dbexit:
The first line above indicates that your MongoDB instancce recieved signal 15 from your OS (SIGTERM). This lead to MongoDB terminating. SIGTERM is the default level for the kill command and for stop portion of an init script in most Linux distros.

Cant initialise replica set on debian (open/create failed in createPrivateMap)

I try to setup MongoDB on my new virtual Server running with Debian 7.3. If a try to configurate the replica set with
hosts = {
"_id" : "rs0",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "localhost:27017"
}
]
}
rs.initiate(hosts);
MongoDB crashes with following exceptions:
Tue Jan 21 00:10:24.599 [initandlisten] MongoDB starting : pid=3616 port=27017 dbpath=/var/lib/mongodb 64-bit host=lvps176-28-17-95.dedicated.hosteurope.de
Tue Jan 21 00:10:24.599 [initandlisten]
Tue Jan 21 00:10:24.600 [initandlisten] ** WARNING: You are running in OpenVZ. This is known to be broken!!!
Tue Jan 21 00:10:24.600 [initandlisten]
Tue Jan 21 00:10:24.600 [initandlisten] db version v2.4.7
Tue Jan 21 00:10:24.600 [initandlisten] git version: 0161738abf06c1f067b56a465b706efd6f4bf2aa
Tue Jan 21 00:10:24.600 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Tue Jan 21 00:10:24.600 [initandlisten] allocator: tcmalloc
Tue Jan 21 00:10:24.600 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", replSet: "rs0" }
Tue Jan 21 00:10:24.609 [initandlisten] journal dir=/var/lib/mongodb/journal
Tue Jan 21 00:10:24.609 [initandlisten] recover : no journal files present, no recovery needed
Tue Jan 21 00:10:24.740 [initandlisten] preallocateIsFaster=true 2.38
Tue Jan 21 00:10:24.780 [initandlisten] waiting for connections on port 27017
Tue Jan 21 00:10:24.780 [websvr] admin web console waiting for connections on port 28017
Tue Jan 21 00:10:24.786 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jan 21 00:10:24.786 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
Tue Jan 21 00:10:27.429 [initandlisten] connection accepted from 127.0.0.1:50602 #1 (1 connection now open)
Tue Jan 21 00:10:34.786 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jan 21 00:10:40.160 [conn1] replSet replSetInitiate admin command received from client
Tue Jan 21 00:10:40.163 [conn1] replSet replSetInitiate config object parses ok, 1 members specified
Tue Jan 21 00:10:40.164 [conn1] replSet replSetInitiate all members seem up
Tue Jan 21 00:10:40.164 [conn1] ******
Tue Jan 21 00:10:40.164 [conn1] creating replication oplog of size: 24630MB...
Tue Jan 21 00:10:40.165 [FileAllocator] allocating new datafile /var/lib/mongodb/local.1, filling with zeroes...
Tue Jan 21 00:10:40.165 [FileAllocator] creating directory /var/lib/mongodb/_tmp
Tue Jan 21 00:10:40.205 [FileAllocator] done allocating datafile /var/lib/mongodb/local.1, size: 2047MB, took 0.036 secs
Tue Jan 21 00:10:40.206 [FileAllocator] allocating new datafile /var/lib/mongodb/local.2, filling with zeroes...
Tue Jan 21 00:10:40.233 [FileAllocator] done allocating datafile /var/lib/mongodb/local.2, size: 2047MB, took 0.027 secs
Tue Jan 21 00:10:40.234 [FileAllocator] allocating new datafile /var/lib/mongodb/local.3, filling with zeroes...
Tue Jan 21 00:10:40.255 [FileAllocator] done allocating datafile /var/lib/mongodb/local.3, size: 2047MB, took 0.02 secs
Tue Jan 21 00:10:40.256 [FileAllocator] allocating new datafile /var/lib/mongodb/local.4, filling with zeroes...
Tue Jan 21 00:10:40.275 [FileAllocator] done allocating datafile /var/lib/mongodb/local.4, size: 2047MB, took 0.019 secs
Tue Jan 21 00:10:40.276 [FileAllocator] allocating new datafile /var/lib/mongodb/local.5, filling with zeroes...
Tue Jan 21 00:10:40.355 [FileAllocator] done allocating datafile /var/lib/mongodb/local.5, size: 2047MB, took 0.079 secs
Tue Jan 21 00:10:40.356 [FileAllocator] allocating new datafile /var/lib/mongodb/local.6, filling with zeroes...
Tue Jan 21 00:10:40.372 [FileAllocator] done allocating datafile /var/lib/mongodb/local.6, size: 2047MB, took 0.014 secs
Tue Jan 21 00:10:40.372 [FileAllocator] allocating new datafile /var/lib/mongodb/local.7, filling with zeroes...
Tue Jan 21 00:10:40.498 [FileAllocator] done allocating datafile /var/lib/mongodb/local.7, size: 2047MB, took 0.121 secs
Tue Jan 21 00:10:40.499 [FileAllocator] allocating new datafile /var/lib/mongodb/local.8, filling with zeroes...
Tue Jan 21 00:10:40.546 [FileAllocator] done allocating datafile /var/lib/mongodb/local.8, size: 2047MB, took 0.046 secs
Tue Jan 21 00:10:40.546 [conn1] ERROR: mmap private failed with out of memory. (64 bit build)
Tue Jan 21 00:10:40.546 [conn1] Assertion: 13636:file /var/lib/mongodb/local.8 open/create failed in createPrivateMap (look in log for more information)
0xde0151 0xda188b 0xda1dcc 0xa5a63b 0xa5af9a 0xaba3b1 0x8d518d 0x8d5698 0x8d577f 0x8d5a1e 0xabbb00 0xac1429 0xa75908 0xc10af1 0x8dd4da 0x8de04d 0x8df582 0xa81f00 0xa867cc 0x9fa469
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde0151]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xda188b]
/usr/bin/mongod() [0xda1dcc]
/usr/bin/mongod(_ZN5mongo8MongoMMF13finishOpeningEv+0x1fb) [0xa5a63b]
/usr/bin/mongod(_ZN5mongo8MongoMMF6createERKSsRyb+0x5a) [0xa5af9a]
/usr/bin/mongod(_ZN5mongo13MongoDataFile4openEPKcib+0x141) [0xaba3b1]
/usr/bin/mongod(_ZN5mongo8Database7getFileEiib+0xbd) [0x8d518d]
/usr/bin/mongod(_ZN5mongo8Database8addAFileEib+0x38) [0x8d5698]
/usr/bin/mongod(_ZN5mongo8Database12suitableFileEPKcibb+0xaf) [0x8d577f]
/usr/bin/mongod(_ZN5mongo8Database11allocExtentEPKcibb+0x9e) [0x8d5a1e]
/usr/bin/mongod(_ZN5mongo13_userCreateNSEPKcRKNS_7BSONObjERSsPb+0x7a0) [0xabbb00]
/usr/bin/mongod(_ZN5mongo12userCreateNSEPKcNS_7BSONObjERSsbPb+0x2b9) [0xac1429]
/usr/bin/mongod(_ZN5mongo11createOplogEv+0xa78) [0xa75908]
/usr/bin/mongod(_ZN5mongo18CmdReplSetInitiate3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1da1) [0xc10af1]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8dd4da]
/usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x71d) [0x8de04d]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8df582]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa81f00]
/usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa867cc]
/usr/bin/mongod() [0x9fa469]
Tue Jan 21 00:10:40.563 [conn1] replSet replSetInitiate exception: file /var/lib/mongodb/local.8 open/create failed in createPrivateMap (look in log for more information)
Tue Jan 21 00:10:40.563 [conn1] command admin.$cmd command: { replSetInitiate: { _id: "rs0", version: 1.0, members: [ { _id: 0.0, host: "localhost:27017" } ] } } ntoreturn:1 keyUpdates:0 locks(micros) W:401$
Tue Jan 21 00:10:44.787 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jan 21 00:10:53.835 [conn1] replSet replSetInitiate admin command received from client
Tue Jan 21 00:10:53.835 [conn1] replSet replSetInitiate config object parses ok, 1 members specified
Tue Jan 21 00:10:53.835 [conn1] replSet replSetInitiate all members seem up
Tue Jan 21 00:10:53.835 [conn1] replSet info saving a newer config version to local.system.replset
Tue Jan 21 00:10:54.564 Invalid access at address: 0x18 from thread: conn1
Tue Jan 21 00:10:54.564 Got signal: 11 (Segmentation fault).
Why MongoDB tries to create 8 files a 2GB? My machine run with 16GB ram, may that is the problem? Because in one line the error "[conn1] ERROR: mmap private failed with out of memory. (64 bit build)" looks like my machine run out of memory. But i only create a replica set with one member and an empty database. Sombody know that bug?
It's not RAM but disk space that is the problem, it's not a bug either.
Tue Jan 21 00:10:40.164 [conn1] creating replication oplog of size: 24630MB...
From MongoDB docs:
The oplog (operations log) is a special capped collection that keeps a
rolling record of all operations that modify the data stored in your
databases. MongoDB applies database operations on the primary and then
records the operations on the primary’s oplog. The secondary members
then copy and apply these operations in an asynchronous process. All
replica set members contain a copy of the oplog, allowing them to
maintain the current state of the database.
For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB
allocates 5% of the available free disk space to the oplog. If this
amount is smaller than a gigabyte, then MongoDB allocates 1 gigabyte
of space.
Above via http://docs.mongodb.org/manual/core/replica-set-oplog/
Oplog is needed for replication (It's a capped collection of a fixed size) and this is created automatically when you create replica set. oplogSize can be set via configuration options (if you would like to just experiment with the setup and can't free up some more disk space).
Here's a doc on it: http://docs.mongodb.org/manual/reference/configuration-options/#oplogSize
However:
Once the mongod has created the oplog for the first time, changing
oplogSize will not affect the size of the oplog.
via http://docs.mongodb.org/manual/reference/configuration-options/#oplogSize
If you would like to change oplogSize after it has been already created you could use this tutorial: http://docs.mongodb.org/manual/tutorial/change-oplog-size/
However if this is your "playground" installation, it better to delete content of your old MongoDB data directory (/var/lib/mongodb), change config file /etc/mongodb.conf (or pass --oplogSize param to mongo when it starts) and just have a "fresh start" with smaller oplog or point your MongoDB dbpath directory to a place where it has more disk space.

MongoDB repair command failed

Previously i was running out of disk space, and mongodb stopped working. Then I have increased disk size but mongodb does not start working.
Though i have journaling enabled, i have execute following command
sudo -u mongodb mongod --dbpath /var/lib/mongodb/ --repair
But this repair command got an exception and stop repairing and then exit.
Fri Nov 30 13:29:36 [initandlisten] build index bd_production.news { _id: 1 }
Fri Nov 30 13:29:36 [initandlisten] fastBuildIndex dupsToDrop:0
Fri Nov 30 13:29:36 [initandlisten] build index done. scanned 2549 total
records. 0.008 secs
Fri Nov 30 13:29:36 [initandlisten] bd_production.change_sets
Assertion failure isOk() src/mongo/db/pdfile.h 360
0x879d86a 0x85a9835 0x85e441e 0x84caa02 0x84c7d19 0x8229b5a 0x822bfd8
0x875bd51 0x875f0c7 0x8760df4 0x83e6523 0x83b6c3b 0x8753b07 0x83b92bf
0x8827ab7 0x882a53b 0x882d4bf 0x882d691 0x85ed280 0x81719dc
mongod(_ZN5mongo15printStackTraceERSo+0x2a) [0x879d86a]
mongod(_ZN5mongo10logContextEPKc+0xa5) [0x85a9835]
...
...
...
... some error msg
Fri Nov 30 13:29:36 [initandlisten] assertion 0 assertion
src/mongo/db/pdfile.h:360 ns:bd_production.change_sets
query:{}
Fri Nov 30 13:29:36 [initandlisten] problem detected during query over
bd_production.change_sets : { $err: "assertion
src/mongo/db/pdfile.h:360" }
Fri Nov 30 13:29:36 [initandlisten] query
bd_production.change_sets ntoreturn:0 keyUpdates:0 exception:
assertion src/mongo/db/pdfile.h:360 reslen:71 197ms
Fri Nov 30 13:29:36 [initandlisten] exception in initAndListen: 13106
nextSafe(): { $err: "assertion src/mongo/db/pdfile.h:360" }, terminating
Fri Nov 30 13:29:36 dbexit:
...
...
news collection is repaired successfully but the 'change_set' does not repair successfully.
How can i repair that particular collection (change_set) or database ?
UPDATE:
When I run mongodump with --repair for that change_set collection i got following error message:
Tue Dec 4 10:45:21 [tools] backwards extent pass
Tue Dec 4 10:45:21 [tools] extent loc: 5:1181e000
Tue Dec 4 10:45:21 [FileAllocator] allocating new datafile /home/suvankar/dd/bd_production.5, filling with zeroes...
Tue Dec 4 10:45:21 [FileAllocator] creating directory /home/suvankar/dd/_tmp
Tue Dec 4 10:45:21 [FileAllocator] done allocating datafile /home/suvankar/dd/bd_production.5, size: 511MB, took 0.042 secs
Tue Dec 4 10:45:21 [tools] warning: Extent not ok magic: 0 going to try to continue
Tue Dec 4 10:45:21 [tools] length:0
Tue Dec 4 10:45:21 [tools] ERROR: offset is 0 for record which should be impossible
Tue Dec 4 10:45:21 [tools] wrote 1 documents
Tue Dec 4 10:45:21 [tools] extent loc: 0:0
Tue Dec 4 10:45:21 [tools] ERROR: invalid extent ofs: 0
Tue Dec 4 10:45:21 [tools] 5 objects
Tue Dec 4 10:45:21 dbexit:
Tue Dec 4 10:45:21 [tools] shutdown: going to close listening sockets...
Tue Dec 4 10:45:21 [tools] shutdown: going to flush diaglog...
Tue Dec 4 10:45:21 [tools] shutdown: going to close sockets...
Tue Dec 4 10:45:21 [tools] shutdown: waiting for fs preallocator...
Tue Dec 4 10:45:21 [tools] shutdown: lock for final commit...
If mongod with repair is not doing it, then it is running into a level of corruption that it can't fix or work around in terms of having a valid and correct set of database files to start up.
You can run mongodump with repair, which is more aggressive in terms of trying to get around the corruption, and is not starting a mongod instance (hence does not require the files to be correct in order to proceed).
mongodump --repair --dbpath /var/lib/mongodb/ <other options here>
Be aware though, that because of the way it attempts to route around the corruption, you may end up with multiple copies of a document. With how mongorestore works this is not an issue, but depending on the level of corruption you can end up with dump files far larger than you would expect. In a very extreme case, I once saw 10x data produced, though that was the exception rather than the rule.
Once you have dumped everything out to your satisfaction, start mongod clean and re-import to get back to a good state.