MongoDb replica-server getting killed due to less memory? - mongodb

Require a huge help here, since this is affecting our production instance.
One of the replica server is failing due to lack of memory (see below chunk of piece from kern.log)
kernel: [80110.848341] Out of memory: kill process 4643 (mongod) score 214181 or a child
kernel: [80110.848349] Killed process 4643 (mongod)
UPDATE
kernel: mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
kernel: [85544.157191] mongod cpuset=/ mems_allowed=0
kernel: [85544.157195] Pid: 7545, comm: mongod Not tainted 2.6.32-318-ec2
Insight:
Primary server DB size is 50GB out of which 30GB is filled by index.
Primary server has 7GB Ram whereas secondary server has 3.1GB Ram.
Both servers are 64-bit machine and running Debian/Ubuntu respectively.
Running Mongo 2.0.2 on both servers
Note:
I see a similar issue has been created in Jira-Mongo web-site recently - no answer to that yet.

Have you got swap enabled on these instances? While generally not needed for mongoDB operation it can prevent the process from being killed by the kernel when you hit an OOM situation. That is mentioned here:
http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-Swap
The issue referenced is happening during a full re-sync rather than ongoing production replication - is that what you are doing also?
Once you get things stable, take a look at your Res memory in mongostat or MMS, if that is exceeding or close to 3GB you should consider upgrading your secondary.

I had a similar issue. One of the things to check is how many open connections you have. run the lsof command to see the open files associated with the mongod process. Try disabling journaling and see if you see a smaller number of open files. If so, let the replica catch up and then re-enable journaling. That might help. Adding swap should help too or if possible temporarily up the RAM.

Related

mongodb high resource usage and configuration hints

I am running a MongoDB instance as a ubuntu service, on a VM with enough resources to handle it with ease, the system is on-premise and not Cloud.
I am a newbie with MongoDB and this is a test/dev environment.
The problem I am getting right now is an abnormal usage of CPU and RAM resources, due to a huge amount of MongoDB threads running and hanging around.
Here's an HTOP resume and a strace of the worse of those bad guys.
sudo strace -p 973
strace: Process 973 attached
futex(0x5644b5e929e8, FUTEX_WAIT_PRIVATE, 0, NULL
strace: Process 973 detached
<detached ...>
other than a possible solution, can you advise me on any interesting articles about setting and configuring Mongo to run for production?
other info:
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
> db.version()
4.2.7
Solved.
apparently one of my collections was growing too big and I was executing various aggregate pipelines on it.
It ended up clogging the machine, solved with some tweaks to the structure :)

Exception when performing restart from replica set to standalone

I am currently experimenting with MongoDB replica set mechanism.
I already have a working standalone Mongo server with a main database of about 20GB of data.
I decided to convert this mongo server to a primary replica set server, then added a 2nd machine with a similar configuration (but a newer mongo version), as a secondary replica set server.
This works fine, all data is replicated to the secondary as expected.
But I would like to perform some alteration operations on the data (because somehow, my data model has changed and I need to, for example rename some properties, or convert references to a simple ObjectId, some things like that). By the same time I would like to update the first server which has an old version (2.4) to the last version available (2.6).
So I decided to follow the instructions on the MongoDB website to perform maintenance on replica set members.
shut down the secondary server. (ok)
restart server as standalone on another port (both servers usually run on 27017)
mongod --dbpath /my/database/path --port 37017
And then, the server never restarts correctly and I get this:
2014-10-03T08:20:58.716+0200 [initandlisten] opening db: myawesomedb
2014-10-03T08:20:58.735+0200 [initandlisten] myawesomedb Assertion failure _name == nsToDatabaseSubstring( ns ) src/mongo/db/catalog/database.cpp 472
2014-10-03T08:20:58.740+0200 [initandlisten] myawesomedb 0x11e6111 0x1187e49 0x116c15e 0x8c2208 0x765f0e 0x76ab3f 0x76c62f 0x76cedb 0x76d475 0x76d699 0x7fd958c3eec5 0x764329
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
/usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x1187e49]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x17e) [0x116c15e]
/usr/bin/mongod(_ZN5mongo8Database13getCollectionERKNS_10StringDataE+0x288) [0x8c2208]
/usr/bin/mongod(_ZN5mongo17checkForIdIndexesEPNS_8DatabaseE+0x19e) [0x765f0e]
/usr/bin/mongod() [0x76ab3f]
/usr/bin/mongod(_ZN5mongo14_initAndListenEi+0x5df) [0x76c62f]
/usr/bin/mongod(_ZN5mongo13initAndListenEi+0x1b) [0x76cedb]
/usr/bin/mongod() [0x76d475]
/usr/bin/mongod(main+0x9) [0x76d699]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fd958c3eec5]
/usr/bin/mongod() [0x764329]
2014-10-03T08:20:58.756+0200 [initandlisten] exception in initAndListen: 0 assertion src/mongo/db/catalog/database.cpp:472, terminating
2014-10-03T08:20:58.757+0200 [initandlisten] dbexit:
What am I doing wrong ?
Note that at this time, the first server is still running as primary member.
Thanks in advance!
I believe you are hitting a bug in VMWare here (can you confirm you are using VMWare VMs? confirmed) - I have seen it confirmed on Ubuntu and Fedora so far. The bug causes pieces of previous data to not be zero'ed out when creating the MongoDB namespace files (not always, but sometimes). That previous data essentially manifests as corruption in the namespace files and leads to the assertion you saw.
To work around the issue, there will be a fix released in MongoDB versions 2.4.12 and 2.6.5+ as part of SERVER-15369. The OS/Kernel level fix will eventually percolate down from the kernel bug and the Ubuntu patch, but that may take some time to actually be available as an official update (hence the need for the workaround change in MongoDB itself in the interim).
The issue will only become apparent when you upgrade to 2.6 because of additional checking added to that version that was not present in 2.4, however the corruption is still present, just not reported on version 2.4
If you still have your primary running, and it does not have the corruption, I would recommend syncing a secondary that is not on a VMWare VM and/or taking a backup of your files as soon as possible for safety - there is no automatic way to fix this corruption right now.
You can also look at using version 2.6.5 once it is released (2.6.5 rc4 is available as of writing this which includes the fix). You will still need to resync with that version off your good source to create a working secondary, but at least there will then be no corruption of the ns files.
Updates:
Version 2.6.5 which includes the fix mentioned was released on October 9th
Version 2.4.12 which includes the fix was released on October 16th
Official MongoDB Advisory: https://groups.google.com/forum/#!topic/mongodb-announce/gPjazaAePoo

Why and what case to issue mongodb repair command

I am using Mongodb 2.4.8 on a 64 bit machiene with 3 servers as replicaSet, for which i have currently disbaled journaling on my development box .
Durabilty is not so important for our Application , so the reason i have disabled Journaling Option .
I see that there is only one advantage of journaling , that is in case of an unclean shutdown we dont have to issue a repair command as journaling will take care of it .
To produce this unclean shutdown i killed mongo replica process using kill -9 Mongo process Id , i just removed mongo locks and restarted the mongo primary , secondary and the arbitery servers , everything started fine .
My question is that , when i should we issue the repair command actually (as removing locks and restart works )
Please excuse if the question is too dumb , as i wanted to know the risk of disbaling journaling under production .
The repairDatabase command checks your whole database for corrupted data and discards that data so the rest becomes usable again.
This can become necessary after an unclear shutdown. In your case the shutdown didn't appear to corrupt any data (or maybe it did, but it didn't become apparent yet because the data in question wasn't accessed yet). But that doesn't mean that this will always be the case. Was your database actually doing anything at that moment? When the database is idle or only performing read-operations, there is usually not much to worry about. But when it is currently in the middle of a large write-operation, a sudden shutdown without journaling can be much more troublesome.
Another scenario where a database could be corrupted and repairDatabase could help is a physical malfunction of the storage medium or a corruption of the underlying filesystem.
Important note regarding replica-sets: When you have a replica-set, and only one node is corrupted, then you should rather remove that node and rebuild it from the other members of the replica-set. RepairDatabase will destroy any corrupted data. Restoring from a replica-set will not.

Is mongod --repair still a blocking task in MongoDB 1.8?

I have a 5GB database I want to compact and repair. Unfortunately, I have an active application running on that database.
I'm wondering if running a mongod --repair task with MongoDB 1.8 will block all the other write operations on the database.
I don't want to shutdown the entire application for hours...
You may take a look at --journal key. It keeps binary log for last operations and recovery may take much less time than repair.
http://www.mongodb.org/display/DOCS/Durability+and+Repair
Yes, repairDatabase is a blocking operation, which means you'll need to do it during a scheduled maintenance window.
Alternately, if you are using a replica set, it's possible to repair with no down time by taking one member out of the replica set, repairing it, re-adding it to the replica set, and repeating until all are repaired. See the note in yellow at the end of this section for more info and caveats.

Is it normal for MongoDB whole /data/db to be gone after a electric trip that result in crash

I have a single machine that has MongoDB and its data is at /data/db as usual.
When my machine crashed due to an electric power trip, my MongoDB refuse to start at launch (Mac OS X Server via LaunchAgent) and also /data/db mysteriously disappear!
Also all log file are wipe out. This happen on my development SSD MBA and I thought is just a weird SSD case. But my XServe server is getting it as well when the power trip.
Am I missing some data protection articles somewhere? For sure it can't be this unreliable by just deleting /data/db!!??
MongoDB will never ever remove your database files!
In case of a crash you have to start mongod using the --repair option.
In addition: using the new journaling option of MongoDB in V 1.8+ that should help a lot when you run MongoDB as standalone service.
No that is not normal.
If it won't start, it's likely mongodb is indicating that you need to run a repair because mongod.lock is present and has a certain state in /data/db. But that would mean /data/db exists.
If /data/db exists but were empty (which in this case would be bad obviously), it would start right up.
If you log(s) are missing, sounds like a more general disk issue.
So check the startup message if about mongod.lock there is data there. Also with v1.8+ use journaling. (albeit you wouldn't lose all datafiles even without journaling)