Exception when performing restart from replica set to standalone - mongodb

I am currently experimenting with MongoDB replica set mechanism.
I already have a working standalone Mongo server with a main database of about 20GB of data.
I decided to convert this mongo server to a primary replica set server, then added a 2nd machine with a similar configuration (but a newer mongo version), as a secondary replica set server.
This works fine, all data is replicated to the secondary as expected.
But I would like to perform some alteration operations on the data (because somehow, my data model has changed and I need to, for example rename some properties, or convert references to a simple ObjectId, some things like that). By the same time I would like to update the first server which has an old version (2.4) to the last version available (2.6).
So I decided to follow the instructions on the MongoDB website to perform maintenance on replica set members.
shut down the secondary server. (ok)
restart server as standalone on another port (both servers usually run on 27017)
mongod --dbpath /my/database/path --port 37017
And then, the server never restarts correctly and I get this:
2014-10-03T08:20:58.716+0200 [initandlisten] opening db: myawesomedb
2014-10-03T08:20:58.735+0200 [initandlisten] myawesomedb Assertion failure _name == nsToDatabaseSubstring( ns ) src/mongo/db/catalog/database.cpp 472
2014-10-03T08:20:58.740+0200 [initandlisten] myawesomedb 0x11e6111 0x1187e49 0x116c15e 0x8c2208 0x765f0e 0x76ab3f 0x76c62f 0x76cedb 0x76d475 0x76d699 0x7fd958c3eec5 0x764329
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
/usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x1187e49]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x17e) [0x116c15e]
/usr/bin/mongod(_ZN5mongo8Database13getCollectionERKNS_10StringDataE+0x288) [0x8c2208]
/usr/bin/mongod(_ZN5mongo17checkForIdIndexesEPNS_8DatabaseE+0x19e) [0x765f0e]
/usr/bin/mongod() [0x76ab3f]
/usr/bin/mongod(_ZN5mongo14_initAndListenEi+0x5df) [0x76c62f]
/usr/bin/mongod(_ZN5mongo13initAndListenEi+0x1b) [0x76cedb]
/usr/bin/mongod() [0x76d475]
/usr/bin/mongod(main+0x9) [0x76d699]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fd958c3eec5]
/usr/bin/mongod() [0x764329]
2014-10-03T08:20:58.756+0200 [initandlisten] exception in initAndListen: 0 assertion src/mongo/db/catalog/database.cpp:472, terminating
2014-10-03T08:20:58.757+0200 [initandlisten] dbexit:
What am I doing wrong ?
Note that at this time, the first server is still running as primary member.
Thanks in advance!

I believe you are hitting a bug in VMWare here (can you confirm you are using VMWare VMs? confirmed) - I have seen it confirmed on Ubuntu and Fedora so far. The bug causes pieces of previous data to not be zero'ed out when creating the MongoDB namespace files (not always, but sometimes). That previous data essentially manifests as corruption in the namespace files and leads to the assertion you saw.
To work around the issue, there will be a fix released in MongoDB versions 2.4.12 and 2.6.5+ as part of SERVER-15369. The OS/Kernel level fix will eventually percolate down from the kernel bug and the Ubuntu patch, but that may take some time to actually be available as an official update (hence the need for the workaround change in MongoDB itself in the interim).
The issue will only become apparent when you upgrade to 2.6 because of additional checking added to that version that was not present in 2.4, however the corruption is still present, just not reported on version 2.4
If you still have your primary running, and it does not have the corruption, I would recommend syncing a secondary that is not on a VMWare VM and/or taking a backup of your files as soon as possible for safety - there is no automatic way to fix this corruption right now.
You can also look at using version 2.6.5 once it is released (2.6.5 rc4 is available as of writing this which includes the fix). You will still need to resync with that version off your good source to create a working secondary, but at least there will then be no corruption of the ns files.
Updates:
Version 2.6.5 which includes the fix mentioned was released on October 9th
Version 2.4.12 which includes the fix was released on October 16th
Official MongoDB Advisory: https://groups.google.com/forum/#!topic/mongodb-announce/gPjazaAePoo

Related

MongoDB WiredTiger error: WiredTiger.turtle: handle-open: open: operation not permitted

MongoDB was working beautifully for me for several months until I had an unexpected shutdown a week or two ago. Since then, I've been getting the error in the title that snowballs into an invalid argument, then a library panic, then some fatal assertions which cause MongoDB to crash.
Now, I've done my research: the normal answers are to run the repair function and to make sure SELinux isn't screwing up the process. Neither of those have worked. The error gets thrown during WiredTiger's checkpoint process, so reads/writes to the database aren't the issue, and because it's during the checkpoint process, it guarantees that MongoDB won't stay up for more than a day.
To be clear: all the files in the database are owned by mongod:mongod, have permissions set to 600 (default, and I tried setting them to 755 to see if that fixed it, and it didn't). I'm running mongodb as a service on a CentOS 7 box, and the service file specifies that it should run as user mongod. The mongod.conf file specifies a mounted filesystem as the database, and it was happy with that until the unexpected shutdown. I'm running MongoDB version 4.0.1, so WiredTiger really doesn't like it if I disable Journaling either (disregarding the fact that I shouldn't disable it in the first place).
I feel like I've exhausted all my options, and that the only thing I can do is backup my data and reinstall MongoDB. Are there any that I've missed?
After creating a backup of my data via mongodump, shutting down mongo, removing the entire database with rm -rf 'path-to-database', rebooting mongo (without the replication config), and restoring the data with mongorestore, mongodb still crashes. This time, however, it's with an Invariant failure after the open: operation not permitted. The only conclusion I can think of is that the data itself has become corrupted in some way. Thankfully, this isn't "mission critical" data, so to speak, and I can easily obtain new data.
Unfortunately, this doesn't answer my original question of "what other options do I have?". However, I'm still posting this in case others run into this same kind of issue.
EDIT: invariant issue was caused by me forgetting to re-initialize my replication set. After fixing that, it's clean. Because of this, I no longer believe it was a data corruption issue, but a checkpoint corruption issue.
EDIT 2: So the issue arose again after about a week, and after another week of trying various debugging methods, I tried simply moving the mongo process to another server. So far, that's been working. The previous server was acting up (I couldn't even run top at one point - another process had a lock on a necessary library file to run it), so here's to hoping that the current server doesn't follow suite.

mongod unclean shutdown detected

I try to start mongod.exe but I have and I get the following error:
C:\MongoDB\Server\30\bin>mongod.exe
2015-12-16T19:12:17.108+0100 I CONTROL 2015-12-16T19:12:17.110+0100 W CONTROL 32-bit servers don't have journaling enabled by default.
Please use --journal if you want durability.
2015-12-16T19:12:17.110+0100 I CONTROL
2015-12-16T19:12:17.120+0100 I CONTROL Hotfix KB2731284 or later update is not installed, will zero-out data files
2015-12-16T19:12:17.132+0100 I STORAGE [initandlisten] **************
2015-12-16T19:12:17.132+0100 I STORAGE [initandlisten] Error: journal files are present in journal directory, yet starting without journaling enabled.
2015-12-16T19:12:17.133+0100 I STORAGE [initandlisten] It is recommended that you start with journaling enabled so that recovery may occur.
2015-12-16T19:12:17.133+0100 I STORAGE [initandlisten] **************
2015-12-16T19:12:17.135+0100 I STORAGE [initandlisten] exception in initAndListen: 13597 can't start without --journal enabled when journal/ files are present, terminating
2015-12-16T19:12:17.135+0100 I CONTROL [initandlisten] dbexit: rc: 100
I also tried to run it with --repair but then I get the same error.
Finally, I tried to delete the mongod.lock file but I still get the error.
How should I fix the unclean shutdown?
The solution to this problem is mongod --repair. This command automatically shuts down the all processes and repairs Mongodb issues. You can find more details in the official documentation.
Ok, to get some confusion right here. Journal files are not there to annoy you. They hold data not yet applied to the datafiles, but already received and acknowledged by the server. The mongod process finished a request after applying the data to the journal, but before applying them to the data files.
This behavior is configured by the chosen write concern.
Bottom line: special measurements were taken to make the data in the journal durable, you should not ignore that.
So you should create a configuration file containing this (among other things, if one already exists):
storage:
journal:
enabled: true
Please follow the documentation on running MongoDB on windows to the letter. Adjust the configuration file with options according to your needs.
If you are absolutely, positively sure that you do not need journaling, you can start mongodb with the --journal command line option just once, shut the instance down after the journal was successfully applied and remove the journal files then. Expect any write with a write concern involving the journal to fail, however.
Note 1 You are using the 32-bit version of MongoDB, which is only suitable for testing. Note that the 32-bit version only supports up to 2Gb of data.
Note 2 MongoDB is VERY well documented. You really should read the manual from top to bottom – it get's you started fast enough with providing a lot of information on the internals.
start cmd shell as admin and call start_mongo. This should fix it
Same error.
It' permission issue. If you get this error on Windows platform you should do all operations with administrator privilegies.
On Linux run
mongod --repair
but you should run it with sudo or under root. If under root you should change ownership of the files in data DB directory. Do it carefully or another error will appear.
Try removing the lock file, then running with --repair.
Here's what the Mongo Docs say about recovering data / restarting after an unexpected shutdown.

Why and what case to issue mongodb repair command

I am using Mongodb 2.4.8 on a 64 bit machiene with 3 servers as replicaSet, for which i have currently disbaled journaling on my development box .
Durabilty is not so important for our Application , so the reason i have disabled Journaling Option .
I see that there is only one advantage of journaling , that is in case of an unclean shutdown we dont have to issue a repair command as journaling will take care of it .
To produce this unclean shutdown i killed mongo replica process using kill -9 Mongo process Id , i just removed mongo locks and restarted the mongo primary , secondary and the arbitery servers , everything started fine .
My question is that , when i should we issue the repair command actually (as removing locks and restart works )
Please excuse if the question is too dumb , as i wanted to know the risk of disbaling journaling under production .
The repairDatabase command checks your whole database for corrupted data and discards that data so the rest becomes usable again.
This can become necessary after an unclear shutdown. In your case the shutdown didn't appear to corrupt any data (or maybe it did, but it didn't become apparent yet because the data in question wasn't accessed yet). But that doesn't mean that this will always be the case. Was your database actually doing anything at that moment? When the database is idle or only performing read-operations, there is usually not much to worry about. But when it is currently in the middle of a large write-operation, a sudden shutdown without journaling can be much more troublesome.
Another scenario where a database could be corrupted and repairDatabase could help is a physical malfunction of the storage medium or a corruption of the underlying filesystem.
Important note regarding replica-sets: When you have a replica-set, and only one node is corrupted, then you should rather remove that node and rebuild it from the other members of the replica-set. RepairDatabase will destroy any corrupted data. Restoring from a replica-set will not.

MongoDB 2Gb limit - can't compact database

I have been adding files to GridFS in my 32bit Mongo database. It eventually failed when the size of all Mongo files hit 2Gb. So, I then deleted the files in GridFS. I've tried running the repairDatabase() command, but it fails, saying "mongo requires 64bit for larger datasets". I get the same error trying to run the compact command against GridFS.
So, I've hit the 2Gb limit, but it won't let me compact or repair because it doesn't have space. Talk about Catch22!!
What do I do?
Edit
This is an immediate problem I have - how do I compact the database right now?
I think the only recourse is to upgrade to a 64-bit OS.
I had the same problem on my database and I solved it such way. At first I created Amazon EC2 64-bit instance and moved database files from 32-bit instance via plain copy. Then I made all needed cleanups in database on 64-bit instance and made dump with mongodump. This dump I moved back to 32-bit instance and restored database from it.
If you need to restore database with same name, that you had before, you can just rename your old db-files in dbpath (files have database name in their name)
And of course, you should move to upgrade to 64-bit later. MongoDB on 32-bit OS is very bad in support.
shot in the dark here... you could try opening a slave off the master (in 64 bit) and see if you can force a replication over to the slave, essentially backing up your data. I have no idea if this would actually work, as it's pretty clear that 32bit has a 2gig limit (all their docs warn about this :( ), but thought I'd at least post a somewhat potentially creative solution..

Is it normal for MongoDB whole /data/db to be gone after a electric trip that result in crash

I have a single machine that has MongoDB and its data is at /data/db as usual.
When my machine crashed due to an electric power trip, my MongoDB refuse to start at launch (Mac OS X Server via LaunchAgent) and also /data/db mysteriously disappear!
Also all log file are wipe out. This happen on my development SSD MBA and I thought is just a weird SSD case. But my XServe server is getting it as well when the power trip.
Am I missing some data protection articles somewhere? For sure it can't be this unreliable by just deleting /data/db!!??
MongoDB will never ever remove your database files!
In case of a crash you have to start mongod using the --repair option.
In addition: using the new journaling option of MongoDB in V 1.8+ that should help a lot when you run MongoDB as standalone service.
No that is not normal.
If it won't start, it's likely mongodb is indicating that you need to run a repair because mongod.lock is present and has a certain state in /data/db. But that would mean /data/db exists.
If /data/db exists but were empty (which in this case would be bad obviously), it would start right up.
If you log(s) are missing, sounds like a more general disk issue.
So check the startup message if about mongod.lock there is data there. Also with v1.8+ use journaling. (albeit you wouldn't lose all datafiles even without journaling)