MongoRestore Create Index Phase Uses 100% resources and locks up database - mongodb

I'm using MongoDB. I have a table with 7M records and a weighted text search index.
When i do a MongoRestore, the create index phase of the restore uses 100% of my database's resources. MongoDB unresponsive to anything until it is done. My db is locked to any incoming connections. In fact it stops reporting any progress of the index creation to my output at that point, and my mongodb client starts getting request timeout errors. I can still tail into the server side mongodb logs to check the progress of the index creation.
I need the database to be responsive while this process is happening. It works just fine for all my other tables, which are a bit smaller. The next largest table, which works great, and still uses a weighted text search index is around 3M records.
What do i do?! Thanks.

I haven't tried this, but it seems that indexes created with { background: true } are dumped with this property by mongodump. This property will be passed to mongorestore during the index creation phase.
Maybe you could recreate some strategical indexes with the background option, and then dump the database. Then, the restore processes should put less strain on the server, and finish faster. Read and write operations should be allowed while MondoDB rebuilds the backgrounded indexes.
Notice that background index builds take longer to complete and result in a larger index. Also, this will not work with secondary replica set members, since background index creation operations will be foregrounded on them.
http://docs.mongodb.org/manual/tutorial/build-indexes-in-the-background/
http://docs.mongodb.org/manual/tutorial/build-indexes-on-replica-sets/
HTH.

I ran into similar issue(s):
Mongo restore took up so much resources that other database operations would simply time out or take on the order of a minute to complete (the restore is in essence a denial of service attack on the DB).
Mongo restore index phase completely blocked the database.
I found that to limit the bandwidth for the restore issue 1 was solved. I used the linux tc command line tool to achieve this. Tweaking the rate and burst from very low until other database operations started to be affected and then scaling it back a bit. The command looked as follows:
sudo tc qdisc change dev enp3s0 root tbf rate 30000kbit burst 40000kbit latency 5ms
To solve issue 2 I found this link which suggests you either:
update the *.metadata.json files in the dump directory to add background:true if not present.
use mongorestore's --noIndexRestore option to avoid accidentally building any indexes in the foreground, and then create the indexes with background:true after mongorestore finishes restoring the data.
Of course all of this is an issue because MongoDB best practices are not followed, which are to have the operational database always work in some form of replica set. If replication is present then one have many more options available such as (oversimplified) taking one out of the larger replication set, restore to it, and then move it back into the replication set.

Related

Drop a 5TB collection in Mongo without bringing down the db

In our Mongo configuration we have a replica set with a primary and 2 secondaries. We currently have a collection that is about 5TB in size that we want to drop completely. From reading docs it sounds like just dropping the collection would lock the database. Seems like it might take a bit to delete 5TB and anything more than a few minutes downtime really isn't an option.
I tried deleting records a little bit at a time via query and remove commands, but this still slowed the db down to a crawl.
I've thought about taking the primary out of the set, dropping the collection and then putting it back in the set as primary, but what will the impact be of having those changes replicate to the secondaries? Is it still just going to use a ton of cpu and lock things up?
The end goal is to move all of our mongo instances to smaller disks, so it would be nice if there was an option that allowed us to tackle both the migration and the deletion of the data at the same time.
Any advice is appreciated.

How to reclaiming deleted space without `db.repairDatabase()`?

I want to shrink data files size by reclaiming deleted space, but I can't run db.repairDatabase(), because free disk space is not enough.
Update: With WiredTiger, compact does free space.
The original answer to this question is here:
Reducing MongoDB database file size
There really is nothing outside of repair that will reclaim space. The compact should allow you to go much longer on the existing space. Otherwise, you will have to migrate to a bigger drive.
One way to do this is to use an off-line secondary from your Replica Set. This should give you a whole maintenance window to migrate, repair, move back and bring back up.
If you are not running a Replica Set, it's time to look at doing just that.
You could run the compact command on a single collection, or one by one in all the collections you want to shrink.
http://www.mongodb.org/display/DOCS/Compact+Command
db.runCommand( { compact : 'mycollectionname' } )
As noted in comments, I was mistaken, compact does not actually reclaim disk space, it only defragments and rebuilds collection indexes.
Instead though, you could use "--repairpath" option if you have another drive available which has available freespace.
For example:
mongod --dbpath /data/db --repair --repairpath /data/db0
Shown here: http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/
You can as well do a manual mongodump and mongorestore. That's basically the same what repairDatabase does. That way you can dump and restore it to/from a different machine with sufficient disk space.
If you're running a replica-set, you will want to issue a resync on each of your secondaries, one at a time. Once this has been completed, step-down your primary and resync the newly assigned secondary.
To resync, stop your mongod instance, delete the locals and start the process back up. Watch the logs to ensure everything starts back up properly and the resync has initiated.
If you have a lot of data / indexes, ensure your oplog is large enough, otherwise it's likely to go stale.
There is one other option, if you are using a replica set, but with a lot of caveats. You can fail over to another set member, then delete the files on the now former primary and do a full resync. A full resync rewrites the files from scratch in a similar way to a repair, but you will also have to rebuild indexes. This is not to be done lightly.
If you go down this path, my recommendation would be to have a 3 member replica set before doing this for disk space reclamation, so that at any time when a member is syncing from scratch you have 2 set members fully functional.
If you do not have a replica set, I recommend creating one, with two secondaries. When you sync them initially you will be creating a nice unfragmented and unpadded versions of your data. More here:
http://www.mongodb.org/display/DOCS/Replica+Set+Configuration

is there a limit for CopyDatabase in Mongo

I'm copying 100Million records (about 97G) data to another server using copyDatabase in Mongo. Both servers have more than 500G diskspace. However, i notice although the process is still running, but the actual files are not added anymore. stop at xxxxx.11 any idea?
Copy database will move collection data over, then build indexes for each collection. Building indexes on a 100GB data set can take a lot of time (especially with small amounts of RAM). It's likely that you're in the middle of a large index build.
You can check the progress by watching the logs and running db.currentOp() in the shell on the destination DB.

MongoDB: mongodump/restore vs. backup up files directly

I'm wondering about experiences people have had with MongoDB backups. Assuming a filesystem snapshot is not an option, what have your experiences been with mongodump/restore versus doing a write lock and backing up the files? Have you run into any bugs with one method that caused you to switch?
From the reading I've done so far, it seems like mongodump/restore has the advantage of being able to run it while the server is live, but I'm not sure how well it will scale.
Locking and copying files is only an option when you don't have heavy write load.
mongodump can be run against live server. It will create some additional load, so don't do it on peak hours. Also, it is advised to do it on a secondary node (if you don't use replica sets, you should).
There are some complications when you have a DB so large that no single machine can hold it. See this document.
Also, if you have replica set, you take down one of secondaries and copy its files directly. See http://www.mongodb.org/display/DOCS/Backups:
A simple approach is just to stop the database, back up the data files, and resume. This is safe but of course requires downtime. This can be done on a secondary without requiring downtime, but you must ensure your oplog is large enough to cover the time the secondary is unavailable so that it can catch up again when you restart it.

When/Where is the best-practices time/place to configure a MongoDB "schema"?

In an app that uses MongoDB, when/where is the best place to make database changes that would be migrations in a relational database?
For example, how should creating indexes or setting shard keys be managed? Where should this code go?
it's probably best to do this in the shell, conciously!, because you could cause havoc if you accidentally start such a command at the wrong moment and on the wrong instance.
Most importantly: do this offline on an extra slave instance if you add an index on an existing DB! For large data sets, building an index can take hours, even days!
see also:
http://www.mongodb.org/display/DOCS/Indexes
http://www.javabeat.net/articles/print.php?article_id=353
http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation
http://nosql.mypopescu.com/post/1312926692/mongodb-indexes-and-indexing
If you have a large data set, make sure to read up on the 4square outage last year..!!
http://www.infoq.com/news/2010/10/4square_mongodb_outage
http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/
http://highscalability.com/blog/2010/10/15/troubles-with-sharding-what-can-we-learn-from-the-foursquare.html
one of the main reasons for not wanting to put indexing in a script or config file of some sort is that in MongoDB the index operation is blocking(!) -- that means MongoDB will stop other operations on the database from proceeding until the indexing is completed. Just imagine an innocent change in the code, requiring a new index to improve performance -- and this change is carelessly checked-in and deployed to production ... and suddenly your production MongoDB is feezing up for your app-server, because MongoDB is internally adding the new index first before doing anything else... outch! Apparently that has happened to a couple of folks, that's why they keep reminding people at the MongoDB conferences to be careful to not 'programmatically' require indexes.
New versions of MongoDB allow background indexing -- you should always do that e.g. db.yourcollection.ensureIndex(..., {background: true})
otherwise, not-so-fun stuff happens:
https://jira.mongodb.org/browse/SERVER-1341
https://jira.mongodb.org/browse/SERVER-3067