How to reclaiming deleted space without `db.repairDatabase()`? - mongodb

I want to shrink data files size by reclaiming deleted space, but I can't run db.repairDatabase(), because free disk space is not enough.

Update: With WiredTiger, compact does free space.
The original answer to this question is here:
Reducing MongoDB database file size
There really is nothing outside of repair that will reclaim space. The compact should allow you to go much longer on the existing space. Otherwise, you will have to migrate to a bigger drive.
One way to do this is to use an off-line secondary from your Replica Set. This should give you a whole maintenance window to migrate, repair, move back and bring back up.
If you are not running a Replica Set, it's time to look at doing just that.

You could run the compact command on a single collection, or one by one in all the collections you want to shrink.
http://www.mongodb.org/display/DOCS/Compact+Command
db.runCommand( { compact : 'mycollectionname' } )
As noted in comments, I was mistaken, compact does not actually reclaim disk space, it only defragments and rebuilds collection indexes.
Instead though, you could use "--repairpath" option if you have another drive available which has available freespace.
For example:
mongod --dbpath /data/db --repair --repairpath /data/db0
Shown here: http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/

You can as well do a manual mongodump and mongorestore. That's basically the same what repairDatabase does. That way you can dump and restore it to/from a different machine with sufficient disk space.

If you're running a replica-set, you will want to issue a resync on each of your secondaries, one at a time. Once this has been completed, step-down your primary and resync the newly assigned secondary.
To resync, stop your mongod instance, delete the locals and start the process back up. Watch the logs to ensure everything starts back up properly and the resync has initiated.
If you have a lot of data / indexes, ensure your oplog is large enough, otherwise it's likely to go stale.

There is one other option, if you are using a replica set, but with a lot of caveats. You can fail over to another set member, then delete the files on the now former primary and do a full resync. A full resync rewrites the files from scratch in a similar way to a repair, but you will also have to rebuild indexes. This is not to be done lightly.
If you go down this path, my recommendation would be to have a 3 member replica set before doing this for disk space reclamation, so that at any time when a member is syncing from scratch you have 2 set members fully functional.
If you do not have a replica set, I recommend creating one, with two secondaries. When you sync them initially you will be creating a nice unfragmented and unpadded versions of your data. More here:
http://www.mongodb.org/display/DOCS/Replica+Set+Configuration

Related

Mongodb normal exit before applying a write lock

I am using python, scrapy, MongoDB for my web scraping project. I used to scrape 40Gb data daily. Is there a way or setting in mongodb.conf file so that MongoDB will exit normally before applying a write lock on db due to disk full error ?
Because every time i face this problem of disk full error in MongoDB. Then I have to manually re-install MongoDB to remove the write lock from db. I cant run repair and compact command on the database because for running this command also I need free space.
MongoDB doesn't handle disk-full errors very well in certain cases, but you do not have to uninstall and then re-install MongoDB to remove the lock file. Instead, you can just mongod.lock file from this. As long as you have journalling enabled, your data should be good. Of course, at that moment, you can't add more data to the MongoDB databases.
You probably wouldn't need repair and compact only helps if you actually have deleted data from MongoDB. compact does not compress data, so this is only useful if you indeed have deleted data.
Constant adding, and then deleting later can cause fragmentation and lots of disk space to be unused. You can prevent that mostly by using the userPowerOf2Sizes option that you can set on collections. compact mitigates this by rewriting the database files as well, but as you said you need free disk space for this. I would advice you to also add some monitoring to warn you when your data size reaches 50% of your full disk space. In that case, there is still plenty of time to use compact to reclaim unused space.

MongoDB Replica Set: Disk size difference in Primary and Secondary Nodes

I just did the mongodb replica set configuration and all looks good. All data moved to secondary nodes properly. But when I looked at the data directory, I can see Primary have ~140G of data and at the same time secondary has only ~110G.
Did anyone come across this kind of issue while setting up the Replica Set. Is that something normal behavior?
When you do an initial sync from scratch on a secondary, it writes all the data fresh. This removes padding, empty space (deleted data) etc. As a result, in that respect it is similar to running a repair.
If you ran a repair on the primary (blocking operation, only to be done if absolutely necessary), then the two would be far closer overall.
If you check the output from db.stats() you should see that the various databases have the same object count, the data directory size differences are nothing to be worried about.

mongodb Excessive Disk Space

my mongodb take 114g which is 85%of my disk
trying to free some space using db.repairDatabase() will fail as i don't have enough free space
i know that my data shouldn't take so much space as i used to have a big collection that took 90% of the disk.
i then drop this collection and re-inserted only 20% of its data.
how can i free some space ?
The disk space for data storage is preallocated by MongoDB and can only be reclaimed by rebuilding the database with new preallocated files. Typically this is done through a db.repairDatabase() or a backup & restore. As you've noted, a repair requires enough space to create a new copy of the database so may not be an option.
Here are a few possible solutions, but all involve having some free space available elsewhere:
if there is enough free space to mongodump that database, you could mongodump, drop, and mongorestore it. db.dropDatabase() will remove the data files on disk.
if you are using a volume manager such as LVM or ZFS, you could add extra disk space to the logical volume in order to repair or dump & restore the database.
if you have another server, you could set up a replica set to sync the data without taking down your current server (which would be a "primary" in the replica set). Once the data is sync'd to a secondary server, you could then stepdown the original primary and resync the database.
Note that for a replica set you need a minimum of three nodes .. so either three data nodes, or two data nodes plus an arbiter. In a production environment the arbiter would normally run on a third server so it can allow either of the data nodes to become a primary if the current primary is unavailable. In your reclaiming disk space scenario it would be OK to run the arbiter on one of the servers instead (presumably the new server).
Replica sets are generally very helpful for administrative purposes, as they allow you to step down a server for maintenance (such as running a database compact or repair) while still having a server available for your application. They have other benefits as well, such as maintaining redundant copies of your data for automatic failover/recovery.

MongoDB: mongodump/restore vs. backup up files directly

I'm wondering about experiences people have had with MongoDB backups. Assuming a filesystem snapshot is not an option, what have your experiences been with mongodump/restore versus doing a write lock and backing up the files? Have you run into any bugs with one method that caused you to switch?
From the reading I've done so far, it seems like mongodump/restore has the advantage of being able to run it while the server is live, but I'm not sure how well it will scale.
Locking and copying files is only an option when you don't have heavy write load.
mongodump can be run against live server. It will create some additional load, so don't do it on peak hours. Also, it is advised to do it on a secondary node (if you don't use replica sets, you should).
There are some complications when you have a DB so large that no single machine can hold it. See this document.
Also, if you have replica set, you take down one of secondaries and copy its files directly. See http://www.mongodb.org/display/DOCS/Backups:
A simple approach is just to stop the database, back up the data files, and resume. This is safe but of course requires downtime. This can be done on a secondary without requiring downtime, but you must ensure your oplog is large enough to cover the time the secondary is unavailable so that it can catch up again when you restart it.

MongoDB data remove - reclaim diskspace [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Auto compact the deleted space in mongodb?
My understanding is that on delete operations MongoDB won't free up the disk space but would reuse it as needed.
Is that correct?
If not, would I have run a repair command?
Could the repair be run on a live mongo instance?
Yes it is correct.
No, better to give mongodb as much disk space as possible( if mongodb can allocate more space than less disk fragmentation you will have, in additional allocating space is expensive operation). But if you wish you can run db.repairDatabase() from mongodb shell to shrink database size.
Yes you can run repairDatabase on live mongodb instance ( better to run it in none peak hours)
This is somewhat of a duplicate of this MongoDB question ...
Auto compact the deleted space in mongodb?
See that answer for details on how to ...
Reclame some space
Use serverside JS
to run a recurring job to get back
space (including a script you can run ...)
How you might want to look
into Capped Collections for some use
cases!
Also you can see this related blog posting: http://learnmongo.com/posts/compacting-mongodb-data-files/
I have another solution that might work better than doing db.repairDatabase() if you can't afford for the system to be locked, or don't have double the storage.
You must be using a replica set.
My thought is once you've removed all of the excess data that's gobbling your disk, stop a secondary replica, wipe its data directory, start it up and let it resynchronize with the master. Repeat with the other secondaries, one at a time.
On the master, do an rs.stepDown() to hand over MASTER to one of the synched secondaries, now stop this one, wipe it, and let it resync.
The process is time consuming, but it should only cost a few seconds of down time, when you do the rs.stepDown().