MongoDB data remove - reclaim diskspace [duplicate] - mongodb

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Auto compact the deleted space in mongodb?
My understanding is that on delete operations MongoDB won't free up the disk space but would reuse it as needed.
Is that correct?
If not, would I have run a repair command?
Could the repair be run on a live mongo instance?

Yes it is correct.
No, better to give mongodb as much disk space as possible( if mongodb can allocate more space than less disk fragmentation you will have, in additional allocating space is expensive operation). But if you wish you can run db.repairDatabase() from mongodb shell to shrink database size.
Yes you can run repairDatabase on live mongodb instance ( better to run it in none peak hours)

This is somewhat of a duplicate of this MongoDB question ...
Auto compact the deleted space in mongodb?
See that answer for details on how to ...
Reclame some space
Use serverside JS
to run a recurring job to get back
space (including a script you can run ...)
How you might want to look
into Capped Collections for some use
cases!
Also you can see this related blog posting: http://learnmongo.com/posts/compacting-mongodb-data-files/

I have another solution that might work better than doing db.repairDatabase() if you can't afford for the system to be locked, or don't have double the storage.
You must be using a replica set.
My thought is once you've removed all of the excess data that's gobbling your disk, stop a secondary replica, wipe its data directory, start it up and let it resynchronize with the master. Repeat with the other secondaries, one at a time.
On the master, do an rs.stepDown() to hand over MASTER to one of the synched secondaries, now stop this one, wipe it, and let it resync.
The process is time consuming, but it should only cost a few seconds of down time, when you do the rs.stepDown().

Related

Completely deleting a database in sharded mongoDB cluster

I am planning to test a MongoDB cluster with some random data to test for performance. Then, I am planning to delete the data and use it for production.
My concern is that doing just the db.dropDatase() may not reclaim all the disk space in all the shards and config servers. This answer from stack overflow says that "MongoDB does not recover disk space when actually data size drops due to data deletion along with other causes."
This documentation kind of says that "You do not need to reclaim disk space for MongoDB to reuse freed space. See Empty records for information on reuse of freed space" but I want to know the proper steps to delete a sharded MongoDB database.
I am currently on MongoDB 3.6.2.
Note: To people who may say I need a different Mongodb database for testing and production I want to make it clear that the production is itself a test to replace another old database. So right now, I am not looking for another big cluster just to test for performance.
I think that you have here the best solution, i can explaint you but i would be wasting my time and you would be losing your time
[https://dzone.com/articles/reclaiming-disk-space-from-mongodb][1]

MongoDB disk space reclamation

I am familiar both with the MongoDB repairDatabase and compact commands, but these both seem to lock the database and/or collection. Is there another way to reclaim deleted disk space without essentially shutting down the database? What are best practices in this area? Thanks!
Best practice would probably depend on your schema and what your application does. Here's my use case, perhaps you can learn something... My application is storing very large amounts of time stamped data samples. Deleting data from a very large store is a very expensive operation, this gets more complicated when you try doing this on live systems. MongoDB had several issues in the past with reclaiming the disk space back to OS and we had to dance around this, not sure how good it works now. But what we did solved everything for good - we partitioned the data in such way so that we could dispose of old stuff by simply dumping entire database. Dropping mongodb database is a very cheap and efficient operation, almost instantaneous even when you drop a TB. Note that dropping collection is not as effective as dropping database, this was actually a key to the solution. For doing this we had to redesign the schema.. Your case of course could be different, but the lesson learned is that deleting data from large storage is very expensive.
The best method currently is to run a Master Slave Setup.
Shutdown 1 mongod instance and let it resync.
More details here: Reducing MongoDB database file size

Why mongodb doesn't use the deleted data's space? [duplicate]

This question already has answers here:
Auto compact the deleted space in mongodb?
(4 answers)
Closed 8 years ago.
I set one shard's maxsize to 20GB and keep remove data every day. but I see one db in this shard's storageSize is about 18G, but its data file on the disk is about 33GB and it keep growing. I know mongo is just sign deleted data, but should it reuse that space? Or I need configure something?
I would not run repair on your database as suggested whenever you feel like getting back your space; this would result in disastrous performance problems for one.
The problem you have is that documents are not "fitting" into the spaces you have in your deleted buckets list (for reference this is a good presentation: http://www.mongodb.com/presentations/storage-engine-internals it will explain everything I am talking about) and since MongoDB runs out of "steam" searching your list before it makes a whole new document you are most likely getting a new extent for every document you are inserting/updating.
You could run a compact but then that is as bad as repair in terms of performance but it will send those delete spaces to a free list which can be used by any size document.
You could also collmod with powerof2sizes: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes which may well fix your problem.
Ideally you need to redo your application, you underestimated how in-place updates/inserts into MongoDB works and now it is slaughtering you.

Mongodb normal exit before applying a write lock

I am using python, scrapy, MongoDB for my web scraping project. I used to scrape 40Gb data daily. Is there a way or setting in mongodb.conf file so that MongoDB will exit normally before applying a write lock on db due to disk full error ?
Because every time i face this problem of disk full error in MongoDB. Then I have to manually re-install MongoDB to remove the write lock from db. I cant run repair and compact command on the database because for running this command also I need free space.
MongoDB doesn't handle disk-full errors very well in certain cases, but you do not have to uninstall and then re-install MongoDB to remove the lock file. Instead, you can just mongod.lock file from this. As long as you have journalling enabled, your data should be good. Of course, at that moment, you can't add more data to the MongoDB databases.
You probably wouldn't need repair and compact only helps if you actually have deleted data from MongoDB. compact does not compress data, so this is only useful if you indeed have deleted data.
Constant adding, and then deleting later can cause fragmentation and lots of disk space to be unused. You can prevent that mostly by using the userPowerOf2Sizes option that you can set on collections. compact mitigates this by rewriting the database files as well, but as you said you need free disk space for this. I would advice you to also add some monitoring to warn you when your data size reaches 50% of your full disk space. In that case, there is still plenty of time to use compact to reclaim unused space.

How to reclaiming deleted space without `db.repairDatabase()`?

I want to shrink data files size by reclaiming deleted space, but I can't run db.repairDatabase(), because free disk space is not enough.
Update: With WiredTiger, compact does free space.
The original answer to this question is here:
Reducing MongoDB database file size
There really is nothing outside of repair that will reclaim space. The compact should allow you to go much longer on the existing space. Otherwise, you will have to migrate to a bigger drive.
One way to do this is to use an off-line secondary from your Replica Set. This should give you a whole maintenance window to migrate, repair, move back and bring back up.
If you are not running a Replica Set, it's time to look at doing just that.
You could run the compact command on a single collection, or one by one in all the collections you want to shrink.
http://www.mongodb.org/display/DOCS/Compact+Command
db.runCommand( { compact : 'mycollectionname' } )
As noted in comments, I was mistaken, compact does not actually reclaim disk space, it only defragments and rebuilds collection indexes.
Instead though, you could use "--repairpath" option if you have another drive available which has available freespace.
For example:
mongod --dbpath /data/db --repair --repairpath /data/db0
Shown here: http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/
You can as well do a manual mongodump and mongorestore. That's basically the same what repairDatabase does. That way you can dump and restore it to/from a different machine with sufficient disk space.
If you're running a replica-set, you will want to issue a resync on each of your secondaries, one at a time. Once this has been completed, step-down your primary and resync the newly assigned secondary.
To resync, stop your mongod instance, delete the locals and start the process back up. Watch the logs to ensure everything starts back up properly and the resync has initiated.
If you have a lot of data / indexes, ensure your oplog is large enough, otherwise it's likely to go stale.
There is one other option, if you are using a replica set, but with a lot of caveats. You can fail over to another set member, then delete the files on the now former primary and do a full resync. A full resync rewrites the files from scratch in a similar way to a repair, but you will also have to rebuild indexes. This is not to be done lightly.
If you go down this path, my recommendation would be to have a 3 member replica set before doing this for disk space reclamation, so that at any time when a member is syncing from scratch you have 2 set members fully functional.
If you do not have a replica set, I recommend creating one, with two secondaries. When you sync them initially you will be creating a nice unfragmented and unpadded versions of your data. More here:
http://www.mongodb.org/display/DOCS/Replica+Set+Configuration