I am planning to test a MongoDB cluster with some random data to test for performance. Then, I am planning to delete the data and use it for production.
My concern is that doing just the db.dropDatase() may not reclaim all the disk space in all the shards and config servers. This answer from stack overflow says that "MongoDB does not recover disk space when actually data size drops due to data deletion along with other causes."
This documentation kind of says that "You do not need to reclaim disk space for MongoDB to reuse freed space. See Empty records for information on reuse of freed space" but I want to know the proper steps to delete a sharded MongoDB database.
I am currently on MongoDB 3.6.2.
Note: To people who may say I need a different Mongodb database for testing and production I want to make it clear that the production is itself a test to replace another old database. So right now, I am not looking for another big cluster just to test for performance.
I think that you have here the best solution, i can explaint you but i would be wasting my time and you would be losing your time
[https://dzone.com/articles/reclaiming-disk-space-from-mongodb][1]
Related
I have set up a free tier MongoDB-atlas database and have a script that is storing tweets on it. Using db.collection.stats() it says storage size is 32768 which will fill up quite fast. Firstly, what happens when you exceed this limit? are new entries rejected or something else? Secondly, is there a way to deal with this without upgrading? For example, is it possible to clear entries before exceeding capacity?
When you exceed the limit the atlas cluster node will have exceeded the limit will be unavailable. It may be possible that the entire cluster will go down and then you will need to contact the MongoDB support to make the cluster up.
Although the best option is this that you need to upgrade to next tier for having more storage capacity. But in case you don't want that in that case you may write a script to delete old data from your cluster and after deleting the data make sure to run the compact command to reclaim the data storage.
I am using replica set (2 mongo, 1 arbitor) for my Sitecore CD servers.
Assuming all mongo DB data get flushed to Reporting SQL DB; do we need to take backup of MongoDB database on production CD ?
If yes what is best approach and frequency to do it; considering My application is moderately using anaytics feature (Personalization , Campaign etc).
Unfortunately, your assumption is bad - the MongoDB is the definitive source of analytic data, not the reporting db. The reporting db contains only the aggregate info needed for generating the report (mostly). In fact, if (when) something goes wrong with the SQL DB, the idea is that it is rebuilt from the source MongoDB. Remember: You can't un-add two numbers after you've added them!
Backup vs Replication
A backup is a point-in-time view of the database, where replication is multiple active copies of a current database. I would advocate for replication over backup for this type of data. Why? Glad you asked!
Currency - under what circumstance would you want to restore a 50GB MongoDB? What if it was a week old? What if it was a month? Really the only useful data is current data, and websites are volatile places - log data backups are out of date within an hour. If you personalise on stale data is that providing a good user experience?
Cost - backing up large datasets is costly in terms of time, storage capacity and compute requirements; they are also a pain to restore and the bigger they are the more likely there's a corruption somewhere
Run of business
In a production MongoDB environment you really should have 2-3 replicas. That's going to save your arse if one of the boxes dies, which they sometimes do - MongoDB works the disks very hard.
These replicas are self-healing, and always current (pretty-much) so they are much better than taking backups. The chances that you lose all your replicas at once is really low except for one particular edge case... upgrades. So a backup is really only protection against hardware failure or data corruption which, in a multi-instance replica set, is already very effectively handled. Unless you're paranoid, you're never going to use that backup and it'll cost you plenty to have it.
Sitecore Upgrades
This is the killer edge-case - always make backups (see Back Up and Restore with MongoDB Tools) before running an upgrade because you can corrupt all of your replicas in one motion and you'll want to be able to roll back.
Data Trimming (side-note)
You didn't ask this, but at some point you'll be thinking "how the heck can I back up this 170GB monster db every day? this is ridiculous" - and you'll be right.
There are various schools of thought around how long this data should be persisted for - that's a question only you or your client can answer. I suggest keeping it until there's too much, then make a decision on how much you have to get rid of. Keep as much as you can tolerate.
I am familiar both with the MongoDB repairDatabase and compact commands, but these both seem to lock the database and/or collection. Is there another way to reclaim deleted disk space without essentially shutting down the database? What are best practices in this area? Thanks!
Best practice would probably depend on your schema and what your application does. Here's my use case, perhaps you can learn something... My application is storing very large amounts of time stamped data samples. Deleting data from a very large store is a very expensive operation, this gets more complicated when you try doing this on live systems. MongoDB had several issues in the past with reclaiming the disk space back to OS and we had to dance around this, not sure how good it works now. But what we did solved everything for good - we partitioned the data in such way so that we could dispose of old stuff by simply dumping entire database. Dropping mongodb database is a very cheap and efficient operation, almost instantaneous even when you drop a TB. Note that dropping collection is not as effective as dropping database, this was actually a key to the solution. For doing this we had to redesign the schema.. Your case of course could be different, but the lesson learned is that deleting data from large storage is very expensive.
The best method currently is to run a Master Slave Setup.
Shutdown 1 mongod instance and let it resync.
More details here: Reducing MongoDB database file size
I am using python, scrapy, MongoDB for my web scraping project. I used to scrape 40Gb data daily. Is there a way or setting in mongodb.conf file so that MongoDB will exit normally before applying a write lock on db due to disk full error ?
Because every time i face this problem of disk full error in MongoDB. Then I have to manually re-install MongoDB to remove the write lock from db. I cant run repair and compact command on the database because for running this command also I need free space.
MongoDB doesn't handle disk-full errors very well in certain cases, but you do not have to uninstall and then re-install MongoDB to remove the lock file. Instead, you can just mongod.lock file from this. As long as you have journalling enabled, your data should be good. Of course, at that moment, you can't add more data to the MongoDB databases.
You probably wouldn't need repair and compact only helps if you actually have deleted data from MongoDB. compact does not compress data, so this is only useful if you indeed have deleted data.
Constant adding, and then deleting later can cause fragmentation and lots of disk space to be unused. You can prevent that mostly by using the userPowerOf2Sizes option that you can set on collections. compact mitigates this by rewriting the database files as well, but as you said you need free disk space for this. I would advice you to also add some monitoring to warn you when your data size reaches 50% of your full disk space. In that case, there is still plenty of time to use compact to reclaim unused space.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Auto compact the deleted space in mongodb?
My understanding is that on delete operations MongoDB won't free up the disk space but would reuse it as needed.
Is that correct?
If not, would I have run a repair command?
Could the repair be run on a live mongo instance?
Yes it is correct.
No, better to give mongodb as much disk space as possible( if mongodb can allocate more space than less disk fragmentation you will have, in additional allocating space is expensive operation). But if you wish you can run db.repairDatabase() from mongodb shell to shrink database size.
Yes you can run repairDatabase on live mongodb instance ( better to run it in none peak hours)
This is somewhat of a duplicate of this MongoDB question ...
Auto compact the deleted space in mongodb?
See that answer for details on how to ...
Reclame some space
Use serverside JS
to run a recurring job to get back
space (including a script you can run ...)
How you might want to look
into Capped Collections for some use
cases!
Also you can see this related blog posting: http://learnmongo.com/posts/compacting-mongodb-data-files/
I have another solution that might work better than doing db.repairDatabase() if you can't afford for the system to be locked, or don't have double the storage.
You must be using a replica set.
My thought is once you've removed all of the excess data that's gobbling your disk, stop a secondary replica, wipe its data directory, start it up and let it resynchronize with the master. Repeat with the other secondaries, one at a time.
On the master, do an rs.stepDown() to hand over MASTER to one of the synched secondaries, now stop this one, wipe it, and let it resync.
The process is time consuming, but it should only cost a few seconds of down time, when you do the rs.stepDown().