How to control size of journals in mongodb? - mongodb

How to control the size of large journals files as journal file take large amount of space. How can the space can be saved using small files.

After doing some research:
Setting smallfiles option for controlling journaling doesn't control the size
However the command is --smallfiles

Here are some ways to control journal file size,
Use --smallfiles option in mongod which will use 128MB as max size of journal file instead of 1GB
ulimit is way in unix to control system settings. It will allow you to set max. file size in system. Consider this when you have sharding in place.
Reduce commitIntervalMs to a lower value i.e flush data to disk frequently. Use this option only when you can tolerate heavy IO load periodically.

Related

MongoDB set maximum size limit on data files

I downloaded MongoDb on my local desktop. Since I have other important stuff on this machine, I need to make sure MongoDB data files do not occupy more than 10GB of disk storage at any time. If it exceeds 10GB, I expect an error message while inserting new documents.
Is there a way to set this max size on disk via the config file?
As at MongoDB 4.0, there isn't a general configuration option to limit the maximum size on disk for a deployment.
However, there are several ways you could limit storage usage on your desktop:
Use a separate partition or storage volume for your MongoDB dbPath
Run MongoDB inside a virtual machine or container with a maximum storage allocation
Connect to a hosted MongoDB deployment
In general it is a bad idea to let your database server run out of space as this may result in unexpected errors or shutdown depending on the operations that are trying to complete at the point when the server runs out of space. Typically you would want to have a process monitoring storage so you can proactively free some space before the issue becomes critical.
A few space saving tips that might help:
Rotate your MongoDB log files regularly to limit storage usage. If you are using a Unix/Linux system you can configure the logrotate utility to rotate and compress logs when they reach a target filesize and subsequently remove archived logs when they reach a certain age.
Consider using TTL indexes to automatically remove old data from collections. This can be useful if you have collections with ephemeral data like user sessions that will become stale after an expiry date.
Drop unused indexes & collections. The WiredTiger storage engine (default in MongoDB 3.2+) allocates a file per collection and index, so dropping either of those should immediately free up storage space.

High level of fragmentation with MongoDB 2.2.1

On a legacy system that is running MongoDB 2.2.1 we are running out of disk space due to excessively large database files. Our actual data size is just under 3 GB, with about 1.7 GB index size, but the storage size is over 70 GB. So, the storage to data+index ratio is close to factor 15. There are about 40 data files, most of which are at the 2 GB maximum file size.
We are contemplating to run a compact() or repair() to regain some of the unused space, but we are worried about the problem recurring soon after. It seems that the current configuration (pretty close to the default configuration) is not suitable for the database usage pattern of our application.
What other tools, diagnostics, remedies or configuration changes are available that could help MongoDB make better use of the disk space?
WiredTiger, used in MongoDB 3.0 and later, is much more efficient in terms of disk usage.
However, migrating from MongoDB 2.2 to 3.0 is going to be a huge leap.
Another option, assuming this is configured as a replica set, is to re-sync the Secondary nodes individually and then perform a failover. This will have the same affect as performing a repair without the downtime that would occur as a result of using the repairDatabase command.

Creating an instance from AMI increases mongodb journal size to 3GB

I have an AWS instance running in 1 machine. It has all the data files, server setup, mongodb database etc. I created a new AMI image, and then tried to launch an instance from this image.
In the new machine as soon as created, the size of the mongodb journal started to increase from just 2.6MB in the original machine to 3.1GB in the new machine. (When the machine starts, and I ssh to the machine, I can see the size of the files increasing gradually and in 10 minutes it reaches around 3.1GB and stops.
I see that based on other answers, the 3.1GB is some magic number for journal files. My question is, why was it small in my original machine and why does it increase only after starting the instance.
I don't see 'smallFile' setting enabled in old or the new machine. There is no other changes. I have retried creating new images and new instances from these images multiple times.
Please let me know how to fix this issue? My total data file size is around 195MB only and the original journal file size is around 2.6MB.
From the sounds of things, you are using MMAPv1 and so are just seeing the natural usage of the journal files, where each journal file should be up to 1GB in size. As stated in the docs, in normal conditions you should have up to 3 files and so up to 3GB of journal.
The fix for really small DBs that need to run on small VMs (like yours), is to enable the smallFiles setting. As already noted in SO, this should not be a problem.
While you're at it, you might also want to check out this other answer when you switch: Setting smallfiles option for controlling journaling doesn't control the size

MongoDB Journal File Size - Relation to Data File Size

Is there a way to determine the journal file size based on a data file size?
For example, I've arrived at a data file size of 10 GB (approximately) based on data + index length considerations and preallocation.
I understand journal is also pre-allocated (after every 1GB file size). So, for 10 GB data file, is it possible to assume journal will also be 10 GB? Or is there any other way to calculate it?
The MongoDB journal files are fixed size 1GB files (unless you use the smallfiles option). There will be at most three 1GB journal files, so you will never have more than 3GB of journal.
http://docs.mongodb.org/manual/core/journaling/

MongoDB and disk space

We have a MongoDB cluster with 4 shards.
Our primary shard disk space has 700GB, and according to db.stats() that shard is using ~530GB.
When checking df -h, the disk usage is on 99% (9.5 GB free), i'm guessing this means that all the rest is data files pre-allocated by Mongo.
I've ran compact on couple of collections, and the disk space was reduced to 3.5GB(?)
We're going to run a process that will generate ~140GB of extra data (35GB per shard).
Should we have any concerned on running out of disk space?
Thanks in advance.
compact doesn't decrease disk usage at all, actually it could even lead to additional file perallocation. To reduce disk usage you could use repairDatabase command or start mongo with repair option. However, it would require additional free space on disk.
Described situation could be the case if you did a lot of document deletions or some operations that forced documents to move. In this case your database would be highly defragmented. compact command helps you to reduce defragmentation and you will have more space for new records, but again, it doesn't reclaim any space back to OS.
Best option for you is to try to get why you have such level of defragmentation.