High level of fragmentation with MongoDB 2.2.1 - mongodb

On a legacy system that is running MongoDB 2.2.1 we are running out of disk space due to excessively large database files. Our actual data size is just under 3 GB, with about 1.7 GB index size, but the storage size is over 70 GB. So, the storage to data+index ratio is close to factor 15. There are about 40 data files, most of which are at the 2 GB maximum file size.
We are contemplating to run a compact() or repair() to regain some of the unused space, but we are worried about the problem recurring soon after. It seems that the current configuration (pretty close to the default configuration) is not suitable for the database usage pattern of our application.
What other tools, diagnostics, remedies or configuration changes are available that could help MongoDB make better use of the disk space?

WiredTiger, used in MongoDB 3.0 and later, is much more efficient in terms of disk usage.
However, migrating from MongoDB 2.2 to 3.0 is going to be a huge leap.
Another option, assuming this is configured as a replica set, is to re-sync the Secondary nodes individually and then perform a failover. This will have the same affect as performing a repair without the downtime that would occur as a result of using the repairDatabase command.

Related

Mongodb terminates when it runs out of memory

I have the following configuration:
a host machine that runs three docker containers:
Mongodb
Redis
A program using the previous two containers to store data
Both Redis and Mongodb are used to store huge amounts of data. I know Redis needs to keep all its data in RAM and I am fine with this. Unfortunately, what happens is that Mongo starts taking up a lot of RAM and as soon as the host RAM is full (we're talking about 32GB here), either Mongo or Redis crashes.
I have read the following previous questions about this:
Limit MongoDB RAM Usage: apparently most RAM is used up by the WiredTiger cache
MongoDB limit memory: here apparently the problem was log data
Limit the RAM memory usage in MongoDB: here they suggest to limit mongo's memory so that it uses a smaller amount of memory for its cache/logs/data
MongoDB using too much memory: here they say it's WiredTiger caching system which tends to use as much RAM as possible to provide faster access. They also state it's completely okay to limit the WiredTiger cache size, since it handles I/O operations pretty efficiently
Is there any option to limit mongodb memory usage?: caching again, they also add MongoDB uses the LRU (Least Recently Used) cache algorithm to determine which "pages" to release, you will find some more information in these two questions
MongoDB index/RAM relationship: quote: MongoDB keeps what it can of the indexes in RAM. They'll be swaped out on an LRU basis. You'll often see documentation that suggests you should keep your "working set" in memory: if the portions of index you're actually accessing fit in memory, you'll be fine.
how to release the caching which is used by Mongodb?: same answer as in 5.
Now what I appear to understand from all these answers is that:
For faster access it would be better for Mongo to fit all indices in RAM. However, in my case, I am fine with indices partially residing on disk as I have a quite fast SSD.
RAM is mostly used for caching by Mongo.
Considering this, I was expecting Mongo to try and use as much RAM space as possible but being able to function also with few RAM space and fetching most things from disk. However, I limited Mongo Docker container's memory (to 8GB for instance), by using --memory and --memory-swap, but instead of fetching stuff from disk, Mongo just crashed as soon as it ran out of memory.
How can I force Mongo to use only the available memory and to fetch from disk everything that does not fit into memory?
Thanks to #AlexBlex's comment I solved my issue. Apparently the problem was that Docker limited the container's RAM to 8GB but the wiredTiger storage engine was still trying to use up 50% - 1GB of the total system RAM for it's cache (which in my case would have been 15 GB).
Capping wiredTiger's cache size by using this configuration option to a value less than what Docker was allocating solved the problem.

mongodb sudden spike in memory usage

There is a spike in the memory utilization of mongodb on our CentOS-07 server.It has 64 Gigabytes of RAM.This is a stand alone mongodb instance which doesn't have any application running on it and there are house keeping scripts enabled to keep only the relevant data .We haven't indexed the data .The total size of data on disk is 81 Gigabyte. This issue was not seen before we tried enabling replication,post which the the node set-up has been using high memory hence was disabled,we then brought up a fresh stand alone instance of mongo. The memory usage hasn't come down ever since we tried re-starting the mongo server but hasn't worked.Is there any reason for mongodb to use so much memory??Below is a link to the snap shot of the mem usage taken from the site server.
The mongo version is 2.6.5
Image link
This is not surprising. See the Memory Use section in the docs for the MMAPv1 storage engine (which is what MongoDB 2.6 uses):
With MMAPv1, MongoDB automatically uses all free memory on the machine as its cache. System resource monitors show that MongoDB uses a lot of memory, but its usage is dynamic. If another process suddenly needs half the server’s RAM, MongoDB will yield cached memory to the other process.
It is also not surprising that the usage spiked after enabling replication, as it sounds like you had a fully populated database and then added a replica member. This would mean that the replica member would need to perform an initial sync of the data from that node, which would require a read of every document which would "prime" MongoDB's cache as a result.

Memory usage remains 99 % even after create index is done on MongoDB Collection

While doing indexing on MongoDB. Now we have nearly 350 GBs of data in the database and its deployed as a windows service in AWS EC2.
And we are doing indexing for some experimentation. But every time I run the indexing command the memory usage goes to 99% and even after the indexing is done the memory usage keeps like that until I restart the service.
The instance has 30 GB of RAM and SSD drive. And right now we have the DB setup as stand alone (not sharded till now). And we are using the latest version of MongoDB.
Any feedback related to this will be helpful.
Thanks,
Arpan
That's normal behavior for MongoDB.
MongoDB grabs all the RAM it can get to cache each accessed document as long as possible. When you add an index to a collection, each document needs to be read once to build the index, which causes MongoDB to load each document into RAM. It then keeps them in RAM in case you want to access them later. But MongoDB will not squat the RAM. When another process needs memory, MongoDB will willingly release it.
This is explained in the FAQ:
Does MongoDB require a lot of RAM?
Not necessarily. It’s certainly
possible to run MongoDB on a machine with a small amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
See also: FAQ: MongoDB Diagnostics for answers to additional questions
about MongoDB and Memory use.

Can't map file memory-mongo requires 64 bit build for larger datasets

I have a sharded cluster in 3 systems.
While inserting I get the error message:
cant map file memory-mongo requires 64 bit build for larger datasets
I know that 32 bit machine have a limit size of 2 gb.
I have two questions to ask.
The 2 gb limit is for 1 system, so the total data will be, 6gb as my sharding is done in 3 systems. So it would be only 2 gb or 6 gb?
While sharding is done properly, all the data are stored in single system in spite of distributing data in all the three sharded system?
Does Sharding play any role in increasing the datasize limit?
Does chunk size play any vital role in performance?
I would not recommend you do anything with 32bit MongoDB beyond running it on a development machine where you perhaps cannot run 64bit. Once you hit the limit the file becomes unuseable.
The documentation states "Use 64 bit for production. This is important as if you hit the mmap size limit (exact limit varies but less than 2GB) you will be unable to write to the database (analogous to a disk full condition)."
Sharding is all about scaling out your data set across multiple nodes so in answer to your question, yes you have increased the possible size of your data set. Remember though that namespaces and indexes also take up space.
You haven't specified where your mongos resides??? Where are you seeing the error from - a mongod or the mongos? I suspect that it's the mongod. I believe that you need to look at pre-splitting the chunks - http://docs.mongodb.org/manual/administration/sharding/#splitting-chunks.
which would seem to indicate that all your data is going to the one mongod.
If you have a mongos, what does sh.status() return? Are chunks spread across all mongod's?
For testing, I'd recommend a chunk size of 1mb. In production, it's best to stick with the default of 64mb unless you've some really important reason why you don't want the default and you really know what you are doing. If you have too small of a chunk size, then you will be performing splits far too often.

MongoDB scaling

How much a MongoDB can scale? I heard a talk about 32bit system have 2-4GB of space available or something like that? Can it save 32GB of data in a single Mongo database in a computer and support querying that 32GB of data from that database using a regular query?
How powerful is MongoDB anyway in terms of size? And when/if the sharding comes into play. I'm looking for a gigantic database as long as the disk permits using MongoDB? It would be funny if MongoDB supports 4GB per database. I'm looking towards 200GB of storage in 5 collections in 1 mongo database in 1 computer running Mongo.
It's true that a single instance of MongoDB on a 32-bit system supports up to 2Gb of data. This is due to the storage engine being directly built on top of memory mapped files which have a maximum addressable space of 2Gb.
That said, I'd say very few, if any, companies will actually run a production database on 32-bit hardware so it's hardly ever an issue. On 64-bit builds the theoretical maximum storage is 2^63, but that's obviously well beyond the size of any real world dataset.
So, on a single 64-bit system you can very easily run 200Gb of data. Whether or not you want to on a production environment is another question. If you only run a single instance there's no real fail-over available. With journaling enabled and safe writes (w >= 1) you should be relatively fine though.
You can have a look at this document about sharding and scaling limits:
http://www.mongodb.org/display/DOCS/Sharding+Limits
Scale Limits
Goal is support of systems of up to 1,000 shards. Testing so far has
been limited to clusters with a modest number of shards (e.g., 100).