MongoDB - Forcefully keeping index + working set in RAM - mongodb

I am currently using MongoDB to store a single collection of data. This data is 50 GB in size and has about 95 GB of indexes. I am running a machine with 256 GB RAM. I basically want to have all the index and working set in the RAM since the machine is exclusively allocated to mongo.
Currently I see that, though mongo is running with the collection size of 50 GB + Index size of 95 GB, total RAM being used in the machine is less than 20 GB.
Is there a way to force mongo to leverage the RAM available so that it can store all its indexes and working set in memory ?

When your mongod process starts it has none of your data in resident memory. Data is then paged in as it is accessed. Given your collection fits in memory (which is the case here) you can run the touch command on it. On Linux this will call the readahead system call to pull your data and indexes into the filesystem cache, making them available in memory to mongod. On Windows mongod will read the first byte of each page, pulling it in to memory.
If your collection+indexes do not fit into memory, only the tail end of data accessed during the touch will be available.

Related

Understanding MongoDB storage size, logical data, vs database size, in MongoDB Atlas

We have a cluster with 100 GB storage, per the configuration for the cluster in mongodb atlas.
And the overview page for the cluster, it shows that 43.3 GB out of a 100 GB max are used.
Since the clusters configuration also has 100 GB storage selected, I am assuming the 100 GB of disc space is the same as the 100 GB available storage?
When we click into our database, it shows the database size is 66.64 GB + 3.21 GB indexes, for a total size of about 70GB.
What is the difference between the 100GB of available storage and disc, and the database size + index size of 70GB? Should we be concerned that the 70 GB is approaching 100GB, or is it only the 43.3 GB of disc usage that matters?
Edit Since I've posted this, MongoDB has removed database size, and replaced it with both storage size and logical data size, which further complicates this. In most instances, the logical data size is 3-4x the storage size.
Your mongodb database is using by default wiredTiger storage engine with snappy compression which mean that most probably your data stored on disk is using 43.3GB , but the actual(uncompressed) data size is ~ 70GB , so there is no place to worry about since you have used only 43.3% from your 100GB storage. Afcourse you need to monitor your data grow and if it is increasing faster you may need to allocate more space ...

Mongodb terminates when it runs out of memory

I have the following configuration:
a host machine that runs three docker containers:
Mongodb
Redis
A program using the previous two containers to store data
Both Redis and Mongodb are used to store huge amounts of data. I know Redis needs to keep all its data in RAM and I am fine with this. Unfortunately, what happens is that Mongo starts taking up a lot of RAM and as soon as the host RAM is full (we're talking about 32GB here), either Mongo or Redis crashes.
I have read the following previous questions about this:
Limit MongoDB RAM Usage: apparently most RAM is used up by the WiredTiger cache
MongoDB limit memory: here apparently the problem was log data
Limit the RAM memory usage in MongoDB: here they suggest to limit mongo's memory so that it uses a smaller amount of memory for its cache/logs/data
MongoDB using too much memory: here they say it's WiredTiger caching system which tends to use as much RAM as possible to provide faster access. They also state it's completely okay to limit the WiredTiger cache size, since it handles I/O operations pretty efficiently
Is there any option to limit mongodb memory usage?: caching again, they also add MongoDB uses the LRU (Least Recently Used) cache algorithm to determine which "pages" to release, you will find some more information in these two questions
MongoDB index/RAM relationship: quote: MongoDB keeps what it can of the indexes in RAM. They'll be swaped out on an LRU basis. You'll often see documentation that suggests you should keep your "working set" in memory: if the portions of index you're actually accessing fit in memory, you'll be fine.
how to release the caching which is used by Mongodb?: same answer as in 5.
Now what I appear to understand from all these answers is that:
For faster access it would be better for Mongo to fit all indices in RAM. However, in my case, I am fine with indices partially residing on disk as I have a quite fast SSD.
RAM is mostly used for caching by Mongo.
Considering this, I was expecting Mongo to try and use as much RAM space as possible but being able to function also with few RAM space and fetching most things from disk. However, I limited Mongo Docker container's memory (to 8GB for instance), by using --memory and --memory-swap, but instead of fetching stuff from disk, Mongo just crashed as soon as it ran out of memory.
How can I force Mongo to use only the available memory and to fetch from disk everything that does not fit into memory?
Thanks to #AlexBlex's comment I solved my issue. Apparently the problem was that Docker limited the container's RAM to 8GB but the wiredTiger storage engine was still trying to use up 50% - 1GB of the total system RAM for it's cache (which in my case would have been 15 GB).
Capping wiredTiger's cache size by using this configuration option to a value less than what Docker was allocating solved the problem.

MongoDB consumes all memory(RAM)

I use MongoDB 3.2, WiredTiger engine. I am trying to use batch inserts on 10K records, the size of one record about kb. All great, but 60-70 million such records memory ends. Mongo is limited cache in 3GB, but the memory is consumed by memory map files collections and indexes. After some time Mongo CPU load at 100% and stops receiving data. OS: Windows 7. What am I doing wrong? :)

mongodb architect design with hardware specification

Currently we have one replica set of 3 members, 25 GB of data, normal cpu usage is 1.5 in both secondary, 0.5 in primary(read happen in secondary instance only), normally 1200 users hit our website. Now we have planned to increase the no of hit to our website. We are expecting about 5000 concurrent users to our website, can you please suggest no of instance needed to add in my replica set.
Current infra in our replica set:
1. Primary instance
CPUs: 16
RAM: 32 GB
HDD: 100 GB
2. Secondary instance
CPUs: 8
RAM: 16 GB
HDD: 100 GB
3. Secondary instance
CPUs: 8
RAM: 16 GB
HDD: 100 GB
Assuming your application scales linearly with the number of users, the CPU capacity should not be a problem (does it? Only you can tell - we don't know what your application does).
The question is: how much do you expect your data to grow? When you currently have 25 GB of data and 16 GB of ram, 64% of your data fits into RAM. That likely means that many queries can be served directly from the RAM cache without hitting the hard drives. These queries are usually very fast. But when your working set increases further beyond the size of your RAM, you might experience some increased latency when accessing the data which now needs to be read from the hard drives (it depends, though: when your application interacts primarily with recent data and rarely with older data, you might not even notice much of a difference).
The solution to this is obvious: get more RAM. Should this not be an option (for example because the server reached the maximum RAM capacity the hardware allows), your next option is building a sharded cluster where each shard is responsible for serving an interval of your data.

Memory usage remains 99 % even after create index is done on MongoDB Collection

While doing indexing on MongoDB. Now we have nearly 350 GBs of data in the database and its deployed as a windows service in AWS EC2.
And we are doing indexing for some experimentation. But every time I run the indexing command the memory usage goes to 99% and even after the indexing is done the memory usage keeps like that until I restart the service.
The instance has 30 GB of RAM and SSD drive. And right now we have the DB setup as stand alone (not sharded till now). And we are using the latest version of MongoDB.
Any feedback related to this will be helpful.
Thanks,
Arpan
That's normal behavior for MongoDB.
MongoDB grabs all the RAM it can get to cache each accessed document as long as possible. When you add an index to a collection, each document needs to be read once to build the index, which causes MongoDB to load each document into RAM. It then keeps them in RAM in case you want to access them later. But MongoDB will not squat the RAM. When another process needs memory, MongoDB will willingly release it.
This is explained in the FAQ:
Does MongoDB require a lot of RAM?
Not necessarily. It’s certainly
possible to run MongoDB on a machine with a small amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
See also: FAQ: MongoDB Diagnostics for answers to additional questions
about MongoDB and Memory use.