I'm using pymongo to insert a big amount of jsons to MongoDB gridFS + some data to collection.
What I noticed some time ago is that MongoDB consumes just crazy amount of RAM within using single connection. As soon as I close this connection it releases it.
RAM consumption is like 10-12GB in total within connection and 200MB without. The actual size of collection is actually ~300MB with 10-18GB gridFS storage.
Why does it happen? How can opening new connection for any bulky operation can be lot less resource-dependent than using one single connection for everything?
Is it somehow related to Journaling?
I will have to break down this problem into multiple smaller problems for ease of understanding:
It is well known that MongoDB is RAM hungry, it will try to use as much RAM as possible.
GridFS tends to store files in collection fs.chunks and corresponding meta-data in fs.files. The files stored in GridFS are split into chunks of 256KB each.
When you read GridFS data by opening a connection, the chunks belonging to file(s) have to be loaded into the RAM from the disk(if it is not already present in RAM). So , RAM usage is directly proportional to the amount of data stored and importantly frequency of GridFS data access. Just to re-iterate GridFS data gets pulled into RAM if the query references it.
If you have a active connection for large amounts of GridFS data then you should expect heavy RAM usage. But if your query frequency is low(just write, but read rarely) then RAM usage will be relatively lower.If you are mostly writing data, then ensure the connection is closed after the operation in done.
The more the number of open connections, your RAM usage will increase.
This is no-way related to journaling.
Note: GridFS also supports sharding which will tend to solve your problem of excessive RAM usage.
Hope this clarifies.
Since MongoDB 2.0. each connection consumes about 1MB of RAM.
You can read more here.
Related
I have a MongoDB 4.4 server (single node, no replicas) for storing IoT-style data. Data is written to several collections every few seconds by my NodeJS app. Documents are not updated/modified, and reads are less common than writes.
I have TTL indices on my big collections so that data older than 6 months is deleted. However, Mongo seems to consume more and more disk space. When the disk inevitably fills up, Mongo and my app stop working. I need to stop Mongo from consuming increasing amounts of disk space.
If I call stats() on my big collections, I can see that there are gigabytes of "file bytes available for reuse". But when I use db.runCommand({compact:'big_collection'}), it doesn't seem to release any space. Other people seem to have similar experiences. I wish I understood why compact isn't working.
I suspect the best alternative approach is to remove the TTL index, and then Cap the collection to a fixed size, but I'd like to hear if anyone has experience with such a process, or alternative recommendations.
As i know MonogoDB cache working set in RAM.
Then if i increase wiredTigerCacheSizeGB as much as all of data in disk, does it work as fast as in-memory db?
if no, what is difference?
See In-Memory Storage Engine and WiredTiger Storage Engine
(In-memory) By avoiding disk I/O, the in-memory storage engine allows for more predictable latency of database operations.
Keep in mind that you are limited a 10000 GB when setting wiredTigerCacheSizeGB. You should also disable journaling and set storage.syncPeriodSecs to 0 in order to increase performance of WiredTiger. But, still WiredTiger has to create WiredTiger.wt and WiredTiger.turtle at least...
PS. I think this link might answer your question
I cannot answer all your questions.
A cache reads data from disk and keeps it in the RAM. When you access such data again then you read it from RAM instead of reading it again from disk - which would be much slower.
So, a cache is useless if you have to read the data only once. Some applications anticipate the data you may read in future and put it into the cache in advance.
The MongoDB in-memory DB puts all data into RAM only, it does not read or write anything from disk, apart from some logging data. When you stop an in-memory MongoDB process then all data is lost.
The wiredTiger storage engine is a data format used by MongoDB to store data persistently on disk.
If you set wiredTigerCacheSizeGB high enough to hold all of your data, then all of your reads will be satisfied from the cache. Writes will update the cache and also be written to storage.
If you use the in-memory configuration then all of your reads will be satisfied from memory. Writes will only go to memory and will not be stored on disk.
So if your workload is mostly reads, then the large cache will behave similarly to an in-memory DB. If your workload has a lot of writes, then the large cache configuration may be slower because it needs to write to disk.
Also, the in-memory DB will not preserve your data in the event of a crash, since it only holds data in memory.
Say I have a single collection in mongodb with only one index, and I require the index for the entire life cycle of the application using that mongo collection.
I would like to know about the behaviour of mongodb.
In this case once the index is loaded into memory, will mongodb keep it in the ram?
Thanks
The first thing MongoDB will knock out of RAM will be the LRU (least recently used) piece of data. So if you only have one index, chances are it will continue to be used pretty regularly and it should stay in memory.
Source
Unfortunately you cannot currently pin a collection or index in memory. MongoDB uses memory mapped files to load collections and indexes into memory. As your activities touch various pieces of your database thru queries, updates, insertions and deletions, that data will get loaded into memory. This is referred to as the working set. If the total memory required to load the working set is less than available memory, no problem.
If not, MongoDB is going to use an LRU algorithm to pick what to unload from memory. This is why it's so important to understand the concept of the working set and how it relates to your available memory.
This writeup from the documentation should be helpful:
How do I calculate how much RAM I need for my application?
The amount of RAM you need depends on several factors, including but
not limited to:
The relationship between database storage and working set.
The operating system’s cache strategy for LRU (Least Recently Used)
The impact of journaling
The number or rate of page faults and other MMS gauges to detect when you need more RAM
Each database connection thread will need up to 1 MB of RAM. MongoDB
defers to the operating system when loading data into memory from
disk. It simply memory maps all its data files and relies on the
operating system to cache data. The OS typically evicts the
least-recently-used data from RAM when it runs low on memory. For
example if clients access indexes more frequently than documents, then
indexes will more likely stay in RAM, but it depends on your
particular usage.
To calculate how much RAM you need, you must calculate your working
set size, or the portion of your data that clients use most often.
This depends on your access patterns, what indexes you have, and the
size of your documents. Because MongoDB uses a thread per connection
model, each database connection also will need up to 1MB of RAM,
whether active or idle.
If page faults are infrequent, your working set fits in RAM. If fault
rates rise higher than that, you risk performance degradation. This is
less critical with SSD drives than with spinning disks.
http://docs.mongodb.org/manual/faq/diagnostics/
You can use the serverStatus command to get an estimate of your current working set:
db.runCommand( { serverStatus: 1, workingSet: 1 } )
While doing indexing on MongoDB. Now we have nearly 350 GBs of data in the database and its deployed as a windows service in AWS EC2.
And we are doing indexing for some experimentation. But every time I run the indexing command the memory usage goes to 99% and even after the indexing is done the memory usage keeps like that until I restart the service.
The instance has 30 GB of RAM and SSD drive. And right now we have the DB setup as stand alone (not sharded till now). And we are using the latest version of MongoDB.
Any feedback related to this will be helpful.
Thanks,
Arpan
That's normal behavior for MongoDB.
MongoDB grabs all the RAM it can get to cache each accessed document as long as possible. When you add an index to a collection, each document needs to be read once to build the index, which causes MongoDB to load each document into RAM. It then keeps them in RAM in case you want to access them later. But MongoDB will not squat the RAM. When another process needs memory, MongoDB will willingly release it.
This is explained in the FAQ:
Does MongoDB require a lot of RAM?
Not necessarily. It’s certainly
possible to run MongoDB on a machine with a small amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
See also: FAQ: MongoDB Diagnostics for answers to additional questions
about MongoDB and Memory use.
I'm using MongoDB on a cloud server with 10GB of storage and 1GB RAM. After importing about 4.4 GB of data into a MongoDB database, whenever I type "mongo" on the commandline to test some queries, the server freezes.
Is there a cap on the memory resource allocation to MongoDB that I can remove? Or is it simply a matter of increasing RAM?
MongoDB uses memory mapped files, which are allocated by the OS. This means that there is no specific resource that you can free up to make more room for a Mongo console to run.
There are a couple of things to note about your environment. Firstly, the amount of RAM you have for the amount of data you have loaded is on the small side. MongoDB is going to try and keep as much of the working set in memory as it can, to avoid page faults as the disc seeks are a real killer for performance. Secondly, there will be some initial work going on when the data is loaded which could affect performance.
You can check out the Wiki page Checking Server Memory Usage for information on how much memory Mongo is using up, and general information on the Memory Usage of Mongo.
Can you try and connect to the MongoD from another machine, so as to remove this burden from the DB Server?