MongoDB consumes a lot of memory - mongodb

For more than a month is my war with mongoDB. Until I lose =] ...
Battle 1. Battle 2.
And now a new problem. Again, not enough memory.
Initially, this was solved by simply increasing the memory at a rate of VPS. Then journal = false. But now I got to the top of your plan and continue to increase the memory is not possible.
For my base are lacking 4 GB of memory.
How should I choose a database for the project, was nowhere written that there are so many mongoDB memory. With about 10 million records in the mongoDB missing 4 GB of memory, when my MySQL database with 10 million easily copes with 1.4 GB of memory.
The problem as I understand it, a large number of index fields. But since I can not log into the database, respectively, can not remove them. They needed me in the early stages of development, now they are not important to me.
Tell me please, can I remove them somehow?
There is a dump of the database is completely whole folder database / data / db
On my PC with 4 GB of memory database does not start on a VPS with 4GB same.
As an alternative, I think to take a test period at some VPS / VDS to run mongo and delete keys.
Do you know a web hosting with a test period and 6 GB of memory?
Or if there is an alternative, could you say what?

The issues has very little to do with the size of your data set. MongoDB uses memory mapped files for its storage engine. As such it'll start swapping in pages of hot data into memory when it can and it does so fairly aggressively (or more accurately, the OS memory management does).
Basically it uses as much memory as is available to it and there's very little you can do to avoid it. All data pages (be it actual data or indexes) that are accessed during operation will be swapped into memory if there is space available.
There are plenty of references to this on the internet and on mongodb.org by the way. Saying it isn't mentioned anywhere isn't really true.

Related

Is there a way to release RAM occupied by mongodb indexes, after dropping collection?

The problem is that we have a huge dataset consists of 50 mln records and almost all fields are indexed, that causes huge consumption of RAM, and after collection is deleted resources are not released, I know that this can be solved by restarting the server, but this solution is not applicable under our situation. So, my question - is there a way to release RAM resources without restarting mongo server? Version of Mongo is 4.4. Thanks in advance.
Not directly... MongoDB never make memory free, it just replaces it or allocates more.
But if you start reading from the disk data what you're going to need, that data will replace that part of memory.
Base problem is that MongoDB will always use (eventually) all free memory what is available and try to keep in memory all active data. So, reading data from the disk, makes that data "active" and will change the content of disk cache in the memory.

How do I have Mongo 3.0 / WiredTiger load my whole database into RAM?

I have a static database (that will never even receive a write) of around 5 GB, while my server RAM is 30 GB. I'm focusing on returning complicated aggregations to the user as fast as possible, so I don't see a reason why I shouldn't have (a) the indexes and (b) the entire dataset stored entirely in RAM, and (c) automatically stored there whenever the Mongo server boots up. Currently my main bottleneck is running group commands to find unique elements out of millions of rows.
My question is, how can I do either (a), (b), or (c) while running on the new Mongo/WiredTiger? I know the "touch" command doesn't work with WiredTiger, so most information on the Internet seems out of date. Are (a), (b), or (c) already done automatically? Should I not be doing each of these steps with this use case?
Normaly you shouldn't have to do anything. The disk pages are loaded in RAM upon request and stay there. If there is no more free memory the older (unused) pages get unloaded to be used by other programs that need them.
If you must have your whole db in ram you could use a ramdisk and tell mongo to use it as a storage device.
I would recommend that you revise your indices and/or data structures. Having the correct ones can make a huge difference in performance. We are talking about seconds vs hours.

What is memory map in mongodb?

I read about this topic at
http://docs.mongodb.org/manual/faq/storage/#faq-storage-memory-mapped-files
But didn't understand point .Does it is used to keep query data in physical memory ? How it is related with virtual memory ? Why it is important and how it effect at performance ?
I'll try to explain in a simple way.
MongoDB (and other storage systems) stores data in files. Each database has its own files, created as they are needed. The first file weights 64 MB, the next 128 and so up to 2 GB. Then, new files created weigh 2 GB. Each of these files are logically divided into different blocks, that correspond with one virtual memory block.
When MongoDB needs to access a file or a part of it, loads all virtual blocks corresponding to that file or parts of the files into memory using mmap.On the other hand, mmap is a way for applications to leverage the system cache (linux).
So what really happens when you are doing a query is that MongoDB "tells" the OS to load the part it needs with the data requested, so the next time is requested will be faster. As you can imagine this is a very important feature to boost performance in databases like MongoDB, because accessing RAM is way faster than hard drive.
Another benefit of using mmap is that MongoDB memory will grow as it needs and the system memory is free.

design mongodb to load entire content in memory

I am involved in a project where they get enough RAM to store the entire database in memory. According to the manager, that is what 10Gen recommended. This is counter intuitive. Is that really the way you want to use Mongodb?
It is not counter intuitive... I find it quite intuitive, actually.
In How much faster is the memory usually than the disk? you can read:
(...) memory is only about 6 times faster when you're doing sequential
access (350 Mvalues/sec for memory compared with 58 Mvalues/sec for
disk); but it's about 100,000 times faster when you're doing random
access.
So if you can fit all your data in RAM, it is quite good because you are going to be really fast reading your data.
Regarding MongoDB, from the FAQ's:
It’s certainly possible to run MongoDB on a machine with a small
amount of free RAM.
MongoDB automatically uses all free memory on the machine as its
cache. System resource monitors show that MongoDB uses a lot of
memory, but its usage is dynamic. If another process suddenly needs
half the server’s RAM, MongoDB will yield cached memory to the other
process.
Technically, the operating system’s virtual memory subsystem manages
MongoDB’s memory. This means that MongoDB will use as much free memory
as it can, swapping to disk as needed. Deployments with enough memory
to fit the application’s working data set in RAM will achieve the best
performance.
The problem is that you usually have much more data than memory available. And then you have to go to disk, and disk I/O is slow. Regarding database performance, avoiding full scan queries is key (much more important when accessing to disk). Therefore, if your data set does not fit in memory, you should aim at having indexes for the vast majority of your access patterns and try to fit those indexes in memory:
If you have created indexes for your queries and your working data set
fits in RAM, MongoDB serves all queries from memory.
It all depends on the size of your database. I am guessing that you said your database was actually quite small, otherwise I cannot see how someone at 10gen gave such advice, I mean not even #Stennie gives such advice (he is 10gen by the way).
Even if your database is small I don't see how the manager recommended that. MongoDB does not do memory management of its own as such it does not "pin" data into pages like memcached does or other memory based databases do.
This means that the paging of mongods data can be quite unpredicatable, a.k.a you will spend more time trying to keep things in RAM than paging in data. This is why it is better to just make sure your working set fits and it can loaded with speed, such things are based upon your hardware and queries.
#Stennies comment pretty much sums up the stance you should be taking with MongoDB.

mongod clean memory used in ram

I have a huge amount of data in my mongodb. It's filled with tweets (50 GB) and my Ram is 8 GB. When querying it retrieves all tweets and mongodb starts filling the ram, when it reaches 8 GB it starts moving files to disk. This is the part where it gets really slowwwww. So i changed the query from skipping and starting using indexes. Now i have indexes and i query only 8GB to my program, save the id of the last tweet used in a file and the program stops. Then restart the program and it goes get the id of the tweet from the file. But mogod server still is ocupping the ram with the first 8GB, that no longer will be used, because i have a index to the last. How can i clean the memory of the mongo db server without restarting it?
(running in a win)
I am a bit confused by your logic here.
So i changed the query from skipping and starting using indexes. Now i have indexes and i query only 8GB to my program, save the id of the last tweet used in a file and the program stops.
Using ranged queries will not help the amount of data you have to page in (in fact it might worsen it because of the index), it merely makes the query faster server side by using an index for huge skips (like 42K+ row skip). If you are dong the same as that skip() but in index then (without a covered index) then you are still paging in exactly the same.
It is slow due to memory mapping and your working set. You have more data than RAM and not only that but you are using more of that data than you have RAM as such you are page faulting probably all the time.
Restarting the program will not solve this, nor will clearing its data OS side (with restart or specific command) because of your queries. You probably need to either:
Think about your queries so that your working set is more in line to your memory
Or shard your data across many servers so that you don't have to build up your primary server
Or get a bigger primary server (moar RAM!!!!!)
Edit
The LRU of your OS should be swapping out old data already since MongoDB is using its fully allocated lot, which means that if that 8GB isn't swapped it is because your working set is taking that full 8GB (most likely with some swap on the end).