We chose to deploy the mongos router in the same VM as our applications, but we're running into some issues where the application gets OOM Killed because the mongos eats up a lot more RAM than we'd expect / want to.
After a reboot, the mongos footprint is a bit under 2GB, but from here it constantly requires more memory. About 500MB per week. It went up to 4.5+GB
This is the stats for one of our mongos for the past 2 weeks and it clearly looks like it's leaking memory...
So my question is: how to investigate such behavior? We've not really been able to find explanations as of why the router might require more RAM, or how to diagnosis the behavior much. Or even how to set a memory usage limit to the mongos.
With a db.serverStatus on the mongos we can see the allocations:
"tcmalloc" : {
"generic" : {
"current_allocated_bytes" : 536925728,
"heap_size" : NumberLong("2530185216")
},
"tcmalloc" : {
"pageheap_free_bytes" : 848211968,
"pageheap_unmapped_bytes" : 213700608,
"max_total_thread_cache_bytes" : NumberLong(1073741824),
"current_total_thread_cache_bytes" : 819058352,
"total_free_bytes" : 931346912,
"central_cache_free_bytes" : 108358128,
"transfer_cache_free_bytes" : 3930432,
"thread_cache_free_bytes" : 819058352,
"aggressive_memory_decommit" : 0,
"pageheap_committed_bytes" : NumberLong("2316484608"),
"pageheap_scavenge_count" : 35286,
"pageheap_commit_count" : 64660,
"pageheap_total_commit_bytes" : NumberLong("28015460352"),
"pageheap_decommit_count" : 35286,
"pageheap_total_decommit_bytes" : NumberLong("25698975744"),
"pageheap_reserve_count" : 513,
"pageheap_total_reserve_bytes" : NumberLong("2530185216"),
"spinlock_total_delay_ns" : NumberLong("38522661660"),
"release_rate" : 1
}
},
------------------------------------------------
MALLOC: 536926304 ( 512.1 MiB) Bytes in use by application
MALLOC: + 848211968 ( 808.9 MiB) Bytes in page heap freelist
MALLOC: + 108358128 ( 103.3 MiB) Bytes in central cache freelist
MALLOC: + 3930432 ( 3.7 MiB) Bytes in transfer cache freelist
MALLOC: + 819057776 ( 781.1 MiB) Bytes in thread cache freelists
MALLOC: + 12411136 ( 11.8 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 2328895744 ( 2221.0 MiB) Actual memory used (physical + swap)
MALLOC: + 213700608 ( 203.8 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 2542596352 ( 2424.8 MiB) Virtual address space used
MALLOC:
MALLOC: 127967 Spans in use
MALLOC: 73 Thread heaps in use
MALLOC: 4096 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
But I can't say it's really helpful. At least to me.
In the server stats we can also see that the number of calls to killCursors is quite high (2909015) but I'm not sure how it would explain the steady increase in memory usage? As the cursors are automatically killed after 30ish seconds, and the number of calls made to the mongos is pretty much steady all throughout the period.
So yeah, any idea on how to diagnosis / where to look / what to look for?
Mongos version: 4.0.19
Edit: seems like our monitoring is based on the virt and not the res memory, so the graph might not be very pertinent. However, we still ended-up with 4+ GB of RES memory at some point
Why the router would require more memory?
If there is any query in the sharded cluster where the system needs to do a scatter gather then merging activity is taken care of by the mongos itself.
For example I am running a query db.collectionanme.find({something : 1})
If this something field here is not the shard key itself then by default it will do a scatter gather, use explainPlan to check the query. It does a scatter gather because mongos interacts with config server and realises that it doesn't have information for this respective field. {This is applicable for a collection which is sharded}
To make things worse, if you have sorting operations where the index cannot be used then even that now has to be done on the mongos itself. Sorting operations have to block the memory segment to get the pages together based on volume of data then sort works, imagine the best possible Big O for a sorting operation here. Till that is done the memory is blocked for that operation.
What you should do?
Based on settings (your slowms setting, default should be 100ms), check the logs, take a look at your slow queries in the system. If you see a lot of SHARD_MERGE & in memory sorts taking place then you have your culprit right there.
And for quick fix increase the Swap memory availability and make sure settings are apt.
All the best.
Without access to the machine one can only speculate, with that said I want to say this is somewhat expected behaviour by Mongo.
Mongo likes memory, it saves many things in RAM in order to increase performance.
Many different things can cause this, i'll list a few:
MongoDB uses RAM to handle open connections, aggregations, serverside code, open cursors and more.
WiredTiger keeps multiple versions of records in its cache (Multi
Version Concurrency Control, read operations access the last
committed version before their operation).
WiredTiger Keeps checksums of the data in cache.
There are many more things that are cached / stored in memory by mongo, one such example could be an index tree for a collection.
If Mongo has the memory to store something, it will. that's why as you're using it more and more the RAM usage increases. however I personally do not think it's a memory leak. as I said Mongo just like's RAM, A LOT.
We chose to deploy the mongos router in the same VM as our applications.
This is a big no no from personal experience in general because of Mongo's hungriness I would personally try and avoid this if possible.
To summarize I don't think you have a memory leak ( although possible ), I just think the more time passes Mongo stores more things into RAM as they are being used.
You should be on the lookout for long running queries as those are the most likely culprit IMO.
Related
I created a database containing a total of 3 tables for a specific purpose. The total size of all tables is about 850 MB - very lean... out of which one single table contains about 800 MB (including index) of data and 5 million records (daily addition of about 6000 records).
The system is PG-Windows with 8 GB RAM Windows 7 laptop with SSD.
I allocated 2048MB as shared_buffers, 256MB as temp_buffers and 128MB as work_mem.
I execute a single query multiple times against the single table - hoping that the table stays in RAM (hence the above parameters).
But, although I see a spike in memory usage during execution (by about 200 MB), I do not see memory consumption remaining at at least 500 MB (for the data to stay in memory). All postgres exe running show 2-6 MB size in task manager. Hence, I suspect the LRU does not keep the data in memory.
Average query execution time is about 2 seconds (very simple single table query)... but I need to get it down to about 10-20 ms or even lesser if possible, purely because there are just too many times, the same is going to be executed and can be achieved only by keeping stuff in memory.
Any advice?
Regards,
Kapil
You should not expect postgres processes to show large memory use, even if the whole database is cached in RAM.
That is because PostgreSQL relies on buffered reads from the operating system buffer cache. In simplified terms, when PostgreSQL does a read(), the OS looks to see whether the requested blocks are cached in the "free" RAM that it uses for disk cache. If the block is in cache, the OS returns it almost instantly. If the block is not in cache the OS reads it from disk, adds it to the disk cache, and returns the block. Subsequent reads will fetch it from the cache unless it's displaced from the cache by other blocks.
That means that if you have enough free memory to fit the whole database in "free" operating system memory, you won't tend to hit the disk for reads.
Depending on the OS, behaviour for disk writes may differ. Linux will write-back cache "dirty" buffers, and will still return blocks from cache even if they've been written to. It'll write these back to the disk lazily unless forced to write them immediately by an fsync() as Pg uses at COMMIT time. When it does that it marks the cached blocks clean, but doesn't flush them. I don't know how Windows behaves here.
The point is that PostgreSQL can be running entirely out of RAM with a 1GB database, even though no PostgreSQL process seems to be using much RAM. Having shared_buffers too high just leads to double-caching and can reduce the amount of RAM available for the OS to cache blocks.
It isn't easy to see exactly what's cached in RAM because Pg relies on the OS cache. That's why I referred you to pg_fincore.
If you're on Windows and this won't work, you really just have to rely on observing disk activity. Does performance monitor show lots of uncached disk reads? Does operating system memory monitoring show lots of memory used for disk cache in the OS?
Make sure that effective_cache_size correctly reflects the RAM used for disk cache. It will help PostgreSQL choose appropriate query plans.
You are making the assumption, without apparent evidence, that the query performance you are experiencing is explained by disk read delays, and that it can be improved by in-memory caching. This may not be the case at all. You need to look at explain analyze output and system performance metrics to see what's going on.
MongoDB 2.46 & 2.4.8
Use case:
Load up 100.000 documents on a collection with 2 indexes. Resident memory increases (mongostat), and no page faults happen.
Restart mongod. Resident memory is low (this is expected)
Try to 'preheat' mongo, with touch command db.runCommand({ touch: collection, data: true, index: true }) or other means (on OS, vmtouch / dd)
a) On this step, on my development machine (MacOS), I see in mongostat a lot of page faults trying to heat it up (expected) and the resident memory is raised. From that point on, any updates do not raise page faults
b) On a numa server (256 GB RAM), even though I started up mongo with this guide: http://docs.mongodb.org/manual/administration/production-notes/#mongodb-on-numa-hardware (note: I do not have superuser access. However, the 2nd step, echoing 0 in /proc/sys/vm/zone_reclaim_mode, is already 0 so I left it like that), I cannot seem to be able to pre-heat the memory with the 'touch' command. Nothing happens, even though it returns successfully. In mongostat, only 'mapped' and 'vsize' is getting higher, and resident memory is the same (35m). I even tried to load up the data files in OS's memory with vmtouch and dd commands. Only re-indexing the collection changed the resident memory.
The problem started a while after I began to load up data into the server. I do a lot of upserts and the performance was awesome in the beginning (3000 - 4000 upserts/sec). This was expected because the working set would be able to fit in memory. After 30.000.000 documents the process seems to make a lot of page faults and I do not know why. The data files are approx. 33GB and the performance is about 500 upserts/sec, with a lot of page faults. That should mean that the working set is not in memory. However, 256GB RAM should be more than enough. I tried the 'touch' command, but resident memory was low (I even restarted the mongod process, ran the touch command, and even though 'mapped' and 'vsize' skyrocketed to a lot of GB, resident memory kept low, 35m). I tried to reIndex the collection and voilĂ , resident memory went from 35m -> 20GB. However, again, I saw page faults. Then I tried to vmtouch the data files (or with dd). Again, a lot of page faults.
The problem is that I cannot have 'only' 500 upserts/sec. Should I change my application logic? I thought with 256GB memory my 'active' working set (expected 60GB) should fit in memory. I am in the middle (30GB) and it seems that I cannot do anything to fix this. Is it the numa hardware? Should I make any other changes?
Thanks in advance
I just wrote a pretty detailed answer over on ServerFault regarding resident memory, page faulting, and how to troubleshoot, tweak and tune etc. so I will not re-hash that here.
I will say that Sammaye's comment is correct, the touch (or dd, vmtouch etc.) command will not cause memory to be reported as resident agains the mongod process until the process actually accesses the data (until then it is just in the FS cache), and then you can hit the issue in SERVER-9415 which can cause resident memory to be under reported.
I think you are already looking at the key metrics here, and you should be able to achieve higher resident memory than you are reporting (or at least, get more data into memory without significant page faults being seen). The situation you are describing sounds like memory pressure from elsewhere but I am assuming you would have notices another process eating significant amounts of memory.
What I will note is that I have previously spent days (literally) attempting to make a particular AWS instance go above a 30% memory threshold without success.
When we finally gave up and tried on another instance, without changing a thing (we just added a new instance as a secondary and failed over to it) it instantly went to over 70% resident memory. Granted, that was on m2.4xlarge instances, so not at the same scale as yours, but it's always worth bearing in mind. If you can try it on another instance, I would recommend giving it a shot.
After starting Mongo via mongod, I ran a Mongo query that took 300 seconds. Calling db.serverStatus() on my "admin" db showed Mongo having resident memory of 1 GB. The docs explain that "resident" memory is the amount of physical disk/RAM that Mongo uses.
Then, I re-ran the same query, but it took 8 seconds. Looking at the resident memory this time, I saw 5 GB.
The large increase in RAM, I believe, helps to explain why the query time shrank from 300 to 8 seconds, but why did the resident memory jump so quickly?
Is there some type of "warming" step recommended to prepare Mongo so as to avoid 300 second queries?
There reason behind that MongoDB uses mmap functionality of the operating system. This means, at least on Linux systems That the memory handling of the mongodb is based on some functionality of the operating system called memory mapped files.
The memory in Linux systems is addressed in several levels basically any program will see an address space on 32-bit systems of 2GB over all, on 64-bit systems 128TB. This is a virtual address space which means on 32/64bit that amount of memory can be addressed with 4kb memory pages(page is the individually handled part of the memory). That is why if you start mongoDB on a 32 bit system it will rise a warning that the database on such system can handle only 2GB of data. Obviously this virtual address space is bigger than the amount of physical memory so there is a mapping between these virtual addresses and the physical ones. Some of the virtual addresses reside in really physical memory so they are in the real memory,but the algorithm which ensures this is on the side of the kernel. Programs running on Linux systems can deal only with virtual addresses, if one tries to access a virtual memory address which is not in physical memory, a pagefault occurs (you can track this on the serverStatus commands extra info field). (You can find short explanation of this here)
Accessing memory in case if the virtual address reside in physical memory is as fast as the memory, accessing a virtual address which has no physical currently means a paging from disk to memory and read the memory so as fast as the disks random read. (This makes the different in your case)
There is a command in mongoDB which with you can enforce the caching of a collection or an index this command is the touch
If you use this command to load the data into memory before the first query you will get the results in 8sec at first try. Unfortunately you cannot really force the OS to keep always this in memory, so if you have others things using up the memory OS will page out this data in some time.
IF you have enough physical memory mongoDB will keep everything the data and indexes in memory. This not always needed. There is a portion of data which need to be in memory to avoid extensive amount of pagefaults this is the workingset. You can check the size of the working set with the db.runCommand( { serverStatus: 1, workingSet: 1 } ) command.
You cannot handle the paging while it is OS level, but if you have enough memory usually the kernel likes to keep as much stuff cached as it can. If the workingset fits in memory you are more or less ok. If some documents really rarely accessed and there is not enough memory to keep everything there they will be paged out anyway.
When you run a query several things can happen. An index can cover which means no documents will be touched at all, if your query is selective in some notion only a part of the index will be touched. unfortunately it is really hard to define memory is sufficient and the only thing what you can do is to monitor (the workingset metric is an estimation). The symptom of running out of memory can be identified check this presentation. And use MMS.
I'm using MongoDB on a 32 bit production system, which sucks but it's out of my control right now. The challenge is to keep the memory usage under ~2.5GB since going over this will cause 32 bit systems to crash.
According to the mongoDB team, the best way to track the memory usage is to use your operating system's process tracking system (i.e. ps or htop on Unix systems; Process Explorer on Windows.) for virtual memory size.
The DB mainly consists of one table which is continually cycling data, i.e. receiving data at regular intervals from sensors, and every day a cron job wipes all data from before the last 3 days. Over a period of time, the memory usage slowly increases. I took some notes over time using db.serverStats(), db.lectura.totalSize() and ps, shown in the chart below. Note that the size of the table in question has reduced in the last month but the memory usage increased nonetheless.
Now, there is some scope for adjustment in how many days of data I store. Today I deleted basically half of the data, and then restarted mongodb, and yet the mem virtual / mem mapped and most importantly memory usage according to ps have hardly changed! Why do these not reduce when I wipe data (and restart)? I read some other questions where people said that mongo isn't really using all the memory that it might appear to be using, and that you can't clear the cache or limit memory use. But then how can I ensure I stay under the 2.5GB limit?
Unless there is a way to stem this dataset-size-irrespective gradual increase in memory usage, it seems to me that the 32-bit version of Mongo is unuseable. Note: I don't mind losing a bit of performance if it solves the problem.
To answer regarding why the mapped and virtual memory usage does not decrease with the deletes, the mapped number is actually what you get when you mmap() the entire set of data files. This does not shrink when you delete records, because although the space is freed up inside the data files, they are not themselves reduced in size - the files are just more empty afterwards.
Virtual will include journal files, and connections, and other non-data related memory usage also, but the same principle applies there. This, and more, is described here:
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
So, the 2GB storage size limitation on 32-bit will actually apply to the data files whether or not there is data in them. To reclaim deleted space, you will have to run a repair. This is a blocking operation and will require the database to be offline/unavailable while it was run. It will also need up to 2x the original size in terms of free disk space to be able to run the repair, since it essentially represents writing out the files again from scratch.
This limitation, and the problems it causes, is why the 32-bit version should not be run in production, it is just not suitable. I would recommend getting onto a 64-bit version as soon as possible.
By the way, neither of these figures (mapped or virtual) actually represents your resident memory usage, which is what you really want to look at. The best way to do this over time is via MMS, which is the free monitoring service provided by 10gen - it will graph virtual, mapped and resident memory for you over time as well as plenty of other stats.
If you want an immediate view, run mongostat and check out the corresponding memory columns (res, mapped, virtual).
In general, when using 64-bit builds with essentially unlimited storage, the data will usually greatly exceed the available memory. Therefore, mongod will use all of the available memory it can in terms of resident memory (which is why you should always have swap configured to the OOM Killer does not come into play).
Once that is used, the OS does not stop allocating memory, it will just have the oldest items paged out to make room for the new data (LRU). In other words, the recycling of memory will be done for you, and the resident memory level will remain fairly constant.
Your options for stretching 32-bit are limited, but you can try some things. The thing that you run out of is address space, and the increases in the sizes of additional database files mean that you would like to avoid crossing over the boundary from "n" files to "n+1". It may be worth structuring your data into more or fewer databases so that you can get the maximum amount of actual data into memory and as little as possible "dead space".
For example, if your database named "mydatabase" consists of the files mydatabase.ns (the namespace file) at 16 MB, mydatabase.0 at 64 MB, mydatabase.1 at 128 MB and mydatabase.2 at 256 MB, then the next file created for this database will be mydatabase.3 at 512 MB. If instead of adding to mydatabase you instead created an additional database "mynewdatabase" it would start life with mynewdatabase.ns at 16 MB and mynewdatabase.0 at 64 MB ... quite a bit smaller than the 512 MB that adding to the original database would be. In fact, you could create 4 new databases for less space than would be consumed by adding a new file to the original database, and because the files are smaller they would be easier to fit into contiguous blocks of memory.
It is a well-known message that 32-bit should not be used for production.
Use 64-bit systems.
Point.
I am concern about my server machine performance . The application deals with gazillion data from RETS sever feed. Whenever server starts mongod service its getting like taking off the performance and the PF usage shoot upto ~3.59GB although it has good configuration(Server2008, 4GB RAM) and using mongodb 64bit latest release (2.0.6).Please enlighten me on this regard.
Thanks
I'm not sure how much you know about MongoDB but Mongo uses memory mapped files to access data, which results in large numbers being displayed for the mongod process. This is normal when using memory-mapped files. The amount of mapped datafile is shown in the virtual size parameter and resident bytes shows how much data is being cached in RAM. The larger your data files, the higher the vmsize of the mongod process.
If other processes need more ram, the operating system’s virtual memory manager will relinquish some memory from the cache and the resident bytes on mongod process will drop.
It is recommended to use a fixed pagefile size. If you use a dynamic page file the OS doesn't increase it fast enough to keep up with the (private) mapped memory calls. There's actually an open ticket to add special warning if the page file is dynamic or (min is) set too small.
This document explains how memory usage works on MongoDB.
Here are some tools that show how you can diagnose system issues with MongoDB -
mongostat
Monitoring and Diagnostics
To be honest, I'd recommend moving this issue to the MongoDB User Google Group and posting your issue there along with the mongostat output during the issue as well as information from perfmon as this will likely be a longer discussion.
Another something to consider is to setup MMS on your Mongod instances.
https://mms.10gen.com