BsonChunkPool and memory leak - mongodb

I use Mongodb 3.6 + .net driver (MongoDb.Driver 2.10) to manage our data. Recenyly, we've noticed that our services (background) consume a lot of memory. After anaylyzing a dump, it turned out that there's a mongo object called BsonChunkPool that always consumes around 0.5 GB of memory. Is it normal ? I cannot really find any valuable documentation about this type, and what it actually does. Can anyone help ?

The BsonChunkPool exists so that large memory buffers (chunks) can be reused, thus easing the amount of work the garbage collector has to do.
Initially the pool is empty, but as buffers are returned to the pool the pool is no longer empty. Whatever memory is held in the pool will not be garbage collected. This is by design. That memory is intended to be reused. This is not a memory leak.
The default configuration of the BsonChunkPool is such that it can hold a maximum of 8192 chunks of 64KB each, so if the pool were to grow to its maximum size it would use 512MB of memory (even more than the 7 or 35 MB you are observing).
If for some reason you don't want the BsonChunkPool to use that much memory, you can configure it differently by putting the following statement at the beginning of your application:
BsonChunkPool.Default = new BsonChunkPool(16, 64 * 1024); // e.g. max 16 chunks of 64KB each, for a total of 1MB
We haven't experimented with different values for chunk counts and sizes so if you do decide to change the default BsonChunkPool configuration you should do some benchmarking and verify that it doesn't have an adverse impact on your performance.
From jira: BsonChunkPool and memory leak

Related

Understanding Postgres work_mem, maintenance_work_mem and temp_buffers allocation

I want to understand allocation of some postgresql.conf parameters. I want to know when maintenance_work_mem, work_mem and temp_buffers gets allocated.
As per my understanding,
maintenance_work_memory gets allocated at server start and this
memory cannot be used by any other process.
work_mem gets allocated at time of query parsing, planner checks for number of sort methods
or hash tables and allocates memory accordingly. Sort operation may not use full allocated memory but still its reserved for that particular operation and cannot be used by any other process.
temp_buffers gets allocated at start of each session.
I have gone through the docs but didn't get any proper answer.
Is this understanding correct?
Maintenance work mem is allocated per session for VACUUM, CREATE INDEX and ADD FOREIGN KEYS, and it depends on parallel workers too like with autovacuum_max_workers = 3 and maintenance_work_mem = 1 GB then autovacuum will consume 1*3= 3GB of memory similarly while creating an index.
Now, work_mem also gets allocated per session depending on your sort/hash operations however I am sure Postgres don't reserve anything to be used in the future and for tuning this and you should always consider your number of parallel connection before allocating this memory as this parameter consumes memory work_mem*sort operations running in your cluster per connection.
Yes, that's true temp_buffers can be changed within individual sessions but only before the first use of temporary tables within the session.
http://rhaas.blogspot.com/2019/01/how-much-maintenanceworkmem-do-i-need.html by Robert Hass
https://www.depesz.com/2011/07/03/understanding-postgresql-conf-work_mem/ by Deprez
https://www.interdb.jp/pg/pgsql02.html and https://severalnines.com/database-blog/architecture-and-tuning-memory-postgresql-databases was very helpful understanding the memory architecture
work_mem is the maximum memory that a single step in an execution plan can use. It is allocated by the executor freed when the query execution is done.
maintenance_work_mem is the same, but for maintenance operations like CREATE INDEX and VACUUM.
temp_buffers is used to cache temporary tables and remains in use for the duration of the database session.

how does memory allocation for postgres work?

shared_buffers - In a regular PostgreSQL installation, say I am allocating 25% of my memory to shared_buffers that means it leaves 75% for rest such as OS, page cache and work_mems etc. Is my understanding correct?
If so, AWS Aurora for Postgres uses 75% of memory for shared_buffers, then it would leave just 25% for other things?
Does the memory specified for work_mem, fully gets allocated to all sessions irrespective of whether they do any sorting or hashing operations?
Your first statement is necessarily true:
If 75% of the RAM are used for shared buffers, then only 25% are available for other things like process private memory.
work_mem is the upper limit of memory that one operation (“node”) in an execution plan is ready to use for operations like creating a hash or a bitmap or sorting. That does not mean that every operation allocates that much memory, it is just a limit.
Without any claim for absolute reliability, here is my personal rule of thumb:
shared_buffers + max_connections * work_mem should be less or equal to the RAM available. Then you are unlikely to run out of memory.

why is kdb process showing high memory usage on system?

I am running into serious memory issues with my kdb process. Here is the architecture in brief.
The process runs in slave mode (4 slaves). It loads a ton of data from database into memory initially (total size of all variables loaded in memory calculated from -22! is approx 11G). Initially this matches .Q.w[] and close to unix process memory usage. This data set increases by very little in incremental operations. However, after a long operation, although the kdb internal memory stats (.Q.w[]) show expected memory usage (both used and heap) ~ 13 G, the process is consuming close to 25G on the system (unix /proc, top) eventually running out of physical memory.
Now, when I run garbage collection manually (.Q.gc[]), it frees up memory and brings unix process usage close to heap number displayed by .Q.w[].
I am running Q 2.7 version with -g 1 option to run garbage collection in immediate mode.
Why is unix process usage so significantly differently from kdb internal statistic -- where is the difference coming from? Why is "-g 1" option not working? When i run a simple example, it works fine. But in this case, it seems to leak a lot of memory.
I tried with 2.6 version which is supposed to have automated garbage collection. Suprisingly, there is still a huge difference between used and heap numbers from .Q.w when running with version 2.6 both in single threaded (each) and multi threaded modes (peach). Any ideas?
I am not sure of the concrete answer but this is my deduction based on following information (and some practical experiments) which is mentioned on wiki:
http://code.kx.com/q/ref/control/#peach
It says:
Memory Usage
Each slave thread has its own heap, a minimum of 64MB.
Since kdb 2.7 2011.09.21, .Q.gc[] in the main thread executes gc in the slave threads too.
Automatic garbage collection within each thread (triggered by a wsful, or hitting the artificial heap limit as specified with -w on the command line) is only executed for that particular thread, not across all threads.
Symbols are internalized from a single memory area common to all threads.
My observations:
Thread Specific Memory:
.Q.w[] only shows stats of main thread and not the summation of all the threads (total process memory). This could be tested by starting 'q' with 2 threads. Total memory in that case should be at least 128MB as per point 1 but .Q.w[] it still shows 64 MB.
That's why in your case at the start memory stats were close to unix stats as all the data was in main thread and nothing on other threads. After doing some operations some threads might have taken some memory (used/garbage) which is not shown by .Q.w[].
Garbage collector call
As mentioned on wiki, calling garbage collector on main thread calls GC on all threads. So that might have collected the garbage memory from threads and reduced the total memory usage which was reflected by reduced unix memory stats.

PostgreSQL Table in memory

I created a database containing a total of 3 tables for a specific purpose. The total size of all tables is about 850 MB - very lean... out of which one single table contains about 800 MB (including index) of data and 5 million records (daily addition of about 6000 records).
The system is PG-Windows with 8 GB RAM Windows 7 laptop with SSD.
I allocated 2048MB as shared_buffers, 256MB as temp_buffers and 128MB as work_mem.
I execute a single query multiple times against the single table - hoping that the table stays in RAM (hence the above parameters).
But, although I see a spike in memory usage during execution (by about 200 MB), I do not see memory consumption remaining at at least 500 MB (for the data to stay in memory). All postgres exe running show 2-6 MB size in task manager. Hence, I suspect the LRU does not keep the data in memory.
Average query execution time is about 2 seconds (very simple single table query)... but I need to get it down to about 10-20 ms or even lesser if possible, purely because there are just too many times, the same is going to be executed and can be achieved only by keeping stuff in memory.
Any advice?
Regards,
Kapil
You should not expect postgres processes to show large memory use, even if the whole database is cached in RAM.
That is because PostgreSQL relies on buffered reads from the operating system buffer cache. In simplified terms, when PostgreSQL does a read(), the OS looks to see whether the requested blocks are cached in the "free" RAM that it uses for disk cache. If the block is in cache, the OS returns it almost instantly. If the block is not in cache the OS reads it from disk, adds it to the disk cache, and returns the block. Subsequent reads will fetch it from the cache unless it's displaced from the cache by other blocks.
That means that if you have enough free memory to fit the whole database in "free" operating system memory, you won't tend to hit the disk for reads.
Depending on the OS, behaviour for disk writes may differ. Linux will write-back cache "dirty" buffers, and will still return blocks from cache even if they've been written to. It'll write these back to the disk lazily unless forced to write them immediately by an fsync() as Pg uses at COMMIT time. When it does that it marks the cached blocks clean, but doesn't flush them. I don't know how Windows behaves here.
The point is that PostgreSQL can be running entirely out of RAM with a 1GB database, even though no PostgreSQL process seems to be using much RAM. Having shared_buffers too high just leads to double-caching and can reduce the amount of RAM available for the OS to cache blocks.
It isn't easy to see exactly what's cached in RAM because Pg relies on the OS cache. That's why I referred you to pg_fincore.
If you're on Windows and this won't work, you really just have to rely on observing disk activity. Does performance monitor show lots of uncached disk reads? Does operating system memory monitoring show lots of memory used for disk cache in the OS?
Make sure that effective_cache_size correctly reflects the RAM used for disk cache. It will help PostgreSQL choose appropriate query plans.
You are making the assumption, without apparent evidence, that the query performance you are experiencing is explained by disk read delays, and that it can be improved by in-memory caching. This may not be the case at all. You need to look at explain analyze output and system performance metrics to see what's going on.

How to keep 32 bit mongodb memory usage down on changing dataset

I'm using MongoDB on a 32 bit production system, which sucks but it's out of my control right now. The challenge is to keep the memory usage under ~2.5GB since going over this will cause 32 bit systems to crash.
According to the mongoDB team, the best way to track the memory usage is to use your operating system's process tracking system (i.e. ps or htop on Unix systems; Process Explorer on Windows.) for virtual memory size.
The DB mainly consists of one table which is continually cycling data, i.e. receiving data at regular intervals from sensors, and every day a cron job wipes all data from before the last 3 days. Over a period of time, the memory usage slowly increases. I took some notes over time using db.serverStats(), db.lectura.totalSize() and ps, shown in the chart below. Note that the size of the table in question has reduced in the last month but the memory usage increased nonetheless.
Now, there is some scope for adjustment in how many days of data I store. Today I deleted basically half of the data, and then restarted mongodb, and yet the mem virtual / mem mapped and most importantly memory usage according to ps have hardly changed! Why do these not reduce when I wipe data (and restart)? I read some other questions where people said that mongo isn't really using all the memory that it might appear to be using, and that you can't clear the cache or limit memory use. But then how can I ensure I stay under the 2.5GB limit?
Unless there is a way to stem this dataset-size-irrespective gradual increase in memory usage, it seems to me that the 32-bit version of Mongo is unuseable. Note: I don't mind losing a bit of performance if it solves the problem.
To answer regarding why the mapped and virtual memory usage does not decrease with the deletes, the mapped number is actually what you get when you mmap() the entire set of data files. This does not shrink when you delete records, because although the space is freed up inside the data files, they are not themselves reduced in size - the files are just more empty afterwards.
Virtual will include journal files, and connections, and other non-data related memory usage also, but the same principle applies there. This, and more, is described here:
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
So, the 2GB storage size limitation on 32-bit will actually apply to the data files whether or not there is data in them. To reclaim deleted space, you will have to run a repair. This is a blocking operation and will require the database to be offline/unavailable while it was run. It will also need up to 2x the original size in terms of free disk space to be able to run the repair, since it essentially represents writing out the files again from scratch.
This limitation, and the problems it causes, is why the 32-bit version should not be run in production, it is just not suitable. I would recommend getting onto a 64-bit version as soon as possible.
By the way, neither of these figures (mapped or virtual) actually represents your resident memory usage, which is what you really want to look at. The best way to do this over time is via MMS, which is the free monitoring service provided by 10gen - it will graph virtual, mapped and resident memory for you over time as well as plenty of other stats.
If you want an immediate view, run mongostat and check out the corresponding memory columns (res, mapped, virtual).
In general, when using 64-bit builds with essentially unlimited storage, the data will usually greatly exceed the available memory. Therefore, mongod will use all of the available memory it can in terms of resident memory (which is why you should always have swap configured to the OOM Killer does not come into play).
Once that is used, the OS does not stop allocating memory, it will just have the oldest items paged out to make room for the new data (LRU). In other words, the recycling of memory will be done for you, and the resident memory level will remain fairly constant.
Your options for stretching 32-bit are limited, but you can try some things. The thing that you run out of is address space, and the increases in the sizes of additional database files mean that you would like to avoid crossing over the boundary from "n" files to "n+1". It may be worth structuring your data into more or fewer databases so that you can get the maximum amount of actual data into memory and as little as possible "dead space".
For example, if your database named "mydatabase" consists of the files mydatabase.ns (the namespace file) at 16 MB, mydatabase.0 at 64 MB, mydatabase.1 at 128 MB and mydatabase.2 at 256 MB, then the next file created for this database will be mydatabase.3 at 512 MB. If instead of adding to mydatabase you instead created an additional database "mynewdatabase" it would start life with mynewdatabase.ns at 16 MB and mynewdatabase.0 at 64 MB ... quite a bit smaller than the 512 MB that adding to the original database would be. In fact, you could create 4 new databases for less space than would be consumed by adding a new file to the original database, and because the files are smaller they would be easier to fit into contiguous blocks of memory.
It is a well-known message that 32-bit should not be used for production.
Use 64-bit systems.
Point.