In Mongodb how does the cache get warmed up? - mongodb

I'm wondering how the cache (memory) gets warmed up. I understand that MongoDB uses memory mapped files and the OS's virtual memory to swap pages in and out as needed. What I don't understand is how it gets warmed up on startup.
Upon startup does mongod map all of the pages in the database to virtual memory or is there some other mechanism to load pages that are not yet mapped which get mapped as queries are run against the database?
Similarly, is the size of the database limited to the amount of virtual memory available to the system. I understand that on a 64-bit system this is a lot. Is there another mechanism other than memory mapping for pages to moved to and from disk?

Memory mapping means that there is a representation of all the on disk files available but only a portion of these files may be present in RAM. When a given page is needed (and it is not in RAM) it can be read from disk into RAM to be accessed.
Regarding limitations, you can see them on the MongoDB limits page
MongoDB does not do any specific "warming" of pages on startup, as it does not have any concept of which pages would be useful and which not.
If you wish to "Warm" certain collections manually before using them you should look at the touch command.

Related

How is a program loaded in OS

I am reading about logical and physical addressing. I am confused that when a binary file is run, does it pass first through the CPU where the logical address are generated or is it directly copied to the physical memory ?
when a binary file is run, does it pass first through the CPU where the logical address are generated or is it directly copied to the physical memory ?
Typically some code somewhere loads the executable file's headers into memory, and then uses information from the headers to figure out where various pieces of the file (sections - e.g. .text, .data, etc) should be in virtual memory and what each virtual page's virtual permissions should be (if writes are allowed, if execution is allowed).
After this, areas of the virtual address space are set up. Often this is done by memory mapping the relevant part of the file into the virtual address space, without actually loading them into physical memory. In this case each page's actual permissions don't reflect the page's virtual permissions (e.g. a "read/write" page might be "not present" initially, and when software tries to read from the page you'll get a page fault and the page fault handler might fetch the page from disk and change the page to "present, read only"; and then later when software tries to write to the page you might get a second page fault and the page fault handler might do a "copy on write" so that anything else using the same physical page isn't effected and then make the new copy "read/write" so that it matches the original virtual permissions).
While this is happening; the OS could (depending on amount of free physical RAM and whether storage devices have more important data to transfer) be prefetching the file data from disk (e.g. into VFS cache), and could be "opportunistically" updating the process' page tables to avoid the overhead of page faults for pages that have been prefetched.
However; if the OS knows that the file was on unreliable and/or removable media it may decide that using memory mapped files is a bad idea and may actually load the needed executable's sections into memory before executing it; and an OS could have other features that cause the file to be loaded into RAM before it's executed (e.g. if the OS checks that an executable file's digital signature is correct before allowing the file to be executed, then the entire file probably needs to be loaded into memory to allow the digital signature can be checked, and in that case the entire file is likely to still be in memory when virtual address space is set up).
You need to read an entire book on these topics and spend several weeks on that
But Operating Systems: Three Easy Pieces is a good book, and it is freely downloadable.
Once you have read it, look perhaps also into osdev.org for practical things. And don't forget free software OSes such as Linux, e.g.
https://kernelnewbies.org/
Be aware of copy-on-write and virtual address space...
An executable files is generally an interpreted program itself that is executed by the loader. The executable contains instructions that tell the loader how the program should exist in VIRTUAL memory. By that, I mean the instructions in the executable define how the initial VIRTUAL representation of process address space.
So when the executable starts, there is only a virtual representation of the address space in secondary storage. As the program executes, it starts page faulting repeatedly to load pages into memory. After the initial load, the page fault rate dies down.
The executable NORMALLY only contains logical addresses.

Magento 2 website goes down every day and need to restart server

I have one e-commerce website in Magento 2.2.2 and it keeps on going down almost every day. Whenever it goes down, users get site took too long too respond and it never loads. To get web site working again I have to restart the server and then it works.
Total space on the server is 50GB. Out of which the whole website is around 18GB (11GB Media files and then vendor files etc.). Here are things that i cannot figure out why:
a.) The server shows that 33GB has been used although it should show only 18GB has been used. I have checked everywhere and I can't find what is consuming additional 15GB of space. Complete HTML folder is only 18GB.
b.) When I checked log files: it shows the following:
WARNING: Memory size allocated for the temporary table is more than 20% of innodb_buffer_pool_size. Please update innodb_buffer_pool_size or decrease batch size value (which decreases memory usages for the temporary table). Current batch size: 100000; Allocated memory size: 280000000 bytes; InnoDB buffer pool size: 1073741824 bytes.
I have already set innodb_buffer_pool_size to 2GB. But still, this problem keeps coming.
The server is an Amazon EC2 server and Magento is in production mode. Can allocating 100GB instead of 50GB will solve the problem?
Increased innodb buffer pool size to 10GB and logs do not show error anymore but server still goes down every day. Since RAM is only 4GB on our server, can that be the main cause? Because everyone is suggesting at least 8GB RAM?
Try the things below.
Magento2 has big log files and caching system. There may be increase your files in var folder.
But still you have to check whether your site belongs to more than 3000 products with high size images for products and you are storing all these in your server itself.
The suggestions what I can give, If your site have more products which I already mentioned better you have to use CDN for better performance. So the entire image will be process from the third party.
Next is You have to setup cloud flare to avoid the down time errors or customer side effect. You can make your index page to load while the server is down. And obviously you have to write script to restart the site automatically while its down.
In your server side check the memory size for php, you can better to give to 2G.
In Mysql side : Check the whether its making sleep query or not. If its making through your custom extension area ask your developer to optimize the code.
for eg : May be the code passing 'collection' for a single item fetch.
You can use the tool like nurelic
If everything is fine from your developer area please try to optimize the site with making memory limit mysql killing etc.. with your server side.
In the mean while magento is a big platform for e-commerce sector, so it has more area to cover by default. Better to avoid the unwanted modules from your active site, like disable the core modules which you are not using yet.
For an average site Use 16gb RAM,
A restart your mysql to make it effect ?
Also you need to set that buffer up to 20971520000, thats around 20GB.
Magento uses a lot of sessions and cache.

why do we need page cache when there is virtual memory

simple question about os. I know virtual memory handle the memory mapping for us. When some data we need is not in memory, VM will page in and copy the data into main memory, and if we are short of memory it will also page out some obsolete memory to disk. My question is, since virtual memory already handle this, why do we need a page cache? It seems to me VM already make main memory a cache for disk.
Virtual memory: see all address space (ram) as my own and bigger than the actual memory (swap to disk as necessary);
Page cache: open a file that is stored in some drive (a filesystem thing)

What is a persistent disk?

I have been doing an online course in MongoDB, however I am frequently coming across a term called "Persistent disk". I have googled it out, but did not get a satisfying answer. Could you please help me out ?
When your application runs (or Mongo in this case), it does all its work in RAM. RAM is not persistent because when you turn off your computer you permanently lose all the information stored in RAM.
Hard drives and SSDs on the otherhand are persistent. If you write something to them, it sticks for good.
So when you tell Mongo to insert a document, it first gets written into RAM, and then eventually gets written to the (persistent) disk. From an AppDev perspective is this all hidden from you and you (generally) just assume its written straight to desk
This is a good video from what I guess to be the Mongo course you're taking that talks about the transition from memory to disk.

Caching strategy to reduce load on web application server

What is a good tool for applying a layer of caching between a webserver and an application server.
Basic Requirements:
The application server needs a way to remove items from the cache and put items in the cache with an expiration date.
The webserver needs a way to pull items out of the cache in a very light-weight, fast manner without requiring thread allocation on the application server.
It does not neccessarily need to be a distributed cache (accessible from multiple machines), but it wouldn't hurt.
Strategies I have considered:
Static file caching. Request comes in, gets hashed, if a file exists we serve it, if not we route the request to the app server. Is high I/O a problem or file locking problems due to concurrency? Is it accurate that the file system is actually very fast due to kernel level caching in memory.
Using a key-value DB like mongodb, or redis. This would store the finished HTML/JSON fragments in db. The webserver would be equipped to read from the DB and route to the app server if needed. The app server would be equipped to insert/remove from the DB.
A memory cache like memcached or Varnish (don't know much about Varnish). My only concern with memcached is that I'm going to want to cache 3 - 10 gigabytes of data at any given time, which is more than I can safely allocate in memory. Does memcached have a method to spill to the filesystem?
Any thoughts on some techniques and pitfalls when trying this type of caching layer?
You can also use GigaSpaces XAP in memory data grid for caching and even hosting your web application. You can choose just the caching option or combine the power of two and gain single management of your environment along other things.
Unlike the key value pair approach you suggested, using GigaSpaces XAP you'll be able to have complex queries such as SQL, object based temples and much more. In your caching scenario you should check out more specifically the local cache related features.
Local Cache
Web Container
Disclaimer, I am a developer in GigaSpaces.
Eitan
Just to answer this from the POV of using Coherence (http://coherence.oracle.com/):
1. The application server needs a way to remove items from the cache and put items in the cache with an expiration date.
// remove one item from cache
cache.remove(key);
// remove multiple items from cache
cache.keySet().removeAll(keylist);
2. The webserver needs a way to pull items out of the cache in a very light-weight, fast manner without requiring thread allocation on the application server.
// access one item from cache
Object value = cache.get(key);
// access multiple items from cache
Map mapKV = cache.getAll(keylist);
3. It does not neccessarily need to be a distributed cache (accessible from multiple machines), but it wouldn't hurt.
Elastic. Just add nodes. Auto-discovery. Auto-load-balancing. No data loss. No interruption. Every time you add a node, you get more data capacity and more throughput.
Automatic high availability (HA). Kill a process, no data loss. Kill a server, no data loss.
A memory cache like memcached or Varnish (don't know much about Varnish). My only concern with memcached is that I'm going to want to cache 3 - 10 gigabytes of data at any given time, which is more than I can safely allocate in memory. Does memcached have a method to spill to the filesystem?
Use both RAM and flash. Transparently. Easily handle 10s or even 100s of gigabytes per Coherence node (e.g. up to a TB or more per physical server).
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.