Does it make sense to cache data obtained from a memory mapped file?

Does it make sense to cache data obtained from a memory mapped file? - memory-mapped-files

Or it would be faster to re-read that data from mapped memory once again, since the OS might implement its own cache?
The nature of data is not known in advance, it is assumed that file reads are random.

i wanted to mention a few things i've read on the subject. The answer is no, you don't want to second guess the operating system's memory manager.
The first comes from the idea that you want your program (e.g. MongoDB, SQL Server) to try to limit your memory based on a percentage of free RAM:
Don't try to allocate memory until there is only x% free
Occasionally, a customer will ask for a way to design their program so it continues consuming RAM until there is only x% free. The idea is that their program should use RAM aggressively, while still leaving enough RAM available (x%) for other use. Unless you are designing a system where you are the only program running on the computer, this is a bad idea.
(read the article for the explanation of why it's bad, including pictures)
Next comes from some notes from the author of Varnish, and reverse proxy:
Varnish Cache - Notes from the architect
So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done.
What happens is this: Squid creates a HTTP object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.
Imagine you do cache something from a memory-mapped file. At some point in the future that memory holding that "cache" will be swapped out to disk.
the OS has written to the hard-drive something which already exists on the hard drive
Next comes a time when you want to perform a lookup from your "cache" memory, rather than the "real" memory. You attempt to access the "cache", and since it has been swapped out of RAM the hardware raises a PAGE FAULT, and cache is swapped back into RAM.
your cache memory is just as slow as the "real" memory, since both are no longer in RAM
Finally, you want to free your cache (perhaps your program is shutting down). If the "cache" has been swapped out, the OS must first swap it back in so that it can be freed. If instead you just unmapped your memory-mapped file, everything is gone (nothing needs to be swapped in).
in this case your cache makes things slower
Again from Raymon Chen: If your application is closing - close already:
When DLL_PROCESS_DETACH tells you that the process is exiting, your best bet is just to return without doing anything
I regularly use a program that doesn't follow this rule. The program
allocates a lot of memory during the course of its life, and when I
exit the program, it just sits there for several minutes, sometimes
spinning at 100% CPU, sometimes churning the hard drive (sometimes
both). When I break in with the debugger to see what's going on, I
discover that the program isn't doing anything productive. It's just
methodically freeing every last byte of memory it had allocated during
its lifetime.
If my computer wasn't under a lot of memory pressure, then most of the
memory the program had allocated during its lifetime hasn't yet been
paged out, so freeing every last drop of memory is a CPU-bound
operation. On the other hand, if I had kicked off a build or done
something else memory-intensive, then most of the memory the program
had allocated during its lifetime has been paged out, which means that
the program pages all that memory back in from the hard drive, just so
it could call free on it. Sounds kind of spiteful, actually. "Come
here so I can tell you to go away."
All this anal-rententive memory management is pointless. The process
is exiting. All that memory will be freed when the address space is
destroyed. Stop wasting time and just exit already.
The reality is that programs no longer run in "RAM", they run in memory - virtual memory.
You can make use of a cache, but you have to work with the operating system's virtual memory manager:
you want to keep your cache within as few pages as possible
you want to ensure they stay in RAM, by the virtue of them being accessed a lot (i.e. actually being a useful cache)
Accessing:
a thousand 1-byte locations around a 400GB file
is much more expensive than accessing
a single 1000-byte location in a 400GB file
In other words: you don't really need to cache data, you need a more localized data structure.
If you keep your important data confined to a single 4k page, you will play much nicer with the VMM; Windows is your cache.
When you add 64-byte quad-word aligned cache-lines, there's even more incentive to adjust your data structure layout. But then you don't want it too compact, or you'll start suffering performance penalties of cache flushes from False Sharing.

The answer is highly OS-specific. Generally speaking, there will be no sense in caching this data. Both the "cached" data as well as the memory-mapped can be paged away at any time.
If there will be any difference it will be specific to an OS - unless you need that granularity, there is no sense in caching the data.

Related

What does TCM connection with Icache in this RISCV version?

In the middle of this page (https://github.com/ultraembedded/riscv), there is a block diagram about the core, I really do not know what is TCM doing in the same block with the Icache ? Is it an optional thing to be inside the CPU ?

Some embedded systems provide dedicated memory for code and/or for data.  On some of these systems, Tightly-Coupled Memory serves as a replacement for the (instruction) cache, while on other such systems this memory is in addition to and along side a cache, applying to a certain portion of the address space.  This dedicated memory may be on the chip of the processor.
This memory could be some kind of ROM or other memory that is initialized somehow prior to boot.  In any case, TCM typically isn't backed by main memory, so doesn't suffer cache misses and the associated circuitry, usually also has high performance, like a cache when a hit occurs.
Some systems refer to this as Instruction Tightly Integrated Memory, ITIM, or Data Tightly Integrated Memory, DTIM.
When a system uses ITIM or DTIM, it performs more like a Harvard architecture than the Modified Harvard architecture of laptops and desktops.

The cache has no address space. CPU does not ask for data from the cache, it just asks for a data, then the memory controller first checks the cache if the data is present in the cache. If it is in the cache, data is fetched, if not then the controller checks the RAM. All processor does is ask for data, it does not care where the data came from. In the case of TCM, the CPU can directly write data to TCM and ask data from TCM since it has a specific address. Think of TCM as a RAM that is close to the CPU.

Why program is executed on Memory not HardDisk?

when I study in Computer Architecture and System Programming, some question rises up.
First of all, program is in SSD or Hard Disk but when it executed, this load to memory(RAM). Why program is not executed on HardDisk directly?? Why need to load on RAM?
Thanks

This is simply done because your RAM is way faster than your Hard Disk.
When your computer executes a program, the CPU reads all the instructions from memory one after another and executes them. The CPU itself cannot store the whole program while executing it, so it has to be read from somewhere else. If the CPU had to read the instructions from a hard disk, it would be crazy slow.
Now that we have SSDs this is becoming somewhat less relevant, but in the old days the difference between RAM ("Random Access Memory") and HDD ("Hard Disk Drive") was that the RAM could access any memory address at any point in time, thus "Random Access". The HDD would have to rotate the hard disk your data is stored on to read from a certain address. Accessing random memory addresses is very hard for an HDD.
When the CPU executes a program it has to jump around all the time. It also has to store the program's memory somewhere and access that as quickly as possible whenever needed. An HDD is very bad at those two things, a RAM is very good.
So why did we use HDDs at all? Because RAM
is way to expensive
does not persist data when turned off
And what about SSDs? They are a lot better at random access that HDDs, but they're still considerably slower than RAM.
Also, you have to take swap files into account. The computer can use some of your HDD or SSD storage as system memory if it needs to. This can be very useful if the data that's using up your RAM does not get accessed by the CPU very often.

Does a user process have any control over paging?

A program might have some data that, when needed, it wants to access very fast. Let's call this VIP data. It would like to reduce the likelihood that page in memory that the VIP data resides on gets swapped to disk when memory utilization is high on the system. What types of control/influence does it have over this?
For example, I think it can consider the page replacement policy and try to influence the OS to not swap this VIP data to disk. If the policy is LRU, the program can periodically read the VIP data to ensure that the page has always been accessed fairly recently. A program can also use a very small amount of memory in total, making it likely that all its pages are recently accessed when it runs and therefore the VIP data is not likely swapped to disk.
Can it exert any more explicit control over paging?

In order to do this, you might consider
Prioritising the process using renice command or
Lock the processes in the main memory using MLOCK(2)

This is entirely operating system dependent. On some systems, if you have appropriate privileges you can lock pages in physical memory.

Cache request in Forth CPU

In the RAM of the computer a longer program is stored, written in the Forth programming language. It contains of three subfunctions and takes too many bytes for storing them in the first-level-cache of the CPU.
A caching strategy is needed which decides, that only a window of the RAM is copied into the CPU cache. The Forth CPU is able to read from the first-level-cache, but not from the RAM because access to the RAM takes to much time.
In the figure, the instruction pointer is directing to the first word and has executed one command so far and will reach the end of the word “:add” in a short time. But what will happen then?
If the CPU has executed the last command in the first-level-cache (+) there are no further commands. Which part of the Forth CPU decides to unload the current content of the CPU cache and retrieves the new chunk from the RAM?
I'm not sure if this has something to do with “block buffers”, “stack caching in the Forth VM” or “hardware-based CPU cache. The literature is a bit confusing:
Minforth Blockbuffers
Programming Forth, page 117
Stack Caching for Interpreters

Actually, it seems to be a general question not specific to a Forth CPU.
The part of a CPU that retrieves data from memory into the cache is known as cache controller unit or cache control circuitry.
See also: Microprocessor Design/Cache wikibook, How does CPU identify if the instructions are decoded question.

How to keep 32 bit mongodb memory usage down on changing dataset

I'm using MongoDB on a 32 bit production system, which sucks but it's out of my control right now. The challenge is to keep the memory usage under ~2.5GB since going over this will cause 32 bit systems to crash.
According to the mongoDB team, the best way to track the memory usage is to use your operating system's process tracking system (i.e. ps or htop on Unix systems; Process Explorer on Windows.) for virtual memory size.
The DB mainly consists of one table which is continually cycling data, i.e. receiving data at regular intervals from sensors, and every day a cron job wipes all data from before the last 3 days. Over a period of time, the memory usage slowly increases. I took some notes over time using db.serverStats(), db.lectura.totalSize() and ps, shown in the chart below. Note that the size of the table in question has reduced in the last month but the memory usage increased nonetheless.
Now, there is some scope for adjustment in how many days of data I store. Today I deleted basically half of the data, and then restarted mongodb, and yet the mem virtual / mem mapped and most importantly memory usage according to ps have hardly changed! Why do these not reduce when I wipe data (and restart)? I read some other questions where people said that mongo isn't really using all the memory that it might appear to be using, and that you can't clear the cache or limit memory use. But then how can I ensure I stay under the 2.5GB limit?
Unless there is a way to stem this dataset-size-irrespective gradual increase in memory usage, it seems to me that the 32-bit version of Mongo is unuseable. Note: I don't mind losing a bit of performance if it solves the problem.

To answer regarding why the mapped and virtual memory usage does not decrease with the deletes, the mapped number is actually what you get when you mmap() the entire set of data files. This does not shrink when you delete records, because although the space is freed up inside the data files, they are not themselves reduced in size - the files are just more empty afterwards.
Virtual will include journal files, and connections, and other non-data related memory usage also, but the same principle applies there. This, and more, is described here:
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
So, the 2GB storage size limitation on 32-bit will actually apply to the data files whether or not there is data in them. To reclaim deleted space, you will have to run a repair. This is a blocking operation and will require the database to be offline/unavailable while it was run. It will also need up to 2x the original size in terms of free disk space to be able to run the repair, since it essentially represents writing out the files again from scratch.
This limitation, and the problems it causes, is why the 32-bit version should not be run in production, it is just not suitable. I would recommend getting onto a 64-bit version as soon as possible.
By the way, neither of these figures (mapped or virtual) actually represents your resident memory usage, which is what you really want to look at. The best way to do this over time is via MMS, which is the free monitoring service provided by 10gen - it will graph virtual, mapped and resident memory for you over time as well as plenty of other stats.
If you want an immediate view, run mongostat and check out the corresponding memory columns (res, mapped, virtual).
In general, when using 64-bit builds with essentially unlimited storage, the data will usually greatly exceed the available memory. Therefore, mongod will use all of the available memory it can in terms of resident memory (which is why you should always have swap configured to the OOM Killer does not come into play).
Once that is used, the OS does not stop allocating memory, it will just have the oldest items paged out to make room for the new data (LRU). In other words, the recycling of memory will be done for you, and the resident memory level will remain fairly constant.

Your options for stretching 32-bit are limited, but you can try some things. The thing that you run out of is address space, and the increases in the sizes of additional database files mean that you would like to avoid crossing over the boundary from "n" files to "n+1". It may be worth structuring your data into more or fewer databases so that you can get the maximum amount of actual data into memory and as little as possible "dead space".
For example, if your database named "mydatabase" consists of the files mydatabase.ns (the namespace file) at 16 MB, mydatabase.0 at 64 MB, mydatabase.1 at 128 MB and mydatabase.2 at 256 MB, then the next file created for this database will be mydatabase.3 at 512 MB. If instead of adding to mydatabase you instead created an additional database "mynewdatabase" it would start life with mynewdatabase.ns at 16 MB and mynewdatabase.0 at 64 MB ... quite a bit smaller than the 512 MB that adding to the original database would be. In fact, you could create 4 new databases for less space than would be consumed by adding a new file to the original database, and because the files are smaller they would be easier to fit into contiguous blocks of memory.

It is a well-known message that 32-bit should not be used for production.
Use 64-bit systems.
Point.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse