MPU and Cache Relation -cortex-r4 - cortex-m3

While configuring MPU in cortex-r4 or configuring the cache, What are the memory attributes needs to considered,?
What are the relation between Memory protection unit attributes with cache and memory attributes?

Related

Logging memory used by Entity Framework Core

Can I somehow log the size of data loaded to memory by Entity Framework?
It is easy to load a lot of data into memory by putting an AsEnumerable before the where clause.
Is the peak memory usage for each query loggable?

Is TLB used at all in the instruction fetching pipeline

Is a TLB used at all in the instruction fetching pipeline?
Is this architecture / microarchitecture - dependent?
Typically, a processor that supports paging (which typically includes a mechanism for excluding execute permission even if not separately from read permission) will access a TLB as part of instruction fetch.
A virtually tagged instruction cache would not require such even for permissions checks as long as permissions were checked when a block is inserted into the instruction cache (which typically would involve a TLB access, though a permission cache could be used with a virtually tagged L2 cache; this includes prefetches into the instruction cache), the permission domain was included with the virtual tag (typically the same as an address space identifier, which is useful anyway to avoid cache flushing), and system software ensured that blocks were removed when execute permission was revoked (or the permission domain/address space identifier was reused for a different permission domain/address space).
(In general, virtually tagged caches do not need a translation lookaside buffer; a cache of permission mappings is sufficient or permissions can be cached with the tag and an indication of the permission domain. Before accessing memory a TLB would be used, but cache hits would not require translation. Permission caching is less expensive than translation caching both because the granularity can be larger and because fewer bits are needed to express permission information.)
A physically tagged instruction cache would require address translation for hit determination, but this can be delayed significantly by speculating that the access was a hit (likely using way prediction). Hit determination can be delayed even to the time of instruction commit/result writeback, though earlier handling is typically better.
Because instruction accesses typically have substantial spatial locality, a very small TLB can provide decent hit rates and a reasonably fast, larger back-up TLB can reduce miss costs. Such a microTLB can facilitate sharing a TLB between data and instruction accesses by filtering out most instruction accesses.
Obviously, an architecture that does not support paging would not use a TLB (though it might use a memory protection unit to check that an access is permitted or use a different translation mechanism such as adding an offset possibly with a bounds check). An architecture oriented toward single address space operating systems would probably use virtually tagged caches and so access a TLB only on cache misses.

Who place the data onto the cache?

Like we have locality of reference on which basis this data is copied to cache but who is responsible for this.
Is there any h/w or is there any s/f which perform this action?
The CPU reads/writes data into the cache when an instruction that access the memory is executed.
So it's an on-demand service, data is moved upon a request.
It then try to keep the data in the cache as long as possible until there is no more space and a replacement policy is used to evict a line in favor of new data.
The minimal unit of data transferred is called line and it is usually bigger than the register size (to improve locality).
Some CPUs have a prefetcher that, upon recognition of specific memory access patterns, try to automatically move data into the cache before it is actually requested by the program.
Some architecture have instructions that performs as hints for the CPU to prefetch data from a specific address.
This let the software have a minimal control over the prefetching circuitry, however if the software wants to just move data into the cache it only has to read the data (the CPU will cache it, if caching is enabled in that region).

Named Cache Vs Regions

I have a following requirement:
1) my cache framework should support Global cache and Application wise(based on country in which application is deployed) cache.
2) Global cache will contains the shared objects for all the applications
3) For the application wise cache, should I use named cache or Regions? And why?
Regions offer searching capabilities, but by limiting cached objects to a single cache host, the use of regions presents a trade-off between functionality and scalability.
Named cache, also referred to as a cache is the default container that spans all cache hosts in the cluster. Even, if there is a limit of 128 named caches, we can find a way to bypass this limit by using a prefix in your cache key.

mmap() internals

It's widely known that the most significant mmap() feature is that file mapping is shared between many processes. But it's not less widely known that every process has its own address space.
The question is where are memmapped files (more specifically, its data) truly kept, and how processes can get access to this memory?
I mean not *(pa+i) and other high-level stuff, but I mean the internals of the process.
This happens at the virtual memory management layer in the operating system. When you memory map a file, the memory manager basically treats the file as if it were swap space for the process. As you access pages in your virtual memory address space, the memory mapper has to interpret them and map them to physical memory. When you cross a page boundary, this may cause a page fault, at which time the OS must map a chunk of disk space to a chunk of physical memory and resolve the memory mapping. With mmap, it simply does so from your file instead of its own swap space.
If you want lots of details of how this happens, you'll have to tell us which operating system you're using, as implementation details vary.
This is very implementation-dependent, but the following is one possible implementation:
When a file is a first memory-mapped, the data isn't stored anywhere at first, it's still on disk. The virtual memory manager (VMM) allocates a range of virtual memory addresses to the process for the file, but those addresses aren't immediately added to the page table.
When the program first tries to read or write to one of those addresses, a page fault occurs. The OS catches the page fault, figures out that that address corresponds to a memory-mapped file, and reads the appropriate disk sector into an internal kernel buffer. Then, it maps the kernel buffer into the process's address space, and restarts the user instruction that caused the page fault. If the faulting instruction was a read, we're all done for now. If it was a write, the data is written to memory, and the page is marked as dirty. Subsequent reads or writes to data within the same page do not require reading/writing to/from disk, since the data is in memory.
When the file is flushed or closed, any pages which have been marked dirty are written back to disk.
Using memory-mapped files is advantageous for programs which read or write disk sectors in a very haphazard manner. You only read disk sectors which are actually used, instead of reading the entire file.
I'm not really sure what you are asking, but mmap() sets aside a chunk of virtual memory to hold the given amount of data (usually. It can be file-backed sometimes).
A process is an OS entity, and it gains access to memory mapped areas through the OS-proscribed method: calling mmap().
The kernel has internal buffers representing chunks of memory. Any given process is assigned a memory mapping in its own address space which refers to that buffer. A number of proccesses may have their own mappings, but they all end up resolving to the same chunk (via the kernel buffer).
This is a simple enough concept, but it can get a little tricky when processes write. To keep things simple in the read-only case there's usually a copy-on-write functionality that's only used as needed.
Any data will be in some form of memory or others, some cases in HDD, in embedded systems may be some flash memory or even the ram (initramfs), barring the last one, the data in the memory are frequently cached in the RAM, RAM is logical divided into pages and the kernel maintains a list of descriptors which uniquely identify an page.
So at best accessing data would be accessing the physical pages. Process gets there own process address space which consists of many vm_are_struct which identifies a mapped section in the address space. In a call to mmap, new vm_area_struct may be created or may be merged with an existing one if the addresses are adjacent.
A new virtual address is returned to the call to mmap. Also new page tables are created which consists the mapping of the newly created virtual addresses to the physical address where the real data resides. mapping can be done on a file, or anonymously like malloc. The process address space structure mm_struct uses the pointer of pgd_t (Page global directory) to reach the physical page and access the data.