mmap() internals - mmap

It's widely known that the most significant mmap() feature is that file mapping is shared between many processes. But it's not less widely known that every process has its own address space.
The question is where are memmapped files (more specifically, its data) truly kept, and how processes can get access to this memory?
I mean not *(pa+i) and other high-level stuff, but I mean the internals of the process.

This happens at the virtual memory management layer in the operating system. When you memory map a file, the memory manager basically treats the file as if it were swap space for the process. As you access pages in your virtual memory address space, the memory mapper has to interpret them and map them to physical memory. When you cross a page boundary, this may cause a page fault, at which time the OS must map a chunk of disk space to a chunk of physical memory and resolve the memory mapping. With mmap, it simply does so from your file instead of its own swap space.
If you want lots of details of how this happens, you'll have to tell us which operating system you're using, as implementation details vary.

This is very implementation-dependent, but the following is one possible implementation:
When a file is a first memory-mapped, the data isn't stored anywhere at first, it's still on disk. The virtual memory manager (VMM) allocates a range of virtual memory addresses to the process for the file, but those addresses aren't immediately added to the page table.
When the program first tries to read or write to one of those addresses, a page fault occurs. The OS catches the page fault, figures out that that address corresponds to a memory-mapped file, and reads the appropriate disk sector into an internal kernel buffer. Then, it maps the kernel buffer into the process's address space, and restarts the user instruction that caused the page fault. If the faulting instruction was a read, we're all done for now. If it was a write, the data is written to memory, and the page is marked as dirty. Subsequent reads or writes to data within the same page do not require reading/writing to/from disk, since the data is in memory.
When the file is flushed or closed, any pages which have been marked dirty are written back to disk.
Using memory-mapped files is advantageous for programs which read or write disk sectors in a very haphazard manner. You only read disk sectors which are actually used, instead of reading the entire file.

I'm not really sure what you are asking, but mmap() sets aside a chunk of virtual memory to hold the given amount of data (usually. It can be file-backed sometimes).
A process is an OS entity, and it gains access to memory mapped areas through the OS-proscribed method: calling mmap().

The kernel has internal buffers representing chunks of memory. Any given process is assigned a memory mapping in its own address space which refers to that buffer. A number of proccesses may have their own mappings, but they all end up resolving to the same chunk (via the kernel buffer).
This is a simple enough concept, but it can get a little tricky when processes write. To keep things simple in the read-only case there's usually a copy-on-write functionality that's only used as needed.

Any data will be in some form of memory or others, some cases in HDD, in embedded systems may be some flash memory or even the ram (initramfs), barring the last one, the data in the memory are frequently cached in the RAM, RAM is logical divided into pages and the kernel maintains a list of descriptors which uniquely identify an page.
So at best accessing data would be accessing the physical pages. Process gets there own process address space which consists of many vm_are_struct which identifies a mapped section in the address space. In a call to mmap, new vm_area_struct may be created or may be merged with an existing one if the addresses are adjacent.
A new virtual address is returned to the call to mmap. Also new page tables are created which consists the mapping of the newly created virtual addresses to the physical address where the real data resides. mapping can be done on a file, or anonymously like malloc. The process address space structure mm_struct uses the pointer of pgd_t (Page global directory) to reach the physical page and access the data.

Related

OS: how does kernel virtual memory help in making swap pages of the page table easier?

Upon reading this chapter from "Operating Systems: Three Easy Pieces" book, I'm confused of this excerpt:
If in contrast the kernel were located entirely in physical memory, it would be quite hard to do things like swap pages of the page table to disk;
I've been trying to make sense of it for days but still can't get as to how kernel virtual memory helps in making swap pages of the page table easier. Wouldn't it be the same if the kernel would live completely in physical memory as the pages of different page tables' processes would end up in physical memory in the end anyway (and thus be swapped to disk if needed)? How is it different when page tables reside in kernel virtual memory vs. in kernel-owned physical memory?
Let's assume the kernel needs to access a string of characters from user-space (e.g. maybe a file name passed to an open() system call).
With the kernel in virtual memory; the kernel would have to check that the virtual address of the string is sane (e.g. not an address for kernel's own data) and also guard against a different thread in user-space modifying the data the pointer points to (so that the string can't change while the kernel is using it, possibly after the kernel checked that the string itself is valid but before the kernel uses the string). In theory (and likely in practice) this could all be hidden by a "check virtual address range" function. Apart from those things; kernel can just use the string like normal. If the data is in swap space, then kernel can just let its own page fault handler fetch the data when kernel attempts to access it the data and not worry about it.
With kernel in physical memory; the kernel would still have to do those things (check the virtual address is sane, guard against another thread modifying the data the pointer points to). In addition; it would have to convert the virtual address to a physical addresses itself, and ensure the data is actually in RAM itself. In theory this could still be hidden by a "check virtual address range" function that also converts the virtual address into physical address(es).
However, the "contiguous in virtual memory" data (the string) may not be contiguous in physical memory (e.g. first half of the string in one page with the second half of the string in a different page with a completely unrelated physical address). This means that kernel would have to deal with this problem too (e.g. for a string, even things like strlen() can't work), and it can't be hidden in a "check (and convert) virtual address range" function to make it easier for the rest of the kernel.
To deal with the "not contiguous in physical memory" problem there's mostly only 2 possibilities:
a) A set of "get char/short/int.. at offset N using this list of physical addresses" functions; or
b) Refuse to support it in the kernel; which mostly just shifts unwanted burden to user-space (e.g. the open() function in a C library copying the file name string to a new page if the original string crossed a page boundary before calling the kernel's open() system call).
In an ideal world (which is what we live in now with kernels that are hundreds of megabytes in size running on machines with gigabytes of physical RAM) the kernel would never swap even parts of itself. But in the old days, when physical memory was a constraint, the less of the kernel in physical memory, the more the application could be in physical memory. The more the application is in physical memory, the fewer page faults in user space.
THe linux kernel has been worked over fairly extensively to keep it compact. Case in point: kernel modules. You can load a module using insmod or modprobe, and that module will become resident, but if nothing uses is, after a while it will get swapped out, and that's no big deal because nothing is using it.

What is meant by "CPU generates logical address space"?

As far as I read from the book(correct me I am wrong) after a compiler puts the compiled code in the storage, the CPU creates logical addresses and those logical addresses are mapped to physical memory through MMU(Memory Management Unit).Also I know that CPU directly cannot access anything other than the physical memory.
Then how does CPU produce logical addresses for the process in the first place?
It sounds like you have a bit of confusion about what things do.
The operating system defines the logical address space by setting up page tables that logical pages to physical page frames. The operating system loads hardware registers of the CPU so that it knows about the page tables it has defined.
This use of page tables to define logical address spaces is an integral part of a modern CPU. In some systems, the only use of physical addresses is within page tables.
The compiler generates an object code file that describes the instructions and data used and created.
The linker combines object code into an executable file that defines how the program will be loaded into memory.
The loader reads the instructions in an executable file and sets up the logical addresses space to run the program. The loader calls system routines that set up the page tables the define the logical addresses space.
For example, where the executable has read-only data, the loader will call OS routines to creates read-only pages in the logical address space and map them to the data in the executable file.

Memory usage of zfs for mapped files

I read the following on https://blogs.oracle.com/roch/entry/does_zfs_really_use_more
There is one peculiar workload that does lead ZFS to consume more
memory: writing (using syscalls) to pages that are also mmaped. ZFS
does not use the regular paging system to manage data that passes
through reads and writes syscalls. However mmaped I/O which is closely
tied to the Virtual Memory subsystem still goes through the regular
paging code . So syscall writting to mmaped pages, means we will keep
2 copies of the associated data at least until we manage to get the
data to disk. We don't expect that type of load to commonly use large
amount of ram
What does this mean exactly? does this mean that zfs will "uselessly" double cache any memory region that is backed by a memory mapped file? or does "using syscalls" mean writing using some other method of writing that I am not familiar with.
If so, am I better off keeping the working directories of files written this way on a ufs partition?
Does this mean that zfs will "uselessly" double cache any memory region that is backed by a memory mapped file?
Hopefully, no.
or does "using syscalls" mean writing using some other method of writing that I am not familiar with.
That method is just regular low level write(fd, buf, nbytes) system calls and similars and not what memory mapped files are designed to support: accessing file content just with reading / writing memory by using pointers, using the file data as a byte array or whatever.
If so, am I better off keeping the working directories of files written this way on a ufs partition?
No, unless memory mapped files that are also written to using system calls sum to a significant part of your RAM workload, which is quite unlikely to happen.
PS: Note that this blog is almost ten years old. There might have been changes in the implementation since that time.

What's the difference between page and block in operating system?

I have learned that in an operating system (Linux), the memory management unit (MMU) can translate a virtual address (VA) to a physical address (PA) via the page table data structure. It seems that page is the smallest data unit that is managed by the VM. But how about the block? Is it also the smallest data unit transfered between the disk and the system memory?
What is the difference between pages and blocks?
A block is the smallest unit of data that an operating system can either write to a file or read from a file.
What exactly is a page?
Pages are used by some operating systems instead of blocks. A page is basically a virtual block. And, pages have a fixed size – 4K and 2K are the most commonly used sizes. So, the two key points to remember about pages is that they are virtual blocks and they have fixed sizes.
Why pages may be used instead of blocks
Pages are used because they make processing easier when there are many storage devices, because each device may support a different block size. With pages the operating system can deal with just a fixed size page, rather than try to figure out how to deal with blocks that are all different sizes. So, pages act as sort of a middleman between operating systems and hardware drivers, which translate the pages to the appropriate blocks. But, both pages and blocks are used as a unit of data storage.
http://www.programmerinterview.com/index.php/database-sql/page-versus-block/
Generally speaking, the hard-disk is one of those devices called "block-devices" as opposed to "character-devices" because the unit of transferring data is in the block.
Even if you want only a single character from a file, the OS and the drive will get you a block and then give you access only to what you asked for while the rest remains in a specific cache/buffer.
Note: The block size, however, can differ from one system to another.
To clear a point:
Yes, any data transferred between the hard disk and the RAM is usually sent in blocks rather than actual bytes.
Data which is stored in RAM is managed, typically, by pages yes; of course the assembly instructions only know byte addresses.

Memory mapping and file I/O

If i have memory mapped a file of size 10GB in a 1GB machine and if i trigger a file i/o, after making sure that the data requested is not in physical memory, will the fetched data get mapped to the corresponding virtual address in mmap?
When i access the same location using mmap, will it again do an i/o (or will it make use of the data that was fetched using file i/o)
Thanks in advance,
Gokul.
It depends on the platform, but in general it'll be treated like other memory (swapped out when not in use, swapped in when required), except that instead of using the normal swap files/partitions it swaps from the original file on disk.