Memory mapping and file I/O - mmap

If i have memory mapped a file of size 10GB in a 1GB machine and if i trigger a file i/o, after making sure that the data requested is not in physical memory, will the fetched data get mapped to the corresponding virtual address in mmap?
When i access the same location using mmap, will it again do an i/o (or will it make use of the data that was fetched using file i/o)
Thanks in advance,
Gokul.

It depends on the platform, but in general it'll be treated like other memory (swapped out when not in use, swapped in when required), except that instead of using the normal swap files/partitions it swaps from the original file on disk.

Related

Memory usage of zfs for mapped files

I read the following on https://blogs.oracle.com/roch/entry/does_zfs_really_use_more
There is one peculiar workload that does lead ZFS to consume more
memory: writing (using syscalls) to pages that are also mmaped. ZFS
does not use the regular paging system to manage data that passes
through reads and writes syscalls. However mmaped I/O which is closely
tied to the Virtual Memory subsystem still goes through the regular
paging code . So syscall writting to mmaped pages, means we will keep
2 copies of the associated data at least until we manage to get the
data to disk. We don't expect that type of load to commonly use large
amount of ram
What does this mean exactly? does this mean that zfs will "uselessly" double cache any memory region that is backed by a memory mapped file? or does "using syscalls" mean writing using some other method of writing that I am not familiar with.
If so, am I better off keeping the working directories of files written this way on a ufs partition?
Does this mean that zfs will "uselessly" double cache any memory region that is backed by a memory mapped file?
Hopefully, no.
or does "using syscalls" mean writing using some other method of writing that I am not familiar with.
That method is just regular low level write(fd, buf, nbytes) system calls and similars and not what memory mapped files are designed to support: accessing file content just with reading / writing memory by using pointers, using the file data as a byte array or whatever.
If so, am I better off keeping the working directories of files written this way on a ufs partition?
No, unless memory mapped files that are also written to using system calls sum to a significant part of your RAM workload, which is quite unlikely to happen.
PS: Note that this blog is almost ten years old. There might have been changes in the implementation since that time.

How does the OS decide data that goes in each page?

I have a comma separated data file, lets assume each record is of fixed length.
How does the OS(Linux) determine, which data parts are kept in one page in the hard disk?
Does it simply look at the file, organize the records one after the other(sequentially) in one page? Is it possible to programmatically set this or does the OS take care of it automatically?
Your question is quite general - you didn't specify which OS or filesystem - so the answer will be too.
Generally speaking the OS does not examine the data being written to a file. It simply writes the data to enough disk sectors to contain the data. If the sector size is 4K, then bytes 0-4095 are written to the first sector, bytes 4096-8191 are written to the second sector, etc. The OS does this automatically.
Very few programs wish to manage their disk sector allocation. One exception is high performance database management systems, which often implement their own filesystem in order have low level control of the file data to sector mapping.

What is memory map in mongodb?

I read about this topic at
http://docs.mongodb.org/manual/faq/storage/#faq-storage-memory-mapped-files
But didn't understand point .Does it is used to keep query data in physical memory ? How it is related with virtual memory ? Why it is important and how it effect at performance ?
I'll try to explain in a simple way.
MongoDB (and other storage systems) stores data in files. Each database has its own files, created as they are needed. The first file weights 64 MB, the next 128 and so up to 2 GB. Then, new files created weigh 2 GB. Each of these files are logically divided into different blocks, that correspond with one virtual memory block.
When MongoDB needs to access a file or a part of it, loads all virtual blocks corresponding to that file or parts of the files into memory using mmap.On the other hand, mmap is a way for applications to leverage the system cache (linux).
So what really happens when you are doing a query is that MongoDB "tells" the OS to load the part it needs with the data requested, so the next time is requested will be faster. As you can imagine this is a very important feature to boost performance in databases like MongoDB, because accessing RAM is way faster than hard drive.
Another benefit of using mmap is that MongoDB memory will grow as it needs and the system memory is free.

mmap writes to file on disk(synchronous/asynchronous)

I havea question regarding mmap functionality. when mmap is used in asynchronous mode where the kernel takes care of persisting the data to the mapped file on the disk , is it possible to have the former updates overwrite the later updates ?
Lets say at time T, we modify a location in memory that is memory mapped to a file on disk and again at time T+1 we modify the same location in memory. As the writes to the file are not synchronous, is it possible that kernel first picks up the modifications at time T+1 and then picks up the modifications at time T resulting in inconsistency in the memory mapped file ?
It's not exactly possible. The file is allowed to be inconsistent till msync(2) or munmap(2) - when that happens, dirty (modified) pages are written to disk page by page (sometimes more, depends on filesystem in newer kernels). msync() allows you to specify synchronous operation and invalidation of caches after finished write, which allows you to ensure that the data in cache is the same as data in file. Without that, it's possible that your program will see newer data but file contains older - exact specifics of the rather hairy situation depend on CPU architecture and specific OS implementation of those routines.

mmap() internals

It's widely known that the most significant mmap() feature is that file mapping is shared between many processes. But it's not less widely known that every process has its own address space.
The question is where are memmapped files (more specifically, its data) truly kept, and how processes can get access to this memory?
I mean not *(pa+i) and other high-level stuff, but I mean the internals of the process.
This happens at the virtual memory management layer in the operating system. When you memory map a file, the memory manager basically treats the file as if it were swap space for the process. As you access pages in your virtual memory address space, the memory mapper has to interpret them and map them to physical memory. When you cross a page boundary, this may cause a page fault, at which time the OS must map a chunk of disk space to a chunk of physical memory and resolve the memory mapping. With mmap, it simply does so from your file instead of its own swap space.
If you want lots of details of how this happens, you'll have to tell us which operating system you're using, as implementation details vary.
This is very implementation-dependent, but the following is one possible implementation:
When a file is a first memory-mapped, the data isn't stored anywhere at first, it's still on disk. The virtual memory manager (VMM) allocates a range of virtual memory addresses to the process for the file, but those addresses aren't immediately added to the page table.
When the program first tries to read or write to one of those addresses, a page fault occurs. The OS catches the page fault, figures out that that address corresponds to a memory-mapped file, and reads the appropriate disk sector into an internal kernel buffer. Then, it maps the kernel buffer into the process's address space, and restarts the user instruction that caused the page fault. If the faulting instruction was a read, we're all done for now. If it was a write, the data is written to memory, and the page is marked as dirty. Subsequent reads or writes to data within the same page do not require reading/writing to/from disk, since the data is in memory.
When the file is flushed or closed, any pages which have been marked dirty are written back to disk.
Using memory-mapped files is advantageous for programs which read or write disk sectors in a very haphazard manner. You only read disk sectors which are actually used, instead of reading the entire file.
I'm not really sure what you are asking, but mmap() sets aside a chunk of virtual memory to hold the given amount of data (usually. It can be file-backed sometimes).
A process is an OS entity, and it gains access to memory mapped areas through the OS-proscribed method: calling mmap().
The kernel has internal buffers representing chunks of memory. Any given process is assigned a memory mapping in its own address space which refers to that buffer. A number of proccesses may have their own mappings, but they all end up resolving to the same chunk (via the kernel buffer).
This is a simple enough concept, but it can get a little tricky when processes write. To keep things simple in the read-only case there's usually a copy-on-write functionality that's only used as needed.
Any data will be in some form of memory or others, some cases in HDD, in embedded systems may be some flash memory or even the ram (initramfs), barring the last one, the data in the memory are frequently cached in the RAM, RAM is logical divided into pages and the kernel maintains a list of descriptors which uniquely identify an page.
So at best accessing data would be accessing the physical pages. Process gets there own process address space which consists of many vm_are_struct which identifies a mapped section in the address space. In a call to mmap, new vm_area_struct may be created or may be merged with an existing one if the addresses are adjacent.
A new virtual address is returned to the call to mmap. Also new page tables are created which consists the mapping of the newly created virtual addresses to the physical address where the real data resides. mapping can be done on a file, or anonymously like malloc. The process address space structure mm_struct uses the pointer of pgd_t (Page global directory) to reach the physical page and access the data.