How big can a memory-mapped file be? - mmap

What limits the size of a memory-mapped file? I know it can't be bigger than the largest continuous chunk of unallocated address space, and that there should be enough free disk space. But are there other limits?

You're being too conservative: A memory-mapped file can be larger than the address space. The view of the memory-mapped file is limited by OS memory constraints, but that's only the part of the file you're looking at at one time. (And I guess technically you could map multiple views of discontinuous parts of the file at once, so aside from overhead and page length constraints, it's only the total # of bytes you're looking at that poses a limit. You could look at bytes [0 to 1024] and bytes [240 to 240 + 1024] with two separate views.)
In MS Windows, look at the MapViewOfFile function. It effectively takes a 64-bit file offset and a 32-bit length.

This has been my experience when using memory-mapped files under Win32:
If your map the entire file into one segment, it normally taps out at around 750 MB, because it can't find a bigger contiguous block of memory. If you split it up into smaller segments, say 100MB each, you can get around 1500MB-1800MB depending on what else is running.
If you use the /3g switch you can get more than 2GB up to about 2700MB but OS performance is penalized.
I'm not sure about 64-bit, I've never tried it but I presume the max file size is then limited only by the amount of physical memory you have.

Under Windows: "The size of a file view is limited to the largest available contiguous block of unreserved virtual memory. This is at most 2 GB minus the virtual memory already reserved by the process. "
From MDSN.
I'm not sure about LINUX/OSX/Whatever Else, but it's probably also related to address space.

Yes, there are limits to memory-mapped files. Most shockingly is:
Memory-mapped files cannot be larger than 2GB on 32-bit systems.
When a memmap causes a file to be created or extended beyond its current size in the filesystem, the contents of the new part are unspecified. On systems with POSIX filesystem semantics, the extended part will be filled with zero bytes.
Even on my 64-bit, 32GB RAM system, I get the following error if I try to read in one big numpy memory-mapped file instead of taking portions of it using byte-offsets:
Overflow Error: memory mapped size must be positive
Big datasets are really a pain to work with.

The limit of virtual address space is >16 Terabyte on 64Bit Windows systems. The issue discussed here is most probably related to mixing DWORD with SIZE_T.

There should be no other limits. Aren't those enough? ;-)

Related

Virtual memory location on hard-disk

I was reading about paging and swap-space and I'm a little confused about how much space (and where) on the hard-disk is used to page out / swap-out frames. Let's think of the following scenario :
We have a single process which progressively uses newer pages in virtual memory. Each time for a new page, we allocate a frame in physical memory.
But after a while, frames in the physical memory get exhausted and we choose a victim frame to be removed from RAM.
I have the following doubts :
Does the victim frame get swapped out to the swap space or paged out to some different location (apart from swap-space) on the hard-disk?
From what I've seen, swap space is usually around 1-2x size of RAM, so does this mean a process can use only RAM + swap-space amount of memory in total? Or would it be more than that and limited by the size of virtual memory?
Does the victim frame get swapped out to the swap space or paged out to some different location (apart from swap-space) on the hard-disk?
It gets swapped to the swap space. Swap space is used for that. A system without swap space cannot use this feature of virtual memory. It still has other features like avoiding external fragmentation and memory protection.
From what I've seen, swap space is usually around 1-2x size of RAM, so does this mean a process can use only RAM + swap-space amount of memory in total? Or would it be more than that and limited by the size of virtual memory?
The total memory available to a process will be RAM + swap-space. Imagine a computer with 1GB of RAM + 1GB of swap space and a process which requires 3GB. The process has virtual memory needs above what is available. This will not work because eventually the process will access all this code/data and it will make the program crash. Basically, the process image is bigger than RAM + swap space so eventually the program will get loaded completely from the executable and the computer will simply not have enough space to hold the process. It will crash the process.
There's really 2 options here. You either store a part of the process in RAM directly or you store it in the swap space. If there's no room in both of these for your process then the kernel doesn't have anywhere else to go. It thus crashes the process.

Difference between paging and segmentation

I am trying to understand both paradigms of memory management;however, I fail to see the big picture and the difference between both. Paging consists of taking fixed size pages from a secondary to a primary storage in order to do some task requested by a process. Segmentation consists of assigning to each unit in a process an address space, so they are allowed to grow. I don't quiet see how they are related and that's because there are still a lot of holes in my understanding. Can someone fill them up?
I think you have something confused. One problem you have is that the term "segment" had multiple meanings.
Segmentation is a method of memory management. Memory is managed in segments that are of variable or fixed length, depending upon the processor. Segments originated on 16-bit processors as a means to access more than 64K of memory.
On the PDP-11, programmers used segments to map different memory into the 64K address space. At any given time a process could only access 64K of memory but the memory that made up that 64K could change.
The 8086 and it successors used segments with base registers. Each segment could have 64K (that grew with the processors) but a process could have 4 segments (more in later processors).
Paging allows a process to have a larger address space than there is physical memory available.
The 8086's successors used the kludge of paging on top of segments. However, that bit of ugliness has finally gone away in 64-bit mode.
You got your answer right there, paging relates with fixed size pages in a storage while segmentation deals with units in a page. 'Segments' are objects in the class 'Page'

what determines the maximum inodes of a single partition?

From what I learn this far, inodes are the maximum file (and directory) you can have in single partition. You can fill the whole disk inodes without actually filling the disk space, or you can fill the disk space with one very big files, leaving inodes unused.
This question has come into my mind recently: where are those numbers coming from?
You did not mention a specific file system so I am going to assume ext4, although what I am saying should mostly apply to ext3 as well.
The number of inodes is determined when the file-system is created. File-systems are generally written to be flexible enough so that this number can be specified at creation to better suit the needs of the system. So if you have a lot of small files you can create more inodes and if you have a smaller number of large files you can create less inodes.
With mkfs.ext4 you can use the -i flag to specify the bytes per inode. The default value as of now is typically 16384 bytes per inode. This number is nothing specifically special but if you assume the typical 256 bytes for the inode size and 16384 bytes per inode you get approximately 1.56% of the disk space being used by inodes.

For a 2GBytes memory, suppose its memory width is 8 bits: what is the address space of the memory?

For a 2GBytes memory, suppose its memory width is 8 bits….
what is the address space of the memory?
what is the address width of the memory?
I’m not looking for the answer to question, I’m just trying to understand the process of how to get there.
EDIT: all instances of Gb replaced with GB
The address-space is the same as the memory size. This was not true (for example) in 32 bit operating systems that had more than 2^32 bytes of memory installed. Since the number of bits used for addressing is not specified in your question, one can only assume it to be sufficient to address the installed memory. To contrast, while you could install more than 4GB in a 32 bit system, you couldn't access more than 4GB, since (2^32)-1 is the location of the last byte you could access. Bear in mind, this address-space must include video memory and any/all bioses in the system. This meant that in 32bit WinXP MS limited the amount of user-accessible memory to a figure considerably less than 4GB.
Since the memory width is 8 bits, each address will point to 1 byte. Since you've got 2GB, you need to use a number of bits for addressing that is equal-to or greater-than that which will allow you to point to any one of those bytes.
Spoiler:
Your address-space is 2GB and you need 31 bit wide addresses to use it all.

Difference between 8k block on 32bit VS 4k block on 64bit file system

Is there any difference between 8k block on 32bit file system vs 4k block on 64bit file system?
If there is, how big would the difference in the largest file size be? Would I calculate it the same way I would calculate 8k on 32bit and just change 4bytes on 32bit to 8bytes on 64bit?
Ex. 8k bytes w/ 32bit disk addresses
So 8192/4bytes = 2048*8KB = 16MB of file data (singly indirect)
and 2048*2048*8k = 32GB of file data (doubly indirect)
Ex. 4k bytes w/ 64bit disk addresses
4096/8 = 512*4k = 2MB of data (singly indirect)
and 512*512*4k = 10GB of file data (doubly indirect)
So if the example is true, there would be a big difference between the two (specially in triply indirect, which gives the max file size)?
If you have real-world filesystems in mind, you can check the maximum
file size on Wikipedia.
In particular, if it's ext4, the block size vs. number of bits
doesn't change the max size of an individual file.
If you're talking about a theoretical system, your calculations are
fine. Do keep in mind that inode-based filesystems typically
have direct blocks as well, ext4 defines 12 of them. So add 48k to
your 4k-block example, 96k to the 8k-block one. And don't forget to add the triply-indirect blocks!