From what I learn this far, inodes are the maximum file (and directory) you can have in single partition. You can fill the whole disk inodes without actually filling the disk space, or you can fill the disk space with one very big files, leaving inodes unused.
This question has come into my mind recently: where are those numbers coming from?
You did not mention a specific file system so I am going to assume ext4, although what I am saying should mostly apply to ext3 as well.
The number of inodes is determined when the file-system is created. File-systems are generally written to be flexible enough so that this number can be specified at creation to better suit the needs of the system. So if you have a lot of small files you can create more inodes and if you have a smaller number of large files you can create less inodes.
With mkfs.ext4 you can use the -i flag to specify the bytes per inode. The default value as of now is typically 16384 bytes per inode. This number is nothing specifically special but if you assume the typical 256 bytes for the inode size and 16384 bytes per inode you get approximately 1.56% of the disk space being used by inodes.
Related
Let's see we have a file system with FAT10 and a disk size of 1 GB.
I'd like to know how I can calculate the minimum size of a cluster?
My current approach looks like this: FAT10 means we have 2^10 clusters. Since the disk size is 1 GB which equals 2^30 bytes, we have 2^(30-10) = 2^20 bytes for each cluster.
Which means the minimum cluster size is 2^20 bytes ?
I hope this is the correct place to ask, otherwise tell me and I will delete this question! :c
It really depends on what your goals are.
Technically, the minimum cluster size is going to be 1 sector. However, this means that the vast majority of the 1 GB will not likely be accessible by the FAT10 system.
If you want to be able to access almost the whole 1 GB disk with the FAT10, then your calculation serves as a reasonable approximation. In fact due to practical constraints, you're probably not going to get much better unless you decide to start making some more unorthodox decisions (I would argue using a FAT10 system on a 1 GB drive is already unorthodox).
Here are some of the things you will need to know.
How many of the theoretical 1024 FAT values are usable? Remember, some have special meaning such as "cluster available", "end of cluster chain", "bad block (if applicable)" or "reserved value (if applicable)"
Does your on-disk FAT10 table reserve its space by count of sectors or count of clusters?
What is your sector size?
Are there any extra reserved sectors or clusters?
Is your data section going to be sector or cluster aligned?
Are you going to limit your cluster size to a power of two?
Consider the following parameters of a FAT based lesystem:
Blocks are 8KB (213 bytes) large
FAT entries are 32 bits wide, of which 24 bits are used to store a block address
A. How large does the FAT structure need to be accommodate a 1GB (2^30 bytes) disk?
B. What is the largest theoretical le size supported by the FAT structure from part (A)?
A. How large does the FAT structure need to be accommodate a 1GB (2^30 bytes) disk?
The FAT file system splits the space into clusters, then has a table (the "cluster allocation table" or FAT) with an entry for each cluster (to say if it's free, faulty or which cluster is the next cluster in a chain of clusters). To work out size of the "cluster allocation table" divide the total size of the volume by the size of a cluster (to determine how many clusters and how many entries in the "cluster allocation table"), then multiply by the size of one entry, then maybe round up to a multiple of the cluster size or not (depending on which answer you want - actual size or space consumed).
B. What is the largest theoretical le size supported by the FAT structure from part (A)?
The largest file size supported is determined by either (whichever is smaller):
the size of "file size" field in the file's directory entry (which is 32-bit for FAT32 and would therefore be 4 GiB); or
the total size of the space minus the space consumed by the hidden/reserved/system area, cluster allocation table, directories and faulty clusters.
For a 1 GiB volume formatted with FAT32, the max. size of a file would be determined by the latter ("total space - sum of areas not usable by the file").
Note that if you have a 1 GiB disk, this might (e.g.) be split into 4 partitions and a FAT file system might be given a partition with a fraction of 1 GiB of space. Even if there is only one partition for the "whole" disk, typically (assuming "MBR partitions" and not the newer "GPT partitions" which takes more space for partition tables, etc) the partition begins on the second track (the first track is "reserved" for MBR, partition table and maybe "boot manager") or a later track (e.g. to align the start of the partition to a "4 KiB physical sector size" and avoid performance problems caused by "512 logical sector size").
In other words, the size of the disk has very little to do with the size of the volume used for FAT; and when questions only tell you the size of the disk and don't tell you the size of the partition/volume you can't provide accurate answers.
What you could do is state your assumptions clearly in your answer, for example:
"I assume that a "1 GB" disk is 1000000 KiB (1024000000 bytes, and not 1 GiB or 1073741824 bytes, and not 1 GB or 1000000000 bytes); and I assume that 1 MiB (1024 KiB) of disk space is consumed by the partition table and MBR and all remaining space is used for a single FAT partition; and therefore the FAT volume itself is 998976 KiB."
What is the relationship between file system block size and
disk space wasted per file.
How can reducing the file system block size could reduce
the available/free disk space.
Its the CLUSTER SIZE that results in "wasted space." On hard file systems, disk space is allocated in clusters. Clusters are multiples of blocks. The block size is determined by the hardware.
The smaller the cluster size, the more clusters there are on the disk, the more overhead that is required to manage those clusters. Usually this is one or more bit maps with a bit per cluster.
Larger cluster size = lower overhead.
The tradeoff is that, if you need just one additional byte of storage, you have to allocate an entire cluster for it. The amount of "wasted" space grows with the size of the cluster.
Larger cluster sizes tend to be more efficient with larger files.
Smaller cluster sizes tend to be more efficient with smaller files.
I have a question here that I do not know how to calculate the maximal size of a file that one can store on a disk that uses inodes and disk blocks.
Assuming a page size of 4096 bytes, a page table entry that points to a frame takes 8 bytes (4
bytes for the pointer plus some flags), and a page table entry that points to another page table
takes 4 bytes, how many levels of page tables would be required to map a 32-bit address space if
each level page table must fit into a single page?
What the maximal file size one can store on a disk that uses inodes and disk blocks that store 4096 bytes. Each inode can store 10 entries, and the first inode reserves the last two entries for cascading inode???
For the first part of the question, I got the total number of levels is 3, but I do not know how to do the second part.
What you're describing sounds like the EXT filesystem.
EXT3 uses a total of 15 pointers.
The first 12 entries are direct: they point directly to data blocks. The third to final entry is a level 1 indirect: it points to a block filled entirely with level 1 entries. The second to final entry is a level 2 indirect: it points to a block completely full of level 1 indirects. The last entry is a level 3 indirect.
The maximum file size on this system is usually a restriction of the operating system, and is usually between 16GB and 2TB.
The theoretical maximum is 12I + I^2/P + I^3/P^2 + I^4/P^3, where I is the inode size in bytes (typically 4096, though different values are possible), and P is the pointer size, in bytes (4). This yields a maximum theoretical size of 4,402,345,721,856 bytes.
EXT3 Inode pointer structure
What limits the size of a memory-mapped file? I know it can't be bigger than the largest continuous chunk of unallocated address space, and that there should be enough free disk space. But are there other limits?
You're being too conservative: A memory-mapped file can be larger than the address space. The view of the memory-mapped file is limited by OS memory constraints, but that's only the part of the file you're looking at at one time. (And I guess technically you could map multiple views of discontinuous parts of the file at once, so aside from overhead and page length constraints, it's only the total # of bytes you're looking at that poses a limit. You could look at bytes [0 to 1024] and bytes [240 to 240 + 1024] with two separate views.)
In MS Windows, look at the MapViewOfFile function. It effectively takes a 64-bit file offset and a 32-bit length.
This has been my experience when using memory-mapped files under Win32:
If your map the entire file into one segment, it normally taps out at around 750 MB, because it can't find a bigger contiguous block of memory. If you split it up into smaller segments, say 100MB each, you can get around 1500MB-1800MB depending on what else is running.
If you use the /3g switch you can get more than 2GB up to about 2700MB but OS performance is penalized.
I'm not sure about 64-bit, I've never tried it but I presume the max file size is then limited only by the amount of physical memory you have.
Under Windows: "The size of a file view is limited to the largest available contiguous block of unreserved virtual memory. This is at most 2 GB minus the virtual memory already reserved by the process. "
From MDSN.
I'm not sure about LINUX/OSX/Whatever Else, but it's probably also related to address space.
Yes, there are limits to memory-mapped files. Most shockingly is:
Memory-mapped files cannot be larger than 2GB on 32-bit systems.
When a memmap causes a file to be created or extended beyond its current size in the filesystem, the contents of the new part are unspecified. On systems with POSIX filesystem semantics, the extended part will be filled with zero bytes.
Even on my 64-bit, 32GB RAM system, I get the following error if I try to read in one big numpy memory-mapped file instead of taking portions of it using byte-offsets:
Overflow Error: memory mapped size must be positive
Big datasets are really a pain to work with.
The limit of virtual address space is >16 Terabyte on 64Bit Windows systems. The issue discussed here is most probably related to mixing DWORD with SIZE_T.
There should be no other limits. Aren't those enough? ;-)