How does the OS decide data that goes in each page? - operating-system

I have a comma separated data file, lets assume each record is of fixed length.
How does the OS(Linux) determine, which data parts are kept in one page in the hard disk?
Does it simply look at the file, organize the records one after the other(sequentially) in one page? Is it possible to programmatically set this or does the OS take care of it automatically?

Your question is quite general - you didn't specify which OS or filesystem - so the answer will be too.
Generally speaking the OS does not examine the data being written to a file. It simply writes the data to enough disk sectors to contain the data. If the sector size is 4K, then bytes 0-4095 are written to the first sector, bytes 4096-8191 are written to the second sector, etc. The OS does this automatically.
Very few programs wish to manage their disk sector allocation. One exception is high performance database management systems, which often implement their own filesystem in order have low level control of the file data to sector mapping.

Related

Append-only file write with fsync, on emmc/SSD/sdcard, ext4 or f2fs?

I am building two operating systems for IoT. The Libertas OS and Hornet OS.
The data APIs are designed to be append-only time series. fsync() is required after each append of byte block to ensure data safety.
The storage could be emmc, SSD, or sdcard. The question is, which filesystem is a better fit for different storage types?
I understand f2fs is designed as append-only. But what about EXT4? Couldn't easily find information about it.
Theoretically, at least for file content, the append shall continue writing on the current underlying block to minimize wearing. Since the file size is changed after append, the file meta-data shall be updated, ideally through append-log.
I also don't know the details of the internal controller of sdcard and emmc, will the controller honor such block level append?
Any insight will be greatly appreciated!

How does mmap() help read information at a specific offset versus regular Posix I/O

I'm trying to understanding something a bit better about mmap. I recently read this portion of this accepted answer in the related stackoverflow question (quoted below):mmap and memory usage
Let's say you read a 100MB chunk of data, and according to the initial
1MB of header data, the information that you want is located at offset
75MB, so you don't need anything between 1~74.9MB! You have read it
for nothing but to make your code simpler. With mmap, you will only
read the data you have actually accessed (rounded 4kb, or the OS page
size, which is mostly 4kb), so it would only read the first and the
75th MB.
I understand most of the benefits of mmap (no need for context-switches, no need to swap contents out, etc), but I don't quite understand this offset. If we don't mmap and we need information at the 75th MB offset, can't we do that with standard POSIX file I/O calls without having to use mmap? Why does mmap exactly help here?
Of course you could. You can always open a file and read just the portions you need.
mmap() can be convenient when you don't want to write said code or you need sparse access to the contents and don't want to have to write a bunch of caching logic.
With mmap(), you're "mapping" the entire contest of the file to offsets in memory. Most implementation of mmap() do this lazily, so each ~4K block of the file is read on-demand, as you access those memory locations.
All you have to do is access the data in your file like it was a huge array of chars (i.e. int* someInt = &map[750000000]; return *someInt;), and let the OS worry about what portions of the file have been read, when to read the file, how much, writing the dirty data blocks back to the file, and purging the memory to free up RAM.

What's the difference between page and block in operating system?

I have learned that in an operating system (Linux), the memory management unit (MMU) can translate a virtual address (VA) to a physical address (PA) via the page table data structure. It seems that page is the smallest data unit that is managed by the VM. But how about the block? Is it also the smallest data unit transfered between the disk and the system memory?
What is the difference between pages and blocks?
A block is the smallest unit of data that an operating system can either write to a file or read from a file.
What exactly is a page?
Pages are used by some operating systems instead of blocks. A page is basically a virtual block. And, pages have a fixed size – 4K and 2K are the most commonly used sizes. So, the two key points to remember about pages is that they are virtual blocks and they have fixed sizes.
Why pages may be used instead of blocks
Pages are used because they make processing easier when there are many storage devices, because each device may support a different block size. With pages the operating system can deal with just a fixed size page, rather than try to figure out how to deal with blocks that are all different sizes. So, pages act as sort of a middleman between operating systems and hardware drivers, which translate the pages to the appropriate blocks. But, both pages and blocks are used as a unit of data storage.
http://www.programmerinterview.com/index.php/database-sql/page-versus-block/
Generally speaking, the hard-disk is one of those devices called "block-devices" as opposed to "character-devices" because the unit of transferring data is in the block.
Even if you want only a single character from a file, the OS and the drive will get you a block and then give you access only to what you asked for while the rest remains in a specific cache/buffer.
Note: The block size, however, can differ from one system to another.
To clear a point:
Yes, any data transferred between the hard disk and the RAM is usually sent in blocks rather than actual bytes.
Data which is stored in RAM is managed, typically, by pages yes; of course the assembly instructions only know byte addresses.

Operating system file system block size?

I recently have this question for my homework and I have trouble figuring it out. I tried searching online, but I can't seem to find any answers.
" Some file systems use two block sizes for disk storage allocation,
for example, 4- Kbyte and 512-byte blocks. Thus, a 6 Kbytes file can
be allocated with a single 4- Kbyte block and four 512-byte blocks.
Discuss the advantage of this scheme compared to the file systems that
use one block size for disk storage allocation. "
So are more blocks better?
Any help? thanks in advance.
You can't have a big amount of different block sizes, that would be hell to implement and manage. I also think that some hardware limitations restrain what sizes you can use.
Now the thing is, unless the amount of data you wish to store fits exactly in all the blocks you are using, then some space is going to be wasted in the last block.
For example, if your block is one gygabyte long (hypothetically speaking), and you want to store a 1 or 2 bytes long file, you've just wasted nearly a gigabyte of disk space. All information is stored as blocks. You can't store half a block.
Long blocks make for better performance, though, since the disk may spend more time fetching information from a block before proceeding to the next one. Also it's less blocks to track and manage.
Linux is a fun operating system to play with because it can work with so many different file systems (as far as I remember you only get some variations of FAT and NTFS with Windows). You could read more about file system on this link:
Linux System Administrators Guide: Chapter 5. Using Disks and Other Storage Media
See section 5.10.5 for more info on advantages and disadvantages of small and big block sizes.
So back to your question: having different block sizes like that allows you to optimize storage. You can minimize wasted space by switching to smaller blocks by the end of the file, while having as few blocks as possible to reduce I/O times.

mmap() internals

It's widely known that the most significant mmap() feature is that file mapping is shared between many processes. But it's not less widely known that every process has its own address space.
The question is where are memmapped files (more specifically, its data) truly kept, and how processes can get access to this memory?
I mean not *(pa+i) and other high-level stuff, but I mean the internals of the process.
This happens at the virtual memory management layer in the operating system. When you memory map a file, the memory manager basically treats the file as if it were swap space for the process. As you access pages in your virtual memory address space, the memory mapper has to interpret them and map them to physical memory. When you cross a page boundary, this may cause a page fault, at which time the OS must map a chunk of disk space to a chunk of physical memory and resolve the memory mapping. With mmap, it simply does so from your file instead of its own swap space.
If you want lots of details of how this happens, you'll have to tell us which operating system you're using, as implementation details vary.
This is very implementation-dependent, but the following is one possible implementation:
When a file is a first memory-mapped, the data isn't stored anywhere at first, it's still on disk. The virtual memory manager (VMM) allocates a range of virtual memory addresses to the process for the file, but those addresses aren't immediately added to the page table.
When the program first tries to read or write to one of those addresses, a page fault occurs. The OS catches the page fault, figures out that that address corresponds to a memory-mapped file, and reads the appropriate disk sector into an internal kernel buffer. Then, it maps the kernel buffer into the process's address space, and restarts the user instruction that caused the page fault. If the faulting instruction was a read, we're all done for now. If it was a write, the data is written to memory, and the page is marked as dirty. Subsequent reads or writes to data within the same page do not require reading/writing to/from disk, since the data is in memory.
When the file is flushed or closed, any pages which have been marked dirty are written back to disk.
Using memory-mapped files is advantageous for programs which read or write disk sectors in a very haphazard manner. You only read disk sectors which are actually used, instead of reading the entire file.
I'm not really sure what you are asking, but mmap() sets aside a chunk of virtual memory to hold the given amount of data (usually. It can be file-backed sometimes).
A process is an OS entity, and it gains access to memory mapped areas through the OS-proscribed method: calling mmap().
The kernel has internal buffers representing chunks of memory. Any given process is assigned a memory mapping in its own address space which refers to that buffer. A number of proccesses may have their own mappings, but they all end up resolving to the same chunk (via the kernel buffer).
This is a simple enough concept, but it can get a little tricky when processes write. To keep things simple in the read-only case there's usually a copy-on-write functionality that's only used as needed.
Any data will be in some form of memory or others, some cases in HDD, in embedded systems may be some flash memory or even the ram (initramfs), barring the last one, the data in the memory are frequently cached in the RAM, RAM is logical divided into pages and the kernel maintains a list of descriptors which uniquely identify an page.
So at best accessing data would be accessing the physical pages. Process gets there own process address space which consists of many vm_are_struct which identifies a mapped section in the address space. In a call to mmap, new vm_area_struct may be created or may be merged with an existing one if the addresses are adjacent.
A new virtual address is returned to the call to mmap. Also new page tables are created which consists the mapping of the newly created virtual addresses to the physical address where the real data resides. mapping can be done on a file, or anonymously like malloc. The process address space structure mm_struct uses the pointer of pgd_t (Page global directory) to reach the physical page and access the data.