Where hashed page table is stored - operating-system

What I understand is we can't guarantee large amount (larger than page size) of contiguous memory. If size of page table itself is large enough that can't be stored in 1 page that is a problem. So we again do paging on page table what is called multilevel page table. But multilevel page table is not a good choice if address is greater than 32 bit because more leveling cost most computation.
To avoid this hashed page table is used.
From my understanding hashed page table [indexable] size should be under page size. So for large address size there is going to be lots of collisions. If page size is 12 bit page table consist 2^52 entries and hashtable size is going to 2^12 ( approx don't know the exact calculation) and then per index 2^40 sized linked list. So how is this going to be feasible. So my assumption is hashtable is going to store using other methods or elsewhere. Operating system concepts book dint explain much about it and others sites also.
I have read operating system concepts ninth edition page 380.

What I understand is we can't guarantee large amount (larger than page size) of contiguous memory.
Why? Often a physical memory manager has to be able to handle the allocation of physically contiguous buffers for (some) device drivers.
So we again do paging on page table what is called multilevel page table. But multilevel page table is not a good choice if address is greater than 32 bit because more leveling cost most computation.
Why? Most CPUs use multilevel page tables; and then have a TLB ("translation look-aside buffer") to avoid the cost of looking things up in the page tables. Modern 80x86 goes further and also has higher level paging structure caches (in addition to TLBs).
From my understanding hashed page table [indexable] size should be under page size. So for large address size there is going to be lots of collisions. If page size is 12 bit page table consist 2^52 entries and hashtable size is going to 2^12 ( approx don't know the exact calculation) and then per index 2^40 sized linked list. So how is this going to be feasible.
The thing is; if the translation isn't in the hash table (e.g. because of limited hash table size) usually the CPU generates a fault to ask the OS for assistance, and the OS figures out the translation and shoves it into the hash table (after evicting something else from the hash table to make room). Of course the OS will probably use its own multilevel page table to figure out the translation (to shove into the hash table); so the whole "hash table" thing ends up being a whole layer of annoying extra bloat (compared to CPUs that support multilevel page tables themselves).

Related

Different Page Sizes for Processes

As part of the virtual to physical address conversion, for each process a table of mappings between virtual to physical addresses is stored. If a process is scheduled next the content of the page table is loaded into the MMU.
1) Where is the page table for each process stored? As part of the process control block?
2) Does the page table contain entries for not allocated memory so a segfault can be detected (more easily)?
3) Is it possible (and used in any known relevant OS) that one process does have multiple page frame sizes? Especially if question 2 is true it is very convenient to map huge page tables to non existing memory to keep the page table as small as possible. It will still allow high precision in mapping smaller frames to the memory to keep external (and internal) fragmentation as small as possible? This of course requires an extra field storing the frame size for each entry. Please point out the reason(s) if my "idea" cannot exist.
1) They could be, but most OS's have a notion of an address space which a process is attached to. The address space typically contains a description of the sorts of mappings that have been established, and pointers to the page structure(s). If you consider the operation of exec(2), at a certain level of abstraction it merely involves creating a new address space, populating it, then attaching the process to it. Once the operation is known to succeed, the old address space can simply be discarded.
2) It depends upon the mmu architecture of the machine. In a forward mapped arrangement (x86, armv[78]), the page tables form a sort of tree structure, but instead of having the conventional 2 or 3 items per node, there are hundreds or thousands of them. The x86-classic has a 2 level structure, where each of the 1024 entries in the first level points to a pagetable which covers 2^20 bytes of address space. Invalid entries, either at the inner or leaf level, can represent unmapped space; so in x86-classic, if you have a very small address space, you only need a root table, and a single leaf level table.
3) Yes, multiple page size has been supported by most OSes since the early 2000s. Again, in forward mapped ones, each of the levels of the tree can be replaced by a single large page for the same address space as that table level. x86-classic only had one size; later editions supported many more.
3a) There is no need to use large pages to do this -- simply having an invalid page table is sufficient. In x86-classic, the least significant bit of the page table/descriptor entry indicates the validity of the entry.
Your idea exists.
1) Where is the page table for each process stored? As part of the process control block?
Usually it's not "a page table". For some CPUs there's only TLB entries (Translation Lookaside Buffer entries - like a cache of what the translations are) where software has to handle "TLB miss" by loading whatever it feels like into the TLB itself, and where the OS might not use tables at all (e.g. could use "list of arbitrary length zones"). For some CPUs it's a hierarchy of multiple levels (e.g. for modern 64-bit 80x86 there's 4 levels); and in this case some of the levels may be in physical memory and some may be in swap space or somewhere else and some may be generated as needed from other data (a little bit like it would've been for "software handling of TLB miss"). In any case, if each process has its own virtual address space (e.g. and it's not some kind of "single-address space shared by many processes" scheme) its likely that the process control block (directly or indirectly) contains a reference to whatever the OS uses (e.g. maybe a single "physical address for the highest level page table", but maybe a virtual address of a "list of arbitrary length zones" and maybe anything else).
2) Does the page table contain entries for not allocated memory so a segfault can be detected (more easily)?
If there are page tables then there must be a way to indicate "page not present", where "page not present" may mean that the memory isn't allocated but could also mean that the (virtual) memory was allocated but the entry for it hasn't been set (either because OS is generating the tables on demand, or because the actual data is in swap space, or...).
3) Is it possible (and used in any known relevant OS) that one process does have multiple page frame sizes?
Yes. It's relatively common for 64-bit 80x86 where there's 4 KiB pages, 2 MiB (or 4 MiB) "large pages" (plus maybe 1 GiB "huge pages"); and done to reduce the chance of TLB misses (while also reducing memory consumed by page tables). Note that this is mostly an artifact of having multiple levels of page tables - an entry in a higher level table can say "this entry is a large page" or it can say "this entry is a lower level page table that might contain smaller pages". Note that in this case it's not "multiple page sizes in the same table", but is "fixed page size for each level".
Especially if question 2 is true it is very convenient to map huge page tables to non existing memory to keep the page table as small as possible. It will still allow high precision in mapping smaller frames to the memory to keep external (and internal) fragmentation as small as possible? This of course requires an extra field storing the frame size for each entry. Please point out the reason(s) if my "idea" cannot exist.
Converting a virtual address into a physical address (or some kind of fault to indicate the translation doesn't exist) needs to be very fast (because it happens extremely often). When you have "fixed page size for each level" it means you can extract some bits of the virtual address and use them as the index into the table; which is fast.
When you have "multiple page sizes in the same table" there's 2 options. The first option is to duplicate entries in the page table so that you can still extract some bits of the virtual address and use them as the index into the table; which (apart from minor differences in the way TLBs are managed - e.g. auto-detecting adjacent translations vs. being manually told) is effectively identical to not bothering at all; but there are some CPUs (ARM I think) that do this.
The other alternative is searching multiple entries in the page table to find the right entry, where the cost of searching reduces performance. I don't know of any CPU that supports this - performance is too important.

How do Multi-Level Page Tables Actually Save Space

I've been trying to do some research on why multi level page tables save space, and I think I'm a little confused on how a page table itself works. I found the following from Cornell:
The page table needs one entry per page. Assuming a 4GB (2^32 byte) virtual and physical address space and a page size of 4kB (2^12 bytes), we see that the the 2^32 byte address space must be split into 2^20 pages.
It is my understanding each process has its own page table. Does this mean that each process has 4GB of virtual address space? What is the point of the virtual address space being so huge? Why not allocate virtual pages as needed? Is it because the OS wants every possible address that can be made in the word size to map to a virtual page? Why not just prevent the program from dereferencing any virtual page number that is not a valid index for the page table?
I have read that one of the advantages of the multi-level page table is that it saves space by not having page table entries for virtual pages that are not in use. See below from Carnegie Mellon:
But why not just have a single level page table that has continuous entries - why would the process need PTE 1, 2, and then skip to 8? Why allow that? Even still, why do all the trailing, unused PTE's exist? Why not cut the page table short?
Consider a system with a 32-bit logical address space. If the
page size in such a system is 4 KB (2^12), then a page table may consist of over
1 million entries (2^20 = 2^32/2^12). Assuming that each entry consists of 4 bytes,
each process may need up to 4 MB of physical address space for the page table
alone.
Clearly, we would not want to allocate the page table contiguously in
main memory. One simple solution to this problem is to divide the page table
into smaller pieces. We can accomplish this division in several ways.
One way is to use a two-level paging algorithm, in which the page table
itself is also paged. For example, consider again the system with
a 32-bit logical address space and a page size of 4 KB. A logical address is
divided into a page number consisting of 20 bits and a page offset consisting
of 12 bits. Because we page the page table, the page number is further divided
into a 10-bit page number and a 10-bit page offset.
Two-level paging
--Silberschatz A., Galvin P.B.
So for the case proposed above we would use 2^10 * 4B = 4KB for the outer page (p1) and only N * 2^10 * 4B= N * 4KB of inner page tables where N is the number of the needed pages for a process, which in total is less than the space needed for a non-leveled page table (4MB).
You should also notice that the process only occupies the number of pages and memory needed, and the maximum virtual address space determines the maximum addressable and thus occupiable memory for a process given a system configuration (32/64 bit address space).
This saves memory (the word "space" is not specific enough) because the level 1 page table is much smaller when using two-level page tables instead of one. We can just keep level 1 page table in memory and paged out some level 2 page tables to disk.
You can watch this YouTube video to learn about size of page table and multi-level page tables.

Hierarchical page tables vs. Inverted tables

I read about a page structure of the memory and can't get some points:
Page table: As I understood the process (like Intel i5) has the page table and TLB that integrated in its crystal, it isn't? But this table doesn't contain addresses of virtual pages, so OS must have yet one page table in the operative memory. So?
Inverted tables: I understood that it has the page table but this table contains addresses of real blocks of the memory. And I got a nothing more. Where does this table located in the process or OS provides it in the operative memory. What's a hash-function for?
From the picture. PID - Process ID (What's it for), p - page number (physical page or virtual page? If it's a physical page, what's this table for?).
Pls, do not refer me to Wiki and etc. I read it already and I couldn't get. Can a someone explain it clearer?
For learning purposes you should start with plain vanilla page tables. Ignore inverted page tables to get started because they are an oddball used in a very few processors.
The simplest case is a single level page table. In that case, the logical address consists of a logical page number and an offset within that page. To translate from logical pages to physical page frames, you take the page number, use that as an index into the page table. The page table then specifies the physical page frame (if there is one) the page is mapped to.
The next level of complexity is a multi level page table. In that case, the logical page number is broken down into bit fields, where each field represents a level in the table. The most significant bit field is in index into the top level page table. The corresponding page table entry references another page table. The next most significant bit field is an index into that page table. The process repeats until you get to the last page table level where the entries specify physical page frames.
Note that in this system the page table maps from logical address to physical page frames. There is no direct mapping between physical page frames and logical addresses.
For inverted page tables you have to relearn everything. There is a single page table with an entry for each physical page frame. The page table indicates the corresponding virtual page (if any) mapped to it.
In the inverted page table system, the processor can map from physical page frames to logical pages directly. In order to map from logical page frames to physical pages, the processor has to scan the page table (relying heavily on caching).
The mechanics of normal page tables are pretty much the same among systems (the major difference being the number of levels). However, there is no such similarity in systems that use inverted page tables.
If the system using inverted page tables uses a single, system wide table (as opposed to one table per process) there must be a PID field in the table to resolve the ambiguity of processes having the same logical pages mapped to different physical page frames.
One way to do the lookup of logical page/PID combinations in the inverted page table is to use a separate hash table. That's the PID in your diagram. You "p" appear to be logical page numbers.
To get around in the real world, you just need to know that inverted page tables exist and their basic operation.

Hierachical Per-Process Page Tables: why don't we use simple linear array?

I would like to know why we need hierachical page tables in OS that handle per-process page tables, using PTBR and PTLR registers in CPU (tipically stored in PCB).
Thanks to PTLR I can check the limit of page table size for the current process, so its page table will contain just entries for its address memory space (that will be not so large as system address memory space).
If virtual address space of a process isn't sparse (its virtual page numbers are 0, 1, 2, ...) I will have a process page table of at most some K entries: totally its size will be at most some MBs, and I think it would be better to use a simple contiguous array.
So, why a lot of real solutions (ie x86 and x64) are based on multi-level page tables (or Hashed Page Tables)?
Thanks.
Because sparse virtual address space is good. Sparse address space allows the OS to crash a program that chases (some) wild pointers, and it makes prelinked shared libraries practical, and perhaps most useful of all, it allows your stack to grow from the "top" end of memory and your heap from the "bottom" end. You could of course define the page table index as a signed integer, which would allow you to implement the latter feature with just a simple array.
Also, think of "memory overcommit" allocation - when you malloc a few gigabytes the OS might say, "sure, fine!", knowing that most programs that ask for a few gigabytes turn out to use only a small fraction thereof. You could have problems supporting things like that with a simple array that isn't unnecessarily large.

Paging: Basic, Hierarchical, Hashed, and Inverted

With respect to operating systems and page tables, it seems there are 4 general methods to paging and page tables
Basic - A single page table which stores the page number and the offset
Hierarchical - A multi-tiered table which breaks up the virtual address into multiple parts
Hashed - A hashed page table which may often include multiple hashings mapping to the same entry
Inverted - The logical address also includes the PID, page number and offset. Then the PID is used to find the page in to the table and the number of rows down the table is added to the offset to find the physical address for main memory. (Rough, and probably terrible definition)
I am just wondering what are the pros and cons of each method? It seems like basic is the easier method but may also take up more space in memory for a larger address space.
What else?
The key to building a usable page model is minimizing the unused space for entries that are not necessary. You want to minimize the amount of memory needed while keeping the computation cost of a memory lookup low.
Basic can take up a lot of memory (for a modern system using 4GB of memory, that might amount to 300 MB only for the table) and is therefore impractical.
Hierarchical reduces that memory a lot by only adding subtables that are actually in use. Still, every process has a root page table. And if the memory footprint of the processes is scattered, there may still be a lot of unnecessary entries in secondary tables. This is a far better solution regarding memory than Basic and introduces only a marginal computation increase.
Hashed does not work because of hash collisions
Inverted is the solution to make Hashed work. The memory use is very small (as big as a Basic table for a single process, plus some PID and chaining overhead). The problem is, if there is a hash collision (several processes use the same virtual address) you will have to follow the chain information (just as in a linked list) until you find the entry with a matching PID. This may produce a lot of computing overhead in addition to the hash computing, but will keep the memory footprint as small as possible.