According to my understanding it shouldn't be, since it's in kernel space and kernel space is non pageable. But with 64 bit address space I don't see how it can hold the full page table since it would be prohibitively large. Any ideas on how this achieved?
Also I guess even holding it on disk fully would take a lot of space. Since most of the VM space would be unused is there way to limit the page table to contain only used VM address ranges?
The page table is actually a tree: it consists of multiple child tables. The head (root) table stores pointers to child tables, the child tables may also store the pointers to their child tables, and so on (the last table in chain stores the actual page table entry, of course). As the most memory in 64-bit address space is unused, it is not necessary to actually allocate memory for all the tables. The root table just sets most of its pointers to null.
On x86_64, there are 4-5 levels of such indirection.
The page table only needs to hold some subset of the pages that are available. There is no rule that it be complete. When an attempt is made to access a virtual address not mapped by the page table, the kernel is invoked. It can then put that mapping into the page table, removing other mappings that haven't been used recently if desired.
The OS is free to keep every possible mapping that a process might access in the page tables if it wants. Alternatively, it can keep only those recently used if it would prefer that arrangement.
Yes.
Quote from wiki - page table
It was mentioned that creating a page table structure that contained mappings for every virtual page in the virtual address space could end up being wasteful. But, we can get around the excessive space concerns by putting the page table in virtual memory, and letting the virtual memory system manage the memory for the page table.
However, part of this linear page table structure must always stay resident in physical memory, in order to prevent against circular page faults, that look for a key part of the page table that is not present in the page table, which is not present in the page table, etc.
Related
As part of the virtual to physical address conversion, for each process a table of mappings between virtual to physical addresses is stored. If a process is scheduled next the content of the page table is loaded into the MMU.
1) Where is the page table for each process stored? As part of the process control block?
2) Does the page table contain entries for not allocated memory so a segfault can be detected (more easily)?
3) Is it possible (and used in any known relevant OS) that one process does have multiple page frame sizes? Especially if question 2 is true it is very convenient to map huge page tables to non existing memory to keep the page table as small as possible. It will still allow high precision in mapping smaller frames to the memory to keep external (and internal) fragmentation as small as possible? This of course requires an extra field storing the frame size for each entry. Please point out the reason(s) if my "idea" cannot exist.
1) They could be, but most OS's have a notion of an address space which a process is attached to. The address space typically contains a description of the sorts of mappings that have been established, and pointers to the page structure(s). If you consider the operation of exec(2), at a certain level of abstraction it merely involves creating a new address space, populating it, then attaching the process to it. Once the operation is known to succeed, the old address space can simply be discarded.
2) It depends upon the mmu architecture of the machine. In a forward mapped arrangement (x86, armv[78]), the page tables form a sort of tree structure, but instead of having the conventional 2 or 3 items per node, there are hundreds or thousands of them. The x86-classic has a 2 level structure, where each of the 1024 entries in the first level points to a pagetable which covers 2^20 bytes of address space. Invalid entries, either at the inner or leaf level, can represent unmapped space; so in x86-classic, if you have a very small address space, you only need a root table, and a single leaf level table.
3) Yes, multiple page size has been supported by most OSes since the early 2000s. Again, in forward mapped ones, each of the levels of the tree can be replaced by a single large page for the same address space as that table level. x86-classic only had one size; later editions supported many more.
3a) There is no need to use large pages to do this -- simply having an invalid page table is sufficient. In x86-classic, the least significant bit of the page table/descriptor entry indicates the validity of the entry.
Your idea exists.
1) Where is the page table for each process stored? As part of the process control block?
Usually it's not "a page table". For some CPUs there's only TLB entries (Translation Lookaside Buffer entries - like a cache of what the translations are) where software has to handle "TLB miss" by loading whatever it feels like into the TLB itself, and where the OS might not use tables at all (e.g. could use "list of arbitrary length zones"). For some CPUs it's a hierarchy of multiple levels (e.g. for modern 64-bit 80x86 there's 4 levels); and in this case some of the levels may be in physical memory and some may be in swap space or somewhere else and some may be generated as needed from other data (a little bit like it would've been for "software handling of TLB miss"). In any case, if each process has its own virtual address space (e.g. and it's not some kind of "single-address space shared by many processes" scheme) its likely that the process control block (directly or indirectly) contains a reference to whatever the OS uses (e.g. maybe a single "physical address for the highest level page table", but maybe a virtual address of a "list of arbitrary length zones" and maybe anything else).
2) Does the page table contain entries for not allocated memory so a segfault can be detected (more easily)?
If there are page tables then there must be a way to indicate "page not present", where "page not present" may mean that the memory isn't allocated but could also mean that the (virtual) memory was allocated but the entry for it hasn't been set (either because OS is generating the tables on demand, or because the actual data is in swap space, or...).
3) Is it possible (and used in any known relevant OS) that one process does have multiple page frame sizes?
Yes. It's relatively common for 64-bit 80x86 where there's 4 KiB pages, 2 MiB (or 4 MiB) "large pages" (plus maybe 1 GiB "huge pages"); and done to reduce the chance of TLB misses (while also reducing memory consumed by page tables). Note that this is mostly an artifact of having multiple levels of page tables - an entry in a higher level table can say "this entry is a large page" or it can say "this entry is a lower level page table that might contain smaller pages". Note that in this case it's not "multiple page sizes in the same table", but is "fixed page size for each level".
Especially if question 2 is true it is very convenient to map huge page tables to non existing memory to keep the page table as small as possible. It will still allow high precision in mapping smaller frames to the memory to keep external (and internal) fragmentation as small as possible? This of course requires an extra field storing the frame size for each entry. Please point out the reason(s) if my "idea" cannot exist.
Converting a virtual address into a physical address (or some kind of fault to indicate the translation doesn't exist) needs to be very fast (because it happens extremely often). When you have "fixed page size for each level" it means you can extract some bits of the virtual address and use them as the index into the table; which is fast.
When you have "multiple page sizes in the same table" there's 2 options. The first option is to duplicate entries in the page table so that you can still extract some bits of the virtual address and use them as the index into the table; which (apart from minor differences in the way TLBs are managed - e.g. auto-detecting adjacent translations vs. being manually told) is effectively identical to not bothering at all; but there are some CPUs (ARM I think) that do this.
The other alternative is searching multiple entries in the page table to find the right entry, where the cost of searching reduces performance. I don't know of any CPU that supports this - performance is too important.
What I understand is we can't guarantee large amount (larger than page size) of contiguous memory. If size of page table itself is large enough that can't be stored in 1 page that is a problem. So we again do paging on page table what is called multilevel page table. But multilevel page table is not a good choice if address is greater than 32 bit because more leveling cost most computation.
To avoid this hashed page table is used.
From my understanding hashed page table [indexable] size should be under page size. So for large address size there is going to be lots of collisions. If page size is 12 bit page table consist 2^52 entries and hashtable size is going to 2^12 ( approx don't know the exact calculation) and then per index 2^40 sized linked list. So how is this going to be feasible. So my assumption is hashtable is going to store using other methods or elsewhere. Operating system concepts book dint explain much about it and others sites also.
I have read operating system concepts ninth edition page 380.
What I understand is we can't guarantee large amount (larger than page size) of contiguous memory.
Why? Often a physical memory manager has to be able to handle the allocation of physically contiguous buffers for (some) device drivers.
So we again do paging on page table what is called multilevel page table. But multilevel page table is not a good choice if address is greater than 32 bit because more leveling cost most computation.
Why? Most CPUs use multilevel page tables; and then have a TLB ("translation look-aside buffer") to avoid the cost of looking things up in the page tables. Modern 80x86 goes further and also has higher level paging structure caches (in addition to TLBs).
From my understanding hashed page table [indexable] size should be under page size. So for large address size there is going to be lots of collisions. If page size is 12 bit page table consist 2^52 entries and hashtable size is going to 2^12 ( approx don't know the exact calculation) and then per index 2^40 sized linked list. So how is this going to be feasible.
The thing is; if the translation isn't in the hash table (e.g. because of limited hash table size) usually the CPU generates a fault to ask the OS for assistance, and the OS figures out the translation and shoves it into the hash table (after evicting something else from the hash table to make room). Of course the OS will probably use its own multilevel page table to figure out the translation (to shove into the hash table); so the whole "hash table" thing ends up being a whole layer of annoying extra bloat (compared to CPUs that support multilevel page tables themselves).
I read about a page structure of the memory and can't get some points:
Page table: As I understood the process (like Intel i5) has the page table and TLB that integrated in its crystal, it isn't? But this table doesn't contain addresses of virtual pages, so OS must have yet one page table in the operative memory. So?
Inverted tables: I understood that it has the page table but this table contains addresses of real blocks of the memory. And I got a nothing more. Where does this table located in the process or OS provides it in the operative memory. What's a hash-function for?
From the picture. PID - Process ID (What's it for), p - page number (physical page or virtual page? If it's a physical page, what's this table for?).
Pls, do not refer me to Wiki and etc. I read it already and I couldn't get. Can a someone explain it clearer?
For learning purposes you should start with plain vanilla page tables. Ignore inverted page tables to get started because they are an oddball used in a very few processors.
The simplest case is a single level page table. In that case, the logical address consists of a logical page number and an offset within that page. To translate from logical pages to physical page frames, you take the page number, use that as an index into the page table. The page table then specifies the physical page frame (if there is one) the page is mapped to.
The next level of complexity is a multi level page table. In that case, the logical page number is broken down into bit fields, where each field represents a level in the table. The most significant bit field is in index into the top level page table. The corresponding page table entry references another page table. The next most significant bit field is an index into that page table. The process repeats until you get to the last page table level where the entries specify physical page frames.
Note that in this system the page table maps from logical address to physical page frames. There is no direct mapping between physical page frames and logical addresses.
For inverted page tables you have to relearn everything. There is a single page table with an entry for each physical page frame. The page table indicates the corresponding virtual page (if any) mapped to it.
In the inverted page table system, the processor can map from physical page frames to logical pages directly. In order to map from logical page frames to physical pages, the processor has to scan the page table (relying heavily on caching).
The mechanics of normal page tables are pretty much the same among systems (the major difference being the number of levels). However, there is no such similarity in systems that use inverted page tables.
If the system using inverted page tables uses a single, system wide table (as opposed to one table per process) there must be a PID field in the table to resolve the ambiguity of processes having the same logical pages mapped to different physical page frames.
One way to do the lookup of logical page/PID combinations in the inverted page table is to use a separate hash table. That's the PID in your diagram. You "p" appear to be logical page numbers.
To get around in the real world, you just need to know that inverted page tables exist and their basic operation.
Why does every process (or any address space) have its own page table?
I think that if we use a single page table for all processes, then a process can access the address apace of other processes so we need a separate page table for each process. This means the pages which actually belong to a particular process will be valid and all other pages which belong to some other's address space will be marked invalid. Am I correct?
If yes, then why didn't we add one more field as "process ID" to the page table to distinguish the address space of every process?
If not, why does every process (or any address space) have its own page table?
How can multilevel paging reduce the size of the page table?
Because we added some more page tables (in multilevel paging) as overhead, and the actual page table is also in main memory
Suppose we did 3 levels of paging as 1 (closer to CPU)->2->3; so we have three page tables for each level. What information is included in each page table? I am worried about 3rd level page table which contains the actual frame number where data resides. Now which page tables are used by processes?
All??? Then the 3rd level page table which contains the actual frames should be of the same size as the original page table (without multilevel) because it must have entries for all frames which are used by physical memory too.
Yes, you are correct in saying that one of the reason behind separate page tables is security concerns. Paging interface is exposed by hardware to the OS. The hardware doesn't understand what is a process? Process semantics is part of the OS design.For this reason we can't add process ID to page tables. You can look at hardware manuals to learn how paging works. Yes you are right that multilevel page doesn't help in reducing size of the page table. In my opinion the hardware requirement is top level page table must be always mapped in the memory. If there is only one level then you always need to map all the page table pages even if they are not used. This might be a reason why hardware exposes multiple level page tables.
With respect to operating systems and page tables, it seems there are 4 general methods to paging and page tables
Basic - A single page table which stores the page number and the offset
Hierarchical - A multi-tiered table which breaks up the virtual address into multiple parts
Hashed - A hashed page table which may often include multiple hashings mapping to the same entry
Inverted - The logical address also includes the PID, page number and offset. Then the PID is used to find the page in to the table and the number of rows down the table is added to the offset to find the physical address for main memory. (Rough, and probably terrible definition)
I am just wondering what are the pros and cons of each method? It seems like basic is the easier method but may also take up more space in memory for a larger address space.
What else?
The key to building a usable page model is minimizing the unused space for entries that are not necessary. You want to minimize the amount of memory needed while keeping the computation cost of a memory lookup low.
Basic can take up a lot of memory (for a modern system using 4GB of memory, that might amount to 300 MB only for the table) and is therefore impractical.
Hierarchical reduces that memory a lot by only adding subtables that are actually in use. Still, every process has a root page table. And if the memory footprint of the processes is scattered, there may still be a lot of unnecessary entries in secondary tables. This is a far better solution regarding memory than Basic and introduces only a marginal computation increase.
Hashed does not work because of hash collisions
Inverted is the solution to make Hashed work. The memory use is very small (as big as a Basic table for a single process, plus some PID and chaining overhead). The problem is, if there is a hash collision (several processes use the same virtual address) you will have to follow the chain information (just as in a linked list) until you find the entry with a matching PID. This may produce a lot of computing overhead in addition to the hash computing, but will keep the memory footprint as small as possible.