In our exam the question was asked, and I couldn't answer it. However, I wonder its answer.
If there exist 2^N bit virtual addressing, 2^M bit physical
addressing and 2^L kb page size. In single paging, what is the page
size?
This is unanswerable unless you make random/unfounded assumptions.
For a silly example, you could assume that there are only 2 physical pages and only one virtual page (M == N-1), and that the page table is the same size as a page (and only has one page table entry) and therefore a page table entry consumes 2^L bits where one of these bits are used to select the physical page and all of the other bits are used for other purposes (access permissions, tracking accessed/dirty, spare bits for the OS to use however it likes, ...).
Related
I am trying to understand why is Page Size specified as part of an ISA.
More specifically, I am looking for details where any of the hardware modules (MMU, TLB) (apart from the Operating System) use the Page Size information to provide a certain functionality.
Please let me know the reasons Page Size has to be part of the ISA instead of just being decided by the OS.
Thanks.
The TLB hardware has to know the page size to figure out whether a translation applies to an address or not. e.g. given a translation, does an address 2500 bytes above it use that translation or not?
Or to put it another way, the TLB has to know which address bits are part of the page offset (within a page), and which bits need translating from virtual to physical.
Also, on architectures with HW page walk, the whole page table format is part of the ISA, and the typical design uses the virtual page number as an index to find the right entry (e.g. x86-64's 4-level page tables). Not a linear or binary search through the page table to find an entry that contains the virtual address being searched for. Normally this same design is used for page tables walked by software, AFAIK.
It is possible to build a TLB where each entry has a mask to control how many address bits it matches. i.e. where a single TLB can have entries for pages of multiples sizes. This only works if pages have power-of-2 sizes and are naturally aligned (i.e. the start address of a page is always some multiple of its size, so zeroing the low bits of an address inside a page gives you the page-start address).
You could potentially use this with an extent-based page-table format, where you have one entry for each contiguous mapping instead of one entry for each page.
Page-walks would probably be more costly, having to check more entries for more mappings, but the same number of TLB entries could cover more address space.
In many cases OSes want to be able to easily unmap or even page out unused pages, and this conflicts with using huge pages that cover a mix of hot and cold data or especially code. (But normal fixed-size hugepages have this problem, too, so x86-64's 2M and 1G hugepages aren't always a win vs. standard 4k pages.)
Page size isn't a part of the ISA (what a compiler would normally emit) for x86_64. The instruction set architecture for x86_64 is formally known as Intel® 64 Architecture, and it is briefly described in section 2.2.10 (volume 1) of the Intel® 64 and IA-32 Architectures Software Developer’s Manual. It describes what an application program can see and do. There is something similar for ARMv8.
Instead, page size is left to the OS, and it isn't a part of the ISA. This is because page sizes can vary amongst implementations and can vary according to mode settings (4K/2M/4M/1G). x86_64 implementations present something like an ISA to the OS which Intel refers to as the system programming level (what an OS would use). That's described in Chapter 13 of volume 2 of Intel's Software Developer's Manual.
That level describes page sizes and modes. But a 'correct' application program should run with different page sizes on different systems in different page size modes.
In an OS that uses virtual memory, each process has a page table. Each page table maps a process's virtual memory pages to the system's physical memory pages and indicates whether a given page is currently valid (loaded in memory) or not.
Suppose that memory is running low and the OS needs to choose a page to evict from physical memory. There are different algorithms for this. For example, FIFO, LRU. Once the OS chooses a page to evict, how does it invalidate any existing references to the page?
If the victim page is currently used by the active process, then the OS must invalidate the mapping in the current process's page table. If the victim page is currently used by another process, the OS must invalidate the mapping in the other process's page table. In any case, how does the OS determine which page table to update (if any), and how does it know where the mapping lives in that page table without doing a linear search?
There is a detailed description of x86 page table structure beginning at slide 22 of this presentation:
http://www.scs.stanford.edu/12au-cs140/notes/l8.pdf
I also found some helpful overviews of virtual memory:
http://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/9_VirtualMemory.html
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/virtual.html
Similar Stack Overflow question without an answer:
how does linux update page table after context switch
Actually, the thing you are asking about called reverse mapping. For example, in Linux, you can find usefull function try_to_unmap_anon Inside page descriptor there is a field called mapping. This field is anon_vma for annonymous pages. As you can see this is not just ordinary struct, but also a list entry. There might be several anon_vmas for one page (see try_to_unmap_anon):
list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
exactly one per page mapping. All this vmas linked into list. That is how kernel knows which processes (and their page tables) are in play.
Now about how kernel determines the virtual address ... again the answer could be found here: vma_address
233 pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
234 unsigned long address;
235
236 address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
So we can answer your question shortly now: in order not to do the page tables scan, kernel stores everything it needs (for quick fetch) in page descriptor (struct page).
Usually an OS will pick a process for whom to reduce memory first. Then it will pick pages from within that process. Then it knows the page table to use.
You suggest picking a page then finding which process. That normally occurs in systems that have pageable kernels. In that case, the system page table is shared by all processes.
In the movie The Social Network, when Mark Zuckberg was in class, the teacher asked this question:
Suppose we're given a computer, with a 16-bit virtual address, and a page size of 256-bytes,the system uses one-level page tables that start at address hex 400, may be you want DMA (Direct Memory Access) on your 16-bit system. Who knows? The first pages are reserved for hardware flags, etc. Assume page-table entries have eight status bits. The eight status bits would then be ...
Mark Zuckberg answered:
One valid bit, one modified bit, one reference bit and five permission bits.
How did he get this?
http://chomaloma.blogspot.com.au/2011/02/social-network-inaccuracies-regarding.html
That does explain it a little
Intel nomenclature in parentheses. The 'valid' (present), 'modified' (dirty) and 'reference' (accessed) bits are the minimum set of bits you need for a demand paging manager and MMU.
The 'valid' (present) bit is used by the MMU to know whether the page is mapped to a valid physical address.
The 'modified' (dirty) bit is used by the demand paging manager to determine if the page being evicted needs to be written to backing media. As accessing backing media can be considered an expensive operation, you really want to keep this to a minimum--especially when writing to it as that is generally slower than reading from it.
The 'reference' (accessed) bit is useful to the demand paging manager to figure out how to age the pages it controls. You don't want to evict the most frequently used pages as that would require saving and/or loading them repeatedly from backing store (which has already been stated as SLOW).
The remaining five bits are gravy. They are free to use as permission and/or option bits. For example, can the page be accessed by supervisor and/or user threads? Is the page available for write, or is it read-only? What is the caching strategy to be used on the page?
Hope this helps.
Sparky
How did he get the answer?---That is just movie BS.
If you take the number of bits in the address and subtract the number of bits used to represent a page, you get the number of bits available for the processor to use as system status bits.
With that information, he could identify the number of system status bits
The usage of those bits is another story. The allocation of system status bits is system dependent. Maybe they exist, but I don't know of any 16-bit virtual addressing system. So he's not referring to any specific type of system.
A reference bit is not used by all systems (e.g. VMS). That's not even mandatory.
Hollywood magic.
I'm familiar with the MIPS architecture, which is has a software-managed TLB. So how and where you (the operating system) wants to store the page tables and the page table entries is completely up to you. For example I did a project with a single inverted page table; I saw others using 2-level page tables per process.
But what's the story with x86? From what I know the TLB is hardware-managed. Does x86 tell basically tell you, "Hey this is where the page table entries you're currently using need to go [physical address range]"? But wait, I've always thought x86 uses multi-level page tables, so would it tell you where to put the 1st level or something...? I'm confused.
Thanks for any help.
Upon entering protected mode, the CR3 register points to a "page directory" (you can put it anywhere you want before you enter protected mode), which is a page of memory (remember, a "small" page is 4 KiB, and a "large" page is 4 MiB) with 1024 page directory entries (PDEs) that point to to "page tables". Each entry is the top 10 bits of a pointer (the address of the page table), plus a bunch of flags that make up the bottom portion of the pointer (present, permission, dirty, etc.).
(The 1024 just comes from the fact that a page is 4096 bytes and a pointer is 4 bytes.)
Each "page table" is itself 1024 "page table entries" (PTEs), which, again, contains 1024 entries that point to physical pages in memory, along with a bunch of (almost the same) flags.
So, to translate a 32-bit virtual address, you take the top 10 bits of the pointer as an index into the table at CR3 (since there are 210 entries), and -- if that PDE is further subdivided (meaning it isn't a "large" page, which you can figure out from the flags) -- you take the top 20 bits of the PDE, look up the page table at that address, and index into it with the virtual address's next-topmost 10 bits. Then the topmost 20 bits refer you to the physical page, assuming the bottom 12 bits tell you the physical page is actually present.
If you're using Physical Address Extension (PAE), then you get another level in the hierarchy at the very top.
Note: for your own sanity (and maybe the CPU's), you'd probably want to map the page directory and the page table to themselves, otherwise things get confusing fast. :)
The TLB is hardware-managed -- so the caching of the page tables is transparent -- but there is an instruction, InvlPG, that invalidates a PTE in the the TLB for you. (I don't know exactly when you should use it and when you shouldn't.)
Source: http://wiki.osdev.org/Paging
Is it possible to provide physical address for a given virtual address in a direct way to the TLB on x86-64 architectures in long mode?
For example, lets say, I put zeros in PML4E, so a page fault exception will be triggered because an invalid address will be found, during the exception can the CPU tell the TLB by using some instruction that this virtual address is located at X physical page frame?
I want to do this because by code I can easily tell where the physical address would be, and this way avoid expensive page walk.
No, you need to put a page to the TLB. To be precise, you need to create/update appropriate PTE (with PDE and PDPE if needed). Everything around MMU management is somehow based on page tables and TLB. Even user/supervisor protection mode is done as a special flag of mapped page.
Why do you think that "page walk" is expensive operation? It is not expensive at all. To determine the PTE that must be updated you need to dereference only 4 pointers: PML4E -> PDPE -> PDE -> PTE. These entries are just indices in related tables. To get PML4E you need to use 39-47 bits of address taken during page fault handling and use the value as an index in PML4 table. To get PDPE you need 30-39 bits of an address as an index in PDE table and so on. It's not the thing that can slow down your system. I think allocation of a physical page takes more time than that.