Cache memory logic - cpu-architecture

A computer has 1MB RAM and has a word size of 8 bits. Its has cache memory having 16 blocks with a block size of 32 bits. Show how the main memory address
1000 1111 1010 0101 1101 will be mapped to cache address, if
i) Direct cache mapping is used
ii) Associative cache mapping is used
iii)Two way Set associative cache mapping is used
Please enlighten me on how to solve this problem.I have looked all over and there is no detailed explanation on this.

32 bit is 4 bytes, you need 2 bits to address these 4 bytes (2^2) so you split off the 2 least significant bits from the address.
1000 1111 1010 0101 11-01
Direct mapped means it can go only into one place in the cache, there are 16 places in the cache so we must peel off the next least 4 bits (2^4=16) getting
1000 1111 1010 01-01 11-01
so 0111 (=7) is the line that gets filled.
If (Fully) Associative cache mapping is used it can go in any of the 16 positions.
Using two way Set associative cache mapping is alike direct mapped but where we split the cache in half (size=8=2^3), giving 2 possible position it can be stored in.
1000 1111 1010 010-1 11-01
so 111 is the index and any of the 2 possible positions can be used.
Read all about caches here.

Related

How to understand the physical address in this example?

The image is relating to an example of translating in virtual memory. The address of phys. mem. starts from 0x000 ~ 0x0FC, then moves start 0x100 ~ 0x1FC and so on. Why don't it go like 0x000 ~ 0x0FF, and then 0x100 ~ 0x1FF etc. What are the two lowest bits stand for?
Thank you for your answers. This photo came from MIT open course, and they didn't reveal more details about the address. But I finally figured it out in the later example of the courses.
The two lowest bits can always be zero as the following example:
Supports that we have:
4GB of MM size.
64 lines of cache.
ONLY 1 WORD = 4 bytes PER CACHE LINE.
The address have 32 bits because of 4GB of MM.
The partial address defining the line have 6 bits because of 64 lines of cache.
And because the cache size is 2^6*4B
=> The tag have 24 bits (log2(4GB/2^8B))
=> The lowest bits have 2(32 - 24 - 6) bits.
Because there is only a word per block so that the lowest bits, which act as a data boundary(This is what the course said), are always 0.

Addressing a word inside memory frames

Suppose we have a 64 bit processor with 8GB ram with frame size 1KB.
Now main memory size is 2^33 B
So number of frames is 2^33 / 2^10 which is 2^23 frames.
So we need 23 bits to uniquely identify every frame.
So the address split would be 23 | 10 where 10 bits are required to identify each byte in a frame (total 1024 bytes)
As it is word addressable with each word = 8B, will the address split now be 23 | 7 as we have 2^7 words in each frame?
Also can the data bus size be different than word size ?
If suppose data bus size is 128 bits then does it mean that we can address two words and transfer 2 words at a time in a single bus cycle but can only perform 64 bit operations?
Most of the answers are dependent on how the system is designed. Also there is bit more picture to your question.
There is something called available addressable space on a system. In a 32 bit application this would be 2^32 and in a 64 bit application this would be 2^64. This is called virtual memory. And there is physical memory which commonly refereed as RAM. If the application is built as 64 bits, then it is able work as if there is 2^64 memory is available. The underlying hardware may not have 2^64 RAM available, which taken care by the memory management unit. Basically it breaks both virtual memory and physical memory into pages( you have refereed to this as frames) and keeps the most frequently used pages in RAM. Rest are stored in the hard disk.
Now you state, the RAM is 8GB which supports 2^33 addressable locations. When you say the processor is 64 bits, I presume you are talking about a 64 bit system which supports 2^64 addressable locations. Now remember the applications is free to access any of these 2^64 locations. Number of pages available are 2^64/2^10 = 2^54. Now we need to know which virtual page is mapped to which physical page. There is a table called page table which has this information. So we take the first 54 bits of the address and index in to this table which will return the physical page number which will be 2^33/2^10 = 23 bits. We combine this 23 bits to the least 10 bits of the virtual address which gives us the physical address. In a general CPU, once the address is calculated, we don't just go an fetch it. First we check if its available in the cache, all the way down the hierarchy. If its not available a fetch request will be issued. When a cache issues a fetch request to main memory, it fetches an entire cache line (which is usually a few words)
I'm not sure what you mean by the following question.
As it is word addressable with each word = 8B, will the address split now be 23 | 7 as we have 2^7 words in each frame?
Memories are typically designed to be byte addressable. Therefore you'll need all the 33 bits to locate a byte within the page.
Also can the data bus size be different than word size ?
Yes you can design a data bus to have any width, but having it less than a byte would be painful.
If suppose data bus size is 128 bits then does it mean that we can
address two words and transfer 2 words at a time in a single bus cycle
but can only perform 64 bit operations?
Again the question is bit unclear, if the data but is 128 bits wide, and your cache line is wider than 128 bits, it'll take multiple cycles to return data as a response to a cache miss. You wont be doing operations on partial data in the cache (at least to the best of my knowledge), so you'll wait until the entire cache line is returned. And once its there, there is no restriction of what operations you can do on that line.

What is the total amount of virtual memory covered by one entry of page tables at each level?

The following parameters apply to a system employing a 40-bit virtual address and
1G bytes of physical (main) memory. Word size is 64 bits (8 bytes). Addresses point
to bytes and are aligned on byte boundaries. We use the following notation for an i-bit
address: Ai-1...A2,A1,A0 where Ai-1 is the most significant bit of the address and A0
is the least significant bit of the address. The virtual address is denoted by V39-V0
and the physical address is denoted by P29-P0.
Page size: 64 K bytes
Page table: three-level page table
The virtual page number is split in 3 fields of 8 bits each.
Entries in all tables are 32 bits (4 bytes).
This is what I have found so far,
Since it is a 40 bit virtual address and the Page Size is 64kB (2^16), 16 bits are for offset and we subtract 16 from 40. The remaining 24 bits are for the Virtual Page Number (VPN). The VPN is split in 3 fields of 8 bits each. So we have a three level page table. Each table has 2^8 entries and the size of each table is 2^8 * 4 bytes = 1024 bytes.
From here how would we proceed and find the total amount of virtual memory covered by one entry of page tables at each level?
At the lowest level each entry points to a single page, so working out the amount of virtual memory is trivial, its the size of 1 page. At each of the higher level, one entry represents n entries in the lower table (2^8 in this case). So for the second level its n * amount covered by a bottom level entry, or 2^8* the size of a page. Then use the size of a second level to repeat this calculation for the third level.

Virtual Memory page table growth

When processes are allowed to grow larger than memory, page tables also grow very large. How could we organize page tables and TLB to keep access times as quick as possible for codes with good locality? For example, assume physical memory is 512K, each page is 1K, and a TLB of size 128. If we assume most processes are 256K or less, then we could allocate a fixed-size page table with 256 entries. Now in the unexpected case, where the page table grows larger than 256 entries, how should we organize it? What implications does your design have on average access time and on the maximum virtual memory size of a program?
The solution used on x86 is to have "sparse" page tables, that is there isn't a full table to contain a mapping for each page. Rather a two level mechanism is used:
The virtual memory is 4 GB large. A single page has size 4 KB. Using a one level approach would thus require a table of 4 GB / 4 KB = 1024 * 1024 entries. If an entry consumed 4 bytes, then every process would need 4 MB just to store its table.
Using a two level approach we have a page directory with 1024 entries, each of size 4 bytes (making it fit perfectly into a single 4 KB page). Thus each entry in that directory manages 4 GB / 1024 = 4 MB. If (and only if) there should be a mapping of some pages of virtual memory to physical memory in that 4 MB range, then the entry points to an instance of another structure, a page table. That contains 1024 entries, too, so each one manages 4 MB / 1024 = 4 KB exactly one page.
If there's a process that just needs a single page to operate, then using the single level approach we need 4 MB to store its virtual memory configuration. Using the two level mechanism described above, we need 4 KB for the page directory and 4 KB for the page table containing the mapping for that single page. Thus only 8 KB are used to store the virtual memory configuration.
If the process needs additional memory at runtime, and if that memory is at a (virtual) address not within the 4 MB range managed by its page table, then a second page table needs to be provided, increasing the memory used to store the mappings by another 4 KB.
Using this two level approach slightly increases access times for pages not in the TLB, because the memory management unit needs to access two memory locations (the page directory, and afterwards the respective page table) to be able to compute the physical address.
The TLB is unaffected by this: It stores mappings of single pages. How these mappings have been established isn't relevant to its operation.
Let's apply this to the example configuration you gave above:
A singe page has 1 KB size. Most processes, as you said, will have 256 KB or less memory. But we want to be able to have processes using more virtual memory.
If we choose to have the last level handle a full 256 KB, then we have
256 KB / 1 KB = 256 entries. Assuming a 32 bit architecture, this in turn means we can have each entry with size of 4 byte (to hold an address). 256 entries * 4 Byte = 1 KB and thus a full page. Nice.
To be able to handle more virtual memory than 256 KB we add another layer. Because it's easy, we let this level use tables with 256 entries (a 4 byte), too, to make such a table exactly fit into a page.
This gives us a virtual memory of 256 * 256 KB (roughly 65 MB). An virtual address in that system would then be 26 bit long:
DDDDDDDDTTTTTTTTPPPPPPPPPP
D := Index to page directory, highest level.
8 bit to be able to index 256 entries.
T := Index to page table, lower level.
8 bit to be able to index 256 entries.
P := Offset inside page.
10 bit to be able to address 1024 bytes.
A process using less than 256 KB needs then 2 KB to manage its memory configuration. Each additional 256 KB of virtual memory needed add another 1 KB of configuration memory.
Assuming the TLB can hold 128 entries (your question is a bit unclear here) it would need 128 * (16 + X - 10) bit, where X is the number of bits used to address physical memory. (Though this depends on the actual implemenation. I was thinking about16 bit per entry to store the indices of the paging structures + the upper bits of the physical address, not counting the 10 bits offset)
I hope this answers your question. An actual implementation will need to make design choices based on a lot of constraints.

What is page table entry size?

I found this example.
Consider a system with a 32-bit logical address space. If the page
size in such a system is 4 KB (2^12), then a page table may consist of
up to 1 million entries (2^32/2^12). Assuming that
each entry consists of 4 bytes, each process may need up to 4 MB of physical address space for the page table alone.
What is the meaning of each entry consists of 4 bytes and why each process may need up to 4 MB of physical address space for the page table?
A page table is a table of conversions from virtual to physical addresses that the OS uses to artificially increase the total amount of main memory available in a system.
Physical memory is the actual bits located at addresses in memory (DRAM), while virtual memory is where the OS "lies" to processes by telling them where it's at, in order to do things like allow for 2^64 bits of address space, despite the fact that 2^32 bits is the most RAM normally used. (2^32 bits is 4 gigabytes, so 2^64 is 16 gb.)
Most default page table sizes are 4096 kb for each process, but the number of page table entries can increase if the process needs more process space. Page table sizes can also initially be allocated smaller or larger amounts or memory, it's just that 4 kb is usually the best size for most processes.
Note that a page table is a table of page entries. Both can have different sizes, but page table sizes are most commonly 4096 kb or 4 mb and page table size is increased by adding more entries.
As for why a PTE(page table entry) is 4 bytes:
Several answers say it's because the address space is 32 bits and the PTE needs 32 bits to hold the address.
But a PTE doesn't contain the complete address of a byte, only the physical page number. The rest of the bits contain flags or are left unused. It need not be 4 bytes exactly.
1) Because 4 bytes (32 bits) is exactly the right amount of space to hold any address in a 32-bit address space.
2) Because 1 million entries of 4 bytes each makes 4MB.
Your first doubt is in the line, "Each entry in the Page Table Entry, also called PTE, consists of 4 bytes". To understand this, first let's discuss what does page table contain?", Answer will be PTEs. So,this 4 bytes is the size of each PTE which consist of virtual address, offset,( And maybe 1-2 other fields if are required/desired)
So, now you know what page table contains, you can easily calculate the memory space it will take, that is: Total no. of PTEs times the size of a PTE.
Which will be: 1m * 4 bytes= 4MB
Hope this clears your doubt. :)
The page table entry is the number number of bits required to get any frame number . for example if you have a physical memory with 2^32 frames , then you would need 32 bits to represent it. These 32 bits are stored in the page table in 4 bytes(32/8) .
Now, since the number of pages are 1 million i.e. so the total size of the page table =
page table entry*number of pages
=4b*1million
=4mb.
hence, 4mb would be required to store store the table in the main memory(physical memory).
So, the entry refers to page table entry (PTE). The data stored in each entry is the physical memory address (PFN). The underlying assumption here is the physical memory also uses a 32-bit address space. Therefore, PTE will be at least 4 bytes (4 * 8 = 32 bits).
In a 32-bit system with memory page size of 4KB (2^2 * 2^10 B), the maximum number of pages a process could have will be 2^(32-12) = 1M. Each process thinks it has access to all physical memory. In order to translate all 1M virtual memory addresses to physical memory addresses, a process may need to store 1 M PTEs, that is 4MB.
Honestly a bit new to this myself, but to keep things short it looks like 4MB comes from the fact that there are 1 million entries (each PTE stores a physical page number, assuming it exists); therefore, 1 million PTE's, which is 2^20 = 1MB. 1MB * 4 Bytes = 4MB, so each process will require that for their page tables.
size of a page table entry depends upon the number of frames in the physical memory, since this text is from "OPERATING SYSTEM CONCEPTS by GALVIN" it is assumed here that number of pages and frames are same, so assuming the same, we find the number of pages/frames which comes out to be 2^20, since page table only stores the frame number of the respective page, so each page table entry has to be of atleast 20 bits to map 2^20 frame numbers with pages, here 4 byte is taken i.e 32 bits, because they are using the upper limit, since page table not only stores the frame numbers, but it also stores additional bits for protection and security, for eg. valid and invalid bit is also stored in the page table, so to map pages with frames we need only 20 bits, the rest are extra bits to store protection and security information.