On AMD64 compliant architectures, addresses need to be in canonical form before being dereferenced.
From the Intel manual, section 3.3.7.1:
In 64-bit mode, an address is considered to be in canonical form if
address bits 63 through to the most-significant implemented bit by the
microarchitecture are set to either all ones or all zeros.
Now, the most significat implemented bit on current operating systems and architectures is the 47th bit. This leaves us with a 48-bit address space.
Especially when ASLR is enabled, user programs can expect to receive an address with the 47th bit set.
If optimizations such as pointer tagging are used and the upper bits are used to store information, the program must make sure the 48th to 63th bits are set back to whatever the 47th bit was before dereferencing the address.
But consider this code:
int main()
{
int* intArray = new int[100];
int* it = intArray;
// Fill the array with any value.
for (int i = 0; i < 100; i++)
{
*it = 20;
it++;
}
delete [] intArray;
return 0;
}
Now consider that intArray is, say:
0000 0000 0000 0000 0111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1100
After setting it to intArray and increasing it once, and considering sizeof(int) == 4, it will become:
0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
The 47th bit is in bold. What happens here is that the second pointer retrieved by pointer arithmetic is invalid because not in canonical form. The correct address should be:
1111 1111 1111 1111 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
How do programs deal with this? Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?
The canonical address rules mean there is a giant hole in the 64-bit virtual address space. 2^47-1 is not contiguous with the next valid address above it, so a single mmap won't include any of the unusable range of 64-bit addresses.
+----------+
| 2^64-1 | 0xffffffffffffffff
| ... |
| 2^64-2^47| 0xffff800000000000
+----------+
| |
| unusable | not to scale: this part is 2^16 times as large
| |
+----------+
| 2^47-1 | 0x00007fffffffffff
| ... |
| 0 | 0x0000000000000000
+----------+
Also most kernels reserve the high half of the canonical range for their own use. e.g. x86-64 Linux's memory map. User-space can only allocate in the contiguous low range anyway so the existence of the gap is irrelevant.
Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?
Not exactly. The 48-bit address space supported by current hardware is an implementation detail. The canonical-address rules ensure that future systems can support more virtual address bits without breaking backwards compatibility to any significant degree.
At most, you'd just need a compat flag to have the OS not give the process any memory regions with high bits not all the same. (Like Linux's current MAP_32BIT flag for mmap, or a process-wide setting). That could support programs that used the high bits for tags and manually redid sign-extension.
Future hardware won't need to support any kind of flag to ignore high address bits or not, because junk in the high bits is currently an error. Intel 5-level paging adds another 9 virtual address bits, widening the canonical high andd low halves. white paper.
See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?
Fun fact: Linux defaults to mapping the stack at the top of the lower range of valid addresses. (Related: Why does Linux favor 0x7f mappings?)
$ gdb /bin/ls
...
(gdb) b _start
Function "_start" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_start) pending.
(gdb) r
Starting program: /bin/ls
Breakpoint 1, 0x00007ffff7dd9cd0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) p $rsp
$1 = (void *) 0x7fffffffd850
(gdb) exit
$ calc
2^47-1
0x7fffffffffff
(Modern GDB can use starti to break before the first user-space instruction executes instead of messing around with breakpoint commands.)
Essentially, how does 4Gb turn into 4GB? If the memory is addressing Bytes, should not the possibilities be 2(32/8)?
It depends on how you address the data.
If you use 32 bits to address each bit, you can address 232 bits or 4Gb = 512MB. If you address bytes like most current architectures it will give you 4GB.
But if you address much larger blocks you will need less bits to address 4GB. For example if you address each 512-byte block (29 bytes) you can address 4GB with 23 bits. FAT16 uses 16 bits to address (maximum) 64KB clusters and therefore can address a maximum 4GB volume. The same is used in Java Compressed Oops where you can address 32GB of memory with 32-bit reference.
Some older architectures even use word-addressable memory instead of byte like most do nowadays. Modern architectures that have a minimum addressable unit bigger than an octet are mainly found in DSPs. There also a few architectures with bit-addressable memory like Intel 8051
Most modern computers are byte-addressable, with each address identifying a single eight bit byte of storage; data too large to be stored in a single byte may reside in multiple bytes occupying a sequence of consecutive addresses.
There exist word-addressable computers, where minimal addressable storage unit is exactly the processor's word. For example, the Data General Nova minicomputer, and the Texas Instruments TMS9900 and National Semiconductor IMP-16 microcomputers used 16 bit words, and there were many 36-bit mainframe computers (e.g., PDP-10) which used 18-bit word addressing, not byte addressing, giving an address space of 218 36-bit words, approximately 1 megabyte of storage.
The efficiency of addressing of memory depends on the bit size of the bus used for addresses – the more bits used, the more addresses are available to the computer. For example, an 8-bit-byte-addressable machine with a 20-bit address bus (e.g. Intel 8086) can address 220 (1,048,576) memory locations, or one MiB of memory, while a 32-bit bus (e.g. Intel 80386) addresses 232 (4,294,967,296) locations, or a 4 GiB address space.
The electrical interface on the chip consists (extremely simplified) of a wires for the address (e.g. 32 address lines) and wires for the data (e.g. 8 wires for read data coming from the RAM, 8 wires for write data going to the RAM). In this case you have 232 words of 8 bits, so you can address 232*8 bits of data.
If you had a RAM with a word width of 16-bit instead (much more likely than 8-bit) you would be able to address twice as much RAM with the same number of address bits. On a modern system, you cannot really "read one byte" but instead the CPU fetches a whole cache line from the RAM and then gives you back just the byte that you asked for.
You can address 2 fields in memory with 1 bits.
You can address 4 fields in memory with 2 bits.
00, 01, 10, 11
So we can address memory by 2^n. For 32bit memory that each address holds 1byte can address 4GB data.
2^32 = 4.294.967.296 address can hold 4GByte data.
Currently I am going through Operating system principles by Galvin book. I am enjoying reading it but in the mean time I have a question.
Can I say that if I use a 64 bit operating system then the logical address space (that a CPU generates) can be of 64 bits? I.e. it will be able to map a large number of frames in the physical memory. If I use a 32 bit OS then the CPU can generate maximum of 2^32 logical address space.
Is that correct?
Sort of, but there are many technicalities which make these names less useful.
First, there are two different sizes that matter to an operating system: Address size and data size. The address size determines how big of an address space is available, and the data size determines how much data can be used in a single-word operation. In my experience, operating systems are usually identified by data size, which means the address size could be something else.
Below are some example architectures and their address and data sizes. As the table shows, the most common 32 bit and 64 bit architectures today have the same data and address sizes, which is why your statement is partially correct. Note that x86 processors in 16-bit mode have a larger address size than data size. This is caused by additional segment registers being used in addressing, which makes the architecture less restrictive.
Address size Data size
x86 16-bit 20 bits 16 bits
x86 32-bit 32 bits 32 bits
x86 64-bit 64 bits 64 bits
ARM 32-bit 32 bits 32 bits
ARM 64-bit 64 bits 64 bits
However, the address size does not necessarily indicate how big of a logical address space can be used. There could be a limitation which restricts the space to a smaller area. For example, no current x86-64 processor supports a 64 bit address space. Instead, they require that the high 16 bits of any address be a sign extension of bit 47, allowing a 248 address space, 256 TiB instead of 16 EiB. This reduces the number of address lines which need to be used in the processor while allowing far more than anyone currently uses.
Finally, everything so far has been in reference to the logical or virtual address space. The physical address space could have a different size. Newer 32 bit x86 systems have Physical Address Extension, which enables 36 bit physical addresses, and x86-64 systems are limited to no more than a 52 bit physical address space, but this can be further limited by the memory controller/motherboard. When the logical address space is bigger than the physical address space, it allows the entire physical address space to be mapped to multiple places at once. When the logical address space is smaller, it allows multiple complete address spaces to be stored in physical memory at the same time.
In a book I read the following:
32-bit processors have 2^32 possible addresses, while current 64-bit processors have a 48-bit address space
My expectation was that if it's a 64-bit processor, the address space should also be 2^64.
So I was wondering what is the reason for this limitation?
Because that's all that's needed. 48 bits give you an address space of 256 terabyte. That's a lot. You're not going to see a system which needs more than that any time soon.
So CPU manufacturers took a shortcut. They use an instruction set which allows a full 64-bit address space, but current CPUs just only use the lower 48 bits. The alternative was wasting transistors on handling a bigger address space which wasn't going to be needed for many years.
So once we get near the 48-bit limit, it's just a matter of releasing CPUs that handle the full address space, but it won't require any changes to the instruction set, and it won't break compatibility.
Any answer referring to the bus size and physical memory is slightly mistaken, since OP's question was about virtual address space not physical address space. For example the supposedly analogous limit on some 386's was a limit on the physical memory they could use, not the virtual address space, which was always a full 32 bits. In principle you could use a full 64 bits of virtual address space even with only a few MB of physical memory; of course you could do so by swapping, or for specialized tasks where you want to map the same page at most addresses (e.g. certain sparse-data operations).
I think the real answer is that AMD was just being cheap and hoped nobody would care for now, but I don't have references to cite.
Read the limitations section of the wikipedia article:
A PC cannot contain 4 petabytes of memory (due to the size of current memory chips if nothing else) but AMD envisioned large servers, shared memory clusters, and other uses of physical address space that might approach this in the foreseeable future, and the 52 bit physical address provides ample room for expansion while not incurring the cost of implementing 64-bit physical addresses
That is, there's no point implementing full 64 bit addressing at this point, because we can't build a system that could utilize such an address space in full - so we pick something that's practical for today's (and tomorrow's) systems.
The internal native register/operation width does not need to be reflected in the external address bus width.
Say you have a 64 bit processor which only needs to access 1 megabyte of RAM. A 20 bit address bus is all that is required. Why bother with the cost and hardware complexity of all the extra pins that you won't use?
The Motorola 68000 was like this; 32 bit internally, but with a 23 bit address bus (and a 16 bit data bus). The CPU could access 16 megabytes of RAM, and to load the native data type (32 bits) took two memory accesses (each bearing 16 bits of data).
There is a more severe reason than just saving transistors in the CPU address path: if you increase the size of the address space you need to increase the page size, increase the size of the page tables, or have a deeper page table structure (that is more levels of translation tables). All of these things increase the cost of a TLB miss, which hurts performance.
From my point of view, this is result from the page size.Each page at most contains 4096/8 =512 entries of page table. And 2^9 =512. So 9 * 4 + 12=48.
Many people have this misconception. But I am promising to you if you read this carefully, after reading this all your misconceptions will be cleart.
To say a processor 32 bit or 64 bit doesn't signify it should have 32 bit address bus or 64 bit address bus respectively!...I repeat it DOESN'T!!
32 bit processor means it has 32 bit ALU (Arithmetic and Logic Unit)...that means it can operate on 32 bit binary operand (or simply saying a binary number having 32 digits) and similarly 64 bit processor can operate on 64 bit binary operand. So weather a processor 32 bit or 64 bit DOESN'T signify the maximum amount of memory can be installed. They just show how large the operand can be...(for analogy you can think of a 10-digit calculator can calculate results upto 10 digits...it cannot give us 11 digits or any other bigger results... although it is in decimal but I am telling this analogy for simplicity)...but what you are saying is address space that is the maximum directly interfaceable size of memory (RAM). The RAM's maximum possible size is determined by the size of the address bus and it is not the size of the data bus or even ALU on which the processor's size is defined (32/64 bit). Yes if a processor has 32 bit "Address bus" then it is able to address 2^32 byte=4GB of RAM (or for 64 bit it will be 2^64)...but saying a processor 32 bit or 64 bit has nothing relevance to this address space (address space=how far it can access to the memory or the maximum size of RAM) and it is only depended on the size of its ALU. Of course data bus and address bus may be of same sized and then it may seem that 32 bit processor means it will access 2^32 byte or 4 GB memory...but it is a coincidence only and it won't be the same for all....for example intel 8086 is a 16 bit processor (as it has 16 bit ALU) so as your saying it should have accessed to 2^16 byte=64 KB of memory but it is not true. It can access upto 1 MB of memory for having 20 bit address bus....You can google if you have any doubts:)
I think I have made my point clear.Now coming to your question...as 64 bit processor doesn't mean that it must have 64 bit address bus so there is nohing wrong of having a 48 bit address bus in a 64 bit processor...they kept the address space smaller to make the design and fabrication cheap....as nobody gonna use such a big memory (2^64 byte)...where 2^48 byte is more than enough nowadays.
To answer the original question: There was no need to add more than 48 Bits of PA.
Servers need the maximum amount of memory, so let's try to dig deeper.
1) The largest (commonly used) server configuration is an 8 Socket system. An 8S system is nothing but 8 Server CPU's connected by a high speed coherent interconnect (or simply, a high speed "bus") to form a single node. There are larger clusters out there but they are few and far between, we are talking commonly used configurations here. Note that in the real world usages, 2 Socket system is one of the most commonly used servers, and 8S is typically considered very high end.
2) The main types of memory used by servers are byte addressable regular DRAM memory (eg DDR3/DDR4 memory), Memory Mapped IO - MMIO (such as memory used by an add-in card), as well as Configuration Space used to configure the devices that are present in the system. The first type of memory is the one that are usually the biggest (and hence need the biggest number of address bits). Some high end servers use a large amount of MMIO as well depending on what the actual configuration of the system is.
3) Assume each server CPU can house 16 DDR4 DIMMs in each slot. With a maximum size DDR4 DIMM of 256GB. (Depending on the version of server, this number of possible DIMMs per socket is actually less than 16 DIMMs, but continue reading for the sake of the example).
So each socket can theoretically have 16*256GB=4096GB = 4 TB.
For our example 8S system, the DRAM size can be a maximum of 4*8= 32 TB. This means that
the max number of bits needed to address this DRAM space is 45 (=log2 32TB/log2 2).
We wont go into the details of the other types of memory (MMIO, MMCFG etc), but the point here is that the most "demanding" type of memory for an 8 Socket system with the largest types of DDR4 DIMMs available today (256 GB DIMMs) use only 45 bits.
For an OS that supports 48 bits (WS16 for example), there are (48-45=) 3 remaining bits.
Which means that if we used the lower 45 bits solely for 32TB of DRAM, we still have 2^3 times of addressable memory which can be used for MMIO/MMCFG for a total of 256 TB of addressable space.
So, to summarize:
1) 48 bits of Physical address is plenty of bits to support the largest systems of today that are "fully loaded" with copious amounts of DDR4 and also plenty of other IO devices that demand MMIO space. 256TB to be exact.
Note that this 256TB address space (=48bits of physical address) does NOT include any disk drives like SATA drives because they are NOT part of the address map, they only include the memory that is byte-addressable, and is exposed to the OS.
2) CPU hardware may choose to implement 46, 48 or > 48 bits depending on the generation of the server. But another important factor is how many bits does the OS recognize.
Today, WS16 supports 48 bit Physical addresses (=256 TB).
What this means to the user is, even though one has a large, ultra modern server CPU that can support >48 bits of addressing, if you run an OS that only supports 48 bits of PA, then you can only take advantage of 256 TB.
3) All in all, there are two main factors to take advantage of higher number of address bits (= more memory capacity).
a) How many bits does your CPU HW support? (This can be determined by CPUID instruction in Intel CPUs).
b) What OS version are you running and how many bits of PA does it recognize/support.
The min of (a,b) will ultimately determine the amount of addressable space your system can take advantage of.
I have written this response without looking into the other responses in detail. Also, I have not delved in detail into the nuances of MMIO, MMCFG and the entirety of the address map construction. But I do hope this helps.
Thanks,
Anand K Enamandram,
Server Platform Architect
Intel Corporation
It's not true that only the low-order 48 bits of a 64 bit VA are used, at least with Intel 64. The upper 16 bits are used, sort of, kind of.
Section 3.3.7.1 Canonical Addressing in the Intel® 64 and IA-32 Architectures Software Developer’s Manual says:
a canonical address must have bits 63 through 48 set to zeros or ones (depending on whether bit 47 is a zero or one)
So bits 47 thru 63 form a super-bit, either all 1 or all 0. If an address isn't in canonical form, the implementation should fault.
On AArch64, this is different. According to the ARMv8 Instruction Set Overview, it's a 49-bit VA.
The AArch64 memory translation system supports a 49-bit virtual address (48 bits per translation table). Virtual addresses are sign- extended from 49 bits, and stored within a 64-bit pointer. Optionally, under control of a system register, the most significant 8 bits of a 64-bit pointer may hold a “tag” which will be ignored when used as a load/store address or the target of an indirect branch
A CPU is considered "N-bits" mainly upon its data-bus size, and upon big part of it's entities (internal architecture): Registers, Accumulators, Arithmetic-Logic-Unit (ALU), Instruction Set, etc. For example: The good old Motorola 6800 (or Intel 8050) CPU is a 8-bits CPU. It has a 8-bits data-bus, 8-bits internal architecture, & a 16-bits address-bus.
Although N-bits CPU may have some other than N-size entities. For example the impovments in the 6809 over the 6800 (both of them are 8-bits CPU with a 8-bits data-bus). Among the significant enhancements introduced in the 6809 were the use of two 8-bit accumulators (A and B, which could be combined into a single 16-bit register, D), two 16-bit index registers (X, Y) and two 16-bit stack pointers.