How the word-length of an ISA is implemented in the hardware and software?

How the word-length of an ISA is implemented in the hardware and software? - word

I have learnt that word-length is an ISA feature, which has to be implemented in hardware and software both. I have a vague idea only about the answer. I need correction or confirmation. Does the word-length becomes size of the general purpose register in the CPU? Does the word-length become the size of the 'int'(just plain int, not long or short) for a compiler?

The word length is the number of bits natively handled by the system. Common versions right now are 32-bit words and 64-bit words.
For example, a byte can hold a number from 0-255. However, a 32-bit integer is from 0-4,294,967,295. An integer is the native "word size" of the system, so is 4-bytes wide in 32-bit systems and therefore is considerably larger than 0-255.
In fact, in many systems/compilers/etc. types which are smaller than a system's native word size are converted to that word size simply because it's more efficient than trying to put multiple values into a single word. A boolean, for example, can be represented by a single bit. However, if you write a piece of software that uses 32 boolean values, it's not going to squeeze them all into a single word. Each will be assigned its own word when it runs on the metal.

I am taking liberty and interpreting this question as size of integer on a computer in C or C++. In that case this link will help - Does the size of an int depend on the compiler and/or processor?.
However if read it literally then size of word of CPU should be size of its register.

Hardware implementation : Word-length is the number of bytes fetched by the CPU at a time and can also be called the natural size of the machine. though there is nothing natural about the computers. it also becomes size of the CPU's register in implementation, since it needs registers to store what it fetches. Having said that, it is possible to use a bigger register for storing purpose. IA-32 softwares (with word length 32bits) can run on x86-64 (with word length 64 bits). Software implementation: word-length becomes the size of 'int' (just plain int, not long,short)

Related

Orthogonality of Instruction Set Architecture

I am studying the difference between CISC and RISC recently, and I've encountered into the term "Orthogonality". After doing some research, my understanding so far is that there are two "axes", addressing modes & operations, which are independent of each other, so it produces a maximum number of (#addressing modes * #operations) instructions in the ISA.
For CISC machine, which is a register-memory architecture, operands may come from register or memory and RISC a register-register(or load-store) one on the contrary.
So, what's the role of orthogonality playing in these two ISA? Is CISC more orthogonal than RISC or vice versa?
As the wiki describes, "Modern CPUs often simulate orthogonality in a preprocessing step before performing the actual tasks in a RISC-like core. This "simulated orthogonality" in general is a broader concept, encompassing the notions of decoupling and completeness in function libraries, like in the mathematical concept: an orthogonal function set is easy to use as a basis into expanded functions, ensuring that parts don’t affect another if we change one part." What does this paragraph mean? What is the preprocessing step, does it have anything to do with the microcode?
Any explanation are appreciated! Thanks a lot!

Maximizing total choices of possible instructions like a CISC is generally not what's meant. Instead it's more about being a simpler compiler target, without complex interactions in what makes an instruction legal or not. RISC machines are often highly orthogonal, and designed with being a compiler target in mind, not human programmers.
My understanding of the term is that orthogonality is more about any register being usable in any case where any other register is usable. Unlike x86 shl reg, cl where variable-count shifts require a specific register. (I know this is a RISC-V question, but the examples of non-orthogonality I know of come from other ISAs, primarily x86.)
And definitely not like 8086 (before 386), where if you needed to multiply, one of the operands had to be in the accumulator, AL or AX. And sign-extension was also only available there. 386 introduced movsx reg, r/m8 and r/m16. (And movzx, allowing easy and more efficient zero-extending of a byte from memory into SI or DI, without having to load 2 bytes and and si, 0x00ff.)
Even worse, 16-bit addressing modes only allow a few registers in very limited ways: [bp|bx] + [si|di] + disp0/8/16, vs. 32-bit addressing modes allowing stuff like lea eax, [ecx + ecx + 3] to use the same register twice, or address memory relative to the stack pointer without having to copy it to the base pointer (BP) register.
Or if some memory operands can use a certain addressing mode, can all memory operands use it? AArch64 ldp/stp (load-pair/store-pair of registers) I think has fewer available addressing modes than single-register loads, because it needs 5 extra bits for a second register number. Unlike ARM32 ldrd where the pair of registers is two contiguous registers, starting with an even number.
In general, the less interaction there is between a choice of one thing (like instruction) and the possible choices for another (a register), the more orthogonal.
One of the major benefits with this is being a simple compiler target. The most optimal code can more often be found with a greedy algorithm that only takes into a account one thing at a time, not interlocking tradeoffs. Not like x86-64 "if I use ECX instead of R9d for this variable, that'll save bytes in multiple instructions not needing REX prefix, but later mean I need an extra mov to copy a register for a shift count". (x86 BMI2 introduced variable-count shifts that can use a count from any register, like shlx ebx, eax, r15d)
Or far worse targeting 8086 or 286, where 16-bit addressing modes impose a lot more constraints on register allocation. And you'd more often you'd want to use instructions that needed their operands in specific registers, especially the accumulator.
But if you're not worried about every byte of code size, x86-64 is a fairly orthogonal ISA, usually you don't need to care about which register you use for what. One change in that direction beyond 386's important changes was making the low byte of every register addressable, like bpl, spl, sil, dil as the low bytes of RBP, RSP, RSI, RDI. (But those require REX prefixes, overlapping encodings with AH/CH/DH/BH which are only usable in instructions without REX prefixes.)
Another example of non-orthogonality is x86's notorious integer SIMD extensions, MMX and SSE2. Want to do minimum of unsigned integers 16 bytes at a time? In SSE2 we have pminub for unsigned byte elements. And pminsw, signed 16-bit elements. But no other combination of size and signedness until SSE4.1, several years later, which filled in the gaps allowing signed bytes and u16, as well as i32 and u32. And then AVX-512 added i64 and u64. Every min available always had a corresponding max, but other than that, SSE2 was highly non-orthogonal in that and many other ways, including signed/unsigned saturating add/sub, and pack of wider to narrower elements with signed or unsigned saturation. And FP vs. integer shuffles, e.g. there's no integer equivalent to shufps that takes two elements from one vector, two from another, using an immediate control operand. Fortunately for shuffles you can use FP shuffles on integer data.
x86 SIMD is still not very orthogonal in many ways, for example in integer multiply where not all combinations of element size are available for everything; 16-bit has 16x16 => 16-bit low half, signed high half, or unsigned high half. (And a widening multiply and horizontal-add, pmaddwd). 32-bit has signed and unsigned widening 32x32 => 64-bit, and with SSE4.1 also non-widening. 8-bit only has a multiply and horizontal-add where one operand is treated as signed, the other as unsigned.
Again, if I'm picking on x86 a lot, it's because it's what I know. And Intel painted a huge "kick me" sign on their back when they designed MMX and SSE2, only taking some steps to fix things later with SSE4.1. (I'm sure there are reasons for some of those choices, including transistor budget and opcode coding-space in x86's notoriously cramped machine-code.) But a lot of programs don't want to assume SSE4.1 as a requirement to run at all, even now, over a decade since the first SSE4.1 CPUs. Most other SIMD ISAs are more orthogonal than x86, like ARM NEON or PowerPC AltiVec.
Anyway, in general, it's more orthogonal if all operations are available in all combinations of size and signedness that exist for any operation. This isn't always a big deal for compilers per-se, more for humans not realizing that a compiler could make their code faster if this variable was unsigned or something.
Modern CPUs often simulate orthogonality in a preprocessing step before performing the actual tasks in a RISC-like core
That sounds like they're talking about decoding to uops, but I don't see how that would gain orthogonality.
Unless they're counting the concept of any instruction allowing a memory source operand as being more orthogonal. Normally you wouldn't, being a load/store architecture is basically a fixed constraint that doesn't make other things harder.
But if you do consider that more orthogonal, then yes, decoding add eax, [rdi] to 2 uops lets it run on a back-end that separates the load work from the store work, like a RISC.

I hadn't heard this term orthogonal instruction set before, however:
The VAX is perhaps the epitome of CISC.  The VAX supports many addressing modes, ranging from register itself, to memory specified by various indexing computations (some including pointer advancement, so as to do *p++ or *--p).
The VAX allows all addressing modes for all operands of any instruction.  Further, the VAX allows both 2 operand and 3 operand instructions, so addl2 is operand2 += operand1, and addl3 is operand3 = operand1 + operand2.
Basically it can encode a lot of stuff in a single instruction, so we can do for example, a[i] = *p++ + b[j]; in one instruction, assuming a, b, i, j, and p are in registers.
Other CISC-style processors limit the encoding, for example, so that we can only do two-operand instructions (no 3 operand), and some even limit the 2nd operand to a register, so only one memory operand.  I believe this is what they're getting at with the term orthogonal or not.
Meanwhile, a RISC processor instead follows a load/store architecture.  Access to memory is not allowed for any operand, but rather only via load and store instructions, and only with those instructions are there addressing modes.  Most all arithmetic operations (except the add for addressing) happen between registers alone.  (In some sense the RISC philosophy has an orthogonality since all arithmetic operations work on registers alone.)
I don't think the term orthogonality is of high value.  I wouldn't dwell on the term itself, but rather take away from that article the comparison between CISC ala VAX, vs. others CISC, vs. RISC.
#Peter also makes a good points, such as that certain registers being hard code (i.e. an implicit source/target) in some architectures for some instructions, which reduces orthogonality.
By that point I might stress that RISC architectures generally don't hard code registers, though MIPS hard codes the return address register ($31) for the jal instruction whereas RISC V does not ($sp and $ra are hard coded but only in the compressed instruction extension).  Whereas some CISC architectures (except VAX) hard code more registers. 
The MC68000 divides the registers into two sets of 8: addressing registers and data registers, which helps encoding by providing 16 registers with only 3-bit register fields, but also limits what you can do with them (and there aren't enough address registers, since one is the stack pointer and another the global pointer, leaving only 6).
CISC architecture often support byte vs. word sized arithmetic, whereas RISC architectures usually support only word sized arithmetic, so if you want byte, you have to simulate it (i.e. with range check or other).

What is char size in a computer architecture?

This Wikipedia article on word sizes provides a table of word sizes in different computer architectures. It has different columns like 'integer size', 'floating point size' etc. I suppose, integer size is the size of arguments for ALU, floating point size is the size of arguments for FPU, unit of address resolution is the number of bits/trits/digits represented by a single address. word size is given as the natural size of data used by the processor (which is still confusing somewhat).
But I'm wondering what does the char size column in the table represents? Is it the smallest object size theoretically possible? Is it the smallest alignment possible? What are the common operations defined over data of char size? In x86, x86-64, ARM architectures char size is 8 bits, which is same as the smallest integer size. But on some other architectures, char size is 5/6/7 bits which is very different from the integer size in that architecture.

In modern C, a char is guaranteed to be independently modifiable, without disturbing surrounding data. It's usually chosen to be the width of the narrowest load/store instruction. So on Alpha or word-addressable CPUs, a char had to be the word size, or else every char store would have to compile to an atomic RMW on the containing word. (Rather than a much cheaper non-atomic RMW like some early compilers actually used, before C11 introduces a thread-aware memory model to the language.) See Can modern x86 hardware not store a single byte to memory? (which covers modern ISAs in general) and C++ memory model and race conditions on char arrays for the requirements C++11 and C11 place on char.
But that Wikipedia table of word and char sizes in historical machines is clearly not about that, given the sizes. (e.g. smaller than a word on some word-addressable machines, I'm pretty sure).
It's about how software (and character I/O hardware like terminals) packed multiple character of the machine's native character encoding (e.g. a subset of ASCII, EBCDIC, or something earlier) into machine words.
Unicode, and variable-length character encodings like UTF-8 and UTF-16, are recent inventions compared to that history. https://en.wikipedia.org/wiki/Character_encoding#History
Many systems used fewer than 8 bits per character, e.g. 6 (64 unique encodings) is enough for the upper and lower case Latin alphabet plus some special characters and control codes.
These historical character sets are what motivated some of the choices for programming languages to use certain special characters or not, because they were developed on systems that had a certain character set.
Historical machines really did do things like pack 3 characters of text into an 18-bit word.
You might want to search on https://retrocomputing.stackexchange.com/, or even ask a question there after doing some more reading.

How are datatypes that need more than 32 bits stored in a 32 bit OS

How are datatypes that would need more than 32 bits stored in the system?
For example consider an unsigned int or a long which can have a value greater than 2 to the power of 32, how is it stored in the memory?

Any OS, or compiler, will use the number of bits that it needs. So if an OS or language has a need for 64-bit integers, it will just store such integers into an 8-byte representation.
There are standards for this, for integers as well as floating point numbers. See this article on Wikipedia for more: http://en.wikipedia.org/wiki/Computer_numbering_formats

The 32-bits in a 32-bit architecture to the number of bits the CPU registers are wide (There are some exceptions, such as floating point registers). This does not mean that the system can't handle datatypes larger than this, only that it must deal with these datatypes 32-bits at a time.
For example, machines have an "Add With Carry" instruction, which allows the machine to chain link multiple adds together so that arbitrarily sized numbers, say two 512-bit numbers, can be added in 16 steps (512/32).

why calculator and excel show value great than 2^31 -1 in 32bit systems?

I have taken some courses in c/c++ and I understand that int type size is limited according to CPU architecture.
For 32bit the maximum value of int is 2^32 ( 4,294,967,295 ), but when I use calculator or excel i get huge number and great than 2^32 .
I am really don't understand this detail how this programs print value great great than 2^23 value .

You can still use values larger than 32 bits in a 32 bit system. In fact, there are BIGINT libraries (like GMP) that allow you to use arbitrarily large integers. These large numbers simply have to be handled in software rather than hardware.
[x86-specific example] Where a simple 32-bit addition uses the add instruction, which adds two 32-bit registers, a 64-bit or BIGINT addition requires the numbers to be added 32 bits at a time, manually propagating the carry from one addition to the next.
See also:
How to implement big int in C++
storing big numbers in c++ or c

What is the exact meaning of 'N' bit processor ? , clarification for freescale arch

While reading one Freescale processor manual I stuck somewhere, which specifies that it is a 32-bit processor.
May I know the exact meaning and logic behind that?
Update:
Does it specify its ALU width or its address width or its register width specifically or all of them together is N-bit each.
Update:
Hope you have heard of Freescale processors. I just came across their site which describes one of their latest Starcore-based processor known as SC3850 as a 16-bit processor. As far as I know, it has 32 bit program counters, including ALU, and 40-bit register width and 2x64 bit address bus width. Also the SC3850 can handle SIMD(2) instructions which are of 32 bit or 64 bit.
For more details please go through this link

One of the major reasons you would care about the register width of the processor is performance. Generally doubling the number of bits doubles the rate at which a processor can move data around, and compute. This is why we're not all using 8 bit processors.
The other major reason is address space. A 16 bit program counter limits you to 64k of address space, and a 32 bit counter limits you to 4 gigabytes. The new 64 bit processors make it possible, if all the address lines are present, to support 17,179,869,184 gigabytes of memory.

Firstly i dont have a definitive answer but i would guess that 8 being a power of 2, is an important factor. Being a power of 2 also means that certain optimisations may be performed by dividing the 8 bits into groups which also means lookup tables can be used for certain operations. 8 bits in the past was also the perfect size when dealing wiht plain old ascii characters. I can imagine that using 5 bit bytes and encoding a string of ascii characters across memory would be a pain.

Please check out the Wikipedia entry on 32-bit processors, from the entry:
In computer architecture, 32-bit
integers, memory addresses, or other
data units are those that are at most
32 bits (4 octets) wide. Also, 32-bit
CPU and ALU architectures are those
that are based on registers, address
buses, or data buses of that size.
32-bit is also a term given to a
generation of computers in which
32-bit processors were the norm.
Read and understand the article - then the answer for N will be obvious.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse