Little Endian in Instruction - cpu-architecture

Little Endian in Instruction - cpu-architecture

I'm learning about RISC-V instructions in Computer Architecture.
What i wonder is, because of little endian, any number in RISC-V's instruction's little digit is on little bit.
I know that RISC-V use little endian to express data in memory. but I'm not sure it is same to express number in instructions.
for example, add instruction has that form, [funct7][rs2][rs1][funct3][rd][opcode], MSB is in funct7, LSB is in opcode. and rs1, rs2, rd is some number with 5bits.
if [rd] is 0b00001, position of 1 is LSB in rd's 5 bits. this point is my question. is reason that 1's position is LSB, RISC-V use little endian? if that is right, is (0b00001's) position of 1, MSB in big endian?

No, endian-ness only changes byte order, not bit ordering within fields of instructions.  In fact, we cannot observe the complete "bit ordering" because the individual bits don't have their own addresses (but we can see the effects of byte ordering when a field crosses a byte boundary).
It is true that the order of the fields themselves in RISC V are sort-of reversed from (big endian) MIPS, but that is really only so that the opcode field comes in the first byte — the lowest address of the bytes in the instruction.  This allows for variable length instructions, with little endian memory.  The lowest bits of the opcode determine the instruction length.

Related

A protocol is telling me to encode the numeric value 150 to 0x01 0x50, and the value 35 to 0x00 0x35?

So I'm trying to implement the 'ECR' protocol that talks to a credit card terminal (Ingenico/Telium device in Costa Rica).
The documentation for the 'length' bytes states:
Length of field DATA (it does not include ETX nor LRC)
Example: if length of field Message Data
is 150 bytes; then, 0x01 0x50 is sent.
I would think that the value '150' should be send as 0x00 0x96.
I've verified that that is not a typo. In a working example message which has 35 bytes of data, they really do send 0x00 0x35.
Am I missing something? Is this form of encoding the decimal representation of a value to its literal representation in hex a thing? Does it have a name? Why would anyone do this?

It has a name, and it was frequent in the past: it is Binary coded Decimal or in short BCD, see https://en.wikipedia.org/wiki/Binary-coded_decimal.
In fact Intel CPU but the 64-bit versions had special instructions to deal with them.
How it works: every decimal digit is encoded in 4 bits (a nibble), so a byte can host two decimal digits. And you get string of them to describe integer numbers. Note: to convert to string (or back from strings): you divide the nibbles and then it is just an addition ('0' + nibble): the C language requires that character encoding of digits must be consecutive (and ordered).
If you works a lot with decimals, it is convenient and fast: no need to transform to binary (which requires shift and addition, or just multiplications) and back (again shift or divisions). So in past when most CPU didn't have floating point co-processors, this was very convenient (especially if you need just to add or subtract numbers). So no need to handle precision errors (which banks doesn't like; was the first movie of Super Man about the villain getting rich by inserting a round error on a bank system? This show the worries of the time).
It has also less problem on number of bits: banks needs accounts with potential billions with a precision of cents. A BCD makes easier to port program on different platforms, with different endianess and different number of bits. Again: it was for the past, where 8-bit, 16-bit, 32-bit, 36-bit, etc. were common, and no real standard architecture.
It is obsolete system: newer CPUs doesn't have problem converting decimal to binary and back, and we have enough bits to handle cents. Note: still in financial sector the floating point is avoided. Just integers with a fixed point (usually 2 digits). But protocols and some sectors tend not to change protocols very often (for interoperability).

RISC-V: Immediate Encoding Variants

In the RISC-V Instruction Set Manual, User-Level ISA, I couldn't understand section 2.3 Immediate Encoding Variants page 11.
There is four types of instruction formats R, I, S, and U, then there is a variants of S and U types which are SB and UJ which I suppose mean Branch and Jump as shown in figure 2.3. Then there is the types of Immediate produced by RISC-V instructions shown in figure 2.4.
So my questions are, why the SB and UJ are needed? and why shuffle the Immediate bits in that way? what does it mean to say "the Immediate produced by RISC-V instructions"? and how are they produced in this manner?

To speed up decoding, the base RISC-V ISA puts the most important fields in the same place in every instruction. As you can see in the instruction formats table,
The major opcode is always in bits 0-6.
The destination register, when present, is always in bits 7-11.
The first source register, when present, is always in bits 15-19.
The second source register, when present, is always in bits 20-24.
The other bits are used for the minor opcode or other data for the instruction (funct3 in bits 12-14 and funct7 in bits 25-31), and for the immediate. How many bits can be used for the immediate depends on how many register numbers are present in the instruction:
Instructions with one destination and two source registers (R-type) have no immediate, for instance adding two registers (ADD);
Instructions with one destination and one source register (I-type) have 12 bits for the immediate, for instance adding one register with an immediate (ADDI);
Instructions with two source registers and no destination register (S-type), for instance the store instructions, have also 12 bits for the immediate, but they have to be in a different place since the register numbers are also in a different place;
Finally, instructions with only a destination register and no minor opcode (U-type), for instance LUI, can use 20 bits for the immediate (the major opcode and the destination register number together need 12 bits).
Now think from the other point of view, of the instructions which will use these immediate values. The simplest users, I-immediate and S-immediate, need only a sign-extended 12-bit value. The U-immediate instructions need the immediate in the upper 20 bits of a 32-bit value. Finally, the branch/jump instructions need the sign-extended immediate in the lower bits of the value, except for the lowest bit which will always be zero, since RISC-V instructions are always aligned to even addresses.
But why are the immediate bits shuffled? Think this time about the physical circuit which decodes the immediate field. Since it's a hardware implementation, the bits will be decoded in parallel; each bit in the output immediate will have a multiplexer to select which input bit it comes from. The bigger the multiplexer, the costlier and slower it is.
The "shuffling" of the immediate bits in the instruction encoding, therefore, is to make each output immediate bit have as little input instruction bit options as possible. For instance, immediate bit 1 can only come from instruction bits 8 (S-immediate or B-immediate), 21 (I-immediate or J-immediate), or constant zero (U-immediate or R-type instruction which has no immediate). Immediate bit 0 can come from instruction bits 7 (S-immediate), 20 (I-immediate), or constant zero. Immediate bit 5 can only come from instruction bit 25 or constant zero. And so on.
Instruction bit 31 is a special case: for RV-64, bits 32-63 of the immediate are always copies of instruction bit 31. This high fan-out adds a delay, which would be even bigger if it also needed a multiplexer, so it only has one option (other than constant zero, which can be treated later in the pipeline by ignoring the whole immediate).
It's also interesting to note that only the major opcode (bits 0-6) is needed to know how to decode the immediate, so immediate decoding can be done in parallel with decoding the rest of the instruction.
So, answering the questions:
SB-type doubles the range of branches, since instructions are always aligned to even addresses;
UJ-type has the same overall instruction format as U-type, but the immediate value is in the lower bits instead of the upper bits;
The immediate bits are shuffled to reduce the cost of decoding the immediate value, by reducing the number of choices for each output immediate bit;
The "immediate produced by RISC-V instructions" table shows the different kinds of immediate values which can be decoded from a RISC-V instruction, and from where in the instruction each bit comes from;
They are produced by, for each output immediate bit, using the major opcode (bits 0-6) to chose an input instruction bit.

The encoding is done to try and make the actual hardware implementation as simple as possible, rather than make it easy for the reader to understand at a glance.
In practice the compiler will generate the output and so it does not matter if it is not easy for the user to understand.
When possible the SB type tries to use the same bits for the same immediate bit positions as type S, that minimizes the hardware design complexity. So imm[4:1] and imm[10:5] are in the same place for both. The top most bit of the immediate values is always at position 31 so that you can use that bit to decide if a sign extension is needed. Again, this makes the hardware easier because for multiple types of instruction the top bit is used to decide on sign extension.

The RISC-V instruction encoding is chosen to simplify the decoder
2.2 Base Instruction Formats
The RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same position in all formats to simplify decoding. Except for the 5-bit immediates used in CSR instructions(Chapter 9), immediates are always sign-extended, and are generally packed towards the left most available bits in the instruction and have been allocated to reduce hardware complexity. In particular, the sign bit for all immediates is always in bit 31 of the instruction to speed sign-extension circuitry.
2.3 Immediate Encoding Variants
The only difference between the S and B formats is that the 12-bit immediate field is used to encode branch offsets in multiples of 2 in the B format. Instead of shifting all bits in the instruction-encoded immediate left by one in hardware as is conventionally done, the middle bits (imm[10:1]) and sign bit stay in fixed positions, while the lowest bit in S format (inst[7]) encodes a high-order bit in B format.
Similarly, the only difference between the U and J formats is that the 20-bit immediate is shiftedleft by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instructionbits in the U and J format immediates is chosen to maximize overlap with the other formats andwith each other.
https://riscv.org/technical/specifications/
The reason for the shuffling of the immediate in SB/UL formats has also been explained in the RISC-V spec
Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hard-ware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible timeto static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straight forward immediate encodings.

How are datatypes that need more than 32 bits stored in a 32 bit OS

How are datatypes that would need more than 32 bits stored in the system?
For example consider an unsigned int or a long which can have a value greater than 2 to the power of 32, how is it stored in the memory?

Any OS, or compiler, will use the number of bits that it needs. So if an OS or language has a need for 64-bit integers, it will just store such integers into an 8-byte representation.
There are standards for this, for integers as well as floating point numbers. See this article on Wikipedia for more: http://en.wikipedia.org/wiki/Computer_numbering_formats

The 32-bits in a 32-bit architecture to the number of bits the CPU registers are wide (There are some exceptions, such as floating point registers). This does not mean that the system can't handle datatypes larger than this, only that it must deal with these datatypes 32-bits at a time.
For example, machines have an "Add With Carry" instruction, which allows the machine to chain link multiple adds together so that arbitrarily sized numbers, say two 512-bit numbers, can be added in 16 steps (512/32).

What is the exact meaning of 'N' bit processor ? , clarification for freescale arch

While reading one Freescale processor manual I stuck somewhere, which specifies that it is a 32-bit processor.
May I know the exact meaning and logic behind that?
Update:
Does it specify its ALU width or its address width or its register width specifically or all of them together is N-bit each.
Update:
Hope you have heard of Freescale processors. I just came across their site which describes one of their latest Starcore-based processor known as SC3850 as a 16-bit processor. As far as I know, it has 32 bit program counters, including ALU, and 40-bit register width and 2x64 bit address bus width. Also the SC3850 can handle SIMD(2) instructions which are of 32 bit or 64 bit.
For more details please go through this link

One of the major reasons you would care about the register width of the processor is performance. Generally doubling the number of bits doubles the rate at which a processor can move data around, and compute. This is why we're not all using 8 bit processors.
The other major reason is address space. A 16 bit program counter limits you to 64k of address space, and a 32 bit counter limits you to 4 gigabytes. The new 64 bit processors make it possible, if all the address lines are present, to support 17,179,869,184 gigabytes of memory.

Firstly i dont have a definitive answer but i would guess that 8 being a power of 2, is an important factor. Being a power of 2 also means that certain optimisations may be performed by dividing the 8 bits into groups which also means lookup tables can be used for certain operations. 8 bits in the past was also the perfect size when dealing wiht plain old ascii characters. I can imagine that using 5 bit bytes and encoding a string of ascii characters across memory would be a pain.

Please check out the Wikipedia entry on 32-bit processors, from the entry:
In computer architecture, 32-bit
integers, memory addresses, or other
data units are those that are at most
32 bits (4 octets) wide. Also, 32-bit
CPU and ALU architectures are those
that are based on registers, address
buses, or data buses of that size.
32-bit is also a term given to a
generation of computers in which
32-bit processors were the norm.
Read and understand the article - then the answer for N will be obvious.

Variable-byte encoding clarification

I am very new to the world of byte encoding so please excuse me (and by all means, correct me) if I am using/expressing simple concepts in the wrong way.
I am trying to understand variable-byte encoding. I have read the Wikipedia article (http://en.wikipedia.org/wiki/Variable-width_encoding) as well as a book chapter from an Information Retrieval textbook. I think I understand how to encode a decimal integer. For example, if I wanted to provide variable-byte encoding for the integer 60, I would have the following result:
1 0 1 1 1 1 0 0
(please let me know if the above is incorrect). If I understand the scheme, then I'm not completely sure how the information is compressed. Is it because usually we would use 32 bits to represent an integer, so that representing 60 would result in 1 1 1 1 0 0 preceded by 26 zeros, thus wasting that space as opposed to representing it with just 8 bits instead?
Thank you in advance for the clarifications.

The way you do it is by reserving one of the bits to mean "I'm not done with the value." Usually, that's the most significant bit.
When you read a byte, you process the lower 7 bits. If the most significant bit is 1, then you know there's one more byte to read, and you repeat the process, adding the next 7 bits to the current 7 bits.
The MIDI format uses that exact encoding to represent lengths of MIDI events, in the following manner:
ExpectedValue = 0
byte=ReadFromFile
ExpectedValue = ExpectedValue + (byte AND 0x7f)
if byte > 127 then
ExpectedValue = ExpectedValue SHL 7
Goto 2
Done
For example, the value 0x80 would be represented using the bytes 0x81 0x00. You can try running the algorithm on those two bytes, and you see you'll get the right value.
UTF-8 works similarly, but it uses a slightly more complex scheme to tell you how many bytes you should be expecting. This allows for some error correction, since you can easily tell if the bytes you're getting match the length claimed. Wikipedia describes their structure quite well.

You hit the nail on the head.
There are many encoding schemes, such as gamma and delta, which are special cases of elias coding. These are bit-level codes, as opposed to the byte-level code you used, and are useful when you have a strong skew towards small numbers (which can often be achieved by encoding deltas instead of absolute values).
Bit-level encoding schemes are much more difficult to implement than byte-level schemes and the additional CPU burden may outweigh the time saved by having less data to read, though most modern CPUs have "highest-bit" and "lowest-bit" instructions that dramatically improve the performance of bit-level codecs. As CPU speeds continue to outpace RAM speeds, bit-level schemes will become more attractive, though the simplicity of byte-level codecs is a big factor too.

Yes, you are right, you save space by encoding using one byte instead of 4.
Generally, you will save memory if the values you are encoding are much smaller than the maximum value that would have fit in your original fixed-width encoding.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse