How to calculate branch instruction offset?

How to calculate branch instruction offset? - offset

Calculating offset of branch instruction:
Hey guys,
My professor sent us a study guide with answers. He never actually went over how he got the answers though. I've searched online but I did not have any luck finding an explanation so at this point I'm a bit desperate.
Does anyone know how the author arrived at those answers?

0xfffb is the 16-bit signed two's complement representation of -5. So in this machine, the offset is scaled by the (presumably fixed) instruction length to get a byte address. (It could have byte sized instructions, but that is not possible since the offset itself is 16-bits.) The architecture is such that by the time the branch is executed the PC is already incremented so a 0 offset is a NOP, a -1 offset branches to the branch itself, -2 branches to the instruction before the branch, etc. Count backwards until you get to loop:. (Either there is some more info attached to the question, or known context, giving the details of the architecture I have used in making the answer, or it is a fairly badly written question.)
For the cache question, you mostly just have to know the names used to describe cache architectures (or "cache geometry", "cache shape" etc.). "2-way set associative" means there are two places in the cache any given address can be placed. There are 128 sets, each of which can hold two blocks because it is 2-way associative, and each block is 32-bytes. (I usually call the 32-byte structure a "cache line", though it appears here the word "block" refers specifically to the data that is stored there and "line" also includes the valid bit and tag, etc.) You then break down the address starting from the least significant bit going outwards in the cache geometry.
It looks like this is an instruction cache so we're going to insist the bottom two bits are 0 and organize the cache in 32-bit items. The block is 32-bytes or 5-bits. 2 are the "byte offset" which probably should just be 0, and then 3 bits to complete the 5-bit part which gets called a "block offset" (really offset within the block). (This subdivison of the 5 low bits doesn't really change anything here.) 128 entries in the set gives a 7-bit "index." The rest of the address, 20 bits, has to be used to tag the block to make sure it holds the address being looked up. (I.e. to determine cache hit or miss.) Plus we need one more bit to say whether there's actually data in the block.
Then we just add it all up -- 32-bytes or 256 bits for the data, plus 20 bits of tag and 1 valid bit, multiply by 128 sets and 2 ways.

Related

RISC-V: Immediate Encoding Variants

In the RISC-V Instruction Set Manual, User-Level ISA, I couldn't understand section 2.3 Immediate Encoding Variants page 11.
There is four types of instruction formats R, I, S, and U, then there is a variants of S and U types which are SB and UJ which I suppose mean Branch and Jump as shown in figure 2.3. Then there is the types of Immediate produced by RISC-V instructions shown in figure 2.4.
So my questions are, why the SB and UJ are needed? and why shuffle the Immediate bits in that way? what does it mean to say "the Immediate produced by RISC-V instructions"? and how are they produced in this manner?

To speed up decoding, the base RISC-V ISA puts the most important fields in the same place in every instruction. As you can see in the instruction formats table,
The major opcode is always in bits 0-6.
The destination register, when present, is always in bits 7-11.
The first source register, when present, is always in bits 15-19.
The second source register, when present, is always in bits 20-24.
The other bits are used for the minor opcode or other data for the instruction (funct3 in bits 12-14 and funct7 in bits 25-31), and for the immediate. How many bits can be used for the immediate depends on how many register numbers are present in the instruction:
Instructions with one destination and two source registers (R-type) have no immediate, for instance adding two registers (ADD);
Instructions with one destination and one source register (I-type) have 12 bits for the immediate, for instance adding one register with an immediate (ADDI);
Instructions with two source registers and no destination register (S-type), for instance the store instructions, have also 12 bits for the immediate, but they have to be in a different place since the register numbers are also in a different place;
Finally, instructions with only a destination register and no minor opcode (U-type), for instance LUI, can use 20 bits for the immediate (the major opcode and the destination register number together need 12 bits).
Now think from the other point of view, of the instructions which will use these immediate values. The simplest users, I-immediate and S-immediate, need only a sign-extended 12-bit value. The U-immediate instructions need the immediate in the upper 20 bits of a 32-bit value. Finally, the branch/jump instructions need the sign-extended immediate in the lower bits of the value, except for the lowest bit which will always be zero, since RISC-V instructions are always aligned to even addresses.
But why are the immediate bits shuffled? Think this time about the physical circuit which decodes the immediate field. Since it's a hardware implementation, the bits will be decoded in parallel; each bit in the output immediate will have a multiplexer to select which input bit it comes from. The bigger the multiplexer, the costlier and slower it is.
The "shuffling" of the immediate bits in the instruction encoding, therefore, is to make each output immediate bit have as little input instruction bit options as possible. For instance, immediate bit 1 can only come from instruction bits 8 (S-immediate or B-immediate), 21 (I-immediate or J-immediate), or constant zero (U-immediate or R-type instruction which has no immediate). Immediate bit 0 can come from instruction bits 7 (S-immediate), 20 (I-immediate), or constant zero. Immediate bit 5 can only come from instruction bit 25 or constant zero. And so on.
Instruction bit 31 is a special case: for RV-64, bits 32-63 of the immediate are always copies of instruction bit 31. This high fan-out adds a delay, which would be even bigger if it also needed a multiplexer, so it only has one option (other than constant zero, which can be treated later in the pipeline by ignoring the whole immediate).
It's also interesting to note that only the major opcode (bits 0-6) is needed to know how to decode the immediate, so immediate decoding can be done in parallel with decoding the rest of the instruction.
So, answering the questions:
SB-type doubles the range of branches, since instructions are always aligned to even addresses;
UJ-type has the same overall instruction format as U-type, but the immediate value is in the lower bits instead of the upper bits;
The immediate bits are shuffled to reduce the cost of decoding the immediate value, by reducing the number of choices for each output immediate bit;
The "immediate produced by RISC-V instructions" table shows the different kinds of immediate values which can be decoded from a RISC-V instruction, and from where in the instruction each bit comes from;
They are produced by, for each output immediate bit, using the major opcode (bits 0-6) to chose an input instruction bit.

The encoding is done to try and make the actual hardware implementation as simple as possible, rather than make it easy for the reader to understand at a glance.
In practice the compiler will generate the output and so it does not matter if it is not easy for the user to understand.
When possible the SB type tries to use the same bits for the same immediate bit positions as type S, that minimizes the hardware design complexity. So imm[4:1] and imm[10:5] are in the same place for both. The top most bit of the immediate values is always at position 31 so that you can use that bit to decide if a sign extension is needed. Again, this makes the hardware easier because for multiple types of instruction the top bit is used to decide on sign extension.

The RISC-V instruction encoding is chosen to simplify the decoder
2.2 Base Instruction Formats
The RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same position in all formats to simplify decoding. Except for the 5-bit immediates used in CSR instructions(Chapter 9), immediates are always sign-extended, and are generally packed towards the left most available bits in the instruction and have been allocated to reduce hardware complexity. In particular, the sign bit for all immediates is always in bit 31 of the instruction to speed sign-extension circuitry.
2.3 Immediate Encoding Variants
The only difference between the S and B formats is that the 12-bit immediate field is used to encode branch offsets in multiples of 2 in the B format. Instead of shifting all bits in the instruction-encoded immediate left by one in hardware as is conventionally done, the middle bits (imm[10:1]) and sign bit stay in fixed positions, while the lowest bit in S format (inst[7]) encodes a high-order bit in B format.
Similarly, the only difference between the U and J formats is that the 20-bit immediate is shiftedleft by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instructionbits in the U and J format immediates is chosen to maximize overlap with the other formats andwith each other.
https://riscv.org/technical/specifications/
The reason for the shuffling of the immediate in SB/UL formats has also been explained in the RISC-V spec
Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hard-ware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible timeto static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straight forward immediate encodings.

Virtual Machine Instruction Length

I'm creating a virtual machine and I'm encoding the instructions into byte code. The instructions are hexadecimal numbers like this: 0x1064, this instruction means load the value of 100 (hexadecimal 64) into register 0 and the number of the load instruction is 1. My question is, if I wanted to load a larger number I would change the 64 to a larger number 3E8 for example (1000 in hexadecimal) the instruction would be 5 characters long, is it possible to keep the instructions the same length some how?

It is certainly possible to keep the instructions the same length. In fact, it is possible to having a turing complete language using only one instruction! The question is what you want to do.
For simplicity of decoding, you may just decide to have all the instructions be the same length. It increases the size of the code, but either way it doesn't really matter. Just do whatever you think is the best.

Is there an explanation on the paging qn asked in 'The Social Network'?

"Suppose you are given a computer with a 16-bit virtual address and a page size of 256 bytes. The system uses 1-level page tables that start at address hex 400. Maybe you want DMA...who knows? The first few pages are reserved for hardware flags, etc. Assume page table entries have 8 status bits. The 8 status bits would be..."
http://www.youtube.com/watch?v=-3Rt2_9d7Jg
Can someone explain why the answer is as Mark/Jesse described?

According to this page documenting some technical inaccuracies of The Social Network, the question is a (badly) derived from a question from an actual Harvard course.
A sample problem: Suppose we are given a computer with a 16-bit
virtual addresses, and a page size of 256 bytes. The system uses
one-level page tables, which start at address 0x0400. (The first few
pages are reserved for hardware flags, etc. Maybe you wanted to have
DMA on your 16-bit system, who knows?) Assume page table entries have
eight status bits: 1 valid bit, 1 modify bit, 1 reference bit, and 5
permissions bits (this is a very secure system).
How many pages are there? How much memory do the page tables require?
The 8 status bits are architecture dependent and, in this particular problem, is made up as an assumption for an imaginary computer. The movie producers simply took the problem description and made one of the assumptions the question - a question that doesn't make sense to ask in the first place.
To more easily understand this, imagine that you have a question like the following
A car traveled across a road for 1 hour. Assuming the car's speed is
100km/h, how much distance did the car travel?
and the question turned into
A car traveled across a road for 1 hour. The speed of the car was...?
Edit: Didn't realise the original article used a similar analogy to mine.

What's the biggest number in a computer?

Just asked by my 5 year old kid: what is the biggest number in the computer?
We are not talking about max number for a specific data types, but the biggest number that a computer can represent.
Infinity is not allowed.
UPDATE my kid always wants to print as
well, so lets say the computer needs
to print this number and the kid to
know that its a big number. Of course,
in practice we won't print because
theres not enough trees.

This question is actually a very interesting one which mathematicians have devoted a fair bit of thought to. You can read about it in this article, which is a fascinating and accessible read.
Briefly, a guy named Tibor Rado set out to find some really big, but still well-defined, numbers by defining a sequence called the Busy Beaver numbers. He defined BB(n) to be the largest number of steps any Turing Machine could take before halting, given an input of n symbols. Note that this sequence is by its very nature not computable, so the numbers themselves, while well-defined, are very difficult to pin down. Here are the first few:
BB(1) = 1
BB(2) = 6
BB(3) = 21
BB(4) = 107
... wait for it ...
BB(5) >= 8,690,333,381,690,951
No one is sure how big exactly BB(5) is, but it is finite. And no one has any idea how big BB(6) and above are. But at least these numbers are completely well-defined mathematically, unlike "the largest number any human has ever thought of, plus one." ;)
So how about this:
The biggest number a computer can represent is the most instructions a program small enough to fit in its available memory can perform before halting.
Squared.
No, wait, cubed. No, raised to the power of itself!
Dammit!

Bits are not numbers. You, as a programmer, give them the meaning you want, possibly numbers.
Now, I decide that 1 represents "the biggest number ever thought by a human plus one".

Errr this is a five year old?
How about something along the lines of: "I'd love to tell you but the number is so big and would take so long to say, I'd die before I finished telling you".

// wait to see
for(;;)
{
printf("9");
}

roughly 2^AVAILABLE_MEMORY_IN_BITS
EDIT: The above is for actually storing a number and treats all media (RAM, HD, cloud etc.) as memory. Subtracting the OS footprint (measured in KB) doesn't make "roughly" less accurate...
If you want to "represent" a number in a meaningful way, then you probably want to go with what the CPU provides: unsigned 32 bit integers (roughly 4 Gigs) or unsigned 64 bit integers for most computers your kid will come into contact with.
NOTE for talking to 5-year-olds: Often, they just want a factoid. Give him a really big and very accurate number (lots of digits), like 4'294'967'295. Then, once the glazing leaves his eyes, try to see how far you can get with explaining how computers represent numbers.
EDIT #2: I once read this article: Who Can Name the Bigger Number that should provide a whole lot of interesting information for your kid. Obviously he's not your normal five-year-old. So this might get you started in a cool direction about numbers and computation.

The answer to life (and this kids question): 42

That depends on the datatype you use to represent it. The computer only stores bits (0/1). We, as developers, give the bits meaning. (65 can be a number or the letter A).
For example, I can define my datatype as 1^N where N is unsigned and represented by an array of bits of arbitrary size. The next person can come up with 10^N which would be ten times larger than my biggest number.
Sure, there would be gaps but if you don't need them, that doesn't matter.
Therefore, the question is meaningless since it doesn't have context.

Well I had the same question earlier this day, so thought why not to make a little c++ codes to see where the computer gonna stop ...
But my laptop wasn't with me in class so I used another, well the number was to big but it never ends, i'll run it again for a night then i'll share the number
you can try the code is stupid
#include <stdlib.h>
#include <stdio.h>
int main() {
int i = 0;
for (i = 0; i <= i; i++) {
printf("%i\n", i);
i++;
}
}
And let it run till it stops ^^

The size will obviously be limited by the total size of hard drives you manage to put into your PC. After all, you can store a number in a text file occupying all disk space.
You can have 4x2Tb drives even in a simple box so around 8Tb available. if you store as binary, then the biggest number is 2 pow 64000000000000.

If your hard drive is 1 TB (8'000'000'000'000 bits), and you would print the number that fits on it on paper as hex digits (nobody would do that, but let's assume), that's 2,000,000,000,000 hex digits.
Each page would contain 4000 hex digits (40 x 100 digits). That's 500,000,000 pages.
Now stack the pages on top of each other (let's say each page is 0.004 inches / 0.1 mm thick), then the stack would be as 5 km (about 3 miles) tall.

I'll try to give a practical answer.
Common Lisp number crunching is particularly powerful. It has something called "bignums" which are integers that can be arbitrarily large, limited by the amount of available.
See: http://en.wikibooks.org/wiki/Common_Lisp/Advanced_topics/Numbers#Fixnums_and_Bignums

Don't know much about theory, but I far as I understood from your question, is: what is the largest number that the computer can represent (and I add: in a reasonable time, and not printing "9" until the Earth will "be eaten by the Sun"). And I put my PC to make one simple calculation (in PHP or whatever language): echo pow(2,1023) - resulting: 8.9884656743116E+307. So I guess this is the largest number that my PC can calculate. On the other side, I think the respresentation of the largest negative number can be: -0,(0)1
LE: That computed value was obataind through PHP, but I tried to figure out what's the largest number that my windows calculator can compute, and it is pow(2, 33219) = 8.2304951207588748764521361245002E+9999. Now I guess this is the largest number my PC can handle.

I think you should be very proud that your 5 year old is already asking questions like this.
And you should continue to promote that! This is truly amazing! With that said, I would say that saying Infinity does not
count is thinking incorrectly about what numbers mean in computer memory.
I feel like this way of thinking is a handicap.
Mathematicians will never be able to write out ALL the digits of pi or eulers number, BUT we FULLY understand it.
Pi, as an example, is perfectly represented by infinite this series: (Pi / 4) = 1 - 1/3 + 1/5 - 1/7 + 1/9 - …
Just because you literally can’t go to inf. or print every single digit in a console means nothing.
You could have printed the symbol representing pi and therefore capturing the inf. series.
Computer Algebra Systems (CAS) represent numbers symbolically all the time. Pi, for instance,
may be a Symbolic object in memory (the binary in memory did not DIRECTLY represent the number. It represents an "mathematical algorithm" for producing the answer to arbitrary precision).
Then you do some math with it, transforming from one expression to the next.
At no point in time did we not represent the number COMPLETELY.
At the end, you can do 2 things with this:
A) Evaluate the expression, turning it into a number of some kind (or Matrix or whatever).
BUT this number could very well be an approximation (say like 20 digits of pi).
B) Keep it in its symbolic form for reference. Obviously we don’t like staring at symbols because we
need to eventually turn the nobs on the apparatii.
NOTE: sometimes you can get a finite (non-irrational) number perfectly represented in memory (like number 1)
by taking limits or going to inf. Not literally having an inf. number in memory, but symbolically representing it.
Just throw this in Wolfram alpha: Lim[Exp[-x], x --> Inf]; It gives you the number 0. Which is EXACT.
In short:
It was the HUMANS need to have some binary in memory that DIRECTLY represented the number that caused
the number to degrade. Symbolically it was perfectly represented. You could design some algorithm that
just continues to calculate the next digits of pi or eulers number giving you an arbitrary amount of precision (Now, this is obviously not practical of course).
I hope this was at least somewhat useful or interesting to you, even if you disagree =)

Depends on how much the computer can handle. Although there are some times when the computer can handle numbers greater than (2^(bits-1)-1)... For example:
My computer is 64 bit (9223372036854775807), however the calculator that comes with the computer itself can handle numbers of up to 10^9999.
Many other supercomputers can exceed these limits, and the one with the most memory (bits) might as well be the one with the record (current largest number that can be held by computers).
Or, if it comes to visually seeing it on computers, you can just make a program that, on monitor, repeats writing 9 and not skips that line to form an ever-growing bunch of 9. :P

go on chrome then go on three dots above and click them then go on tools and then go on developer tool click on console and type Number.MAX_VALUE

redundant encoding?

This is more of a computer science / information theory question than a straightforward programming one, so if anyone knows of a better site to post this, please let me know.
Let's say I have an N-bit piece of data that will be sent redundantly in M messages, where at least M-1 of those messages will be received successfully. I am interested in different ways of encoding the N-bit piece of data in fewer bits per message. (this is similar to RAID but at a much smaller level, where N = 8 or 16 or 32)
Example: suppose N = 16 and M = 4. Then I could use the following algorithm:
1st and 3rd message: send "0" + bits 0-7
2nd and 4th message: send "1" + bits 8-15
If I can guarantee that 3 messages of the 4 will get through, then at least one message from each group will get through. Thus I can make this work with 9 bits or less, there's probably a way to do this with fewer total bits but I'm not sure how.
Are there some simple encoding/decoding algorithms to do this kind of thing? Does this problem have a name? (if I know what it's called, I can google it!)
note: in my particular case, the messages either arrive correctly or do not arrive at all (no messages arrive with errors).
(edit: moved 2nd part to a separate question)

(Incomplete answer follows. I may add more later.)
The term you may be interested in is channel coding: adding redundancy to a source in order to make it robust during transmission over a noisy channel. In information theory, the complementary problem to channel coding is source coding: reducing the redundancy in a source to represent it using fewer bits. (The combination of these two problems is called joint source-channel coding.)
Your first question asks to find a channel code. The simple example you give is similar to a repetition code, i.e., you send the same message more than twice (usually an odd number of times), and then the message which is received most often is accepted as the original message.
This code is inefficient. To use standard notation, let k = number of bits in original message, and n = number of bits in the transmitted message. For your example, k = 16 and n = 36. A measure of coding efficiency is k/n, where higher means more efficient. In your case, k/n = 0.44. This is low.
The repetition code is a simple kind of block code, i.e., redundancy is added to each block of k bits to create a codeword of n bits. So are the Hamming and Reed-Solomon codes as others mentioned. Hamming codes are relatively easy to understand with some basic linear algebra.
These should be enough terms for you to search on your own. Good luck.

I'm not sure if I understood all the details of your question correctly, but your problem is definitely aboud designing some kind of error correcting code. This is a vast area of computer science and thick tomes have been written about it. Start with wikipedia and see if you can get any simple schemes (like Hamming or Reed-Solomon codes) to work in your case.
If you want to deal not only with symbol corruption, but also deletion of symbols, you should look at erasure codes, this is definitely a more difficult task but good methods exist in many cases.
EDIT: This material from hackersdelight.org seems a nice introduction.

See erasure codes.

You're looking for a packet erasure code. There are only two useful packet erasure codes that are not totally encumbered by patents, and there's only one open-source library to implement those. Find it here: http://planete-bcast.inrialpes.fr/rubrique.php3?id_rubrique=5

Here's a trivially simple scheme that's almost twice as efficient as your example.
You chopped the message into blocks of (N/M)*2 bits. Instead, chop it into N/(M-1)-bit blocks. (Round it up if necessary.) The first block, src[0], encodes as itself: enc[0]=src[0]. The same for the last block: enc[M-1]=src[M-1]. Each of the other blocks gets XORed with its left neighbor: enc[i]=src[i-1]^src[i].
Prefix each encoded block with a log(M)-bit sequence number, essentially as you did, so the receiver can tell which was dropped. (If you can be sure that whichever blocks arrive will arrive in order, then a 1-bit sequence number will do. Just alternate 0 and 1.)
To decode, successively XOR from the left and the right until you hit the dropped block. E.g. src[1] == enc[0]^enc[1]. (Dropping one of the endpoint blocks isn't a special case -- e.g. if the first block is dropped, the scan from the right recovers it, and the scan from the left is of length 0.)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse