Virtual Machine Instruction Length - encoding

I'm creating a virtual machine and I'm encoding the instructions into byte code. The instructions are hexadecimal numbers like this: 0x1064, this instruction means load the value of 100 (hexadecimal 64) into register 0 and the number of the load instruction is 1. My question is, if I wanted to load a larger number I would change the 64 to a larger number 3E8 for example (1000 in hexadecimal) the instruction would be 5 characters long, is it possible to keep the instructions the same length some how?

It is certainly possible to keep the instructions the same length. In fact, it is possible to having a turing complete language using only one instruction! The question is what you want to do.
For simplicity of decoding, you may just decide to have all the instructions be the same length. It increases the size of the code, but either way it doesn't really matter. Just do whatever you think is the best.

Related

How to calculate branch instruction offset?

Calculating offset of branch instruction:
Hey guys,
My professor sent us a study guide with answers. He never actually went over how he got the answers though. I've searched online but I did not have any luck finding an explanation so at this point I'm a bit desperate.
Does anyone know how the author arrived at those answers?
0xfffb is the 16-bit signed two's complement representation of -5. So in this machine, the offset is scaled by the (presumably fixed) instruction length to get a byte address. (It could have byte sized instructions, but that is not possible since the offset itself is 16-bits.) The architecture is such that by the time the branch is executed the PC is already incremented so a 0 offset is a NOP, a -1 offset branches to the branch itself, -2 branches to the instruction before the branch, etc. Count backwards until you get to loop:. (Either there is some more info attached to the question, or known context, giving the details of the architecture I have used in making the answer, or it is a fairly badly written question.)
For the cache question, you mostly just have to know the names used to describe cache architectures (or "cache geometry", "cache shape" etc.). "2-way set associative" means there are two places in the cache any given address can be placed. There are 128 sets, each of which can hold two blocks because it is 2-way associative, and each block is 32-bytes. (I usually call the 32-byte structure a "cache line", though it appears here the word "block" refers specifically to the data that is stored there and "line" also includes the valid bit and tag, etc.) You then break down the address starting from the least significant bit going outwards in the cache geometry.
It looks like this is an instruction cache so we're going to insist the bottom two bits are 0 and organize the cache in 32-bit items. The block is 32-bytes or 5-bits. 2 are the "byte offset" which probably should just be 0, and then 3 bits to complete the 5-bit part which gets called a "block offset" (really offset within the block). (This subdivison of the 5 low bits doesn't really change anything here.) 128 entries in the set gives a 7-bit "index." The rest of the address, 20 bits, has to be used to tag the block to make sure it holds the address being looked up. (I.e. to determine cache hit or miss.) Plus we need one more bit to say whether there's actually data in the block.
Then we just add it all up -- 32-bytes or 256 bits for the data, plus 20 bits of tag and 1 valid bit, multiply by 128 sets and 2 ways.

CISC instruction length

I was wondering, what is the maximum possible length of a CISC instruction on most of today's CISC architectures?
I haven't found the definitive answer yet, but it is suggested that it's 16 bytes long, in theory.
In the video # around 15:00 mins, why does the speaker suggests "in theory" and why exactly 16 bytes?
In practice as well. For the x86-64 AMD has limited the allowed instruction length to 15 bytes. After that, the instruction decoder will give up and signal an error.
Otherwise, with multiple instruction prefixes and override bytes, we don't know exactly how long the instruction could get. No limit at all, if we allow redundant repetitions of some prefixes.
Agner Fog describes the problem:
Executing three, four or five instructions simultaneously is not unusual. The limit is not the execution units, which we have plenty of, but the instruction decoder. The length of an instruction can be anywhere from one to fifteen bytes. If we want to decode several instructions simultaneously, then we have a serious problem. We have to know the length of the first instruction before we know where the second instruction begins. So we can't decode the second instruction before we have decoded the first instruction. The decoding is a serial process by nature, and it takes a lot of hardware to be able to decode multiple instructions per clock cycle. In other words, the decoding of instructions can be a serious bottleneck, and it becomes worse the more complicated the instruction codes are.
See the rest of his blog post here.
CISC is a design philosophy, not an architecture, therefore there's no such thing as "CISC instruction length", only instruction length of a specific CISC architecture (like x86 or Motorola 68k)
Talking specifically about x86 then the limit is 15 bytes. Theoretically the instruction length can be infinite because prefixes can be repeated. However that makes it difficult for the decoder so in 80286 Intel began to limit it to 10 bytes, and then 15 bytes in later ISA versions. For more information about it read
x86_64 ASM - maximum bytes for an instruction?
What is the maximum length an Intel 386 instruction without any prefixes?
Also note that RISC doesn't mean fixed-length instructions. Modern MIPS, ARM, RISC-V... all have a variable length instruction mode to increase code density

Is there any classic 3 byte fingerprint function?

I need a checksum/fingerprint function for short strings (say, 16 to 256 bytes) which fits in a 24 bits word. Is there any well known algorithm for that?
I propose to use a 24-bit CRC as an easy solution. CRCs are available in all lengths and always simple to compute. Wikipedia has a matching entry. The quality is far better than a modulo-reduced sum, because swapping characters will most likely produce a different CRC.
The next step (if it is a real threat to have a wrong string with the same checksum) would be a cryptographic MAC like CMAC. While this is too long out of the book, it can be reduced by taking the first 24 bits.
Simplest thing to do is a basic checksum - add up the bytes in the string, mod (2^24).
You have to watch out for character set issues when converting to bytes though, so everyone agrees on the same encoding of characters to bytes.

How the word-length of an ISA is implemented in the hardware and software?

I have learnt that word-length is an ISA feature, which has to be implemented in hardware and software both. I have a vague idea only about the answer. I need correction or confirmation. Does the word-length becomes size of the general purpose register in the CPU? Does the word-length become the size of the 'int'(just plain int, not long or short) for a compiler?
The word length is the number of bits natively handled by the system. Common versions right now are 32-bit words and 64-bit words.
For example, a byte can hold a number from 0-255. However, a 32-bit integer is from 0-4,294,967,295. An integer is the native "word size" of the system, so is 4-bytes wide in 32-bit systems and therefore is considerably larger than 0-255.
In fact, in many systems/compilers/etc. types which are smaller than a system's native word size are converted to that word size simply because it's more efficient than trying to put multiple values into a single word. A boolean, for example, can be represented by a single bit. However, if you write a piece of software that uses 32 boolean values, it's not going to squeeze them all into a single word. Each will be assigned its own word when it runs on the metal.
I am taking liberty and interpreting this question as size of integer on a computer in C or C++. In that case this link will help - Does the size of an int depend on the compiler and/or processor?.
However if read it literally then size of word of CPU should be size of its register.
Hardware implementation : Word-length is the number of bytes fetched by the CPU at a time and can also be called the natural size of the machine. though there is nothing natural about the computers. it also becomes size of the CPU's register in implementation, since it needs registers to store what it fetches. Having said that, it is possible to use a bigger register for storing purpose. IA-32 softwares (with word length 32bits) can run on x86-64 (with word length 64 bits). Software implementation: word-length becomes the size of 'int' (just plain int, not long,short)

What's the biggest number in a computer?

Just asked by my 5 year old kid: what is the biggest number in the computer?
We are not talking about max number for a specific data types, but the biggest number that a computer can represent.
Infinity is not allowed.
UPDATE my kid always wants to print as
well, so lets say the computer needs
to print this number and the kid to
know that its a big number. Of course,
in practice we won't print because
theres not enough trees.
This question is actually a very interesting one which mathematicians have devoted a fair bit of thought to. You can read about it in this article, which is a fascinating and accessible read.
Briefly, a guy named Tibor Rado set out to find some really big, but still well-defined, numbers by defining a sequence called the Busy Beaver numbers. He defined BB(n) to be the largest number of steps any Turing Machine could take before halting, given an input of n symbols. Note that this sequence is by its very nature not computable, so the numbers themselves, while well-defined, are very difficult to pin down. Here are the first few:
BB(1) = 1
BB(2) = 6
BB(3) = 21
BB(4) = 107
... wait for it ...
BB(5) >= 8,690,333,381,690,951
No one is sure how big exactly BB(5) is, but it is finite. And no one has any idea how big BB(6) and above are. But at least these numbers are completely well-defined mathematically, unlike "the largest number any human has ever thought of, plus one." ;)
So how about this:
The biggest number a computer can represent is the most instructions a program small enough to fit in its available memory can perform before halting.
Squared.
No, wait, cubed. No, raised to the power of itself!
Dammit!
Bits are not numbers. You, as a programmer, give them the meaning you want, possibly numbers.
Now, I decide that 1 represents "the biggest number ever thought by a human plus one".
Errr this is a five year old?
How about something along the lines of: "I'd love to tell you but the number is so big and would take so long to say, I'd die before I finished telling you".
// wait to see
for(;;)
{
printf("9");
}
roughly 2^AVAILABLE_MEMORY_IN_BITS
EDIT: The above is for actually storing a number and treats all media (RAM, HD, cloud etc.) as memory. Subtracting the OS footprint (measured in KB) doesn't make "roughly" less accurate...
If you want to "represent" a number in a meaningful way, then you probably want to go with what the CPU provides: unsigned 32 bit integers (roughly 4 Gigs) or unsigned 64 bit integers for most computers your kid will come into contact with.
NOTE for talking to 5-year-olds: Often, they just want a factoid. Give him a really big and very accurate number (lots of digits), like 4'294'967'295. Then, once the glazing leaves his eyes, try to see how far you can get with explaining how computers represent numbers.
EDIT #2: I once read this article: Who Can Name the Bigger Number that should provide a whole lot of interesting information for your kid. Obviously he's not your normal five-year-old. So this might get you started in a cool direction about numbers and computation.
The answer to life (and this kids question): 42
That depends on the datatype you use to represent it. The computer only stores bits (0/1). We, as developers, give the bits meaning. (65 can be a number or the letter A).
For example, I can define my datatype as 1^N where N is unsigned and represented by an array of bits of arbitrary size. The next person can come up with 10^N which would be ten times larger than my biggest number.
Sure, there would be gaps but if you don't need them, that doesn't matter.
Therefore, the question is meaningless since it doesn't have context.
Well I had the same question earlier this day, so thought why not to make a little c++ codes to see where the computer gonna stop ...
But my laptop wasn't with me in class so I used another, well the number was to big but it never ends, i'll run it again for a night then i'll share the number
you can try the code is stupid
#include <stdlib.h>
#include <stdio.h>
int main() {
int i = 0;
for (i = 0; i <= i; i++) {
printf("%i\n", i);
i++;
}
}
And let it run till it stops ^^
The size will obviously be limited by the total size of hard drives you manage to put into your PC. After all, you can store a number in a text file occupying all disk space.
You can have 4x2Tb drives even in a simple box so around 8Tb available. if you store as binary, then the biggest number is 2 pow 64000000000000.
If your hard drive is 1 TB (8'000'000'000'000 bits), and you would print the number that fits on it on paper as hex digits (nobody would do that, but let's assume), that's 2,000,000,000,000 hex digits.
Each page would contain 4000 hex digits (40 x 100 digits). That's 500,000,000 pages.
Now stack the pages on top of each other (let's say each page is 0.004 inches / 0.1 mm thick), then the stack would be as 5 km (about 3 miles) tall.
I'll try to give a practical answer.
Common Lisp number crunching is particularly powerful. It has something called "bignums" which are integers that can be arbitrarily large, limited by the amount of available.
See: http://en.wikibooks.org/wiki/Common_Lisp/Advanced_topics/Numbers#Fixnums_and_Bignums
Don't know much about theory, but I far as I understood from your question, is: what is the largest number that the computer can represent (and I add: in a reasonable time, and not printing "9" until the Earth will "be eaten by the Sun"). And I put my PC to make one simple calculation (in PHP or whatever language): echo pow(2,1023) - resulting: 8.9884656743116E+307. So I guess this is the largest number that my PC can calculate. On the other side, I think the respresentation of the largest negative number can be: -0,(0)1
LE: That computed value was obataind through PHP, but I tried to figure out what's the largest number that my windows calculator can compute, and it is pow(2, 33219) = 8.2304951207588748764521361245002E+9999. Now I guess this is the largest number my PC can handle.
I think you should be very proud that your 5 year old is already asking questions like this.
And you should continue to promote that! This is truly amazing! With that said, I would say that saying Infinity does not
count is thinking incorrectly about what numbers mean in computer memory.
I feel like this way of thinking is a handicap.
Mathematicians will never be able to write out ALL the digits of pi or eulers number, BUT we FULLY understand it.
Pi, as an example, is perfectly represented by infinite this series: (Pi / 4) = 1 - 1/3 + 1/5 - 1/7 + 1/9 - …
Just because you literally can’t go to inf. or print every single digit in a console means nothing.
You could have printed the symbol representing pi and therefore capturing the inf. series.
Computer Algebra Systems (CAS) represent numbers symbolically all the time. Pi, for instance,
may be a Symbolic object in memory (the binary in memory did not DIRECTLY represent the number. It represents an "mathematical algorithm" for producing the answer to arbitrary precision).
Then you do some math with it, transforming from one expression to the next.
At no point in time did we not represent the number COMPLETELY.
At the end, you can do 2 things with this:
A) Evaluate the expression, turning it into a number of some kind (or Matrix or whatever).
BUT this number could very well be an approximation (say like 20 digits of pi).
B) Keep it in its symbolic form for reference. Obviously we don’t like staring at symbols because we
need to eventually turn the nobs on the apparatii.
NOTE: sometimes you can get a finite (non-irrational) number perfectly represented in memory (like number 1)
by taking limits or going to inf. Not literally having an inf. number in memory, but symbolically representing it.
Just throw this in Wolfram alpha: Lim[Exp[-x], x --> Inf]; It gives you the number 0. Which is EXACT.
In short:
It was the HUMANS need to have some binary in memory that DIRECTLY represented the number that caused
the number to degrade. Symbolically it was perfectly represented. You could design some algorithm that
just continues to calculate the next digits of pi or eulers number giving you an arbitrary amount of precision (Now, this is obviously not practical of course).
I hope this was at least somewhat useful or interesting to you, even if you disagree =)
Depends on how much the computer can handle. Although there are some times when the computer can handle numbers greater than (2^(bits-1)-1)... For example:
My computer is 64 bit (9223372036854775807), however the calculator that comes with the computer itself can handle numbers of up to 10^9999.
Many other supercomputers can exceed these limits, and the one with the most memory (bits) might as well be the one with the record (current largest number that can be held by computers).
Or, if it comes to visually seeing it on computers, you can just make a program that, on monitor, repeats writing 9 and not skips that line to form an ever-growing bunch of 9. :P
go on chrome then go on three dots above and click them then go on tools and then go on developer tool click on console and type Number.MAX_VALUE