Representation of a Kilo/Mega/Tera Byte

Representation of a Kilo/Mega/Tera Byte - numbers

I was getting a little confused with the representation of different units of bytes.
It is accepted throughout that 1 byte = 8 bits.
However, in a lot of sources I have seen that
1 kiloByte = 2^10 bytes = 1024 bytes
AND
1 kiloByte = 1000 bytes
Doesn't this contradict as in both cases it is stated that 1 byte is 8 bits...?
Different sources claim different reasons for these different representations, thus I am not sure what the most important/real reason is for this rather confusing difference in representation.
Can someone please explain and clarify?

It is accepted throughout that 1 byte = 8 bits
However, in a lot of sources I have seen that
1 kiloByte = 2^ 10 bytes = 1024 bytes
AND
1 kiloByte = 1000 bytes
To make sure we're all clear, your question is "Is a kilobyte equal to 1024 bytes or 1000 bytes?".
Doesn't this contradict as in both cases it is stated that 1 byte is 8 bits...?
This is irrelevant to the question.
So, let's begin. In SI (metric), the multiplier of 1000 is called kilo, abbreviated k. k always means 1000, never anything else.
When binary computers entered the world, we noticed that 2 to the power of 10 is 1024, which is conveniently close to 1000. Computer engineers decided to abuse this coincidence and say that kilo means 1024. By extension, they say that mega means 10242 (instead of the proper definition of 10002), and so on with giga, tera, etc.
While the difference between 1000 and 1024 is small for many purposes, there are times when exact answers are required, and this is where the abusive terminology hurts everyone. Only after decades after kilo=1024 got established did anyone really try to fix the problem. The IEC proposed new prefixes for the binary multipliers: 1024 = kibi, 10242 = mebi, 10243 = gibi, etc.
In summary, the notion that kilo=1024 is an abusive deviation from the consistent SI definition of kilo=1000. While kilo=1024 is popular in the computer industry, it is nevertheless wrong and should be replaced by kibi=1024. Or numbers need to be recomputed to reflect the true definition of kilo/mega/etc. (For example, "512 MB" of RAM is actually about 536.9 MB.)
Btw, don't use random capitalization; it's spelled kilobyte, not kiloByte.
References and links:
http://physics.nist.gov/cuu/Units/binary.html
http://en.wikipedia.org/wiki/Kilo-
http://en.wikipedia.org/wiki/Kilobyte
http://en.wikipedia.org/wiki/Kibibyte
http://xkcd.com/394/

When you talk about data information in computer science, you always have to calculate the result by a power of two. See what wikipedia says:
"In computing, a binary prefix is a
specifier or mnemonic that is
prepended to the units of digital
information, the bit and the byte, to
indicate multiplication by a power of
2. In practice the powers used are multiples of 10, so the prefixes
denote powers of 1024 = 2^10."
Sometimes people use to round it as you have mentioned, but it is a bad use of it.

I don't see what the byte to bits has to do with anything if you are asking whether 1 kiloByte is equal to 1024 or 1000 bytes. These measurements are not set in stone and are not really controlled at all. Computer makers can (and have) used the 1000 conversion to make it look like they have more memory.
The problem comes up when thinking about binary (base 2) or base 10. Base 10 you would use 1000, base 2, 1024.

Related

Why is this question worded like this regarding main memory?

I have this question:
1. How many bits are required to address a 4M × 16 main memory if main memory is word-addressable?
And before you say it, yes I have looked this question up and there have been posts on stackoverflow asking about how to answer it but my question is different.
This may sound like a silly question but I don't understand what it means when it says "How many bits are required to address...".
To my understanding and what I have been taught is that (if we're talking about word addressable) each cell would contain 16 bits in the RAM chip and the length would be 4M-1, with 2^22 words. But I don't understand what it is asking when it says 'How many bits are required...':
The answer says 22 bits would be required but I just don't understand. 22 bits for what? All I know is each word is 16 bits and each cell would be numbered from 0 - 4M-1. Can someone clear this up for me please?

Since you have 4 million cells, you need a number that is able to represent each cell. 22 bits is the size of the address to allow representing 2^22 cels (4,194,304 cells)
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor.
(https://en.m.wikipedia.org/wiki/Word)
Using this principle imagine a memory with a word that uses 2 bits only, and it is capable of storing 4 words:
XX|YY|WW|ZZ
Each word in this memory is represented by a number that tells to computer it's position.
XX is 0
YY is 1
WW is 2
ZZ is 3
The smallest binary number length that can represent 3 is a 2 bit binary length right? Now apply the same example to a largest memory. Doesn't matters if the word size is 16 bits or 2 bits. Only the length of words matters

Word Addressable Memory

For a 32 bit word addressable memory, the word has size of 4 bytes.
If I try to store a data structure uses less than 4 byte memory, say 2 bytes. Is the remaining 2 bytes wasted?
Should we consider the word size when we decide what data structure to use?
Got similar question here but not exactly what i am asking.
Please help.

On a modern CPU, memory itself is retrieved in usually chunks called cache lines (64 bytes on x86), but the CPU instruction set can address individual bytes.
If you had some esoteric machine with an instruction set that couldn't address individual bytes, then your compiler would hide that from you.
Whether or not memory is wasted in data structures smaller than a word would depend on the language you use and its implementation, but generally, records are aligned according to the field with the coarsest requirement. If you have an array of 16 bit integers, they will pack together tightly.

If you have 3 or 4 integers, it scarcely matters whether you store them in 2, 4, or 8 bytes.
If you have 3 or 4 billion integers, then it's probably worth considering a more space-efficient structure.
Generally speaking, the natural integer size for a given language implementation is supposed to be optimal in some way, so my advice is in general 'use int unless you know it's not appropriate' and let the compiler worry about it - until you have performance data to show otherwise.

What's the biggest number in a computer?

Just asked by my 5 year old kid: what is the biggest number in the computer?
We are not talking about max number for a specific data types, but the biggest number that a computer can represent.
Infinity is not allowed.
UPDATE my kid always wants to print as
well, so lets say the computer needs
to print this number and the kid to
know that its a big number. Of course,
in practice we won't print because
theres not enough trees.

This question is actually a very interesting one which mathematicians have devoted a fair bit of thought to. You can read about it in this article, which is a fascinating and accessible read.
Briefly, a guy named Tibor Rado set out to find some really big, but still well-defined, numbers by defining a sequence called the Busy Beaver numbers. He defined BB(n) to be the largest number of steps any Turing Machine could take before halting, given an input of n symbols. Note that this sequence is by its very nature not computable, so the numbers themselves, while well-defined, are very difficult to pin down. Here are the first few:
BB(1) = 1
BB(2) = 6
BB(3) = 21
BB(4) = 107
... wait for it ...
BB(5) >= 8,690,333,381,690,951
No one is sure how big exactly BB(5) is, but it is finite. And no one has any idea how big BB(6) and above are. But at least these numbers are completely well-defined mathematically, unlike "the largest number any human has ever thought of, plus one." ;)
So how about this:
The biggest number a computer can represent is the most instructions a program small enough to fit in its available memory can perform before halting.
Squared.
No, wait, cubed. No, raised to the power of itself!
Dammit!

Bits are not numbers. You, as a programmer, give them the meaning you want, possibly numbers.
Now, I decide that 1 represents "the biggest number ever thought by a human plus one".

Errr this is a five year old?
How about something along the lines of: "I'd love to tell you but the number is so big and would take so long to say, I'd die before I finished telling you".

// wait to see
for(;;)
{
printf("9");
}

roughly 2^AVAILABLE_MEMORY_IN_BITS
EDIT: The above is for actually storing a number and treats all media (RAM, HD, cloud etc.) as memory. Subtracting the OS footprint (measured in KB) doesn't make "roughly" less accurate...
If you want to "represent" a number in a meaningful way, then you probably want to go with what the CPU provides: unsigned 32 bit integers (roughly 4 Gigs) or unsigned 64 bit integers for most computers your kid will come into contact with.
NOTE for talking to 5-year-olds: Often, they just want a factoid. Give him a really big and very accurate number (lots of digits), like 4'294'967'295. Then, once the glazing leaves his eyes, try to see how far you can get with explaining how computers represent numbers.
EDIT #2: I once read this article: Who Can Name the Bigger Number that should provide a whole lot of interesting information for your kid. Obviously he's not your normal five-year-old. So this might get you started in a cool direction about numbers and computation.

The answer to life (and this kids question): 42

That depends on the datatype you use to represent it. The computer only stores bits (0/1). We, as developers, give the bits meaning. (65 can be a number or the letter A).
For example, I can define my datatype as 1^N where N is unsigned and represented by an array of bits of arbitrary size. The next person can come up with 10^N which would be ten times larger than my biggest number.
Sure, there would be gaps but if you don't need them, that doesn't matter.
Therefore, the question is meaningless since it doesn't have context.

Well I had the same question earlier this day, so thought why not to make a little c++ codes to see where the computer gonna stop ...
But my laptop wasn't with me in class so I used another, well the number was to big but it never ends, i'll run it again for a night then i'll share the number
you can try the code is stupid
#include <stdlib.h>
#include <stdio.h>
int main() {
int i = 0;
for (i = 0; i <= i; i++) {
printf("%i\n", i);
i++;
}
}
And let it run till it stops ^^

The size will obviously be limited by the total size of hard drives you manage to put into your PC. After all, you can store a number in a text file occupying all disk space.
You can have 4x2Tb drives even in a simple box so around 8Tb available. if you store as binary, then the biggest number is 2 pow 64000000000000.

If your hard drive is 1 TB (8'000'000'000'000 bits), and you would print the number that fits on it on paper as hex digits (nobody would do that, but let's assume), that's 2,000,000,000,000 hex digits.
Each page would contain 4000 hex digits (40 x 100 digits). That's 500,000,000 pages.
Now stack the pages on top of each other (let's say each page is 0.004 inches / 0.1 mm thick), then the stack would be as 5 km (about 3 miles) tall.

I'll try to give a practical answer.
Common Lisp number crunching is particularly powerful. It has something called "bignums" which are integers that can be arbitrarily large, limited by the amount of available.
See: http://en.wikibooks.org/wiki/Common_Lisp/Advanced_topics/Numbers#Fixnums_and_Bignums

Don't know much about theory, but I far as I understood from your question, is: what is the largest number that the computer can represent (and I add: in a reasonable time, and not printing "9" until the Earth will "be eaten by the Sun"). And I put my PC to make one simple calculation (in PHP or whatever language): echo pow(2,1023) - resulting: 8.9884656743116E+307. So I guess this is the largest number that my PC can calculate. On the other side, I think the respresentation of the largest negative number can be: -0,(0)1
LE: That computed value was obataind through PHP, but I tried to figure out what's the largest number that my windows calculator can compute, and it is pow(2, 33219) = 8.2304951207588748764521361245002E+9999. Now I guess this is the largest number my PC can handle.

I think you should be very proud that your 5 year old is already asking questions like this.
And you should continue to promote that! This is truly amazing! With that said, I would say that saying Infinity does not
count is thinking incorrectly about what numbers mean in computer memory.
I feel like this way of thinking is a handicap.
Mathematicians will never be able to write out ALL the digits of pi or eulers number, BUT we FULLY understand it.
Pi, as an example, is perfectly represented by infinite this series: (Pi / 4) = 1 - 1/3 + 1/5 - 1/7 + 1/9 - …
Just because you literally can’t go to inf. or print every single digit in a console means nothing.
You could have printed the symbol representing pi and therefore capturing the inf. series.
Computer Algebra Systems (CAS) represent numbers symbolically all the time. Pi, for instance,
may be a Symbolic object in memory (the binary in memory did not DIRECTLY represent the number. It represents an "mathematical algorithm" for producing the answer to arbitrary precision).
Then you do some math with it, transforming from one expression to the next.
At no point in time did we not represent the number COMPLETELY.
At the end, you can do 2 things with this:
A) Evaluate the expression, turning it into a number of some kind (or Matrix or whatever).
BUT this number could very well be an approximation (say like 20 digits of pi).
B) Keep it in its symbolic form for reference. Obviously we don’t like staring at symbols because we
need to eventually turn the nobs on the apparatii.
NOTE: sometimes you can get a finite (non-irrational) number perfectly represented in memory (like number 1)
by taking limits or going to inf. Not literally having an inf. number in memory, but symbolically representing it.
Just throw this in Wolfram alpha: Lim[Exp[-x], x --> Inf]; It gives you the number 0. Which is EXACT.
In short:
It was the HUMANS need to have some binary in memory that DIRECTLY represented the number that caused
the number to degrade. Symbolically it was perfectly represented. You could design some algorithm that
just continues to calculate the next digits of pi or eulers number giving you an arbitrary amount of precision (Now, this is obviously not practical of course).
I hope this was at least somewhat useful or interesting to you, even if you disagree =)

Depends on how much the computer can handle. Although there are some times when the computer can handle numbers greater than (2^(bits-1)-1)... For example:
My computer is 64 bit (9223372036854775807), however the calculator that comes with the computer itself can handle numbers of up to 10^9999.
Many other supercomputers can exceed these limits, and the one with the most memory (bits) might as well be the one with the record (current largest number that can be held by computers).
Or, if it comes to visually seeing it on computers, you can just make a program that, on monitor, repeats writing 9 and not skips that line to form an ever-growing bunch of 9. :P

go on chrome then go on three dots above and click them then go on tools and then go on developer tool click on console and type Number.MAX_VALUE

Variable-byte encoding clarification

I am very new to the world of byte encoding so please excuse me (and by all means, correct me) if I am using/expressing simple concepts in the wrong way.
I am trying to understand variable-byte encoding. I have read the Wikipedia article (http://en.wikipedia.org/wiki/Variable-width_encoding) as well as a book chapter from an Information Retrieval textbook. I think I understand how to encode a decimal integer. For example, if I wanted to provide variable-byte encoding for the integer 60, I would have the following result:
1 0 1 1 1 1 0 0
(please let me know if the above is incorrect). If I understand the scheme, then I'm not completely sure how the information is compressed. Is it because usually we would use 32 bits to represent an integer, so that representing 60 would result in 1 1 1 1 0 0 preceded by 26 zeros, thus wasting that space as opposed to representing it with just 8 bits instead?
Thank you in advance for the clarifications.

The way you do it is by reserving one of the bits to mean "I'm not done with the value." Usually, that's the most significant bit.
When you read a byte, you process the lower 7 bits. If the most significant bit is 1, then you know there's one more byte to read, and you repeat the process, adding the next 7 bits to the current 7 bits.
The MIDI format uses that exact encoding to represent lengths of MIDI events, in the following manner:
ExpectedValue = 0
byte=ReadFromFile
ExpectedValue = ExpectedValue + (byte AND 0x7f)
if byte > 127 then
ExpectedValue = ExpectedValue SHL 7
Goto 2
Done
For example, the value 0x80 would be represented using the bytes 0x81 0x00. You can try running the algorithm on those two bytes, and you see you'll get the right value.
UTF-8 works similarly, but it uses a slightly more complex scheme to tell you how many bytes you should be expecting. This allows for some error correction, since you can easily tell if the bytes you're getting match the length claimed. Wikipedia describes their structure quite well.

You hit the nail on the head.
There are many encoding schemes, such as gamma and delta, which are special cases of elias coding. These are bit-level codes, as opposed to the byte-level code you used, and are useful when you have a strong skew towards small numbers (which can often be achieved by encoding deltas instead of absolute values).
Bit-level encoding schemes are much more difficult to implement than byte-level schemes and the additional CPU burden may outweigh the time saved by having less data to read, though most modern CPUs have "highest-bit" and "lowest-bit" instructions that dramatically improve the performance of bit-level codecs. As CPU speeds continue to outpace RAM speeds, bit-level schemes will become more attractive, though the simplicity of byte-level codecs is a big factor too.

Yes, you are right, you save space by encoding using one byte instead of 4.
Generally, you will save memory if the values you are encoding are much smaller than the maximum value that would have fit in your original fixed-width encoding.

Converting from bandwidth to traffic gives different results depending on operators position?

This must be a stupid question, but nevertheless I find it curious:
Say I have a steady download of 128Kbps.
How much disk space is going to be consumed after a hour in Megabytes?
128 x 60 x 60 / 8 / 1024 = 56.25 MB
But
128 x 60 x 60 / 1000 /8 = 57.6 MB
So what is the correct way to calculate this?
Thanks!

In one calculation you're dividing by 1000, but in another you're dividing by 1024. There shouldn't be any surprise you get different numbers.
Officially, the International Electrotechnical Commission standards body has tried to push "kibibyte" as an alternative to "kilobyte" when you're talking about the 1024-based version. But if you use it, people will laugh at you.

Please remember that there is overhead in any transmission. There can be "dropped" packets etc. Also there is generally some upstream traffic as your PC acknoledges receipt of packets. Finally since packets can be received out of order, the packets themselves contain "extra" data to all the receiver to reconstruct the data in the proper order.

Ok, I found out an official explanation from Symantec on the matter:
http://seer.entsupport.symantec.com/docs/274171.htm
It seems the idea is to convert from bits to bytes as early as possible in calculation, and then the usual 1024 division comes in place.
I just hope it's a standard procedure, and not Symantec imposed one :).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse