Say I had a 6 digit hexadecimal signed in two's complement. What would be its range?
-(16 ^ 5) < x < (16 ^ 5)
Correct?
This sounds like homework. If so, please tag your question as such.
One way to think about this:
How many bytes are represented with 6 hex digits?
How many bits are represented with those bytes?
How many bits do you lose due to the sign?
Given your total number of bits, what is smallest value you can represent?
Given your total number of bits, what is the largest value you can represent?
Think hard about the answer to the last question.
For example, the smallest signed 32-bit int is -2147483648. The largest signed 32-bit int is 2147483647.
Related
The max values of int, float and long in Scala are:
Int.MaxValue = 2147483647
Float.MaxValue = 3.4028235E38
Long.MaxValue = 9223372036854775807L
From the authors of Scala compiler, Keynote, PNW Scala 2013, slide 16 What's Int.MaxValue between friends?:
val x1: Float = Long.MaxValue
val x2: Float = Long.MaxValue - Int.MaxValue
println (x1 == x2)
// NO WONDER NOTHING WORKS
Why does this expression return true?
A Float is a 4-byte floating point value. Meanwhile a Long is an 8-byte value and an Int is also a 4-byte value. However, the way numbers are stored in 4-byte floating point values means that they have only around 8 digits of precision. Consequently, they do not have the capacity to store even the 4 most significant bytes (around 9-10 digits) of a Long regardless of the value of the least 4 significant bytes (another 9-10 digits).
Consequently, the Float representation of the two expressions is the same, because the bits that differ are below the resolution of a Float. Hence the two values compare equal.
Echoing Mike Allen's answer, but hoping to provide some additional context (would've left this as a comment rather than a separate answer, but SO's reputation feature wouldn't let me).
Integers have a maximum range of values defined as either 0 to 2^n (if it is an unsigned integer) or -2^(n-1) to 2^(n-1) (for signed integers) where n is the number of bits in the underlying implementation (n=32 in this case). If you wish to represent a number larger than 2^31 with a signed value, you can't use an int. A signed long will work up to 2^63. For anything larger than this, a signed float can go up to roughly 2^127.
One other thing to note is that these resolution issues are only in force when the value stored in the floating point number approaches the max. In this case, the subtraction operation causes a change in true value that is many orders of magnitude smaller than the first value. A float would not round off the difference between 100 and 101, but it might round off the difference between 10000000000000000000000000000 and 10000000000000000000000000001.
Same goes for small values. If you cast 0.1 to an integer, you get exactly 0. This is not generally considered a failing of the integer data type.
If you are operating on numbers that are many orders of magnitude different in size, and also not able to tolerate rounding errors, you will need data structures and algorithms that account for inherent limitations of binary data representation. One possible solution would be to use a floating point encoding with fewer bits of exponential, thereby limiting the max value but providing for greater resolution is less significant bits. For greater detail, check out:
look up the IEEE Standard 754 (which defines the floating point encoding)
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
Is it possible to somehow calculate minimum number of bits that are needed to represent integers in arithmetic coding? Let's say i have 100characters-length string. How many bits do i need to encode and decode that sequence?
To begin with, allow me to confess that I'm an experienced programmer, who has over 10 years of programming experience. However, the question I'm asking here is the one, which has bugged me ever since, I first picked up a book on C about a decade back.
Below is an excerpt from a book on Python, explaining about Python Floating type.
Floating-point numbers are represented using the native
double-precision (64-bit) representation of floating-point numbers on
the machine. Normally this is IEEE 754, which provides approximately
17 digits of precision and an exponent in the range of –308 to
308.This is the same as the double type in C.
What I never understood is the meaning of the phrase
" ... which provides approximately 17 digits of precision and an
exponent in the range of –308 to 308 ... "
My intuition here goes astray, since i can understand the meaning of precision, but how can range be different from that. I mean, if a floating point number can represent a value up to 17 digits, (i.e. max of 1,000,000,000,000,000,00 - 1), then how can exponent be +308. wouldn't that make a 308 digit number if exponent is 10 or a rough 100 digit number if exponent is 2.
I hope, I'm able to express my confusion.
Regards
Vaid, Abhishek
Suppose that we write 1500 with two digits of precision. That means we are being precise enough to distinguish 1500 from 1600 and 1400, but not precise enough to distinguish 1500 from 1510 or 1490. Telling those numbers apart would require three digits of precision.
Even though I wrote four digits, floating-point representation doesn't necessarily contain all these digits. 1500 is 1.5 * 10^3. In decimal floating-point representation, with two digits of precision, only the first two digits of the number and the exponent would be stored, which I will write (1.5, 3).
Why is there a distinction between the "real" digits and the placeholder zeros? Because it tells us how precisely we can represent numbers, that is, what fraction of their value is lost due to approximation. We can distinguish 1500 = (1.5, 3) from 1500+100 = (1.6, 3). But if we increase the exponent, we can't distinguish 15000 = (1.5, 4) from 15000+100 = (1.51, 4). At best, we can approximate numbers within +/- 10% with two decimal digits of precision. This is true no matter how small or large the exponent is allowed to be.
The regular decimal representation of numbers obscures the issue. If instead, one considers them in normalised scientific notation to separate the mantissa and exponent, then the distinction is immediate. Normalisation is achieved by scaling the mantissa until it is between 0.0 and 1.0 and adjusting the exponent to avoid loss of scale.
The precision of the number is the number of digits in the mantissa. A floating point number has a limited number of bits to represent this portion of the value. It determines how accurately numbers that are similar in size can be distinguished.
The range determines the allowable values of the exponent. In your example, the range of -308 through 308 is represented independently of the mantissa value and is limited by the number of bits in the floating point number allocated to storing the range.
These two values can be varied independently to suit. For example, in many graphic pipelines much smaller values are represented with truncated values that are scaled to fit into even 16 bits.
Numerical methods libraries expend large amounts of effort in ensuring that these limits are not exceeded to maintain the correctness of calculations. Casual use will not usually encounter these limits.
The choices in IEEE 754 are accepted to be a reasonable trade off between precision and range. The 32-bit single has similar, but different limits. The question What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems? provides a longer summary and further links for more study.
From Wiki, Double Precision Floating Point numbers are expected to have a precision to 17 digits, or 17 SF. The exponent can be in the range -1022 to 1023.
Their -308 to 308 would appear to be an error, or else an idea not fully explained.
I am learning High Level Assembly Language at the moment, and was going over the concept of signed and unsigned integers. It seems simple enough, however getting to sign extension has confused me.
Take the byte 10011010 which I would take to be 154 in decimal. Indeed, using a binary calculator with anything more than word selected shows this as 154 in decimal.
However, if I select the unit to be a byte, and type in 10011010 then suddenly it is treated as -102 in decimal. Whenever I increase it starting from a byte then it is sign extended and will always be -102 in decimal.
If I use anything higher than a byte then it remains 154 in decimal.
Could somebody please explain this seeming disparity?
When you select the unit as a byte the MSB of 10011010 is treated as the signed bit, which makes this one byte signed integer equivalent interpretation to -102 (2's complement).
For integers sized large than 8 bits, say 16 bits the number will be: 0000000010011010 which do not have 1 in MSB therefore it is treated as a positive integer whose integer representation is 154 in decimal. When you convert the 8 bit byte to a higher type the sign extension will preserve the -ve interpretation in the larger length storage too.
I just can't understand fixed point and floating point numbers due to hard to read definitions about them all over Google. But none that I have read provide a simple enough explanation of what they really are. Can I get a plain definition with example?
A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point) and a specific number of bits reserved for the fractional part (the part to the right of the decimal point). No matter how large or small your number is, it will always use the same number of bits for each portion. For example, if your fixed point format was in decimal IIIII.FFFFF then the largest number you could represent would be 99999.99999 and the smallest non-zero number would be 00000.00001. Every bit of code that processes such numbers has to have built-in knowledge of where the decimal point is.
A floating point number does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). So a floating point number that took up 10 digits with 2 digits reserved for the exponent might represent a largest value of 9.9999999e+50 and a smallest non-zero value of 0.0000001e-49.
A fixed point number just means that there are a fixed number of digits after the decimal point. A floating point number allows for a varying number of digits after the decimal point.
For example, if you have a way of storing numbers that requires exactly four digits after the decimal point, then it is fixed point. Without that restriction it is floating point.
Often, when fixed point is used, the programmer actually uses an integer and then makes the assumption that some of the digits are beyond the decimal point. For example, I might want to keep two digits of precision, so a value of 100 means actually means 1.00, 101 means 1.01, 12345 means 123.45, etc.
Floating point numbers are more general purpose because they can represent very small or very large numbers in the same way, but there is a small penalty in having to have extra storage for where the decimal place goes.
From my understanding, fixed-point arithmetic is done using integers. where the decimal part is stored in a fixed amount of bits, or the number is multiplied by how many digits of decimal precision is needed.
For example, If the number 12.34 needs to be stored and we only need two digits of precision after the decimal point, the number is multiplied by 100 to get 1234. When performing math on this number, we'd use this rule set. Adding 5620 or 56.20 to this number would yield 6854 in data or 68.54.
If we want to calculate the decimal part of a fixed-point number, we use the modulo (%) operand.
12.34 (pseudocode):
v1 = 1234 / 100 // get the whole number
v2 = 1234 % 100 // get the decimal number (100ths of a whole).
print v1 + "." + v2 // "12.34"
Floating point numbers are a completely different story in programming. The current standard for floating point numbers use something like 23 bits for the data of the number, 8 bits for the exponent, and 1 but for sign. See this Wikipedia link for more information on this.
The term ‘fixed point’ refers to the corresponding manner in which numbers are represented, with a fixed number of digits after, and sometimes before, the decimal point.
With floating-point representation, the placement of the decimal point can ‘float’ relative to the significant digits of the number.
For example, a fixed-point representation with a uniform decimal point placement convention can represent the numbers 123.45, 1234.56, 12345.67, etc, whereas a floating-point representation could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, etc.
There's of what a fixed-point number is and , but very little mention of what I consider the defining feature. The key difference is that floating-point numbers have a constant relative (percent) error caused by rounding or truncating. Fixed-point numbers have constant absolute error.
With 64-bit floats, you can be sure that the answer to x+y will never be off by more than 1 bit, but how big is a bit? Well, it depends on x and y -- if the exponent is equal to 10, then rounding off the last bit represents an error of 2^10=1024, but if the exponent is 0, then rounding off a bit is an error of 2^0=1.
With fixed point numbers, a bit always represents the same amount. For example, if we have 32 bits before the decimal point and 32 after, that means truncation errors will always change the answer by 2^-32 at most. This is great if you're working with numbers that are all about equal to 1, which gain a lot of precision, but bad if you're working with numbers that have different units--who cares if you calculate a distance of a googol meters, then end up with an error of 2^-32 meters?
In general, floating-point lets you represent much larger numbers, but the cost is higher (absolute) error for medium-sized numbers. Fixed points get better accuracy if you know how big of a number you'll have to represent ahead of time, so that you can put the decimal exactly where you want it for maximum accuracy. But if you don't know what units you're working with, floats are a better choice, because they represent a wide range with an accuracy that's good enough.
It is CREATED, that fixed-point numbers don't only have some Fixed number of decimals after point (digits) but are mathematically represented in negative powers. Very good for mechanical calculators:
e.g, the price of smth is USD 23.37 (Q=2 digits after the point. ) The machine knows where the point is supposed to be!
Take the number 123.456789
As an integer, this number would be 123
As a fixed point (2), this
number would be 123.46 (Assuming you rounded it up)
As a floating point, this number would be 123.456789
Floating point lets you represent most every number with a great deal of precision. Fixed is less precise, but simpler for the computer..