The max values of int, float and long in Scala are:
Int.MaxValue = 2147483647
Float.MaxValue = 3.4028235E38
Long.MaxValue = 9223372036854775807L
From the authors of Scala compiler, Keynote, PNW Scala 2013, slide 16 What's Int.MaxValue between friends?:
val x1: Float = Long.MaxValue
val x2: Float = Long.MaxValue - Int.MaxValue
println (x1 == x2)
// NO WONDER NOTHING WORKS
Why does this expression return true?
A Float is a 4-byte floating point value. Meanwhile a Long is an 8-byte value and an Int is also a 4-byte value. However, the way numbers are stored in 4-byte floating point values means that they have only around 8 digits of precision. Consequently, they do not have the capacity to store even the 4 most significant bytes (around 9-10 digits) of a Long regardless of the value of the least 4 significant bytes (another 9-10 digits).
Consequently, the Float representation of the two expressions is the same, because the bits that differ are below the resolution of a Float. Hence the two values compare equal.
Echoing Mike Allen's answer, but hoping to provide some additional context (would've left this as a comment rather than a separate answer, but SO's reputation feature wouldn't let me).
Integers have a maximum range of values defined as either 0 to 2^n (if it is an unsigned integer) or -2^(n-1) to 2^(n-1) (for signed integers) where n is the number of bits in the underlying implementation (n=32 in this case). If you wish to represent a number larger than 2^31 with a signed value, you can't use an int. A signed long will work up to 2^63. For anything larger than this, a signed float can go up to roughly 2^127.
One other thing to note is that these resolution issues are only in force when the value stored in the floating point number approaches the max. In this case, the subtraction operation causes a change in true value that is many orders of magnitude smaller than the first value. A float would not round off the difference between 100 and 101, but it might round off the difference between 10000000000000000000000000000 and 10000000000000000000000000001.
Same goes for small values. If you cast 0.1 to an integer, you get exactly 0. This is not generally considered a failing of the integer data type.
If you are operating on numbers that are many orders of magnitude different in size, and also not able to tolerate rounding errors, you will need data structures and algorithms that account for inherent limitations of binary data representation. One possible solution would be to use a floating point encoding with fewer bits of exponential, thereby limiting the max value but providing for greater resolution is less significant bits. For greater detail, check out:
look up the IEEE Standard 754 (which defines the floating point encoding)
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
Related
How to get smallest value (stride) for particular number type in Swift? I mean, the shortest non-zero stride.
For example 1 for Int, 0.00...001 for Double etc...
This has been partially addressed in separate comments already, but to bring it all together...
For integer types:
The smallest possible increment between distinct values is 1. This is part of the definition of an integer, so it's such a foundational aspect of the type that there isn't (and needn't be) a special API for finding it.
(One can argue that successor constitutes such an API. But one can also argue that using successor where you could just use 1 makes your code much less readable.)
For floating-point types:
There is no one smallest possible increment. Because floating-point types use an exponent-based representation, the spacing between representable floating-point numbers varies with the exponent, as illustrated in this number line:
That image comes from What Every Computer Scientist Should Know About Floating-Point Arithmetic, an essential read for, well, everyone using floating-point numbers. You can read more about the concept in Wikipedia's pages for Unit in the Last Place and Machine Epsilon. Exploring Binary also has a good entry-level read on floating-point spacing.
Back to Swift — The Float and Double types conform to the FloatingPoint protocol (in Swift 3, currently in beta). This defines the features of IEEE 754 floating point formats, which includes both:
The Unit in the Last Place, or ulp, which tells you the increment between a number and the next greater representable number (but for some edge cases). This is related to, but not the same as, machine epsilon, since it scales with value. (ulpOfOne is the same as what other libraries call machine epsilon.)
nextUp and nextDown, which tell you the closest greater or lesser representable numbers.
Here's an example (conveniently showing that for 32-bit Float, the minimum increment gets bigger than one sooner than you might think):
let ichi: Float = 1.0
ichi.ulp // -> 1.192093e-07
ichi.nextUp // -> 1.00000012
let man: Float = 10_000
man.ulp // -> 10000
man.nextUp // -> 10000.001
let oku: Float = 100_000_000
oku.ulp // -> 8
oku.nextUp // -> 100000008.0
In Swift 2, there's no FloatingPoint protocol, but you can make use of the equivalent POSIX constants/functions imported from C: FLT_EPSILON and DBL_EPSILON are defined as the difference between 1.0 and the next representable value, and the nextafter functions find the increment at any value.
You can get these numbers by importing Darwin and the numbers you're looking for will be
DBL_MIN
FLT_MIN
Int.min
However these won't be 'nice' numbers like 0.00 ... 01 and Int will be signed so using Uint8.min will give you 0.
To begin with, allow me to confess that I'm an experienced programmer, who has over 10 years of programming experience. However, the question I'm asking here is the one, which has bugged me ever since, I first picked up a book on C about a decade back.
Below is an excerpt from a book on Python, explaining about Python Floating type.
Floating-point numbers are represented using the native
double-precision (64-bit) representation of floating-point numbers on
the machine. Normally this is IEEE 754, which provides approximately
17 digits of precision and an exponent in the range of –308 to
308.This is the same as the double type in C.
What I never understood is the meaning of the phrase
" ... which provides approximately 17 digits of precision and an
exponent in the range of –308 to 308 ... "
My intuition here goes astray, since i can understand the meaning of precision, but how can range be different from that. I mean, if a floating point number can represent a value up to 17 digits, (i.e. max of 1,000,000,000,000,000,00 - 1), then how can exponent be +308. wouldn't that make a 308 digit number if exponent is 10 or a rough 100 digit number if exponent is 2.
I hope, I'm able to express my confusion.
Regards
Vaid, Abhishek
Suppose that we write 1500 with two digits of precision. That means we are being precise enough to distinguish 1500 from 1600 and 1400, but not precise enough to distinguish 1500 from 1510 or 1490. Telling those numbers apart would require three digits of precision.
Even though I wrote four digits, floating-point representation doesn't necessarily contain all these digits. 1500 is 1.5 * 10^3. In decimal floating-point representation, with two digits of precision, only the first two digits of the number and the exponent would be stored, which I will write (1.5, 3).
Why is there a distinction between the "real" digits and the placeholder zeros? Because it tells us how precisely we can represent numbers, that is, what fraction of their value is lost due to approximation. We can distinguish 1500 = (1.5, 3) from 1500+100 = (1.6, 3). But if we increase the exponent, we can't distinguish 15000 = (1.5, 4) from 15000+100 = (1.51, 4). At best, we can approximate numbers within +/- 10% with two decimal digits of precision. This is true no matter how small or large the exponent is allowed to be.
The regular decimal representation of numbers obscures the issue. If instead, one considers them in normalised scientific notation to separate the mantissa and exponent, then the distinction is immediate. Normalisation is achieved by scaling the mantissa until it is between 0.0 and 1.0 and adjusting the exponent to avoid loss of scale.
The precision of the number is the number of digits in the mantissa. A floating point number has a limited number of bits to represent this portion of the value. It determines how accurately numbers that are similar in size can be distinguished.
The range determines the allowable values of the exponent. In your example, the range of -308 through 308 is represented independently of the mantissa value and is limited by the number of bits in the floating point number allocated to storing the range.
These two values can be varied independently to suit. For example, in many graphic pipelines much smaller values are represented with truncated values that are scaled to fit into even 16 bits.
Numerical methods libraries expend large amounts of effort in ensuring that these limits are not exceeded to maintain the correctness of calculations. Casual use will not usually encounter these limits.
The choices in IEEE 754 are accepted to be a reasonable trade off between precision and range. The 32-bit single has similar, but different limits. The question What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems? provides a longer summary and further links for more study.
From Wiki, Double Precision Floating Point numbers are expected to have a precision to 17 digits, or 17 SF. The exponent can be in the range -1022 to 1023.
Their -308 to 308 would appear to be an error, or else an idea not fully explained.
I'm testing a photo application for Facebook. I'm getting object IDs from the Facebook API, but I received some incorrect ones, which doesn't make sense - why would Facebook send wrong IDs? I investigated a bit and found out that numbers with 17 and more digits are automatically turning into even numbers!
For example, let's say the ID I'm supposed to receive from Facebook is 12345678912345679. In the debugger, I've noticed that Flash Player automatically turns it into 12345678912345678. And I even tried to manually set it back to an odd number, but it keeps changing back to even.
Is there any way to stop Flash Player from rounding the numbers? BTW the variable is defined as Object, I receive it like that from Facebook.
This is related to the implementation of data types:
int is a 32-bit number, with an even distribution of positive and
negative values, including 0. So the maximum value is
(2^32 / 2 ) - 1 == 2,147,483,647.
uint is also a 32-bit number, but it doesn't have negative values. So the
maximum value is
2^32 - 1 == 4,294,967,295.
When you use a numerical value greater than the maximum value of int or uint, it is automatically cast to Number. From the Adobe Doc:
The Number data type is useful when you need to use floating-point
values. Flash runtimes handle int and uint data types more efficiently
than Number, but Number is useful in situations where the range of
values required exceeds the valid range of the int and uint data
types. The Number class can be used to represent integer values well
beyond the valid range of the int and uint data types. The Number data
type can use up to 53 bits to represent integer values, compared to
the 32 bits available to int and uint.
53 bits have a maximum value of:
2^53 - 1 == 9,007,199,254,740,989 => 16 digits
So when you use any value greater than that, the inner workings of floating point numbers apply.
You can read about floating point numbers here, but in short, for any floating point value, the first couple of bits are used to specify a multiplication factor, which determines the location of the point. This allows for a greater range of values than are actually possible to represent with the number of bits available - at the cost of reduced precision.
When you have a value greater than the maximum possible integer value a Number could have, the least significant bit (the one representing 0 and 1) is cut off to allow for a more significant bit (the one representing 2^54) to exist => hence, you lose the odd numbers.
There is a simple way to get around this: Keep all your IDs as Strings - they can have as many digits as your system has available bytes ;) It's unlikely you're going to do any calculations with them, anyway.
By the way, if you had a value greater than 2^54-(1+2), your numbers would be rounded down to the next multiple of 4; if you had a value greater than 2^55-(1+2+4), they would be rounded down to the next multiple of 8, etc.
I just can't understand fixed point and floating point numbers due to hard to read definitions about them all over Google. But none that I have read provide a simple enough explanation of what they really are. Can I get a plain definition with example?
A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point) and a specific number of bits reserved for the fractional part (the part to the right of the decimal point). No matter how large or small your number is, it will always use the same number of bits for each portion. For example, if your fixed point format was in decimal IIIII.FFFFF then the largest number you could represent would be 99999.99999 and the smallest non-zero number would be 00000.00001. Every bit of code that processes such numbers has to have built-in knowledge of where the decimal point is.
A floating point number does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). So a floating point number that took up 10 digits with 2 digits reserved for the exponent might represent a largest value of 9.9999999e+50 and a smallest non-zero value of 0.0000001e-49.
A fixed point number just means that there are a fixed number of digits after the decimal point. A floating point number allows for a varying number of digits after the decimal point.
For example, if you have a way of storing numbers that requires exactly four digits after the decimal point, then it is fixed point. Without that restriction it is floating point.
Often, when fixed point is used, the programmer actually uses an integer and then makes the assumption that some of the digits are beyond the decimal point. For example, I might want to keep two digits of precision, so a value of 100 means actually means 1.00, 101 means 1.01, 12345 means 123.45, etc.
Floating point numbers are more general purpose because they can represent very small or very large numbers in the same way, but there is a small penalty in having to have extra storage for where the decimal place goes.
From my understanding, fixed-point arithmetic is done using integers. where the decimal part is stored in a fixed amount of bits, or the number is multiplied by how many digits of decimal precision is needed.
For example, If the number 12.34 needs to be stored and we only need two digits of precision after the decimal point, the number is multiplied by 100 to get 1234. When performing math on this number, we'd use this rule set. Adding 5620 or 56.20 to this number would yield 6854 in data or 68.54.
If we want to calculate the decimal part of a fixed-point number, we use the modulo (%) operand.
12.34 (pseudocode):
v1 = 1234 / 100 // get the whole number
v2 = 1234 % 100 // get the decimal number (100ths of a whole).
print v1 + "." + v2 // "12.34"
Floating point numbers are a completely different story in programming. The current standard for floating point numbers use something like 23 bits for the data of the number, 8 bits for the exponent, and 1 but for sign. See this Wikipedia link for more information on this.
The term ‘fixed point’ refers to the corresponding manner in which numbers are represented, with a fixed number of digits after, and sometimes before, the decimal point.
With floating-point representation, the placement of the decimal point can ‘float’ relative to the significant digits of the number.
For example, a fixed-point representation with a uniform decimal point placement convention can represent the numbers 123.45, 1234.56, 12345.67, etc, whereas a floating-point representation could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, etc.
There's of what a fixed-point number is and , but very little mention of what I consider the defining feature. The key difference is that floating-point numbers have a constant relative (percent) error caused by rounding or truncating. Fixed-point numbers have constant absolute error.
With 64-bit floats, you can be sure that the answer to x+y will never be off by more than 1 bit, but how big is a bit? Well, it depends on x and y -- if the exponent is equal to 10, then rounding off the last bit represents an error of 2^10=1024, but if the exponent is 0, then rounding off a bit is an error of 2^0=1.
With fixed point numbers, a bit always represents the same amount. For example, if we have 32 bits before the decimal point and 32 after, that means truncation errors will always change the answer by 2^-32 at most. This is great if you're working with numbers that are all about equal to 1, which gain a lot of precision, but bad if you're working with numbers that have different units--who cares if you calculate a distance of a googol meters, then end up with an error of 2^-32 meters?
In general, floating-point lets you represent much larger numbers, but the cost is higher (absolute) error for medium-sized numbers. Fixed points get better accuracy if you know how big of a number you'll have to represent ahead of time, so that you can put the decimal exactly where you want it for maximum accuracy. But if you don't know what units you're working with, floats are a better choice, because they represent a wide range with an accuracy that's good enough.
It is CREATED, that fixed-point numbers don't only have some Fixed number of decimals after point (digits) but are mathematically represented in negative powers. Very good for mechanical calculators:
e.g, the price of smth is USD 23.37 (Q=2 digits after the point. ) The machine knows where the point is supposed to be!
Take the number 123.456789
As an integer, this number would be 123
As a fixed point (2), this
number would be 123.46 (Assuming you rounded it up)
As a floating point, this number would be 123.456789
Floating point lets you represent most every number with a great deal of precision. Fixed is less precise, but simpler for the computer..
I write financial applications where I constantly battle the decision to use a double vs using a decimal.
All of my math works on numbers with no more than 5 decimal places and are not larger than ~100,000. I have a feeling that all of these can be represented as doubles anyways without rounding error, but have never been sure.
I would go ahead and make the switch from decimals to doubles for the obvious speed advantage, except that at the end of the day, I still use the ToString method to transmit prices to exchanges, and need to make sure it always outputs the number I expect. (89.99 instead of 89.99000000001)
Questions:
Is the speed advantage really as large as naive tests suggest? (~100 times)
Is there a way to guarantee the output from ToString to be what I want? Is this assured by the fact that my number is always representable?
UPDATE: I have to process ~ 10 billion price updates before my app can run, and I have implemented with decimal right now for the obvious protective reasons, but it takes ~3 hours just to turn on, doubles would dramatically reduce my turn on time. Is there a safe way to do it with doubles?
Floating point arithmetic will almost always be significantly faster because it is supported directly by the hardware. So far almost no widely used hardware supports decimal arithmetic (although this is changing, see comments).
Financial applications should always use decimal numbers, the number of horror stories stemming from using floating point in financial applications is endless, you should be able to find many such examples with a Google search.
While decimal arithmetic may be significantly slower than floating point arithmetic, unless you are spending a significant amount of time processing decimal data the impact on your program is likely to be negligible. As always, do the appropriate profiling before you start worrying about the difference.
There are two separable issues here. One is whether the double has enough precision to hold all the bits you need, and the other is where it can represent your numbers exactly.
As for the exact representation, you are right to be cautious, because an exact decimal fraction like 1/10 has no exact binary counterpart. However, if you know that you only need 5 decimal digits of precision, you can use scaled arithmetic in which you operate on numbers multiplied by 10^5. So for example if you want to represent 23.7205 exactly you represent it as 2372050.
Let's see if there is enough precision: double precision gives you 53 bits of precision.
This is equivalent to 15+ decimal digits of precision. So this would allow you five digits after the decimal point and 10 digits before the decimal point, which seems ample for your application.
I would put this C code in a .h file:
typedef double scaled_int;
#define SCALE_FACTOR 1.0e5 /* number of digits needed after decimal point */
static inline scaled_int adds(scaled_int x, scaled_int y) { return x + y; }
static inline scaled_int muls(scaled_int x, scaled_int y) { return x * y / SCALE_FACTOR; }
static inline scaled_int scaled_of_int(int x) { return (scaled_int) x * SCALE_FACTOR; }
static inline int intpart_of_scaled(scaled_int x) { return floor(x / SCALE_FACTOR); }
static inline int fraction_of_scaled(scaled_int x) { return x - SCALE_FACTOR * intpart_of_scaled(x); }
void fprint_scaled(FILE *out, scaled_int x) {
fprintf(out, "%d.%05d", intpart_of_scaled(x), fraction_of_scaled(x));
}
There are probably a few rough spots but that should be enough to get you started.
No overhead for addition, cost of a multiply or divide doubles.
If you have access to C99, you can also try scaled integer arithmetic using the int64_t 64-bit integer type. Which is faster will depend on your hardware platform.
Always use Decimal for any financial calculations or you will be forever chasing 1cent rounding errors.
Yes; software arithmetic really is 100 times slower than hardware. Or, at least, it is a lot slower, and a factor of 100, give or take an order of magnitude, is about right. Back in the bad old days when you could not assume that every 80386 had an 80387 floating-point co-processor, then you had software simulation of binary floating point too, and that was slow.
No; you are living in a fantasy land if you think that a pure binary floating point can ever exactly represent all decimal numbers. Binary numbers can combine halves, quarters, eighths, etc, but since an exact decimal of 0.01 requires two factors of one fifth and one factor of one quarter (1/100 = (1/4)*(1/5)*(1/5)) and since one fifth has no exact representation in binary, you cannot exactly represent all decimal values with binary values (because 0.01 is a counter-example which cannot be represented exactly, but is representative of a huge class of decimal numbers that cannot be represented exactly).
So, you have to decide whether you can deal with the rounding before you call ToString() or whether you need to find some other mechanism that will deal with rounding your results as they are converted to a string. Or you can continue to use decimal arithmetic since it will remain accurate, and it will get faster once machines are released that support the new IEEE 754 decimal arithmetic in hardware.
Obligatory cross-reference: What Every Computer Scientist Should Know About Floating-Point Arithmetic. That's one of many possible URLs.
Information on decimal arithmetic and the new IEEE 754:2008 standard at this Speleotrove site.
Just use a long and multiply by a power of 10. After you're done, divide by the same power of 10.
Decimals should always be used for financial calculations. The size of the numbers isn't important.
The easiest way for me to explain is via some C# code.
double one = 3.05;
double two = 0.05;
System.Console.WriteLine((one + two) == 3.1);
That bit of code will print out False even though 3.1 is equal to 3.1...
Same thing...but using decimal:
decimal one = 3.05m;
decimal two = 0.05m;
System.Console.WriteLine((one + two) == 3.1m);
This will now print out True!
If you want to avoid this sort of issue, I recommend you stick with decimals.
I refer you to my answer given to this question.
Use a long, store the smallest amount you need to track, and display the values accordingly.