Losing accuracy with double division - double

I am having a problem with a simple division from two integers. I need it to be as accurate as possible, but for some reason the double type is working strange.
For example, if I execute the following code:
double res = (29970.0/1000.0);
The result is 29.969999999999999, when it should be 29.970.
Any idea why this is happening?
Thanks

Any idea why this is happening?
Because double representation is finite. For example, IEEE754 double-precision standard has 52 bits for fraction. So, not all the real numbers are covered. So, some of the values can not be ideally precise. In your case the result is 10^-15 away from the ideal.

I need it to be as accurate as possible
You shouldn't use doubles, then. In Java, for example, you would use BigDecimal instead (most languages provide a similar facility). double operations are intrinsically inaccurate to some degree. This is due to the internal representation of floating point numbers.

floating point numbers of type float and double are stored in binary format. Therefore numbers cant have precise decimal values. Those values are instead quantisized. If you hypothetically had only 2 bits fraction number type you would be able to represent only 2^-2 quantums: 0.00 0.25 0.50 0.75, nothing between.

I need it to be as accurate as possible
There is no silver bullet, but if you want only basic arithmetic operations (which map ℚ to ℚ), and you REALLY want exact results, then your best bet is rational type composed of two unlimited integers (a.k.a. BigInteger, BigInt, etc.) - but even then, memory is not infinite, and you must think about it.
For the rest of the question, please read about fixed size floating-point numbers, there's plenty of good sources.

Related

iOS - rounding a float with roundf() is not working properly

I am having issue with rounding a float in iPhone application.
float f=4.845;
float s= roundf(f * 100.0)/100;
NSLog(#"Output-1: %.2f",s);
s= roundf(484.5)/100;
NSLog(#"Output-2: %.2f",s);
Output-1: 4.84
Output-2: 4.85
Let me know whats problem in this and how to solve this.
The problem is that you don't yet realise one of the inherent problems with floating point: the fact that most numbers cannot be represented exactly (a).
This means that 4.845 is likely to be, in reality, something like 4.8449999999999 which, when you round it, gives you 4.84 rather than what you expect, 4.85.
And what value you end up with also depends on how you calculate it, which is why you're getting a different result.
And, of course, no floating point "inaccuracy" answer would be complete on SO without the authoritative What Every Computer Scientist Should Know About Floating-Point Arithmetic.
(a) Only sums of exact powers of two, within a certain similar range, can be exactly rendered in IEEE754. So, for example, 484.5 is
256 + 128 + 64 + 32 + 4 + 0.5 (28 + 27 + 26 + 25 + 22 + 2-1).
See this answer for a more detailed look into the IEEE754 format.
As to solving it, you have a few choices. One is to use double instead of float. That gives you more precision and greater range of numbers but only moves the problem further away rather than really solving it. Since 0.1 is a repeating fraction in IEEE754, no amount of bits (short of infinity) can exactly represent it.
Another choice is to use a custom library like a big decimal type, which can represent decimals of arbitrary precision (that's not infinite precision as some people are wont to suggest, since it's limited by memory). This will reduce the errors caused by the binary/decimal mismatch.
You may also want to look into NSDecimalNumber - this doesn't give you arbitrary precision but it does give a large range with accurate decimal representation.
There'll still be numbers you can't represent, like PI or the square root of 2 or any other irrational number, but it should cover most cases. If you really need to handle those other values, you need to switch to symbolic numeric representations.
Unlike 484.5 which can be represented exactly as a float* , 4.845 is represented as 4.8449998 (see this calculator if you wish to try other numbers). Multiplying by one hundred keeps the number at 484.49998, which correctly rounds to 484.
* An exact representation is possible because its fractional part 0.5 is a power of two (i.e. 2^-1).

Matlab precision: simple subtraction is not zero

I compute this simple sum on Matlab:
2*0.04-0.5*0.4^2 = -1.387778780781446e-017
but the result is not zero. What can I do?
Aabaz and Jim Clay have good explanations of what's going on.
It's often the case that, rather than exactly calculating the value of 2*0.04 - 0.5*0.4^2, what you really want is to check whether 2*0.04 and 0.5*0.4^2 differ by an amount that is small enough to be within the relevant numerical precision. If that's the case, than rather than checking whether 2*0.04 - 0.5*0.4^2 == 0, you can check whether abs(2*0.04 - 0.5*0.4^2) < thresh. Here thresh can either be some arbitrary smallish number, or an expression involving eps, which gives the precision of the numerical type you're working with.
EDIT:
Thanks to Jim and Tal for suggested improvement. Altered to compare the absolute value of the difference to a threshold, rather than the difference.
Matlab uses double-precision floating-point numbers to store real numbers. These are numbers of the form m*2^e where m is an integer between 2^52 and 2^53 (the mantissa) and e is the exponent. Let's call a number a floating-point number if it is of this form.
All numbers used in calculations must be floating-point numbers. Often, this can be done exactly, as with 2 and 0.5 in your expression. But for other numbers, most notably most numbers with digits after the decimal point, this is not possible, and an approximation has to be used. What happens in this case is that the number is rounded to the nearest floating-point number.
So, whenever you write something like 0.04 in Matlab, you're really saying "Get me the floating-point number that is closest to 0.04. In your expression, there are 2 numbers that need to be approximated: 0.04 and 0.4.
In addition, the exact result of operations like addition and multiplication on floating-point numbers may not be a floating-point number. Although it is always of the form m*2^e the mantissa may be too large. So you get an additional error from rounding the results of operations.
At the end of the day, a simple expression like yours will be off by about 2^-52 times the size of the operands, or about 10^-17.
In summary: the reason your expression does not evaluate to zero is two-fold:
Some of the numbers you start out with are different (approximations) to the exact numbers you provided.
The intermediate results may also be approximations of the exact results.
What you are seeing is quantization error. Matlab uses doubles to represent numbers, and while they are capable of a lot of precision, they still cannot represent all real numbers because there are an infinite number of real numbers. I'm not sure about Aabaz's trick, but in general I would say there isn't anything you can do, other than perhaps massaging your inputs to be double-friendly numbers.
I do not know if it is applicable to your problem but often the simplest solution is to scale your data.
For example:
a=0.04;
b=0.2;
a-0.2*b
ans=-6.9389e-018
c=a/min(abs([a b]));
d=b/min(abs([a b]));
c-0.2*d
ans=0
EDIT: of course I did not mean to give a universal solution to these kind of problems but it is still a good practice that can make you avoid a few problems in numerical computation (curve fitting, etc ...). See Jim Clay's answer for the reason why you are experiencing these problems.
I'm pretty sure this is a case of ye olde floating point accuracy issues.
Do you need 1e-17 accuracy? Is this merely a case of wanting 'pretty' output?
In that case, you can just use a formatted sprintf to display the number of significant digits you want.
Realize that this is not a matlab problem, but a fundamental limitation of how numbers are represented in binary.
For fun, work out what .1 is in binary...
Some references:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://www.mathworks.com/support/tech-notes/1100/1108.html

Arbitrary precision Float numbers on JavaScript

I have some inputs on my site representing floating point numbers with up to ten precision digits (in decimal). At some point, in the client side validation code, I need to compare a couple of those values to see if they are equal or not, and here, as you would expect, the intrinsics of IEEE754 make that simple check fails with things like (2.0000000000==2.0000000001) = true.
I may break the floating point number in two longs for each side of the dot, make each side a 64 bit long and do my comparisons manually, but it looks so ugly!
Any decent Javascript library to handle arbitrary (or at least guaranteed) precision float numbers on Javascript?
Thanks in advance!
PS: A GWT based solution has a ++
There is the GWT-MATH library at http://code.google.com/p/gwt-math/.
However, I warn you, it's a GWT jsni overlay of a java->javascript automated conversion of java.BigDecimal (actually the old com.ibm.math.BigDecimal).
It works, but speedy it is not. (Nor lean. It will pad on a good 70k into your project).
At my workplace, we are working on a fixed point simple decimal, but nothing worth releasing yet. :(
Use an arbitrary precision integer library such as silentmatt’s javascript-biginteger, which can store and calculate with integers of any arbitrary size.
Since you want ten decimal places, you’ll need to store the value n as n×10^10. For example, store 1 as 10000000000 (ten zeroes), 1.5 as 15000000000 (nine zeroes), etc. To display the value to the user, simply place a decimal point in front of the tenth-last character (and then cut off any trailing zeroes if you want).
Alternatively you could store a numerator and a denominator as bigintegers, which would then allow you arbitrarily precise fractional values (but beware – fractional values tend to get very big very quickly).

From Fraction to Decimal and Back?

Is it possible to store a fraction like 3/6 in a variable of some sort?
When I try this it only stores the numbers before the /. I know that I can use 2 variables and divide them, but the input is from a single text field. Is this at all possible?
I want to do this because I need to calculate the fractions to decimal odds.
A bonus question ;) - Is there an easy way to calculate a decimal value to a fraction? Thanks..
Well in short, there is no true way to extract the original fraction out of a decimal.
Example: take 5/10
you will get 0.5
now, 0.5 also translates back to 1/2, 2/4, 3/6, etc.
Your best bet is to store each integer separately, and perform the calculation later on.
The best thing to do is to implement a fraction class (or rational number class). Normally it would take a numerator and denominator and be able to provide a double, and do basic math with other fraction objects. It should also be able to parse and format fractions.
Rational Arithmetic on Rosetta Code looks like something good to start with.
I'm afraid there aren't any easy answers for you on this. For creating the fraction, you'll have to split the text field on the '/', convert the two halves to doubles, and divide them out. As for converting it back to a fraction, you'll have to crack open a math textbook and figure it out. (Even worse, a double is not actually precise—you may think it has 0.1 in it, but it really has 0.09999999999999998726 or something like that, so you'll have to choose a precision and go for it, or write some sort of fraction class that's based on a pair of integers.)
The method, as been said, is to store the numerator and denominator, much in the way you can write it on paper.
for 'C' use the
GNU Multiple Precision Arithmetic Library
look for 'rational' in the docs.
Is there an easy way to calculate a decimal value to a fraction?
If you limit your decimal values to a certain number of decimal points you could create a lookup table.
0.3333, 1/3
0.6666, 2/3
0.0625, 1/16
0.1250, 1/8
0.2500, 1/4
0.5000, 1/2
0.7500, 3/4
etc...
So if the user input 0.5 you pad it with 0's until you got 4 decimal places. You would then use the lookup table to return "1/2". The lookup table should probably be a dictionary of sorts.
It wouldn't be too difficult to do estimating either. For example, if the user entered 0.0624 you could easily select the value in the table closest to that decimal. In this case it would return "1/16."
Don't let typing/entering of the finite set of decimal/fraction pairs scares you (it's really not that large depending on the precision you choose).
If all else fails perhaps a google search would reveal a library that does this sort of this for you.

Decimal vs Double Speed

I write financial applications where I constantly battle the decision to use a double vs using a decimal.
All of my math works on numbers with no more than 5 decimal places and are not larger than ~100,000. I have a feeling that all of these can be represented as doubles anyways without rounding error, but have never been sure.
I would go ahead and make the switch from decimals to doubles for the obvious speed advantage, except that at the end of the day, I still use the ToString method to transmit prices to exchanges, and need to make sure it always outputs the number I expect. (89.99 instead of 89.99000000001)
Questions:
Is the speed advantage really as large as naive tests suggest? (~100 times)
Is there a way to guarantee the output from ToString to be what I want? Is this assured by the fact that my number is always representable?
UPDATE: I have to process ~ 10 billion price updates before my app can run, and I have implemented with decimal right now for the obvious protective reasons, but it takes ~3 hours just to turn on, doubles would dramatically reduce my turn on time. Is there a safe way to do it with doubles?
Floating point arithmetic will almost always be significantly faster because it is supported directly by the hardware. So far almost no widely used hardware supports decimal arithmetic (although this is changing, see comments).
Financial applications should always use decimal numbers, the number of horror stories stemming from using floating point in financial applications is endless, you should be able to find many such examples with a Google search.
While decimal arithmetic may be significantly slower than floating point arithmetic, unless you are spending a significant amount of time processing decimal data the impact on your program is likely to be negligible. As always, do the appropriate profiling before you start worrying about the difference.
There are two separable issues here. One is whether the double has enough precision to hold all the bits you need, and the other is where it can represent your numbers exactly.
As for the exact representation, you are right to be cautious, because an exact decimal fraction like 1/10 has no exact binary counterpart. However, if you know that you only need 5 decimal digits of precision, you can use scaled arithmetic in which you operate on numbers multiplied by 10^5. So for example if you want to represent 23.7205 exactly you represent it as 2372050.
Let's see if there is enough precision: double precision gives you 53 bits of precision.
This is equivalent to 15+ decimal digits of precision. So this would allow you five digits after the decimal point and 10 digits before the decimal point, which seems ample for your application.
I would put this C code in a .h file:
typedef double scaled_int;
#define SCALE_FACTOR 1.0e5 /* number of digits needed after decimal point */
static inline scaled_int adds(scaled_int x, scaled_int y) { return x + y; }
static inline scaled_int muls(scaled_int x, scaled_int y) { return x * y / SCALE_FACTOR; }
static inline scaled_int scaled_of_int(int x) { return (scaled_int) x * SCALE_FACTOR; }
static inline int intpart_of_scaled(scaled_int x) { return floor(x / SCALE_FACTOR); }
static inline int fraction_of_scaled(scaled_int x) { return x - SCALE_FACTOR * intpart_of_scaled(x); }
void fprint_scaled(FILE *out, scaled_int x) {
fprintf(out, "%d.%05d", intpart_of_scaled(x), fraction_of_scaled(x));
}
There are probably a few rough spots but that should be enough to get you started.
No overhead for addition, cost of a multiply or divide doubles.
If you have access to C99, you can also try scaled integer arithmetic using the int64_t 64-bit integer type. Which is faster will depend on your hardware platform.
Always use Decimal for any financial calculations or you will be forever chasing 1cent rounding errors.
Yes; software arithmetic really is 100 times slower than hardware. Or, at least, it is a lot slower, and a factor of 100, give or take an order of magnitude, is about right. Back in the bad old days when you could not assume that every 80386 had an 80387 floating-point co-processor, then you had software simulation of binary floating point too, and that was slow.
No; you are living in a fantasy land if you think that a pure binary floating point can ever exactly represent all decimal numbers. Binary numbers can combine halves, quarters, eighths, etc, but since an exact decimal of 0.01 requires two factors of one fifth and one factor of one quarter (1/100 = (1/4)*(1/5)*(1/5)) and since one fifth has no exact representation in binary, you cannot exactly represent all decimal values with binary values (because 0.01 is a counter-example which cannot be represented exactly, but is representative of a huge class of decimal numbers that cannot be represented exactly).
So, you have to decide whether you can deal with the rounding before you call ToString() or whether you need to find some other mechanism that will deal with rounding your results as they are converted to a string. Or you can continue to use decimal arithmetic since it will remain accurate, and it will get faster once machines are released that support the new IEEE 754 decimal arithmetic in hardware.
Obligatory cross-reference: What Every Computer Scientist Should Know About Floating-Point Arithmetic. That's one of many possible URLs.
Information on decimal arithmetic and the new IEEE 754:2008 standard at this Speleotrove site.
Just use a long and multiply by a power of 10. After you're done, divide by the same power of 10.
Decimals should always be used for financial calculations. The size of the numbers isn't important.
The easiest way for me to explain is via some C# code.
double one = 3.05;
double two = 0.05;
System.Console.WriteLine((one + two) == 3.1);
That bit of code will print out False even though 3.1 is equal to 3.1...
Same thing...but using decimal:
decimal one = 3.05m;
decimal two = 0.05m;
System.Console.WriteLine((one + two) == 3.1m);
This will now print out True!
If you want to avoid this sort of issue, I recommend you stick with decimals.
I refer you to my answer given to this question.
Use a long, store the smallest amount you need to track, and display the values accordingly.