Is it possible to predict when Perl's decimal/float math will be wrong? [duplicate] - perl

This question already has answers here:
Why can't decimal numbers be represented exactly in binary?
(22 answers)
Closed 7 years ago.
In one respect, I understand that Perl's floats are inexact binary representations, which causes Perl's math to sometimes be wrong. What I don't understand, is why sometimes these floats seem to give exact answers, and other times, not. Is it possible to predict when Perl's float math will give the wrong (i.e. inexact answer)?
For instance, in the below code, Perl's math is wrong 1 time when the subtraction is "16.12 - 15.13", wrong 2 times when the problem is "26.12 - 25.13", and wrong 20 times when the problem is "36.12 - 35.13". Furthermore, for some reason, in all of the above mentioned test cases, the result of our subtraction problem (i.e. $subtraction_problem) starts out as being wrong, but will tend to become more correct, the more we add or subtract from it (with $x). This makes no sense, why is it that the more we add to or subtract from our arithmetic problem, the more likely it becomes that the value is correct (i.e. exact)?
my $subtraction_problem = 16.12 - 15.13;
my $perl_math_failures = 0;
for (my $x = -25; $x< 25; $x++){
my $result = $subtraction_problem +$x;
print "$result\n";
$perl_math_failures++ if length $result > 6;
}
print "There were $perl_math_failures perl math failures!\n";

None of this is Perl specific. See Goldberg:
Rounding Error
Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation. The section Relative Error and Ulps describes how it is measured.
Since most floating-point calculations have rounding error anyway, does it matter if the basic arithmetic operations introduce a little bit more rounding error than necessary? That question is a main theme throughout this section. The section Guard Digits discusses guard digits, a means of reducing the error when subtracting two nearby numbers. Guard digits were considered sufficiently important by IBM that in 1968 it added a guard digit to the double precision format in the System/360 architecture (single precision already had a guard digit), and retrofitted all existing machines in the field. Two examples are given to illustrate the utility of guard digits.
The IEEE standard goes further than just requiring the use of a guard digit. It gives an algorithm for addition, subtraction, multiplication, division and square root, and requires that implementations produce the same result as that algorithm. Thus, when a program is moved from one machine to another, the results of the basic operations will be the same in every bit if both machines support the IEEE standard. This greatly simplifies the porting of programs. Other uses of this precise specification are given in Exactly Rounded Operations.

Related

Numerical convergence and minimum number size

I have a program which calculates probability values
(p-values),
but it is entering a very large negative number into the
exp function
exp(-626294.830) which evaluates to zero instead of the very small
positive number that it should be.
How can I get this to evaluate as a very small floating point number?
I have tried
Math::BigFloat,
bignum, and
bigrat
but all have failed.
Wolfram Alpha says that exp(-626294.830) is 4.08589×10^-271997... zero is a pretty close approximation to that ;-) Although you've edited and removed the context from your question, do you really need to work with such tiny numbers, or perhaps there is some way you could optimize your algorithm or scale your numbers?
Anyway, you are correct that code like Math::BigFloat->new("-626294.830")->bexp seems to take quite some time, even with the support of use Math::BigFloat lib => 'GMP';.
The only alternative I can offer at the moment is Math::Prime::Util::GMP's expreal, although you need to specify a precision to it.
use Math::Prime::Util::GMP qw/expreal/;
use Math::BigFloat;
my $e = Math::BigFloat->new(expreal(-626294.830,272000));
print $e->bnstr,"\n";
__END__
4.086e-271997
But on my machine, even that still takes ~20s to run, which brings us back to the question of potential optimization in other places.
Floating point numbers do not have infinite precision. Assuming the number is represented as an IEEE 754 double, we have 52 bits for a fraction, 11 bits for the exponent, and one bit for the sign. Due to the way exponents are encoded, the smallest positive number that can be represented is 2^-1022.
If we look at your number e^-626294.830, we can do a change of base and see that it equals 2^(log_2 e · -626294.830) = 2^-903552.445, which is significantly smaller than 2^-1022. Approximating your number as zero is therefore correct.
Instead of calculating this value using arbitrary-precision numerics, you are likely better off solving the necessary equations by hand, then coding this in a way that does not require extreme precision. For example, it is unlikely that you need the exact value of e^-626294.830, but perhaps just the magnitude. Then, you can calculate the logarithm instead of using exp().

Getting around floating point error with logarithms?

I'm trying to write a basic digit counter (an integer is inputted and the number of digits of that integer is outputted) for positive integers. This is my general formula:
dig(x) := Math.floor(Math.log(x,10))
I tried implementing the equivalent of dig(x) in Ruby, and found that when I was computing dig(1000) I was getting 2 instead of 3 because Math.log was returning 2.9999999999999996 which would then be truncated down to 2. What is the proper way to handle this problem? (I'm assuming this problem can occur regardless of the language used to implement this approach, but if that's not the case then please explain that in your answer).
To get an exact count of the number of digits in an integer, you can do the usual thing: (in C/C++, assuming n is non-negative)
int digits = 0;
while (n > 0) {
n = n / 10; // integer division, just drops the ones digit and shifts right
digits = digits + 1;
}
I'm not certain but I suspect running a built-in logarithm function won't be faster than this, and this will give you an exact answer.
I thought about it for a minute and couldn't come up with a way to make the logarithm-based approach work with any guarantees, and almost convinced myself that it is probably a doomed pursuit in the first place because of floating point rounding errors, etc.
From The Art of Computer Programming volume 2, we will eliminate one bit of error before the floor function is applied by adding that one bit back in.
Let x be the result of log and then do x += x / 0x10000000 for a single precision floating point number (C's float). Then pass the value into floor.
This is guaranteed to be the fastest (assuming you have the answer in numerical form) because it uses only a few floating point instructions.
Floating point is always subject to roundoff error; that's one of the hazards you need to be aware of, and actively manage, when working with it. The proper way to handle it, if you must use floats is to figure out what the expected amount of accumulated error is and allow for that in comparisons and printouts -- round off appropriately, compare for whether the difference is within that range rather than comparing for equality, etcetera.
There is no exact binary-floating-point representation of simple things like 1/10th, for example.
(As others have noted, you could rewrite the problem to avoid using the floating-point-based solution entirely, but since you asked specifically about working log() I wanted to address that question; apologies if I'm off target. Some of the other answers provide specific suggestions for how you might round off the result. That would "solve" this particular case, but as your floating operations get more complicated you'll have to continue to allow for roundoff accumulating at each step and either deal with the error at each step or deal with the cumulative error -- the latter being the more complicated but more accurate solution.)
If this is a serious problem for an application, folks sometimes use scaled fixed point instead (running financial computations in terms of pennies rather than dollars, for example). Or they use one of the "big number" packages which computes in decimal rather than in binary; those have their own round-off problems, but they round off more the way humans expect them to.

comparing float and double and printing them

I have a quick question. So, say I have a really big number up to like 15 digits, and I would take the input and assign it to two variables, one float and one double if I were to compare two numbers, how would you compare them? I think double has the precision up to like 15 digits? and float has 8? So, do I simply compare them while the float only contains 8 digits and pad the rest or do I have the float to print out all 15 digits and then make the comparison? Also, if I were asked to print out the float number, is the standard way of doing it is just printing it up to 8 digits? which is its max precision
thanks
Most languages will do some form of type promotion to let you compare types that are not identical, but reasonably similar. For details, you would have to indicate what language you are referring to.
Of course, the real problem with comparing floating point numbers is that the results might be unexpected due to rounding errors. Most mathematical equivalences don't hold for floating point artihmetic, so two sequences of operations which SHOULD yield the same value might actually yield slightly different values (or even very different values if you aren't careful).
EDIT: as for printing, the "standard way" is based on what you need. If, for some reason, you are doing monetary computations in floating point, chances are that you'll only want to print 2 decimal digits.
Thinking in terms of digits may be a problem here. Floats can have a range from negative infinity to positive infinity. In C# for example the range is ±1.5 × 10^−45 to ±3.4 × 10^38 with a precision of 7 digits.
Also, IEEE 754 defines floats and doubles.
Here is a link that might help http://en.wikipedia.org/wiki/IEEE_floating_point
Your question is the right one. You want to consider your approach, though.
Whether at 32 or 64 bits, the floating-point representation is not meant to compare numbers for equality. For example, the assertion 2.0/7.0 == 60.0/210.0 may or may not be true in the CPU's view. Conceptually, the floating-point is inherently meant to be imprecise.
If you wish to compare numbers for equality, use integers. Consider again the ratios of the last paragraph. The assertion that 2*210 == 7*60 is always true -- noting that those are the integral versions of the same four numbers as before, only related using multiplication rather than division. One suspects that what you are really looking for is something like this.

Matlab precision: simple subtraction is not zero

I compute this simple sum on Matlab:
2*0.04-0.5*0.4^2 = -1.387778780781446e-017
but the result is not zero. What can I do?
Aabaz and Jim Clay have good explanations of what's going on.
It's often the case that, rather than exactly calculating the value of 2*0.04 - 0.5*0.4^2, what you really want is to check whether 2*0.04 and 0.5*0.4^2 differ by an amount that is small enough to be within the relevant numerical precision. If that's the case, than rather than checking whether 2*0.04 - 0.5*0.4^2 == 0, you can check whether abs(2*0.04 - 0.5*0.4^2) < thresh. Here thresh can either be some arbitrary smallish number, or an expression involving eps, which gives the precision of the numerical type you're working with.
EDIT:
Thanks to Jim and Tal for suggested improvement. Altered to compare the absolute value of the difference to a threshold, rather than the difference.
Matlab uses double-precision floating-point numbers to store real numbers. These are numbers of the form m*2^e where m is an integer between 2^52 and 2^53 (the mantissa) and e is the exponent. Let's call a number a floating-point number if it is of this form.
All numbers used in calculations must be floating-point numbers. Often, this can be done exactly, as with 2 and 0.5 in your expression. But for other numbers, most notably most numbers with digits after the decimal point, this is not possible, and an approximation has to be used. What happens in this case is that the number is rounded to the nearest floating-point number.
So, whenever you write something like 0.04 in Matlab, you're really saying "Get me the floating-point number that is closest to 0.04. In your expression, there are 2 numbers that need to be approximated: 0.04 and 0.4.
In addition, the exact result of operations like addition and multiplication on floating-point numbers may not be a floating-point number. Although it is always of the form m*2^e the mantissa may be too large. So you get an additional error from rounding the results of operations.
At the end of the day, a simple expression like yours will be off by about 2^-52 times the size of the operands, or about 10^-17.
In summary: the reason your expression does not evaluate to zero is two-fold:
Some of the numbers you start out with are different (approximations) to the exact numbers you provided.
The intermediate results may also be approximations of the exact results.
What you are seeing is quantization error. Matlab uses doubles to represent numbers, and while they are capable of a lot of precision, they still cannot represent all real numbers because there are an infinite number of real numbers. I'm not sure about Aabaz's trick, but in general I would say there isn't anything you can do, other than perhaps massaging your inputs to be double-friendly numbers.
I do not know if it is applicable to your problem but often the simplest solution is to scale your data.
For example:
a=0.04;
b=0.2;
a-0.2*b
ans=-6.9389e-018
c=a/min(abs([a b]));
d=b/min(abs([a b]));
c-0.2*d
ans=0
EDIT: of course I did not mean to give a universal solution to these kind of problems but it is still a good practice that can make you avoid a few problems in numerical computation (curve fitting, etc ...). See Jim Clay's answer for the reason why you are experiencing these problems.
I'm pretty sure this is a case of ye olde floating point accuracy issues.
Do you need 1e-17 accuracy? Is this merely a case of wanting 'pretty' output?
In that case, you can just use a formatted sprintf to display the number of significant digits you want.
Realize that this is not a matlab problem, but a fundamental limitation of how numbers are represented in binary.
For fun, work out what .1 is in binary...
Some references:
http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
http://www.mathworks.com/support/tech-notes/1100/1108.html

How to eliminate Perl rounding errors

Consider the following program:
$x=12345678901.234567000;
$y=($x-int($x))*1000000000;
printf("%f:%f\n",$x,$y);
Here's what is prints:
12345678901.234568:234567642.211914
I was expecting:
12345678901.234567:234567000
This appears to be some sort of rounding issue in Perl.
How could I change it to get 234567000 instead?
Did I do something wrong?
This is a frequently-asked question.
Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Internally, your computer represents floating-point numbers in binary. Digital (as in powers of two) computers cannot store all numbers exactly. Some real numbers lose precision in the process. This is a problem with how computers store numbers and affects all computer languages, not just Perl.
perlnumber shows the gory details of number representations and conversions.
To limit the number of decimal places in your numbers, you can use the printf or sprintf function. See the Floating Point Arithmetic for more details.
printf "%.2f", 10/3;
my $number = sprintf "%.2f", 10/3;
Make "use bignum;" the first line of your program.
Other answers explain what to expect when using floating point arithmetic -- that some digits towards the end are not really part of the answer. This is to make the computations do-able in a reasonable amount of time and space. If you are willing to use unbounded time and space to work with numbers, then you can use arbitrary-precision numbers and math, which is what "use bignum" enables. It's slower and uses more memory, but it works like math you learned in elementary school.
In general, it's best to learn more about how floating point math works before converting your program to arbitrary-precision math. It's only needed in very strange situations.
The whole issue of floating point precision has been answered, but you're still seeing the problem despite bignum. Why? The culprit is printf. bignum is a shallow pragma. It only affects how numbers are represented in variables and math operations. Even though bignum makes Perl do the math right, printf is still implemented in C. %f takes your precise number and turns it right back into an imprecise floating point number.
Print your numbers with just print and they should do fine. You'll have to format them manually.
The other thing you can do is to recompile Perl with -Duse64bitint -Duselongdouble which will force Perl to internally use 64 bit integers and long double floating point numbers. This will give you a lot more accuracy, more consistently and almost no performance cost (bignum is a bit of a performance hog for math intensive code). Its not 100% accurate like bignum, but it will affect things like printf. However, recompiling Perl this way makes it binary incompatible, so you're going to have to recompile all your extensions. If you do this, I suggest installing a fresh Perl in a different location (/usr/local/perl/64bit or something) rather than trying to manage parallel Perl installs sharing the same library.
Homework (Googlework?) for you: How are floating point numbers represented by computers?
You can only have a limited number of precise digits, everything beyond that is just the noise from base conversion (binary to decimal). That is also why the last digit of your $x appears to be 8.
$x - (int($x) is 0.23456linenoise, which is also a floating point number. Multiplied by 1000000000, it gives another floating point number, with more random digits pulled from the incommensurability of the bases.
Perl does not do arbitrary precision arithmetic for its built-in floating point types. So your initial variable $x is an approximation. You can see this by doing:
$ perl -e 'printf "%.10f", 12345678901.234567000'
12345678901.2345676422
This answer works on my x64 platform, by accommodating the scale of the errors
sub safe_eq {
my($var1,$var2)=#_;
return 1 if($var1==$var2);
my $dust;
if($var2==0) { $dust=abs($var1); }
else { $dust= abs(($var1/$var2)-1); }
return 0 if($dust>5.32907051820076e-15 ); # 5.32907051820075e-15
return 1;
}
You can build on the above to solve most of your problems.
Avoid bignum if you can - it's stupendously slow - plus it will not solve any problems if you've got to store your numbers anyplace like a DB or in JSON etc.
This has to do with the (limited) accuracy of the floating point computations a computer does. Generally when comparing floating point numbers you should compare with a suitable epsilon:
$value1 == $value2 or warn;
won't work as expected in most cases. You should do
use constant EPSILON => 1.0e-10;
abs($value1 - $value2) < EPSILON or warn;
EPSILON should be chosen such that it takes into account the complexity of the computations for valueX. A large computation might lead to a much, much larger EPSILON.
The other option is, as suggested by others:
sprintf("%.5f", value1) eq sprintf("%.5f", value2) or warn;
Or use an arbitrary precision math library.