Does scientific notation affect Perl's precision? - perl

I encountered a weird behaviour in Perl. The following subtraction should yield zero as result (which it does in Python):
print 7.6178E-01 - 0.76178
-1.11022302462516e-16
Why does it occur and how to avoid it?
P.S. Effect appears on "v5.10.0 built for x86_64-linux-gnu-thread-multi" (Ubuntu 9.04) and "v5.8.9 built for darwin-2level" (Mac OS 10.6)

It's not that scientific notation affects the precision so much as the limitations of floating point notation represented in binary. See the answers to the perlfaq4. This is a problem for any language that relies on the underlying architecture for number storage.
Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Why is int() broken?
If you need better number handling, check out the bignum pragma.

Related

Why NumberLong(9007199254740993) matches NumberLong(9007199254740992) in MongoDB from mongo shell?

This situation happens when the given number is big enough (greater than 9007199254740992), along with more tests, I even found many adjacent numbers could match a single number.
Not only NumberLong(9007199254740996) would match NumberLong("9007199254740996"), but also NumberLong(9007199254740995) and NumberLong(9007199254740997).
When I want to act upon a record using its number, I could actually use three different adjacent numbers to get back the same record.
The accepted answer from here makes sense, I quote the most relevant part below:
Caveat: Don't try to invoke the constructor with a too large number, i.e. don't try db.foo.insert({"t" : NumberLong(1234657890132456789)}); Since that number is way too large for a double, it will cause roundoff errors. Above number would be converted to NumberLong("1234657890132456704"), which is wrong, obviously.
Here are some supplements to make things more clear:
Firstly, Mongo shell is a JavaScript shell. And JS does not distinguish between integer and floating-point values. All numbers in JS are represented as floating point values. This means mongo shell uses 64 bit floating point number by default. If shell sees "9007199254740995", it will treat this as a string and convert it to long long. But when we omit the double quotes, mongo shell will see unquoted 9007199254740995 and treat it as a floating-point number.
Secondly, JS uses the 64 bit floating-point format defined in IEEE 754 standard to represent numbers, the maximum it can represent is:
, and the minimum is:
There are an infinite number of real numbers, but only a limited number of real numbers can be accurately represented in the JS floating point format. This means that when you deal with real numbers in JS, the representation of the numbers will usually be an approximation of the actual numbers.
This brings the so-called rounding error issue. Because integers are also represented in binary floating-point format, the reason for the loss of trailing digits precision is actually the same as that of decimals.
The JS number format allows you to accurately represent all integers between
and
Here, since the numbers are bigger than 9007199254740992, the rounding error certainly occurs. The binary representation of NumberLong(9007199254740995), NumberLong(9007199254740996) and NumberLong(9007199254740997) are the same. So when we query with these three numbers in this way, we are practically asking for the same thing. As a result, we will get back the same record.
I think understanding that this problem is not specific to JS is important: it affects any programming language that uses binary floating point numbers.
You are misusing the NumberLong constructor.
The correct usage is to give it a string argument, as stated in the relevant documentation.
NumberLong("2090845886852")

Is it possible to predict when Perl's decimal/float math will be wrong? [duplicate]

This question already has answers here:
Why can't decimal numbers be represented exactly in binary?
(22 answers)
Closed 7 years ago.
In one respect, I understand that Perl's floats are inexact binary representations, which causes Perl's math to sometimes be wrong. What I don't understand, is why sometimes these floats seem to give exact answers, and other times, not. Is it possible to predict when Perl's float math will give the wrong (i.e. inexact answer)?
For instance, in the below code, Perl's math is wrong 1 time when the subtraction is "16.12 - 15.13", wrong 2 times when the problem is "26.12 - 25.13", and wrong 20 times when the problem is "36.12 - 35.13". Furthermore, for some reason, in all of the above mentioned test cases, the result of our subtraction problem (i.e. $subtraction_problem) starts out as being wrong, but will tend to become more correct, the more we add or subtract from it (with $x). This makes no sense, why is it that the more we add to or subtract from our arithmetic problem, the more likely it becomes that the value is correct (i.e. exact)?
my $subtraction_problem = 16.12 - 15.13;
my $perl_math_failures = 0;
for (my $x = -25; $x< 25; $x++){
my $result = $subtraction_problem +$x;
print "$result\n";
$perl_math_failures++ if length $result > 6;
}
print "There were $perl_math_failures perl math failures!\n";
None of this is Perl specific. See Goldberg:
Rounding Error
Squeezing infinitely many real numbers into a finite number of bits requires an approximate representation. Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits. In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation. This rounding error is the characteristic feature of floating-point computation. The section Relative Error and Ulps describes how it is measured.
Since most floating-point calculations have rounding error anyway, does it matter if the basic arithmetic operations introduce a little bit more rounding error than necessary? That question is a main theme throughout this section. The section Guard Digits discusses guard digits, a means of reducing the error when subtracting two nearby numbers. Guard digits were considered sufficiently important by IBM that in 1968 it added a guard digit to the double precision format in the System/360 architecture (single precision already had a guard digit), and retrofitted all existing machines in the field. Two examples are given to illustrate the utility of guard digits.
The IEEE standard goes further than just requiring the use of a guard digit. It gives an algorithm for addition, subtraction, multiplication, division and square root, and requires that implementations produce the same result as that algorithm. Thus, when a program is moved from one machine to another, the results of the basic operations will be the same in every bit if both machines support the IEEE standard. This greatly simplifies the porting of programs. Other uses of this precise specification are given in Exactly Rounded Operations.

Are there any real-world uses for converting numbers between different bases?

I know that we need to convert decimal, octal, and hexadecimal into binary, but I am confused about conversion of decimal to octal or octal to hexadecimal or decimal to hexadecimal.
Why and where we need these types of conversion?
Different bases are good for different purposes.
Decimal is obviously what most people know how to deal with, so is good for output of real quantities to end users.
Hex is short and has an even ratio of exactly 2 characters per byte, so it's good for expressing large numbers like SHA1 hashes or private keys and the like in a type-able format, particularly since those numbers don't really represent a quantity, so users don't need to be able to understand them as numbers.
Octal is mostly for legacy reasons -- UNIX file permission codes are traditionally expressed as octal numbers, for example, because three bits per digit corresponds nicely to the three bits per user-category of the UNIX permission encoding scheme.
One sometimes will want to use numbers in one base for a purpose where another base is desired. Thus, the various conversion functions available. In truth, however, my experience is that in practice you almost never convert from one base to another much, except to convert numbers from some non-binary base into binary (in the form of your language of choice's native integral type) and back out into whatever base you need to output. Most of the time one goes from one non-binary base to another is when learning about bases and getting a feel for what numbers in different bases look like, or when debugging using hexadecimal output. Even then, if a computer does it the main method is to convert to binary and then back out, because current computers are just inherently good at dealing with base-2 numbers and not-so-good at anything else.
One important place you see numbers actually stored and operated on in decimal is in some financial applications or others where it's important that "number-of-decimal-place" level precision be preserved. Sometimes fixed-point arithmetic can work for currency, but not always, and if it doesn't using binary-floating-point is a bad idea. Older systems actually had built in support for this in the form of binary-coded-decimal arithmetic. In BCD, each 4 bits acts as a decimal digit, so you give up a chunk of every 4 bits of storage in exchange for maintaining your level of precision in the base-of-choice of the non-computing world.
Oddly enough, there is one common use case for other bases that's a bit hidden. Modern languages with large number support (e.g. Python 2.x's long type or Java's BigInteger and BigDecimal type) will usually store the numbers internally in an array with each element being a digit in some base. Then they implement the math they support on strings of digits of that base. Really efficient bigint implementations may actually use use a base approaching 2^(bits in machine native word size); a base 2^64 number is obviously impossible to usefully output in that form, but doing the calculations in chunks of that size ends up making the best use of space and the CPU. (I don't know if that's the best base; it may be best to use a base of half that number of bits to simplify overflow handling from one digit to the next. It's been awhile since I wrote my own bigint and I never implemented the faster/more-complicated versions of multiplication and division.)
MIME uses hexadecimal system for Quoted Printable encoding (e.g. mail subject in Unicode) abd 64-based system for Base64 encoding.
If your workplace is stuck in IPv4 CIDR - you'll be doing quite a lot of bin -> hex -> decimal conversions managing most of the networking equipment until you get them memorized (or just use some random, simple tool).
Even that usage is a bit few-and-far-between - most businesses just adopt the lazy "/24 everything" approach.
If you do a lot of graphics work - there's the chance you'll want to convert colors between systems and need to convert from hex -> dec... most tools have this built in to the color picker, though.
I suppose there's no practical reason to be able to do other than it's really simple and there's no point not learning how to do it. :)
... unless, for some reason, you're trying to do mantissa binary math in your head.
All of these bases have their uses. Hexadecimal in particular is useful as a shorthand for binary. Every hexadecimal digit is equivalent to 4 bits, so you can write a full 32-bit value as a string of 8 hex digits. Likewise, octal digits are equivalent to 3 bits, and are used frequently as a shorthand for things like Unix file permissions (777 = set read, write, execute bits for user/group/other).
No one base is special--they all have their (obscure) uses. Decimal is special to us because it reflects human experience (10 fingers) but that's really the only reason.
A real world use case: a program prints error code in decimal, to get info from a database or the internet you need the hexadecimal format, because the bits of the error 'number' convey extra info you need to look at it in binary.
I'm there are occasional uses for this. One use case would be a little app that allows user who wants to convert decimal to octal ... like you can with lots of calculators.
But I'm not sure I understand the point of the question. Standard libraries typically don't provide methods like String toOctal(String decimal). Instead, you would normally convert from a decimal String to a primitive integer and then from the primitive integer to (say) an octal String.

How to eliminate Perl rounding errors

Consider the following program:
$x=12345678901.234567000;
$y=($x-int($x))*1000000000;
printf("%f:%f\n",$x,$y);
Here's what is prints:
12345678901.234568:234567642.211914
I was expecting:
12345678901.234567:234567000
This appears to be some sort of rounding issue in Perl.
How could I change it to get 234567000 instead?
Did I do something wrong?
This is a frequently-asked question.
Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?
Internally, your computer represents floating-point numbers in binary. Digital (as in powers of two) computers cannot store all numbers exactly. Some real numbers lose precision in the process. This is a problem with how computers store numbers and affects all computer languages, not just Perl.
perlnumber shows the gory details of number representations and conversions.
To limit the number of decimal places in your numbers, you can use the printf or sprintf function. See the Floating Point Arithmetic for more details.
printf "%.2f", 10/3;
my $number = sprintf "%.2f", 10/3;
Make "use bignum;" the first line of your program.
Other answers explain what to expect when using floating point arithmetic -- that some digits towards the end are not really part of the answer. This is to make the computations do-able in a reasonable amount of time and space. If you are willing to use unbounded time and space to work with numbers, then you can use arbitrary-precision numbers and math, which is what "use bignum" enables. It's slower and uses more memory, but it works like math you learned in elementary school.
In general, it's best to learn more about how floating point math works before converting your program to arbitrary-precision math. It's only needed in very strange situations.
The whole issue of floating point precision has been answered, but you're still seeing the problem despite bignum. Why? The culprit is printf. bignum is a shallow pragma. It only affects how numbers are represented in variables and math operations. Even though bignum makes Perl do the math right, printf is still implemented in C. %f takes your precise number and turns it right back into an imprecise floating point number.
Print your numbers with just print and they should do fine. You'll have to format them manually.
The other thing you can do is to recompile Perl with -Duse64bitint -Duselongdouble which will force Perl to internally use 64 bit integers and long double floating point numbers. This will give you a lot more accuracy, more consistently and almost no performance cost (bignum is a bit of a performance hog for math intensive code). Its not 100% accurate like bignum, but it will affect things like printf. However, recompiling Perl this way makes it binary incompatible, so you're going to have to recompile all your extensions. If you do this, I suggest installing a fresh Perl in a different location (/usr/local/perl/64bit or something) rather than trying to manage parallel Perl installs sharing the same library.
Homework (Googlework?) for you: How are floating point numbers represented by computers?
You can only have a limited number of precise digits, everything beyond that is just the noise from base conversion (binary to decimal). That is also why the last digit of your $x appears to be 8.
$x - (int($x) is 0.23456linenoise, which is also a floating point number. Multiplied by 1000000000, it gives another floating point number, with more random digits pulled from the incommensurability of the bases.
Perl does not do arbitrary precision arithmetic for its built-in floating point types. So your initial variable $x is an approximation. You can see this by doing:
$ perl -e 'printf "%.10f", 12345678901.234567000'
12345678901.2345676422
This answer works on my x64 platform, by accommodating the scale of the errors
sub safe_eq {
my($var1,$var2)=#_;
return 1 if($var1==$var2);
my $dust;
if($var2==0) { $dust=abs($var1); }
else { $dust= abs(($var1/$var2)-1); }
return 0 if($dust>5.32907051820076e-15 ); # 5.32907051820075e-15
return 1;
}
You can build on the above to solve most of your problems.
Avoid bignum if you can - it's stupendously slow - plus it will not solve any problems if you've got to store your numbers anyplace like a DB or in JSON etc.
This has to do with the (limited) accuracy of the floating point computations a computer does. Generally when comparing floating point numbers you should compare with a suitable epsilon:
$value1 == $value2 or warn;
won't work as expected in most cases. You should do
use constant EPSILON => 1.0e-10;
abs($value1 - $value2) < EPSILON or warn;
EPSILON should be chosen such that it takes into account the complexity of the computations for valueX. A large computation might lead to a much, much larger EPSILON.
The other option is, as suggested by others:
sprintf("%.5f", value1) eq sprintf("%.5f", value2) or warn;
Or use an arbitrary precision math library.

Problem with very small numbers?

I tried to assign a very small number to a double value, like so:
double verySmall = 0.000000001;
9 fractional digits. For some reason, when I multiplicate this value by 10, I get something like 0.000000007. I slighly remember there were problems writing big numbers like this in plain text into source code. Do I have to wrap it in some function or a directive in order to feed it correctly to the compiler? Or is it fine to type in such small numbers in text?
The problem is with floating point arithmetic not with writing literals in source code. It is not designed to be exact. The best way around is to not use the built in double - use integers only (if possible) with power of 10 coefficients, sum everything up and display the final useful figure after rounding.
Standard floating point numbers are not stored in a perfect format, they're stored in a format that's fairly compact and fairly easy to perform math on. They are imprecise at surprisingly small precision levels. But fast. More here.
If you're dealing with very small numbers, you'll want to see if Objective-C or Cocoa provides something analagous to the java.math.BigDecimal class in Java. This is precisely for dealing with numbers where precision is more important than speed. If there isn't one, you may need to port it (the source to BigDecimal is available and fairly straightforward).
EDIT: iKenndac points out the NSDecimalNumber class, which is the analogue for java.math.BigDecimal. No port required.
As usual, you need to read stuff like this in order to learn more about how floating-point numbers work on computers. You cannot expect to be able to store any random fraction with perfect results, just as you can't expect to store any random integer. There are bits at the bottom, and their numbers are limited.