sprintf : fixed point, big numbers and precision loss - perl

I need to read some numbers in a database and write them into a text file using Perl.
In the table where are the numbers, the data format is defined as numeric (25,5) (it reads 25 digits, including 5 decimals).
I format the numbers in my file with a sprintf "%.5f", $myvalue to force 5 decimals and I just noticed that for greats values, there is a precision loss for numbers with more than 17 digits :
db = 123.12345
file = 123.12345 (OK)
db = 12345678901234891.12345
file = 12345678901234892.00000 (seems to be rounded to upper integer)
db = 12345678901234567890.12345
file = 12345678901234567000.00000 (truncation ?)
What is Perl's greatest precision for fixed decimal numbers?
I am aware of the concepts and limitations of floating point arithmetic in general, but I am not a Perl monk and I do not know the internals of Perl so I don't know if it is normal (or if it is related at all to floating point). I am not sure either if it is a internal limitation of Perl, or a problem related to the sprintf processing.
Is there a workaround or a dedicated module that could help with that problem?
Some notable points :
this is an additional feature of a system that already uses Perl, so using another tool is not an option
the data being crunched is financial so I need to keep every cent and I cannot cope with a +/- 10 000 units precision :^S

Once again, I am finding a solution right after asking SO. I am putting my solution here, to help a future visitor :
replace
$myout = sprintf "%.5f", $myvalue;
by
use Math::BigFloat;
$myout = Math::BigFloat->new($myvalue)->ffround( -5 )->bstr;

Without modules like Math::BigFloat, everything above 16 digits is pure magic... e.g.
perl -e 'printf "*10^%02d: %-.50g\n", $_, log(42)*(10**$_) for (0..20)'
produces
*10^00: 3.7376696182833684112267746968427672982215881347656
*10^01: 37.376696182833683224089327268302440643310546875
*10^02: 373.76696182833683224089327268302440643310546875
*10^03: 3737.6696182833684360957704484462738037109375
*10^04: 37376.6961828336861799471080303192138671875
*10^05: 373766.96182833681814372539520263671875
*10^06: 3737669.6182833681814372539520263671875
*10^07: 37376696.18283368647098541259765625
*10^08: 373766961.82833683490753173828125
*10^09: 3737669618.283368587493896484375
*10^10: 37376696182.83368682861328125
*10^11: 373766961828.33685302734375
*10^12: 3737669618283.36865234375
*10^13: 37376696182833.6875
*10^14: 373766961828336.8125
*10^15: 3737669618283368.5
*10^16: 37376696182833688
*10^17: 373766961828336832
*10^18: 3737669618283368448
*10^19: 37376696182833684480
*10^20: 373766961828336828416

What is Perl's greatest precision for fixed decimal numbers?
Perl doesn't have fixed point decimal numbers. Very few languages do, actually. You could use a module like Math::FixedPoint, though

Perl is storing your values as floating-point numbers internally.1 The precision is dependent on how your version of Perl is compiled, but it's probably a 64-bit double.
C:\>perl -MConfig -E "say $Config::Config{doublesize}"
8
A 64-bit double-precision float2 has a 53-bit significand (a.k.a. fraction or mantissa) which gives it approximately 16 decimal characters of precision. Your database is defined as storing 25 characters of precision. You'll be fine if you treat the data as a string but if you treat it as a number you'll lose precision.
Perl's bignum pragma provides transparent support for arbitrarily large numbers. It can slow things down considerably so limit its use to the smallest possible scope. If you want big floats only (without making other numeric types "big") use Math::BigFloat instead.
1. Internally, perl uses a datatype called an SV that can hold floats, ints, and/or strings simultaneously.
2. Assuming IEEE 754 format.

Alternatively, if you're just transferring the values from the database to a text file and not operating on them as numbers, then have the DB format them as strings. Then read and print them as strings (perhaps using "printf '%s'"). For example:
select Big_fixed_point_col(format '-Z(24)9.9(5)')(CHAR(32))

Related

Simple addition in Perl causing result to be off by small fraction [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Why is floating point arithmetic in C# imprecise?
Why does ghci say that 1.1 + 1.1 + 1.1 > 3.3 is True?
#!/usr/bin/perl
$l1 = "0+0.590580+0.583742+0.579787+0.564928+0.504538+0.459805+0.433273+0.384211+0.3035810";
$l2 = "0+0.590580+0.583742+0.579788+0.564928+0.504538+0.459805+0.433272+0.384211+0.3035810";
$val1 = eval ($l1);
$val2 = eval ($l2);
$diff = (($val1 - $val2)/$val1)*100;
print " (($val1 - $val2)/$val1)*100 ==> $diff\n";
Surprisingly the output ended up to be
((4.404445 - 4.404445)/4.404445)*100 ==> -2.01655014354845e-14.
Is it not supposed to be a ZERO????
Can any one explain this please......
What every computer scientist should know about floating point arithmetic
See Why is floating point arithmetic in C# imprecise?
This isn't Perl related, but floating point related.
It's pretty close to zero, which is what I'd expect.
Why's it supposed to be zero? 0.579787 != 0.579788 and 0.433273 != 0.433272. It's likely that none of these have an exact floating point representation so you should expect some inaccuracies.
From perlfaq4's answer to Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?:
Internally, your computer represents floating-point numbers in binary. Digital (as in powers of two) computers cannot store all numbers exactly. Some real numbers lose precision in the process. This is a problem with how computers store numbers and affects all computer languages, not just Perl.
perlnumber shows the gory details of number representations and conversions.
To limit the number of decimal places in your numbers, you can use the printf or sprintf function. See the "Floating Point Arithmetic" for more details.
printf "%.2f", 10/3;
my $number = sprintf "%.2f", 10/3;
When you change the two strings to be equal (there are 2 digits different between $l1 and $l2) it does indeed result in zero.
What it is demonstrating is that you can create 2 different floating point numbers ($val1 and $val2) that look the same when printed out, but internally have a tiny difference. These differences can be magnified up if you're not careful.
Vinko Vrsalovic posted some good links to explain why.

Convert Integer in to decimal Perl

I need to convert the following integers numbers
29900
17940
1
in to decimal format like
299.00
179.40
0.01
I tried already Data/Types.pm but
to_decimal(1, 2)
return 1.00
Perl does not have data types in the sense of integer, float or string. All you need to do is divide by 100. If you want an output with two decimals, use sprintf to format it.
printf '%.02d', 29900 / 100;
Will output 299.00. Note that printf is like sprintf, but with printing instead of returning.
You can read perldata to learn more about what kinds of data Perl has.
Under the hood at the XS and C layer, there are of course data types. You can learn about them in perlguts. But the whole point of a higher language is to abstract those things away. So if all you do is write Perl code, you never need to care that those exist or how they work.

How can I eliminate floating point inaccuracy when I pack and unpack a floating point number?

I am packing an array of numbers to send via UDP to another piece of hardware using socket programming.
When I pack the number 12.2 and then unpack it, I get 12.199999892651. As I am working with numbers related to latitudes and longitudes, I cannot have such deviations.
This is the simple script I wrote:
use warnings;
use Time::HiRes qw (sleep);
#Data = ( 20.2, 30.23, 40.121, 1, 2, 3, 4, 6. 4, 3.2, 9.9, 0.1, 12.2, 0.99, 7.8, 999, 12.3 );
$myArr = pack('f*', #Data);
print "$myArr\n\n";
#Dec = unpack('f*',$myArr);
print "#Dec";
The output is:
20.2000007629395 30.2299995422363 40.1209983825684 1 2 3 4 6.40000009536743 3.20 000004768372 9.89999961853027 0.100000001490116 12.1999998092651 0.9900000095367 43 7.80000019073486 999 12.3000001907349
Is there any way I can control the precision?
pack's f template is for single-precision floating point numbers, which on most platforms is good to 7 or so decimal places of accuracy. The d template offers double-precision and will be good enough for ~15 decimal places.
print unpack("f", pack("f",12.2)); # "12.1999998092651"
print unpack("d", pack("d",12.2)); # "12.2"
printf "%.20f",unpack("f", pack("f",12.2)); # "12.19999980926513671875"
printf "%.20f",unpack("d", pack("d",12.2)); # "12.19999999999999928946"
The short answer is: don't pack these numbers as floats. You will lose accuracy due to IEEE floating point representation. Instead, convert them to "character decimals" (i.e. strings), and pack them as strings. If you really need the accuracy, and don't need to perform math operations on them, you may want to store them as strings in Perl as well.
2/10 is a periodic number in binary just like 1/3 is a periodic number in decimal. It's impossible to store it exactly in a floating point number as it would take infinite storage.
As such, it's not pack that's introducing the error; it's faithfully storing precisely the number you provided it.
$ perl -E'say sprintf "%.20e", 12.2'
1.21999999999999992895e+01
$ perl -E'say sprintf "%.20e", unpack "d", pack "d", 12.2'
1.21999999999999992895e+01
As long as you use floating point numbers, you will not be able to store 12.2 exactly.
But as you can see above, you can store store precisely enough by using d (double-precision, almost 16 digits of precision) instead of f (single-precision, over 7 digits of precision). Perl uses double-precision, so you were actually introducing precision loss by using f instead of d.
So use d, and round your results (sprintf "%.10f").

How to stop matlab truncating long numbers

These two long numbers are the same except for the last digit.
test = [];
test(1) = 33777100285870080;
test(2) = 33777100285870082;
but the last digit is lost when the numbers are put in the array:
unique(test)
ans = 3.3777e+16
How can I prevent this? The numbers are ID codes and losing the last digit is screwing everything up.
Matlab uses 64-bit floating point representation by default for numbers. Those have a base-10 16-digit precision (more or less) and your numbers seem to exceed that.
Use something like uint64 to store your numbers:
> test = [uint64(33777100285870080); uint64(33777100285870082)];
> disp(test(1));
33777100285870080
> disp(test(2));
33777100285870082
This is really a rounding error, not a display error. To get the correct strings for output purposes, use int2str, because, again, num2str uses a 64-bit floating point representation, and that has rounding errors in this case.
To add more explanation to #rubenvb's solution, your values are greater than flintmax for IEEE 754 double precision floating-point, i.e, greater than 2^53. After this point not all integers can be exactly represented as doubles. See also this related question.

Perl pack and unpack functions

I am trying to unpack a variable containing a string received from a spectrum analyzer:
#42404?û¢-+Ä¢-VÄ¢-oÆ¢-8æ¢-bÉ¢-ôÿ¢-+Ä¢-?Ö¢-sÉ¢-ÜÖ¢-¦ö¢-=Æ¢-8æ¢-uô¢-=Æ¢-\Å¢-uô¢-?ü¢-}¦¢-=Æ¢-)...
The format is real 32 which uses four bytes to store each value. The number #42404 represents 4 extra bytes present and 2404/4 = 601 points collected. The data starts after #42404. Now when I receive this into a string variable,
$lp = ibqry($ud,":TRAC:DATA? TRACE1;*WAI;");
I am not sure how to convert this into an array of numbers :(... Should I use something like the followin?
#dec = unpack("d", $lp);
I know this is not working, because I am not getting the right values and the number of data points for sure is not 601...
First, you have to strip the #42404 off and hope none of the following binary data happens to be an ASCII number.
$lp =~ s{^#\d+}{};
I'm not sure what format "Real 32" is, but I'm going to guess that it's a single precision floating point which is 32 bits long. Looking at the pack docs. d is "double precision float", that's 64 bits. So I'd try f which is "single precision".
#dec = unpack("f*", $lp);
Whether your data is big or little endian is a problem. d and f use your computer's native endianness. You may have to force endianness using the > and < modifiers.
#dec = unpack("f*>", $lp); # big endian
#dec = unpack("f*<", $lp); # little endian
If the first 4 encodes the number of remaining digits (2404) before the floats, then something like this might work:
my #dec = unpack "x a/x f>*", $lp;
The x skips the leading #, the a/x reads one digit and skips that many characters after it, and the f>* parses the remaining string as a sequence of 32-bit big-endian floats. (If the output looks weird, try using f<* instead.)