Can Perl detect if a floating point number has been implicitly rounded? - perl

When I use the code:
(sub {
use strict;
use warnings;
print 0.49999999999999994;
})->();
Perl outputs "0.5".
And when I remove one "9" from the number:
(sub {
use strict;
use warnings;
print 0.4999999999999994;
})->();
It prints 0.499999999999999.
Only when I remove another 9, it actually stores the number precisely.
I know that floating point numbers are a can of worms nobody wants to deal with, but I am curious if there is a way in Perl to "trap" this implicit conversion and die, so that I can use eval to catch this die and let the user know that the number they are trying to pass is not supported by Perl in its' native form(So the user can maybe pass a string or an object instead).
The reason why I need this is to avoid a situations like passing 0.49999999999999994 to be rounded by my function, but the number gets converted to 0.5, and in turn gets rounded to 1 instead of 0. I am not sure how to "intercept" this conversion so that my function "knows" that it did not actually get 0.5 as input, but that the user's input was intercepted.
Without knowing how to intercept this kind of conversion, I cannot trust "round" because I do not know whether it received my input as I sent it, or if that input has been modified(at compile time or runtime, not sure) before the function was called(and in turn, the function has no idea if the input it is operating on is the input the user intended or not and has no means to warn the user).
This is not a Perl unique problem, it happens in JavaScript:
(() => {
'use strict';
/* oops: 1 */
console.log(Math.round(0.49999999999999999))
})();
It happens in Ruby:
(Proc.new {
# oops: 1
print (0.49999999999999999.round)
}).call()
It happens in PHP:
<?php
(call_user_func(function() {
/* oops: 1 */
echo round(0.49999999999999999);
}));
?>
it even happens in C(which is okay to happen, but my gcc does not warn me that the number has not been stored precisely(when specifying specific floating point literals, they had better be stored exactly, or the compiler should warn you that it decided to turn it into another form(e.g. "Your number x cannot be represented in 64 bit/32 bit floating point form, so I converted it to y." ) so you can see if that's okay or not, in this case it is NOT)):
#include <math.h>
#include <stdio.h>
int main(int argc, char **argv)
{
/* oops: 1 */
printf("%f.\n", round(0.49999999999999999));
return 0;
}
Summary:
Is it possible to make Perl show error or warning on implicit conversions of floating numbers, or is this something that Perl5(along with other languages) are incapable of doing at this moment(e.g. The compiler does not go out of its' way to support such warnings/offer a flag to enable such warnings)?
e.g.
warning: the number 0.49999999999999994 is not representable, it has been converted to 0.5. using bigint might solve this. Consider reducing precision of the number.

Perhaps use BigNum:
$ perl -Mbignum -le 'print 0.49999999999999994'
0.49999999999999994
$ perl -Mbignum -le 'print 0.49999999999999994+0.1'
0.59999999999999994
$ perl -Mbignum -le 'print 0.49999999999999994-0.1'
0.39999999999999994
$ perl -Mbignum -le 'print 0.49999999999999994+10.1'
10.59999999999999994
It transparently extends precision of Perl floating point and ints to extended precision.

be aware that bignum is 150 times slower than internal and other math solutions, and will typicaly NOT solve your problem (as soon as you need to store your numbers in JSON or databases or whatever, you're back at the same problem again).
Typically, sprintf takes care of prettying your output for you, so you do not have to see the ugly imprecision, however, it's still there.
Here is an example which works on my x64 platform which understands how to deal with that imprecision.
This correctly tells you if the 2 numbers you're interested in are the same:
sub safe_eq {
my($var1,$var2)=#_;
return 1 if($var1==$var2);
my $dust;
if($var2==0) { $dust=abs($var1); }
else { $dust= abs(($var1/$var2)-1); }
return 0 if($dust>5.32907051820076e-15 ); # dust <= 5.32907051820075e-15
return 1;
}
You can build on top of this to solve all your problems.
It works by understanding the magnitude of the imprecision in your native numbers, and accommodating it.

As you said in the question, dealing with floating-point numbers in code is quite the can of worms, precisely because the standard floating-point representation, regardless of the precision employed, is incapable of accurately representing many decimal numbers. The only 100% reliable way around this is to not use floating-point numbers.
The easiest way to apply that is to instead use fixed-point numbers, although that limits precision to a fixed number of decimal places. e.g., Instead of storing 10.0050, define a convention that all numbers are stored to 4 decimal places and store 100050 instead.
But that doesn't seem likely to satisfy you, based on the minimal explanation you've given for what you're actually trying to accomplish (building a general-purpose math library). The next option, then, would be to store the number of decimal places as a scaling factor with each value. So 10.0050 would become an object containing the data { value => 100050, scale => 4 }.
This can then be extended into a more general "rational number" data type by effectively storing each number as a numerator and denominator, thus allowing you to precisely store numbers such as 1/3, which neither base 2 nor base 10 can represent exactly. This is, incidentally, the approach that I am told Perl 6 has taken. So, if switching to Perl 6 is an option, then you may find that it all Just Works for you once you do so.

Related

Does Perl's Glob have a limitation?

I am running the following expecting return strings of 5 characters:
while (glob '{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}'x5) {
print "$_\n";
}
but it returns only 4 characters:
anbc
anbd
anbe
anbf
anbg
...
However, when I reduce the number of characters in the list:
while (glob '{a,b,c,d,e,f,g,h,i,j,k,l,m}'x5) {
print "$_\n";
}
it returns correctly:
aamid
aamie
aamif
aamig
aamih
...
Can someone please tell me what I am missing here, is there a limit of some sort? or is there a way around this?
If it makes any difference, It returns the same result in both perl 5.26 and perl 5.28
The glob first creates all possible file name expansions, so it will first generate the complete list from the shell-style glob/pattern it is given. Only then will it iterate over it, if used in scalar context. That's why it's so hard (impossible?) to escape the iterator without exhausting it; see this post.
In your first example that's 265 strings (11_881_376), each five chars long. So a list of ~12 million strings, with (naive) total in excess of 56Mb ... plus the overhead for a scalar, which I think at minimum is 12 bytes or such. So at the order of a 100Mb's, at the very least, right there in one list.โ€ 
I am not aware of any formal limits on lengths of things in Perl (other than in regex) but glob does all that internally and there must be undocumented limits -- perhaps some buffers are overrun somewhere, internally? It is a bit excessive.
As for a way around this -- generate that list of 5-char strings iteratively, instead of letting glob roll its magic behind the scenes. Then it absolutely should not have a problem.
However, I find the whole thing a bit big for comfort, even in that case. I'd really recommend to write an algorithm that generates and provides one list element at a time (an "iterator"), and work with that.
There are good libraries that can do that (and a lot more), some of which are Algorithm::Loops recommended in a previous post on this matter (and in a comment), Algorithm::Combinatorics (same comment), Set::CrossProduct from another answer here ...
Also note that, while this is a clever use of glob, the library is meant to work with files. Apart from misusing it in principle, I think that it will check each of (the ~ 12 million) names for a valid entry! (See this page.) That's a lot of unneeded disk work. (And if you were to use "globs" like * or ? on some systems it returns a list with only strings that actually have files, so you'd quietly get different results.)
โ€ โ€‰I'm getting 56 bytes for a size of a 5-char scalar. While that is for a declared variable, which may take a little more than an anonymous scalar, in the test program with length-4 strings the actual total size is indeed a good order of magnitude larger than the naively computed one. So the real thing may well be on the order of 1Gb, in one operation.
Update A simple test program that generates that list of 5-char long strings (using the same glob approach) ran for 15-ish minutes on a server-class machine and took 725 Mb of memory.
It did produce the right number of actual 5-char long strings, seemingly correct, on this server.
Everything has some limitation.
Here's a pure Perl module that can do it for you iteratively. It doesn't generate the entire list at once and you start to get results immediately:
use v5.10;
use Set::CrossProduct;
my $set = Set::CrossProduct->new( [ ([ 'a'..'z' ]) x 5 ] );
while( my $item = $set->get ) {
say join '', #$item
}

perl6: Cannot unbox 65536 bit wide bigint into native integer

I try some examples from Rosettacode and encounter an issue with the provided Ackermann example: When running it "unmodified" (I replaced the utf-8 variable names by latin-1 ones), I get (similar, but now copyable):
$ perl6 t/ackermann.p6
65533
19729 digits starting with 20035299304068464649790723515602557504478254755697...
Cannot unbox 65536 bit wide bigint into native integer
in sub A at t/ackermann.p6 line 3
in sub A at t/ackermann.p6 line 11
in sub A at t/ackermann.p6 line 3
in block <unit> at t/ackermann.p6 line 17
Removing the proto declaration in line 3 (by commenting out):
$ perl6 t/ackermann.p6
65533
19729 digits starting with 20035299304068464649790723515602557504478254755697...
Numeric overflow
in sub A at t/ackermann.p6 line 8
in sub A at t/ackermann.p6 line 11
in block <unit> at t/ackermann.p6 line 17
What went wrong? The program doesn't allocate much memory. Is the natural integer kind-of limited?
I replaced in the code from Ackermann function the ๐‘š with m and the ๐‘› with n for better terminal interaction for copying errors and tried to comment out proto declaration. I also asked Liz ;)
use v6;
proto A(Int \m, Int \n) { (state #)[m][n] //= {*} }
multi A(0, Int \n) { n + 1 }
multi A(1, Int \n) { n + 2 }
multi A(2, Int \n) { 3 + 2 * n }
multi A(3, Int \n) { 5 + 8 * (2 ** n - 1) }
multi A(Int \m, 0 ) { A(m - 1, 1) }
multi A(Int \m, Int \n) { A(m - 1, A(m, n - 1)) }
# Testing:
say A(4,1);
say .chars, " digits starting with ", .substr(0,50), "..." given A(4,2);
A(4, 3).say;
Please read JJ's answer first. It's breezy and led to this answer which is effectively an elaboration of it.
TL;DR A(4,3) is a very big number, one that cannot be computed in this universe. But raku(do) will try. As it does you will blow past reasonable limits related to memory allocation and indexing if you use the caching version and limits related to numeric calculations if you don't.
I try some examples from Rosettacode and encounter an issue with the provided Ackermann example
Quoting the task description with some added emphasis:
Arbitrary precision is preferred (since the function grows so quickly)
raku's standard integer type Int is arbitrary precision. The raku solution uses them to compute the most advanced answer possible. It only fails when you make it try to do the impossible.
When running it "unmodified" (I replaced the utf-8 variable names by latin-1 ones)
Replacing the variable names is not a significant change.
But adding the A(4,3) line shifted the code from being computable in reality to not being computable in reality.
The example you modified has just one explanatory comment:
Here's a caching version of that ... to make A(4,2) possible
Note that the A(4,2) solution is nearly 20,000 digits long.
If you look at the other solutions on that page most don't even try to reach A(4,2). There are comments like this one on the Phix version:
optimised. still no bignum library, so ack(4,2), which is power(2,65536)-3, which is apparently 19729 digits, and any above, are beyond (the CPU/FPU hardware) and this [code].
A solution for A(4,2) is the most advanced possible.
A(4,3) is not computable in practice
To quote Academic Kids: Ackermann function:
Even for small inputs (4,3, say) the values of the Ackermann function become so large that they cannot be feasibly computed, and in fact their decimal expansions cannot even be stored in the entire physical universe.
So computing A(4,3).say is impossible (in this universe).
It must inevitably lead to an overflow of even arbitrary precision integer arithmetic. It's just a matter of when and how.
Cannot unbox 65536 bit wide bigint into native integer
The first error message mentions this line of code:
proto A(Int \m, Int \n) { (state #)[m][n] //= {*} }
The state # is an anonymous state array variable.
By default # variables use the default concrete type for raku's abstract array type. This default array type provides a balance between implementation complexity and decent performance.
While computing A(4,2) the indexes (m and n) remain small enough that the computation completes without overflowing the default array's indexing limit.
This limit is a "native" integer (note: not a "natural" integer). A "native" integer is what raku calls the fixed width integers supported by the hardware it's running on, typically a long long which in turn is typically 64 bits.
A 64 bit wide index can handle indices up to 9,223,372,036,854,775,807.
But in trying to compute A(4,3) the algorithm generates a 65536 bits (8192 bytes) wide integer index. Such an integer could be as big as 265536, a 20,032 decimal digit number. But the biggest index allowed is a 64 bit native integer. So unless you comment out the caching line that uses an array, then for A(4,3) the program ends up throwing the exception:
Cannot unbox 65536 bit wide bigint into native integer
Limits to allocations and indexing of the default array type
As already explained, there is no array that could be big enough to help fully compute A(4,3). In addition, a 64 bit integer is already a pretty big index (9,223,372,036,854,775,807).
That said, raku can accommodate other array implementations such as Array::Sparse so I'll discuss that briefly below because such possibilities might be of interest for other problems.
But before discussing bigger arrays, running the code below on tio.run shows the practical limits for the default array type on that platform:
my #array;
#array[2**29]++; # works
#array[2**30]++; # could not allocate 8589967360 bytes
#array[2**60]++; # Unable to allocate ... 1152921504606846977 elements
#array[2**63]++; # Cannot unbox 64 bit wide bigint into native integer
(Comment out error lines to see later/greater errors.)
The "could not allocate 8589967360 bytes" error is a MoarVM panic. It's a result of tio.run refusing a memory allocation request.
I think the "Unable to allocate ... elements" error is a raku level exception that's thrown as a result of exceeding some internal Rakudo implementation limit.
The last error message shows the indexing limit for the default array type even if vast amounts of memory were made available to programs.
What if someone wanted to do larger indexing?
It's possible to create/use other # (does Positional) data types that support things like sparse arrays etc.
And, using this mechanism, it's possible that someone could write an array implementation that supports larger integer indexing than is supported by the default array type (presumably by layering logic on top of the underlying platform's instructions; perhaps the Array::Sparse I linked above does).
If such an alternative were called BigArray then the cache line could be replaced with:
my #array is BigArray;
proto A(Int \๐‘š, Int \๐‘›) { #array[๐‘š][๐‘›] //= {*} }
Again, this still wouldn't be enough to store interim results for fully computing A(4,3) but my point was to show use of custom array types.
Numeric overflow
When you comment out the caching you get:
Numeric overflow
Raku/Rakudo do arbitrary precision arithmetic. While this is sometimes called infinite precision it obviously isn't actually infinite but is instead, well, "arbitrary", which in this context also means "sane" for some definition of "sane".
This classically means running out of memory to store a number. But in Rakudo's case I think there's an attempt to keep things sane by switching from a truly vast Int to a Num (a floating point number) before completely running out of RAM. But then computing A(4,3) eventually overflows even a double float.
So while the caching blows up sooner, the code is bound to blow up later anyway, and then you'd get a numeric overflow that would either manifest as an out of memory error or a numeric overflow error as it is in this case.
Array subscripts use native ints; that's why you get the error in line 3, when you use the big ints as array subscripts. You might have to define a new BigArray that uses Ints as array subscripts.
The second problem arises in the ** operator: the result is a Real, and when the low-level operations returns a Num, it throws an exception.
https://github.com/rakudo/rakudo/blob/master/src/core/Int.pm6#L391-L401
So creating a BigArray might not be helpful anyway. You'll have to create your own ** too, that always works with Int, but you seem to have hit the (not so infinite) limit of the infinite precision Ints.

perl - int() decrementing an integer

Before I get flamed, I want to say I do understand floating point numbers and things of the sort, but that doesn't seem to be my issue.
To simplify things, I'm trying to determine if a number has more than 2 decimal places. I'm doing this by multiplying the number by 100 (stored under variable "test1") and then truncating it with int() ($test2) and comparing it with an if.
$test1 = $number * 100;
$test2 = int($test1);
unless ($test1 == $test2) {
die ("test1:$test1, test2:$test2");
}
The initial $number comes from a whole series of other functions and should realistically be only two decimals, hence I'm trying to catch those that aren't (as a few entries seem to have very many decimals).
However, I just got:
test1:15, test2:14
from my die().
Can someone explain how that would happen? How can int(15) be 14?
From perldoc:
machine representations of floating-point numbers can sometimes produce counterintuitive results. For example, int(-6.725/0.025) produces -268 rather than the correct -269; that's because it's really more like -268.99999999999994315658 instead
So, the machine representation of "15" is probably something like 14.9999999999999999 and, therefore, int truncates it to 14.
Note that perldoc suggests using the POSIX functions floor or ceil instead.
In a simple, one off, case adding 0.5 to your value before int-ing it will give you what you want.
e.g.
int(14.99 + 0.5)
15
it becomes 15.49 and is int-ed "down" to 15, whereas:
int( 14.45 + 0.5 )
still gets int'ed "down" to 14.0. This is a handy trick but doesn't self document as nicely as using floor and ceil.
As a side note, the Goldberg paper on floating point arithmetic always reminds me how useful it sometimes is to have brains that are not as mindlessly precise as a computer :-)
If I wanted to check if a number had more than two decimal places, I wouldn't do math on it.
my $more_than_two = $number =~ /\d+\.\d{2}\d+\z/;
Before I do that, I might use Scalar::Util's looks_like_a_number. This method will still fail with floating point squishiness if you were expecting 14.99999 to be 15.0.
However, you should tell us what you are trying to do instead of how you are trying to do that. It's easier to give better answers.
For your questions about int, I think it's documentation tell you what you need to know. The rest is answered in the first couple of questions in perlfaq4.

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on โ€œJust another Perl hackerโ€, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning โ€œany number of characters, specified by their number in the current character setโ€. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, โ€ฆ, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, โ€ฆ, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

reading and storing numbers in perl without a loss of percision (Perl)

I have a few numbers in a file in a variety of formats: 8.3, 0.001, 9e-18. I'm looking for an easy way to read them in and store them without any loss of precision. This would be easy in AWK, but how's it done in Perl? I'm only open to using Perl. Thanks!
Also, I was wondering if there's an easy way to print them in an appropriate format. For example, 8.3 should be printed as "8.3" not "8.3e0"
If they're text strings, then reading them into Perl as strings and writing them back out as strings shouldn't result in any loss of precision. If you have to do arithmetic on them, then I suggest installing the CPAN module Math::BigFloat to ensure that you don't lose any precision to rounding.
As to your second question, Perl doesn't do any reformatting unless you ask it to:
$ perl -le 'print 8.3'
8.3
Am I missing something?
From http://perldoc.perl.org/perlnumber.html:
Perl can internally represent numbers in 3 different ways: as native
integers, as native floating point numbers, and as decimal strings.
Decimal strings may have an exponential notation part, as in
"12.34e-56" . Native here means "a format supported by the C compiler
which was used to build perl".
This means that printing the number out depends on how the number is stored internal to perl, which means, in turn, that you have to know how the number is represented on input.
By and large, Perl will just do the right thing, but you should know how what compiler was used, how it represents numbers internally, and how to print those numbers. For example:
$ perldoc -f int
int EXPR
int Returns the integer portion of EXPR. If EXPR is omitted, uses $_. You should
not use this function for rounding: one because it truncates towards 0, and two
because machine representations of floating-point numbers can sometimes produce
counterintuitive results. For example, "int(-6.725/0.025)" produces -268 rather than
the correct -269; that's because it's really more like -268.99999999999994315658
instead. Usually, the "sprintf", "printf", or the "POSIX::floor" and
"POSIX::ceil" functions will serve you better than will int().
I think that if you want to read a number in explicitly as a string, your best bet would be to use unpack() with the 'A*' format.