perl negative look ahead not working on large strings - perl

The perl negative look ahead is not working on large strings ( length > 40000, in active perl and cygwin perl, version 5.14 ). I tried the same code with mingw perl 5.8.8 and it stops working for strings with length > 5000.
The code I am using is:
my $str = q(A B);
my $pattern = '(A)(?:(?!(X)).)*(B)';
if ( $str =~ m/$pattern/ ) {
print "matched\n";
}
This works fine for all three versions of the perl. But when I increase the length of the string by adding spaces, the pattern stops matching.
for e.g.: my $str = q(A ...some 50000 spaces... B);
Kindly help.

Perl imposes an internal limit (happens to be a signed 16-bit integer on most systems) on the size of various regex operations to limit stack growth. This answer has a very good breakdown of the limit.
From empirical testing, when the space count gets to 32767, that's when you fail, so it's certainly this limit.

Related

Perl 5.6.1 vs. Perl 5.14 - converting dec to hex

I found something strange.
Different behaviors for different versions of perl.
The code is:
$x = -806;
$x = sprintf "0x%x" , $x;
print "$x";
In 5.6.1 i get:
0xfffffcda
In 5.14 i get:
0xfffffffffffffcda
How can i get 32-bit in 5.14 as well?
Thanks!
The thing with negative numbers is they're represented via 2s complement binary. What you're seeing is the result of the word size being larger.
I'm not entirely sure precisely why it would have changed (aside from 14 years and a general move to 64bit), but it's not easy to fix without recompiling perl. I'd suggest that's not a good idea since what you're really trying to get is a stringification.
A simpler solution would be a bitwise AND with the appropriate length bitmask:
$x = -806;
$x = sprintf ("0x%x" , $x & 0xffffffff);
print "$x";
Some addition to the answer above:
The number of digits Perl produces when its sprintf converts to hex depends on the size of the native C data type Perl uses internally to store unsigned integer values. What type that is is determined by Perl's Configure script when it sets things up to compile the Perl interpreter, so it's not exactly something that can be changed at run time. It can also vary from operating system to operating system and machine to machine, so if you run your script in different environments you can't be sure how many hex digits will be produced (a point strongly in favor of Sobrique's suggestion). It's also quite likely that the default native type was changed from a 32-bit one to a 64-bit one at some point during the 14 years since 5.6.1 was released.
If you want to know what type is used in a particular perl installation, perl -MConfig -E 'say $Config{uvtype}' will tell you (modify as needed for pre-5.10 perls).

Can't use an undefined value as an ARRAY reference

I have a simple script written in perl, and i keep getting this particular error when i try to run. The script is for generating some numbers for use in checking integer to floating point. This is the particular error i get.
Can't use an undefined value as an ARRAY reference at /tools/oss/packages/i86pc-5.10/perl/5.8.8-32/lib/5.8.8/Math/BigInt/Calc.pm line 1180
From the error message am not able to figure out where my code is going wrong. By the way i need to use 64 bit numbers. How do i debug this issue?
Here is the code sample
use bignum;
use warnings;
use strict;
open(VCVT, ">CvtIntToFp") or die "couldn't open file to write:$!";
my $number;
my $sgn;
# left with 31 bits excluding the sign
# 23 bits of significand needed, all the result
# will be exact except where leading bit ignoring singn is >23
# take its 2's complement to get the negative number and put it
# into the register
# 32 bit number 1 bit sign 31 left any number with leading 1 #position >23 (counting from 0) will be inexact when in floating point
# 30-24 bit positons can have a leading ones at at any position and result is an inexact
my $twoPwr32 = 0x100000000; #2**32
my #num=();
for(my $i=0; $i<100; $i++)
{
$sgn = (rand()%2);
my $tempLead = (rand()%7); # there are 7 bits from 24 to 30
$number=$tempLead << 24;
if($sgn)
{$number = ($twoPwr32- $number +1) & 0xffffffff;
}
$number = sprintf("%x", $number);
push(#num, $number);
}
my $item=0;
foreach $item (#num)
{
print "$item\n";
print VCVT "$item\n";
}
Try using use diagnostics to get a better error message and read perldoc bignum. The error and explanation is given there that usage of bignum internally converts the numbers into bignum and returns a reference. Since I have perl 5.14 documentation I have the link for documentation of perl 5.20 and I think the bug still exists. Refer to http://perldoc.perl.org/bignum.html
Update :
Hexadecimal number > 0xffffffff non-portable at throw_stack.pl line 19 (#1)
(W portable) The hexadecimal number you specified is larger than 2**32-1
(4294967295) and therefore non-portable between systems. See
perlport for more on portability concerns.
Also refer to this question for the usage of 64 bit arithmetic in Perl.

Perl version string: why use EVAL EXPR?

I just took notice to this generated by Catalyst.pl. It is obviously some sort of unannotated hack. What is the advantage of setting up a version string like this? I can't even figure out what they're trying to do.
our $VERSION = '0.01';
$VERSION = eval $VERSION;
Version numbers are complex in Perl. Here's an excellent overview for those looking for the gory details. It might surprise you how many subtle ways there are to get things wrong...
The direct answer to your question though, is that different things expect different formats. For CPAN, you care about development versions for example, as a string. For runtime, you care about them as a number.
Consider the case of $VERSION = "0.01_001". eval converts it to the number 0.01001 correctly.
From perlmodstyle: Version numbering
If you want to release a 'beta' or
'alpha' version of a module but don't
want CPAN.pm to list it as most recent
use an '_' after the regular version
number followed by at least 2 digits,
eg. 1.20_01. If you do this, the
following idiom is recommended:
$VERSION = "1.12_01";
$XS_VERSION = $VERSION; # only needed if you have XS code
$VERSION = eval $VERSION;
With that trick MakeMaker will only
read the first line and thus read the
underscore, while the perl interpreter
will evaluate the $VERSION and convert
the string into a number. Later
operations that treat $VERSION as a
number will then be able to do so
without provoking a warning about
$VERSION not being a number.
The eval converts the string "0.001_001" to a number, following the rules for Perl numeric literals (which allow underscores for legibility). The result is the number 0.001001.
Without the eval, the string is converted to a number following the rule for converting strings, which stops at the first non-numeric character.
E.g.: perl -e 'print "0.001_001" + 0'
I may be misremembering this, but I think some automated code parsers like to see the line of code:
our $VERSION = '0.01';
But you really want $VERSION to hold a float instead of a string.
You may want to read this article, I know I am going to.
Oh, dear god, now I remember why I use
our $VERSION = 20100903;
style version numbers. That is just insane. I love Perl, but that is pure, refined, concentrated insanity. I won't try to summarize David Golden's article. You just have to read it and cry.

In Perl, can I treat a string as a byte array?

In Perl, is it appropriate to use a string as a byte array containing 8-bit data? All the documentation I can find on this subject focuses on 7-bit strings.
For instance, if I read some data from a binary file into $data
my $data;
open FILE, "<", $filepath;
binmode FILE;
read FILE $data 1024;
and I want to get the first byte out, is substr($data,1,1) appropriate? (again, assuming it is 8-bit data)
I come from a mostly C background, and I am used to passing a char pointer to a read() function. My problem might be that I don't understand what the underlying representation of a string is in Perl.
The bundled documentation for the read command, reproduced here, provides a lot of information that is relevant to your question.
read FILEHANDLE,SCALAR,LENGTH,OFFSET
read FILEHANDLE,SCALAR,LENGTH
Attempts to read LENGTH characters of data into variable SCALAR
from the specified FILEHANDLE. Returns the number of
characters actually read, 0 at end of file, or undef if there
was an error (in the latter case $! is also set). SCALAR will
be grown or shrunk so that the last character actually read is
the last character of the scalar after the read.
An OFFSET may be specified to place the read data at some place
in the string other than the beginning. A negative OFFSET
specifies placement at that many characters counting backwards
from the end of the string. A positive OFFSET greater than the
length of SCALAR results in the string being padded to the
required size with "\0" bytes before the result of the read is
appended.
The call is actually implemented in terms of either Perl's or
system's fread() call. To get a true read(2) system call, see
"sysread".
Note the characters: depending on the status of the filehandle,
either (8-bit) bytes or characters are read. By default all
filehandles operate on bytes, but for example if the filehandle
has been opened with the ":utf8" I/O layer (see "open", and the
"open" pragma, open), the I/O will operate on UTF-8 encoded
Unicode characters, not bytes. Similarly for the ":encoding"
pragma: in that case pretty much any characters can be read.
See perldoc -f pack and perldoc -f unpack for how to treat strings as byte arrays.
You probably want to use sysopen and sysread if you want to read bytes from binary file.
See also perlopentut.
Whether this is appropriate or necessary depends on what exactly you are trying to do.
#!/usr/bin/perl -l
use strict; use warnings;
use autodie;
use Fcntl;
sysopen my $bin, 'test.png', O_RDONLY;
sysread $bin, my $header, 4;
print map { sprintf '%02x', ord($_) } split //, $header;
Output:
C:\Temp> t
89504e47
Strings are strings of "characters", which are bigger than a byte.1 You can store bytes in them and manipulate them as though they are characters, taking substrs of them and so on, and so long as you're just manipulating entities in memory, everything is pretty peachy. The data storage is weird, but that's mostly not your problem.2
When you try to read and write from files, the fact that your characters might not map to bytes becomes important and interesting. Not to mention annoying. This annoyance is actually made a bit worse by Perl trying to do what you want in the common case: If all the characters in the string fit into a byte and you happen to be on a non-Windows OS, you don't actually have to do anything special to read and write bytes. Perl will complain, however, if you have stored a non-byte-sized character and try to write it without giving it a clue about what to do with it.
This is getting a little far afield, largely because encoding is a large and confusing topic. Let me leave it off there with some references: Look at Encode(3perl), open(3perl), perldoc open, and perldoc binmode for lots of hilarious and gory details.
So the summary answer is "Yes, you can treat strings as though they contained bytes if they do in fact contain bytes, which you can assure by only reading and writing bytes.".
1: Or pedantically, "which can express a larger range of values than a byte, though they are stored as bytes when that is convenient". I think.
2: For the record, strings in Perl are internally represented by a data structure called a 'PV' which in addition to a character pointer knows things like the length of the string and the current value of pos.3
3: Well, it will start storing the current value of pos if it starts being interesting. See also
use Devel::Peek;
my $x = "bluh bluh bluh bluh";
Dump($x);
$x =~ /bluh/mg;
Dump($x);
$x =~ /bluh/mg;
Dump($x);
It might help more if you tell us what you are trying to do with the byte array. There are various ways to work with binary data, and each lends itself to a different set of tools.
Do you want to convert the data into a Perl array? If so, pack and unpack are a good start. split could also come in handy.
Do you want to access individual elements of the string without unpacking it? If so, substr is fast and will do the trick for 8 byte data. If you want other bit depths, take a look at the vec function, which treads a string as a bit vector.
Do you want to scan the string and convert certain bytes to other bytes? Then the s/// or tr/// constructs might be useful.
Allow me just to post a small example about treating string as binary array - since I myself found it difficult to believe that something called "substr" would handle null bytes; but seemingly it does - below is a snippet of a perl debugger terminal session (with both string and array/list approaches):
$ perl -d
Loading DB routines from perl5db.pl version 1.32
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
^D
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.
DB<1> $str="\x00\x00\x84\x00"
DB<2> print $str
�
DB<3> print unpack("H*",$str) # show content of $str as hex via `unpack`
00008400
DB<4> $str2=substr($str,2,2)
DB<5> print unpack("H*",$str2)
8400
DB<6> $str2=substr($str,1,3)
DB<7> print unpack("H*",$str2)
008400
[...]
DB<30> #stra=split('',$str); print #stra # convert string to array (by splitting at empty string)
�
DB<31> print unpack("H*",$stra[3]) # print indiv. elems. of array as hex
00
DB<32> print unpack("H*",$stra[2])
84
DB<33> print unpack("H*",$stra[1])
00
DB<34> print unpack("H*",$stra[0])
00
DB<35> print unpack("H*",join('',#stra[1..3])) # print only portion of array/list via indexes (using flipflop [two dots] operator)
008400

How can I sprintf a big number in Perl?

On a Windows 32-bit platform I have to read some numbers that, this was unexpected, can have values as big as 99,999,999,999, but no more. Trying to sprintf("%011d", $myNum) them outputs an overflow: -2147483648.
I cannot use the BigInt module because in this case I should deeply change the code. I cannot manage the format as string, sprintf("%011s", $numero), because the minus sign is incorrectly handled.
How can I manage this? Could pack/unpack be of some help?
Try formatting it as a float with no fraction part:
$ perl -v
This is perl, v5.6.1 built for sun4-solaris
...
$ perl -e 'printf "%011d\n", 99999999999'
-0000000001
$ perl -e 'printf "%011.0f\n", 99999999999'
99999999999
Yes, one of Perl's numeric blind spots is formatting; Perl automatically handles representing numbers as integers or floats pretty well, but then coerces them into
one or the other when the printf numeric formats are used, even when that isn't
appropriate. And printf doesn't really handle BigInts at all (except by treating
them as strings and converting that to a number, with loss of precision).
Using %s instead of %d with any number you aren't sure will be in an appropriate
range is a good workaround, except as you note for negative numbers. To handle
those, you are going to have to write some Perl code.
Floats can work, up to a point.
perl -e "printf qq{%.0f\n}, 999999999999999"
999999999999999
But only up to a point
perl -e "printf qq{%.0f\n}, 9999999999999999999999999999999999999999999999"
9999999999999998663747590131240811450955988992
Bignum doesn't help here.
perl -e "use bignum ; printf qq{%.0f\n}, 9999999999999999999999999999999999999999999999"
9999999999999999931398190359470212947659194368
The problem is printf. (Do you really need printf?)
Could print work?
perl -e "use bignum;print 9999999999999999999999999999999999999999999999"
9999999999999999999999999999999999999999999999
Having said all of that, the nice thing about perl is it's always an option to roll your own.
e.g.
my $in = ...;
my $out = "";
while($in){
my $chunk=$in & 0xf;
$in >>= 4;
$out = sprintf("%x",$chunk).$out;
}
print "0x$out\n";
I'm no Perl expert, and maybe I'm missing some sort of automatic handling of bignums here, but isn't this simply a case of integer overflow? A 32-bit integer can't hold numbers that are as big as 99,999,999,999.
Anyway, I get the same result with Perl v5.8.8 on my 32-bit Linux machine, and it seems that printf with "%d" doesn't handle larger numbers.
I think your copy of Perl must be broken, this is from CygWin's version (5.10):
pax$ perl -e 'printf("%011d\n", 99999999999);'
99999999999
pax$ perl -v
This is perl, v5.10.0 built for cygwin-thread-multi-64int
(with 6 registered patches, see perl -V for more detail)
Copyright 1987-2007, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
What version are you running (output of perl -v)?
You may have to get a 64-bit enabled version of Perl [and possibly a new 64-bit production machine] (note the "cygwin-thread-multi-64int" in my output). That will at least avoid the need for changing the code.
I'm stating this on the basis that you don't want to change the code greatly (i.e., you fear breaking things). The solution of new hardware, whilst a little expensive, will almost certainly not require you to change the software at all. It depends on your priorities.
Another possibility is that Perl itself may be storing the number correctly but just displaying it wrong due to a printf() foible. In that case, you may want to try:
$million = 1000000;
$bignum = 99999999999;
$firstbit = int($bignum / $million);
$secondbit = $bignum - $firstbit * million;
printf ("%d%06d\n",$firstbit,$secondbit);
Put that in a function and call the function to return a string, such as:
sub big_honkin_number($) {
$million = 1_000_000;
$bignum = shift;
$firstbit = int($bignum / $million);
$secondbit = $bignum - $firstbit * $million;
return sprintf("%d%06d\n", $firstbit, $secondbit);
}
printf ("%s", big_honkin_number (99_999_999_999));
Note that I tested this but on the 64-bit platform - you'll need to do your own test on 32-bit but you can use whatever scaling factor you want (including more than two segments if need be).
Update: That big_honkin_number() trick works fine on a 32-bit Perl so it looks like it is just the printf() functions that are stuffing you up:
pax#pax-desktop:~$ perl -v
This is perl, v5.8.8 built for i486-linux-gnu-thread-multi
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
pax#pax-desktop:~$ perl qq.pl
99999999999