Perl 5.6.1 vs. Perl 5.14 - converting dec to hex - perl

I found something strange.
Different behaviors for different versions of perl.
The code is:
$x = -806;
$x = sprintf "0x%x" , $x;
print "$x";
In 5.6.1 i get:
0xfffffcda
In 5.14 i get:
0xfffffffffffffcda
How can i get 32-bit in 5.14 as well?
Thanks!

The thing with negative numbers is they're represented via 2s complement binary. What you're seeing is the result of the word size being larger.
I'm not entirely sure precisely why it would have changed (aside from 14 years and a general move to 64bit), but it's not easy to fix without recompiling perl. I'd suggest that's not a good idea since what you're really trying to get is a stringification.
A simpler solution would be a bitwise AND with the appropriate length bitmask:
$x = -806;
$x = sprintf ("0x%x" , $x & 0xffffffff);
print "$x";

Some addition to the answer above:
The number of digits Perl produces when its sprintf converts to hex depends on the size of the native C data type Perl uses internally to store unsigned integer values. What type that is is determined by Perl's Configure script when it sets things up to compile the Perl interpreter, so it's not exactly something that can be changed at run time. It can also vary from operating system to operating system and machine to machine, so if you run your script in different environments you can't be sure how many hex digits will be produced (a point strongly in favor of Sobrique's suggestion). It's also quite likely that the default native type was changed from a 32-bit one to a 64-bit one at some point during the 14 years since 5.6.1 was released.
If you want to know what type is used in a particular perl installation, perl -MConfig -E 'say $Config{uvtype}' will tell you (modify as needed for pre-5.10 perls).

Related

perl negative look ahead not working on large strings

The perl negative look ahead is not working on large strings ( length > 40000, in active perl and cygwin perl, version 5.14 ). I tried the same code with mingw perl 5.8.8 and it stops working for strings with length > 5000.
The code I am using is:
my $str = q(A B);
my $pattern = '(A)(?:(?!(X)).)*(B)';
if ( $str =~ m/$pattern/ ) {
print "matched\n";
}
This works fine for all three versions of the perl. But when I increase the length of the string by adding spaces, the pattern stops matching.
for e.g.: my $str = q(A ...some 50000 spaces... B);
Kindly help.
Perl imposes an internal limit (happens to be a signed 16-bit integer on most systems) on the size of various regex operations to limit stack growth. This answer has a very good breakdown of the limit.
From empirical testing, when the space count gets to 32767, that's when you fail, so it's certainly this limit.

Efficient pre-perl-5.10 equivalent of pack("Q>")

Update: Salva correctly points out that I was wrong about the introduction of the "Q" pack template. It's the ">" modifier that doesn't go back to 5.8.
Perl 5.10 introduced the pack() modifier ">", which, for my use case with "Q" packs an unsigned quad (64bit) value in big endian.
Now, I'm looking for an efficient equivalent for
pack("Q>2", #ints)
where #ints contains two 64bit unsigned ints. "Q>2" means "pack two unsigned quads in big-endian byte order". Obviously, I want this because I am (at least temporarily) tied to a pre-5.10 Perl.
Update2: Actually, on further reflection, something as simple as the following should do:
pack("N4", $ints[0] >> 32, $ints[0], $ints[1] >> 32, $ints[1])
Appears to work on my 64bit x86-64 Linux. Any reason why this might not be exactly the same as pack("Q>2", #ints)? Any platform-specific matters?
What's the reverse (ie. equivalent to unpack("Q>2", #ints))?
The Q pattern was introduced in perl 5.6. Your real problem may be that you are trying to use it in a perl compiled without 64bit support.
Anyway, you can use Math::Int64.
Update, an example:
use Math::Int64 qw(int64_to_native);
my $packed = join '', map int64_to_native($_), #ints;
Another option, if you are on a 64bit perl supporting Q but not Q>, is to reorder the bytes yourself:
pack 'C*', reverse unpack 'C*', pack 'Q', $int;

Perl version string: why use EVAL EXPR?

I just took notice to this generated by Catalyst.pl. It is obviously some sort of unannotated hack. What is the advantage of setting up a version string like this? I can't even figure out what they're trying to do.
our $VERSION = '0.01';
$VERSION = eval $VERSION;
Version numbers are complex in Perl. Here's an excellent overview for those looking for the gory details. It might surprise you how many subtle ways there are to get things wrong...
The direct answer to your question though, is that different things expect different formats. For CPAN, you care about development versions for example, as a string. For runtime, you care about them as a number.
Consider the case of $VERSION = "0.01_001". eval converts it to the number 0.01001 correctly.
From perlmodstyle: Version numbering
If you want to release a 'beta' or
'alpha' version of a module but don't
want CPAN.pm to list it as most recent
use an '_' after the regular version
number followed by at least 2 digits,
eg. 1.20_01. If you do this, the
following idiom is recommended:
$VERSION = "1.12_01";
$XS_VERSION = $VERSION; # only needed if you have XS code
$VERSION = eval $VERSION;
With that trick MakeMaker will only
read the first line and thus read the
underscore, while the perl interpreter
will evaluate the $VERSION and convert
the string into a number. Later
operations that treat $VERSION as a
number will then be able to do so
without provoking a warning about
$VERSION not being a number.
The eval converts the string "0.001_001" to a number, following the rules for Perl numeric literals (which allow underscores for legibility). The result is the number 0.001001.
Without the eval, the string is converted to a number following the rule for converting strings, which stops at the first non-numeric character.
E.g.: perl -e 'print "0.001_001" + 0'
I may be misremembering this, but I think some automated code parsers like to see the line of code:
our $VERSION = '0.01';
But you really want $VERSION to hold a float instead of a string.
You may want to read this article, I know I am going to.
Oh, dear god, now I remember why I use
our $VERSION = 20100903;
style version numbers. That is just insane. I love Perl, but that is pure, refined, concentrated insanity. I won't try to summarize David Golden's article. You just have to read it and cry.

How to tokenize Perl source code?

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer, which will split it to tokens, and return the token type of each of them, e.g. for the script
print "Hello, World!\n";
it would return something like this:
keyword 5 bytes
whitespace 1 byte
double-quoted-string 17 bytes
semicolon 1 byte
whitespace 1 byte
Which is the best library (preferably written in Perl) for this? It has to be reasonably correct, i.e. it should be able to parse syntactic constructs like qq{{\}}}, but it doesn't have to know about special parsers like Lingua::Romana::Perligata. I know that parsing Perl is Turing-complete, and only Perl itself can do it right, but I don't need absolute correctness: the tokenizer can fail or be incompatible or assume some default in some very rare corner cases, but it should work correctly most of the time. It must be better than the syntax highlighting built into an average text editor.
FYI I tried the PerlLexer in pygments, which works reasonable for most constructs, except that it cannot find the 2nd print keyword in this one:
print length(<<"END"); print "\n";
String
END
PPI
use PPI;
Yes, only perl can parse Perl, however PPI is the 95% correct solution.

How can I sprintf a big number in Perl?

On a Windows 32-bit platform I have to read some numbers that, this was unexpected, can have values as big as 99,999,999,999, but no more. Trying to sprintf("%011d", $myNum) them outputs an overflow: -2147483648.
I cannot use the BigInt module because in this case I should deeply change the code. I cannot manage the format as string, sprintf("%011s", $numero), because the minus sign is incorrectly handled.
How can I manage this? Could pack/unpack be of some help?
Try formatting it as a float with no fraction part:
$ perl -v
This is perl, v5.6.1 built for sun4-solaris
...
$ perl -e 'printf "%011d\n", 99999999999'
-0000000001
$ perl -e 'printf "%011.0f\n", 99999999999'
99999999999
Yes, one of Perl's numeric blind spots is formatting; Perl automatically handles representing numbers as integers or floats pretty well, but then coerces them into
one or the other when the printf numeric formats are used, even when that isn't
appropriate. And printf doesn't really handle BigInts at all (except by treating
them as strings and converting that to a number, with loss of precision).
Using %s instead of %d with any number you aren't sure will be in an appropriate
range is a good workaround, except as you note for negative numbers. To handle
those, you are going to have to write some Perl code.
Floats can work, up to a point.
perl -e "printf qq{%.0f\n}, 999999999999999"
999999999999999
But only up to a point
perl -e "printf qq{%.0f\n}, 9999999999999999999999999999999999999999999999"
9999999999999998663747590131240811450955988992
Bignum doesn't help here.
perl -e "use bignum ; printf qq{%.0f\n}, 9999999999999999999999999999999999999999999999"
9999999999999999931398190359470212947659194368
The problem is printf. (Do you really need printf?)
Could print work?
perl -e "use bignum;print 9999999999999999999999999999999999999999999999"
9999999999999999999999999999999999999999999999
Having said all of that, the nice thing about perl is it's always an option to roll your own.
e.g.
my $in = ...;
my $out = "";
while($in){
my $chunk=$in & 0xf;
$in >>= 4;
$out = sprintf("%x",$chunk).$out;
}
print "0x$out\n";
I'm no Perl expert, and maybe I'm missing some sort of automatic handling of bignums here, but isn't this simply a case of integer overflow? A 32-bit integer can't hold numbers that are as big as 99,999,999,999.
Anyway, I get the same result with Perl v5.8.8 on my 32-bit Linux machine, and it seems that printf with "%d" doesn't handle larger numbers.
I think your copy of Perl must be broken, this is from CygWin's version (5.10):
pax$ perl -e 'printf("%011d\n", 99999999999);'
99999999999
pax$ perl -v
This is perl, v5.10.0 built for cygwin-thread-multi-64int
(with 6 registered patches, see perl -V for more detail)
Copyright 1987-2007, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
What version are you running (output of perl -v)?
You may have to get a 64-bit enabled version of Perl [and possibly a new 64-bit production machine] (note the "cygwin-thread-multi-64int" in my output). That will at least avoid the need for changing the code.
I'm stating this on the basis that you don't want to change the code greatly (i.e., you fear breaking things). The solution of new hardware, whilst a little expensive, will almost certainly not require you to change the software at all. It depends on your priorities.
Another possibility is that Perl itself may be storing the number correctly but just displaying it wrong due to a printf() foible. In that case, you may want to try:
$million = 1000000;
$bignum = 99999999999;
$firstbit = int($bignum / $million);
$secondbit = $bignum - $firstbit * million;
printf ("%d%06d\n",$firstbit,$secondbit);
Put that in a function and call the function to return a string, such as:
sub big_honkin_number($) {
$million = 1_000_000;
$bignum = shift;
$firstbit = int($bignum / $million);
$secondbit = $bignum - $firstbit * $million;
return sprintf("%d%06d\n", $firstbit, $secondbit);
}
printf ("%s", big_honkin_number (99_999_999_999));
Note that I tested this but on the 64-bit platform - you'll need to do your own test on 32-bit but you can use whatever scaling factor you want (including more than two segments if need be).
Update: That big_honkin_number() trick works fine on a 32-bit Perl so it looks like it is just the printf() functions that are stuffing you up:
pax#pax-desktop:~$ perl -v
This is perl, v5.8.8 built for i486-linux-gnu-thread-multi
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
pax#pax-desktop:~$ perl qq.pl
99999999999