Perl pack/unpack and length of binary string - perl

Consider this short example:
$a = pack("d",255);
print length($a)."\n";
# Prints 8
$aa = pack("ddddd", 255,123,0,45,123);
print length($aa)."\n";
# Prints 40
#unparray = unpack("d "x5, $aa);
print scalar(#unparray)."\n";
# Prints 5
print length($unparray[0])."\n"
# Prints 3
printf "%d\n", $unparray[0] '
# Prints 255
# As a one-liner:
# perl -e '$a = pack("d",255); print length($a)."\n"; $aa = pack("dd", 255,123,0,45,123); print length($aa)."\n"; #unparray = unpack("d "x5, $aa); print scalar(#unparray)."\n"; print length($unparray[0])."\n"; printf "%d\n", $unparray[0] '
Now, I'd expect a double-precision float to be eight bytes, so the first length($a) is correct. But why is the length after the unpack (length($unparray[0])) reporting 3 - when I'm trying to go back the exact same way (double-precision, i.e. eight bytes) - and the value of the item (255) is correctly preserved?

By unpacking what you packed, you've gotten back the original values, and the first value is 255. The stringification of 255 is "255", which is 3 characters long, and that's what length tells you.

Related

bug in perl with prefixed 0 numbers

when I try to execute the following(01434.210 instead of 1434.210)
$val=22749.220-(21315.010+01434.210)
print $val
I get these output
output 638.207900000001
But according to me output must be 0.
What am I missing?
A leading 0 in a literal number makes Perl interpret the value I'm base 8:
123 # 123, in decimal
0123 # 123 in octal, but 83 in decimal
This isn't the same for strings converted to numbers. In those Perl ignores the leading 0s. The string-to-number conversion only deals in base-10:
"123" + 0 # 123
"0123" + 0 # still 123
In your example in the comment, you convert a literal number to a string with a leading zero. When you convert that string back to its numeric form you get the same value you started with:
$val=sprintf("%05d",1434); # converting 1434 to the string "01434"
print $val; print "\n"; # still a string
print $val+21315; # "01434" + 21315 => 1434 + 21315
print "\n";
print 01434+21315; # oct(1434) + 21315 => 796 + 21315
The leading zero notation helps with certain builtins that typically use octal numbers, such as those that deal with unix permissions:
chmod 0644, #files

Difference between $var = 500 and $var = '500'

In Perl, what is the difference between
$status = 500;
and
$status = '500';
Not much. They both assign five hundred to $status. The internal format used will be different initially (IV vs PV,UTF8=0), but that's of no importance to Perl.
However, there are things that behave different based on the choice of storage format even though they shouldn't. Based on the choice of storage format,
JSON decides whether to use quotes or not.
DBI guesses the SQL type it should use for a parameter.
The bitwise operators (&, | and ^) guess whether their operands are strings or not.
open and other file-related builtins encode the file name using UTF-8 or not. (Bug!)
Some mathematical operations return negative zero or not.
As already #ikegami told not much. But remember than here is MUCH difference between
$ perl -E '$v=0500; say $v'
prints 320 (decimal value of 0500 octal number), and
$ perl -E '$v="0500"; say $v'
what prints
0500
and
$ perl -E '$v=0900; say $v'
what dies with error:
Illegal octal digit '9' at -e line 1, at end of line
Execution of -e aborted due to compilation errors.
And
perl -E '$v="0300";say $v+1'
prints
301
but
perl -E '$v="0300";say ++$v'
prints
0301
similar with 0x\d+, e.g:
$v = 0x900;
$v = "0x900";
There is only a difference if you then use $var with one of the few operators that has different flavors when operating on a string or a number:
$string = '500';
$number = 500;
print $string & '000', "\n";
print $number & '000', "\n";
output:
000
0
To provide a bit more context on the "not much" responses, here is a representation of the internal data structures of the two values via the Devel::Peek module:
user#foo ~ $ perl -MDevel::Peek -e 'print Dump 500; print Dump "500"'
SV = IV(0x7f8e8302c280) at 0x7f8e8302c288
REFCNT = 1
FLAGS = (PADTMP,IOK,READONLY,pIOK)
IV = 500
SV = PV(0x7f8e83004e98) at 0x7f8e8302c2d0
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,pPOK)
PV = 0x7f8e82c1b4e0 "500"\0
CUR = 3
LEN = 16
Here is a dump of Perl doing what you mean:
user#foo ~ $ perl -MDevel::Peek -e 'print Dump ("500" + 1)'
SV = IV(0x7f88b202c268) at 0x7f88b202c270
REFCNT = 1
FLAGS = (PADTMP,IOK,READONLY,pIOK)
IV = 501
The first is a number (the integer between 499 and 501). The second is a string (the characters '5', '0', and '0'). It's not true that there's no difference between them. It's not true that one will be converted immediately to the other. It is true that strings are converted to numbers when necessary, and vice-versa, and the conversion is mostly transparent, but not completely.
The answer When does the difference between a string and a number matter in Perl 5 covers some of the cases where they're not equivalent:
Bitwise operators treat numbers numerically (operating on the bits of the binary representation of each number), but they treat strings character-wise (operating on the bits of each character of each string).
The JSON module will output a string as a string (with quotes) even if it's numeric, but it will output a number as a number.
A very small or very large number might stringify differently than you expect, whereas a string is already a string and doesn't need to be stringified. That is, if $x = 1000000000000000 and $y = "1000000000000000" then $x might stringify to 1e+15. Since using a variable as a hash key is stringifying, that means that $hash{$x} and $hash{$y} may be different hash slots.
The smart-match (~~) and given/when operators treat number arguments differently from numeric strings. Best to avoid those operators anyway.
There are different internally:)
($_ ^ $_) ne '0' ? print "$_ is string\n" : print "$_ is numeric\n" for (500, '500');
output:
500 is numeric
500 is string
I think this perfectly demonstrates what is going on.
$ perl -MDevel::Peek -e 'my ($a, $b) = (500, "500");print Dump $a; print Dump $b; $a.""; $b+0; print Dump $a; print Dump $b'
SV = IV(0x8cca90) at 0x8ccaa0
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 500
SV = PV(0x8acc20) at 0x8ccad0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x8c5da0 "500"\0
CUR = 3
LEN = 16
SV = PVIV(0x8c0f88) at 0x8ccaa0
REFCNT = 1
FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
IV = 500
PV = 0x8d3660 "500"\0
CUR = 3
LEN = 16
SV = PVIV(0x8c0fa0) at 0x8ccad0
REFCNT = 1
FLAGS = (PADMY,IOK,POK,pIOK,pPOK)
IV = 500
PV = 0x8c5da0 "500"\0
CUR = 3
LEN = 16
Each scalar (SV) can have string (PV) and or numeric (IV) representation. Once you use variable with only string representation in any numeric operation and one with only numeric representation in any string operation they have both representations. To be correct, there can be also another number representation, the floating point representation (NV) so there are three possible representation of scalar value.
Many answers already to this question but i'll give it a shot for the confused newbie:
my $foo = 500;
my $bar = '500';
As they are, for practical pourposes they are the "same". The interesting part is when you use operators.
For example:
print $foo + 0;
output: 500
The '+' operator sees a number at its left and a number at its right, both decimals, hence the answer is 500 + 0 => 500
print $bar + 0;
output: 500
Same output, the operator sees a string that looks like a decimal integer at its left, and a zero at its right, hence 500 + 0 => 500
But where are the differences?
It depends on the operator used. Operators decide what's going to happen. For example:
my $foo = '128hello';
print $foo + 0;
output: 128
In this case it behaves like atoi() in C. It takes biggest numeric part starting from the left and uses it as a number. If there are no numbers it uses it as a 0.
How to deal with this in conditionals?
my $foo = '0900';
my $bar = 900;
if( $foo == $bar)
{print "ok!"}
else
{print "not ok!"}
output: ok!
== compares the numerical value in both variables.
if you use warnings it will complain about using == with strings but it will still try to coerce.
my $foo = '0900';
my $bar = 900;
if( $foo eq $bar)
{print "ok!"}
else
{print "not ok!"}
output: not ok!
eq compares strings for equality.
You can try "^" operator.
my $str = '500';
my $num = 500;
if ($num ^ $num)
{
print 'haha\n';
}
if ($str ^ $str)
{
print 'hehe\n';
}
$str ^ $str is different from $num ^ $num so you will get "hehe".
ps, "^" will change the arguments, so you should do
my $temp = $str;
if ($temp ^ $temp )
{
print 'hehe\n';
}
.
I usually use this operator to tell the difference between num and str in perl.

How to convert hex to string of hex

I have a problem understanding and using the 'vec' keyword.
I am reading a logpacket in which values are stored in little endian hexadecimal. In my code, I have to unpack the different bytes into scalars using the unpack keyword.
Here's an example of my problem:
my #hexData1 = qw(50 65);
my $data = pack ('C*', #hexData1);
my $x = unpack("H4",$data); # At which point the hexadecimal number became a number
print $x."\n";
#my $foo = sprintf("%x", $foo);
print "$_-> " . vec("\x65\x50", $_, 1) . ", " for (0..15); # This works.
print "\n";
But I want to use the above statement in the way below. I don't want to send a string of hexadecimal in quotes. I want to use the scalar array of hex $x. But it won't work. How do I convert my $x to a hexadecimal string. This is my requirement.
print "$_-> " . vec($x, $_, 1).", " for (0..15); # This doesn't work.
print "\n";
My final objective is to read the third bit from the right of the two byte hexadecimal number.
How do I use the 'vec' command for that?
You are making the mistake of unpacking $data into $x before using it in a call to vec. vec expects a string, so if you supply a number it will be converted to a string before being used. Here's your code
my #hexData1 = qw(50 65);
my $data= pack ('C*', #hexData1);
The C pack format uses each value in the source list as a character code. It is the same as calling chr on each value and concatenating them. Unfortunately your values look like decimal, so you are getting chr(50).chr(65) or "2A". Since your values are little-endian, what you want is chr(0x65).chr(0x50) or "\x65\x50", so you must write
my $data= pack ('(H2)*', reverse #hexData1);
which reverses the list of data (to account for it being little-endian) and packs it as if it was a list of two-digit hex strings (which, fortunately, it is).
Now you have done enough. As I say, vec expects a string so you can write
print join ' ', map vec($data, $_, 1), 0 .. 15;
print "\n";
and it will show you the bits you expect. To extract the the 3rd bit from the right (assuming you mean bit 13, where the last bit is bit 15) you want
print vec $data, 13, 1;
First, get the number the bytes represent.
If you start with "\x50\x65",
my $num = unpack('v', "\x50\x65");
If you start with "5065",
my $num = unpack('v', pack('H*', "5065"));
If you start with "50","65",
my $num = unpack('v', pack('H*', join('', "50","65"));
Then, extract the bit you want.
If you want bit 10,
my $bit = ($num >> 10) & 1;
If you want bit 2,
my $bit = ($num >> 2) & 1;
(I'm listing a few possibilities because it's not clear to me what you want.)

How do I unpack a double-precision value in Perl?

From this question:
bytearray - Perl pack/unpack and length of binary string - Stack Overflow
I've learned that #unparray = unpack("d "x5, $aa); in the snippet below results with string items in the unparray - not with double precision numbers (as I expected).
Is it possible to somehow obtain an array of double-precision values from the $aa bytestring in the snippet below?:
$a = pack("d",255);
print length($a)."\n";
# prints 8
$aa = pack("ddddd", 255,123,0,45,123);
print length($aa)."\n";
# prints 40
#unparray = unpack("d "x5, $aa);
print scalar(#unparray)."\n";
# prints 5
print length($unparray[0])."\n"
# prints 3
printf "%d\n", $unparray[0] '
# prints 255
# one liner:
# perl -e '$a = pack("d",255); print length($a)."\n"; $aa = pack("ddddd", 255,123,0,45,123); print length($aa)."\n"; #unparray = unpack("d "x5, $aa); print scalar(#unparray)."\n"; print length($unparray[0])."\n" '
Many thanks in advance for any answers,
Cheers!
What makes you think it's not stored as a double?
use feature qw( say );
use Config qw( %Config );
use Devel::Peek qw( Dump );
my #a = unpack "d5", pack "d5", 255,123,0,45,123;
say 0+#a; # 5
Dump $a[0]; # NOK (floating point format)
say $Config{nvsize}; # 8 byte floats on this build
Sorry, but you've misunderstood hobbs' answer to your earlier question.
$unparray[0] is a double-precision floating-point value; but length is not like (say) C's sizeof operator, and doesn't tell you the size of its argument. Rather, it converts its argument to a string, and then tells you the length of that string.
For example, this:
my $a = 3.0 / 1.5;
print length($a), "\n";
will print this:
1
because it sets $a to 2.0, which gets stringified as 2, which has length 1.

Perl: utf8::decode vs. Encode::decode

I am having some interesting results trying to discern the differences between using Encode::decode("utf8", $var) and utf8::decode($var). I've already discovered that calling the former multiple times on a variable will eventually result in an error "Cannot decode string with wide characters at..." whereas the latter method will happily run as many times as you want, simply returning false.
What I'm having trouble understanding is how the length function returns different results depending on which method you use to decode. The problem arises because I am dealing with "doubly encoded" utf8 text from an outside file. To demonstrate this issue, I created a text file "test.txt" with the following Unicode characters on one line: U+00e8, U+00ab, U+0086, U+000a. These Unicode characters are the double-encoding of the Unicode character U+8acb, along with a newline character. The file was encoded to disk in UTF8. I then run the following perl script:
#!/usr/bin/perl
use strict;
use warnings;
require "Encode.pm";
require "utf8.pm";
open FILE, "test.txt" or die $!;
my #lines = <FILE>;
my $test = $lines[0];
print "Length: " . (length $test) . "\n";
print "utf8 flag: " . utf8::is_utf8($test) . "\n";
my #unicode = (unpack('U*', $test));
print "Unicode:\n#unicode\n";
my #hex = (unpack('H*', $test));
print "Hex:\n#hex\n";
print "==============\n";
$test = Encode::decode("utf8", $test);
print "Length: " . (length $test) . "\n";
print "utf8 flag: " . utf8::is_utf8($test) . "\n";
#unicode = (unpack('U*', $test));
print "Unicode:\n#unicode\n";
#hex = (unpack('H*', $test));
print "Hex:\n#hex\n";
print "==============\n";
$test = Encode::decode("utf8", $test);
print "Length: " . (length $test) . "\n";
print "utf8 flag: " . utf8::is_utf8($test) . "\n";
#unicode = (unpack('U*', $test));
print "Unicode:\n#unicode\n";
#hex = (unpack('H*', $test));
print "Hex:\n#hex\n";
This gives the following output:
Length: 7
utf8 flag:
Unicode:
195 168 194 171 194 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 4
utf8 flag: 1
Unicode:
232 171 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 2
utf8 flag: 1
Unicode:
35531 10
Hex:
e8ab8b0a
This is what I would expect. The length is originally 7 because perl thinks that $test is just a series of bytes. After decoding once, perl knows that $test is a series of characters that are utf8-encoded (i.e. instead of returning a length of 7 bytes, perl returns a length of 4 characters, even though $test is still 7 bytes in memory). After the second decoding, $test contains 4 bytes interpreted as 2 characters, which is what I would expect since Encode::decode took the 4 code points and interpreted them as utf8-encoded bytes, resulting in 2 characters. The strange thing is when I modify the code to call utf8::decode instead (replace all $test = Encode::decode("utf8", $test); with utf8::decode($test))
This gives almost identical output, only the result of length differs:
Length: 7
utf8 flag:
Unicode:
195 168 194 171 194 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 4
utf8 flag: 1
Unicode:
232 171 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 4
utf8 flag: 1
Unicode:
35531 10
Hex:
e8ab8b0a
It seems like perl first counts the bytes before decoding (as expected), then counts the characters after the first decoding, but then counts the bytes again after the second decoding (not expected). Why would this switch happen? Is there a lapse in my understanding of how these decoding functions work?
Thanks,Matt
You are not supposed to use the functions from the utf8 pragma module. Its documentation says so:
Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.
Always use the Encode module, and also see the question Checklist for going the Unicode way with Perl. unpack is too low-level, it does not even give you error-checking.
You are going wrong with the assumption that the octects E8 AB 86 0A are the result of UTF-8 double-encoding the characters 諆 and newline. This is the representation of a single UTF-8 encoding of these characters. Perhaps the whole confusion on your side stems from that mistake.
length is unappropriately overloaded, at certain times it determines the length in characters, or the length in octets. Use better tools such as Devel::Peek.
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Devel::Peek qw(Dump);
use Encode qw(decode);
my $test = "\x{00e8}\x{00ab}\x{0086}\x{000a}";
# or read the octets without implicit decoding from a file, does not matter
Dump $test;
# FLAGS = (PADMY,POK,pPOK)
# PV = 0x8d8520 "\350\253\206\n"\0
$test = decode('UTF-8', $test, Encode::FB_CROAK);
Dump $test;
# FLAGS = (PADMY,POK,pPOK,UTF8)
# PV = 0xc02850 "\350\253\206\n"\0 [UTF8 "\x{8ac6}\n"]
Turns out this was a bug: https://rt.perl.org/rt3//Public/Bug/Display.html?id=80190.