Why does perl's unpack() think that the second argument is a string? - perl

And how do I fix it?
If I do the following:
print unpack("B8", 7) . "\n";
I get the following output:
00110111
The expected output is of course 00000111. I've checked, and it's giving me ascii "7", the string. I'm able to fix it poorly by wrapping the 7 in a chr():
print unpack("B8", chr(7)) . "\n";
Of course, this will only work if my input remains below 255, and I suspect it may go into the low thousands (I'll make the "B8" dynamic too).
I know I'm being obtuse, but I've read the docs on this and they make no mention of it. Its reverse function, pack(), seems to interpret the second argument correctly.

unpack unpacks a string of bytes into scalars with the values represented by those bytes.
$ perl -E'say for unpack("nB8", "\x12\x34\x56")'
4660
01010110
You're looking for
sprintf("%08B", 7)

Related

Why does this line return sum of integers 1-10?

I'd like to understand how unpack is returning the sum in the given perl one-liner.
I've looked at pack man page and mostly understood that it is simply formatting the given array into a scalar of ten doubles.
However, I couldn't find proper documentation for unpack with %123. Looking for help here.
print unpack "%123d*" , pack( "d*", (1..10));
This line correctly outputs 55 which is 1+2+3+...+10.
From perldoc -f unpack:
In addition to fields allowed in pack(), you may prefix a field with a % to indicate that you want a <number>-bit checksum of the items instead of the items themselves.
Thus %123d* means to add all the input integers 1..10 and then take the first 123 bit of this result in order to construct the "<number>-bit checksum". Note that %8d* or just %d* (which is equivalent to %16d*) would suffice too given that the sum is small enough.

Why are ##, #!, #, etc. not interpolated in strings?

First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)
Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).
Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.
Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."

Why does `$v = () = split` return 1?

perldoc says "a list assignment in scalar context returns the number of elements on the right-hand side of the list assignment" but when I try this code:
perl -e '$_="aaaaa";print $v=(()=split //)'
The output is 1 which makes me confused. (The answer I expect is 5.)
Can anybody explain this?
According to split documentation:
When assigning to a list, if LIMIT is omitted, or zero, Perl supplies
a LIMIT one larger than the number of variables in the list <...>
Since you specify empty list, split only returns 1 result and this number of results is exactly what ends in your variable.
split has some kind of crazy ultra-magic in it that allows it to know when it is on the right hand side of an assignment that has a list on the left hand side, and adjusts its behavior according to the number of items in that list.
This is described in perlfunc as being done "to avoid unnecessary work", but you've found an observable difference in behavior caused by that optimization.
To see some evidence of what happened, run your script through Deparse like this:
perl -MO=Deparse -e '$_="aaaaa";print $v=(()=split //)'
Update: I went looking for the code that implements this, and it's not where I expected it to be. Actually the optimization is performed by the assignment operator (op.c:Perl_newASSIGNOP) . split doesn't know that much about its context.
Why are you assigning to an empty array? the ()=(split //) bit. That's going to end up with - um, well, a mess. Or, in your case, an array with a size of one with not much in it.
Also, that's excessively obscure. perl has a sad reputation for being write-only, and all that modifying $_ and using it doesn't help others - or you - understand what is going on.
Try something like
perl -e '$v = (split //, "aaaaa"); print "$v\n"'
or, if you wish to replicate the behavior of your test:
perl -e '$v = () = (split //, "aaaaa"); print "$v\n"'
Yes, but :
perl -e '$_="aaaaa";print $v=(split //)'
gives 5, as well as
perl -e '$_="aaaaa";print $v=(#x=split //)'
Maybe your left-value () is dropping additional array elements ?
edit : by the way :
perl -e '$_="aaaaa";print $v=(($x,$y)=split //)'
returns 3, because the right sight of the $v=... command gives :
( $x , $y , ( a , a , a ) )
So in your original case, ()=split // returns ( ( a , a , a , a , a ) ) (which has only one element)
Edit : bad array notation, and result was wrong because of a last minute changed of my test-case

Difference between "52" and 52?

Guys perl is not as easy i thought its so confusing thing.I just moved to operators and I wrote some codes but I am unable to figure it out how the compiler treating them.
$in = "42" ;
$out = "56"+32+"good";
print $out,;
The output for above code is 88 and where does the good gone? and Now lets see the other one.
$in ="42";
$out="good52"+32;
print $out ;
and for these the output is 32. The question is where does the good gone that we just stored in $out and the value 52 between the " "why the compiler just printing the value as 32 but not that remaining text.And the other question is
$in=52;
$in="52";
both doing the same work "52" not working as a text . becuase when we add "52"+32 it gives as 84. what is happening and
$in = "hello";
$in = hello;
both do the same work ? or do they differ but if i print then give the same output.Its just eating up my brain.Its so confusing becuase when "52" or 52 and "hello" or hello doing the same job why did they introduce " ".I just need the explaination why its happening for above codes.
In Perl, + is a numeric operator. It tries to interpret its two operands as numbers. 51 is the number 51. "51" is a string containing two digits, and the + operator tries to convert the string to a number, which is 51, and uses it in the calculation. "hello" is a string containing five letters, and when the + operator tries to interpret that as a number, it equates to 0 (zero).
Your first example is thus:
$out = "56"+32+"good";
which is evaluates just like:
$out = 56 + 32 + 0;
Your print then converts that to a string on output, and yields 88.
In perl, the + operator will treat its arguments as numbers, and try to convert anything that is not a number to a number. The . (dot) operator is used to join strings: it will try to convert its operands to strings if they aren't already strings.
If you put:
use strict;
use warnings;
At the top of your script, you would get warnings such as:
Argument "good" isn't numeric in addition (+) at ...
Argument "good52" isn't numeric in addition (+) at ...
Perl automatically reassigns a string value to numeric, if possible. So "42" + 10 actually becomes 52. But it cannot do that with a proper string value, such as "good".
In perl, a string in a numerical context (like when you use a + operator) is converted to a number.
In perl, you can concatenate string using the . (dot) operator, not +.
If you use +, perl will try and interpret all of the operands as numbers. This works well for strings that are number representations, otherwise you get 0. This explains what you see.
$in=52;
$in="52";
both doing the same work "52" not working as a text . becuase when we add "52"+32 it gives as 84.
The problem here is not with the variable definition. One is a string and the other a number. But when you use the string in a numerical expression (+), then it will converted to number.
About your second question:
$in = "hello" defines a string, as you expect;
$in = hello; will just copy the symbol hello (however it is defined) on to your variable. This is actually not "strict" perl and if you set use strict; in your file, perl will complain about it.
First off, give this a read.
Your problem is that the + is a mathematical addition, which doesn't work on strings. If you use that, Perl will assume that you're working with numbers and therefore discard anything that isn't.
To concatenate strings, use .:
$str = "blah " . "blah " . "blah";
As far as the difference between "52" and 52 goes, there isn't one. Since nothing (commands, comments, etc.) in Perl can start with numbers, the compiler doesn't need the quotes to know what to do.

Can you explain the bits I'm getting from unpack?

I'm relatively inexperienced with Perl, but my question concerns the unpack function when getting the bits for a numeric value. For example:
my $bits = unpack("b*", 1);
print $bits;
This results in 10001100 being printed, which is 140 in decimal. In the reverse order it's 49 in decimal. Any other values I've tried seem to give the incorrect bits.
However, when I run $bits through pack, it produces 1 again. Is there something I'm missing here?
It seems that I jumped to conclusions when I thought my problem was solved. Maybe I should briefly explain what it is I'm trying do.
I need to convert an integer value that could be as big as 24 bits long (the point being that it could be bigger than one byte) into a bit string. This much can be accomplished using unpack and pack as suggested by #ikegami, but I also need to find a way to convert that bit string back into it's original integer (not a string representation of it).
As I mentioned, I'm relatively inexperienced with Perl, and I've been trying with no success.
I found what seems to be an optimal solution:
my $bits = sprintf("%032b", $num);
print "$bits\n";
my $orig = unpack("N", pack("B32", substr("0" x 32 . $bits, -32)));
print "$orig\n";
This might be obvious, but the other answers haven't pointed it out explicitly: The second argument in unpack("b*", 1) is being typecast to the string "1", which has an ASCII value of 31 in hex (with the most significant nibble first).
The corresponding binary would be 00110001, which is reversed to 10001100 in your output because you used "b*" instead of "B*". These correspond to the opposite "endian" forms of the binary representation. "Endian-ness" is just whether the most-significant bits go at the start or the end of the binary representation.
Yes, you're missing that different machines support different "endianness". And Perl is treating 1 like '1' so ( 0x31 ). So, you're seeing 1 -> 1000 (in ascending order) and 3 -> 1100.
"Wrong" depends on perspective and whether or not you gave Perl enough information to know what encoding and endianness you wanted.
From pack:
b A bit string (ascending bit order inside each byte, like vec()).
B A bit string (descending bit order inside each byte).
I think this is what you want:
unpack( 'B*', chr(1))
You're trying to convert an integer to binary and then back. While you can do that with pack and then unpack, the better way is to use sprintf or printf with the %b format:
my $int = 5;
my $bits = sprintf "%024b\n", $int;
print "$bits\n";
To go the other way (converting a string of 0s & 1s to an integer), the best way is to use the oct function with a 0b prefix:
my $orig = oct("0b$bits");
print "$orig\n";
As the others explained, unpack expects a string to unpack, so if you have an integer, you first have to pack it into a string. The %b format expects an integer to begin with.
If you need to do a lot of this on bytes, and speed is crucial, you could build a lookup table:
my #binary = map { sprintf '%08b', $_ } 0 .. 255;
print $binary[$int]; # Assuming $int is between 0 and 255
The ord(1) is 49. You must want something like sprintf("%064b", 1), although that does seem like overkill.
You didn't specify what you expect. I'm guessing you're expecting 00000001.
That's the correct bits for the byte you provided, at least on non-EBCDIC systems. Remember, the input of unpack is a string (mostly strings of bytes). Perhaps you wanted
unpack('b*', pack('C', 1))
Update: As others have pointed out, the above gives 10000000. For 00000001, you'd use
unpack('B*', pack('C', 1)) # 00000001
You want "B" instead of "b".
$ perl -E'say unpack "b*", "1"'
10001100
$ perl -E'say unpack "B*", "1"'
00110001
pack