View Perl Variables as Bytes/Bits - perl

Disclaimer: It's been ages since I've done any perl, so if I'm asking/saying something stupid please correct me.
Is it possible to view a byte/bit representation of a perl variable? That is, if I say something like
my $foo = 'a';
I know (think?) the computer sees $foo as something like
0b1100010
Is there a way to get perl to print out the binary representation of a variable?
(Not asking for any practical purpose, just tinkering around with a old friend and trying to understand it more deeply than I did in 1997)

Sure, using unpack:
print unpack "B*", $foo;
Example:
% perl -e 'print unpack "B*", "bar";'
011000100110000101110010
The perldoc pages for pack and perlpacktut give a nice overview about converting between different representations.

The place to start if you want the actual internals is a document called "perlguts". Either perldoc perlguts or read it here: http://perldoc.perl.org/perlguts.html

After seeing the way that Andy interpreted your question, I can follow up by saying that Devel::Peek has a Dump function which can show the internal representation of a variable. It won't take it to the binary level, but if what you are interested in is the internals, you might look at this.
$ perl -MDevel::Peek -e 'my $foo="a";Dump $foo';
SV = PV(0x7fa8a3004e78) at 0x7fa8a3031150
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x7fa8a2c06190 "a"\0
CUR = 1
LEN = 16
$ perl -MDevel::Peek -e 'my %bar=(x=>"y",a=>"b");Dump \%bar'
SV = IV(0x7fbc5182d6e8) at 0x7fbc5182d6f0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x7fbc51831168
SV = PVHV(0x7fbc5180c268) at 0x7fbc51831168
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x7fbc5140f9f0 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "a" HASH = 0xca2e9442
SV = PV(0x7fbc51804f78) at 0x7fbc51807340
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc5140fa60 "b"\0
CUR = 1
LEN = 16
Elt "x" HASH = 0x9303a5e5
SV = PV(0x7fbc51804e78) at 0x7fbc518070d0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc514061a0 "y"\0
CUR = 1
LEN = 16

And one more way:
printf "%v08b\n", 'abc';
output:
01100001.01100010.0110001
(The v flag is a perl-only printf/sprintf feature and also works with numeric formats other than b.)
This differs from the unpack suggestion where there are characters greater than "\xff": unpack will only return the 8 low bits (with a warning), printf '%v...' will show all the bits:
$ perl -we'printf "%vX\n", "\cA\13P\x{1337}"'
1.B.50.1337

You can use ord to return the numeric value of a character, and printf with a %b format to display that value in binary.
print "%08b\n”, ord 'a'
output
01100010

Related

Is the value returned by refaddr permanent?

According to Scalar::Util's documentation, refaddr works like this:
my $addr = refaddr( $ref );
If $ref is reference the internal memory address of the referenced value is returned as a plain integer. Otherwise undef is returned.
However, this doesn't tell me if $addr is permanent. Could the refaddr of a reference change over time? In C, for example, running realloc could change the location of something stored in dynamic memory. Is this analogous for Perl 5?
I'm asking because I want to make an inside-out object, and I'm wondering whether refaddr($object) would make a good key. It seems simplest when programming in XS, for example.
First of all, don't reinvent the wheel; use Class::InsideOut.
It is permanent. It must be, or the following would fail:
my $x;
my $r = \$x;
... Do something with $x ...
say $$r;
Scalars have a "head" at a fixed location. If the SV needs an upgrade (e.g. to hold a string), it's a second memory block known as the "body" that will change. The string buffer is yet a third memory block.
$ perl -MDevel::Peek -MScalar::Util=refaddr -E'
my $x=4;
my $r=\$x;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Upgrade SV:";
$x='abc';
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Increase PV size:";
$x="x"x20;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
'
refaddr=0x2e1db58
SV = IV(0x2e1db48) at 0x2e1db58 <-- SVt_IV variables can't hold strings.
REFCNT = 2
FLAGS = (PADMY,IOK,pIOK)
IV = 4
Upgrade SV:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58 <-- Scalar upgrade to SVt_PVIV.
REFCNT = 2 New body at new address,
FLAGS = (PADMY,POK,IsCOW,pPOK) but head still at same address.
IV = 4
PV = 0x2e86f20 "abc"\0 <-- The scalar now has a string buffer.
CUR = 3
LEN = 10
COW_REFCNT = 1
Increase PV size:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58
REFCNT = 2
FLAGS = (PADMY,POK,pPOK)
IV = 4
PV = 0x2e5d7b0 "xxxxxxxxxxxxxxxxxxxx"\0 <-- Changing the address of the string buffer
REFCNT = 2 doesn't change anything else.
CUR = 20
LEN = 22

Assigning a string to Perl substr?

I am looking at Perl script written by someone else, and I found this:
$num2 = '000000';
substr($num2, length($num2)-length($num), length($num)) = $num;
my $id_string = $text."_".$num2
Forgive me ignorance, but for an untrained Perl programmer the second line looks as if the author is assigning the string $num to the result of the function substr. What does this line exactly do?
Exactly what you think it would do:
$ perldoc -f substr
You can use the substr() function as an lvalue, in which case
EXPR must itself be an lvalue. If you assign something shorter
than LENGTH, the string will shrink, and if you assign
something longer than LENGTH, the string will grow to
accommodate it. To keep the string the same length, you may
need to pad or chop your value using "sprintf".
In Perl, (unlike say, Python, where strings, tuples are not modifiable in-place), strings can be modified in situ. That is what substr is doing here, it is modifying only a part of the string. Instead of this syntax, you can use the more cryptic syntax:
substr($num2, length($num2)-length($num), length($num),$num);
which accomplishes the same thing. You can further stretch it. Imagine you want to replace all instances of foo by bar in a string, but only within the first 50 characters. Perl will let you do it in a one-liner:
substr($target,0,50) =~ s/foo/bar/g;
Great, isn't it?
"Exactly", you ask?
Normally, substr returns a boring string (PV with POK).
$ perl -MDevel::Peek -e'$_="abcd"; Dump("".substr($_, 1, 2));'
SV = PV(0x99f2828) at 0x9a0de38
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x9a12510 "bc"\0
CUR = 2
LEN = 12
However, when substr is evaluated where an lvalue (assignable value) is expected, it returns a magical scalar (PVLV with GMG (get magic) and SMG (set magic)).
$ perl -MDevel::Peek -e'$_="abcd"; Dump(substr($_, 1, 2));'
SV = PVLV(0x8941b90) at 0x891f7d0
REFCNT = 1
FLAGS = (TEMP,GMG,SMG)
IV = 0
NV = 0
PV = 0
MAGIC = 0x8944900
MG_VIRTUAL = &PL_vtbl_substr
MG_TYPE = PERL_MAGIC_substr(x)
TYPE = x
TARGOFF = 1
TARGLEN = 2
TARG = 0x8948c18
FLAGS = 0
SV = PV(0x891d798) at 0x8948c18
REFCNT = 2
FLAGS = (POK,pPOK)
PV = 0x89340e0 "abcd"\0
CUR = 4
LEN = 12
This magical scalar holds the parameters passed to susbtr (TARG, TARGOFF and TARGLEN). You can see the scalar pointed by TARG (the original scalar passed to substr) repeated at the end (the SV at 0x8948c18 you see at the bottom).
Any read of this magical scalar results in an associated function to be called instead. Similarly, a write calls a different associated function. These functions cause the selected part of the string passed to substr to be read or modified.
perl -E'
$_ = "abcde";
my $ref = \substr($_, 1, 3); # $$ref is magical
say $$ref; # bcd
$$ref = '123';
say $_; # a123e
'
Looks to me like it's overwriting the last length($num) characters of $num2 with the contents of $num in order to get a '0' filled number.
I imagine most folks would accomplish this same task w/ sprintf()

How can I dump a string in perl to see if there are any character differences?

I've occasionally had problems with strings being subtly different, in some cases utf8::all changed the behavior, so I assume the subtle differences are unicode. I'd like to dump strings in such a way that the differences will be visual to me. What are my options for doing this?
I recommend the Dump function in the Devel::Peek module in the Perl core:
$ perl -MDevel::Peek -e 'Dump "abc"'
SV = PV(0x10441500) at 0x10491680
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,pPOK)
PV = 0x10442224 "abc"\0
CUR = 3
LEN = 4
$ perl -MDevel::Peek -e 'Dump "\x{FEFF}abc"'
SV = PV(0x10441050) at 0x10443be0
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
PV = 0x10449bc0 "\357\273\277abc"\0 [UTF8 "\x{feff}abc"]
CUR = 6
LEN = 8
(You see how FLAGS contains UTF8 in the second example, because of the wide character, but not in the first?)
For most uses, Data::Dumper with Useqq will do.
use utf8;
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper("foo–bar"));
print(Dumper("foo-bar"));
Output:
$VAR1 = "foo\x{2013}bar";
$VAR1 = "foo-bar";
If you want internal details (such as the UTF8 flag), use Devel::Peek.
use utf8;
use Devel::Peek;
Dump("foo–bar");
Dump("foo-bar");
Output:
SV = PV(0x328ccc) at 0x1d6a0c4
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
PV = 0x1d6d52c "foo\342\200\223bar"\0 [UTF8 "foo\x{2013}bar"]
CUR = 9
LEN = 12
SV = PV(0x328dcc) at 0x32b594
REFCNT = 1
FLAGS = (PADTMP,POK,READONLY,pPOK)
PV = 0x1d6d50c "foo-bar"\0
CUR = 7
LEN = 12
Have you tried Test::LongString? Even though it's really a test module, it is handy for showing you where the differences in a string occur. It focuses on the parts that are different instead of showing you the whole string, and it make \x{} escapes for specials.
I'd like to see an example where utf8::all changed the behavior, even if just to see an interesting edge case.
All you need to dump out any string is:
printf "U+%v04X\n", $string;
You could use this to format a string:
($print_string = $string) =~ s/([^\x20-\x7E])/sprintf "\\x{%x}", $1/ge;
or even
use charnames ();
($print_string = $string) =~ s/([^\x20-\x7E])/sprintf "\\N{%s}", charnames::viacode(ord $1)/ge;
I have no idea why in the wolrd you would use the misleadingly named utf8::all. It’s not a core module, and you seem to be having some sort of trouble with knowing what it is really doing. If you explicitly used the individual core pieces that go into it, maybe you would understand it all better.

Devel::Peek Question

% perl -Ilib -MDevel::Peek -le '$a="34567"; $a=~s/...//; Dump($a)'
SV = PV(0x8171048) at 0x8186f48 # replaced "12345" with "34567"
REFCNT = 1
FLAGS = (POK,OOK,pPOK)
OFFSET = 3
PV = 0x8181bdb ( "34\003" . ) "67"\0
CUR = 2
LEN = 9
Where do the 2 zeros in the chomped part ( "12\003" . ) between 2 and 3 come from?
Why do I get this kind of output in the chomped part ( "34\003" . )?
A bug? "\003" is chr(3) in octal form. However:
$ perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/...//; Dump($a)'
SV = PVIV(0x869b0bc) at 0x86a5060
REFCNT = 1
FLAGS = (POK,OOK,pPOK)
IV = 3 (OFFSET)
PV = 0x869fac3 ( "123" . ) "45"\0
CUR = 2
LEN = 5
I can't duplicate that; what version of perl are you using?
Note that the part of the string buffer in () is reserved but not currently in use.
I am getting same result as sid_com using perl 5.12.2 on Windows. However the string length is taken from CUR field of structure anyway. I don't see why this should be a bug, there can be any bytes in rest of string buffer.

Are Perl strings immutable?

What's happening behind the scenes when I do a concatenation on a string?
my $short = 'short';
$short .= 'cake';
Is Perl effectively creating a new string, then assigning it the correct variable reference, or are Perl strings always mutable by nature?
The motivation for this question came from a discussion I had with a colleague, who said that scripting languages can utilize immutable strings.
Perl strings are mutable. Perl automatically creates new buffers, if required.
use Devel::Peek;
my $short = 'short';
Dump($short);
Dump($short .= 'cake');
Dump($short = "");
SV = PV(0x28403038) at 0x284766f4
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x28459078 "short"\0
CUR = 5
LEN = 8
SV = PV(0x28403038) at 0x284766f4
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x28458120 "shortcake"\0
CUR = 9
LEN = 12
SV = PV(0x28403038) at 0x284766f4
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x28458120 ""\0
CUR = 0
LEN = 12
Note that no new buffer is allocated in the third case.
Perl strings are definitely mutable. Each will store an allocated buffer size in addition to the used length and beginning offset, and the buffer will be expanded as needed. (The beginning offset is useful to allow consumptive operations like s/^abc// to not have to move the actual data.)
$short = 'short';
print \$short;
$short .= 'cake';
print \$short;
After executing this code I get "SCALAR(0x955f468)SCALAR(0x955f468)". My answer would be 'mutable'.