Is the value returned by refaddr permanent? - perl

According to Scalar::Util's documentation, refaddr works like this:
my $addr = refaddr( $ref );
If $ref is reference the internal memory address of the referenced value is returned as a plain integer. Otherwise undef is returned.
However, this doesn't tell me if $addr is permanent. Could the refaddr of a reference change over time? In C, for example, running realloc could change the location of something stored in dynamic memory. Is this analogous for Perl 5?
I'm asking because I want to make an inside-out object, and I'm wondering whether refaddr($object) would make a good key. It seems simplest when programming in XS, for example.

First of all, don't reinvent the wheel; use Class::InsideOut.
It is permanent. It must be, or the following would fail:
my $x;
my $r = \$x;
... Do something with $x ...
say $$r;
Scalars have a "head" at a fixed location. If the SV needs an upgrade (e.g. to hold a string), it's a second memory block known as the "body" that will change. The string buffer is yet a third memory block.
$ perl -MDevel::Peek -MScalar::Util=refaddr -E'
my $x=4;
my $r=\$x;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Upgrade SV:";
$x='abc';
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Increase PV size:";
$x="x"x20;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
'
refaddr=0x2e1db58
SV = IV(0x2e1db48) at 0x2e1db58 <-- SVt_IV variables can't hold strings.
REFCNT = 2
FLAGS = (PADMY,IOK,pIOK)
IV = 4
Upgrade SV:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58 <-- Scalar upgrade to SVt_PVIV.
REFCNT = 2 New body at new address,
FLAGS = (PADMY,POK,IsCOW,pPOK) but head still at same address.
IV = 4
PV = 0x2e86f20 "abc"\0 <-- The scalar now has a string buffer.
CUR = 3
LEN = 10
COW_REFCNT = 1
Increase PV size:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58
REFCNT = 2
FLAGS = (PADMY,POK,pPOK)
IV = 4
PV = 0x2e5d7b0 "xxxxxxxxxxxxxxxxxxxx"\0 <-- Changing the address of the string buffer
REFCNT = 2 doesn't change anything else.
CUR = 20
LEN = 22

Related

In perl, when assigning a subroutine's return value to a variable, is the data duplicated in memory?

sub foo {
my #return_value = (1, 2);
}
my #receiver = foo();
Is this assigning like any other assigning in perl? the array is duplicated in memory? I doubt this cause of that since the array held by the subroutine is disposable, a duplication is totally redundant. it makes sense to just 'link' the array to #receiver for optimization reason.
by the way, I noticed a similar question Perl: function returns reference or copy? but didn't get what I want.
and I'm talking about Perl5
ps. any books or materials on such sort of topics about perl?
The scalars returned by :lvalue subs aren't copied.
The scalars returned by XS subs aren't copied.
The scalars returned by function (named operators) aren't copied.
The scalars returned by other subs are copied.
But that's before any assignment comes into play. If you assign the returned values to a variable, you will be copying them (again, in the case of a normal Perl sub).
This means my $y = sub { $x }->(); copies $x twice!
But that doesn't really matter because of optimizations.
Let's start with an example of when they aren't copied.
$ perl -le'
sub f :lvalue { my $x = 123; print \$x; $x }
my $r = \f();
print $r;
'
SCALAR(0x465eb48) # $x
SCALAR(0x465eb48) # The scalar on the stack
But if you remove :lvalue...
$ perl -le'
sub f { my $x = 123; print \$x; $x }
my $r = \f();
print $r;
'
SCALAR(0x17d0918) # $x
SCALAR(0x17b1ec0) # The scalar on the stack
Worse, one usually follows up by assigning the scalar to a variable, so a second copy occurs.
$ perl -le'
sub f { my $x = 123; print \$x; $x }
my $r = \f(); # \
print $r; # > my $y = f();
my $y = $$r; # /
print \$y;
'
SCALAR(0x1802958) # $x
SCALAR(0x17e3eb0) # The scalar on the stack
SCALAR(0x18028f8) # $y
On the plus side, assignment in optimized to minimize the cost of copying strings.
XS subs and functions (named operators) typically return mortal ("TEMP") scalars. These are scalars "on death row". They will be automatically destroyed if nothing steps in to claim a reference to them.
In older versions of Perl (<5.20), assigning a mortal string to another scalar will cause ownership of the string buffer to be transferred to avoid having to copy the string buffer. For example, my $y = lc($x); doesn't copy the string created by lc; simply the string pointer is copied.
$ perl -MDevel::Peek -e'my $s = "abc"; Dump($s); $s = lc($s); Dump($s);'
SV = PV(0x1705840) at 0x1723768
REFCNT = 1
FLAGS = (PADMY,POK,IsCOW,pPOK)
PV = 0x172d4c0 "abc"\0
CUR = 3
LEN = 10
COW_REFCNT = 1
SV = PV(0x1705840) at 0x1723768
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x1730070 "abc"\0 <-- Note the change of address from stealing
CUR = 3 the buffer from the scalar returned by lc.
LEN = 10
In newer versions of Perl (≥5.20), the assignment operator never[1] copies the string buffer. Instead, newer versions of Perl uses a copy-on-write ("COW") mechanism.
$ perl -MDevel::Peek -e'my $x = "abc"; my $y = $x; Dump($x); Dump($y);'
SV = PV(0x26b0530) at 0x26ce230
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x26d68a0 "abc"\0 <----+
CUR = 3 |
LEN = 10 |
COW_REFCNT = 2 +-- Same buffer (0x26d68a0)
SV = PV(0x26b05c0) at 0x26ce248 |
REFCNT = 1 |
FLAGS = (POK,IsCOW,pPOK) |
PV = 0x26d68a0 "abc"\0 <----+
CUR = 3
LEN = 10
COW_REFCNT = 2
Ok, so far, I've only talked about scalars. Well, that's because subs and functions can only return scalars[2].
In your example, the scalar assigned to #return_value would be returned[3], copied, then copied a second time into #receiver by the assignment.
You could avoid all of this by returning a reference to the array.
sub f { my #fizbobs = ...; \#fizbobs }
my $fizbobs = f();
The only thing copied there is a reference, the simplest non-undefined scalar.
Ok, maybe not never. I think there needs to be a free byte in the string buffer to hold the COW count.
In list context, they can return 0, 1 or many of them, but they can only return scalars.
The last operator of your sub is a list assignment operator. In list context, the list assignment operator returns the scalars to which its left-hand side (LHS) evaluates. See Scalar vs List Assignment Operator for more info.
The subroutine returns the result of the last operation if you don't specify an explicit return.
#return_value is created separately from #receiver and the values are copied and the memory used by #return_value is released when it goes out of scope at subroutine exit.
So yes - the memory used is duplicated.
If you desperately want to avoid this, you can create an anonymous array once, and 'pass' a reference to it around:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $anon_array_ref = [ 1, 2 ];
return $anon_array_ref;
}
my $results_from_foo = foo();
print Dumper $results_from_foo;
This will usually be premature optimisation though, unless you know you're dealing with really big data structures.
Note - you should probably include an explicit return; in your sub after the assignment, as it's good practice to make clear what you're doing.

Attempt to delete readonly key from a restricted hash, when it is not restricted

I quite often arrange my subroutine entry like this:
sub mySub {
my ($self, %opts) = #_;
lock_keys(%opts, qw(count, name));
...
my $name = delete $opts{name};
$self->SUPER::mySub(%opts);
}
to allow calling the sub using named arguments like this:
$obj->mySub(count=>1, name=>'foobar');
The lock_keys guards against calling the sub with mis-spelled argument names.
The last couple of lines are another common idiom I use, where if I am writing a method that overrides a superclass, I might extract the arguments which are specific to the subclass and then chain a call to the subclass.
This worked fine in perl 5.8, but after upgrading to Centos 6 (which has perl 5.10.1) I started to see seemingly random errors like this:
Attempt to delete readonly key 'otherOption' from a restricted hash at xxx.pl line 9.
These errors do not happen all the time (even in the same subroutine) but they do seem to relate to the call chain that results in calling the sub which bombs out.
Also note that they do not happen on perl 5.16 (or at least not on ideone).
What is causing these errors in perl 5.10? According to the manpage for Hash::Util, delete() should still work after lock_keys. It is like the whole hash is getting locked somehow.
I found the answer to this even before posting on SO, but the workaround is not great so feel free to chime in with a better one.
This SSCCE exhibits the problem:
#!/usr/bin/perl
use strict;
use Hash::Util qw(lock_keys);
sub doSomething {
my ($a, $b, %opts) = #_;
lock_keys(%opts, qw(myOption, otherOption));
my $x = delete $opts{otherOption};
}
my %h = (
a=>1,
b=>2
);
foreach my $k (keys %h) {
doSomething(1, 2, otherOption=>$k);
}
It seems that the problem is related to the values passed in as values to the named argument hash (%opt in my example). If these values are copied from keys of a hash, as in the example above, it marks the values as read-only in such a way that it later prevents deleting keys from the hash.
In fact you can see this using Devel::Peek
$ perl -e'
use Devel::Peek;
my %x=(a=>1);
foreach my $x (keys %x) {
my %y = (x => $x);
Dump($x);
Dump(\%y);
}
'
SV = PV(0x22cfb78) at 0x22d1fd0
REFCNT = 2
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x22f8450 "a"
CUR = 1
LEN = 0
SV = RV(0x22eeb30) at 0x22eeb20
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x22f8880
SV = PVHV(0x22d7fb8) at 0x22f8880
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x22e99a0 (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
RITER = -1
EITER = 0x0
Elt "x" HASH = 0x9303a5e5
SV = PV(0x22cfc88) at 0x22d1b98
REFCNT = 1
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x22f8450 "a"
CUR = 1
LEN = 0
Note that the FLAGS for the hash entry are "READONLY" and in fact the variable $x and the value of the corresponding value in %y are actually pointing at the same string (PV = 0x22f8450 in my example above). It seems that Perl 5.10 is trying hard to avoid copying strings, but in doing so has inadvertently locked the whole hash.
The workaround I am using is to force a string copy, like this:
foreach my $k (keys %h) {
my $j = "$k";
doSomething(1, 2, otherOption=>$j);
}
This seems an inefficient way to force a string copy, and in any case is easy to forget, so other answers containing better workarounds are welcome.

View Perl Variables as Bytes/Bits

Disclaimer: It's been ages since I've done any perl, so if I'm asking/saying something stupid please correct me.
Is it possible to view a byte/bit representation of a perl variable? That is, if I say something like
my $foo = 'a';
I know (think?) the computer sees $foo as something like
0b1100010
Is there a way to get perl to print out the binary representation of a variable?
(Not asking for any practical purpose, just tinkering around with a old friend and trying to understand it more deeply than I did in 1997)
Sure, using unpack:
print unpack "B*", $foo;
Example:
% perl -e 'print unpack "B*", "bar";'
011000100110000101110010
The perldoc pages for pack and perlpacktut give a nice overview about converting between different representations.
The place to start if you want the actual internals is a document called "perlguts". Either perldoc perlguts or read it here: http://perldoc.perl.org/perlguts.html
After seeing the way that Andy interpreted your question, I can follow up by saying that Devel::Peek has a Dump function which can show the internal representation of a variable. It won't take it to the binary level, but if what you are interested in is the internals, you might look at this.
$ perl -MDevel::Peek -e 'my $foo="a";Dump $foo';
SV = PV(0x7fa8a3004e78) at 0x7fa8a3031150
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x7fa8a2c06190 "a"\0
CUR = 1
LEN = 16
$ perl -MDevel::Peek -e 'my %bar=(x=>"y",a=>"b");Dump \%bar'
SV = IV(0x7fbc5182d6e8) at 0x7fbc5182d6f0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x7fbc51831168
SV = PVHV(0x7fbc5180c268) at 0x7fbc51831168
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x7fbc5140f9f0 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "a" HASH = 0xca2e9442
SV = PV(0x7fbc51804f78) at 0x7fbc51807340
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc5140fa60 "b"\0
CUR = 1
LEN = 16
Elt "x" HASH = 0x9303a5e5
SV = PV(0x7fbc51804e78) at 0x7fbc518070d0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc514061a0 "y"\0
CUR = 1
LEN = 16
And one more way:
printf "%v08b\n", 'abc';
output:
01100001.01100010.0110001
(The v flag is a perl-only printf/sprintf feature and also works with numeric formats other than b.)
This differs from the unpack suggestion where there are characters greater than "\xff": unpack will only return the 8 low bits (with a warning), printf '%v...' will show all the bits:
$ perl -we'printf "%vX\n", "\cA\13P\x{1337}"'
1.B.50.1337
You can use ord to return the numeric value of a character, and printf with a %b format to display that value in binary.
print "%08b\n”, ord 'a'
output
01100010

Assigning a string to Perl substr?

I am looking at Perl script written by someone else, and I found this:
$num2 = '000000';
substr($num2, length($num2)-length($num), length($num)) = $num;
my $id_string = $text."_".$num2
Forgive me ignorance, but for an untrained Perl programmer the second line looks as if the author is assigning the string $num to the result of the function substr. What does this line exactly do?
Exactly what you think it would do:
$ perldoc -f substr
You can use the substr() function as an lvalue, in which case
EXPR must itself be an lvalue. If you assign something shorter
than LENGTH, the string will shrink, and if you assign
something longer than LENGTH, the string will grow to
accommodate it. To keep the string the same length, you may
need to pad or chop your value using "sprintf".
In Perl, (unlike say, Python, where strings, tuples are not modifiable in-place), strings can be modified in situ. That is what substr is doing here, it is modifying only a part of the string. Instead of this syntax, you can use the more cryptic syntax:
substr($num2, length($num2)-length($num), length($num),$num);
which accomplishes the same thing. You can further stretch it. Imagine you want to replace all instances of foo by bar in a string, but only within the first 50 characters. Perl will let you do it in a one-liner:
substr($target,0,50) =~ s/foo/bar/g;
Great, isn't it?
"Exactly", you ask?
Normally, substr returns a boring string (PV with POK).
$ perl -MDevel::Peek -e'$_="abcd"; Dump("".substr($_, 1, 2));'
SV = PV(0x99f2828) at 0x9a0de38
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x9a12510 "bc"\0
CUR = 2
LEN = 12
However, when substr is evaluated where an lvalue (assignable value) is expected, it returns a magical scalar (PVLV with GMG (get magic) and SMG (set magic)).
$ perl -MDevel::Peek -e'$_="abcd"; Dump(substr($_, 1, 2));'
SV = PVLV(0x8941b90) at 0x891f7d0
REFCNT = 1
FLAGS = (TEMP,GMG,SMG)
IV = 0
NV = 0
PV = 0
MAGIC = 0x8944900
MG_VIRTUAL = &PL_vtbl_substr
MG_TYPE = PERL_MAGIC_substr(x)
TYPE = x
TARGOFF = 1
TARGLEN = 2
TARG = 0x8948c18
FLAGS = 0
SV = PV(0x891d798) at 0x8948c18
REFCNT = 2
FLAGS = (POK,pPOK)
PV = 0x89340e0 "abcd"\0
CUR = 4
LEN = 12
This magical scalar holds the parameters passed to susbtr (TARG, TARGOFF and TARGLEN). You can see the scalar pointed by TARG (the original scalar passed to substr) repeated at the end (the SV at 0x8948c18 you see at the bottom).
Any read of this magical scalar results in an associated function to be called instead. Similarly, a write calls a different associated function. These functions cause the selected part of the string passed to substr to be read or modified.
perl -E'
$_ = "abcde";
my $ref = \substr($_, 1, 3); # $$ref is magical
say $$ref; # bcd
$$ref = '123';
say $_; # a123e
'
Looks to me like it's overwriting the last length($num) characters of $num2 with the contents of $num in order to get a '0' filled number.
I imagine most folks would accomplish this same task w/ sprintf()

Are Perl strings immutable?

What's happening behind the scenes when I do a concatenation on a string?
my $short = 'short';
$short .= 'cake';
Is Perl effectively creating a new string, then assigning it the correct variable reference, or are Perl strings always mutable by nature?
The motivation for this question came from a discussion I had with a colleague, who said that scripting languages can utilize immutable strings.
Perl strings are mutable. Perl automatically creates new buffers, if required.
use Devel::Peek;
my $short = 'short';
Dump($short);
Dump($short .= 'cake');
Dump($short = "");
SV = PV(0x28403038) at 0x284766f4
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x28459078 "short"\0
CUR = 5
LEN = 8
SV = PV(0x28403038) at 0x284766f4
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x28458120 "shortcake"\0
CUR = 9
LEN = 12
SV = PV(0x28403038) at 0x284766f4
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x28458120 ""\0
CUR = 0
LEN = 12
Note that no new buffer is allocated in the third case.
Perl strings are definitely mutable. Each will store an allocated buffer size in addition to the used length and beginning offset, and the buffer will be expanded as needed. (The beginning offset is useful to allow consumptive operations like s/^abc// to not have to move the actual data.)
$short = 'short';
print \$short;
$short .= 'cake';
print \$short;
After executing this code I get "SCALAR(0x955f468)SCALAR(0x955f468)". My answer would be 'mutable'.