Is « my » overwriting memory when called in a loop? - perl

A simple but relevant question: Is « my » overwriting memory when called in a loop?
For instance, is it "better" (in terms of memory leaks, performance, speed) to declare it outside of the loop:
my $variable;
for my $number ( #array ) {
$variable = $number * 5;
_sub($variable);
}
Or should I declare it inside the loop:
for my $number ( #array ) {
my $variable = $number * 5;
_sub($variable);
}
(I just made that code up, it's not meant to do anything nor be used - as it is - in real life)
Will Perl allocate a new space in memory for each and every one of the for iterations ?

Aamir already told you what will happen.
I recommend to stick to the second version unless there is some reason to use the first. You don't want to care about the previous state of $variable. It's simplest to start each iteration with a fresh variable. And if variable contains a reference you might actually shoot yourself in the foot if you push that onto an array.
Edit:
Yes, there is a performance hit. Using a recycled variable will be faster. However, it is hard to hell how much faster it will be as this will depend on your specific situation. No matter how much faster it is though, always remember: Premature optimization is the root of all evil.

From your examples above:
A new space for variable will not be allocated everytime, the previous one will be used.
A new space will be allocated for every iteration of loop and will be de-allocated as well in the same iteration.

These are things you aren't supposed to think about with a dynamic language such as Perl. Even though you might get an answer about what the current implementation does, that's not a feature and it isn't something you should rely on.
Define your variables in the shortest scope possible.
However, to be merely curious, you can use the Devel::Peek module to cheat a bit to see the internal (not physical) memory address:
use Devel::Peek;
foreach ( 0 .. 5 ) {
my $var = $_;
Dump( $var );
}
In this small case, the address ends up being the same. That's no guarantee that it will always be the same for different situations, or even the same program:
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 0
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 1
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 2
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 3
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 4
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 5

You can benchmark the difference between the two uses using the Benchmark module which is made for these types of micro-benchmarking comparisons:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw( cmpthese );
sub outside {
my $x;
for my $y ( 1 .. 1_000_000 ) {
$x = $y;
}
return;
}
sub inside {
for my $y ( 1 .. 1_000_000 ) {
my $x = $y;
}
return;
}
cmpthese -1 => {
inside => \&inside,
outside => \&outside,
};
Results on my Windows XP SP3 laptop:
Rate inside outside
inside 4.44/s -- -25%
outside 5.91/s 33% --
Predictably, the difference is less pronounced when the body of the loop is executed only once.
That said, I would not declare $x outside the loop unless I needed outside the loop what is assigned to $x inside the loop.

You are totally safe using "my" inside a for loop or any other block. In general you don't have to worry about memory leaks in perl, but you would be equally safe in this circumstance with a non-garbage-collecting language like C++. A normal variable is deallocated at the end of the block in which it has scope.

Related

In perl, when assigning a subroutine's return value to a variable, is the data duplicated in memory?

sub foo {
my #return_value = (1, 2);
}
my #receiver = foo();
Is this assigning like any other assigning in perl? the array is duplicated in memory? I doubt this cause of that since the array held by the subroutine is disposable, a duplication is totally redundant. it makes sense to just 'link' the array to #receiver for optimization reason.
by the way, I noticed a similar question Perl: function returns reference or copy? but didn't get what I want.
and I'm talking about Perl5
ps. any books or materials on such sort of topics about perl?
The scalars returned by :lvalue subs aren't copied.
The scalars returned by XS subs aren't copied.
The scalars returned by function (named operators) aren't copied.
The scalars returned by other subs are copied.
But that's before any assignment comes into play. If you assign the returned values to a variable, you will be copying them (again, in the case of a normal Perl sub).
This means my $y = sub { $x }->(); copies $x twice!
But that doesn't really matter because of optimizations.
Let's start with an example of when they aren't copied.
$ perl -le'
sub f :lvalue { my $x = 123; print \$x; $x }
my $r = \f();
print $r;
'
SCALAR(0x465eb48) # $x
SCALAR(0x465eb48) # The scalar on the stack
But if you remove :lvalue...
$ perl -le'
sub f { my $x = 123; print \$x; $x }
my $r = \f();
print $r;
'
SCALAR(0x17d0918) # $x
SCALAR(0x17b1ec0) # The scalar on the stack
Worse, one usually follows up by assigning the scalar to a variable, so a second copy occurs.
$ perl -le'
sub f { my $x = 123; print \$x; $x }
my $r = \f(); # \
print $r; # > my $y = f();
my $y = $$r; # /
print \$y;
'
SCALAR(0x1802958) # $x
SCALAR(0x17e3eb0) # The scalar on the stack
SCALAR(0x18028f8) # $y
On the plus side, assignment in optimized to minimize the cost of copying strings.
XS subs and functions (named operators) typically return mortal ("TEMP") scalars. These are scalars "on death row". They will be automatically destroyed if nothing steps in to claim a reference to them.
In older versions of Perl (<5.20), assigning a mortal string to another scalar will cause ownership of the string buffer to be transferred to avoid having to copy the string buffer. For example, my $y = lc($x); doesn't copy the string created by lc; simply the string pointer is copied.
$ perl -MDevel::Peek -e'my $s = "abc"; Dump($s); $s = lc($s); Dump($s);'
SV = PV(0x1705840) at 0x1723768
REFCNT = 1
FLAGS = (PADMY,POK,IsCOW,pPOK)
PV = 0x172d4c0 "abc"\0
CUR = 3
LEN = 10
COW_REFCNT = 1
SV = PV(0x1705840) at 0x1723768
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x1730070 "abc"\0 <-- Note the change of address from stealing
CUR = 3 the buffer from the scalar returned by lc.
LEN = 10
In newer versions of Perl (≥5.20), the assignment operator never[1] copies the string buffer. Instead, newer versions of Perl uses a copy-on-write ("COW") mechanism.
$ perl -MDevel::Peek -e'my $x = "abc"; my $y = $x; Dump($x); Dump($y);'
SV = PV(0x26b0530) at 0x26ce230
REFCNT = 1
FLAGS = (POK,IsCOW,pPOK)
PV = 0x26d68a0 "abc"\0 <----+
CUR = 3 |
LEN = 10 |
COW_REFCNT = 2 +-- Same buffer (0x26d68a0)
SV = PV(0x26b05c0) at 0x26ce248 |
REFCNT = 1 |
FLAGS = (POK,IsCOW,pPOK) |
PV = 0x26d68a0 "abc"\0 <----+
CUR = 3
LEN = 10
COW_REFCNT = 2
Ok, so far, I've only talked about scalars. Well, that's because subs and functions can only return scalars[2].
In your example, the scalar assigned to #return_value would be returned[3], copied, then copied a second time into #receiver by the assignment.
You could avoid all of this by returning a reference to the array.
sub f { my #fizbobs = ...; \#fizbobs }
my $fizbobs = f();
The only thing copied there is a reference, the simplest non-undefined scalar.
Ok, maybe not never. I think there needs to be a free byte in the string buffer to hold the COW count.
In list context, they can return 0, 1 or many of them, but they can only return scalars.
The last operator of your sub is a list assignment operator. In list context, the list assignment operator returns the scalars to which its left-hand side (LHS) evaluates. See Scalar vs List Assignment Operator for more info.
The subroutine returns the result of the last operation if you don't specify an explicit return.
#return_value is created separately from #receiver and the values are copied and the memory used by #return_value is released when it goes out of scope at subroutine exit.
So yes - the memory used is duplicated.
If you desperately want to avoid this, you can create an anonymous array once, and 'pass' a reference to it around:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $anon_array_ref = [ 1, 2 ];
return $anon_array_ref;
}
my $results_from_foo = foo();
print Dumper $results_from_foo;
This will usually be premature optimisation though, unless you know you're dealing with really big data structures.
Note - you should probably include an explicit return; in your sub after the assignment, as it's good practice to make clear what you're doing.

Is the value returned by refaddr permanent?

According to Scalar::Util's documentation, refaddr works like this:
my $addr = refaddr( $ref );
If $ref is reference the internal memory address of the referenced value is returned as a plain integer. Otherwise undef is returned.
However, this doesn't tell me if $addr is permanent. Could the refaddr of a reference change over time? In C, for example, running realloc could change the location of something stored in dynamic memory. Is this analogous for Perl 5?
I'm asking because I want to make an inside-out object, and I'm wondering whether refaddr($object) would make a good key. It seems simplest when programming in XS, for example.
First of all, don't reinvent the wheel; use Class::InsideOut.
It is permanent. It must be, or the following would fail:
my $x;
my $r = \$x;
... Do something with $x ...
say $$r;
Scalars have a "head" at a fixed location. If the SV needs an upgrade (e.g. to hold a string), it's a second memory block known as the "body" that will change. The string buffer is yet a third memory block.
$ perl -MDevel::Peek -MScalar::Util=refaddr -E'
my $x=4;
my $r=\$x;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Upgrade SV:";
$x='abc';
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
say "";
say "Increase PV size:";
$x="x"x20;
say sprintf "refaddr=0x%x", refaddr($r);
Dump($$r);
'
refaddr=0x2e1db58
SV = IV(0x2e1db48) at 0x2e1db58 <-- SVt_IV variables can't hold strings.
REFCNT = 2
FLAGS = (PADMY,IOK,pIOK)
IV = 4
Upgrade SV:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58 <-- Scalar upgrade to SVt_PVIV.
REFCNT = 2 New body at new address,
FLAGS = (PADMY,POK,IsCOW,pPOK) but head still at same address.
IV = 4
PV = 0x2e86f20 "abc"\0 <-- The scalar now has a string buffer.
CUR = 3
LEN = 10
COW_REFCNT = 1
Increase PV size:
refaddr=0x2e1db58
SV = PVIV(0x2e18b40) at 0x2e1db58
REFCNT = 2
FLAGS = (PADMY,POK,pPOK)
IV = 4
PV = 0x2e5d7b0 "xxxxxxxxxxxxxxxxxxxx"\0 <-- Changing the address of the string buffer
REFCNT = 2 doesn't change anything else.
CUR = 20
LEN = 22

Attempt to delete readonly key from a restricted hash, when it is not restricted

I quite often arrange my subroutine entry like this:
sub mySub {
my ($self, %opts) = #_;
lock_keys(%opts, qw(count, name));
...
my $name = delete $opts{name};
$self->SUPER::mySub(%opts);
}
to allow calling the sub using named arguments like this:
$obj->mySub(count=>1, name=>'foobar');
The lock_keys guards against calling the sub with mis-spelled argument names.
The last couple of lines are another common idiom I use, where if I am writing a method that overrides a superclass, I might extract the arguments which are specific to the subclass and then chain a call to the subclass.
This worked fine in perl 5.8, but after upgrading to Centos 6 (which has perl 5.10.1) I started to see seemingly random errors like this:
Attempt to delete readonly key 'otherOption' from a restricted hash at xxx.pl line 9.
These errors do not happen all the time (even in the same subroutine) but they do seem to relate to the call chain that results in calling the sub which bombs out.
Also note that they do not happen on perl 5.16 (or at least not on ideone).
What is causing these errors in perl 5.10? According to the manpage for Hash::Util, delete() should still work after lock_keys. It is like the whole hash is getting locked somehow.
I found the answer to this even before posting on SO, but the workaround is not great so feel free to chime in with a better one.
This SSCCE exhibits the problem:
#!/usr/bin/perl
use strict;
use Hash::Util qw(lock_keys);
sub doSomething {
my ($a, $b, %opts) = #_;
lock_keys(%opts, qw(myOption, otherOption));
my $x = delete $opts{otherOption};
}
my %h = (
a=>1,
b=>2
);
foreach my $k (keys %h) {
doSomething(1, 2, otherOption=>$k);
}
It seems that the problem is related to the values passed in as values to the named argument hash (%opt in my example). If these values are copied from keys of a hash, as in the example above, it marks the values as read-only in such a way that it later prevents deleting keys from the hash.
In fact you can see this using Devel::Peek
$ perl -e'
use Devel::Peek;
my %x=(a=>1);
foreach my $x (keys %x) {
my %y = (x => $x);
Dump($x);
Dump(\%y);
}
'
SV = PV(0x22cfb78) at 0x22d1fd0
REFCNT = 2
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x22f8450 "a"
CUR = 1
LEN = 0
SV = RV(0x22eeb30) at 0x22eeb20
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x22f8880
SV = PVHV(0x22d7fb8) at 0x22f8880
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x22e99a0 (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
RITER = -1
EITER = 0x0
Elt "x" HASH = 0x9303a5e5
SV = PV(0x22cfc88) at 0x22d1b98
REFCNT = 1
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x22f8450 "a"
CUR = 1
LEN = 0
Note that the FLAGS for the hash entry are "READONLY" and in fact the variable $x and the value of the corresponding value in %y are actually pointing at the same string (PV = 0x22f8450 in my example above). It seems that Perl 5.10 is trying hard to avoid copying strings, but in doing so has inadvertently locked the whole hash.
The workaround I am using is to force a string copy, like this:
foreach my $k (keys %h) {
my $j = "$k";
doSomething(1, 2, otherOption=>$j);
}
This seems an inefficient way to force a string copy, and in any case is easy to forget, so other answers containing better workarounds are welcome.

Is it a good practice to use self invoking anonymous function in Perl?

It is a common practice to use self invoking anonymous functions to scope variables etc. in JavaScript:
;(function() {
...
})();
Is it a good practice to use such functions in Perl ?
(sub {
...
})->();
Or is it better for some reason to use main subroutine ?
sub main {
...
}
main();
Perl has lexical scoping mechanisms JS lacks. You are better off simply enclosing code you want scoped somehow in a block, e.g.:
{
my $localvar;
. . .
}
In this case $localvar will be completely invisible outside of those braces; that is also the same mechanism one can use to localise builtin variables such as $/:
{
local $/ = undef;
#reading from a file handle now consumes the entire file
}
#But not out here
(Side note: never set $/ globally. It can break things in subtle and horrible ways if you forget to set it back when you're done, or if you call other code before restoring it.)
In perl, the best practise is to put things in subs when it makes sense; when it doesn't make sense or unnecessarily complicates the code, lexical blocks ensure scoping; if you do need anonymous subroutines (generally for callbacks or similar) then you can do my $subref = sub { . . . }; or even just stick the sub declaration directly into a function argument: do_something(callback => sub { . . . });
Note: see also ysth's answer for a resource-related advantage to self-invoking anonymous subs.
Since perl provides lexically scoped variables (and, as of 5.18, lexical named subs), there is no scoping reason for doing that.
The only reason to do it that I can think of would be for memory management; if the sub in question is a closure (references at least one external lexical variable), any memory used by the sub will be totally freed instead of retained for reuse on the next call:
$ perl -MDevel::Peek -wle'sub { my $x; Dump $x; $x = 42 }->() for 1..2'
SV = NULL(0x0) at 0x944a88
REFCNT = 1
FLAGS = (PADMY)
SV = IV(0x944a78) at 0x944a88
REFCNT = 1
FLAGS = (PADMY)
IV = 42
$ perl -MDevel::Peek -wle'my $y; sub { $y if 0; my $x; Dump $x; $x = 42 }->() for 1..2'
SV = NULL(0x0) at 0x259d238
REFCNT = 1
FLAGS = (PADMY)
SV = NULL(0x0) at 0x259d220
REFCNT = 1
FLAGS = (PADMY)
Though if you are not concerned about memory, this would be a disadvantage.
It's not unheard of but not common either. To restrict variable scope temporarily, it's much more common to use a block with a my variable declaration:
...
{
my $local_variable;
...
}
In Javascript, self-invoking functions have two uses:
Variable scoping. The var declarations are hoisted into the scope of the first enclosing function or into global scope. Therefore,
function () {
if (true) {
var foo = 42
}
}
is the same as
function () {
var foo
if (true) {
foo = 42
}
}
– often an unwanted effect.
Statements on the expression level. Sometimes you need multiple statements to compute something, but want to do so inside an expression.
largeObject = {
...,
// sum from 1 to 42
sum: (function(n){
var sum = 0;
for(var i = 1; i <= n; i++)
sum += i;
return sum;
})(42),
...,
};
Perl has no need for self-invoking functions as a scoping mechanism, because a new scope is introduced by any curly brace. A bare block is always allowed on a statement level:
...
my $foo = 10;
{
my $foo = 42;
}
$foo == 10 or die; # lives
Perl has reduced need for self-invoking functions to introduce statements into an expression because of the do BLOCK builtin:
%large_hash = (
...,
sum => do {
my $sum = 0;
$sum += $_ for 1 .. 42;
$sum;
},
...,
);
However, you will sometimes want to short-curcuit in such a block. As return exits the surrounding subroutine (not block), it can be quite useful here. For example in a memoized function:
# moronic cached division by two
my %cache;
sub lookup {
my $key = shift;
return $cache{$key} //= sub {
for (1 .. 100) {
return $_ if $_ * 2 == $key
}
return;
}->();
}

Assigning a string to Perl substr?

I am looking at Perl script written by someone else, and I found this:
$num2 = '000000';
substr($num2, length($num2)-length($num), length($num)) = $num;
my $id_string = $text."_".$num2
Forgive me ignorance, but for an untrained Perl programmer the second line looks as if the author is assigning the string $num to the result of the function substr. What does this line exactly do?
Exactly what you think it would do:
$ perldoc -f substr
You can use the substr() function as an lvalue, in which case
EXPR must itself be an lvalue. If you assign something shorter
than LENGTH, the string will shrink, and if you assign
something longer than LENGTH, the string will grow to
accommodate it. To keep the string the same length, you may
need to pad or chop your value using "sprintf".
In Perl, (unlike say, Python, where strings, tuples are not modifiable in-place), strings can be modified in situ. That is what substr is doing here, it is modifying only a part of the string. Instead of this syntax, you can use the more cryptic syntax:
substr($num2, length($num2)-length($num), length($num),$num);
which accomplishes the same thing. You can further stretch it. Imagine you want to replace all instances of foo by bar in a string, but only within the first 50 characters. Perl will let you do it in a one-liner:
substr($target,0,50) =~ s/foo/bar/g;
Great, isn't it?
"Exactly", you ask?
Normally, substr returns a boring string (PV with POK).
$ perl -MDevel::Peek -e'$_="abcd"; Dump("".substr($_, 1, 2));'
SV = PV(0x99f2828) at 0x9a0de38
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x9a12510 "bc"\0
CUR = 2
LEN = 12
However, when substr is evaluated where an lvalue (assignable value) is expected, it returns a magical scalar (PVLV with GMG (get magic) and SMG (set magic)).
$ perl -MDevel::Peek -e'$_="abcd"; Dump(substr($_, 1, 2));'
SV = PVLV(0x8941b90) at 0x891f7d0
REFCNT = 1
FLAGS = (TEMP,GMG,SMG)
IV = 0
NV = 0
PV = 0
MAGIC = 0x8944900
MG_VIRTUAL = &PL_vtbl_substr
MG_TYPE = PERL_MAGIC_substr(x)
TYPE = x
TARGOFF = 1
TARGLEN = 2
TARG = 0x8948c18
FLAGS = 0
SV = PV(0x891d798) at 0x8948c18
REFCNT = 2
FLAGS = (POK,pPOK)
PV = 0x89340e0 "abcd"\0
CUR = 4
LEN = 12
This magical scalar holds the parameters passed to susbtr (TARG, TARGOFF and TARGLEN). You can see the scalar pointed by TARG (the original scalar passed to substr) repeated at the end (the SV at 0x8948c18 you see at the bottom).
Any read of this magical scalar results in an associated function to be called instead. Similarly, a write calls a different associated function. These functions cause the selected part of the string passed to substr to be read or modified.
perl -E'
$_ = "abcde";
my $ref = \substr($_, 1, 3); # $$ref is magical
say $$ref; # bcd
$$ref = '123';
say $_; # a123e
'
Looks to me like it's overwriting the last length($num) characters of $num2 with the contents of $num in order to get a '0' filled number.
I imagine most folks would accomplish this same task w/ sprintf()