I saw this bit of code in an answer to another post: Why would I use Perl anonymous subroutines instead of a named one?, but couldn't figure out exactly what as going on, so I wanted to run it myself.
sub outer
{
my $a = 123;
sub inner
{
print $a, "\n"; #line 15 (for your reference, all other comments are the OP's)
}
# At this point, $a is 123, so this call should always print 123, right?
inner();
$a = 456;
}
outer(); # prints 123
outer(); # prints 456! Surprise!
In the above example, I received a warning: "Variable $a will not stay shared at line 15.
Obviously, this is why the output is "unexpected," but I still don't really understand what's happening here.
sub outer2
{
my $a = 123;
my $inner = sub
{
print $a, "\n";
};
# At this point, $a is 123, and since the anonymous subrotine
# whose reference is stored in $inner closes over $a in the
# "expected" way...
$inner->();
$a = 456;
}
# ...we see the "expected" results
outer2(); # prints 123
outer2(); # prints 123
In the same vein, I don't understand what's happening in this example either. Could someone please explain?
Thanks in advance.
It has to do with compile-time vs. run-time parsing of subroutines. As the diagnostics message says,
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
Annotating your code:
sub outer
{
# 'my' will reallocate memory for the scalar variable $a
# every time the 'outer' function is called. That is, the address of
# '$a' will be different in the second call to 'outer' than the first call.
my $a = 123;
# the construction 'sub NAME BLOCK' defines a subroutine once,
# at compile-time.
sub inner1
{
# since this subroutine is only getting compiled once, the '$a' below
# refers to the '$a' that is allocated the first time 'outer' is called
print "inner1: ",$a, "\t", \$a, "\n";
}
# the construction sub BLOCK defines an anonymous subroutine, at run time
# '$inner2' is redefined in every call to 'outer'
my $inner2 = sub {
# this '$a' now refers to '$a' from the current call to outer
print "inner2: ", $a, "\t", \$a, "\n";
};
# At this point, $a is 123, so this call should always print 123, right?
inner1();
$inner2->();
# if this is the first call to 'outer', the definition of 'inner1' still
# holds a reference to this instance of the variable '$a', and this
# variable's memory will not be freed when the subroutine ends.
$a = 456;
}
outer();
outer();
Typical output:
inner1: 123 SCALAR(0x80071f50)
inner2: 123 SCALAR(0x80071f50)
inner1: 456 SCALAR(0x80071f50)
inner2: 123 SCALAR(0x8002bcc8)
You can print \&inner; in the first example (after definition), and print $inner; in second.
What you see are hex code references which are equal in first example and differ in second.
So, in the first example inner gets created only once, and it is always closure to $a lexical variable from the first call of the outer().
Related
This question already has an answer here:
Perl - What scopes/closures/environments are producing this behaviour?
(1 answer)
Closed 5 years ago.
I have executed the following piece of a simple nested subroutine and the output of it makes me crazy.
#!/usr/bin/perl
use strict;
use warnings;
sub outer {
my $a = "123";
sub inner {
print "$a\n";
}
inner();
$a = "456";
}
outer();
outer();
Output to this is
Variable "$a" will not stay shared at E:\Perl\source\public\sss.pl line 9.
123
456
But how is this possible?
I call the inner subroutine when $a value is 123, but why am I getting 456 when outer is called the second time.
perldoc diagnostics gives quite self explanatory description for the warning Variable "$a" will not stay shared,
use strict;
use warnings;
use diagnostics;
sub outer {
my $a = "123";
sub inner {
print "$a\n";
}
inner();
$a = "456";
}
outer();
outer();
output
Variable "$a" will not stay shared at -e line 9 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the *first*
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
123
456
There is no point in declaring an subroutine within another one. It works as if it were declared at the top level, and won't function properly as a closure
If you enable lexical subroutines (and disable the corresponding "experimental" warning) and declare inner as my sub inner then your code will work as you expect
#!/usr/bin/perl
use strict;
use warnings 'all';
use feature 'lexical_subs';
no warnings 'experimental::lexical_subs';
sub outer {
my $a = "123";
my sub inner {
print "$a\n";
}
inner();
$a = "456";
}
outer();
outer();
output
123
123
I have a strange behaved (to Python programmer) subroutine, which simplified as the following:
use strict;
use Data::Dumper;
sub a {
my #x;
sub b { push #x, 1; print "inside: ", Dumper(\#x); }
&b;
print "outside: ", Dumper(\#x);
}
&a;
&a;
I found the result is:
inside: $VAR1=[ 1 ]
outside: $VAR1 = [ 1 ]
inside: $VAR1=[1, 1]
outside: $VAR1= []
What I thought is when calling &a, #x is empty array after "my #x" and has one element after "&b", then dead. Every time I call &a, it is the same. so the output should be all $VAR1 = [ 1 ].
Then I read something like named sub routine are defined once in symbol table, then I do "my $b = sub { ... }; &$b;", it seems make sense to me.
How to explain?
As per the "perlref" man page:
named subroutines are created at compile time so their lexical
variables [i.e., their 'my' variables] get assigned to the parent
lexicals from the first execution of the parent block. If a parent
scope is entered a second time, its lexicals are created again, while
the nested subs still reference the old ones.
In other words, a named subroutine (your b), has its #x bound to the parent subroutine's "first" #x, so when a is called the first time, b adds a 1 to #x, and both the inner and outer copies refer to this same version. However, the second time a is called, a new #x lexical is created, but b still points to the old one, so it adds a second 1 to that list and prints it (inner), but when it comes time for a to print its version, it prints out the (empty) brand new lexical (outer).
Anonymous subroutines don't exhibit this problem, so when you write my $b = sub { ... }, the inner #x always refers to the "current" version of a's lexical #x.
My test goes like this:
use strict;
use warnings;
func();
my $string = 'string';
func();
sub func {
print $string, "\n";
}
And the result is:
Use of uninitialized value $string in print at test.pl line 10.
string
Perl allows us to call a function before it has been defined. However when the function uses a variable declared only after the function call, the variable appears to be undefined. Is this behavior documented somewhere? Thank you!
The behaviour of my is documented in perlsub - it boils down to this - perl knows $string is in scope - because the my tells it so.
The my operator declares the listed variables to be lexically confined to the enclosing block, conditional (if/unless/elsif/else), loop (for/foreach/while/until/continue), subroutine, eval, or do/require/use'd file.
It means it's 'in scope' from the point at which it's first 'seen' until the closing bracket of the current 'block'. (Or in your example - the end of the code)
However - in your example my also assigns a value.
This scoping process happens at compile time - where perl checks where it's valid to use $string or not. (Thanks to strict). However - it can't know what the value was, because that might change during code execution. (and is non-trivial to analyze)
So if you do this it might be a little clearer what's going on:
#!/usr/bin/env perl
use strict;
use warnings;
my $string; #undefined
func();
$string = 'string';
func();
sub func {
print $string, "\n";
}
$string is in scope in both cases - because the my happened at compile time - before the subroutine has been called - but it doesn't have a value set beyond the default of undef prior to the first invocation.
Note this contrasts with:
#!/usr/bin/env perl
use strict;
use warnings;
sub func {
print $string, "\n";
}
my $string; #undefined
func();
$string = 'string';
func();
Which errors because when the sub is declared, $string isn't in scope.
First of all, I would consider this undefined behaviour since it skips executing my like my $x if $cond; does.
That said, the behaviour is currently consistent and predictable. And in this instance, it behaves exactly as expected if the optimization that warranted the undefined behaviour notice didn't exit.
At compile-time, my has the effect of declaring and allocating the variable[1]. Scalars are initialized to undef when created. Arrays and hashes are created empty.
my $string was encountered by the compiler, so the variable was created. But since you haven't executed the assignment yet, it still has its default value (undefined) during the first call to func.
This model allows variables to be captured by closures.
Example 1:
{
my $x = "abc";
sub foo { $x } # Named subs capture at compile-time.
}
say foo(); # abc, even though $x fell out of scope before foo was called.
Example 2:
sub make_closure {
my ($x) = #_;
return sub { $x }; # Anon subs capture at run-time.
}
my $foo = make_closure("foo");
my $bar = make_closure("bar");
say $foo->(); # foo
say $bar->(); # bar
The allocation is possibly deferred until the variable is actually used.
What is the difference between the following two Perl variable declarations?
my $foo = 'bar' if 0;
my $baz;
$baz = 'qux' if 0;
The difference is significant when these appear at the top of a loop. For example:
use warnings;
use strict;
foreach my $n (0,1){
my $foo = 'bar' if 0;
print defined $foo ? "defined\n" : "undefined\n";
$foo = 'bar';
print defined $foo ? "defined\n" : "undefined\n";
}
print "==\n";
foreach my $m (0,1){
my $baz;
$baz = 'qux' if 0;
print defined $baz ? "defined\n" : "undefined\n";
$baz = 'qux';
print defined $baz ? "defined\n" : "undefined\n";
}
results in
undefined
defined
defined
defined
==
undefined
defined
undefined
defined
It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?
First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.
my $x has three documented effects:
It declares a symbol at compile-time.
It creates an new variable on execution.
It returns the new variable on execution.
In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.
But if it actually worked that way, the following would create 100 variables:
for (1..100) {
my $x = rand();
print "$x\n";
}
This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:
It declares a symbol at compile-time.
It creates the variable at compile-time[1].
It puts a directive on the stack that will clear[2] the variable when the scope is exited.
It returns the new variable on execution.
As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.
Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.
Some take advantage of the implementation as follows:
sub foo {
my $state if 0;
$state = 5 if !defined($state);
print "$state\n";
++$state;
}
foo(); # 5
foo(); # 6
foo(); # 7
But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using
{
my $state = 5;
sub foo {
print "$state\n";
++$state;
}
}
or
use feature qw( state ); # Or: use 5.010;
sub foo {
state $state = 5;
print "$state\n";
++$state;
}
Notes:
"Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.
If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:
my #a;
for (...) {
my $x = ...;
push #a, \$x;
}
See ikegami's better answer, probably above.
In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.
In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.
Given a root directory I wish to identify the most shallow parent directory of any .svn directory and pom.xml .
To achieve this I defined the following function
use File::Find;
sub firstDirWithFileUnder {
$needle=#_[0];
my $result = 0;
sub wanted {
print "\twanted->result is '$result'\n";
my $dir = "${File::Find::dir}";
if ($_ eq $needle and ((not $result) or length($dir) < length($result))) {
$result=$dir;
print "Setting result: '$result'\n";
}
}
find(\&wanted, #_[1]);
print "Result: '$result'\n";
return $result;
}
..and call it thus:
$svnDir = firstDirWithFileUnder(".svn",$projPath);
print "\tIdentified svn dir:\n\t'$svnDir'\n";
$pomDir = firstDirWithFileUnder("pom.xml",$projPath);
print "\tIdentified pom.xml dir:\n\t'$pomDir'\n";
There are two situations which arise that I cannot explain:
When the search for a .svn is successful, the value of $result perceived inside the nested subroutine wanted persists into the next call of firstDirWithFileUnder. So when the pom search begins, although the line my $result = 0; still exists, the wanted subroutine sees its value as the return value from the last firstDirWithFileUnder call.
If the my $result = 0; line is commented out, then the function still executes properly. This means a) outer scope (firstDirWithFileUnder) can still see the $result variable to be able to return it, and b) print shows that wanted still sees $result value from last time, i.e. it seems to have formed a closure that's persisted beyond the first call of firstDirWithFileUnder.
Can somebody explain what's happening, and suggest how I can properly reset the value of $result to zero upon entering the outer scope?
Using warnings and then diagnostics yields this helpful information, including a solution:
Variable "$needle" will not stay shared at ----- line 12 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
$result is lexically scoped, meaning a brand new variable is allocated every time you call &firstDirWithFileUnder.
sub wanted { ... } is a compile-time subroutine declaration, meaning it is compiled by the Perl interpreter one time and stored in your package's symbol table. Since it contains a reference to the lexically scoped $result variable, the subroutine definition that Perl saves will only refer to the first instance of $result. The second time you call &firstDirWithFileUnder and declare a new $result variable, this will be a completely different variable than the $result inside &wanted.
You'll want to change your sub wanted { ... } declaration to a lexically scoped, anonymous sub:
my $wanted = sub {
print "\twanted->result is '$result'\n";
...
};
and invoke File::Find::find as
find($wanted, $_[1])
Here, $wanted is a run-time declaration for a subroutine, and it gets redefined with the current reference to $result in every separate call to &firstDirWithFileUnder.
Update: This code snippet may prove instructive:
sub foo {
my $foo = 0; # lexical variable
$bar = 0; # global variable
sub compiletime {
print "compile foo is ", ++$foo, " ", \$foo, "\n";
print "compile bar is ", ++$bar, " ", \$bar, "\n";
}
my $runtime = sub {
print "runtime foo is ", ++$foo, " ", \$foo, "\n";
print "runtime bar is ", ++$bar, " ", \$bar, "\n";
};
&compiletime;
&$runtime;
print "----------------\n";
push #baz, \$foo; # explained below
}
&foo for 1..3;
Typical output:
compile foo is 1 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 2 SCALAR(0xac18c0)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 3 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xa63d18)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 4 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xac1db8)
runtime bar is 2 SCALAR(0xac1938)
----------------
Note that the compile time $foo always refers to the same variable SCALAR(0xac18c0), and that this is also the run time $foo THE FIRST TIME the function is run.
The last line of &foo, push #baz,\$foo is included in this example so that $foo doesn't get garbage collected at the end of &foo. Otherwise, the 2nd and 3rd runtime $foo might point to the same address, even though they refer to different variables (the memory is reallocated each time the variable is declared).