Perl - What scopes/closures/environments are producing this behaviour? - perl

Given a root directory I wish to identify the most shallow parent directory of any .svn directory and pom.xml .
To achieve this I defined the following function
use File::Find;
sub firstDirWithFileUnder {
$needle=#_[0];
my $result = 0;
sub wanted {
print "\twanted->result is '$result'\n";
my $dir = "${File::Find::dir}";
if ($_ eq $needle and ((not $result) or length($dir) < length($result))) {
$result=$dir;
print "Setting result: '$result'\n";
}
}
find(\&wanted, #_[1]);
print "Result: '$result'\n";
return $result;
}
..and call it thus:
$svnDir = firstDirWithFileUnder(".svn",$projPath);
print "\tIdentified svn dir:\n\t'$svnDir'\n";
$pomDir = firstDirWithFileUnder("pom.xml",$projPath);
print "\tIdentified pom.xml dir:\n\t'$pomDir'\n";
There are two situations which arise that I cannot explain:
When the search for a .svn is successful, the value of $result perceived inside the nested subroutine wanted persists into the next call of firstDirWithFileUnder. So when the pom search begins, although the line my $result = 0; still exists, the wanted subroutine sees its value as the return value from the last firstDirWithFileUnder call.
If the my $result = 0; line is commented out, then the function still executes properly. This means a) outer scope (firstDirWithFileUnder) can still see the $result variable to be able to return it, and b) print shows that wanted still sees $result value from last time, i.e. it seems to have formed a closure that's persisted beyond the first call of firstDirWithFileUnder.
Can somebody explain what's happening, and suggest how I can properly reset the value of $result to zero upon entering the outer scope?

Using warnings and then diagnostics yields this helpful information, including a solution:
Variable "$needle" will not stay shared at ----- line 12 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
$result is lexically scoped, meaning a brand new variable is allocated every time you call &firstDirWithFileUnder.
sub wanted { ... } is a compile-time subroutine declaration, meaning it is compiled by the Perl interpreter one time and stored in your package's symbol table. Since it contains a reference to the lexically scoped $result variable, the subroutine definition that Perl saves will only refer to the first instance of $result. The second time you call &firstDirWithFileUnder and declare a new $result variable, this will be a completely different variable than the $result inside &wanted.
You'll want to change your sub wanted { ... } declaration to a lexically scoped, anonymous sub:
my $wanted = sub {
print "\twanted->result is '$result'\n";
...
};
and invoke File::Find::find as
find($wanted, $_[1])
Here, $wanted is a run-time declaration for a subroutine, and it gets redefined with the current reference to $result in every separate call to &firstDirWithFileUnder.
Update: This code snippet may prove instructive:
sub foo {
my $foo = 0; # lexical variable
$bar = 0; # global variable
sub compiletime {
print "compile foo is ", ++$foo, " ", \$foo, "\n";
print "compile bar is ", ++$bar, " ", \$bar, "\n";
}
my $runtime = sub {
print "runtime foo is ", ++$foo, " ", \$foo, "\n";
print "runtime bar is ", ++$bar, " ", \$bar, "\n";
};
&compiletime;
&$runtime;
print "----------------\n";
push #baz, \$foo; # explained below
}
&foo for 1..3;
Typical output:
compile foo is 1 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 2 SCALAR(0xac18c0)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 3 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xa63d18)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 4 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xac1db8)
runtime bar is 2 SCALAR(0xac1938)
----------------
Note that the compile time $foo always refers to the same variable SCALAR(0xac18c0), and that this is also the run time $foo THE FIRST TIME the function is run.
The last line of &foo, push #baz,\$foo is included in this example so that $foo doesn't get garbage collected at the end of &foo. Otherwise, the 2nd and 3rd runtime $foo might point to the same address, even though they refer to different variables (the memory is reallocated each time the variable is declared).

Related

Why "my #variable" inside a sub and modified by sub/sub behaves so strange

I have a strange behaved (to Python programmer) subroutine, which simplified as the following:
use strict;
use Data::Dumper;
sub a {
my #x;
sub b { push #x, 1; print "inside: ", Dumper(\#x); }
&b;
print "outside: ", Dumper(\#x);
}
&a;
&a;
I found the result is:
inside: $VAR1=[ 1 ]
outside: $VAR1 = [ 1 ]
inside: $VAR1=[1, 1]
outside: $VAR1= []
What I thought is when calling &a, #x is empty array after "my #x" and has one element after "&b", then dead. Every time I call &a, it is the same. so the output should be all $VAR1 = [ 1 ].
Then I read something like named sub routine are defined once in symbol table, then I do "my $b = sub { ... }; &$b;", it seems make sense to me.
How to explain?
As per the "perlref" man page:
named subroutines are created at compile time so their lexical
variables [i.e., their 'my' variables] get assigned to the parent
lexicals from the first execution of the parent block. If a parent
scope is entered a second time, its lexicals are created again, while
the nested subs still reference the old ones.
In other words, a named subroutine (your b), has its #x bound to the parent subroutine's "first" #x, so when a is called the first time, b adds a 1 to #x, and both the inner and outer copies refer to this same version. However, the second time a is called, a new #x lexical is created, but b still points to the old one, so it adds a second 1 to that list and prints it (inner), but when it comes time for a to print its version, it prints out the (empty) brand new lexical (outer).
Anonymous subroutines don't exhibit this problem, so when you write my $b = sub { ... }, the inner #x always refers to the "current" version of a's lexical #x.

2 Sub references as arguments in perl

I have perl function I dont what does it do?
my what does min in perl?
#ARVG what does mean?
sub getArgs
{
my $argCnt=0;
my %argH;
for my $arg (#ARGV)
{
if ($arg =~ /^-/) # insert this entry and the next in the hash table
{
$argH{$ARGV[$argCnt]} = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;}
Code like that makes David sad...
Here's a reformatted version of the code doing the indentations correctly. That makes it so much easier to read. I can easily tell where my if and loops start and end:
sub getArgs {
my $argCnt = 0;
my %argH;
for my $arg ( #ARGV ) {
if ( $arg =~ /^-/ ) { # insert this entry and the next in the hash table
$argH{ $ARGV[$argCnt] } = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;
}
The #ARGV is what is passed to the program. It is an array of all the arguments passed. For example, I have a program foo.pl, and I call it like this:
foo.pl one two three four five
In this case, $ARGV is set to the list of values ("one", "two", "three", "four", "five"). The name comes from a similar variable found in the C programming language.
The author is attempting to parse these arguments. For example:
foo.pl -this that -the other
would result in:
$arg{"-this"} = "that";
$arg{"-the"} = "other";
I don't see min. Do you mean my?
This is a wee bit of a complex discussion which would normally involve package variables vs. lexically scoped variables, and how Perl stores variables. To make things easier, I'm going to give you a sort-of incorrect, but technically wrong answer: If you use the (strict) pragma, and you should, you have to declare your variables with my before they can be used. For example, here's a simple two line program that's wrong. Can you see the error?
$name = "Bob";
print "Hello $Name, how are you?\n";
Note that when I set $name to "Bob", $name is with a lowercase n. But, I used $Name (upper case N) in my print statement. As it stands, now. Perl will print out "Hello, how are you?" without a care that I've used the wrong variable name. If it's hard to spot an error like this in a two line program, imagine what it would be like in a 1000 line program.
By using strict and forcing me to declare variables with my, Perl can catch that error:
use strict;
use warnings; # Another Pragma that should always be used
my $name = "Bob";
print "Hello $Name, how are you doing\n";
Now, when I run the program, I get the following error:
Global symbol "$Name" requires explicit package name at (line # of print statement)
This means that $Name isn't defined, and Perl points to where that error is.
When you define variables like this, they are in scope with in the block where it's defined. A block could be the code contained in a set of curly braces or a while, if, or for statement. If you define a variable with my outside of these, it's defined to the end of the file.
Thus, by using my, the variables are only defined inside this subroutine. And, the $arg variable is only defined in the for loop.
One more thing:
The person who wrote this should have used the Getopt::Long module. There's a major bug in their code:
For example:
foo.pl -this that -one -two
In this case, my hash looks like this:
$args{'-this'} = "that";
$args{'-one'} = "-two";
$args{'-two'} = undef;
If I did this:
if ( defined $args{'-two'} ) {
...
}
I would not execute the if statement.
Also:
foo.pl -this=that -one -two
would also fail.
#ARGV is a special variable (refer to perldoc perlvar):
#ARGV
The array #ARGV contains the command-line arguments intended for the
script. $#ARGV is generally the number of arguments minus one, because
$ARGV[0] is the first argument, not the program's command name itself.
See $0 for the command name.
Perl documentation is also available from your command line:
perldoc -v #ARGV

Why does this Perl variable keep its value

What is the difference between the following two Perl variable declarations?
my $foo = 'bar' if 0;
my $baz;
$baz = 'qux' if 0;
The difference is significant when these appear at the top of a loop. For example:
use warnings;
use strict;
foreach my $n (0,1){
my $foo = 'bar' if 0;
print defined $foo ? "defined\n" : "undefined\n";
$foo = 'bar';
print defined $foo ? "defined\n" : "undefined\n";
}
print "==\n";
foreach my $m (0,1){
my $baz;
$baz = 'qux' if 0;
print defined $baz ? "defined\n" : "undefined\n";
$baz = 'qux';
print defined $baz ? "defined\n" : "undefined\n";
}
results in
undefined
defined
defined
defined
==
undefined
defined
undefined
defined
It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?
First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.
my $x has three documented effects:
It declares a symbol at compile-time.
It creates an new variable on execution.
It returns the new variable on execution.
In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.
But if it actually worked that way, the following would create 100 variables:
for (1..100) {
my $x = rand();
print "$x\n";
}
This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:
It declares a symbol at compile-time.
It creates the variable at compile-time[1].
It puts a directive on the stack that will clear[2] the variable when the scope is exited.
It returns the new variable on execution.
As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.
Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.
Some take advantage of the implementation as follows:
sub foo {
my $state if 0;
$state = 5 if !defined($state);
print "$state\n";
++$state;
}
foo(); # 5
foo(); # 6
foo(); # 7
But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using
{
my $state = 5;
sub foo {
print "$state\n";
++$state;
}
}
or
use feature qw( state ); # Or: use 5.010;
sub foo {
state $state = 5;
print "$state\n";
++$state;
}
Notes:
"Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.
If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:
my #a;
for (...) {
my $x = ...;
push #a, \$x;
}
See ikegami's better answer, probably above.
In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.
In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.

How does the closure mechanism work in Perl?

Newbie in Perl again here, trying to understand closure in Perl.
So here's an example of code which I don't understand:
sub make_saying {
my $salute = shift;
my $newfunc = sub {
my $target = shift;
print "$salute, $target!\n";
};
return $newfunc; # Return a closure
}
$f = make_saying("Howdy"); # Create a closure
$g = make_saying("Greetings"); # Create another closure
# Time passes...
$f->("world");
$g->("earthlings");
So my questions are:
If a variable is assigned to a function, is it automatically a reference to that function?
In that above code, can I write $f = \make_saying("Howdy") instead? And when can I use the & because I tried using that in passing the parameters (&$f("world")) but it doesn't work.
and lastly, in that code above how did the strings world and earthlings get appended to the strings Howdy and Greetings.
Note: I understand that $f is somewhat bound to the function with the parameter "Howdy" so that's my understanding how "world" got appended. What I don't understand is the 2nd function inside. How that one operates its magic. Sorry I really don't know how to ask this one.
In Perl, scalar variables cannot hold subroutines directly, they can only hold references. This is very much like scalars cannot hold arrays or hashes, only arrayrefs or hashrefs.
The sub { ... } evaluates to a coderef, so you can directly assign it to a scalar variable. If you want to assign a named function (e.g. foo), you have to obtain the reference like \&foo.
You can call coderefs like $code->(#args) or &$code(#args).
The code
$f = \make_saying("Howdy")
evaluates make_saying("Howdy"), and takes a reference to the returned value. So you get a reference that points to a coderef, not a coderef itself.
Therefore, it can't be called like &$f("world"), you need to dereference one extra level: &$$f("world").
A closure is a function that is bound to a certain environment.
The environment consists of all currently visible variables, so a closure always remembers that scope. In the code
my $x;
sub foo {
my $y;
return sub { "$x, $y" };
}
foo is a closure over $x, as the outer environment consists of $x. The inner sub is a closure over $x and $y.
Each time foo is executed, we get a new $y and therefore a new closure. Most importantly $y does not "go out of scope" when foo is left, but it will be "kept alive" as it's still reachable from the closure returned.
In short: $y is a "local state" of the closure returned.
When we execute make_saying("Howdy"), the $salute variable is set to Howdy. The returned closure remembers that scope.
When we execute it again with make_saying("Greetings"), the body of make_saying is evaluated again. The $salute is now set to Greetings, and the inner sub closes over this variable. This variable is separate from the previous $salute, which still exists, but isn't accessible except through the first closure.
The two greeters have closed over separate $salute variables. When they are executed, their respective $salute is still in scope, and they can access and modify the value.
If a variable is asigned to a function, is it automatically a
reference to that function?
No. In example the function make_saying return reference another function. Such closures do not have name and can catch a variable from outside its scope (variable $salute in your example).
In that above code, can i write $f = \make_saying("Howdy") instead?
And when can i use the & cause i tried using that in passing the
parameters (&$f("world")) but it doesnt work.
No. $f = \make_saying("Howdy") is not what you think (read amon post for details). You can write $f = \&make_saying; which means "put into $f reference to function make_saying". You can use it later like this:
my $f = \&make_saying;
my $other_f = $f->("Howdy");
$other_f->("world");
and lastly, In that code above how in he** did the words world and
earthlings got appended to the words howdy and greetings.
make_saying creating my variable which goes into lamda (my $newfunc = sub); that lambda is returned from make_saying. It holds the given word "Howdy" through "closing" (? sorry dont know which word in english).
Every time you call the subroutine 'make_saying', it:
creates a DIFFERENT closure
assigns the received parameter to the scalar '$salute'
defines (creates but not execute) an inner anonymous subroutine:
That is the reason why at that moment nothing is assigned to the scalar
$target nor is the statement print "$salute, $target!\n"; executed .
finally the subroutine 'make_saying' returns a reference to the inner anonymous subroutine, that reference becomes the only way to call the (specific) anonymous subroutine.
Ever time you call each anonymous subroutine, it:
assign the received parameter to the scalar $target
sees also the scalar $salute that will have the value assigned at the moment in which was created the anonymous subroutine (when was called its parent subroutine make_saying
finally executes the statement print "$salute, $target!\n";

Shared variables in the context of subroutines vs anonymous subroutines

I saw this bit of code in an answer to another post: Why would I use Perl anonymous subroutines instead of a named one?, but couldn't figure out exactly what as going on, so I wanted to run it myself.
sub outer
{
my $a = 123;
sub inner
{
print $a, "\n"; #line 15 (for your reference, all other comments are the OP's)
}
# At this point, $a is 123, so this call should always print 123, right?
inner();
$a = 456;
}
outer(); # prints 123
outer(); # prints 456! Surprise!
In the above example, I received a warning: "Variable $a will not stay shared at line 15.
Obviously, this is why the output is "unexpected," but I still don't really understand what's happening here.
sub outer2
{
my $a = 123;
my $inner = sub
{
print $a, "\n";
};
# At this point, $a is 123, and since the anonymous subrotine
# whose reference is stored in $inner closes over $a in the
# "expected" way...
$inner->();
$a = 456;
}
# ...we see the "expected" results
outer2(); # prints 123
outer2(); # prints 123
In the same vein, I don't understand what's happening in this example either. Could someone please explain?
Thanks in advance.
It has to do with compile-time vs. run-time parsing of subroutines. As the diagnostics message says,
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
Annotating your code:
sub outer
{
# 'my' will reallocate memory for the scalar variable $a
# every time the 'outer' function is called. That is, the address of
# '$a' will be different in the second call to 'outer' than the first call.
my $a = 123;
# the construction 'sub NAME BLOCK' defines a subroutine once,
# at compile-time.
sub inner1
{
# since this subroutine is only getting compiled once, the '$a' below
# refers to the '$a' that is allocated the first time 'outer' is called
print "inner1: ",$a, "\t", \$a, "\n";
}
# the construction sub BLOCK defines an anonymous subroutine, at run time
# '$inner2' is redefined in every call to 'outer'
my $inner2 = sub {
# this '$a' now refers to '$a' from the current call to outer
print "inner2: ", $a, "\t", \$a, "\n";
};
# At this point, $a is 123, so this call should always print 123, right?
inner1();
$inner2->();
# if this is the first call to 'outer', the definition of 'inner1' still
# holds a reference to this instance of the variable '$a', and this
# variable's memory will not be freed when the subroutine ends.
$a = 456;
}
outer();
outer();
Typical output:
inner1: 123 SCALAR(0x80071f50)
inner2: 123 SCALAR(0x80071f50)
inner1: 456 SCALAR(0x80071f50)
inner2: 123 SCALAR(0x8002bcc8)
You can print \&inner; in the first example (after definition), and print $inner; in second.
What you see are hex code references which are equal in first example and differ in second.
So, in the first example inner gets created only once, and it is always closure to $a lexical variable from the first call of the outer().