Why does this Perl variable keep its value - perl

What is the difference between the following two Perl variable declarations?
my $foo = 'bar' if 0;
my $baz;
$baz = 'qux' if 0;
The difference is significant when these appear at the top of a loop. For example:
use warnings;
use strict;
foreach my $n (0,1){
my $foo = 'bar' if 0;
print defined $foo ? "defined\n" : "undefined\n";
$foo = 'bar';
print defined $foo ? "defined\n" : "undefined\n";
}
print "==\n";
foreach my $m (0,1){
my $baz;
$baz = 'qux' if 0;
print defined $baz ? "defined\n" : "undefined\n";
$baz = 'qux';
print defined $baz ? "defined\n" : "undefined\n";
}
results in
undefined
defined
defined
defined
==
undefined
defined
undefined
defined
It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?

First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.
my $x has three documented effects:
It declares a symbol at compile-time.
It creates an new variable on execution.
It returns the new variable on execution.
In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.
But if it actually worked that way, the following would create 100 variables:
for (1..100) {
my $x = rand();
print "$x\n";
}
This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:
It declares a symbol at compile-time.
It creates the variable at compile-time[1].
It puts a directive on the stack that will clear[2] the variable when the scope is exited.
It returns the new variable on execution.
As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.
Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.
Some take advantage of the implementation as follows:
sub foo {
my $state if 0;
$state = 5 if !defined($state);
print "$state\n";
++$state;
}
foo(); # 5
foo(); # 6
foo(); # 7
But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using
{
my $state = 5;
sub foo {
print "$state\n";
++$state;
}
}
or
use feature qw( state ); # Or: use 5.010;
sub foo {
state $state = 5;
print "$state\n";
++$state;
}
Notes:
"Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.
If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:
my #a;
for (...) {
my $x = ...;
push #a, \$x;
}

See ikegami's better answer, probably above.
In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.
In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.

Related

Per: Visibility of variables of a subroutines for its subroutine

I have a problem about variables of a subroutine which cannot be accessed by another subroutine.
the first subroutine :
sub esr_info {
my $esr ;
my #vpls = () ;
my #sap = ();
my #spoke = () ;
&conf_esr($esr , 1);
}
the second :
sub conf_esr {
my $e = #_[0] ;
some code (#vpls, #sap, #spoke);
}
the first calls the second, and I need the variables of the first to be local and not global for the whole code (for threading purposes). The second uses all the variables of the first . I get these errors :
Global symbol "$esr" requires explicit package name (did you forget to declare "my $esr"?) at w.pl line 63.
Global symbol "#vpls" requires explicit package name (did you forget to declare "my #vpls"?) at w.pl line 74.
My question : Can a subroutine access the vars of another without declaring those vars as global ?
Many thanks for reading the post.
You can contain (restrict the visibility of) the variables to the two subs by introducing a scope { ... }, for example:
{
my $esr ;
my #vpls = () ;
my #sap = ();
my #spoke = () ;
sub esr_info {
conf_esr($esr , 1);
}
sub conf_esr {
my $e = #_[0] ;
#some code (#vpls, #sap, #spoke);
}
}
But note that the variables now retain the values after the subs are exited (they become state variables). This is also called a closure.
But other approaches could be more appropriate (closures can make the code more difficult to read and hence to maintain) depending on you situation. For example, alternatives could be:
you could pass references to the variables as arguments to conf_esr, or better
use an object oriented approach where the variables are contained in a $self hash.
My question : Can a subroutine access the vars of another without declaring those vars as global ?
No. You should try passing in variables, it's better form, but you can also use global variables.
my $i=1;
mysub(); # This will not change the global $i
print "i=$i\n"; # This should print '1'
exit;
##########
sub mysub
{my $i=2; # This is a variable local to mysub() only.
return;
}
Type in the code above and run it with Perl. Notice that the $i in the subroutine mysub() is completely different than the global $i in the program itself, because the $i in the mysub() is a different memory address.
Now let's change $i to global. mysub() will change the global $i because it doesn't have a local $i declared.
my $i=1;
mysub(); # This will not change the global $i
print "i=$i\n"; # This should print '2'
exit;
##########
sub mysub
{$i=2; # This is changing the value in the global $i memory area.
return;
}

Can someone explain why Perl behaves this way (variable scoping)?

My test goes like this:
use strict;
use warnings;
func();
my $string = 'string';
func();
sub func {
print $string, "\n";
}
And the result is:
Use of uninitialized value $string in print at test.pl line 10.
string
Perl allows us to call a function before it has been defined. However when the function uses a variable declared only after the function call, the variable appears to be undefined. Is this behavior documented somewhere? Thank you!
The behaviour of my is documented in perlsub - it boils down to this - perl knows $string is in scope - because the my tells it so.
The my operator declares the listed variables to be lexically confined to the enclosing block, conditional (if/unless/elsif/else), loop (for/foreach/while/until/continue), subroutine, eval, or do/require/use'd file.
It means it's 'in scope' from the point at which it's first 'seen' until the closing bracket of the current 'block'. (Or in your example - the end of the code)
However - in your example my also assigns a value.
This scoping process happens at compile time - where perl checks where it's valid to use $string or not. (Thanks to strict). However - it can't know what the value was, because that might change during code execution. (and is non-trivial to analyze)
So if you do this it might be a little clearer what's going on:
#!/usr/bin/env perl
use strict;
use warnings;
my $string; #undefined
func();
$string = 'string';
func();
sub func {
print $string, "\n";
}
$string is in scope in both cases - because the my happened at compile time - before the subroutine has been called - but it doesn't have a value set beyond the default of undef prior to the first invocation.
Note this contrasts with:
#!/usr/bin/env perl
use strict;
use warnings;
sub func {
print $string, "\n";
}
my $string; #undefined
func();
$string = 'string';
func();
Which errors because when the sub is declared, $string isn't in scope.
First of all, I would consider this undefined behaviour since it skips executing my like my $x if $cond; does.
That said, the behaviour is currently consistent and predictable. And in this instance, it behaves exactly as expected if the optimization that warranted the undefined behaviour notice didn't exit.
At compile-time, my has the effect of declaring and allocating the variable[1]. Scalars are initialized to undef when created. Arrays and hashes are created empty.
my $string was encountered by the compiler, so the variable was created. But since you haven't executed the assignment yet, it still has its default value (undefined) during the first call to func.
This model allows variables to be captured by closures.
Example 1:
{
my $x = "abc";
sub foo { $x } # Named subs capture at compile-time.
}
say foo(); # abc, even though $x fell out of scope before foo was called.
Example 2:
sub make_closure {
my ($x) = #_;
return sub { $x }; # Anon subs capture at run-time.
}
my $foo = make_closure("foo");
my $bar = make_closure("bar");
say $foo->(); # foo
say $bar->(); # bar
The allocation is possibly deferred until the variable is actually used.

Query reg code in List::Util::reduce

I came across the following code in List::Util for reduce subroutine.
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
I could understand that reduce function is called as:
my $sum = reduce { $a + $b } 1 .. 1000;
So, I understood the code is trying to reference $a mentioned in the subroutine. But, I am unable to understand the intent correctly.
For reference, I am adding the complete code for subroutine
sub reduce (&#) {
my $code = shift;
require Scalar::Util;
my $type = Scalar::Util::reftype($code);
unless($type and $type eq 'CODE') {
require Carp;
Carp::croak("Not a subroutine reference");
}
no strict 'refs';
return shift unless #_ > 1;
use vars qw($a $b);
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
$a = shift;
foreach (#_) {
$b = $_;
$a = &{$code}();
}
$a;
}
The following aliases package variable $foo to variable $bar.
*foo = \$bar;
Any change to one changes the other as both names refer to the same scalar.
$ perl -E'
*foo = \$bar;
$bar=123; say $foo;
$foo=456; say $bar;
say \$foo == \$bar ? 1 : 0;
'
123
456
1
Of course, you can fully qualify *foo since it's a symbol table entry. The following aliases package variable $main::foo to $bar.
*main::foo = \$bar;
Or, if you don't know the name at compile time
my $caller = 'main';
*{$caller."::foo"} = \$bar; # Symbolic reference
$bar, of course, can just as easily be a lexical variable as a package variable. And since my $bar; actually returns the variable begin declared,
my $bar;
*foo = \$bar;
can be written as
*foo = \my $bar;
So,
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
declares and aliases lexical variables $a and $b the similarly named package variables in the caller's namespace.
local simply causes everything to return to their original state once the sub is exited.
On scope
Perl has two variable name scoping mechnisms: global and lexical. Declaration of lexical vars is done with my, and they are accessibly by this name until they encounter a closing curly brace.
Global variables, on the other hand, are accessible from anywhere and do not have a scope. They can be declared with our and use vars, or do not have to be declared if strict is not in effect. However, they have namespaces, or packages. The namespace is a prefix seperated from the variable name by two colons (or a single quote, but never do that). Inside the package of the variable, the variable can be accessed with or without the prefix. Outside of the package, the prefix is required.
The local function is somewhat special and gives global variables a temporary value. The scope of this value is the same as that of a lexical variable plus the scopes of all subs called within this scope. The old value is restored once this scope is exited. This is called the dynamic scope.
On Globs
Perl organizes global variables in a big hash representing the namespace and all variable names (sometimes called the stash). In each slot of this hash, there is a so-called glob. A typeglob is a special hash that has a field for each of Perls native types, e.g. scalar, array, hash, IO, format, code etc. You assign to a slot by passing the glob a reference of a value you want to add - the glob figures out the right slot on it's own. This is also the reason you can have multiple variables with the same name (like $thing, #thing, %thing, thing()). Typeglobs have a special sigil, namely the asterisk *.
On no strict 'refs'
The no strict 'refs' is a cool thing if you know what you are doing. Normally you can only dereference normal references, e.g.
my #array = (1 .. 5);
my $arrayref = \#array; # is a reference
push #{$arrayref}, 6; # works
push #{array}, 6; # works; barewords are considered o.k.
push #{"array"}, 6; # dies horribly, if strict refs enabled.
The last line tried to dereference a string, this is considered bad practice. However, under no strict 'refs', we can access a variable of which we do not know the name at compile time, as we do here.
Conclusion
The caller functions returns the name of the package of the calling code, i.e. it looks up one call stack frame. The name is used here to construct the full names of $a and $b variables of the calling packages, so that they can be used there without a prefix. Then, these names are locally (i.e. in the dynamic scope) assigned to the reference of a newly declared, lexical variable.
The global variables $a and $b are predeclared in each package.
In the foreach loop, these lexicals are assigned different values (lexical vars take precedence over global vars), but the global variables $foo::a and $foo::$b point to the same data because of the reference, allowing the anonymous callback sub in the reduce call to read the two arguments easily. (See ikegamis answer for details on this.)
All of this hassle is good because (a) the effects are not externaly visible, and (b) the callback doesn't have to do tedious argument unpacking.

Perl - What scopes/closures/environments are producing this behaviour?

Given a root directory I wish to identify the most shallow parent directory of any .svn directory and pom.xml .
To achieve this I defined the following function
use File::Find;
sub firstDirWithFileUnder {
$needle=#_[0];
my $result = 0;
sub wanted {
print "\twanted->result is '$result'\n";
my $dir = "${File::Find::dir}";
if ($_ eq $needle and ((not $result) or length($dir) < length($result))) {
$result=$dir;
print "Setting result: '$result'\n";
}
}
find(\&wanted, #_[1]);
print "Result: '$result'\n";
return $result;
}
..and call it thus:
$svnDir = firstDirWithFileUnder(".svn",$projPath);
print "\tIdentified svn dir:\n\t'$svnDir'\n";
$pomDir = firstDirWithFileUnder("pom.xml",$projPath);
print "\tIdentified pom.xml dir:\n\t'$pomDir'\n";
There are two situations which arise that I cannot explain:
When the search for a .svn is successful, the value of $result perceived inside the nested subroutine wanted persists into the next call of firstDirWithFileUnder. So when the pom search begins, although the line my $result = 0; still exists, the wanted subroutine sees its value as the return value from the last firstDirWithFileUnder call.
If the my $result = 0; line is commented out, then the function still executes properly. This means a) outer scope (firstDirWithFileUnder) can still see the $result variable to be able to return it, and b) print shows that wanted still sees $result value from last time, i.e. it seems to have formed a closure that's persisted beyond the first call of firstDirWithFileUnder.
Can somebody explain what's happening, and suggest how I can properly reset the value of $result to zero upon entering the outer scope?
Using warnings and then diagnostics yields this helpful information, including a solution:
Variable "$needle" will not stay shared at ----- line 12 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
$result is lexically scoped, meaning a brand new variable is allocated every time you call &firstDirWithFileUnder.
sub wanted { ... } is a compile-time subroutine declaration, meaning it is compiled by the Perl interpreter one time and stored in your package's symbol table. Since it contains a reference to the lexically scoped $result variable, the subroutine definition that Perl saves will only refer to the first instance of $result. The second time you call &firstDirWithFileUnder and declare a new $result variable, this will be a completely different variable than the $result inside &wanted.
You'll want to change your sub wanted { ... } declaration to a lexically scoped, anonymous sub:
my $wanted = sub {
print "\twanted->result is '$result'\n";
...
};
and invoke File::Find::find as
find($wanted, $_[1])
Here, $wanted is a run-time declaration for a subroutine, and it gets redefined with the current reference to $result in every separate call to &firstDirWithFileUnder.
Update: This code snippet may prove instructive:
sub foo {
my $foo = 0; # lexical variable
$bar = 0; # global variable
sub compiletime {
print "compile foo is ", ++$foo, " ", \$foo, "\n";
print "compile bar is ", ++$bar, " ", \$bar, "\n";
}
my $runtime = sub {
print "runtime foo is ", ++$foo, " ", \$foo, "\n";
print "runtime bar is ", ++$bar, " ", \$bar, "\n";
};
&compiletime;
&$runtime;
print "----------------\n";
push #baz, \$foo; # explained below
}
&foo for 1..3;
Typical output:
compile foo is 1 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 2 SCALAR(0xac18c0)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 3 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xa63d18)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 4 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xac1db8)
runtime bar is 2 SCALAR(0xac1938)
----------------
Note that the compile time $foo always refers to the same variable SCALAR(0xac18c0), and that this is also the run time $foo THE FIRST TIME the function is run.
The last line of &foo, push #baz,\$foo is included in this example so that $foo doesn't get garbage collected at the end of &foo. Otherwise, the 2nd and 3rd runtime $foo might point to the same address, even though they refer to different variables (the memory is reallocated each time the variable is declared).

For Loop and Lexically Scoped Variables

Version #1
use warnings;
use strict;
my $count = 4;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
This prints:
1
2
3
4
5
6
4
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
Version #2
use warnings;
use strict;
my $count;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
1
2
3
4
5
6
Count not defined
Whoops! I wanted to capture the exit value of $count, but the for loop had it's own lexically scoped version of $count!. I just had someone spend two hours trying to track down this bug.
Version #3
use warnings;
use strict;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
print "That's all folks!\n";
This gives me the error Global symbol "$count" requires explicit package name at line 5. But, I thought $count was automatically lexically scoped inside the for block. It seems like that only occurs when I've already declared a lexically scoped version of this variable elsewhere.
What was the reason for this behavior? Yes, I know about Conway's dictate that you should always use my for the for loop variable, but the question is why was the Perl interpretor designed this way.
In Perl, assignment to the variable in the loop is always localized to the loop, and the loop variable is always an alias to the looped over value (meaning you can change the original elements by modifying the loop variable). This is true both for package variables (our) and lexical variables (my).
This behavior is closest to that of Perl's dynamic scoping of package variables (with the local keyword), but is also special cased to work with lexical variables (either declared in the loop or before hand).
In no case though does the looped over value persist in the loop variable after the loop ends. For a loop scoped variable, this is fairly intuitive, but for variables with scope beyond the loop, the behavior is analogous to a value localized (with local) inside of a block scope created by the loop.
for our $val (1 .. 10) {...}
is equivalent to:
our $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
local *val = \$list[$i++];
# loop body
}
In pure perl it is not possible to write the expanded lexical version, but if a module like Data::Alias is used:
my $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
alias $val = $list[$i++];
# loop body
}
Actually, in version #3 the variable is "localized" as opposed to lexically scoped.
The "foreach" loop iterates over a normal list value and sets the
variable VAR to be each element of the list in turn. If the variable
is preceded with the keyword "my", then it is lexically scoped, and is
therefore visible only within the loop. Otherwise, the variable is
implicitly local to the loop and regains its former value upon exiting
the loop. If the variable was previously declared with "my", it uses
that variable instead of the global one, but it's still localized to
the loop. This implicit localisation occurs only in a "foreach" loop.
In any case, you will not be able to access the loop variable from that stlye of for-loop outside the loop. But you could use the other style (C-style) for-loop:
my $count;
for ($count=1; $count <= 8; $count++) {
last if $count == 6;
}
... # $count is now 6.
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
This is wrong. If you had written for my $count (...) { ... } it would be true, but you didn't. Instead, if $count is already a global, it's localized -- the global that already exists is set to new values during the execution of the loop, and set back when it's done. The difference should be clear from this:
our $x = "orig";
sub foo {
print $x, "\n";
}
foo();
for $x (1 .. 3) {
foo();
}
for my $x (1 .. 3) {
foo();
}
The output is
orig
1
2
3
orig
orig
orig
The first for loop, without my, is changing the value of the global $x that already exists. The second for loop, with my, is creating a new lexical $x that isn't visible outside the loop. They're not the same.
This is also why example #3 fails -- since there isn't a lexical $count in scope and you haven't declared that you intend to touch the package global $count, strict 'vars' stops you in your tracks. It's not really behaving any differently for a for loop than anything else.