Explain scoping for functions called from PowerShell closures - powershell

The following PowerShell code displays unexpected scoping behavior for functions called from closures. Can you explain whether this is "by design", or is a defect?
function runblock($block) {
$x = 4
& $block
}
function printx() {
" in printx: x=" + $x
}
"PSVersion $($PSVersionTable.PSVersion)"
$x = 1
$b = {"In block x=" + $x ; $x = 3 ; printx}
$x = 2
runblock $b
$x = 1
$b = {"In closure x=" + $x ; $x = 3 ; printx}.GetNewClosure()
$x = 2
runblock $b
Output from the above is
PSVersion 3.0
In block x=4
in printx: x=3
In closure x=1
in printx: x=4
Most of the output makes sense to me:
The script block outputs In block x=4 since its parent scope is the runblock function. The printx function outputs x=3 since its parent scope is the script block scope.
The closure outputs In closure x=1 since the value of $x was captured by GetNewClosure call. All as expected.
BUT:
The call to printx from the closure outputs in printx: x=4. So the scope that printx executes within is unaffected by the scope of the closure where $x = 3.
It seems strange to me that a function called from a normal script block does see the variables in the script block scope, but a function called from a closure does not see the variables in the closure.

Consider following code:
function Run {
param($ScriptBlock)
$a = 3
# Picture 4
& $ScriptBlock
}
function Print {
# Picture 6
"Print `$a=$a"
"Print `$b=$b"
}
$a = 1
$b = 1
# Picture 1
$SB = {
# Picture 5
"Closure `$a=$a"
"Closure `$b=$b"
Print
}.GetNewClosure()
# Picture 2
$a = 2
$b = 2
# Picture 3
Run $SB
It prints:
Closure $a=1
Closure $b=1
Print $a=3
Print $b=2
Picture 1: you start from global scope, where you define some variables.
Picture 2: GetNewClosure() create new module and copy variables to it. (red arrow show parent scope relationship)
Picture 3: you change value of variables. Module scope not affected.
Picture 4: function Run called. It create local variable $a. (blue arrow show call direction)
Picture 5: $SB script block called. Script block bound to module session state, so you transit to it.
Picture 6: function Print called. Function bound to global session state, so you return to it.

Related

Global variable changed in function not effective

I just tried this code:
$number = 2
Function Convert-Foo {
$number = 3
}
Convert-Foo
$number
I expected that function Convert-Foo would change $number to 3, but it is still 2.
Why isn't the global variable $number changed to 3 by the function?
No, I'm afraid PowerShell isn't designed that way. You have to think in scopes, for more information on this topic please read the PowerShell help about scopes or type Get-Help about_scopes in your PowerShell ISE/Console.
The short answer is that if you want to change a variable that is in the global scope, you should address the global scope:
$number = 2
Function Convert-Foo {
$global:number = 3
}
Convert-Foo
$number
All variables created inside a Function are not visible outside of the function, unless you explicitly defined them as Script or Global. It's best practice to save the result of a function in another variable, so you can use it in the script scope:
$number = 5
Function Convert-Foo {
# do manipulations in the function
# and return the new value
$number * 10
}
$result = Convert-Foo
# Now you can use the value outside the function:
"The result of the function is '$result'"

Concisely passing functions as parameters

This function accepts two values and a function, and calls that function with the two values:
function abc ($a, $func, $b) { &$func $a $b }
Define a function to pass:
function bcd ($x, $y) { $x + $y }
Pass bcd to abc:
abc 10 bcd 20
The result is 30.
It appears that the bcd function object itself isn't being passed, but the string "bcd". Then abc invokes bcd by its name.
Is this considered an acceptable way to pass a function to another function? Most examples I've seen suggest passing a function in a script block which will be invoked in the receiving function. However that approach is more verbose than the above method.
You're correct that it works, as long as the scope doesn't change enough to invalidate the naming. So it's a bit fragile for truly general code; I wouldn't recommend it even in your profile script, for example. (Speaking from some painful experience with insufficiently general profile functions here.)
However, consider this sample, which is even shorter and more robust:
function abc($a, $func, $b) { &$func $a $b }
abc 10 { param($a, $b); $a+$b } 20
(Prints out 30.) You can do whatever you'd normally want with that param block, including validation.
abc 10 {param([parameter(Mandatory=$true)]$a, [parameter(Mandatory=$true)]$b); $a+$b} 20
Alternatively, predefine the function like this:
$bcd = { param($a, $b); $a+$b }
And continue as usual:
abc 10 $bcd 20
Unless there's some reason the passed function call needs to be isolated to it's own scope you can simplify that by just passing a script block and invoking it in the function's local scope:
function abc($a, $func, $b) {.$func}
abc 10 {$a+$b} 20
30

Why does this Perl variable keep its value

What is the difference between the following two Perl variable declarations?
my $foo = 'bar' if 0;
my $baz;
$baz = 'qux' if 0;
The difference is significant when these appear at the top of a loop. For example:
use warnings;
use strict;
foreach my $n (0,1){
my $foo = 'bar' if 0;
print defined $foo ? "defined\n" : "undefined\n";
$foo = 'bar';
print defined $foo ? "defined\n" : "undefined\n";
}
print "==\n";
foreach my $m (0,1){
my $baz;
$baz = 'qux' if 0;
print defined $baz ? "defined\n" : "undefined\n";
$baz = 'qux';
print defined $baz ? "defined\n" : "undefined\n";
}
results in
undefined
defined
defined
defined
==
undefined
defined
undefined
defined
It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?
First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.
my $x has three documented effects:
It declares a symbol at compile-time.
It creates an new variable on execution.
It returns the new variable on execution.
In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.
But if it actually worked that way, the following would create 100 variables:
for (1..100) {
my $x = rand();
print "$x\n";
}
This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:
It declares a symbol at compile-time.
It creates the variable at compile-time[1].
It puts a directive on the stack that will clear[2] the variable when the scope is exited.
It returns the new variable on execution.
As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.
Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.
Some take advantage of the implementation as follows:
sub foo {
my $state if 0;
$state = 5 if !defined($state);
print "$state\n";
++$state;
}
foo(); # 5
foo(); # 6
foo(); # 7
But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using
{
my $state = 5;
sub foo {
print "$state\n";
++$state;
}
}
or
use feature qw( state ); # Or: use 5.010;
sub foo {
state $state = 5;
print "$state\n";
++$state;
}
Notes:
"Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.
If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:
my #a;
for (...) {
my $x = ...;
push #a, \$x;
}
See ikegami's better answer, probably above.
In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.
In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.

Perl - What scopes/closures/environments are producing this behaviour?

Given a root directory I wish to identify the most shallow parent directory of any .svn directory and pom.xml .
To achieve this I defined the following function
use File::Find;
sub firstDirWithFileUnder {
$needle=#_[0];
my $result = 0;
sub wanted {
print "\twanted->result is '$result'\n";
my $dir = "${File::Find::dir}";
if ($_ eq $needle and ((not $result) or length($dir) < length($result))) {
$result=$dir;
print "Setting result: '$result'\n";
}
}
find(\&wanted, #_[1]);
print "Result: '$result'\n";
return $result;
}
..and call it thus:
$svnDir = firstDirWithFileUnder(".svn",$projPath);
print "\tIdentified svn dir:\n\t'$svnDir'\n";
$pomDir = firstDirWithFileUnder("pom.xml",$projPath);
print "\tIdentified pom.xml dir:\n\t'$pomDir'\n";
There are two situations which arise that I cannot explain:
When the search for a .svn is successful, the value of $result perceived inside the nested subroutine wanted persists into the next call of firstDirWithFileUnder. So when the pom search begins, although the line my $result = 0; still exists, the wanted subroutine sees its value as the return value from the last firstDirWithFileUnder call.
If the my $result = 0; line is commented out, then the function still executes properly. This means a) outer scope (firstDirWithFileUnder) can still see the $result variable to be able to return it, and b) print shows that wanted still sees $result value from last time, i.e. it seems to have formed a closure that's persisted beyond the first call of firstDirWithFileUnder.
Can somebody explain what's happening, and suggest how I can properly reset the value of $result to zero upon entering the outer scope?
Using warnings and then diagnostics yields this helpful information, including a solution:
Variable "$needle" will not stay shared at ----- line 12 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
$result is lexically scoped, meaning a brand new variable is allocated every time you call &firstDirWithFileUnder.
sub wanted { ... } is a compile-time subroutine declaration, meaning it is compiled by the Perl interpreter one time and stored in your package's symbol table. Since it contains a reference to the lexically scoped $result variable, the subroutine definition that Perl saves will only refer to the first instance of $result. The second time you call &firstDirWithFileUnder and declare a new $result variable, this will be a completely different variable than the $result inside &wanted.
You'll want to change your sub wanted { ... } declaration to a lexically scoped, anonymous sub:
my $wanted = sub {
print "\twanted->result is '$result'\n";
...
};
and invoke File::Find::find as
find($wanted, $_[1])
Here, $wanted is a run-time declaration for a subroutine, and it gets redefined with the current reference to $result in every separate call to &firstDirWithFileUnder.
Update: This code snippet may prove instructive:
sub foo {
my $foo = 0; # lexical variable
$bar = 0; # global variable
sub compiletime {
print "compile foo is ", ++$foo, " ", \$foo, "\n";
print "compile bar is ", ++$bar, " ", \$bar, "\n";
}
my $runtime = sub {
print "runtime foo is ", ++$foo, " ", \$foo, "\n";
print "runtime bar is ", ++$bar, " ", \$bar, "\n";
};
&compiletime;
&$runtime;
print "----------------\n";
push #baz, \$foo; # explained below
}
&foo for 1..3;
Typical output:
compile foo is 1 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 2 SCALAR(0xac18c0)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 3 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xa63d18)
runtime bar is 2 SCALAR(0xac1938)
----------------
compile foo is 4 SCALAR(0xac18c0)
compile bar is 1 SCALAR(0xac1938)
runtime foo is 1 SCALAR(0xac1db8)
runtime bar is 2 SCALAR(0xac1938)
----------------
Note that the compile time $foo always refers to the same variable SCALAR(0xac18c0), and that this is also the run time $foo THE FIRST TIME the function is run.
The last line of &foo, push #baz,\$foo is included in this example so that $foo doesn't get garbage collected at the end of &foo. Otherwise, the 2nd and 3rd runtime $foo might point to the same address, even though they refer to different variables (the memory is reallocated each time the variable is declared).

For Loop and Lexically Scoped Variables

Version #1
use warnings;
use strict;
my $count = 4;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
This prints:
1
2
3
4
5
6
4
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
Version #2
use warnings;
use strict;
my $count;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
1
2
3
4
5
6
Count not defined
Whoops! I wanted to capture the exit value of $count, but the for loop had it's own lexically scoped version of $count!. I just had someone spend two hours trying to track down this bug.
Version #3
use warnings;
use strict;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
print "That's all folks!\n";
This gives me the error Global symbol "$count" requires explicit package name at line 5. But, I thought $count was automatically lexically scoped inside the for block. It seems like that only occurs when I've already declared a lexically scoped version of this variable elsewhere.
What was the reason for this behavior? Yes, I know about Conway's dictate that you should always use my for the for loop variable, but the question is why was the Perl interpretor designed this way.
In Perl, assignment to the variable in the loop is always localized to the loop, and the loop variable is always an alias to the looped over value (meaning you can change the original elements by modifying the loop variable). This is true both for package variables (our) and lexical variables (my).
This behavior is closest to that of Perl's dynamic scoping of package variables (with the local keyword), but is also special cased to work with lexical variables (either declared in the loop or before hand).
In no case though does the looped over value persist in the loop variable after the loop ends. For a loop scoped variable, this is fairly intuitive, but for variables with scope beyond the loop, the behavior is analogous to a value localized (with local) inside of a block scope created by the loop.
for our $val (1 .. 10) {...}
is equivalent to:
our $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
local *val = \$list[$i++];
# loop body
}
In pure perl it is not possible to write the expanded lexical version, but if a module like Data::Alias is used:
my $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
alias $val = $list[$i++];
# loop body
}
Actually, in version #3 the variable is "localized" as opposed to lexically scoped.
The "foreach" loop iterates over a normal list value and sets the
variable VAR to be each element of the list in turn. If the variable
is preceded with the keyword "my", then it is lexically scoped, and is
therefore visible only within the loop. Otherwise, the variable is
implicitly local to the loop and regains its former value upon exiting
the loop. If the variable was previously declared with "my", it uses
that variable instead of the global one, but it's still localized to
the loop. This implicit localisation occurs only in a "foreach" loop.
In any case, you will not be able to access the loop variable from that stlye of for-loop outside the loop. But you could use the other style (C-style) for-loop:
my $count;
for ($count=1; $count <= 8; $count++) {
last if $count == 6;
}
... # $count is now 6.
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
This is wrong. If you had written for my $count (...) { ... } it would be true, but you didn't. Instead, if $count is already a global, it's localized -- the global that already exists is set to new values during the execution of the loop, and set back when it's done. The difference should be clear from this:
our $x = "orig";
sub foo {
print $x, "\n";
}
foo();
for $x (1 .. 3) {
foo();
}
for my $x (1 .. 3) {
foo();
}
The output is
orig
1
2
3
orig
orig
orig
The first for loop, without my, is changing the value of the global $x that already exists. The second for loop, with my, is creating a new lexical $x that isn't visible outside the loop. They're not the same.
This is also why example #3 fails -- since there isn't a lexical $count in scope and you haven't declared that you intend to touch the package global $count, strict 'vars' stops you in your tracks. It's not really behaving any differently for a for loop than anything else.