For Loop and Lexically Scoped Variables - perl

Version #1
use warnings;
use strict;
my $count = 4;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
This prints:
1
2
3
4
5
6
4
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
Version #2
use warnings;
use strict;
my $count;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
1
2
3
4
5
6
Count not defined
Whoops! I wanted to capture the exit value of $count, but the for loop had it's own lexically scoped version of $count!. I just had someone spend two hours trying to track down this bug.
Version #3
use warnings;
use strict;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
print "That's all folks!\n";
This gives me the error Global symbol "$count" requires explicit package name at line 5. But, I thought $count was automatically lexically scoped inside the for block. It seems like that only occurs when I've already declared a lexically scoped version of this variable elsewhere.
What was the reason for this behavior? Yes, I know about Conway's dictate that you should always use my for the for loop variable, but the question is why was the Perl interpretor designed this way.

In Perl, assignment to the variable in the loop is always localized to the loop, and the loop variable is always an alias to the looped over value (meaning you can change the original elements by modifying the loop variable). This is true both for package variables (our) and lexical variables (my).
This behavior is closest to that of Perl's dynamic scoping of package variables (with the local keyword), but is also special cased to work with lexical variables (either declared in the loop or before hand).
In no case though does the looped over value persist in the loop variable after the loop ends. For a loop scoped variable, this is fairly intuitive, but for variables with scope beyond the loop, the behavior is analogous to a value localized (with local) inside of a block scope created by the loop.
for our $val (1 .. 10) {...}
is equivalent to:
our $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
local *val = \$list[$i++];
# loop body
}
In pure perl it is not possible to write the expanded lexical version, but if a module like Data::Alias is used:
my $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
alias $val = $list[$i++];
# loop body
}

Actually, in version #3 the variable is "localized" as opposed to lexically scoped.
The "foreach" loop iterates over a normal list value and sets the
variable VAR to be each element of the list in turn. If the variable
is preceded with the keyword "my", then it is lexically scoped, and is
therefore visible only within the loop. Otherwise, the variable is
implicitly local to the loop and regains its former value upon exiting
the loop. If the variable was previously declared with "my", it uses
that variable instead of the global one, but it's still localized to
the loop. This implicit localisation occurs only in a "foreach" loop.
In any case, you will not be able to access the loop variable from that stlye of for-loop outside the loop. But you could use the other style (C-style) for-loop:
my $count;
for ($count=1; $count <= 8; $count++) {
last if $count == 6;
}
... # $count is now 6.

Why? Because the for loop creates its own lexically scoped version of $count inside its block.
This is wrong. If you had written for my $count (...) { ... } it would be true, but you didn't. Instead, if $count is already a global, it's localized -- the global that already exists is set to new values during the execution of the loop, and set back when it's done. The difference should be clear from this:
our $x = "orig";
sub foo {
print $x, "\n";
}
foo();
for $x (1 .. 3) {
foo();
}
for my $x (1 .. 3) {
foo();
}
The output is
orig
1
2
3
orig
orig
orig
The first for loop, without my, is changing the value of the global $x that already exists. The second for loop, with my, is creating a new lexical $x that isn't visible outside the loop. They're not the same.
This is also why example #3 fails -- since there isn't a lexical $count in scope and you haven't declared that you intend to touch the package global $count, strict 'vars' stops you in your tracks. It's not really behaving any differently for a for loop than anything else.

Related

Per: Visibility of variables of a subroutines for its subroutine

I have a problem about variables of a subroutine which cannot be accessed by another subroutine.
the first subroutine :
sub esr_info {
my $esr ;
my #vpls = () ;
my #sap = ();
my #spoke = () ;
&conf_esr($esr , 1);
}
the second :
sub conf_esr {
my $e = #_[0] ;
some code (#vpls, #sap, #spoke);
}
the first calls the second, and I need the variables of the first to be local and not global for the whole code (for threading purposes). The second uses all the variables of the first . I get these errors :
Global symbol "$esr" requires explicit package name (did you forget to declare "my $esr"?) at w.pl line 63.
Global symbol "#vpls" requires explicit package name (did you forget to declare "my #vpls"?) at w.pl line 74.
My question : Can a subroutine access the vars of another without declaring those vars as global ?
Many thanks for reading the post.
You can contain (restrict the visibility of) the variables to the two subs by introducing a scope { ... }, for example:
{
my $esr ;
my #vpls = () ;
my #sap = ();
my #spoke = () ;
sub esr_info {
conf_esr($esr , 1);
}
sub conf_esr {
my $e = #_[0] ;
#some code (#vpls, #sap, #spoke);
}
}
But note that the variables now retain the values after the subs are exited (they become state variables). This is also called a closure.
But other approaches could be more appropriate (closures can make the code more difficult to read and hence to maintain) depending on you situation. For example, alternatives could be:
you could pass references to the variables as arguments to conf_esr, or better
use an object oriented approach where the variables are contained in a $self hash.
My question : Can a subroutine access the vars of another without declaring those vars as global ?
No. You should try passing in variables, it's better form, but you can also use global variables.
my $i=1;
mysub(); # This will not change the global $i
print "i=$i\n"; # This should print '1'
exit;
##########
sub mysub
{my $i=2; # This is a variable local to mysub() only.
return;
}
Type in the code above and run it with Perl. Notice that the $i in the subroutine mysub() is completely different than the global $i in the program itself, because the $i in the mysub() is a different memory address.
Now let's change $i to global. mysub() will change the global $i because it doesn't have a local $i declared.
my $i=1;
mysub(); # This will not change the global $i
print "i=$i\n"; # This should print '2'
exit;
##########
sub mysub
{$i=2; # This is changing the value in the global $i memory area.
return;
}

Why does this Perl variable keep its value

What is the difference between the following two Perl variable declarations?
my $foo = 'bar' if 0;
my $baz;
$baz = 'qux' if 0;
The difference is significant when these appear at the top of a loop. For example:
use warnings;
use strict;
foreach my $n (0,1){
my $foo = 'bar' if 0;
print defined $foo ? "defined\n" : "undefined\n";
$foo = 'bar';
print defined $foo ? "defined\n" : "undefined\n";
}
print "==\n";
foreach my $m (0,1){
my $baz;
$baz = 'qux' if 0;
print defined $baz ? "defined\n" : "undefined\n";
$baz = 'qux';
print defined $baz ? "defined\n" : "undefined\n";
}
results in
undefined
defined
defined
defined
==
undefined
defined
undefined
defined
It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?
First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.
my $x has three documented effects:
It declares a symbol at compile-time.
It creates an new variable on execution.
It returns the new variable on execution.
In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.
But if it actually worked that way, the following would create 100 variables:
for (1..100) {
my $x = rand();
print "$x\n";
}
This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:
It declares a symbol at compile-time.
It creates the variable at compile-time[1].
It puts a directive on the stack that will clear[2] the variable when the scope is exited.
It returns the new variable on execution.
As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.
Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.
Some take advantage of the implementation as follows:
sub foo {
my $state if 0;
$state = 5 if !defined($state);
print "$state\n";
++$state;
}
foo(); # 5
foo(); # 6
foo(); # 7
But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using
{
my $state = 5;
sub foo {
print "$state\n";
++$state;
}
}
or
use feature qw( state ); # Or: use 5.010;
sub foo {
state $state = 5;
print "$state\n";
++$state;
}
Notes:
"Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.
If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:
my #a;
for (...) {
my $x = ...;
push #a, \$x;
}
See ikegami's better answer, probably above.
In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.
In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.

What's the difference between 'for' and 'foreach' in Perl?

I see these used interchangeably. What's the difference?
There is no difference. From perldoc perlsyn:
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity.
I see these used interchangeably.
There is no difference other than that of syntax.
Ever since its introduction in perl-2.0, foreach has been synonymous with for. It's a nod to the C shell's foreach command.
In my own code, in the rare case that I'm using a C-style for-loop, I write
for (my $i = 0; $i < $n; ++$i)
but for iterating over an array, I spell out
foreach my $x (#a)
I find that it reads better in my head that way.
Four letters.
They're functionally identical, just spelled differently.
From http://perldoc.perl.org/perlsyn.html#Foreach-Loops
The foreach keyword is actually a synonym for the for keyword, so you
can use either. If VAR is omitted, $_ is set to each value.
# Perl's C-style
for (;;) {
# do something
}
for my $j (#array) {
print $j;
}
foreach my $j (#array) {
print $j;
}
However:
If any part of LIST is an array, foreach will get very confused if you
add or remove elements within the loop body, for example with splice.
So don't do that.
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing for comes more naturally.) If VAR is omitted, $_ is set to each value.
There is a subtle difference (http://perldoc.perl.org/perlsyn.html#Foreach-Loops) :
The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn. If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop. This implicit localization occurs only in a foreach loop.
This program :
#!/usr/bin/perl -w
use strict;
my $var = 1;
for ($var=10;$var<=10;$var++) {
print $var."\n"; # print 10
foo(); # print 10
}
print $var."\n"; # print 11
foreach $var(100) {
print $var."\n"; # print 100
foo(); # print 11 !
}
sub foo {
print $var."\n";
}
will produce that :
10
10
11
100
11
In case of the "for" you can use the three steps.
1) Initialization
2) Condition Checking
3) Increment or decrement
But in case of "foreach" you are not able increment or decrement the value. It always take the increment value as 1.

What perl code samples can lead to undefined behaviour?

These are the ones I'm aware of:
The behaviour of a "my" statement modified with a statement modifier conditional or loop construct (e.g. "my $x if ...").
Modifying a variable twice in the same statement, like $i = $i++;
sort() in scalar context
truncate(), when LENGTH is greater than the length of the file
Using 32-bit integers, "1 << 32" is undefined. Shifting by a negative number of bits is also undefined.
Non-scalar assignment to "state" variables, e.g. state #a = (1..3).
One that is easy to trip over is prematurely breaking out of a loop while iterating through a hash with each.
#!/usr/bin/perl
use strict;
use warnings;
my %name_to_num = ( one => 1, two => 2, three => 3 );
find_name(2); # works the first time
find_name(2); # but fails this time
exit;
sub find_name {
my($target) = #_;
while( my($name, $num) = each %name_to_num ) {
if($num == $target) {
print "The number $target is called '$name'\n";
return;
}
}
print "Unable to find a name for $target\n";
}
Output:
The number 2 is called 'two'
Unable to find a name for 2
This is obviously a silly example, but the point still stands - when iterating through a hash with each you should either never last or return out of the loop; or you should reset the iterator (with keys %hash) before each search.
These are just variations on the theme of modifying a structure that is being iterated over:
map, grep and sort where the code reference modifies the list of items to sort.
Another issue with sort arises where the code reference is not idempotent (in the comp sci sense)--sort_func($a, $b) must always return the same value for any given $a and $b.

Is there a way to modify foreach loop variable?

The following code gives an error message:
#!/usr/bin/perl -w
foreach my $var (0, 1, 2){
$var += 2;
print "$var\n";
}
Modification of a read-only value attempted at test.pl line 4.
Is there any way to modify $var? (I'm just asking out of curiosity; I was actually quite surprised to see this error message.)
In a foreach $var (#list) construct, $var becomes aliased to the elements of the loop, in the sense that the memory address of $var would be the same address as an element of #list. So your example code attempts to modify read-only values, and you get the error message.
This little script will demonstrate what is going on in the foreach construct:
my #a = (0,1,2);
print "Before: #a\n";
foreach my $var (#a) {
$var += 2;
}
print "After: #a\n";
Before: 0 1 2
After: 2 3 4
Additional info: This item from perlsyn is easy to gloss over but gives the whole scoop:
Foreach loops
...
If any element of LIST is an lvalue,
you can modify it by modifying VAR
inside the loop. Conversely, if any
element of LIST is NOT an lvalue, any
attempt to modify that element will
fail. In other words, the "foreach"
loop index variable is an implicit
alias for each item in the list that
you're looping over.
Perl is complaining about the values which are constants, not the loop variable. The readonly value it's complaining about is your 0 because $var is an alias to it, and it's not stored in a variable (which is something that you can change). If you loop over an array or a list of variables, you don't have that problem.
Figuring out why the following does not result in the same message will go a long way toward improving your understanding of why the message is emitted in the first place:
#!/usr/bin/perl
use strict;
use warnings;
for my $x ( #{[ 0, 1, 2 ]} ) {
$x += 2;
print $x, "\n";
}