What's the difference between 'for' and 'foreach' in Perl? - perl

I see these used interchangeably. What's the difference?

There is no difference. From perldoc perlsyn:
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity.

I see these used interchangeably.
There is no difference other than that of syntax.

Ever since its introduction in perl-2.0, foreach has been synonymous with for. It's a nod to the C shell's foreach command.
In my own code, in the rare case that I'm using a C-style for-loop, I write
for (my $i = 0; $i < $n; ++$i)
but for iterating over an array, I spell out
foreach my $x (#a)
I find that it reads better in my head that way.

Four letters.
They're functionally identical, just spelled differently.

From http://perldoc.perl.org/perlsyn.html#Foreach-Loops
The foreach keyword is actually a synonym for the for keyword, so you
can use either. If VAR is omitted, $_ is set to each value.
# Perl's C-style
for (;;) {
# do something
}
for my $j (#array) {
print $j;
}
foreach my $j (#array) {
print $j;
}
However:
If any part of LIST is an array, foreach will get very confused if you
add or remove elements within the loop body, for example with splice.
So don't do that.

The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing for comes more naturally.) If VAR is omitted, $_ is set to each value.

There is a subtle difference (http://perldoc.perl.org/perlsyn.html#Foreach-Loops) :
The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn. If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop. This implicit localization occurs only in a foreach loop.
This program :
#!/usr/bin/perl -w
use strict;
my $var = 1;
for ($var=10;$var<=10;$var++) {
print $var."\n"; # print 10
foo(); # print 10
}
print $var."\n"; # print 11
foreach $var(100) {
print $var."\n"; # print 100
foo(); # print 11 !
}
sub foo {
print $var."\n";
}
will produce that :
10
10
11
100
11

In case of the "for" you can use the three steps.
1) Initialization
2) Condition Checking
3) Increment or decrement
But in case of "foreach" you are not able increment or decrement the value. It always take the increment value as 1.

Related

Scope of the default variable $_ in Perl

I have the following method which accepts a variable and then displays info from a database:
sub showResult {
if (#_ == 2) {
my #results = dbGetResults($_[0]);
if (#results) {
foreach (#results) {
print "$count - $_[1] (ID: $_[0])\n";
}
} else {
print "\n\nNo results found";
}
}
}
Everything works fine, except the print line in the foreach loop. This $_ variable still contains the values passed to the method.
Is there anyway to 'force' the new scope of values on $_, or will it always contain the original values?
If there are any good tutorials that explain how the scope of $_ works, that would also be cool!
Thanks
The problem here is that you're using really #_ instead of $_. The foreach loop changes $_, the scalar variable, not #_, which is what you're accessing if you index it by $_[X]. Also, check again the code to see what it is inside #results. If it is an array of arrays or refs, you may need to use the indirect ${$_}[0] or something like that.
In Perl, the _ name can refer to a number of different variables:
The common ones are:
$_ the default scalar (set by foreach, map, grep)
#_ the default array (set by calling a subroutine)
The less common:
%_ the default hash (not used by anything by default)
_ the default file handle (used by file test operators)
&_ an unused subroutine name
*_ the glob containing all of the above names
Each of these variables can be used independently of the others. In fact, the only way that they are related is that they are all contained within the *_ glob.
Since the sigils vary with arrays and hashes, when accessing an element, you use the bracket characters to determine which variable you are accessing:
$_[0] # element of #_
$_{...} # element of %_
$$_[0] # first element of the array reference stored in $_
$_->[0] # same
The for/foreach loop can accept a variable name to use rather than $_, and that might be clearer in your situation:
for my $result (#results) {...}
In general, if your code is longer than a few lines, or nested, you should name the variables rather than relying on the default ones.
Since your question was related more to variable names than scope, I have not discussed the actual scope surrounding the foreach loop, but in general, the following code is equivalent to what you have.
for (my $i = 0; $i < $#results; $i++) {
local *_ = \$results[$i];
...
}
The line local *_ = \$results[$i] installs the $ith element of #results into the scalar slot of the *_ glob, aka $_. At this point $_ contains an alias of the array element. The localization will unwind at the end of the loop. local creates a dynamic scope, so any subroutines called from within the loop will see the new value of $_ unless they also localize it. There is much more detail available about these concepts, but I think they are outside the scope of your question.
As others have pointed out:
You're really using #_ and not $_ in your print statement.
It's not good to keep stuff in these variables since they're used elsewhere.
Officially, $_ and #_ are global variables and aren't members of any package. You can localize the scope with my $_ although that's probably a really, really bad idea. The problem is that Perl could use them without you even knowing it. It's bad practice to depend upon their values for more than a few lines.
Here's a slight rewrite in your program getting rid of the dependency on #_ and $_ as much as possible:
sub showResults {
my $foo = shift; #Or some meaningful name
my $bar = shift; #Or some meaningful name
if (not defined $foo) {
print "didn't pass two parameters\n";
return; #No need to hang around
}
if (my #results = dbGetResults($foo)) {
foreach my $item (#results) {
...
}
}
Some modifications:
I used shift to give your two parameters actual names. foo and bar aren't good names, but I couldn't find out what dbGetResults was from, so I couldn't figure out what parameters you were looking for. The #_ is still being used when the parameters are passed, and my shift is depending upon the value of #_, but after the first two lines, I'm free.
Since your two parameters have actual names, I can use the if (not defined $bar) to see if both parameters were passed. I also changed this to the negative. This way, if they didn't pass both parameters, you can exit early. This way, your code has one less indent, and you don't have a if structure that takes up your entire subroutine. It makes it easier to understand your code.
I used foreach my $item (#results) instead of foreach (#results) and depend upon $_. Again, it's clearer what your program is doing, and you wouldn't have confused $_->[0] with $_[0] (I think that's what you were doing). It would have been obvious you wanted $item->[0].

For Loop and Lexically Scoped Variables

Version #1
use warnings;
use strict;
my $count = 4;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
This prints:
1
2
3
4
5
6
4
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
Version #2
use warnings;
use strict;
my $count;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
if (not defined($count)) {
print "Count not defined\n";
}
else {
print "Count = $count\n";
}
1
2
3
4
5
6
Count not defined
Whoops! I wanted to capture the exit value of $count, but the for loop had it's own lexically scoped version of $count!. I just had someone spend two hours trying to track down this bug.
Version #3
use warnings;
use strict;
for $count (1..8) {
print "Count = $count\n";
last if ($count == 6);
}
print "That's all folks!\n";
This gives me the error Global symbol "$count" requires explicit package name at line 5. But, I thought $count was automatically lexically scoped inside the for block. It seems like that only occurs when I've already declared a lexically scoped version of this variable elsewhere.
What was the reason for this behavior? Yes, I know about Conway's dictate that you should always use my for the for loop variable, but the question is why was the Perl interpretor designed this way.
In Perl, assignment to the variable in the loop is always localized to the loop, and the loop variable is always an alias to the looped over value (meaning you can change the original elements by modifying the loop variable). This is true both for package variables (our) and lexical variables (my).
This behavior is closest to that of Perl's dynamic scoping of package variables (with the local keyword), but is also special cased to work with lexical variables (either declared in the loop or before hand).
In no case though does the looped over value persist in the loop variable after the loop ends. For a loop scoped variable, this is fairly intuitive, but for variables with scope beyond the loop, the behavior is analogous to a value localized (with local) inside of a block scope created by the loop.
for our $val (1 .. 10) {...}
is equivalent to:
our $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
local *val = \$list[$i++];
# loop body
}
In pure perl it is not possible to write the expanded lexical version, but if a module like Data::Alias is used:
my $val;
my #list = 1 .. 10;
my $i = 0;
while ($i < #list) {
alias $val = $list[$i++];
# loop body
}
Actually, in version #3 the variable is "localized" as opposed to lexically scoped.
The "foreach" loop iterates over a normal list value and sets the
variable VAR to be each element of the list in turn. If the variable
is preceded with the keyword "my", then it is lexically scoped, and is
therefore visible only within the loop. Otherwise, the variable is
implicitly local to the loop and regains its former value upon exiting
the loop. If the variable was previously declared with "my", it uses
that variable instead of the global one, but it's still localized to
the loop. This implicit localisation occurs only in a "foreach" loop.
In any case, you will not be able to access the loop variable from that stlye of for-loop outside the loop. But you could use the other style (C-style) for-loop:
my $count;
for ($count=1; $count <= 8; $count++) {
last if $count == 6;
}
... # $count is now 6.
Why? Because the for loop creates its own lexically scoped version of $count inside its block.
This is wrong. If you had written for my $count (...) { ... } it would be true, but you didn't. Instead, if $count is already a global, it's localized -- the global that already exists is set to new values during the execution of the loop, and set back when it's done. The difference should be clear from this:
our $x = "orig";
sub foo {
print $x, "\n";
}
foo();
for $x (1 .. 3) {
foo();
}
for my $x (1 .. 3) {
foo();
}
The output is
orig
1
2
3
orig
orig
orig
The first for loop, without my, is changing the value of the global $x that already exists. The second for loop, with my, is creating a new lexical $x that isn't visible outside the loop. They're not the same.
This is also why example #3 fails -- since there isn't a lexical $count in scope and you haven't declared that you intend to touch the package global $count, strict 'vars' stops you in your tracks. It's not really behaving any differently for a for loop than anything else.

What perl code samples can lead to undefined behaviour?

These are the ones I'm aware of:
The behaviour of a "my" statement modified with a statement modifier conditional or loop construct (e.g. "my $x if ...").
Modifying a variable twice in the same statement, like $i = $i++;
sort() in scalar context
truncate(), when LENGTH is greater than the length of the file
Using 32-bit integers, "1 << 32" is undefined. Shifting by a negative number of bits is also undefined.
Non-scalar assignment to "state" variables, e.g. state #a = (1..3).
One that is easy to trip over is prematurely breaking out of a loop while iterating through a hash with each.
#!/usr/bin/perl
use strict;
use warnings;
my %name_to_num = ( one => 1, two => 2, three => 3 );
find_name(2); # works the first time
find_name(2); # but fails this time
exit;
sub find_name {
my($target) = #_;
while( my($name, $num) = each %name_to_num ) {
if($num == $target) {
print "The number $target is called '$name'\n";
return;
}
}
print "Unable to find a name for $target\n";
}
Output:
The number 2 is called 'two'
Unable to find a name for 2
This is obviously a silly example, but the point still stands - when iterating through a hash with each you should either never last or return out of the loop; or you should reset the iterator (with keys %hash) before each search.
These are just variations on the theme of modifying a structure that is being iterated over:
map, grep and sort where the code reference modifies the list of items to sort.
Another issue with sort arises where the code reference is not idempotent (in the comp sci sense)--sort_func($a, $b) must always return the same value for any given $a and $b.

Is there a way to modify foreach loop variable?

The following code gives an error message:
#!/usr/bin/perl -w
foreach my $var (0, 1, 2){
$var += 2;
print "$var\n";
}
Modification of a read-only value attempted at test.pl line 4.
Is there any way to modify $var? (I'm just asking out of curiosity; I was actually quite surprised to see this error message.)
In a foreach $var (#list) construct, $var becomes aliased to the elements of the loop, in the sense that the memory address of $var would be the same address as an element of #list. So your example code attempts to modify read-only values, and you get the error message.
This little script will demonstrate what is going on in the foreach construct:
my #a = (0,1,2);
print "Before: #a\n";
foreach my $var (#a) {
$var += 2;
}
print "After: #a\n";
Before: 0 1 2
After: 2 3 4
Additional info: This item from perlsyn is easy to gloss over but gives the whole scoop:
Foreach loops
...
If any element of LIST is an lvalue,
you can modify it by modifying VAR
inside the loop. Conversely, if any
element of LIST is NOT an lvalue, any
attempt to modify that element will
fail. In other words, the "foreach"
loop index variable is an implicit
alias for each item in the list that
you're looping over.
Perl is complaining about the values which are constants, not the loop variable. The readonly value it's complaining about is your 0 because $var is an alias to it, and it's not stored in a variable (which is something that you can change). If you loop over an array or a list of variables, you don't have that problem.
Figuring out why the following does not result in the same message will go a long way toward improving your understanding of why the message is emitted in the first place:
#!/usr/bin/perl
use strict;
use warnings;
for my $x ( #{[ 0, 1, 2 ]} ) {
$x += 2;
print $x, "\n";
}

How would I do the equivalent of Prototype's Enumerator.detect in Perl with the least amount of code?

Lately I've been thinking a lot about functional programming. Perl offers quite a few tools to go that way, however there's something I haven't been able to find yet.
Prototype has the function detect for enumerators, the descriptions is simply this:
Enumerator.detect(iterator[, context]) -> firstElement | undefined
Finds the first element for which the iterator returns true.
Enumerator in this case is any list while iterator is a reference to a function, which is applied in turn on each element of the list.
I am looking for something like this to apply in situations where performance is important, i.e. when stopping upon encountering a match saves time by disregarding the rest of the list.
I am also looking for a solution that would not involve loading any extra module, so if possible it should be done with builtins only. And if possible, it should be as concise as this for example:
my #result = map function #array;
You say you don't want a module, but this is exactly what the first function in List::Util does. That's a core module, so it should be available everywhere.
use List::Util qw(first);
my $first = first { some condition } #array;
If you insist on not using a module, you could copy the implementation out of List::Util. If somebody knew a faster way to do it, it would be in there. (Note that List::Util includes an XS implementation, so that's probably faster than any pure-Perl approach. It also has a pure-Perl version of first, in List::Util::PP.)
Note that the value being tested is passed to the subroutine in $_ and not as a parameter. This is a convenience when you're using the first { some condition} #values form, but is something you have to remember if you're using a regular subroutine. Some more examples:
use 5.010; # I want to use 'say'; nothing else here is 5.10 specific
use List::Util qw(first);
say first { $_ > 3 } 1 .. 10; # prints 4
sub wanted { $_ > 4 }; # note we're using $_ not $_[0]
say first \&wanted, 1 .. 10; # prints 5
my $want = \&wanted; # Get a subroutine reference
say first \&$want, 1 .. 10; # This is how you pass a reference in a scalar
# someFunc expects a parameter instead of looking at $_
say first { someFunc($_) } 1 .. 10;
Untested since I don't have Perl on this machine, but:
sub first(\&#) {
my $pred = shift;
die "First argument to "first" must be a sub" unless ref $pred eq 'CODE';
for my $val (#_) {
return $val if $pred->($val);
}
return undef;
}
Then use it as:
my $first = first { sub performing test } #list;
Note that this doesn't distinguish between no matches in the list and one of the elements in the list being an undefined value and having that match.
Just since its not here, a Perl function definition of first that localizes $_ for its block:
sub first (&#) {
my $code = shift;
for (#_) {return $_ if $code->()}
undef
}
my #array = 1 .. 10;
say first {$_ > 5} #array; # prints 6
While it will work fine, I don't advocate using this version, since List::Util is a core module (installed by default), and its implementation of first will usually use the XS version (written in C) which is much faster.