As I use $_ a lot I want to understand its usage better. $_ is a global variable for implicit values as far as I understood and used it.
As $_ seems to be set anyway, are there reasons to use named loop variables over $_ besides readability?
In what cases does it matter $_ is a global variable?
So if I use
for (#array){
print $_;
}
or even
print $_ for #array;
it has the same effect as
for my $var (#array){
print $var;
}
But does it work the same? I guess it does not exactly but what are the actual differences?
Update:
It seems $_ is even scoped correctly in this example. Is it not global anymore? I am using 5.12.3.
#!/usr/bin/perl
use strict;
use warnings;
my #array = qw/one two three four/;
my #other_array = qw/1 2 3 4/;
for (#array){
for (#other_array){
print $_;
}
print $_;
}
that prints correctly 1234one1234two1234three1234four.
For global $_ I would have expected 1234 4 1234 4 1234 4 1234 4 .. or am i missing something obvious?
When is $_ global then?
Update:
Ok, after having read the various answers and perlsyn more carefully I came to a conclusion:
Besides readability it is better to avoid using $_ because implicit localisation of $_ must be known and taken account of otherwise one might encounter unexpected behaviour.
Thanks for clarification of that matter.
are there reasons to use named loop variables over $_ besides readability?
The issue is not if they are named or not. The issue is if they are "package variables" or "lexical variables".
See the very good description of the 2 systems of variables used in Perl "Coping with Scoping":
http://perl.plover.com/FAQs/Namespaces.html
package variables are global variables, and should therefore be avoided for all the usual reasons (eg. action at a distance).
Avoiding package variables is a question of "correct operation" or "harder to inject bugs" rather than a question of "readability".
In what cases does it matter $_ is a global variable?
Everywhere.
The better question is:
In what cases is $_ local()ized for me?
There are a few places where Perl will local()ize $_ for you, primarily foreach, grep and map. All other places require that you local()ize it yourself, therefore you will be injecting a potential bug when you inevitably forget to do so. :-)
The classic failure mode of using $_ (implicitly or explicitly) as a loop variable is
for $_ (#myarray) {
/(\d+)/ or die;
foo($1);
}
sub foo {
open(F, "foo_$_[0]") or die;
while (<F>) {
...
}
}
where, because the loop variable in for/foreach is bound to the actual list item, means that the while (<F>) overwrites #myarray with lines read from the files.
$_ is the same as naming the variable as in your second example with the way it is usually used. $_ is just a shortcut default variable name for the current item in the current loop to save on typing when doing a quick, simple loop. I tend to use named variables rather than the default. It makes it more clear what it is and if I happen to need to do a nested loop there are no conflicts.
Since $_ is a global variable, you may get unexpected values if you try to use its value that it had from a previous code block. The new code block may be part of a loop or other operation that inserts its own values into $_, overwriting what you expected to be there.
The risk in using $_ is that it is global (unless you localise it with local $_), and so if some function you call in your loop also uses $_, the two uses can interfere.
For reasons which are not clear to me, this has only bitten me occasionally, but I usually localise $_ if I use it inside packages.
There is nothing special about $_ apart from it is the default parameter for many functions. If you explicitly lexically scope your $_ with my, perl will use the local version of $_ rather than the global one. There is nothing strange in this, it is just like any other named variable.
sub p { print "[$_]"; } # Prints the global $_
# Compare and contrast
for my $_ (b1..b5) { for my $_ (a1..a5) { p } } print "\n"; # ex1
for my $_ (b1..b5) { for (a1..a5) { p } } print "\n"; # ex2
for (b1..b5) { for my $_ (a1..a5) { p } } print "\n"; # ex3
for (b1..b5) { for (a1..a5) { p } } print "\n"; # ex4
You should be slightly mystified by the output until you find out that perl will preserve the original value of the loop variable on loop exit (see perlsyn).
Note ex2 above. Here the second loop is using the lexically scoped $_ declared in the first loop. Subtle, but expected. Again, this value is preserved on exit so the two loops do not interfere.
Related
I am very very new to perl programming.
While reading about the loops, for the foreach loop I got two examples.
The one example is,
foreach ('hickory','dickory','doc') {
print $_;
print "\n";
}
Output:-
hickory
dickory
doc
The $_ variable contains the each item. So, it prints.
In another example, they said did not specified the $_ variable in print statement. The empty print statement only there. How it prints the foreach arguments.
foreach ('hickory','dickory','doc') {
print;
print "\n";
}
Output:-
hickory
dickory
doc
For this also the same output. How it prints the values. In that book they did not given any explanation for that. I was searched in internet. But I am not able to find anything.
Your question about print in foreach being answered, here is a little more on $_.
From General Variables in perlvar
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf, quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only), rmdir, say, sin, split (for its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f , -d ) except for -t , which defaults to STDIN. See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put the next value or input record when a <FH>, readline, readdir or each operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.
$_ is by default a global variable.
As you can see, it is available nearly everywhere and it is indeed used a lot. Note that the perlvar page describes a whole lot more of similar variables, many of them good to know about.
Here is an example. Consider that we read lines from a file, want to discard the ones which have only spaces or start with # (comments), and for others want to split them by spaces into words.
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
next if not /\S/;
next if /^\s*#/;
my #words = split;
# do something with #words ...
}
Let's see how many uses of $_ are in the above example. Here is an equivalent program
while (my $line = <$fh>)
{
next if not $line =~ m/\S/; # if not matching any non-space character
next if $line =~ m/^\s*#/; # if matching # after only (possible) spaces
my #words = split ' ', $line; # split $line by ' ' (any white space)
# do something with #words ...
}
Compare these two
the filehandle read <$fh> in the while condition assigns to $_, then available in the loop.
regular expression's match operator by default works on $_. The m itself can be dropped.
split by default splits $_. We also use the other default, for the pattern to split the string by, which is ' ' (any amount of any white space).
once we do $line = <$fh> the deal with $_ is off (it is undefined in the loop) and we have to use $line everywhere. So either do this or do while (<$fh>) and use $_.
To illustrate all this a bit further, let us find the longest capitalized word on each line
use List::Util 'max';
my $longest_cap = max map { length } grep { /^[A-Z]/ } #words;
The grep takes the list in #words and applies the block to each element. Each element is assigned to $_ and is thus available to the code inside the block as $_. This is what the regex uses by default. The ones that satisfy the condition are passed to map, which also iterates assigning them to $_, what is of course the default for length. Finally max from List::Util picks the largest one.
Note that $_ is never actually written and no temporary variable is needed.
Here is some of the relevant documentation. The I/O Operators in perlop discusses while (<$fh>) and all manner of related things. The regex part is in Regexp Quote-Like Operators in perlop and in perlretut. Also have a look at split.
Defaults are used regularly and to read code written by others you must understand them. When you write your own code though you can choose whether to use $_ or not, as one can always introduce a lexical variable instead of it.
So, when to use $_ as default (which need not be written) and when not to?
Correct use of defaults, $_ in particular, can lead to clearer and more readable code. What generally means better code. But it is quite possible to push this too far and end up with obscure, tricky, and brittle code. So good taste is required.
Another case is when some parts of the code benefit from having $_ for their defaults while at other places you then have to use $_ explicitly. I'd say that if $_ is seen more than once or twice in a section of code it means that there should be a properly named variable instead.
Overall, if in doubt simply name everything.
If you are not declaring any variable in your foreach loop it sets by default $_
From perldoc about foreach:
The foreach keyword is actually a synonym for the for keyword, so you
can use either. If VAR is omitted, $_ is set to each value.
So it explains the first loop.
The second loop, as you already know now that $_ is set with each element from your array, will works because you are omitting the $var.
You could use foreach loop with explicit variable like this:
foreach my $item ( #list )
{
print "My item is: $item\n";
}
Or you can omit like you did and print will still work as #Dada said because:
If FILEHANDLE is omitted, prints to the last selected (see select)
output handle. If LIST is omitted, prints $_ to the currently selected
output handle.
I will explain why you getting same results with the different syntax:
If you omit the control variable from the beginning of the foreach
loop, Perl uses its favorite default variable, $_ . This is (mostly) just like any other scalar variable, except for its unusual name. For example:
foreach ('hickory','dickory','doc') {
print $_;
print "\n";
}
Output :
hickory
dickory
doc
Although this isn’t Perl’s only default by a long shot, it’s Perl’s most common default. You’ll see many other cases in which Perl will automatically use $_ when you don’t tell it to use some other variable or value, thereby saving the programmer from the heavy labor of having to think up and type a new variable name. So as not to keep you in suspense, one of those cases is
print , which will print $_ if given no other argument:
foreach ('hickory','dickory','doc') {
print; # prints $_ by default
print "\n";
}
Output :-
hickory
dickory
doc
I need to edit some Perl script and I'm new to this language.
I encountered the following statement:
print for (#$result);
I know that $result is a reference to an array and #$result returns the whole array.
But what does print for mean?
Thank you in advance.
In Perl, there's such a thing as an implicit variable. You may have seen it already as $_. There's a lot of built in functions in perl that will work on $_ by default.
$_ is set in a variety of places, such as loops. So you can do:
while ( <$filehandle> ) {
chomp;
tr/A-Z/a-z/;
s/oldword/newword/;
print;
}
Each of these lines is using $_ and modifying it as it goes. Your for loop is doing the same - each iteration of the loop sets $_ to the current value and print is then doing that by default.
I would point out though - whilst useful and clever, it's also a really good way to make confusing and inscrutable code. In nested loops, for example, it can be quite unclear what's actually going on with $_.
So I'd typically:
avoid writing it explicitly - if you need to do that, you should consider actually naming your variable properly.
only use it in places where it makes it clearer what's going on. As a rule of thumb - if you use it more than twice, you should probably use a named variable instead.
I find it particularly useful if iterating on a file handle. E.g.:
while ( <$filehandle> ) {
next unless m/keyword/; #skips any line without 'keyword' in it.
my ( $wiggle, $wobble, $fronk ) = split ( /:/ ); #split $_ into 3 variables on ':'
print $wobble, "\n";
}
It would be redundant to assign a variable name to capture a line from <$filehandle>, only to immediately discard it - thus instead we use split which by default uses $_ to extract 3 values.
If it's hard to figure out what's going on, then one of the more useful ways is to use perl -MO=Deparse which'll re-print the 'parsed' version of the script. So in the example you give:
foreach $_ (#$result) {
print $_;
}
It is equivalent to for (#$result) { print; }, which is equivalent to for (#$result) { print $_; }. $_ refers to the current element.
What was the design (or technical) reason for Perl not automatically localizing $_ with the following syntax:
while (<HANDLE>) {...}
Which gets rewritten as:
while (defined( $_ = <HANDLE> )) {...}
All of the other constructs that implicitly write to $_ do so in a localized manner (for/foreach, map, grep), but with while, you must explicitly localize the variable:
local $_;
while (<HANDLE>) {...}
My guess is that it has something to do with using Perl in "Super-AWK" mode with command line switches, but that might be wrong.
So if anyone knows (or better yet was involved in the language design discussion), could you share with us the reasoning behind this behavior? More specifically, why was allowing the value of $_ to persist outside of the loop deemed important, despite the bugs it can cause (which I tend to see all over the place on SO and in other Perl code)?
In case it is not clear from the above, the reason why $_ must be localized with while is shown in this example:
sub read_handle {
while (<HANDLE>) { ... }
}
for (1 .. 10) {
print "$_: \n"; # works, prints a number from 1 .. 10
read_handle;
print "done with $_\n"; # does not work, prints the last line read from
# HANDLE or undef if the file was finished
}
From the thread on perlmonks.org:
There is a difference between foreach
and while because they are two totally
different things. foreach always
assigns to a variable when looping
over a list, while while normally
doesn't. It's just that while (<>) is
an exception and only when there's a
single diamond operator there's an
implicit assignment to $_.
And also:
One possible reason for why while(<>)
does not implicitly localize $_ as
part of its magic is that sometimes
you want to access the last value of
$_ outside the loop.
Quite simply, while never localises. No variable is associated with a while construct, so it doesn't have even have anything to localise.
If you change some variable in the while loop expression or in a while loop body, it's your responsibility to adequately scope it.
Speculation: Because for and foreach are iterators and loop over values, while while operates on a condition. In the case of while (<FH>) the condition is that data was read from the file. The <FH> is what writes to $_, not the while. The implicit defined() test is just an affordance to prevent naive code from terminating the loop on a read of false value.
For other forms of while loops, e.g. while (/foo/) you wouldn't want to localize $_.
While I agree that it would be nice if while (<FH>) localized $_, it would have to be a very special case, which could cause other problems with recognizing when to trigger it and when not to, much like the rules for <EXPR> distinguishing being a handle read or a call to glob.
As a side note, we only write while(<$fh>) because Perl doesn't have real iterators. If Perl had proper iterators, <$fh> would return one. for would use that to iterate a line at a time rather than slurping the whole file into an array. There would be no need for while(<$fh>) or the special cases associated with it.
I've found in a Module a for-loop written like this
for( #array ) {
my $scalar = $_;
...
...
}
Is there Difference between this and the following way of writing a for-loop?
for my $scalar ( #array ) {
...
...
}
Yes, in the first example, the for loop is acting as a topicalizer (setting $_ which is the default argument to many Perl functions) over the elements in the array. This has the side effect of masking the value $_ had outside the for loop. $_ has dynamic scope, and will be visible in any functions called from within the for loop. You should primarily use this version of the for loop when you plan on using $_ for its special features.
Also, in the first example, $scalar is a copy of the value in the array, whereas in the second example, $scalar is an alias to the value in the array. This matters if you plan on setting the array's value inside the loop. Or, as daotoad helpfully points out, the first form is useful when you need a copy of the array element to work on, such as with destructive function calls (chomp, s///, tr/// ...).
And finally, the first example will be marginally slower.
$_ is the "default input and pattern matching space". In other words, if you read in from a file handle at the top of a while loop, or run a foreach loop and don't name a loop variable, $_ is set up for you.
However, if you write a foreach loop and name a loop variable, $_ is not set up.This can be justified by following code:
1. #!/usr/bin/perl -w
2. #array = (1,2,3);
3. foreach my $elmnt (#array)
4. {
5. print "$_ ";
6. }
The output being "Use of uninitialized value in concatenation (.)"
However if you replace line 3 by:
foreach (#array)
The output is "1 2 3" as expected.
Now in your case, it is always better to name a loop variable in a foreach loop to make the code more readable(perl is already cursed much for being less readable), this way there will also be no need of explicit assignment to the $_ variable and resulting scoping issues.
I can't explain better than the doc can
I know that in a subroutine in Perl, it's a very good idea to preserve the "default variable" $_ with local before doing anything with it, in case the caller is using it, e.g.:
sub f() {
local $_; # Ensure $_ is restored on dynamic scope exit
while (<$somefile>) { # Clobbers $_, but that's OK -- it will be restored
...
}
}
Now, often the reason you use $_ in the first place is because you want to use regexes, which may put results in handy "magic" variables like $1, $2 etc. I'd like to preserve those variables too, but I haven't been able to find a way to do that.
All perlvar says is that #+ and #-, which $1 etc. seem to depend on internally, refer to the "last successful submatches in the currently active dynamic scope". But even that seems at odds with my experiments. Empirically, the following code prints "aXaa" as I had hoped:
$_ = 'a';
/(.)/; # Sets $1 to 'a'
print $1; # Prints 'a'
{
local $_; # Preserve $_
$_ = 'X';
/(.)/; # Sets $1 to 'X'
print $1; # Prints 'X'
}
print $_; # Prints 'a' ('local' restored the earlier value of $_)
print $1; # Prints 'a', suggesting localising $_ does localise $1 etc. too
But what I find truly surprising is that, in my ActivePerl 5.10.0 at least, commenting out the local line still preserves $1 -- that is, the answer "aXXa" is produced! It appears that the lexical (not dynamic) scope of the brace-enclosed block is somehow preserving the value of $1.
So I find this situation confusing at best and would love to hear a definitive explanation. Mind you, I'd actually settle for a bulletproof way to preserve all regex-related magic variables without having to enumerate them all as in:
local #+, #-, $&, $1, $2, $3, $4, ...
which is clearly a disgusting hack. Until then, I will worry that any regex I touch will clobber something the caller was not expecting to be clobbered.
Thanks!
Maybe you can suggest a better wording for the documentation. Dynamic scope means everything up to the start of the enclosing block or subroutine, plus everything up to the start of that block or subroutine call, etc. except that any closed blocks are excluded.
Another way to say it: "last successful submatches in the currently active dynamic scope" means there is implicitly a local $x=$x; at the start of each block for each variable.
Most of the mentions of dynamic scope (for instance, http://perldoc.perl.org/perlglossary.html#scope or http://perldoc.perl.org/perlglossary.html#dynamic-scoping)
are approaching it from the other way around. They apply if you think of a successful
regex as implicitly doing a local $1, etc.
I am not sure there is any real reason to be this paranoid about all these variables. I have managed to use Perl for almost ten years without once needing to use an explicit local in this context.
The answer to your specific question is: The number of digit variables is not a given (even though there is a hard memory limit to how many matches you can work with). So, it is not possible to localize all of them at the same time.
I think you are worrying too much. The best thing to do is run your match operator, immediately save the values you want into meaningful variables, then let the special variables do whatever they do without worrying about them:
if( $string =~ m/...(a.c).../ ) {
my $found = $1;
}
When I want to capture parts of the strings, I most often use the match operator in list context to get a list of the memories back:
my #array = $string =~ m/..../g;