Question about the foreach-value - perl

I've found in a Module a for-loop written like this
for( #array ) {
my $scalar = $_;
...
...
}
Is there Difference between this and the following way of writing a for-loop?
for my $scalar ( #array ) {
...
...
}

Yes, in the first example, the for loop is acting as a topicalizer (setting $_ which is the default argument to many Perl functions) over the elements in the array. This has the side effect of masking the value $_ had outside the for loop. $_ has dynamic scope, and will be visible in any functions called from within the for loop. You should primarily use this version of the for loop when you plan on using $_ for its special features.
Also, in the first example, $scalar is a copy of the value in the array, whereas in the second example, $scalar is an alias to the value in the array. This matters if you plan on setting the array's value inside the loop. Or, as daotoad helpfully points out, the first form is useful when you need a copy of the array element to work on, such as with destructive function calls (chomp, s///, tr/// ...).
And finally, the first example will be marginally slower.

$_ is the "default input and pattern matching space". In other words, if you read in from a file handle at the top of a while loop, or run a foreach loop and don't name a loop variable, $_ is set up for you.
However, if you write a foreach loop and name a loop variable, $_ is not set up.This can be justified by following code:
1. #!/usr/bin/perl -w
2. #array = (1,2,3);
3. foreach my $elmnt (#array)
4. {
5. print "$_ ";
6. }
The output being "Use of uninitialized value in concatenation (.)"
However if you replace line 3 by:
foreach (#array)
The output is "1 2 3" as expected.
Now in your case, it is always better to name a loop variable in a foreach loop to make the code more readable(perl is already cursed much for being less readable), this way there will also be no need of explicit assignment to the $_ variable and resulting scoping issues.

I can't explain better than the doc can

Related

Why doesn't Perl's foreach require its variable to be declared with my?

Today, I stumbled over something in Perl I was not aware of: it "localizes" the variable that the elements of the list iterated over is assigned to.
This, of course, is documented in the Perl documentation - however I failed to remember or read it.
The following script demonstrates what I mean:
use warnings;
use strict;
my $g = 99;
foreach $g (1..5) {
p($g);
}
sub p {
my $l = shift;
printf ("%2d %2d\n", $g, $l);
}
The script prints
99 1
99 2
99 3
99 4
99 5
because $g is "localized" to the foreach loop.
As far as I can tell there is no difference if I had added my to $g in the foreach loop:
foreach my $g (1..5) {
Actually, I ended up doing it because I feel it makes it clearer that the variable is local to the loop.
My question is now: is there a scenario where my using my does make a difference (given that $g is already declared globally).
The investigated behavior is documented in Foreach Loops in perlsyn
The foreach loop iterates over a normal list value and sets the scalar variable VAR to be each element of the list in turn. If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible only within the loop.
which continues to the explanation
Otherwise, the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop.
Thus there should be no difference between localizing it with my or leaving that to foreach.
A little curiosity is that
This implicit localization occurs only in a foreach loop.
All this is further clarified in this snippet from Private Variables via my() from perlsub
The foreach loop defaults to scoping its index variable dynamically in the manner of local. However, if the index variable is prefixed with the keyword my, or if there is already a lexical by that name in scope, then a new lexical is created instead.
Since a new lexical is created inside in both cases there cannot be any practical difference.
I absolutely support and recommend (always) having a my there.

what does print for mean in Perl?

I need to edit some Perl script and I'm new to this language.
I encountered the following statement:
print for (#$result);
I know that $result is a reference to an array and #$result returns the whole array.
But what does print for mean?
Thank you in advance.
In Perl, there's such a thing as an implicit variable. You may have seen it already as $_. There's a lot of built in functions in perl that will work on $_ by default.
$_ is set in a variety of places, such as loops. So you can do:
while ( <$filehandle> ) {
chomp;
tr/A-Z/a-z/;
s/oldword/newword/;
print;
}
Each of these lines is using $_ and modifying it as it goes. Your for loop is doing the same - each iteration of the loop sets $_ to the current value and print is then doing that by default.
I would point out though - whilst useful and clever, it's also a really good way to make confusing and inscrutable code. In nested loops, for example, it can be quite unclear what's actually going on with $_.
So I'd typically:
avoid writing it explicitly - if you need to do that, you should consider actually naming your variable properly.
only use it in places where it makes it clearer what's going on. As a rule of thumb - if you use it more than twice, you should probably use a named variable instead.
I find it particularly useful if iterating on a file handle. E.g.:
while ( <$filehandle> ) {
next unless m/keyword/; #skips any line without 'keyword' in it.
my ( $wiggle, $wobble, $fronk ) = split ( /:/ ); #split $_ into 3 variables on ':'
print $wobble, "\n";
}
It would be redundant to assign a variable name to capture a line from <$filehandle>, only to immediately discard it - thus instead we use split which by default uses $_ to extract 3 values.
If it's hard to figure out what's going on, then one of the more useful ways is to use perl -MO=Deparse which'll re-print the 'parsed' version of the script. So in the example you give:
foreach $_ (#$result) {
print $_;
}
It is equivalent to for (#$result) { print; }, which is equivalent to for (#$result) { print $_; }. $_ refers to the current element.

While loop and diamond operator in Perl

I am trying to input a text file to Perl program and reverse its order of lines i.e. last line will become first, second last will become second etc. I am using following code
#!C:\Perl64\bin
$k = 0;
while (<>){
print "the value of i is $i";
#array[k] = $_;
++$k;
}
print "the array is #array";
But for some reason, my array is only printing the last line of the text file.
Any suggestions?
Typically, rather than keep a separate array index, perl programs use the push operator to push a string onto an array. One way to do this in your program:
push #array, $_;
If you really want to do it by array index, then you need to use the following syntax:
$array[$k] = $_;
Notice the $ rather than # in front. This tells perl that you're dealing with a single element from the array, not multiple elements. #array gives you the entire array, while $array[$k] gives you a single element. (There is a more advanced topic called "slices," but let's not get into that here. I will say that #array[$k] gives you a slice, and that isn't what you want here.)
If you really just want to slurp the entire file into an array, you can do that in one step:
#array = ( <> );
That will read the entire file into #array in one step.
You might have noticed I omitted/ignored your print statement. I'm not sure what it's doing printing out a variable named $i, since it didn't seem connected at all to the rest of the code. I reasoned it was debug code you had added, and not really relevant to the task at hand.
Anyway, that should get your input into #array. Now reversing the array... There are many ways you could do this in perl, but I'll let you discover those yourself.
Instead of:
#array[k] = $_;
you want:
$array[$k] = $_;
To reference the scalar variable $k, you need the $ on the front. Without that it is interpreted as the literal string 'k', which when used as an array index would be interpreted as 0 (since a non-numeric string will be interpreted as 0 in a numeric context).
So, each time around the loop you are setting the first element to the line read in (overwriting the value set in the previous iteration).
A few other tips:
#array[ ] is actually the syntax for an array slice rather than a single element. It works in this case because you are assigning to a slice of 1. The usual syntax for accessing a single element would be $array[ ].
I recommend placing 'use strict;' at the top of your script - you would have gotten an error pointing out the incorrect reference to $k
Instead of using an index variable, you could push the values onto the end of the array, eg:
while (<>) {
push #array, $_;
}
Accept input until it finds the word end
Solution1
#!/usr/bin/perl
while(<>) {
last if $_=~/end/i;
push #array,$_;
}
for (my $i=scalar(#array);$i>=0;$i--){
print pop #array;
}
Solution2
while(<>){
last if $_=~/end/i;
push #array,$_;
}
print reverse(#array);

A better variable naming system?

A newbie to programming. The task is to extract a particular data from a string and I chose to write the code as follows -
while ($line =<IN>) {
chomp $line;
#tmp=(split /\t/, $line);
next if ($tmp[0] !~ /ch/);
#tgt1=#tmp[8..11];
#tgt2=#tmp[12..14];
#tgt3=#tmp[15..17];
#tgt4=#tmp[18..21];
foreach (1..4) {
print #tgt($_), "\n";
}
I thought #tgt($_) would be interpreted as #tgt1, #tgt2, #tgt3, #tgt4 but I still get the error message that #tgt is a global symbol (#tgt1, #tgt2, #tgt3, #tgt4` have been declared).
Q1. Did I misunderstand foreach loop?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
I'll asnswer both together.
#tgt($_) does NOT mean what you hope it means
First off, it's an invalid syntax (you can't use () after an array name, perl interpeter will produce a compile error).
What you're trying to do is access distinct variables by accessing a variable via an expression resulting in its name (aka symbolic references). This IS possible to do; but is typically a bad idea and poor-style Perl (as in, you CAN but you SHOULD NOT do it, without a very very good reason).
To access element $_ the way you tried, you use #{"tgt$_"} syntax. But I repeat - Do Not Do That, even if you can.
A correct idiomatic solution: use an array of arrayrefs, with your 1-4 (or rather 0-3) indexing the outer array:
# Old bad code: #tgt1=#tmp[8..11];
# New correct code:
$tgt[0]=[ #tmp[8..11] ]; # [] creates an array reference from a list.
# etc... repeat 4 times - you can even do it in a smart loop later.
What this does is, it stores a reference to an array slice into a zeroth element of a single #tgt array.
At the end, #tgt array has 4 elements , each an array reference to an array containing one of the slices.
Q1. Did I misunderstand foreach loop?
Your foreach loop (as opposed to its contents - see above) was correct, with one style caveat - again, while you CAN use a default $_ variable, you should almost never use it, instead always use named variables for readability.
You print the abovementioned array of arrayrefs as follows (ask separately if any of the syntax is unclear - this is a mid-level data structure handling, not for beginners):
foreach my $index (0..3) {
print join(",", #{ $tgt[$index]}) . "\n";
}

How is $_ different from named input or loop arguments?

As I use $_ a lot I want to understand its usage better. $_ is a global variable for implicit values as far as I understood and used it.
As $_ seems to be set anyway, are there reasons to use named loop variables over $_ besides readability?
In what cases does it matter $_ is a global variable?
So if I use
for (#array){
print $_;
}
or even
print $_ for #array;
it has the same effect as
for my $var (#array){
print $var;
}
But does it work the same? I guess it does not exactly but what are the actual differences?
Update:
It seems $_ is even scoped correctly in this example. Is it not global anymore? I am using 5.12.3.
#!/usr/bin/perl
use strict;
use warnings;
my #array = qw/one two three four/;
my #other_array = qw/1 2 3 4/;
for (#array){
for (#other_array){
print $_;
}
print $_;
}
that prints correctly 1234one1234two1234three1234four.
For global $_ I would have expected 1234 4 1234 4 1234 4 1234 4 .. or am i missing something obvious?
When is $_ global then?
Update:
Ok, after having read the various answers and perlsyn more carefully I came to a conclusion:
Besides readability it is better to avoid using $_ because implicit localisation of $_ must be known and taken account of otherwise one might encounter unexpected behaviour.
Thanks for clarification of that matter.
are there reasons to use named loop variables over $_ besides readability?
The issue is not if they are named or not. The issue is if they are "package variables" or "lexical variables".
See the very good description of the 2 systems of variables used in Perl "Coping with Scoping":
http://perl.plover.com/FAQs/Namespaces.html
package variables are global variables, and should therefore be avoided for all the usual reasons (eg. action at a distance).
Avoiding package variables is a question of "correct operation" or "harder to inject bugs" rather than a question of "readability".
In what cases does it matter $_ is a global variable?
Everywhere.
The better question is:
In what cases is $_ local()ized for me?
There are a few places where Perl will local()ize $_ for you, primarily foreach, grep and map. All other places require that you local()ize it yourself, therefore you will be injecting a potential bug when you inevitably forget to do so. :-)
The classic failure mode of using $_ (implicitly or explicitly) as a loop variable is
for $_ (#myarray) {
/(\d+)/ or die;
foo($1);
}
sub foo {
 open(F, "foo_$_[0]") or die;
while (<F>) {
...
}
}
where, because the loop variable in for/foreach is bound to the actual list item, means that the while (<F>) overwrites #myarray with lines read from the files.
$_ is the same as naming the variable as in your second example with the way it is usually used. $_ is just a shortcut default variable name for the current item in the current loop to save on typing when doing a quick, simple loop. I tend to use named variables rather than the default. It makes it more clear what it is and if I happen to need to do a nested loop there are no conflicts.
Since $_ is a global variable, you may get unexpected values if you try to use its value that it had from a previous code block. The new code block may be part of a loop or other operation that inserts its own values into $_, overwriting what you expected to be there.
The risk in using $_ is that it is global (unless you localise it with local $_), and so if some function you call in your loop also uses $_, the two uses can interfere.
For reasons which are not clear to me, this has only bitten me occasionally, but I usually localise $_ if I use it inside packages.
There is nothing special about $_ apart from it is the default parameter for many functions. If you explicitly lexically scope your $_ with my, perl will use the local version of $_ rather than the global one. There is nothing strange in this, it is just like any other named variable.
sub p { print "[$_]"; } # Prints the global $_
# Compare and contrast
for my $_ (b1..b5) { for my $_ (a1..a5) { p } } print "\n"; # ex1
for my $_ (b1..b5) { for (a1..a5) { p } } print "\n"; # ex2
for (b1..b5) { for my $_ (a1..a5) { p } } print "\n"; # ex3
for (b1..b5) { for (a1..a5) { p } } print "\n"; # ex4
You should be slightly mystified by the output until you find out that perl will preserve the original value of the loop variable on loop exit (see perlsyn).
Note ex2 above. Here the second loop is using the lexically scoped $_ declared in the first loop. Subtle, but expected. Again, this value is preserved on exit so the two loops do not interfere.