Topicalising a variable using "for" is apparently bad. Why? - perl

So I answered a question on SO and got a lot of flack for it.
I have been using Perl for many years and use this topicalising quite a lot.
So let's start with some code. I am doing search and replace in these examples. The idea is to search for one and three from two strings and replace them.
$values = 'one two three four five';
$value2 = 'one 2 three four 5';
$values =~ s/one//g;
$values =~ s/three//g;
$values2 =~ s/one//g;
$values2 =~ s/three//g;
This code is simple and everyone accepts it.
I can also build an array or hash with a list of values to search and replace which is also acceptable.
However, When I build a script to topicalise $values and $values2 and lessen the amount of typing to build a script it seems to be misunderstood?
Here is the code.
$values = 'one two three four five';
$value2 = 'one 2 three four 5';
for ( $values, $values2 ) {
s/one//g;
s/three//g;
}
The above code will topicalise the variables for the duration of the for block, but many programmers are against this. I want to understand why this is unacceptable?

There are several points to consider.
Your code performs multiple substitutions on a list of variables. You can do that without using $_:
for my $s ($values, $values2) {
$s =~ s/one//g;
$s =~ s/three//g;
}
Personally I think nothing is wrong with the above code.
The general problem with $_ is that it's not a local variable. E.g. if the body of your for loop calls a function (that calls a function ...) that modifies $_ without localizing it (e.g. by assigning to it or using a bare s/// or using while (<...>)), then it will overwrite the variables you're iterating over. With a my variable you're protected because other functions can't see your local variables.
That said, if the rest of your code doesn't have this bug (scribbling over $_ without localizing it), $_ will work fine here.
However, the code in your answer people originally complained about is different:
for ($brackets) {
s/\\left/(/g;
s/\\right/)/g;
}
Here you're not trying to perform the same substitutions on many variables; you're just saving yourself some typing by getting rid of the $brackets =~ part:
$brackets =~ s/\\left/(/g;
$brackets =~ s/\\right/)/g;
Using an explicit loop variable wouldn't be a solution because you'd still have to type out $foo =~ on every line.
This is more a matter of taste. You're only using for for its aliasing effect, not to loop over multiple values. Personally I'm still OK with this.

perldoc perlsyn has this
The foreach is the non-experimental way to set a topicalizer.
The OP's construct is a perfectly valid way of writing Perl code. The only provisons I have regarding their earlier answer are
Unlike the example here, only two operations were being applied to a single variable. That is only marginally briefer than simply writing two substitutions and I wouldn't bother here, although I may consider
s/one//g, s/three//g for $value;
Other than the topicaliser, the answer is identical to another one already posted. I don't believe this makes it sufficiently different to warrant another post

Related

Exchange hash values without temporary variable

I have two hash values a couple of levels deep in a data structure that I would like to exchange and then switch back later.
$hashref->{$irrelevant}{$key1} and $hashref->{$irrelevant}{$key2}
Since they are such long names, ($a, $b) = ($b, $a) would be way too long for a single line of code.
Is there a way to do this elegantly, or am I stuck taking up three lines by exchanging with a temporary variable?
You people who hide "irrelevant" data meanings aren't doing anyone any favours. We still have to write a solution, but it has to be in abstract terms that make no sense either to you or me!
The neatest way I can think of is with a pair of hash slices
my $irrelevant_href = $hashref->{$irrelevant};
#{$irrelevant_href}{$key1, $key2} = #{$irrelevant_href}{$key2, $key1};
Create a sub to make it clear what the long line full of symbols is doing.
sub swap { ($_[0], $_[1]) = ($_[1], $_[0]) }
And it also makes the line shorter.
swap($hashref->{$irrelevant}{$key1}, $hashref->{$irrelevant}{$key2});
You could even use
swap(#{ $hashref->{$irrelevant} }{ $key1, $key2 });
There are far more grave sins in coding that declaring a lexical temp variable, but since you're asking let me throw an idea at you.
If you'll be doing any significant work with $hashref->{$irrelevant}, perhaps you should grab a copy of it specifically, to both make your code briefer for a maintenance programmer & cut out a level of dereferencing with every use.
For example:
# capture inner reference 'cause we'll be using it alot anyway...
my $ir_h_ref = $hashref->{$irrelevant};
#stuff here
# Do swap with shorter reference chain
( $ir_h_ref->{$key1}, $ir_h_ref->{$key2} ) =
( $ir_h_ref->{$key2}, $ir_h_ref->{$key1} );
Now this new variable is probably not worth it just for the sake of the switch, but if you'll be doing much more with that hash in the same code block, it just may become attractive.

Round brackets enclosing private variables. Why used in this case?

I am reading Learning Perl 6th edition, and the subroutines chapter has this code:
foreach (1..10) {
my($square) = $_ * $_; # private variable in this loop
print "$_ squared is $square.\n";
}
Now I understand that the list syntax, ie the brackets, are used to distinguish between list context and scalar context as in:
my($num) = #_; # list context, same as ($num) = #_;
my $num = #_; # scalar context, same as $num = #_;
But in the foreach loop case I can't see how a list context is appropriate.
And I can change the code to be:
foreach (1..10) {
my $square = $_ * $_; # private variable in this loop
print "$_ squared is $square.\n";
}
And it works exactly the same. So why did the author use my($square) when a simple my $square could have been used instead?
Is there any difference in this case?
Certainly in this case, the brackets aren't necessary. They're not strictly wrong in the sense that they do do what the author intends. As with so much in Perl, there's more than one way to do it.
So there's the underlying question: why did the author choose to do this this way? I wondered at first whether it was the author's preferred style: perhaps he chose always to put his lists of new variables in brackets simply so that something like:
my ($count) = 4;
where the brackets aren't doing anything helpful, at least looked consistent with something like:
my ($min, $max) = (2, 3);
But looking at the whole book, I can't find a single example of this use of brackets for a single value other than the section you referenced. As one example of many, the m// in List Context section in Chapter 9 contains a variety of different uses of my with assignments, but does not use brackets with any single values.
I'm left with the conclusion that as the author introduced my in subroutines with my($m, $n); he tried to vary the syntax as little as possible the next time he used it, ending up with my($max_so_far) and then tried to explain scalar and list contexts, as you quoted above. I'm not sure this is terribly helpful.
TL;DR It's not necessary, although it's not actually wrong. Probably a good idea to avoid this style in your code.
You're quite correct. It's redundant. It doesn't make any difference in this case, because you're effectively forcing a list context to list context operation.
E.g.
my ( $square ) = ( $_ * $_ );
Which also produces the same result. So - in this case, doesn't matter. But is generally speaking not good coding style.

A better variable naming system?

A newbie to programming. The task is to extract a particular data from a string and I chose to write the code as follows -
while ($line =<IN>) {
chomp $line;
#tmp=(split /\t/, $line);
next if ($tmp[0] !~ /ch/);
#tgt1=#tmp[8..11];
#tgt2=#tmp[12..14];
#tgt3=#tmp[15..17];
#tgt4=#tmp[18..21];
foreach (1..4) {
print #tgt($_), "\n";
}
I thought #tgt($_) would be interpreted as #tgt1, #tgt2, #tgt3, #tgt4 but I still get the error message that #tgt is a global symbol (#tgt1, #tgt2, #tgt3, #tgt4` have been declared).
Q1. Did I misunderstand foreach loop?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
Q2. Why couldn't perl see #tgt($_) as #tgt1, #tgt2 ..etc?
Q2. From the experience this is probably a bad way to name variables. What would be a preferred way to name variables that have similar features?
I'll asnswer both together.
#tgt($_) does NOT mean what you hope it means
First off, it's an invalid syntax (you can't use () after an array name, perl interpeter will produce a compile error).
What you're trying to do is access distinct variables by accessing a variable via an expression resulting in its name (aka symbolic references). This IS possible to do; but is typically a bad idea and poor-style Perl (as in, you CAN but you SHOULD NOT do it, without a very very good reason).
To access element $_ the way you tried, you use #{"tgt$_"} syntax. But I repeat - Do Not Do That, even if you can.
A correct idiomatic solution: use an array of arrayrefs, with your 1-4 (or rather 0-3) indexing the outer array:
# Old bad code: #tgt1=#tmp[8..11];
# New correct code:
$tgt[0]=[ #tmp[8..11] ]; # [] creates an array reference from a list.
# etc... repeat 4 times - you can even do it in a smart loop later.
What this does is, it stores a reference to an array slice into a zeroth element of a single #tgt array.
At the end, #tgt array has 4 elements , each an array reference to an array containing one of the slices.
Q1. Did I misunderstand foreach loop?
Your foreach loop (as opposed to its contents - see above) was correct, with one style caveat - again, while you CAN use a default $_ variable, you should almost never use it, instead always use named variables for readability.
You print the abovementioned array of arrayrefs as follows (ask separately if any of the syntax is unclear - this is a mid-level data structure handling, not for beginners):
foreach my $index (0..3) {
print join(",", #{ $tgt[$index]}) . "\n";
}

In Perl, why does the `while(<HANDLE>) {...}` construct not localize `$_`?

What was the design (or technical) reason for Perl not automatically localizing $_ with the following syntax:
while (<HANDLE>) {...}
Which gets rewritten as:
while (defined( $_ = <HANDLE> )) {...}
All of the other constructs that implicitly write to $_ do so in a localized manner (for/foreach, map, grep), but with while, you must explicitly localize the variable:
local $_;
while (<HANDLE>) {...}
My guess is that it has something to do with using Perl in "Super-AWK" mode with command line switches, but that might be wrong.
So if anyone knows (or better yet was involved in the language design discussion), could you share with us the reasoning behind this behavior? More specifically, why was allowing the value of $_ to persist outside of the loop deemed important, despite the bugs it can cause (which I tend to see all over the place on SO and in other Perl code)?
In case it is not clear from the above, the reason why $_ must be localized with while is shown in this example:
sub read_handle {
while (<HANDLE>) { ... }
}
for (1 .. 10) {
print "$_: \n"; # works, prints a number from 1 .. 10
read_handle;
print "done with $_\n"; # does not work, prints the last line read from
# HANDLE or undef if the file was finished
}
From the thread on perlmonks.org:
There is a difference between foreach
and while because they are two totally
different things. foreach always
assigns to a variable when looping
over a list, while while normally
doesn't. It's just that while (<>) is
an exception and only when there's a
single diamond operator there's an
implicit assignment to $_.
And also:
One possible reason for why while(<>)
does not implicitly localize $_ as
part of its magic is that sometimes
you want to access the last value of
$_ outside the loop.
Quite simply, while never localises. No variable is associated with a while construct, so it doesn't have even have anything to localise.
If you change some variable in the while loop expression or in a while loop body, it's your responsibility to adequately scope it.
Speculation: Because for and foreach are iterators and loop over values, while while operates on a condition. In the case of while (<FH>) the condition is that data was read from the file. The <FH> is what writes to $_, not the while. The implicit defined() test is just an affordance to prevent naive code from terminating the loop on a read of false value.
For other forms of while loops, e.g. while (/foo/) you wouldn't want to localize $_.
While I agree that it would be nice if while (<FH>) localized $_, it would have to be a very special case, which could cause other problems with recognizing when to trigger it and when not to, much like the rules for <EXPR> distinguishing being a handle read or a call to glob.
As a side note, we only write while(<$fh>) because Perl doesn't have real iterators. If Perl had proper iterators, <$fh> would return one. for would use that to iterate a line at a time rather than slurping the whole file into an array. There would be no need for while(<$fh>) or the special cases associated with it.

Why does Perl::Critic dislike using shift to populate subroutine variables?

Lately, I've decided to start using Perl::Critic more often on my code. After programming in Perl for close to 7 years now, I've been settled in with most of the Perl best practices for a long while, but I know that there is always room for improvement. One thing that has been bugging me though is the fact that Perl::Critic doesn't like the way I unpack #_ for subroutines. As an example:
sub my_way_to_unpack {
my $variable1 = shift #_;
my $variable2 = shift #_;
my $result = $variable1 + $variable2;
return $result;
}
This is how I've always done it, and, as its been discussed on both PerlMonks and Stack Overflow, its not necessarily evil either.
Changing the code snippet above to...
sub perl_critics_way_to_unpack {
my ($variable1, $variable2) = #_;
my $result = $variable1 + $variable2;
return $result;
}
...works too, but I find it harder to read. I've also read Damian Conway's book Perl Best Practices and I don't really understand how my preferred approach to unpacking falls under his suggestion to avoid using #_ directly, as Perl::Critic implies. I've always been under the impression that Conway was talking about nastiness such as:
sub not_unpacking {
my $result = $_[0] + $_[1];
return $result;
}
The above example is bad and hard to read, and I would never ever consider writing that in a piece of production code.
So in short, why does Perl::Critic consider my preferred way bad? Am I really committing a heinous crime unpacking by using shift?
Would this be something that people other than myself think should be brought up with the Perl::Critic maintainers?
The simple answer is that Perl::Critic is not following PBP here. The
book explicitly states that the shift idiom is not only acceptable, but
is actually preferred in some cases.
Running perlcritic with --verbose 11 explains the policies. It doesn't look like either of these explanations applies to you, though.
Always unpack #_ first at line 1, near
'sub xxx{ my $aaa= shift; my ($bbb,$ccc) = #_;}'.
Subroutines::RequireArgUnpacking (Severity: 4)
Subroutines that use `#_' directly instead of unpacking the arguments to
local variables first have two major problems. First, they are very hard
to read. If you're going to refer to your variables by number instead of
by name, you may as well be writing assembler code! Second, `#_'
contains aliases to the original variables! If you modify the contents
of a `#_' entry, then you are modifying the variable outside of your
subroutine. For example:
sub print_local_var_plus_one {
my ($var) = #_;
print ++$var;
}
sub print_var_plus_one {
print ++$_[0];
}
my $x = 2;
print_local_var_plus_one($x); # prints "3", $x is still 2
print_var_plus_one($x); # prints "3", $x is now 3 !
print $x; # prints "3"
This is spooky action-at-a-distance and is very hard to debug if it's
not intentional and well-documented (like `chop' or `chomp').
An exception is made for the usual delegation idiom
`$object->SUPER::something( #_ )'. Only `SUPER::' and `NEXT::' are
recognized (though this is configurable) and the argument list for the
delegate must consist only of `( #_ )'.
It's important to remember that a lot of the stuff in Perl Best Practices is just one guy's opinion on what looks the best or is the easiest to work with, and it doesn't matter if you do it another way. Damian says as much in the introductory text to the book. That's not to say it's all like that -- there are many things in there that are absolutely essential: using strict, for instance.
So as you write your code, you need to decide for yourself what your own best practices will be, and using PBP is as good a starting point as any. Then stay consistent with your own standards.
I try to follow most of the stuff in PBP, but Damian can have my subroutine-argument shifts and my unlesses when he pries them from my cold, dead fingertips.
As for Critic, you can choose which policies you want to enforce, and even create your own if they don't exist yet.
In some cases Perl::Critic cannot enforce PBP guidelines precisely, so it may enforce an approximation that attempts to match the spirit of Conway's guidelines. And it is entirely possible that we have misinterpreted or misapplied PBP. If you find something that doesn't smell right, please mail a bug report to bug-perl-critic#rt.cpan.org and we'll look into it right away.
Thanks,
-Jeff
I think you should generally avoid shift, if it is not really necessary!
Just ran into a code like this:
sub way {
my $file = shift;
if (!$file) {
$file = 'newfile';
}
my $target = shift;
my $options = shift;
}
If you start changing something in this code, there is a good chance you might accidantially change the order of the shifts or maybe skip one and everything goes southway. Furthermore it's hard to read - because you cannot be sure you really see all parameters for the sub, because some lines below might be another shift somewhere... And if you use some Regexes in between, they might replace the contents of $_ and weird stuff begins to happen...
A direct benefit of using the unpacking my (...) = #_ is you can just copy the (...) part and paste it where you call the method and have a nice signature :) you can even use the same variable-names beforehand and don't have to change a thing!
I think shift implies list operations where the length of the list is dynamic and you want to handle its elements one at a time or where you explicitly need a list without the first element. But if you just want to assign the whole list to x parameters, your code should say so with my (...) = #_; no one has to wonder.