Is it correct to write This?
sub foobar {
local $_ = $_[0] if #_;
s/foo/bar/;
$_;
}
The idea is to take $_ if no arguments given as chompdo. I can then either write
foobar($_);
or
&foobar;
local $_ = ... if #_; will only localize $_ if the the sub received an argument, meaning it won't protect the caller's $_ is the sub doesn't receive an argument, and that's not what you want.
The minimal fix is
sub sfoobar {
local $_ = #_ ? shift : $_;
s/foo/bar/;
return $_;
}
But you might as well use a named variable at this point.
sub sfoobar {
my $s = #_ ? shift : $_;
$s =~ s/foo/bar/;
return $s;
}
5.10+ introduced the _ prototype.
sub sfoobar(_) {
my ($s) = #_;
$s =~ s/foo/bar/;
return $s;
}
5.14+ introduced s///r.
sub sfoobar(_) {
return $_[0] =~ s/foo/bar/r;
}
This isn't correct, no. The trouble is that you can't conditionally local something - it's either localised, or it isn't.
Instead of doing that, what I suggest is you localise it, then conditionally copy from #_
local $_ = $_;
$_ = shift if #_;
This way, $_ is always localised, but only conditionally copied from the first positional argument if one exists.
If you want to pass use an outer $_ in a subroutine, you can use the "_" prototype:
# dolund.pl
#
use strict;
sub dolund (_)
{ my $p1 = $_[0];
print "passed parameter is $p1\n";
}
dolund 12; # prints 12
my $fred = 21; # prints 21
dolund $fred;
$_ = 'not 12';
dolund; # prints "not 12"
Obiously, you could use $p1=~ s/foo/bar/; if you like. I just wanted to demonstrate the implicit passing of $_.
I've got to ask - what are you actually trying to accomplish here?
It looks like you want a sub that works like some of the 'builtins' like chomp. I would suggest this is bad practice.
Unexpected things make code maintenance harder. The next guy to maintain your code should never have to think 'wtf?'.
messing with 'built in' variables - such as reassigning values to $_ can have very strange consequences.
if someone sees your subroutine call, they're going to have to go and look what it does anyway. That's almost by definition a bad subroutine.
Question: What's the scope of $_?
Answer: "It's complicated" because it's sort of global, but sometimes it's implicitly localized. And sometimes it's not a variable in it's own right, it's an alias that by modifying it you change the original.
This means it's just plain bad news from a code maintainability standpoint.
From: http://perldoc.perl.org/perlvar.html#General-Variables
$_ is by default a global variable. However, as of perl v5.10.0, you can use a lexical version of $_ by declaring it in a file or in a block with my. Moreover, declaring our $_ restores the global $_ in the current scope. Though this seemed like a good idea at the time it was introduced, lexical $_ actually causes more problems than it solves. If you call a function that expects to be passed information via $_ , it may or may not work, depending on how the function is written, there not being any easy way to solve this. Just avoid lexical $_ , unless you are feeling particularly masochistic. For this reason lexical $_ is still experimental and will produce a warning unless warnings have been disabled. As with other experimental features, the behavior of lexical $_ is subject to change without notice, including change into a fatal error.
Related
How can I get a perl sub to use $_ when the parameter is omitted, like chr does? Is this the best way?
my #chars = map { chr } #numbers; # example
my #trimmed_names = map { trim } #names;
sub trim
{
my $str = shift || $_;
$str =~ s/^\s+|\s+$//g;
return $str;
}
The $_ is directly seen in a sub called in its scope, so you can indeed just use it
sub trim { s/^\s+|\s+$//gr } # NOTE: doesn't change $_
where with /r modifier the changed string is returned and original isn't changed, crucial here.
However, this can be tricky and can (easily) result in subtle and very hard-to-find bugs. Here is one ready example. If we changed the $_ in the sub during processing, like
sub trim { # WARNING: caller's data changed
s/^\s+|\s+$//g;
return $_;
}
then the elements of #names in the caller have been changed, what is generally not expected. This is because the changed upper-scope $_ itself is aliased in map's body.† As $_ is a convenient default for many things we'd have to keep track of everything used in the sub. So I'd indeed first copy $_, or safer yet localize it, in the sub and work with that.
Finally, in order to use either a passed parameter or $_ (at the point of the call)
sub trim {
my $str = #_ ? shift : $_; #/
$str =~ s/^\s+|\s$//gr;
}
my #trimmed_names = map { trim } #names; # may omit () if sub declared before
This is because the visibility of $_ is unrelated to the argument list in #_ so one can also pass arguments. Here we also get the (much) safer copying of $_.
The shift || $_ from the question would dismiss a 0 or '' (empty string) in #_, what is in principle valid input; the shift // $_ would dismiss an undef, also a possible input. Thanks to ikegami's comment on this. Thus explicitly test whether there is anything in #_.
While passing a variable that's undef isn't valid here it may be valid input in general. More to the point, the premise here is to use an argument if provided, so we should do that and then (hopefully) detect an error from the calling code (if passing undef shouldn't have happened), instead of quietly side-stepping it, by switching to $_.
So, my answer is a qualified "yes" -- that's one way to do it; but I may find it uncomfortable to work with a codebase where user's subs mix scopes. This example trim in map is perfectly safe as it stands, but where else may such a function wind up used? Why not just pass arguments?
Note: In order to be able to call a user-defined sub without parenthesis we must have it declared in the source before the point of invocation so that the interpreter knows what that bareword (trim) is, since without parens it doesn't have any hints.
† I think it's worth recalling at this point that arguments to a sub are aliased, not copied, so if elements of #_ themselves are changed then caller's data gets changed. This isn't directly related to $_ but the behavior can be.
You can use the _ prototype.
sub trim(_) { $_[0] =~ s/^\s+|\s+\z//rg }
Otherwise, you can simply use $_ if no arguments were provided.
sub trim { ( #_ ? $_[0] : $_ ) =~ s/^\s+|\s+\z//rg }
Either way,
say for map trim, #strings;
-or-
say for map trim($_), #strings;
I'm trying map() with my own subroutine. When I tried it with a Perl's builtin function, it works. But when I tried map() with my own subroutine, it fails.
I couldn't point out what makes the error.
Here is the code snippet.
#!/usr/bin/perl
use strict;
sub mysqr {
my ($input) = #_;
my $answer = $input * $input;
return $answer;
}
my #questions = (1,2,3,4,5);
my #answers;
#answers = map(mysqr, #questions); # doesn't work.
#answers = map {mysqr($_)} #questions; #works.
print "map = ";
print join(", ", #answers);
print "\n";
Map always assigns an element of the argument list to $_, then evaluates the expression. So map mysqr($_), 1,2,3,4,5 calls mysqr on each of the elements 1,2,3,4,5, because $_ is set to each of 1,2,3,4,5 in turn.
The reason you can often omit the $_ when calling a built-in function is that many Perl built-in functions, if not given an argument, will operate on $_ by default. For example, the lc function does this. Your mysqr function doesn't do this, but if you changed it to do this, the first form would work:
sub mysqr {
my $input;
if (#_) { ($input) = #_ }
else { $input = $_ } # No argument was given, so default to $_
my $answer = $input * $input;
return $answer;
}
map(mysqr, 1,2,3,4,5); # works now
The difference is that in the second case, you are explicitly passing the argument, and in the first one, you pass nothing.
#answers = map(mysqr, #questions); # same as mysqr(), no argument passed
#answers = map {mysqr($_)} #questions; # $_ is passed on to $input
You might be thinking of the fact that many Perl built-in functions use $_ when no argument is given. This is, however, not the default behaviour of user defined subroutines. If you want that functionality, you need to add it yourself. Though be warned that it often is not a good idea.
Note that if you use use warnings, which you always should, you will get a descriptive error:
Use of uninitialized value $input in multiplication (*) at foo.pl line 8.
Which tells you that no data is passed to $input.
Not using warnings is not removing errors from your code, it is merely hiding them, much like hiding the "low oil" warning lamp in a car does not prevent engine failure.
I've been programming in Perl for a while, but I never have understood a couple of subtleties about Perl:
The use and the setting/unsetting of the $_ variable confuses me. For instance, why does
# ...
shift #queue;
($item1, #rest) = split /,/;
work, but (at least for me)
# ...
shift #queue;
/some_pattern.*/ or die();
does not seem to work?
Also, I don't understand the difference between iterating through a file using foreach versus while. For instance,I seem to be getting different results for
while(<SOME_FILE>){
# Do something involving $_
}
and
foreach (<SOME_FILE>){
# Do something involving $_
}
Can anyone explain these subtle differences?
shift #queue;
($item1, #rest) = split /,/;
If I understand you correctly, you seem to think that this shifts off an element from #queue to $_. That is not true.
The value that is shifted off of #queue simply disappears The following split operates on whatever is contained in $_ (which is independent of the shift invocation).
while(<SOME_FILE>){
# Do something involving $_
}
Reading from a filehandle in a while statement is special: It is equivalent to
while ( defined( $_ = readline *SOME_FILE ) ) {
This way, you can process even colossal files line-by-line.
On the other hand,
for(<SOME_FILE>){
# Do something involving $_
}
will first load the entire file as a list of lines into memory. Try a 1GB file and see the difference.
Another, albeit subtle, difference between:
while (<FILE>) {
}
and:
foreach (<FILE>) {
}
is that while() will modify the value of $_ outside of its scope, whereas, foreach() makes $_ local. For example, the following will die:
$_ = "test";
while (<FILE1>) {
print "$_";
}
die if $_ ne "test";
whereas, this will not:
$_ = "test";
foreach (<FILE1>) {
print "$_";
}
die if $_ ne "test";
This becomes more important with more complex scripts. Imagine something like:
sub func1() {
while (<$fh2>) { # clobbers $_ set from <$fh1> below
<...>
}
}
while (<$fh1>) {
func1();
<...>
}
Personally, I stay away from using $_ for this reason, in addition to it being less readable, etc.
Regarding the 2nd question:
while (<FILE>) {
}
and
foreach (<FILE>) {
}
Have the same functional behavior, including setting $_. The difference is that while() evaluates <FILE> in a scalar context, while foreach() evaluates <FILE> in a list context. Consider the difference between:
$x = <FILE>;
and
#x = <FILE>;
In the first case, $x gets the first line of FILE, and in the second case #x gets the entire file. Each entry in #x is a different line in FILE.
So, if FILE is very big, you'll waste memory slurping it all at once using foreach (<FILE>) compared to while (<FILE>). This may or may not be an issue for you.
The place where it really matters is if FILE is a pipe descriptor, as in:
open FILE, "some_shell_program|";
Now foreach(<FILE>) must wait for some_shell_program to complete before it can enter the loop, while while(<FILE>) can read the output of some_shell_program one line at a time and execute in parallel to some_shell_program.
That said, the behavior with regard to $_ remains unchanged between the two forms.
foreach evaluates the entire list up front. while evaluates the condition to see if its true each pass. while should be considered for incremental operations, foreach only for list sources.
For example:
my $t= time() + 10 ;
while ( $t > time() ) { # do something }
StackOverflow: What’s the difference between iterating over a file with foreach or while in Perl?
It is to avoid this sort of confusion that it's considered better form to avoid using the implicit $_ constructions.
my $element = shift #queue;
($item,#rest) = split /,/ , $element;
or
($item,#rest) = split /,/, shift #queue;
likewise
while(my $foo = <SOMEFILE>){
do something
}
or
foreach my $thing(<FILEHANDLE>){
do something
}
while only checks if the value is true, for also places the value in $_, except in some circumstances. For example <> will set $_ if used in a while loop.
to get similar behaviour of:
foreach(qw'a b c'){
# Do something involving $_
}
You have to set $_ explicitly.
while( $_ = shift #{[ qw'a b c' ]} ){
# Do something involving $_
}
It is better to explicitly set your variables
for my $line(<SOME_FILE>){
}
or better yet
while( my $line = <SOME_FILE> ){
}
which will only read in the file one line at a time.
Also shift doesn't set $_ unless you specifically ask it too
$_ = shift #_;
And split works on $_ by default. If used in scalar, or void context will populate #_.
Please read perldoc perlvar so that you will have an idea of the different variables in Perl.
perldoc perlvar.
OK, I have the following code:
use strict;
my #ar = (1, 2, 3);
foreach my $a (#ar)
{
$a = $a + 1;
}
print join ", ", #ar;
and the output?
2, 3, 4
What the heck? Why does it do that? Will this always happen? is $a not really a local variable? What where they thinking?
Perl has lots of these almost-odd syntax things which greatly simplify common tasks (like iterating over a list and changing the contents in some way), but can trip you up if you're not aware of them.
$a is aliased to the value in the array - this allows you to modify the array inside the loop. If you don't want to do that, don't modify $a.
See perldoc perlsyn:
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
There is nothing weird or odd about a documented language feature although I do find it odd how many people refuse check the docs upon encountering behavior they do not understand.
$a in this case is an alias to the array element. Just don't have $a = in your code and you won't modify the array. :-)
If I remember correctly, map, grep, etc. all have the same aliasing behaviour.
As others have said, this is documented.
My understanding is that the aliasing behavior of #_, for, map and grep provides a speed and memory optimization as well as providing interesting possibilities for the creative. What happens is essentially, a pass-by-reference invocation of the construct's block. This saves time and memory by avoiding unnecessary data copying.
use strict;
use warnings;
use List::MoreUtils qw(apply);
my #array = qw( cat dog horse kanagaroo );
foo(#array);
print join "\n", '', 'foo()', #array;
my #mapped = map { s/oo/ee/g } #array;
print join "\n", '', 'map-array', #array;
print join "\n", '', 'map-mapped', #mapped;
my #applied = apply { s/fee//g } #array;
print join "\n", '', 'apply-array', #array;
print join "\n", '', 'apply-applied', #applied;
sub foo {
$_ .= 'foo' for #_;
}
Note the use of List::MoreUtils apply function. It works like map but makes a copy of the topic variable, rather than using a reference. If you hate writing code like:
my #foo = map { my $f = $_; $f =~ s/foo/bar/ } #bar;
you'll love apply, which makes it into:
my #foo = apply { s/foo/bar/ } #bar;
Something to watch out for: if you pass read only values into one of these constructs that modifies its input values, you will get a "Modification of a read-only value attempted" error.
perl -e '$_++ for "o"'
the important distinction here is that when you declare a my variable in the initialization section of a for loop, it seems to share some properties of both locals and lexicals (someone with more knowledge of the internals care to clarify?)
my #src = 1 .. 10;
for my $x (#src) {
# $x is an alias to elements of #src
}
for (#src) {
my $x = $_;
# $_ is an alias but $x is not an alias
}
the interesting side effect of this is that in the first case, a sub{} defined within the for loop is a closure around whatever element of the list $x was aliased to. knowing this, it is possible (although a bit odd) to close around an aliased value which could even be a global, which I don't think is possible with any other construct.
our #global = 1 .. 10;
my #subs;
for my $x (#global) {
push #subs, sub {++$x}
}
$subs[5](); # modifies the #global array
Your $a is simply being used as an alias for each element of the list as you loop over it. It's being used in place of $_. You can tell that $a is not a local variable because it is declared outside of the block.
It's more obvious why assigning to $a changes the contents of the list if you think about it as being a stand in for $_ (which is what it is). In fact, $_ doesn't exist if you define your own iterator like that.
foreach my $a (1..10)
print $_; # error
}
If you're wondering what the point is, consider the case:
my #row = (1..10);
my #col = (1..10);
foreach (#row){
print $_;
foreach(#col){
print $_;
}
}
In this case it is more readable to provide a friendlier name for $_
foreach my $x (#row){
print $x;
foreach my $y (#col){
print $y;
}
}
Try
foreach my $a (#_ = #ar)
now modifying $a does not modify #ar.
Works for me on v5.20.2
I'm looking through perl code and I see this:
sub html_filter {
my $text = shift;
for ($text) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
}
return $text;
}
what does the for loop do in this case and why would you do it this way?
The for loop aliases each element of the list its looping over to $_. In this case, there is only one element, $text.
Within the body, this allows one to write
s/&/&/g;
etc. instead of having to write
$text =~ s/&/&/g;
repeatedly. See also perldoc perlsyn.
Without an explicit loop variable, the for loop uses the special variable called $_. The substitution statements inside the loop also use the special $_ variable because none other is specified, so this is just a trick to make the source code shorter. I would probably write this function as:
sub html_filter {
my $text = shift;
$text =~ s/&/&/g;
$text =~ s/</</g;
$text =~ s/>/>/g;
$text =~ s/"/"/g;
return $text;
}
This will have no performance consequences and is readable by people other than Perl.
As Mr Hewgill points out, the code sample is implicitly localizing and aliasing to $_, the magical implied variable.
He offers a substitute that is more readable at the cost of boilerplate code.
There is no reason to sacrifice readability for brevity. Simply replace the implicit localization and assignment with an explicit version:
sub html_filter {
local $_ = shift;
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
return $_;
}
If I didn't know Perl all that well and came across this code, I'd know that I needed to look at the docs for $_ and local--as a bonus in perlvar, there a few examples of localizing $_.
For anyone who uses Perl a lot, the above should be easy to understand.
So there is really no reason to sacrifice readability for brevity here.
It's just used to alias $text to $_, the default variable. Done because they're too lazy to use an explicit variable or don't want to waste precious cycles creating a new scalar.
Its cleaning up &, < , > and quote characters and replacing them with the appropriate HTML entity chars.
It loops through your text and substitutes ampersands (&) with &, < with <, > with > and " with ". You'd do this for output to a .html document... those are the proper entity characters.
The original code could be more flexible by using wantarray to test the desired context:
sub html_filter {
my #text = #_;
for (#text) {
s/&/&/g;
s/</</g;
s/>/>/g;
s/"/"/g;
}
return wantarray ? #text: "#text"; }
That way you could call it in list context or scalar context and get back the correct results, for example:
my #stuff = html_filter('"','>');
print "$_\n" for #stuff;
my $stuff = html_filter('&');
print $stuff;