How to have sub use $_ when parameter omitted? - perl

How can I get a perl sub to use $_ when the parameter is omitted, like chr does? Is this the best way?
my #chars = map { chr } #numbers; # example
my #trimmed_names = map { trim } #names;
sub trim
{
my $str = shift || $_;
$str =~ s/^\s+|\s+$//g;
return $str;
}

The $_ is directly seen in a sub called in its scope, so you can indeed just use it
sub trim { s/^\s+|\s+$//gr } # NOTE: doesn't change $_
where with /r modifier the changed string is returned and original isn't changed, crucial here.
However, this can be tricky and can (easily) result in subtle and very hard-to-find bugs. Here is one ready example. If we changed the $_ in the sub during processing, like
sub trim { # WARNING: caller's data changed
s/^\s+|\s+$//g;
return $_;
}
then the elements of #names in the caller have been changed, what is generally not expected. This is because the changed upper-scope $_ itself is aliased in map's body.† As $_ is a convenient default for many things we'd have to keep track of everything used in the sub. So I'd indeed first copy $_, or safer yet localize it, in the sub and work with that.
Finally, in order to use either a passed parameter or $_ (at the point of the call)
sub trim {
my $str = #_ ? shift : $_; #/
$str =~ s/^\s+|\s$//gr;
}
my #trimmed_names = map { trim } #names; # may omit () if sub declared before
This is because the visibility of $_ is unrelated to the argument list in #_ so one can also pass arguments. Here we also get the (much) safer copying of $_.
The shift || $_ from the question would dismiss a 0 or '' (empty string) in #_, what is in principle valid input; the shift // $_ would dismiss an undef, also a possible input. Thanks to ikegami's comment on this. Thus explicitly test whether there is anything in #_.
While passing a variable that's undef isn't valid here it may be valid input in general. More to the point, the premise here is to use an argument if provided, so we should do that and then (hopefully) detect an error from the calling code (if passing undef shouldn't have happened), instead of quietly side-stepping it, by switching to $_.
So, my answer is a qualified "yes" -- that's one way to do it; but I may find it uncomfortable to work with a codebase where user's subs mix scopes. This example trim in map is perfectly safe as it stands, but where else may such a function wind up used? Why not just pass arguments?
Note: In order to be able to call a user-defined sub without parenthesis we must have it declared in the source before the point of invocation so that the interpreter knows what that bareword (trim) is, since without parens it doesn't have any hints.
† I think it's worth recalling at this point that arguments to a sub are aliased, not copied, so if elements of #_ themselves are changed then caller's data gets changed. This isn't directly related to $_ but the behavior can be.

You can use the _ prototype.
sub trim(_) { $_[0] =~ s/^\s+|\s+\z//rg }
Otherwise, you can simply use $_ if no arguments were provided.
sub trim { ( #_ ? $_[0] : $_ ) =~ s/^\s+|\s+\z//rg }
Either way,
say for map trim, #strings;
-or-
say for map trim($_), #strings;

Related

Perl: passing hash by ref using rule1

I am still unclear about why by ref portion is showing undefined value for %Q and $_ uninitialized. I have been looking through perlreftut and still unable to see what I have done wrong. Passing the hash as a flat array has no issue.
Doing it by ref with testRef(\%mkPara) passes a scalar hash reference to the subroutine, right? So, does my %Q = %{$_} not turn it back into a hash?
use strict;
use diagnostics;
use warnings;
my %mkPara = ('aa'=>2,'bb'=>3,'cc'=>4,'dd'=>5);
sub testFlat
{
my %P = #_;
print "$P{'aa'}, $P{'bb'}, ", $P{'cc'}*$P{'dd'}, "\n";
}
sub testRef
{
my %Q = %{$_}; #can't use an undefined value as HASH reference
#print $_->{'aa'}, "\n";#Use of uninitialized value
print $Q{'aa'},"\n";
}
#testFlat(%mkPara);
testRef(\%mkPara);
When you use arguments in a function call (\%mkPara in your case), you can access them through #_ array inside the function.
Here, you pass a single argument to the function : \%mkPara, which you can then access by accessing the first element of #_ by using $_[0].
$_ is the default variable for some builtin functions/operators (print, m//, s///, chomp and a lot more). Usually seen in while or for loops. But in your code, you have no reason to use it (you are never setting it to anything, so it's still set to undef, hence the error "Can't use an undefined value as a HASH reference".
So your function should actually be :
sub testRef
{
my %Q = %{$_[0]}; # instead of %{$_}
print $_[0]->{'aa'}, "\n"; # instead of $_->{'aa'}
print $Q{'aa'},"\n";
}
If needed, you can find more about functions on perlsub.
However, as #Ikegami pointed out in the comments, using my %Q = %{$_[0]}; creates a copy of the hash you sent to the function, which in most cases (including that one where you just print a key of the hash) is very suboptimal as you could just use a hashref (like you are doing when you do $_[0]->{'aa'}).
You can use hash references like this (roughly the same example as the answer of #Zaid) :
sub testRef
{
my ( $Q ) = #_;
print $Q->{aa} ;
print $_, "\n" for keys %$Q;
}
testRef(\%mkPara);
There are quite a lot of resources about references online, for instance perlreftut that you were already looking at.
This can seem a bit tricky at first, but the reason is that $_ is not the same as #_.
From perlvar:
$_ is the implicit/"default" variable that does not have to be spelled out explicitly for certain functions (e.g. split )
Within a subroutine the array #_ contains the parameters passed to that subroutine
So the reason why
my %Q = %{$_};
says you can't use an undefined value as hash reference is because $_ is not defined.
What you really need here is
my %Q = %{$_[0]};
because that is the first element of #_, which is what was passed to testRef in the first place.
In practice I tend to find myself doing things a little differently because it lends itself to flexibility for future modifications:
sub testRef {
my ( $Q ) = #_;
print $_, "\n" for keys %$Q; # just as an example
}

local $_ in a Perl subroutine

Is it correct to write This?
sub foobar {
local $_ = $_[0] if #_;
s/foo/bar/;
$_;
}
The idea is to take $_ if no arguments given as chompdo. I can then either write
foobar($_);
or
&foobar;
local $_ = ... if #_; will only localize $_ if the the sub received an argument, meaning it won't protect the caller's $_ is the sub doesn't receive an argument, and that's not what you want.
The minimal fix is
sub sfoobar {
local $_ = #_ ? shift : $_;
s/foo/bar/;
return $_;
}
But you might as well use a named variable at this point.
sub sfoobar {
my $s = #_ ? shift : $_;
$s =~ s/foo/bar/;
return $s;
}
5.10+ introduced the _ prototype.
sub sfoobar(_) {
my ($s) = #_;
$s =~ s/foo/bar/;
return $s;
}
5.14+ introduced s///r.
sub sfoobar(_) {
return $_[0] =~ s/foo/bar/r;
}
This isn't correct, no. The trouble is that you can't conditionally local something - it's either localised, or it isn't.
Instead of doing that, what I suggest is you localise it, then conditionally copy from #_
local $_ = $_;
$_ = shift if #_;
This way, $_ is always localised, but only conditionally copied from the first positional argument if one exists.
If you want to pass use an outer $_ in a subroutine, you can use the "_" prototype:
# dolund.pl
#
use strict;
sub dolund (_)
{ my $p1 = $_[0];
print "passed parameter is $p1\n";
}
dolund 12; # prints 12
my $fred = 21; # prints 21
dolund $fred;
$_ = 'not 12';
dolund; # prints "not 12"
Obiously, you could use $p1=~ s/foo/bar/; if you like. I just wanted to demonstrate the implicit passing of $_.
I've got to ask - what are you actually trying to accomplish here?
It looks like you want a sub that works like some of the 'builtins' like chomp. I would suggest this is bad practice.
Unexpected things make code maintenance harder. The next guy to maintain your code should never have to think 'wtf?'.
messing with 'built in' variables - such as reassigning values to $_ can have very strange consequences.
if someone sees your subroutine call, they're going to have to go and look what it does anyway. That's almost by definition a bad subroutine.
Question: What's the scope of $_?
Answer: "It's complicated" because it's sort of global, but sometimes it's implicitly localized. And sometimes it's not a variable in it's own right, it's an alias that by modifying it you change the original.
This means it's just plain bad news from a code maintainability standpoint.
From: http://perldoc.perl.org/perlvar.html#General-Variables
$_ is by default a global variable. However, as of perl v5.10.0, you can use a lexical version of $_ by declaring it in a file or in a block with my. Moreover, declaring our $_ restores the global $_ in the current scope. Though this seemed like a good idea at the time it was introduced, lexical $_ actually causes more problems than it solves. If you call a function that expects to be passed information via $_ , it may or may not work, depending on how the function is written, there not being any easy way to solve this. Just avoid lexical $_ , unless you are feeling particularly masochistic. For this reason lexical $_ is still experimental and will produce a warning unless warnings have been disabled. As with other experimental features, the behavior of lexical $_ is subject to change without notice, including change into a fatal error.

Perl map passing arguments

I'm trying map() with my own subroutine. When I tried it with a Perl's builtin function, it works. But when I tried map() with my own subroutine, it fails.
I couldn't point out what makes the error.
Here is the code snippet.
#!/usr/bin/perl
use strict;
sub mysqr {
my ($input) = #_;
my $answer = $input * $input;
return $answer;
}
my #questions = (1,2,3,4,5);
my #answers;
#answers = map(mysqr, #questions); # doesn't work.
#answers = map {mysqr($_)} #questions; #works.
print "map = ";
print join(", ", #answers);
print "\n";
Map always assigns an element of the argument list to $_, then evaluates the expression. So map mysqr($_), 1,2,3,4,5 calls mysqr on each of the elements 1,2,3,4,5, because $_ is set to each of 1,2,3,4,5 in turn.
The reason you can often omit the $_ when calling a built-in function is that many Perl built-in functions, if not given an argument, will operate on $_ by default. For example, the lc function does this. Your mysqr function doesn't do this, but if you changed it to do this, the first form would work:
sub mysqr {
my $input;
if (#_) { ($input) = #_ }
else { $input = $_ } # No argument was given, so default to $_
my $answer = $input * $input;
return $answer;
}
map(mysqr, 1,2,3,4,5); # works now
The difference is that in the second case, you are explicitly passing the argument, and in the first one, you pass nothing.
#answers = map(mysqr, #questions); # same as mysqr(), no argument passed
#answers = map {mysqr($_)} #questions; # $_ is passed on to $input
You might be thinking of the fact that many Perl built-in functions use $_ when no argument is given. This is, however, not the default behaviour of user defined subroutines. If you want that functionality, you need to add it yourself. Though be warned that it often is not a good idea.
Note that if you use use warnings, which you always should, you will get a descriptive error:
Use of uninitialized value $input in multiplication (*) at foo.pl line 8.
Which tells you that no data is passed to $input.
Not using warnings is not removing errors from your code, it is merely hiding them, much like hiding the "low oil" warning lamp in a car does not prevent engine failure.

In Perl, should a function do the wantarray dance, or can we expect the caller to use map?

On the (much appreciated) perlmonks site, I found the following snippet that trims the spaces from both sides of a string:
sub trim {
#_ = $_ if not #_ and defined wantarray;
#_ = #_ if defined wantarray;
for (#_ ? #_ : $_) { s/^\s+//, s/\s+$// }
return wantarray ? #_ : $_[0] if defined wantarray;
}
I don't understand why the author goes to all the trouble of checking wantarray almost every line. Why not just trim the string, and have the programmer use map when he is passing an array?
What is the difference between that trim, called like this:
my #test_array = ( 'string1', ' string2', 'string3 ', ' string4 ');
my #result = trim(#test_array);
Or a simple trim, called like this when one needs to trim an array:
my #test_array = ( 'string1', ' string2', 'string3 ', ' string4 ');
my #result = map { trim($_) } #test_array;
First of all it's better if you abstract that map away:
#e.1.
sub trim
{
my #ar = #_;
for (#ar) { s/^\s+//, s/\s+$// };
return wantarray ? #ar : $ar[0];
}
Second, consider the above example and compare it with:
#e.2.
sub trim
{
for (#_) { s/^\s+//, s/\s+$// };
}
what's the difference?
e.1. returns a new trimmed array, while e.2. modifies the original array.
Ok now, what does the original cryptic subroutine do?
It auto-magically (yeaah, it's Perl) modifies the original array if you are not assigning the return value to anything OR leaves the original array untouched and returns a new trimmed array if you are assigning the return value to another variable.
How?
By checking to see if wantarray is defined at all. as long as the function is on the right hand side and the return value is assigned to a variable "defined wantarray" is true (regardless of scalar/array context).
Breaking this down line by line since it hasn't been:
sub trim {
#_ = $_ if not #_ and defined wantarray;
# if there are no arguments, but a return value is requested
# then place a copy of $_ into #_ to work on
#_ = #_ if defined wantarray;
# if the caller expects a return value, copy the values in #_ into itself
# (this breaks the aliasing to the caller's variables)
for (#_ ? #_ : $_) { s/^\s+//, s/\s+$// }
# perform the substitution, in place, on either #_ or $_ depending on
# if arguments were passed to the function
return wantarray ? #_ : $_[0] if defined wantarray;
# if called in list context, return #_, otherwise $_[0]
}
I agree that the code gets a bit tedious with all of the wantarray checks, but the result is a function that shares a level of flexibility with Perl's builtin functions. The net result of making the function "smart" is to clean up the call site (avoiding looping constructs, temp variables, repetition, ...) which depending on the frequency the function is used can meaningfully improve code readability.
The function could be simplified a little bit:
sub trim {
#_ = #_ ? #_ : $_ if defined wantarray;
s/^\s+//, s/\s+$// for #_ ? #_ : $_;
wantarray ? #_ : shift
}
The first two lines can be rolled into one, since they are doing the same thing (assigning to #_) just with different source values. And there is no need for the outer return ... if defined wantarray check at the end, since returning a value in void context doesn't do anything anyway.
But I would probably change the last line to wantarray ? #_ : pop since that makes it behave like a list (last element in scalar context).
Once all is said and done, this allows the following calling styles to be used:
my #test_array = ( 'string1', ' string2', 'string3 ', ' string4 ');
my #result = trim #test_array;
my $result = trim $test_array[0];
trim #test_array; # in place trim
and even still supports the call site loop:
my #result = map trim, #test_array;
or more verbosely as:
my #result = map trim($_), #test_array;
and it can be used inside a while loop similar to chomp
while (<$file_handle>) {
trim;
# do something
}
Opinions about dwimmery in Perl are mixed. I personally like it when functions give me the flexibility to code the caller in a way that makes sense, rather than working around a function's rigid interface.
Probably the author wanted to mimic the behavior of the standard chomp function. There is no need to do this in your own function.
man perlfunc
chomp VARIABLE
chomp( LIST )
chomp
[...] If you chomp a list, each element is chomped. [...]
Note that this is exactly how Text::Trim is implemented. See its description on various use-cases. Plays with wantarray allows to distinguish various contexts and implement different semantics for each.
Personally I prefer just single semantics as it is easier to understand and use. I would avoid using $_ default variable or modification in place, in line with Nylon Smile's example 1.

Why is Perl foreach variable assignment modifying the values in the array?

OK, I have the following code:
use strict;
my #ar = (1, 2, 3);
foreach my $a (#ar)
{
$a = $a + 1;
}
print join ", ", #ar;
and the output?
2, 3, 4
What the heck? Why does it do that? Will this always happen? is $a not really a local variable? What where they thinking?
Perl has lots of these almost-odd syntax things which greatly simplify common tasks (like iterating over a list and changing the contents in some way), but can trip you up if you're not aware of them.
$a is aliased to the value in the array - this allows you to modify the array inside the loop. If you don't want to do that, don't modify $a.
See perldoc perlsyn:
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
There is nothing weird or odd about a documented language feature although I do find it odd how many people refuse check the docs upon encountering behavior they do not understand.
$a in this case is an alias to the array element. Just don't have $a = in your code and you won't modify the array. :-)
If I remember correctly, map, grep, etc. all have the same aliasing behaviour.
As others have said, this is documented.
My understanding is that the aliasing behavior of #_, for, map and grep provides a speed and memory optimization as well as providing interesting possibilities for the creative. What happens is essentially, a pass-by-reference invocation of the construct's block. This saves time and memory by avoiding unnecessary data copying.
use strict;
use warnings;
use List::MoreUtils qw(apply);
my #array = qw( cat dog horse kanagaroo );
foo(#array);
print join "\n", '', 'foo()', #array;
my #mapped = map { s/oo/ee/g } #array;
print join "\n", '', 'map-array', #array;
print join "\n", '', 'map-mapped', #mapped;
my #applied = apply { s/fee//g } #array;
print join "\n", '', 'apply-array', #array;
print join "\n", '', 'apply-applied', #applied;
sub foo {
$_ .= 'foo' for #_;
}
Note the use of List::MoreUtils apply function. It works like map but makes a copy of the topic variable, rather than using a reference. If you hate writing code like:
my #foo = map { my $f = $_; $f =~ s/foo/bar/ } #bar;
you'll love apply, which makes it into:
my #foo = apply { s/foo/bar/ } #bar;
Something to watch out for: if you pass read only values into one of these constructs that modifies its input values, you will get a "Modification of a read-only value attempted" error.
perl -e '$_++ for "o"'
the important distinction here is that when you declare a my variable in the initialization section of a for loop, it seems to share some properties of both locals and lexicals (someone with more knowledge of the internals care to clarify?)
my #src = 1 .. 10;
for my $x (#src) {
# $x is an alias to elements of #src
}
for (#src) {
my $x = $_;
# $_ is an alias but $x is not an alias
}
the interesting side effect of this is that in the first case, a sub{} defined within the for loop is a closure around whatever element of the list $x was aliased to. knowing this, it is possible (although a bit odd) to close around an aliased value which could even be a global, which I don't think is possible with any other construct.
our #global = 1 .. 10;
my #subs;
for my $x (#global) {
push #subs, sub {++$x}
}
$subs[5](); # modifies the #global array
Your $a is simply being used as an alias for each element of the list as you loop over it. It's being used in place of $_. You can tell that $a is not a local variable because it is declared outside of the block.
It's more obvious why assigning to $a changes the contents of the list if you think about it as being a stand in for $_ (which is what it is). In fact, $_ doesn't exist if you define your own iterator like that.
foreach my $a (1..10)
print $_; # error
}
If you're wondering what the point is, consider the case:
my #row = (1..10);
my #col = (1..10);
foreach (#row){
print $_;
foreach(#col){
print $_;
}
}
In this case it is more readable to provide a friendlier name for $_
foreach my $x (#row){
print $x;
foreach my $y (#col){
print $y;
}
}
Try
foreach my $a (#_ = #ar)
now modifying $a does not modify #ar.
Works for me on v5.20.2