Should I use $_[0] or copy the argument list in Perl? - perl

If I pass a hash to a sub:
parse(\%data);
Should I use a variable to $_[0] first or is it okay to keep accessing $_[0] whenever I want to get an element from the hash? clarification:
sub parse
{ $var1 = $_[0]->{'elem1'};
$var2 = $_[0]->{'elem2'};
$var3 = $_[0]->{'elem3'};
$var4 = $_[0]->{'elem4'};
$var5 = $_[0]->{'elem5'};
}
# Versus
sub parse
{ my $hr = $_[0];
$var1 = $hr->{'elem1'};
$var2 = $hr->{'elem2'};
$var3 = $hr->{'elem3'};
$var4 = $hr->{'elem4'};
$var5 = $hr->{'elem5'};
}
Is the second version more correct since it doesn't have to keep accessing the argument array, or does Perl end up interpereting them the same way anyhow?

In this case there is no difference because you are passing reference to hash. But in case of passing scalar there will be difference:
sub rtrim {
## remove tailing spaces from first argument
$_[0] =~ s/\s+$//;
}
rtrim($str); ## value of the variable will be changed
sub rtrim_bugged {
my $str = $_[0]; ## this makes a copy of variable
$str =~ s/\s+$//;
}
rtrim($str); ## value of the variable will stay the same
If you're passing hash reference, then only copy of reference is created. But the hash itself will be the same. So if you care about code readability then I suggest you to create a variable for all your parameters. For example:
sub parse {
## you can easily add new parameters to this function
my ($hr) = #_;
my $var1 = $hr->{'elem1'};
my $var2 = $hr->{'elem2'};
my $var3 = $hr->{'elem3'};
my $var4 = $hr->{'elem4'};
my $var5 = $hr->{'elem5'};
}
Also more descriptive variable names will improve your code too.

For a general discussion of the efficiency of shift vs accessing #_ directly, see:
Is there a difference between Perl's shift versus assignment from #_ for subroutine parameters?
Is 'shift' evil for processing Perl subroutine parameters?
As for your specific code, I'd use shift, but simplify the data extraction with a hash slice:
sub parse
{
my $hr = shift;
my ($var1, $var2, $var3, $var4, $var5) = #{$hr}{qw(elem1 elem2 elem3 elem4 elem5)};
}
I'll assume that this method does something else with these variables that makes it worthwhile to keep them in separate variables (perhaps the hash is read-only, and you need to make some modifications before inserting them into some other data?) -- otherwise why not just leave them in the hashref where they started?

You are micro-optimizing; try to avoid that. Go with whatever is most readable/maintainable. Usually this would be the one where you use a lexical variable, since its name indicates its purpose...but if you use a name like $data or $x this obviously doesn't apply.
In terms of the technical details, for most purposes you can estimate the time taken by counting the number of basic ops perl will use. For your $_[0], an element lookup in a non-lexical array variable takes multiple ops: one to get the glob, one to get the array part of the glob, one or more to get the index (just one for a constant), and one to look up the element. $hr, on the other hand is a single op. To cater to direct users of #_, there's an optimization that reduces the ops for $_[0] to a single combined op (when the index is between 0 and 255 inclusive), but it isn't used in your case because the hash-deref context requires an additional flag on the array element lookup (to support autovivification) and that flag isn't supported by the optimized op.
In summary, using a lexical is going to be both more readable and (if you using it more than once) imperceptibly faster.

My rule is that I try not to use $_[0] in subroutines that are longer than a couple of statements. After that everything gets a user-defined variable.
Why are you copying all of the hash values into variables? Just leave them in the hash where they belong. That's a much better optimization than the one you are thinking about.

Its the same although the second is more clear

Since they work, both are fine, the common practice is to shift off parameters.
sub parse { my $hr = shift; my $var1 = $hr->{'elem1'}; }

Related

sub _init in Perl explanation

I was wondering what this subroutine does in Perl. I believe i have the general idea but i'm wondering about some of the syntax.
sub _init
{
my $self = shift;
if (#_) {
my %extra = #_;
#$self{keys %extra} = values %extra;
}
}
Here is what i think it does: essentially add any "extra" key-value pairs to the nameless hash refereced by the variable $self. Also i'm not 100% sure about this but i think my $self = shift is actually referring to the variable $self that called the _init() subroutine.
My specific questions are:
Is $self actually referring to the variable that called the subroutine _init() ?
What does the #$ syntax mean when writing #$self{keys %extra} = values %extra;
Your understanding is correct. This allows the user of a class to set any parameters they want into the object.
Yes. If you call, for example, $myobject->_init('color', 'green'), then this code will set $myobject->{'color'} = 'green'.
This is a somewhat confusing hash operation. keys %extra is a list (obviously of keys). We are effectively using a "hash slice" here. Think of it like an array slice, where you can call #$arrayref[1, 3, 4]. We use the # sign here because we're talking about a list - it's the list of values corresponding to the list of keys keys %extra in the hash referenced by $self.
Another way to write this would be:
foreach my $key (keys %extra) {
$self->{$key} = $extra{$key};
}
Is $self actually referring to the variable that called the subroutine _init()?
Variables don't call subroutines.
The invocant (what's on the left of the -> in ->_init()) is passed to the method as its first argument, and you placed this in $self. (shift() is short for shift(#_) in subs.)
What does the #$ syntax mean when writing #$self{keys %extra} = values %extra;
#hash{LIST} is a hash slice.
#{ EXPR }{LIST} is a hash slice where the hash to slice is specified through a reference. The curlies are optional when EXPR is simple scalar lookup, so #{ $hash_ref }{LIST} can be written as #$hash_ref{LIST}.
The method add the arguments to %$self, the hash-based object used as the invocant. It could also have been written as follows:
%$self = ( %$self, #_ );

Scope of the default variable $_ in Perl

I have the following method which accepts a variable and then displays info from a database:
sub showResult {
if (#_ == 2) {
my #results = dbGetResults($_[0]);
if (#results) {
foreach (#results) {
print "$count - $_[1] (ID: $_[0])\n";
}
} else {
print "\n\nNo results found";
}
}
}
Everything works fine, except the print line in the foreach loop. This $_ variable still contains the values passed to the method.
Is there anyway to 'force' the new scope of values on $_, or will it always contain the original values?
If there are any good tutorials that explain how the scope of $_ works, that would also be cool!
Thanks
The problem here is that you're using really #_ instead of $_. The foreach loop changes $_, the scalar variable, not #_, which is what you're accessing if you index it by $_[X]. Also, check again the code to see what it is inside #results. If it is an array of arrays or refs, you may need to use the indirect ${$_}[0] or something like that.
In Perl, the _ name can refer to a number of different variables:
The common ones are:
$_ the default scalar (set by foreach, map, grep)
#_ the default array (set by calling a subroutine)
The less common:
%_ the default hash (not used by anything by default)
_ the default file handle (used by file test operators)
&_ an unused subroutine name
*_ the glob containing all of the above names
Each of these variables can be used independently of the others. In fact, the only way that they are related is that they are all contained within the *_ glob.
Since the sigils vary with arrays and hashes, when accessing an element, you use the bracket characters to determine which variable you are accessing:
$_[0] # element of #_
$_{...} # element of %_
$$_[0] # first element of the array reference stored in $_
$_->[0] # same
The for/foreach loop can accept a variable name to use rather than $_, and that might be clearer in your situation:
for my $result (#results) {...}
In general, if your code is longer than a few lines, or nested, you should name the variables rather than relying on the default ones.
Since your question was related more to variable names than scope, I have not discussed the actual scope surrounding the foreach loop, but in general, the following code is equivalent to what you have.
for (my $i = 0; $i < $#results; $i++) {
local *_ = \$results[$i];
...
}
The line local *_ = \$results[$i] installs the $ith element of #results into the scalar slot of the *_ glob, aka $_. At this point $_ contains an alias of the array element. The localization will unwind at the end of the loop. local creates a dynamic scope, so any subroutines called from within the loop will see the new value of $_ unless they also localize it. There is much more detail available about these concepts, but I think they are outside the scope of your question.
As others have pointed out:
You're really using #_ and not $_ in your print statement.
It's not good to keep stuff in these variables since they're used elsewhere.
Officially, $_ and #_ are global variables and aren't members of any package. You can localize the scope with my $_ although that's probably a really, really bad idea. The problem is that Perl could use them without you even knowing it. It's bad practice to depend upon their values for more than a few lines.
Here's a slight rewrite in your program getting rid of the dependency on #_ and $_ as much as possible:
sub showResults {
my $foo = shift; #Or some meaningful name
my $bar = shift; #Or some meaningful name
if (not defined $foo) {
print "didn't pass two parameters\n";
return; #No need to hang around
}
if (my #results = dbGetResults($foo)) {
foreach my $item (#results) {
...
}
}
Some modifications:
I used shift to give your two parameters actual names. foo and bar aren't good names, but I couldn't find out what dbGetResults was from, so I couldn't figure out what parameters you were looking for. The #_ is still being used when the parameters are passed, and my shift is depending upon the value of #_, but after the first two lines, I'm free.
Since your two parameters have actual names, I can use the if (not defined $bar) to see if both parameters were passed. I also changed this to the negative. This way, if they didn't pass both parameters, you can exit early. This way, your code has one less indent, and you don't have a if structure that takes up your entire subroutine. It makes it easier to understand your code.
I used foreach my $item (#results) instead of foreach (#results) and depend upon $_. Again, it's clearer what your program is doing, and you wouldn't have confused $_->[0] with $_[0] (I think that's what you were doing). It would have been obvious you wanted $item->[0].

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.

Is 'shift' evil for processing Perl subroutine parameters?

I'm frequently using shift to unpack function parameters:
sub my_sub {
my $self = shift;
my $params = shift;
....
}
However, many on my colleagues are preaching that shift is actually evil. Could you explain why I should prefer
sub my_sub {
my ($self, $params) = #_;
....
}
to shift?
The use of shift to unpack arguments is not evil. It's a common convention and may be the fastest way to process arguments (depending on how many there are and how they're passed). Here's one example of a somewhat common scenario where that's the case: a simple accessor.
use Benchmark qw(cmpthese);
sub Foo::x_shift { shift->{'a'} }
sub Foo::x_ref { $_[0]->{'a'} }
sub Foo::x_copy { my $s = $_[0]; $s->{'a'} }
our $o = bless {a => 123}, 'Foo';
cmpthese(-2, { x_shift => sub { $o->x_shift },
x_ref => sub { $o->x_ref },
x_copy => sub { $o->x_copy }, });
The results on perl 5.8.8 on my machine:
Rate x_copy x_ref x_shift
x_copy 772761/s -- -12% -19%
x_ref 877709/s 14% -- -8%
x_shift 949792/s 23% 8% --
Not dramatic, but there it is. Always test your scenario on your version of perl on your target hardware to find out for sure.
shift is also useful in cases where you want to shift off the invocant and then call a SUPER:: method, passing the remaining #_ as-is.
sub my_method
{
my $self = shift;
...
return $self->SUPER::my_method(#_);
}
If I had a very long series of my $foo = shift; operations at the top of a function, however, I might consider using a mass copy from #_ instead. But in general, if you have a function or method that takes more than a handful of arguments, using named parameters (i.e., catching all of #_ in a %args hash or expecting a single hash reference argument) is a much better approach.
It is not evil, it is a taste sort of thing. You will often see the styles used together:
sub new {
my $class = shift;
my %self = #_;
return bless \%self, $class;
}
I tend to use shift when there is one argument or when I want to treat the first few arguments differently than the rest.
This is, as others have said, a matter of taste.
I generally prefer to shift my parameters into lexicals because it gives me a standard place to declare a group of variables that will be used in the subroutine. The extra verbosity gives me a nice place to hang comments. It also makes it easy to provide default values in a tidy fashion.
sub foo {
my $foo = shift; # a thing of some sort.
my $bar = shift; # a horse of a different color.
my $baz = shift || 23; # a pale horse.
# blah
}
If you are really concerned about the speed of calling your routines, don't unpack your arguments at all--access them directly using #_. Be careful, those are references to the caller's data you are working with. This is a common idiom in POE. POE provides a bunch of constants that you use to get positional parameters by name:
sub some_poe_state_handler {
$_[HEAP]{some_data} = 'chicken';
$_[KERNEL]->yield('next_state');
}
Now the big stupid bug you can get if you habitually unpack params with shift is this one:
sub foo {
my $foo = shift;
my $bar = shift;
my #baz = shift;
# I should really stop coding and go to bed. I am making dumb errors.
}
I think that consistent code style is more important than any particular style. If all my coworkers used the list assignment style, I'd use it too.
If my coworkers said there was a big problem using shift to unpack, I'd ask for a demonstration of why it is bad. If the case is solid, then I'd learn something. If the case is bogus, I could then refute it and help stop the spread of anti-knowledge. Then I'd suggest that we determine a standard method and follow it for future code. I might even try to set up a Perl::Critic policy to check for the decided upon standard.
Assigning #_ to a list can bring some helpul addtional features.
It makes it slightly easier to add additional named parameters at a later date, as you modify your code Some people consider this a feature, similar to how finishing a list or a hash with a trailing ',' makes it slightly easier to append members in the future.
If you're in the habit of using this idiom, then shifting the arguments might seem harmful, because if you edit the code to add an extra argument, you could end up with a subtle bug, if you don't pay attention.
e.g.
sub do_something {
my ($foo) = #_;
}
later edited to
sub do_something {
my ($foo,$bar) = #_; # still works
}
however
sub do_another_thing {
my $foo = shift;
}
If another colleague, who uses the first form dogmatically (perhaps they think shift is evil) edits your file and absent-mindedly updates this to read
sub do_another_thing {
my ($foo, $bar) = shift; # still 'works', but $bar not defined
}
and they may have introduced a subtle bug.
Assigning to #_ can be more compact and efficient with vertical space, when you have a small number of parameters to assign at once. It also allows for you to supply default arguments if you're using the hash style of named function parameters
e.g.
my (%arguments) = (user=>'defaultuser',password=>'password',#_);
I would still consider it a question of style / taste. I think the most important thing is to apply one style or the other with consistency, obeying the principle of least surprise.
I don't think shift is evil. The use of shift shows your willingness to actually name variables - instead of using $_[0].
Personally, I use shift when there's only one parameter to a function. If I have more than one parameter, I'll use the list context.
my $result = decode($theString);
sub decode {
my $string = shift;
...
}
my $otherResult = encode($otherString, $format);
sub encode {
my ($string,$format) = #_;
...
}
There is an optimization for list assignment.
The only reference I could find, is this one.
5.10.0 inadvertently disabled an optimization, which caused a
measurable performance drop in list
assignment, such as is often used to
assign function parameters from #_ .
The optimisation has been re-instated,
and the performance regression fixed.
This is an example of the affected performance regression.
sub example{
my($arg1,$arg2,$arg3) = #_;
}
Perl::Critic is your friend here. It follows the "standards" set up in Damian Conway's book Perl Best Practices. Running it with --verbose 11 gives you an explanation on why things are bad. Not unpacking #_ first in your subs is a severity 4 (out of 5). E.g:
echo 'package foo; use warnings; use strict; sub quux { my foo= shift; my (bar,baz) = #_;};1;' | perlcritic -4 --verbose 11
Always unpack #_ first at line 1, near 'sub quux { my foo= shift; my (bar,baz) = #_;}'.
Subroutines::RequireArgUnpacking (Severity: 4)
Subroutines that use `#_' directly instead of unpacking the arguments to
local variables first have two major problems. First, they are very hard
to read. If you're going to refer to your variables by number instead of
by name, you may as well be writing assembler code! Second, `#_'
contains aliases to the original variables! If you modify the contents
of a `#_' entry, then you are modifying the variable outside of your
subroutine. For example:
sub print_local_var_plus_one {
my ($var) = #_;
print ++$var;
}
sub print_var_plus_one {
print ++$_[0];
}
my $x = 2;
print_local_var_plus_one($x); # prints "3", $x is still 2
print_var_plus_one($x); # prints "3", $x is now 3 !
print $x; # prints "3"
This is spooky action-at-a-distance and is very hard to debug if it's
not intentional and well-documented (like `chop' or `chomp').
An exception is made for the usual delegation idiom
`$object->SUPER::something( #_ )'. Only `SUPER::' and `NEXT::' are
recognized (though this is configurable) and the argument list for the
delegate must consist only of `( #_ )'.
It isn't intrinsically evil, but using it to pull off the arguments of a subroutine one by one is comparatively slow and requires a greater number of lines of code.

What does '#_' do in Perl?

I was glancing through some code I had written in my Perl class and I noticed this.
my ($string) = #_;
my #stringarray = split(//, $string);
I am wondering two things:
The first line where the variable is in parenthesis, this is something you do when declaring more than one variable and if I removed them it would still work right?
The second question would be what does the #_ do?
The #_ variable is an array that contains all the parameters passed into a subroutine.
The parentheses around the $string variable are absolutely necessary. They designate that you are assigning variables from an array. Without them, the #_ array is assigned to $string in a scalar context, which means that $string would be equal to the number of parameters passed into the subroutine. For example:
sub foo {
my $bar = #_;
print $bar;
}
foo('bar');
The output here is 1--definitely not what you are expecting in this case.
Alternatively, you could assign the $string variable without using the #_ array and using the shift function instead:
sub foo {
my $bar = shift;
print $bar;
}
Using one method over the other is quite a matter of taste. I asked this very question which you can check out if you are interested.
When you encounter a special (or punctuation) variable in Perl, check out the perlvar documentation. It lists them all, gives you an English equivalent, and tells you what it does.
Perl has two different contexts, scalar context, and list context. An array '#_', if used in scalar context returns the size of the array.
So given these two examples, the first one gives you the size of the #_ array, and the other gives you the first element.
my $string = #_ ;
my ($string) = #_ ;
Perl has three 'Default' variables $_, #_, and depending on who you ask %_. Many operations will use these variables, if you don't give them a variable to work on. The only exception is there is no operation that currently will by default use %_.
For example we have push, pop, shift, and unshift, that all will accept an array as the first parameter.
If you don't give them a parameter, they will use the 'default' variable instead. So 'shift;' is the same as 'shift #_;'
The way that subroutines were designed, you couldn't formally tell the compiler which values you wanted in which variables. Well it made sense to just use the 'default' array variable '#_' to hold the arguments.
So these three subroutines are (nearly) identical.
sub myjoin{
my ( $stringl, $stringr ) = #_;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift;
my $stringr = shift;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift #_;
my $stringr = shift #_;
return "$stringl$stringr";
}
I think the first one is slightly faster than the other two, because you aren't modifying the #_ variable.
The variable #_ is an array (hence the # prefix) that holds all of the parameters to the current function.