I had a small question. I was reading some code and as my school didn't teach me anything useful about perl programming, I am here to ask you people. I see this line being used a lot in some perl programs:
$variable = &something();
I don't know what the & sign means here as I never say it in perl. And the something is a subroutine ( I am guessing). It usually says a name and it has arguments like a function too sometimes. Can someone tell me what & stands for here and what that something is all the time.
The variable takes in some sort of returned value and is then used to check some conditions, which makes me think it is a subroutine. But still why the &?
Thanks
Virtually every time you see & outside of \&foo and EXRP && EXPR, it's an error.
&foo(...) is the same as foo(...) except foo's prototype will be ignored.
sub foo(&#) { ... } # Cause foo to takes a BLOCK as its first arg
foo { ... } ...;
&foo(sub { ... }, ...); # Same thing.
Only subroutines (not operators) will be called by &foo(...).
sub print { ... }
print(...); # Calls the print builtin
&print(...); # Calls the print sub.
You'll probably never need to use this feature in your entire programming career. If you see it used, it's surely someone using & when they shouldn't.
&foo is similar to &foo(#_). The difference is that changes to #_ in foo affects the current sub's #_.
You'll probably never need to use this feature in your entire programming career. If you see it used, it's surely someone using & when they shouldn't or a foolish attempt at optimization. However, the following is pretty elegant:
sub log_info { unshift #_, 'info'; &log }
sub log_warn { unshift #_, 'warn'; &log }
sub log_error { unshift #_, 'error'; &log }
goto &foo is similar to &foo, except the current subroutine is removed from the call stack first. This will cause it to not show up in stack traces, for example.
You'll probably never need to use this feature in your entire programming career. If you see it used, it's surely a foolish attempt at optimization.
sub log_info { unshift #_, 'info'; goto &log; } # These are slower than
sub log_warn { unshift #_, 'warn'; goto &log; } # not using goto, but maybe
sub log_error { unshift #_, 'error'; goto &log; } # maybe log uses caller()?
$& contains what the last regex expression match matched. Before 5.20, using this causes every regex in your entire interpreter to become slower (if they have no captures), so don't use this.
print $& if /fo+/; # Bad before 5.20
print $MATCH if /fo+/; # Bad (Same thing. Requires "use English;")
print ${^MATCH} if /fo+/p; # Ok (Requires Perl 5.10)
print $1 if /(fo+)/; # Ok
defined &foo is a perfectly legitimate way of checking if a subroutine exists, but it's not something you'll likely ever need. There's also exists &foo is similar, but not as useful.
EXPR & EXPR is the bitwise AND operator. This is used when dealing with low-level systems that store multiple pieces of information in a single word.
system($cmd);
die "Can't execute command: $!\n" if $? == -1;
die "Child kill by ".($? & 0x7F)."\n" if $? & 0x7F;
die "Child exited with ".($? >> 8)."\n" if $? >> 8;
&{ EXPR }() (and &$ref()) is a subroutine call via a reference. This is a perfectly acceptable and somewhat common thing to do, though I prefer the $ref->() syntax. Example in next item.
\&foo takes a reference to subroutine foo. This is a perfectly acceptable and somewhat common thing to do.
my %dispatch = (
foo => \&foo,
bar => \&bar,
);
my $handler = $dispatch{$cmd} or die;
$handler->();
# Same: &{ $handler }();
# Same: &$handler();
EXPR && EXPR is the boolean AND operator. I'm sure you're familiar with this extremely common operator.
if (0 <= $x && $x <= 100) { ... }
In older versions of perl & was used to call subroutines. Now this is not necessary and \& is mostly used to take a reference to subroutine,
my $sub_ref = \&subroutine;
or to ignore function prototype (http://perldoc.perl.org/perlsub.html#Prototypes)
Other than for referencing subroutines & is bitwise and operator,
http://perldoc.perl.org/perlop.html#Bitwise-And
Related
eval is slow when done on a string: The string first has to be parsed before it can be executed.
I am looking for a way to cache the parsing, so that I can reuse the parsed string for yet another eval. The next eval will be the same code, but will not eval to the same value, so I cannot simply cache the results.
From the description I am looking for ceval from Eval::Compile.
But I cannot use Eval::Compile, as that requires a C compiler for the platform, and it is not given that the user has a C compiler.
So can I do something similar to ceval in pure Perl?
Background
GNU Parallel lets the user give Perl expressions that will be eval'ed on every argument. Currently the Perl expressions are given as strings by the user and eval'ed for every argument. The Perl expressions remain unchanged for each argument. It is therefore a waste to recompile the expression as the recompilation will not change anything.
Profiling of the code shows that the eval is one of the bottlenecks.
Example
The user enters: $_ .= "foo" and s/a/b/g
A user's scripts are stored in $usereval1 and $usereval2.
The user gives 10000 random arguments (strings) stored in #arguments.
sub replace {
my ($script, $arg) = #_;
local $_;
$_ = $arg;
# This is where I would like to cache the parsed $script.
eval $script;
return $_;
}
for my $arg (#arguments) {
# Loads of indirect code (in the order of 1000 lines) that
# call subs calling subs calling subs that eventually does:
$replaced1 = replace($usereval1, $arg);
$replaced2 = replace($usereval2, $arg);
# Yet more code that does stuff with $replaced1 $replaced2
}
You can store a subroutine ref like this:
perl -lwe 'my $x = eval(q( sub { my $foo = shift; $foo*2; } )); print $x->(12);'
This prints 24. You can reuse the code without the need to recompile it.
I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).
I am learning perl and understand that it is a common and accepted practice to unpack subroutine arguments using shift. I also understand that it is common and acceptable practice to omit function arguments to use the default #_ array.
Considering these two things, if you call a subroutine without arguments, the #_ can (and will, if using shift) be changed. Does this mean that calling another subroutine with default arguments, or, in fact, using the #_ array after this, is considered bad practice? Consider this example:
sub total { # calculate sum of all arguments
my $running_sum;
# take arguments one by one and sum them together
while (#_) {
$running_sum += shift;
}
$running_sum;
}
sub avg { calculate the mean of given arguments
if (#_ == 0) { return }
my $sum = &total; # gets the correct answer, but changes #_
$sum / #_ # causes division by zero, since #_ is now empty
}
My gut feeling tells me that using shift to unpack arguments would actually be bad practice, unless your subroutine is actually supposed to change the passed arguments, but I have read in multiple places, including Stack Overflow, that this is not a bad practice.
So the question is: if using shift is common practice, should I always assume the passed argument list could get changed, as a side-effect of the subroutine (like the &total subroutine in the quoted example)? Is there maybe a way to pass arguments by value, so I can be sure that the argument list does not get changed, so I could use it again (like in the &avg subroutine in the quoted text)?
In general, shifting from the arguments is ok—using the & sigil to call functions isn't. (Except in some very specific situations you'll probably never encounter.)
Your code could be re-written, so that total doesn't shift from #_. Using a for-loop may even be more efficient.
sub total {
my $total = 0;
$total += $_ for #_;
$total;
}
Or you could use the sum function from List::Util:
use List::Util qw(sum);
sub avg { #_ ? sum(#_) / #_ : 0 }
Using shift isn't that common, except for extracting $self in object oriented Perl. But as you always call your functions like foo( ... ), it doesn't matter if foo shifts or doesn't shift the argument array.
(The only thing worth noting about a function is whether it assigns to elements in #_, as these are aliases for the variables you gave as arguments. Assigning to elements in #_ is usually bad.)
Even if you can't change the implementation of total, calling the sub with an explicit argument list is safe, as the argument list is a copy of the array:
(a) &total — calls total with the identical #_, and overrides prototypes.
(b) total(#_) — calls total with a copy of #_.
(c) &total(#_) — calls total with a copy of #_, and overrides prototypes.
Form (b) is standard. Form (c) shouldn't be seen, except in very few cases for subs inside the same package where the sub has a prototype (and don't use prototypes), and they have to be overridden for some obscure reason. A testament to poor design.
Form (a) is only sensible for tail calls (#_ = (...); goto &foo) or other forms of optimization (and premature optimization is the root of all evil).
You should avoid using the &func; style of calling unless you have a really good reason, and trust that others do the same.
To guard your #_ against modification by a callee, just do &func() or func.
Perl is a little too lax sometimes and having multiple ways of accessing input parameters can make smelly and inconsistent code. For want of a better answer, try to impose your own standard.
Here's a few ways I've used and seen
Shifty
sub login
{
my $user = shift;
my $passphrase = shift;
# Validate authentication
return 0;
}
Expanding #_
sub login
{
my ($user, $passphrase) = #_;
# Validate authentication
return 0;
}
Explicit indexing
sub login
{
my user = $_[0];
my user = $_[1];
# Validate authentication
return 0;
}
Enforce parameters with function prototypes (this is not popular however)
sub login($$)
{
my ($user, $passphrase) = #_;
# Validate authentication
return 0;
}
Sadly you still have to perform your own convoluted input validation/taint checking, ie:
return unless defined $user;
return unless defined $passphrase;
or better still, a little more informative
unless (defined($user) && defined($passphrase)) {
carp "Input error: user or passphrase not defined";
return -1;
}
Perldoc perlsub should really be your first port of call.
Hope this helps!
Here are some examples where the careful use of #_ matters.
1. Hash-y Arguments
Sometimes you want to write a function which can take a list of key-value pairs, but one is the most common use and you want that to be available without needing a key. For example
sub get_temp {
my $location = #_ % 2 ? shift : undef;
my %options = #_;
$location ||= $options{location};
...
}
So now if you call the function with an odd number of arguments, the first is location. This allows get_temp('Chicago') or get_temp('New York', unit => 'C') or even get_temp( unit => 'K', location => 'Nome, Ak'). This may be a more convenient API for your users. By shifting the odd argument, now #_ is an even list and may be assigned to a hash.
2. Dispatching
Lets say we have a class that we want to be able to dispatch methods by name (possibly AUTOLOAD could be useful, we will hand roll). Perhaps this is a command line script where arguments are methods. In this case we define two dispatch methods one "clean" and one "dirty". If we call with the -c flag we get the clean one. These methods find the method by name and call it. The difference is how. The dirty one leaves itself in the stack trace, the clean one has to be more cleaver, but dispatches without being in the stack trace. We make a death method which gives us that trace.
#!/usr/bin/env perl
use strict;
use warnings;
package Unusual;
use Carp;
sub new {
my $class = shift;
return bless { #_ }, $class;
}
sub dispatch_dirty {
my $self = shift;
my $name = shift;
my $method = $self->can($name) or confess "No method named $name";
$self->$method(#_);
}
sub dispatch_clean {
my $self = shift;
my $name = shift;
my $method = $self->can($name) or confess "No method named $name";
unshift #_, $self;
goto $method;
}
sub death {
my ($self, $message) = #_;
$message ||= 'died';
confess "$self->{name}: $message";
}
package main;
use Getopt::Long;
GetOptions
'clean' => \my $clean,
'name=s' => \(my $name = 'Robot');
my $obj = Unusual->new(name => $name);
if ($clean) {
$obj->dispatch_clean(#ARGV);
} else {
$obj->dispatch_dirty(#ARGV);
}
So now if we call ./test.pl to invoke the death method
$ ./test.pl death Goodbye
Robot: Goodbye at ./test.pl line 32
Unusual::death('Unusual=HASH(0xa0f7188)', 'Goodbye') called at ./test.pl line 19
Unusual::dispatch_dirty('Unusual=HASH(0xa0f7188)', 'death', 'Goodbye') called at ./test.pl line 46
but wee see dispatch_dirty in the trace. If instead we call ./test.pl -c we now use the clean dispatcher and get
$ ./test.pl -c death Adios
Robot: Adios at ./test.pl line 33
Unusual::death('Unusual=HASH(0x9427188)', 'Adios') called at ./test.pl line 44
The key here is the goto (not the evil goto) which takes the subroutine reference and immediately switches the execution to that reference, using the current #_. This is why I have to unshift #_, $self so that the invocant is ready for the new method.
Refs:
sub refWay{
my ($refToArray,$secondParam,$thirdParam) = #_;
#work here
}
refWay(\#array, 'a','b');
HashWay:
sub hashWay{
my $refToHash = shift; #(if pass ref to hash)
#and i know, that:
return undef unless exists $refToHash->{'user'};
return undef unless exists $refToHash->{'password'};
#or the same in loop:
for (qw(user password etc)){
return undef unless exists $refToHash->{$_};
}
}
hashWay({'user'=>YourName, 'password'=>YourPassword});
I tried a simple example:
#!/usr/bin/perl
use strict;
sub total {
my $sum = 0;
while(#_) {
$sum = $sum + shift;
}
return $sum;
}
sub total1 {
my ($a, $aa, $aaa) = #_;
return ($a + $aa + $aaa);
}
my $s;
$s = total(10, 20, 30);
print $s;
$s = total1(10, 20, 30);
print "\n$s";
Both print statements gave answer as 60.
But personally I feel, the arguments should be accepted in this manner:
my (arguments, #garb) = #_;
in order to avoid any sort of issue latter.
I found the following gem in http://perldoc.perl.org/perlsub.html:
"Yes, there are still unresolved issues having to do with visibility of #_ . I'm ignoring that question for the moment. (But note that if we make #_ lexically scoped, those anonymous subroutines can act like closures... (Gee, is this sounding a little Lispish? (Never mind.)))"
You might have run into one of those issues :-(
OTOH amon is probably right -> +1
I'm dismayed. OK, so this was probably the most fun Perl bug I've ever found. Even today I'm learning new stuff about Perl. Essentially, the flip-flop operator .. which returns false until the left-hand-side returns true, and then true until the right-hand-side returns false keep global state (or that is what I assume.)
Can I reset it (perhaps this would be a good addition to Perl 4-esque hardly ever used reset())? Or, is there no way to use this operator safely?
I also don't see this (the global context bit) documented anywhere in perldoc perlop is this a mistake?
Code
use feature ':5.10';
use strict;
use warnings;
sub search {
my $arr = shift;
grep { !( /start/ .. /never_exist/ ) } #$arr;
}
my #foo = qw/foo bar start baz end quz quz/;
my #bar = qw/foo bar start baz end quz quz/;
say 'first shot - foo';
say for search \#foo;
say 'second shot - bar';
say for search \#bar;
Spoiler
$ perl test.pl
first shot
foo
bar
second shot
Can someone clarify what the issue with the documentation is? It clearly indicates:
Each ".." operator maintains its own boolean state.
There is some vagueness there about what "Each" means, but I don't think the documentation would be well served by a complex explanation.
Note that Perl's other iterators (each or scalar context glob) can lead to the same problems. Because the state for each is bound to a particular hash, not a particular bit of code,each can be reset by calling (even in void context) keys on the hash. But for glob or .., there is no reset mechanism available except by calling the iterator until it is reset. A sample glob bug:
sub globme {
print "globbing $_[0]:\n";
print "got: ".glob("{$_[0]}")."\n" for 1..2;
}
globme("a,b,c");
globme("d,e,f");
__END__
globbing a,b,c:
got: a
got: b
globbing d,e,f:
got: c
Use of uninitialized value in concatenation (.) or string at - line 3.
got:
For the overly curious, here are some examples where the same .. in the source is a different .. operator:
Separate closures:
sub make_closure {
my $x;
return sub {
$x if 0; # Look, ma, I'm a closure
scalar( $^O..!$^O ); # handy values of true..false that don't trigger ..'s implicit comparison to $.
}
}
print make_closure()->(), make_closure()->();
__END__
11
Comment out the $x if 0 line to see that non-closures have a single .. operation shared by all "copies", with the output being 12.
Threads:
use threads;
sub coderef { sub { scalar( $^O..!$^O ) } }
coderef()->();
print threads->create( coderef() )->join(), threads->create( coderef() )->join();
__END__
22
Threaded code starts with whatever the state of the .. had been before thread creation, but changes to its state in the thread are isolated from affecting anything else.
Recursion:
sub flopme {
my $recurse = $_[0];
flopme($recurse-1) if $recurse;
print " "x$recurse, scalar( $^O..!$^O ), "\n";
flopme($recurse-1) if $recurse;
}
flopme(2)
__END__
1
1
2
1
3
2
4
Each depth of recursion is a separate .. operator.
The trick is not use the same flip-flop so you have no state to worry about. Just make a generator function to give you a new subroutine with a new flip-flop that you only use once:
sub make_search {
my( $left, $right ) = #_;
sub {
grep { !( /\Q$left\E/ .. /\Q$right\E/ ) } #{$_[0]};
}
}
my $search_sub1 = make_search( 'start', 'never_existed' );
my $search_sub2 = make_search( 'start', 'never_existed' );
my #foo = qw/foo bar start baz end quz quz/;
my $count1 = $search_sub1->( \#foo );
my $count2 = $search_sub2->( \#foo );
print "count1 $count1 and count2 $count2\n";
I also write about this in Make exclusive flip-flop operators.
The "range operator" .. is documented in perlop under "Range Operators". Looking through the doucmentation, it appears that there isn't any way to reset the state of the .. operator. Each instance of the .. operator keeps its own state, which means there isn't any way to refer to the state of any particular .. operator.
It looks like it's designed for very small scripts such as:
if (101 .. 200) { print; }
The documentation states that this is short for
if ($. == 101 .. $. == 200) { print; }
Somehow the use of $. is implicit there (toolic points out in a comment that that's documented too). The idea seems to be that this loop runs once (until $. == 200) in a given instance of the Perl interpreter, and therefore you don't need to worry about resetting the state of the .. flip-flop.
This operator doesn't seem too useful in a more general reusable context, for the reasons you've identified.
A workaround/hack/cheat for your particular case is to append the end value to your array:
sub search {
my $arr = shift;
grep { !( /start/ .. /never_exist/ ) } #$arr, 'never_exist';
}
This will guarantee that the RHS of range operator will eventually be true.
Of course, this is in no way a general solution.
In my opinion, this behavior is not clearly documented. If you can construct a clear explanation, you could apply a patch to perlop.pod via perlbug.
I found this problem, and as far as I know there's no way to fix it. The upshot is - don't use the .. operator in functions, unless you are sure you are leaving it in the false state when you leave the function, otherwise the function may return different output for the same input (or exhibit different behaviour for the same input).
Each use of the .. operator maintains its own state. Like Alex Brown said, you need to leave it in the false state when you leave the function. Maybe you could do something like:
sub search {
my $arr = shift;
grep { !( /start/ || $_ eq "my magic reset string" ..
/never_exist/ || $_ eq "my magic reset string" ) }
(#$arr, "my magic reset string");
}
I'm frequently using shift to unpack function parameters:
sub my_sub {
my $self = shift;
my $params = shift;
....
}
However, many on my colleagues are preaching that shift is actually evil. Could you explain why I should prefer
sub my_sub {
my ($self, $params) = #_;
....
}
to shift?
The use of shift to unpack arguments is not evil. It's a common convention and may be the fastest way to process arguments (depending on how many there are and how they're passed). Here's one example of a somewhat common scenario where that's the case: a simple accessor.
use Benchmark qw(cmpthese);
sub Foo::x_shift { shift->{'a'} }
sub Foo::x_ref { $_[0]->{'a'} }
sub Foo::x_copy { my $s = $_[0]; $s->{'a'} }
our $o = bless {a => 123}, 'Foo';
cmpthese(-2, { x_shift => sub { $o->x_shift },
x_ref => sub { $o->x_ref },
x_copy => sub { $o->x_copy }, });
The results on perl 5.8.8 on my machine:
Rate x_copy x_ref x_shift
x_copy 772761/s -- -12% -19%
x_ref 877709/s 14% -- -8%
x_shift 949792/s 23% 8% --
Not dramatic, but there it is. Always test your scenario on your version of perl on your target hardware to find out for sure.
shift is also useful in cases where you want to shift off the invocant and then call a SUPER:: method, passing the remaining #_ as-is.
sub my_method
{
my $self = shift;
...
return $self->SUPER::my_method(#_);
}
If I had a very long series of my $foo = shift; operations at the top of a function, however, I might consider using a mass copy from #_ instead. But in general, if you have a function or method that takes more than a handful of arguments, using named parameters (i.e., catching all of #_ in a %args hash or expecting a single hash reference argument) is a much better approach.
It is not evil, it is a taste sort of thing. You will often see the styles used together:
sub new {
my $class = shift;
my %self = #_;
return bless \%self, $class;
}
I tend to use shift when there is one argument or when I want to treat the first few arguments differently than the rest.
This is, as others have said, a matter of taste.
I generally prefer to shift my parameters into lexicals because it gives me a standard place to declare a group of variables that will be used in the subroutine. The extra verbosity gives me a nice place to hang comments. It also makes it easy to provide default values in a tidy fashion.
sub foo {
my $foo = shift; # a thing of some sort.
my $bar = shift; # a horse of a different color.
my $baz = shift || 23; # a pale horse.
# blah
}
If you are really concerned about the speed of calling your routines, don't unpack your arguments at all--access them directly using #_. Be careful, those are references to the caller's data you are working with. This is a common idiom in POE. POE provides a bunch of constants that you use to get positional parameters by name:
sub some_poe_state_handler {
$_[HEAP]{some_data} = 'chicken';
$_[KERNEL]->yield('next_state');
}
Now the big stupid bug you can get if you habitually unpack params with shift is this one:
sub foo {
my $foo = shift;
my $bar = shift;
my #baz = shift;
# I should really stop coding and go to bed. I am making dumb errors.
}
I think that consistent code style is more important than any particular style. If all my coworkers used the list assignment style, I'd use it too.
If my coworkers said there was a big problem using shift to unpack, I'd ask for a demonstration of why it is bad. If the case is solid, then I'd learn something. If the case is bogus, I could then refute it and help stop the spread of anti-knowledge. Then I'd suggest that we determine a standard method and follow it for future code. I might even try to set up a Perl::Critic policy to check for the decided upon standard.
Assigning #_ to a list can bring some helpul addtional features.
It makes it slightly easier to add additional named parameters at a later date, as you modify your code Some people consider this a feature, similar to how finishing a list or a hash with a trailing ',' makes it slightly easier to append members in the future.
If you're in the habit of using this idiom, then shifting the arguments might seem harmful, because if you edit the code to add an extra argument, you could end up with a subtle bug, if you don't pay attention.
e.g.
sub do_something {
my ($foo) = #_;
}
later edited to
sub do_something {
my ($foo,$bar) = #_; # still works
}
however
sub do_another_thing {
my $foo = shift;
}
If another colleague, who uses the first form dogmatically (perhaps they think shift is evil) edits your file and absent-mindedly updates this to read
sub do_another_thing {
my ($foo, $bar) = shift; # still 'works', but $bar not defined
}
and they may have introduced a subtle bug.
Assigning to #_ can be more compact and efficient with vertical space, when you have a small number of parameters to assign at once. It also allows for you to supply default arguments if you're using the hash style of named function parameters
e.g.
my (%arguments) = (user=>'defaultuser',password=>'password',#_);
I would still consider it a question of style / taste. I think the most important thing is to apply one style or the other with consistency, obeying the principle of least surprise.
I don't think shift is evil. The use of shift shows your willingness to actually name variables - instead of using $_[0].
Personally, I use shift when there's only one parameter to a function. If I have more than one parameter, I'll use the list context.
my $result = decode($theString);
sub decode {
my $string = shift;
...
}
my $otherResult = encode($otherString, $format);
sub encode {
my ($string,$format) = #_;
...
}
There is an optimization for list assignment.
The only reference I could find, is this one.
5.10.0 inadvertently disabled an optimization, which caused a
measurable performance drop in list
assignment, such as is often used to
assign function parameters from #_ .
The optimisation has been re-instated,
and the performance regression fixed.
This is an example of the affected performance regression.
sub example{
my($arg1,$arg2,$arg3) = #_;
}
Perl::Critic is your friend here. It follows the "standards" set up in Damian Conway's book Perl Best Practices. Running it with --verbose 11 gives you an explanation on why things are bad. Not unpacking #_ first in your subs is a severity 4 (out of 5). E.g:
echo 'package foo; use warnings; use strict; sub quux { my foo= shift; my (bar,baz) = #_;};1;' | perlcritic -4 --verbose 11
Always unpack #_ first at line 1, near 'sub quux { my foo= shift; my (bar,baz) = #_;}'.
Subroutines::RequireArgUnpacking (Severity: 4)
Subroutines that use `#_' directly instead of unpacking the arguments to
local variables first have two major problems. First, they are very hard
to read. If you're going to refer to your variables by number instead of
by name, you may as well be writing assembler code! Second, `#_'
contains aliases to the original variables! If you modify the contents
of a `#_' entry, then you are modifying the variable outside of your
subroutine. For example:
sub print_local_var_plus_one {
my ($var) = #_;
print ++$var;
}
sub print_var_plus_one {
print ++$_[0];
}
my $x = 2;
print_local_var_plus_one($x); # prints "3", $x is still 2
print_var_plus_one($x); # prints "3", $x is now 3 !
print $x; # prints "3"
This is spooky action-at-a-distance and is very hard to debug if it's
not intentional and well-documented (like `chop' or `chomp').
An exception is made for the usual delegation idiom
`$object->SUPER::something( #_ )'. Only `SUPER::' and `NEXT::' are
recognized (though this is configurable) and the argument list for the
delegate must consist only of `( #_ )'.
It isn't intrinsically evil, but using it to pull off the arguments of a subroutine one by one is comparatively slow and requires a greater number of lines of code.