How do I get named sub parameters that are also fast? - perl

Parameters to perl subs are passed in #_. To make my programs easier to read, I've always used this pattern to get named parameters:
sub foo1 {
my ($bar, $baz) = #_;
do_something($bar,$baz);
}
but it causes $_[0] and $_[1] to be copied. If I were to access $_[0] directly instead of accessing $bar in the above pattern, I have call-by-value/alias access to the callers parameters with the usual caveats of call-by-reference, but it is much faster (see demo below).
I'm left with this suspicion, that the my ($param1, $param2 ...) = #_; pattern is bad for performance reasons. So I find I have to choose between fast and readable programs which is, well, an impossible choice.
I end up writing subs where performance is the focus with $_[<n>] and everything else with the pattern above. Trouble is, often I don't know beforehand where the bottlenecks are ;-)
Is there a way to get named parameters that are also fast? Or what seems to be canon on the matter? $_[0] or $bar?
Appendix: Speed demo
use Time::HiRes qw(time);
# Lets just do *something* with the parameters - here we just add up all
# their lengths
my $totalLength = 0;
sub foo1 {
# Access $_[0] directly - effectively call-by-reference
$totalLength += length($_[0]);
}
sub foo2 {
# Access a copy of $_[0] - effectively call-by-value - involves
# copying
my ($bar) = #_;
$totalLength += length($bar);
}
my $a = 'b' x 10_000;
my $t0 = time;
foreach (0..1_000_000) {
foo1($a);
}
printf "foo1 %2.6f\n", time - $t0;
$t0 = time;
foreach (0..1_000_000) {
foo2($a);
}
printf "foo2 %2.6f\n", time - $t0;
Prints out
foo1 0.329470
foo2 1.280364
foo1 is almost 4 times faster than foo2 because it avoids copying $_[0].

Pass them via a hashref:
my %args = (bar=>1, baz=>2);
mySub(\%args);
sub mySub {
my $args = shift;
doSomething($args); # pass entire parameter list for cheap!
doSomething2($args{bar}); # Use specific parameter
}
Frankly, I have my slight doubts about the performance benefits (hash access isn't free) but you can benchmark it if you really need to. But this isn't the best performance option (see below) and may not even be needed (see last part), and this I don't see much need to try.
Another option (which kinda sucks but is better for performance) is to use $_[1] etc... but combat non-readability via extensive comments.
# pass `baz`
doSomething($_[1]);
Another even-higher-performance-but-bad-design option is to bypass the parameter passing alltogether, and use global variables to pass parameters.
our $bar = 1;
mySub();
sub mySub {
#do something with $bar. Pray some other code didn't clobber it
}
Last consideration:
If your code is so well-tuned AND so performance-sensitive that copying a couple of scalars makes a significant difference, you MAY want to drop out of Perl into pure C for those functions.
But, as Knuth said, please don't optimize prematurely.
First, profile your entire app and make sure that the scalar parameter copying is indeed where your biggest bottlenecks are. I don't dispute that this is plausible, but typically, bottlenecks are elsewhere (IO, DB, slow data structures, etc...).
In other words, the fact that $operation_X can be implemented 4 times faster, means nothing if $operation_X takes up a total of 0.01% of your runtime. Speeding it up by a factor of 4 is simply not worth the trouble given decreased readability.

Well, if you're passing $bar and $baz, in that order, to sub do_something(), another bad option is to use the following scary -- but documented -- syntax:
sub foo1 { goto &do_something}
...which would pass the context on to do_something() immediately. No help with documenting the parameters, but it's possibly the fastest pass-on-to-another-routine mechanism of the bunch. :-)
Heck, I'd downvote this answer myself....

Related

In perl, what does a parenthesized list of '$' mean in a sub declaration?

I have to debug someone else's code and ran across sub declarations that look like this...
sub mysub($$$$) {
<code here>
}
...also...
sub mysub($$$;$) {
<code here>
}
What does the parenthesized list of '$' (with optional ';') mean?
I ran an experiment and it doesn't seem to care if I pass more and fewer args to a sub declared this way than there are '$' in the list. I was thinking that it might be used to disambiguate two different subs with the same name, differring only by the number of args pased to it (as defined by the ($$$$) vs ($$$) vs ($$) etc... ). But that doesn't seem to be it.
That's a Perl subroutine prototype. It's an old-school way of letting the parser know how many arguments to demand. Unless you know what they are going to do for you, I suggest you avoid these for any new code. If you can avoid prototypes, avoid it. It doesn't gain you as much as you think. There's a newer but experimental way to do it better.
The elements after the ; are optional arguments. So, mysub($$$$) has four mandatory arguments, and mysub($$$;$) has three mandatory arguments and one optional argument.
A little about parsing
Perl lets you be a bit loose about parentheses when you want to specify arguments, so these are the same:
print "Hello World";
print( "Hello World\n" );
This is one of Perl's philosophical points. When we can omit boilerplate, we should be able to.
Also, Perl lets you pass as many arguments as you like to a subroutine and you don't have to say anything about parameters ahead of time:
sub some_sub { ... }
some_sub( 1, 2, 3, 4 );
some_sub 1, 2, 3, 4; # same
This is another foundational idea of Perl: we have scalars and lists. Many things work on a list, and we don't care what's in it or how many elements it has.
But, some builtins take a definite number of arguments. The sin takes exactly one argument (but print takes zero to effectively infinity):
print sin 5, 'a'; # -0.958924274663138a (a is from `a`)
The rand takes zero or one:
print rand; # 0.331390818188996
print rand 10; # 4.23956650382937
But then, you can define your own subroutines. Prototypes are a way to mimic that same behavior you see in the builtins (which I think is kinda cool but also not as motivating for production situations).
I tend to use parens in argument lists because I find it's easier for people to see what I intend (although not always with print, I guess):
print sin(5), 'a';
There's one interesting use of prototypes that I like. You can make your own syntax that works like map and grep block forms:
map { ... } #array;
If you want to play around with that (but still not subject maintenance programmers to it), check out Object::Iterate for a demonstration of it.
Experimental signatures
Perl v5.20 introduced an experimental signatures feature where you can give names to parameters. All of these are required:
use v5.20;
use feature qw(signatures);
sub mysub ( $name, $address, $phone ) { ... }
If you wanted an optional parameter, you can give it a default value:
sub mysub ( $name, $address, $phone = undef ) { ... }
Since this is an experimental feature, it warns whenever you use it. You can turn it off though:
no warnings qw(experimental::signatures);
This is interesting.
I ran an experiment and it doesn't seem to care if I pass more and fewer args to a sub declared this way than there are '$' in the list.
Because, of course, that's exactly what the code's author was trying to enforce.
There are two ways to circumvent the parameter counting that prototypes are supposed to enforce.
Call the subroutine as a method on an object ($my_obj->my_sub(...)) or on a class (MyClass->my_sub(...)).
Call the subroutine using the "old-style" ampersand syntax (&my_sub(...)).
From which we learn:
Don't use prototypes on subroutines that are intended to be used as methods.
Don't use the ampersand syntax for calling subroutines.

How to access a sub through symbol table for main:: and name of sub in a scalar variable

Suppose I have a function foo (or ::foo, or main::foo if you prefer), and I define
use strict;
my $sub_name = 'foo';
I want to invoke foo indirectly, as "the function whose name is stored in $sub_name". (For the sake of this example, assume that the invocation should pass the list 1, 2, 3 as arguments.)
I know that there's a way to do this by working with the symbol table for main:: directly, treating it like a hash-like data structure.
This symbol-table incantation is what I'm looking for.
I've done this sort of thing many times before, but I have not programmed Perl in many years, and I no longer remember the incantation.
(I'd prefer to do this without having to resort to no strict, but no biggie if that's not possible.)
I'd simply use a symbolic reference.
my $sub = \&$qualified_sub_name; # \&$symbol is except from strict 'refs'.
$sub->()
But you requested that we avoid using symbolic reference. That's way too complex. (It's also might not handle weird but legit misuse of colons.)
my $pkg = \%::;
my $sub_name = $qualified_sub_name;
$pkg = $pkg->{$1} while $sub_name =~ s/^(.*?::)//sg;
my $sub = $pkg->{$sub_name};
$sub = *{ $pkg->{$sub_name} }{CODE}
if ref(\$sub) eq 'GLOB'; # Skip if glob optimized away.
$sub->()
You can use can:
my $sub_name = 'foo';
my $coderef = main->can($sub_name);
$coderef->(#args);
As others have mentioned, you should note that this can return also methods like "can" or "isa".
Also, if $sub_name contains Some::Module::subname, this will also be called.
If you're not sure what's in $sub_name, you probably want a different approach.
Use this only if you have control over $sub_name and it can contain only expected values. (I assumed this, that's why I wrote this answer.)

Perl subroutine arguments

I have been reading about Perl recently and am slightly perplexed about how Perl handles arguments passed to subroutines.
In a language like Python, Java or PHP, a function definition takes the form (in pseudocode):
function myFunc(arg1, arg2) {
// Do something with arg1 and arg2 here
}
Yet in Perl it's just:
sub mySub {
# #_ holds all arguments passed
}
And as I understand it, that's the only way to do it.
What if I want to restrict the caller to only pass two arguments?
Isn't this just Perl not allowing anything but variable-number arguments in other languages (i.e., Python, C, etc.)?
Wouldn't that become a problem at some point?
What about all the default argument-number checking in other languages? Would one have to do that explicitly in Perl? For instance
sub a_sub {
if (#_ == 2) {
# Continue function
}
else {
return false
}
}
You are wary of the Perl environment because it is quite different from the languages you have come across before.
The people who believe in strong typing and function prototypes will disagree here, but I believe that restrictions like that are rarely useful. Has C really caught you passing the wrong number of parameters to a function often enough to be useful?
It is most common in modern Perl to copy the contents of #_ to a list of lexical scalar variables, so you will often see subroutines starting with
sub mysub {
my ($p1, $p2) = #_;
... etc.
}
that way, all parameters that are passed will be available as elements of #_ ($_[0], $_[1] etc.) while the expected ones are named and appear in $p1 and $p2 (although I hope you understand that those names should be chosen appropriately).
In the particular case that the subroutine is a method, the first parameter is special. In other languages it is self or this, but in Perl it is simply the first parameter in #_ and you may call it what you like. In those circumstances you would see
sub method {
my $self = shift;
my ($p1, $p2) = #_;
... etc.
}
so that the context object (or the name of the class if it is a class method) is extracted into $self (a name assumed by convention) and the rest of the parameters remain in #_ to be accessed either directly or, more usually, copied to local scalar variables as $p1, $p2 etc.
Most often the complaint is that there is no type checking either, so I can pass any scalar I like as a subroutine parameter. As long as use strict and use warnings are in context, even this is generally simple to debug, simply because the operations that the subroutine can perform on one form of scalar are usually illegal on another.
Although it was originally more to do with encapsulation with respect to object-oriented Perl, this quote from Larry Wall is very relevant
Perl doesn't have an infatuation with enforced privacy. It would prefer that you stayed out of its living room because you weren't invited, not because it has a shotgun
C was designed and implemented in the days when it was a major efficiency boost if you could get a faulty program to fail during compilation rather than at run time. That has changed now, although a similar situation has arisen with client-side JavaScript where it actually would be useful to know that the code is wrong before fetching the data from the internet that it has to deal with. Sadly, JavaScript parameter checking is now looser than it should be.
Update
For those who doubt the usefulness of Perl for teaching purposes, I suggest that it is precisely because Perl's mechanisms are so simple and direct that they are ideal for such purposes.
When you call a Perl subroutine all of the parameters in the call are aliased in #_. You can use them directly to affect the actual parameters, or copy them to prevent external action
If you call a Perl subroutine as a method then the calling object or class is provided as the first parameter. Again, the subroutine (method) can do what it likes with #_
Perl doesn't manage your argument handling for you. Instead, it provides a minimal, flexible abstraction and allows you to write code that fits your needs.
Pass By Reference
By default, Perl sticks an alias to each argument in #_. This implements basic, pass by reference semantics.
my $num = 1;
foo($num);
print "$num\n"; # prints 2.
sub foo { $_[0]++ }
Pass by reference is fast but has the risk of leaking changes to parameter data.
Pass By Copy
If you want pass by copy semantics, you need to make the copies yourself. Two main approaches to handling lists of positional parameters are common in the Perl community:
sub shifty {
my $foo = shift;
}
sub listy {
my ($foo) = #_;
}
At my place of employment we do a version of listy:
sub fancy_listy {
my ($positional, $args, #bad) = #_;
die "Extra args" if #bad;
}
Named Parameters
Another common practice is the use of named parameters:
sub named_params {
my %opt = #_;
}
Some people are happy with just the above. I prefer a more verbose approach:
sub named_params {
my %opt = #_;
my $named = delete $opt{named} // "default value";
my $param = delete $opt{param}
or croak "Missing required 'param'";
croak "Unknown params:", join ", ", keys %opt
if %opt;
# do stuff
}
This unpacks named params into variables, allows space for basic validation and default values and enforces that no extra, unknown arguments were passed in.
On Perl Prototypes
Perl's "prototypes" are not prototypes in the normal sense. They only provide compiler hints that allow you to skip parenthesis on function calls. The only reasonable use is to mimic the behavior of built-in functions. You can easily defeat prototype argument checking. In general, DO NOT USE PROTOTYPES. Use them with with care that you would use operator overloading--i.e. sparingly and only to improve readability.
For some reason, Perl likes lists, and dislikes static typing. The #_ array actually opens up a lot of flexibility, because subroutine arguments are passed by reference, and not by value. For example, this allows us to do out-arguments:
my $x = 40;
add_to($x, 2);
print "$x\n"; # 42
sub add_to { $_[0] += $_[1] }
… but this is more of an historic performance hack. Usually, the arguments are “declared” by a list assignment:
sub some_sub {
my ($foo, $bar) = #_;
# ^-- this assignment performs a copy
...
}
This makes the semantics of this sub call-by-value, which is usually more desirable. Yes, unused arguments are simply forgotten, and too few arguments do not raise any automatic error – the variables just contain undef. You can add arbitrary validation e.g. by checking the size of #_.
There exist plans to finally make named parameters available in the future, which would look like
sub some_sub($foo, $bar) { ... }
You can have this syntax today if you install the signatures module. But there is something even better: I can strongly recommend Function::Parameters, which allows syntax like
fun some_sub($foo, $bar = "default value") { ... }
method some_method($foo, $bar, :$named_parameter, :$named_with_default = 42) {
# $self is autodeclared in methods
}
This also supports experimental type checks.
Parser extensions FTW!
If you really want to impose stricter parameter checks in Perl, you could look at something like Params::Validate.
Perl does have the prototyping capability for parameter placeholders, that you're kind of used to seeing, but it's often unnecessary.
sub foo($){
say shift;
};
foo(); # Error: Not enough arguments for main::foo
foo('bar'); # executes correctly
And if you did sub foo($$){...} it would require 2 non-optional arguments (eg foo('bar','baz'))
You can just use:
my ($arg1, $arg2) = #_;
To explicitly limit the number of arguments you can use:
my $number =2;
die "Too many arguments" if #_ > $number;
If you are reading about Perl recently, please read about recent Perl. You may read the Modern Perl book for free as well.
Indeed, while with old standard Perl you would need to restrict the number of arguments passed to a subroutine manually, for example with something like this:
sub greet_one {
die "Too many arguments for subroutine" unless #_ <= 1;
my $name = $_[0] || "Bruce";
say "Hello, $name!";
}
With modern Perl you can take advantage of function signatures.
Here are a few examples from the Modern Perl book:
sub greet_one($name = 'Bruce') {
say "Hello, $name!";
}
sub greet_all($leader, #everyone) {
say "Hello, $leader!";
say "Hi also, $_." for #everyone;
}
sub make_nested_hash($name, %pairs) {
return { $name => \%pairs };
}
Please, note that function signatures were introduced in Perl 5.20 and considered experimental until Perl 5.36.
Therefore, if you use a Perl version in that range you may want to disable warnings for the "experimental::signatures" category:
use feature 'signatures';
no warnings 'experimental::signatures';

How to localize a variable in an upper scope in Perl?

I have run across the following pattern a few times while developing Perl modules that use AUTOLOAD or other subroutine dispatch techniques:
sub AUTOLOAD {
my $self = $_[0];
my $code = $self->figure_out_code_ref( $AUTOLOAD );
goto &$code;
}
This works fine, and caller sees the correct scope.
Now what I would like to do is to locally set $_ equal to $self during the execution of &$code. Which would be something like this:
sub AUTOLOAD {
my $self = $_[0];
my $code = $self->figure_out_code_ref( $AUTOLOAD );
local *_ = \$self;
# and now the question is how to call &$code
# goto &$code; # wont work since local scope changes will
# be unrolled before the goto
# &$code; # will preserve the local, but caller will report an
# additional stack frame
}
Solutions that involve wrapping caller are not acceptable due to performance and dependency issues. So that seems to rule out the second option.
Moving back to the first, the only way to prevent the new value of $_ from going out of scope during the goto would be to either not localize the change (not a viable option) or to implement some sort of uplevel_local or goto_with_local.
I have played around with all sorts of permutations involving PadWalker, Sub::Uplevel, Scope::Upper, B::Hooks::EndOfScope and others, but have not been able to come up with a robust solution that cleans up $_ at the right time, and does not wrap caller.
Has anyone found a pattern that works in this case?
(the SO question: How can I localize Perl variables in a different stack frame? is related, but preserving caller was not a requirement, and ultimately the answer there was to use a different approach, so that solution is not helpful in this case)
Sub::Uplevel appears to work -- at least for a simple case not involving AUTOLOAD:
use strict;
use warnings;
use Sub::Uplevel;
$_ = 1;
bar();
sub foo {
printf "%s %s %d - %s\n", caller, $_
}
sub bar {
my $code = \&foo;
my $x = 2;
local *_ = \$x;
uplevel 1, $code;
}
The output is:
main c:\temp\foo.pl 6 - 2
Granted, this doesn't really localize a variable in the parent scope, but I don't think you would really want to do that even if you could. You only want to localize $_ for the duration of the call.
The perlfunc documentation for goto points out (emphasis added)
The goto-&NAME form is quite different from the other forms of "goto". In fact, it isn't a goto in the normal sense at all, and doesn't have the stigma associated with other gotos. Instead, it exits the current subroutine (losing any changes set by local) …
What sorts of performance concerns allow for indirection through autoloading but not through a wrapper?

Why does Perl::Critic dislike using shift to populate subroutine variables?

Lately, I've decided to start using Perl::Critic more often on my code. After programming in Perl for close to 7 years now, I've been settled in with most of the Perl best practices for a long while, but I know that there is always room for improvement. One thing that has been bugging me though is the fact that Perl::Critic doesn't like the way I unpack #_ for subroutines. As an example:
sub my_way_to_unpack {
my $variable1 = shift #_;
my $variable2 = shift #_;
my $result = $variable1 + $variable2;
return $result;
}
This is how I've always done it, and, as its been discussed on both PerlMonks and Stack Overflow, its not necessarily evil either.
Changing the code snippet above to...
sub perl_critics_way_to_unpack {
my ($variable1, $variable2) = #_;
my $result = $variable1 + $variable2;
return $result;
}
...works too, but I find it harder to read. I've also read Damian Conway's book Perl Best Practices and I don't really understand how my preferred approach to unpacking falls under his suggestion to avoid using #_ directly, as Perl::Critic implies. I've always been under the impression that Conway was talking about nastiness such as:
sub not_unpacking {
my $result = $_[0] + $_[1];
return $result;
}
The above example is bad and hard to read, and I would never ever consider writing that in a piece of production code.
So in short, why does Perl::Critic consider my preferred way bad? Am I really committing a heinous crime unpacking by using shift?
Would this be something that people other than myself think should be brought up with the Perl::Critic maintainers?
The simple answer is that Perl::Critic is not following PBP here. The
book explicitly states that the shift idiom is not only acceptable, but
is actually preferred in some cases.
Running perlcritic with --verbose 11 explains the policies. It doesn't look like either of these explanations applies to you, though.
Always unpack #_ first at line 1, near
'sub xxx{ my $aaa= shift; my ($bbb,$ccc) = #_;}'.
Subroutines::RequireArgUnpacking (Severity: 4)
Subroutines that use `#_' directly instead of unpacking the arguments to
local variables first have two major problems. First, they are very hard
to read. If you're going to refer to your variables by number instead of
by name, you may as well be writing assembler code! Second, `#_'
contains aliases to the original variables! If you modify the contents
of a `#_' entry, then you are modifying the variable outside of your
subroutine. For example:
sub print_local_var_plus_one {
my ($var) = #_;
print ++$var;
}
sub print_var_plus_one {
print ++$_[0];
}
my $x = 2;
print_local_var_plus_one($x); # prints "3", $x is still 2
print_var_plus_one($x); # prints "3", $x is now 3 !
print $x; # prints "3"
This is spooky action-at-a-distance and is very hard to debug if it's
not intentional and well-documented (like `chop' or `chomp').
An exception is made for the usual delegation idiom
`$object->SUPER::something( #_ )'. Only `SUPER::' and `NEXT::' are
recognized (though this is configurable) and the argument list for the
delegate must consist only of `( #_ )'.
It's important to remember that a lot of the stuff in Perl Best Practices is just one guy's opinion on what looks the best or is the easiest to work with, and it doesn't matter if you do it another way. Damian says as much in the introductory text to the book. That's not to say it's all like that -- there are many things in there that are absolutely essential: using strict, for instance.
So as you write your code, you need to decide for yourself what your own best practices will be, and using PBP is as good a starting point as any. Then stay consistent with your own standards.
I try to follow most of the stuff in PBP, but Damian can have my subroutine-argument shifts and my unlesses when he pries them from my cold, dead fingertips.
As for Critic, you can choose which policies you want to enforce, and even create your own if they don't exist yet.
In some cases Perl::Critic cannot enforce PBP guidelines precisely, so it may enforce an approximation that attempts to match the spirit of Conway's guidelines. And it is entirely possible that we have misinterpreted or misapplied PBP. If you find something that doesn't smell right, please mail a bug report to bug-perl-critic#rt.cpan.org and we'll look into it right away.
Thanks,
-Jeff
I think you should generally avoid shift, if it is not really necessary!
Just ran into a code like this:
sub way {
my $file = shift;
if (!$file) {
$file = 'newfile';
}
my $target = shift;
my $options = shift;
}
If you start changing something in this code, there is a good chance you might accidantially change the order of the shifts or maybe skip one and everything goes southway. Furthermore it's hard to read - because you cannot be sure you really see all parameters for the sub, because some lines below might be another shift somewhere... And if you use some Regexes in between, they might replace the contents of $_ and weird stuff begins to happen...
A direct benefit of using the unpacking my (...) = #_ is you can just copy the (...) part and paste it where you call the method and have a nice signature :) you can even use the same variable-names beforehand and don't have to change a thing!
I think shift implies list operations where the length of the list is dynamic and you want to handle its elements one at a time or where you explicitly need a list without the first element. But if you just want to assign the whole list to x parameters, your code should say so with my (...) = #_; no one has to wonder.