Why is 'last' called 'last' in Perl? - perl

What is the historical reason to that last is called that in Perl rather than break as it is called in C?
The design of Perl was influenced by C (in addition to awk, sed and sh - see man page below), so there must have been some reasoning behind not going with the familiar C-style naming of break/last.
A bit of history from the Perl 1.000 (released 18 December, 1987) man page:
[Perl] combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC|PLUS.)

The semantics of 'break' or 'last' are
defined by the language (in this case
Perl), not by you.
Why not think of 'last' as "this is
the last statement to run for the
loop".
It's always struck me as odd that the
'continue' statement in 'C' starts the
next pass of a loop. This is
definitely a strange use of the
concept of "continue". But it is the
semantics of 'C', so I accept it.
By trying to map particular
programming concepts into single
English words with existing meaning
there is always going to be some sort
of mismatching oddity
Source
Plus, Larry Wall is kinda weird. Have you seen his picture?
(source: wired.com)

I expect that this is because Perl was created by a linguist, not a computer scientist. In normal English usage, the concept of declaring that you have completed your final pass through a loop is more strongly connected to the word "last" ("this is the last pass") than to the word "break" ("break the loop"? "break out of the loop"? - it's not even clear how "break" is intended to relate to exiting the loop).

The term 'last' makes more sense when you remember that you can use it with more than just the immediate looping control. You can apply it to labeled blocks one or more levels above
the block it is in:
LINE: while( <> ) {
WORD: foreach ( split ) {
last LINE if /^__END__\z/;
...
}
}
It reads more naturally to say "last" in english when you read it as "last line if it matches ...".

Theres an additional reason you might want to consider:
Last does more than just loop control.
sub hello {
my ( $arg ) = #_;
scope: {
foo();
bar();
last if $arg > 4;
baz();
quux();
}
}
Last as such is a general flow control mechanism not limited to loops. While of course, you can generalise the above as a loop that runs at most 1 times, the absence of a loop to me indicates "Break? What are we breaking out of?"
Instead, I think of "last" as "Jump to the position of the last brace", which is for this purpose, more semantically sensible.

I was asking the same question to Damian Conway about say. Perl 6 will introduce say, which is nothing more than print that automatically adds a newline. My question was why not simply use echo, because this is what echo does in Bash (and probably elsewhere).
His answer was: echo is 33% longer than say.
He has a point there. :)

Because it goes to the last of the loop. And because Larry Wall was a weird guy.

Related

perl: what is the right way to call a function stored in a variable?

What is the right way to call a function stored in a variable?
my $f = sub () { ... };
&$f(); # 1st
$f->(); # 2nd
Both appear to work, and the first probably worked in perl4.
However, I was wondering what the "official perl5 way" was.
Also, are there any performance implications?
Both are the right way. Perl is not about forcing any special style down your throat.
Style #1 &$f()
Pro:
Emphasizes that we are using a subroutine
Con:
Looks like line noise
Overrides function templates
Seems a bit perl4-ly to me
Caveats:
In the dark ages of perl4, there were no references. One could simulate references by passing around variable names (*shudder*). This also works with subs, so this code runs:
sub f { (shift == 0) ? 1 : 0 }
$g = "f";
print &$g(1); # prints 0;
print &$g(0); # prints 1;
Please use strict 'refs' to guard against this horror.
Style #2 $f->()
Pro:
Emphasizes that we are handling a reference
Looks cleaner
Con:
can be confused with objects and hashrefs
Caveats:
Same as with the other syntax, as they are the same under the hood. But the dereference operator is not misused as often.
Performance implications
Lets face it, if we were all about performance, we would be writing assembler. If you want to optimize Perl, first optimize the algorithm, then code everything in C/XS, throw away any objects and modules, and finally discuss dereferencing syntax.
I would guess style #1 is faster in theory, but I doubt it would have serious implications in real life.
I sincerely doubt there are any differences performance, since both methods result in the same code:
$ perl -MO=Deparse -e'&$f()'
&$f();
-e syntax OK
$ perl -MO=Deparse -e'$f->()'
&$f();
-e syntax OK

Is using prototypes to declare array reference context on subroutine args a Good Thing in Perl?

In the linked SO answer, Eric illustrates a way to call a subroutine, which accepts arrays by reference as arguments, and use the prototypes to allow the caller code to pass the array names without using reference operator \#; the way built-ins like push #array, $value do.
# Original code:
sub Hello { my ($x_ref, $y_ref) = #_; ...}
Hello(\#x, \#y);
# Same thing using array ref prototype:
sub Hello (\#\#$) {...}
Hello(#x, #y);
My question is, is this considered to be a Best Practice? And what are the guidelines on the pattern's use?
It seems like this pattern should either be used ONLY for built-ins, or for 100% of subroutines that accept array arguments in all of your code.
Otherwise code maintenance and use of your subs becomes fragile since the developer never knows whether a particular sub, when called, should be forced to reference an array or not.
An additional point of fragility is that you become confused between doing such calls and legitimately using two arrays combined into one using a comma operator.
On the positive side, using the pattern prevents the "forgot to reference the array" bugs, and makes the code calling the subroutines somewhat more readable.
P.S. I don't have Conway's book handy and don't recall if he ever discussed the topic, to pre-empt RTFB responses.
Only use prototypes when you're trying to extend Perl's syntax: e.g. if you're building Moose, or something like the examples in Dominus' Higher Order Perl. If you're doing that, you know enough to disregard PBP (and to annotate your code to shut Perl::Critic up). If you're doing anything else, don't use them.
Let's ask Perl::Critic:
echo "use strict; sub Hello (\#\#$) {...}" | perlcritic
Subroutine prototypes used at line 1, column 1. See page 194 of PBP. (Severity: 5)
Yeah, that would be a no.
I tend to think that anything that makes a subroutine act differently than any other subroutine isn't the best thing. There might be instances where it's a good idea, but in general it's more special cases, documentation, and other things to remember. The more your code acts like most other Perl code, the easier you make things for your users.
Notice that this isn't the main complaint that Perl Best Practices has with prototypes, which is an ugly kludge in Perl.

What is the CORE:match (opcode) subroutine in Perl profiling?

I previously wrote some utilities in Perl, and I am now rewriting them in order to give some new/better features. However, things seem to be going much more slowly than in the original utilities, so I decided to run one with the NYTProf profiler. Great profiler btw, still trying to figure out all its useful features.
So anyway, it turns out that 93% of my program's time is being spent on calls to the GeneModel::CORE:match (opcode) subroutine, and I have no idea what this is. Most Google hits point to NYTProf profiles others have posted. I indeed wrote the GeneModel class/package, but I don't know what this subroutine is, why it was called so many times, or why it's taking so long. Any ideas?
CORE:match is a call to a regular expression -- in this case, within your GeneModel package.
For example, if we profile this script, Devel::NYTProf reports 1000 calls to Foo::CORE:match.
use strict;
use warnings;
package Foo;
my $s = 'foo foo';
$s =~ /foo/ for 1 .. 1000;
Perl is compiled to opcodes. The match operator results in a match opcode.
> perl -MO=Terse -e'm//'
LISTOP (0x8c4b40) leave [1]
OP (0x8c4070) enter
COP (0x8c4780) nextstate
PMOP (0x8c4260) match
This is not a subroutine, but merely represented that way as opcode profiling is a recent addition and the UI hasn't been overhauled yet to take that into account. In simple words, the profiler is telling you that most time is spent in the regex engine.

Why does Perl::Critic dislike using shift to populate subroutine variables?

Lately, I've decided to start using Perl::Critic more often on my code. After programming in Perl for close to 7 years now, I've been settled in with most of the Perl best practices for a long while, but I know that there is always room for improvement. One thing that has been bugging me though is the fact that Perl::Critic doesn't like the way I unpack #_ for subroutines. As an example:
sub my_way_to_unpack {
my $variable1 = shift #_;
my $variable2 = shift #_;
my $result = $variable1 + $variable2;
return $result;
}
This is how I've always done it, and, as its been discussed on both PerlMonks and Stack Overflow, its not necessarily evil either.
Changing the code snippet above to...
sub perl_critics_way_to_unpack {
my ($variable1, $variable2) = #_;
my $result = $variable1 + $variable2;
return $result;
}
...works too, but I find it harder to read. I've also read Damian Conway's book Perl Best Practices and I don't really understand how my preferred approach to unpacking falls under his suggestion to avoid using #_ directly, as Perl::Critic implies. I've always been under the impression that Conway was talking about nastiness such as:
sub not_unpacking {
my $result = $_[0] + $_[1];
return $result;
}
The above example is bad and hard to read, and I would never ever consider writing that in a piece of production code.
So in short, why does Perl::Critic consider my preferred way bad? Am I really committing a heinous crime unpacking by using shift?
Would this be something that people other than myself think should be brought up with the Perl::Critic maintainers?
The simple answer is that Perl::Critic is not following PBP here. The
book explicitly states that the shift idiom is not only acceptable, but
is actually preferred in some cases.
Running perlcritic with --verbose 11 explains the policies. It doesn't look like either of these explanations applies to you, though.
Always unpack #_ first at line 1, near
'sub xxx{ my $aaa= shift; my ($bbb,$ccc) = #_;}'.
Subroutines::RequireArgUnpacking (Severity: 4)
Subroutines that use `#_' directly instead of unpacking the arguments to
local variables first have two major problems. First, they are very hard
to read. If you're going to refer to your variables by number instead of
by name, you may as well be writing assembler code! Second, `#_'
contains aliases to the original variables! If you modify the contents
of a `#_' entry, then you are modifying the variable outside of your
subroutine. For example:
sub print_local_var_plus_one {
my ($var) = #_;
print ++$var;
}
sub print_var_plus_one {
print ++$_[0];
}
my $x = 2;
print_local_var_plus_one($x); # prints "3", $x is still 2
print_var_plus_one($x); # prints "3", $x is now 3 !
print $x; # prints "3"
This is spooky action-at-a-distance and is very hard to debug if it's
not intentional and well-documented (like `chop' or `chomp').
An exception is made for the usual delegation idiom
`$object->SUPER::something( #_ )'. Only `SUPER::' and `NEXT::' are
recognized (though this is configurable) and the argument list for the
delegate must consist only of `( #_ )'.
It's important to remember that a lot of the stuff in Perl Best Practices is just one guy's opinion on what looks the best or is the easiest to work with, and it doesn't matter if you do it another way. Damian says as much in the introductory text to the book. That's not to say it's all like that -- there are many things in there that are absolutely essential: using strict, for instance.
So as you write your code, you need to decide for yourself what your own best practices will be, and using PBP is as good a starting point as any. Then stay consistent with your own standards.
I try to follow most of the stuff in PBP, but Damian can have my subroutine-argument shifts and my unlesses when he pries them from my cold, dead fingertips.
As for Critic, you can choose which policies you want to enforce, and even create your own if they don't exist yet.
In some cases Perl::Critic cannot enforce PBP guidelines precisely, so it may enforce an approximation that attempts to match the spirit of Conway's guidelines. And it is entirely possible that we have misinterpreted or misapplied PBP. If you find something that doesn't smell right, please mail a bug report to bug-perl-critic#rt.cpan.org and we'll look into it right away.
Thanks,
-Jeff
I think you should generally avoid shift, if it is not really necessary!
Just ran into a code like this:
sub way {
my $file = shift;
if (!$file) {
$file = 'newfile';
}
my $target = shift;
my $options = shift;
}
If you start changing something in this code, there is a good chance you might accidantially change the order of the shifts or maybe skip one and everything goes southway. Furthermore it's hard to read - because you cannot be sure you really see all parameters for the sub, because some lines below might be another shift somewhere... And if you use some Regexes in between, they might replace the contents of $_ and weird stuff begins to happen...
A direct benefit of using the unpacking my (...) = #_ is you can just copy the (...) part and paste it where you call the method and have a nice signature :) you can even use the same variable-names beforehand and don't have to change a thing!
I think shift implies list operations where the length of the list is dynamic and you want to handle its elements one at a time or where you explicitly need a list without the first element. But if you just want to assign the whole list to x parameters, your code should say so with my (...) = #_; no one has to wonder.

What's good practice for Perl special variables?

First off, does anyone have a comprehensive list of the Perl special variables?
Second, are there any tasks that are much easier using them? I always unset $/ to read in files all at once, and $| to automatically flush buffers, but I'm not sure of any others.
And third, should one use the Perl special variables, or be more explicit in their coding. Personally I'm a fan of using the special variables to manipulate the way code behaves, but I've heard others argue that it just confuses things.
They are all documented in perlvar.
Note that the long names are only usable if you use English qw( -no_match_vars ); first.
Always remember to local'ize your changes to the punctuation variables. Some of the punctuation variables are useful, others should not be used. For instance, $[ should never be used (it changes the base index of arrays, so local $[ = 1; will cause 1 to refer to the first item in a list or array). Others like $" are iffy. You have to balance the usefulness of not having to do the join manually. For instance, which of these is easier to understand?
local $" = " :: "; #"
my $s = "#a / #b / #c\n";
versus
my $sep = " :: ";
my $s = join(" / ", join($sep, #a), join($sep, #a), join($sep, #a)) . "\n";
or
my $s = join(" / ", map { join " :: ", #$_ }, \(#a, #b, #c)) . "\n";
1) As far as which ones I use often:
$! is quintessential for IO error handling
$# for eval error handling when calling mis-designed libraries (like database ones) whose coders weren't considerate enough to code in decent error handling other than "die"
$_ for map/grep blocks, although I 100% agree with a poster above that using it for regular code is not a good practice.
$| for flushing buffers
2) As far as using punctuation vs. English names, I'll pick on Marc Bollinger's reply above although the same rebuttal goes for anyone arguing that there's no benefit to using English names.
"if you're using Perl, you're obviously not choosing it for neophyte readability"
Marc, I find that is not always (or rather almost never) true. Then again, 99% of my Perl experience is writing production Perl code for large companies, 90% of it full fledged applications instead of 10-line hack scripts, so my analysis may not apply in other domains. The reasons such thinking as Marc's is wrong are:
Just because I'm a Perl non-neophyte (to put it mildly), some noob analyst hired a year ago - or an outsourced "genius" - is probably not. You may not want to confuse them any more than they already are. "If code was hard to write, it should be hard to read" is not exactly high on the list of good attitudes of professional developers, in any language.
When I'm up at 2am, half-asleep and troubleshooting a production problem, I really do not want to depend on the ability of my already-nearly-blind eyes to distinguish between $! and $|. Especially in a code written by before mentioned "genius" who may not have known which one of them to use and switched them around.
When I'm reading a code left unfinished by a guy who was cough "restructured" cough out of the company a year ago, I'd rather concentrate on intricacies of screwy logic than readability of the punctuation soup.
The three I use the most are $_, #_ and $!.
I like to use $_ when looping through an array, retrieving parameters (as pointed out by Motti, this is actually #_) or performing substitutions:
Example 1.1:
foreach (#items)
{
print $_;
}
Example 1.2:
my $prm1 = shift; # implicit use of #_ or #ARGV depending on context
Example 1.3:
s/" "/""/ig; # implicit use of $_
I use $! in cases like this:
Example 2.1:
open(FILE, ">>myfile") || die "Error: $!";
I do agree though, it makes the code more confusing to someone not familiar with Perl. But confusing other people is one of the joys of knowing the language! :)
Typical ones I use are $_, #_, #ARGV, $!, $/. Other ones I comment heavily.
Brad notes that $# is also a pretty common variable. (Error value from eval()).
I say use them--if you're using Perl, you're obviously not choosing it for neophyte readability. Any more-than-casual developer will likely have a browser/reference window open, and sifting through the perlvar manpage in one window is likely no less arduous than looking up definitions of (and assignments to!) global or external variables. As an example, I just recently encountered the new-in-5.10.x named capture buffers:
/^(?<myName>.*)$/;
# and later
my $capture = %+{'myName'};
And figuring out what was going on wasn't any harder than going into parlvar/perlre and reading a little bit.
I'd much rather find a bunch of wacky special vars in undocumented code than a bunch of wacky algorithms in undocumented code.