Perl sub returns a subroutine - perl

I haven't used Perl for around 20 years, and this is confusing me. I've g******d for it, but I obviously haven't used a suitable search string because I haven't found anything relating to this...
Why would I want to do the following? I understand what it's doing, but the "why" escapes me. Why not just return 0 or 1 to begin with?
I'm working on some code where a sub uses "return sub"; here's a very truncated example e.g.
sub test1 {
$a = shift #_;
if ($a eq "cat") {
return sub {
print("cat test OK\n");
return 0;
}
}
# default if "cat" wasn't the argument
return sub {
print("test for cat did not work\n");
return 1;
}
}
$c = test1("cat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
$c = test1("bat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");

In your code there is no reason to return a sub. However, with a little tweak
sub test1 {
my $animal = shift #_;
if ($animal eq "cat" || $animal eq "dog") {
return sub {
print("$animal test OK\n");
return 0;
};
}
# default if "cat" or "dog" wasn't the argument
return sub {
print("test for cat or dog did not work\n");
return 1;
};
}
We now have a closure around $animal this saves memory as the test for cat and dog share the same code. Note that this only works with my variables. Also note that $a and $b are slightly special to Perl, they are used in the block of code that you can pass to the sort function and bypass some of the checks on visibility so it's best to avoid them for anything except sort.

You probably want to search "perl closures".
There are many reasons that you'd want to return a code reference, but it's not something I can shortly answer in a StackOverflow question. Mark Jason Dominus's Higher Order Perl is a good way to expand your mind, and we cover a little of that in Intermediate Perl.
I wrote File::Find::Closures as a way to demonstrate this is class. Each subroutine in that module returns two code references—one for the callback to File::Find and the other as a way to access the results. The two share a common variable which nothing else can access.
Notice in your case, you aren't merely calling a subroutine to "get a zero". It's doing other things. Even in your simple example there's some output. Some behavior is then deferred until you actually use the result for something.
Having said that, we have no chance of understanding why the programmer who wrote your particular code did it. One plausible guess was that the system was set up for more complex situations and you're looking at a trivial example that fits into that. Another plausible guess was that the programmer just learned that feature and fell in love with it then used it everywhere for a year. There's always a reason, but that doesn't mean there's always a good reason.

Related

Not enough arguments when redefining a subroutine

When I redefine my own subroutine (and not a Perl built-in function), as below :
perl -ce 'sub a($$$){} sub b {a(#_)}'
I get this error :
Not enough arguments for main::a at -e line 1, near "#_)"
I'm wondering why.
Edit :
The word "redefine" is maybe not well chosen. But in my case (and I probably should have explained what I was trying to do originally), I want to redefine (and here "redefine" makes sense) the Test::More::is function by printing first Date and Time before the test result.
Here's what I've done :
Test::More.pm :
sub is ($$;$) {
my $tb = Test::More->builder;
return $tb->is_eq(#_);
}
MyModule.pm :
sub is ($$;$) {
my $t = gmtime(time);
my $date = $t->ymd('/').' '.$t->hms.' ';
print($date);
Test::More::is(#_);
}
The prototype that you have given your subroutine (copied from Test::More::is) says that your subroutine requires two mandatory parameters and one optional one. Passing in a single array will not satisfy that prototype - it is seen as a single parameter which will be evaluated in scalar context.
The fix is to retrieve the two (or three) parameters passed to your subroutine and to pass them, individually, to Test::More::is.
sub is ($$;$) {
my ($got, $expected, $test_name) = #_;
my $t = gmtime(time);
my $date = $t->ymd('/').' '.$t->hms.' ';
print($date);
Test::More::is($got, $expected, $test_name);
}
The problem has nothing to do with your use of a prototype or the fact that you are redefining a subroutine (which, strictly, you aren't as the two subroutines are in different packages) but it's because Test::More::is() has a prototype.
You are not redefining anything here.
You've set a prototype for your sub a by saying sub a($$$). The dollar signs in the function definition tell Perl that this sub has exactly three scalar parameters. When you call it with a(#_), Perl doesn't know how many elements will be in that list, thus it doesn't know how many arguments the call will have, and fails at compile time.
Don't mess with prototypes. You probably don't need them.
Instead, if you know your sub will need three arguments, explicitly grab them where you call it.
sub a($$$) {
...
}
sub b {
my ($one, $two, $three) = #_;
a($one, $two, $three);
}
Or better, don't use the prototype at all.
Also, a and b are terrible names. Don't use them.
In Perl, prototypes don't validate arguments so much as alter parsing rules. $$;$ means the sub expects the caller to match is(EXPR, EXPR) or is(EXPR, EXPR, EXPR).
In this case, bypassing the prototype is ideal.
sub is($$;$) {
print gmtime->strftime("%Y/%m/%d %H:%M:%S ");
return &Test::More::is(#_);
}
Since you don't care if Test::More::is modifies yours #_, the following is a simple optimization:
sub is($$;$) {
print gmtime->strftime("%Y/%m/%d %H:%M:%S ");
return &Test::More::is;
}
If Test::More::is uses caller, you'll find the following useful:
sub is($$;$) {
print gmtime->strftime("%Y/%m/%d %H:%M:%S ");
goto &Test::More::is;
}

Why does this Perl produce "Not a CODE reference?"

I need to remove a method from the Perl symbol table at runtime. I attempted to do this using undef &Square::area, which does delete the function but leaves some traces behind. Specifically, when $square->area() is called, Perl complains that it is "Not a CODE reference" instead of "Undefined subroutine &Square::area called" which is what I expect.
You might ask, "Why does it matter? You deleted the function, why would you call it?" The answer is that I'm not calling it, Perl is. Square inherits from Rectangle, and I want the inheritance chain to pass $square->area through to &Rectangle::area, but instead of skipping Square where the method doesn't exist and then falling through to Rectangle's area(), the method call dies with "Not a CODE reference."
Oddly, this appears to only happen when &Square::area was defined by typeglob assignment (e.g. *area = sub {...}). If the function is defined using the standard sub area {} approach, the code works as expected.
Also interesting, undefining the whole glob works as expected. Just not undefining the subroutine itself.
Here's a short example that illustrates the symptom, and contrasts with correct behavior:
#!/usr/bin/env perl
use strict;
use warnings;
# This generates "Not a CODE reference". Why?
sub howdy; *howdy = sub { "Howdy!\n" };
undef &howdy;
eval { howdy };
print $#;
# Undefined subroutine &main::hi called (as expected)
sub hi { "Hi!\n" }
undef &hi;
eval { hi };
print $#;
# Undefined subroutine &main::hello called (as expected)
sub hello; *hello = sub { "Hello!\n" };
undef *hello;
eval { hello };
print $#;
Update: I have since solved this problem using Package::Stash (thanks #Ether), but I'm still confused by why it's happening in the first place. perldoc perlmod says:
package main;
sub Some_package::foo { ... } # &foo defined in Some_package
This is just a shorthand for a typeglob assignment at compile time:
BEGIN { *Some_package::foo = sub { ... } }
But it appears that it isn't just shorthand, because the two cause different behavior after undefining the function. I'd appreciate if someone could tell me whether this is a case of (1) incorrect docs, (2) bug in perl, or (3) PEBCAK.
Manipulating symbol table references yourself is bound to get you into trouble, as there are lots of little fiddly things that are hard to get right. Fortunately there is a module that does all the heavy lifting for you, Package::Stash -- so just call its methods add_package_symbol and remove_package_symbol as needed.
Another good method installer that you may want to check out is Sub::Install -- especially nice if you want to generate lots of similar functions.
As to why your approach is not correct, let's take a look at the symbol table after deleting the code reference:
sub foo { "foo!\n"}
sub howdy; *howdy = sub { "Howdy!\n" };
undef &howdy;
eval { howdy };
print $#;
use Data::Dumper;
no strict 'refs';
print Dumper(\%{"main::"});
prints (abridged):
$VAR1 = {
'howdy' => *::howdy,
'foo' => *::foo,
};
As you can see, the 'howdy' slot is still present - undefining &howdy doesn't actually do anything enough. You need to explicitly remove the glob slot, *howdy.
The reason it happens is precisely because you assigned a typeglob.
When you delete the CODE symbol, the rest of typeglob is still lingering, so when you try to execute howdy it will point to the non-CODE piece of typeglob.

How can a Perl force its caller to return? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is it possible for a Perl subroutine to force its caller to return?
I want to write a subroutine which causes the caller to return under certain conditions. This is meant to be used as a shortcut for validating input to a function. What I have so far is:
sub needs($$) {
my ($condition, $message) = #_;
if (not $condition) {
print "$message\n";
# would like to return from the *parent* here
}
return $condition;
}
sub run_find {
my $arg = shift #_;
needs $arg, "arg required" or return;
needs exists $lang{$arg}, "No such language: $arg" or return;
# etc.
}
The advantage of returning from the caller in needs would then be to avoid having to write the repetitive or return inside run_find and similar functions.
I think you're focussing on the wrong thing here. I do this sort of thing with Data::Constraint, Brick, etc. and talk about this in Mastering Perl. With a little cleverness and thought about the structure of your program and the dynamic features that Perl has, you don't need such a regimented, procedural approach.
However, the first thing you need to figure out is what you really want to know in that calling subroutine. If you just want to know yes or no, it's pretty easy.
The problem with your needs is that you're thinking about calling it once for every condition, which forces you to use needs to control program flow. That's the wrong way to go. needs is only there to give you an answer. It's job is not to change program state. It becomes much less useful if you misuse it because some other calling subroutine might want to continue even if needs returns false. Call it once and let it return once. The calling subroutine uses the return value to decide what it should do.
The basic structure involves a table that you pass to needs. This is your validation profile.
sub run_find {
my $arg = shift #_;
return unless needs [
[ sub { $arg }, "arg required" ],
[ sub { exists $lang{$arg} }, "No such language: $arg" ],
];
}
...
}
You construct your table for whatever your requirements are. In needs you just process the table:
sub needs($$) {
my ($table) = #_;
foreach $test ( #$table ) {
my( $sub, $message ) = #$test;
unless( $sub->(...) ) {
print $message;
return
}
}
return 1;
}
Now, the really cool thing with this approach is that you don't have to know the table ahead of time. You can pull that from configuration or some other method. That also means that you can change the table dynamically. Now your code shrinks quite a bit:
sub run_find {
my $arg = shift #_;
return unless needs( $validators{run_find} );
...
}
You cna keep going with this. In Mastering Perl I show a couple of solutions that completely remove that from the code and moves it into a configuration file. That is, you can change the business rules without changing the code.
Remember, almost any time that you are typing the same sequence of characters, you're probably doing it wrong. :)
Sounds like you are re-inventing exception handling.
The needs function should not magically deduce its parent and interrupt the parent's control flow - that's bad manners. What if you add additional functions to the call chain, and you need to go back two or even three functions back? How can you determine this programmatically? Will the caller be expecting his or her function to return early? You should follow the principle of least surprise if you want to avoid bugs - and that means using exceptions to indicate that there is a problem, and having the caller decide how to deal with it:
use Carp;
use Try::Tiny;
sub run_find {
my $arg = shift;
defined $arg or croak "arg required";
exists $lang{$arg} or croak "no such language: $arg";
...
}
sub parent {
try { run_find('foo') }
catch { print $#; }
}
Any code inside of the try block is special: if something dies, the exception is caught and stored in $#. In this case, the catch block is executed, which prints the error to STDOUT and control flow continues as normal.
Disclaimer: exception handling in Perl is a pain. I recommend Try::Tiny, which protects against many common gotchas (and provides familiar try/catch semantics) and Exception::Class to quickly make exception objects so you can distinguish between Perl's errors and your own.
For validation of arguments, you might find it easier to use a CPAN module such as Params::Validate.
You may want to look at a similar recent question by kinopiko:
Is it possible for a Perl subroutine to force its caller to return?
The executive summary for that is: best solution is to use exceptions (die/eval, Try::Tiny, etc...). You van also use GOTO and possibly Continuation::Escape
It doesn't make sense to do things this way; ironically, ya doesn't needs needs.
Here's why.
run_find is poorly written. If your first condition is true, you'll never test the second one since you'll have returned already.
The warn and die functions will provide you printing and/or exiting behavior anyway.
Here's how I would write your run_find sub if you wanted to terminate execution if your argument fails (renamed it to well_defined):
sub well_defined {
my $arg = shift;
$arg or die "arg required";
exists $lang{$arg} or die "no such language: $arg";
return 1;
}
There should be a way to return 0 and warn at the same time, but I'll need to play around with it a little more.
run_find can also be written to return 0 and the appropriate warn message if conditions are not met, and return 1 if they are (renamed to well_defined).
sub well_defined {
my $arg = shift;
$arg or warn "arg required" and return 0;
exists $lang{$arg} or warn "no such language: $arg" and return 0;
return 1;
}
This enables Boolean-esque behavior, as demonstrated below:
perform_calculation $arg if well_defined $arg; # executes only if well-defined

What does the function declaration "sub function($$)" mean?

I have been using Perl for some time, but today I came across this code:
sub function1($$)
{
//snip
}
What does this mean in Perl?
It is a function with a prototype that takes two scalar arguments.
There are strong arguments for not actually using Perl prototypes in general - as noted in the comments below. The strongest argument is probably:
Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl
There's a discussion on StackOverflow from 2008:
SO 297034
There's a possible replacement in the MooseX::Method::Signatures module.
As the other answer mentions, the $$ declares a prototype. What the other answer doesn't say is what prototypes are for. They are not for input validation, they are hints for the parser.
Imagine you have two functions declared like:
sub foo($) { ... }
sub bar($$) { ... }
Now when you write something ambiguous, like:
foo bar 1, 2
Perl knows where to put the parens; bar takes two args, so it consumes the two closest to it. foo takes one arg, so it takes the result of bar and the two args:
foo(bar(1,2))
Another example:
bar foo 2, 3
The same applies; foo takes one arg, so it gets the 2. bar takes two args, so it gets foo(2) and 3:
bar(foo(2),3)
This is a pretty important part of Perl, so dismissing it as "never use" is doing you a disservice. Nearly every internal function uses prototypes, so by understanding how they work in your own code, you can get a better understanding of how they're used by the builtins. Then you can avoid unnecessary parentheses, which makes for more pleasant-looking code.
Finally, one anti-pattern I will warn you against:
package Class;
sub new ($$) { bless $_[1] }
sub method ($) { $_[0]->{whatever} }
When you are calling code as methods (Class->method or $instance->method), the prototype check is completely meaningless. If your code can only be called as a method, adding a prototype is wrong. I have seen some popular modules that do this (hello, XML::Compile), but it's wrong, so don't do it. If you want to document how many args to pass, how about:
sub foo {
my ($self, $a, $b) = #_; # $a and $b are the bars to fooify
....
or
use MooseX::Method::Signatures;
method foo(Bar $a, Bar $b) { # fooify the bars
....
Unlike foo($$), these are meaningful and readable.

How can I override Perl functions, enabling multiple overrides?

some time ago, I asked This question about overriding building perl functions.
How do I do this in a way that allows multiple overrides? The following code yields an infinite recursion.
What's the proper way to do this? If I redefine a function, I don't want to step on someone else's redefinition.
package first;
my $orig_system1;
sub mysystem {
my #args = #_;
print("in first mysystem\n");
return &{$orig_system1}(#args);
}
BEGIN {
if (defined(my $orig = \&CORE::GLOBAL::system)) {
$orig_system1 = $orig;
*CORE::GLOBAL::system = \&first::mysystem;
printf("first defined\n");
} else {
printf("no orig for first\n");
}
}
package main;
system("echo hello world");
The proper way to do it is not to do it. Find some other way to accomplish what you're doing. This technique has all the problems of a global variable, squared. Unless you get your rewrite of the function exactly right, you could break all sorts of code you never even knew existed. And while you might be polite in not blowing over an existing override, somebody else probably will not be.
Overriding system is particularly touchy because it does not have a proper prototype. This is because it does things not expressible in the prototype system. This means your override cannot do some things that system can. Namely...
system {$program} #args;
This is a valid way to call system, though you need to read the exec docs to do it. You might think "oh, well I just won't do that then", but if any module that you use does it, or any module it uses does it, then you're out of luck.
That said, there's little different from overriding any other function politely. You have to trap the existing function and be sure you call it in your new one. Whether you do it before or after is up to you.
The problem in your code is that the proper way to check if a function is defined is defined &function. Taking a code ref, even of an undefined function, will always return a true code ref. I'm not sure why, maybe its like how \undef will return a scalar ref. Why calling this code ref is causing mysystem() to go infinitely recursive is anyone's guess.
There's an additional complexity in that you can't take a reference to a core function. \&CORE::system doesn't do what you mean. Nor can you get at it with a symbolic reference. So if you want to call CORE::system or an existing override depending on which is defined you can't just assign one or the other to a code ref. You have to split your logic.
Here is one way to do it.
package first;
use strict;
use warnings;
sub override_system {
my $after = shift;
my $code;
if( defined &CORE::GLOBAL::system ) {
my $original = \&CORE::GLOBAL::system;
$code = sub {
my $exit = $original->(#_);
return $after->($exit, #_);
};
}
else {
$code = sub {
my $exit = CORE::system(#_);
return $after->($exit, #_);
};
}
no warnings 'redefine';
*CORE::GLOBAL::system = $code;
}
sub mysystem {
my($exit, #args) = #_;
print("in first mysystem, got $exit and #args\n");
}
BEGIN { override_system(\&mysystem) }
package main;
system("echo hello world");
Note that I've changed mysystem() to merely be a hook that runs after the real system. It gets all the arguments and the exit code, and it can change the exit code, but it doesn't change what system() actually does. Adding before/after hooks is the only thing you can do if you want to honor an existing override. Its quite a bit safer anyway. The mess of overriding system is now in a subroutine to keep BEGIN from getting too cluttered.
You should be able to modify this for your needs.