What does the function declaration "sub function($$)" mean? - perl

I have been using Perl for some time, but today I came across this code:
sub function1($$)
{
//snip
}
What does this mean in Perl?

It is a function with a prototype that takes two scalar arguments.
There are strong arguments for not actually using Perl prototypes in general - as noted in the comments below. The strongest argument is probably:
Far More Than Everything You've Ever Wanted to Know about Prototypes in Perl
There's a discussion on StackOverflow from 2008:
SO 297034
There's a possible replacement in the MooseX::Method::Signatures module.

As the other answer mentions, the $$ declares a prototype. What the other answer doesn't say is what prototypes are for. They are not for input validation, they are hints for the parser.
Imagine you have two functions declared like:
sub foo($) { ... }
sub bar($$) { ... }
Now when you write something ambiguous, like:
foo bar 1, 2
Perl knows where to put the parens; bar takes two args, so it consumes the two closest to it. foo takes one arg, so it takes the result of bar and the two args:
foo(bar(1,2))
Another example:
bar foo 2, 3
The same applies; foo takes one arg, so it gets the 2. bar takes two args, so it gets foo(2) and 3:
bar(foo(2),3)
This is a pretty important part of Perl, so dismissing it as "never use" is doing you a disservice. Nearly every internal function uses prototypes, so by understanding how they work in your own code, you can get a better understanding of how they're used by the builtins. Then you can avoid unnecessary parentheses, which makes for more pleasant-looking code.
Finally, one anti-pattern I will warn you against:
package Class;
sub new ($$) { bless $_[1] }
sub method ($) { $_[0]->{whatever} }
When you are calling code as methods (Class->method or $instance->method), the prototype check is completely meaningless. If your code can only be called as a method, adding a prototype is wrong. I have seen some popular modules that do this (hello, XML::Compile), but it's wrong, so don't do it. If you want to document how many args to pass, how about:
sub foo {
my ($self, $a, $b) = #_; # $a and $b are the bars to fooify
....
or
use MooseX::Method::Signatures;
method foo(Bar $a, Bar $b) { # fooify the bars
....
Unlike foo($$), these are meaningful and readable.

Related

Perl sub returns a subroutine

I haven't used Perl for around 20 years, and this is confusing me. I've g******d for it, but I obviously haven't used a suitable search string because I haven't found anything relating to this...
Why would I want to do the following? I understand what it's doing, but the "why" escapes me. Why not just return 0 or 1 to begin with?
I'm working on some code where a sub uses "return sub"; here's a very truncated example e.g.
sub test1 {
$a = shift #_;
if ($a eq "cat") {
return sub {
print("cat test OK\n");
return 0;
}
}
# default if "cat" wasn't the argument
return sub {
print("test for cat did not work\n");
return 1;
}
}
$c = test1("cat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
$c = test1("bat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
In your code there is no reason to return a sub. However, with a little tweak
sub test1 {
my $animal = shift #_;
if ($animal eq "cat" || $animal eq "dog") {
return sub {
print("$animal test OK\n");
return 0;
};
}
# default if "cat" or "dog" wasn't the argument
return sub {
print("test for cat or dog did not work\n");
return 1;
};
}
We now have a closure around $animal this saves memory as the test for cat and dog share the same code. Note that this only works with my variables. Also note that $a and $b are slightly special to Perl, they are used in the block of code that you can pass to the sort function and bypass some of the checks on visibility so it's best to avoid them for anything except sort.
You probably want to search "perl closures".
There are many reasons that you'd want to return a code reference, but it's not something I can shortly answer in a StackOverflow question. Mark Jason Dominus's Higher Order Perl is a good way to expand your mind, and we cover a little of that in Intermediate Perl.
I wrote File::Find::Closures as a way to demonstrate this is class. Each subroutine in that module returns two code references—one for the callback to File::Find and the other as a way to access the results. The two share a common variable which nothing else can access.
Notice in your case, you aren't merely calling a subroutine to "get a zero". It's doing other things. Even in your simple example there's some output. Some behavior is then deferred until you actually use the result for something.
Having said that, we have no chance of understanding why the programmer who wrote your particular code did it. One plausible guess was that the system was set up for more complex situations and you're looking at a trivial example that fits into that. Another plausible guess was that the programmer just learned that feature and fell in love with it then used it everywhere for a year. There's always a reason, but that doesn't mean there's always a good reason.

Not enough arguments when redefining a subroutine

When I redefine my own subroutine (and not a Perl built-in function), as below :
perl -ce 'sub a($$$){} sub b {a(#_)}'
I get this error :
Not enough arguments for main::a at -e line 1, near "#_)"
I'm wondering why.
Edit :
The word "redefine" is maybe not well chosen. But in my case (and I probably should have explained what I was trying to do originally), I want to redefine (and here "redefine" makes sense) the Test::More::is function by printing first Date and Time before the test result.
Here's what I've done :
Test::More.pm :
sub is ($$;$) {
my $tb = Test::More->builder;
return $tb->is_eq(#_);
}
MyModule.pm :
sub is ($$;$) {
my $t = gmtime(time);
my $date = $t->ymd('/').' '.$t->hms.' ';
print($date);
Test::More::is(#_);
}
The prototype that you have given your subroutine (copied from Test::More::is) says that your subroutine requires two mandatory parameters and one optional one. Passing in a single array will not satisfy that prototype - it is seen as a single parameter which will be evaluated in scalar context.
The fix is to retrieve the two (or three) parameters passed to your subroutine and to pass them, individually, to Test::More::is.
sub is ($$;$) {
my ($got, $expected, $test_name) = #_;
my $t = gmtime(time);
my $date = $t->ymd('/').' '.$t->hms.' ';
print($date);
Test::More::is($got, $expected, $test_name);
}
The problem has nothing to do with your use of a prototype or the fact that you are redefining a subroutine (which, strictly, you aren't as the two subroutines are in different packages) but it's because Test::More::is() has a prototype.
You are not redefining anything here.
You've set a prototype for your sub a by saying sub a($$$). The dollar signs in the function definition tell Perl that this sub has exactly three scalar parameters. When you call it with a(#_), Perl doesn't know how many elements will be in that list, thus it doesn't know how many arguments the call will have, and fails at compile time.
Don't mess with prototypes. You probably don't need them.
Instead, if you know your sub will need three arguments, explicitly grab them where you call it.
sub a($$$) {
...
}
sub b {
my ($one, $two, $three) = #_;
a($one, $two, $three);
}
Or better, don't use the prototype at all.
Also, a and b are terrible names. Don't use them.
In Perl, prototypes don't validate arguments so much as alter parsing rules. $$;$ means the sub expects the caller to match is(EXPR, EXPR) or is(EXPR, EXPR, EXPR).
In this case, bypassing the prototype is ideal.
sub is($$;$) {
print gmtime->strftime("%Y/%m/%d %H:%M:%S ");
return &Test::More::is(#_);
}
Since you don't care if Test::More::is modifies yours #_, the following is a simple optimization:
sub is($$;$) {
print gmtime->strftime("%Y/%m/%d %H:%M:%S ");
return &Test::More::is;
}
If Test::More::is uses caller, you'll find the following useful:
sub is($$;$) {
print gmtime->strftime("%Y/%m/%d %H:%M:%S ");
goto &Test::More::is;
}

perl constructor keyword 'new'

I am new to Perl and currently learning Perl object oriented and came across writing a constructor.
It looks like when using new for the name of the subroutine the first parameter will be the package name.
Must the constructor be using the keyword new? Or is it because when we are calling the new subroutine using the packagename, then the first parameter to be passed in will be package name?
packagename->new;
and when the subroutine have other name it will be the first parameter will be the reference to an object? Or is it because when the subroutine is call via the reference to the object so that the first parameter to be passed in will be the reference to the object?
$objRef->subroutine;
NB: All examples below are simplified for instructional purposes.
On Methods
Yes, you are correct. The first argument to your new function, if invoked as a method, will be the thing you invoked it against.
There are two “flavors” of invoking a method, but the result is the same either way. One flavor relies upon an operator, the binary -> operator. The other flavor relies on ordering of arguments, the way bitransitive verbs work in English. Most people use the dative/bitransitive style only with built-ins — and perhaps with constructors, but seldom anything else.
Under most (but not quite all) circumstances, these first two are equivalent:
1. Dative Invocation of Methods
This is the positional one, the one that uses word-order to determine what’s going on.
use Some::Package;
my $obj1 = new Some::Package NAME => "fred";
Notice we use no method arrow there: there is no -> as written. This is what Perl itself uses with many of its own functions, like
printf STDERR "%-20s: %5d\n", $name, $number;
Which just about everyone prefers to the equivalent:
STDERR->printf("%-20s: %5d\n", $name, $number);
However, these days that sort of dative invocation is used almost exclusively for built-ins, because people keep getting things confused.
2. Arrow Invocation of Methods
The arrow invocation is for the most part clearer and cleaner, and less likely to get you tangled up in the weeds of Perl parsing oddities. Note I said less likely; I did not say that it was free of all infelicities. But let’s just pretend so for the purposes of this answer.
use Some::Package;
my $obj2 = Some::Package->new(NAME => "fred");
At run time, barring any fancy oddities or inheritance matters, the actual function call would be
Some::Package::new("Some::Package", "NAME", "fred");
For example, if you were in the Perl debugger and did a stack dump, it would have something like the previous line in its call chain.
Since invoking a method always prefixes the parameter list with invocant, all functions that will be invoked as methods must account for that “extra” first argument. This is very easily done:
package Some::Package;
sub new {
my($classname, #arguments) = #_;
my $obj = { #arguments };
bless $obj, $classname;
return $obj;
}
This is just an extremely simplified example of the new most frequent ways to call constructors, and what happens on the inside. In actual production code, the constructor would be much more careful.
Methods and Indirection
Sometimes you don’t know the class name or the method name at compile time, so you need to use a variable to hold one or the other, or both. Indirection in programming is something different from indirect objects in natural language. Indirection just means you have a variable that contains something else, so you use the variable to get at its contents.
print 3.14; # print a number directly
$var = 3.14; # or indirectly
print $var;
We can use variables to hold other things involved in method invocation that merely the method’s arguments.
3. Arrow Invocation with Indirected Method Name:
If you don’t know the method name, then you can put its name in a variable. Only try this with arrow invocation, not with dative invocation.
use Some::Package;
my $action = (rand(2) < 1) ? "new" : "old";
my $obj = Some::Package->$action(NAME => "fido");
Here the method name itself is unknown until run-time.
4. Arrow Invocation with Indirected Class Name:
Here we use a variable to contain the name of the class we want to use.
my $class = (rand(2) < 1)
? "Fancy::Class"
: "Simple::Class";
my $obj3 = $class->new(NAME => "fred");
Now we randomly pick one class or another.
You can actually use dative invocation this way, too:
my $obj3 = new $class NAME => "fred";
But that isn’t usually done with user methods. It does sometimes happen with built-ins, though.
my $fh = ($error_count == 0) ? *STDOUT : *STDERR;
printf $fh "Error count: %d.\n", $error_count;
That’s because trying to use an expression in the dative slot isn’t going to work in general without a block around it; it can otherwise only be a simple scalar variable, not even a single element from an array or hash.
printf { ($error_count == 0) ? *STDOUT : *STDERR } "Error count: %d.\n", $error_count;
Or more simply:
print { $fh{$filename} } "Some data.\n";
Which is pretty darned ugly.
Let the invoker beware
Note that this doesn’t work perfectly. A literal in the dative object slot works differently than a variable does there. For example, with literal filehandles:
print STDERR;
means
print STDERR $_;
but if you use indirect filehandles, like this:
print $fh;
That actually means
print STDOUT $fh;
which is unlikely to mean what you wanted, which was probably this:
print $fh $_;
aka
$fh->print($_);
Advanced Usage: Dual-Nature Methods
The thing about the method invocation arrow -> is that it is agnostic about whether its left-hand operand is a string representing a class name or a blessed reference representing an object instance.
Of course, nothing formally requires that $class contain a package name. It may be either, and if so, it is up to the method itself to do the right thing.
use Some::Class;
my $class = "Some::Class";
my $obj = $class->new(NAME => "Orlando");
my $invocant = (rand(2) < 1) ? $class : $obj;
$invocant->any_debug(1);
That requires a pretty fancy any_debug method, one that does something different depending on whether its invocant was blessed or not:
package Some::Class;
use Scalar::Util qw(blessed);
sub new {
my($classname, #arguments) = #_;
my $obj = { #arguments };
bless $obj, $classname;
return $obj;
}
sub any_debug {
my($invocant, $value) = #_;
if (blessed($invocant)) {
$invocant->obj_debug($value);
} else {
$invocant->class_debug($value);
}
}
sub obj_debug {
my($self, $value) = #_;
$self->{DEBUG} = $value;
}
my $Global_Debug;
sub class_debug {
my($classname, $value) = #_;
$Global_Debug = $value;
}
However, this is a rather advanced and subtle technique, one applicable in only a few uncommon situations. It is not recommended for most situations, as it can be confusing if not handled properly — and perhaps even if it is.
It is not first parameter to new, but indirect object syntax,
perl -MO=Deparse -e 'my $o = new X 1, 2'
which gets parsed as
my $o = 'X'->new(1, 2);
From perldoc,
Perl suports another method invocation syntax called "indirect object" notation. This syntax is called "indirect" because the method comes before the object it is being invoked on.
That being said, new is not some kind of reserved word for constructor invocation, but name of method/constructor itself, which in perl is not enforced (ie. DBI has connect constructor)

Best way to equate 2 subroutines?

I have many modules each of them have a sub insert_info and sub update_info methods. From time to time, the sub update_info and sub insert_info are the same. But I don't want to use only one of these methods when that happens, because in general they are not the same. So How I make the 2 methods equal?
Is this the only way?
sub insert_info {
# code......
}
sub update_info { insert_info(); }
Alias via typeglob
*update_info = \&insert_info;
Adding BEGIN may avoid problems
BEGIN { *update_info = \&insert_info; }
This helps ensure that it runs before other things, which may call it.
Comments on your Example
Also, your sub update_info { insert_info(); } is not a copy because it will always call insert_info with no parameters. If you passed update_info any values (like update_info('someval')), they would not be passed on to insert_info. Furthermore, they are both declared and defined subroutines - both taking memory.
If you wanted to declare it how you did and automatically pass along the arguments to the inner function, you could do sub update_info { insert_info(#_); }, or better is sub update_info { &insert_info }, since the & without any argument list, will automatically pass along #_.
Still these take more memory than using the typeglob assignment, listed at the top.
This is one of the rare opportunities to use goto without being shamed for it.
sub update_info {
goto &insert_info;
}
This has the benefit of passing along any arguments to the inner function, and cleaning up the caller() stack to remove the call to the outer function.
I suggest you use the Sub::Alias module from CPAN, which has the advantages of being explicit and self-documenting as well as neat and clear in use.
Your code becomes
use Sub::Alias 'alias';
sub insert_info {
...
}
alias update_info => 'insert_info';

Why does this Perl produce "Not a CODE reference?"

I need to remove a method from the Perl symbol table at runtime. I attempted to do this using undef &Square::area, which does delete the function but leaves some traces behind. Specifically, when $square->area() is called, Perl complains that it is "Not a CODE reference" instead of "Undefined subroutine &Square::area called" which is what I expect.
You might ask, "Why does it matter? You deleted the function, why would you call it?" The answer is that I'm not calling it, Perl is. Square inherits from Rectangle, and I want the inheritance chain to pass $square->area through to &Rectangle::area, but instead of skipping Square where the method doesn't exist and then falling through to Rectangle's area(), the method call dies with "Not a CODE reference."
Oddly, this appears to only happen when &Square::area was defined by typeglob assignment (e.g. *area = sub {...}). If the function is defined using the standard sub area {} approach, the code works as expected.
Also interesting, undefining the whole glob works as expected. Just not undefining the subroutine itself.
Here's a short example that illustrates the symptom, and contrasts with correct behavior:
#!/usr/bin/env perl
use strict;
use warnings;
# This generates "Not a CODE reference". Why?
sub howdy; *howdy = sub { "Howdy!\n" };
undef &howdy;
eval { howdy };
print $#;
# Undefined subroutine &main::hi called (as expected)
sub hi { "Hi!\n" }
undef &hi;
eval { hi };
print $#;
# Undefined subroutine &main::hello called (as expected)
sub hello; *hello = sub { "Hello!\n" };
undef *hello;
eval { hello };
print $#;
Update: I have since solved this problem using Package::Stash (thanks #Ether), but I'm still confused by why it's happening in the first place. perldoc perlmod says:
package main;
sub Some_package::foo { ... } # &foo defined in Some_package
This is just a shorthand for a typeglob assignment at compile time:
BEGIN { *Some_package::foo = sub { ... } }
But it appears that it isn't just shorthand, because the two cause different behavior after undefining the function. I'd appreciate if someone could tell me whether this is a case of (1) incorrect docs, (2) bug in perl, or (3) PEBCAK.
Manipulating symbol table references yourself is bound to get you into trouble, as there are lots of little fiddly things that are hard to get right. Fortunately there is a module that does all the heavy lifting for you, Package::Stash -- so just call its methods add_package_symbol and remove_package_symbol as needed.
Another good method installer that you may want to check out is Sub::Install -- especially nice if you want to generate lots of similar functions.
As to why your approach is not correct, let's take a look at the symbol table after deleting the code reference:
sub foo { "foo!\n"}
sub howdy; *howdy = sub { "Howdy!\n" };
undef &howdy;
eval { howdy };
print $#;
use Data::Dumper;
no strict 'refs';
print Dumper(\%{"main::"});
prints (abridged):
$VAR1 = {
'howdy' => *::howdy,
'foo' => *::foo,
};
As you can see, the 'howdy' slot is still present - undefining &howdy doesn't actually do anything enough. You need to explicitly remove the glob slot, *howdy.
The reason it happens is precisely because you assigned a typeglob.
When you delete the CODE symbol, the rest of typeglob is still lingering, so when you try to execute howdy it will point to the non-CODE piece of typeglob.