Perl subroutine getting arguments without sending any - perl

I'm attempting to do some hacking in some Git source code (as in the source code for Git, not just some random piece of code managed by Git). The bit I'm looking at is in Perl, and I'm having trouble understanding what's going on.
I have very little experience (and that several years old) of Perl; I've asked a couple of friends with more experience for advice, but they've turned up nothing.
The relevant bit of code is in the v1.8.1.5 source code, where git-svn.perl's cmd_fetch function includes the line:
$_fetch_all ? $gs->fetch_all : $gs->fetch;
My best reading of this is that it will call either the fetch or fetch_all functions (I can't see how it could be doing anything else, certainly).
In SVN.pm we find that fetch function, which starts with the following line:
my ($self, $min_rev, $max_rev, #parents) = #_;
I recognise that as collecting the function arguments, but (and finally, my question): where do these arguments get passed in?

The function called with the arrow notation is called as a method. The first argument to a method is the object whose method was called. $self will be therefore set to $gs. The rest of the arguments is empty, hence undef.

First, I really hope there is an assignment to the left of the code you cited. Using the conditional operator for control flow is a crime against humanity. That said, your intuition about what happens is correct: Depending on the value of $_fetch_all, either $gs->fetch or $gs->fetch_all is called. Now, on to the argument question.
Perl method calls pass arguments by prepending the invocant to the list of arguments, so the call
$gs->fetch
results in the arguments ($gs) being passed into the method as #_. The argument assignment line
my ($self, $min_rev, $max_rev, #parents) = #_;
then list-assigns
my ($self, $min_rev, $max_rev, #parents) = ($gs);
List assignments assign corresponding elements until an array or hash on the left-hand side eats all the arguments or the list of assignees is exhausted, padding the list with undef as needed. So $self gets $gs, $min_rev and $max_rev get undef, and #parents gets the empty list. It turns out that these are all valid values, and so nothing untoward happens.
If you wanted to affect the values of $min_rev et al., you would alter the call site to read
$gs->fetch(5, 9)
(it turns out #parents is ignored, so I don't know what its legal values might be).

Related

Basic Object Oriented subfunction definition and use in Perl

Sorry to bother the community for this but I have unfortunately to code in Perl :'(. It is about an OO perl code I want to understand but I am failing to put all the pieces together.
The following is a template of code that represents somehow what I am currently looking at. The following is the class MyClass:
package Namespace::MyClass;
sub new($)
{
my ($class) = #_;
$self = { };
bless ($self, $class);
}
sub init($$)
{
my ($self, $param1) = #_;
$self->{whatever} = ($param1, $param1, $param1);
}
and then the following is a script.pl that supposedly uses the class:
#!/path/to/your/perl
require Namespace::MyClass;
my myClass = new Namespace::MyClass()
myClass->init("data_for_param1");
There may be error but I am interested more in having the following questions answered than having my possibly wrong code corrected:
Questions group 1 : "$" in a sub definition means I need to supply one parameter, right? If so, why does new ask for one and I do not supply it? Has this to do with the call in the script using () or something similar to how Python works (self is implied)?
Question group 2 : is for the same previous reason that the init subroutine (here a method) declares to expect two parameters? If so, is the blessing in some way implying a self is ever passed for all the function in the module?
I ask this because I saw that in non blessed modules one $ = one parameter.
Thank you for your time.
QG1:
Prototypes (like "$") mean exactly nothing in Method calls.
Method calls are not influenced by prototypes either, because the function to be called is indeterminate at compile time, since the exact code called depends on inheritance.
Most experienced Perl folk avoid prototypes entirely unless they are trying to imitate a built-in function. Some PHBs inexperienced in Perl mandate their use under the mistaken idea that they work like prototypes in other languages.
The 1st parameter of a Method call is the Object (Blessed Ref) or Class Name (String) that called the Method. In the case of your new Method that would be 'Namespace::MyClass'.
Word to the wise: Also avoid indirect Method calls. Rewrite your line using the direct Method call as follows: my $myClass = Namespace::MyClass->new;
QG2:
Your init method is getting $myClass as it's 1st parameter because it is what 'called' the method. The 2nd parameter is from the parameter list. Blessing binds the name of the Class to the Reference, so that when a method call is seen, It knows which class in which to start the search for the correct sub. If the correct sub is not immediately found, the search continues in the classes named in the class's #ISA array.
Don't use prototypes! They don't do what you think they do.
Prototypes in Perl are mainly used to allow functions to be defined without the use of parentheses or to allow for functions that take array references to use the array name like pop or push do. Otherwise, prototypes can cause more trouble and heartbreak than experienced by most soap opera characters.
is what you actually want to do validate parameters? if so then that is not the purpose of prototypes. you could try using signatures, but for some reason they are new and still experimental. some consider lack of a stable signatures feature to be a flaw of perl. the alternatives are CPAN and writing code in your subs/methods that explicitly validate the params.

Perl references. How do we know it is one?

I am new to Perl and reading about references.
I can not understand how doe one know if the variable he work on is a reference.
For instance if I understand correctly, this:
$b = $a could be assigning scalars or references. How do we know which is it?
In C or C++ we would know via the function signature (*a or &a of **a). But in Perl there is no signature of parameters.
So how do we know in code what is a reference and what is not? Or if it is a reference to scalar or array or hash or another reference?
Perl has a ref that you can use for that:
Returns a non-empty string if EXPR is a reference, the empty string otherwise. [...]
The string returned (if non-empty) will tell you the type of object the reference references.
You're asking the wrong question.
While there is a function called ref and another called reftype, these are not functions you should ever need to use.
It's bad to check the type of variables, because there's no way to effectively know without actually using it as intended due to overloading and magic.
For example, say you designed a function that accepts a reference or a string. That would be a bad design because an object that overloads stringification is both.
A good interface would use context to differentiate the arguments. For example, it could differentiate based on the number of arguments,
foo($point_obj)
-vs-
foo(x => $x, y => $y)
based on the value of other arguments,
foo(fh => $fh)
-vs-
foo(str => $file_contents)
or based on the choice of function called
foo_from_fh($fh)
-vs-
foo($file_contents)
So the answer is: You know it's a reference because your documentation instructs the caller of your function to pass a reference. If you got passed something other than a reference and it's used as a reference, the caller will get a strict error for their error.
The ref function is what you're looking for. Documentation is available at http://perldoc.perl.org/functions/ref.html
ref EXPR
Returns a non-empty string if EXPR is a reference, the empty string otherwise. If EXPR is
not specified, $_ will be used. The value returned depends on the type of thing the
reference is a reference to...

Why do '::' and '->' work (sort of) interchangeably when calling methods from Perl modules?

I keep getting :: confused with -> when calling subroutines from modules. I know that :: is more related to paths and where the module/subroutine is and -> is used for objects, but I don't really understand why I can seemingly interchange both and it not come up with immediate errors.
I have perl modules which are part of a larger package, e.g. FullProgram::Part1
I'm just about getting to grips with modules, but still am on wobbly grounds when it comes to Perl objects, but I've been accidentally doing this:
FullProgram::Part1::subroutine1();
instead of
FullProgram::Part1->subroutine1();
so when I've been passing a hash ref to subroutine1 and been careful about using $class/$self to deal with the object reference and accidentally use :: I end up pulling my hair out wondering why my hash ref seems to disappear. I have learnt my lesson, but would really like an explanation of the difference. I have read the perldocs and various websites on these but I haven't seen any comparisons between the two (quite hard to google...)
All help appreciated - always good to understand what I'm doing!
There's no inherent difference between a vanilla sub and one's that's a method. It's all in how you call it.
Class::foo('a');
This will call Class::foo. If Class::foo doesn't exist, the inheritance tree will not be checked. Class::foo will be passed only the provided arguments ('a').
It's roughly the same as: my $sub = \&Class::foo; $sub->('a');
Class->foo('a');
This will call Class::foo, or foo in one of its base classes if Class::foo doesn't exist. The invocant (what's on the left of the ->) will be passed as an argument.
It's roughly the same as: my $sub = Class->can('foo'); $sub->('Class', 'a');
FullProgram::Part1::subroutine1();
calls the subroutine subroutine1 of the package FullProgram::Part1 with an empty parameter list while
FullProgram::Part1->subroutine1();
calls the same subroutine with the package name as the first argument (note that it gets a little bit more complex when you're subclassing). This syntax is used by constructor methods that need the class name for building objects of subclasses like
sub new {
my ($class, #args) = #_;
...
return bless $thing, $class;
}
FYI: in Perl OO you see $object->method(#args) which calls Class::method with the object (a blessed reference) as the first argument instead of the package/class name. In a method like this, the subroutine could work like this:
sub method {
my ($self, $foo, $bar) = #_;
$self->do_something_with($bar);
# ...
}
which will call the subroutine do_something_with with the object as first argument again followed by the value of $bar which was the second list element you originally passed to method in #args. That way the object itself doesn't get lost.
For more informations about how the inheritance tree becomes important when calling methods, please see ikegami's answer!
Use both!
use Module::Two;
Module::Two::->class_method();
Note that this works but also protects you against an ambiguity there; the simple
Module::Two->class_method();
will be interpreted as:
Module::Two()->class_method();
(calling the subroutine Two in Module and trying to call class_method on its return value - likely resulting in a runtime error or calling a class or instance method in some completely different class) if there happens to be a sub Two in Module - something that you shouldn't depend on one way or the other, since it's not any of your code's business what is in Module.
Historically, Perl dont had any OO. And functions from packages called with FullProgram::Part1::subroutine1(); sytax. Or even before with FullProgram'Part1'subroutine1(); syntax(deprecated).
Later, they implemented OOP with -> sign, but dont changed too much actually. FullProgram::Part1->subroutine1(); calls subroutine1 and FullProgram::Part1 goes as 1st parameter. you can see usage of this when you create an object: my $cgi = CGI->new(). Now, when you call a method from this object, left part also goes as first parameter to function: $cgi->param(''). Thats how param gets object he called from (usually named $self). Thats it. -> is hack for OOP. So as a result Perl does not have classes(packages work as them) but does have objects("objects" hacks too - they are blessed scalars).
Offtop: Also you can call with my $cgi = new CGI; syntax. This is same as CGI->new. Same when you say print STDOUT "text\n";. Yeah, just just calling IOHandle::print().

Why aren't both versions of this code failing the -c Perl check?

The new method of Parse::RecDescent has this prototype:
sub new ($$$)
{
# code goes here
}
and if I create an object like this:
my $parser = Parse::RecDescent->new($grammar);
it will create a parser, and the method will receive 2 parameters "Parse::RecDescent" and $grammar, right? If I try to create an object like:
Parse::RecDescent::new("Parse::RecDescent",$grammar)
this will fail saying "Not enough arguments for Parse::RecDescent::new", and I understand this message. I'm only passing 2 parameters. However, I don't understand why the arrow version works.
Can you explain?
Function prototypes are not checked when you call it as an OO-style method. In addition, you bypass prototype checking when you call a sub with &, as in &sub(arg0, arg1..);
From perldoc perlsub:
Not only does the "&" form make the argument list optional, it also disables any prototype checking on arguments you do provide. This is partly for
historical reasons, and partly for having a convenient way to cheat if you know what you're doing. See Prototypes below.
Method calls are not influenced by prototypes either, because the function to be called is indeterminate at compile time, since the exact code called depends on inheritance.
While Parse::RecDescent::new("Parse::RecDescent", $grammar) is syntactically correct, that's a pretty smelly way of calling the constructor, and now you are forcing it to be defined in that class (rather than in an ancestor). If you really need to validate your arguments, do it inside the method:
sub new
{
my ($class, #args) = #_;
die "Not enough arguments passed to constructor" if #args < 2;
# ...
}
See also this earlier question on prototypes and why they aren't usually such a great idea.

What is the difference between new Some::Class and Some::Class->new() in Perl?

Many years ago I remember a fellow programmer counselling this:
new Some::Class; # bad! (but why?)
Some::Class->new(); # good!
Sadly now I cannot remember the/his reason why. :( Both forms will work correctly even if the constructor does not actually exist in the Some::Class module but instead is inherited from a parent somewhere.
Neither of these forms are the same as Some::Class::new(), which will not pass the name of the class as the first parameter to the constructor -- so this form is always incorrect.
Even if the two forms are equivalent, I find Some::Class->new() to be much more clear, as it follows the standard convention for calling a method on a module, and in perl, the 'new' method is not special - a constructor could be called anything, and new() could do anything (although of course we generally expect it to be a constructor).
Using new Some::Class is called "indirect" method invocation, and it's bad because it introduces some ambiguity into the syntax.
One reason it can fail is if you have an array or hash of objects. You might expect
dosomethingwith $hashref->{obj}
to be equal to
$hashref->{obj}->dosomethingwith();
but it actually parses as:
$hashref->dosomethingwith->{obj}
which probably isn't what you wanted.
Another problem is if there happens to be a function in your package with the same name as a method you're trying to call. For example, what if some module that you use'd exported a function called dosomethingwith? In that case, dosomethingwith $object is ambiguous, and can result in puzzling bugs.
Using the -> syntax exclusively eliminates these problems, because the method and what you want the method to operate upon are always clear to the compiler.
See Indirect Object Syntax in the perlobj documentation for an explanation of its pitfalls. freido's answer covers one of them (although I tend to avoid that with explicit parens around my function calls).
Larry once joked that it was there to make the C++ feel happy about new, and although people will tell you not to ever use it, you're probably doing it all the time. Consider this:
print FH "Some message";
Have you ever wondered my there was no comma after the filehandle? And there's no comma after the class name in the indirect object notation? That's what's going on here. You could rewrite that as a method call on print:
FH->print( "Some message" );
You may have experienced some weirdness in print if you do it wrong. Putting a comma after the explicit file handle turns it into an argument:
print FH, "some message"; # GLOB(0xDEADBEEF)some message
Sadly, we have this goofiness in Perl. Not everything that got into the syntax was the best idea, but that's what happens when you pull from so many sources for inspiration. Some of the ideas have to be the bad ones.
The indirect object syntax is frowned upon, for good reasons, but that's got nothing to do with constructors. You're almost never going to have a new() function in the calling package. Rather, you should use Package->new() for two other (better?) reasons:
As you said, all other class methods take the form Package->method(), so consistency is a Good Thing
If you're supplying arguments to the constructor, or you're taking the result of the constructor and immediately calling methods on it (if e.g. you don't care about keeping the object around), it's simpler to say e.g.
$foo = Foo->new(type => 'bar', style => 'baz');
Bar->new->do_stuff;
than
$foo = new Foo(type => 'bar', style => 'baz');
(new Bar)->do_stuff;
Another problem is that new Some::Class happens at run time. If there is an error and you testing never branches to this statement, you never know it until it happens in production. It is better to use Some::Class->new unless you are doing dynamic programing.