Fastest way of calling a subroutine - perl

As far as I know, in Perl, we can call a subroutine from a Module by using these techniques:
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
Create an Object of that Module in your perl script finally call foo using that Object.
Directly call foo using its path, like this myDir::Module::foo();.
If I am always confused which is better way of calling a subroutine foo.
If I have a dynamic script, which I run from the browser and not command line, which approach one should go for so that the script takes less time.
Thanks.

There is a difference between the fastest, and the best way to call code in Perl.
Edit: please see simbabques answer as well. He especially covers the differences between #1 and #3, and why you would use either.
#1, #3: Function calls
Your #1 and #3 are identical: The subroutine has an unique name in the globally visible namespace. Many names may map to one subroutine via aliases, or importing a module.
If the name of the function you are calling is known at compile time, the sub will be resolved at compile time. This assumes that you don't spontaneously redefine your functions. If the exact function is only known at runtime, this is only a hash lookup away.
There are three ways how functions can be called:
foo(#args);
&foo(#args);
#_ = #args; goto &foo;
Number one (braces sometimes optional) is default, and validates your arguments against the sub prototype (don't use prototypes). Also, a whole call stack frame (with much useful debug information) is constructed. This takes time.
Number two skips the protoype verification, and assumes that you know what you are doing. This is slightly faster. I think this is sloppy style.
Number three is a tail call. This returns from the current sub with the return value of foo. This is fast, as prototypes are ignored, and the current call stack frame can be reused. This isn't useful very often, and has ugly syntax. Inlining the code is about an order of magnitude faster (i.e. in Perl, we prefer loops over recursion ☹).
#2: Method calls
The flexibility of OO comes at a hefty performance price: As the type of the object you call the message on is never known until runtime, the actual method can only be resolved at runtime.
This means that $foo->bar() looks up the function bar in the package that $foo was blessed into. If it can't be found there, it will be searched for in parent classes. This is slow. If you want to use OO, pay attention to shallow hierarchies (→ less lookups). Do also note that Perls default Method Resolution Order is unusual.
You cannot generally reduce a method call to a function call, even if you know the type.
If $foo if of class Foo, and Foo::bar is a sub, then Foo::bar($foo) will skip the method resultution, and might even work. However, this breaks encapsulation, and will break once Foo is subclassed. Also, this doesn't work if Foo doesn't define bar, but the method was defined in a parent class.
I am generally in favour of object orientation, until it is clear from benchmarks that this will not provide the performance you require.

Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
In order to do this, you would use Exporter in the module/package that implements the sub. You tell your module what it will export via #EXPORT_OK and #EXPORT. If you use the module, stuff gets imported into your current namespace at compile time. The following to statements are equivalent.
# This is the same...
use Module;
# ... as this
BEGIN {
require Module;
Module->import();
}
You want to do this if you have stuff you are going to use in your main script, or you are going to use often. Some examples are List::Util, Data::Dumper or use feature 'say'. Of course you can also use it in other modules.
use Data::Dumper;
use List::Util qw(max);
use feature qw(say);
my #foo = (1, 2, 3, 4, 5, 23);
print Dumper \#foo;
say max(#foo);
The catch is that here, you 'pollute' your namespace. Do this if you must, but keep in mind that it happens at compile time, so it is not conditional. You cannot say:
if ($foo) {
use Some::Module 'foo';
foo($foo);
} else {
use Something::Else 'bar';
bar();
}
It will load both Some::Module and Something::Else at compile time, thus increasing the time and memory your program consumes. The condition will work of course, but it is not efficient.
Create an Object of that Module in your perl script finally call foo using that Object.
This is the OOp approach. It is (as mentioned above) not compairable to the other methods. You don't need to import methods of an object. You just load your class (which is a module) either with use or require (see above), create an instance and use its methods to your liking. However, you need an object oriented module for that. If you are interestend in how that works, start by taking a look at perlootut.
Directly call foo using its path, like this myDir::Module::foo();.
It's actually not quite its path, but rather its name(space). For example, Data::Dumper is Dumper.pm located in the folder Data, somewhere in your lib dir. But that is not really important.
The main difference to the first approach is that you ommit the importing part. This is useful if you want to build something that conditionally loads certain modules, or if you are in a huge (maybe legacy) application and do not want to pollute the namespace.
if ($order_has_some_condition) {
require Very::Long::NameSpace::For::This::Condition::Module;
Very::Long::NameSpace::For::This::Condition::Module::do_stuff_with_an_order($order);
}
Imagine this piece of code is in a legacy sub with 2k lines and a lot of stuff going on, most of it is never called in our case. We do not want to use our module, making it available for each of the maybe 100 different cases that are handled in this huge piece of code. Instead, we want to only load it if we really need it. Now we require the module and call it's sub directly using the full name.
In conclusing, both the first and the third way have their merits. They both need to exist, and they should both be used if appropriate. In some cases, it is just flavor, but in others it makes sense to decide. The second, OOp, approach is something else entirely.
There are no real speed differences, and as Borodin said, Perl is fast. Of course, if you do not import stuff, you don't have to 'pay' for the import. In a 10-liner script, that doesn't matter. In legacy software with potentially thousands of lines of codes and many use cases in one huge file, it matters a lot.
I hope this helps you decide.

Related

new SomeModule::SomeName vs SomeModule::SomeName->new(), what's the difference?

I'm very new to perl so you'll have to excuse my ignorance.
I'm working on a legacy project. I don't have a dedicated IDE, I'm using PHPStorm with a dedicated Perl plugin.
When hovering over a new keyword I'm getting a warning Using of fancy calls is not recommended, use TCO::AEMAP->new().
The code in question is
my $aemapper = new TCO::AEMAP();
Basically it's suggesting doing
my $aemapper = TCO::AEMAP->new();
Is there any merit to this claim or is it simply more of a convention? I can't find much on google since I'm not exactly sure what to look for.
The new Foo version is called indirect object syntax. It's an old-fashioned way of calling the constructor on a package, and it's discouraged in modern Perl. Here's a partial quote of the relevant section in perldoc.
We recommend that you avoid this syntax, for several reasons.
First, it can be confusing to read. In the above example, it's not
clear if save is a method provided by the File class or simply a
subroutine that expects a file object as its first argument.
When used with class methods, the problem is even worse. Because Perl
allows subroutine names to be written as barewords, Perl has to guess
whether the bareword after the method is a class name or subroutine
name. In other words, Perl can resolve the syntax as either File->new(
$path, $data ) or new( File( $path, $data ) ) .
To parse this code, Perl uses a heuristic based on what package names
it has seen, what subroutines exist in the current package, what
barewords it has previously seen, and other input. Needless to say,
heuristics can produce very surprising results!
Older documentation (and some CPAN modules) encouraged this syntax,
particularly for constructors, so you may still find it in the wild.
However, we encourage you to avoid using it in new code.
The alternative is calling new as a class method on a package with the arrow, as in Foo->new. The arrow -> does three things:
It looks up what's on its left-hand side. In this case, the bareword Foo looks looks like a package name. So Perl will see if it knows a package (or namespace) with that name.
It calls the method on the right-hand side of the arrow in that package it's just found.
It passes in the thing that's on the right, which in our case is Foo, the package name, as the first argument. That's why in the method declaration you will see my ($class, #args) = #_ or similar.
For all other object oriented calls, it's typical to use the arrow syntax. But there is lots of old code around that uses indirect object syntax for new, and especially older modules on CPAN still use it in their documentation.
Both work, but the indirect object syntax is discouraged. Use Foo->new for new code.

Perl: Static vs Package methods

I need to create a package which will be used by other developers.
What is the best way to implement static methods?
For static (class) methods I must expect 1st parameter $class, and method must be called as a class method:
My::Package->Sub1();
From the other hand I can write a "regular" package subroutine (no $class parameter expected) which will perfectly do the same, but needs to be called differently
My::Package::Sub1();
So, basically there is no difference from the business functionality perspective (at least I don't see it, except package name availability through the first parameter), but 2 different ways to implement and call. Kinda confusing.
Which way should I use and when? Is there some rule?
Also, should I check if method was called as I expected (static vs package)?
First, a functional point: If a 2nd Class is create that inherits from My::Package, Child::Class::Sub1() will be undefined, and if Sub1 is written as a non-OO subroutine, Child::Class->Sub1() will ignore the fact that it's being called from Child::Class.
As such, for the sake of the programmers using your module, you'll want to make all of the subroutines in a Package/Class respond to a consistent calling structure/methodology. Your module should either be a library of subroutines/functions or a class full of methods. If part of it is OO, make it all OO. It is possible to create subroutines to behave in a mixed mode, but this complicates the code unnecessarily, and seems to have gone out of fashion on CPAN.
Now if there is truly no reason to distinguish between My::Package->Sub1() and Child::Class->Sub1(), then you can feel free to ignore the implicit class name parameter you'll be passed. This doesn't mean you shouldn't expect that parameter or that you should encourage a non-OO call format in an OO Module.

Should packages be 'use'd globally or from functions that need them?

Apologies if this question isn't appropriate for StackOverflow. I suspect the answer is largely a matter of opinion (unless one of the style guides has a recommendation).
I have code that looks something like this
use File::Temp;
sub foo {
...
}
sub bar {
...
}
sub baz {
my $fh = tempfile();
...
}
baz is the only subroutine that uses File::Temp, and I'm not using AutoLoader. Is it reasonable to put the use declaration inside baz, or should I leave it at the top of my script?
Since (as chepner said), there is no difference technically, it really is a matter of style.
The pros of putting then all on top:
Clear at first glance what all the direct module dependencies are
Easier to maintain - if you need to move around code using the library, you don't need to remember to move the library.
Please note that the same exact logic also applies to variable declarations, but in that case, the scoping concerns severely trump the "remember to move the declaration" concerns and therefore you should declare variables in the innermost possible scope as close to where they are used as possible.
For esoteric cases where your own code contains complicated logic in BEGIN{} blocks that depends on all the libraries being loaded (e.g., call a specifically named method from ALL loaded libraries - which I have done) - you will have a bug if some library's use call is AFTER that BEGIN{} block
The cons of putting them all on top:
One can possibly argue that this makes the code less readable since you need to seek out to the start of file to see what you imported from the module. Frankly, I don't believe that but I have heard it expressed and it has at least some merit.
use is essentially the same as putting a require statement inside a BEGIN block, so it doesn't really matter where you put them; they are evaluated before any of the rest of your code.

Why isn't the "import" subroutine capitalized in Perl

I am curious. Most of Perl's implicitly called subroutines must be named in all caps. TIESCALAR, DESTROY, etc. In fact perldoc perltoot says
If constructors can have arbitrary
names, then why not destructors?
Because while a constructor is
explicitly called, a destructor is
not. Destruction happens
automatically via Perl's garbage
collection (GC) system, which is a
quick but somewhat lazy
reference-based GC system. To know
what to call, Perl insists that the
destructor be named DESTROY. Perl's
notion of the right time to call a
destructor is not well-defined
currently, which is why your
destructors should not rely on when
they are called.
Why is DESTROY in all caps? Perl on
occasion uses purely uppercase
function names as a convention to
indicate that the function will be
automatically called by Perl in some
way. Others that are called
implicitly include BEGIN, END,
AUTOLOAD, plus all methods used by
tied objects, described in perltie.
Why then is the import subroutine left to be lower case? Does anyone have a good insight on this?
I'd say that "import" is not called implicitly. It's an explicit call issued by implementation of use. To quote from perldoc use:
It is exactly equivalent to:
BEGIN { require Module; Module->import( LIST ); }
To expand on DVK's answer a little, there are situations where you'd legitimately want to invoke import explicitly, for example when loading an optional module or auto-populating namespaces:
eval "require $modulename; $modulename->import( LIST ); ";
I can't think of any situation where you would ever want to invoke DESTROY, TIESCALAR, etc. explicitly.
It's simply an oversight in the design. It's too late to change.

In Perl, is it better to use a module than to require a file?

Another question got me thinking about different methods of code reuse: use vs. require vs. do
I see a lot of posts here where the question centers around the use of require to load and execute code. This seems to me to be an obvious bad practice, but I haven't found any good resources on the issue that I can point people to.
perlfaq8 covers the difference between use and require, but it doesn't offer any advice in regard to preference (as of 5.10--in 5.8.8 there is a quick bit of advice in favor of use).
This topic seems to suffer from a lack of discussion. I have a few questions I'd like to see discussed:
What is the preferred method of code reuse in Perl?
use ModuleName;
require ModuleName;
require 'file.pl';
do 'file.pl';
What is the difference between require ModuleName and require "file.pl"?
Is it ever a good idea to use require "file.pl"? Why or why not?
Standard practice is to use use most of the time, require occasionally, and do rarely.
do 'file' will execute file as a Perl script. It's almost like calling eval on the contents of the file; if you do the same file multiple times (e.g. in a loop) it will be parsed and evaluated each time which is unlikely to be what you want. The difference between do and eval is that do can't see lexical variables in the enclosing scope, which makes it safer. do is occasionally useful for simple tasks like processing a configuration file that's written in the form of Perl code.
require 'file' is like do 'file' except that it will only parse any particular file one time and will raise an exception if something goes wrong. (e.g. the file can't be found, it contains a syntax error, etc.) The automatic error checking makes it a good replacement for do 'file' but it's still only suited for the same simple uses.
The do 'file' and require 'file' forms are carryovers from days long past when the *.pl file extension meant "Perl Library." The modern way of reusing code in Perl is to organize it into modules. Calling something a "module" instead of a "library" is just semantics, but the words mean distinctly different things in Perl culture. A library is just a collection of subroutines; a module provides a namespace, making it far more suitable for reuse.
use Module is the normal way of using code from a module. Note that Module is the package name as a bareword and not a quoted string containing a file name. Perl handles the translation from a package name to a file name for you. use statements happen at compile time and throw an exception if they fail. This means that if a module your code depends on isn't available or fails to load the error will be apparent immediately. Additionally, use automatically calls the import() method of the module if it has one which can save you a little typing.
require Module is like use Module except that it happens at runtime and does not automatically call the module's import() method. Normally you want to use use to fail early and predictably, but sometimes require is better. For example, require can be used to delay the loading of large modules which are only occasionally required or to make a module optional. (i.e. use the module if it's available but fall back on something else or reduce functionality if it isn't.)
Strictly speaking, the only difference between require Module and require 'file' is that the first form triggers the automatic translation from a package name like Foo::Bar to a file name like Foo/Bar.pm while the latter form expects a filename to start with. By convention, though, the first form is used for loading modules while the second form is used for loading libraries.
There is a major preference for using use, because it happens at an earlier state BEGIN {} during compilation, and the errors tend to be propagated to the user at a more appropriate time. It also calls the sub import {} function which gives the caller control over the import process. This is something that is heavily used. You can get the same effect, by calling the specific namespace's import, but that requires you to know the name of the namespace, and the file, and to code the call to the subroutine... which is a lot more work. Conversely, use, just requires you to know the namespace, and then it requires the file with the matching namespace -- thus making the link between namespaces and files less of an conscious to the user.
Read perldoc -f use, and perldoc -f require, for more information. Per perldoc -f use:
use is the same as BEGIN { require Module; Module->import( LIST ); } Which is just a lot more ugly.
The main difference is around import/export. use is preferred when you're making use of a module because it allows you to specify which routines you wish to import into your namespace :
use MyModule qw(foo bar baz); # allows foo(), bar() and baz() to be used
use MyModule qw(); # Requires explicit naming (e.g. MyModule::foo).
use also gives runs the module's import() procedure which is often used to set the module up.
See the perldoc for use for more detail.
Using use to module will include module during compile time which increases speed but uses more memory, whereas using require module will include during run time. Requiring a module without using import when needed uses less memory but reduces speed.