I am curious. Most of Perl's implicitly called subroutines must be named in all caps. TIESCALAR, DESTROY, etc. In fact perldoc perltoot says
If constructors can have arbitrary
names, then why not destructors?
Because while a constructor is
explicitly called, a destructor is
not. Destruction happens
automatically via Perl's garbage
collection (GC) system, which is a
quick but somewhat lazy
reference-based GC system. To know
what to call, Perl insists that the
destructor be named DESTROY. Perl's
notion of the right time to call a
destructor is not well-defined
currently, which is why your
destructors should not rely on when
they are called.
Why is DESTROY in all caps? Perl on
occasion uses purely uppercase
function names as a convention to
indicate that the function will be
automatically called by Perl in some
way. Others that are called
implicitly include BEGIN, END,
AUTOLOAD, plus all methods used by
tied objects, described in perltie.
Why then is the import subroutine left to be lower case? Does anyone have a good insight on this?
I'd say that "import" is not called implicitly. It's an explicit call issued by implementation of use. To quote from perldoc use:
It is exactly equivalent to:
BEGIN { require Module; Module->import( LIST ); }
To expand on DVK's answer a little, there are situations where you'd legitimately want to invoke import explicitly, for example when loading an optional module or auto-populating namespaces:
eval "require $modulename; $modulename->import( LIST ); ";
I can't think of any situation where you would ever want to invoke DESTROY, TIESCALAR, etc. explicitly.
It's simply an oversight in the design. It's too late to change.
Related
I'm very new to perl so you'll have to excuse my ignorance.
I'm working on a legacy project. I don't have a dedicated IDE, I'm using PHPStorm with a dedicated Perl plugin.
When hovering over a new keyword I'm getting a warning Using of fancy calls is not recommended, use TCO::AEMAP->new().
The code in question is
my $aemapper = new TCO::AEMAP();
Basically it's suggesting doing
my $aemapper = TCO::AEMAP->new();
Is there any merit to this claim or is it simply more of a convention? I can't find much on google since I'm not exactly sure what to look for.
The new Foo version is called indirect object syntax. It's an old-fashioned way of calling the constructor on a package, and it's discouraged in modern Perl. Here's a partial quote of the relevant section in perldoc.
We recommend that you avoid this syntax, for several reasons.
First, it can be confusing to read. In the above example, it's not
clear if save is a method provided by the File class or simply a
subroutine that expects a file object as its first argument.
When used with class methods, the problem is even worse. Because Perl
allows subroutine names to be written as barewords, Perl has to guess
whether the bareword after the method is a class name or subroutine
name. In other words, Perl can resolve the syntax as either File->new(
$path, $data ) or new( File( $path, $data ) ) .
To parse this code, Perl uses a heuristic based on what package names
it has seen, what subroutines exist in the current package, what
barewords it has previously seen, and other input. Needless to say,
heuristics can produce very surprising results!
Older documentation (and some CPAN modules) encouraged this syntax,
particularly for constructors, so you may still find it in the wild.
However, we encourage you to avoid using it in new code.
The alternative is calling new as a class method on a package with the arrow, as in Foo->new. The arrow -> does three things:
It looks up what's on its left-hand side. In this case, the bareword Foo looks looks like a package name. So Perl will see if it knows a package (or namespace) with that name.
It calls the method on the right-hand side of the arrow in that package it's just found.
It passes in the thing that's on the right, which in our case is Foo, the package name, as the first argument. That's why in the method declaration you will see my ($class, #args) = #_ or similar.
For all other object oriented calls, it's typical to use the arrow syntax. But there is lots of old code around that uses indirect object syntax for new, and especially older modules on CPAN still use it in their documentation.
Both work, but the indirect object syntax is discouraged. Use Foo->new for new code.
I need to create a package which will be used by other developers.
What is the best way to implement static methods?
For static (class) methods I must expect 1st parameter $class, and method must be called as a class method:
My::Package->Sub1();
From the other hand I can write a "regular" package subroutine (no $class parameter expected) which will perfectly do the same, but needs to be called differently
My::Package::Sub1();
So, basically there is no difference from the business functionality perspective (at least I don't see it, except package name availability through the first parameter), but 2 different ways to implement and call. Kinda confusing.
Which way should I use and when? Is there some rule?
Also, should I check if method was called as I expected (static vs package)?
First, a functional point: If a 2nd Class is create that inherits from My::Package, Child::Class::Sub1() will be undefined, and if Sub1 is written as a non-OO subroutine, Child::Class->Sub1() will ignore the fact that it's being called from Child::Class.
As such, for the sake of the programmers using your module, you'll want to make all of the subroutines in a Package/Class respond to a consistent calling structure/methodology. Your module should either be a library of subroutines/functions or a class full of methods. If part of it is OO, make it all OO. It is possible to create subroutines to behave in a mixed mode, but this complicates the code unnecessarily, and seems to have gone out of fashion on CPAN.
Now if there is truly no reason to distinguish between My::Package->Sub1() and Child::Class->Sub1(), then you can feel free to ignore the implicit class name parameter you'll be passed. This doesn't mean you shouldn't expect that parameter or that you should encourage a non-OO call format in an OO Module.
As far as I know, in Perl, we can call a subroutine from a Module by using these techniques:
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
Create an Object of that Module in your perl script finally call foo using that Object.
Directly call foo using its path, like this myDir::Module::foo();.
If I am always confused which is better way of calling a subroutine foo.
If I have a dynamic script, which I run from the browser and not command line, which approach one should go for so that the script takes less time.
Thanks.
There is a difference between the fastest, and the best way to call code in Perl.
Edit: please see simbabques answer as well. He especially covers the differences between #1 and #3, and why you would use either.
#1, #3: Function calls
Your #1 and #3 are identical: The subroutine has an unique name in the globally visible namespace. Many names may map to one subroutine via aliases, or importing a module.
If the name of the function you are calling is known at compile time, the sub will be resolved at compile time. This assumes that you don't spontaneously redefine your functions. If the exact function is only known at runtime, this is only a hash lookup away.
There are three ways how functions can be called:
foo(#args);
&foo(#args);
#_ = #args; goto &foo;
Number one (braces sometimes optional) is default, and validates your arguments against the sub prototype (don't use prototypes). Also, a whole call stack frame (with much useful debug information) is constructed. This takes time.
Number two skips the protoype verification, and assumes that you know what you are doing. This is slightly faster. I think this is sloppy style.
Number three is a tail call. This returns from the current sub with the return value of foo. This is fast, as prototypes are ignored, and the current call stack frame can be reused. This isn't useful very often, and has ugly syntax. Inlining the code is about an order of magnitude faster (i.e. in Perl, we prefer loops over recursion ☹).
#2: Method calls
The flexibility of OO comes at a hefty performance price: As the type of the object you call the message on is never known until runtime, the actual method can only be resolved at runtime.
This means that $foo->bar() looks up the function bar in the package that $foo was blessed into. If it can't be found there, it will be searched for in parent classes. This is slow. If you want to use OO, pay attention to shallow hierarchies (→ less lookups). Do also note that Perls default Method Resolution Order is unusual.
You cannot generally reduce a method call to a function call, even if you know the type.
If $foo if of class Foo, and Foo::bar is a sub, then Foo::bar($foo) will skip the method resultution, and might even work. However, this breaks encapsulation, and will break once Foo is subclassed. Also, this doesn't work if Foo doesn't define bar, but the method was defined in a parent class.
I am generally in favour of object orientation, until it is clear from benchmarks that this will not provide the performance you require.
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
In order to do this, you would use Exporter in the module/package that implements the sub. You tell your module what it will export via #EXPORT_OK and #EXPORT. If you use the module, stuff gets imported into your current namespace at compile time. The following to statements are equivalent.
# This is the same...
use Module;
# ... as this
BEGIN {
require Module;
Module->import();
}
You want to do this if you have stuff you are going to use in your main script, or you are going to use often. Some examples are List::Util, Data::Dumper or use feature 'say'. Of course you can also use it in other modules.
use Data::Dumper;
use List::Util qw(max);
use feature qw(say);
my #foo = (1, 2, 3, 4, 5, 23);
print Dumper \#foo;
say max(#foo);
The catch is that here, you 'pollute' your namespace. Do this if you must, but keep in mind that it happens at compile time, so it is not conditional. You cannot say:
if ($foo) {
use Some::Module 'foo';
foo($foo);
} else {
use Something::Else 'bar';
bar();
}
It will load both Some::Module and Something::Else at compile time, thus increasing the time and memory your program consumes. The condition will work of course, but it is not efficient.
Create an Object of that Module in your perl script finally call foo using that Object.
This is the OOp approach. It is (as mentioned above) not compairable to the other methods. You don't need to import methods of an object. You just load your class (which is a module) either with use or require (see above), create an instance and use its methods to your liking. However, you need an object oriented module for that. If you are interestend in how that works, start by taking a look at perlootut.
Directly call foo using its path, like this myDir::Module::foo();.
It's actually not quite its path, but rather its name(space). For example, Data::Dumper is Dumper.pm located in the folder Data, somewhere in your lib dir. But that is not really important.
The main difference to the first approach is that you ommit the importing part. This is useful if you want to build something that conditionally loads certain modules, or if you are in a huge (maybe legacy) application and do not want to pollute the namespace.
if ($order_has_some_condition) {
require Very::Long::NameSpace::For::This::Condition::Module;
Very::Long::NameSpace::For::This::Condition::Module::do_stuff_with_an_order($order);
}
Imagine this piece of code is in a legacy sub with 2k lines and a lot of stuff going on, most of it is never called in our case. We do not want to use our module, making it available for each of the maybe 100 different cases that are handled in this huge piece of code. Instead, we want to only load it if we really need it. Now we require the module and call it's sub directly using the full name.
In conclusing, both the first and the third way have their merits. They both need to exist, and they should both be used if appropriate. In some cases, it is just flavor, but in others it makes sense to decide. The second, OOp, approach is something else entirely.
There are no real speed differences, and as Borodin said, Perl is fast. Of course, if you do not import stuff, you don't have to 'pay' for the import. In a 10-liner script, that doesn't matter. In legacy software with potentially thousands of lines of codes and many use cases in one huge file, it matters a lot.
I hope this helps you decide.
Apologies if this question isn't appropriate for StackOverflow. I suspect the answer is largely a matter of opinion (unless one of the style guides has a recommendation).
I have code that looks something like this
use File::Temp;
sub foo {
...
}
sub bar {
...
}
sub baz {
my $fh = tempfile();
...
}
baz is the only subroutine that uses File::Temp, and I'm not using AutoLoader. Is it reasonable to put the use declaration inside baz, or should I leave it at the top of my script?
Since (as chepner said), there is no difference technically, it really is a matter of style.
The pros of putting then all on top:
Clear at first glance what all the direct module dependencies are
Easier to maintain - if you need to move around code using the library, you don't need to remember to move the library.
Please note that the same exact logic also applies to variable declarations, but in that case, the scoping concerns severely trump the "remember to move the declaration" concerns and therefore you should declare variables in the innermost possible scope as close to where they are used as possible.
For esoteric cases where your own code contains complicated logic in BEGIN{} blocks that depends on all the libraries being loaded (e.g., call a specifically named method from ALL loaded libraries - which I have done) - you will have a bug if some library's use call is AFTER that BEGIN{} block
The cons of putting them all on top:
One can possibly argue that this makes the code less readable since you need to seek out to the start of file to see what you imported from the module. Frankly, I don't believe that but I have heard it expressed and it has at least some merit.
use is essentially the same as putting a require statement inside a BEGIN block, so it doesn't really matter where you put them; they are evaluated before any of the rest of your code.
As far as I understand, in Scala we can define a function with no parameters either by using empty parentheses after its name, or no parentheses at all, and these two definitions are not synonyms. What is the purpose of distinguishing these 2 syntaxes and when should I better use one instead of another?
It's mostly a question of convention. Methods with empty parameter lists are, by convention, evaluated for their side-effects. Methods without parameters are assumed to be side-effect free. That's the convention.
Scala Style Guide says to omit parentheses only when the method being called has no side-effects:
http://docs.scala-lang.org/style/method-invocation.html
Other answers are great, but I also think it's worth mentioning that no-param methods allow for nice access to a classes fields, like so:
person.name
Because of parameterless methods, you could easily write a method to intercept reads (or writes) to the 'name' field without breaking calling code, like so
def name = { log("Accessing name!"); _name }
This is called the Uniform Access Principal
I have another light to bring to the usefulness of the convention encouraging an empty parentheses block in the declaration of functions (and thus later in calls to them) with side effects.
It is with the debugger.
If one add a watch in a debugger, such as, say, process referring for the example to a boolean in the focused debug context, either as a variable view, or as a pure side-effect free function evaluation, it creates a nasty risk for your later troubleshooting.
Indeed, if the debugger keeps that watch as a try-to-evaluate thing whenever you change the context (change thread, move in the call stack, reach another breakpoint...), which I found to be at least the case with IntelliJ IDEA, or Visual Studio for other languages, then the side-effects of any other process function possibly found in any browsed scope would be triggered...
Just imagine the kind of puzzling troubleshooting this could lead to if you do not have that warning just in mind, because of some innocent regular naming. If the convention were enforced, with my example, the process boolean evaluation would never fall back to a process() function call in the debugger watches; it might just be allowed in your debugger to explicitly access the () function putting process() in the watches, but then it would be clear you are not directly accessing any attribute or local variables, and fallbacks to other process() functions in other browsed scopes, if maybe unlucky, would at the very least be very less surprising.