In Perl, is it better to use a module than to require a file? - perl

Another question got me thinking about different methods of code reuse: use vs. require vs. do
I see a lot of posts here where the question centers around the use of require to load and execute code. This seems to me to be an obvious bad practice, but I haven't found any good resources on the issue that I can point people to.
perlfaq8 covers the difference between use and require, but it doesn't offer any advice in regard to preference (as of 5.10--in 5.8.8 there is a quick bit of advice in favor of use).
This topic seems to suffer from a lack of discussion. I have a few questions I'd like to see discussed:
What is the preferred method of code reuse in Perl?
use ModuleName;
require ModuleName;
require 'file.pl';
do 'file.pl';
What is the difference between require ModuleName and require "file.pl"?
Is it ever a good idea to use require "file.pl"? Why or why not?

Standard practice is to use use most of the time, require occasionally, and do rarely.
do 'file' will execute file as a Perl script. It's almost like calling eval on the contents of the file; if you do the same file multiple times (e.g. in a loop) it will be parsed and evaluated each time which is unlikely to be what you want. The difference between do and eval is that do can't see lexical variables in the enclosing scope, which makes it safer. do is occasionally useful for simple tasks like processing a configuration file that's written in the form of Perl code.
require 'file' is like do 'file' except that it will only parse any particular file one time and will raise an exception if something goes wrong. (e.g. the file can't be found, it contains a syntax error, etc.) The automatic error checking makes it a good replacement for do 'file' but it's still only suited for the same simple uses.
The do 'file' and require 'file' forms are carryovers from days long past when the *.pl file extension meant "Perl Library." The modern way of reusing code in Perl is to organize it into modules. Calling something a "module" instead of a "library" is just semantics, but the words mean distinctly different things in Perl culture. A library is just a collection of subroutines; a module provides a namespace, making it far more suitable for reuse.
use Module is the normal way of using code from a module. Note that Module is the package name as a bareword and not a quoted string containing a file name. Perl handles the translation from a package name to a file name for you. use statements happen at compile time and throw an exception if they fail. This means that if a module your code depends on isn't available or fails to load the error will be apparent immediately. Additionally, use automatically calls the import() method of the module if it has one which can save you a little typing.
require Module is like use Module except that it happens at runtime and does not automatically call the module's import() method. Normally you want to use use to fail early and predictably, but sometimes require is better. For example, require can be used to delay the loading of large modules which are only occasionally required or to make a module optional. (i.e. use the module if it's available but fall back on something else or reduce functionality if it isn't.)
Strictly speaking, the only difference between require Module and require 'file' is that the first form triggers the automatic translation from a package name like Foo::Bar to a file name like Foo/Bar.pm while the latter form expects a filename to start with. By convention, though, the first form is used for loading modules while the second form is used for loading libraries.

There is a major preference for using use, because it happens at an earlier state BEGIN {} during compilation, and the errors tend to be propagated to the user at a more appropriate time. It also calls the sub import {} function which gives the caller control over the import process. This is something that is heavily used. You can get the same effect, by calling the specific namespace's import, but that requires you to know the name of the namespace, and the file, and to code the call to the subroutine... which is a lot more work. Conversely, use, just requires you to know the namespace, and then it requires the file with the matching namespace -- thus making the link between namespaces and files less of an conscious to the user.
Read perldoc -f use, and perldoc -f require, for more information. Per perldoc -f use:
use is the same as BEGIN { require Module; Module->import( LIST ); } Which is just a lot more ugly.

The main difference is around import/export. use is preferred when you're making use of a module because it allows you to specify which routines you wish to import into your namespace :
use MyModule qw(foo bar baz); # allows foo(), bar() and baz() to be used
use MyModule qw(); # Requires explicit naming (e.g. MyModule::foo).
use also gives runs the module's import() procedure which is often used to set the module up.
See the perldoc for use for more detail.

Using use to module will include module during compile time which increases speed but uses more memory, whereas using require module will include during run time. Requiring a module without using import when needed uses less memory but reduces speed.

Related

Save resources by only loading parts of a module?

I am reading through O'Reilly's Perl Objects, References & Modules, more specifically its section about modules. It states that when using use Some::Module you can specify an import list. From its explanation it seems that the only benefit of using this list is for the sake of keeping your namespace clean. In other words, if you have a subroutine some_sub in your main package and the loaded module has a sub with the same name, your subroutine will be overridden. However, if you specify an import list and leave out some_sub from this list, you'll not have this conflict. You can then still run some_sub from the Module by declaring it like so: Some::Module::some_sub.
Is there any other benefit than the one I described above? I am asking this because in some cases you load modules with loads of functionality, even though you are only interested in some of its methods. At first I thought that by specifying an import list you only loaded those methods and not bloating memory with methods you wouldn't use anyway. However, from the explanation above that does not seem the case. Can you selectively save resources by only loading parts of a module? Or is Perl smart enough to do this when compiling without the need of a programmer's intervention?
From use we see that use Module LIST; means exactly
BEGIN { require Module; Module->import( LIST ); }
On the other hand, from require
Otherwise, require demands that a library file be included if it hasn't already been included. The file is included via the do-FILE mechanism, [...]
and do 'file' executes 'file' as a Perl script. Thus with use we load the whole module.
"Importing" a sub means that its name is added (or overwritten) in the caller's symbol table (via the CODE slot for the typeglob, normally aliased), by the package's import function. The sub's code isn't copied. Now, import can be written any way the author wants to, but generally the import list in the use statement merely controls what symbols are brought into the namespace. The preferred way to provide import in a module is to use the Exporter's import method.
Selective importing relieves the symbol table (and perhaps some related mechanisms), but I am not aware of practical benefits of this. The benefits are related to programming, via reduced chances for collisions.
Another clear benefit is that it nicely documents what is used in the code.
Note that "import list" is just a convention. Module's import function is free to do whatever it pleases with this list and you can see it (ab)used by many so-called pragma modules. Therefore partial loading is NOT bound to use in any way. For example module can load heavy function stubs WHEREVER you've imported them or not and dynamically load heavy implementation on actual first call.
Therefore use with partial import list may, or may not actually save any resources - it is all depends on actual implementation of used module.
While require and use indeed load entire .pm file - that file well could be just a lightweight stub and loader for actual code located elsewhere. There's another convention to call those modules ::Heavy.
Modules are free to implement partial loading in any way they please as well. Here are just some possibilities how module can save resources:
AUTOLOAD (with its complimentary AutoLoader, AutoSplit, and SelfLoader modules).
Use stubs that load necessary submodules.
Dynamically load heavy data (i.e. dictionaries or encoding maps) when they are first accessed by their name.
If you depend on other heavy modules, dynamically require them at runtime in functions that depend on them instead of compile-time use at the very start.
Everything on this list could work automatically behind the scenes, exposed through use import list, or work/be called in other, completely arbitrary way. Once again, it's completely up to module's implementation.

Perl - Is there a way to import a class and all of its child classes?

In java, there is a way to import a class and all of its children in one line:
import java.utils.*
In Perl, I've found I can only import specific classes:
use Perl::Utils::Folder;
use Perl::Utils::Classes qw(new run_class);
Is there similar way like java to import everything that falls under a tree structure, only in Perl?
No, there is not a way to easily do what you are after.
You could walk the relevant paths in your PERL library's filesystem and use every .pm file you came across (that's what Module::Find, as suggested by #Daniel Böhmer, does), but that can miss a few things:
Packages that are declared in funny ways/at runtime.
Multiple packages per module file.
Other cases I haven't thought of.
This is also a bad idea, for a few reasons:
You mentioned "classes" in your question, rather than just packages. Perl packages and subpackages do not necessarily represent classes/instantiable object-oriented code. If you were to programmatically generate a list of all packages in a hierarchy and then call $packagename->new() on each of them, you might have a syntax error, if one of the packages was just a library of functions.
Packages and subpackages often are not directly related, developed by the same people, or used for similar things. Just because a package starts with Net:: doesn't mean that it will obey standard conventions that other Net::-prefixed packages expect. For example, File::Find and File::Tail share a prefix, but have very little to do with each other; the prefix is in common because both utilities work with files as their goal.
Lots of packages do things at BEGIN/INIT/etc time when they're compiled. Some of them (sadly) do different things depending on the order in which they're used relative to other modules. The solution to this problem for module developers is "don't do that", but for module users, it's "use sparingly, and only when needed".
It clutters your local namespace with lots of potentially-exported symbols you don't necessarily need (to conditionally import symbols, you'll have to use import arguments like you're doing in your example; there's no programmatic way to define "symbols I'm interested in", since Perl doesn't have that kind static analysis at compile time . . . not for lots of call styles, at least).
It slows down your program's startup time by compiling things you might not necessarily need. This might not seem important at the early phase of a project, but for larger projects it is very easy to end up in situations where you're pulling in over a thousand CPAN modules when you start Apache (or launch your main script, or whatever), and your app takes more than a minute just to start.
I have a hunch that you're trying to reduce boilerplate (as in: all of your modules have a big block of use statements at the top, and that's duplicated everywhere). There are a few ways to do this, starting with:
Don't: import things in each module as you need them, and use strict/warnings and lots of tests to be told early on if you're calling functionality that you haven't imported yet.
You could also make your own Exporter subclass that uses all of your standard modules and adds the functions that you frequently use from them to its #EXPORTS (or splices their #EXPORTS onto its own, or uses Exporter sub-sub-classing, or something).
Factor your code so that the parts that depend on multiple imported modules live in a single utility module, and import that.
Factor your code so that the parts that depend on the imported modules live in a parent class, and address its methods via instances of subclasses (or SUPER), so your subclasses don't have to explicitly contain the imports, e.g. $instance->method_that_calls_an_imported_function_in_the_parent();
Also, as an aside, using package.* imports in Java is debatable, and has many of the same drawbacks of doing it in Perl.
In Perl, the class Foo::Bar::Foo may not be a subclass of Foo::Bar. Nor, is there any guarantee that a subclass module even has the same class prefix. IO::File is a subclass of IO::Handle and not of IO which isn't even a module.
There also isn't even an easy way to tell of a Perl module is a sub-class of another Perl module. There are (at least) three ways to declare a subclass' relationship to a class:
use parent
use base
The #ISA package variable
It is possible to use #INC to find all modules, then look at the source and look at use parent, use base, and #ISA declarations and build a Perl class matrix, then go through that matrix to load the classes you do need. That will probably be slow and cumbersome, and doesn't even cover Moose based classes.
You're asking the wrong question. You're asking "Find all of the subclasses of a particular class.". This will include classes that you're probably not even interested in. I know (for example LWP) that there can be dozens of various classes and subclasses that include stuff you're not even interested in.
What you should be asking is "What do I need to do?", and then find the classes that fulfill your needs. If these classes happen to be child classes of a particular parent class, these subclasses will load the required class.
We do Java programming here, and one of the standards is not to use asterisks in our import statements. This is considered sloppy programming. If you need a particular class, you should declare it rather than simply declaring a superclass. Many of our reporting tools have problems with asterisk declarations in import statements.
There is a Module::Find module, but I am not sure exactly how it works. I believe it simply assumes that subclasses are in the same module hierarchy as the superclass, but that's far from true in Perl.
In general, I think it is a bad idea to load a whole 'tree' of modules (or subclasses so to speak).
There is definitely something wrong in your design if you need to know all and everything about sub classes/modules. You break the rules of encapsulation and you should not need to know how a class handles its responsibilities.

How can I tell if a Perl module is actually used in my program?

I have been on a "cleaning spree" lately at work, doing a lot of touch-up stuff that should have been done awhile ago. One thing I have been doing is deleted modules that were imported into files and never used, or they were used at one point but not anymore. To do this I have just been deleting an import and running the program's test file. Which gets really, really tedious.
Is there any programmatic way of doing this? Short of me writing a program myself to do it.
Short answer, you can't.
Longer possibly more useful answer, you won't find a general purpose tool that will tell you with 100% certainty whether the module you're purging will actually be used. But you may be able to build a special purpose tool to help you with the manual search that you're currently doing on your codebase. Maybe try a wrapper around your test suite that removes the use statements for you and ignores any error messages except messages that say Undefined subroutine &__PACKAGE__::foo and other messages that occur when accessing missing features of any module. The wrapper could then automatically perform a dumb source scan on the codebase of the module being purged to see if the missing subroutine foo (or other feature) might be defined in the unwanted module.
You can supplement this with Devel::Cover to determine which parts of your code don't have tests so you can manually inspect those areas and maybe get insight into whether they are using code from the module you're trying to purge.
Due to the halting problem you can't statically determine whether any program, of sufficient complexity, will exit or not. This applies to your problem because the "last" instruction of your program might be the one that uses the module you're purging. And since it is impossible to determine what the last instruction is, or if it will ever be executed, it is impossible to statically determine if that module will be used. Further, in a dynamic language, which can extend the program during it's run, analysis of the source or even the post-compile symbol tables would only tell you what was calling the unwanted module just before run-time (whatever that means).
Because of this you won't find a general purpose tool that works for all programs. However, if you are positive that your code doesn't use certain run-time features of Perl you might be able to write a tool suited to your program that can determine if code from the module you're purging will actually be executed.
You might create alternative versions of the modules in question, which have only an AUTOLOAD method (and import, see comment) in it. Make this AUTOLOAD method croak on use. Put this module first into the include path.
You might refine this method by making AUTOLOAD only log the usage and then load the real module and forward the original function call. You could also have a subroutine first in #INC which creates the fake module on the fly if necessary.
Of course you need a good test coverage to detect even rare uses.
This concept is definitely not perfect, but it might work with lots of modules and simplify the testing.

Fastest way of calling a subroutine

As far as I know, in Perl, we can call a subroutine from a Module by using these techniques:
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
Create an Object of that Module in your perl script finally call foo using that Object.
Directly call foo using its path, like this myDir::Module::foo();.
If I am always confused which is better way of calling a subroutine foo.
If I have a dynamic script, which I run from the browser and not command line, which approach one should go for so that the script takes less time.
Thanks.
There is a difference between the fastest, and the best way to call code in Perl.
Edit: please see simbabques answer as well. He especially covers the differences between #1 and #3, and why you would use either.
#1, #3: Function calls
Your #1 and #3 are identical: The subroutine has an unique name in the globally visible namespace. Many names may map to one subroutine via aliases, or importing a module.
If the name of the function you are calling is known at compile time, the sub will be resolved at compile time. This assumes that you don't spontaneously redefine your functions. If the exact function is only known at runtime, this is only a hash lookup away.
There are three ways how functions can be called:
foo(#args);
&foo(#args);
#_ = #args; goto &foo;
Number one (braces sometimes optional) is default, and validates your arguments against the sub prototype (don't use prototypes). Also, a whole call stack frame (with much useful debug information) is constructed. This takes time.
Number two skips the protoype verification, and assumes that you know what you are doing. This is slightly faster. I think this is sloppy style.
Number three is a tail call. This returns from the current sub with the return value of foo. This is fast, as prototypes are ignored, and the current call stack frame can be reused. This isn't useful very often, and has ugly syntax. Inlining the code is about an order of magnitude faster (i.e. in Perl, we prefer loops over recursion ☹).
#2: Method calls
The flexibility of OO comes at a hefty performance price: As the type of the object you call the message on is never known until runtime, the actual method can only be resolved at runtime.
This means that $foo->bar() looks up the function bar in the package that $foo was blessed into. If it can't be found there, it will be searched for in parent classes. This is slow. If you want to use OO, pay attention to shallow hierarchies (→ less lookups). Do also note that Perls default Method Resolution Order is unusual.
You cannot generally reduce a method call to a function call, even if you know the type.
If $foo if of class Foo, and Foo::bar is a sub, then Foo::bar($foo) will skip the method resultution, and might even work. However, this breaks encapsulation, and will break once Foo is subclassed. Also, this doesn't work if Foo doesn't define bar, but the method was defined in a parent class.
I am generally in favour of object orientation, until it is clear from benchmarks that this will not provide the performance you require.
Export subroutine foo, import the module which has this subroutine. Finally call it in your perl script.
In order to do this, you would use Exporter in the module/package that implements the sub. You tell your module what it will export via #EXPORT_OK and #EXPORT. If you use the module, stuff gets imported into your current namespace at compile time. The following to statements are equivalent.
# This is the same...
use Module;
# ... as this
BEGIN {
require Module;
Module->import();
}
You want to do this if you have stuff you are going to use in your main script, or you are going to use often. Some examples are List::Util, Data::Dumper or use feature 'say'. Of course you can also use it in other modules.
use Data::Dumper;
use List::Util qw(max);
use feature qw(say);
my #foo = (1, 2, 3, 4, 5, 23);
print Dumper \#foo;
say max(#foo);
The catch is that here, you 'pollute' your namespace. Do this if you must, but keep in mind that it happens at compile time, so it is not conditional. You cannot say:
if ($foo) {
use Some::Module 'foo';
foo($foo);
} else {
use Something::Else 'bar';
bar();
}
It will load both Some::Module and Something::Else at compile time, thus increasing the time and memory your program consumes. The condition will work of course, but it is not efficient.
Create an Object of that Module in your perl script finally call foo using that Object.
This is the OOp approach. It is (as mentioned above) not compairable to the other methods. You don't need to import methods of an object. You just load your class (which is a module) either with use or require (see above), create an instance and use its methods to your liking. However, you need an object oriented module for that. If you are interestend in how that works, start by taking a look at perlootut.
Directly call foo using its path, like this myDir::Module::foo();.
It's actually not quite its path, but rather its name(space). For example, Data::Dumper is Dumper.pm located in the folder Data, somewhere in your lib dir. But that is not really important.
The main difference to the first approach is that you ommit the importing part. This is useful if you want to build something that conditionally loads certain modules, or if you are in a huge (maybe legacy) application and do not want to pollute the namespace.
if ($order_has_some_condition) {
require Very::Long::NameSpace::For::This::Condition::Module;
Very::Long::NameSpace::For::This::Condition::Module::do_stuff_with_an_order($order);
}
Imagine this piece of code is in a legacy sub with 2k lines and a lot of stuff going on, most of it is never called in our case. We do not want to use our module, making it available for each of the maybe 100 different cases that are handled in this huge piece of code. Instead, we want to only load it if we really need it. Now we require the module and call it's sub directly using the full name.
In conclusing, both the first and the third way have their merits. They both need to exist, and they should both be used if appropriate. In some cases, it is just flavor, but in others it makes sense to decide. The second, OOp, approach is something else entirely.
There are no real speed differences, and as Borodin said, Perl is fast. Of course, if you do not import stuff, you don't have to 'pay' for the import. In a 10-liner script, that doesn't matter. In legacy software with potentially thousands of lines of codes and many use cases in one huge file, it matters a lot.
I hope this helps you decide.

Why would you want to export symbols in Perl?

It seems strange to me that Perl would allow a package to export symbols into another package's namespace. The exporting package doesn't know if the using package already defined a symbol by the same name, and it certainly can't guarantee that it's the only package exporting a symbol by that name.
A very common problem caused by this is using CGI and LWP::Simple at the same time. Both packages export head() and cause an error. I know, it's easy enough to work around, but that's not the point. You shouldn't have to employ work arounds to use two practically-core Perl libraries.
As far as I can see, the only reason to do this is laziness. You save some key strokes by not typing Foo:: or using an object interface, but is it really worth it?
The practice of exporting all the functions from a module by default is not the recommended one for Perl. You should only export functions if you have a good reason. The recommended practice is to use EXPORT_OK so that the user has to type the name of the required function, like:
use My::Module 'my_function';
Modules from way back when, like LWP::Simple and CGI, were written before this recommendation came in, and it's hard now to alter them to not export things since it would break existing software. I guess the recommendation came about through people noticing problems like that.
Anyway Perl's object-oriented objects or whatever it's called doesn't require you to export anything, and you don't not have to say $foo->, so that part of your question is wrong.
Exporting is a feature. Like every other feature in any language, it can cause problems if you (ab)use it too frequently, or where you shouldn't. It's good when used wisely and bad otherwise, just like any other feature.
Back in the day when there weren't many modules around, it didn't seem like such a bad thing to export things by default. With 15,000 packages on CPAN, however, there are bound to be conflicts and that's unfortunate. However, fixing the modules now might break existing code. Whenever you make a poor interface choice and release it to the public, you're committed to it even if you don't like it.
So, it sucks, but that's the way it is, and there are ways around it.
The exporting package doesn't know if the using package already defined a symbol by the same name, and it certainly can't guarantee that it's the only package exporting a symbol by that name.
If you wanted to, I imagine your import() routine could check, but the default Exporter.pm routine doesn't check (and probably shouldn't, because it's going to get used a lot, and always checking if a name is defined will cause a major slowdown (and if you found a conflict, what is Exporter expected to do?)).
As far as I can see, the only reason to do this is laziness. You save some key strokes by not typing Foo:: or using an object interface, but is it really worth it?
Foo:: isn't so bad, but consider My::Company::Private::Special::Namespace:: - your code will look much cleaner if we just export a few things. Not everything can (or should) be in a top-level namespace.
The exporting mechanism should be used when it makes code cleaner. When namespaces collide, it shouldn't be used, because it obviously isn't making code cleaner, but otherwise I'm a fan of exporting things when requested.
It's not just laziness, and it's not just old modules. Take Moose, "the post-modern object system", and Rose::DB::Object, the object interface to a popular ORM. Both import the meta method into the useing package's symbol table in order to provide features in that module.
It's not really any different than the problem of multiply inheriting from modules that each provide a method of the same name, except that the order of your parentage would decide which version of that method would get called (or you could define your own overridden version that somehow manually folded the features of both parents together).
Personally I'd love to be able to combine Rose::DB::Object with Moose, but it's not that big a deal to work around: one can make a Moose-derived class that “has a” Rose::DB::Object-derived object within it, rather than one that “is a” (i.e., inherits from) Rose::DB::Object.
One of the beautiful things about Perl's "open" packages is that if you aren't crazy about the way a module author designed something, you can change it.
package LWPS;
require LWP::Simple;
for my $sub (#LWP::Simple::EXPORT, #LWP::Simple::EXPORT_OK) {
no strict 'refs';
*$sub = sub {shift; goto &{'LWP::Simple::' . $sub}};
}
package main;
my $page = LWPS->get('http://...');
of course in this case, LWP::Simple::get() would probably be better.