Exporting subroutines from a module 'used' via a 'require' - perl

I'm working with a set of perl scripts which our build system is written with. Unfortunately they were not written as a set of modules, but instead a bunch of .pl files which 'require' each other.
After making some changes to a 'LogOutput.pl' which was used by almost every other file, I started to suffer from some issues caused by the file being 'require'd multiple times.
In an effort to fix this, while not changing every file (some of which are not under my direct control), I did the following:
-Move everything in LogOutput.pl to a new file LogOutput.pm, this one having everything needed to make it a module (based on reading http://www.perlmonks.org/?node_id=102347 ).
-Replace the existing LogOutput.pl with the following
BEGIN
{
use File::Spec;
push #INC, File::Spec->catfile($BuildScriptsRoot, 'Modules');
}
use COMPANY::LogOutput qw(:DEFAULT);
1;
This works, except that I need to change calling code to prefix the sub names with the new package (i.e. COMPANY::LogOutput::OpenLog instead of just OpenLog)
Is there any way for me to export the new module's subroutine's from within LogOutput.pl?

The well named Import::Into can be used to export a module's symbols into another package.
use Import::Into;
# As if Some::Package did 'use COMPANY::LogOutput'
COMPANY::LogOutput->import::into("Some::Package");
However, this shouldn't be necessary. Since LogOutput.pl has no package, its code is in the package it was required from. use COMPANY::LogOutput will export into the package which required LogOutput.pl. Your code, as written, should work to emulate a bunch of functions in a .pl file.
Here's what I assume LogOutput.pl looked like (using the subroutine "pass" as a stand in for whatever subroutines you had in there)...
sub pass { print "pass called\n" }
1;
And what I assume LogOutput.pl and LogOutput.pm look like now...
# LogOutput.pl
BEGIN
{
use File::Spec;
push #INC, File::Spec->catfile($BuildScriptsRoot, 'Modules');
}
use COMPANY::LogOutput qw(:DEFAULT);
1;
# LogOutput.pm
package test;
use strict;
use warnings;
use Exporter "import";
our #EXPORT_OK = qw(pass);
our %EXPORT_TAGS = (
':DEFAULT' => [qw(pass)],
);
sub pass { print "pass called\n" }
1;
Note this will not change the basic nature of require. A module will still only be required once, after that requiring it again is a no-op. So this will still not work...
{
package Foo;
require "test.pl"; # this one will work
pass();
}
{
package Bar;
require "test.pl"; # this is a no-op
pass();
}
You can make it work. Perl stores the list of what files have been required in %INC. If you delete and entry, Perl will load the file again. However, you have to be careful that all the code in the .pl file is ok with this. That #INC hack has to make sure its only run once.
BEGIN
{
use File::Spec;
# Only run this code once, no matter how many times this
# file is loaded.
push #INC, File::Spec->catfile($BuildScriptsRoot, 'Modules')
if $LogOutput_pl::only_once++;
}
use COMPANY::LogOutput qw(:DEFAULT);
# Allow this to be required and functions imported more
# than once.
delete $INC{"LogOutput.pl"};
1;
This is one of the few cases that a global variable is justified. A lexical (my) variable must be declared and would be reset with each loading of the library. A global variable does not need to be declared and will persist between loading.

This turned out to just be a stupid mistake on my part, I didn't put the subs into the #EXPORT list, only into #EXPORT_OK.

Related

Anyway to tell if perl script is run via do?

I have a small script configuration file which is loaded from a main script.
#main.pl
package MYPACKAGE;
our $isMaster=1;
package main;
my config=do "./config.pl"
#config.pl
my $runViaDoFlag;
$runViaDoFlag=$0=~/main\.pl/; #Test if main.pl is the executing script
$runViaDoFlag=defined $MYPACKAGE::isMaster; #Test custom package variable
#Is there a 'built-in' way to do this?
die "Need to run from main script! " unless $runViaDoFlag;
{
options={
option1=>"some value",
option2=>"some value",
},
mySub=>sub {
# do interesting things here
}
}
In a more complicated config file it might not be so obvious that config.pl script is intended to only be executed by do. Hence I want to include a die with basic usage instructions.
Solutions:
test $0 for the main script name
have custom package variable defined in the main script and checked by the config script
simply have a comment in the config instructing the user how to use it.
These work, however is there some way of knowing if a script is executed via do via built-in variable/subs?
I'd offer a change in design: have that configuration in a normal module, in which you can then test whether it's been loaded by (out of) the main:: namespace or not. Then there is no need for any of that acrobatics with control variables etc.
One way to do that
use warnings;
use strict;
use feature 'say';
use FindBin qw($RealBin);
use lib $RealBin; # so to be able to 'use' from current directory
use ConfigPackage qw(load_config);
my $config = load_config();
# ...
and the ConfigPackage.pm (in the same directory)
package ConfigPackage;
use warnings;
use strict;
use feature 'say';
use Carp;
use Exporter qw(); # want our own import
our #EXPORT_OK = qw(load_config);
sub import {
#say "Loaded by: ", (caller)[0];
croak "Must only be loaded from 'main::'"
if not ( (caller)[0] eq 'main' );
# Now switch to Exporter::import to export symbols as needed
goto &Exporter::import;
}
sub load_config {
# ...
return 'Some config-related data structure';
}
1;
(Note that this use of goto is fine.)
This is just a sketch of course; adjust, develop further, and amend as needed. If this is lodaed out of a package other than main::, and so it fails, then that happens in the compile phase, since that's when import is called. I'd consider that a good thing.
If that config code need be able to run as well, as the question may indicate, then have a separate executable that loads this module and runs what need be run.
As for the question as stated, the title and the question's (apparent) quest differ a little, but both can be treated by using caller EXPR. It won't be a clean little "built-in" invocation though.
The thing about do as intended to be used is that
do './stat.pl' is largely like
eval `cat stat.pl`;
except that ...
(That stat.pl is introduced earlier in docs, merely to signify that do is invoked on a file.)
Then caller(0) will have clear hints to offer (see docs). It returns
my ($package, $filename, $line, $subroutine, $hasargs,
$wantarray, $evaltext, $is_require, $hints, $bitmask, $hinthash)
= caller($i);
In a call asked for, do './config.pl', apart from main (package) and the correct filename, the caller(0) in config.pl also returns:
(eval) for $subroutine
./config.pl for $evaltext
1 for $is_require
Altogether this gives plenty to decide whether the call was made as required.
However, I would not recommend this kind of involved analysis instead of just using a package, what is also incomparably more flexible.

Perl: Use custom nested modules organised in nested sub-directories

I'm trying to figure out how to use nested modules organised in nested sub-directories with Perl. I mean, a test.pl program use a Foo module, and this in turn use another Bar module and so on... Let's put a little example, files could be organised in directories like that:
./test.pl
./lib/Foo.pm
./lib/common/Bar.pm
First thing that comes to mind is using the FindBin module in test.pl like that:
use FindBin;
use lib "$FindBin::RealBin/lib/.";
use Foo;
But if you want to do the same inside Foo to "use Bar", you need to include all the relative path from the test.pl program, including "/lib" segment. That means Foo needs to be aware of his relative path to the calling program. This forces kind of rigidity in the directories structure. For example, you can't simply copy&paste your custom modules wherever you want and call it. Besides, it needs to install the FindBin module in order to work.
In order to solve this issues, googling I found this solution: BEGIN solution that directly add paths to #INC. With this in mind, a solution could be:
./test.pl
#!/usr/bin/perl
use strict;
use warnings;
# Include lib in #INC
BEGIN {
use File::Basename;
my($filename, $dirs, $suffix) = fileparse(__FILE__);
my $common_path = $dirs."lib/.";
unshift(#main::INC, $common_path) ;
}
use Foo;
print "Inside: $Foo::message";
./lib/Foo.pm
package Foo;
# Include Common library
BEGIN {
use File::Basename;
my($filename, $dirs, $suffix) = fileparse(__FILE__);
$common_path = $dirs."common/.";
unshift(#main::INC, $common_path) ;
}
use Bar;
$message = " Foo > $Bar::message";
1;
./lib/common/Bar.pm
package Bar;
$message = "Bar";
1;
Executing ./test.pl should print:
Inside: Foo > Bar
You could nest any number of modules in their respective directories (I tested three levels), and what is better, copy&paste at whatever point of path without breaking functionality.
Nevertheless I dont't know if this method is advisable or have any disadvantages (I'm newby with perl). For example, is it advisable editing #INC directly like this?. is it advisable use the BEGIN code block for that?. Or, are there better ways to do this (allowing copy&paste directories at arbitrary points of the modules directories structure) so it works without touching code inside the modules?
Don't change #INC in modules. Leave that to the script.
use FindBin qw( $RealBin );
use lib "$RealBin/lib", "$RealBin/lib/common";
That means Foo needs to be aware of his relative path to the calling program.
If the libs are there for the script and each other, that's not a problem, and using use lib makes sense.
Otherwise, it's just a module for any script to use, and use lib shouldn't be used. Instead, it should be installed into standard installation directory (or a custom one specified by env var PERL5LIB).
By the way, you probably shouldn't put common inside of lib. This implies you can do use common::Bar;, which would be wrong.

In Perl, why do I get "undefined subroutine" in a perl module but not in main ?

I'm getting an "undefined subroutine" for sub2 in the code below but not for sub1.
This is the perl script (try.pl)...
#!/usr/bin/env perl
use strict;
use IO::CaptureOutput qw(capture_exec_combined);
use FindBin qw($Bin);
use lib "$Bin";
use try_common;
print "Running try.pl\n";
sub1("echo \"in sub1\"");
sub2("echo \"in sub2\"");
exit;
sub sub1 {
(my $cmd) = #_;
print "Executing... \"${cmd}\"\n";
my ($stdouterr, $success, $exit_code) = capture_exec_combined($cmd);
print "${stdouterr}\n";
return;
}
This is try_common.pm...
#! /usr/bin/env perl
use strict;
use IO::CaptureOutput qw(capture_exec_combined);
package try_common;
use Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw(
sub2
);
sub sub2 {
(my $cmd) = #_;
print "Executing... \"${cmd}\"\n";
my ($stdouterr, $success, $exit_code) = capture_exec_combined($cmd);
print "${stdouterr}\n";
return;
}
1;
When I run try.pl I get...
% ./try.pl
Running try.pl
Executing... "echo "in sub1""
in sub1
Executing... "echo "in sub2""
Undefined subroutine &try_common::capture_exec_combined called at
/home/me/PERL/try_common.pm line 20.
This looks like some kind of scoping issue because if I cut/paste the "use IO::CaptureOutput qw(capture_exec_combined);" as the first line of sub2, it works. This is not necessary in the try.pl (it runs sub1 OK), but a problem in the perl module. Hmmmm....
Thanks in Advance for any help!
You imported capture_exec_combined by the use clause before declaring the package, so it was imported into the main package, not the try_common. Move the package declaration further up.
You should take a look at the perlmod document to understand how modules work. In short:
When you use package A (in Perl 5), you change the namespace of the following code to A, and all global symbol (e.g. subroutine) definitions after that point will go into that package. Subroutines inside a scope need not be exported and may be used preceded by their scope name: A::function. This you seem to have found.
Perl uses package as a way to create modules and split code in different files, but also as the basis for its object orientation features.
Most of the times, modules are handled by a special core module called Exporter. See Exporter. This module uses some variables to know what to do, like #EXPORT, #EXPORT_OK or #ISA. The first defines the names that should be exported by default when you include the module with use Module. The second defines the names that may be exported (but need to be mentioned with use Module qw(name1 name2). The last tells in an object oriented fashion what your module is. If you don't care about object orientation, your module typically "is a" Exporter.
Also, as stated in another answer, when you define a module, the package module declaration should be the first thing to be in the file so anything after it will be under that scope.
I hate when I make this mistake although I don't make it much anymore. There are two habits you can develop:
Most likely, make the entire file the package. The first lines will be the package statement and no other package statements show up in the file.
Or, use the new PACKAGE BLOCK syntax and put everything for that package inside the block. I do this for small classes that I might need only locally:
package Foo {
# everything including use statements go in this block
}
I think I figured it out. If, in the perl module, I prefix the "capture_exec_combined" with "::", it works.
Still, why isn't this needed in the main, try.pl ?

Name space pollution from indirectly included module

Consider the following script p.pl:
use strict;
use warnings;
use AA;
BB::bfunc();
where the file AA.pm is:
package AA;
use BB;
1;
and the file BB.pm is:
package BB;
sub bfunc {
print "Running bfunc..\n";
}
1;
Running p.pl gives output (with no warnings or errors):
Running bfunc..
Q: Why is it possible to call BB::bfunc() from p.pl even though there is no use BB; in p.pl? Isn't this odd behavior? Or are there situation where this could be useful?
(To me, it seems like this behavior only presents an information leak to another package and violates the data hiding principle.. Leading to programs that are difficult to maintain.. )
You're not polluting a namespace, because the function within BB isn't being 'imported' into your existing namespace.
They are separate, and may be referenced autonomously.
If you're making a module, then usually you'll define via Exporter two lists:
#EXPORT and #EXPORT_OK.
The former is the list of things that should be imported when you use the package. The latter is the things that you can explicity import via:
use MyPackage qw ( some_func );
You can also define package variables in your local namespace via our and reference them via $main.
our $fish = "haddock";
print $main::fish;
When you do this, you're explicitly referencing the main namespace. When you use a module, then you cause perl to go and look for it, and include it in your %INC. I then 'knows about' that namespace - because it must in order for the dependencies to resolve.
But this isn't namespace pollution, because it doesn't include anything in your namespace until your ask.
This might make a bit more sense if you have multiple packages within the same program:
use strict;
use warnings;
package CC;
our $package_var = "Blong";
sub do_something {
print $package_var,"\n";
}
package main;
use Data::Dumper;
our $package_var = "flonk";
print Dumper $package_var;
print Dumper $CC::package_var;
Each package is it's own namespace, but you can 'poke' things in another. perl will also let you do this with object - poking at the innards of instantiated objects or indeed "patch" them.
That's quite powerful, but I'd generally suggest Really Bad Style.
While it's good practice to use or require every dependency that you are planning to access (tried to avoid use here), you don't have to do that.
As long as you use full package names, that is fine. The important part is that Perl knows about the namespaces. If it does not, it will fail.
When you use something, that is equivalent to:
BEGIN {
require Foo::Bar;
Foo::Bar->import();
}
The require will take the Foo::Bar and convert it to a path according to the operating system's conventions. On Linux, it will try to find Foo/Bar.pm somewhere inside #INC. It will then load that file and make a note in %INC that it loaded the file.
Now Perl knows about that namespace. In case of the use it might import something into your own namespace. But it will always be available from everywhere after that as long as you use the full name. Just the same, stuff that you have in your main script.pl would be available inside of packages by saying main::frobnicate(). (Please don't do that!)
It's also not uncommon to bundle several namespaces/packages in one .pm module file. There are quite a few big names on CPAN that do it, like XML::Twig.
If you do that, and don't import anything, the only way to get to the stuff under the different namespaces is by using the full name.
As you can see, this is not polluting at all.

Avoiding collisions with modules with same method names

I am thinking of using the following perl modules from Cpan:
CSS-Minifier
Javascript-Minifier
I note from the documentation that I need to call the "minify" method for both.
I know I must be missing something obvious as a Perl newbie, but wouldn't they collide?
I suppose the question is how do I specify the "minify" for each module separately so that the CSS module works on CSS only and the JS module works on JS only?
Perl modules include subroutines. Usually, the reason to use a module is to make use of the subroutines from that module. A module will have its own package name and the subroutines from that module will exist in that package. So a simple module might look like this:
package MyModule;
sub my_sub {
print "This is my sub\n";
}
If I load that module in my code, then I have to call the subroutine using its fully qualified name (i.e. including the package name).
use MyModule;
MyModule::my_sub();
That gets repetitive quite quickly, so many modules will export their subroutines into your package. They do that using special arrays called #EXPORT and #EXPORT_OK.
If I put the name of a subroutine into #EXPORT then it automatically gets imported whenever that module is used.
package MyModule;
our #EXPORT = ('my_sub');
sub my_sub {
print "This is my sub\n";
}
I can then use it like this:
use MyModule;
my_sub();
Alternatively, I can use #EXPORT_OK which defines optional exports. Users can ask for subroutines in #EXPORT_OK to be imported - but it doesn't happen automatically. You ask for an option export to be imported by including the name in the use statement.
package MyModule;
our #EXPORT_OK = ('my_sub');
sub my_sub {
print "This is my sub\n";
}
I can now do this:
use MyModule ('my_sub');
my_sub();
There's one more trick that might be useful. You can turn off automatic imports with an empty list on the use statement. Assume we have the #EXPORT version of the module:
use MyModule (); # Turn off imports
my_sub(); # Doesn't work
MyModule::my_sub(); # works
Now we have the knowledge we need to look at your specific problem. The simplest solution is to turn off all automatic imports and use the fully-qualified names of both of the subroutines.
use CSS::Minifier ();
use JavaScript::Minifier();
CSS::Minifier::minify();
JavaScript::Minifier::minify();
But we can be a bit cleverer. The documentation for both modules suggests that both imports are optional so that you need to explicitly import the subroutines.
use CSS::Minifier ('minify');
use JavaScript::Minifier('minify');
This would obviously be a bad thing to do, as you can't import two subroutines with the same name!
However, looking at the source code to the modules, I see that the documentation for JavaScript::Minifier is wrong (Edit: I've submitted a report about this error). CSS::Minifier has this line of code:
our #EXPORT_OK = qw(minify);
But JavaScript::Minifier has this:
our #EXPORT = qw(minify);
So the export is automatic from JavaScript::Minifier and optional from CSS::Minifier. So the simplest approach to take would be:
use JavaScript::Minifier; # Automatically imports minify
use CSS::Minifier; # Doesn't import option export minify
You could then use the two subroutines like this:
minify(); # Automatic export from JavaScript::Minifier
CSS::Minifier::minify(); # Fully-qualified name from CSS::Minifier
I suspect, however, that this is a bad approach as it's hard to be sure where the unqualified version of minify() comes from. I'd therefore recommend turning off all imports from the two modules and using the fully-qualified names for both subroutines.
To call it separately if both exported, use :
CSS::Minifier::minify();
or
JavaScript::Minifier::minify();
You can use the fully qualified names to reference subroutines and remove ambiguity. And create shorter names with aliases.
use CSS::Minifier;
use JavaScript::Minifier;
BEGIN {
*minify_css = \&{CSS::Minifier::minify};
*minify_js = \&{JavaScript::Minifier::minify};
}
minify_css(...);
minify_js(...);