Name space pollution from indirectly included module - perl

Consider the following script p.pl:
use strict;
use warnings;
use AA;
BB::bfunc();
where the file AA.pm is:
package AA;
use BB;
1;
and the file BB.pm is:
package BB;
sub bfunc {
print "Running bfunc..\n";
}
1;
Running p.pl gives output (with no warnings or errors):
Running bfunc..
Q: Why is it possible to call BB::bfunc() from p.pl even though there is no use BB; in p.pl? Isn't this odd behavior? Or are there situation where this could be useful?
(To me, it seems like this behavior only presents an information leak to another package and violates the data hiding principle.. Leading to programs that are difficult to maintain.. )

You're not polluting a namespace, because the function within BB isn't being 'imported' into your existing namespace.
They are separate, and may be referenced autonomously.
If you're making a module, then usually you'll define via Exporter two lists:
#EXPORT and #EXPORT_OK.
The former is the list of things that should be imported when you use the package. The latter is the things that you can explicity import via:
use MyPackage qw ( some_func );
You can also define package variables in your local namespace via our and reference them via $main.
our $fish = "haddock";
print $main::fish;
When you do this, you're explicitly referencing the main namespace. When you use a module, then you cause perl to go and look for it, and include it in your %INC. I then 'knows about' that namespace - because it must in order for the dependencies to resolve.
But this isn't namespace pollution, because it doesn't include anything in your namespace until your ask.
This might make a bit more sense if you have multiple packages within the same program:
use strict;
use warnings;
package CC;
our $package_var = "Blong";
sub do_something {
print $package_var,"\n";
}
package main;
use Data::Dumper;
our $package_var = "flonk";
print Dumper $package_var;
print Dumper $CC::package_var;
Each package is it's own namespace, but you can 'poke' things in another. perl will also let you do this with object - poking at the innards of instantiated objects or indeed "patch" them.
That's quite powerful, but I'd generally suggest Really Bad Style.

While it's good practice to use or require every dependency that you are planning to access (tried to avoid use here), you don't have to do that.
As long as you use full package names, that is fine. The important part is that Perl knows about the namespaces. If it does not, it will fail.
When you use something, that is equivalent to:
BEGIN {
require Foo::Bar;
Foo::Bar->import();
}
The require will take the Foo::Bar and convert it to a path according to the operating system's conventions. On Linux, it will try to find Foo/Bar.pm somewhere inside #INC. It will then load that file and make a note in %INC that it loaded the file.
Now Perl knows about that namespace. In case of the use it might import something into your own namespace. But it will always be available from everywhere after that as long as you use the full name. Just the same, stuff that you have in your main script.pl would be available inside of packages by saying main::frobnicate(). (Please don't do that!)
It's also not uncommon to bundle several namespaces/packages in one .pm module file. There are quite a few big names on CPAN that do it, like XML::Twig.
If you do that, and don't import anything, the only way to get to the stuff under the different namespaces is by using the full name.
As you can see, this is not polluting at all.

Related

how to access variables in imported module in local scope in perl?

I am stuck while creating a perl Moose module.
I have a global pm module.
package XYZ;
require Exporter;
our #ISA = qw(Exporter); ## EDIT missed this line
our #EXPORT_OK = qw($VAR);
my $VAR1 = 1;
our $VAR = {'XYZ' => $VAR1};
1;
I want to get $VAR in a Moose module I'm creating
package THIS;
use Moose;
use YAML::XS;
sub get_all_blocks{
my ($self) = #_;
require $self->get_pkg(); # this returns the full path+name of the above package
# i cannot use use lib+use since the get_pkg starts complaining
our $VAR;
print YAML::XS::Dump($XYZ::VAR); # this works
print YAML::XS::Dump($VAR); # this does not work
# i cannot use the scope resolution since XYZ would keep changing.
}
1;
Can someone please help me with accessing variable?
EDIT: Missed one line in the package XYZ code.
I cannot touch the package XYZ since it is owned/used by someone else, I can just use it :(
Exporting variables may easily lead to trouble.
Why not
package XYZ;
use strict;
use warnings;
use Exporter qw(import);
our #EXPORT_OK = qw(get_var);
my $VAR = '...'; # no need for "our" now
sub get_var { return $VAR }
...
1;
and then
package THIS;
use warnings;
use strict;
use XYZ qw(get_var);
my $var = get_var();
...
1;
See Exporter.
As for what you tried to do, there are two direct problems
$VAR from XYZ is never imported into THIS. If you need symbols from other packages you need to import them.† Those packages have to make them available first, so you need to add it to #EXPORT_OK as well.
Like above but with $VAR instead of get_var()
package XYZ;
...
use Exporter qw(import);
our #EXPORT_OK = qw($VAR);
our $VAR = '...'; # need be "our" for this
with
package THIS;
...
use XYZ qw($VAR);
print "$VAR\n";
Now $VAR can be used directly, including being written to (unless declared constant); that can change its value under the feet of yet other code, which may never even know about any of it.
Another way is to use #EXPORT and then those symbols are introduced into every program that says use Package;. I strongly recommend to only use #EXPORT_OK, when callers need to explicitly list what they want. That also nicely documents what is being used.
Even once you add that, there is still a variable with the same name in THIS, which hides (masks, shadows) the $XYZ::VAR. So remove our $VAR in THIS. This is an excellent example of one problem with globals. Once they're introduced we have to be careful about them always and everywhere.
But there are far greater problems with sharing variables across modules.
It makes code components entangled and the code gets harder and harder to work with. It runs contrary to principles of well defined scopes and modular design, it enables action at a distance, etc. Perl provides many good tools for structuring code and we rarely need globals and shared variables. It is telling that the Exporter itself warns against that.
Note how now my $VAR in XYZ is not visible outside XYZ; there is no way for any code outside XYZ to know about it or to access it.‡ When it is our then any code in the interpreter can write it simply as $XYZ::VAR, and without even importing it; that's what we don't want.
Of course that there may be a need for or good use of exporting variables, what can occasionally be found in modules. That is an exception though, to be used sparingly and carefully.
† Unless they're declared as package globals under a lexical alias via our in their package, in which case they can be used anywhere as $TheirPackageName::varname.
‡ This complete privacy is courtesy of my.
You do not want our $VAR; in THIS's namespace. That creates a lexical reference to $THIS::VAR. Not what you want.
Instead, you need to use properly:
use XYZ qw($VAR);
However, XYZ doesn't have an import to run here, so you need to update that. There are two ways to fix XYZ to do this - one is to import import, e.g., use Exporter qw(import);, the other is to derive off Exporter, e.g., use parent qw(Exporter);. Both of these will get XYZ->import(...) to work properly.
Once XYZ is useing Exporter correctly, then the use XYZ qw($VAR); line will cause perl to implicitly load XYZ and call XYZ->import(qw($VAR)), which will import that variable into your namespace.
Now, having answered your question, I will join others in suggesting that exporting variables is a very bad code smell, and probably is not the best / cleanest way to do what you want.

Avoiding collisions with modules with same method names

I am thinking of using the following perl modules from Cpan:
CSS-Minifier
Javascript-Minifier
I note from the documentation that I need to call the "minify" method for both.
I know I must be missing something obvious as a Perl newbie, but wouldn't they collide?
I suppose the question is how do I specify the "minify" for each module separately so that the CSS module works on CSS only and the JS module works on JS only?
Perl modules include subroutines. Usually, the reason to use a module is to make use of the subroutines from that module. A module will have its own package name and the subroutines from that module will exist in that package. So a simple module might look like this:
package MyModule;
sub my_sub {
print "This is my sub\n";
}
If I load that module in my code, then I have to call the subroutine using its fully qualified name (i.e. including the package name).
use MyModule;
MyModule::my_sub();
That gets repetitive quite quickly, so many modules will export their subroutines into your package. They do that using special arrays called #EXPORT and #EXPORT_OK.
If I put the name of a subroutine into #EXPORT then it automatically gets imported whenever that module is used.
package MyModule;
our #EXPORT = ('my_sub');
sub my_sub {
print "This is my sub\n";
}
I can then use it like this:
use MyModule;
my_sub();
Alternatively, I can use #EXPORT_OK which defines optional exports. Users can ask for subroutines in #EXPORT_OK to be imported - but it doesn't happen automatically. You ask for an option export to be imported by including the name in the use statement.
package MyModule;
our #EXPORT_OK = ('my_sub');
sub my_sub {
print "This is my sub\n";
}
I can now do this:
use MyModule ('my_sub');
my_sub();
There's one more trick that might be useful. You can turn off automatic imports with an empty list on the use statement. Assume we have the #EXPORT version of the module:
use MyModule (); # Turn off imports
my_sub(); # Doesn't work
MyModule::my_sub(); # works
Now we have the knowledge we need to look at your specific problem. The simplest solution is to turn off all automatic imports and use the fully-qualified names of both of the subroutines.
use CSS::Minifier ();
use JavaScript::Minifier();
CSS::Minifier::minify();
JavaScript::Minifier::minify();
But we can be a bit cleverer. The documentation for both modules suggests that both imports are optional so that you need to explicitly import the subroutines.
use CSS::Minifier ('minify');
use JavaScript::Minifier('minify');
This would obviously be a bad thing to do, as you can't import two subroutines with the same name!
However, looking at the source code to the modules, I see that the documentation for JavaScript::Minifier is wrong (Edit: I've submitted a report about this error). CSS::Minifier has this line of code:
our #EXPORT_OK = qw(minify);
But JavaScript::Minifier has this:
our #EXPORT = qw(minify);
So the export is automatic from JavaScript::Minifier and optional from CSS::Minifier. So the simplest approach to take would be:
use JavaScript::Minifier; # Automatically imports minify
use CSS::Minifier; # Doesn't import option export minify
You could then use the two subroutines like this:
minify(); # Automatic export from JavaScript::Minifier
CSS::Minifier::minify(); # Fully-qualified name from CSS::Minifier
I suspect, however, that this is a bad approach as it's hard to be sure where the unqualified version of minify() comes from. I'd therefore recommend turning off all imports from the two modules and using the fully-qualified names for both subroutines.
To call it separately if both exported, use :
CSS::Minifier::minify();
or
JavaScript::Minifier::minify();
You can use the fully qualified names to reference subroutines and remove ambiguity. And create shorter names with aliases.
use CSS::Minifier;
use JavaScript::Minifier;
BEGIN {
*minify_css = \&{CSS::Minifier::minify};
*minify_js = \&{JavaScript::Minifier::minify};
}
minify_css(...);
minify_js(...);

Load perl modules automatically during runtime in Perl

Is there a way to load entire modules during runtime in Perl? I had thought I found a good solution with autouse but the following bit of code fails compilation:
package tryAutouse2;
use autouse 'tryAutouse';
my $obj = tryAutouse->new();
I imagine this is because autouse is specifically meant to be used with exported functions, am I correct? Since this fails compilation, is it impossible to have a packaged solution? Am I forced to require before each new module invocation if I want dynamic loading?
The reasoning behind this is that my team loads many modules, but we're afraid this is eating up memory.
You want Class::Autouse or ClassLoader.
Due to too much magic, I use ClassLoader only in my REPL for convenience. For serious code, I always load classes explicitely. Jack Maney points out in a comment that Module::Load and Module::Load::Conditional are suitable for delayed loading.
There's nothing wrong with require IMO. Skip the export of the function and just call the fully qualified name:
require Some::Module;
Some::Module::some_function(#some_arguments);
eval 'use tryAutouse; 1;' or die $#;
Will work. But you might want to hide the ugliness.
When you say:
use Foo::Bar;
You're loading module Foo::Bar in at compile time. Thus, if you want to load your module in at run time, you'd use require:
require Foo::Bar;
They are sort of equivalent, but there are differences. See the Perldoc on use to understand the complete difference. For example, require used in this way won't automatically load in imported functions. That might be important to you.
If you want to test whether a module is there or not, wrap up your require statement in an eval and test whether or not eval is successful.
I use a similar technique to see if a particular Perl module is available:
eval { require Mail::Sendmail; };
if ($#) {
$watch->_Send_Email_Net_SMTP($watcher);
return;
}
In the above, I'll attempt to use Mail::Sendmail which is an optional module if it's available. If not, I'll run another routine that uses Net::SMTP:
sub _Send_Email_Net_SMTP {
my $self = shift;
my $watcher = shift;
require Net::SMTP; #Standard module: It should be available
WORD O'WARNING: You need to use curly braces around your eval statement and not parentheses. Otherwise, if the require doesn't work, your program will exit which is probably not what you want to do.
Instruction 'use' is performed at compile time, so check the path to the module also takes place at compile time. This may cause incorrect behavior, which are difficult to understand until you consider the contents of the #INC array.
One solution is to add block 'BEGIN', but the solution shown below is inelegant.
BEGIN { unshift #INC, '/path/to/module/'; }
use My::Module;
You can replace the whole mess a simple directive:
use lib '/path/to/module';
use My::Module;
This works because it is performed at compile time. So everything is ready to execute 'use' instruction.
Instead of the 'BEGIN' block, you can also decide to different instruction executed at compile time ie declaring a constant.
use constant LIB_DIR => '/path/to/module';
use lib LIB_DIR;
use My::Module;

How should I organize many Perl modules?

Consider that I have 100 Perl modules in 12 directories. But, looking into the main Perl script, it looks like 100 use p1 ; use p2 ; etc. What is the to best way to solve this issue?
It seems unlikely to me that you're useing all 100 modules directly in your main program. If your program uses a function in module A which then calls a function from module B, but the main program itself doesn't reference anything in module B, then the program should only use A. It should not use B unless it directly calls anything from module B.
If, on the other hand, your main program really does talk directly to all 100 modules, then it's probably just plain too big. Identify different functional groupings within the program and break each of those groups out into its own module. The main reason for doing this is so that it will result in code that is more maintainable, flexible, and reusable, but it will also have the happy side-effect of reducing the number of modules that the main program talks to directly, thus cutting down on the number of use statements required in any one place.
(And, yes, I do realize that 100 was probably an exaggeration, but, if you're getting uncomfortable about the number of modules being used by your code, then that's usually a strong indication that the code in question is trying to do too much in one place and should be broken down into a collection of modules.)
Put all the use statements in one file, say Mods.pm:
package Mods;
use Mod1;
use Mod2;
...
and include the file in your main script:
use Mods;
I support eugene's solution, but you could group the use statements in files by topic, like:
package Math;
use ModMatrix;
use ModFourier;
...
And of course you should name the modules and the mod-collections meaningful.
Putting all of the use statements in a separate file as eugene y suggested is probably the best approach. you can minimize the typing in that module with a bit of meta programming:
package Mods;
require Exporter;
our #ISA = 'Exporter';
my #packages = qw/Mod1 Mod2 Mod3 .... /;
# or map {"Mod$_"} 1 .. 100 if your modules are actually named that way
for (#packages) {
eval "require $_" or die $#; # 'use' means "require pkg; pkg->import()"
$_->import(); # at compile time
}
our #EXPORT = grep {*{$Mods::{$_}}{CODE}} keys %Mods::; # grab imported subs
#or #EXPORT_OK

Is it mandatory that a folder by the name of a package should be present for creating a package?

We are factoring out the common code from our Perl project. One main program should be split into several re-usable modules.
Our program name is validate_results.pl which contains set of validation commands. We are planning to split this into small modules so that validate_results.pl should be like:
use Common::Validate_Results;
use Common::Validate_Results::CommonCommands;
use Common::Validate_Results::ReturnCodeValidation;
...
As per my understanding I should create a Common folder and under that Validate_Results.pm should be present. Again under Common, Validate_Results folder should be created and under that CommonCommands and ReturnCodeValidation folders should be present.
Is it mandatory that all these folders should be present or can we have all the Perl programs in a single folder and logically group them and still use the above way to access the modules (say use common::validate_results like that).
The filesystem hierarchy is required. A::B::C will always be located in A/B/C.pm, somewhere in #INC.
If you have to get around this, read perldoc -f require, specifically looking for the section about subroutine references in #INC. Yes, you can make the module loader do weird things if that's what you really want; but that's not what you want, trust me. Just stick to the convention, like the other 99.9999999% of Perl applications do.
If you want to 'use' your modules, then you must conform to the structure. If you want to get around that you can 'require' your modules instead, passing the filename to require.
You really shouldn't do this, though. If you truly don't want to have a directory structure, take it out of the module names (though that can lead to problems in the future if you ever have a module name that conflicts with something more generic from CPAN). Simply add the scripts directory to the INC path via Find::Bin and use the modules directly:
use FindBin;
use lib $FindBin::Bin;
use ValidateResults;
use CommonCommands;
use ReturnCodeValidation;
HTH
Here's an example of a module and it's sub-modules in the same file:
package Foo;
use strict;
use Exporter 'import';
our #EXPORT = ( 'from_foo' );
sub from_foo { print "from_foo\n"; }
package Foo::Bar;
use strict;
use Exporter 'import';
our #EXPORT = ( 'from_foo_bar' );
sub from_foo_bar { print "from_foo_bar\n"; }
1;
In your program, if you use module Foo (the one with a .pm file):
use Foo;
You will have access to Foo::Bar functions, except only as canonical names (Foo::Bar::from_foo_bar). You can import them like this:
use Foo;
Foo::Bar->import;
Note that you can't do this:
use Foo::Bar;
Because there is no file Foo/Bar.pm.
The package name in a 'use' command is effectively just a path which ends with a .pm file, so you don't need a folder with the name of every package. In your example, you need folders:
Common
Common/Validate_Results
But you don't need folders:
Common/Validate_Results/CommonCommands
Common/Validate_Results/ReturnCodeValidation
The actual package name in the .pm file does not have to be the same as the name in the 'use' command that loads it. But keeping the paths consistent with the package names is always a good idea.