How can I pass common arguments to Perl modules? - perl

I'm not thrilled with the argument-passing architecture I'm evolving for the (many) Perl scripts that have been developed for some scripts that call various Hadoop MapReduce jobs.
There are currently 8 scripts (of the form run_something.pl) that are run from cron. (And more on the way ... we expect anywhere from 1 to 3 more for every function we add to hadoop.) Each of these have about 6 identical command-line parameters, and a couple command line parameters that are similar, all specified with Euclid.
The implementations are in a dozen .pm modules. Some of which are common, and others of which are unique....
Currently I'm passing the args globally to each module ...
Inside run_something.pl I have:
set_common_args (%ARGV);
set_something_args (%ARGV);
And inside Something.pm I have
sub set_something_args {
(%MYARGS) =#_;
}
So then I can do
if ( $MYARGS{'--needs_more_beer'} ) {
$beer++;
}
I'm seeing that I'm probably going to have additional "common" files that I'll want to pass args to, so I'll have three or four set_xxx_args calls at the top of each run_something.pl, and it just doesn't seem too elegant.
On the other hand, it beats passing the whole stupid argument array down the call chain, and choosing and passing individual elements down the call chain is (a) too much work (b) error-prone (c) doesn't buy much.
In lots of ways what I'm doing is just object-oriented design without the object-oriented language trappings, and it looks uglier without said trappings, but nonetheless ...
Anyone have thoughts or ideas?

In the same vein as Pedro's answer, but upgraded to use Moose and MooseX::Getopt, I present the SO community with... a Moose modulino*: a Moose module that can be included and run normally as a module, or separately as a command-line utility:
# this is all in one file, MyApp/Module/Foo.pm:
package MyApp::Module::Foo;
use Moose;
use MooseX::Getopt;
has [ qw(my config args here) ] => (
is => 'ro', isa => 'Int',
);
sub run { ... }
package main;
use strict;
use warnings;
sub run
{
my $module = MyApp::Module::Foo->new_with_options();
$module->run();
}
run() unless caller();
The module can be invoked using:
perl MyApp/Module/Foo.pm --my 0 --config 1 --args 2 --here 3
Using this pattern, you can collect command-line arguments using one module, which is used by all other modules and scripts that share the same options, and use standard Moose accessor methods for retrieving those options.
*modulinos are modules that can also be run as stand-alone scripts -- a Perl design pattern by SO's own brian d foy.

Have a look at import in Getopt::Long. You pass arguments to your module through use Module qw/.../ and grab those via the import subroutine.
# Override import.
sub import {
my $pkg = shift; # package
my #syms = (); # symbols to import
my #config = (); # configuration
my $dest = \#syms; # symbols first
for ( #_ ) {
if ( $_ eq ':config' ) {
$dest = \#config; # config next
next;
}
push(#$dest, $_); # push
}
# Hide one level and call super.
local $Exporter::ExportLevel = 1;
push(#syms, qw(&GetOptions)) if #syms; # always export GetOptions
$pkg->SUPER::import(#syms);
# And configure.
Configure(#config) if #config;
}

Related

Anyway to tell if perl script is run via do?

I have a small script configuration file which is loaded from a main script.
#main.pl
package MYPACKAGE;
our $isMaster=1;
package main;
my config=do "./config.pl"
#config.pl
my $runViaDoFlag;
$runViaDoFlag=$0=~/main\.pl/; #Test if main.pl is the executing script
$runViaDoFlag=defined $MYPACKAGE::isMaster; #Test custom package variable
#Is there a 'built-in' way to do this?
die "Need to run from main script! " unless $runViaDoFlag;
{
options={
option1=>"some value",
option2=>"some value",
},
mySub=>sub {
# do interesting things here
}
}
In a more complicated config file it might not be so obvious that config.pl script is intended to only be executed by do. Hence I want to include a die with basic usage instructions.
Solutions:
test $0 for the main script name
have custom package variable defined in the main script and checked by the config script
simply have a comment in the config instructing the user how to use it.
These work, however is there some way of knowing if a script is executed via do via built-in variable/subs?
I'd offer a change in design: have that configuration in a normal module, in which you can then test whether it's been loaded by (out of) the main:: namespace or not. Then there is no need for any of that acrobatics with control variables etc.
One way to do that
use warnings;
use strict;
use feature 'say';
use FindBin qw($RealBin);
use lib $RealBin; # so to be able to 'use' from current directory
use ConfigPackage qw(load_config);
my $config = load_config();
# ...
and the ConfigPackage.pm (in the same directory)
package ConfigPackage;
use warnings;
use strict;
use feature 'say';
use Carp;
use Exporter qw(); # want our own import
our #EXPORT_OK = qw(load_config);
sub import {
#say "Loaded by: ", (caller)[0];
croak "Must only be loaded from 'main::'"
if not ( (caller)[0] eq 'main' );
# Now switch to Exporter::import to export symbols as needed
goto &Exporter::import;
}
sub load_config {
# ...
return 'Some config-related data structure';
}
1;
(Note that this use of goto is fine.)
This is just a sketch of course; adjust, develop further, and amend as needed. If this is lodaed out of a package other than main::, and so it fails, then that happens in the compile phase, since that's when import is called. I'd consider that a good thing.
If that config code need be able to run as well, as the question may indicate, then have a separate executable that loads this module and runs what need be run.
As for the question as stated, the title and the question's (apparent) quest differ a little, but both can be treated by using caller EXPR. It won't be a clean little "built-in" invocation though.
The thing about do as intended to be used is that
do './stat.pl' is largely like
eval `cat stat.pl`;
except that ...
(That stat.pl is introduced earlier in docs, merely to signify that do is invoked on a file.)
Then caller(0) will have clear hints to offer (see docs). It returns
my ($package, $filename, $line, $subroutine, $hasargs,
$wantarray, $evaltext, $is_require, $hints, $bitmask, $hinthash)
= caller($i);
In a call asked for, do './config.pl', apart from main (package) and the correct filename, the caller(0) in config.pl also returns:
(eval) for $subroutine
./config.pl for $evaltext
1 for $is_require
Altogether this gives plenty to decide whether the call was made as required.
However, I would not recommend this kind of involved analysis instead of just using a package, what is also incomparably more flexible.

Is there a tool to check a Perl script for unnecessary use statements?

For Python, there is a script called importchecker which tells you if you have unnecessary import statements.
Is there a similar utility for Perl use (and require) statements?
Take a look at Devel::TraceUse it might give you a chunk of what you're looking for.
Here is a script I wrote to attempt this. It is very simplistic and will not automate anything for you but it will give you something to start with.
#!/usr/bin/perl
use strict;
use v5.14;
use PPI::Document;
use PPI::Dumper;
use PPI::Find;
use Data::Dumper;
my %import;
my $doc = PPI::Document->new($ARGV[0]);
my $use = $doc->find( sub { $_[1]->isa('PPI::Statement::Include') } );
foreach my $u (#$use) {
my $node = $u->find_first('PPI::Token::QuoteLike::Words');
next unless $node;
$import{$u->module} //= [];
push $import{$u->module}, $node->literal;
}
my $words = $doc->find( sub { $_[1]->isa('PPI::Token::Word') } );
my #words = map { $_->content } #$words;
my %words;
#words{ #words } = 1;
foreach my $u (keys %import) {
say $u;
foreach my $w (#{$import{$u}}) {
if (exists $words{$w}) {
say "\t- Found $w";
}
else {
say "\t- Can't find $w";
}
}
}
There is a number of ways to load packages and import symbols (or not). I am not aware of a tool which single-handedly and directly checks whether those symbols are used or not.
But for cases where an explicit import list is given,
use Module qw(func1 func2 ...);
there is a Perl::Critic policy TooMuchCode::ProhibitUnusedImport that helps with much of that.
One runs on the command line
perlcritic --single-policy TooMuchCode::ProhibitUnusedImport program.pl
and the program is checked. Or run without --single-policy flag for a complete check and seek Severity 1 violations in the output, which this is.
For an example, consider a program
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # a class; but it imports 'path'
use Data::Dumper; # imports 'Dumper'
use Data::Dump qw(dd pp); # imports 'dd' and 'pp'
use Cwd qw(cwd); # imports only 'cwd'
use Carp qw(carp verbose); # imports 'carp'; 'verbose' isn't a symbol
use Term::ANSIColor qw(:constants); # imports a lot of symbols
sub a_func {
say "\tSome data: ", pp [ 2..5 ];
carp "\tA warning";
}
say "Current working directory: ", cwd;
a_func();
Running the above perlcritic command prints
Unused import: dd at line 7, column 5. A token is imported but not used in the same code. (Severity: 1)
Unused import: verbose at line 9, column 5. A token is imported but not used in the same code. (Severity: 1)
We got dd caught, while pp from the same package isn't flagged since it's used (in the sub), and neither are carp and cwd which are also used; as it should be, out of what the policy aims for.
But note
whatever comes with :constants tag isn't found
word verbose, which isn't a function (and is used implicitly), is reported as unused
if a_func() isn't called then those pp and carp in it are still not reported even though they are then unused. This may be OK, since they are present in code, but it is worth noting
(This glitch-list is likely not exhaustive.)
Recall that the import list is passed to an import sub, which may expect and make use of whatever the module's design deemed worthy; these need not be only function names. It is apparently beyond this policy to follow up on all that. Still, loading modules with the explicit import list with function names is good practice and what this policy does cover is an important use case.
Also, per the clearly stated policy's usage, the Dumper (imported by Data::Dumper) isn't found, nor is path from Path::Tiny. The policy does deal with some curious Moose tricks.
How does one do more? One useful tool is Devel::Symdump, which harvests the symbol tables. It catches all symbols in the above program that have been imported (no Path::Tiny methods can be seen if used, of course). The non-existing "symbol" verbose is included as well though. Add
use Devel::Symdump;
my $syms = Devel::Symdump->new;
say for $syms->functions;
to the above example. To also deal with (runtime) require-ed libraries we have to do this at a place in code after they have been loaded, what can be anywhere in the program. Then best do it in an END block, like
END {
my $ds = Devel::Symdump->new;
say for $ds->functions;
};
Then we need to check which of these are unused. At this time the best tool I'm aware of for that job is PPI; see a complete example. Another option is to use a profiler, like Devel::NYTProf.
Another option, which requires some legwork†, is the compiler's backend B::Xref, which gets practically everything that is used in the program. It is used as
perl -MO=Xref,-oreport_Xref.txt find_unused.pl
and the (voluminous) output is in the file report_Xref.txt.
The output has sections for each involved file, which have subsections for subroutines and their packages. The last section of the output is directly useful for the present purpose.
For the example program used above I get the output file like
File /.../perl5/lib/perl5//Data/Dump.pm
...
(some 3,000 lines)
...
File find_unused.pl --> there we go, this program's file
Subroutine (definitions)
... dozens of lines ...
Subroutine (main)
Package main
&a_func &43
&cwd &27
Subroutine a_func
Package ?
#?? 14
Package main
&carp &15
&pp &14
So we see that cwd gets called (on line 27) and that carp and pp are also called in the sub a_func. Thus dd and path are unused (out of all imported symbols found otherwise, by Devel::Symdump for example). This is easy to parse.
However, while path is reported when used, if one uses new instead (also in Path::Tiny as a traditional constructor) then that isn't reported in this last section, nor are other methods.
So in principle† this is one way to find which of the symbols (for functions) reported to exist by Devel::Symdump have been used in the program.
† The example here is simple and easy to process but I have no idea how complete, or hard to parse, this is when all kinds of weird ways for using imported subs are taken into account.

Where should I put common utility functions for Perl .t tests?

I am getting started with Test::More, already have a few .t test scripts. Now I'd like to define a function that will only be used for the tests, but across different .t files. Where's the best place to put such a function? Define another .t without any tests and require it where needed? (As a sidenote I use the module structure created by Module::Starter)
The best approach is to put your test functions, like any other set of functions, into a module. You can then use Test::Builder to have your test diagnostics/fail messages act as if the failure originated from the .t file, rather than your module.
Here is a simple example.
package Test::YourModule;
use Test::Builder;
use Sub::Exporter -setup => { exports => ['exitcode_ok'] }; # or 'use Exporter' etc.
my $Test = Test::Builder->new;
# Runs the command and makes sure its exit code is $expected_code. Contrived!
sub exitcode_ok {
my ($command, $expected_code, $name) = #_;
system($command);
my $exit = $? >> 8;
my $message = $!;
my $ok = $Test->is_num( $exit, $expected_code, $name );
if ( !$ok ) {
$Test->diag("$command exited incorrectly with the error '$message'");
}
return $ok;
}
In your script:
use Test::More plan => 1;
use Test::YourModule qw(exitcode_ok);
exitcode_ok('date', 0, 'date exits without errors');
Write a module as rjh has demonstrated. Put it in t/lib/Test/YourThing.pm, then it can be loaded as:
use lib 't/lib';
use Test::YourThing;
Or you can put it straight in t/Test/YourThing.pm, call it package t::Test::YourThing and load it as:
use t::Test::YourThing;
The upside is not having to write the use lib line in every test file, and clearly identifying it as a local test module. The down side is cluttering up t/, it won't work if "." is not in #INC (for example, if you run your tests in taint mode, but it can be worked around with use lib ".") and if you decide to move the .pm file out of your project you have to rewrite all the uses. Your choice.

How do I rename an exported function in Perl?

I have some Perl modules which exports various functions. (We haven't used #EXPORT in new modules for some years, but have retained it for compatibility with old scripts.)
I have renamed a number of functions and methods to change to a consistent naming policy, and thought that then adding a list of lines like
*directory_error = *directoryError;
at the end of the module would just alias the old name to the new.
This works, except when the old name is exported, and a calling script calls the function with an unqualified name: in this case, it reports that the subroutine is not found (in the calling module).
I guess that what is happening is that Exporter prepares the list in a BEGIN, when the alias has not been created; but I tried putting the typeglob assignment in a BEGIN block and that didn't help.
I've tried AUTOLOAD, but of course that does not make the name available in the calling context. Of course I could write a series of wrapper functions, but that is tedious. It's possible I could generate wrapper functions automatically, though I'm not sure how.
Any suggestions of a neat way of handling this?
Exporting
Manually calling #EXPORT =() stuff is getting a bit haggard.
package Bar;
use strict;
use warnings;
use Sub::Exporter -setup => {
exports => [qw[ foo ]],
groups => {
default => [qw[ foo ]],
}
};
sub foo(){
};
1;
Use:
use strict;
use warnings;
use Bar foo => { -as-> 'Foo' };
Sub::Exporter can do lots of awesome stuff, like group exports, group exclusion, builder methods ( Ie: how the subs it exports work are determined by passed parameters , and the subs are generated inside other subs, etc )
Sub::Exporter tutorial
Sub::Exporter
Renaming
For renaming things it might be better to have a secondary function which just stands as a legacy function that Carp()s when its called to recommend the code that points to it everywhere to be moved to the new method. This will increase consistency codewide.
Then when your tests stop spouting forth warnings, you can remove the legacy function.
sub old { # line 1
Carp::carp('Legacy function \'old\' called, please move to \'newmethod\' ');
goto &newmethod; # this passes #_ literally and hides itself from the stack trace.
} # line 4
sub newmethod { # line 6
Carp::cluck('In New Method');
return 5;
} # line 9
print old(), "\n"; # line 11
Legacy function 'old' called, please move to 'newmethod' at code.pl line 2
main::old() called at code.pl line 11
In New Method at code.pl line 7
main::newmethod() called at code.pl line 11
5
Note how the warnings in newmethod look exactly like they'd been called directly.
The following works for me. This seems to be what you're describing; you must have made a mistake somewhere.
Main script:
use strict;
use warnings;
use Bar;
baz();
Module:
package Bar;
use strict;
use warnings;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw(baz);
sub Baz { print "Baz() here\n" }
*baz = *Baz;
1;
You must export both names if you want both names to be visible. Using Michael Carman's answer as a base, you need
our #EXPORT = qw(Baz baz);
or
our #EXPORT = qw(Baz);
our #EXPORT_OK = qw(baz);
if you want to be able to call either one in the program. Just because they point to the same coderef does not mean all names for that coderef will be exported when one is.

How can I get Perl to give a warning message when a certain package/tag is imported?

I have a package that I just made and I have an "old-mode" that basically makes it work like it worked before: importing everything into the current namespace. One of the nice things about having this as a package is that we no longer have to do that. Anyway, what I would like to do is have it so that whenever anyone does:
use Foo qw(:oldmode);
I throw a warning that this is deprecated and that they should either import only what they need or just access functions with Foo->fun();
Any ideas on how to do this?
You write your own sub import in package Foo that will get called with the parameter list from use Foo.
An example:
package Foo;
use Exporter;
sub import {
warn "called with paramters '#_'";
# do the real import work
goto &{Exporter->can('import')};
}
So in sub import you can search the argument list for the deprecated tag, and then throw a warning.
Update: As Axeman points out, you should call goto &{Exporter->can('import')}. This form of goto replaces the current subroutine call on the stack, preserving the current arguments (if any). That's needed because Exporter's import() method will export to its caller's namespace.
Well, as you specifically state that you want to alarm in the cases of use Mod qw<:oldmode>; This works better:
package Foo;
use base qw<Exporter>;
use Carp qw<carp>;
...
sub import {
#if ( grep { $_ eq ':oldmode' } #_ ) { # Perl 5.8
if ( #_ ~~ ':oldmode' ) { # Perl 5.10
carp( 'import called with :oldmode!' );
}
goto &{Exporter->can( 'import' )};
}
Thanks to Frew, for mentioning the Perl 5.10 smart match syntax. I'm learning all the ways to work Perl 5.10 into my code.
Note: the standard way to use exporter in an import sub is to either manipulate $Exporter::ExportLevel or to call Foo->export_to_level( 1, #_ ); But I like the way above. It's quicker and, I think, simpler.