Defining constants for a number of scripts and modules in perl - perl

I am facing the following problem:
I am working on a perl project consisting of a number of modules and scripts. The project must run on two different machines.
Throughout the project i call external programs, but the paths are different on both machines, so I would like to define them once globally for all files and then only change this definition when i switch machines.
Since I am fairly new to perl I ask you what would be a common way to accomplish this.
Should I use "use define" or global variables or something else?
Thanks in advance!

If I were you, I'd definitely do my best to avoid global variables - they are a sign of weak coding style (in any language) and offer you a maintenance hell.
Instead, you could create and use configuration files - one for each of your machines. Being on Perl, you have plenty of options for free, ready to use CPAN modules:
Config::Auto
Config::JSON
Config::YAML
And many many other

Rather than defining globals which may or may not work, why not use a subroutine to find a working executable?
my $program = program_finder();
sub program_finder {
-x && return $_ for qw( /bin/perl /usr/bin/perl /usr/local/bin/perl );
die "Could not find a perl executable";
}

Create a module to hold your configuration information.
In file My/Config.pm in your perl library path:
package My::Config;
use warnings;
use strict;
use Carp ();
my %setup = (
one => {path => '/some/path'},
two => {path => '/other/path'},
);
my $config = $setup{ $ENV{MYCONFIG} }
or Carp::croak "environment variable MYCONFIG must be set to one of: "
.(join ' ' => keys %setup)."\n";
sub AUTOLOAD {
my ($key) = our $AUTOLOAD =~ /([^:]+)$/;
exists $$config{$key} or Carp::croak "no config for '$key'";
$$config{$key}
}
And then in your files:
use My::Config;
my $path = My::Config->path;
And of course on your machines, set the environment variable MYCONFIG to one of the keys in %setup.

Related

Perl "do", with relative path beginning with "." or ".."

I'm trying to use Perl's do EXPR function as a poor man's config parser, using a second .pl file that just returns a list as configuration information. (I think this is probably the ideal use for do, not least because I can write "do or die" in my code.) Here's an example:
main.pl
# Go read the config file
my %config = do './config.pl';
# do something with it
$web_object->login($config{username}, $config{password});
config.pl
# Configuration file for main script
(
username => "username",
password => "none_of_your_business",
favorite_color => "0x0000FF",
);
Reading Perldoc for do gives a lot of helpful advice about relative paths - searching #INC and modifying %INC, special warnings about 5.26 not searching "." any more, etc. But it also has these bits:
# load the exact specified file (./ and ../ special-cased)...
Using do with a relative path (except for ./ and ../), like...
And then it never actually bothers to explain the Special Case path handling for "./" or "../" - an important omission!
So my question(s) are all variations on "what really happens when you do './file.pl';"? For instance...
Does this syntax still work in 5.26, though CWD is removed from #INC?
From whose perspective is "./" anyway: the Perl binary, the Perl script executed, CWD from the user's shell, or something else?
Are there security risks to be aware of?
Is this better or worse than modifying #INC and just using a base filename?
Any insight is appreciated.
OK, so - to start with, I'm not sure your config.pl is really the right approach - it's not perl for starters, because it doesn't compile. Either way though, trying to evaluate stuff to 'parse config' isn't a great plan generally - it's rather prone to unpleasant glitches and security flaws, so should be reserved for when it's needed.
I would urge you to do it differently by either:
Write it as a module
Something like this:
package MyConfig;
# Configuration file for main script
our %config = (
username => "username",
password => "none_of_your_business",
favorite_color => "0x0000FF",
);
You could then in your main script:
use MyConfig; #note - the file needs to be the same name, and in #INC
and access it as:
print $MyConfig::config{username},"\n";
If you can't put it in the existing #INC - which there may be reasons you can't, FindBin lets you use paths relative to your script location:
use FindBin;
use lib "$FindBin::Bin";
use MyConfig;
Write your 'config' as a defined parsable format, rather than executable code.
YAML
YAML is very solid for a config file particularly:
use YAML::XS;
open ( my $config_file, '<', 'config.yml' ) or die $!;
my $config = Load ( do { local $/; <$config_file> });
print $config -> {username};
And your config file looks like:
username: "username"
password: "password_here"
favourite_color: "green"
air_speed_of_unladen_swallow: "african_or_european?"
(YAML also supports multi-dimensional data structures, arrays etc. You don't seem to need these though.)
JSON
JSON based looks much the same, just the input is:
{
"username": "username",
"password": "password_here",
"favourite_color": "green",
"air_speed_of_unladen_swallow": "african_or_european?"
}
You read it with:
use JSON;
open ( my $config_file, '<', 'config.json' ) or die $!;
my $config = from_json ( do { local $/; <$config_file> });
Using relative paths to config:
You don't have to worry about #INC at all. You can simply use based on relative path... but a better bet would be to NOT do that, and use FindBin instead - which lets you specify "relative to my script path" and that's much more robust.
use FindBin;
open ( my $config_file, '<', "$FindBin::Bin/config.yml" ) or die $!;
And then you'll know you're reading the one in the same directory as your script, no matter where it's invoked from.
specific questions:
From whose perspective is "./" anyway: the Perl binary, the Perl script executed, CWD from the user's shell, or something else?
Current working directory passes down through processes. So user's shell by default, unless the perl script does a chdir
Are there security risks to be aware of?
Any time you 'evaluate' something as if it were executable code (and EXPR can be) there's a security risk. It's probably not huge, because the script will be running as the user, and the user is the person who can tamper with CWD. The core risks are:
user is in a 'different' directory where someone else has put a malicious thing for them to run. (e.g. imagine of 'config.pl' had rm -rf /* in it for example). Maybe there's a 'config.pl' in /tmp that they 'run' accidentally?
The thing you're evaling has a typo, and breaks the script in funky and unexpected ways. (E.g. maybe it redefines $[ and messes with program logic henceforth in ways that are hard to debug)
script does anything in a privileged context. Which doesn't appear to be the case, but see the previous point and imagine if you're root or other privileged user.
Is this better or worse than modifying #INC and just using a base filename?
Worse IMO. Actually just don't modify #INC at all, and use a full path, or relative one using FindBin. And don't eval things when it's not necessary.

How can I use the Environment Modules system in Perl?

How can one use the Environment Modules system* in Perl?
Running
system("load module <module>");
does not work, presumably because it forks to another environment.
* Not to be confused with Perl modules. According to the Wikipedia entry:
The Environment Modules system is a tool to help users manage their Unix or Linux shell environment, by allowing groups of related environment-variable settings to be made or removed dynamically.
It looks like the Perl module Env::Modulecmd will do what you want. From the documentation:
Env::Modulecmd provides an automated interface to modulecmd from Perl. The most straightforward use of Env::Modulecmd is for loading and unloading modules at compile time, although many other uses are provided.
Example usage:
use Env::Modulecmd { load => 'foo/1.0' };
Alternately, to do it less perl-namespace like and more environment module shell-like, you can source the Environment Modules initialization perl code like the other shells:
do( '/usr/share/Modules/init/perl');
module('load use.own');
print module('list');
For a one-line example:
perl -e "do ('/usr/share/Modules/init/perl');print module('list');"
(This problem, "source perl environment module" uses such generic words, that it is almost un-searchable.)
system("load module foo ; foo bar");
or, if that doesn't work, then
system("load module foo\nfoo bar");
I'm guessing it makes changes to the environment variables. To change Perl's environment variables, it would have to be executed within the Perl process. That's not going to work since it was surely only designed to be integrated into the shell. (It might not be too hard to port it, though.)
If you are ok with restarting the script after loading the module, you can use the following workaround:
use String::ShellQuote qw( shell_quote );
BEGIN {
if (!#ARGV || $ARGV[0] ne '!!foo_loaded!!') {
my $perl_cmd = shell_quote($^X, '--', $0, '!!foo_loaded!!', #ARGV);
exec("load module foo ; $perl_cmd")
or die $!;
}
shift(#ARGV);
}

A proper way of using Perl custom modules inside of other Perl modules

I'm using custom modules in my scripts and have to store them outside of Perl lib directory. So in Perl scripts (*.pl) I use the following block to include them in #INC:
BEGIN {
use FindBin qw($Bin);
push #INC, "$Bin/../ModulesFolder1";
push #INC, "$Bin/../ModulesFolder2";
}
But I also have to use modules inside of my other Perl modules (*.pm), and as I understand FindBin works for scripts only. So I change that block to:
BEGIN {
push #INC, catdir( dirname( $INC{'ThisModule.pm'} ), qw( .. ModulesFolder1 ) );
push #INC, catdir( dirname( $INC{'ThisModule.pm'} ), qw( .. ModulesFolder2 ) );
}
It works but with a little problem. I code in Eclipse with EPIC plugin, and "if you have something in a BEGIN block that causes the compiler to abort prematurely, it won't report syntax errors to EPIC", so that way I loose Perl syntax check in modules.
So, with FindBin (in scripts) I don't have to use any functions (like catdir) in BEGIN{} block, and the syntax check of the following code goes on correctly. Besides, I'd like not to change any environment variables (like PERL5LIB), so that I could use the scripts on my colleagues' machines without any additional preparations.
What's the proper way of using custom Perl modules inside of other modules, and not interfering with EPIC syntax check at the same time? Or maybe I even should include modules in completely other way?
I strongly disagree with modifying #INC in modules. It causes all kinds of headaches. Let the script (or even the calling process via the PERL5LIB environment variable) setup #INC correctly.
script.pl:
use FindBin qw( $RealBin );
use lib
"$RealBin/../ModulesFolder1",
"$RealBin/../ModulesFolder2";
use ModuleInFolder1;
ModuleInFolder1.pm:
use ModuleInFolder2; # Works fine.
As for EPIC, do the following:
Right-click on the project.
Properties
Perl Include Path
${project_loc}/ModulesFolder1, Add to list
${project_loc}/ModulesFolder2, Add to list
(I literally mean the 14 chars ${project_loc}. That means something to EPIC. It will continue to work even if you move the project.)
PS — $RealBin is better than $Bin because it allows you to use a symlink to your script.
PS — __FILE__ is more appropriate than $INC{'ThisModule.pm'}.
Not sure about Eclipse, but you can use use lib (will probably not work, it changes #INC at compile time) or set the environment variable PERL5LIB to point to your library folder(s).
Set up PERL5LIB environment variable. Every time you use or require, Perl will check all directories listed in it.
Alternatively, place all necessary custom modules under script's directory, so you can use relative paths in use lib. It will also allow you to quickly make a bundle to transfer everything to another PC by just packing recursively from top-level directory.
Another solution (from my colleague) - a change to be made in the module:
sub path_to_current_module() {
my $package_name = __PACKAGE__ .'.pm';
$package_name =~ s#::#/#g;
for my $path ( #INC ) {
# print "\$path == '$path'\n";
if ( -e catfile( $path, $package_name ) ) {
return $path;
}
}
confess;
}
BEGIN {
my $path_to_current_module = path_to_current_module();
push #INC, catdir( $path_to_current_module, qw( .. ModulesFolder1 ) );
push #INC, catdir( $path_to_current_module, qw( .. ModulesFolder2 ) );
}
It seems that the old way (described in the question) Perl couldn't locate current module name in #INC - that's why perl -c was interrupted by error inside of the BEGIN block. And the described sub helps it to determine the real path to the current module. Besides, it doesn't depend on the current file name and can be copied to another module.

How do I load libraries relative to the script location in Perl?

How can you get current script directory in Perl?
This has to work even if the script is imported from another script (require).
This is not the current directory
Example:
#/aaa/foo.pl
require "../bbb/foo.pl"
#/bbb/bar.pl
# I want to obtain my directory (`/bbb/`)
print($mydir)
The script foo.pl could be executed in any ways and from any directory, like perl /aaa/foo.pl, or ./foo.pl.
What people usually do is
use FindBin '$Bin';
and then use $Bin as the base-directory of the running script. However, this won't work if you do things like
do '/some/other/file.pl';
and then expect $Bin to contain /some/other/ within that. I'm sure someone thought of something incredibly clever to work this around and you'll find it on CPAN somewhere, but a better approach might be to not include a program within a program, but to use Perl's wonderful ways of code-reuse that are much nicer than do and similar constructs. Modules, for example.
Those generally shouldn't care about what directory they were loaded from. If they really need to operate on some path, you can just pass that path to them.
See Dir::Self CPAN module. This adds pseudo-constant __DIR__ to compliment __FILE__ & __LINE__.
use Dir::Self;
use lib __DIR__ . '/lib';
I use this snippet very often:
use Cwd qw(realpath);
use File::Basename;
my $cwd = dirname(realpath($0));
This will give you the real path to the directory containing the currently running script. "real path" means all symlinks, "." and ".." resolved.
Sorry for the other 4 responses but none of them worked, here is a solution that really works.
In below example that adds the lib directory to include path the $dirname will contain the path to the current script. This will work even if this script is included using require from another directory.
BEGIN {
use File::Spec;
use File::Basename;
$dirname = dirname(File::Spec->rel2abs( __FILE__ )) . "/lib/";
}
use lib $dirname;
From perlfaq8's answer to How do I add the directory my program lives in to the module/library search path?
(contributed by brian d foy)
If you know the directory already, you can add it to #INC as you would for any other directory. You might if you know the directory at compile time:
use lib $directory;
The trick in this task is to find the directory. Before your script does anything else (such as a chdir), you can get the current working directory with the Cwd module, which comes with Perl:
BEGIN {
use Cwd;
our $directory = cwd;
}
use lib $directory;
You can do a similar thing with the value of $0, which holds the script name. That might hold a relative path, but rel2abs can turn it into an absolute path. Once you have the
BEGIN {
use File::Spec::Functions qw(rel2abs);
use File::Basename qw(dirname);
my $path = rel2abs( $0 );
our $directory = dirname( $path );
}
use lib $directory;
The FindBin module, which comes with Perl, might work. It finds the directory of the currently running script and puts it in $Bin, which you can then use to construct the right library path:
use FindBin qw($Bin);
You can also use local::lib to do much of the same thing. Install modules using local::lib's settings then use the module in your program:
use local::lib; # sets up a local lib at ~/perl5
See the local::lib documentation for more details.
Let's say you're looking for script.pl. You may be running it, or you may have included it. You don't know. So it either lies in the %INC table in the first case or as $PROGRAM_NAME (aka $0) in the second.
use strict;
use warnings;
use English qw<$PROGRAM_NAME>;
use File::Basename qw<dirname>;
use File::Spec;
use List::Util qw<first>;
# Here we get the first entry that ends with 'script.pl'
my $key = first { defined && m/\bscript\.pl$/ } keys %INC, $PROGRAM_NAME;
die "Could not find script.pl!" unless $key;
# Here we get the absolute path of the indicated path.
print File::Spec->rel2abs( dirname( $INC{ $key } || $key )), "\n";
Link to File::Basename, File::Spec, and List::Util

How can I determine CPAN dependencies before I deploy a Perl project?

Does anyone have any suggestions for a good approach to finding all the CPAN dependencies that might have arisen in a bespoke development project. As tends to be the case your local development environment rarely matches your live one and as you build more and more projects you tend to build up a local library of installed modules. These then lead to you not necessarily noticing that your latest project has a requirement on a non-core module. As there is generally a requirement to package the entire project up for deployment to another group (in our case our operations team), it is important to know what modules should be included in the package.
Does anyone have any insights into the problem.
Thanks
Peter
I've had this problem myself. Devel::Modlist (as suggested by this answer) takes a dynamic approach. It reports the modules that were actually loaded during a particular run of your script. This catches modules that are loaded by any means, but it may not catch conditional requirements. That is, if you have code like this:
if ($some_condition) { require Some::Module }
and $some_condition happens to be false, Devel::Modlist will not list Some::Module as a requirement.
I decided to use Module::ExtractUse instead. It does a static analysis, which means that it will always catch Some::Module in the above example. On the other hand, it can't do anything about code like:
my $module = "Other::Module";
eval "use $module;";
Of course, you could use both approaches and then combine the two lists.
Anyway, here's the solution I came up with:
#! /usr/bin/perl
#---------------------------------------------------------------------
# Copyright 2008 Christopher J. Madsen <perl at cjmweb.net>
#
# This program is free software; you can redistribute it and/or modify
# it under the same terms as Perl itself.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either the
# GNU General Public License or the Artistic License for more details.
#
# Recursively collect dependencies of Perl scripts
#---------------------------------------------------------------------
use strict;
use warnings;
use File::Spec ();
use Module::CoreList ();
use Module::ExtractUse ();
my %need;
my $core = $Module::CoreList::version{'5.008'};
# These modules have lots of dependencies. I don't need to see them now.
my %noRecurse = map { $_ => 1 } qw(
Log::Log4perl
XML::Twig
);
foreach my $file (#ARGV) {
findDeps($file);
}
foreach my $module (sort keys %need) {
print " $module\n";
}
#---------------------------------------------------------------------
sub findDeps
{
my ($file) = #_;
my $p = Module::ExtractUse->new;
$p->extract_use($file);
foreach my $module ($p->array) {
next if exists $core->{$module};
next if $module =~ /^5[._\d]+/; # Ignore "use MIN-PERL-VERSION"
next if $module =~ /\$/; # Run-time specified module
if (++$need{$module} == 1 and not $noRecurse{$module}) {
my $path = findModule($module);
if ($path) { findDeps($path) }
else { warn "WARNING: Can't find $module\n" }
} # end if first use of $module
} # end foreach $module used
} # end findDeps
#---------------------------------------------------------------------
sub findModule
{
my ($module) = #_;
$module =~ s!::|\'!/!g;
$module .= '.pm';
foreach my $dir (#INC) {
my $path = File::Spec->catfile($dir, $module);
return $path if -f $path;
}
return;
} # end findModule
You'd run this like:
perl finddeps.pl scriptToCheck.pl otherScriptToCheck.pl
It prints a list of all non-core modules necessary to run the scripts listed. (Unless they do fancy tricks with module loading that prevent Module::ExtractUse from seeing them.)
You can use online web-service at deps.cpantesters.org that will provide you many useful dependency data. All modules on CPAN already have the link to the dependency site (on the right side of the module page).
In the past I have used Devel::Modlist which is reasonably good allowing you to go
perl -d:Modlist script.pl
To get a list of the required modules.
I have a Make-based build system for all my C/C++ applications (both PC-based and for various embedded projects), and while I love being able to do a top-level build on a fresh machine and verify all dependencies are in place (I check my toolchains in to revision control :D), I've been frustrated at not doing the same for interpreted languages that currently have no makefile in my build system.
I'm tempted to write a script that:
searches my revision control repository for files with the .pl or .pm extension
runs perl -d:Modlist on them (thanks Vagnerr!)
concatenating it to the list of required modules
and finally comparing it to the list of installed modules.
I'd then execute that script as part of my top-level build, so that anyone building anything will know if they have everything they need to run every perl script they got from revision control. If there is some perl script they never run and don't want to CPAN install what's required to run it, they'd have to remove the unwanted script from their harddrive, so the dependency checker can't find them. I know how to modify a perforce client to leave out certain subdirectories when you do a 'sync', I'll have to figure that out for subversion...
I'd suggest making the dependency checker a single script that searches for pl files, as opposed to an individual makefile to check dependencies for each script, or based on a hard-coded list of script names. If you choose a method that requires user action to have a script checked for dependencies, people will forget to perform that action, since they will be able to run the script even if they don't do the dependency check.
Like I said, I haven't implemented the above yet, but this question has prompted me to try to do so. I'll post back with my experience after I'm done.
The 'obvious' way - painful but moderately effective - is to install a brand new build of base Perl in some out of the way location (you aren't going to use this in production), and then try to install your module using this 'virgin' version of Perl. You will find all the missing dependencies. The first time, this could be painful. After the first time, you'll already have the majority of the dependencies covered, and it will be vastly less painful.
Consider running your own local repository of CPAN modules - so that you won't always have to download the code. Also consider how you clean up the out of date modules.
use Acme::Magic::Pony;
Seriously. It will auto-install Perl modules if they turn up missing. See the Acme::Magic::Pony page in CPAN.
Its a "horse that's bolted" answer but I've got into the habit of creating a Bundle file with all my dependencies. Thus when I go to a new environment I just copy it over and install it.
For eg. I have a Baz.pm
package Bundle::Baz;
$VERSION = '0.1';
1;
__END__
=head1 NAME
Bundle::Baz
=head1 SYNOPSIS
perl -MCPAN -e 'install Bundle::Baz'
=head1 CONTENTS
# Baz's modules
XML::Twig
XML::Writer
Perl6::Say
Moose
Put this in ~/.cpan/Bundle/ (or wherever your .cpan lives) and then install 'Bundle::Baz' like a normal CPAN module. This then installs all the modules listed under "=head1 CONTENTS".
Here is a quickie bash function (using the excellent ack):
# find-perl-module-use <directory> (lib/ by default)
function find-perl-module-use() {
dir=${1:-lib}
ack '^\s*use\s+.*;\s*$' $dir | awk '{ print $2 }' | sed 's/();\?$\|;$//' | sort | uniq
ack '^\s*use\s+base\s+.*;\s*$' $dir | awk '{ print $3 }' | sed 's/();\?$\|;$//' | sort | uniq
}