How can I override hard-coded configuration in my Perl program?

How can I override hard-coded configuration in my Perl program? - perl

I have a Perl script that sets up variables near the top for directories and files that it will use. It also requires a few variables to be set as command-line arguments.
Example:
use Getopt::Long;
my ($mount_point, $sub_dir, $database_name, $database_schema);
# Populate variables from the command line:
GetOptions(
'mount_point=s' => \$mount_point,
'sub_dir=s' => \$sub_dir,
'database_name=s' => \$database_name,
'database_schema=s' => \$database_schema
);
# ... validation of required arguments here
################################################################################
# Directory variables
################################################################################
my $input_directory = "/${mount_point}/${sub_dir}/input";
my $output_directory = "/${mount_point}/${sub_dir}/output";
my $log_directory = "/${mount_point}/${sub_dir}/log";
my $database_directory = "/db/${database_name}";
my $database_scripts = "${database_directory}/scripts";
################################################################################
# File variables
################################################################################
my $input_file = "${input_dir}/input_file.dat";
my $output_file = "${output_dir}/output_file.dat";
# ... etc
This works fine in my dev, test, and production environments. However, I was trying to make it easier to override certain variables (without going into the debugger) for development and testing. (For example, if I want to set my input_file = "/tmp/my_input_file.dat"). My thought was to use the GetOptions function to handle this, something like this:
GetOptions(
'input_directory=s' => \$input_directory,
'output_directory=s' => \$output_directory,
'database_directory=s' => \$database_directory,
'log_directory=s' => \$log_directory,
'database_scripts=s' => \$database_scripts,
'input_file=s' => \$input_file,
'output_file=s' => \$output_file
);
GetOptions can only be called once (as far as I know). The first 4 arguments in my first snippit are required, the last 7 directly above are optional. I think an ideal situation would be to setup the defaults as in my first code snippit, and then somehow override any of them that have been set if arguments were passed at the command line. I thought about storing all my options in a hash and then using that hash when setting up each variable with the default value unless an entry exists in the hash, but that seems to add a lot of additional logic. Is there a way to call GetOptions in two different places in the script?
Not sure if that makes any sense.
Thanks!

It sounds like you need to change your program to use configuration files rather than hard-coded configuration. I devoted an entire chapter of Mastering Perl to this. You don't want to change source code to test the program.
There are many Perl modules on CPAN that make configuration files an easy feature to add. Choose the one that works best for your input data.
Once you get a better configuration model in place, you can easily set default values, take values from multiple places (files, command-line, etc), and easily test the program with different values.

Here's another approach. It uses arrays of names and a hash to store the options. It makes all options truly optional, but validates the required ones unless you include "--debug" on the command line. Regardless of whether you use "--debug", you can override any of the others.
You could do more explicit logic checks if that's important to you, of course. I included "--debug" as an example of how to omit the basic options like "mount_point" if you're just going to override the "input_file" and "output_file" variables anyway.
The main idea here is that by keeping sets of option names in arrays, you can include logic checks against the groups with relatively little code.
use Getopt::Long;
my #required_opts = qw(
mount_point
sub_dir
database_name
database_schema
);
my #internal_opts = qw(
input_directory
output_directory
log_directory
database_directory
database_scripts
input_file
output_file
);
my #opt_spec = ("debug", map { "$_:s" } #required_opts, #internal_opts);
# Populate variables from the command line:
GetOptions( \(my %opts), #opt_spec );
# check required options unless
my #errors = grep { ! exists $opts{$_} } #required_options;
if ( #errors && ! $opts{debug} ) {
die "$0: missing required option(s): #errors\n";
}
################################################################################
# Directory variables
###############################################################################
my $opts{input_directory} ||= "/$opts{mount_point}/$opts{sub_dir}/input";
my $opts{output_directory} ||= "/$opts{mount_point}/$opts{sub_dir}/output";
my $opts{log_directory} ||= "/$opts{mount_point}/$opts{sub_dir}/log";
my $opts{database_directory} ||= "/db/$opts{database_name}";
my $opts{database_scripts} ||= "$opts{database_directory}/scripts";
################################################################################
# File variables
################################################################################
my $opts{input_file} ||= "$opts{input_directory}/input_file.dat";
my $opts{output_file} ||= "$opts{output_directory}/output_file.dat";
# ... etc

I think what I would do is set input_directory et al to "undef", and then put them in the getopts, and then afterwards, test if they're still undef and if so assign them as shown. If your users are technically sophisticated enough to understand "if I give a relative path it's relative to $mount_point/$sub_dir", then I'd do additional parsing looking for an initial "/".

GetOptions can be called with an array as its input data. Read the documentation.

Related

Parsing command line args, but not validating

We have a module at work that is included in most scripts to create a logging event that includes who invoked the script and what command line args were passed to it. Currently this simply uses a dump of #ARGV.
However, we're wanting to include this functionality for scripts that potentially include secure information that is passed on the command line. We therefore still want to ledger the options passed to the script but masking the values. s/(?<=.{2})./X/sg
For example
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dump qw(dd);
use Getopt::Long qw(GetOptions);
local #ARGV = ( '-i', '--name' => 'value', '--password' => 'secure info', '--list' => 'foobar', '--list' => 'two' );
# The below GetOptions call specifics the allowed command line options
# to be parsed and validated.
#
# I want some way to accomplish the same but WITHOUT having to specify
# anything.
#
# Something like: GetOptinos( \my %hash ); # Just do it without validation
GetOptions( \my %hash, 'i', 'name=s', 'password=s', 'list=s#' );
for ( values %hash ) {
for ( ref($_) ? #$_ : $_ ) {
s/(?<=.{2})./X/sg;
}
}
dd \%hash; # The the command line options are generically logged to a file with the values masked.
Outputs:
{
i => 1,
list => ["foXXXX", "twX"],
name => "vaXXX",
password => "seXXXXXXXXX",
}
The module I'm used to using for CLI parsing is Getopt::Long.
Is there a way to get Getopt::Long to not validate, but simply generically parse the options into a hash without having to specify any of the allowed options? Alternatively, is there another module that would give this ability?

I am not sure how Getopt::Long affects security, but I can think of a couple of ways to limit how much it works with provided arguments.
When a user subroutine is used to process options
It is up to the subroutine to store the value, or do whatever it thinks is appropriate.
I assume that the code just passes things to the sub. Then you can put them away as you wish
GetOptions(sensitive_opt => sub { $sensitive{$_[0]} = $_[1] }, ...) or usage();
Another way would be to not document the sensitive options for Getopt::Long, but provide the argument callback <>, which runs for each unrecognized thing on the command line; then those can be processed by hand. For this the pass_through configuration option need be enabled
use Getopt::Long qw(pass_through); # pre 2.24 use Getopt::Long::Configure()
GetOptions(opt => \$normal_opt, ... '<>' => \&process_sensitive) ...
Then the given options are handled normally while the (expected) sensitive ones are processed by hand in process_sensitive().
The drawback here is that the options unmentioned in GetOptions are literally untouched and passed to the sub as mere words, and one at a time. So there would be some work to do.

Can variable declarations be placed in a common script

Before I start, the whole 'concept' may be technically impossible; hopefully someone will have more knowledge about such things, and advise me.
With Perl, you can "declare" global variables at the start of a script via my / our thus:
my ($a,$b,$c ..)
That's fine with a few unique variables. But I am using about 50 of them ... and the same names (not values) are used by five scripts. Rather than having to place huge my( ...) blocks at the start of each file, I'm wondering if there is a way to create them in one script. Note: Declare the namespace, not their values.
I have tried placing them all in a single file, with the shebang at the top, and a 1 at the bottom, and then tried "require", "use" and "do" to load them in. But - at certain times -the script complains it cannot find the global package name. (Maybe the "paths.pl" is setting up the global space relative to itself - which cannot be 'seen' by the other scripts)
Looking on Google, somebody suggested setting variables in the second file, and still setting the my in the calling script ... but that is defeating the object of what I'm trying to do, which is simply declare the name space once, and setting the values in another script
** So far, it seems if I go from a link in an HTML page to a perl script, the above method works. But when I call a script via XHTMLRequest using a similar setup, it cannot find the $a, $b, $c etc within the "paths" script
HTML
<form method="post" action="/cgi-bin/track/script1.pl>
<input type="submit" value="send"></form>
Perl: (script1.pl)
#shebang
require "./paths.pl"
$a=1;
$b="test";
print "content-type: text/html\n\n";
print "$a $b";
Paths.pl
our($a,
$b,
$c ...
)1;
Seems to work OK, with no errors. But ...
# Shebang
require "./paths.pl"
XHTMLREQUEST script1.pl
Now it complains it cannot find $a or $b etc as an "explicit package" for "script1.pl"
Am I moving into the territory of "modules" - of which I know little. Please bear in mind, I am NOT declaring values within the linked file, but rather setting up the 'global space' so that they can be used by all scripts which declare their own values.
(On a tangent, I thought - in the past - a file in the same directory could be accessed as "paths.pl" -but it won't accept that, and it insists on "./" Maybe this is part of the problem. I have tried absolute and relative paths too, from "url/cgi-bin/track/" to "/cgi-bin/track" but can't seem to get that to work either)
I'm fairly certain it's finding the paths file as I placed a "my value" before the require, and set a string within paths, and it was able to print it out.

First, lexical (my) variables only exist in their scope. A file is a scope, so they only exist in their file. You are now trying to work around that, and when you find yourself fighting the language that way, you should realize that you are doing it wrong.
You should move away from declaring all variables in one go at the top of a program. Declare them near the scope you want to use them, and declare them in the smallest scope possible.
You say that you want to "Set up a global space", so I think you might misunderstand something. If you want to declare a lexical variable in some scope, you just do it. You don't have to do anything else to make that possible.
Instead of this:
my( $foo, $bar, $baz );
$foo = 5;
sub do_it { $bar = 9; ... }
while( ... ) { $baz = 6; ... }
Declare the variable just where you want them:
my $foo = 5;
sub do_it { my $bar = 9; ... }
while( ... ) { my $baz = 6; ... }
Every lexical variable should exist in the smallest scope that can tolerate it. That way nothing else can mess with it and it doesn't retain values from previous operations when it shouldn't. That's the point of them, after all.
When you declare them to be file scoped, then don't declare them in the scope that uses them, you might have two unrelated uses of the same name conflicting with each other. One of the main benefits of lexical variables is that you don't have to know the names of any other variables in scope or in the program:
my( $foo, ... );
while( ... ) {
$foo = ...;
do_something();
...
}
sub do_something {
$foo = ...;
}
Are those uses of $foo in the while and the sub the same, or do they accidentally have the same name? That's a cruel question to leave up to the maintenance program.
If they are the same thing, make the subroutine get its value from its argument list instead. You can use the same names, but since each scope has it's own lexical variables, they don't interfere with each other:
while( ... ) {
my $foo = ...;
do_something($foo);
...
}
sub do_something {
my( $foo ) = #_;
}
See also:
How to share/export a global variable between two different perl scripts?
You say you aren't doing what I'm about to explain, but other people may want to do something similar to share values. Since you are sharing the same variable names across programs, I suspect that this is actually what it going on, though.
In that case, there are many modules on CPAN that can do that job. What you choose depends on what sort of stuff you are trying to share between programs. I have a chapter in Mastering Perl all about it.
You might be able to get away with something like this, where one module defines all the values and makes them available for export:
# in Local/Config.pm
package Local::Config;
use Exporter qw(import);
our #EXPORT = qw( $foo $bar );
our $foo = 'Some value';
our $bar = 'Different value';
1;
To use this, merely load it with use. It will automatically import the variables that you put in #EXPORT:
# in some program
use Local::Config;
We cover lots of this sort of stuff in Intermediate Perl.

What you want to do here is a form of boilerplate management. Shoving variable declarations into a module or class file. This is a laudable goal. In fact you should shove as much boilerplate into that other module as possible. It makes it far easier to keep consistent behavior across the many scripts in a project. However shoving variables in there will not be as easy as you think.
First of all, $a and $b are special variables reserved for use in sort blocks so they never have to be declared. So using them here will not validate your test. require always searches for the file in #INC. See perlfunc require.
To declare a variable it has to be done at compile time. our, my, and state all operate at compile time and legalize a symbol in a lexical scope. Since a module is a scope, and require and do both create a scope for that file, there is no way to have our (let alone my and state) reach back to a parent scope to declare a symbol.
This leaves you with two options. Export package globals back to the calling script or munge the script with a source filter. Both of these will give you heartburn. Remember that it has to be done at compile time.
In the interest of computer science, here's how you would do it (but don't do it).
#boilerplate.pm
use strict;
use vars qw/$foo $bar/;
1;
__END__
#script.pl
use strict;
use boilerplate;
$foo = "foo here";
use vars is how you declare package globals when strict is in effect. Package globals are unscoped ("global") so it doesn't matter what scope or file they're declared in. (NB: our does not create a global like my creates a lexical. our creates a lexical alias to a global, thus exposing whatever is there.) Notice that boilerplate.pm has no package declaration. It will inherit whatever called it which is what you want.
The second way using source filters is devious. You create a module that rewrites the source code of your script on the fly. See Filter::Simple and perlfilter for more information. This only works on real scripts, not perl -e ....
#boilerplate.pm
package boilerplate;
use strict; use diagnostics;
use Filter::Simple;
my $injection = '
our ($foo, $bar);
my ($baz);
';
FILTER { s/__FILTER__/$injection/; }
__END__
#script.pl
use strict; use diagnostics;
use boilerplate;
__FILTER__
$foo = "foo here";
You can make any number of filtering tokens or scenarios for code substitution. e.g. use boilerplate qw/D2_loadout/;
These are the only ways to do it with standard Perl. There are modules that let you meddle with calling scopes through various B modules but you're on your own there. Thanks for the question!
HTH

writing reflective method to load variables from conf file, and assigning references?

I'm working with ugly code and trying to do a cleanup by moving values in a module into a configuration file. I want to keep the modules default values if a variable doesn't exist in the conf file, otherwise use the conf file version. There are lots of variables (too many) in the module so I wanted a helper method to support this. This is a first refactoring step, I likely will go further to better handle config variables later, but one step at a time.
I want a method that would take a variable in my module and either load the value from conf or set a default. So something like this (writing this from scratch, so treat it as just pseudocode for now)
Our ($var_a, $var_b ...);
export($var_a, $var_b ...);
my %conf = #load config file
load_var(\$var_a, "foo");
load_var(\$var_b, "$var_abar");
sub load_var($$){
my($variable_ref, $default) = #_
my $variale_name = Dumper($$variable_ref); #get name of variable
my $variable_value = $conf{$variable_name} // $default;
#update original variable by having $variable_ref point to $variable_value
}
So two questions here. First, does anyone know if some functionality like my load_var already exists which I an reuse?
Second, if I have to write it from scratch, can i do it with a perl version older then 5.22? when I read perlref it refers to setting references as being a new feature in 5.22, but it seems odd that such a basic behavior of references wasn't implemented sooner, so I'm wonder if I'm misunderstanding the document. Is there a way to pass a variable to my load_var method and ensure it's actually updated?

For this sort of problem, I would be thinking along the lines of using the AUTOLOAD - I know it's not quite what you asked, but it's sort of doing a similar thing:
If you call a subroutine that is undefined, you would ordinarily get an immediate, fatal error complaining that the subroutine doesn't exist. (Likewise for subroutines being used as methods, when the method doesn't exist in any base class of the class's package.) However, if an AUTOLOAD subroutine is defined in the package or packages used to locate the original subroutine, then that AUTOLOAD subroutine is called with the arguments that would have been passed to the original subroutine.
Something like:
#!/usr/bin/env perl
package Narf;
use Data::Dumper;
use strict;
use warnings;
our $AUTOLOAD;
my %conf = ( fish => 1,
carrot => "banana" );
sub AUTOLOAD {
print "Loading $AUTOLOAD\n";
##read config file
my $target = $AUTOLOAD =~ s/.*:://gr;
print $target;
return $conf{$target} // 0;
}
sub boo {
print "Boo!\n";
}
You can either call it OO style, or just 'normally' - but bear in mind this creates subs, not variables, so you might need to specify the package (or otherwise 'force' an import/export)
#!/usr/bin/env perl
use strict;
use warnings;
use Narf;
print Narf::fish(),"\n";
print Narf::carrot(),"\n";
print Narf::somethingelse(),"\n";
print Narf::boo;
Note - as these are autoloaded, they're not in the local namespace. Related to variables you've got this perlmonks discussion but I'm not convinced that's a good line to take, for all the reasons outlined in Why it's stupid to `use a variable as a variable name'

Perl portable serialization only with CORE modules

Exists any portable serialization method/Module what is included in the CORE modules? I know here is Storable, but it is not truly portable nor "cross-platform-standardized". Looking for something like YAML, JSON, XML or like...
I already chcecked the http://perldoc.perl.org/index-modules-T.html - but maybe missed something.
Motivation: want make a simple perl script what will works with any perl (without CPAN) and can read some configuration (and data) from a file. Using require with the Data::Dumper format is not very "user friendly"...
So possible solutions:
include something like YAML directly to my script (can be a solution, but...)
forcing users to install CPAN modules (not a solution)
use native perl and require - not very userfriendly syntax (for a non-perl users)
Any other suggested solution?
Ps: Understand the need keep core as small as possible and reasonable, but reading data in some standardized formats maybe? should be in a core...

There is a YAML parser and serializer bundled with Perl, hidden away. It's called CPAN::Meta::YAML. It only handles a subset of YAML, but that may be sufficient for your purposes.

You can configure Data::Dumper's output to be JSON-like. For example:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Pair = ': ';
$Data::Dumper::Terse = 1;
$Data::Dumper::Useqq = 1;
$Data::Dumper::Indent = 1;
my $structure = {
foo => 'bar',
baz => {
quux => 'duck',
badger => 'mythical',
}
};
print Dumper( $structure );
This prints:
{
"baz": {
"quux": "duck",
"badger": "mythical"
},
"foo": "bar"
}
That might get you most of the way towards interoperability? The module does have a bunch of options for controlling / changing output e.g. the Freezer and Toaster options.

Can you explain to me the problem with Storable again? If you look at Perlport, after a discussion of Bigendiness and Littleendiness, it concludes:
One can circumnavigate both these problems in two ways. Either transfer and store numbers always in text format, instead of raw binary, or else consider using modules like Data::Dumper and Storable (included as of perl 5.8). Keeping all data as text significantly simplifies matters.
So, Storable is universal for storing and retrieving data in Perl, and it's not only easy to use, but it's a standard Perl module.
Is the issue that you want to be able to write the data without having a Perl program do it for you? You could simply write your own Perl module. In most Perl installations, that module could be placed in the same directory as your program.
package Some_data; # Can be put in the same directory as the program like a config file
our $data; # Module variable which makes it accessible to your local program
$data = {}; # I am making this complex data structure...
$data->{NAME}->{FIRST} = "Bob";
$data->{NAME}->{LAST} = "Smith";
$data->{PHONE}->[0]->{TYPE} = "C";
$data->{PHONE}->[0]->{NUMBER} = "555-1234";
$data->{PHONE}->[1]->{TYPE} = "H";
$data->{PHONE}->[1]->{NUMBER} = "555-2345";
# Or use Subroutines
sub first {
return "Bob";
}
sub last {
return "Smith"
}
...
Now you can include this in your program:
use Some_data;
my $first_name = $Some_data::data->{NAME}->{FIRST} # As a hash of hashes
# OR
my $first_name = Some_data::first; # As a constant
The nice thing about the subroutines is that you can't change the data in your program. They're constants. In fact, that's exactly how Perl constants work too.
Speaking about constants. You could use use constant too:
package Some_data;
use constant {
FIRST => "Bob",
SECOND => "Smith",
};
And in your program:
use strict;
use warnings;
use Some_data;
my $fist_name = &Some_Data::FIRST; # Note the use of the ampersand!
Not quite as clean because you need to prefix the constant with an ampersand. There are ways of getting around that ampersand, but they're not all that pretty.
Now, you have a way of importing your data in your program, and it's really no harder to maintain than a JSON data structure. There's nothing your program has to do except to use Module; to get that data.
One final possibility
Here's one I've done before. I simply have a configuration file that looks like what you'd put on the command line, then use Getopt::Long to pull in the configuration:
Configfile
-first Bob -last Smith
-phone 555-1212
NOTE: It doesn't matter if you put it all on one line or not:
use strict;
use warnings;
use Getopt::Long qw(GetOptionsFromString);
open my $param_fh, "<", $param_file;
my #parameters = <$param_fh>;
close $param_fh;
my $params = join " ", $parameters # One long string
my ( $first, $phone );
GetOptionsFromString ( $params,
"first=s" => \$first,
"phone=s" => \$phone,
);
You can't get easier to maintain than that!

Can Perl's Getopt::Long parse arguments I don't define ahead of time?

I know how to use Perl's Getopt::Long, but I'm not sure how I can configure it to accept any "--key=value" pair that hasn't been explicitly defined and stick it in a hash. In other words, I don't know ahead of time what options the user may want, so there's no way for me to define all of them, yet I want to be able to parse them all.
Suggestions? Thanks ahead of time.

The Getopt::Long documentation suggests a configuration option that might help:
pass_through (default: disabled)
Options that are unknown, ambiguous or supplied
with an invalid option value are passed through
in #ARGV instead of being flagged as errors.
This makes it possible to write wrapper scripts
that process only part of the user supplied
command line arguments, and pass the remaining
options to some other program.
Once the regular options are parsed, you could use code such as that provided by runrig to parse the ad hoc options.

Getopt::Long doesn't do that. You can parse the options yourself...e.g.
my %opt;
my #OPTS = #ARGV;
for ( #OPTS ) {
if ( /^--(\w+)=(\w+)$/ ) {
$opt{$1} = $2;
shift #ARGV;
} elsif ( /^--$/ ) {
shift #ARGV;
last;
}
}
Or modify Getopt::Long to handle it (or modify the above code to handle more kinds of options if you need that).

I'm a little partial, but I've used Getopt::Whatever in the past to parse unknown arguments.

Potentially, you could use the "Options with hash values" feature.
For example, I wanted to allow users to set arbitrary filters when parsing through an array of objects.
GetOptions(my $options = {}, 'foo=s', 'filter=s%')
my $filters = $options->{filter};
And then call it like
perl ./script.pl --foo bar --filter baz=qux --filter hail=eris
Which would build something like..
$options = {
'filter' => {
'hail' => 'eris',
'baz' => 'qux'
},
'foo' => 'bar'
};
And of course $filters will have the value associated with 'filter'
Good luck! I hope someone found this helpful.

From the documentation:
Argument Callback
A special option 'name' <> can be used to designate a subroutine to handle non-option arguments. When GetOptions() encounters an argument that does not look like an option, it will immediately call this subroutine and passes it one parameter: the argument name.
Well, actually it is an object that stringifies to the argument name.
For example:
my $width = 80;
sub process { ... }
GetOptions ('width=i' => \$width, '<>' => \&process);
When applied to the following command line:
arg1 --width=72 arg2 --width=60 arg3
This will call process("arg1") while $width is 80, process("arg2") while $width is 72, and process("arg3") while $width is 60.
This feature requires configuration option permute, see section
"Configuring Getopt::Long".

This is a good time to roll your own option parser. None of the modules that I've seen on the CPAN provide this type of functionality, and you could always look at their implementations to get a good sense of how to handle the nuts and bolts of parsing.
As an aside, this type of code makes me hate Getopt variants:
use Getopt::Long;
&GetOptions(
'name' => \$value
);
The inconsistent capitalization is maddening, even for people who have seen and used this style of code for a long time.