Perl: command-line override of config file settings? - perl

I'm building a script that utilizes a config file (YAML) to read in all the necessary configuration information, then prints out all the necessary steps a Linux Admin needs to step through to build a server.
A required option is for the Linux Admin that's running the script to be able to override any of the item/value pairs from the config file, at the command-line.
The way I'm currently handling this seems overly cumbersome and I know there's got to be a more innovative and less clunky way to do this.
In the code I:
Parse the YAML config file with YAML::Tiny
location:
continent: na
country: us
city: rh
Create variables with the same names as the config file items, assigning the values from the config file.
my $yaml = YAML::Tiny->new;
$yaml = YAML::Tiny->read($config_yml);
my $continent = $yaml->[0]->{location}->{continent};
my $country = $yaml->[0]->{location}->{country};
my $city = $yaml->[0]->{location}->{city};
Use Getopt::Long and assign the variables, overriding anything passed at the command-line.
GetOptions (
"city=s" => \$city,
"continent=s" => \$continent,
"country=s" => \$country,
);
So those are just 3 item/values pairs, my actual config has over 40 and will change...Which makes for a bit of work to have to keep updating. Any suggestions?

You can let the admin override the YAML settings with a single, flexible switch similar to what ssh(1) does with -o. This is especially appropriate if the config settings are numerous and likely to change.
$ myscript -o location:city=rh --option location:country=us
Now, inside the script, you might keep all your runtime config bundled together in a hash for convenience (rather than having $this_and_that_opt scalars proliferate over time). Option parsing would then look something like this:
# First, set up %GlobalAppCfg from defaults and YAML
# now handle "-o location:country=us"
GetOptions('option|o=s' => sub {
my (undef, $optstring) = #_;
my ($userkey, $val) = split('=', $optstring, 2);
my ($major, $minor) = split(':', $userkey, 2);
$GlobalAppCfg->{$major}->{$minor} = $val;
},
...);
or whatever. You can normalize config keys and values, handle arbitrarily deep key/subkey/subkey configs, etc. This can get slippery, so you might like to key-lock that global hash.

Take a look at some of the Config modules that marry GetOpt and YAML, perhaps Config::YAML or Config::YAML::Tiny

Just a sketch, but
In your GetOptions call, could you use "deep references" into the YAML structure, thereby getting rid of the "intermediate" variables?
By looking at the generated YAML structure, could you not generate the GetOptions call automatically (based on what variables you actually see in the YAML). By generate, I mean, create the call as a string, and then use "eval" to actually execute it.
If you want the "intermediate variables" for convenience, you could probably generate those yourself from the YAML structure, also as a string, and then use "eval" to actually create the variables.
As I said, just a sketch.

Related

how to include pl files in batch pod conversion?

I use the module Pod::Simple::HTMLBatch to create the documentation of the software which i write.
The script used is taken from the documentation:
use Pod::Simple::HTMLBatch;
my $batchconv = Pod::Simple::HTMLBatch->new;
$batchconv->some_option( some_value );
$batchconv->some_other_option( some_other_value );
$batchconv->batch_convert( \#search_dirs, $output_dir );
The module just considers pm files. How is it possible to tell the module to create the documentation also for pl files?
I did not find an option in the documentation.
Pod::Simple::Search looks for files matching this regular expression:
m/^[-_a-zA-Z0-9]+\.(?:pod|pm|plx?)\z/is
Which should include any *.pl files. If it's not working for you, try turning on its laborious flag, which is somewhat more forgiving about file names, testing like so:
m/\.(pod|pm|plx?)\z/i || -x _ and -T _
The only way to enable the laborious search with Pod::Simple::HTMLBatch is to create a search subclass, like I did for Pod::Site:
package My::Pod::Search;
use parent 'Pod::Simple::Search';
sub new {
my $self = shift->SUPER::new(#_);
$self->laborious(1);
return $self;
}
Then tell HTMLBatch to use your subclass:
$batchconv->search_class('My::Pod::Search');
$batchconv->batch_convert( \#search_dirs, $output_dir );
Might be nice to update HTMLBatch to accept a search object in its constructor to eliminate this silly workaround, though.

Perl Moose - Loading values from configuration file etc

I'm new to using Moose, but I was wondering how I can load values from a configuration file and then exposing those values as properties of my 'config' object where the attributes are the configuration names in the configuration file.
For example,
The configuration file might contain:
server:mozilla.org
protocol:HTTP
So I'd like my config object to have a 'server' attribute with a value of 'mozilla.org' and a protocol attribute with a value of 'HTTP'.
Right now my understanding is that I have to explicitly name the attributes with a
has 'server' => ( is => 'ro', isa => 'Str', default => 'mozilla.org' );
type of entry in my Config.pm file.
How do I dynamically create these so that the configuration file can change without having me rewrite the Config.pm each time that it does?
TIA!
That's such an obvious idea, it has been implemented already several times.
MooseX::ConfigFromFile
MooseX::Configuration
MooseX::SimpleConfig
Also see
MooseX::App
MooseX::App::Cmd
MooseX::Runnable
which map command-line options to attributes, which you most likely also want.
This isn't quite what you asked for, but you can get a config attribute that's a hash reference by using BUILDARGS to populate the config info at the time of creation. Assuming that the lines of your config file consist of key-value pairs separated by :, something like this should work:
package My::Module;
use Moose;
has 'config'=>(isa=>'HashRef[Str]',is=>'rw',required=>1);
around BUILDARGS=>sub
{
my $orig=shift;
my $class=shift;
my $args=shift; #other arguments passed in (if any).
my %config_hash=();
open(my $read,"<","config_file") or confess $!;
while(<$read>)
{
chomp;
my #array=split /:/;
$config_hash{$array[0]}=$array[1];
}
close($read);
$args->{config}=\%config_hash;
return $class->$orig($args);
};
no Moose;
1;
With minimal effort, it's also easy to have additional attributes to specify the name and path of the config file along with the delimiter. These could be accessed inside of BUILDARGS as, eg, $args->{config_file} and $args->{config_delimiter}.

Perl module loading - Safeguarding against: perhaps you forgot to load "Bla"?

When you run perl -e "Bla->new", you get this well-known error:
Can't locate object method "new" via package "Bla"
(perhaps you forgot to load "Bla"?)
Happened in a Perl server process the other day due to an oversight of mine. There are multiple scripts, and most of them have the proper use statements in place. But there was one script that was doing Bla->new in sub blub at line 123 but missing a use Bla at the top, and when it was hit by a click without any of the other scripts using Bla having been loaded by the server process before, then bang!
Testing the script in isolation would be the obvious way to safeguard against this particular mistake, but alas the code is dependent upon a humungous environment. Do you know of another way to safeguard against this oversight?
Here's one example how PPI (despite its merits) is limited in its view on Perl:
use strict;
use HTTP::Request::Common;
my $req = GET 'http://www.example.com';
$req->headers->push_header( Bla => time );
my $au=Auweia->new;
__END__
PPI::Token::Symbol '$req'
PPI::Token::Operator '->'
PPI::Token::Word 'headers'
PPI::Token::Operator '->'
PPI::Token::Word 'push_header'
PPI::Token::Symbol '$au'
PPI::Token::Operator '='
PPI::Token::Word 'Auweia'
PPI::Token::Operator '->'
PPI::Token::Word 'new'
Setting the header and assigning the Auweia->new parse the same. So I'm not sure how you can build upon such a shaky foundation. I think the problem is that Auweia could also be a subroutine; perl.exe cannot tell until runtime.
Further Update
Okay, from #Schwern's instructive comments below I learnt that PPI is just a tokenizer, and you can build upon it if you accept its limitations.
Testing is the only answer worth the effort. If the code contains mistakes like forgetting to load a class, it probably contains other mistakes. Whatever the obstacles, make it testable. Otherwise you're patching a sieve.
That said, you have two options. You can use Class::Autouse which will try to load a module if it isn't already loaded. It's handy, but because it affects the entire process it can have unintended effects.
Or you can use PPI to scan your code and find all the class method calls. PPI::Dumper is very handy to understand how PPI sees Perl.
use strict;
use warnings;
use PPI;
use PPI::Dumper;
my $file = shift;
my $doc = PPI::Document->new($file);
# How PPI sees a class method call.
# PPI::Token::Word 'Class'
# PPI::Token::Operator '->'
# PPI::Token::Word 'method'
$doc->find( sub {
my($node, $class) = #_;
# First we want a word
return 0 unless $class->isa("PPI::Token::Word");
# It's not a class, it's actually a method call.
return 0 if $class->method_call;
my $class_name = $class->literal;
# Next to it is a -> operator
my $op = $class->snext_sibling or return 0;
return 0 unless $op->isa("PPI::Token::Operator") and $op->content eq '->';
# And then another word which PPI identifies as a method call.
my $method = $op->snext_sibling or return 0;
return 0 unless $method->isa("PPI::Token::Word") and $method->method_call;
my $method_name = $method->literal;
printf "$class->$method_name seen at %s line %d.\n", $file, $class->line_number;
});
You don't say what server enviroment you're running under, but from what you say it sounds like you could do with preloading all your modules in advance before executing any individual pages. Not only would this prevent the problems you're describing (where every script has to remember to load all the modules it uses) but it would also save you memory.
In pre-forking servers (as is commonly used with mod_perl and Apache) you really want to load as much of your code before your server forks for the first time so that the code is stored once in copy-on-write shared memory rather than mulitple times in each child process when it is loaded on demand.
For information on pre-loading in Apache, see the section of Practical mod_perl

Why is it a bad idea to write configuration data in code?

Real-life case (from caff) to exemplify the short question subject:
$CONFIG{'owner'} = q{Peter Palfrader};
$CONFIG{'email'} = q{peter#palfrader.org};
$CONFIG{'keyid'} = [ qw{DE7AAF6E94C09C7F 62AF4031C82E0039} ];
$CONFIG{'keyserver'} = 'wwwkeys.de.pgp.net';
$CONFIG{'mailer-send'} = [ 'testfile' ];
Then in the code: eval `cat $config`, access %CONFIG
Provide answers that lay out the general problems, not only specific to the example.
There are many reasons to avoid configuration in code, and I go through some of them in the configuration chapter in Mastering Perl.
No configuration change should carry the risk of breaking the program. It certainly shouldn't carry the risk of breaking the compilation stage.
People shouldn't have to edit the source to get a different configuration.
People should be able to share the same application without using a common group of settings, instead re-installing the application just to change the configuration.
People should be allowed to create several different configurations and run them in batches without having to edit the source.
You should be able to test your application under different settings without changing the code.
People shouldn't have to learn how to program to be able to use your tool.
You should only loosely tie your configuration data structures to the source of the information to make later architectural changes easier.
You really want an interface instead of direct access at the application level.
I sum this up in my Mastering Perl class by telling people that the first rule of programming is to create a situation where you do less work and people leave you alone. When you put configuration in code, you spend more time dealing with installation issues and responding to breakages. Unless you like that sort of thing, give people a way to change the settings without causing you more work.
$CONFIG{'unhappy_employee'} = `rm -rf /`
One major issue with this approach is that your config is not very portable. If a functionally identical tool were built in Java, loading configuration would have to be redone. If both the Perl and the Java variation used a simple key=value layout such as:
owner = "Peter Palfrader"
email = "peter#peter#palfrader.org"
...
they could share the config.
Also, calling eval on the config file seems to open this system up to attack. What could a malicious person add to this config file if they wanted to wreak some havoc? Do you realize that ANY arbitrary code in your config file will be executed?
Another issue is that it's highly counter-intuitive (at least to me). I would expect a config file to be read by some config loader, not executed as a runnable piece of code. This isn't so serious but could confuse new developers who aren't used to it.
Finally, while it's highly unlikely that the implementation of constructs like p{...} will ever change, if they did change, this might fail to continue to function.
It's a bad idea to put configuration data in compiled code, because it can't be easily changed by the user. For scripts, just make sure it's separated entirely from the rest and document it nicely.
A reason I'm surprised no one mentioned yet is testing. When config is in the code you have to write crazy, contorted tests to be able to test safely. You can end up writing tests that duplicate the code they test which makes the tests nearly useless; mostly just testing themselves, likely to drift, and difficult to maintain.
Hand in hand with testing is deployment which was mentioned. When something is easy to test, it is going to be easy (well, easier) to deploy.
The main issue here is reusability in an environment where multiple languages are possible. If your config file is in language A, then you want to share this configuration with language B, you will have to do some rewriting.
This is even more complicated if you have more complex configurations (example the apache config files) and are trying to figure out how to handle potential differences in data structures. If you use something like JSON, YAML, etc., parsers in the language will be aware of how to map things with regards to the data structures of the language.
The one major drawback of not having them in a language, is that you lose the potential of utilizing setting config values to dynamic data.
I agree with Tim Anderson. Somebody here confuses configuration in code as configuration not being configurable. This is corrected for compiled code.
Both a perl or ruby file is read and interpreted, as is a yml file or xml file with configuration data. I choose yml because it is easier on the eye than in code, as grouping by test environment, development, staging and production, which in code would involve more .. code.
As a side note, XML contradicts the "easy on the eye" completely. I find it interesting that XML config is extensively used with compiled languages.
Reason 1. Aesthetics. While no one gets harmed by bad smell, people tend to put effort into getting rid of it.
Reason 2. Operational cost. For a team of 5 this is probably ok, but once you have developer/sysadmin separation, you must hire sysadmins who understand Perl (which is $$$), or give developers access to production system (big $$$).
And to make matters worse you won't have time (also $$$) to introduce a configuration engine when you suddenly need it.
My main problem with configuration in many small scripts I write, is that they often contain login data (username and password or auth-token) to a service I use. Then later, when the scripts gets bigger, I start versioning it and want to upload it on github.
So before every commit I need to replace my configuration with some dummy values.
$CONFIG{'user'} = 'username';
$CONFIG{'password'} = '123456';
Also you have to be careful, that those values did not eventually slip into your commit history at some point. This can get very annoying. When you went through this one or two times, you will never again try to put configuration into code.
Excuse the long code listing. Below is a handy Conf.pm module that I have used in many systems which allows you to specify different variables for different production, staging and dev environments. Then I build my programs to either accept the environment parameters on the command line, or I store this file outside of the source control tree so that never gets over written.
The AUTOLOAD provides automatic methods for variable retrieval.
# Instructions:
# use Conf;
# my $c = Conf->new("production");
# print $c->root_dir;
# print $c->log_dir;
package Conf;
use strict;
our $AUTOLOAD;
my $default_environment = "production";
my #valid_environments = qw(
development
production
);
#######################################################################################
# You might need to change this.
sub set_vars {
my ($self) = #_;
$self->{"access_token"} = 'asdafsifhefh';
if ( $self->env eq "development" ) {
$self->{"root_dir"} = "/Users/patrickcollins/Documents/workspace/SysG_perl";
$self->{"server_base"} = "http://localhost:3000";
}
elsif ($self->env eq "production" ) {
$self->{"root_dir"} = "/mnt/SysG-production/current/lib";
$self->{"server_base"} = "http://api.SysG.com";
$self->{"log_dir"} = "/mnt/SysG-production/current/log"
} else {
die "No environment defined\n";
}
#######################################################################################
# You shouldn't need to configure this.
# More dirs. Move these into the dev/prod sections if they're different per env.
my $r = $self->{'root_dir'};
my $b = $self->{'server_base'};
$self->{"working_dir"} ||= "$r/working";
$self->{"bin_dir"} ||= "$r/bin";
$self->{"log_dir"} ||= "$r/log";
# Other URLs. Move these into the dev/prod sections if they're different per env.
$self->{"new_contract_url"} = "$b/SysG-training-center/v1/contract/new";
$self->{"new_documents_url"} = "$b/SysG-training-center/v1/documents/new";
}
#######################################################################################
# Code, don't change below here.
sub new {
my ($class,$env) = #_;
my $self = {};
bless ($self,$class);
if ($env) {
$self->env($env);
} else {
$self->env($default_environment);
}
$self->set_vars;
return $self;
}
sub AUTOLOAD {
my ($self,$val) = #_;
my $type = ref ($self) || die "$self is not an object";
my $field = $AUTOLOAD;
$field =~ s/.*://;
#print "field: $field\n";
unless (exists $self->{$field} || $field =~ /DESTROY/ )
{
die "ERROR: {$field} does not exist in object/class $type\n";
}
$self->{$field} = $val if ($val);
return $self->{$field};
}
sub env {
my ($self,$in) = #_;
if ($in) {
die ("Invalid environment $in") unless (grep($in,#valid_environments));
$self->{"_env"} = $in;
}
return $self->{"_env"};
}
1;

Is there a tool for extracting all variable, module, and function names from a Perl module file?

My apologies if this is a duplicate; I may not know the proper terms to search for.
I am tasked with analyzing a Perl module file (.pm) that is a fragment of a larger application. Is there a tool, app, or script that will simply go through the code and pull out all the variable names, module names, and function calls? Even better would be something that would identify whether it was declared within this file or is something external.
Does such a tool exist? I only get the one file, so this isn't something I can execute -- just some basic static analysis I guess.
Check out the new, but well recommended Class::Sniff.
From the docs:
use Class::Sniff;
my $sniff = Class::Sniff->new({class => 'Some::class'});
my $num_methods = $sniff->methods;
my $num_classes = $sniff->classes;
my #methods = $sniff->methods;
my #classes = $sniff->classes;
{
my $graph = $sniff->graph; # Graph::Easy
my $graphviz = $graph->as_graphviz();
open my $DOT, '|dot -Tpng -o graph.png' or die("Cannot open pipe to dot: $!");
print $DOT $graphviz;
}
print $sniff->to_string;
my #unreachable = $sniff->unreachable;
foreach my $method (#unreachable) {
print "$method\n";
}
This will get you most of the way there. Some variables, depending on scope, may not be available.
If I understand correctly, you are looking for a tool to go through Perl source code. I am going to suggest PPI.
Here is an example cobbled up from the docs:
#!/usr/bin/perl
use strict;
use warnings;
use PPI::Document;
use HTML::Template;
my $Module = PPI::Document->new( $INC{'HTML/Template.pm'} );
my $sub_nodes = $Module->find(
sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name }
);
my #sub_names = map { $_->name } #$sub_nodes;
use Data::Dumper;
print Dumper \#sub_names;
Note that, this will output:
...
'new',
'new',
'new',
'output',
'new',
'new',
'new',
'new',
'new',
...
because multiple classes are defined in HTML/Template.pm. Clearly, a less naive approach would work with the PDOM tree in a hierarchical way.
Another CPAN tools available is Class::Inspector
use Class::Inspector;
# Is a class installed and/or loaded
Class::Inspector->installed( 'Foo::Class' );
Class::Inspector->loaded( 'Foo::Class' );
# Filename related information
Class::Inspector->filename( 'Foo::Class' );
Class::Inspector->resolved_filename( 'Foo::Class' );
# Get subroutine related information
Class::Inspector->functions( 'Foo::Class' );
Class::Inspector->function_refs( 'Foo::Class' );
Class::Inspector->function_exists( 'Foo::Class', 'bar' );
Class::Inspector->methods( 'Foo::Class', 'full', 'public' );
# Find all loaded subclasses or something
Class::Inspector->subclasses( 'Foo::Class' );
This will give you similar results to Class::Sniff; you may still have to do some processing on your own.
There are better answers to this question, but they aren't getting posted, so I'll claim the fastest gun in the West and go ahead and post a 'quick-fix'.
Such a tool exists, in fact, and is built into Perl. You can access the symbol table for any namespace by using a special hash variable. To access the main namespace (the default one):
for(keys %main::) { # alternatively %::
print "$_\n";
}
If your package is named My/Package.pm, and is thus in the namespace My::Package, you would change %main:: to %My::Package:: to achieve the same effect. See the perldoc perlmod entry on symbol tables - they explain it, and they list a few alternatives that may be better, or at least get you started on finding the right module for the job (that's the Perl motto - There's More Than One Module To Do It).
If you want to do it without executing any code that you are analyzing, it's fairly easy to do this with PPI. Check out my Module::Use::Extract; it's a short bit of code shows you how to extract any sort of element you want from PPI's PerlDOM.
If you want to do it with code that you have already compiled, the other suggestions in the answers are better.
I found a pretty good answer to what I was looking for in this column by Randal Schwartz. He demonstrated using the B::Xref module to extract exactly the information I was looking for. Just replacing the evaluated one-liner he used with the module's filename worked like a champ, and apparently B::Xref comes with ActiveState Perl, so I didn't need any additional modules.
perl -MO=Xref module.pm