Separating configuration data and script logic in Perl scripts - perl

I find the following anti-pattern repeated in my Perl scripts: the script contains some machine/setup specific settings which I store in-line as constants in the script whereas the rest of the script is general in nature:
#!/usr/bin/perl
use strict;
use warnings;
# machine specific settings at the start of the script.
my $SETTING_1 = "foo";
my #SETTING_2 = ("123", "456");
my $SETTING_3 = "something";
# general part of script follows.
...
This pattern is somewhat okay when running on one machine, but as soon as I want to distribute the script to multiple machines the trouble starts since I must keep track so that I do not overwrite the settings part with new updates in the general part.
The correct solution is obviously to have one general script file and have it read a configuration file which is specific to the environment that the script runs in.
My question is: What CPAN module would you recommend for solving this problem? Why?

For configuration files, I like to use YAML. Simple, cross-platform, human-readable, and no danger of your configuration accidentally morphing into an actual program.

My favorite is Config::Std. I like the way it handles multi-line and multi-part configuration values.
You have to be careful when a variable is potentially multi-valued: If a single value exists in the configuration file, it will store the value in a scalar; if multiple values exist, you will get an array reference.
I find it convenient to have two configuration files: One for values that describe the operating environment (where to find libraries etc) and another for user-modifiable behavior.
I also like to write a wrapper around it. For example (updated to include autogenerated read-only accessors):
#!/usr/bin/perl
package My::Config;
use strict; use warnings;
use Config::Std;
use FindBin qw($Bin);
use File::Spec::Functions qw( catfile );
sub new {
my $class = shift;
my ($config_file) = #_;
$config_file = catfile($Bin, 'config.ini');
read_config $config_file => my %config;
my $object = bless \%config => $class;
$object->gen_accessors(
single => {
install => [ qw( root ) ],
},
multi => {
template => [ qw( dir ) ],
},
);
return $object;
}
sub gen_accessors {
my $config = shift;
my %args = #_;
my $class = ref $config;
{
no strict 'refs';
for my $section ( keys %{ $args{single} } ) {
my #vars = #{ $args{single}->{$section} };
for my $var ( #vars ) {
*{ "${class}::${section}_${var}" } = sub {
$config->{$section}{$var};
};
}
}
for my $section ( keys %{ $args{multi} } ) {
my #vars = #{ $args{multi}->{$section} };
for my $var ( #vars ) {
*{ "${class}::${section}_${var}" } = sub {
my $val = $config->{$section}{$var};
return [ $val ] unless 'ARRAY' eq ref $val;
return $val;
}
}
}
}
return;
}
package main;
use strict; use warnings;
my $config = My::Config->new;
use Data::Dumper;
print Dumper($config->install_root, $config->template_dir);
C:\Temp> cat config.ini
[install]
root = c:\opt
[template]
dir = C:\opt\app\tmpl
dir = C:\opt\common\tmpl
Output:
C:\Temp> g.pl
$VAR1 = 'c:\\opt';
$VAR2 = [
'C:\\opt\\app\\tmpl',
'C:\\opt\\common\\tmpl'
];

The Config:Properties library is good for reading and writing key/value pair property files.

I prefer YAML and YAML::XS for configuration data. It's simple, readable, and has bindings for almost any programming language. Another popular choice is Config::General.

The usual low-tech method is to simply do EXPR a configuration file. Have you looked into this?

At the risk of being laughed out of class, one solution is to store the config in XML (or for more adventurous, JSON). Human-consumable, interoperable outside of Perl, doesn't have to live on local PC (both XML and JSON can be requested off of a "config URL") and a bunch of standard modules (XML::Simple is usually good enough for config XML files) exist on CPAN.

For simple configuration like this, especially for trivial things where I don't expect this data to change in the real world, I often simply use YAML. The simplicity cannot be beat:
First, write your Perl data structure containing your configuration.
use YAML;
my $SETTINGS = {
'1' => "foo",
'2' => ["123", "456"],
'3' => "something",
};
Then, pass it to YAML::DumpFile();
YAML::DumpFile("~/.$appname.yaml", $SETTINGS);
Delete the data structure and replace it with
my $SETTINGS = YAML::LoadFile("~/.$appname.yaml");
And then forget about it. Even if you don't know or want to learn YAML syntax, small changes to the config can be made by hand and more major ones can be done in Perl and then re-dumped to YAML.

Don't tie yourself to a format -- use Config::Any, or for a little more whizbang DWIM factor, Config::JFDI (which itself wraps Config::Any). With them you buy yourself the ability to support INI, YAML, XML, Apache-style config, and more.
Config::JFDI builds on this by trying to capture some of the magic of Catalyst's config loader: merging of instance-local config with app-wide config, environment variable support, and a limited macro facility (__path_to(foo/bar)__ comes in handy surprisingly often.)

Related

Merge two yml files does not handle duplicates?

I am trying to merge 2 yml files using Hash::Merge perl module. And trying to Dump it to yml file using Dump from YMAL module.
use strict;
use warnings;
use Hash::Merge qw( merge );
Hash::Merge::set_behavior('RETAINMENT_PRECEDENT');
use File::Slurp qw(write_file);
use YAML;
my $yaml1 = $ARGV[0];
my $yaml2 = $ARGV[1];
my $yaml_output = $ARGV[2];
my $clkgrps = &YAML::LoadFile($yaml1);
my $clkgrps1 = &YAML::LoadFile($yaml2);
my $clockgroups = merge($clkgrps1, $clkgrps);
my $out_yaml = Dump $clockgroups;
write_file($yaml_output, { binmode => ':raw' }, $out_yaml);
After merging yml file, I could see duplicate entries i.e. following content is same in two yml files. While merging it is treating them as different entries. Do we have any implicit way in handle duplicates?
The data structures obtained from YAML files generally contain keys with values being arrayrefs with hashrefs. In your test case that's the arrayref for the key test.
Then a tool like Hash::Merge can only add the hashrefs to the arrayref belonging to the same key; it is not meant to compare array elements, as there aren't general criteria for that. So you need to do this yourself in order to prune duplicates, or apply any specific rules of your choice to data.
One way to handle this is to serialize (so stringify) complex data structures in each arrayref that may contain duplicates so to be able to build a hash with them being keys, which is a standard way to handle duplicates (with O(1) complexity, albeit possibly with a large constant).
There are a number of ways to serialize data in Perl. I'd recommend JSON::XS, as a very fast tool with output that can be used by any language and tool. (But please research others of course, that may suit your precise needs better.)
A simple complete example, using your test cases
use strict;
use warnings;
use feature 'say';
use Data::Dump qw(dd pp);
use YAML;
use JSON::XS;
use Hash::Merge qw( merge );
#Hash::Merge::set_behavior('RETAINMENT_PRECEDENT'); # irrelevant here
die "Usage: $0 in-file1 in-file2 output-file\n" if #ARGV != 3;
my ($yaml1, $yaml2, $yaml_out) = #ARGV;
my $hr1 = YAML::LoadFile($yaml1);
my $hr2 = YAML::LoadFile($yaml2);
my $merged = merge($hr2, $hr1);
#say "merged: ", pp $merged;
for my $key (keys %$merged) {
# The same keys get overwritten
my %uniq = map { encode_json $_ => 1 } #{$merged->{$key}};
# Overwrite the arrayref with the one without dupes
$merged->{$key} = [ map { decode_json $_ } keys %uniq ];
}
dd $merged;
# Save the final structure...
More complex data structures require a more judicious traversal; consider using a tool for that.
With files as shown in the question this prints
{
test => [
{ directory => "LIB_DIR", name => "ObsSel.ktc", project => "TOT" },
{ directory => "MODEL_DIR", name => "pipe.v", project => "TOT" },
{
directory => "PCIE_LIB_DIR",
name => "pciechip.ktc",
project => "PCIE_MODE",
},
{ directory => "NAME_DIR", name => "fame.v", project => "SINGH" },
{ directory => "TREE_PROJECT", name => "Syn.yml", project => "TOT" },
],
}
(I use Data::Dump to show complex data, for its simplicity and default compact output.)
If there are issues with serializing and comparing entire structures consider using a digest (checksum, hashing) of some sort.
Another option altogether would be to compare data structures as they are in order to resolve duplicates, by hand. For comparison of complex data structures I like to use Test::More, which works very nicely for mere comparisons outside of any testing. But there are dedicated tools as well of course, like Data::Compare.
Finally, instead of manually processing the result of a naive merge, like above, one can code the desired behavior using Hash::Merge::add_behavior_spec and then have the module do it all. For specific examples of how to use this feature see for instance this post
and this post and this post.
Note that in this case you still write all the code to do the job like above but the module does take some of the mechanics off of your hands.

Replacing a class in Perl ("overriding"/"extending" a class with same name)?

I am trying to Iterate directories in Perl, getting introspectable objects as result, mostly so I can print fields like mtime when I'm using Dumper on the returns from IO::All.
I have discovered, that it can be done, if in the module IO::All::File (for me, /usr/local/share/perl/5.10.1/IO/All/File.pm), I add the line field mtimef => undef;, and then modify its sub file so it runs $self->mtimef($self->mtime); (note, this field cannot have the same name (mtime) as the corresponding method/property, as those are dynamically assigned in IO::All). So, in essence, I'm not interested in "overloading", as in having the same name for multiple function signatures - I'd want to "replace" or "override" a class with its extended version (not sure how this is properly called), but under the same name; so all other classes that may use it, get on to using the extended version from that point on.
The best approach for me now would be, if I could somehow "replace" the IO::All::File class, from my actual "runnable" Perl script -- if somehow possible, by using the mechanisms for inheritance, so I can just add what is "extra". To show what I mean, here is an example:
use warnings;
use strict;
use Data::Dumper;
my #targetDirsToScan = ("./");
use IO::All -utf8 ; # Turn on utf8 for all io
# try to "replace" the IO::All::File class
{ # recursive inheritance!
package IO::All::File;
use IO::All::File -base;
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
return $self->_init;
}
1;
}
# main script start
my $io = io(#targetDirsToScan);
my #contents = $io->all(0); # Get all contents of dir
for my $contentry ( #contents ) {
print Dumper \%{*$contentry};
}
... which fails with "Recursive inheritance detected in package 'IO::All::Filesys' at /usr/local/share/perl/5.10.1/IO/All/Base.pm line 13."; if you comment out the "recursive inheritance" section, it all works.
I'm sort of clear on why this happens with this kind of syntax - however, is there a syntax, or a way, that can be used to "replace" a class with its extended version but of the same name, similar to how I've tried it above? Obviously, I want the same name, so that I wouldn't have to change anything in IO::All (or any other files in the package). Also, I would preferably do this in the "runner" Perl script (so that I can have everything in a single script file, and I don't have to maintain multiple files) - but if the only way possible is to have a separate .pm file, I'd like to know about it as well.
So, is there a technique I could use for something like this?
Well, I honestly have no idea what is going on, but I poked around with the code above, and it seems all that is required, is to remove the -base from the use IO::All::File statement; and the code otherwise seems to work as I expect it - that is, the package does get "overriden" - if you change this snippet in the code above:
# ...
{ # no more recursive inheritance!? IO::All::File gets overriden with this?!
package IO::All::File;
use IO::All::File; # -base; # just do not use `-base` here?!
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
print("!! *haxx0rz'd* file() reporting in\n");
return $self->_init;
}
1;
}
# ...
I found this so unbelievable, I even added the print() there to make sure it is the "overriden" function that runs, and sure enough, it is; this is what I get in output:
...
!! *haxx0rz'd* file() reporting in
$VAR1 = {
'_utf8' => 1,
'mtimef' => 1394828707,
'constructor' => sub { "DUMMY" },
'is_open' => 0,
'io_handle' => undef,
'name' => './test.blg',
'_encoding' => 'utf8',
'package' => 'IO::All'
};
...
... and sure enough,the field is there, as expected, too...
Well - I hope someone eventually puts a more qualified answer here; for the time being, I hope this is as good as a fix to my problems :) ...

Argument to Perl module use

Having a C background, I may be trying to write something the wrong way so excuse the beginner question. Here is what I'm looking for :
I'm willing to have a perl module Verbose (not a class) that define a subroutine called verbose_print(). This subroutine will print its argument (a string) only if module's variable $verbose is true. Of course, what I'm willing to do is to get the -V option from the command line (using Getopt::Long) and then, is the -V switch is on, call the Verbose module with 'true' being the value for $Verbose::verbose.
One workaround is to have a verbose_init function that set the $Verbose::verbose variable to true inside the Verbose module.
Another one was to declare $verbose in my module using our and then $Verbose::verbose = $command_line_verbose_switch in the main script.
I was wondering if there is another way to do this in perl?
Don't be so afraid of classes in Perl, they're just packages and modules treated a wee bit differently. They don't bite. However, you said no classes, so no classes.
I prefer not to touch package variables directly. Instead, I'll use a subroutine to set them.
Here's my Local::Verbose (stored under Local/Verbose.pm)
package Local::Verbose;
use strict;
use warnings;
use Exporter 'import';
# Could have used just '#EXPORT', but that's bad manners
our #EXPORT_OK = qw(verbose verbose_switch);
# Use "our", so $verbose_value is a package variable.
# This makes it survive between calls to this package
our $verbose_value;
# prints out message, but only if $verbose_value is set to non-blank/zero value
sub verbose {
my $message = shift;
if ( $verbose_value ) {
print "VERBOSE: $message\n";
return $message;
}
else {
return;
}
}
sub verbose_switch {
my $switch_value = shift;
$verbose_value = $switch_value;
return $switch_value;
}
1;
Notice the our. That makes $verbose_value a package variable which means it lives on outside of the package between calls.
Notice how I use Exporter and the #EXPORT_OK array. You can use #EXPORT which will export all of the named subroutines automatically, but it's now considered bad manners since you could end up covering over someone's local subroutine with the same name. Better make it explicit. If there's a problem, they can use the Local::Verbose::verbose name of the verbose subroutine.
And how it's used
use strict;
use warnings;
use Local::Verbose qw(verbose verbose_switch);
verbose ("This is a test");
verbose_switch(1);
verbose ("This is a second test");
By the way, imagine calling the verbose subroutine like this:
verbose($message, $my_verbose_level);
Now, your verbose subroutine could look like this:
sub verbose {
my $message = shift;
my $verbose_level = shift;
if (not defined $verbose) {
$verbose_level = 1;
}
if ( $verbose_value =< $verbose_level ) {
print "VERBOSE: $message\n";
return $message;
}
else {
return;
}
}
Now, you can set your verbose level to various values, and have your verbose statements give you different levels of verbosity. (I do the same thing, but call it debug).
One of 'another ways' is create an import function:
package Verbose;
my $verbose_on;
sub import {
$verbose_on = shift;
}
#... Your code.
Now you can set your verbose like this:
if( ... ) { #check getopt
use Verbose (1); #This will require package Verbose and call "import"
}
But, i think more simple and obivious to further use is make a function-setter.

How do I access a hash from another subroutine?

I am trying to create some scripts for web testing and I use the following piece of code to set up variables from a config file:
package setVariables;
sub readConfig{
open(FH, "workflows.config") or die $!;
while(<FH>)
{
($s_var, $s_val) = split("=", $_);
chomp($s_var);
chomp($s_val);
$args{$s_var} = $s_val;
print "set $s_var = $s_val\n";
}
close(FH);
}
for example: var1=val1
var2=val2
var3=val3
etc...
I want to be able to pass the values set by this subroutine to a subroutine in another package. This is what I have for the package I want it passed into.
package startTest;
use setVariables;
sub startTest{
my %args = %setVariables::args;
my $s_var = $setVariables::s_var;
my $s_val = $setVariables::s_var;
setVariables::readConfig(); #runs the readConfig sub to set variables
my $sel = Test::WWW::Selenium->new( host => "localhost",
port => 4444,
browser => $args{"browser"},
browser_url => $args{"url"} );
$sel->open_ok("/index.aspx");
$sel->set_speed($args{"speed"});
$sel->type_ok("userid", $args{"usrname"});
$sel->type_ok("password", $args{"passwd"});
$sel->click_ok("//button[\#value='Submit']");
$sel->wait_for_page_to_load_ok("30000");
sleep($args{"sleep"});
}
Unfortunately its not holding on to the variables as is and I don't know how to reference them.
Thank you for any help.
Your code has some problems. Let's fix those first.
# Package names should start with upper case unless they are pragmas.
package SetVariables;
# Do this EVERYWHERE. It will save you hours of debugging.
use strict;
use warnings;
sub readConfig{
# Use the three argument form of open()
open( my $fh, '<', "workflows.config")
or die "Error opening config file: $!\n";
my %config;
# Use an explicit variable rather than $_
while( my $line = <$fh> )
{
chomp $line; # One chomp of the line is sufficient.
($s_var, $s_val) = split "=", $line;
$config{$s_var} = $s_val;
print "set $s_var = $s_val\n";
}
close $fh;
return \%config;
}
Then use like so:
use SetVariables;
my $config = SetVariables::readConfig();
print "$_ is $config->{$_}\n"
for keys %$config;
But rather than do all this yourself, check out the many, many config file modules on CPAN. Consider Config::Any, Config::IniFiles, Config::JSON.
You note in your comment that you are trying to work with multiple files, your main code and a couple of packages.
One pattern that is common is to load your config in your main code and pass it (or select elements of it) to consuming code:
package LoadConfig;
sub read_config {
my $file = shift;
my $config;
# Do stuff to read a file into your config object;
return $config;
}
1;
Meanwhile in another file:
package DoStuff;
sub run_some_tests {
my $foo = shift;
my $bar = shift;
# Do stuff here
return;
}
sub do_junk {
my $config;
my $foo = $config->{foo};
# Do junk
return;
}
1;
And in your main script:
use DoStuff;
use LoadConfig;
my $config = LoadConfig::read_config('my_config_file.cfg');
run_some_tests( $config->{foo}, $config->{bar} );
do_junk( $config );
So in run_some_tests() I extract a couple elements from the config and pass them in individually. In do_junk() I just pass in the whole config variable.
Are your users going to see the configuration file or just programmers? If it's just programmers, put your configuration in a Perl module, then use use to import it.
The only reason to use a configuration file for only programmers if you are compiling the program. Since Perl programs are scripts, don't bother with the overhead of parsing a configuration file; just do it as Perl.
Unless it's for your users and its format is simpler than Perl.
PS: There's already a module called Config. Call yours My_config and load it like this:
use FindBin '$RealBin';
use lib $RealBin;
use My_config;
See:
perldoc FindBin
perldoc Config
I would suggest using a regular format, such as YAML, to store the configuration data. You can then use YAML::LoadFile to read back a hash reference of the configuration data and then use it.
Alternatively, if you don't want to use YAML or some other configuration format with pre-written modules, you'll need for your reading routine to actually return either a hash or a a hashref.
If you need some more background information, check out perlref, perlreftut and perlintro.
all you need to do is collect the variable in a hash and return a reference to it in readConfig:
my %vars = ( var1 => val1,
var2 => val2,
var3 => val3,
);
return \%vars;
and in startTest:
my $set_vars = setVariables::readConfig();

Calling a function in Perl with different properties

I have written a Perl script that would start a SNMP session and extracting the data/counters and it's value to a csv file. There are 7 perl scripts; different properties/definition/variables on the top.. but the engine is the same.
At this point, those 7 perl scripts are redundant except for the defined variables. Is there a way to keep the execution perl script as a properties/execution file and keep the engine in a another file? This properties/execution perl script will call the engine (using the properties defined in it's own script).
So in short, I want to use the variables in their own script (as an execution as well), but calls a specific function from a unified "engine".
i.e.
retrieve_mibs1.pl retrieve_mibs2.pl
retrieve_mibs3.pl
retrieve_mibs4.pl
retrieve_mibs5.pl
retrieve_mibs6.pl
retrieve_mibs7.pl
retrieve_mibs1.pl
#!/usr/local/bin/perl
use Net::SNMP;
##DEFINITION START
my #Servers = (
'server1',
'server2',
);
my $PORT = 161;
my $COMMUNITY = 'secret';
my $BASEOID = '1.2.3.4.5.6.7.8';
my $COUNTERS = [
[11,'TotalIncomingFromPPH'],
[12,'TotalFailedIncomingFromPPH'],
];
##ENGINE START
sub main {
my $stamp = gmtime();
my #oids = ();
foreach my $counter (#$COUNTERS) {
push #oids,("$BASEOID.$$counter[0].0");
}
foreach my $server (#Servers) {
print "$stamp$SEPARATOR$server";
my ($session,$error) = Net::SNMP->session(-version => 1,-hostname => $server,-port => $PORT,-community => $COMMUNITY);
if ($session) {
my $result = $session->get_request(-varbindlist => \#oids);
if (defined $result) {
foreach my $oid (#oids) {
print $SEPARATOR,$result->{$oid};
}
} else {
print STDERR "$stamp Request error: ",$session->error,"\n";
print "$SEPARATOR-1" x scalar(#oids);
}
} else {
print STDERR "$stamp Session error: $error\n";
print "$SEPARATOR-1" x scalar(#oids);
}
print "\n";
}
}
main();
You could do it using eval: set up the variables in one file, then open the engine and eval it's content.
variables.pl (set up your variables and call the engine):
use warnings;
use strict;
use Carp;
use English '-no_match_vars';
require "engine.pl"; # so that we can call it's subs
# DEFINITION START
our $VAR1 = "Hello";
our $VAR2 = "World";
# CALL THE ENGINE
print "START ENGINE:\n";
engine(); # call engine
print "DONE\n";
engine.pl (the actual working stuff):
sub engine{
print "INSIDE ENGINE\n";
print "Var1: $VAR1\n";
print "Var2: $VAR2\n";
}
1; # return a true value
Other alternatives would be:
pass the definitions as command line parameters directly to engine.pl and evaluate the contents of #ARGV
write a perl module containing the engine and use this module
store the parameters in a config file and read it in from your engine (e.g. using Config::IniFiles)
Two thoughts come to mind immediately:
Build a Perl module for your common code, and then require or use the module as your needs dictate. (The difference is mostly whether you want to run LynxLee::run_servers() or run_servers() -- do you want the module to influence your current scope or not.)
Use symbolic links: create these symlinks: retrieve_mibs1.pl -> retrieve_mibs.pl retrieve_mibs2.pl -> retrieve_mibs.pl, and so on, then set the variables based on the program name:
#!/usr/bin/perl -w
use File::Basename;
my $name = basename($0);
my #Servers, $PORT, $COMMUNITY, $BASEOID, $COUNTERS;
if($name ~= /retrieve_mibs1\.pl/) {
#Servers = (
'server1',
'server2',
);
# ...
} elsif ($name ~= /retrieve_mibs2\.pl/) {
#Servers = (
'server3',
'server4',
);
# ...
}
Indexing into a hash with the name of the program to retrieve the parameters would be much cleaner, but I'm not so good at Perl references. :)
I'm not sure what the problem is so I'm guessing a little. You have code in various places that is the same each time save for some variables. This is the very definition of a subroutine.
Maybe the problem is that you don't know how to include the common code in those various scripts. This is fairly easy: You write that code in a perl module. This is basically a file ending in pm instead of pl. Of course you have to take care of a bunch of things such as exporting your functions. Perldoc should be of great help.