How to use Perl as a language for a configuration file? - perl

I have an application where I would like to allow users to specify configuration files in Perl.
What I have in mind is similar to how PKGBUILD is used in Arch: it is a Bash script, which is simply sourced from the "makepkg" tool. It contains variable and function definitions:
pkgbase=somepkg
pkgname=('somepkg')
pkgver=0.0.1
...
prepare() {
cd "${srcdir}/${pkgbase}-${pkgver}"
...
}
I would like to do something like this in Perl. I guess I'm looking for a Perl version of source, which can read a file, evaluate it, and import all the subroutines and variables into the namespace of the caller. With the Bash version, I could conceivably write a PKGBUILD which sources another PKGBUILD and then overrides one or two variables; I want this kind of "inheritance" to be possible in my Perl config files as well.
One problem with Perl's do is that it seems to put the file's variables and subroutines into a separate namespace. Also, I can't figure out how to override true subroutines, only named anonymous ones.
Here's a version that may serve to illustrate what I want to do. It is not very elegant, but it demonstrates overriding both subroutines and variables, as well as calling a previously-defined subroutine:
$ cat test-override
#!/usr/bin/perl
use warnings;
use strict;
my #tables = qw(a b c);
my $print_tables = sub {
print join(", ", #tables), "\n";
};
eval(`cat "test-conf"`) or die "$#";
&$print_tables();
$ cat test-conf
#tables = qw(d e f);
my $old_print_tables = $print_tables;
$print_tables = sub {
warn "In test-conf \$print_tables\n";
&$old_print_tables();
}
$ ./test-override
In test-conf $print_tables
d, e, f
Another way to do this would be to let the configuration file return a hash, with data and subroutines as values. There is also the option of using classes and inheritance. However, I want the configuration files to be as light-weight as possible, syntactically.
The documentation for "do" mentions the possibility of using Perl in configuration files, so I know this problem has been considered before. Is there a canonical example of how to do it in a "user-friendly" manner?

Your description of what you want this "configuration file" to do (import subs and variables into the caller's namespace, override subs, classes, inheritance...) sounds suspiciously like a standard Perl module. So use a standard Perl module.
Note that there is widespread precedent for this approach: The standard cpan command-line client stores its configuration in a module, located at the default path of ~/.cpan/CPAN/MyConfig.pm on *nix-type systems. Granted, cpan's MyConfig.pm is a very simple example which just sets the hashref $CPAN::Config, but there's no reason it couldn't also do all the other things any module does.
But doing it with do is quite simple. I suspect you're just overthinking it:
$ cat test-override
#!/usr/bin/perl
use warnings;
use strict;
our #tables = qw(a b c);
sub print_tables {
print join(", ", #tables), "\n";
};
print_tables;
do "test-conf";
print_tables;
print "\#tables is #tables\n";
$ cat test-conf
#tables = qw(d e f);
sub print_tables {
print "print_tables from test_conf\n";
}
$ ./test-override
a, b, c
print_tables from test_conf
#tables is d e f
The important change I made with #tables was to change it from my, which is visible only within the current scope and the current file, to our, which is visible anywhere within the same package (or from other packages if it's qualified with the package name).
But my print_tables from the config file doesn't call the original print_tables, and you're just out of luck on that one. Because there can only be one &main::print_tables, replacing it completely overwrites the original one, which no longer exists. If you want to be able to override it and still be able to call the original, you need to put the two declarations into different packages, which kind of implies using OO Perl (so that you'll be able to polymorphically call the right one).
Also note that use has the same lexical scoping as my, which means that your use strict; use warnings; does not carry over into the conf file. You can easily demonstrate this by adding a use warnings; to my version of test-conf, at which point it will then generate the warning Subroutine print_tables redefined.

Related

In Perl, why do I get "undefined subroutine" in a perl module but not in main ?

I'm getting an "undefined subroutine" for sub2 in the code below but not for sub1.
This is the perl script (try.pl)...
#!/usr/bin/env perl
use strict;
use IO::CaptureOutput qw(capture_exec_combined);
use FindBin qw($Bin);
use lib "$Bin";
use try_common;
print "Running try.pl\n";
sub1("echo \"in sub1\"");
sub2("echo \"in sub2\"");
exit;
sub sub1 {
(my $cmd) = #_;
print "Executing... \"${cmd}\"\n";
my ($stdouterr, $success, $exit_code) = capture_exec_combined($cmd);
print "${stdouterr}\n";
return;
}
This is try_common.pm...
#! /usr/bin/env perl
use strict;
use IO::CaptureOutput qw(capture_exec_combined);
package try_common;
use Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw(
sub2
);
sub sub2 {
(my $cmd) = #_;
print "Executing... \"${cmd}\"\n";
my ($stdouterr, $success, $exit_code) = capture_exec_combined($cmd);
print "${stdouterr}\n";
return;
}
1;
When I run try.pl I get...
% ./try.pl
Running try.pl
Executing... "echo "in sub1""
in sub1
Executing... "echo "in sub2""
Undefined subroutine &try_common::capture_exec_combined called at
/home/me/PERL/try_common.pm line 20.
This looks like some kind of scoping issue because if I cut/paste the "use IO::CaptureOutput qw(capture_exec_combined);" as the first line of sub2, it works. This is not necessary in the try.pl (it runs sub1 OK), but a problem in the perl module. Hmmmm....
Thanks in Advance for any help!
You imported capture_exec_combined by the use clause before declaring the package, so it was imported into the main package, not the try_common. Move the package declaration further up.
You should take a look at the perlmod document to understand how modules work. In short:
When you use package A (in Perl 5), you change the namespace of the following code to A, and all global symbol (e.g. subroutine) definitions after that point will go into that package. Subroutines inside a scope need not be exported and may be used preceded by their scope name: A::function. This you seem to have found.
Perl uses package as a way to create modules and split code in different files, but also as the basis for its object orientation features.
Most of the times, modules are handled by a special core module called Exporter. See Exporter. This module uses some variables to know what to do, like #EXPORT, #EXPORT_OK or #ISA. The first defines the names that should be exported by default when you include the module with use Module. The second defines the names that may be exported (but need to be mentioned with use Module qw(name1 name2). The last tells in an object oriented fashion what your module is. If you don't care about object orientation, your module typically "is a" Exporter.
Also, as stated in another answer, when you define a module, the package module declaration should be the first thing to be in the file so anything after it will be under that scope.
I hate when I make this mistake although I don't make it much anymore. There are two habits you can develop:
Most likely, make the entire file the package. The first lines will be the package statement and no other package statements show up in the file.
Or, use the new PACKAGE BLOCK syntax and put everything for that package inside the block. I do this for small classes that I might need only locally:
package Foo {
# everything including use statements go in this block
}
I think I figured it out. If, in the perl module, I prefix the "capture_exec_combined" with "::", it works.
Still, why isn't this needed in the main, try.pl ?

Splitting perl module into multiple files

I'm a perl newbie who is working on a module that is getting quite long and I want to split it into multiple files for easier maintainability and organization. Right now it looks something like this
#ABC.pm
package ABC;
use strict;
use warnings;
my $var1;
my $var2;
sub func1 {
#some operations on a $var
}
sub func2 {
#some operations on a $var
}
return 1;
I'd like it to look something like
#ABC_Part_1.pm
package ABC;
use strict;
use warnings;
my $var1;
my $var2;
sub func1 {
#some operations on a $var
}
return 1;
#ABC_Part_2.pm
package ABC;
use strict;
use warnings;
sub func2 {
#some operations on a $var
}
return 1;
The issue I'm having is getting the variables to be seen across the separate files. I tried to declare them using 'our', but then I have to use the scope resolution operator which I don't want to do. I'd like to treat them as local variables within the module files, but have them hidden to the calling script. I'd also like to only have one include in the calling script, like
#!/usr/bin/env perl
#script.pl
use strict;
use warnings;
use ABC;
func1();
func2();
Thanks
The issue I'm having is getting the variables to be seen across the separate files.
Your best option is to stop wanting that.
The whole point of lexical variables is for them to be accessible in only a small locally visible bit of code. A variable that needs to be accessed from multiple different files is a sign of "code smell".
If you're really sure you want this though…
I tried to declare them using 'our', but then I have to use the scope resolution operator which I don't want to do.
Yes, this should work, but you need to declare the variable at the top of each file you use it in.
# ABC_part_1.pm
package ABC;
our $foo;
# code that accesses $foo goes here
1;
And then:
# ABC_part_2.pm
package ABC;
our $foo;
# code that accesses $foo goes here
1;
You can make your ABC.pm file a collection of require statements.
package ABC;
require ABC1;
require ABC2;
require ABC3;
1;
It's important to work with require instead of use, because use will try to import things automatically. But that will not work since there is no package ABC1 in your ABC1.pm file, so ABC1->import will fail.
Regarding the variables, there's really no way to get lexical variables into different files. You could use do instead of require, which would read and run the files directly in the line with the do. That way, the scope would stay the same, and you could have this.
package ABC;
my $foo;
my $bar;
do 'lib/ABC1.pm';
do 'lib/ABC2.pm';
Please do not do this. It's crazy!
If you feel that your library is getting too big, first add proper documentation to every function, and sort so that things that belong together are together. If that does not help you, split up the file into smaller logical units and make those individual packages that talk to each other through a defined interface, but also are able to stand alone where needed.
If repeating a bunch of use statements feels like too much boiler plate, write your own module collection (like strictures) using Import::Into.
Furthermore, don't use lexical variables in the file scope. If you want to have state for things, create object oriented code and write classes. Then you'll have state and behavior. If you have package/class data, use package variables.
Perl doesn't have the concept of private things for a reason. There are conventions in place to mark things as private, like naming them _stuff with a leading underscore. That's a sign for everyone that this is internal, not a stable API, might change at any moment and shouldn't be messed with. Do that, instead of trying to hide things. It's a strength of Perl to allow you to mess with everything. But that doesn't mean you have to do it. It's an option that you should embrace.

writing reflective method to load variables from conf file, and assigning references?

I'm working with ugly code and trying to do a cleanup by moving values in a module into a configuration file. I want to keep the modules default values if a variable doesn't exist in the conf file, otherwise use the conf file version. There are lots of variables (too many) in the module so I wanted a helper method to support this. This is a first refactoring step, I likely will go further to better handle config variables later, but one step at a time.
I want a method that would take a variable in my module and either load the value from conf or set a default. So something like this (writing this from scratch, so treat it as just pseudocode for now)
Our ($var_a, $var_b ...);
export($var_a, $var_b ...);
my %conf = #load config file
load_var(\$var_a, "foo");
load_var(\$var_b, "$var_abar");
sub load_var($$){
my($variable_ref, $default) = #_
my $variale_name = Dumper($$variable_ref); #get name of variable
my $variable_value = $conf{$variable_name} // $default;
#update original variable by having $variable_ref point to $variable_value
}
So two questions here. First, does anyone know if some functionality like my load_var already exists which I an reuse?
Second, if I have to write it from scratch, can i do it with a perl version older then 5.22? when I read perlref it refers to setting references as being a new feature in 5.22, but it seems odd that such a basic behavior of references wasn't implemented sooner, so I'm wonder if I'm misunderstanding the document. Is there a way to pass a variable to my load_var method and ensure it's actually updated?
For this sort of problem, I would be thinking along the lines of using the AUTOLOAD - I know it's not quite what you asked, but it's sort of doing a similar thing:
If you call a subroutine that is undefined, you would ordinarily get an immediate, fatal error complaining that the subroutine doesn't exist. (Likewise for subroutines being used as methods, when the method doesn't exist in any base class of the class's package.) However, if an AUTOLOAD subroutine is defined in the package or packages used to locate the original subroutine, then that AUTOLOAD subroutine is called with the arguments that would have been passed to the original subroutine.
Something like:
#!/usr/bin/env perl
package Narf;
use Data::Dumper;
use strict;
use warnings;
our $AUTOLOAD;
my %conf = ( fish => 1,
carrot => "banana" );
sub AUTOLOAD {
print "Loading $AUTOLOAD\n";
##read config file
my $target = $AUTOLOAD =~ s/.*:://gr;
print $target;
return $conf{$target} // 0;
}
sub boo {
print "Boo!\n";
}
You can either call it OO style, or just 'normally' - but bear in mind this creates subs, not variables, so you might need to specify the package (or otherwise 'force' an import/export)
#!/usr/bin/env perl
use strict;
use warnings;
use Narf;
print Narf::fish(),"\n";
print Narf::carrot(),"\n";
print Narf::somethingelse(),"\n";
print Narf::boo;
Note - as these are autoloaded, they're not in the local namespace. Related to variables you've got this perlmonks discussion but I'm not convinced that's a good line to take, for all the reasons outlined in Why it's stupid to `use a variable as a variable name'

Name space pollution from indirectly included module

Consider the following script p.pl:
use strict;
use warnings;
use AA;
BB::bfunc();
where the file AA.pm is:
package AA;
use BB;
1;
and the file BB.pm is:
package BB;
sub bfunc {
print "Running bfunc..\n";
}
1;
Running p.pl gives output (with no warnings or errors):
Running bfunc..
Q: Why is it possible to call BB::bfunc() from p.pl even though there is no use BB; in p.pl? Isn't this odd behavior? Or are there situation where this could be useful?
(To me, it seems like this behavior only presents an information leak to another package and violates the data hiding principle.. Leading to programs that are difficult to maintain.. )
You're not polluting a namespace, because the function within BB isn't being 'imported' into your existing namespace.
They are separate, and may be referenced autonomously.
If you're making a module, then usually you'll define via Exporter two lists:
#EXPORT and #EXPORT_OK.
The former is the list of things that should be imported when you use the package. The latter is the things that you can explicity import via:
use MyPackage qw ( some_func );
You can also define package variables in your local namespace via our and reference them via $main.
our $fish = "haddock";
print $main::fish;
When you do this, you're explicitly referencing the main namespace. When you use a module, then you cause perl to go and look for it, and include it in your %INC. I then 'knows about' that namespace - because it must in order for the dependencies to resolve.
But this isn't namespace pollution, because it doesn't include anything in your namespace until your ask.
This might make a bit more sense if you have multiple packages within the same program:
use strict;
use warnings;
package CC;
our $package_var = "Blong";
sub do_something {
print $package_var,"\n";
}
package main;
use Data::Dumper;
our $package_var = "flonk";
print Dumper $package_var;
print Dumper $CC::package_var;
Each package is it's own namespace, but you can 'poke' things in another. perl will also let you do this with object - poking at the innards of instantiated objects or indeed "patch" them.
That's quite powerful, but I'd generally suggest Really Bad Style.
While it's good practice to use or require every dependency that you are planning to access (tried to avoid use here), you don't have to do that.
As long as you use full package names, that is fine. The important part is that Perl knows about the namespaces. If it does not, it will fail.
When you use something, that is equivalent to:
BEGIN {
require Foo::Bar;
Foo::Bar->import();
}
The require will take the Foo::Bar and convert it to a path according to the operating system's conventions. On Linux, it will try to find Foo/Bar.pm somewhere inside #INC. It will then load that file and make a note in %INC that it loaded the file.
Now Perl knows about that namespace. In case of the use it might import something into your own namespace. But it will always be available from everywhere after that as long as you use the full name. Just the same, stuff that you have in your main script.pl would be available inside of packages by saying main::frobnicate(). (Please don't do that!)
It's also not uncommon to bundle several namespaces/packages in one .pm module file. There are quite a few big names on CPAN that do it, like XML::Twig.
If you do that, and don't import anything, the only way to get to the stuff under the different namespaces is by using the full name.
As you can see, this is not polluting at all.

How can I share global values among different packages in Perl?

Is there a standard way to code a module to hold global application parameters to be included in every other package? For instance: use Config;?
A simple package that only contains our variables? What about readonly variables?
There's already a standard Config module, so choose a different name.
Say you have MyConfig.pm with the following contents:
package MyConfig;
our $Foo = "bar";
our %Baz = (quux => "potrzebie");
1;
Then other modules might use it as in
#! /usr/bin/perl
use warnings;
use strict;
use MyConfig;
print "Foo = $MyConfig::Foo\n";
print $MyConfig::Baz{quux}, "\n";
If you don't want to fully qualify the names, then use the standard Exporter module instead.
Add three lines to MyConfig.pm:
package MyConfig;
require Exporter;
our #ISA = qw/ Exporter /;
our #EXPORT = qw/ $Foo %Baz /;
our $Foo = "bar";
our %Baz = (quux => "potrzebie");
1;
Now the full package name is no longer necessary:
#! /usr/bin/perl
use warnings;
use strict;
use MyConfig;
print "Foo = $Foo\n";
print $Baz{quux}, "\n";
You could add a read-only scalar to MyConfig.pm with
our $READONLY;
*READONLY = \42;
This is documented in perlmod.
After adding it to #MyConfig::EXPORT, you might try
$READONLY = 3;
in a different module, but you'll get
Modification of a read-only value attempted at ./program line 12.
As an alternative, you could declare in MyConfig.pm constants using the constant module and then export those.
Don't use global variables for configuration and don't sotre configuration as code. I have an entire chapter in Mastering Perl about this.
Instead, make a configuration class that any other package can use to access the configuration data. It will be much easier in the long run to provide an interface to something you may want to change later than deal with the nuttiness you lock yourself into by scattering variables names you have to support for the rest of your life.
A config interface also gives you the benefit of composing new answers to configuration questions by combining the right bits of actual configuration data. You hide all of that behind a method and the higher levels don't have to see how it's implemented. For instance,
print "Hello!" unless $config->be_silent;
The be_silent answer can be triggered for multiple reasons the higher-level code doesn't need to know about. It could be from a user switch, that the program detected it is not interactive, and so on. It can also be flipped by options such as a debugging switch, which overrides all other preferences. No matter what you decide, that code line doesn't change because that statement only cares about the answer, not how you got the answer.