Perl reading zip files with IO::Uncompress::AnyUncompress - perl

We are moving from our current build system (which is a mess) to one that uses Ant with Ivy. I'm cleaning up all the build files, and finding the jar dependencies. I thought it might be easier if I could automate it a bit, by going through the jars that are checked into the project, finding what classes they contain, then matching those classes with the various import statements in the Java code.
I have used Archive::Tar before, but Archive::Zip isn't a standard Perl module. (My concern is that someone is going to try my script, call me in the middle of the night and tell me it isn't working.)
I noticed that IO::Uncompress::AnyUncompress is a standard module, so I thought I could try IO::Uncompress::AnyUncompressor at leastIO::Uncompress::Unzip` which is also a standard module.
Unfortunately, the documentation for these modules give no examples (According to the documentation, examples are a todo).
I'm able to successfully open my jar and create an object:
my $zip_obj = IO::Uncompress::AnyUncompress->new ( $zip_file );
Now, I want to see the contents. According to the documentation:
getHeaderInfo
Usage is
$hdr = $z->getHeaderInfo();
#hdrs = $z->getHeaderInfo();
This method returns either a hash reference (in scalar context) or a list or hash references (in array context) that contains information about each of the header fields in the compressed data stream(s).
Okay, this isn't an object like Archive::Tar or Archive::Zip returns, and there are no methods or subroutines mentioned to parse the data. I'll use Data::Dumper and see what hash keys are contained in the reference.
Here's a simple test program:
#! /usr/bin/env perl
use 5.12.0;
use warnings;
use IO::Uncompress::AnyUncompress;
use Data::Dumper;
my $obj = IO::Uncompress::AnyUncompress->new("testng.jar")
or die qq(You're an utter failure);
say qq(Dump of \$obj = ) . Dumper $obj;
my #header2 = $obj->getHeaderInfo;
say qq(Dump of \$header = ) . Dumper $headers->[0];
And here's my results:
Dump of $obj = $VAR1 = bless( \*Symbol::GEN0, 'IO::Uncompress::Unzip' );
Dump of $header = $VAR1 = {
'UncompressedLength' => 0,
'Zip64' => 0,
'MethodName' => 'Stored',
'Stream' => 0,
'Time' => 1181224440,
'MethodID' => 0,
'CRC32' => 0,
'HeaderLength' => 43,
'ExtraFieldRaw' => '¦- ',
'ExtraField' => [
[
'¦-',
''
]
],
'FingerprintLength' => 4,
'Type' => 'zip',
'TrailerLength' => 0,
'CompressedLength' => 0,
'Name' => 'META-INF/',
'Header' => 'PK
+N¦6 META-INF/¦- '
};
Some of that looks sort of useful. However, all of my entries return `'Name' => 'META-INF/``, so it doesn't look like a file name.
Is it possible to use IO::Uncompress::AnyUncompress (or even IO::Uncompress:Unzip) to read through the archive and see what files are in its contents. And, if so, how do I parse that header?
Otherwise, I'll have to go with Archive::Zip and let people know they have to download and install it from CPAN on their systems.

The files in the archive are compressed in different data streams, so you need to iterate through the streams to get the individual files.
use strict;
use warnings;
use IO::Uncompress::Unzip qw(unzip $UnzipError);
my $zipfile = 'zipfile.zip';
my $u = new IO::Uncompress::Unzip $zipfile
or die "Cannot open $zipfile: $UnzipError";
die "Zipfile has no members"
if ! defined $u->getHeaderInfo;
for (my $status = 1; $status > 0; $status = $u->nextStream) {
my $name = $u->getHeaderInfo->{Name};
warn "Processing member $name\n" ;
if ($name =~ /\/$/) {
mkdir $name;
}
else {
unzip $zipfile => $name, Name => $name
or die "unzip failed: $UnzipError\n";
}
}

Related

Using a perl script to loop through files in directory

I have a directory filled with several thousand .txt files. I need to run the same perl script on each .txt file and when it's done running the script on each .txt file, to name that file a unique name. Please forgive the basic inquiry as I am learning perl with this script for the first time.
I have seen other posts addressing this issue: How can I loop through files in a directory in Perl? and running loops through terminal: Take all files in dir and for each file do the same perl procedure.
A bit about my data: These are blastx subject sequence ID results
$head file1.txt
GCN2_SCHPO
GCN2_YEAST
GCN20_YEAST
GCNK_GLUOX
$head file2.txt
PDXA_RUEST
PDXA_SULSY
PDXA_SYNFM
PDXA_SYNY3
My perl script- uses Uniprot's Retrieve/ID mapping service programmatically, instead of putting in thousands of requests, manually (Retrieve/ID mapping):
use warnings;
use LWP::UserAgent;
#files = <*.txt>; # File containg list of UniProt IDs.
my $base = 'http://www.uniprot.org';
my $tool = 'uploadlists';
my $contact = ''; # Please set your email address here to help us debug in
case of problems.
my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push #{$agent->requests_redirectable}, 'POST';
foreach $file (#files) {
my $response = $agent->post("$base/$tool/",
[ 'file' => [#files],
'format' => 'tab',
'from' => 'ACC+ID',
'to' => 'ACC',
'columns' => 'id,database(ko)',
],
'Content_Type' => 'form-data');
while (my $wait = $response->header('Retry-After')) {
print STDERR "Waiting ($wait)...\n";
sleep $wait;
$response = $agent->get($response->base);
}
$response->is_success ?
print $response->content :
die 'Failed, got ' . $response->status_line .
' for ' . $response->request->uri . "\n";
print $file . "\n";
}
This script, instead of looping through each .txt file only grabs the first .txt file in my directory and performs this function over and over again on that one file only. However, at the end, it prints the correct file name. Here is an example output:
Entry Cross-reference (ko) yourlist:M20170501A isomap:M201705
Q9HGN1 K16196; GCN2_SCHPO
P15442 K16196; GCN2_YEAST
P43535 K06158; GCN20_YEAST
Q5FQ97 K00851; GCNK_GLUOX
file1.txt
Entry Cross-reference (ko) yourlist:M20170501A isomap:M201705
Q9HGN1 K16196; GCN2_SCHPO
P15442 K16196; GCN2_YEAST
P43535 K06158; GCN20_YEAST
Q5FQ97 K00851; GCNK_GLUOX
file2.txt
I have tried to do this via terminal with the following loop as well:
for i in *; do perl script.pl $i $i.txt; done
and I get the same results.
I am missing something very simple and am asking for your wisdom on understanding why this loop is being loopy. Secondly, is there a way to code this (in the script or via terminal) to name each result of each .txt file differently?
Thank-you!
Your for loop foreach $file (#files) { ... }executes the following block repeatedly, setting $file to each file name in turn. But inside the loop you try to pass all of the files at once, using the parameter 'file' => [#files]
LWP treats that list as a file path, a file name, and a number of header names and values, so the data uploaded always comes from the first file in #files
The quick solution is to to replace that line with file => [ $file ] and then it should work, but there are a few other issues with you code so I've written this refactoring
I'm not in a position to test this at present, but it does compile
use strict;
use warnings 'all';
use LWP::UserAgent;
my #files = glob '*.txt'; # Files containg list of UniProt IDs.
my $base = 'http://www.uniprot.org';
my $tool = 'uploadlists';
my $contact = ''; # Please set your email address here
# to help us debug in case of problems.
my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push #{$agent->requests_redirectable}, 'POST';
for my $file ( #files ) {
my $response = $agent->post(
"$base/$tool/",
Content_Type => 'form-data',
Content => [
file => [ $file ],
format => 'tab',
from => 'ACC+ID',
to => 'ACC',
columns => 'id,database(ko)',
],
);
while ( my $wait = $response->header('Retry-After') ) {
print STDERR "Waiting ($wait)...\n";
sleep $wait;
$response = $agent->get($response->base);
}
if ( $response->is_success ) {
print $response->content;
}
else {
die sprintf "Failed. Got %s for %s\n",
$response->request->uri,
$response->status_line;
}
}

Iterate directories in Perl, getting introspectable objects as result

I'm about to start a script that may have some file lookups and manipulation, so I thought I'd look into some packages that would assist me; mostly, I'd like the results of the iteration (or search) to be returned as objects, which would have (base)name, path, file size, uid, modification time, etc as some sort of properties.
The thing is, I don't do this all that often, and tend to forget APIs; when that happens, I'd rather let the code run on an example directory, and dump all of the properties in an object, so I can remind myself what is available where (obviously, I'd like to "dump", in order to avoid having to code custom printouts). However, I'm aware of the following:
list out all methods of object - perlmonks.org
"Out of the box Perl doesn't do object introspection. Class wrappers like Moose provide introspection as part of their implementation, but Perl's built in object support is much more primitive than that."
Anyways, I looked into:
"Files and Directories Handling in Perl - Perl Beginners' Site" http://perl-begin.org/topics/files-and-directories/
... and started looking into the libraries referred there (also related link: rjbs's rubric: the speed of Perl file finders).
So, for one, File::Find::Object seems to work for me; this snippet:
use Data::Dumper;
#targetDirsToScan = ("./");
use File::Find::Object;
my $tree = File::Find::Object->new({}, #targetDirsToScan);
while (my $robh = $tree->next_obj()) {
#print $robh ."\n"; # prints File::Find::Object::Result=HASH(0xa146a58)}
print Dumper($robh) ."\n";
}
... prints this:
# $VAR1 = bless( {
# 'stat_ret' => [
# 2054,
# 429937,
# 16877,
# 5,
# 1000,
# 1000,
# 0,
# '4096',
# 1405194147,
# 1405194139,
# 1405194139,
# 4096,
# 8
# ],
# 'base' => '.',
# 'is_link' => '',
# 'is_dir' => 1,
# 'path' => '.',
# 'dir_components' => [],
# 'is_file' => ''
# }, 'File::Find::Object::Result' );
# $VAR1 = bless( {
# 'base' => '.',
# 'is_link' => '',
# 'is_dir' => '',
# 'path' => './test.blg',
# 'is_file' => 1,
# 'stat_ret' => [
# 2054,
# 423870,
# 33188,
# 1,
# 1000,
# 1000,
# 0,
# '358',
# 1404972637,
# 1394828707,
# 1394828707,
# 4096,
# 8
# ],
# 'basename' => 'test.blg',
# 'dir_components' => []
... which is mostly what I wanted, except the stat results are an array, and I'd have to know its layout (($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) stat - perldoc.perl.org) to make sense of the printout.
Then I looked into IO::All, which I like because of utf-8 handling (but also, say, socket functionality, which would be useful to me for an unrelated task in the same script); and I was thinking I'd use this package instead. The problem is, I have a very hard time discovering what the available fields in the object returned are; e.g. with this code:
use Data::Dumper;
#targetDirsToScan = ("./");
use IO::All -utf8;
$io = io(#targetDirsToScan);
#contents = $io->all(0);
for my $contentry ( #contents ) {
#print Dumper($contentry) ."\n";
# $VAR1 = bless( \*Symbol::GEN298, 'IO::All::File' );
# $VAR1 = bless( \*Symbol::GEN307, 'IO::All::Dir' ); ...
#print $contentry->uid . " -/- " . $contentry->mtime . "\n";
# https://stackoverflow.com/q/24717210/printing-ret-of-ioall-w-datadumper
print Dumper \%{*$contentry}; # doesn't list uid
}
... I get a printout like this:
# $VAR1 = {
# '_utf8' => 1,
# 'constructor' => sub { "DUMMY" },
# 'is_open' => 0,
# 'io_handle' => undef,
# 'name' => './test.blg',
# '_encoding' => 'utf8',
# 'package' => 'IO::All'
# };
# $VAR1 = {
# '_utf8' => 1,
# 'constructor' => sub { "DUMMY" },
# 'mode' => undef,
# 'name' => './testdir',
# 'package' => 'IO::All',
# 'is_absolute' => 0,
# 'io_handle' => undef,
# 'is_open' => 0,
# '_assert' => 0,
# '_encoding' => 'utf8'
... which clearly doesn't show attributes like mtime, etc. - even if they exist (which you can see if you uncomment the respective print line).
I've also tried Data::Printer's (How can I perform introspection in Perl?) p() function - it prints exactly the same fields as Dumper. I also tried to use print Dumper \%{ref ($contentry) . "::"}; (list out all methods of object - perlmonks.org), and this prints stuff like:
'O_SEQUENTIAL' => *IO::All::File::O_SEQUENTIAL,
'mtime' => *IO::All::File::mtime,
'DESTROY' => *IO::All::File::DESTROY,
...
'deep' => *IO::All::Dir::deep,
'uid' => *IO::All::Dir::uid,
'name' => *IO::All::Dir::name,
...
... but only if you use the print $contentry->uid ... line beforehand; else they are not listed! I guess that relates to this:
introspection - How do I list available methods on a given object or package in Perl? #911294
In general, you can't do this with a dynamic language like Perl. The package might define some methods that you can find, but it can also make up methods on the fly that don't have definitions until you use them. Additionally, even calling a method (that works) might not define it. That's the sort of things that make dynamic languages nice. :)
Still, that prints the name and type of the field - I'd want the name and value of the field instead.
So, I guess my main question is - how can I dump an IO::All result, so that all fields (including stat ones) are printed out with their names and values (as is mostly the case with File::Find::Object)?
(I noticed the IO::All results can be of type, say, IO::All::File, but its docs defer to "See IO::All", which doesn't discuss IO::All::File explicitly much at all. I thought, if I could "cast" \%{*$contentry} to a IO::All::File, maybe then mtime etc fields will be printed - but is such a "cast" possible at all?)
If that is problematic, are there other packages, that would allow introspective printout of directory iteration results - but with named fields for individual stat properties?
Perl does introspection in the fact that an object will tell you what type of object it is.
if ( $object->isa("Foo::Bar") ) {
say "Object is of a class of Foo::Bar, or is a subclass of Foo::Bar.";
}
if ( ref $object eq "Foo::Bar" ) {
say "Object is of the class Foo::Bar.";
}
else {
say "Object isn't a Foo::Bar object, but may be a subclass of Foo::Bar";
}
You can also see if an object can do something:
if ( $object->can("quack") ) {
say "Object looks like a duck!";
}
What Perl can't do directly is give you a list of all the methods that a particular object can do.
You might be able to munge some way.Perl objects are stored in package namespaces which are in the symbol table. Classes are implemented via Perl subroutines. It may be possible to go through the package namespace and then find all the subroutines.
However, I can see several issues. First private methods (the ones you're not suppose to use) and non-method subroutines would also be included. There's no way to know which is which. Also, parent methods won't be listed.
Many languages can generate such a list of methods for their objects (I believe both Python and Ruby can), but these usually give you a list without an explanation what these do. For example, File::Find::Object::Result (which is returned by the next_obj method of File::Find::Object) has a base method. What does it do? Maybe it's like basename and gives me the name of the file. Nope, it's like dirname and gives me the name of the directory.
Again, some languages could give a list of those methods for an object and a description. However, those descriptions depend upon the programmer to maintain and make sure they're correct. No guaranteed of that.
Perl doesn't have introspection, but all Perl modules stored in CPAN must be documented via POD embedded documentation, and this is printable from the command line:
$ perldoc File::Find::Object
This is the documentation you see in CPAN pages, in http://Perldoc.perl.org and in ActiveState's Perl documentation.
It's not bad. It's not true introspection, but the documentation is usually pretty good. After all, if the documentation stunk, I probably wouldn't have installed that module in the first place. I use perldoc all the time. I can barely remember my kids' names let alone the way to use Perl classes that I haven't used in a few months, but I find that using perldoc works pretty wall.
What you should not do is use Data::Dumper to dump out objects and try to figure out what they contain and possible methods. Some cleaver programmers are using Inside-Out Objects to thwart peeking toms.
So no, Perl doesn't list methods of a particular class like some languages can, but perldoc comes pretty close to doing what you need. I haven't use File::Find::Object in a long while, but going over the perldoc, I probably could write up such a program without much difficulty.
As I answered to your previous question, it is not a good idea to go relying on the guts of objects in Perl. Instead just call methods.
If IO::All doesn't offer a method that gives you the information that you need, you might be able to write your own method for it that assembles that information using just the documented methods provided by IO::All...
use IO::All;
# Define a new method for IO::All::Base to use, but
# define it in a lexical variable!
#
my $dump_info = sub {
use Data::Dumper ();
my $self = shift;
local $Data::Dumper::Terse = 1;
local $Data::Dumper::Sortkeys = 1;
return Data::Dumper::Dumper {
name => $self->name,
mtime => $self->mtime,
mode => $self->mode,
ctime => $self->ctime,
};
};
$io = io('/tmp');
for my $file ( $io->all(0) ) {
print $file->$dump_info();
}
Ok, this is more-less as an exercise (and reminder for me); below is some code, where I've tried to define a class (File::Find::Object::StatObj) with accessor fields for all of the stat fields. Then, I have the hack for IO::All::File from Replacing a class in Perl ("overriding"/"extending" a class with same name)?, where a mtimef field is added which corresponds to mtime, just as a reminder.
Then, just to see what sort of interface I could have between the two libraries, I have IO::All doing the iterating; and the current file path is passed to File::Find::Object, from which we obtain a File::Find::Object::Result - which has been "hacked" to also show the File::Find::Object::StatObj; but that one is only generated after a call to the hacked Result's full_components (that might as well have been a separate function). Notice that in this case, you won't get full_components/dir_components of File::Find::Object::Result -- because apparently it is not File::Find::Object doing the traversal here, but IO::All. Anyways, the result is something like this:
# $VAR1 = {
# '_utf8' => 1,
# 'mtimef' => 1403956165,
# 'constructor' => sub { "DUMMY" },
# 'is_open' => 0,
# 'io_handle' => undef,
# 'name' => 'img/test.png',
# '_encoding' => 'utf8',
# 'package' => 'IO::All'
# };
# img/test.png
# > - $VAR1 = bless( {
# 'base' => 'img/test.png',
# 'is_link' => '',
# 'is_dir' => '',
# 'path' => 'img/test.png',
# 'is_file' => 1,
# 'stat_ret' => [
# 2054,
# 426287,
# 33188,
# 1,
# 1000,
# 1000,
# 0,
# '37242',
# 1405023944,
# 1403956165,
# 1403956165,
# 4096,
# 80
# ],
# 'basename' => undef,
# 'stat_obj' => bless( {
# 'blksize' => 4096,
# 'ctime' => 1403956165,
# 'rdev' => 0,
# 'blocks' => 80,
# 'uid' => 1000,
# 'dev' => 2054,
# 'mtime' => 1403956165,
# 'mode' => 33188,
# 'size' => '37242',
# 'nlink' => 1,
# 'atime' => 1405023944,
# 'ino' => 426287,
# 'gid' => 1000
# }, 'File::Find::Object::StatObj' ),
# 'dir_components' => []
# }, 'File::Find::Object::Result' );
I'm not sure how correct this would be, but what I like about this is that I could forget where the fields are; then I could rerun the dumper, and see that I could get mtime via (*::Result)->stat_obj->size - and that seems to work (here I'd need just to read these, not to set them).
Anyways, here is the code:
use Data::Dumper;
my #targetDirsToScan = ("./");
use IO::All -utf8 ; # Turn on utf8 for all io
# try to "replace" the IO::All::File class
{ # https://stackoverflow.com/a/24726797/277826
package IO::All::File;
use IO::All::File; # -base; # just do not use `-base` here?!
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
# NB: field is a sub in /usr/local/share/perl/5.10.1/IO/All/Base.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
#print("!! *haxx0rz'd* file() reporting in\n");
return $self->_init;
}
1;
}
use File::Find::Object;
# based on /usr/local/share/perl/5.10.1/File/Find/Object/Result.pm;
# but inst. from /usr/local/share/perl/5.10.1/File/Find/Object.pm
{
package File::Find::Object::StatObj;
use integer;
use Tie::IxHash;
#use Data::Dumper;
sub ordered_hash { # https://stackoverflow.com/a/3001400/277826
#my (#ar) = #_; #print("# ". join(",",#ar) . "\n");
tie my %hash => 'Tie::IxHash';
%hash = #_; #print Dumper(\%hash);
\%hash
}
my $fields = ordered_hash(
# from http://perldoc.perl.org/functions/stat.html
(map { $_ => $_ } (qw(
dev ino mode nlink uid gid rdev size
atime mtime ctime blksize blocks
)))
); #print Dumper(\%{$fields});
use Class::XSAccessor
#accessors => %{$fields}, # cannot - is seemingly late
# ordered_hash gets accepted, but doesn't matter in final dump;
#accessors => { (map { $_ => $_ } (qw(
accessors => ordered_hash( (map { $_ => $_ } (qw(
dev ino mode nlink uid gid rdev size
atime mtime ctime blksize blocks
))) ),
#))) },
;
use Fcntl qw(:mode);
sub new
{
#my $self = shift;
my $class = shift;
my #stat_arr = #_; # the rest
my $ic = 0;
my $self = {};
bless $self, $class;
for my $k (keys %{$fields}) {
$fld = $fields->{$k};
#print "$ic '$k' '$fld' ".join(", ",$stat_arr[$ic])." ; ";
$self->$fld($stat_arr[$ic]);
$ic++;
}
#print "\n";
return $self;
}
1;
}
# try to "replace" the File::Find::Object::Result
{
package File::Find::Object::Result;
use File::Find::Object::Result;
#use File::Find::Object::StatObj; # no, has no file!
use Class::XSAccessor replace => 1,
accessors => {
(map { $_ => $_ } (qw(
base
basename
is_dir
is_file
is_link
path
dir_components
stat_ret
stat_obj
)))
}
;
#use Fcntl qw(:mode);
#sub new # never gets called
sub full_components
{
my $self = shift; #print("NEWCOMP\n");
my $sobj = File::Find::Object::StatObj->new(#{$self->stat_ret()});
$self->stat_obj($sobj); # add stat_obj and its fields
return
[
#{$self->dir_components()},
($self->is_dir() ? () : $self->basename()),
];
}
1;
}
# main script start
my $io = io($targetDirsToScan[0]);
my #contents = $io->all(0); # Get all contents of dir
for my $contentry ( #contents ) {
print Dumper \%{*$contentry};
print $contentry->name . "\n"; # img/test.png
# get a File::Find::Object::Result - must instantiate
# a File::Find::Object; just item_obj() will return undef
# right after instantiation, so must give it "next";
# no instantition occurs for $tro, though!
#my $tffor = File::Find::Object->new({}, ($contentry->name))->next_obj();
my $tffo = File::Find::Object->new({}, ("./".$contentry->name));
my $tffos = $tffo->next(); # just a string!
$tffo->_calc_current_item_obj(); # unfortunately, this will not calculate dir_components ...
my $tffor = $tffo->item_obj();
# ->full_components doesn't call new, either!
# must call full_compoments, to generate the fields
# (assign to unused variable triggers it fine)
# however, $arrref_fullcomp will be empty, because
# File::Find::Object seemingly calcs dir_components only
# if it is traversing a tree...
$arrref_fullcomp = $tffor->full_components;
#print("# ".$tffor->stat_obj->size."\n"); # seems to work
print "> ". join(", ", #$arrref_fullcomp) ." - ". Dumper($tffor);
}

Dynamically loading Perl modules

I am trying to make an extensible system whereby I can code up a new module to be a handler. I want the program to automatically load any new .pm file that is put into the Handlers directory and conforms to a Moose::Role interface.
I'm wondering whether there is a Perl module or a more Moose sanctioned way to do this automatically? Here is what I have built so far, but it seems a little verbose and there has got to be a simpler way to do it.
handler.pl contains:
#!/usr/bin/perl
use Handler;
use Data::Dumper;
my $base_handler = Handler->new();
$base_handler->load_modules('SysG/Handler');
print Dumper($base_handler);
Handler.pm contains:
package Handler;
use Moose;
has 'handlers' => ( traits => ['Array'], handles => { add_handler => 'push' } );
sub load_modules {
my ($self,$dir) = #_;
push(#INC, $dir);
my #modules = find_modules_to_load($dir);
eval {
# Note that this sort is important. The processing order will be critically important.
# The sort implies the sort order
foreach my $module ( sort #modules) {
(my $file = $module) =~ s|::|/|g;
print "About to load $file.pm for module $module\n" ;
require $file . '.pm';
$module->import();
my $obj = $module->new();
$self->add_handler($obj);
1;
}
} or do {
my $error = $#;
print "Error loading modules: $error" if $error;
};
}
sub find_modules_to_load {
my ($dir) = #_;
my #files = glob("$dir/*.pm");
my $namespace = $dir;
$namespace =~ s/\//::/g;
# Get the leaf name and add the System::Module namespace to it
my #modules = map { s/.*\/(.*).pm//g; "${namespace}::$1"; } #files;
die "ERROR: No classifier modules found in $dir\n" unless #modules;
return #modules;
}
1;
Then I have made a directory called SysG/Handler and added two .pm files which ordinarily will conform to a Moose::Role (as if to define an interface that must be adhered too).
The SysG::Handler::0001_HandleX.pm stub contains:
package SysG::Handler::0001_HandleX;
use Moose;
1;
The SysG::Handler::0002_HandleX.pm stub contains:
package SysG::Handler::0002_HandleY;
use Moose;
1;
Put all this together and the Data::Dumper result is:
$VAR1 = bless( {
'handlers' => [
bless( {}, 'SysG::Handler::0001_HandleX' ),
bless( {}, 'SysG::Handler::0002_HandleY' )
]
}, 'Handler' );
So, now I repeat my original question: There must be a simpler way, or a module or a Moose way to automatically load any modules in a specific directory.
Any Moose experts able to help out here?
MooseX::Object::Pluggable

Separating configuration data and script logic in Perl scripts

I find the following anti-pattern repeated in my Perl scripts: the script contains some machine/setup specific settings which I store in-line as constants in the script whereas the rest of the script is general in nature:
#!/usr/bin/perl
use strict;
use warnings;
# machine specific settings at the start of the script.
my $SETTING_1 = "foo";
my #SETTING_2 = ("123", "456");
my $SETTING_3 = "something";
# general part of script follows.
...
This pattern is somewhat okay when running on one machine, but as soon as I want to distribute the script to multiple machines the trouble starts since I must keep track so that I do not overwrite the settings part with new updates in the general part.
The correct solution is obviously to have one general script file and have it read a configuration file which is specific to the environment that the script runs in.
My question is: What CPAN module would you recommend for solving this problem? Why?
For configuration files, I like to use YAML. Simple, cross-platform, human-readable, and no danger of your configuration accidentally morphing into an actual program.
My favorite is Config::Std. I like the way it handles multi-line and multi-part configuration values.
You have to be careful when a variable is potentially multi-valued: If a single value exists in the configuration file, it will store the value in a scalar; if multiple values exist, you will get an array reference.
I find it convenient to have two configuration files: One for values that describe the operating environment (where to find libraries etc) and another for user-modifiable behavior.
I also like to write a wrapper around it. For example (updated to include autogenerated read-only accessors):
#!/usr/bin/perl
package My::Config;
use strict; use warnings;
use Config::Std;
use FindBin qw($Bin);
use File::Spec::Functions qw( catfile );
sub new {
my $class = shift;
my ($config_file) = #_;
$config_file = catfile($Bin, 'config.ini');
read_config $config_file => my %config;
my $object = bless \%config => $class;
$object->gen_accessors(
single => {
install => [ qw( root ) ],
},
multi => {
template => [ qw( dir ) ],
},
);
return $object;
}
sub gen_accessors {
my $config = shift;
my %args = #_;
my $class = ref $config;
{
no strict 'refs';
for my $section ( keys %{ $args{single} } ) {
my #vars = #{ $args{single}->{$section} };
for my $var ( #vars ) {
*{ "${class}::${section}_${var}" } = sub {
$config->{$section}{$var};
};
}
}
for my $section ( keys %{ $args{multi} } ) {
my #vars = #{ $args{multi}->{$section} };
for my $var ( #vars ) {
*{ "${class}::${section}_${var}" } = sub {
my $val = $config->{$section}{$var};
return [ $val ] unless 'ARRAY' eq ref $val;
return $val;
}
}
}
}
return;
}
package main;
use strict; use warnings;
my $config = My::Config->new;
use Data::Dumper;
print Dumper($config->install_root, $config->template_dir);
C:\Temp> cat config.ini
[install]
root = c:\opt
[template]
dir = C:\opt\app\tmpl
dir = C:\opt\common\tmpl
Output:
C:\Temp> g.pl
$VAR1 = 'c:\\opt';
$VAR2 = [
'C:\\opt\\app\\tmpl',
'C:\\opt\\common\\tmpl'
];
The Config:Properties library is good for reading and writing key/value pair property files.
I prefer YAML and YAML::XS for configuration data. It's simple, readable, and has bindings for almost any programming language. Another popular choice is Config::General.
The usual low-tech method is to simply do EXPR a configuration file. Have you looked into this?
At the risk of being laughed out of class, one solution is to store the config in XML (or for more adventurous, JSON). Human-consumable, interoperable outside of Perl, doesn't have to live on local PC (both XML and JSON can be requested off of a "config URL") and a bunch of standard modules (XML::Simple is usually good enough for config XML files) exist on CPAN.
For simple configuration like this, especially for trivial things where I don't expect this data to change in the real world, I often simply use YAML. The simplicity cannot be beat:
First, write your Perl data structure containing your configuration.
use YAML;
my $SETTINGS = {
'1' => "foo",
'2' => ["123", "456"],
'3' => "something",
};
Then, pass it to YAML::DumpFile();
YAML::DumpFile("~/.$appname.yaml", $SETTINGS);
Delete the data structure and replace it with
my $SETTINGS = YAML::LoadFile("~/.$appname.yaml");
And then forget about it. Even if you don't know or want to learn YAML syntax, small changes to the config can be made by hand and more major ones can be done in Perl and then re-dumped to YAML.
Don't tie yourself to a format -- use Config::Any, or for a little more whizbang DWIM factor, Config::JFDI (which itself wraps Config::Any). With them you buy yourself the ability to support INI, YAML, XML, Apache-style config, and more.
Config::JFDI builds on this by trying to capture some of the magic of Catalyst's config loader: merging of instance-local config with app-wide config, environment variable support, and a limited macro facility (__path_to(foo/bar)__ comes in handy surprisingly often.)

How do I convert Data::Dumper output back into a Perl data structure?

I was wondering if you could shed some lights regarding the code I've been doing for a couple of days.
I've been trying to convert a Perl-parsed hash back to XML using the XMLout() and XMLin() method and it has been quite successful with this format.
#!/usr/bin/perl -w
use strict;
# use module
use IO::File;
use XML::Simple;
use XML::Dumper;
use Data::Dumper;
my $dump = new XML::Dumper;
my ( $data, $VAR1 );
Topology:$VAR1 = {
'device' => {
'FOC1047Z2SZ' => {
'ChassisID' => '2009-09',
'Error' => undef,
'Group' => {
'ID' => 'A1',
'Type' => 'Base'
},
'Model' => 'CATALYST',
'Name' => 'CISCO-SW1',
'Neighbor' => {},
'ProbedIP' => 'TEST',
'isDerived' => 0
}
},
'issues' => [
'TEST'
]
};
# create object
my $xml = new XML::Simple (NoAttr=>1,
RootName=>'data',
SuppressEmpty => 'true');
# convert Perl array ref into XML document
$data = $xml->XMLout($VAR1);
#reads an XML file
my $X_out = $xml->XMLin($data);
# access XML data
print Dumper($data);
print "STATUS: $X_out->{issues}\n";
print "CHASSIS ID: $X_out->{device}{ChassisID}\n";
print "GROUP ID: $X_out->{device}{Group}{ID}\n";
print "DEVICE NAME: $X_out->{device}{Name}\n";
print "DEVICE NAME: $X_out->{device}{name}\n";
print "ERROR: $X_out->{device}{error}\n";
I can access all the element in the XML with no problem.
But when I try to create a file that will house the parsed hash, problem arises because I can't seem to access all the XML elements. I guess, I wasn't able to unparse the file with the following code.
#!/usr/bin/perl -w
use strict;
#!/usr/bin/perl
# use module
use IO::File;
use XML::Simple;
use XML::Dumper;
use Data::Dumper;
my $dump = new XML::Dumper;
my ( $data, $VAR1, $line_Holder );
#this is the file that contains the parsed hash
my $saveOut = "C:/parsed_hash.txt";
my $result_Holder = IO::File->new($saveOut, 'r');
while ($line_Holder = $result_Holder->getline){
print $line_Holder;
}
# create object
my $xml = new XML::Simple (NoAttr=>1, RootName=>'data', SuppressEmpty => 'true');
# convert Perl array ref into XML document
$data = $xml->XMLout($line_Holder);
#reads an XML file
my $X_out = $xml->XMLin($data);
# access XML data
print Dumper($data);
print "STATUS: $X_out->{issues}\n";
print "CHASSIS ID: $X_out->{device}{ChassisID}\n";
print "GROUP ID: $X_out->{device}{Group}{ID}\n";
print "DEVICE NAME: $X_out->{device}{Name}\n";
print "DEVICE NAME: $X_out->{device}{name}\n";
print "ERROR: $X_out->{device}{error}\n";
Do you have any idea how I could access the $VAR1 inside the text file?
Regards,
newbee_me
$data = $xml->XMLout($line_Holder);
$line_Holder has only the last line of your file, not the whole file, and not the perl hashref that would result from evaling the file. Try something like this:
my $ref = do $saveOut;
The do function loads and evals a file for you. You may want to do it in separate steps, like:
use File::Slurp "read_file";
my $fileContents = read_file( $saveOut );
my $ref = eval( $fileContents );
You might want to look at the Data::Dump module as a replacement for Data::Dumper; its output is already ready to re-eval back.
Basically to load Dumper data you eval() it:
use strict;
use Data::Dumper;
my $x = {"a" => "b", "c"=>[1,2,3],};
my $q = Dumper($x);
$q =~ s{\A\$VAR\d+\s*=\s*}{};
my $w = eval $q;
print $w->{"a"}, "\n";
The regexp (s{\A\$VAR\d+\s*=\s*}{}) is used to remove $VAR1= from the beginning of string.
On the other hand - if you need a way to store complex data structure, and load it again, it's much better to use Storable module, and it's store() and retrieve() functions.
This has worked for me, for hashes of hashes. Perhaps won't work so well with structures which contain references other structures. But works well enough for simple structures, like arrays, hashes, or hashes of hashes.
open(DATA,">",$file);
print DATA Dumper(\%g_write_hash);
close(DATA);
my %g_read_hash = %{ do $file };
Please use dump module as a replacement for Data::Dumper
You can configure the variable name used in Data::Dumper's output with $Data::Dumper::Varname.
Example
use Data::Dumper
$Data::Dumper::Varname = "foo";
my $string = Dumper($object);
eval($string);
...will create the variable $foo, and should contain the same data as $object.
If your data structure is complicated and you have strange results, you may want to consider Storable's freeze() and thaw() methods.