How to include a data file with a Perl module? - perl

What is the "proper" way to bundle a required-at-runtime data file with a Perl module, such that the module can read its contents before being used?
A simple example would be this Dictionary module, which needs to read a list of (word,definition) pairs at startup.
package Reference::Dictionary;
# TODO: This is the Dictionary, which needs to be populated from
# data-file BEFORE calling Lookup!
our %Dictionary;
sub new {
my $class = shift;
return bless {}, $class;
}
sub Lookup {
my ($self,$word) = #_;
return $Dictionary{$word};
}
1;
and a driver program, Main.pl:
use Reference::Dictionary;
my $dictionary = new Reference::Dictionary;
print $dictionary->Lookup("aardvark");
Now, my directory structure looks like this:
root/
Main.pl
Reference/
Dictionary.pm
Dictionary.txt
I can't seem to get Dictionary.pm to load Dictionary.txt at startup. I've tried a few methods to get this to work, such as...
Using BEGIN block:
BEGIN {
open(FP, '<', 'Dictionary.txt') or die "Can't open: $!\n";
while (<FP>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
close(FP);
}
No dice: Perl is looking in cwd for Dictionary.txt, which is the path of the main script ("Main.pl"), not the path of the module, so this gives File Not Found.
Using DATA:
BEGIN {
while (<DATA>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
close(DATA);
}
and at end of module
__DATA__
aardvark,an animal which is definitely not an anteater
abacus,an oldschool calculator
...
This too fails because BEGIN executes at compile-time, before DATA is available.
Hard-code the data in the module
our %Dictionary = (
aardvark => 'an animal which is definitely not an anteater',
abacus => 'an oldschool calculator'
...
);
Works, but is decidedly non-maintainable.
Similar question here: How should I distribute data files with Perl modules? but that one deals with modules installed by CPAN, not modules relative to the current script as I'm attempting to do.

There's no need to load the dictionary at BEGIN time. BEGIN time is relative to the file being loaded. When your main.pl says use Dictionary, all the code in Dictionary.pm is compiled and loaded. Put the code to load it early in Dictionary.pm.
package Dictionary;
use strict;
use warnings;
my %Dictionary; # There is no need for a global
while (<DATA>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
You can also load from Dictionary.txt located in the same directory. The trick is you have to provide an absolute path to the file. You can get this from __FILE__ which is the path to the current file (ie. Dictionary.pm).
use File::Basename;
# Get the directory Dictionary.pm is located in.
my $dir = dirname(__FILE__);
open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";
my %Dictionary;
while (<$fh>) {
chomp;
my ($word, $def) = split(/,/);
$Dictionary{$word} = $def;
}
close($fh);
Which should you use? DATA is easier to distribute. A separate, parallel file is easier for non-coders to work on.
Better than loading the whole dictionary when the library is loaded, it is more polite to wait to load it when it's needed.
use File::Basename;
# Load the dictionary from Dictionary.txt
sub _load_dictionary {
my %dictionary;
# Get the directory Dictionary.pm is located in.
my $dir = dirname(__FILE__);
open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";
while (<$fh>) {
chomp;
my ($word, $def) = split(/,/);
$dictionary{$word} = $def;
}
return \%dictionary;
}
# Get the possibly cached dictionary
my $Dictionary;
sub _get_dictionary {
return $Dictionary ||= _load_dictionary;
}
sub new {
my $class = shift;
my $self = bless {}, $class;
$self->{dictionary} = $self->_get_dictionary;
return $self;
}
sub lookup {
my $self = shift;
my $word = shift;
return $self->{dictionary}{$word};
}
Each object now contains a reference to the shared dictionary (eliminating the need for a global) which is only loaded when an object is created.

I suggest using DATA with INIT instead of BEGIN to ensure that the data is initialised before run time. It also makers it more self-documenting
Or it may be more appropriate to use a UNITCHECK block, which will be executed as early as possible, immediately after the library file is compiled, and so can be considered as an extension of the compilation
package Dictionary;
use strict;
use warnings;
my %dictionary;
UNITCHECK {
while ( <DATA> ) {
chomp;
my ($k, $v) = split /,/;
$dictionary{$k} = $v;
}
}

Related

include/eval perl file into unique namespace defined at runtime

I'm writing a tool that must import a number of other perl config files. The files are not wrapped w/packages and may have similar or conflicting variables/functions. I don't have the ability to change the format of these files, so I must work around what they are. What I was thinking to do was import each into a unique name space, but I've not found a way to do that using do, require, or use. If I don't use dynamic names, just a hardcoded name, I can do it.
Want something like this:
sub sourceTheFile {
my ($namespace, $file) = #_;
package $namespace;
do $file;
1;
return;
}
That doesn't work because the package command requires a constant for the name. So then I try something like this:
sub sourceTheFile {
my ($namespace, $file) = #_;
eval "package $namespace;do $file;1;"
return;
}
But the contents of the file read by do are placed in the main:: scope not the one I want. The target scope is created, just not populated by the
do. (I tried require, and just a straight cat $file inside the eval as well.)
I'm using Devel::Symdump to verify that the namespaces are built correctly or not.
example input file:
my $xyz = "some var";
%all_have_this = ( common=>"stuff" );
ADDITIONAL CHALLENGE
Using the answer that does the temp file build and do call, I can make this work dynamically as I require. BUT, big but, how do I now reference the data inside this new namespace? Perl doesn't seem to have the lose ability to build a variable name from a string and use that as the variable.
I am not sure why the eval did not work. Maybe a bug? Here is a workaround using a temp file. This works for me:
use strict;
use warnings;
use Devel::Symdump;
use File::Temp;
my $file = './test.pl';
my $namespace = 'TEST';
{
my $fh = File::Temp->new();
print $fh "package $namespace;\n";
print $fh "do '$file';\n";
print $fh "1;\n";
close $fh;
do $fh->filename;
}
Perl's use and require facilities make use of any hooks you might have installed in #INC. You can simply install a hook which looks in a specific location to load modules with a prefix you choose:
package MyIncHook;
use strict;
use warnings;
use autouse Carp => qw( croak );
use File::Spec::Functions qw( catfile );
sub import {
my ($class, $prefix, $location) = #_;
unshift #INC, _loader_for($prefix, $location);
return;
}
sub _loader_for {
my $prefix = shift;
my $location = shift;
$prefix =~ s{::}{/}g;
return sub {
my $self = shift;
my $wanted = shift;
return unless $wanted =~ /^\Q$prefix/;
my $path = catfile($location, $wanted);
my ($is_done);
open my $fh, '<', $path
or croak "Failed to open '$path' for reading: $!";
my $loader = sub {
if ($is_done) {
close $fh
or croak "Failed to close '$path': $!";
return 0;
}
if (defined (my $line = <$fh>)) {
$_ = $line;
return 1;
}
else {
$_ = "1\n";
$is_done = 1;
return 1;
}
};
(my $package = $wanted) =~ s{/}{::}g;
$package =~ s/[.]pm\z//;
my #ret = (\"package $package;", $loader);
return #ret;
}
}
__PACKAGE__;
__END__
Obviously, modify the construction of $path according to your requirements.
You can use it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use MyIncHook ('My::Namespace', "$ENV{TEMP}/1");
use My::Namespace::Rand;
print $My::Namespace::Rand::settings{WARNING_LEVEL}, "\n";
where $ENV{TEMP}/1/My/Namespace/Rand.pm contains:
%settings = (
WARNING_LEVEL => 'critical',
);
Output:
C:\Temp> perl t.pl
critical
You can, obviously, define your own mapping from made up module names to file names.

function call in perl

As a part of my course work I have been learning perl programming language for the first time in last the few weeks. I have been writing small functions and making function calls. I have written a function for string matching.
use strict;
use warnings;
sub find_multi_string {
my ($file, #strings) = #_;
my $fh;
open ($fh, "<$file");
#store the whole file in an array
my #array = <$fh>;
for my $string (#strings) {
if (grep /$string/, #array) {
next;
} else {
die "Cannot find $string in $file";
}
}
return 1;
}
find_multi_string('file name', 'string1','string2','string3','string4','string 5');
In the above script I'm passing the arguments in the function call. The script works.
But I'd like to know if there is way to specify the file name and string1... string n in an array in the program itself and just make the function call.
find_multi_string();
That would be a mistake, always pass parameters and return values to your subroutines.
What you're describing is essentially using subroutines solely to subdivide and document your code. If you were to do that, it would better to just remove the subroutine entirely and include a comment before the section of code.
Overall, your code looks good as is. You probably will want to use quotemeta though, and your logic can be simplified a little:
use strict;
use warnings;
use autodie;
sub find_multi_string {
my ($file, #strings) = #_;
# Load the file
my $data = do {
open my $fh, "<", $file;
local $/;
<$fh>
};
for my $string (#strings) {
if ($data !~ /\Q$string/) {
die "Cannot find $string in $file";
}
}
return 1;
}
find_multi_string('file name', 'string1','string2','string3','string4','string 5');
A few improvements of your original code:
use autodie
use 3-args open
as you want to check anywhere in the file, just load the file as a single string
if the matching string are just text without metacharacters from regexp, just use the index function
Your question is about passing the function arguments from your program.
I suspect that you are looking for #ARGV. See perlvar.
Here is the modified code:
use strict;
use warnings;
use autodie;
sub find_multi_string {
my ($file, #strings) = #_;
my $content = do {
open my $fh, '<', $file;
local $/;
<$fh>
};
foreach (#strings) {
die "Cannot find $string in $file" unless index($content, $_) >= 0;
}
return 1;
}
find_multi_string(#ARGV);

perl OOP pass variable

This is SeqIO.pm
package SeqIO;
use strict;
use Carp;
use warnings;
use vars('#ISA');
use vars('#EXPORT_OK');
require Exporter;
#ISA = qw(Exporter);
#EXPORT_OK = qw(readSeq writeSeq);
sub readSeq {
my ($var1)= #_;
print "$var1\n";
open IN, '<$var1' or die "Cannot open file : $!";
while(<IN>) {
chomp $_;
print "$_\n";
}
close IN
}
sub writeSeq {}
sub new {
my $this = {};
bless $this;
return $this;
}
1;
Test.pl call SeqIO.pm
use strict;
use SeqIO;
use warnings;
my $path_fasta=q/D:\360Downloads\A1.fasta/;
my $seqio = new SeqIO;
$seqio->readSeq($path_fasta);
BUT when i use readSeq it shows SeqIO=HASH(0x38ba34), anything wrong?
When you call a Perl subroutine as a method using the -> operator, the invocant, meaning the thing on the left side of -> is passed into the subroutine as the first parameter. So change your method to this:
sub readSeq {
my ($self, $var1) = #_;
print "$var1\n";
open my $in, '<', $var1 or die "Cannot open file : $!";
while(<$in>) {
chomp $_;
print "$_\n";
}
}
I've also changed your filehandle to a lexical variable instead of a global symbol, and changed your open call to the three-argument version which is better.
There's also no need to export readSeq and writeSeq if you intend to use those as object methods. Exporting is only for when you want to modify the client code's namespace. So you can delete all of this:
use vars('#ISA');
use vars('#EXPORT_OK');
require Exporter;
#ISA = qw(Exporter);
#EXPORT_OK = qw(readSeq writeSeq);

How to I use a class property/variable as a print filehandle in Perl?

I want to do the same thing as
open MYFILE, ">", "data.txt";
print MYFILE "Bob\n";
but instead in class variable like
sub _init_tmp_db
{
my ($self) = #_;
open $$self{tmp_db_fh}, ">", "data.txt";
print $$self{tmp_db_fh} "Bob\n";
}
It gave me this error : 'String found where operator expected near "Bob\n"'
what should I do?
From the print manpage:
If you're storing handles in an array or hash, or in general whenever
you're using any expression more complex than a bareword handle or a
plain, unsubscripted scalar variable to retrieve it, you will have to
use a block returning the filehandle value instead.
You should be using:
print { $$self{tmp_db_fh} } "Bob\n";
This code won't work under use strict. To fix it just use a my variable:
open my $fh, ">", "data.txt" or die $!;
$$self{tmp_db_fh} = $fh;
print { $$self{tmp_db_fh} } "Bob\n";
You should the IO::File module instead.
use IO::File;
my $file = IO::File->new;
$file->open("> data.txt");
print_something($file);
sub print_something {
my ($file) = #_;
$file->print("hello world\n");
}
Or in your example function:
use IO::File;
# ...
sub _init_tmp_db
{
my ($self) = #_;
$self{tmp_db_fh} = IO::File->new;
$self{tmp_db_fh}->open(">", "data.txt");
$self{tmp_db_fh}->print"Bob\n";
}
(note, you can still non -> based calls too, but I wrote the above
using the more traditional ->open() type calls.)
Filehandles can only be scalars.
But $$self{tmp_db_fh} is either an open filehandle (to data.txt) then this would work:
sub _init_tmp_db
{
my ($self) = #_;
my $filehandle = $$self{tmp_db_fh} ;
print $filehandle "Bob\n";
}
or you open the filehandle inside _init_tmp_db
sub _init_tmp_db
{
my ($self) = #_;
open my $filehandle , ">", "data.txt" or die "Cannot open data.txt" ;
print $filehandle "Bob\n";
}
But providing a string in $$self{tmp_db_fh} (like 'FILEHANDLE') won't work.
This is easily solved by creating a variable for a file handle:
sub _init_tmp_db {
my $self = shift;
my $fh;
open $fh, ">", "data.txt"
$self->{temp_db_fh} = $fh;
# Sometime later...
$fh = $self-{temp_db_hf};
print $fh "Bob\n";
}
This is an issue because the way the print syntax is parsed and the early sloppiness of the syntax. The print statement has really two separate formats: Format #1 is that the you're simply passing it stuff to print. Format #2 says that the first item may be a file handle, and the rest is the stuff you want to print to the file handle. If print can't easily determine that the first parameter is a file handle, it fails.
If you look at other languages, they'll use a parameter for passing the file handle, and maybe the stuff to print. Or in object oriented languages, they'll overload >> for the file handle parameter. They'll look something like this:
print "This is my statement", file=file_handle;
or
print "This is my statement" >> file_handle;
You might be able to munge the syntax to get away from using a variable. However, it doesn't make the program more efficient or more readable, and may simply make the program harder to maintain. So, just use a variable for the file handle.
You said class in your title. I assume that you are interested in writing a fully fledge object oriented package to do this. Here's a quick example. Notice in the write subroutine method I retrieve the file handle into a variable and use the variable in the print statement.
#! /usr/bin/env perl
#
use strict;
use warnings;
#######################################################
# MAIN PROGRAM
#
my $file = File->new;
$file->open("OUTPUT") or
die "Can't open 'OUTPUT' for writing\n";
$file->write("This is a test");
#
#######################################################
package File;
use Carp;
sub new {
my $class = shift;
my $self = {};
bless $self, $class;
return $self;
}
sub open {
my $self = shift;
my $file = shift;
my $fh;
if (defined $file) {
$self->{FILE} = $file;
open ($fh, ">", $file) and $self->_fh($fh);
}
return $self->_fh;
}
sub _fh {
my $self = shift;
my $fh = shift;
if (defined $fh) {
$self->{FH} = $fh;
}
return $self->{FH};
}
sub write {
my $self = shift;
my $note = shift;
my $fh = $self->_fh;
print $fh $note . "\n";
return
}

How to pass a file handle to a function?

When I run the code below I get
Can't use string ("F") as a symbol ref while "strict refs" in use at ./T.pl line 21.
where line 21 is
flock($fh, LOCK_EX);
What am I doing wrong?
#!/usr/bin/perl
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET'; # file locking
use Data::Dumper;
# use xx;
my $file = "T.yaml";
my $fh = "F";
my $obj = open_yaml_with_lock($file, $fh);
$obj->{a} = 1;
write_yaml_with_lock($obj, $fh);
sub open_yaml_with_lock {
my ($file, $fh) = #_;
open $fh, '+<', $file;
flock($fh, LOCK_EX);
my $obj = YAML::Syck::LoadFile($fh);
return $obj;
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh,0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}
What you're doing wrong is using the string "F" as a filehandle. This
has never been something that's worked; you could use a bareword as a
filehandle (open FH, ...; print FH ...), or you could pass in an
empty scalar and perl would assign a new open file object to that
variable. But if you pass in the string F, then you need to refer to
then handle as F, not $fh. But, don't do that.
Do this instead:
sub open_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
We're doing several things here. One, we're not storing the
filehandle in a global. Global state makes your program extremely
difficult to understand -- I had a hard time with your 10 line post --
and should be avoided. Just return the filehandle, if you want to
keep it around. Or, you can alias it like open does:
sub open_yaml_with_lock {
open $_[0], '+<', $_[1] or die $!;
...
}
open_yaml_with_lock(my $fh, 'filename');
write_yaml_with_lock($fh);
But really, this is a mess. Put this stuff in an object. Make new
open and lock the file. Add a write method. Done. Now you can
reuse this code (and let others do the same) without worrying about
getting something wrong. Less stress.
The other thing we're doing here is checking errors. Yup, disks can
fail. Files can be typo'd. If you blissfully ignore the return value
of open and flock, then your program may not be doing what you think
it's doing. The file might not be opened. The file might not be
locked properly. One day, your program is not going to work properly
because you spelled "file" as "flie" and the file can't be opened.
You will scratch your head for hours wondering what's going on.
Eventually, you'll give up, go home, and try again later. This time,
you won't typo the file name, and it will work. Several hours will
have been wasted. You'll die several years earlier than you should
because of the accumulated stress. So just use autodie or write or
die $! after your system calls so that you get an error message when
something goes wrong!
Your script would be correct if you wrote use autodie qw/open flock
seek close/ at the top. (Actually, you should also check that
"print" worked or use
File::Slurp or
syswrite, since autodie can't detect a failing print statement.)
So anyway, to summarize:
Don't open $fh when $fh is defined. Write open my $fh to
avoid thinking about this.
Always check the return values of system calls. Make autodie do
this for you.
Don't keep global state. Don't write a bunch of functions that
are meant to be used together but rely on implicit preconditions
like an open file. If functions have preconditions, put them in
a class and make the constructor satisfy the preconditions.
This way, you can't accidentally write buggy code!
Update
OK, here's how to make this more OO. First we'll do "pure Perl" OO
and then use Moose. Moose is
what I would use for any real work; the "pure Perl" is just for the
sake of making it easy to understand for someone new to both OO and
Perl.
package LockedYAML;
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET';
use YAML::Syck;
use autodie qw/open flock sysseek syswrite/;
sub new {
my ($class, $filename) = #_;
open my $fh, '+<', $filename;
flock $fh, LOCK_EX;
my $self = { obj => YAML::Syck::LoadFile($fh), fh => $fh };
bless $self, $class;
return $self;
}
sub object { $_[0]->{obj} }
sub write {
my ($self, $obj) = #_;
my $yaml = YAML::Syck::Dump($obj);
local $YAML::Syck::ImplicitUnicode = 1; # ensure that this is
# set for us only
my $fh = $self->{fh};
# use system seek/write to ensure this really does what we
# mean. optional.
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
$self->{obj} = $obj; # to keep things consistent
}
Then, we can use the class in our main program:
use LockedYAML;
my $resource = LockedYAML->new('filename');
print "Our object looks like: ". Dumper($resource->object);
$resource->write({ new => 'stuff' });
Errors will throw exceptions, which can be handled with
Try::Tiny, and the YAML
file will stay locked as long as the instance exists. You can, of
course, have many LockedYAML objects around at once, that's why we
made it OO.
And finally, the Moose version:
package LockedYAML;
use Moose;
use autodie qw/flock sysseek syswrite/;
use MooseX::Types::Path::Class qw(File);
has 'file' => (
is => 'ro',
isa => File,
handles => ['open'],
required => 1,
coerce => 1,
);
has 'fh' => (
is => 'ro',
isa => 'GlobRef',
lazy_build => 1,
);
has 'obj' => (
is => 'rw',
isa => 'HashRef', # or ArrayRef or ArrayRef|HashRef, or whatever
lazy_build => 1,
trigger => sub { shift->_update_obj(#_) },
);
sub _build_fh {
my $self = shift;
my $fh = $self->open('rw');
flock $fh, LOCK_EX;
return $fh;
}
sub _build_obj {
my $self = shift;
return YAML::Syck::LoadFile($self->fh);
}
sub _update_obj {
my ($self, $new, $old) = #_;
return unless $old; # only run if we are replacing something
my $yaml = YAML::Syck::Dump($new);
local $YAML::Syck::ImplicitUnicode = 1;
my $fh = $self->fh;
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
return;
}
This is used similarly:
use LockedYAML;
my $resource = LockedYAML->new( file => 'filename' );
$resource->obj; # the object
$resource->obj( { new => 'object' }); # automatically saved to disk
The Moose version is longer, but does a lot more runtime consistency
checking and is easier to enhance. YMMV.
From the documentation:
open FILEHANDLE,EXPR
If FILEHANDLE is an undefined scalar variable (or array or hash
element) the variable is assigned a reference to a new anonymous
filehandle, otherwise if FILEHANDLE is an expression, its value is
used as the name of the real filehandle wanted. (This is considered a
symbolic reference, so "use strict 'refs'" should
not be in effect.)
Filehandle here is an expression ("F") so itsvalue is used as the name of the real filehandle you want. (A filehandle called F). And then... the documentation says "use strict 'refs'" should not be in effect, because you're using F as a symbolic reference.
(use strict; on line 1 includes strict 'refs'.)
Had you just said at the beginning:
my $fh;
This would have worked, because then $fh would become a reference to a new anonymous filehandle and Perl won't try to use it as a symbolic reference.
This works:
#!/usr/bin/perl
my $global_fh;
open_filehandle(\$global_fh);
use_filehandle(\$global_fh);
sub open_filehandle {
my ($fh)=#_;
open($$fh, ">c:\\temp\\testfile") || die;
}
sub use_filehandle {
my($fh) = #_;
# Print is pecular that it expects the next token to be the filehandle
# or a simple scalar. Thus, print $$fh "Hello, world!" will not work.
my $lfh = $$fh;
print $lfh "Hello, world!";
close($$fh);
}
Or you can do what the other poster suggested and use $_[1] directly, but that's a bit harder to read.
If you use the value directly in the sub, it will work:
use strict;
use warnings;
use autodie;
my $fh;
yada($fh);
print $fh "testing, testing";
sub yada {
open $_[0], '>', 'yada.gg';
}
Or as a reference:
yada(\$fh);
sub yada {
my $handle = shift;
open $$handle, '>', 'yada.gg';
}
Or better yet, return a filehandle:
my $fh = yada($file);
sub yada {
my $inputfile = shift;
open my $gg, '>', $inputfile;
return $gg;
}
Replace
my $fh = "F"; # text and also a ref in nonstrict mode
with
my $fh = \*F; # a reference, period
Of course, it's better yet to use lexical filehandles, as in open my $fd, ... or die ..., but that's not always possible, e.g. you have STDIN that's predefined. In such cases, use \*FD wherever $fd fits.
There's also a case with old scripts, you have to watch out where a global FD is opened and closed.