Call a subroutine defined as a variable - perl

I am working on a program which uses different subroutines in separate files.
There are three parts
A text file with the name of the subroutine
A Perl program with the subroutine
The main program which extracts the name of the subroutine and launches it
The subroutine takes its data from a text file.
I need the user to choose the text file, the program then extracts the name of the subroutine.
The text file contains
cycle.name=cycle01
Here is the main program :
# !/usr/bin/perl -w
use strict;
use warnings;
use cycle01;
my $nb_cycle = 10;
# The user chooses a text file
print STDERR "\nfilename: ";
chomp($filename = <STDIN>);
# Extract the name of the cycle
open (my $fh, "<", "$filename.txt") or die "cannot open $filename";
while ( <$fh> ) {
if ( /cycle\.name/ ) {
(undef, $cycleToUse) = split /\s*=\s*/;
}
}
# I then try to launch the subroutine by passing variables.
# This fails because the subroutine is defined as a variable.
$cycleToUse($filename, $nb_cycle);
And here is the subroutine in another file
# !/usr/bin/perl
package cycle01;
use strict;
use warnings;
sub cycle01 {
# Get the total number of arguments passed
my ($filename, $nb_cycle) = #_;
print "$filename, $nb_cycle";

Your code doesn't compile, because in the final call, you have mistyped the name of $nb_cycle. It's helpful if you post code that actually runs :-)
Traditionally, Perl module names start with a capital letter, so you might want to rename your package to Cycle01.
The quick and dirty way to do this is to use the string version of eval. But evaluating an arbitrary string containing code is dangerous, so I'm not going to show you that. The best way is to use a dispatch table - basically a hash where the keys are valid subroutine names and the values are references to the subroutines themselves. The best place to add this is in the Cycle01.pm file:
our %subs = (
cycle01 => \&cycle01,
);
Then, the end of your program becomes:
if (exists $Cycle01::subs{$cycleToUse}) {
$Cycle01::subs{$cycleToUse}->($filename, $nb_cycle);
} else {
die "$cycleToUse is not a valid subroutine name";
}
(Note that you'll also need to chomp() the lines as you read them in your while loop.)

To build on Dave Cross' answer, I usually avoid the hash table, partly because, in perl, everything is a hash table anyway. Instead, I have all my entry-point subs start with a particular prefix, that prefix depends on what I'm doing, but here we'll just use ep_ for entry-point. And then I do something like this:
my $subname = 'ep_' . $cycleToUse;
if (my $func = Cycle01->can($subname))
{
$func->($filename, $nb_cycle);
}
else
{
die "$cycleToUse is not a valid subroutine name";
}
The can method in UNIVERSAL extracts the CODE reference for me from perl's hash tables, instead of me maintaining my own (and forgetting to update it). The prefix allows me to have other functions and methods in that same namespace that cannot be called by the user code directly, allowing me to still refactor code into common functions, etc.
If you want to have other namespaces as well, I would suggest having them all be in a single parent namespace, and potentially all prefixed the same way, and, ideally, don't allow :: or ' (single quote) in those names, so that you minimise the scope of what the user might call to only that which you're willing to test.
e.g.,
die "Invalid namespace $cycleNameSpaceToUse"
if $cycleNameSpaceToUse =~ /::|'/;
my $ns = 'UserCallable::' . $cycleNameSpaceToUse;
my $subname = 'ep_' . $cycleToUse;
if (my $func = $ns->can($subname))
# ... as before
There are definitely advantages to doing it the other way, such as being explicit about what you want to expose. The advantage here is in not having to maintain a separate list. I'm always horrible at doing that.

Related

Perl, find a match and read next line in perl

I would like to use
myscript.pl targetfolder/*
to read some number from ASCII files.
myscript.pl
#list = <#ARGV>;
# Is the whole file or only 1st line is loaded?
foreach $file ( #list ) {
open (F, $file);
}
# is this correct to judge if there is still file to load?
while ( <F> ) {
match_replace()
}
sub match_replace {
# if I want to read the 5th line in downward, how to do that?
# if I would like to read multi lines in multi array[row],
# how to do that?
if ( /^\sName\s+/ ) {
$name = $1;
}
}
I would recommend a thorough read of perlintro - it will give you a lot of the information you need. Additional comments:
Always use strict and warnings. The first will enforce some good coding practices (like for example declaring variables), the second will inform you about potential mistakes. For example, one warning produced by the code you showed would be readline() on unopened filehandle F, giving you the hint that F is not open at that point (more on that below).
#list = <#ARGV>;: This is a bit tricky, I wouldn't recommend it - you're essentially using glob, and expanding targetfolder/* is something your shell should be doing, and if you're on Windows, I'd recommend Win32::Autoglob instead of doing it manually.
foreach ... { open ... }: You're not doing anything with the files once you've opened them - the loop to read from the files needs to be inside the foreach.
"Is the whole file or only 1st line is loaded?" open doesn't read anything from the file, it just opens it and provides a filehandle (which you've named F) that you then need to read from.
I'd strongly recommend you use the more modern three-argument form of open and check it for errors, as well as use lexical filehandles since their scope is not global, as in open my $fh, '<', $file or die "$file: $!";.
"is this correct to judge if there is still file to load?" Yes, while (<$filehandle>) is a good way to read a file line-by-line, and the loop will end when everything has been read from the file. You may want to use the more explicit form while (my $line = <$filehandle>), so that your variable has a name, instead of the default $_ variable - it does make the code a bit more verbose, but if you're just starting out that may be a good thing.
match_replace(): You're not passing any parameters to the sub. Even though this code might still "work", it's passing the current line to the sub through the global $_ variable, which is not a good practice because it will be confusing and error-prone once the script starts getting longer.
if (/^\sName\s+/){$name = $1;}: Since you've named the sub match_replace, I'm guessing you want to do a search-and-replace operation. In Perl, that's called s/search/replacement/, and you can read about it in perlrequick and perlretut. As for the code you've shown, you're using $1, but you don't have any "capture groups" ((...)) in your regular expression - you can read about that in those two links as well.
"if I want to read the 5th line in downward , how to do that ?" As always in Perl, There Is More Than One Way To Do It (TIMTOWTDI). One way is with the range operator .. - you can skip the first through fourth lines by saying next if 1..4; at the beginning of the while loop, this will test those line numbers against the special $. variable that keeps track of the most recently read line number.
"and if I would like to read multi lines in multi array[row], how to do that ?" One way is to use push to add the current line to the end of an array. Since keeping the lines of a file in an array can use up more memory, especially with large files, I'd strongly recommend making sure you think through the algorithm you want to use here. You haven't explained why you would want to keep things in an array, so I can't be more specific here.
So, having said all that, here's how I might have written that code. I've added some debugging code using Data::Dumper - it's always helpful to see the data that your script is working with.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper; # for debugging
$Data::Dumper::Useqq=1;
for my $file (#ARGV) {
print Dumper($file); # debug
open my $fh, '<', $file or die "$file: $!";
while (my $line = <$fh>) {
next if 1..4;
chomp($line); # remove line ending
match_replace($line);
}
close $fh;
}
sub match_replace {
my ($line) = #_; # get argument(s) to sub
my $name;
if ( $line =~ /^\sName\s+(.*)$/ ) {
$name = $1;
}
print Data::Dumper->Dump([$line,$name],['line','name']); # debug
# ... do more here ...
}
The above code is explicitly looping over #ARGV and opening each file, and I did say above that more verbose code can be helpful in understanding what's going on. I just wanted to point out a nice feature of Perl, the "magic" <> operator (discussed in perlop under "I/O Operators"), which will automatically open the files in #ARGV and read lines from them. (There's just one small thing, if I want to use the $. variable and have it count the lines per file, I need to use the continue block I've shown below, this is explained in eof.) This would be a more "idiomatic" way of writing that first loop:
while (<>) { # reads line into $_
next if 1..4;
chomp; # automatically uses $_ variable
match_replace($_);
} continue { close ARGV if eof } # needed for $. (and range operator)

Reading/dumping a perl hash from shell

I have a read-only perl file with a huge hash defined in it. Is there anyway for me to read this perl file and dump out the hash contents?
this is basic structure of the hash within the file.
%hash_name = {
-files => [
'<some_path>',
],
-dirs => [
'<some_path>',
'<some_path>',
'<some_path>',
'<some_path>',
'<some_path>',
],
};
Ideally you'd copy the file so that you can edit it, then turn it into a module so to use it nicely.
But if for some reason this isn't feasible here are your options.
If that hash is the only thing in the file, "load" it using do† and assign to a hash
use warnings;
use strict;
my $file = './read_this.pl'; # the file has *only* that one hash
my %hash = do $file;
This form of do executes the file (runs it as a script), returning the last expression that is evaluated. With only the hash in the file that last expression is the hash definition, precisely what you need.
If the hash is undeclared, so a global variable (or declared with our), then declare as our a hash with the same name in your program and again load the file with do
our %hash_name; # same name as in the file
do $file; # file has "%hash" or "our %hash" (not "my %hash")
Here we "pick up" the hash that is evaluated as do runs the file by virtues of our
If the hash is "lexical", declared as my %hash (as it should be!) ... well, this is bad. Then you need to parse the text of the file so to extract lines with the hash. This is in general very hard to do, as it amounts to parsing Perl. (A hash can be built using map, returned from a sub as a reference or a flat list ...) Once that is done you eval the variable which contains the text defining that hash.
However, if you know how the hash is built, as you imply, with no () anywhere inside
use warnings;
use strict;
my $file = './read_this.pl';
my $content = do { # "slurp" the file -- read it into a variable
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
my ($hash_text) = $content =~ /\%hash_name\s*=\s*(\(.*?\)/s;
my %hash = eval $hash_text;
This simple shot leaves out a lot, assuming squarely that the hash is as shown. Also note that this form of eval carries real and serious security risks.
†
Files are also loaded using require. Apart from it doing a lot more than do, the important thing here is that even if it runs multiple times require still loads that file only once. This matters for modules in the first place, which shouldn't be loaded multiple times, and use indeed uses require.
On the other hand, do does it every time, what makes it suitable for loading files to be used as data, which presumably should be read every time. This is the recommended method. Note that require itself uses do to actually load the file.
Thanks to Schwern for a comment.

PERL: String Replacement on file

I am working on a script to do a string replacement in a file and I will read the variables and values and files from a configuration file and do string replacement.
Here is my logic to do a string replacement.
sub expansion($$$){
my $f = shift(#_) ; # file Name
my $vname = shift(#_) ; # variable name for pattern match
my $value = shift(#_) ; # value to replace
my $n = "$f".".new";
open ( O, "<$f") or print( "Can't open $f file: $!");
open ( N ,">$n" ) or print( "Can't open $n file: $!");
while (<O>)
{
$_ =~ s/$vname/$value/g; #check for pattern
print N "$_" ;
}
close (O);
close (N);
}
In my logic am reading line by line in from input file ($f) for the pattern and writing to a new file ($n) .
Instead of write to a new file is there any way to do a string replacement the original file when I try to do the same it has only empty file with no contents.
Do not. Never, ever1. Don't you dare, Don't even think of, do not use subroutine prototyping. It is horribly broken (that is, it doesn't do what you think it does) and is dangerous.
Now, we got that out of the way:
Yes, you can do what you want. You can open a file as both read and writable by using the mode <+. So far, so good.
However, due to buffering, you cannot use the standard read and write methods to read and write to the file. Instead, you need to use sysread and syswrite.
Then, what you need to do is read the line, use sysseek to go back to the start of where you read, and then write to that spot.
Not only is it very complex to do, but it is full of peril. Let's take a simple example. I have a document, and I want to replace my curly quotes with straight quotes.
$line =~ s/“|”/"/g;
That should work. I'm replacing one character with another. What could go wrong?
If this is a UTF-8 file (what Macs and Linux systems use by default), those curly quotes are two-byte characters and that straight quote is a single byte character. I would be writing back a line that was shorter than the line I read in. My buffer is going to be off.
Back in the days when computer memory and storage were measured in kilobytes, and you serial devices like reel-to-reel tapes, this type of operation was quite common. However, in this age where storage is vast, it's simply not worth the complexity and error prone process that this entails. Stick with reading from one file, and writing to another. Then use unlink and rename to delete the original and to rename the copy to the original's name.
A few more pointers:
Don't print if the file can't be opened. Use die. Otherwise, your program will simply continue on blithely unaware that it is not working. Even better, use the pragma use autodie;, and you won't have to worry about testing whether or not a read/write failed.
Use scalars for file handles.
That is instead of
open OUT, ">my_file.txt";
use
open my $out_fh, ">my_file.txt";
And, it is highly recommended to use the three parameter open:
Use
open my $out_fh, ">", "my_file.txt";
If you aren't, always add use strict; and use warnings;.
In fact, your Perl syntax is a bit ancient. You need to get a book on Modern Perl. Perl originally was written as a hack language to replace shell and awk programming. However, Perl has morphed into a full fledge language that can handle complex data types, object orientation, and large projects. Learning the modern syntax of Perl will help you find errors, and become a better developer.
1. Like all rules, this can be broken, but only if you have a clear and careful understanding what is going on. It's like those shows that say "Don't do this at home. We're professionals."
sub inplace_expansion($$$){
my $f = shift(#_) ; # file Name
my $vname = shift(#_) ; # variable name for pattern match
my $value = shift(#_) ; # value to replace
local #ARGV = ( $f );
local $^I = '';
while (<>)
{
s/\Q$vname/$value/g; #check for pattern
print;
}
}
or, my preference would run closer to this (basically equivalent, changes mostly in formatting, variable names, etc.):
use English;
sub inplace_expansion {
my ( $filename, $pattern, $replacement ) = #_;
local #ARGV = ( $filename ),
$INPLACE_EDIT = '';
while ( <> ) {
s/\Q$pattern/$replacement/g;
print;
}
}
The trick with local basically simulates a command-line script (as one would run with perl -e); for more details, see perldoc perlrun. For more on $^I (aka $INPLACE_EDIT), see perldoc perlvar.
(For the business with \Q (in the s// expression), see perldoc -f quotemeta. This is unrelated to your question, but good to know. Also be aware that passing regex patterns around in variables—as opposed to, e.g., using literal regexes exclusively— can be vulnerable to injection attacks; Perl's built-in taint mode is useful here.)
EDIT: David W. is right about prototypes.

Pass by value vs pass by reference for a Perl hash

I'm using a subroutine to make a few different hash maps. I'm currently passing the hashmap by reference, but this conflicts when doing it multiple times. Should I be passing the hash by value or passing the hash reference?
use strict;
use warnings;
sub fromFile($){
local $/;
local our %counts =();
my $string = <$_[0]>;
open FILE, $string or die $!;
my $contents = <FILE>;
close FILE or die $!;
my $pa = qr{
( \pL {2} )
(?{
if(exists $counts{lc($^N)}){
$counts{lc($^N)} = $counts{lc($^N)} + 1;
}
else{
$counts{lc($^N)} = '1';
}
})
(*FAIL)
}x;
$contents =~ $pa;
return %counts;
}
sub main(){
my %english_map = &fromFile("english.txt");
#my %german_map = &fromFile("german.txt");
}
main();
When I run the different txt files individually I get no problems, but with both I get some conflicts.
Three comments:
Don't confuse passing a reference with passing by reference
Passing a reference is passing a scalar containing a reference (a type of value).
The compiler passes an argument by reference when it passes the argument without making a copy.
The compiler passes an argument by value when it passes a copy of the argument.
Arguments are always passed by reference in Perl
Modifying a function's parameters (the elements of #_) will change the corresponding variable in the caller. That's one of the reason the convention to copy the parameters exists.
my ($x, $y) = #_; # This copies the args.
Of course, the primary reason for copying the parameters is to "name" them, but it saves us from some nasty surprises we'd get by using the elements of #_ directly.
$ perl -E'sub f { my ($x) = #_; "b"=~/(.)/; say $x; } "a"=~/(.)/; f($1)'
a
$ perl -E'sub f { "b"=~/(.)/; say $_[0]; } "a"=~/(.)/; f($1)'
b
One cannot pass an array or hash as an argument in Perl
The only thing that can be passed to a Perl sub is a list of scalars. (It's also the only thing that can be returned by one.)
Since #a evaluates to $a[0], $a[1], ... in list context,
foo(#a)
is the same as
foo($a[0], $a[1], ...)
That's why we create a reference to the array or hash we want to pass to a sub and pass the reference.
If we didn't, the array or hash would be evaluated into a list of scalars, and it would have to be reconstructed inside the sub. Not only is that expensive, it's impossible in cases like
foo(#a, #b)
because foo has no way to know how many arguments were returned by #a and how many were returned by #b.
Note that it's possible to make it look like an array or hash is being passed as an argument using prototypes, but the prototype just causes a reference to the array/hash to be created automatically, and that's what actually passed to the sub.
For a couple of reasons you should use pass-by-reference, but the code you show returns the hash by value.
You should use my rather than local except for built-in variables like $/, and then for only as small a scope as possible.
Prototypes on subroutines are almost never a good idea. They do something very specific, and if you don't know what that is you shouldn't use them.
Calling subroutines using the ampersand sigil, as in &fromFile("english.txt"), hasn't been correct since Perl 4, about twenty years ago. It affects the parameters delivered to a subroutine in at least two different ways and is a bad idea.
I'm not sure why you are using a file glob with my $string = <$_[0]>. Are you expecting wildcards in the filename passed as the parameter? If so then you will be opening and reading only the first matching file, otherwise the glob is unnecessary.
Lexical file handles like $fh are better than bareword file handles like FILE, and will be closed implicitly when they are destroyed - usually at the end of the block where they are declared.
I am not sure how your hash %counts gets populated. No regex on its own can fill a hash, but I will have to trust you!
Try this version. People familiar with Perl will thank you (ironically!) for not using camel-case variable names. And it is rare to see a main subroutine declared and called. That is C, this is Perl.
Update I have changed this code to do what your original regex did.
use strict;
use warnings;
sub from_file {
my ($filename) = #_;
my $contents = do {
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
local $/;
my $contents = <$fh>;
};
my %counts;
$counts{lc $1}++ while $contents =~ /(?=(\pL{2}))/g;
return \%counts;
}
sub main {
my $english_map = from_file('english.txt');
my $german_map = from_file('german.txt');
}
main();
You can use either a reference or pass the entire hash or array. Your choice. There are two issues that might make you choose one over the other:
Passing other parameters
Memory Management
Perl doesn't really have subroutine parameters. Instead, you're simply passing in an array of parameters. What if you're subroutine is seeing which array has more elements. I couldn't do this:
foo(#first, #second);
because all I'll be passing in is one big array that combines all the members of both. This is true with hashes too. Imagine a program that takes two hashes and finds the ones with common keys:
#common_keys = common(%hash1, %hash1);
Again, I'm combining all the keys and their values in both hashes into one big array.
The only way around this issue is to pass a reference:
foo(\#first, \#second);
#common_keys = common(\%hash1, \%hash2);
In this case, I'm passing the memory location where these two hashes are stored in memory. My subroutine can use those hash references. However, you do have to take some care which I'll explain with the second explanation.
The second reason to pass a reference is memory management. If my array or hash is a few dozen entries, it really doesn't matter all that much. However, imagine I have 10,000,000 entries in my hash or array. Copying all those members could take quite a bit of time. Passing by reference saves me memory, but with a terrible cost. Most of the time, I'm using subroutines as a way of not affecting my main program. This is why subroutines are suppose to use their own variables and why you're taught in most programming courses about variable scope.
However, when I pass a reference, I'm breaking that scope. Here's a simple program that doesn't pass a reference.
#! /usr/bin/env perl
use strict;
use warnings;
my #array = qw(this that the other);
foo (#array);
print join ( ":", #array ) . "\n";
sub foo {
my #foo_array = #_;
$foo_array[1] = "FOO";
}
Note that the subroutine foo1 is changing the second element of the passed in array. However, even though I pass in #array into foo, the subroutine doesn't change the value of #array. That's because the subroutine is working on a copy (created by my #foo_array = #_;). Once the subroutine exists, the copy disappears.
When I execute this program, I get:
this:that:the:other
Now, here's the same program, except I'm passing in a reference, and in the interest of memory management, I use that reference:
#! /usr/bin/env perl
use strict;
use warnings;
my #array = qw(this that the other);
foo (\#array);
print join ( ":", #array ) . "\n";
sub foo {
my $foo_array_ref = shift;
$foo_array_ref->[1] = "FOO";
}
When I execute this program, I get:
this:FOO:the:other
That's because I don't pass in the array, but a reference to that array. It's the same memory location that holds #array. Thus, changing the reference in my subroutine causes it to be changed in my main program. Most of the time, you do not want to do this.
You can get around this by passing in a reference, then copying that reference to an array. For example, if I had done this:
sub foo {
my #foo_array = #{ shift() };
I would be making a copy of my reference to another array. It protects my variables, but it does mean I'm copying my array over to another object which takes time and memory. Back in the 1980s when I first was programming, this was a big issue. However, in this age of gigabyte memory and quadcore processors, the main issue isn't memory management, but maintainability. Even if your array or hash contained 10 million entries, you'll probably not notice any time or memory issues.
This also works the other way around too. I could return from my subroutine a reference to a hash or the entire hash. Many people like returning a reference, but this could be problematic.
In object oriented Perl programming, I use references to keep track of my objects. Normally, I'll have a reference to a hash I can use to store other values, arrays, and hashes.
In a recent program, I was counting IDs and how many times they are referenced in a log file. This was stored in an object (which is just a reference to a hash). I had a method that would return the entire hash of IDs and their counts. I could have done this:
return $self->{COUNT_HASH};
But, what happened, if the user started modifying that reference I passed? They would be actually manipulating my object without using my methods to add and subtract from the IDs. Not something that I want them to do. Instead, I create a new hash, and then return a reference to that hash:
my %hash_counts = % { $self-{COUNT_HASH} };
return \%hash_count;
This copied my reference to an array, and then I passed the reference to the array. This protects my data from outside manipulation. I could still return a reference, but the user would no longer have access to my object without going through my methods.
By the way, I like using wantarray which gives the caller a choice on how they want their data:
my %hash_counts = %{ $self->{COUNT_HASH} };
return want array ? %hash_counts : \%hash_counts;
This allows me to return a reference or a hash depending how the user called my object:
my %hash_counts = $object->totals(); # Returns a hash
my $hash_counts_ref = $object->totals(); # Returns a reference to a hash
1 A footnote: The #_ array is pointing to the same memory location as the parameters of your calling subroutine. Thus, if I pass in foo(#array) and then did $_[1] = "foo";, I would be changing the second element of #array.

module creation using perl script [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do you create a Perl module?
I have the script that reads an xml file and creates hash table. its working properly but now i need to create module for that code, that i can call in my main function.In my main function file path as input and it gives output as hash. now i need to create module for this code.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML::Reader;
#Reading XML with a pull parser
my $file;
open( $file, 'formal.xml');
my $reader = XML::LibXML::Reader->new( IO => $file ) or die ("unable to open file");
my %nums;
while ($reader->nextElement( 'Data' ) ) {
my $des = $reader->readOuterXml();
$reader->nextElement( 'Number' );
my $desnode = $reader->readInnerXml();
$nums{$desnode}= $des;
print( " NUMBER: $desnode\n" );
print( " Datainfo: $des\n" );
}
how can i create module for this code?
You need to create a file with .pm extension, i.e. "MyModule.pm" with this code:
package MyModule;
use warnings;
use strict;
use XML::LibXML::Reader;
sub mi_function_name {
#Reading XML with a pull parser
my $file;
open( $file, 'formal.xml');
my $reader = XML::LibXML::Reader->new( IO => $file ) or die ("unable to open file");
my %nums;
while ($reader->nextElement( 'Data' ) ) {
my $des = $reader->readOuterXml();
$reader->nextElement( 'Number' );
my $desnode = $reader->readInnerXml();
$nums{$desnode}= $des;
print( " NUMBER: $desnode\n" );
print( " Datainfo: $des\n" );
}
}
1; #this is important
And in the file you want to use this module:
use MyModule;
#...
MyModule::mi_function_name;
This is a very simple and basic usage of a module, I recommend the lecture of better tutorials (http://www.perlmonks.org/?node_id=102347) to gain further knowledge on this
Take a look at the Perl Documentation. One of the tutorials included is perlmod. This offers a lot of good information.
First step: Make your program into a subroutine. That way, you can call it your code. I've taken the liberty to do that:
#!/usr/bin/perl
use warnings;
use strict;
use Carp;
use XML::LibXML::Reader;
#Reading XML with a pull parser
sub myFunction {
my $fh = shift; #File Handle (should be opened before calling
my $reader = XML::LibXML::Reader->new( IO => $fh )
or croak ("unable to open file");
my %nums;
while ($reader->nextElement( 'Data' ) ) {
my $des = $reader->readOuterXml();
$reader->nextElement( 'Number' );
my $desnode = $reader->readInnerXml();
$nums{$desnode} = $des;
}
return %nums;
}
1;
I've made a wee change. You notice that I no longer open a file. Instead, you'll pass a file handle to your MyFunction subroutine. Second, instead of printing out $desnode and $des, it now returns a hash that has these values in them. You don't want subroutines to output data. You want them to return the data, and let your program decide what to do with the information.
I've also put in a use Carp; line. Carp gives you two functions (as well as a few others). One is called carp which is a replacement for warning, and the other is called croak which is a replacement for die. What these two functions do is report the line number in the user's program which called your function. That way, the user doesn't see the error in your module, but their program.
I've also added the line 1; at the bottom of your program. When a module loads, if it returns a zero on load, the load fails. Thus, your last statement should return a non-zero value. The 1; guarantees it.
Now that we have a subroutine that you can return, let's make your program into a module.
To create a module, all you have to do is say package <moduleName> on top of your program. And, also make sure that the last statement executes with a non-zero value. The tradition is just to put a 1; as the last line of the program. Modules names end with a .pm suffix by default. Modules names can have components in the names separated by double colons. For example File::Basename. In that case, the module, Basename.pm lives in the directory File somewhere in the #INC list of directories (which, by default includes the current directory).
The package command simply creates a separate namespace, so your package variables and functions don't collide with the names of the variables and functions inside the program that uses your package.
If you use an object oriented interface, there's no reason why you need to export anything. The program that uses your module will simply use the object oriented syntax. If your module is function based, you probably want to export your function names into the main program.
For example, let's take File::Basename. This module imports the function basename and dirname into your program. This allows you to do this:
my $directoryName = dirname $fileName;
Instead of having to do this:
my $direcotryName = File::Basename::dirname $fileName;
To export a function, make sure your module uses the Exporter module, and then set the package variable #EXPORT_OK or #EXPORT to contain the list of functions you're allowing to be exported in the user's program. The difference is that if you say #EXPORT_OK, the functions will be exported, but the user must request each one. If you use #EXPORT, all those functions will automatically be exported.
Using your program as a basis, your module will be called Mypackage.pm and look like this:
#!/usr/bin/perl
package Mymodule;
use warnings;
use strict;
use Exporter qw(import);
use Carp;
use XML::LibXML::Reader;
our #EXPORT_OK(myFunction);
#Reading XML with a pull parser
sub MyFunction {
my $fh = shift; #File Handle (should be opened before calling
my $reader = XML::LibXML::Reader->new( IO => $fh )
or die ("unable to open file");
my %nums;
while ($reader->nextElement( 'Data' ) ) {
my $des = $reader->readOuterXml();
$reader->nextElement( 'Number' );
my $desnode = $reader->readInnerXml();
$nums{$desnode}= $des;
}
return %nums;
}
1;
The big thing is the use of:
package Mypackage
use Exporter qw(import)
our #EXPORT_OK qw(myFunction);
The package function sets up an independent name space, so your variables and function names don't override (or get overwritten) by the user's program.
The use Exporter says that your program is using the import function of the Exporter module. This allows you to import variables and functions into the main namespace of the user's program. That way, the user can simply refer to your function as mi_function_name instead of Mypackage::my_function_name. In theory, you don't have to export anything, and newer modules don't. These module are entirely object oriented or just don't want to bother with namespace issues.
The #EXPORT_OK array says what you're exporting. This is preferred over #EXPORT. With #EXPORT_OK, the developer must specify what functions he wants to import into their program. With #EXPORT, this is done automatically.
In the program that uses your module, you'll need to do this:
use Mypackage qw(myFunction);
Now, all you have to do in your program is
my %returnedHash = myFunction($fh);
Now, things are constantly evolving in Perl, and I've never received any formal training. I simply read the documentation and take a look at various examples and hope that I understand them correctly. So, if someone might say that I'm doing something wrong, they're probably correct. I've also didn't test any of the code. I might have screwed something in your program when I turned it into a subroutine.
However, the gist should be correct: You need to make your code into callable subroutines that return the information you need. Then, you can turn it into a module. It's not all that difficult to do.