I've been offloaded some maintenance tasks on a couple of Perl scripts. One of the requirements is to download a couple of dozen files (HTTP) in parallel. I went looking on CPAN for the easiest solution and found this module called IO::Lambda::HTTP.
Unfortunately, I have absolutely no experience with functional programming (and beginner-level Perl experience), so while I see that all of the examples work as documented, I can't really modify any of them to suit my needs.
For example, the sample that comes with the module:
#!/usr/bin/perl
# $Id: parallel.pl,v 1.7 2008/05/06 20:41:33 dk Exp $
#
# This example fetches two pages in parallel, one with http/1.0 another with
# http/1.1 . The idea is to demonstrate three different ways of doing so, by
# using object API, and explicit and implicit loop unrolling
#
use lib qw(./lib);
use HTTP::Request;
use IO::Lambda qw(:lambda);
use IO::Lambda::HTTP qw(http_request);
use LWP::ConnCache;
my $a = HTTP::Request-> new(
GET => "http://www.perl.com/",
);
$a-> protocol('HTTP/1.1');
$a-> headers-> header( Host => $a-> uri-> host);
my #chain = (
$a,
HTTP::Request-> new(GET => "http://www.perl.com/"),
);
sub report
{
my ( $result) = #_;
if ( ref($result) and ref($result) eq 'HTTP::Response') {
print "good:", length($result-> content), "\n";
} else {
print "bad:$result\n";
}
# print $result-> content;
}
my $style;
#$style = 'object';
#$style = 'explicit';
$style = 'implicit';
# $IO::Lambda::DEBUG++; # uncomment this to see that it indeed goes parallel
if ( $style eq 'object') {
## object API, all references and bindings are explicit
sub handle {
shift;
report(#_);
}
my $master = IO::Lambda-> new;
for ( #chain) {
my $lambda = IO::Lambda::HTTP-> new( $_ );
$master-> watch_lambda( $lambda, \&handle);
}
run IO::Lambda;
} elsif ( $style eq 'explicit') {
#
# Functional API, based on context() calls. context is
# $obj and whatever arguments the current call needs, a RPN of sorts.
# The context though is not stack in this analogy, because it stays
# as is in the callback
#
# Explicit loop unrolling - we know that we have exactly 2 steps
# It's not practical in this case, but it is when a (network) protocol
# relies on precise series of reads and writes
this lambda {
context $chain[0];
http_request \&report;
context $chain[1];
http_request \&report;
};
this-> wait;
} else {
# implicit loop - we don't know how many states we need
#
# also, use 'tail'
this lambda {
context map { IO::Lambda::HTTP-> new( $_, async_dns => 1 ) } #chain;
tails { report $_ for #_ };
};
this-> wait;
}
Works as advertised, but I can't for the life of me figure out how to modify either the 'object' or 'implicit' examples to be limited to N parallel instances like the following from IO::Lambda's synopsis:
# http://search.cpan.org/~karasik/IO-Lambda/lib/IO/Lambda.pm
# crawl for all urls in parallel, but keep 10 parallel connections max
print par(10)-> wait(map { http($_) } #hosts);
Can someone show me an example of what the lambda code would look like given the above constraint (e.g limit to N instances)?
Also, what's the best way to start learning functional programming? It seems totally alien to me.
There are good other options than IO::Lambda for this task, for example AnyEvent::HTTP. See this previous SO question.
Even though I'm familiar with functional programming, the above IO::Lambda sample code looks rather hard to understand to me.
Related
I have one script (A.pl) and one package (B.pm), I want to create a static variable in B.pm so that it can accessible to A.pl.
A.pl
use lib 'path/to/B_Package';
for loop 10 times {
fun(age);
}
if ($count>0) {
print "script fails";
}
B.pm
package B {
fun() {
my $age_given = shift;
my $count;
eval {
result = someFileHandling;
} or die {
$count++;
}
}
}
I'd question such design, and some alternatives are offered below.
But yes it can be done -- a variable declared as our can be accessed by its fully qualified name.
In the package file Pack.pm
package Pack;
use warnings;
use strict;
use Exporter qw(import);
our #EXPORT_OK = qw(func);
our $count = 7;
sub func { ++$count }
1;
and in its user
use warnings;
use strict;
use feature 'say';
use Pack qw(func);
for (1..2) {
func();
say "Current value of a global in 'Pack': ", $Pack::count;
}
$Pack::count = 123;
say "Current value of a global in 'Pack': ", $Pack::count;
So changes to $count made in Pack::func() are seen in the calling program. More to the point, $Pack::count can be directly written by any code in the interpreter.
The package globals that are meant to be used directly, like $count above,† are tricky creatures that can be hard to use sensibly but are very easy to end up abusing.
In general you don't want them: their use runs contrary to the critical idea of dividing software in components that communicate via clearly defined interface, they introduce uncontrolled coupling and thus defeat scope, etc. With such variables in use distinct components in the code get entangled.
But they can of course be useful and are used in libraries, mostly for constants and parameters.
Now, having them change as well? That can get out of control, and even though that, too, is used in libraries (to control their behavior by setting parameters) it veers closer to an analogue of a "God Class," an all-controlling entity. At that point I would flatly call it flawed and a trouble-maker.
Why not have subs handle the counting and return updated values? They can keep values using state pragma for instance. Or even using a file-scoped variable, as long as that is internal to its business and outsiders aren't allowed to poke at it.
Here's a sample for the two mentioned approaches in revised Pack.pm
package Pack;
use warnings;
use strict;
use feature qw(state);
use Exporter qw(import);
our #EXPORT_OK = qw(count func1 func2);
my $count = 7;
sub func1 { ++$count } # sets counter while doing its other work
sub count { # add check that input is numeric
$count += shift for #_; # Set count if values passed,
return $count; # return value in either case
}
sub func2 {
state $count = 0; # keeps count (its own $count)
return $count += shift // 1; # demo: add some and return
}
1;
Demo for its use:
use warnings;
use strict;
use feature 'say';
use Pack qw(count func1 func2);
say "Call func2(), using 'state' feature, with its own counter: ";
for (1..2) { say "func2($_): ", func2($_) }
say '';
say "Set value for a file-wide counter, retrieve using count(): ";
for (1..2) { func1() }
say "Count is: ", count();
say "Use count() to set values as well: ";
for (1..2) { say "For #$_: ", count($_) }
This prints
Call func2(), using 'state' feature, with its own counter:
func2(1): 1
func2(2): 3
Set value for a file-wide counter, retrieve using count():
Count is: 9
Use count() to set values as well:
With 1: 10
With 2: 12
The next step up is to make this a class, and then you can implement any and all kinds of counters in very natural ways.
For more on variables, see this post and this post and this Effective Perler article, for starters.
† An our variable is strictly speaking not a global, but a lexical that is aliased to a package variable (a "true" global) with the same name.
I think there's a better way to do what I'm guessing that you want to do. I think that you want to try something a certain number of times and give up if you can't acheive that goal.
When you call your subroutine, you want it to know how many times to try. Also, you want to know when it fails.
You don't need to share a variable for this. The die is going to take care of that for you. Call the sub as many times as you like, and each time you don't get back a value from eval, count that as an error:
my $errors = 0;
foreach ( 1 .. 10 ) {
my $result = eval { do_this($age) };
$errors++ unless defined $result;
}
print "Script fails" if $errors > 0;
In the subroutine, you don't need to worry about how many times this has been done because that's happening at the higher level for you. You look at the result of the subroutine to decide if it failed and adjust a counter at the higher level. Now the subroutine can focus on it's small part instead of thinking about why you are calling it. You also don't need the eval at this level because you already have it at the higher level.
sub do_this {
my( $age ) = #_;
... some file handling ...
}
Factories
But let's say that there is some good reason for a lower subroutine to know its count. I don't want to pollute that subroutine for everyone—suppose that 10 other places in the program also call this subroutine and they all fail. Should that count against your call? You probably don't want that. But, there's a way around this. You can create a new version of the subroutine when you need to. A factory is a subroutine that makes other subroutines.
Let's say you want to try something a certain number of times. But, you might want to do that multiple times too. Make a new subroutine every time that you want to try this. Tell that subroutine how many tries it gets:
sub some_factory {
my( $max_tries ) = #_;
sub anon_thingy {
my( $age ) = #_;
for ( 1 .. $max_tries ) {
... file handling ... or die ...
}
}
}
Your program would then look something like:
my $try_this = some_factory( 10 );
my $result = eval { $try_this->($age) };
print "Script fails" unless defined $result;
In the same program, you can do it again, and each generated code reference tracks its own use and doesn't bother other subs:
foreach $age ( list_of_ages() ) {
my $new_sub = some_factory( 10 );
my $result = eval { $new_sub->($age) };
print "Script fails" unless defined $result;
}
I spend quite a bit of time on this sort of stuff in Intermediate Perl and Mastering Perl.
I have a very interesting predicament. I am working on a Perl script interface to the CVS repository and have created Perl Objects to represent Modules,Paths, and Files. Since Modules, Paths, and Files can all have CVS commands issued on them, I set up the AUTOLOAD routine to take any unidentified methods and issue them on the object as if they were CVS commands.
All of these CVS commands are executed exactly the same way, but some of them need special processing done with the output to get the result i desire.
For example, I want to take the output from the diff command and reformat it before I return it.
I am using Moose, so typically this special processing could be done as follows:
after 'diff' => sub {
# Reformat output here
}
The problem is... I never explicitly created the diff method since it is being generated by AUTOLOAD and Perl won't let me create a method modifier for it since it technically doesn't exist!
Is there a way to get this to work how I want?
Apply after to your AUTOLOAD method.
after 'AUTOLOAD' => sub {
my $method = $The::Package::AUTOLOAD;
$method =~ s/.*:://;
if ($method eq 'diff') {
# do after diff stuff
} elsif ($method eq 'foo') {
# do after foo stuff
} else {
# never mind, don't want to do anything after this function
}
};
EDIT:
I found that I may want even more control over the diff command so I have added more detail to your answer. Hopefully someone will find this information useful.
For even more control you can use around!
around 'AUTOLOAD' => sub {
my $orig = shift;
my $self = shift;
(my $command = $AUTOLOAD) =~ s{.+::}{};
# Special processing
if ($command eq 'diff') {
#
# Add "before" special processing here
#
my $output = $self->$orig(#_);
#
# Add "after" special processing here
#
}
else {
return $self->$orig(#_);
}
};
This allows you to do special processing before the function is called AND after.
For more information see: Moose::Manual::MethodModifiers
Depending on how well the AUTOLOAD-using class is implemented, you may find that it respects the can method too, and that simply calling can is enough to create the method.
__PACKAGE__->can( "diff" );
after diff => sub { ... };
I'd suggest that you re-architect your system to use traits, instead of relying on AUTOLOAD behavior. The maintainability and intent will be much more obvious, if you don't have behavior scattered all over the place.
As an example, you can do what you want with something like the following:
package Trait::CVSActions;
use Moose::Role;
sub commit { print 'in commit for ' . shift . "\n" }
sub diff { print 'diffing for ' . shift . "\n" }
package Module;
use Moose;
with 'Trait::CVSActions';
package Path;
use Moose;
with 'Trait::CVSActions';
after commit => sub { print "after commit on Path\n" };
package main;
my $module = new Module;
my $path = new Path;
$module->commit;
$path->commit;
If you're looking to use AUTOLOAD to dispatch to unknown commands, then this is dangerous, since there may be some that you will have to have special handling for that you aren't aware of, so you may be causing yourself future problems.
I have a module misc which is used by a few scripts. Each script accept two standard options (-help and -verbose) as well as a bunch of its own ones.
So, every scripts now has
my ($verbose,$quiet) = (1,0);
my $help = undef;
...
GetOptions("verbose|v+" => \$verbose, "quiet|q+" => \$quiet, "help|h" => \$help,
....)
or die "GetOptions: error (see above)\n";
if (defined $help) { usage(); exit; }
$verbose -= $quiet;
which is already boring.
Now, I want the misc functions to be aware of the value of $verbose too, so I have to move $verbose et al to misc and use $misc::verbose in scripts:
misc:
our $verbose = 1;
my $quiet = 0;
our $help = undef;
our %common_options = ("verbose|v+" => \$verbose, "quiet|q+" => \$quiet, "help|h" => \$help);
sub set_verbose () { $verbose -= $quiet; }
script:
GetOptions(\%misc::common_options,"verbose|v","quiet|q","help|h",
"count=i" => \$count, "start=i" => \$start, "mincoverage=i" => \$mincoverage,
"output=s" => \$output, "input=s" => \$input, "targets=s" => \$targets)
or die "GetOptions: error (see above)\n";
if (defined $misc::help) { usage(); exit; }
misc::set_verbose();
which does not look much better (and appears to not work anyway, at least -help is ignored).
So, what do people use for command line options shared between modules and scripts?
Personally, I do it simpler:
use a hash to store command line options
GetOptions(\%args, "xxx1","xxx2");
Pass that hash - as is - to ANY classes' constructrs, or module setters
my $object = Class1->new(%args, %constructor_args);
Module2::set_args(\%args); #
Argument setter in the module would be:
# Module2
our %args;
sub set_args { %args = %{ shift }; }
This ensures that:
I NEVER have to worry about moving parameters from scope to scope and having to modify some calls. They are ALL passed around 100% of needed places
Neat and non-messy code. Since we are broadcasting, we don't need to worry about subscribers' individual needs.
Pattern easily replicated to ALL classes you own.
As a matter of fact, if you want to be extra crafty, you can even replace Module2::set_args(\%args); calls for several classes with a smart code that:
Reads in a list of loaded modules
Checks which of those modules implements set_args() call via UNIVERSAL::can()
Calls the supporting modules' set_args() call
The latter makes the code even cleaner, in that N calls to set_args() one for each non-class module is are all replaced by one set_all_modules_args() call.
Have a module that's responsible for getting standard options.
Use this module, and everyone will be able to access the verbose() call.
package StandardOptions;
use base qw(Exporter);
our #EXPORT = qw(verbose);
use Getopt::Long;
Getopt::Long::Configure(qw(pass_through));
my $helpNeeded;
my $verbose
GetOptions ("help" => \$helpNeeded, "verbose" => $verbose);
if ($helpNeeded) {
#provide help
exit(0);
}
sub verbose {
return $verbose;
}
1;
I'm writing a Perl script that will run simultaneously N times. The script will need to process a list. Each element of the list need to be processed only once.
Can I rely on DB_File to mark which element is processed/processing? I can make a hash out of the list that will be dropped in the file.
If not, what is the best way to implement this?
Rather than using Berkeley DB, why not just use something like
Parallel::Fork::BossWorker? I've been happily using it for several years to do what you're describing.
Update
Nothing wrong with Berkeley DB per se, but it strikes me that you'd need to be writing a bunch of queue management code whereas a module like BossWorker takes care off all that for you (and allows you to concentrate on the real problem).
As an example, I use it to monitor network switches where checking them serially takes too long (especially if one or more switches are having issues) and checking them in parallel buries the monitoring box. The stripped down version looks like:
use strict;
use warnings;
use Parallel::Fork::BossWorker;
my %opts = get_options();
my %config = read_config_file($opts{config});
my $worker_count = $opts{POLLING_WORKER_PROCESSES} || 3;
my #results;
# Other setup/initialization stuff... yada, yada, yada
# Set up the BossWorker
my $bw = new Parallel::Fork::BossWorker(
work_handler => \&do_work,
result_handler => \&check_result,
worker_count => $worker_count,
);
# Populate the work queue
foreach my $switch (#switches) {
$bw->add_work({switch => $switch, config => \%config, opts => \%opts });
}
# Process the work in the queue
$bw->process();
# Once the workers are done, do something with the results
exit;
########################################################################
sub check_result {
my $result = shift;
if ($result) {
push #results, $result;
}
}
sub do_work {
my $work = shift;
my $switch = $work->{switch};
my %config = %{$work->{config}};
my %opts = %{$work->{opts}};
my $result = '';
# Do something...
return $result;
}
I have written a Perl script that would start a SNMP session and extracting the data/counters and it's value to a csv file. There are 7 perl scripts; different properties/definition/variables on the top.. but the engine is the same.
At this point, those 7 perl scripts are redundant except for the defined variables. Is there a way to keep the execution perl script as a properties/execution file and keep the engine in a another file? This properties/execution perl script will call the engine (using the properties defined in it's own script).
So in short, I want to use the variables in their own script (as an execution as well), but calls a specific function from a unified "engine".
i.e.
retrieve_mibs1.pl retrieve_mibs2.pl
retrieve_mibs3.pl
retrieve_mibs4.pl
retrieve_mibs5.pl
retrieve_mibs6.pl
retrieve_mibs7.pl
retrieve_mibs1.pl
#!/usr/local/bin/perl
use Net::SNMP;
##DEFINITION START
my #Servers = (
'server1',
'server2',
);
my $PORT = 161;
my $COMMUNITY = 'secret';
my $BASEOID = '1.2.3.4.5.6.7.8';
my $COUNTERS = [
[11,'TotalIncomingFromPPH'],
[12,'TotalFailedIncomingFromPPH'],
];
##ENGINE START
sub main {
my $stamp = gmtime();
my #oids = ();
foreach my $counter (#$COUNTERS) {
push #oids,("$BASEOID.$$counter[0].0");
}
foreach my $server (#Servers) {
print "$stamp$SEPARATOR$server";
my ($session,$error) = Net::SNMP->session(-version => 1,-hostname => $server,-port => $PORT,-community => $COMMUNITY);
if ($session) {
my $result = $session->get_request(-varbindlist => \#oids);
if (defined $result) {
foreach my $oid (#oids) {
print $SEPARATOR,$result->{$oid};
}
} else {
print STDERR "$stamp Request error: ",$session->error,"\n";
print "$SEPARATOR-1" x scalar(#oids);
}
} else {
print STDERR "$stamp Session error: $error\n";
print "$SEPARATOR-1" x scalar(#oids);
}
print "\n";
}
}
main();
You could do it using eval: set up the variables in one file, then open the engine and eval it's content.
variables.pl (set up your variables and call the engine):
use warnings;
use strict;
use Carp;
use English '-no_match_vars';
require "engine.pl"; # so that we can call it's subs
# DEFINITION START
our $VAR1 = "Hello";
our $VAR2 = "World";
# CALL THE ENGINE
print "START ENGINE:\n";
engine(); # call engine
print "DONE\n";
engine.pl (the actual working stuff):
sub engine{
print "INSIDE ENGINE\n";
print "Var1: $VAR1\n";
print "Var2: $VAR2\n";
}
1; # return a true value
Other alternatives would be:
pass the definitions as command line parameters directly to engine.pl and evaluate the contents of #ARGV
write a perl module containing the engine and use this module
store the parameters in a config file and read it in from your engine (e.g. using Config::IniFiles)
Two thoughts come to mind immediately:
Build a Perl module for your common code, and then require or use the module as your needs dictate. (The difference is mostly whether you want to run LynxLee::run_servers() or run_servers() -- do you want the module to influence your current scope or not.)
Use symbolic links: create these symlinks: retrieve_mibs1.pl -> retrieve_mibs.pl retrieve_mibs2.pl -> retrieve_mibs.pl, and so on, then set the variables based on the program name:
#!/usr/bin/perl -w
use File::Basename;
my $name = basename($0);
my #Servers, $PORT, $COMMUNITY, $BASEOID, $COUNTERS;
if($name ~= /retrieve_mibs1\.pl/) {
#Servers = (
'server1',
'server2',
);
# ...
} elsif ($name ~= /retrieve_mibs2\.pl/) {
#Servers = (
'server3',
'server4',
);
# ...
}
Indexing into a hash with the name of the program to retrieve the parameters would be much cleaner, but I'm not so good at Perl references. :)
I'm not sure what the problem is so I'm guessing a little. You have code in various places that is the same each time save for some variables. This is the very definition of a subroutine.
Maybe the problem is that you don't know how to include the common code in those various scripts. This is fairly easy: You write that code in a perl module. This is basically a file ending in pm instead of pl. Of course you have to take care of a bunch of things such as exporting your functions. Perldoc should be of great help.