Parallel::ForkManager doesn't work with Perl 5.36 - perl

I have a script that runs well with Perl < 5.36:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';
use Parallel::ForkManager;
sub execute {
my $command = shift;
print "Executing Command: $command\n";
if (system($command) != 0) {
my $fail_filename = "$0.fail";
print "$command failed.\n";
die;
}
}
sub run_parallel {
my $cmd = shift;
my $manager = new Parallel::ForkManager(2);
foreach my $command (#{ $cmd }) {
$manager->start and next;
execute( $command );
$manager->finish;
}
$manager->wait_all_children;#necessary after all lists
}
my #commands = ('echo "a"','echo "b"','echo "c"','which ls','which rm');
run_parallel(\#commands);
but when I make minor changes with the above to 5.36:
#!/usr/bin/env perl
use 5.036;
use warnings FATAL => 'all';
use autodie ':all';
use Parallel::ForkManager;
sub execute {
my $command = shift;
print "Executing Command: $command\n";
if (system($command) != 0) {
my $fail_filename = "$0.fail";
print "$command failed.\n";
die;
}
}
sub run_parallel {
my $cmd = shift;
my $manager = new Parallel::ForkManager(2);
foreach my $command (#{ $cmd }) {
$manager->start and next;
execute( $command );
$manager->finish;
}
$manager->wait_all_children;#necessary after all lists
}
my #commands = ('echo "a"','echo "b"','echo "c"','which ls','which rm');
run_parallel(\#commands);
I get an error:
Bareword found where operator expected at debug.pl line 20, near "new Parallel::ForkManager"
All I switched was use 5.036
Is Parallel::ForkManager incompatible with perl 5.36 or am I doing something wrong?

Perl v5.36 with use v5.36 turns off indirect object notation, where the method comes before the invocant:
my $p = new Some::Module; # indirect object notation
my $p = Some::Module->new(); # what you should do
If this is inconvenient for you in the short term, you can require the minimum version so you still get the things turned off by use v5.36:
require v5.36;
If you don't actually use v5.36 features, also consider requiring the minimum version that your code actually needs. In your snippet, I don't immediately see any minimum version requirement (other than just Perl 5).

Loading the 5.36 feature bundle (which you do through use 5.036;) disables the indirect method call syntax as if you had done no feature qw( indirect );.
This is a method call using the indirect syntax:
METHODNAME INVOCANT ARGS
Either re-enable the feature or use the "direct" syntax:
INVOCANT->METHODNAME( ARGS )
In your case,
my $manager = Parallel::ForkManager->new( 2 );

Related

Overwriting a function defined in a module but before used in its runtime phase?

Let's take something very simple,
# Foo.pm
package Foo {
my $baz = bar();
sub bar { 42 }; ## Overwrite this
print $baz; ## Before this is executed
}
Is there anyway that I can from test.pl run code that changes what $baz is set to and causes Foo.pm to print something else to the screen?
# maybe something here.
use Foo;
# maybe something here
Is it possible with the compiler phases to force the above to print 7?
A hack is required because require (and thus use) both compiles and executes the module before returning.
Same goes for eval. eval can't be used to compile code without also executing it.
The least intrusive solution I've found would be to override DB::postponed. This is called before evaluating a compiled required file. Unfortunately, it's only called when debugging (perl -d).
Another solution would be to read the file, modify it and evaluate the modified file, kinda like the following does:
use File::Slurper qw( read_binary );
eval(read_binary("Foo.pm") . <<'__EOS__') or die $#;
package Foo {
no warnings qw( redefine );
sub bar { 7 }
}
__EOS__
The above doesn't properly set %INC, it messes up the file name used by warnings and such, it doesn't call DB::postponed, etc. The following is a more robust solution:
use IO::Unread qw( unread );
use Path::Class qw( dir );
BEGIN {
my $preamble = '
UNITCHECK {
no warnings qw( redefine );
*Foo::bar = sub { 7 };
}
';
my #libs = #INC;
unshift #INC, sub {
my (undef, $fn) = #_;
return undef if $_[1] ne 'Foo.pm';
for my $qfn (map dir($_)->file($fn), #libs) {
open(my $fh, '<', $qfn)
or do {
next if $!{ENOENT};
die $!;
};
unread $fh, "$preamble\n#line 1 $qfn\n";
return $fh;
}
return undef;
};
}
use Foo;
I used UNITCHECK (which is called after compilation but before execution) because I prepended the override (using unread) rather than reading in the whole file in and appending the new definition. If you want to use that approach, you can get a file handle to return using
open(my $fh_for_perl, '<', \$modified_code);
return $fh_for_perl;
Kudos to #Grinnz for mentioning #INC hooks.
Since the only options here are going to be deeply hacky, what we really want here is to run code after the subroutine has been added to the %Foo:: stash:
use strict;
use warnings;
# bless a coderef and run it on destruction
package RunOnDestruct {
sub new { my $class = shift; bless shift, $class }
sub DESTROY { my $self = shift; $self->() }
}
use Variable::Magic 0.58 qw(wizard cast dispell);
use Scalar::Util 'weaken';
BEGIN {
my $wiz;
$wiz = wizard(store => sub {
return undef unless $_[2] eq 'bar';
dispell %Foo::, $wiz; # avoid infinite recursion
# Variable::Magic will destroy returned object *after* the store
return RunOnDestruct->new(sub { no warnings 'redefine'; *Foo::bar = sub { 7 } });
});
cast %Foo::, $wiz;
weaken $wiz; # avoid memory leak from self-reference
}
use lib::relative '.';
use Foo;
This will emit some warnings, but prints 7:
sub Foo::bar {}
BEGIN {
$SIG{__WARN__} = sub {
*Foo::bar = sub { 7 };
};
}
First, we define Foo::bar. It's value will be redefined by the declaration in Foo.pm, but the "Subroutine Foo::bar redefined" warning will be triggered, which will call the signal handler that redefines the subroutine again to return 7.
Here is a solution that combines hooking the module loading process with the readonly-making capabilities of the Readonly module:
$ cat Foo.pm
package Foo {
my $baz = bar();
sub bar { 42 }; ## Overwrite this
print $baz; ## Before this is executed
}
$ cat test.pl
#!/usr/bin/perl
use strict;
use warnings;
use lib qw(.);
use Path::Tiny;
use Readonly;
BEGIN {
my #remap = (
'$Foo::{bar} => \&mybar'
);
my $pre = join ' ', map "Readonly::Scalar $_;", #remap;
my #inc = #INC;
unshift #INC, sub {
return undef if $_[1] ne 'Foo.pm';
my ($pm) = grep { $_->is_file && -r } map { path $_, $_[1] } #inc
or return undef;
open my $fh, '<', \($pre. "#line 1 $pm\n". $pm->slurp_raw);
return $fh;
};
}
sub mybar { 5 }
use Foo;
$ ./test.pl
5
I have revised my solution here, so that it no longer relies on Readonly.pm, after learning that I had missed a very simple alternative, based on m-conrad's answer, which I have reworked into the modular approach that I had started here.
Foo.pm (Same as in the opening post)
package Foo {
my $baz = bar();
sub bar { 42 }; ## Overwrite this
print $baz; ## Before this is executed
}
# Note, even though print normally returns true, a final line of 1; is recommended.
OverrideSubs.pm Updated
package OverrideSubs;
use strict;
use warnings;
use Path::Tiny;
use List::Util qw(first);
sub import {
my (undef, %overrides) = #_;
my $default_pkg = caller; # Default namespace when unspecified.
my %remap;
for my $what (keys %overrides) {
( my $with = $overrides{$what} ) =~ s/^([^:]+)$/${default_pkg}::$1/;
my $what_pkg = $what =~ /^(.*)\:\:/ ? $1 : $default_pkg;
my $what_file = ( join '/', split /\:\:/, $what_pkg ). '.pm';
push #{ $remap{$what_file} }, "*$what = *$with";
}
my #inc = grep !ref, #INC; # Filter out any existing hooks; strings only.
unshift #INC, sub {
my $remap = $remap{ $_[1] } or return undef;
my $pre = join ';', #$remap;
my $pm = first { $_->is_file && -r } map { path $_, $_[1] } #inc
or return undef;
# Prepend code to override subroutine(s) and reset line numbering.
open my $fh, '<', \( $pre. ";\n#line 1 $pm\n". $pm->slurp_raw );
return $fh;
};
}
1;
test-run.pl
#!/usr/bin/env perl
use strict;
use warnings;
use lib qw(.); # Needed for newer Perls that typically exclude . from #INC by default.
use OverrideSubs
'Foo::bar' => 'mybar';
sub mybar { 5 } # This can appear before or after 'use OverrideSubs',
# but must appear before 'use Foo'.
use Foo;
Run and output:
$ ./test-run.pl
5
If the sub bar inside Foo.pm has a different prototype than an existing Foo::bar function, Perl won't overwrite it? That seems to be the case, and makes the solution pretty simple:
# test.pl
BEGIN { *Foo::bar = sub () { 7 } }
use Foo;
or kind of the same thing
# test.pl
package Foo { use constant bar => 7 };
use Foo;
Update: no, the reason this works is that Perl won't redefine a "constant" subroutine (with prototype ()), so this is only a viable solution if your mock function is constant.
Lets have a Golf contest!
sub _override { 7 }
BEGIN {
my ($pm)= grep -f, map "$_/Foo.pm", #INC or die "Foo.pm not found";
open my $fh, "<", $pm or die;
local $/= undef;
eval "*Foo::bar= *main::_override;\n#line 1 $pm\n".<$fh> or die $#;
$INC{'Foo.pm'}= $pm;
}
use Foo;
This just prefixes the module's code with a replacement of the method, which will be the first line of code that runs after the compilation phase and before the execution phase.
Then, fill in the %INC entry so that future loads of use Foo don't pull in the original.

Is it possible to use Perl's Getopt::Std in 2 packages?

[perl script]
perl_script.pl -b "HELLO" -v
use package1;
use package2;
my $argb = GetPackage1Option;
my $argv = GetPackage2Option;
print "$argb\n"; # Print -b argument
print "v is found!\n" if ( GetPackage2Option ); # Print
[package1.pm]
use Getopt::Std;
require Exporter;
#ISA = qw(Exporter);
#EXPORT = qw( GetPackage1Option );
sub GetPackage1Option {
getopt('b');
our ($opt_b);
my $argb = $opt_b || '';
return $argb;
}
[package2.pm]
use Getopt::Std;
require Exporter;
#ISA = qw(Exporter);
#EXPORT = qw( GetPackage2Option );
sub GetPackage2Option {
my $bool = 0;
my %opt;
getopts('v', \%opt);
return ( defined $optHash{v} ) ? $optHash{v} : $bool;
}
I am calling 2 functions from 2 separate packages and each function uses a different option.
I tried this but I am only getting the option for -b. The -v option works if I remove the functions that use -b.
Apologies this only sort of a summary of my original code.
Please let me know if my question is confusing. Thanks :)
Getopt::Std operates on and "consumes" the global #ARGV array. To run getopt or getopts multiple times, you could use local to make a temporary copy of #ARGV.
sub GetPackage1Option {
local #ARGV = #ARGV;
getopt('b');
our ($opt_b);
my $argb = $opt_b || '';
return $argb;
} # end of scope, original #ARGV restored
Though like Sinan and simbabque say, this is an unorthodox design that you should probably reconsider.
The easiest way out of the conundrum is for command line argument parsing to happen at the entry point of your program where the values of the arguments can be saved in a hash, where you can provide default values for those not specified on the command line etc. Then, you can decide how to communicate the values each module is interested in (e.g. package variables or constructor arguments).
I would strongly recommend against this kind of action at a distance, but here's how one might go about automating the setting of values of package variables in different modules in response to values given on the command line. Each package is responsible for declaring the command line switches in which it is interested.
#!/usr/bin/env perl
use strict;
use warnings;
package X;
our #PACKAGE_OPTS = qw( v x: );
our ($OPT_V, $OPT_X);
sub func {
print "$OPT_V\t$OPT_X\n";
}
package Y;
our #PACKAGE_OPTS = qw( t y: z);
our ($OPT_T, $OPT_Y, $OPT_Z);
sub func {
print join("\t", $OPT_T, $OPT_Y, $OPT_Z), "\n";
}
package main;
use Getopt::Std;
main();
sub main {
init_opts(qw( X Y ));
X::func();
Y::func();
}
sub init_opts {
my %opts;
my $opt_string;
my %opt_to_package;
for my $pkg ( #_ ) {
my #opts = eval "\#${pkg}::PACKAGE_OPTS";
$opt_string .= join '', #opts;
for my $opt ( #opts ) {
(my $var = $opt) =~ s/[^A-Za-z]+//g;
$opts{$var} = '-not provided-';
$opt_to_package{$var} = $pkg;
}
}
getopts($opt_string, \%opts);
for my $opt (keys %opts) {
my $pkg = $opt_to_package{$opt};
my $var = 'OPT_' . uc $opt;
eval "\$${pkg}::${var} = \$opts{\$opt}"
}
return;
}
Output:
C:\...\Temp> perl tt.pl -zvx a -y b
1 a
-not provided- b 1

Functional interface to IO::Compress::Gzip is not handling arguments correctly

Here is a simple example to illustrate the issue I am seeing when trying to use IO::Compress::Gzip:
use strict;
use warnings;
eval {
require IO::Compress::Gzip;
IO::Compress::Gzip->import();
1;
} or do {
my $error = $#;
die "\nERROR: Couldn't load IO::Compress::Gzip" if $error;
};
my $input = shift;
my $out = $input.".gz";
print "Defined!\n" if defined $out;
IO::Compress::Gzip::gzip $input => $out
or die "gzip failed: $!\n";
This generates the following error:
Defined!
Use of uninitialized value $_[1] in string eq at /home/statonse/perl/perlbrew/perls/perl-5.22.0/lib/5.22.0/IO/Compress/Base/Common.pm line 280.
IO::Compress::Gzip::gzip: output filename is undef or null string at test.pl line 17.
However, if I use the object interface:
use strict;
use warnings;
eval {
require IO::Compress::Gzip;
IO::Compress::Gzip->import();
1;
} or do {
my $error = $#;
die "\nERROR: Couldn't load IO::Compress::Gzip" if $error;
};
my $input = shift;
my $out = $input.".gz";
print "Defined!\n" if defined $out;
my $z = new IO::Compress::Gzip $out
or die "IO::Compress::Gzip failed: $!\n";
$z->print($input);
It works just fine. For some context, it would work as normal if I imported the module with use:
use strict;
use warnings;
use IO::Compress::Gzip;
my $input = shift;
my $out = $input.".gz";
IO::Compress::Gzip::gzip $input => $out
or die "gzip failed: $!\n";
but I am trying to avoid that since this library is rarely used in the application. Is there something obvious I am doing wrong or is this a behavior specific to this module?
This line:
IO::Compress::Gzip::gzip $input => $out
is parsed differently depending on whether the parser knows that there is a function called IO::Compress::Gzip::gzip or not.
When you load the library with use, its functions are known to the parser before the rest of your program is parsed (because use is a type of BEGIN). In this case the parser chooses the interpretation you want.
In the other case, it chooses the alternate interpretation: indirect object syntax, equivalent to $input->IO::Compress::Gzip::gzip, $out
You can see this for yourself by running perl -MO=Deparse on the different versions of your program.
The fix is to make the function call explicit with parentheses:
IO::Compress::Gzip::gzip($input, $out)
The parser can't misinterpret that.

Perl print out all subs arguments at every call at runtime

I'm looking for way to debug print each subroutine call from the namespace Myapp::* (e.g. without dumping the CPAN modules), but without the need edit every .pm file manually for to inserting some module or print statement.
I just learning (better to say: trying to understand) the package DB, what allows me tracing the execution (using the shebang #!/usr/bin/perl -d:Mytrace)
package DB;
use 5.010;
sub DB {
my( $package, $file, $line ) = caller;
my $code = \#{"::_<$file"};
print STDERR "--> $file $line $code->[$line]";
}
#sub sub {
# print STDERR "$sub\n";
# &$sub;
#}
1;
and looking for a way how to use the sub call to print the actual arguments of the called sub from the namespace of Myapp::*.
Or is here some easier (common) method to
combine the execution line-tracer DB::DB
with the Dump of the each subroutine call arguments (and its return values, if possible)?
I don't know if it counts as "easier" in any sane meaning of the word, but you can walk the symbol table and wrap all functions in code that prints their arguments and return values. Here's an example of how it might be done:
#!/usr/bin/env perl
use 5.14.2;
use warnings;
package Foo;
sub first {
my ( $m, $n ) = #_;
return $m+$n;
}
sub second {
my ( $m, $n ) = #_;
return $m*$n;
}
package main;
no warnings 'redefine';
for my $k (keys %{$::{'Foo::'}}) {
my $orig = *{$::{'Foo::'}{$k}}{CODE};
$::{'Foo::'}{$k} = sub {
say "Args: #_";
unless (wantarray) {
my $r = $orig->(#_);
say "Scalar return: $r";
return $r;
}
else {
my #r = $orig->(#_);
say "List return: #r";
return #r
}
}
}
say Foo::first(2,3);
say Foo::second(4,6);

What is currently the most comfortable and reliable cross-platform Perl module to do parallel downloads?

I'm going to have to download a number of datasets via simply POSTing at an url and getting XML in return. I will be able to speed this up by doing more than one request at a time, but here's the hook:
It will need to run on both Windows and Linux, so threads and forks are both out. (Since this is purely IO-bound i don't think they're needed either.)
Additionally my coworkers aren't on a very high level of perl understanding, but need to be able to grasp how to use it (not necessarily what's going on, usage is fine). As such i'd be happy if its API was somewhat simple.
Right now i'm looking at IO::Lambda for this.
Any other suggestions?
Post-Mortem: Based on draegtun's suggestion i've now thrown together this, which does the job perfectly: https://gist.github.com/661386 You might see it on CPAN soonish.
Have a look at AnyEvent::HTTP. According to the CPAN testers platform matrix it does compile & work on Windows.
Below is a straightforward example of async POSTing (http_post).
use 5.012;
use warnings;
use AnyEvent::HTTP;
my $cv = AnyEvent->condvar;
my #urls = (
[google => 'http://google.com', 'some body'],
[yahoo => 'http://yahoo.com' , 'any body' ],
);
for my $site (#urls) {
my ($name, $url, $body) = #$site;
$cv->begin;
http_post $url, $body => sub {
my $xml = shift;
do_something_with_this( $name, $xml );
$cv->end;
}
}
# wait till all finished
$cv->recv;
say "Finished";
sub do_something_with_this { say #_ }
NB. Remember whatever you decide todo with do_something_with_this try to avoid anything that blocks. See other non-blocking AnyEvent modules
/I3az/
You can try to use LWP::Parallel.
Update:
I just tried to build it on Windows XP with ActiveState's 5.10.1 and encountered a bunch of test failures some which are due to the TEST script blindly prepending .. to all entries in #INC and others seem to be due to a version mismatch with LWP::Protocol::* classes.
This is a concern. I might go with Parallel::ForkManager in conjunction with LWP.
#!/usr/bin/perl
use strict; use warnings;
use Config::Std { def_sep => '=' };
use File::Slurp;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use Parallel::ForkManager;
die "No config file specified\n" unless #ARGV;
my ($ini) = #ARGV;
read_config $ini, my %config;
my $pm = Parallel::ForkManager->new(10);
my #urls = #{ $config{''}{url} };
for my $url ( #urls ) {
$pm->start and next;
my $param = [ %{ $config{$url} } ];
my $request = POST $url, $param;
my $ua = LWP::UserAgent->new;
my $fn = sprintf '%s-%s-%s.xml',
map $request->$_, qw( method uri content);
$fn =~ s/\W+/_/g;
my $response = $ua->request( $request );
if ( $response->code == 200 ) {
write_file $fn, \ $response->as_string;
}
else {
warn $response->message, "\n";
}
$pm->finish;
}
$pm->wait_all_children;
Here is a sample config file:
url = http://one.example.com/search
url = http://two.example.com/query
url = http://three.example.com/question
[http://one.example.com/search]
keyword = Perl
limit = 20
[http://two.example.com/query]
type = Who is
limit = 10
[http://three.example.com/question]
use = Perl
result = profit
Update:
If you need to convince yourself that execution is not serial, try the following short script:
#!/usr/bin/perl
use strict; use warnings;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(2);
for my $sub (1 .. 4) {
$pm->start and next;
for my $i ('a' .. 'd') {
sleep rand 3;
print "[$sub]: $i\n";
}
$pm->finish;
}
$pm->wait_all_children;
Output:
[1]: a
[1]: b
[2]: a
[1]: c
[1]: d
[2]: b
[3]: a
[3]: b
[3]: c
[2]: c
[3]: d
[2]: d
[4]: a
[4]: b
[4]: c
[4]: d
Regarding your comment about "reliability", I believe it's misguided. What you are doing is simulated by the following script:
#!/usr/bin/perl
use strict; use warnings;
use Parallel::ForkManager;
use YAML;
my #responses = parallel_run();
print Dump \#responses;
sub parallel_run {
my $pm = Parallel::ForkManager->new(2);
my #responses;
for my $sub (1 .. 4) {
$pm->start and next;
for my $i ('a' .. 'd') {
sleep rand 3;
push #responses, "[$sub]: $i";
}
$pm->finish;
}
$pm->wait_all_children;
return #responses;
}
The output you get from that will be:
--- []
It is up to you to figure out why. That's why Parallel::ForkManager allows you to register callbacks. Just like the ones you are using with AnyEvent::HTTP.
What module you use is your own business. Just don't keep making blatantly false statements.
Mojo::UserAgent can also do async paralell http. Its API might be easier to understand for non-perl people than some of the other modules..
Not sure if it qualifies as "reliable" yet ..