I am trying to use Parallel::ForkManager to control some child processes. I would like to limit the number of processes running concurrently to 10. In total I need to run 20.
I know I could set the process limit at 10 in the first line at the object declaration, but I am also using the $pm object to run child processes that do something different (the current function is much more memory intensive so needs to be limited).
The code I have currently does not work, the run on finish call is never made, so the remaining 10 children never get forked. I don't understand why this is the case- I'd have thought the child would still call the finish code on exit,and decrement the count, but the "if" statement seems to stop this. Could someone explain why this is the case?
Thanks for any help!
# Parallel declarations
my $pm = Parallel::ForkManager->new(30);
$pm->run_on_finish(sub {
my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_str_ref) = #_;
--$active_jobs;
})
my $total_jobs = 0;
my $active_jobs = 0;
while( $total_jobs < 20) {
sleep 300 and next if $active_jobs > 10;
my $pid = $pm->start and ++$active_p1_jobs and ++$total_p1_jobs and next;
my $return = module::function(%args);
$pm->finish(0, { index => $total_jobs, return => $return });
}
print STDERR "Submitted all jobs, now waiting for children to exit.\n";
$pm->wait_all_children();
I'm going to call "type 2" the jobs that are limited to 10.
This is how I'd do it with P::FM:
use strict;
use warnings;
use List::Util qw( shuffle );
use Parallel::ForkManager qw( );
use POSIX qw( WNOHANG );
use Time::HiRes qw( sleep );
use constant MAX_WORKERS => 30;
use constant MAX_TYPE2_WORKERS => 10;
sub is_type2_job { $_[0]{type} == 2 }
my #jobs = shuffle(
( map { { type => 1, data => $_ } } 0..19 ),
( map { { type => 2, data => $_ } } 0..19 ),
);
my $pm = Parallel::ForkManager->new(MAX_WORKERS);
my $type2_count = 0;
$pm->run_on_finish(sub {
my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $job) = #_;
--$type2_count if is_type2_job($job);
print "Finished: $pid, $job->{type}, $job->{data}, $job->{result}\n";
});
my #postponed_jobs;
while (#postponed_jobs || #jobs) {
my $job;
if (#postponed_jobs && $type2_count < MAX_TYPE2_WORKERS) {
$job = shift(#postponed_jobs);
}
elsif (#jobs) {
$job = shift(#jobs);
if ($type2_count >= MAX_TYPE2_WORKERS && is_type2_job($job)) {
push #postponed_jobs, $job;
redo;
}
}
# elsif (#postponed_jobs) {
# # Already max type 2 jobs being processed,
# # but there are idle workers.
# $job = shift(#postponed_jobs);
# }
else {
local $SIG{CHLD} = sub { };
select(undef, undef, undef, 0.300);
$pm->wait_one_child(WNOHANG);
redo;
}
++$type2_count if is_type2_job($job);
my $pid = $pm->start and next;
$job->{result} = $job->{data} + 100; # Or whatever.
$pm->finish(0, $job);
}
$pm->wait_all_children();
But this is broken. The code that picks the next job should be done in the middle of start (i.e. after it waits for children to finish, but before it forks), not before start. This could cause jobs to be run out of order. This isn't the first time I've wished P::FM has a pre-fork callback. Maybe you could ask the maintainer for one.
Related
I'm trying to implement an alarm in my Perl backend process so that it will terminate if it gets stuck for too long. I tried to implement the code given on the alarm documentation page on Perldoc (this is verbatim from the documentation other than the line that calls my program's key subroutine instead of the sample line in the documentation):
eval {
local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
alarm $timeout;
&FaithTree::Backend::commandLine({ 'skipWidgets' => $skipWidgets, 'commandLineId' => $commandLineId, 'force' => $force });
alarm 0;
};
if ($#) {
die unless $# eq "alarm\n"; # propagate unexpected errors
# timed out
}
else {
# didn't
}
Given this code, nothing happens when the alarm should have timed out. On the other hand, if I remove the custom definition for $SIG{ALRM} (which, again, came straight from the Perl documentation) the alarm does fire, just without the custom handler.
I'm wondering if the fact that I'm using Thread::Queue is playing a role in the alarm failing, but that doesn't explain why it works so long as I skip redefining $SIG{ALRM}.
Here's a minimal, runnable version with subroutine that is intentionally an infinite loop for testing:
eval {
$SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
alarm 1;
&FaithTree::Test::Backend::commandLine({ 'skipWidgets' => $skipWidgets, 'commandLineId' => $commandLineId, 'force' => $force });
alarm 0;
};
if ($#) {
die unless $# eq "alarm\n"; # propagate unexpected errors
# timed out
}
else {
exit;
}
package FaithTree::Test::Backend;
use File::Tail;
use threads;
use threads::shared;
use Thread::Queue;
sub commandLine {
our $N //= 4;
my $Q = new Thread::Queue;
my #kids = map threads->create( \&FaithTree::Test::Backend::fetchChild, $Q ), 1 .. $N;
my #feeds = ( "1","2","3","4" );
foreach my $feed (#feeds) {
$Q->enqueue( $feed );
}
$Q->enqueue( ( undef ) x $N );
$_->join for #kids;
}
sub fetchChild {
print "Test";
# Access queue.
my $Q = shift;
#What is my thread id?
my $tid = threads->tid();
my ($num, $num2);
for ( ; ; ){
if ($num2 == 10000) {
say STDERR $tid . ': ' . $num;
$num2 = 0;
}
$num++;
$num2++;
}
return 1;
}
If you comment out the $SIG{ALRM} line, it will terminate when the alarm is set to time out. If you leave it in place, it will never terminate.
Signals and threads don't mix well. You might want to rethink your use of signals. For example, you could move all the thread stuff to a child process.
Signal handlers are only called between Perl ops. The main thread is in a call to XS sub thread->join, and the signal handler will be called once join returns.
Most blocking system calls can be interrupted (returning error EINTR), so it might be possible to write a signal-aware version of join. Except I seem to remember pthread functions not being interruptible, so maybe not.
In this particular case, you could have the threads signal the main thread when they are over using a system that allows the main thread to block until the a signal occurs or a timeout has occured. cond_signal/cond_timedwait is such a system.
use Sub::ScopeFinalizer qw( scope_finalizer );
use Time::HiRes qw( time );
my $lock :shared;
my $threads_remaining = $N;
my $Q = new Thread::Queue;
my #threads;
{
lock $lock;
for (1..$N) {
++$threads_remaining;
push #threads, async {
my $guard = scope_finalizer {
lock $lock;
--$threads_remaining;
cond_signal($lock);
};
worker($Q);
}
}
}
my $max_end_time = time + 1;
# ...
{
lock $lock;
while ($threads_remaining) {
if (!cond_timedwait($lock, $max_end_time)) {
# ... Handle timeout ...
}
}
}
$_->join for #threads;
My code is like below:
use threads;
use threads::shared;
use Thread::Queue;
my $q = Thread::Queue->new();
my #threads = ();
my $run :shared = 1;
$SIG{'TERM'} = sub {
$run = 0;
$q->enqueue('exit');
foreach(#threads){
$_->join();
}
};
push #threads, threads->create(proc1);
push #threads, threads->create(proc2);
sub proc1 {
while($p = $q->dequeue()){
if($p eq 'exit'){
last;
}
.....
}
$q->end();
threads->exit();
}
sub proc2 {
while($run){
.....
}
}
On TERM signal im trying to wait till all threads are ended. However, whenever I pass TERM signal my program gets terminated with an error
Segmentation fault
How to fix this ?
Assuming threads->create(proc1) even works (and that would only be because you didn't use use strict; as you should), then your program exits immediately after creating the threads. You need to have your main thread wait for the children threads to finish.
Fixing that problem (and applying some simplifications) results in the following:
use strict;
use warnings;
use threads;
use threads::shared;
use Thread::Queue 3.01 qw( );
my $q = Thread::Queue->new();
my $run :shared = 1;
$SIG{TERM} = sub {
print("Received SIGTERM. Shutting down...\n");
$run = 0;
$q->end();
};
print("$$\n");
threads->create(\&proc1);
threads->create(\&proc2);
$_->join() for threads->list();
sub proc1 {
while(my $p = $q->dequeue()) {
sleep 1; # Placeholder
}
}
sub proc2 {
while($run){
sleep 1; # Placeholder
}
}
I don't get a seg fault, but the program doesn't exit either. The signal handler is simply not called. It's because Perl is waiting for join to return before calling the signal handler. You can solve that by polling the list of joinable threads. In other words, replace
$_->join() for threads->list();
with
my $running_threads = 2;
while ($running_threads) {
for my $thread (threads->list(threads::joinable)) {
$thread->join();
$running_threads--;
}
sleep 1;
}
I am trying to use Parallel::ForkManager to run proceed parallel but unfortunately the subroutine parallel does not return any entries.
sub parallel {
my ($self,$values) = #_;
my %hash;
my $pm = Parallel::ForkManager->new(200);
foreach my $IP ( keys %{$values} ) {
my $pid = $pm->start and next;
$hash{$IP}=$self->getData($IP);
$pm->finish(0, \$hash{$IP});
}
$pm->wait_all_children;
return %hash;
}
print Dumper( parallel(%data) );
What I'm doing wrong? Any ideas?
Forking is the creation of a new process that's a copy of the current process. Changing a variable in one process doesn't change similarly named variables in other processes.
You modify the child's process's %hash, but you're dumping the parent's process's %hash.
P::FM does provide a mechanism for passing data back to the parent process. It's documented under the heading "RETRIEVING DATASTRUCTURES from child processes".
use Data::Dumper qw( Dumper );
use Parallel::ForkManager qw( );
use constant MAX_WORKERS => 200;
my %hash;
my $pm = Parallel::ForkManager->new(MAX_WORKERS);
$pm->run_on_finish(sub {
my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $result_ref) = #_;
my $IP = $ident;
warn("Child $IP killed by signal $exit_signal"), return if $exit_signal;
warn("Child $IP exited with error $exit_code"), return if $exit_code;
warn("Child $IP encountered an unknown error"), return if !$result_ref;
$hash{$IP} = $$result_ref;
});
for my $IP (keys %$values) {
my $pid = $pm->start($IP) and next;
$pm->finish(0, \$self->getData($IP));
}
$pm->wait_all_children();
print(Dumper(\%hash));
I start to learn AnyEvent and have some trobles with it.
I totally misunderstood how its possible to get asynchronous profit, fe :
#!/usr/bin/env perl
package LatencySub;
use strict;
use warnings;
use AnyEvent;
# sub for emulate latency - is it right way?
sub do_delay{
my ($name, $delay) = (#_);
my $cv = AE::cv;
my $timer = AE::timer $delay, 0, sub { $cv->send() };
$cv->recv;
return $name.' proceed, delay is '.$delay;
};
package main;
use 5.12.0;
use warnings;
use Smart::Comments;
use AnyEvent;
my #list = (
{ name => 'first', delay => 1 },
{ name => 'second', delay => 1 },
{ name => 'third', delay => 2 }
);
sub process_cb {
my ( $name, $delay, $cb ) = #_;
my $result = LatencySub::do_delay( $name, $delay );
$cb->($result);
}
my %result;
my $cv = AE::cv;
# outer loop
$cv->begin (sub { shift->send (\%result) });
my $before_time = AE::time;
### foreach start...
foreach my $entity (#list) {
$cv->begin;
process_cb (
$entity->{'name'},
$entity->{'delay'},
sub {
$result{$entity->{'name'}} = shift;
$cv->end;
}
);
}
### foreach end...
$cv->end;
my $time_all = AE::time - $before_time;
### $time_all
### %result
At output I got:
### foreach start...
### foreach end...
### $time_all: '4.02105116844177'
### %result: {
### first => 'first proceed, delay is 1',
### second => 'second proceed, delay is 1',
### third => 'third proceed, delay is 2'
### }
All delay sum (1+1+2) eq $time_all - 4 seconds.
So, no profit at all.
Why is it and how I can (and is it possible?) create "right" callback?
Don't use condvars except to block your top-level program while waiting for events to complete. Using condvars makes it very difficult to reuse code; any function that has a condvar internally can never be safely used in a program that has another function that has a condvar in it. (This is not true if you never call recv and only use cb. But still... it's dangerous and not for those that don't know what they're doing.)
My rule: if the filename ends .pm, no condvars!
If you want to run multiple things in parallel and run some more code once all the results are available, try Event::Join:
sub delay($$) {
AnyEvent->timer( after => $_[0], cb => $_[1] );
}
my $join = Event::Join->new(
on_completion => sub { say "Everything is done" }
events => [qw/t1 t2 t3/],
);
delay 1, $join->event_sender_for('t1');
delay 2, $join->event_sender_for('t2');
delay 3, $join->event_sender_for('t3');
Then, after 3 seconds, you'll see "everything is done". Event::Join is like begin and end on condvars, but can never block your program. So it's easy to reuse code that uses it. Also, events are named, so you can collect results as a hash instead of just calling a callback when other callbacks are called.
The call $cv->recv will block until ->send is called, so do_delay() takes $delay secs to return.
Here is an example of spawning three threads and waiting for all of them to complete:
use strict;
use warnings;
use AnyEvent;
sub make_delay {
my ($name, $delay, $cv) = (#_);
$cv->begin;
return AE::timer $delay, 0, sub { warn "done with $name\n"; $cv->end };
}
my $cv = AE::cv;
my #timers = (make_delay("t1", 3, $cv),
make_delay("t2", 5, $cv),
make_delay("t3", 4, $cv)
);
$cv->recv;
I'm setting something up to SSH out to several servers in 'batches'. I basically want to maintain 5 connections at a time, and when one finishes open up another (following an array of server IPs).
I'm wondering for something like this should I be using fork()? If so, what logic can I use to ensure that the I maintain 5 children at a time?
Forking (or threading) is what you want, but you should look at CPAN for modules that will provide most of what you need to prevent you from reinventing the wheel and going through the learning pains of what you need to do.
For example, Parallel::ForkManager looks like it's EXACTLY what you want.
use Parallel::ForkManager;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
foreach $data (#all_data) {
# Forks and returns the pid for the child:
my $pid = $pm->start and next;
... do some work with $data in the child process ...
$pm->finish; # Terminates the child process
}
There are several modules that solve exactly this problem. See Parallel::ForkManager, Forks::Super, or Proc::Queue, for example.
use Net::OpenSSH::Parallel;
my $pssh = Net::OpenSSH::Parallel->new(connections => 5);
for my $ip (#ips) {
$pssh->add_host($ip);
}
$pssh->push('*', command => 'do this');
$pssh->push('*', command => 'do that');
$pssh->push('*', scp_get => 'foo', 'bar-%HOST%');
$pssh->push('*', scp_put => 'doz', 'there');
$pssh->run;
My personal forking(!) favourite is Proc::Fork
General overview from pod:
use Proc::Fork;
run_fork {
child {
# child code goes here.
}
parent {
my $child_pid = shift;
# parent code goes here.
waitpid $child_pid, 0;
}
retry {
my $attempts = shift;
# what to do if if fork() fails:
# return true to try again, false to abort
return if $attempts > 5;
sleep 1, return 1;
}
error {
# Error-handling code goes here
# (fork() failed and the retry block returned false)
}
};
And to limit the number of maximum processes running for something like SSH batches then this should do the trick:
use strict;
use warnings;
use 5.010;
use POSIX qw(:sys_wait_h);
use Proc::Fork;
my $max = 5;
my %pids;
my #ssh_files = (
sub { system "scp file0001 baz#foo:/somedir/." },
...
sub { system "scp file9999 baz#foo:/somedir/." },
);
while (my $proc = shift #ssh_files) {
# max limit reached
while ($max == keys %pids) {
# loop thru pid list until a child is released
for my $pid (keys %procs) {
if (my $kid = waitpid($pid, WNOHANG)) {
delete $pids{ $kid };
last;
}
}
}
run_fork {
parent {
my $child = shift;
$pids{ $child } = 1;
}
child {
$proc->();
exit;
}
}
}
/I3az/