perl process queue - perl

I have a Perl script which forks a number of sub-processes. I'd like to have some kind of functionality like xargs --max-procs=4 --max-args=1 or make -j 4, where Perl will keep a given number of processes running until it runs out of work.
It's easy to say fork four process and wait for them all to complete, and then fork another four, but I'd like to keep four or n processes running at the same time, forking a new process as soon as one completes.
Is there a simple way in Perl to implement such a process pool?

Forks::Super can handle this requirement.
use Forks::Super MAX_PROC => 5, ON_BUSY => [ block | queue ];
Calls to fork() can block until the number of active subprocesses falls below 5, or you can pass additional parameters to the fork call and the tasks to perform can queue up:
fork { sub => sub { ... task to run in subprocess ... } }
When one subprocess finishes, another job on the queue will start up.
(I am the author of this module).

Check out Parallel::ForkManager -- it does much of what you describe. You can set a maximum number of processes, and the callback function could start a new child as soon as one finishes (as long as there is work to do).

While I would almost always use a CPAN module, or write something with the fantastic AnyEvent modules I think its important to understand how these things work under the hood. Here's an example that has no dependencies other than perl. The same approach could also be written in C without too much trouble.
#!/usr/bin/env perl
use strict;
## run a function in a forked process
sub background (&) {
my $code = shift;
my $pid = fork;
if ($pid) {
return $pid;
} elsif ($pid == 0) {
$code->();
exit;
} else{
die "cant fork: $!"
}
}
my #work = ('sleep 30') x 8;
my %pids = ();
for (1..4) {
my $w = shift #work;
my $pid = background {
exec $w;
};
$pids{$pid} = $w;
}
while (my $pid = waitpid(-1,0)) {
if ($?) {
if ($? & 127) {
warn "child died with signal " . ($? & 127);
} else {
warn "chiled exited with value " . ($? >> 8);
}
## redo work that died or got killed
my $npid = background {
exec $pids{$pid};
};
$pids{$npid} = delete $pids{$pid};
} else {
delete $pids{$pid};
## send more work if there is any
if (my $w = shift #work) {
my $pid = background {
exec shift #work;
};
$pids{$pid} = $w;
}
}
}

Related

Understanding how Perl fork works

What would be the right way to fork processes that each one of them runs a different subroutine sub1,sub2,...,subN. After reading a lot of previous thread and material, I feel that I understand the logic but a bit confused on how to write in the cleanest way possible (readability is important to me).
Consider 4 subs. Each one of them gets different arguments. It feels like that the most efficient way would be to create 7 forks that each one of them will run a different sub. The code will look something like this:
my $forks = 0;
foreach my $i (1..4) {
if ($i == 1) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 2) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 3) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 4) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
}
}
for (1 .. $forks) {
my $pid = wait();
print "Parent saw $pid exiting\n";
}
print "done\n";
Some points:
This will work only if all of the forks were successful. But I would like to run the subs even though the fork failed (even though it will not be parallel. In that case, I guess we need to take the subs out of the if and exit only if the $pid wasn't 0. something like:
my $pid = fork();
run1();
$forks++ if ($pid == 0);
exit if ($pid == 0);
But it still feels not right.
Using exit is the right way to kill the child process? if the processes were killed with exit should I still use wait? Will it prevent zombies?
Maybe the most interesting question: What will I do if we have 15 function calls? I would like to somehow create 15 forks but I can't create 15 if-else statements - the code will not be readable that way. At first, I thought that it is possible to insert those function calls into an array (somehow) and loop over that array. But after some research, I didn't find a way that it is possible.
If possible, I prefer not to use any additional modules like Parallel::ForkManager.
Is there a clean and simple way to solve it?
There are a few questions to clear up here.
A basic example
use warnings;
use strict;
use feature 'say';
my #coderefs;
for my $i (1..4) {
push #coderefs, sub {
my #args = #_;
say "Sub #$i with args: #args";
};
}
my #procs;
for my $i (0 .. $#coderefs) {
my $pid = fork // do {
warn "Can't fork: $!";
# retry, or record which subs failed so to run later
next;
};
if ($pid == 0) {
$coderefs[$i]->("In $$: $i");
exit;
}
push #procs, $pid;
#sleep 1;
}
say "Started: #procs";
for my $pid (#procs) {
my $goner = waitpid $pid, 0;
say "$goner exited with $?";
}
We generate anonymous subroutines and store those code references in an array, then go through that array and start that many processes, running a sub in each. After that the parent waitpids on these in the order in which they were started, but normally you'll want to reap as they exit; see docs listed below.
A child process always exits, or you'd have multiple processes executing all of the rest of the code in the program. Once a child process exits the kernel will notify the parent, and the parent can "pick up" that notification ("reap" the exit status of the child process) via wait/waitpid, or use a signal handler to handle/ignore it.
If the parent never does this after the child exited, once it exits itself later the OS stays stuck with that information about the (exited) child process in the process table; that's a zombie. So you do need to wait, so that OS gets done with the child process (and you check up on how it went). Or, you can indicate in a signal handler that you don't care about the child's exit.† Modern systems reap would-be zombies but not always and you cannot rely on that; clean up after yourself.
Note, you'll need to be reading perlipc, fork, wait and waitpid, perlvar ... and yet other resources that'll come up while working on all this. It will take a little playing and some trial and error. Once you get it all down you may want to start using modules, at least for some types of tasks.
† To ignore the SIGCHLD (default)
$SIG{CHLD} = 'IGNORE';
Or, can run code there (but well advised to be minimal)
$SIG{CHLD} = sub { ... };
These signal "dispositions" are inherited in fork-ed processes (but not via execve).
See the docs listed above, and the basics of %SIG variable in perlvar. Also see man(7) signal. All this is generally *nix business.
This is a global variable, affecting all code in the interpreter. In order to limit the change to the nearest scope use local
local $SIG{CHLD} = ...

Perl (tk): how to run asynchronously a system command, being able to react to it's output?

I'm writing a wrapper to an external command ("sox", if this can help) with Perl "Tk".
I need to run it asynchronously, of course, to avoid blocking tk's MainLoop().
But, I need to read it's output to notify user about command's progress.
I am testing a solution like this one, using IPC::Open3:
{
$| = 1;
$pid = open3(gensym, ">&STDERR", \*FH, $cmd) or error("Errore running command \"$cmd\"");
}
while (defined($ch = FH->getc)) {
notifyUser($ch) if ($ch =~ /$re/);
}
waitpid $pid, 0;
$retval = $? >> 8;
POSIX::close($_) for 3 .. 1024; # close all open handles (arbitrary upper bound)
But of course the while loop blocks MainLoop until $cmd does terminate.
Is there some way to read output handle asynchronously?
Or should I go with standard fork stuff?
The solution should work under win32, too.
For non-blocking read of a filehandle, take a look at Tk::fileevent.
Here's an example script how one can use a pipe, a forked process, and fileevent together:
use strict;
use IO::Pipe;
use Tk;
my $pipe = IO::Pipe->new;
if (!fork) { # Child XXX check for failed forks missing
$pipe->writer;
$pipe->autoflush(1);
for (1..10) {
print $pipe "something $_\n";
select undef, undef, undef, 0.2;
}
exit;
}
$pipe->reader;
my $mw = tkinit;
my $text;
$mw->Label(-textvariable => \$text)->pack;
$mw->Button(-text => "Button", -command => sub { warn "Still working!" })->pack;
$mw->fileevent($pipe, 'readable', sub {
if ($pipe->eof) {
warn "EOF reached, closing pipe...";
$mw->fileevent($pipe, 'readable', '');
return;
}
warn "pipe is readable...\n";
chomp(my $line = <$pipe>);
$text = $line;
});
MainLoop;
Forking may or may not work under Windows. Also one needs to be cautious when forking within Tk; you must make sure that only one of the two processes is doing X11/GUI stuff, otherwise bad things will happen (X11 errors, crashes...). A good approach is to fork before creating the Tk MainWindow.

How can I kill forked processes that take too long in my perl script without timing out the forked process?

I've been using the following template for all of my forking/processes needs when it comes to processing "things" in parallel. It basically loops through everything I need to process, X number of entries at a time, and time's out any entries that take too long:
my $num_procs = 0;
foreach my $entry (#entries) {
$num_procs++;
if($num_procs == $MAX_PROCS) {
wait();
$num_procs--;
}
my $pid = fork();
if($pid == 0) {
process($entry);
}
}
for (; $num_procs>0; $num_procs--) {
wait();
}
The "process" routine has the following template which times out the process:
my $TIMEOUT_IN_SECONDS = 15;
eval {
local $SIG{ALRM} = sub { die "alarm" };
alarm($TIMEOUT_IN_SECONDS);
# do something
alarm(0);
};
if ($#) {
# do something about the timeout
}
I've now come across an issue where this no longer works because the child is unable to time itself out. (I think this is due to an I/O blocking issue with NFS) The only way around this, I'm thinking, is for the parent itself to kill -9 the child.
Is there a way to modify my code to do this?
Whenever alarm can be flaky, it is a good use case for the poor man's alarm:
my $pid = fork();
if ($pid == 0) {
... # child code
exit;
}
if (fork() == 0) {
my $time = 15;
exec($^X, "-e", "sleep 1,kill(0,$pid)||exit for 1..$time;kill -9,$pid");
die; # shouldn't get here
}
The first fork fires off your child process. The second fork is for running a process to kill the first process after $time seconds.

perl - child process signaling parent

I have written the following piece of code to test signaling between child and parent. Ideally, when the child gives a SIGINT to parent the parent should come back in the new iteration and wait for user input. This I have observed in perl 5.8, but in perl 5.6.1(which I am asked to use) the parent is actually "killed". There is no next iteration.
my $parent_pid = $$;
$pid = fork();
if($pid == 0)
{
print "child started\n";
kill 2, $parent_pid;
}
else
{
while(1)
{
eval
{
$SIG{INT} = sub{die "GOTCHA";};
print 'inside parent'."\n";
$a = <>;
};
if($#)
{
print "got the signal!!!!\n$#\n";
next;
}
}
}
Could someone please give a walkaround for this problem or some other way to signal the parent so that it enters the new iteration.
The failure on 5.6.X might be because of the way Perl used to handle signals, which was fixed with 'Safe Signal Handling' in Perl 5.8.0. In either case, you are using a Perl which is practically archaeological and you should argue strongly to your masters that you should be using at least Perl 5.12, and ideally 5.14.
This is likely to be a race condition, caused by the child sending the SIGINT before the parent was ready for it. Remember that after you fork() you will have two independent processes, each might proceed at whatever pace it likes.
It's best in your case to set up the SIGINT handler before the fork() call, so you know it's definitely in place before the child tries to kill() its parent.
(with some minor corrections):
$SIG{INT} = sub { die "GOTCHA" };
my $parent_pid = $$;
defined( my $pid = fork() ) or die "Cannot fork() - $!";
if($pid == 0)
{
print "child started\n";
kill INT => $parent_pid;
}
else
{
while(1)
{
eval
{
print "inside parent\n";
<>;
};
if($#)
{
print "got the signal!!!!\n$#\n";
next;
}
}
}

Killing child and its children when child was created using open

Here's my code, with error handling and other stuff removed for clarity:
sub launch_and_monitor {
my ($script, $timeout) = #_;
sub REAPER {
while ((my $child = waitpid(-1, &WNOHANG)) > 0) {}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
my $pid = fork;
if (defined $pid) {
if ($pid == 0) {
# in child
monitor($timeout);
}
else {
launch($script);
}
}
}
The launch sub executes a shell script which in turn launches other processes, like so:
sub launch($) {
my ($script) = #_;
my $pid = open(PIPE, "$script|");
# write pid to pidfile
if ($pid != 0) {
while(<PIPE>) {
# do stuff with output
}
close(PIPE) or die $!;
}
}
The monitor sub basically just waits for a specified period of time and then attempts to kill the shell script.
sub monitor($) {
my ($timeout) = #_;
sleep $timeout;
# check if script is still running and if so get pid from pidfile
if (...) {
my $pid = getpid(...);
kill 9, $pid;
}
}
This kills the script, however, it does not kill any of its subprocesses. How to fix it?
You can do this with process groups, if your operating system supports them. You need to make the script process become a process group leader. The child processes that it runs will inherit the process group from their parent. You can then use kill to send a signal to each process in the group at the same time.
In launch(), you will need to replace the open line with one that forks. Then in the child, you would call setpgrp() before exec'ing the command. Something like the following should work:
my $pid = open(PIPE, "-|");
if (0 == $pid) {
setpgrp(0, 0);
exec $script;
die "exec failed: $!\n";
}
else {
while(<PIPE>) {
# do stuff with output
}
close(PIPE) or die $!;
}
Later, to kill the script process and its children, negate the process ID that you're signalling:
kill 9, -$pid;
In general, I don't think you can expect signals to be propagated into all child processes; this isn't specific to perl.
That said, you might be able to use the process group signal feature built into perl kill():
...if SIGNAL is negative, it kills process groups instead of processes...
You probably need to use setpgrp() on your (direct) child process, then change your kill call to something like:
kill -9, $pgrp;
Try adding:
use POSIX qw(setsid);
setsid;
at the top of your launch_and_monitor function. This will put your processes in a separate session, and cause things to exit when the session leader (i.e. the master) exits.
Killing a processgroup works, but don't forget the parent can be killed alone too. Assuming child processes have an event loop, they can check the parent socket that was created in a socketpair prior doing the fork() for validness. In fact, select() cleanly exits when the parent socket is gone, all that needs to be done is to check the socket.
E.g.:
use strict; use warnings;
use Socket;
$SIG{CHLD} = sub {};
socketpair(my $p, my $c, AF_UNIX, SOCK_STREAM, PF_UNSPEC) or die $!;
print "parent $$, fork 2 kids\n";
for (0..1){
my $kid = fork();
unless($kid){
child_loop($p, $c);
exit;
}
print "parent $$, forked kid $kid\n";
}
print "parent $$, waiting 5s\n";
sleep 5;
print "parent $$ exit, closing sockets\n";
sub child_loop {
my ($p_s, $c_s) = #_;
print "kid: $$\n";
close($c_s);
my $rin = '';
vec($rin, fileno($p_s), 1) = 1;
while(1){
select my $rout = $rin, undef, undef, undef;
if(vec($rout, fileno($p_s), 1)){
print "kid: $$, parent gone, exiting\n";
last;
}
}
}
Runs like this:
tim#mint:~$ perl ~/abc.pl
parent 5638, fork 2 kids
parent 5638, forked kid 5639
kid: 5639
parent 5638, forked kid 5640
parent 5638, waiting 5s
kid: 5640
parent 5638 exit, closing sockets
kid: 5640, parent gone, exiting
kid: 5639, parent gone, exiting
tim#mint:~$