Perl forking then stopping multiple child processes from parent - perl

In this scenario, I need my perl program to start multiple child processes that last an unknown amount of time, and in fact only the parent process knows when the child processes need to end. I've been trying to fork off more than one process then ending it from the parent but have been unsuccessful. What I have so far:
Successfully forking off one process then ending it
my $pid = fork();
if($pid == 0){
#do things in child process
}
else{
#examine external conditions, when the time is right:
kill 1, $pid;
}
Unsuccessfully trying to extend it to 2 processes:
my $pid = fork();
if($pid != 0){ #parent makes another fork
my $pid2 = fork();
}
if($pid == 0 || $pid2 = 0){
#do things in child process
}
else{
#examine external conditions, when the time is right:
kill 1, $pid;
kill 2, $pid;
}
I've read all the documentation on fork available on the internet, and it was all written about forking off one process which I understand pretty well, but I have no clue how to extend it to 2 or more processes, and would appreciate any help on how to do that.

Once you understand well what's going on in the first answer (but only then!), go have a look at Parallel::ForkManager (or something similar) for real work. There are many, many small niggling details that you can get wrong while working with child processes, so using a third-party module for that can save you a lot of time.

Follow this code, I hope the code is self explanatory:
my $num_process = 5; ## for as many you want, I tested with 5
my %processes; ## to store the list of children
for ( 1 .. $num_process ) {
my $pid = fork();
if ( not defined $pid ) {
die "Could not fork";
}
elseif ( $pid > 0 ) {
## boss
$processes{$pid} = 1;
}
else {
#do things in child process
## exit for child, dont forget this
exit;
}
}
## when things are right to kill ;-)
foreach my $pid ( keys %processes ) {
kill 1, $pid;
}

Related

Understanding how Perl fork works

What would be the right way to fork processes that each one of them runs a different subroutine sub1,sub2,...,subN. After reading a lot of previous thread and material, I feel that I understand the logic but a bit confused on how to write in the cleanest way possible (readability is important to me).
Consider 4 subs. Each one of them gets different arguments. It feels like that the most efficient way would be to create 7 forks that each one of them will run a different sub. The code will look something like this:
my $forks = 0;
foreach my $i (1..4) {
if ($i == 1) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 2) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 3) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 4) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
}
}
for (1 .. $forks) {
my $pid = wait();
print "Parent saw $pid exiting\n";
}
print "done\n";
Some points:
This will work only if all of the forks were successful. But I would like to run the subs even though the fork failed (even though it will not be parallel. In that case, I guess we need to take the subs out of the if and exit only if the $pid wasn't 0. something like:
my $pid = fork();
run1();
$forks++ if ($pid == 0);
exit if ($pid == 0);
But it still feels not right.
Using exit is the right way to kill the child process? if the processes were killed with exit should I still use wait? Will it prevent zombies?
Maybe the most interesting question: What will I do if we have 15 function calls? I would like to somehow create 15 forks but I can't create 15 if-else statements - the code will not be readable that way. At first, I thought that it is possible to insert those function calls into an array (somehow) and loop over that array. But after some research, I didn't find a way that it is possible.
If possible, I prefer not to use any additional modules like Parallel::ForkManager.
Is there a clean and simple way to solve it?
There are a few questions to clear up here.
A basic example
use warnings;
use strict;
use feature 'say';
my #coderefs;
for my $i (1..4) {
push #coderefs, sub {
my #args = #_;
say "Sub #$i with args: #args";
};
}
my #procs;
for my $i (0 .. $#coderefs) {
my $pid = fork // do {
warn "Can't fork: $!";
# retry, or record which subs failed so to run later
next;
};
if ($pid == 0) {
$coderefs[$i]->("In $$: $i");
exit;
}
push #procs, $pid;
#sleep 1;
}
say "Started: #procs";
for my $pid (#procs) {
my $goner = waitpid $pid, 0;
say "$goner exited with $?";
}
We generate anonymous subroutines and store those code references in an array, then go through that array and start that many processes, running a sub in each. After that the parent waitpids on these in the order in which they were started, but normally you'll want to reap as they exit; see docs listed below.
A child process always exits, or you'd have multiple processes executing all of the rest of the code in the program. Once a child process exits the kernel will notify the parent, and the parent can "pick up" that notification ("reap" the exit status of the child process) via wait/waitpid, or use a signal handler to handle/ignore it.
If the parent never does this after the child exited, once it exits itself later the OS stays stuck with that information about the (exited) child process in the process table; that's a zombie. So you do need to wait, so that OS gets done with the child process (and you check up on how it went). Or, you can indicate in a signal handler that you don't care about the child's exit.† Modern systems reap would-be zombies but not always and you cannot rely on that; clean up after yourself.
Note, you'll need to be reading perlipc, fork, wait and waitpid, perlvar ... and yet other resources that'll come up while working on all this. It will take a little playing and some trial and error. Once you get it all down you may want to start using modules, at least for some types of tasks.
† To ignore the SIGCHLD (default)
$SIG{CHLD} = 'IGNORE';
Or, can run code there (but well advised to be minimal)
$SIG{CHLD} = sub { ... };
These signal "dispositions" are inherited in fork-ed processes (but not via execve).
See the docs listed above, and the basics of %SIG variable in perlvar. Also see man(7) signal. All this is generally *nix business.
This is a global variable, affecting all code in the interpreter. In order to limit the change to the nearest scope use local
local $SIG{CHLD} = ...

IPC communication between 2 processes with Perl

Let's say we have a 'Child' and 'Parent' process defined and subroutines
my $pid = fork;
die "fork failed: $!" unless defined($pid);
local $SIG{USR1} = sub {
kill KILL => $pid;
$SIG{USR1} = 'IGNORE';
kill USR1 => $$;
};
and we divide them, is it possible to do the following?
if($pid == 0){
sub1();
#switch to Parent process to execute sub4()
sub2();
#switch to Parent process to execute sub5()
sub3();
}
else
{
sub4();
#send message to child process so it executes sub2
sub5();
#send message to child process so it executes sub3
}
If yes, can you point how, or where can I look for the solution? Maybe a short example would suffice. :)
Thank you.
There is a whole page in the docs about inter process communication: perlipc
To answer your question - yes, there is a way to do what you want. The problem is, exactly what it is ... depends on your use case. I can't tell what you're trying to accomplish - what you you mean by 'switch to parent' for example?
But generally the simplest (in my opinion) is using pipes:
#!/usr/bin/env perl
use strict;
use warnings;
pipe ( my $reader, my $writer );
my $pid = fork(); #you should probably test for undef for fork failure.
if ( $pid == 0 ) {
## in child:
close ( $writer );
while ( my $line = <$reader> ) {
print "Child got $line\n";
}
}
else {
##in parent:
close ( $reader );
print {$writer} "Parent says hello!\n";
sleep 5;
}
Note: you may want to check your fork return codes - 0 means we're in the child - a number means we're in the parent, and undef means the fork failed.
Also: Your pipe will buffer - this might trip you over in some cases. It'll run to the end just fine, but you may not get IO when you think you should.
You can open pipes the other way around - for child->parent comms. Be slightly cautious when you multi-fork though, because an active pipe is inherited by every child of the fork - but it's not a broadcast.

How can I kill forked processes that take too long in my perl script without timing out the forked process?

I've been using the following template for all of my forking/processes needs when it comes to processing "things" in parallel. It basically loops through everything I need to process, X number of entries at a time, and time's out any entries that take too long:
my $num_procs = 0;
foreach my $entry (#entries) {
$num_procs++;
if($num_procs == $MAX_PROCS) {
wait();
$num_procs--;
}
my $pid = fork();
if($pid == 0) {
process($entry);
}
}
for (; $num_procs>0; $num_procs--) {
wait();
}
The "process" routine has the following template which times out the process:
my $TIMEOUT_IN_SECONDS = 15;
eval {
local $SIG{ALRM} = sub { die "alarm" };
alarm($TIMEOUT_IN_SECONDS);
# do something
alarm(0);
};
if ($#) {
# do something about the timeout
}
I've now come across an issue where this no longer works because the child is unable to time itself out. (I think this is due to an I/O blocking issue with NFS) The only way around this, I'm thinking, is for the parent itself to kill -9 the child.
Is there a way to modify my code to do this?
Whenever alarm can be flaky, it is a good use case for the poor man's alarm:
my $pid = fork();
if ($pid == 0) {
... # child code
exit;
}
if (fork() == 0) {
my $time = 15;
exec($^X, "-e", "sleep 1,kill(0,$pid)||exit for 1..$time;kill -9,$pid");
die; # shouldn't get here
}
The first fork fires off your child process. The second fork is for running a process to kill the first process after $time seconds.

can a perl child wait for another perl child to finish?

So I have a program where i spawn off some children who do some useful tasks. I then spawn off another child who needs to wait for the first children to stop before doing its work. The parent program then continues running and at the end waits for the last forked child to stop.
I'm getting an issue where the child who needs to wait on the others doesn't.
use strict;
use warnings;
use diagnostics;
my $pid1;
my $child1 = fork();
if ($child1) {
# parent
#print "pid is $pid, parent $$\n";
$pid1 = $child1;
} elsif ($child1 == 0) {
# child1
# do something
sleep 20;
print "Child1\n";
exit 0;
} else {
die "couldnt fork: $!\n";
}
my $pid2;
my $child2 = fork();
if ($child2) {
# parent
#print "pid is $pid, parent $$\n";
$pid2 = $child2;
} elsif ($child2 == 0) {
# child2
# wait for child1 to finish
my $tmp = waitpid($pid1, 0);
# do something else
print "Child2\n";
exit 0;
} else {
die "couldnt fork: $!\n";
}
# do more stuff
# wait for child2 to finish
my $tmp = waitpid($pid2, 0);
Is there an easy way to do this? Possibly without having to wrap the first child in the second?
In Unix-like systems, a given process can only wait for its own children to die. It can't wait for siblings, ancestors or grandchildren to die.
Spawn one child, have that process spawn sub-child processes, and then wait for them to finish before continuing in the first child process.
Then have the parent do its work and wait on the child process when it's ready to wait.
The easy way to do this is with Forks::Super.
use Forks::Super;
my $child1 = fork();
if ($child1 != 0) {
# ... parent code ...
} else {
# ... child code ...
exit;
}
my $child2 = fork {
depend_on => $child1,
on_busy => 'queue',
sub => sub {
# ... code to execute in 2nd child ...
}
};
# ... more code to run in the parent ...
# ... and at the end of the program:
waitall;
In Forks::Super, waitpid is still called in the parent (behind the scenes). But when the first child is finished, Forks::Super will know it is time to start launch the second child process in the background.
Assuming your perl has semop and
friends, you could use System V semaphores for synchronization among the children. See
below for a working example program.
We begin with the usual front matter. Rather than calling the low-level
semaphore operations directly, the code uses the built-in
IPC::SysV and
IPC::Semaphore modules.
#! /usr/bin/env perl
use strict;
use warnings;
use IPC::Semaphore;
use IPC::SysV qw/ IPC_PRIVATE S_IRUSR S_IWUSR IPC_CREAT /;
This program separates child processes into the two stages. Children in
the first stage run to completion, performing their processing with no
synchronization concerns. We can have arbitrarily many of these.
We have a single second-stage process, but it executes after all
first-stage children have finished.
Below are simple placeholder implementations.
# how many other children the last child must wait for
my $FIRST_STAGE_CHILDREN = 2;
sub first_stage {
my($id) = #_;
print "[$$] hello from child $id\n";
sleep rand 10;
print "[$$] child $id done\n";
}
sub second_stage {
print "[$$] hello from second-stage child!\n";
}
To implement synchronization between the first and second stages, the
program creates a set of semaphores whose size equals the number of
first-stage children. When a first-stage child completes, the program
releases the particular semaphore corresponding to that child.
my $sem = IPC::Semaphore->new(
IPC_PRIVATE, $FIRST_STAGE_CHILDREN,
S_IRUSR | S_IWUSR | IPC_CREAT)
or die "$0: failed to create semaphore: $!";
As we will see later, the second-stage child waits on his brethren by
attempting to decrement their semaphores. By starting the value at zero,
when the second-stage child attempts these decrements, the OS will put
the child to sleep because. Only after all first-stage children have
exited and released their semaphores will the system unblock the
second-stage child.
# start in blocked state
$sem->setall((0) x $FIRST_STAGE_CHILDREN);
First we fork the first-stage children. In this design, the parent
process does as much bookkeeping as possible. This keeps the definition
of first_stage and second_stage simple. Also, if a first-stage child
somehow exited without releasing its semaphore, the second stage would
have no hope of running.
my %kids;
foreach my $id (0 .. $FIRST_STAGE_CHILDREN - 1) {
my $pid = fork;
die "$0: fork: $!" unless defined $pid;
if ($pid) {
++$kids{$pid};
}
else {
first_stage $id;
$sem->op($id, 1, 0); # release
exit 0;
}
}
Now we fork the second-stage child. Important: although the code
performs an operation on multiple semaphores, this happens atomically,
that is, either it works for all of them or none of them. In no
observable state will it appear that the second stage was able to grab
any fewer than all of the first-stage semaphores. This is an important
property. In more complex systems, haphazard onesie-twosie taking and
releasing will result in deadlock.
my $pid = fork;
die "$0: fork: $!" unless defined $pid;
if ($pid) {
++$kids{$pid};
}
else {
# block waiting on all first-stage children
my #op = map +($_, -1, 0), 0 .. $FIRST_STAGE_CHILDREN - 1;
$sem->op(#op);
second_stage;
exit 0;
}
Finally, the parent process waits for all children to complete.
do {
$pid = waitpid -1, 0;
print "[$$] reaped $pid\n";
warn "$0: unknown child $pid" unless delete $kids{$pid};
} while $pid > 0 && keys %kids;
Sample output is below. It’s more interesting to watch live where you can see the pauses.
[18389] hello from child 0
[18390] hello from child 1
[18390] child 1 done
[18388] reaped 18390
[18389] child 0 done
[18391] hello from second-stage child!
[18388] reaped 18389
[18388] reaped 18391

perl process queue

I have a Perl script which forks a number of sub-processes. I'd like to have some kind of functionality like xargs --max-procs=4 --max-args=1 or make -j 4, where Perl will keep a given number of processes running until it runs out of work.
It's easy to say fork four process and wait for them all to complete, and then fork another four, but I'd like to keep four or n processes running at the same time, forking a new process as soon as one completes.
Is there a simple way in Perl to implement such a process pool?
Forks::Super can handle this requirement.
use Forks::Super MAX_PROC => 5, ON_BUSY => [ block | queue ];
Calls to fork() can block until the number of active subprocesses falls below 5, or you can pass additional parameters to the fork call and the tasks to perform can queue up:
fork { sub => sub { ... task to run in subprocess ... } }
When one subprocess finishes, another job on the queue will start up.
(I am the author of this module).
Check out Parallel::ForkManager -- it does much of what you describe. You can set a maximum number of processes, and the callback function could start a new child as soon as one finishes (as long as there is work to do).
While I would almost always use a CPAN module, or write something with the fantastic AnyEvent modules I think its important to understand how these things work under the hood. Here's an example that has no dependencies other than perl. The same approach could also be written in C without too much trouble.
#!/usr/bin/env perl
use strict;
## run a function in a forked process
sub background (&) {
my $code = shift;
my $pid = fork;
if ($pid) {
return $pid;
} elsif ($pid == 0) {
$code->();
exit;
} else{
die "cant fork: $!"
}
}
my #work = ('sleep 30') x 8;
my %pids = ();
for (1..4) {
my $w = shift #work;
my $pid = background {
exec $w;
};
$pids{$pid} = $w;
}
while (my $pid = waitpid(-1,0)) {
if ($?) {
if ($? & 127) {
warn "child died with signal " . ($? & 127);
} else {
warn "chiled exited with value " . ($? >> 8);
}
## redo work that died or got killed
my $npid = background {
exec $pids{$pid};
};
$pids{$npid} = delete $pids{$pid};
} else {
delete $pids{$pid};
## send more work if there is any
if (my $w = shift #work) {
my $pid = background {
exec shift #work;
};
$pids{$pid} = $w;
}
}
}