Reaping child processes from Perl - perl

I have a script that spawns a set of children. The parent must wait for each of the children to finish.
My script performs similar to the following perl script:
#! /usr/bin/perl
use strict;
use warnings;
print "I am the only process.\n";
my #children_pids;
for my $count (1..10){
my $child_pid = fork();
if ($child_pid) { # If I have a child PID, then I must be the parent
push #children_pids, $child_pid;
}
else { # I am the child
my $wait_time = int(rand(30));
sleep $wait_time;
my $localtime = localtime;
print "Child: Some child exited at $localtime\n";
exit 0; # Exit the child
}
}
foreach my $child (#children_pids) {
print "Parent: Waiting on $child\n";
waitpid($child, 0);
my $localtime = localtime;
print "Parent: Child $child was reaped - $localtime.\n";
}
print "All done.\n";
Similar to the code I've provided above, each child may take a different time to finish.
The problem is when I try to reap the children by looping over the children PIDs, in that last foreach block, the parent waits for the children in the order that they are created.
Obviously the children do not finish in the order which they are spawned and so I'm left with a bunch of zombie processes for children that happen to finish early.
In my actual code, these children may finish days before one another and the number of zombie processes floating around can grow in the hundreds.
Is there a better way for me to reap a set of children?

If your parent process doesn't need to be aware of its children's completion status then you can just set
$SIG{CHLD} = 'IGNORE';
which will automatically reap all children as they complete.
If you do need to be informed of the children completing, then the signal handler needs to be set to reap all possible processes
use POSIX ();
$SIG{CHLD} = sub {
while () {
my $child = waitpid -1, POSIX::WNOHANG;
last if $child <= 0;
my $localtime = localtime;
print "Parent: Child $child was reaped - $localtime.\n";
}
};

use "-1" for the pid, or use the wait() function so that you wait for any child process. The reaped pid is returned, so you can check it against your list if necessary. If that is unacceptable, then periodically waitpid for each pid in your list with POSIX::WNOHANG() as the second argument.

Borodin's answer is perfectly fine for the asynchronous reaping of children as they terminate.
If, as your question and code suggest to me, you are looking for the synchronous (blocking) reaping of all outstanding children in the order in which they terminate, the parent can simply do this:
use feature qw(say);
...
# Block until all children are finished
while (1) {
my $child = waitpid(-1, 0);
last if $child == -1; # No more outstanding children
say "Parent: Child $child was reaped - ", scalar localtime, ".";
}
say "All done."

Never use a loop like this to wait for children:
while (1) {
my $child = waitpid(-1, POSIX::WNOHANG);
last if $child == -1;
print "Parent: Child $child was reaped\n";
}
The parent process will consume 100% cpu while waiting for the child processes to die - especially when they can run for a long time. At least add a sleep (bad idea - when they die fast, the parent is waiting).
Always use a blocking wait + count for TERM/INT/ppid for niceness!:
my $loop = 1;
$SIG{CHLD} = 'DEFAULT'; # turn off auto reaper
$SIG{INT} = $SIG{TERM} = sub {$loop = 0; kill -15 => #children_pids};
while ($loop && getppid() != 1) {
my $child = waitpid(-1, 0);
last if $child == -1;
print "Parent: Child $child was reaped\n";
}
This blocking wait it of course not possible when the parent process also has to do other stuff - like the getppid() call ;-). For that, you can use a socketpair() and put that in a select() that does a blocking call. Even the loop check can benefit from that.

Related

Get the value of a process executed in a child back to parent

I'm looking for a solution which allows me to return the values of a process executed in a child back to the parent process. Currently i try this but have no idea where to hook the return value:
use Proc::ProcessTable;
use POSIX qw(:signal_h :errno_h :sys_wait_h);
$SIG{CHLD} = \&REAPER;
for my $count (1..10) { # start a few demo childs
if (fork () == 0) {
&startChild;
exit 0;
}
}
do {
print "Working\n";
sleep 1;
} while (chkChildProcess());
sub startChild {
print "Starting Child $$\n";
system("date"); #==>Need to get the output of "date" back to parent
sleep 2 + rand 7;
print "End Child $$\n";
}
sub chkChildProcess {
for my $p (#{new Proc::ProcessTable->table}){
if ($p->ppid == $$){
$curPID{$$}=$p->pid;
return 1;
}
}
return undef;
}
sub REAPER {
my $pid;
$pid = waitpid(-1, &WNOHANG);
if ($pid == -1) {
# no child waiting. Ignore it.
} elsif (WIFEXITED($?)) {
print "Process $pid exited.\n";
} else {
print "False alarm on $pid.\n";
}
$SIG{CHLD} = \&REAPER; # in case of unreliable signals
}
Any help would be great.
The bg_eval and bg_qx methods of Forks::Super were made to solve this problem.
use Forks::Super 'bg_eval';
my #result;
for my $count (1 .. 10) {
$result[$count] = bg_eval {
my $date = `date`;
sleep 2 + rand 7;
return $date;
};
}
print "$result[$_]\n" for 1..10;
The block after bg_eval is run asynchronously in a background process. When the background process is finished, the variable $result[$count] will be populated with the result.
When you print $result[$_], one of two things will happen. If the background process associated with that variable is finished, it will contain its return value. If the background process is not finished, it will wait for the process to finish, and then make the return value available in that value.
It looks like you may want to use Parallel::ForkManager, returning the value from the child via the data_structure_reference parameter of the finish method to a run_on_finish callback in the parent.
To capture the output, the easiest way is to use IPC::System::Simple's capture or capturex.
you could use threads::shared instead of fork, create a shared variable, lock it and write into it. keep in mind that locking is reeeally slow!
See also this post on perlmonks on why locking the variable is necessary.

can a perl child wait for another perl child to finish?

So I have a program where i spawn off some children who do some useful tasks. I then spawn off another child who needs to wait for the first children to stop before doing its work. The parent program then continues running and at the end waits for the last forked child to stop.
I'm getting an issue where the child who needs to wait on the others doesn't.
use strict;
use warnings;
use diagnostics;
my $pid1;
my $child1 = fork();
if ($child1) {
# parent
#print "pid is $pid, parent $$\n";
$pid1 = $child1;
} elsif ($child1 == 0) {
# child1
# do something
sleep 20;
print "Child1\n";
exit 0;
} else {
die "couldnt fork: $!\n";
}
my $pid2;
my $child2 = fork();
if ($child2) {
# parent
#print "pid is $pid, parent $$\n";
$pid2 = $child2;
} elsif ($child2 == 0) {
# child2
# wait for child1 to finish
my $tmp = waitpid($pid1, 0);
# do something else
print "Child2\n";
exit 0;
} else {
die "couldnt fork: $!\n";
}
# do more stuff
# wait for child2 to finish
my $tmp = waitpid($pid2, 0);
Is there an easy way to do this? Possibly without having to wrap the first child in the second?
In Unix-like systems, a given process can only wait for its own children to die. It can't wait for siblings, ancestors or grandchildren to die.
Spawn one child, have that process spawn sub-child processes, and then wait for them to finish before continuing in the first child process.
Then have the parent do its work and wait on the child process when it's ready to wait.
The easy way to do this is with Forks::Super.
use Forks::Super;
my $child1 = fork();
if ($child1 != 0) {
# ... parent code ...
} else {
# ... child code ...
exit;
}
my $child2 = fork {
depend_on => $child1,
on_busy => 'queue',
sub => sub {
# ... code to execute in 2nd child ...
}
};
# ... more code to run in the parent ...
# ... and at the end of the program:
waitall;
In Forks::Super, waitpid is still called in the parent (behind the scenes). But when the first child is finished, Forks::Super will know it is time to start launch the second child process in the background.
Assuming your perl has semop and
friends, you could use System V semaphores for synchronization among the children. See
below for a working example program.
We begin with the usual front matter. Rather than calling the low-level
semaphore operations directly, the code uses the built-in
IPC::SysV and
IPC::Semaphore modules.
#! /usr/bin/env perl
use strict;
use warnings;
use IPC::Semaphore;
use IPC::SysV qw/ IPC_PRIVATE S_IRUSR S_IWUSR IPC_CREAT /;
This program separates child processes into the two stages. Children in
the first stage run to completion, performing their processing with no
synchronization concerns. We can have arbitrarily many of these.
We have a single second-stage process, but it executes after all
first-stage children have finished.
Below are simple placeholder implementations.
# how many other children the last child must wait for
my $FIRST_STAGE_CHILDREN = 2;
sub first_stage {
my($id) = #_;
print "[$$] hello from child $id\n";
sleep rand 10;
print "[$$] child $id done\n";
}
sub second_stage {
print "[$$] hello from second-stage child!\n";
}
To implement synchronization between the first and second stages, the
program creates a set of semaphores whose size equals the number of
first-stage children. When a first-stage child completes, the program
releases the particular semaphore corresponding to that child.
my $sem = IPC::Semaphore->new(
IPC_PRIVATE, $FIRST_STAGE_CHILDREN,
S_IRUSR | S_IWUSR | IPC_CREAT)
or die "$0: failed to create semaphore: $!";
As we will see later, the second-stage child waits on his brethren by
attempting to decrement their semaphores. By starting the value at zero,
when the second-stage child attempts these decrements, the OS will put
the child to sleep because. Only after all first-stage children have
exited and released their semaphores will the system unblock the
second-stage child.
# start in blocked state
$sem->setall((0) x $FIRST_STAGE_CHILDREN);
First we fork the first-stage children. In this design, the parent
process does as much bookkeeping as possible. This keeps the definition
of first_stage and second_stage simple. Also, if a first-stage child
somehow exited without releasing its semaphore, the second stage would
have no hope of running.
my %kids;
foreach my $id (0 .. $FIRST_STAGE_CHILDREN - 1) {
my $pid = fork;
die "$0: fork: $!" unless defined $pid;
if ($pid) {
++$kids{$pid};
}
else {
first_stage $id;
$sem->op($id, 1, 0); # release
exit 0;
}
}
Now we fork the second-stage child. Important: although the code
performs an operation on multiple semaphores, this happens atomically,
that is, either it works for all of them or none of them. In no
observable state will it appear that the second stage was able to grab
any fewer than all of the first-stage semaphores. This is an important
property. In more complex systems, haphazard onesie-twosie taking and
releasing will result in deadlock.
my $pid = fork;
die "$0: fork: $!" unless defined $pid;
if ($pid) {
++$kids{$pid};
}
else {
# block waiting on all first-stage children
my #op = map +($_, -1, 0), 0 .. $FIRST_STAGE_CHILDREN - 1;
$sem->op(#op);
second_stage;
exit 0;
}
Finally, the parent process waits for all children to complete.
do {
$pid = waitpid -1, 0;
print "[$$] reaped $pid\n";
warn "$0: unknown child $pid" unless delete $kids{$pid};
} while $pid > 0 && keys %kids;
Sample output is below. It’s more interesting to watch live where you can see the pauses.
[18389] hello from child 0
[18390] hello from child 1
[18390] child 1 done
[18388] reaped 18390
[18389] child 0 done
[18391] hello from second-stage child!
[18388] reaped 18389
[18388] reaped 18391

Perl sleep terminated by SIGCHLD

#!/usr/bin/perl
use POSIX ":sys_wait_h";
$SIG{CHLD} = \&REAPER;
sub REAPER {
my $pid;
while (($pid = waitpid(-1, WNOHANG)) > 0) {
print "where is here,$pid\n";
}
}
sub child {
print "I'm child, pid=$$.\n";
sleep 2;
}
$lid = fork();
if ($lid == 0) {
&child;
exit;
} else {
sleep 1000;
print "I am parent, child pid : $lid\n";
}
Output:
I'm child, pid=11839.
where is here,11839
I am parent, child pid : 11839
The above is my Perl script. The output is right, but one strange thing is that it prints I am parent, child pid : 11839 immediately after the last output. Why didn't the sleep 1000 have any effect?
This is documented: "May be interrupted if the process receives a signal". The whole point is to allow signal handlers to run. Just go back to sleep if it's not time to wake up.
use Time::HiRes qw( time sleep ); # Optional.
sub unint_sleep($) {
my $sleep_til = time + $_[0];
for (;;) {
my $sleep_dur = time - $sleep_til;
last if $sleep_dur <= 0;
sleep($sleep_dur);
}
}
Please always use strict and use warnings, and declare your symbols using my at their point of definition. This applies especially when you are asking for help, as these measures can reveal simple bugs that are otherwise easily overlooked
sleep is implemented by using setitimer to request a SIGALRM after a specified interval, and then pause to suspend the process until it gets the signal.
But if a SIGCHLD comes first this will also wake the process.
The signal mask cannot be set to prevent this as otherwise the SIGCHLD wouldn't get serviced

How to timeout waitpid without killing the child?

I am aware of the many questions regarding waitpid and timeouts but they all cover this by killing the child from within an alarm handler.
That is not what i want, i want to keep the process running but dispatch it from waitpid.
The underlaying problem I try to solve is a daemon process with a main loop that processes a queue. The tasks are processed one at a time.
If a task hangs the whole main loop hangs. To get around this fork() and waitpid seemed an obvious choice. Still if a task hangs the loop hangs.
I can think of workarounds where i do not use waitpid at all but i would have to track running processes another way as i still want to process one task at a time in parallel to possibly hanging tasks.
I could even kill the task but i would like to have it running to examine what exactly is going wrong. A kill handler that dumps some debug information is also possible.
Anyway, the most convenient way to solve that issue is to timeout waitpid if possble.
Edit:
This is how I used fork() and waitpid and it may be clearer what is meant by child.
my $pid = fork();
if ($pid == 0){
# i am the child and i dont want to die
}
elsif ($pid > 0) {
waitpid $pid, 0;
# i am the parent and i dont want to wait longer than $timeout
# for the child to exit
}
else {
die "Could not fork()";
}
Edit:
Using waitpid WNOHANG does what I want. Is this usage good practice or would you do it differently?
use strict;
use warnings;
use 5.012;
use POSIX ':sys_wait_h';
my $pid = fork();
if ($pid == 0){
say "child will sleep";
sleep 20;
say "child slept";
}
else {
my $time = 10;
my $status;
do {
sleep 1;
$status = waitpid -1, WNOHANG;
$time--;
} while ($time && not $status );
say "bye";
}
If a task hangs the whole main loop hangs. To get around this fork()
and waitpid seemed an obvious choice. Still if a task hangs the loop
hangs.
Use waitpid with the WNOHANG option. This way it's not going to suspend the parent process and will immediately return 0 when the child has not yet exited. In your main loop you'll have to periodically poll all the children (tasks).
instead of poling all the children periodically, you might want to set up a signal handler to handle SIGCHLD... from perlipc:
use POSIX ":sys_wait_h";
$SIG{CHLD} = sub {
while ((my $child = waitpid(-1, WNOHANG)) > 0) {
$Kid_Status{$child} = $?;
}
};
# do something that forks...
Enabling and handling SIGCHLD is also a possibility; it'll notify you of child process state changes without polling -- see sigprocmask(2) and signal(3) in the manual pages.

Killing child and its children when child was created using open

Here's my code, with error handling and other stuff removed for clarity:
sub launch_and_monitor {
my ($script, $timeout) = #_;
sub REAPER {
while ((my $child = waitpid(-1, &WNOHANG)) > 0) {}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
my $pid = fork;
if (defined $pid) {
if ($pid == 0) {
# in child
monitor($timeout);
}
else {
launch($script);
}
}
}
The launch sub executes a shell script which in turn launches other processes, like so:
sub launch($) {
my ($script) = #_;
my $pid = open(PIPE, "$script|");
# write pid to pidfile
if ($pid != 0) {
while(<PIPE>) {
# do stuff with output
}
close(PIPE) or die $!;
}
}
The monitor sub basically just waits for a specified period of time and then attempts to kill the shell script.
sub monitor($) {
my ($timeout) = #_;
sleep $timeout;
# check if script is still running and if so get pid from pidfile
if (...) {
my $pid = getpid(...);
kill 9, $pid;
}
}
This kills the script, however, it does not kill any of its subprocesses. How to fix it?
You can do this with process groups, if your operating system supports them. You need to make the script process become a process group leader. The child processes that it runs will inherit the process group from their parent. You can then use kill to send a signal to each process in the group at the same time.
In launch(), you will need to replace the open line with one that forks. Then in the child, you would call setpgrp() before exec'ing the command. Something like the following should work:
my $pid = open(PIPE, "-|");
if (0 == $pid) {
setpgrp(0, 0);
exec $script;
die "exec failed: $!\n";
}
else {
while(<PIPE>) {
# do stuff with output
}
close(PIPE) or die $!;
}
Later, to kill the script process and its children, negate the process ID that you're signalling:
kill 9, -$pid;
In general, I don't think you can expect signals to be propagated into all child processes; this isn't specific to perl.
That said, you might be able to use the process group signal feature built into perl kill():
...if SIGNAL is negative, it kills process groups instead of processes...
You probably need to use setpgrp() on your (direct) child process, then change your kill call to something like:
kill -9, $pgrp;
Try adding:
use POSIX qw(setsid);
setsid;
at the top of your launch_and_monitor function. This will put your processes in a separate session, and cause things to exit when the session leader (i.e. the master) exits.
Killing a processgroup works, but don't forget the parent can be killed alone too. Assuming child processes have an event loop, they can check the parent socket that was created in a socketpair prior doing the fork() for validness. In fact, select() cleanly exits when the parent socket is gone, all that needs to be done is to check the socket.
E.g.:
use strict; use warnings;
use Socket;
$SIG{CHLD} = sub {};
socketpair(my $p, my $c, AF_UNIX, SOCK_STREAM, PF_UNSPEC) or die $!;
print "parent $$, fork 2 kids\n";
for (0..1){
my $kid = fork();
unless($kid){
child_loop($p, $c);
exit;
}
print "parent $$, forked kid $kid\n";
}
print "parent $$, waiting 5s\n";
sleep 5;
print "parent $$ exit, closing sockets\n";
sub child_loop {
my ($p_s, $c_s) = #_;
print "kid: $$\n";
close($c_s);
my $rin = '';
vec($rin, fileno($p_s), 1) = 1;
while(1){
select my $rout = $rin, undef, undef, undef;
if(vec($rout, fileno($p_s), 1)){
print "kid: $$, parent gone, exiting\n";
last;
}
}
}
Runs like this:
tim#mint:~$ perl ~/abc.pl
parent 5638, fork 2 kids
parent 5638, forked kid 5639
kid: 5639
parent 5638, forked kid 5640
parent 5638, waiting 5s
kid: 5640
parent 5638 exit, closing sockets
kid: 5640, parent gone, exiting
kid: 5639, parent gone, exiting
tim#mint:~$