Is there an option one can give to IPC::Run which kills the process upon the parent dying? Or alternatively a simple way to kill child processes when the parent dies? I know I can do this by catching signals in the parent, but I'd rather not reinvent the wheel if a simple way to do this already exists. I understand that this may not catch SIGKILL, but that's okay, I plan to kill the parent in a more reasonable manner.
Use an END block to clean up.
my #ipc_run_harnesses;
END { $_->kill_kill for #ipc_run_harnesses }
...
for my $start ( 1..2 ) {
push #ipc_run_harnesses, IPC::Run::start( "while true; do sleep 1; echo running $start; done" );
}
sleep 10;
exit;
Related
I have a Perl app that's been running largely untroubled on a RH system for a few years. In one place, I have to run a system command that can take take many minutes to complete, so I do this in a child process. The overall structure is like this:
$SIG{CHLD} = 'IGNORE'; # Ignore dead children, to avoid zombie processes
my $child = fork();
if ($child) { # Parent; return OK
$self->status_ok();
} else { # Child; run system command
# do a bunch of data retrieval, etc.
my $output;
my #command = # generate system command here
use IPC::System::Simple 'capture';
eval { $output = capture(#command); };
$self->log->error("Error running #command: $#") if $#;
# success: log $output, carry on
}
We recently changed some of our infrastructure, although not in ways that I expected would have any influence on this. (Still running on RH, still using nginx, etc.) However, now we find that almost every instance of running this code fails, logging 'Error running {command}: failed to start: "No child processes" at /path/to/code.pl'.
I've looked around and can't figure out what the right solution is for this. There was a suggestion to change $SIG{CHLD} from 'IGNORE' to 'DEFAULT', but then I have to worry about zombie processes.
What is causing the "No child processes" error, and how do we fix this?
There was a suggestion to change $SIG{CHLD} from 'IGNORE' to 'DEFAULT', but then I have to worry about zombie processes.
This isn't true.
A zombie process is a process that has ended, but hasn't been reaped by its parent yet. A parent reaps its children using wait(2), waitpid(2) or similar. capture waits for its child to end, so it doesn't leave any zombie behind.
In fact, the error you are getting is from waitpid. capture is waiting for the child to end to reap it and collect its error code, but the you instructed the OS to clean up the child as soon as it completes, leaving waitpid with no child to reap and no error code to collect.
To fix this problem, simply place local $SIG{CHLD} = 'DEFAULT'; before the call to capture.
If I run an external process via a perl program,the perl program will remain the parent of the process. Making process management easy.
system('sleep 3000'); # perl is still the parent
However if I try to run the process in the background so that the program does not have to wait for the process to exit...
system('sleep 3000 &');
The sleep process will be adopted by the systems init process and is no longer associated with the program that executed it.
What is the proper way to handle process management in this situation. How can I emulate running the process in the background but maintain process ancestry?
You can use threads,
use threads;
my $t = async { system('sleep 3000'); };
# do something in parallel ..
# wait for thread to finish
$t->join;
or fork
sub fasync(&) {
my ($worker) = #_;
my $pid = fork() // die "can't fork!";
if (!$pid) { $worker->(); exit(0); }
return sub {
my ($flags) = #_;
return waitpid($pid, $flags // 0);
}
}
my $t = fasync { system('sleep 3000'); };
# do something in parallel ..
# wait for fork to finish
$t->();
fork/exec and wait().
Fork creates the child process by creating a copy of the parent, the parent receives the process id of the child, and calls wait() on the child process id.
Meanwhile, the child process uses exec() to overlay itself (a copy of the parent) with the process that you wish to execute.
If you need more than one concurrent background job, I recommend Parallel::ForkManager.
I'm writing a debugging utility, and I want to fork a child while preventing that child's termination from triggering a SIGCHLD to its parent. I still want other children to normally cause a SIGCHLD upon termination.
I want to do this because I don't want the fork to trigger an existing $SIG{CHLD} handler, but I still want other children to trigger it. That is, I want to isolate my new child and I don't want it to interfere with management of existing children.
I'm wary of locally installing a custom $SIG{CHLD} handler since I don't think I can properly detect when I should call the original handler. For instance, if I install a local $SIG{CHLD} handler, then I'm guaranteed to receive a SIGCHLD signal once I spawn a child and have the parent waitpid for it to terminate. However, that SIGCHLD won't indicate whether or not other children have terminated as well, so I can't be sure whether to call the original handler.
I researched that a process cannot change its parent pid. I'm not sure if changing the child's process group id or session id would be useful.
Is it even possible to prevent a specific child from triggering the SIGCHLD on the parent? Am I forced to rely on the existing $SIG{CHLD} handler to Do The Right Thing when it receives the SIGCHLD signal for a child that it did not expect?
Although there may be a better way to implement that debugging utility (let me know if there is), I'm still wondering whether POSIX offers such fine-grained control over children, and am looking for a Perl solution to my dilemma.
You can't portably prevent the SIGCHLD entirely, but if the signal handler is written properly you can prevent the waitpid from returning your debugging tool's pid by orphaning it.
use Signal::Mask;
sub start_background_child {
local $Signal::Mask{CHLD} = 1;
my $first = fork;
croak "fork failed: $!" if not defined $first;
if ($first) {
waitpid $first, 0;
return;
}
my $second = fork;
croak "fork failed: $!" if not defined $second;
exit if $second;
your_stuff_here();
}
I think you could daemonize the special child -- fork twice -- to sever the parent-child relationship. You'd still receive a SIGCHLD when the process was created -- I don't know if that's acceptable for you.
sub start_special_child {
return if fork; # run rest of this routine in child process
fork && exit; # run rest of this routine in grandchild process
# the exit here triggers SIGCHLD in the parent
... # now run your special process
exit; # exit here does not trigger SIGCHLD
}
The other approach is to keep track of the process id's of your child processes, and use waitpid to figure out which process(es) triggered the SIGCHLD handler.
$SIG{CHLD} = \&sigchld_handler;
$pid1 = start_child_process();
$pid2 = start_child_process();
$pid3 = start_child_process();
$pidS = start_special_child_process();
sub sigchld_handler {
$pid = waitpid -1, WNOHANG; # use POSIX ':sys_wait_h'
return if $pid == $pidS;
}
Is it possible to implement some kind of timeout (time limit) for fork using Parallel::ForkManager ?
Basic Parallel::ForkManager script looks like this
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new( 10 );
for ( 1 .. 1000 ) {
$pm->start and next;
# some job for fork
$pm->finish;
}
$pm->wait_all_children();
I would like to limit time for "# some job for fork". For example, if its not finished in 90 secs. then it (fork) should be killed/terminated.
I thought about using this but I have to say, that I dont know how to use it with Parallel::ForkManager.
EDIT
Thanks hobbs and ikegami. Both your suggestions worked..... but only in this basic example, not in my actual script :(.
These forks will be there forever and - to be honest - I dont know why. I use this script for couple of months. Didnt change anything (although many things depends on outside variables).
Every fork has to download a page from a website, parse it and save results to a file. It should not take more than 30 secs per fork. Timeout is set to 180 secs. Those hanging forks are totally random so its very hard to trace the problem. Thats why I came up with a temporary, simple solution - timeout & kill.
What could possibly disable (interrupt) your methods of timeout in my code ? I dont have any other alarm() anywhere in my code.
EDIT 2
One of the forks, was hanging for 1h38m and returned "timeout PID" - which is what I type in die() for alarm(). So the timeout works... but its late about 1h36,5m ;). Do you have any ideas?
Update
Sorry to update after the close, but I'd be remiss if I didn't point out that Parallel::ForkManager also supports a run_on_start callback. This can be used to install a "child registration" function that takes care of the time()-stamping of PIDs for you.
E.g.,
$pm->run_on_start(sub { my $pid = shift; $workers{$pid} = time(); });
The upshot is that, in conjunction with run_on_wait as described below, the main loop of a P::FM doesn't have to do anything special. That is, it can remain a simple $pm->start and next, and the callbacks will take care of everything else.
Original Answer
Parallel::ForkManager's run_on_wait handler, and a bit of bookkeeping, can force hanging and ALRM-proof children to terminate.
The callback registered by that function can be run, periodically, while the $pm awaits child termination.
use strict; use warnings;
use Parallel::ForkManager;
use constant PATIENCE => 90; # seconds
our %workers;
sub dismiss_hung_workers {
while (my ($pid, $started_at) = each %workers) {
next unless time() - $started_at > PATIENCE;
kill TERM => $pid;
delete $workers{$pid};
}
}
...
sub main {
my $pm = Parallel::ForkManager->new(10);
$pm->run_on_wait(\&dismiss_hung_workers, 1); # 1 second between callback invocations
for (1 .. 1000) {
if (my $pid = $pm->start) {
$workers{$pid} = time();
next;
}
# Here we are child. Do some work.
# (Maybe install a $SIG{TERM} handler for graceful shutdown!)
...
$pm->finish;
}
$pm->wait_all_children;
}
(As others suggest, it's better to have the children regulate themselves via alarm(), but that appears intermittently unworkable for you. You could also resort to wasteful, gross hacks like having each child itself fork() or exec('bash', '-c', 'sleep 90; kill -TERM $PPID').)
All you need is one line:
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new( 10 );
for ( 1 .. 1000 ) {
$pm->start and next;
alarm 90; # <---
# some job for fork
$pm->finish;
}
$pm->wait_all_children();
You don't need to set up a signal handlers since you do mean for the process to die.
It even works if you exec in the child. It won't work on Windows, but using fork on Windows is questionable in the first place.
Just do what the answer you linked to suggests, inside the child process (i.e. between the $pm->start and next and the end of the loop. There's nothing special you need to do to make it interact with Parallel::ForkManager, other than make sure you don't accidentally kill the parent instead :)
I have a quite simple perl script, that in one function does the following:
if ( legato_is_up() ) {
write_log("INFO: Legato is up and running. Continue the installation.");
$wait_minutes = $WAITPERIOD + 1;
$legato_up = 1;
}
else {
my $towait = $WAITPERIOD - $wait_minutes;
write_log("INFO: Legato is not up yet. Waiting for another $towait minutes...");
sleep 30;
$wait_minutes = $wait_minutes + 0.5;
}
For some reason, sometimes (like 1 in 3 runs) the script gets killed. I don't know who's responsible for the kill, I just know it happens during the "sleep" call.
Can anyone give me a hint here? After script is killed, it's job is not done, which is a big problem.
Without knowing what else is running on your system, it's anybody's guess. You could add a signal handler, but all that it would tell you is which signal it was (and when), but not who sent it:
foreach my $signal (qw(INT PIPE HUP))
{
my $old_handler = $SIG{$signal};
$SIG{$signal} = sub {
print time, ": ", $signal, " received!\n";
$old_handler->(#_) if $old_handler;
};
}
You also may want to consider adding a WARN and DIE handler, if you are not logging output from stderr.
Under, at least Linux, you can see who sent a signal (if its an external process that used kill(2)) by looking at the siginfo struct (particularly si_pid) passed to a signal handler. I don't know how to see that from Perl however - but in your case you could strace (or similar on non-Linux platforms) your script and see it that way. e.g. strace -p <pid of your perl script>. You should see something like:
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=89165, si_uid=1000} ---
just before your untimely death.
(a few years late for the OP I know...)