I have a Perl app that's been running largely untroubled on a RH system for a few years. In one place, I have to run a system command that can take take many minutes to complete, so I do this in a child process. The overall structure is like this:
$SIG{CHLD} = 'IGNORE'; # Ignore dead children, to avoid zombie processes
my $child = fork();
if ($child) { # Parent; return OK
$self->status_ok();
} else { # Child; run system command
# do a bunch of data retrieval, etc.
my $output;
my #command = # generate system command here
use IPC::System::Simple 'capture';
eval { $output = capture(#command); };
$self->log->error("Error running #command: $#") if $#;
# success: log $output, carry on
}
We recently changed some of our infrastructure, although not in ways that I expected would have any influence on this. (Still running on RH, still using nginx, etc.) However, now we find that almost every instance of running this code fails, logging 'Error running {command}: failed to start: "No child processes" at /path/to/code.pl'.
I've looked around and can't figure out what the right solution is for this. There was a suggestion to change $SIG{CHLD} from 'IGNORE' to 'DEFAULT', but then I have to worry about zombie processes.
What is causing the "No child processes" error, and how do we fix this?
There was a suggestion to change $SIG{CHLD} from 'IGNORE' to 'DEFAULT', but then I have to worry about zombie processes.
This isn't true.
A zombie process is a process that has ended, but hasn't been reaped by its parent yet. A parent reaps its children using wait(2), waitpid(2) or similar. capture waits for its child to end, so it doesn't leave any zombie behind.
In fact, the error you are getting is from waitpid. capture is waiting for the child to end to reap it and collect its error code, but the you instructed the OS to clean up the child as soon as it completes, leaving waitpid with no child to reap and no error code to collect.
To fix this problem, simply place local $SIG{CHLD} = 'DEFAULT'; before the call to capture.
Related
If I run an external process via a perl program,the perl program will remain the parent of the process. Making process management easy.
system('sleep 3000'); # perl is still the parent
However if I try to run the process in the background so that the program does not have to wait for the process to exit...
system('sleep 3000 &');
The sleep process will be adopted by the systems init process and is no longer associated with the program that executed it.
What is the proper way to handle process management in this situation. How can I emulate running the process in the background but maintain process ancestry?
You can use threads,
use threads;
my $t = async { system('sleep 3000'); };
# do something in parallel ..
# wait for thread to finish
$t->join;
or fork
sub fasync(&) {
my ($worker) = #_;
my $pid = fork() // die "can't fork!";
if (!$pid) { $worker->(); exit(0); }
return sub {
my ($flags) = #_;
return waitpid($pid, $flags // 0);
}
}
my $t = fasync { system('sleep 3000'); };
# do something in parallel ..
# wait for fork to finish
$t->();
fork/exec and wait().
Fork creates the child process by creating a copy of the parent, the parent receives the process id of the child, and calls wait() on the child process id.
Meanwhile, the child process uses exec() to overlay itself (a copy of the parent) with the process that you wish to execute.
If you need more than one concurrent background job, I recommend Parallel::ForkManager.
I'm writing a debugging utility, and I want to fork a child while preventing that child's termination from triggering a SIGCHLD to its parent. I still want other children to normally cause a SIGCHLD upon termination.
I want to do this because I don't want the fork to trigger an existing $SIG{CHLD} handler, but I still want other children to trigger it. That is, I want to isolate my new child and I don't want it to interfere with management of existing children.
I'm wary of locally installing a custom $SIG{CHLD} handler since I don't think I can properly detect when I should call the original handler. For instance, if I install a local $SIG{CHLD} handler, then I'm guaranteed to receive a SIGCHLD signal once I spawn a child and have the parent waitpid for it to terminate. However, that SIGCHLD won't indicate whether or not other children have terminated as well, so I can't be sure whether to call the original handler.
I researched that a process cannot change its parent pid. I'm not sure if changing the child's process group id or session id would be useful.
Is it even possible to prevent a specific child from triggering the SIGCHLD on the parent? Am I forced to rely on the existing $SIG{CHLD} handler to Do The Right Thing when it receives the SIGCHLD signal for a child that it did not expect?
Although there may be a better way to implement that debugging utility (let me know if there is), I'm still wondering whether POSIX offers such fine-grained control over children, and am looking for a Perl solution to my dilemma.
You can't portably prevent the SIGCHLD entirely, but if the signal handler is written properly you can prevent the waitpid from returning your debugging tool's pid by orphaning it.
use Signal::Mask;
sub start_background_child {
local $Signal::Mask{CHLD} = 1;
my $first = fork;
croak "fork failed: $!" if not defined $first;
if ($first) {
waitpid $first, 0;
return;
}
my $second = fork;
croak "fork failed: $!" if not defined $second;
exit if $second;
your_stuff_here();
}
I think you could daemonize the special child -- fork twice -- to sever the parent-child relationship. You'd still receive a SIGCHLD when the process was created -- I don't know if that's acceptable for you.
sub start_special_child {
return if fork; # run rest of this routine in child process
fork && exit; # run rest of this routine in grandchild process
# the exit here triggers SIGCHLD in the parent
... # now run your special process
exit; # exit here does not trigger SIGCHLD
}
The other approach is to keep track of the process id's of your child processes, and use waitpid to figure out which process(es) triggered the SIGCHLD handler.
$SIG{CHLD} = \&sigchld_handler;
$pid1 = start_child_process();
$pid2 = start_child_process();
$pid3 = start_child_process();
$pidS = start_special_child_process();
sub sigchld_handler {
$pid = waitpid -1, WNOHANG; # use POSIX ':sys_wait_h'
return if $pid == $pidS;
}
Is there an option one can give to IPC::Run which kills the process upon the parent dying? Or alternatively a simple way to kill child processes when the parent dies? I know I can do this by catching signals in the parent, but I'd rather not reinvent the wheel if a simple way to do this already exists. I understand that this may not catch SIGKILL, but that's okay, I plan to kill the parent in a more reasonable manner.
Use an END block to clean up.
my #ipc_run_harnesses;
END { $_->kill_kill for #ipc_run_harnesses }
...
for my $start ( 1..2 ) {
push #ipc_run_harnesses, IPC::Run::start( "while true; do sleep 1; echo running $start; done" );
}
sleep 10;
exit;
Is it possible to implement some kind of timeout (time limit) for fork using Parallel::ForkManager ?
Basic Parallel::ForkManager script looks like this
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new( 10 );
for ( 1 .. 1000 ) {
$pm->start and next;
# some job for fork
$pm->finish;
}
$pm->wait_all_children();
I would like to limit time for "# some job for fork". For example, if its not finished in 90 secs. then it (fork) should be killed/terminated.
I thought about using this but I have to say, that I dont know how to use it with Parallel::ForkManager.
EDIT
Thanks hobbs and ikegami. Both your suggestions worked..... but only in this basic example, not in my actual script :(.
These forks will be there forever and - to be honest - I dont know why. I use this script for couple of months. Didnt change anything (although many things depends on outside variables).
Every fork has to download a page from a website, parse it and save results to a file. It should not take more than 30 secs per fork. Timeout is set to 180 secs. Those hanging forks are totally random so its very hard to trace the problem. Thats why I came up with a temporary, simple solution - timeout & kill.
What could possibly disable (interrupt) your methods of timeout in my code ? I dont have any other alarm() anywhere in my code.
EDIT 2
One of the forks, was hanging for 1h38m and returned "timeout PID" - which is what I type in die() for alarm(). So the timeout works... but its late about 1h36,5m ;). Do you have any ideas?
Update
Sorry to update after the close, but I'd be remiss if I didn't point out that Parallel::ForkManager also supports a run_on_start callback. This can be used to install a "child registration" function that takes care of the time()-stamping of PIDs for you.
E.g.,
$pm->run_on_start(sub { my $pid = shift; $workers{$pid} = time(); });
The upshot is that, in conjunction with run_on_wait as described below, the main loop of a P::FM doesn't have to do anything special. That is, it can remain a simple $pm->start and next, and the callbacks will take care of everything else.
Original Answer
Parallel::ForkManager's run_on_wait handler, and a bit of bookkeeping, can force hanging and ALRM-proof children to terminate.
The callback registered by that function can be run, periodically, while the $pm awaits child termination.
use strict; use warnings;
use Parallel::ForkManager;
use constant PATIENCE => 90; # seconds
our %workers;
sub dismiss_hung_workers {
while (my ($pid, $started_at) = each %workers) {
next unless time() - $started_at > PATIENCE;
kill TERM => $pid;
delete $workers{$pid};
}
}
...
sub main {
my $pm = Parallel::ForkManager->new(10);
$pm->run_on_wait(\&dismiss_hung_workers, 1); # 1 second between callback invocations
for (1 .. 1000) {
if (my $pid = $pm->start) {
$workers{$pid} = time();
next;
}
# Here we are child. Do some work.
# (Maybe install a $SIG{TERM} handler for graceful shutdown!)
...
$pm->finish;
}
$pm->wait_all_children;
}
(As others suggest, it's better to have the children regulate themselves via alarm(), but that appears intermittently unworkable for you. You could also resort to wasteful, gross hacks like having each child itself fork() or exec('bash', '-c', 'sleep 90; kill -TERM $PPID').)
All you need is one line:
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new( 10 );
for ( 1 .. 1000 ) {
$pm->start and next;
alarm 90; # <---
# some job for fork
$pm->finish;
}
$pm->wait_all_children();
You don't need to set up a signal handlers since you do mean for the process to die.
It even works if you exec in the child. It won't work on Windows, but using fork on Windows is questionable in the first place.
Just do what the answer you linked to suggests, inside the child process (i.e. between the $pm->start and next and the end of the loop. There's nothing special you need to do to make it interact with Parallel::ForkManager, other than make sure you don't accidentally kill the parent instead :)
This is my code:
if ($DAEMON) {
my $pid = fork();
if (not defined $pid) {
print "Unable to start daemon.\n";
exit(1);
}
elsif ($pid == 0) {
open STDOUT, '>', '/dev/null';
open STDERR, '>', '/dev/null';
_create_sessions($self, $settings);
$poe_kernel->run;
}
else { print "Script forked to background with PID $pid\n"; }
}
else {
_create_sessions($self, $settings);
$poe_kernel->run;
}
When $DAEMON = 1, it complains that POE::Kernel's run() method was never called, but as you can see in the above code, I did that already. The script works perfectly fine when in daemon mode, but I can't get rid of that warning or understand why it says that. I also tried calling $poe_kernel->has_forked() and that didn't make a difference either.
I'm out of ideas. Any suggestions?
Updated to add: Maybe I wasn't clear enough. The code below creates the session and runs the kernel.
_create_sessions($self, $settings);
$poe_kernel->run;
It works perfectly fine. It's only when the same code is run inside a fork'd child so I can send the script to the background, that it says POE::Kernel's run method wasn't called. The script does go into the background and works like it should which means the kernel is indeed running. I'm only looking to get rid of that annoying warning.
ysth is right. The warning happens because POE::Session instances are created in the parent process but they haven't been given an opportunity to run.
% perl -wle 'use POE; POE::Session->create(inline_states=>{_start => sub {}})'
40023: Sessions were started, but POE::Kernel's run() method was never
40023: called to execute them. This usually happens because an error
40023: occurred before POE::Kernel->run() could be called. Please fix
40023: any errors above this notice, and be sure that POE::Kernel->run()
40023: is called. See documentation for POE::Kernel's run() method for
40023: another way to disable this warning.
In the above example, 40023 is the process ID where the problem was detected.
It's similar to Perl's warning about exiting with active threads:
% perl -wle 'use threads; threads->create(sub { sleep 3600 }); '
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
While your code snippet shows sessions are created and run in the child process, I suspect sessions are created before or afterward. The parent process doesn't exit in your snippet, so there's no telling where execution goes afterward?
You should also call POE::Kernel->has_forked() in the child process. I can't tell whether that's happening in your code snippet.
The correct solution is to move all session instantiation into the child process when daemonizing. A passable workaround is to call POE::Kernel->run() just after using POE::Kernel and before any sessions are actually created. run() will return immediately because no sessions exist, but the call satisfies the condition you're being warned about. It's a way of saying "yes, yes, but I know what I'm doing".
From the doc, POE::Kernel's run is normally called as a class method; what is $poe_kernel?
Somewhere, you seem to be starting a session but don't end up calling POE::Kernel->run();
Update: since the message you see is output with warn, and you are throwing away STDERR in the child, I'm guessing it is the parent giving the warning. Something you are doing (in the code you don't show that loads POE and sets $poe_kernel?) is in fact creating a session, apparently unintentionally.
Try to reduce your code to a short, yet runnable example and you will either find the problem yourself or enable others to help you find it.