As far as I know, the pattern for interrupting something after some time in Perl, is as follows:
#!/usr/bin/perl
local $SIG{ALRM} = sub { die "timed out\n"; };
my $pid;
eval {
alarm 10;
# do stuff
alarm 0;
};
print "done\n";
kill(9, $pid) if $pid;
So one example of a "something" is opening a subprocess for reading which might randomly hang. Simulating it for testing purposes works as expected:
# replace "# do stuff" with:
$pid = open FH, "sleep 120|";
my $line = <FH>;
close FH;
However, these globally-named things can't be passed to other subroutines, so I really want to use the following implementation:
# replace "# do stuff" with:
$pid = open my $fh, "sleep 120|";
my $line = <$fh>;
close $fh;
... but when I run this, the Perl script hangs indefinitely! Why?
When there are no references left to a variable (perhaps as a result of it going out of scope), Perl cleans up the resources associated with that variable. Destructors are called, memory is freed, file handles are closed, etc. This is the case even if the variable ceases to be referenced due to an exception.
When a file handle created using open -| or open |- is closed, Perl waits for the child process to end. This happens not just for explicit calls to close, but for implicit closes that happen as a result of the destroying an open file handle.
In your scenario, Perl is waiting for the child to complete as a result of the implicit close of the file handle. It doesn't hang indefinitely, but for the remainder of the 120 seconds it takes for the child to end. This is the behaviour one should observe for both versions of the program, and it is the behaviour one observes for both versions unless one is using an old version of Perl. Up until the release of 5.18 9 years ago, a bug skipped closing of file handles during global destruction.
Related
Also posted on PerlMonks.
I have this very simple Perl script on my linux server.
What I would like to be able to do is to call the script from a browser on a separate machine
Have the script initiate a fork
Have the parent send an httpResponse (freeing up the browser)
Immediately end the parent
Allow the child to do its job, heavy complex database work, which could take a minute or two
Have the child end itself with no output whatsoever
When I call this script from a browser, the browser does not receive the sent response till the child is complete.
Yes, it works when called from the command line.
Is what I want to do possible?
p.s. I even tried it with ProcSimple, but I get the same hang up.
#!/usr/bin/perl
local $SIG{CHLD} = "IGNORE";
use lib '/var/www/cgi-bin';
use CGI;
my $q = new CGI;
if(!defined($pid = fork())) {
die "Cannot fork a child: $!";
} elsif ($pid == 0) {
print $q->header();
print "i am the child\n";
sleep(10);
print "child is done\n";
exit;
} else {
print $q->header();
print "I am the parent\n";
print "parent is done\n";
exit 0;
}
exit 0;
In general you must detach the child process from its parent to allow the parent to exit cleanly -- otherwise the parent can't assume that it won't need to handle more input/output.
} elsif ($pid == 0) {
close STDIN;
close STDERR;
close STDOUT; # or redirect
do_long_running_task();
exit;
In your example, the child process is making print statements until it exits. Where do those prints go if the parent process has been killed and closed its I/O handles?
One way for a parent process to start another process that will go on its own is to "double fork." The child itself forks and it then exits right away, so its child is taken over by init and can't be a zombie.
This may help here as it does seem that there may be blocking since file descriptors are shared between parent and child, as brought up in comments. If the child were to exit quickly that may work but as you need a process for a long running job then fork twice
use warnings;
use strict;
use feature 'say';
my $pid = fork // die "Can't fork: $!";
if ($pid == 0) {
say "\tChild. Fork";
my $ch_pid = fork // die "Can't fork from child: $!";
if ($ch_pid == 0) {
# grandchild, run the long job
sleep 10;
say "\t\tgrandkid done";
exit;
}
say "\tChild, which just forked, exiting right away.";
exit;
}
say "Parent, and done";
I am not sure how to simulate your setup to test whether this helps but since you say that the child produces "no output whatsoever" it may be enough. It should be worth trying since it's simpler than demonizing the process (which I'd expect to do the trick).
Similarly to #mob's post, here's how my web apps do it:
# fork long task
if (my $pid = fork) {
# parent: return with http response to web client
} else {
# child: suppress further IO to ensure termination of http connection to client
open STDOUT, '>', "/dev/null";
open STDIN, '>', "/dev/null";
open STDERR, '>', "/dev/null";
}
# Child carries on from here,
Sometimes the (child) long process prints to a semaphore or status file that the web client may watch to see when the long process is complete.
I don't remember which Perl adept suggested this years ago, but it's served reliably in many situations, and seems very clear from the "re-visit it years later - what was I doing?" perspective...
Note that if /dev/null doesn't work outside of UNIX/Linux, then #mob's use of close might be more universal.
How I can echo a progress bar while an external process is executing and capture its STDOUT when it's done, using only standard modules. And not using fork?
Run external process, something like: #array = `ls -l`;
While it executing, do printing progress bar, like: print '.';
Capture STDOUT of the process into array, when it done
Continue works main script
I'm reading about IPC::Open2, IPC::Open3, but I don't understand how to use them for this task. Maybe it's not the right direction?
What do you have so far? If you have having trouble with the interprocess communication, forget about the progress bar for the moment and ask just about that.
You can't really have a progress bar for something that has an indeterminate end. If you don't know how much input you will read, you don't know what fraction of it you have read. People tend to think of progress bars as a representation of fraction of work done, just not activity. That is, unless you use macOS and understand that "less than one minute" means "more than three hours". ;)
I tend to do something simple, where I output a dot every so often. I don't know how many dots I'll see, but I know that I'll see new ones.
$|++; # unbuffer stdout to see dots as they are output
while( <$fh> ) {
print '.' if $. % $chunk_size; # $. is the line number
print "\n[$.] " if $. % $chunk_size * $row_length;
...
}
That $fh can be anything that you want to read from, including a pipe. perlopentut has examples of reading from external processes. Those are doing a fork, though. And, the other modules will fork as well. What's the constraint that makes you think you can't use fork?
You can get more fancy with your display by using curses and other things (a carriage return is handy :), but I'm not inclined to type those out.
Perhaps OP is looking for something of next kind just to indicate that external process is running.
Define a handler for $SIG{ALRM} and set alarm 1 to run handler every second. Once process complete reset alarm 0 to turn off alarm handler.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $ls_l; # variable to store output of external command
$| = 1; # unbuffered output
$SIG{ALRM} = \&handler;
alarm 1; # run handler every second
say 'Running sig_alarm_sleep';
$ls_l=`./sig_alarm_sleep`;
say ' done';
alarm 0;
my #fields = qw(rwx count user group size month day time name);
my #lines = split("\n",$ls_l);
my(#array);
for( #lines ) {
my $x->#{#fields} = split(' ',$_);
push #array, $x;
}
say Dumper(\#array);
exit 0;
sub handler {
print '.';
$SIG{ALRM} = \&handler;
alarm 1;
}
Bash script sig_alarm_sleep sample
#!/usr/bin/bash
sleep 20
ls -al
I don't know if I accidentally deleted or put in a typo somewhere but all of the sudden some of my code stopped working. For some reason no lines are ever read from $in.
use Win32::Job;
use IO::Handle;
STDOUT->autoflush;
pipe my $in, my $out;
my $job = Win32::Job->new;
sub flush_pipe{
while (defined(my $line = <$in>)) {
chomp($line);
print($line);
}
}
my $pid = $job->spawn("cmd", "cmd /C \"ipconfig\"",
{
stdout=>$out
}
);
flush_pipe();
Edit:
Through trial and error I eventually found out I have to close the $out filehandle before flushing the pipe.
A pipe is unidirectional. Each of the processes it connects can either read or write.
After pipe you have two filehandles, and both the parent and the child see them both. If the child is going to write and the parent to read, as in your code, then the child must first close the handle it won't use ($in) and the parent must close its unused one, $out. Otherwise you'll have deadlocks.
The spawn from the module starts a child process (or, rather, a Windows approximation of it) and redirects its STDOUT to the writing end of the pipe, $out.
Some very basic code that should cover this
use strict;
use warnings;
use feature 'say';
pipe my $in, my $out;
my $pid = fork // die "Can't fork: $!";
if ($pid == 0) { # child
close $in;
print $out "hi "; # can't read this yet (no newline) ...
sleep 1;
say $out "from child"; # now the other end can read it
close $out;
exit;
}
# parent
close $out;
say while <$in>;
close $in;
wait;
When you want prints to become available to the reader right away (up to buffering outside your code) send a newline. Close unused ends of the pipe in each process before doing anything else.
I can't write code on Windows now, but in your code parent must close $out (after spawn).
The term "flush" here can relate to the code in the writer or to Perl's clearing of IO buffers; the code in your flush_pipe() merely reads the pipe. So I'd change that name, to read_pipe or such.
I'm working on a library with a test suite that uses Perl open to run it's tests. It looks something like this:
open (MYOUT, "$myprog $arg1 $arg2 $arg3 2>&1 |") die "Bad stuff happened";
What I'd really like to do is to measure the runtime of $myprog. Unfortunately, just grabbing a start time and end time around the open command just grabs roughly how long it takes to start up the process.
Is there some way of either forcing the open command to finish the process (and therefore accurately measure time) or perhaps something else that would accomplish the same thing?
Key constraints are that we need to capture (potentially a lot of) STDOUT and STDERR.
Since you open a pipe, you need to time from before opening to at least after the reading
use warnings;
use strict;
use Time::HiRes qw(gettimeofday tv_interval sleep);
my $t0 = [gettimeofday];
open my $read, '-|', qw(ls -l) or die "Can't open process: $!";
while (<$read>)
{
sleep 0.1;
print;
}
print "It took ", tv_interval($t0), " seconds\n";
# close pipe and check
or, to time the whole process, after calling close on the pipe (after all reading is done)
my $t0 = [gettimeofday];
open my $read, '-|', qw(ls -l) or die "Can't open process: $!";
# ... while ($read) { ... }
close $read or
warn $! ? "Error closing pipe: $!" : "Exit status: $?";
print "It took ", tv_interval($t0), " seconds\n";
The close blocks and waits for the program to finish
Closing a pipe also waits for the process executing on the pipe to exit--in case you wish to look at the output of the pipe afterwards--and implicitly puts the exit status value of that command into $? [...]
For the status check see $? variable in perlvar and system
If the timed program forks and doesn't wait on its children in a blocking way this won't time them correctly.
In that case you need to identify resources that they use (files?) and monitor that.
I'd like to add that external commands should be put together carefully, to avoid shell injection trouble. A good module is String::ShellQuote. See for example this answer and this answer
Using a module for capturing streams would free you from the shell and perhaps open other ways to run and time this more reliably. A good one is Capture::Tiny (and there are others as well).
Thanks to HåkonHægland for comments. Thanks to ikegami for setting me straight, to use close (and not waitpid).
My understanding is that closing the handle for an IO::Pipe object should be done with the method ($fh->close) and not the built-in (close($fh)).
The other day I goofed and used the built-in out of habit on a IO::Pipe object that was opened to a command that I expected to fail. I was surprised when $? was zero, and my error checking wasn't triggered.
I realized my mistake. If I use the built-in, IO:Pipe can't perform the waitpid() and can't set $?. But what I was surprised by was that perl seemed to still close the pipe without setting $? via the core.
I worked up a little test script to show what I mean:
use 5.012;
use warnings;
use IO::Pipe;
say 'init pipes:';
pipes();
my $fh = IO::Pipe->reader(q(false));
say 'post open pipes:';
pipes();
say 'return: ' . $fh->close;
#say 'return: ' . close($fh);
say 'status: ' . $?;
say q();
say 'post close pipes:';
pipes();
sub pipes
{
for my $fd ( glob("/proc/self/fd/*") )
{
say readlink($fd) if -p $fd;
}
say q();
}
When using the method it shows the pipe being gone after the close and $? is set as I expected:
init pipes:
post open pipes:
pipe:[992006]
return: 1
status: 256
post close pipes:
And, when using the built-in it also appears to close the pipe, but does not set $?:
init pipes:
post open pipes:
pipe:[952618]
return: 1
status: 0
post close pipes:
It seems odd to me that the built-in results in the pipe closure, but doesn't set $?. Can anyone help explain the discrepancy?
Thanks!
If you look at the code for IO::Handle (of which IO::Pipe::End is a sub-class), you will see the following:
sub close {
#_ == 1 or croak 'usage: $io->close()';
my($io) = #_;
close($io);
}
It looks like $fh->close just calls close $fh. Of course, we should not be peeking behind the curtain.
We can see after IO::Pipe does a close $fh (behind the scenes), it then does a waitpid:
package IO::Pipe::End;
our(#ISA);
#ISA = qw(IO::Handle);
sub close {
my $fh = shift;
my $r = $fh->SUPER::close(#_); # <-- This just calls a CORE::close
waitpid(${*$fh}{'io_pipe_pid'},0)
if(defined ${*$fh}{'io_pipe_pid'});
$r;
}
Also interesting is this from the close Perldoc:
If the filehandle came from a piped open, close returns false if one of the other syscalls involved fails or if its program exits with non-zero status. If the only problem was that the program exited non-zero, $! will be set to 0 .
Closing a pipe also waits for the process executing on the pipe to exit --in case you wish to look at the output of the pipe
afterwards--and implicitly puts the exit status value of that command
into $? and ${^CHILD_ERROR_NATIVE} .
That answers your question right there.
But what I was surprised by was that perl seemed to still close the pipe without setting $? via the core.
Why would it? It has no way to know the process at the other end is a child, much less one for which the program should wait. Since it has no reason to call waitpid, $? isn't going to get set.
In fact, I doubt it wait for the process at the other end of the pipe even if it wanted to, because I doubt there's a way of obtaining the pid of the process at the other end of the pipe, because it's actually possible for there to be multiple processes at the other end of the pipe.
IO::Pipe::close only calls waitpid when IO::Pipe is used to "open a process".
Similarly, close only calls waitpid when open is used to "open a process".
A process "opened" using one method cannot be closed by the other.
It turns out that my confusion stems from a flawed assumption that the disappearing pipe coincided with a complete process termination. That appears to not be the case, as the process is still available for a wait().
> perl -MIO::Pipe -le 'my $io = IO::Pipe->reader(q(false)); close($io); print $?; print wait(); print $?'
0
8857
256