I'm working on a library with a test suite that uses Perl open to run it's tests. It looks something like this:
open (MYOUT, "$myprog $arg1 $arg2 $arg3 2>&1 |") die "Bad stuff happened";
What I'd really like to do is to measure the runtime of $myprog. Unfortunately, just grabbing a start time and end time around the open command just grabs roughly how long it takes to start up the process.
Is there some way of either forcing the open command to finish the process (and therefore accurately measure time) or perhaps something else that would accomplish the same thing?
Key constraints are that we need to capture (potentially a lot of) STDOUT and STDERR.
Since you open a pipe, you need to time from before opening to at least after the reading
use warnings;
use strict;
use Time::HiRes qw(gettimeofday tv_interval sleep);
my $t0 = [gettimeofday];
open my $read, '-|', qw(ls -l) or die "Can't open process: $!";
while (<$read>)
{
sleep 0.1;
print;
}
print "It took ", tv_interval($t0), " seconds\n";
# close pipe and check
or, to time the whole process, after calling close on the pipe (after all reading is done)
my $t0 = [gettimeofday];
open my $read, '-|', qw(ls -l) or die "Can't open process: $!";
# ... while ($read) { ... }
close $read or
warn $! ? "Error closing pipe: $!" : "Exit status: $?";
print "It took ", tv_interval($t0), " seconds\n";
The close blocks and waits for the program to finish
Closing a pipe also waits for the process executing on the pipe to exit--in case you wish to look at the output of the pipe afterwards--and implicitly puts the exit status value of that command into $? [...]
For the status check see $? variable in perlvar and system
If the timed program forks and doesn't wait on its children in a blocking way this won't time them correctly.
In that case you need to identify resources that they use (files?) and monitor that.
I'd like to add that external commands should be put together carefully, to avoid shell injection trouble. A good module is String::ShellQuote. See for example this answer and this answer
Using a module for capturing streams would free you from the shell and perhaps open other ways to run and time this more reliably. A good one is Capture::Tiny (and there are others as well).
Thanks to HåkonHægland for comments. Thanks to ikegami for setting me straight, to use close (and not waitpid).
Related
The following script creates a gziped file named "input.gz". Then the script attempts to open "input.gz" using gzip -dc. Intuitively, die should be triggered if a wrong input file name is provided. However, as in the following script, the program will not die even if a wrong input file name is provided ("inputx.gz"):
use warnings;
use strict;
system("echo PASS | gzip -c > input.gz");
open(IN,"-|","gzip -dc inputx.gz") || die "can't open input.gz!";
print STDOUT "die statment was not triggered!\n";
close IN;
The output of the script above was
die statment was not triggered!
gzip: inputx.gz: No such file or directory
My questions is: why wasn't die statement triggered even though gzip quit with error? And how can I make die statement triggered when a wrong file name is given?
It's buried in perlipc, but this seems relevant (emphasis added):
Be careful to check the return values from both open() and close(). If you're writing to a pipe, you should also trap SIGPIPE. Otherwise, think of what happens when you start up a pipe to a command that doesn't exist: the open() will in all likelihood succeed (it only reflects the fork()'s success), but then your output will fail--spectacularly. Perl can't know whether the command worked, because your command is actually running in a separate process whose exec() might have failed. Therefore, while readers of bogus commands return just a quick EOF, writers to bogus commands will get hit with a signal, which they'd best be prepared to handle.
Use IO::Uncompress::Gunzip to read gzipped files instead.
The open documentation is explicit about open-ing a process since that is indeed different
If you open a pipe on the command - (that is, specify either |- or -| with the one- or two-argument forms of open), an implicit fork is done, so open returns twice: in the parent process it returns the pid of the child process, and in the child process it returns (a defined) 0. Use defined($pid) or // to determine whether the open was successful.
For example, use either
my $child_pid = open(my $from_kid, "-|") // die "Can't fork: $!";
or
my $child_pid = open(my $to_kid, "|-") // die "Can't fork: $!";
(with code following that shows one use of this, which you don't need) The main point is to check for defined -- by design we get undef if open for a process fails, not just any "false."
While this should be corrected, keep in mind that the open call fails if fork itself fails, what is rare; in most cases when a "command fails" the fork was successful but something later wasn't. So in such cases we just cannot get the // die message, but end up seeing messages from the shell or command or OS, hopefully.
This is alright though, if informative messages indeed get emitted by some part of the process. Wrap the whole thing in eval and you'll have manageable error reporting.
But it is in general difficult to ensure to get all the right messages, and in some cases not possible. One good approach is to use a module for running and managing external commands. Among the many other advantages they also usually handle errors much more nicely. If you need to handle process's output right as it is emitted I recommend IPC::Run (which i'd recommend otherwise as well).
Read on what linked docs say, for specific examples on what you need and for much useful insight.
In your case
# Check input, depending on how it is given,
# consider String::ShellQuote if needed
my $file = ...;
my #cmd = ('gzip', '-dc', $file);
my $child_pid = open my $in, '-|', #cmd
// die "Can't fork for '#cmd': $!";
while (<$in>) {
...
}
close $in or die "Error closing pipe: $!";
Note a few other points
the "list form" of the command bypasses the shell
lexical filehandle (my $fh) is much better than typeglobs (IN)
print the actual error in the die statement, in $! variable
check close for a good final check on how it all went
I created a child process via IPC::Open2.
I need to read from the stdout of this child process line by line.
Problem is, as the stdout of the child process is not connected to a terminal, it's fully buffered and I can't read from it until the process terminates.
How can I flush the output of the child process without modifying its code ?
child process code
while (<STDIN>) {
print "Received : $_";
}
parent process code:
use IPC::Open2;
use Symbol;
my $in = gensym();
my $out = gensym();
my $pid = open2($out, $in, './child_process');
while (<STDIN>) {
print $in $_;
my $line = <$out>;
print "child said : $line";
}
When I run the code, it get stucks waiting the output of the child process.
However, if I run it with bc the result is what I expect, I believe bc must manually flush its output
note:
In the child process if I add $| = 1 at the beginning or STDOUT->flush() after printing, the parent process can properly read from it.
However this is an example and I must handle programs that don't manually flush their output.
Unfortunately Perl has no control over the buffering behavior of the programs it executes. Some systems have an unbuffer utility that can do this. If you have access to this tool, you could say
my $pid = open2($out, $in, 'unbuffer ./child_process');
There's a discussion here about the equivalent tools for Windows, but I couldn't say whether any of them are effective.
One way to (try to) deal with buffering is to set up a terminal-like environment for the process, a pseudo-terminal (pty). That is not easy to do in general but IPC::Run has that capability ready for easy use.
Here is the driver, run for testing using at facility so that it has no controlling terminal (or run it via cron)
use warnings;
use strict;
use feature 'say';
use IPC::Run qw(run);
my #cmd = qw(./t_term.pl input arguments);
run \#cmd, '>pty>', sub { say "out: #_" };
#run \#cmd, '>', sub { say "out: #_" } # no pty
With >pty> it sets up a pseudo-terminal for STDOUT of the program in #cmd (with > it's a pipe); also see <pty< and see more about redirection.
The anonymous sub {} gets called every time there is output from the child, so one can process it as it goes. There are other related options.
The program that is called (t_term.pl) only tests for a terminal
use warnings;
use strict;
use feature 'say';
say "Is STDOUT filehandle attached to a terminal: ",
( (-t STDOUT) ? "yes" : "no" );
sleep 2;
say "bye from $$";
The -t STDOUT (see filetest operators) is a suitable way to check for a terminal in this example. For more/other ways see this post.
The output shows that the called program (t_term.pl) does see a terminal on its STDOUT, even when a driver runs without one (using at, or out of a crontab). If the >pty> is changed to the usual redirection > (a pipe) then there is no terminal.
Whether this solves the buffering problem is clearly up to that program, and to whether it is enough to fool it with a terminal.
Another way around the problem is using unbuffer when possible, as in mob's answer.
I need (would like?) to spawn a slow process from a web app using a Minion queue.
The process - a GLPK solver - can run for a long time but generates progress output.
I'd like to capture that output as it happens and write it to somewhere (database? log file?) so that it can be played back to the user as a status update inside the web app.
Is that possible? I have no idea (hence no code).
I was exploring Capture::Tiny - the simplicity of it is nice but I can't tell if it can track write events upon writing.
A basic way is to use pipe open, where you open a pipe to a process that gets forked. Then the STDOUT from the child is piped to the filehandle in the parent, or the parent pipes to its STDIN.
use warnings;
use strict;
my #cmd = qw(ls -l .); # your command
my $pid = open(my $fh, '-|', #cmd) // die "Can't open pipe from #cmd: $!";
while (<$fh>) {
print;
}
close $fh or die "Error closing pipe from #cmd: $!";
This way the parent receives child's STDOUT right as it is emitted.†
There is a bit more that you can do with error checking, see the man page, close, and $? in perlvar. Also, install a handler for SIGPIPE, see perlipc and %SIG in perlvar.
There are modules that make it far easier to run and manage external commands and, in particular, check errors. However, Capture::Tiny and IPC::Run3 use files to transfer the external program's streams.
On the other hand, the IPC::Run gives you far more control and power.
To have code executed "... each time some data is read from the child" use a callback
use warnings;
use strict;
use IPC::Run qw(run);
my #cmd = (
'perl',
'-le',
'STDOUT->autoflush(1); for (qw( abc def ghi )) { print; sleep 1; }'
);
run \#cmd, '>', sub { print $_[0] };
Once you use IPC::Run a lot more is possible, including better error interrogation, setting up pseudo tty for the process, etc. For example, using >pty> instead of > sets up a terminal-like environment so the external program that is run may turn back to line buffering and provide more timely output. If demands on how to manage the process grow more complex then work will be easier with the module.
Thanks to ikegami for comments, including the demo #cmd.
† To demonstrate that the parent receives child's STDOUT as it is emitted use a command that emits output with delays. For example, instead of ls -l above use
my #cmd = (
'perl',
'-le',
'STDOUT->autoflush(1); for (qw( abc def ghi )) { print; sleep 1; }'
);
This Perl one-liner prints words one second apart, and that is how they wind up on screen.
I am trying to do a fairly simple process using Perl. A snippet of the code:
open(FH,"<command> |") or die "Could not run command .. $!\n";
print "After open\n";
while(<FH>)
{
print "I am inside loop\n";
<process..something>
}
I am seeing some inexplicable delays when the while() is called. I see the open took 9-10ms to run ( which is within range ), however I do see 200 - 250ms delay between the messages "After open" and "I am inside loop".
Has anyone seen anything like this before ? Any help would be appreciated.
Thanks
Rajib
This is almost certainly because the output from <command> is buffered until either the buffer fills up or the process terminates
You can probably get around this using unbuffer, which pretends to the command that it is outputting to a terminal
Try using this instead
open my $fh, '-|', 'unbuffer <command>' or die "Could not run command: $!\n";
My understanding is that closing the handle for an IO::Pipe object should be done with the method ($fh->close) and not the built-in (close($fh)).
The other day I goofed and used the built-in out of habit on a IO::Pipe object that was opened to a command that I expected to fail. I was surprised when $? was zero, and my error checking wasn't triggered.
I realized my mistake. If I use the built-in, IO:Pipe can't perform the waitpid() and can't set $?. But what I was surprised by was that perl seemed to still close the pipe without setting $? via the core.
I worked up a little test script to show what I mean:
use 5.012;
use warnings;
use IO::Pipe;
say 'init pipes:';
pipes();
my $fh = IO::Pipe->reader(q(false));
say 'post open pipes:';
pipes();
say 'return: ' . $fh->close;
#say 'return: ' . close($fh);
say 'status: ' . $?;
say q();
say 'post close pipes:';
pipes();
sub pipes
{
for my $fd ( glob("/proc/self/fd/*") )
{
say readlink($fd) if -p $fd;
}
say q();
}
When using the method it shows the pipe being gone after the close and $? is set as I expected:
init pipes:
post open pipes:
pipe:[992006]
return: 1
status: 256
post close pipes:
And, when using the built-in it also appears to close the pipe, but does not set $?:
init pipes:
post open pipes:
pipe:[952618]
return: 1
status: 0
post close pipes:
It seems odd to me that the built-in results in the pipe closure, but doesn't set $?. Can anyone help explain the discrepancy?
Thanks!
If you look at the code for IO::Handle (of which IO::Pipe::End is a sub-class), you will see the following:
sub close {
#_ == 1 or croak 'usage: $io->close()';
my($io) = #_;
close($io);
}
It looks like $fh->close just calls close $fh. Of course, we should not be peeking behind the curtain.
We can see after IO::Pipe does a close $fh (behind the scenes), it then does a waitpid:
package IO::Pipe::End;
our(#ISA);
#ISA = qw(IO::Handle);
sub close {
my $fh = shift;
my $r = $fh->SUPER::close(#_); # <-- This just calls a CORE::close
waitpid(${*$fh}{'io_pipe_pid'},0)
if(defined ${*$fh}{'io_pipe_pid'});
$r;
}
Also interesting is this from the close Perldoc:
If the filehandle came from a piped open, close returns false if one of the other syscalls involved fails or if its program exits with non-zero status. If the only problem was that the program exited non-zero, $! will be set to 0 .
Closing a pipe also waits for the process executing on the pipe to exit --in case you wish to look at the output of the pipe
afterwards--and implicitly puts the exit status value of that command
into $? and ${^CHILD_ERROR_NATIVE} .
That answers your question right there.
But what I was surprised by was that perl seemed to still close the pipe without setting $? via the core.
Why would it? It has no way to know the process at the other end is a child, much less one for which the program should wait. Since it has no reason to call waitpid, $? isn't going to get set.
In fact, I doubt it wait for the process at the other end of the pipe even if it wanted to, because I doubt there's a way of obtaining the pid of the process at the other end of the pipe, because it's actually possible for there to be multiple processes at the other end of the pipe.
IO::Pipe::close only calls waitpid when IO::Pipe is used to "open a process".
Similarly, close only calls waitpid when open is used to "open a process".
A process "opened" using one method cannot be closed by the other.
It turns out that my confusion stems from a flawed assumption that the disappearing pipe coincided with a complete process termination. That appears to not be the case, as the process is still available for a wait().
> perl -MIO::Pipe -le 'my $io = IO::Pipe->reader(q(false)); close($io); print $?; print wait(); print $?'
0
8857
256