IPC::Open3 and determining if child is waiting for input - perl

sub run_command
{
my $COMMAND = shift;
my #OUTPUT;
my %CMD = {};
$CMD{pid} = open3(my $CH_IN, my $CH_OUT, my $CH_ERR, $COMMAND);
$CMD{_STDIN} = $CH_IN;
$CMD{_STDOUT} = $CH_OUT;
$CMD{_STDERR} = $CH_ERR;
my $line = readline $CMD{_STDOUT};
print $line;
# open my $CMDPROC, q{-|}, $COMMAND or return;
# foreach (<$CMDPROC>)
# {
# push #OUTPUT, "$ARG";
# }
close $CMDPROC or return;
return #OUTPUT
}
The above code is part of a script I am writing which needs to run another script (called child). The child may or may not prompt for input, depending on the presence of a cookie file in /var/tmp (both scripts written on CentOS5 / perl 5.8.8)
I need to determine if and when the child is waiting for input, so that the parent can pass input from STDIN of parent. I also need to use open3 to open the child process, as I need for parent to pass the brutal (Severity 1) check of Perl::Critic.
I included the comments, because when the cookie file is already set, I can at least get parent to call child properly since child doesn't wait for input in that case.
I've checked around trying to find examples of how to determine if the child is waiting for input. The one example I found used strace (http://www.perlmonks.org/?node_id=964971) and I feel as though that might be too complex for what I am trying to do.
Any links to guide me will be greatly appreciated.

You can check if there's space in the pipe (using select). You can even check how much space is available in the pipe. However, I've never heard of the ability to check if a thread is blocked waiting to read from the pipe. I think you should explore other avenues.
It seems to me that a program that only reads from STDIN when certain conditions unrelated to arguments are met would provide a prompt indicating it's waiting for input. If that's the case, one could use Expect to launch and control the child program.
But the simplest solution would be to write the data to STDIN unconditionally. Implementing this using IPC::Open3 is very complicated[1], so I recommend switching to IPC::Run3 (simpler) or IPC::Run (more flexible).
# Capture's child's STDERR
run3 [ $prog, #args ], \$text_for_stdin, \my $text_from_stdout, \my $text_from_stderr;
or
# Inherits parent's STDERR
run3 [ $prog, #args ], \$text_for_stdin, \my $text_from_stdout;
When you both write to the child's STDIN and read from the child's STDOUT, you need to use select (or something else) to avoid deadlocks. IPC::Open3 is very low level and doesn't do this for you, whereas handling this are IPC::Run3 and IPC::Run raison d'être.

Related

Flush output of child process

I created a child process via IPC::Open2.
I need to read from the stdout of this child process line by line.
Problem is, as the stdout of the child process is not connected to a terminal, it's fully buffered and I can't read from it until the process terminates.
How can I flush the output of the child process without modifying its code ?
child process code
while (<STDIN>) {
print "Received : $_";
}
parent process code:
use IPC::Open2;
use Symbol;
my $in = gensym();
my $out = gensym();
my $pid = open2($out, $in, './child_process');
while (<STDIN>) {
print $in $_;
my $line = <$out>;
print "child said : $line";
}
When I run the code, it get stucks waiting the output of the child process.
However, if I run it with bc the result is what I expect, I believe bc must manually flush its output
note:
In the child process if I add $| = 1 at the beginning or STDOUT->flush() after printing, the parent process can properly read from it.
However this is an example and I must handle programs that don't manually flush their output.
Unfortunately Perl has no control over the buffering behavior of the programs it executes. Some systems have an unbuffer utility that can do this. If you have access to this tool, you could say
my $pid = open2($out, $in, 'unbuffer ./child_process');
There's a discussion here about the equivalent tools for Windows, but I couldn't say whether any of them are effective.
One way to (try to) deal with buffering is to set up a terminal-like environment for the process, a pseudo-terminal (pty). That is not easy to do in general but IPC::Run has that capability ready for easy use.
Here is the driver, run for testing using at facility so that it has no controlling terminal (or run it via cron)
use warnings;
use strict;
use feature 'say';
use IPC::Run qw(run);
my #cmd = qw(./t_term.pl input arguments);
run \#cmd, '>pty>', sub { say "out: #_" };
#run \#cmd, '>', sub { say "out: #_" } # no pty
With >pty> it sets up a pseudo-terminal for STDOUT of the program in #cmd (with > it's a pipe); also see <pty< and see more about redirection.
The anonymous sub {} gets called every time there is output from the child, so one can process it as it goes. There are other related options.
The program that is called (t_term.pl) only tests for a terminal
use warnings;
use strict;
use feature 'say';
say "Is STDOUT filehandle attached to a terminal: ",
( (-t STDOUT) ? "yes" : "no" );
sleep 2;
say "bye from $$";
The -t STDOUT (see filetest operators) is a suitable way to check for a terminal in this example. For more/other ways see this post.
The output shows that the called program (t_term.pl) does see a terminal on its STDOUT, even when a driver runs without one (using at, or out of a crontab). If the >pty> is changed to the usual redirection > (a pipe) then there is no terminal.
Whether this solves the buffering problem is clearly up to that program, and to whether it is enough to fool it with a terminal.
Another way around the problem is using unbuffer when possible, as in mob's answer.

does Perl's Capture::Tiny::capture() avoid disk io required when using system()?

When calling an external program from a Perl script, does Capture::Tiny avoid disk io required when using system()? I get essentially the same performance when using either. A colleague is using my code and told me that it was hammering his disks. I (perhaps) don't have this problem when running on my local machine and writing to local disks.
I was previously doing this:
open($fhStdin, ">stdin.txt");
print $fhStdin "some text\n";
close($fhStdin);
system("cmd < stdin.txt 1> stdout.txt 2> stderr.txt");
# open and read stdout.txt
# open and read stderr.txt
And changed to this:
($stdout, $stderr, $exit) = capture {
open($fhStdin, '| cmd');
print $fhStdin "some text\n";
close($fhStdin);
};
But NYTProf tells me that they take essentially the same amount of time to run (but NYTProf removes disk io overheads from subroutine times). So I wondered if capture() is writing to temporary files under the hood? (I tried reading the Tiny.pm source code but am ashamed to say I couldn't tell from that.)
Thanks for any tips.
The documentation for Capture::Tiny::capture states that files are indeed used
Captures are normally done to an anonymous temporary filehandle.
This can be seen in the source for the _capture_tee sub, used as a generic routine for all methods. About half-way through this sub we find a call to File::Temp->new happening, unless named files are to be used (see below). The rest of processing can be traced with some care.†
The docs proceed to offer a way to monitor all this via named files instead
To capture via a named file (e.g. to externally monitor a long-running capture), provide custom filehandles as a trailing list of option pairs:
my $out_fh = IO::File->new("out.txt", "w+");
my $err_fh = IO::File->new("out.txt", "w+");
capture { ... } stdout => $out_fh, stderr => $err_fh;
The filehandles must be read/write and seekable. Modifying the files or filehandles during a capture operation will give unpredictable results. Existing IO layers on them may be changed by the capture.
(If this is done then the call to File::Temp doesn't go, as mentioned above. See source.)
If this disk activity is a problem you can use piped open to read cmd's output
(write its input to a file first), or use qx (backticks). But then you'd have to merge or redirect STDERR and go through more hoops to check and handle error.
Another option is to use IPC::Run3. While it also uses files it offers far more options which may be leveraged to lessen the disk I/O, or perhaps avoid disk altogether. (The idea to invoke with a filehandle opened to a scalar (in-memory) doesn't work since this isn't a real filehandle.‡ )
The "nuclear" option is the more complex IPC::Run which can take output without using disk.
† A crude sketch
The "dispatch" of all methods to _capture_tee is done in the beginning, where a set of flags is unshifted to #_ before goto &func takes it away, to distinguish methods. For capture this is 1,1,0,0, what sets up variables $do_stdout and $do_stderr in _capture_tee. These are then used to set up the %do hash, which keys are iterated over to set up $stash.
If extra arguments were passed to capture (for named files) then $stash->{capture} is set, otherwise a File::Temp object is assigned.
The $stash is later passed to _open_std where the redirection happens.
There is a lot more, but mostly related to manipulation of localized globs and layers.
‡ The most usual invocation writes to scalar(s)
run3 \#cmd, \my $in, \my $out, \my $err;
but this uses files, as explained in docs under How it works.
An attempt to trick it into not using files, by writing to a filehandle which is opened to a scalar
my #cmd = qw(ls -l .);
open my $fh, '>', \my $cmd_out; # not a real filehandle ...
run3 \#cmd, \undef, $fh; # ... so this won't work
aborts with
run3(): Invalid argument redirecting STDOUT at ...
This is because an open to a scalar doesn't set up a real filehandle. See this post.
If the filehandle is opened to a file this works as intended, writing to that file. This may well result in a more efficient disk I/O operation, compared with what Capture::Tiny does.

Perl: Pass one byte plus STDIN to another command

I would like to do this efficiently:
my $buf;
my $len = read(STDIN,$buf,1);
if($len) {
# Not empty
open(OUT,"|-", "wc") || die;
print OUT $buf;
# This is the line I want to do faster
print OUT <STDIN>;
exit;
}
The task is to start wc only if there is any input. If there is no input, the program should just exit.
wc is just an example here. It will be substituted with a much more complex command.
The input can be of several TB of data, so I would really like to not touch that data at all (not even with a sysread). I tried doing:
pipe(STDIN,OUT);
But that doesn't work. Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN? Maybe some open(">=&2") gymnastics combined with exec?
The FIONREAD ioctl, mentioned in the Perl Cookbook, can tell you how many bytes are pending on a file descriptor without consuming them. In perlish terms:
use strict;
use warnings;
use IO::Select qw( );
BEGIN { require 'sys/ioctl.ph'; }
sub fionread {
my $sz = pack('L', 0);
return unless ioctl($_[0], FIONREAD, $sz);
return unpack('L', $sz);
}
# Wait until it's known whether the handle has data to read or has reached EOF.
IO::Select->new(\*STDIN)->can_read();
if (fionread(\*STDIN)) {
system('wc');
# Check for errors
}
This should be very widely portable to UNIX and UNIX-like platforms.
A child process is always given duplicates of its parent's file handles, so simply starting wc - either with backticks or with a call to system or exec - will cause it to read from the same place as the Perl process's STDIN.
As for starting wc only when there is something to read, it looks like you need IO::Select, which will allow you either to check whether a file handle has something to read, or to block until it does have something.
This program will check whether STDIN has any data waiting, and run wc and print its output if so.
use strict;
use warnings;
use IO::Select;
my $select = IO::Select->new(\*STDIN);
if ( $select->can_read(0) ) {
print `wc`;
}
The parameter to can_read is a timeout in seconds. Passing a value of zero makes it return immediately, reporting true (actually it returns the file handle itself) if there is data waiting, or false (undef) if not.
If you don't pass a parameter then can_read will wait forever until there is something to read, so you can suspend your program and wait for data for wc by writing just
$select->can_read;
print `wc`;
or you could combine the construction of the object to make it even more concise
IO::Select->new(\*STDOUT)->can_read;
print `wc`;
Note also that IO::Select works fine with file descriptors too, and as the fileno for STDIN is zero, you could write
my $select = IO::Select(0)
but that isn't very descriptive and would need a comment to make sense
The specific solution in which you're interested is impossible.
As you surely discovered already, you can't determine if a file handle has reached EOF without reading from it. [Apparently, you can] select(2) will get you close. It will tell you that a handle has reached EOF or has data waiting, but it won't tell you which. This is why you're looking into alternate solutions. Unfortunately, the one you're looking into is just as impossible.
Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN?
No. OUT isn't code; it doesn't read anything. It's a variable. Furthermore, it's a variable in the parent. Changing a variable in the parent isn't going to affect the child.
Maybe you meant to ask: Can one tell the child program to start reading from a second handle?
No, generally speaking. You can't go and edit another program's variables. The program would have to be specifically written to accept two file handles and read from one after the other.
Then again, it's possible to obtain a file name for an arbitrary file handle, so all we need is a program that is specifically written to accept two file names and read from one after the other, and that's quite common.
$ echo abcdef | perl -MFcntl -e'
if (sysread(STDIN, $buf, 1)) {
pipe(my $r, my $w);
my $pid = fork();
if (!$pid) {
close($w);
# Clear close-on-exec flag.
my $flags = fcntl($r, Fcntl::F_GETFD, 0);
fcntl($r, Fcntl::F_SETFD, $flags & ~Fcntl::FD_CLOEXEC);
exec("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
die $!;
}
close($r);
print($w $buf);
close($w);
waitpid($pid, 0);
}
'
abcdef
(Lots of error checking needed.)
Above, cat was used an example where your program would be used, but that presents another solution: Why not just use cat? The overhead of cat should be quite minor for an IO-bound program.
use String::ShellQuote qw( shell_quote );
my $cmd1 = shell_quote("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
my $cmd2 = ...
exec("$cmd1 | $cmd2");

How do I influence the width of Perl IPC::Open3 output?

I have the following Perl code and would like it to display exactly as invoking /bin/ls in the terminal would display. For example on a terminal sized to 100 columns, it would print up to 100 characters worth of output before inserting a newline. Instead this code prints 1 file per line of output. I feel like it involves assigning some terminal settings to the IO::Pty instance, but I've tried variations of that without luck.
UPDATE: I replaced the <$READER> with a call to sysread hoping the original code might just have a buffering issue, but the output received from sysread is still one file per line.
UPDATE: I added code showing my attempt at changing the IO::Pty's size via the clone_winsize_from method. This didn't result in the output being any different.
UPDATE: As best I can tell (from reading IPC::open3 code for version 1.12) it seems you cannot pass a variable of type IO::Handle without open3 creating a pipe rather than dup'ing the filehandle. This means isatty doesn't return a true value when ls invokes it and ls then forces itself into "one file per line" mode.
I think I just need to do a fork/exec and handle the I/O redirection myself.
#!/usr/bin/env perl
use IPC::Open3;
use IO::Pty;
use strict;
my $READER = IO::Pty->new();
$READER->slave->clone_winsize_from(\*STDIN);
my $pid = open3(undef, $READER, undef, "/bin/ls");
while(my $line = <$READER>)
{
print $line;
}
waitpid($pid, 0) or die "Error waiting for pid: $!\n";
$READER->close();
I think $READER is getting overwritten with a pipe created by open3, which can be avoided by changing
my $READER = ...;
my $pid = open3(undef, $READER, undef, "/bin/ls");
to
local *READER = ...;
my $pid = open3(undef, '>&READER', undef, "/bin/ls");
See the docs.
You can pass the -C option to ls to force it to use columnar output (without getting IO::Pty involved).
The IO::Pty docs describe a clone_winsize_from(\*FH) method. You might try cloning your actual pty's dimensions.
I see that you're setting up the pty only as stdout of the child process. You might need to set it up also as its stdin — when the child process sends the "query terminal size" escape sequence to its stdout, it would need to receive the response on its stdin.

How can I capture the stdin and stdout of system command from a Perl script?

In the middle of a Perl script, there is a system command I want to execute. I have a string that contains the data that needs to be fed into stdin (the command only accepts input from stdin), and I need to capture the output written to stdout. I've looked at the various methods of executing system commands in Perl, and the open function seems to be what I need, except that it looks like I can only capture stdin or stdout, not both.
At the moment, it seems like my best solution is to use open, redirect stdout into a temporary file, and read from the file after the command finishes. Is there a better solution?
IPC::Open2/3 are fine, but I've found that usually all I really need is IPC::Run3, which handles the simple cases really well with minimal complexity:
use IPC::Run3; # Exports run3() by default
run3( \#cmd, \$in, \$out, \$err );
The documentation compares IPC::Run3 to other alternatives. It's worth a read even if you don't decide to use it.
The perlipc documentation covers many ways that you can do this, including IPC::Open2 and IPC::Open3.
Somewhere at the top of your script, include the line
use IPC::Open2;
That will include the necessary module, usually installed with most Perl distributions by default. (If you don't have it, you could install it using CPAN.) Then, instead of open, call:
$pid = open2($cmd_out, $cmd_in, 'some cmd and args');
You can send data to your command by sending it to $cmd_in and then read your command's output by reading from $cmd_out.
If you also want to be able to read the command's stderr stream, you can use the IPC::Open3 module instead.
IPC::Open3 would probably do what you want. It can capture STDERR and STDOUT.
http://metacpan.org/pod/IPC::Open3
A very easy way to do this that I recently found is the IPC::Filter module. It lets you do the job extremely intuitively:
$output = filter $input, 'somecmd', '--with', 'various=args', '--etc';
Note how it invokes your command without going through the shell if you pass it a list. It also does a reasonable job of handling errors for common utilities. (On failure, it dies, using the text from STDERR as its error message; on success, STDERR is just discarded.)
Of course, it’s not suitable for huge amounts of data since it provides no way of doing any streaming processing; also, the error handling might not be granular enough for your needs. But it makes the many simple cases really really simple.
I think you want to take a look at IPC::Open2
There is a special perl command for it
open2()
More info can be found on: http://sunsite.ualberta.ca/Documentation/Misc/perl-5.6.1/lib/IPC/Open2.html
If you do not want to include extra packages, you can just do
open(TMP,">tmpfile");
print TMP $tmpdata ;
open(RES,"$yourcommand|");
$res = "" ;
while(<RES>){
$res .= $_ ;
}
which is the contrary of what you suggested, but should work also.
I always do it this way if I'm only expecting a single line of output or want to split the result on something other than a newline:
my $result = qx( command args 2>&1 );
my $rc=$?;
# $rc >> 8 is the exit code of the called program.
if ($rc != 0 ) {
error();
}
If you want to deal with a multi-line response, get the result as an array:
my #lines = qx( command args 2>&1 );
foreach ( my $line ) (#lines) {
if ( $line =~ /some pattern/ ) {
do_something();
}
}