Perl bidirectional pipe IPC, how to avoid output buffering - perl

I am trying to communicate with an interactive process. I want my perl script to be a "moddle man" between the user and the process. The process puts text to stdout, prompts the user for a command, puts more text to stdout, prompts the user for a command, ....... A primitive graphic is provided:
User <----STDOUT---- interface.pl <-----STDOUT--- Process
User -----STDIN----> interface.pl ------STDIN---> Process
User <----STDOUT---- interface.pl <-----STDOUT--- Process
User -----STDIN----> interface.pl ------STDIN---> Process
User <----STDOUT---- interface.pl <-----STDOUT--- Process
User -----STDIN----> interface.pl ------STDIN---> Process
The following simulates what I'm trying to do:
#!/usr/bin/perl
use strict;
use warnings;
use FileHandle;
use IPC::Open2;
my $pid = open2( \*READER, \*WRITER, "cat -n" );
WRITER->autoflush(); # default here, actually
my $got = "";
my $input = " ";
while ($input ne "") {
chomp($input = <STDIN>);
print WRITER "$input \n";
$got = <READER>;
print $got;
}
DUe to output buffering the above example does not work. No matter what text is typed in, or how many enters are pressed the program just sits there. The way to fix it is to issue:
my $pid = open2( \*READER, \*WRITER, "cat -un" );
Notice "cat -un" as opposed to just "cat -n". -u turns off output buffering on cat. When output buffering is turned off this works. The process I am trying to interact with most likely buffers output as I am facing the same issues with "cat -n". Unfortunately I can not turn off output buffering on the process I am communicating with, so how do I handle this issue?
UPDATE1 (using ptty):
#!/usr/bin/perl
use strict;
use warnings;
use IO::Pty;
use IPC::Open2;
my $reader = new IO::Pty;
my $writer = new IO::Pty;
my $pid = open2( $reader, $writer, "cat -n" );
my $got = "";
my $input = " ";
$writer->autoflush(1);
while ($input ne "") {
chomp($input = <STDIN>);
$writer->print("$input \n");
$got = $reader->getline;
print $got;
}
~

There are three kinds of buffering:
Block buffering: Output is placed into a fixed-sized buffer. The buffer is flushed when it becomes full. You'll see the output come out in chunks.
Line buffering: Output is placed into a fixed-sized buffer. The buffer is flushed when a newline is added to the buffer and when it becomes full.
No buffering: Output is passed directly to the OS.
In Perl, buffering works as follows:
File handles are buffered by default. One exception: STDERR is not buffered by default.
Block buffering is used. One exception: STDOUT is line buffered if and only if it's connected to a terminal.
Reading from STDIN flushes the buffer for STDOUT.
Until recently, Perl used 4KB buffers. Now, the default is 8KB, but that can be changed when Perl is built.
This first two are surprisingly standard across all applications. That means:
User -------> interface.pl
User is a person. He doesn't buffer per say, though it's a very slow source of data. OK
interface.pl ----> Process
interface.pl's output is block buffered. BAD
Fixed by adding the following to interface.pl:
use IO::Handle qw( );
WRITER->autoflush(1);
Process ----> interface.pl
Process's output is block buffered. BAD
Fixed by adding the following to Process:
use IO::Handle qw( );
STDOUT->autoflush(1);
Now, you're probably going to tell me you can't change Process. If so, that leaves you three options:
Use a command line or configuration option provided by tool to change its buffering behaviour. I don't know of any tools that provide such an option.
Fool the child to use line buffering instead of block buffering by using a pseudo tty instead of a pipe.
Quitting.
interface.pl -------> User
interface.pl's output is line buffered. OK (right?)

Related

Flush output of child process

I created a child process via IPC::Open2.
I need to read from the stdout of this child process line by line.
Problem is, as the stdout of the child process is not connected to a terminal, it's fully buffered and I can't read from it until the process terminates.
How can I flush the output of the child process without modifying its code ?
child process code
while (<STDIN>) {
print "Received : $_";
}
parent process code:
use IPC::Open2;
use Symbol;
my $in = gensym();
my $out = gensym();
my $pid = open2($out, $in, './child_process');
while (<STDIN>) {
print $in $_;
my $line = <$out>;
print "child said : $line";
}
When I run the code, it get stucks waiting the output of the child process.
However, if I run it with bc the result is what I expect, I believe bc must manually flush its output
note:
In the child process if I add $| = 1 at the beginning or STDOUT->flush() after printing, the parent process can properly read from it.
However this is an example and I must handle programs that don't manually flush their output.
Unfortunately Perl has no control over the buffering behavior of the programs it executes. Some systems have an unbuffer utility that can do this. If you have access to this tool, you could say
my $pid = open2($out, $in, 'unbuffer ./child_process');
There's a discussion here about the equivalent tools for Windows, but I couldn't say whether any of them are effective.
One way to (try to) deal with buffering is to set up a terminal-like environment for the process, a pseudo-terminal (pty). That is not easy to do in general but IPC::Run has that capability ready for easy use.
Here is the driver, run for testing using at facility so that it has no controlling terminal (or run it via cron)
use warnings;
use strict;
use feature 'say';
use IPC::Run qw(run);
my #cmd = qw(./t_term.pl input arguments);
run \#cmd, '>pty>', sub { say "out: #_" };
#run \#cmd, '>', sub { say "out: #_" } # no pty
With >pty> it sets up a pseudo-terminal for STDOUT of the program in #cmd (with > it's a pipe); also see <pty< and see more about redirection.
The anonymous sub {} gets called every time there is output from the child, so one can process it as it goes. There are other related options.
The program that is called (t_term.pl) only tests for a terminal
use warnings;
use strict;
use feature 'say';
say "Is STDOUT filehandle attached to a terminal: ",
( (-t STDOUT) ? "yes" : "no" );
sleep 2;
say "bye from $$";
The -t STDOUT (see filetest operators) is a suitable way to check for a terminal in this example. For more/other ways see this post.
The output shows that the called program (t_term.pl) does see a terminal on its STDOUT, even when a driver runs without one (using at, or out of a crontab). If the >pty> is changed to the usual redirection > (a pipe) then there is no terminal.
Whether this solves the buffering problem is clearly up to that program, and to whether it is enough to fool it with a terminal.
Another way around the problem is using unbuffer when possible, as in mob's answer.

Can I capture STDOUT write events from a process in perl?

I need (would like?) to spawn a slow process from a web app using a Minion queue.
The process - a GLPK solver - can run for a long time but generates progress output.
I'd like to capture that output as it happens and write it to somewhere (database? log file?) so that it can be played back to the user as a status update inside the web app.
Is that possible? I have no idea (hence no code).
I was exploring Capture::Tiny - the simplicity of it is nice but I can't tell if it can track write events upon writing.
A basic way is to use pipe open, where you open a pipe to a process that gets forked. Then the STDOUT from the child is piped to the filehandle in the parent, or the parent pipes to its STDIN.
use warnings;
use strict;
my #cmd = qw(ls -l .); # your command
my $pid = open(my $fh, '-|', #cmd) // die "Can't open pipe from #cmd: $!";
while (<$fh>) {
print;
}
close $fh or die "Error closing pipe from #cmd: $!";
This way the parent receives child's STDOUT right as it is emitted.†
There is a bit more that you can do with error checking, see the man page, close, and $? in perlvar. Also, install a handler for SIGPIPE, see perlipc and %SIG in perlvar.
There are modules that make it far easier to run and manage external commands and, in particular, check errors. However, Capture::Tiny and IPC::Run3 use files to transfer the external program's streams.
On the other hand, the IPC::Run gives you far more control and power.
To have code executed "... each time some data is read from the child" use a callback
use warnings;
use strict;
use IPC::Run qw(run);
my #cmd = (
'perl',
'-le',
'STDOUT->autoflush(1); for (qw( abc def ghi )) { print; sleep 1; }'
);
run \#cmd, '>', sub { print $_[0] };
Once you use IPC::Run a lot more is possible, including better error interrogation, setting up pseudo tty for the process, etc. For example, using >pty> instead of > sets up a terminal-like environment so the external program that is run may turn back to line buffering and provide more timely output. If demands on how to manage the process grow more complex then work will be easier with the module.
Thanks to ikegami for comments, including the demo #cmd.
† To demonstrate that the parent receives child's STDOUT as it is emitted use a command that emits output with delays. For example, instead of ls -l above use
my #cmd = (
'perl',
'-le',
'STDOUT->autoflush(1); for (qw( abc def ghi )) { print; sleep 1; }'
);
This Perl one-liner prints words one second apart, and that is how they wind up on screen.

How to print before a while loop is started in Perl?

I have this code in Perl:
print "Processing ... ";
while ( some condition ) {
# do something over than 10 minutes
}
print "OK\n";
Now I get back the first print after the while loop is finished.
How can I print the messeage before the while loop is started?
Output is buffered, meaning the program decides when it actually renders what you printed. You can put
$| = 1;
to flush stdout in this single instance. For more methods (auto-flushing, file flushing etc) you can search around SO for questions about this.
Ordinarily, perl will buffer up to 8KB of output text before flushing it to the device, or up to the next newline if the device is a terminal. You can avoid this by adding
STDOUT->autoflush
to the top of your code, assuming that you are printing to STDOUT. This will force the data to be flushed after every print, say or write operation
Note that this is the same as using $| = 1 but is significantly less cryptic and allows you to change the properties of any given file handle
You can see the prints by flushing the buffers immediately after.
print "Processing ... ";
STDOUT->flush;
If you are using autoflush, you should save the current configuration by duplicating the file handle.
use autodie; # dies if error on open and close.
{
STDOUT->flush; # empty its buffer
open my $saved_stdout, '>&', \*STDOUT;
STDOUT->autoflush;
# ... output with autoflush active
open STDOUT, '>&', $saved_stdout; # restore old STDOUT
}
See perldoc -f open and search for /\>\&/

Perl: retrieve output from process in IPC::Run if it dies

I have been running some commands with the IPC::Run module and everything is fine, except that I can't access the output (STDOUT, STDERR), the process produced and were redirected into variables. Is there a way to retrieve those in the error handling?
#commands = ();
foreach my $id (1..3) {
push #commands, ["perl", "script" . $id . ".pl"];
}
foreach my $cmd (#commands) {
my $out = "";
my $err = "";
my $h = harness $cmd, \undef, \$out, \$err, timeout(12,exception => {name => 'timeout'});
eval {
run $h;
};
if ($#) {
my $err_msg = $#; # save in case another error happens
print "$out\n";
print "$err\n";
$h->kill_kill;
}
}
I don't need any input for now, I just need to execute it and get the output.
EDIT
I have been testing it with running perl scripts which look like this:
for (my $i = 0; $i < 10; $i++) {
sleep 1;
print "Hello from script 1 " . localtime() . "\n";
}
I have 3 such scripts with different times and the 3rd takes 20 seconds to complete, which is more than the 12 I have in the timer.
As noted by #ysth, the reason you do not get any output, is that the STDOUT and STDERR of the process corresponding to the command $cmd, is not line buffered, but rather block buffered. So all output is collected in a buffer which is not shown (printed) until the buffer is full or it is explicitly flushed. However, when your command times out, all the output is still in the buffer and has not yet been flushed and hence collected into the variable $out in the parent process (script).
Also note that since your $cmd script is a Perl script, this behavior is documented in perlvar:
$|
If set to nonzero, forces a flush right away and after every write
or print on the currently selected output channel. Default is 0
(regardless of whether the channel is really buffered by the system or
not; $| tells you only whether you've asked Perl explicitly to flush
after each write). STDOUT will typically be line buffered if output is
to the terminal and block buffered otherwise.
The problem (that the program is not connected to a terminal or a tty) is also noted in the documentation page for IPC::Run :
Interactive applications are usually optimized for human use. This can
help or hinder trying to interact with them through modules like
IPC::Run. Frequently, programs alter their behavior when they detect
that stdin, stdout, or stderr are not connected to a tty, assuming
that they are being run in batch mode. Whether this helps or hurts
depends on which optimizations change. And there's often no way of
telling what a program does in these areas other than trial and error
and occasionally, reading the source. This includes different versions
and implementations of the same program.
The documentation also lists a set of possible workarounds, including using pseudo terminals.
One solution for your specific case is then to explicitly make STDOUT line buffered at the beginning of your script:
STDOUT->autoflush(1); # Make STDOUT line buffered
# Alternatively use: $| = 1;
for (my $i = 0; $i < 10; $i++) {
sleep 1;
print "Hello from script 1 " . localtime() . "\n";
}
Edit:
If you cannot modify the scripts you are running for some reason, you could try connect the script to a pseudo terminal. So instead of inserting statements like STDOUT->autoflush(1) in the source code of the script, you can fool the script to believe it is connected to a terminal, and hence that it should use line buffering. For your case, we just add a >pty> argument before the \$out argument in the call to harness:
my $h = harness $cmd, \undef, '>pty>', \$out,
timeout(12, exception => {name => 'timeout'});
eval {
run $h;
};

How to capture all output of a command which also requires user input from terminal?

I would like to capture all output (both STDOUT and STDERR) of a command that also requires user interaction from the terminal window, i.e. it reads STDIN and then prints something to STDOUT.
Here is minimal version of the script I want to capture the output from:
user.pl:
#! /usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
print "Enter URL: ";
my $ans = <STDIN>;
# do something based on $ans
say "Verification code: AIwquj2VVkwlWEBwway";
say "Access Token: bskjZO8iZotv!";
I tried using Capture::Tiny :
p.pl:
#! /usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
use Capture::Tiny qw(tee_merged);
my $output = tee_merged {
#STDOUT->autoflush(1); # This does not work
system "user.pl";
};
if ( $output =~ /Access Token: (.*)$/ ) {
say $1;
}
but it does not work, since the prompt is not displayed until after the user has entered the input in the terminal.
Edit:
It seems it works fine if I replace user.pl with a python script. For example:
user.py:
#! /usr/bin/env python3
ans = input( 'Enter URL: ' )
# do something based on $ans
print( 'Verification code: AIwquj2VVkwlWEBwway' )
print( 'Access Token: bskjZO8iZotv!' )
TL/DR There is a solution, it's somewhat ugly, but it works. There are some minor caveats.
What's going on? The problem is actually in user.pl. The sample user.pl that you provided works like this: It starts by printing the string Enter URL: to its stdout, it then flushes its stdout and it then reads a line from its stdin. The flushing of the stdout occurs automatically by perl: when you try do read from stdin with <..> (aka readline), perl flushes stdout. It does that precisely to make programs like this behave correctly. Unfortunately, it appears that perl only implements this behavior when stdout is a tty (pseudo-terminal). If not, it does not flush stdout before reading from stdin. This is why the script works when you execute it in an interactive terminal session and it doesn't work correctly when you try to capture its output (because in that case its stdout is connected to a pipe).
How to fix this? Since user.pl misbehaves if its stdout is not a tty, we must use a tty. AFAIK, IPC::Run is the only perl module that can capture the output of a subprocess using a tty instead of a plain pipe. Unfortunately, when using a tty, IPC::Run does not allow us to redirect stdout only, it forces us to redirect stdin too. Because of that, we have to handle reading from stdin in the parent process on behalf of the child process (yikes!). Here's an example implementation of p.pl using IPC::Run:
#!/usr/bin/perl
use strict;
use warnings;
use IO::Handle;
use IPC::Run;
my $complete_output='';
my $in='';
my $out='';
my $h=IPC::Run::start ['./user.pl'],'<pty<',\$in,'>pty>',\$out;
while ($h->pumpable) {
$h->pump;
print $out;
STDOUT->flush;
if ($out eq 'Enter URL: ') {
$in.=<STDIN>;
}
$complete_output.=$out;
$out='';
}
$h->finish;
# do something with $complete_output here
So this is somewhat ugly. For example, we try do detect when the subprocess is waiting for user input (by looking for the string Enter URL:) and when it does, we read the user input in the parent process and then pass it to the child. Also notice that we have to implement the tee functionality ourselves since IPC::Run doesn't offer that.
There are some caveats. The way we handle user input, if the subprocess uses something like the readline library to support line editing, this will not work, because we do all the reading in the parent process with a simple <STDIN>. Also, because a tty is used behind the scenes instead of a pipe, all user input will be echoed to stdout. So whatever the user types in prompt, we put it in $in to send it to the process and will get it back from the process (via the $out variable). But since our terminal has also echo, the text will appear twice. One solution is filter $out to remove the user input and to prevent us from printing it.
Finally, this will not work on Windows.
Write your input prompt directly to the tty.
open TTY, '>', '/dev/tty'; # or 'con' in Windows
print TTY "Enter URL:";
my $ans = <STDIN>;
...