How do I influence the width of Perl IPC::Open3 output?

How do I influence the width of Perl IPC::Open3 output? - perl

I have the following Perl code and would like it to display exactly as invoking /bin/ls in the terminal would display. For example on a terminal sized to 100 columns, it would print up to 100 characters worth of output before inserting a newline. Instead this code prints 1 file per line of output. I feel like it involves assigning some terminal settings to the IO::Pty instance, but I've tried variations of that without luck.
UPDATE: I replaced the <$READER> with a call to sysread hoping the original code might just have a buffering issue, but the output received from sysread is still one file per line.
UPDATE: I added code showing my attempt at changing the IO::Pty's size via the clone_winsize_from method. This didn't result in the output being any different.
UPDATE: As best I can tell (from reading IPC::open3 code for version 1.12) it seems you cannot pass a variable of type IO::Handle without open3 creating a pipe rather than dup'ing the filehandle. This means isatty doesn't return a true value when ls invokes it and ls then forces itself into "one file per line" mode.
I think I just need to do a fork/exec and handle the I/O redirection myself.
#!/usr/bin/env perl
use IPC::Open3;
use IO::Pty;
use strict;
my $READER = IO::Pty->new();
$READER->slave->clone_winsize_from(\*STDIN);
my $pid = open3(undef, $READER, undef, "/bin/ls");
while(my $line = <$READER>)
{
print $line;
}
waitpid($pid, 0) or die "Error waiting for pid: $!\n";
$READER->close();

I think $READER is getting overwritten with a pipe created by open3, which can be avoided by changing
my $READER = ...;
my $pid = open3(undef, $READER, undef, "/bin/ls");
to
local *READER = ...;
my $pid = open3(undef, '>&READER', undef, "/bin/ls");
See the docs.

You can pass the -C option to ls to force it to use columnar output (without getting IO::Pty involved).

The IO::Pty docs describe a clone_winsize_from(\*FH) method. You might try cloning your actual pty's dimensions.
I see that you're setting up the pty only as stdout of the child process. You might need to set it up also as its stdin — when the child process sends the "query terminal size" escape sequence to its stdout, it would need to receive the response on its stdin.

Related

Flush output of child process

I created a child process via IPC::Open2.
I need to read from the stdout of this child process line by line.
Problem is, as the stdout of the child process is not connected to a terminal, it's fully buffered and I can't read from it until the process terminates.
How can I flush the output of the child process without modifying its code ?
child process code
while (<STDIN>) {
print "Received : $_";
}
parent process code:
use IPC::Open2;
use Symbol;
my $in = gensym();
my $out = gensym();
my $pid = open2($out, $in, './child_process');
while (<STDIN>) {
print $in $_;
my $line = <$out>;
print "child said : $line";
}
When I run the code, it get stucks waiting the output of the child process.
However, if I run it with bc the result is what I expect, I believe bc must manually flush its output
note:
In the child process if I add $| = 1 at the beginning or STDOUT->flush() after printing, the parent process can properly read from it.
However this is an example and I must handle programs that don't manually flush their output.

Unfortunately Perl has no control over the buffering behavior of the programs it executes. Some systems have an unbuffer utility that can do this. If you have access to this tool, you could say
my $pid = open2($out, $in, 'unbuffer ./child_process');
There's a discussion here about the equivalent tools for Windows, but I couldn't say whether any of them are effective.

One way to (try to) deal with buffering is to set up a terminal-like environment for the process, a pseudo-terminal (pty). That is not easy to do in general but IPC::Run has that capability ready for easy use.
Here is the driver, run for testing using at facility so that it has no controlling terminal (or run it via cron)
use warnings;
use strict;
use feature 'say';
use IPC::Run qw(run);
my #cmd = qw(./t_term.pl input arguments);
run \#cmd, '>pty>', sub { say "out: #_" };
#run \#cmd, '>', sub { say "out: #_" } # no pty
With >pty> it sets up a pseudo-terminal for STDOUT of the program in #cmd (with > it's a pipe); also see <pty< and see more about redirection.
The anonymous sub {} gets called every time there is output from the child, so one can process it as it goes. There are other related options.
The program that is called (t_term.pl) only tests for a terminal
use warnings;
use strict;
use feature 'say';
say "Is STDOUT filehandle attached to a terminal: ",
( (-t STDOUT) ? "yes" : "no" );
sleep 2;
say "bye from $$";
The -t STDOUT (see filetest operators) is a suitable way to check for a terminal in this example. For more/other ways see this post.
The output shows that the called program (t_term.pl) does see a terminal on its STDOUT, even when a driver runs without one (using at, or out of a crontab). If the >pty> is changed to the usual redirection > (a pipe) then there is no terminal.
Whether this solves the buffering problem is clearly up to that program, and to whether it is enough to fool it with a terminal.
Another way around the problem is using unbuffer when possible, as in mob's answer.

Perl: Pass one byte plus STDIN to another command

I would like to do this efficiently:
my $buf;
my $len = read(STDIN,$buf,1);
if($len) {
# Not empty
open(OUT,"|-", "wc") || die;
print OUT $buf;
# This is the line I want to do faster
print OUT <STDIN>;
exit;
}
The task is to start wc only if there is any input. If there is no input, the program should just exit.
wc is just an example here. It will be substituted with a much more complex command.
The input can be of several TB of data, so I would really like to not touch that data at all (not even with a sysread). I tried doing:
pipe(STDIN,OUT);
But that doesn't work. Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN? Maybe some open(">=&2") gymnastics combined with exec?

The FIONREAD ioctl, mentioned in the Perl Cookbook, can tell you how many bytes are pending on a file descriptor without consuming them. In perlish terms:
use strict;
use warnings;
use IO::Select qw( );
BEGIN { require 'sys/ioctl.ph'; }
sub fionread {
my $sz = pack('L', 0);
return unless ioctl($_[0], FIONREAD, $sz);
return unpack('L', $sz);
}
# Wait until it's known whether the handle has data to read or has reached EOF.
IO::Select->new(\*STDIN)->can_read();
if (fionread(\*STDIN)) {
system('wc');
# Check for errors
}
This should be very widely portable to UNIX and UNIX-like platforms.

A child process is always given duplicates of its parent's file handles, so simply starting wc - either with backticks or with a call to system or exec - will cause it to read from the same place as the Perl process's STDIN.
As for starting wc only when there is something to read, it looks like you need IO::Select, which will allow you either to check whether a file handle has something to read, or to block until it does have something.
This program will check whether STDIN has any data waiting, and run wc and print its output if so.
use strict;
use warnings;
use IO::Select;
my $select = IO::Select->new(\*STDIN);
if ( $select->can_read(0) ) {
print `wc`;
}
The parameter to can_read is a timeout in seconds. Passing a value of zero makes it return immediately, reporting true (actually it returns the file handle itself) if there is data waiting, or false (undef) if not.
If you don't pass a parameter then can_read will wait forever until there is something to read, so you can suspend your program and wait for data for wc by writing just
$select->can_read;
print `wc`;
or you could combine the construction of the object to make it even more concise
IO::Select->new(\*STDOUT)->can_read;
print `wc`;
Note also that IO::Select works fine with file descriptors too, and as the fileno for STDIN is zero, you could write
my $select = IO::Select(0)
but that isn't very descriptive and would need a comment to make sense

The specific solution in which you're interested is impossible.
As you surely discovered already, you can't determine if a file handle has reached EOF without reading from it. [Apparently, you can] select(2) will get you close. It will tell you that a handle has reached EOF or has data waiting, but it won't tell you which. This is why you're looking into alternate solutions. Unfortunately, the one you're looking into is just as impossible.
Is there some other way that I can tell OUT that after it has gotten the first byte, it should just read from STDIN?
No. OUT isn't code; it doesn't read anything. It's a variable. Furthermore, it's a variable in the parent. Changing a variable in the parent isn't going to affect the child.
Maybe you meant to ask: Can one tell the child program to start reading from a second handle?
No, generally speaking. You can't go and edit another program's variables. The program would have to be specifically written to accept two file handles and read from one after the other.
Then again, it's possible to obtain a file name for an arbitrary file handle, so all we need is a program that is specifically written to accept two file names and read from one after the other, and that's quite common.
$ echo abcdef | perl -MFcntl -e'
if (sysread(STDIN, $buf, 1)) {
pipe(my $r, my $w);
my $pid = fork();
if (!$pid) {
close($w);
# Clear close-on-exec flag.
my $flags = fcntl($r, Fcntl::F_GETFD, 0);
fcntl($r, Fcntl::F_SETFD, $flags & ~Fcntl::FD_CLOEXEC);
exec("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
die $!;
}
close($r);
print($w $buf);
close($w);
waitpid($pid, 0);
}
'
abcdef
(Lots of error checking needed.)
Above, cat was used an example where your program would be used, but that presents another solution: Why not just use cat? The overhead of cat should be quite minor for an IO-bound program.
use String::ShellQuote qw( shell_quote );
my $cmd1 = shell_quote("cat", "/proc/$$/fd/".fileno($r), "/proc/$$/fd/".fileno(STDIN));
my $cmd2 = ...
exec("$cmd1 | $cmd2");

Perl - output from external process directly to stdout (avoid buffering)

I have a Perl script that has to wrap a PHP script that produces a lot of output, and takes about half an hour to run.
At moment I'm shelling out with:
print `$command`;
This works in the sense that the PHP script is called, and it does it's job, but, there is no output rendered by Perl until the PHP script finishes half an hour later.
Is there a way I could shell out so that the output from PHP is printed by perl as soon as it receives it?

The problem is that Perl's not going to finish reading until the PHP script terminates, and only when it finishes reading will it write. The backticks operator blocks until the child process exits, and there's no magic to make a read/write loop implicitly.
So you need to write one. Try a piped open:
open my $fh, '-|', $command or die 'Unable to open';
while (<$fh>) {
print;
}
close $fh;
This should then read each line as the PHP script writes it, and immediately output it. If the PHP script doesn't output in convenient lines and you want to do it with individual characters, you'll need to look into using read to get data from the file handle, and disable output buffering ($| = 1) on stdout for writing it.
See also http://perldoc.perl.org/perlipc.html#Using-open()-for-IPC

Are you really doing print `$command`?
If you are only running a command and not capturing any of its output, simply use system $command. It will write to stdout directly without passing through Perl.

You might want to investigate Capture::Tiny. IIRC something like this should work:
use strict;
use warnings;
use Capture::Tiny qw/tee/;
my ($stdout, $stderr, #result) = tee { system $command };
Actually, just using system might be good enough, YMMV.

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file

I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";

It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.

A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed

open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";

You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

How can I capture the stdin and stdout of system command from a Perl script?

In the middle of a Perl script, there is a system command I want to execute. I have a string that contains the data that needs to be fed into stdin (the command only accepts input from stdin), and I need to capture the output written to stdout. I've looked at the various methods of executing system commands in Perl, and the open function seems to be what I need, except that it looks like I can only capture stdin or stdout, not both.
At the moment, it seems like my best solution is to use open, redirect stdout into a temporary file, and read from the file after the command finishes. Is there a better solution?

IPC::Open2/3 are fine, but I've found that usually all I really need is IPC::Run3, which handles the simple cases really well with minimal complexity:
use IPC::Run3; # Exports run3() by default
run3( \#cmd, \$in, \$out, \$err );
The documentation compares IPC::Run3 to other alternatives. It's worth a read even if you don't decide to use it.

The perlipc documentation covers many ways that you can do this, including IPC::Open2 and IPC::Open3.

Somewhere at the top of your script, include the line
use IPC::Open2;
That will include the necessary module, usually installed with most Perl distributions by default. (If you don't have it, you could install it using CPAN.) Then, instead of open, call:
$pid = open2($cmd_out, $cmd_in, 'some cmd and args');
You can send data to your command by sending it to $cmd_in and then read your command's output by reading from $cmd_out.
If you also want to be able to read the command's stderr stream, you can use the IPC::Open3 module instead.

IPC::Open3 would probably do what you want. It can capture STDERR and STDOUT.
http://metacpan.org/pod/IPC::Open3

A very easy way to do this that I recently found is the IPC::Filter module. It lets you do the job extremely intuitively:
$output = filter $input, 'somecmd', '--with', 'various=args', '--etc';
Note how it invokes your command without going through the shell if you pass it a list. It also does a reasonable job of handling errors for common utilities. (On failure, it dies, using the text from STDERR as its error message; on success, STDERR is just discarded.)
Of course, it’s not suitable for huge amounts of data since it provides no way of doing any streaming processing; also, the error handling might not be granular enough for your needs. But it makes the many simple cases really really simple.

I think you want to take a look at IPC::Open2

There is a special perl command for it
open2()
More info can be found on: http://sunsite.ualberta.ca/Documentation/Misc/perl-5.6.1/lib/IPC/Open2.html

If you do not want to include extra packages, you can just do
open(TMP,">tmpfile");
print TMP $tmpdata ;
open(RES,"$yourcommand|");
$res = "" ;
while(<RES>){
$res .= $_ ;
}
which is the contrary of what you suggested, but should work also.

I always do it this way if I'm only expecting a single line of output or want to split the result on something other than a newline:
my $result = qx( command args 2>&1 );
my $rc=$?;
# $rc >> 8 is the exit code of the called program.
if ($rc != 0 ) {
error();
}
If you want to deal with a multi-line response, get the result as an array:
my #lines = qx( command args 2>&1 );
foreach ( my $line ) (#lines) {
if ( $line =~ /some pattern/ ) {
do_something();
}
}