How to read to and write from a pipe in Perl? - perl

I need to modify an existing Perl program. I want to pipe a string (which can contain multiple lines) through an external program and read the output from this program. This external program is used to modify the string. Let's simply use cat as a filter program. I tried it like this, but it doesn't work. (Output of cat goes to STDOUT instead of being read by perl.)
#!/usr/bin/perl
open(MESSAGE, "| cat |") or die("cat failed\n");
print MESSAGE "Line 1\nLine 2\n";
my $message = "";
while (<MESSAGE>)
{
$message .= $_;
}
close(MESSAGE);
print "This is the message: $message\n";
I've read that this isn't supported by Perl because it may end up in a deadlock, and I can understand it. But how do I do it then?

You can use IPC::Open3 to achieve bi-directional communication with child.
use strict;
use IPC::Open3;
my $pid = open3(\*CHLD_IN, \*CHLD_OUT, \*CHLD_ERR, 'cat')
or die "open3() failed $!";
my $r;
for(my $i=1;$i<10;$i++) {
print CHLD_IN "$i\n";
$r = <CHLD_OUT>;
print "Got $r from child\n";
}

This involves system programming, so it’s more than a basic question. As written, your main program doesn’t require full-duplex interaction with the external program. Dataflow travels in one direction, namely
string → external program → main program
Creating this pipeline is straightforward. Perl’s open has a useful mode explained in the “Safe pipe opens” section of the perlipc documentation.
Another interesting approach to interprocess communication is making your single program go multiprocess and communicate between—or even amongst—yourselves. The open function will accept a file argument of either "-|" or "|-" to do a very interesting thing: it forks a child connected to the filehandle you’ve opened. The child is running the same program as the parent. This is useful for safely opening a file when running under an assumed UID or GID, for example. If you open a pipe to minus, you can write to the filehandle you opened and your kid will find it in his STDIN. If you open a pipe from minus, you can read from the filehandle you opened whatever your kid writes to his STDOUT.
This is an open that involves a pipe, which gives nuance to the return value. The perlfunc documentation on open explains.
If you open a pipe on the command - (that is, specify either |- or -| with the one- or two-argument forms of open), an implicit fork is done, so open returns twice: in the parent process it returns the pid of the child process, and in the child process it returns (a defined) 0. Use defined($pid) or // to determine whether the open was successful.
To create the scaffolding, we work in right-to-left order using open to fork a new process at each step.
Your main program is already running.
Next, fork a process that will eventually become the external program.
Inside the process from step 2
First fork the string-printing process so as to make its output arrive on our STDIN.
Then exec the external program to perform its transformation.
Have the string-printer do its work and then exit, which kicks up to the next level.
Back in the main program, read the transformed result.
With all of that set up, all you have to do is implant your suggestion at the bottom, Mr. Cobb.
#! /usr/bin/env perl
use 5.10.0; # for defined-or and given/when
use strict;
use warnings;
my #transform = qw( tr [A-Za-z] [N-ZA-Mn-za-m] ); # rot13
my #inception = (
"V xabj, Qnq. Lbh jrer qvfnccbvagrq gung V pbhyqa'g or lbh.",
"V jnf qvfnccbvagrq gung lbh gevrq.",
);
sub snow_fortress { print map "$_\n", #inception }
sub hotel {
given (open(STDIN, "-|") // die "$0: fork: $!") { # / StackOverflow hiliter
snow_fortress when 0;
exec #transform or die "$0: exec: $!";
}
}
given (open(my $fh, "-|") // die "$0: fork: $!") {
hotel when 0;
print while <$fh>;
close $fh or warn "$0: close: $!";
}
Thanks for the opportunity to write such a fun program!

You can use the -n commandline switch to effectively wrap your existing program code in a while-loop... look at the man page for -n:
LINE:
while (<>) {
... # your program goes here
}
Then you can use the operating system's pipe mechanism directly
cat file | your_perl_prog.pl
(Edit)
I'll try to explain this more carefully...
The question is not clear about what part the perl program plays: filter or final stage. This works in either case, so I will assume it is the latter.
'your_perl_prog.pl' is your existing code. I'll call your filter program 'filter'.
Modify your_perl_prog.pl so that the shebang line has an added '-n' switch: #!/usr/bin/perl -n or #!/bin/env "perl -n"
This effectively puts a while(<>){} loop around the code in your_perl_prog.pl
add a BEGIN block to print the header:
BEGIN {print "HEADER LINE\n");}
You can read each line with '$line = <>;' and process/print
Then invoke the lot with
cat sourcefile |filter|your_perl_prog.pl

I want to expand on #Greg Bacon's answer without changing it.
I had to execute something similar, but wanted to code without
the given/when commands, and also found there was explicit exit()
calls missing because in the sample code it fell through and exited.
I also had to make it also work on a version running ActiveState perl,
but that version of perl does not work.
See this question How to read to and write from a pipe in perl with ActiveState Perl?
#! /usr/bin/env perl
use strict;
use warnings;
my $isActiveStatePerl = defined(&Win32::BuildNumber);
sub pipeFromFork
{
return open($_[0], "-|") if (!$isActiveStatePerl);
die "active state perl cannot cope with dup file handles after fork";
pipe $_[0], my $child or die "cannot create pipe";
my $pid = fork();
die "fork failed: $!" unless defined $pid;
if ($pid) { # parent
close $child;
} else { # child
open(STDOUT, ">&=", $child) or die "cannot clone child to STDOUT";
close $_[0];
}
return $pid;
}
my #transform = qw( tr [A-Za-z] [N-ZA-Mn-za-m] ); # rot13
my #inception = (
"V xabj, Qnq. Lbh jrer qvfnccbvagrq gung V pbhyqa'g or lbh.",
"V jnf qvfnccbvagrq gung lbh gevrq.",
);
sub snow_fortress { print map "$_\n", #inception }
sub hotel
{
my $fh;
my $pid = pipeFromFork($fh); # my $pid = open STDIN, "-|";
defined($pid) or die "$0: fork: $!";
if (0 == $pid) {
snow_fortress;
exit(0);
}
open(STDIN, "<&", $fh) or die "cannot clone to STDIN";
exec #transform or die "$0: exec: $!";
}
my $fh;
my $pid = pipeFromFork($fh); # my $pid = open my $fh, "-|";
defined($pid) or die "$0: fork: $!";
if (0 == $pid) {
hotel;
exit(0);
}
print while <$fh>;
close $fh or warn "$0: close: $!";

the simplest -- not involving all these cool internals -- way to do what the OP needs, is to use a temporary file to hold the output until the external processor is done, like so:
open ToTemp, "|/usr/bin/tac>/tmp/MyTmp$$.whee" or die "open the tool: $!";
print ToTemp $TheMessageWhateverItIs;
close ToTemp;
my $Result = `cat /tmp/MyTmp$$.whee`; # or open and read it, or use File::Slurp, etc
unlink "/tmp/MyTmp$$.whee";
Of course, this isn't going to work for something interactive, but co-routines appear to be out of the scope of the original question.

Related

How do I run perl scripts in parallel and capture outputs in files?

I need to run my perl tests in parallel and capture STDOUT and STDERR in a separate file for each test file. I'm having no success even in capturing in one file. I've been all over SO and have had no luck. Here is where I started (I'll spare you all the variations). Any help is greatly appreciated. Thanks!
foreach my $file ( #files) {
next unless $file =~ /\.t$/;
print "\$file = $file\n";
$file =~ /^(\w+)\.\w+/;
my $file_pfx = $1;
my $this_test_file_name = $file_pfx . '.txt';
system("perl $test_dir\\$file > results\\$test_file_name.txt &") && die "cmd failed: $!\n";
}
Here is a simple example using Parallel::ForkManager to spawn separate processes.
In each process the STDOUT and STDERR streams are redirected, in two ways for a demo: STDOUT to a variable, that can then be passed around as desired (here dumped into a file), and STDERR directly to a file. Or use a library, with an example in a separate code snippet.
The numbers 1..6 represent batches of data that each child will pick from to process. Only three processes are started right away and then as one finishes another one is started in its place.† (Here they exit nearly immediately, the "jobs" being trivial.)
use warnings;
use strict;
use feature 'say';
use Carp qw(carp)
use Path::Tiny qw(path);
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(3);
foreach my $data (1..6) {
$pm->start and next; # start a child process
proc_in_child($data); # code that runs in the child process
$pm->finish; # exit it
}
$pm->wait_all_children; # reap all child processes
say "\nParent $$ done\n";
sub proc_in_child {
my ($data) = #_;
say "Process $$ with data $data"; # still shows on terminal
# Will dump all that was printed to streams to these files
my (outfile, $errfile) =
map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);
# Redirect streams
# One way to do it, redirect to a variable (for STDOUT)...
open my $fh_stdout, ">", \my $so or carp "Can't open handle to variable: $!";
my $fh_STDOUT = select $fh_stdout;
# ...another way to do it, directly to a file (for any stream)
# (first 'dup' it so it can be restored if needed)
open my $SAVEERR, ">&STDERR" or carp "Can't dup STDERR: $!";
open *STDERR, ">", $errfile or carp "Can't redirect STDERR to $errfile: $!";
# Prints wind up in a variable (for STDOUT) and a file (for STDERR)
say "STDOUT: Child process with pid $$, processing data #$data";
warn "STDERR: Child process with pid $$, processing data #$data";
close $fh_stdout;
# If needed to restore (not in this example which exits right away)
select $fh_STDOUT;
open STDERR, '>&', $SAVEERR or carp "Can't reopen STDERR: $!";
# Dump all collected STDOUT to a file (or pass it around, it's a variable)
path( $outfile )->spew($so);
return 1
}
While STDOUT is redirected to a variable, STDERR cannot be redirected that way and here it goes directly to a file. See open. However there are ways to capture it in a variable as well.
Then you can use the module's ability to return from child processes to the parent, which can then handle those variables. See for example this post and this post and this post. (There's way more, these are merely the ones I know.) Or indeed just dump them to files, as done here.
Another way is to use modules that can run code and redirect output, like Capture::Tiny
use Capture::Tiny qw(capture);
sub proc_in_child {
my ($data) = #_;
say "Process $$ with data $data"; # on terminal
# Run code and capture all output
my ($stdout, $stderr, #results) = capture {
say "STDOUT: Child process $$, processing data #$data";
warn "STDERR: Child process $$, processing data #$data";
# return results perhaps...
1 .. 4;
}
# Do as needed with variables with collected STDOUT and STDERR
# Return to parent, or dump to file:
my ($outfile, $errfile) =
map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);
path($outfile) -> spew( $stdout );
path($errfile) -> spew( $stderr );
return 1
}
† This keeps the same number of processes running. Or, one can set it up to wait for the whole batch to finish and then start another batch. For some details of operation see this post
I think, the easiest way is to use shell redirects in your 'system' command. BTW, spawning uncontrolled subprocesses from it with '&' makes me frown.
Here is a simple example of with shell redirects and fork.
#!/usr/bin/perl
use strict;
for my $i (0..2) {
my $stdoutName = "stdout$i.txt";
my $stderrName = "stderr$i.txt";
my $pid = fork();
if($pid == 0) {
system("perl mytest.pl 1>$stdoutName 2>$stderrName"); #redirects are here 1> (stdout) and 2> (stderr)
exit $?;
}
}

Die if anything is written to STDERR?

How can I force Perl script to die if anything is written to STDERR ?
Such action should be done instantly, when such output happen, or even before, to prevent that output...
This doesn't seem like an especially smart idea, but a tied filehandle should work. According to the perltie manpage:
When STDERR is tied, its PRINT method will be called to issue warnings and error messages. This feature is temporarily disabled during the call, which means you can use warn() inside PRINT without starting a recursive loop.
So something like this (adapted from the manpage example) ought to work:
package FatalHandle;
use strict;
use warnings;
sub TIEHANDLE { my $i; bless \$i, shift }
sub PRINT {
my $r = shift;
die "message to STDERR: ", #_;
}
package main;
tie *STDERR, "FatalHandle";
warn "this should be fatal.";
print "Should never get here.";
And that outputs (with exit code 255):
message to STDERR: this should be fatal. at fh.pl line 17.
Here's a method that works no matter how STDERR (fd 2) is written to, even if it's a C extension that doesn't use Perl's STDERR variable to do so. It will even kill child processes that write to STDERR!
{
pipe(my $r, my $w)
or die("Can't create pipe: $!\n");
open(STDERR, '>&', $w)
or die("Can't dup pipe: $!\n");
close($r);
}
print "abc\n";
print "def\n";
print STDERR "xxx\n";
print "ghi\n";
print "jkl\n";
$ perl a.pl
abc
def
$ echo $?
141
Doesn't work on Windows. Doesn't work if you add a SIGPIPE handler.

How can I read the STDOUT of a external Program in realtime?

Let me elaborate.
Say I have perl program
(whch was shamelessly copied and edited from perl
http://perldoc.perl.org/perlfaq8.html#How-can-I-open-a-pipe-both-to-and-from-a-command%3f
)
use IPC::Open3;
use Symbol qw(gensym);
use IO::File;
local *CATCHOUT = IO::File->new_tmpfile;
local *CATCHERR = IO::File->new_tmpfile;
my $pid = open3(gensym, ">&CATCHOUT", ">&CATCHERR", "ping -t localhost");
#waitpid($pid, 0);
seek $_, 0, 0 for \*CATCHOUT, \*CATCHERR;
while( <CATCHOUT> ) {
print $_;
}
But the problem with the above program is it will to a sort of readtoEnd() of the STDOUT belonging to the program ping.exe in this case and allow it ti be read all at once.
But what I want to be able to do is to read the STDOUT as it is being written out to STDOUT.
if I remove waitforpid() then program exits immediately, so that doesn't help either.
Is that Possible ? If so, can you please point me in the right direction.
Update:
Drats!!!! I missed the | symbol... which is essential for piping the output out of ping and into the perl script!!!
use IPC::Open3 qw( open3 );
open(local *CHILD_STDIN, '<', '/dev/null') or die $!;
my $pid = open3(
'<&CHILD_STDIN',
my $child stdout,
'>&STDERR',
'ping', '-t', 'localhost',
);
while (<$child_stdout>) {
chomp;
print("Got: <<<$_>>>\n");
}
waitpid($pid, 0);
But that can be written as
open(my $ping_fh, '-|', 'ping', '-t', 'localhost') or die $!;
while (<$ping_fh>) {
chomp;
print("Got: <<<$_>>>\n");
}
close($ping_fh);
This just shows the proper usage. If these don't work, it's an unrelated problem: ping is buffering it's IO when not connected to a terminal. You can fool it using a pseudo-tty.
One of the strengths (or weaknesses) of perl is that there is more than one way to do things. This works:
perl -e 'open(F,"ping localhost|"); while(<F>) { s/ms/Milliseconds/; print $_; }'
Just put the s/ms/Milliseconds/ to show that the data is being read and changed
Not sure exactly what you have wrong with Open3

Perl IPC::Run appending output and parsing stderr while keeping it in a batch of files

I'm trying to wrap my head around IPC::Run to be able to do the following. For a list of files:
my #list = ('/my/file1.gz','/my/file2.gz','/my/file3.gz');
I want to execute a program that has built-in decompression, does some editing and filtering to them, and prints to stdout, giving some stats to stderr:
~/myprogram options $file
I want to append the stdout of the execution for all the files in the list to one single $out file, and be able to parse and store a couple of lines in each stderr as variables, while letting the rest be written out into separate fileN.log files for each input file.
I want stdout to all go into a ">>$all_into_one_single_out_file", it's the err that I want to keep in different logs.
After reading the manual, I've gone so far as to the code below, where the commented part I don't know how to do:
for $file in #list {
my #cmd;
push #cmd, "~/myprogram options $file";
IPC::Run::run \#cmd, \undef, ">>$out",
sub {
my $foo .= $_[0];
#check if I want to keep my line, save value to $mylog1 or $mylog2
#let $foo and all the other lines be written into $file.log
};
}
Any ideas?
First things first. my $foo .= $_[0] is not necessary. $foo is a new (empty) value, so appending to it via .= doesn't do anything. What you really want is a simple my ($foo) = #_;.
Next, you want to have output go to one specific file for each command while also (depending on some conditional) putting that same output to a common file.
Perl (among other languages) has a great facility to help in problems like this, and it is called closure. Whichever variables are in scope at the time of a subroutine definition, those variables are available for you to use.
use strict;
use warnings;
use IPC::Run qw(run new_chunker);
my #list = qw( /my/file1 /my/file2 /my/file3 );
open my $shared_fh, '>', '/my/all-stdout-goes-here' or die;
open my $log1_fh, '>', '/my/log1' or die "Cannot open /my/log1: $!\n";
open my $log2_fh, '>', '/my/log2' or die "Cannot open /my/log2: $!\n";
foreach my $file ( #list ) {
my #cmd = ( "~/myprogram", option1, option2, ..., $file );
open my $log_fh, '>', "$file.log"
or die "Cannot open $file.log: $!\n";
run \#cmd, '>', $shared_fh,
'2>', new_chunker, sub {
# $out contains each line of stderr from the command
my ($out) = #_;
if ( $out =~ /something interesting/ ) {
print $log1_fh $out;
}
if ( $out =~ /something else interesting/ ) {
print $log2_fh $out;
}
print $log_fh $out;
return 1;
};
}
Each of the output file handles will get closed when they're no longer referenced by anything -- in this case at the end of this snippet.
I fixed your #cmd, though I don't know what your option1, option2, ... will be.
I also changed the way you are calling run. You can call it with a simple > to tell it the next thing is for output, and the new_chunker (from IPC::Run) will break your output into one-line-at-a-time instead of getting all the output all-at-once.
I also skipped over the fact that you're outputting to .gz files. If you want to write to compressed files, instead of opening as:
open my $fh, '>', $file or die "Cannot open $file: $!\n";
Just open up:
open my $fh, '|-', "gzip -c > $file" or die "Cannot startup gzip: $!\n";
Be careful here as this is a good place for command injection (e.g. let $file be /dev/null; /sbin/reboot. How to handle this is given in many, many other places and is beyond the scope of what you're actually asking.
EDIT: re-read problem a bit more, and changed answer to more closely reflect the actual problem.
EDIT2:: Updated per your comment. All stdout goes to one file, and the stderr from command is fed to the inline subroutine. Also fixed a stupid typo (for syntax was pseudo code not Perl).

Showing progress whilst running a system() command in Perl

I have a Perl script which performs some tasks, one of which is to call a system command to "tar -cvf file.tar.....".
This can often take some time so I'd like the command line to echo back a progress indicator, something like a # echoing back to screen whilst the system call is in progress.
I've been doing some digging around and stumbled across fork. Is this the best way to go? Is it possible to fork off the system command, then create a while loop which checks on the staus of the $pid returned by the fork?
I've also seen references to waitpid.... I'm guessing I need to use this also.
fork system("tar ... ")
while ( forked process is still active) {
print #
sleep 1
}
Am I barking up the wrong tree?
Many thanks
John
Perl has a nice construction for this, called "pipe opens." You can read more about it by typing perldoc -f open at a shell prompt.
# Note the use of a list for passing the command. This avoids
# having to worry about shell quoting and related errors.
open(my $tar, '-|', 'tar', 'zxvf', 'test.tar.gz', '-C', 'wherever') or die ...;
Here's a snippet showing an example:
open(my $tar, '-|', 'tar', ...) or die "Could not run tar ... - $!";
while (<$tar>) {
print ".";
}
print "\n";
close($tar);
Replace the print "." with something that prints a hash mark every 10 to 100 lines or so to get a nice gaugebar.
An example that doesn't depend on the child process writing any kind of output, and just prints a dot about once a second as long as it's running:
use POSIX qw(:sys_wait_h);
$|++;
defined(my $pid = fork) or die "Couldn't fork: $!";
if (!$pid) { # Child
exec('long_running_command', #args)
or die "Couldn't exec: $!";
} else { # Parent
while (! waitpid($pid, WNOHANG)) {
print ".";
sleep 1;
}
print "\n";
}
Although it could probably stand to have more error-checking, and there might actually be something better already on CPAN. Proc::Background seems promising for abstracting this kind of job away but I'm not sure how reliable it is.
$|++;
open(my $tar, 'tar ... |') or die "Could not run tar ... - $!";
while ($file=<$tar>) {
print "$file";
}
print "\n";
close($tar);
This prints the filenames received from tar.
For showing progress during a long-running task, you will find Term::ProgressBar useful -- it does the "printing of # across the screen" functionality that you describe.
I would try something like this
open my $tar, "tar -cvf file.tar..... 2>&/dev/null |"
or die "can't fork: $!";
my $i = 0;
while (<$tar>) {
if( i++ % 1000 == 0 ) print;
}
close $tar or die "tar error: $! $?";
Expanding on what Hobbs provided if you would like to get the data from the child process back into the Parent process you need to have an external conduit. I ended up using the tempfs because it was simple like a file, but does not put IO hits on the disk.
** Important **
You need to exit the child process, because otherwise the "child" process will continue along the same script and you will get double print statements. So in the example below foreach (#stdoutput) would happen two times despite only being in the script once.
$shm_id = time; #get unique name for file - example "1452463743"
$shm_file = "/dev/shm/$shm_id.tmp"; #set filename in tempfs
$| = 1; #suffering from buffering
print ("Activity Indicator: "); #No new line here
defined(my $pid = fork) or die "Couldn't fork: $!";
if (!$pid) { # Child
#stdoutput=`/usr/home/script.pl -o $parameter`; #get output of external command
open (SHM, ">$shm_file");
foreach (#stdoutput) {
print SHM ("$_"); #populate file in tempfs
}
close (SHM);
exit; #quit the child process (will not kill parent script)
} else { # Parent
while (! waitpid($pid, WNOHANG)) {
print ("\#"); # prints a progress bar
sleep 5;
}
}
print ("\n"); #finish up bar and go to new line
open (SHM, "$shm_file");
#stdoutput = <SHM>; #Now open the file and read it. Now array is in parent
close (SHM);
unlink ($shm_file); #deletes the tempfs file
chomp(#stdoutput);
foreach (#stdoutput) {
print ("$_\n"); #print results of external script
}