Perl: avoid greedy reading from stdin? - perl

Consider the following perl script (read.pl):
my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;
If this script is executed from the command line, it will get the first line of input, while cat gets everything else until the end of input (^D is pressed).
However, things are different when the input is piped from another process or read from a file:
$ echo "foo\nbar" | ./read.pl
Perl read: foo
And here's what cat gets:
Perl seems to greadily buffer the entire input somewhere, and processes called using backticks or system do no see any of the input.
The problem is that I'd like to unit test a script that mixes <STDIN> and calls to other processes. What would be the best way to do this? Can I turn off input buffering in perl? Or can I spool the data in a way that will "mimic" a terminal?

This is not a Perl problem. It is a UNIX/shell problem. When you run a command without pipes you are in line buffering mode, but when you redirect with pipes, you are in block buffering mode. You can see this by saying:
cat /usr/share/dict/words | ./read.pl | head
This C program has the same problem:
#include <stdio.h>
int main(int argc, char** argv) {
char line[4096];
FILE* cat;
fgets(line, 4096, stdin);
printf("C got: %s\ncat got:\n", line);
cat = popen("cat", "r");
while (fgets(line, 4096, cat)) {
printf("%s", line);
}
pclose(cat);
return 0;
}

I have good news and bad news.
The good news is a simple modification of read.pl allows you to give it fake input:
#! /usr/bin/perl
use warnings;
use strict;
binmode STDIN, "unix" or die "$0: binmode: $!";
my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;
Sample run:
$ printf "A\nB\nC\nD\n" | ./read.pl
Perl read: A
And here's what cat gets: B
C
D
The bad news is you get a single switchover: if you try to repeat the read-then-cat, the first cat will starve all subsequent reads. To see this, consider
#! /usr/bin/perl
use warnings;
use strict;
binmode STDIN, "unix" or die "$0: binmode: $!";
my $line = <STDIN>;
print "1: Perl read: $line";
print "1: And here's what cat gets: ", `cat -`;
$line = <STDIN>;
$line = "<undefined>\n" unless defined $line;
print "2: Perl read: $line";
print "2: And here's what cat gets: ", `cat -`;
and then a sample run that produces
$ printf "A\nB\nC\nD\n" | ./read.pl
1: Perl read: A
1: And here's what cat gets: B
C
D
2: Perl read: <undefined>
2: And here's what cat gets:

Today I think I've found what I needed: Perl has a module called Expect which is perfect for such situations:
#!/usr/bin/perl
use strict;
use warnings;
use Expect;
my $exp = Expect->spawn('./read.pl');
$exp->send("First Line\n");
$exp->send("Second Line\n");
$exp->send("Third Line\n");
$exp->soft_close();
Works like a charm ;)

Here's a sub-optimal way that I've found:
use IPC::Run;
my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, \$output);
$process->pump() until $output =~ /Perl read:/;
$input .= "Second Line\n";
$process->finish();
print $output;
It's sub-optimal in the sense that one needs to know the "prompt" that the program will emit before waiting for more input.
Another sub-optimal solution is the following:
use IPC::Run;
my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, my $timer = IPC::Run::timer(1));
$process->pump() until $timer->is_expired();
$timer->start(1);
$input .= "Second Line\n";
$process->finish();
It does not require knowledge of any prompt, but is slow because it waits at least two seconds. Also, I don't understand why the second timer is needed (finish won't return otherwise).
Does anybody know better solutions?

Finally I ended up with the following solution. Still far from optimal, but it works. Even in situations like the one described by gbacon.
use Carp qw( confess );
use IPC::Run;
use Scalar::Util;
use Time::HiRes;
# Invokes the given program with the given input and argv, and returns stdout/stderr.
#
# The first argument provided is the input for the program. It is an arrayref
# containing one or more of the following:
#
# * A scalar is simply passed to the program as stdin
#
# * An arrayref in the form [ "prompt", "input" ] causes the function to wait
# until the program prints "prompt", then spools "input" to its stdin
#
# * An arrayref in the form [ 0.3, "input" ] waits 0.3 seconds, then spools
# "input" to the program's stdin
sub capture_with_input {
my ($program, $inputs, #argv) = #_;
my ($stdout, $stderr);
my $stdin = '';
my $process = IPC::Run::start( [$program, #argv], \$stdin, \$stdout, \$stderr );
foreach my $input (#$inputs) {
if (ref($input) eq '') {
$stdin .= $input;
}
elsif (ref($input) eq 'ARRAY') {
(scalar #$input == 2) or
confess "Input to capture_with_input must be of the form ['prompt', 'input'] or [timeout, 'input']!";
my ($prompt_or_timeout, $text) = #$input;
if (Scalar::Util::looks_like_number($prompt_or_timeout)) {
my $start_time = [ Time::HiRes::gettimeofday ];
$process->pump_nb() while (Time::HiRes::tv_interval($start_time) < $prompt_or_timeout);
}
else {
$prompt_or_timeout = quotemeta $prompt_or_timeout;
$process->pump until $stdout =~ m/$prompt_or_timeout/gc;
}
$stdin .= $text;
}
else {
confess "Unknown input type passed to capture_with_input!";
}
}
$process->finish();
return ($stdout, $stderr);
}
my $input = [
"First Line\n",
["Perl read:", "Second Line\n"],
[0.5, "Third Line\n"],
];
print "Executing process...\n";
my ($stdout, $stderr) = capture_with_input('./read.pl', $input);
print "done.\n";
print "STDOUT:\n", $stdout;
print "STDERR:\n", $stderr;
Usage example (with a slightly modified read.pl to test gbacon's case):
$ time ./spool_read4.pl
Executing process...
done.
STDOUT:
Perl read: First Line
And here's what head -n1 gets: Second Line
Perl read again: Third Line
STDERR:
./spool_read4.pl 0.54s user 0.02s system 102% cpu 0.547 total
Still, I'm open to better solutions...

Related

Prepend perl program with filter

I want to write:
... | my_filter | myperlprogram
But I do not know how to run my_filter until I have started myperlprogram.
Can I somehow in myperlprogram loop STDIN through my_filter before reading it?
I am thinking something like:
pipe($a,$b);
if(not fork()) {
close STDOUT;
open STDOUT, $b;
exec "my_filter --with the correct --options";
} else {
close STDIN
open STDIN, $a
}
# continue reading STDIN now looped through `my_filter`
It's not at all clear from the description why a simple
open STDIN, '-|', 'your_filter', '--option1', ...
will not do.
The way I see the problem is: To filter the STDIN for the script, by using an external program which is run from inside the script once the script is running (so, not with a pipeline). With IPC::Run
use warnings;
use strict;
use feature 'say';
use IPC::Run qw(start pump finish);
my $filtered_in;
FILTER_IN: {
my #cmd = qw(a_filter.pl); # add filter's options/arguments
my $h = start \#cmd, \my $in, \$filtered_in;
while (<>) {
$in = $_;
pump $h while length $in;
# Wait for filter's output -- IF WANT to process lines as received
pump $h until $filtered_in =~ /\n\z/;
chomp $filtered_in; # process/use filter's output
$filtered_in .= '|'; # as it's coming (if needed)
}
finish $h or die "Cleanup returned: $?";
};
say $filtered_in // 'no input';
This allows one to process filter's lines of output as they are emitted. If that is not needed but we only want to accumulate filter's output for later then you don't need the code under # Wait for...
Simplest test with a_filter.pl such as
use warnings;
use strict;
STDOUT->autoflush(1);
my $cnt = 0;
while (<>) { print "line ", ++$cnt, ": ", $_ }
and then run
echo "a\nfew\nlines" | script.pl
with output
line 1: a|line 2: few|line 3: lines|
from our toy processing in script.pl above.
This will filter input via a file as well,
script.pl < input.txt

Perl script to log external executables output and error while that still runs

I have a Perl script, which runs an external executable. That executable runs for a while (sometimes seconds, sometimes an hour), can spit out text to both STDOUT and STDERR as well as an exit code, which all are needed. Following code demonstrates first successful external executable run (small bash script with one line - the comment), then with bad exit status (example with gs - ghostscript).
I want the external executable give its STDOUT to the Perl script for evaluation, filtering, formatting etc. before it gets logged to a logfile (used for other stuff as well) while the external is still executing. STDERR would also be great to be worked on same way.
This script is in stand to log everything from STDOUT, but only after the executable has finished. And the STDERR is logged only directly, without evaluations etc. I have no possibility to install any additional Perl parts, modules etc.
How do I get my Perl script to get each line (STDOUT + STDERR) from the executable while it is spitting it out (not just at the end) as well as its exit code for other purposes?
#!/usr/bin/perl
#array_executable_and_parameters = "/home/username/perl/myexecutable.sh" ; #ls -lh ; for i in {1..5}; do echo X; sleep 1; done
#array_executable_and_parameters2= "gs aaa" ;
my $line;
chdir("/home/username/perl/");
$logFileName = "logfileforsomespecificinput.log";
open(LOGHANDLE, ">>$logFileName" );
open (STDERR, '>>', $logFileName); #Prints to logfile directly
#open (STDERR, '>>', <STDOUT>); #Prints to own STDOUT (screen or mailfile)
print LOGHANDLE "--------------OK run\n";
open CMD, '-|', #array_executable_and_parameters or die $#;
while (defined($line = <CMD>)) { #Logs all at once at end
print LOGHANDLE "-----\$line=$line-----\n";
}
close CMD;
$returnCode1= $?>>8;
print LOGHANDLE "\$returnCode1=$returnCode1\n";
print LOGHANDLE "--------------BAD run\n";
open CMD2, '-|', #array_executable_and_parameters2 or die $#;
while (defined($line = <CMD2>)) {
print LOGHANDLE "-----\$line=$line-----\n";
}
close CMD2;
$returnCode2= $?>>8;
print LOGHANDLE "\$returnCode2=$returnCode2\n";
close(LOGHANDLE);
Take 2. After good advice in comments I have tried the IPC::Run. But something still does not work as expected. I seem to be missing how the looping from start (or pump?) to finish works, as well as how to get it to iterate when I do not know what the last output would be - as the examples everywhere mentions. So far I have now the following code, but it does not work line by line. It spits out listing of files in one go, then waits until the external loop is fully finished to print all the X's out. How do I tame it to the initial needs?
#! /usr/bin/perl
use IPC::Run qw( start pump finish );
#array_executable_and_parameters = ();
push(#array_executable_and_parameters,"/home/username/perl/myexecutable.sh"); #ls -lh ; for i in {1..5}; do echo X; sleep 1; done
my $h = start \#array_executable_and_parameters, \undef, \$out, \$err ;
pump $h;# while ($out or $err);
print "1A. \$out: $out\n";
print "1A. \$err: $err\n";
$out = "";
$err = "";
finish $h or die "Command returned:\n\$?=$?\n\$#=$#\nKilled by=".( $? & 0x7F )."\nExit code=".( $? >> 8 )."\n" ;
print "1B. \$out: $out\n";
print "1B. \$err: $err\n";
Look at IPC modules, especially IPC::Cmd, IPC::Run and if not satisfied then IPC::Run3. There is a lot of details you would have to cover and those modules will make your life a lot easier.
OK, have got it to work, so far. Might have some issues - not sure about environment variables, like umask or language related or the system load when push is waiting/blocking, or how to replace die with capturing of all variables for status. Nevertheless for my purpose, seems to work well. Will see how it works on a real system.
#! /usr/bin/perl
BEGIN {
push #INC, '/home/myusername/perl5/lib/perl5'; #Where the modules from Cpan are
}
use IPC::Run qw( start pump finish );
#array_executable_and_parameters = ();
push(#array_executable_and_parameters,"/home/myusername/perl/myexecutable.sh"); #ls -lh ; for i in {1..5}; do echo X; sleep 1; done
my $h = start \#array_executable_and_parameters, \undef, \$out, \$err ;
while (42) {
pump $h;# while ($out or $err);
if ($out eq '' and $err eq '') {last;}
print "1A. \$out: $out\n";
print "1A. \$err: $err\n";
$out = "";
$err = "";
}
finish $h or die "Command returned:\n\$?=$?\n\$#=$#\nKilled by=".( $? & 0x7F )."\nExit code=".( $? >> 8 )."\n" ;
print "1B. \$out: $out\n";
print "1B. \$err: $err\n";
The key was understanding how the blocking of pump works. All the manuals and help places kind of skipped over this part. So a neverending while which jumps out when pump lets go further without output was the key.

Expression in backticks with Perl `'cmd'.join ...`

I would like to send the remaining #ARGV to foo. I currently do this:
my $cmd = 'foo '.join ' ', #ARGV;
my $out = `$cmd`;
Is there possible to do it in one line? For instance with a non-existent e option:
my $out = qx/'foo'.join ' ', #ARGV/e;
In a more general case I might want to do this:
my $out = qx/'foo?.join(' ', keys %hash)/e;
The builtin readpipe function is what is at the back end of backticks/qx() calls, so you can use that directly:
my $out = readpipe('foo' . join ' ', #ARGV);
You don't need to assemble the command prior to running it. The qx() operator (aliased by the backticks) interpolates.
perl -e 'print `echo #ARGV`' foo bar
or in your script:
my $out = `foo #ARGV`
What "optional" says about qx and interpolation is right: Beware that double interpolation might bite you and it's prone to security issues!
Regarding your update: Try
perl -e '%h = (foo=>1,bar=>2); print `echo #{[keys %h]}`'
That constructs an anonymous arrayref and immediately dereferrences it. Hashes don't interpolate but this array context allows arbitrary Perl code producing a list. Also I'm pretty sure the compiler recognized this idiom and removes the arrayref (de)dereferrence during optimization.
But that is really ugly, nearly unreadable from my point of view. I'd rather recommend:
my #keys = keys %hash;
my $cmd = "foo #keys";
my $out = `$cmd`;
Hint: storing the command in a dedicated variable makes logging executes commands easier what is really desirable.
Sure
my $out = capture_this_command( 'foo', #ARGV );
sub capture_this_command {
use Capture::Tiny qw/ capture /;
## local %ENV;
## delete #ENV{'PATH', 'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
## $ENV{'PATH'} = '/bin:/usr/bin';
my #cmd = #_;
my( $stdout, $stderr, $exit ) = capture {
system { $cmd[0] } #cmd;
};;
if( $exit ){
die "got the exit( $exit ) and stderr: $stderr\n ";
} elsif( $stderr ){
warn "got stderr: $stderr\n ";
}
return $stdout;
}
update:
qx// is double quotes, it interpolates, so everything perlintro/perlsyn/perlquote say about that goes, but also, remember, qx// calls your shell (to see which one you have perl -V:sh) and shells have their own interpolation
So you could write my $out = qx/foo #ARGV/; but its subject to interpolation, first by perl, then by whatever shell you're invoking

Detect first or second file in a one-liner

In AWK, it is common to see this kind of structure for a script that runs on two files:
awk 'NR==FNR { print "first file"; next } { print "second file" }' file1 file2
Which uses the fact that there are two variables defined: FNR, which is the line number in the current file and NR which is the global count (equivalent to Perl's $.).
Is there something similar to this in Perl? I suppose that I could maybe use eof and a counter variable:
perl -nE 'if (! $fn) { say "first file" } else { say "second file" } ++$fn if eof' file1 file2
This works but it feels like I might be missing something.
To provide some context, I wrote this answer in which I manually define a hash but instead, I would like to populate the hash from the values in the first file, then do the substitutions on the second file. I suspect that there is a neat, idiomatic way of doing this in Perl.
Unfortunately, perl doesn't have a similar NR==FNR construct to differentiate between two files. What you can do is use the BEGIN block to process one file and main body to process the other.
For example, to process a file with the following:
map.txt
a=apple
b=ball
c=cat
d=dog
alpha.txt
f
a
b
d
You can do:
perl -lne'
BEGIN {
$x = pop;
%h = map { chomp; ($k,$v) = split /=/; $k => $v } <>;
#ARGV = $x
}
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
Update:
I gave a pretty simple example, and now when I look at that, I can only say TIMTOWDI since you can do:
perl -F'=' -lane'
if (#F == 2) { $h{$F[0]} = $F[1]; next }
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
However, I can say for sure, there is no NR==FNR construct for perl and you can probably process them in various different ways based on the files.
It looks like what you're aiming for is to use the same loop for reading both files, and have a conditional inside the loop that chooses what to do with the data. I would avoid that idea because you are hiding what two distinct processes in the same stretch of code, making it less than clear what is going on.
But, in the case of just two files, you could compare the current file with the first element of #ARGV, like this
perl -nE 'if ($ARGV eq $ARGV[0]) { say "first file" } else { say "second file" }' file1 file2
Forgetting about one-line programs, which I hate with a passion, I would just explicitly open $ARGV[0] and $ARGV[1]. Perhaps naming them like this
use strict;
use warnings;
use 5.010;
use autodie;
my ($definitions, $data) = #ARGV;
open my $fh, '<', $definitions;
while (<$fh>) {
# Build hash
}
open $fh, '<', $data;
while (<$fh>) {
# Process file
}
But if you want to avail yourself of the automatic opening facilities then you can mess with #ARGV like this
use strict;
use warnings;
my ($definitions, $data) = #ARGV;
#ARGV = ($definitions);
while (<>) {
# Build hash
}
#ARGV = ($data);
while (<>) {
# Process file
}
You can also create your own $fnr and compare to $..
Given:
var='first line
second line'
echo "$var" >f1
echo "$var" >f2
echo "$var" >f3
You can create a pseudo FNR by setting a variable in the BEGIN block and resetting at each eof:
perl -lnE 'BEGIN{$fnr=1;}
if ($fnr==$.) {
say "first file: $ARGV, $fnr, $. $_";
}
else {
say "$ARGV, $fnr, $. $_";
}
eof ? $fnr=1 : $fnr++;' f{1..3}
Prints:
first file: f1, 1, 1 first line
first file: f1, 2, 2 second line
f2, 1, 3 first line
f2, 2, 4 second line
f3, 1, 5 first line
f3, 2, 6 second line
Definitely not as elegant as awk but it works.
Note that Ruby has support for FNR==NR type logic.

Debug a Perl script

I already did some research on Perl script debugging but couldn't find what I was looking for.
Let me explain my problem here.
I have a Perl script which is not entering into last while loop it seems cos it is not printing anything inside as instructed.
So, I want to know is there any easier method available to see all lines one by one like we can see in shell script using
set -x
Here is my Perl script code
#!/usr/bin/perl -w
my $ZONEADM = "/usr/sbin/zoneadm list -c";
use strict;
use diagnostics;
use warnings;
system("clear");
print "Enter the app\n";
chomp(my $INS = <>);
print "\nEnter the Symmitrix ID\n";
chomp(my $SYMM = <>);
print "\nEnter the Server\n";
chomp(my $SRV = <>);
print "\nEnter the devices\n";
while (<>) {
if($_ !~ m/(q|quit)/) {
chomp($_);
my $TEMP_FILE = "/export/home/ptiwari/scripts/LOG.11";
open (my $FH, '>>', $TEMP_FILE);
my #arr = split(/:/, $_);
if($arr[3]) {
print $FH "/".$INS."db/".$arr[0]." ".$SYMM." ".$arr[1]." ".$arr[2]." ".$arr[3]."\n";
}
else {
print $FH "/".$INS."db/".$arr[0]." ".$SYMM." ".$arr[1]." ".$arr[2]."\n";
}
undef #arr;
close $FH;
}
else {
exit;
}
}
my $IS_ZONE = qx($ZONEADM|grep -i $SRV|grep -v global);
if($IS_ZONE) {
$IS_ZONE = "yes";
}
else {
$IS_ZONE = "no";
}
open(my $FLH, '<', "/export/home/ptiwari/scripts/LOG.11");
my #lines;
while(<$FLH>) {
my ($GLOBAL_MTPT, $SYM, $SYM_DEV, $SIZE, $NEWFS) = split;
print $GLOBAL_MTPT." ".$SYM." ".$SYM_DEV;
print "\n";
}
I already tried perl -d but it didn't show me anything which can help me to troubleshoot why it didn't enter the while loop.
Your while(<>) loop doesn't have sensible termination conditions. The /q|quit/ regex is buggy.
You exit the whole script if any line contains q or quit. You will also exit, if the device descriptions contains things like quill or acquisition. The effect of typing an accidental q is similar to a CtrlC.
The only way to finish the loop and go on with the script is to send an EOF. This requires the user to punch CtrlD into the keyboard, or a file to simply end. Then your script will continue.
There are some other things wrong/weird with this script.
Main criticism: (a) all-uppercase variables are informally reserved for Perl and pragmatic modules. Lowercase or mixed case variables work too. (b) Your script contains quite some redundant code. Either refactor it into subs, or rewrite your logic
Here is an example rewrite that may be easier to debug / may not contain some of the bugs.
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use constant DEBUG_FLAG => 1; # set to false value for release
my $zoneadm_command = "/usr/sbin/zoneadm list -c";
my $temp_file_name = "/export/home/ptiwari/scripts/LOG.11";
sub prompt { print "\n", $_[0], "\n"; my $answer = <>; chomp $answer; return $answer }
sub DEBUG { print STDERR "DEBUG> ", #_, "\n" if DEBUG_FLAG }
system("clear");
my $app_name = prompt("Enter the app");
my $symm_id = prompt("Enter the Symmitrix ID");
my $server = prompt("Enter the server name");
print "Enter the devices.\n";
print qq(\tTo terminate the script, type "q" or "quit".\n);
print qq(\tTo finish the list of devices, type Ctrl+D.\n);
open my $temp_file, ">>", $temp_file_name
or die "Can't open log file: $!";
while (<>) {
chomp; # remove trailing newline
exit if /^q(?:uit)?$/; # terminate the script if the input line *is* `q` or `quit`.
my #field = split /:/;
# grep: select all true values
#field = grep {$_} ("/${app_name}db/$field[0]", $symm_id, #field[1 .. 3]);
print $temp_file join(" ", #field), "\n";
}
close $temp_file;
DEBUG("finished the reading loop");
# get the zones with only *one* extra process
my #zones =
grep {not /global/}
grep {/\Q$server\E/i}
map {chomp; $_}
qx($zoneadm_command);
my $is_zone = #zones ? "yes" : "no";
DEBUG("Am I in the zone? $is_zone");
open my $device_file, "<", $temp_file_name or die "Can't open $temp_file_name: $!";
while (<$device_file>) {
chomp;
my ($global_mtpt, $sym, $sym_dev) = split;
print join(" ", $global_mtpt, $sym, $sym_dev), "\n";
# or short: print join(" ", (split)[0 .. 2]), "\n";
}
You need something like this for stepping into the script:
http://www.devshed.com/c/a/Perl/Using-The-Perl-Debugger/
You can really use the debugger: http://perldoc.perl.org/perldebug.html
But if your preference is to trace like bash -x, take a look at this discussion:
http://www.perlmonks.org/?node_id=419653
The Devel::Trace Perl module is designed to mimic sh -x tracing for shell programs.
Try to remove the "my $" from the last open statement and the "$" from there in the last while statement. Or better yet, try this:
open(my FLH, '<', "/export/home/ptiwari/scripts/LOG.11");
my #lines = <FLH>;
foreach (#lines) {
my ($GLOBAL_MTPT, $SYM, $SYM_DEV, $SIZE, $NEWFS) = split;
print $GLOBAL_MTPT." ".$SYM." ".$SYM_DEV;
print "\n";
}
Let me know about the results.