Prepend perl program with filter

Prepend perl program with filter - perl

I want to write:
... | my_filter | myperlprogram
But I do not know how to run my_filter until I have started myperlprogram.
Can I somehow in myperlprogram loop STDIN through my_filter before reading it?
I am thinking something like:
pipe($a,$b);
if(not fork()) {
close STDOUT;
open STDOUT, $b;
exec "my_filter --with the correct --options";
} else {
close STDIN
open STDIN, $a
}
# continue reading STDIN now looped through `my_filter`

It's not at all clear from the description why a simple
open STDIN, '-|', 'your_filter', '--option1', ...
will not do.

The way I see the problem is: To filter the STDIN for the script, by using an external program which is run from inside the script once the script is running (so, not with a pipeline). With IPC::Run
use warnings;
use strict;
use feature 'say';
use IPC::Run qw(start pump finish);
my $filtered_in;
FILTER_IN: {
my #cmd = qw(a_filter.pl); # add filter's options/arguments
my $h = start \#cmd, \my $in, \$filtered_in;
while (<>) {
$in = $_;
pump $h while length $in;
# Wait for filter's output -- IF WANT to process lines as received
pump $h until $filtered_in =~ /\n\z/;
chomp $filtered_in; # process/use filter's output
$filtered_in .= '|'; # as it's coming (if needed)
}
finish $h or die "Cleanup returned: $?";
};
say $filtered_in // 'no input';
This allows one to process filter's lines of output as they are emitted. If that is not needed but we only want to accumulate filter's output for later then you don't need the code under # Wait for...
Simplest test with a_filter.pl such as
use warnings;
use strict;
STDOUT->autoflush(1);
my $cnt = 0;
while (<>) { print "line ", ++$cnt, ": ", $_ }
and then run
echo "a\nfew\nlines" | script.pl
with output
line 1: a|line 2: few|line 3: lines|
from our toy processing in script.pl above.
This will filter input via a file as well,
script.pl < input.txt

Related

How do I run perl scripts in parallel and capture outputs in files?

I need to run my perl tests in parallel and capture STDOUT and STDERR in a separate file for each test file. I'm having no success even in capturing in one file. I've been all over SO and have had no luck. Here is where I started (I'll spare you all the variations). Any help is greatly appreciated. Thanks!
foreach my $file ( #files) {
next unless $file =~ /\.t$/;
print "\$file = $file\n";
$file =~ /^(\w+)\.\w+/;
my $file_pfx = $1;
my $this_test_file_name = $file_pfx . '.txt';
system("perl $test_dir\\$file > results\\$test_file_name.txt &") && die "cmd failed: $!\n";
}

Here is a simple example using Parallel::ForkManager to spawn separate processes.
In each process the STDOUT and STDERR streams are redirected, in two ways for a demo: STDOUT to a variable, that can then be passed around as desired (here dumped into a file), and STDERR directly to a file. Or use a library, with an example in a separate code snippet.
The numbers 1..6 represent batches of data that each child will pick from to process. Only three processes are started right away and then as one finishes another one is started in its place.† (Here they exit nearly immediately, the "jobs" being trivial.)
use warnings;
use strict;
use feature 'say';
use Carp qw(carp)
use Path::Tiny qw(path);
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(3);
foreach my $data (1..6) {
$pm->start and next; # start a child process
proc_in_child($data); # code that runs in the child process
$pm->finish; # exit it
}
$pm->wait_all_children; # reap all child processes
say "\nParent $$ done\n";
sub proc_in_child {
my ($data) = #_;
say "Process $$ with data $data"; # still shows on terminal
# Will dump all that was printed to streams to these files
my (outfile, $errfile) =
map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);
# Redirect streams
# One way to do it, redirect to a variable (for STDOUT)...
open my $fh_stdout, ">", \my $so or carp "Can't open handle to variable: $!";
my $fh_STDOUT = select $fh_stdout;
# ...another way to do it, directly to a file (for any stream)
# (first 'dup' it so it can be restored if needed)
open my $SAVEERR, ">&STDERR" or carp "Can't dup STDERR: $!";
open *STDERR, ">", $errfile or carp "Can't redirect STDERR to $errfile: $!";
# Prints wind up in a variable (for STDOUT) and a file (for STDERR)
say "STDOUT: Child process with pid $$, processing data #$data";
warn "STDERR: Child process with pid $$, processing data #$data";
close $fh_stdout;
# If needed to restore (not in this example which exits right away)
select $fh_STDOUT;
open STDERR, '>&', $SAVEERR or carp "Can't reopen STDERR: $!";
# Dump all collected STDOUT to a file (or pass it around, it's a variable)
path( $outfile )->spew($so);
return 1
}
While STDOUT is redirected to a variable, STDERR cannot be redirected that way and here it goes directly to a file. See open. However there are ways to capture it in a variable as well.
Then you can use the module's ability to return from child processes to the parent, which can then handle those variables. See for example this post and this post and this post. (There's way more, these are merely the ones I know.) Or indeed just dump them to files, as done here.
Another way is to use modules that can run code and redirect output, like Capture::Tiny
use Capture::Tiny qw(capture);
sub proc_in_child {
my ($data) = #_;
say "Process $$ with data $data"; # on terminal
# Run code and capture all output
my ($stdout, $stderr, #results) = capture {
say "STDOUT: Child process $$, processing data #$data";
warn "STDERR: Child process $$, processing data #$data";
# return results perhaps...
1 .. 4;
}
# Do as needed with variables with collected STDOUT and STDERR
# Return to parent, or dump to file:
my ($outfile, $errfile) =
map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);
path($outfile) -> spew( $stdout );
path($errfile) -> spew( $stderr );
return 1
}
† This keeps the same number of processes running. Or, one can set it up to wait for the whole batch to finish and then start another batch. For some details of operation see this post

I think, the easiest way is to use shell redirects in your 'system' command. BTW, spawning uncontrolled subprocesses from it with '&' makes me frown.
Here is a simple example of with shell redirects and fork.
#!/usr/bin/perl
use strict;
for my $i (0..2) {
my $stdoutName = "stdout$i.txt";
my $stderrName = "stderr$i.txt";
my $pid = fork();
if($pid == 0) {
system("perl mytest.pl 1>$stdoutName 2>$stderrName"); #redirects are here 1> (stdout) and 2> (stderr)
exit $?;
}
}

Make Perl IPC::Run pump each line, not occasional batches

I use IPC::Run to get output from an external executable in a cron run script. I need it to be able to filter and make decisions based on the output on the fly. But the problem is it gives me output not on the fly, but in few batches - many lines at once, only after the executable has been run for a while. Is it possible to somehow flush the output like we can on the grep command with grep --line-buffered? I do not see this properly answered in all the Perl sites. Here is my script part:
use IPC::Run qw( start pump finish );
...
my $externalExecRun = start \#executableAndParameters, \undef, \$executableStdout, \$executableStderr ;
while (42) {
pump $externalExecRun;
if ($executableStdout eq '' and $engineStderr eq '') {last;}
WriteToLog("\$executableStdout: -->$executableStdout<--"); #This writes many lines at once
WriteToLog("\$executableStderr: -->$executableStderr<--");
$executableStdout = "";
$executableStderr = "";
}
finish $externalExecRun;

You can use IPC::Run's new_chunker to have it give you output on a line-by-line basis:
use warnings;
use strict;
use IPC::Run qw/ start new_chunker /;
use Data::Dump;
my $run = start ['perl','-le','print "a" x $_ for 1..180'],
'>', new_chunker, \my $out, '2>', new_chunker, \my $err;
while (1) {
$run->pump;
last unless defined $out || defined $err;
dd $out, $err;
($out,$err) = ();
}
$run->finish;
It's still possible that the external program won't output on a line-by-line basis, in which case, at least on *NIX, changing the first '>' into '>pty>' (as suggested by #ikegami in the comments) will hopefully help; or one of the links provided by #daxim.

Perl - Command line input and STDIN

I've created this script below for a assignment I have. It asks for a text file, checks the frequency of words, and lists the 10 words that appear the most times. Everything is working fine, but I need this script to be able to start via the command line as well as via the standard input.
So I need to be able to write 'perl wfreq.pl example.txt' and that should start the script and not ask the question for a text file. I'm not sure how to accomplish this really. I think I might need a while loop at the start somewhere that skips the STDIN if you give it the text file on a terminal command line.
How can I do it?
The script
#! /usr/bin/perl
use utf8;
use warnings;
print "Please enter the name of the file: \n" ;
$file = <STDIN>;
chop $file;
open(my $DATA, "<:utf8", $file) or die "Oops!!: $!";
binmode STDOUT, ":utf8";
while(<$DATA>) {
tr/A-Za-z//cs;
s/[;:()".,!?]/ /gio;
foreach $word (split(' ', lc $_)) {
$freq{$word}++;
}
}
foreach $word (sort { $freq{$b} <=> $freq{$a} } keys %freq) {
#fr = (#fr, $freq{$word});
#ord = (#ord, $word);
}
for ($v =0; $v < 10; $v++) {
print " $fr[$v] | $ord[$v]\n";
}

Instead of reading from <STDIN>, you can read from <> to get data either from files provided on the command line or from stdin if there are no files.
For example, with the program:
#!/usr/bin/env perl
while (<>) {
print $_;
}
The command ./script foo.txt will read and print lines from foo.txt, while ./script by itself will read and print lines from standard input.

You need to do the following:
my $DATA;
my $filename = $ARGV[0];
unless ($filename) {
print "Enter filename:\n";
$filename = <STDIN>;
chomp $filename;
}
open($DATA, $filename) or die $!;
Though I have to say, user-prompts are very un-Unix like.

perl script.pl < input.txt
The use of the operator < passes input.txt to script.pl as standard input. You can then skip querying for the filename. Otherwise, use $ARGV[0] or similar, if defined.

You can check for a command-line argument in #ARGV, which is Perl's array that automagically grabs command line arguments, and --if present-- process them (else continue with input from STDIN). Something like:
use utf8;
use strict; #Don't ever forget this! Always, always, ALWAYS use strict!
use warnings;
if(#ARGV)
{
#Assume that the first command line argument is a file to be processed as in your original code.
#You may or may not want to care if additional command line arguments are passed. Up to you.
}
else
{
#Ask for the filename and proceed as normal.
}
Note that you might want to isolate the code for processing the file name (i.e., the opening of DATA, going through the lines, counting the words, etc.) to a subroutine that you can easily call (perhaps with an argument consisting of the file name) in each branch of the code.
Oh, and to make sure I'm clear about this: always use strict and always use warnings. If you don't, you're asking for trouble.

How to know whether there are pipelines input to a perl program

I can use <> to loop there the pipeline input to a perl program. However how can I decide whether there are pipelined input, if there is no pipelined input I will use environment variable to load a file. I am trying to use:
my #lines = (<>);
if ($#lines == -1) {
use setenv;
open FILE, "$ENV{'ART_FILE_LIST'}" or die $!;
#lines = <FILE>;
}
Obviously it doesn't work, because the program will waiting at the first line

use 5.010_000;
use utf8;
use strict;
use autodie;
use warnings qw< FATAL all >;
use open qw< :std :utf8 >;
END {
close(STDOUT)
|| die "can't close stdout: $!";
}
if (#ARGV == 0 && -t STDIN) {
# NB: This is magic open, so the envariable
# could hold a pipe, like 'cat -n /some/file |'
#ARGV = $ENV{ART_FILE_LIST}
|| die q(need $ART_FILE_LIST envariable set);
}
while (<>) {
# blah blah blah
}

You can use the -t operator to see if you are a terminal, i.e., not a pipeline:
if (-t STDIN) { print "Terminal\n" }
else { print "Not a terminal\n" }

Use Getopt::Long
perl -Mylib -e 'Mylib::do_stuff' --i_am_pipe_lined
One of the things about UNIX pipelines is that they achieve their usefulness by not caring what's before them or after them. They just have a job to do and they do it. They do one thing, simply, but they all have switches to do their simple job with a little more customization.

Perl: avoid greedy reading from stdin?

Consider the following perl script (read.pl):
my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;
If this script is executed from the command line, it will get the first line of input, while cat gets everything else until the end of input (^D is pressed).
However, things are different when the input is piped from another process or read from a file:
$ echo "foo\nbar" | ./read.pl
Perl read: foo
And here's what cat gets:
Perl seems to greadily buffer the entire input somewhere, and processes called using backticks or system do no see any of the input.
The problem is that I'd like to unit test a script that mixes <STDIN> and calls to other processes. What would be the best way to do this? Can I turn off input buffering in perl? Or can I spool the data in a way that will "mimic" a terminal?

This is not a Perl problem. It is a UNIX/shell problem. When you run a command without pipes you are in line buffering mode, but when you redirect with pipes, you are in block buffering mode. You can see this by saying:
cat /usr/share/dict/words | ./read.pl | head
This C program has the same problem:
#include <stdio.h>
int main(int argc, char** argv) {
char line[4096];
FILE* cat;
fgets(line, 4096, stdin);
printf("C got: %s\ncat got:\n", line);
cat = popen("cat", "r");
while (fgets(line, 4096, cat)) {
printf("%s", line);
}
pclose(cat);
return 0;
}

I have good news and bad news.
The good news is a simple modification of read.pl allows you to give it fake input:
#! /usr/bin/perl
use warnings;
use strict;
binmode STDIN, "unix" or die "$0: binmode: $!";
my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;
Sample run:
$ printf "A\nB\nC\nD\n" | ./read.pl
Perl read: A
And here's what cat gets: B
C
D
The bad news is you get a single switchover: if you try to repeat the read-then-cat, the first cat will starve all subsequent reads. To see this, consider
#! /usr/bin/perl
use warnings;
use strict;
binmode STDIN, "unix" or die "$0: binmode: $!";
my $line = <STDIN>;
print "1: Perl read: $line";
print "1: And here's what cat gets: ", `cat -`;
$line = <STDIN>;
$line = "<undefined>\n" unless defined $line;
print "2: Perl read: $line";
print "2: And here's what cat gets: ", `cat -`;
and then a sample run that produces
$ printf "A\nB\nC\nD\n" | ./read.pl
1: Perl read: A
1: And here's what cat gets: B
C
D
2: Perl read: <undefined>
2: And here's what cat gets:

Today I think I've found what I needed: Perl has a module called Expect which is perfect for such situations:
#!/usr/bin/perl
use strict;
use warnings;
use Expect;
my $exp = Expect->spawn('./read.pl');
$exp->send("First Line\n");
$exp->send("Second Line\n");
$exp->send("Third Line\n");
$exp->soft_close();
Works like a charm ;)

Here's a sub-optimal way that I've found:
use IPC::Run;
my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, \$output);
$process->pump() until $output =~ /Perl read:/;
$input .= "Second Line\n";
$process->finish();
print $output;
It's sub-optimal in the sense that one needs to know the "prompt" that the program will emit before waiting for more input.
Another sub-optimal solution is the following:
use IPC::Run;
my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, my $timer = IPC::Run::timer(1));
$process->pump() until $timer->is_expired();
$timer->start(1);
$input .= "Second Line\n";
$process->finish();
It does not require knowledge of any prompt, but is slow because it waits at least two seconds. Also, I don't understand why the second timer is needed (finish won't return otherwise).
Does anybody know better solutions?

Finally I ended up with the following solution. Still far from optimal, but it works. Even in situations like the one described by gbacon.
use Carp qw( confess );
use IPC::Run;
use Scalar::Util;
use Time::HiRes;
# Invokes the given program with the given input and argv, and returns stdout/stderr.
#
# The first argument provided is the input for the program. It is an arrayref
# containing one or more of the following:
#
# * A scalar is simply passed to the program as stdin
#
# * An arrayref in the form [ "prompt", "input" ] causes the function to wait
# until the program prints "prompt", then spools "input" to its stdin
#
# * An arrayref in the form [ 0.3, "input" ] waits 0.3 seconds, then spools
# "input" to the program's stdin
sub capture_with_input {
my ($program, $inputs, #argv) = #_;
my ($stdout, $stderr);
my $stdin = '';
my $process = IPC::Run::start( [$program, #argv], \$stdin, \$stdout, \$stderr );
foreach my $input (#$inputs) {
if (ref($input) eq '') {
$stdin .= $input;
}
elsif (ref($input) eq 'ARRAY') {
(scalar #$input == 2) or
confess "Input to capture_with_input must be of the form ['prompt', 'input'] or [timeout, 'input']!";
my ($prompt_or_timeout, $text) = #$input;
if (Scalar::Util::looks_like_number($prompt_or_timeout)) {
my $start_time = [ Time::HiRes::gettimeofday ];
$process->pump_nb() while (Time::HiRes::tv_interval($start_time) < $prompt_or_timeout);
}
else {
$prompt_or_timeout = quotemeta $prompt_or_timeout;
$process->pump until $stdout =~ m/$prompt_or_timeout/gc;
}
$stdin .= $text;
}
else {
confess "Unknown input type passed to capture_with_input!";
}
}
$process->finish();
return ($stdout, $stderr);
}
my $input = [
"First Line\n",
["Perl read:", "Second Line\n"],
[0.5, "Third Line\n"],
];
print "Executing process...\n";
my ($stdout, $stderr) = capture_with_input('./read.pl', $input);
print "done.\n";
print "STDOUT:\n", $stdout;
print "STDERR:\n", $stderr;
Usage example (with a slightly modified read.pl to test gbacon's case):
$ time ./spool_read4.pl
Executing process...
done.
STDOUT:
Perl read: First Line
And here's what head -n1 gets: Second Line
Perl read again: Third Line
STDERR:
./spool_read4.pl 0.54s user 0.02s system 102% cpu 0.547 total
Still, I'm open to better solutions...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Prepend perl program with filter - perl

It's not at all clear from the description why a simple open STDIN, '-|', 'your_filter', '--option1', ... will not do.

Related

How do I run perl scripts in parallel and capture outputs in files?

Make Perl IPC::Run pump each line, not occasional batches

Perl - Command line input and STDIN

How to know whether there are pipelines input to a perl program

Perl: avoid greedy reading from stdin?

Categories

Resources