Perl: process string with shell command (pipe) - perl

Assume a pipeline with three programs:
start | middle | end
If start and end are now part of one perl script, how can I pipe data through a shell command in the perl script, in order to pass through middle?
I tried the following (apologies for lack of strict mode, it was supposed to be a simple proof of concept):
#!/usr/bin/perl -n
# Output of "start" stage
$start = "a b c d\n";
# This shell command is "middle"
open (PR, "| sed -E 's/a/-/g' |") or die 'Failed to start sed';
# Pipe data from "start" into "middle"
print PR $start;
# Read data from "middle" into "end"
$end = "";
while (<PR>) {
$end .= $_;
}
close PR;
# Apply "end" and print output
$end =~ s/b/+/g;
print $end;
Expected output:
- + c d
Actual output:
none, until I hit ENTER, then I get - b c d. The middle command is receiving data from start and processing it, but the output is going to STDOUT instead of end. Also, the attempt to read from middle seems to be reading from STDIN instead (hence the relevance of hitting ENTER).
I'm aware that this could all easily be done in one line of perl (or sed); my problem is how to do piping in perl, not how to replace chars in a string.

You can use IPC::Open2 for this.
This code creates two file handles: $to_sed, which you can print to to send input to the program, and $from_sed which you can readline (or <$from_sed>) from to read the program's output.
use IPC::Open2;
my $pid = open2(my ($from_sed, $to_sed), "sed -E 's/a/-/g'");
Most often it is simplest to involve the shell, but there is an alternative call that allows you to bypass the shell and instead run a program and populate its argv directly. It is described in the linked documentation.

The reason your code does nothing until you hit enter is because you are using perl -n.
-n causes Perl to assume the following loop around your program, which makes it iterate over filename arguments
somewhat like sed -n or awk:
LINE:
while (<>) {
... # your program goes here
}
The part in your code where you read your file again returns nothing.
If you turn on warnings you will discover that perl doesn't do bi-directional pipes.

Related

Error while running sed command in perl cript

I am trying to run the following command in perl script :
#!/usr/bin/perl
my $cmd3 =`sed ':cycle s/^\(\([^,]*,\)\{0,13\}[^,|]*\)|[^,]*/\1/;t cycle' file1 >file2`;
system($cmd3);
but is not producing any output nor any error.
Although when I am running the command from command line it is working perfectly and gives desired output. Can you guys please help what I am doing wrong here ?
Thanks
system() doesn't return the output, just the exit status.
To see the output, print $cmd3.
my $cmd3 = `sed ':cycle s/^\(\([^,]*,\)\{0,13\}[^,|]*\)|[^,]*/\1/;t cycle' file1 >file2`;
print "$cmd3\n";
Edit:
If you want to check for exceptional return values, use CPAN module IPC::System::Simple:
use IPC::System::Simple qw(capture);
my $result = capture("any-command");
Running sed from inside Perl is just insane.
#!/usr/bin/perl
open (F, '<', "file1") or die "$O: Could not open file1: $!\n";
while (<F>) {
1 while s/^(([^,]*,){0,13}[^,|]*)\|[^,]*/$1/;
print;
}
Notice how Perl differs from your sed regex dialect in that grouping parentheses and alternation are unescaped, whereas a literal round parenthesis or pipe symbol needs to be backslash-escaped (or otherwise made into a literal, such as by putting it in a character class). Also, the right-hand side of the substitution prefers $1 (you will get a warning if you use warnings and have \1 in the substitution; technically, at this level, they are equivalent).
man perlrun has a snippet explaining how to implement the -i option inside a script if you really need that, but it's rather cumbersome. (Search for the first occurrence of "LINE:" which is part of the code you want.)
However, if you want to modify file1 in-place, and you pass it to your Perl script as its sole command-line argument, you can simply say $^I = 1; (or with use English; you can say $INPLACE_EDIT = 1;). See man perlvar.
By the way, the comment that your code "isn't producing any output" isn't entirely correct. It does what you are asking it to; but you are apparently asking for the wrong things.
Quoting a command in backticks executes that command. So
my $cmd3 = `sed ... file1 >file2`;
runs the sed command in a subshell, there and then, with input from file1, and redirected into file2. Because of the redirection, the output from this pipeline is nothing, i.e. an empty string "", which is assigned to $cmd3, which you then completely superfluously attempt to pass to system.
Maybe you wanted to put the sed command in regular quotes instead of backticks (so that the sed command line would be the value of $cmd3, which it then makes sense to pass to system). But because of the redirection, it would still not produce any visible output; it would create file2 containing the (possibly partially substituted) text from file1.

When does the output of a script get appended to a file, and what can I do to circumvent the outcome?

I have a script called my_bash.sh and it calls a perl script, and appends the output of the perl script to a log file. Does the file only get written once the perl script has completed?
my_bash.sh
#!/bin/bash
echo "Starting!" > "my_log.log"
perl my_perl.pl >> "my_log.log" 2>&1
echo "Ending!" >> "my_log.log"
The issue is that as the perl script is running, I'd like to manipulate contents of the my_log.log file while it's running, but it appears the file is blank. Is there a proper way to do this? Please let me know if you'd like more information.
my_perl.pl
...
foreach $component (#arrayOfComponents)
{
print "Component Name: $component (MORE_INFO)\n";
# Do some work to gather more info (including other prints)
#...
# I want to replace "MORE_INFO" above with what I've calculated here
system("sed 's/MORE_INFO/$moreInfo/' my_log.log");
}
The sed isn't working correctly since the print statements haven't yet made it to the my_log.log.
Perl would buffer the output by default. To disable buffering set $| to a non-zero value. Add
$|++;
at the top of your perl script.
Quoting perldoc pervar:
$|
If set to nonzero, forces a flush right away and after every write or
print on the currently selected output channel. Default is 0
(regardless of whether the channel is really buffered by the system or
not; $| tells you only whether you've asked Perl explicitly to flush
after each write). STDOUT will typically be line buffered if output is
to the terminal and block buffered otherwise. Setting this variable is
useful primarily when you are outputting to a pipe or socket, such as
when you are running a Perl program under rsh and want to see the
output as it's happening. This has no effect on input buffering. See
getc for that. See select on how to select the output channel. See
also IO::Handle.
Mnemonic: when you want your pipes to be piping hot.
The answer to this question depends on how your my_perl.pl is outputting data and how much data is being output.
If you're using normal (buffered) I/O to produce your output, then my_log.log will only be written to once the STDOUT buffer fills. Generally speaking, if you're not producing a lot of output, this is when the program ends and the buffer is flushed.
If you're producing enough output to fill the output buffer, you will get output in my_log.log prior to my_perl.pl completing.
Additionally, in Perl, you can make your STDOUT unbuffered with the following code:
select STDOUT; $| = 1;
In which case, your output would be written to STDOUT (and then to my_log.log via redirection) the moment it is produced in your script.
Depending on what you need to do to the log file, who might be able to read each line of output from the perl script, do something with the line, then write it to the log yourself (or not):
#!/bin/bash
echo "Starting!" > "my_log.log"
perl my_perl.pl | \
while read line; do
# do something with the line
echo "$line" >> "my_log.log"
done
echo "Ending!" >> "my_log.log"
In between
print "Component Name: $component (MORE_INFO)\n";
and
system("sed 's/MORE_INFO/$moreInfo/' my_log.log");
do you print stuff? If not, delay the first print until you've figured out $moreInfo
my $header = "Component Name: $component (MORE_INFO)";
# ...
# now I have $moreInfo
$header =~ s/MORE_INFO/$moreInfo/;
print $header, "\n";
If you do print stuff, you could always "queue" it until you have the info you need:
my #output;
foreach my $component (...) {
#output = ("Component Name: $component (MORE_INFO)");
# ...
push #output, "something to print";
# ...
$output[0] =~ s/MORE_INFO/$moreInfo/;
print join("\n", #output), "\n";

Perl -> How return value of qx(perl file)

I need to know how is possible return values of Perl file from other Perl file.
In my first file i call to the second file with sentence similar to:
$variable = qx( perl file2.pl --param1 $p1 --param2 $p2);
I have tried with exit and return to get this data but is not possible.
Any idea?
Processes are no subroutines.
Communication between processes (“IPC”) is mostly done via normal file handles. Such file handles can specifically be
STDIN and STDOUT,
pipes that are set up by the parent process, these are then shared by the child,
sockets
Every process also has an exit code. This code is zero for success, and non-zero to indicate a failure. The code can be any integer in the range 0–255. The exit code can be set via the exit function, e.g. exit(1), and is also set by die.
Using STDIN and STDOUT is the normal mode of operation for command line programs that follow the Unix philosophy. This allows them to be chained with pipes to more complex programs, e.g.
cat a b c | grep foo | sort >out
Such a tool can be implemented in Perl by reading from the ARGV or STDIN file handle, and printing to STDOUT:
while (<>) {
# do something with $_
print $output;
}
Another program can then feed data to that script, and read it from the STDOUT. We can use open to treat the output as a regular file handle:
use autodie;
open my $tool, "-|", "perl", "somescript.pl", "input-data"; # notice -| open mode
while (<$tool>) {
...
}
close $tool;
When you want all the output in one variable (scalar or array), you can use qx as a shortcut: my $tool_output = qx/perl somescript.pl input-data/, but this has two disadvantages: One, a shell process is executed to parse the command (shell escaping problems, inefficiency). Two, the output is available only when the command has finished. Using open on the other hand allows you to do parallel computations.
In file2.pl, you must print something to STDOUT. For example:
print "abc\n";
print is the solution.
Sorry for my idiot question!
#
$variable = system( perl file2.pl --param1 $p1 --param2 $p2);
#$variable has return value of perl file2.pl ...

What's the use of <> in Perl?

What's the use of <> in Perl. How to use it ?
If we simply write
<>;
and
while(<>)
what is that the program doing in both cases?
The answers above are all correct, but it might come across more plainly if you understand general UNIX command line usage. It is very common to want a command to work on multiple files. E.g.
ls -l *.c
The command line shell (bash et al) turns this into:
ls -l a.c b.c c.c ...
in other words, ls never see '*.c' unless the pattern doesn't match. Try this at a command prompt (not perl):
echo *
you'll notice that you do not get an *.
So, if the shell is handing you a bunch of file names, and you'd like to go through each one's data in turn, perl's <> operator gives you a nice way of doing that...it puts the next line of the next file (or stdin if no files are named) into $_ (the default scalar).
Here is a poor man's grep:
while(<>) {
print if m/pattern/;
}
Running this script:
./t.pl *
would print out all of the lines of all of the files that match the given pattern.
cat /etc/passwd | ./t.pl
would use cat to generate some lines of text that would then be checked for the pattern by the loop in perl.
So you see, while(<>) gets you a very standard UNIX command line behavior...process all of the files I give you, or process the thing I piped to you.
<>;
is a short way of writing
readline();
or if you add in the default argument,
readline(*ARGV);
readline is an operator that reads a line from the specified file handle. Reading from the special file handle ARGV will read from STDIN if #ARGV is empty or from the concatenation of the files named by #ARGV if it's not.
As for
while (<>)
It's a syntax error. If you had
while (<>) { ... }
it get rewritten to
while (defined($_ = <>)) { ... }
And as previously explained, that means the same as
while (defined($_ = readline(*ARGV))) { ... }
That means it will read lines from (previously explained) ARGV until there are no more lines to read.
It is called the diamond operator and feeds data from either stdin if ARGV is empty or each line from the files named in ARGV. This webpage http://docstore.mik.ua/orelly/perl/learn/ch06_02.htm explains it very well.
In many cases of programming with syntactical sugar like this, Deparse of O is helpful to find out what's happening:
$ perl -MO=Deparse -e 'while(<>){print 42}'
while (defined($_ = <ARGV>)) {
print 42;
}
-e syntax OK
Quoting perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes
a list of filenames, doing the same to each line of input from all of
them. Input from <> comes either from standard input, or from each
file listed on the command line.
it takes the STDIN standard input:
> cat temp.pl
#!/usr/bin/perl
use strict;
use warnings;
my $count=<>;
print "$count"."\n";
>
below is the execution:
> temp.pl
3
3
>
so as soon as you execute the script it will wait till the user gives some input.
after 3 is given as input,it stores that value in $count and it prints the value in the next statement.

Perl: How to get filename when using <> construct?

Perl offers this very nice feature:
while ( <> )
{
# do something
}
...which allows the script to be used as script.pl <filename> as well as cat <filename> | script.pl.
Now, is there a way to determine if the script has been called in the former way, and if yes, what the filename was?
I know I knew this once, and I know I even used the construct, but I cannot remember where / how. And it proved very hard to search the 'net for this ("perl stdin filename"? No...).
Help, please?
The variable $ARGV holds the current file being processed.
$ echo hello1 > file1
$ echo hello2 > file2
$ echo hello3 > file3
$ perl -e 'while(<>){s/^/$ARGV:/; print;}' file*
file1:hello1
file2:hello2
file3:hello3
The I/O Operators section of perlop is very informative about this.
Essentially, the first time <> is executed, - is added to #ARGV if it started out empty. Opening - has the effect of cloning the STDIN file handle, and the variable $ARGV is set to the current element of #ARGV as it is processed.
Here's the full clip.
The null filehandle "<>" is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes a
list of filenames, doing the same to each line of input from all of
them. Input from "<>" comes either from standard input, or from each
file listed on the command line. Here's how it works: the first time
"<>" is evaluated, the #ARGV array is checked, and if it is empty,
$ARGV[0] is set to "-", which when opened gives you standard input. The
#ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally. "<>" is just
a synonym for "<ARGV>", which is magical. (The pseudo code above doesn't
work because it treats "<ARGV>" as non-magical.)
If you care to know about when <> switches to a new file (e.g. in my case - I wanted to record the new filename and line number), then the eof() function documentation offers a trick:
# reset line numbering on each input file
while (<>) {
next if /^\s*#/; # skip comments
print "$.\t$_";
} continue {
close ARGV if eof; # Not eof()!
}