Capturing standard out of a program to a file-like object? - perl

I'm trying to capture standard output of a command to a file-like object in Perl.
I essentially need to do the following:
Execute an OS command, capturing standard output.
Run a regular expression on each line of the file and pull output into an array.
Call another OS command for each item in the array of lines of output from the first command.
How can I do step one? I'd like to execute a command and get its standard out in a filelike object so as to be able to read it line by line.

The first part is easy:
use autodie qw(:all);
open my $input, '-|', 'os-command', #args;
Clearly, the remainder is not much harder:
while (<$input>)
{
next unless m/your regex/;
system 'other-command', $_;
}
Automatic error checking for the open and system calls is provided through autodie.

You might do:
my #input = qx( some_command );
for my $line (#input) {
$line =~ m{some_pattern} and system("some_command", "$line");
}

Related

Remove multiple duplicate lines from a file

I have a Perl script run in crontab that generates a file rich with duplicate entries, because on each run it rewrites information previously written.
I would use a sort -u of file, but, I would do it at the end of the Perl script file.
My list
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
...
My code
#!/usr/bin/perl
# Libraries
use strict;
use warnings 'all';
%lines = ();
# Remove duplicate
open( TMP_GL_OUTPUT, '>', $OUTPUT_FILE ) or die $!;
while ( <TMP_GL_OUTPUT> ) {
$lines{$_}++;
}
open( OUTFILE, '>', $TMPOUTPUT_FILE ) or die $!;
print OUTFILE keys %lines;
close( OUTFILE );
close( TMP_GL_OUTPUT );
Where am I going wrong? In shell it feels shorter than in Perl.
sort -u $TMPOUTPUT_FILE > $OUTPUT_FILE
As Suggested by ikegamy user, I've do as following:
move $OUTPUT_FILE, $TMPOUTPUT_FILE; # Copy file
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE; # Remove duplicate
unlink $TMPOUTPUT_FILE;
I think you are asking why your Perl program is longer than your shell script.
First of all, your shell script does something completely different than your Perl program.
Your shell script executes a program, and stores its out in a file.
Your Perl program reads a file, manipulates the data it read, and stores the output in a file.
The Perl equivalent to
sort -u -- "$TMPOUTPUT_FILE" > "$OUTPUT_FILE"
is
use IPC::Run qw( run );
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE;
(There are differences in error handling between these two.)
They're not that different in length.
This brings up the second difference. The shell specializes in executing programs, but Perl is a general purpose language. It would be surprising if it wasn't longer in Perl!
(Now try comparing the size of your Perl program to the source of sort...)
List::Util is a core module.
use List::Util 'uniq';
print for uniq <>
Your code looks almost OK.
My proposition is only to chomp each line, before you
save an element in the hash.
The reason is that e.g. the last line, not terminated
with a \n may look just the same as one of previous lines,
but without chomp the previous line would have contained
the terminating \n, whereas the last - not.
The resut is that both these lines will be different keys in the hash.
Compare my example program (working, presented below) with yours, there are
no other significant differences, apart from reading from __DATA__ and
writing to the console.
In my program, for demonstration purposes, I put 2 variants of printout,
one with key values (repetition counts) and another, printing just keys.
In your program leave only the second printout.
use strict; use warnings; use feature qw(say);
my %lines;
while(<DATA>) {
chomp;
$lines{$_}++;
}
while(my($key, $val) = each %lines) {
printf "%-32s / %d\n", $key, $val;
}
say '========';
foreach my $key (keys %lines) {
say $key;
}
__DATA__
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
Edit
Your code assigns no names to $OUTPUT_FILE and $TMPOUTPUT_FILE,
you even didn't declare these variables, but I assume, that in your actual
code you did it.
Another detail is that %lines should be preceded with my,
otherwise, as you put use strict; the compiler prints an error.
Edit 2
There is a quicker and shorter solution than yours.
Instead of writing lines to a hash and printing them as late as in
the second step, you can do it in a single loop:
Read the line.
Check whether the hash already contains a key equal to the line just read.
If not, then:
write the line to the hash, to block the printout, if just the
same line occured again,
print the line.
You can even write this program as a Perl one-liner:
perl -lne"print if !$lines{$_}++" input.txt
If you run the above command from the Windows cmd, it will print the output
to the console. If you use Linux, instead of double quotes, you can use apostrophes.
You may of course redirect the output to any file, adding > output.txt to
the above command.
The code is executed for each input line, chomped due to -l option.
If any other details concerning Perl one-liners are not known to you, search the web.

how to get text via backticks or system in Perl

I want to call a EXE file in Perl which performs some action
I tried calling the exe file via backtick and system but in both the cases i get only the return value
The exe file prints some text on to the console. Is it possible to capture that as well?
I looked into this variable ${^CHILD_ERROR_NATIVE} but I get only the return value and not text
I am using Perl 5.14
Thanks in advance
The application might not print its output to STDOUT but STDERR instead, which isn't captured by the backtick operator. To capture both, you could use the following:
my $binary = 'foo.exe';
my $output = `$binary 2>&1`;
For a more fine-tuned capturing, you might want to resort to IPC::Open3 with which you can "control" all of a process' streams (IN, OUT and ERR).
I used to execute commands from perl script and capture the output this way
sub execute_command() {
my($host) = #_;
open(COMMAND_IN, "your_command |");
while (<COMMAND_IN>)
{ #The COMMAND_IN will have the output of the command
#Read the output of your command here...
$ans = $_;
}
close(COMMAND_IN);
return $ans;
}
Check whether it helps you
I recommend the capture and capture_err functions from Scriptalicious.
use Scriptalicious qw(capture);
my $output = capture('my_command', 'arg');

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

STDOUT to array Perl

I am compiling a Perl program, i am writing the output STDOUT to a file. In the same program , i want to run another small script using while function on the output of STDOUT. So, I need to save the output of first script in an array, then i can use in while<#array>. Like
open(File,"text.txt");
open(STDOUT,">output,txt");
#file_contents=<FILE>;
foreach (#file_contents){
//SCRIPT GOES HERE//
write;
}
format STDOUT =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
//Here I want to use output of above program in while loop //
while(<>){
}
How can i save the output of first program into array so that i can use in while loop, or how can i directly use STDOUT in while loop. I have to make sure that first part is completely executed. Thanks in advance.
Since you remapped STDOUT so it writes to a file, you could presumably close STDOUT, and then reopen the file for reading.
Quite where you're going to send any other output is a bit of a mystery, but presumably you can resolve that. Were it me, I'd not fiddle with STDOUT. I'd make the script write to a file handle:
use strict;
use warnings;
open my $input, "<", "text.txt" or die "A horrible death";
open my $output, ">", "output.txt" or die "A horrible death";
my #file_contents = <$input>;
close($input);
foreach (#file_contents)
{
# Script goes here
print $output "Any information that goes to output\n";
}
close $output;
open my $reread, "<", "output.txt" or die "A horrible death";
while (<$reread>)
{
# Process the previous output
}
Note the use of lexical file handles, the checking that the open worked, the close when finished with the input file, the use of use strict; and use warnings;. (I've only been working with Perl for 20 years and I know I don't trust my scripts until they run clean with those settings.)
I assume you want to reopen STDOUT in order to make the write function work. However, the correct solution for that is to either specify the file handle, or to a lesser extent, to use select.
write FILEHANDLE;
or
select FILEHANDLE;
write;
Unfortunately, it seems the IO of perlform is a bit arcane, and does not seem to allow for lexical file handles.
Your problem is you can't reuse the formatted text within the program, so a bit of trixy programming is required. What you can do is open a file handle that prints to a scalar. Which is another somewhat arcane perl functionality, but in this case, it might be the only way to do this directly.
# Using FOO as format to avoid destroying STDOUT
format FOO =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
my $foo;
use autodie; # save yourself some typing
open INPUT, '<', "text.txt"; # normally, we would add "or die $!" on these
open FOO, '>', \$foo; # but now autodie handles that for us
open my $output, '>', "output.txt";
while (<FILE>) {
$foo = ""; # we need to reset $foo each iteration
write FOO; # write to the file handle instead
print $output $foo; # this now prints $foo to output.txt
do_something($foo); # now you can also process the text at the same time
}
As you'll notice, we now first print the formatted line to the scalar $foo. While it is there, we can handle it as regular data, so there's no need to save to a file and reopening it to get to the data.
Each iteration, data is concatenated to the end of $foo, so to avoid accumulation, we need to reset $foo. The best way to handle this would be to make $foo lexical within the scope, but unfortunately we need $foo to be declared outside the while loop in order to be able to use it in the open statement.
It might be possible to use local $foo inside the while-loop, but I think that's adding yet more bad practice to this already very bad hack.
Conclusion:
With all this said and done, I suspect the best way to handle this is to not use perlform at all, and format your data in some other way. While perlform might be well suited to print to a file, it is not the best suited for what you have in mind. I recall this question from earlier, perhaps there was some other answer that would work better. Such as using sprintf, like Jonathan suggested
Assuming the output from your first program is tab-delimited:
while (<>) {
chomp $_;
my ($variable, $x, $y, $z) = split("\t", $_);
# do stuff with values
}

How do I get a filehandle from the command line?

I have a subroutine that takes a filehandle as an argument. How do I make a filehandle from a file path specified on the command line? I don't want to do any processing of this file myself, I just want to pass it off to this other subroutine, which returns an array of hashes with all the parsed data from the file.
Here's what the command line input I'm using looks like:
$ ./getfile.pl /path/to/some/file.csv
Here's what the beginning of the subroutine I'm calling looks like:
sub parse {
my $handle = shift;
my #data = <$handle>;
while (my $line = shift(#data)) {
# do stuff
}
}
Command line arguments are available in the predefined #ARGV array. You can get the file name from there and use open to open a filehandle to it. Assuming that you want read-only access to the file, you would do it this way:
my $file = shift #ARGV;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
parse($fh);
Note that the or die... checks the call open for success and dies with an error message if it wasn't. The built-in variable $! will contain the (OS dependent) error message on failure that tells you why the call wasn't successful. e.g. "Permission denied."
parse(*ARGV) is the simplest solution: the explanation is a bit long, but an important part of learning how to use Perl effectively is to learn Perl.
When you use a null filehandle (<>), it actually reads from the magical ARGV filehandle, which has special semantics: it reads from all the files named in #ARGV, or STDIN if #ARGV is empty.
From perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk. Input from <> comes either from standard
input, or from each file listed on the command line. Here’s how it
works: the first time <> is evaluated, the #ARGV array is checked, and
if it is empty, $ARGV[0] is set to "-", which when opened gives you
standard input. The #ARGV array is then processed as a list of
filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn’t so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally--<> is just a
synonym for <ARGV>, which is magical. (The pseudo code above doesn’t
work because it treats <ARGV> as non-magical.)
You don't have to use <> in a while loop -- my $data = <> will read one line from the first non-empty file, my #data = <>; will slurp it all up at once, and you can pass *ARGV around as if it were a normal filehandle.
This is what the -n switch is for!
Take your parse method, and do this:
#!/usr/bin/perl -n
#do stuff
Each line is stored in $_. So you run
./getfile.pl /path/to.csv
And it does this.
See here and here for some more info about these. I like -p too, and have found the combo of -a and -F to be really useful.
Also, if you want to do some extra processing, add BEGIN and end blocks.
#!/usr/bin/perl -n
BEGIN {
my $accumulator;
}
# do stuff
END {
print process_total($accumulator);
}
or whatever. This is very, very useful.
Am I missing something or are you just looking for the open() call?
open($fh, "<$ARGV[0]") or die "couldn't open $ARGV[0]: $!";
do_something_with_fh($fh);
close($fh);