Loop Find Command's Output - perl

I'm wanting to issue the find command in Perl and loop through the resulting file paths. I'm trying it like so (but not having any luck):
my $cmd;
open($cmd, '-|', 'find $input_dir -name "*.fastq.gz" -print') or die $!;
while ($line = <$cmd>) {
print $line;
}
close $cmd;
Any ideas?
Thanks

You're not applying enough escaping to the * character.
Prepending a \ should fix it.
It's better not to invoke the shell in the first place,
by separating the arguments:
use warnings;
use strict;
open(my $cmd, '-|', 'find', $input_dir, '-name' ,'*.fastq.gz', '-print') or die $!;
while (my $line = <$cmd>) {
print $line;
}
close $cmd;

Your problem seems to be using single quotes. Your variable will not be interpolated, but the variable name will be fed to find as-is.
But why not use File::Find?
> perl -MFile::Find -lwe '
$foo = "perl";
find ( sub { /\.pl$/i or return; print $File::Find::name }, $foo);'
perl/foo.pl
perl/parsewords.pl
perl/yada.pl
Here, the wanted subroutine is simply a pattern match against the file name. We exit (return from) the subroutine unless the extension is .pl, else we print the file name with the relative path.

If you were to do
print 'find $input_dir -name "*.fastq.gz" -print';
The problem should become obvious: Single-quotes don't interpolate. You probably meant to do
open(my $cmd_fh, '-|', qq{find $input_dir -name "*.fastq.gz" -print}) or die $!;
but that's buggy too. You don't convert $input_dir into a shell literal. Two solutions present themselves.
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote("find", $input_dir, "-name", "*.fastq.gz", "-print");
open(my $cmd_fh, '-|', $cmd) or die $!;
Or
my #cmd = ("find", $input_dir, "-name", "*.fastq.gz", "-print");
open(my $cmd_fh, '-|', #cmd) or die $!;

To read the output of a command, use the backtick operator.
my $command = "find $inputdir ..."; # interpolate the input directory
my $output = `$command`; # be careful here
my #lines = split /\n/ => $output; # split in single lines
for my $line (#lines) { # iterate
# do something with $line
}
I think it's much better readable than piping. The downside is that it blocks, so if you want to process huge output strings with lots of lines, the pipe approach may be better.
But you may want to use an appropriate module. File::Find (core module) should fit your needs.

Related

Return file handle from subroutine and pass to other subroutine

I am trying to create a couple of functions that will work together. getFH should take in the mode to open the file (either > or < ), and then the file itself (from the command line). It should do some checking to see if the file is okay to open, then open it, and return the file handle. doSomething should take in the file handle, and loop over the data and do whatever. However when the program lines to the while loop, I get the error:
readline() on unopened filehandle 1
What am I doing wrong here?
#! /usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Getopt::Long;
use Pod::Usage;
# command line param(s)
my $infile = '';
my $usage = "\n\n$0 [options] \n
Options
-infile Infile
-help Show this help message
\n";
# check flags
GetOptions(
'infile=s' => \$infile,
help => sub { pod2usage($usage) },
) or pod2usage(2);
my $inFH = getFh('<', $infile);
doSomething($inFH);
## Subroutines ##
## getFH ##
## #params:
## How to open file: '<' or '>'
## File to open
sub getFh {
my ($read_or_write, $file) = #_;
my $fh;
if ( ! defined $read_or_write ) {
die "Read or Write symbol not provided", $!;
}
if ( ! defined $file ) {
die "File not provided", $!;
}
unless ( -e -f -r -w $file ) {
die "File $file not suitable to use", $!;
}
unless ( open( $fh, $read_or_write, $file ) ) {
die "Cannot open $file",$!;
}
return($fh);
}
#Take in filehandle and do something with data
sub doSomething{
my $fh = #_;
while ( <$fh> ) {
say $_;
}
}
my $fh = #_;
This line does not mean what you think it means. It sets $fh to the number of items in #_ rather than the filehandle that is passed in - if you print the value of $fh, it will be 1 instead of a filehandle.
Use my $fh = shift, my $fh = $_[0], or my ($fh) = #_ instead.
As has been pointed out, my $fh = #_ will set $fh to 1, which is not a file handle. Use
my ($fh) = #_
instead to use list assignment
In addition
-e -f -r -w $file will not do what you want. You need
-e $file and -f $file and -r $file and -w $file
And you can make this more concise and efficient by using underscore _ in place of the file name, which will re-use the information fetched for the previous file test
-e $file and -f _ and -r _ and -w _
However, note that you will be rejecting a request if a file isn't writeable, which makes no sense if the request is to open a file for reading. Also, -f will return false if the file doesn't exist, so -e is superfluous
It is good to include $! in your die strings as it contains the reason for the failure, but your first two tests don't set this value up, and so should be just die "Read or Write symbol not provided"; etc.
In addition, die "Cannot open $file", $! should probably be
die qq{Cannot open "$file": $!}
to make it clear if the file name is empty, and to add some space between the message and the value of $!
The lines read from the file will have a newline character at the end, so there is no need for say. Simply print while <$fh> is fine
Perl variable names are conventionally snake_case, so get_fh and do_something is more usual

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

How to get the output for this code?

I have a file t_code.txt in which I want to replace all occurrences of strings PIOMUX_UART1_TXD and PIOMUX_UART1_RXD with strings in #array1 containing TXD and RXD respectively and then print it in new file c_code2.txt but it's not working
open my $f6, '<', 't_code.txt' or die $!;
my #lines = <$f6>;
my #newlines;
foreach (#lines) {
$_ =~ s/PIOMUX_UART1_TXD/ grep ( / TXD / )(#array1)/g;
$_ =~ s/PIOMUX_UART1_RXD/ grep ( / RXD / )(#array1)/g;
push(#newlines, $_);
}
close($f6);
open my $output, '>', 'c_code2.txt' or die "Can't open the output file!";
print $output #newlines;
close($output);
Since #array1 (a dreadful choice of identifier, by the way) doesn't change inside the loop, it is best to build the replacement strings outside instead of every time you make a replacement.
It isn't clear exactly what string you want to replace PIOMUX_UART1_TXD and PIOMUX_UART1_RXD with, but this code joins all the matching elements of the array with commas and uses that. I hope it's cler how to do something different if you need to.
I've also used a while loop, as there's no need to read the whole file into an array beforehand.
my ($in_file, $out_file) = qw/ t_code.txt c_code2.txt /;
open my $in_fh, '<', $in_file or die qq{Unable to open "$in_file" for reading: $!};
open my $out_fh, '>', $out_file or die qq{Unable to open "$out_file" for writing: $!};
my ($txd) = grep /TXD/, #array1;
my ($rxd) = grep /RXD/, #array1;
while ( <$in_fh> ) {
s/PIOMUX_UART1_TXD/$txd/g;
s/PIOMUX_UART1_RXD/$rxd/g;
print $out_fh $_;
}
close $out_fh or die $!;
Several problems in the code:
To be able to use code in the replacement part of a substitution, you must use the /e modifier.
In a s/// construct, you can't use / unquoted. Either change the separator, or backslash it.
The replacement part in a substitution is a string. In case of code, it's evaluated in scalar context. grep in scalar context returns the number of matches.
Cf:
#! /usr/bin/perl
use warnings;
use strict;
my #array1 = qw( aTXDb cRXDd );
while (<DATA>) {
s/PIOMUX_UART1_TXD/join q(), grep m=TXD=, #array1/eg;
s/PIOMUX_UART1_RXD/join q(), grep m=RXD=, #array1/eg;
print;
}
__DATA__
PIOMUX_UART1_TXD
PIOMUX_UART1_RXD

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

Perl problems printing output to a new file

I want to remove all lines in a text file that start with HPL_ I have acheived this and can print to screen, but when I try to write to a file, I just get the last line of the amended text printed in the new file. Any help please!
open(FILE,"<myfile.txt");
#LINES = <FILE>;
close(FILE);
open(FILE,">myfile.txt");
foreach $LINE (#LINES) {
#array = split(/\:/,$LINE);
my $file = "changed";
open OUTFILE, ">$file" or die "unable to open $file $!";
print OUTFILE $LINE unless ($array[0] eq "HPL_");
}
close(FILE);
close (OUTFILE);
exit;
You just want to remove all lines that start with HPL_? That's easy!
perl -pi -e 's/^HPL_.*//s' myfile.txt
Yes, it really is just a one-liner. :-)
If you don't want to use the one-liner, re-write the "write to file" portion as follows:
my $file = "changed";
open( my $outfh, '>', $file ) or die "Could not open file $file: $!\n";
foreach my $LINE (#LINES) {
my #array = split(/:/,$LINE);
next if $array[0] eq 'HPL_';
print $outfh $LINE;
}
close( $outfh );
Note how you are open()ing the file each time through the loop. This is causing the file to only contain the last line, as using open() with > means "overwrite what's in the file". That's the major problem with your code as it stands.
Edit: As an aside, you want to clean up your code. Use lexical filehandles as I've shown. Always add the three lines that tchrist posted at the top of every one of your Perl programs. Use the three-operator version of open(). Don't slurp the entire file into an array, as if you try to read a huge file it could cause your computer to run out of memory. Your program could be re-written as:
#!perl
use strict;
use autodie;
use warnings FATAL => "all";
my $infile = "myfile.txt";
my $outfile = "changed.txt";
open( my $infh, '<', $infile );
open( my $outfh, '>', $outfile );
while( my $line = <$infh> ) {
next if $line =~ /^HPL_/;
print $outfh $line;
}
close( $outfh );
close( $infh );
Note how with use autodie you don't need to add or die ... to the open() function, as the autodie pragma handles that for you.
The issue with your code is that you open the file for output within your line-processing loop which, due to your use of the '>' form of open, opens the file each time for write, obliterating any previous content.
Move the invocation of open() to the top of your file, above the loop, and it should work.
Also, I'm not sure of your intent but at line 4 of your example, you reopen your input file for write (using '>'), which also clobbers anything it contains.
As a side note, you might try reading up on Perl's grep() command which is designed to do exactly what you need, as in:
#!/usr/bin/perl
use strict;
use warnings;
open(my $in, '<', 'myfile.txt') or die "failed to open input for read: $!";
my #lines = <$in> or die 'no lines to read from input';
close($in);
# collect all lines that do not begin with HPL_ into #result
my #result = grep ! /^HPL_/, #lines;
open(my $out, '>', 'changed.txt') or die "failed to open output for write: $!";
print { $out } #result;
close($out);