Search string with multiple words in the pattern - perl

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For

There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}

it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {

Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

Related

Counting number of lines with conditions

This is my script count.pl, I am trying to count the number of lines in a file.
The script's code :
chdir $filepath;
if (-e "$filepath"){
$total = `wc -l < file.list`;
printf "there are $total number of lines in file.list";
}
i can get a correct output, but i do not want to count blank lines and anything in the file that start with #. any idea ?
As this is a Perl program already open the file and read it, filtering out lines that don't count with
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $num_lines = grep { not /^$|^\s*#/ } <$fh>;
where $filename is "file.list." If by "blank lines" you mean also lines with spaces only then chagne regex to /^\s*$|^\s*#/. See grep, and perlretut for regex used in its condition.
That filehandle $fh gets closed when the control exits the current scope, or add close $fh; after the file isn't needed for processing any more. Or, wrap it in a block with do
my $num_lines = do {
open my $fh, '<', $filename or die "Can't open $filename: $!";
grep { not /^$|^\s*#/ } <$fh>;
};
This makes sense doing if the sole purpose of opening that file is counting lines.
Another thing though: an operation like chdir should always be checked, and then there is no need for the race-sensitive if (-e $filepath) either. Altogether
# Perhaps save the old cwd first so to be able to return to it later
#my $old_cwd = Cwd::cwd;
chdir $filepath or die "Can't chdir to $filepath: $!";
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $num_lines = grep { not /^$|^\s*#/ } <$fh>;
A couple of other notes:
There is no reason for printf. For all normal prints use say, for which you need use feature qw(say); at the beginning of the program. See feature pragma
Just in case, allow me to add: every program must have at the beginning
use warnings;
use strict;
Perhaps the original intent of the code in the question is to allow a program to try a non-existing location, and not die? In any case, one way to keep the -e test, as asked for
#my $old_cwd = Cwd::cwd;
chdir $filepath or warn "Can't chdir to $filepath: $!";
my $num_lines;
if (-e $filepath) {
open my $fh, '<', $filename or die "Can't open $filename: $!";
$num_lines = grep { not /^$|^\s*#/ } <$fh>;
}
where I still added a warning if chdir fails. Remove that if you really don't want it. I also added a declaration of the variable that is assigned the number of lines, with my $total_lines;. If it is declared earlier in your real code then of course remove that line here.
perl -ne '$n++ unless /^$|^#/ or eof; print "$n\n" if eof'
Works with multiple files too.
perl -ne '$n++ unless /^$|^#/ or eof; END {print "$n\n"}'
Better for a single file.
open(my $fh, '<', $filename);
my $n = 0;
for(<$fh>) { $n++ unless /^$|^#/}
print $n;
Using sed to filter out the "unwanted" lines in a single file:
sed '/^\s*#/d;/^\s*$/d' infile | wc -l
Obviously, you can also replace infile with a list of files.
The solution is very simple, no any magic.
use strict;
use warnings;
use feature 'say';
my $count = 0;
while( <> ) {
$count++ unless /^\s*$|^\s*#/;
}
say "Total $count lines";
Reference:
<>

Perl Script: sorting through log files.

Trying to write a script which opens a directory and reads bunch of multiple log files line by line and search for information such as example:
"Attendance = 0 " previously I have used grep "Attendance =" * to search my information but trying to write a script to search for my information.
Need your help to finish this task.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/path/';
opendir (DIR, $dir) or die $!;
while (my $file = readdir(DIR))
{
print "$file\n";
}
closedir(DIR);
exit 0;
What's your perl experience?
I'm assuming each file is a text file. I'll give you a hint. Try to figure out where to put this code.
# Now to open and read a text file.
my $fn='file.log';
# $! is a variable which holds a possible error msg.
open(my $INFILE, '<', $fn) or die "ERROR: could not open $fn. $!";
my #filearr=<$INFILE>; # Read the whole file into an array.
close($INFILE);
# Now look in #filearr, which has one entry per line of the original file.
exit; # Normal exit
I prefer to use File::Find::Rule for things like this. It preserves path information, and it's easy to use. Here's an example that does what you want.
use strict;
use warnings;
use File::Find::Rule;
my $dir = '/path/';
my $type = '*';
my #files = File::Find::Rule->file()
->name($type)
->in(($dir));
for my $file (#files){
print "$file\n\n";
open my $fh, '<', $file or die "can't open $file: $!";
while (my $line = <$fh>){
if ($line =~ /Attendance =/){
print $line;
}
}
}

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

Perl - search and replace across multiple lines across multiple files in specified directory

At the moment this code replaces all occurences of my matching string with my replacement string, but only for the file I specify on the command line. Is there a way to change this so that all .txt files for example, in the same directory (the directory I specify) are processed without having to run this 100s of times on individual files?
#!/usr/bin/perl
use warnings;
my $filename = $ARGV[0];
open(INFILE, "<", $filename) or die "Cannot open $ARGV[0]";
my(#fcont) = <INFILE>;
close INFILE;
open(FOUT,">$filename") || die("Cannot Open File");
foreach $line (#fcont) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close INFILE;
I have also tried this:
perl -p0007i -e 's/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/m' *.txt
But have noticed that is only changes the first occurence of the matched pattern and ignores all the rest in the file.
I also have tried this, but it doesn't work in the sense that it just creates a blank file:
use v5.14;
use strict;
use warnings;
use DBI;
my $source_dir = "C:/Testing2";
# Store the handle in a variable.
opendir my $dirh, $source_dir or die "Unable to open directory: $!";
my #files = grep /\.txt$/i, readdir $dirh;
closedir $dirh;
# Stop script if there aren't any files in the list
die "No files found in $source_dir" unless #files;
foreach my $file (#files) {
say "Processing $source_dir/$file";
open my $in, '<', "$source_dir/$file" or die "Unable to open $source_dir/$file: $!\n";
open(FOUT,">$source_dir/$file") || die("Cannot Open File");
foreach my $line (#files) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close $in;
}
say "Status: Processing of complete";
Just wondering what am I missing from my code above? Thanks.
You could try the following:
opendir(DIR,"your_directory");
my #all_files = readdir(DIR);
closedir(DIR);
for (#all_files) .....

Select rows based on text pattern

I want to extract rows from a file that match a particular pattern and I want to do this for over 500 files. It should have the ability to retain the unique name of the file as well.
I used awk but then i have to do each file individually.
c:\>gawk "/S1901/" Census_Tract_*.csv > Census_Tract_*.csv
In the example shown in the link here (http://bit.ly/nMX8qh) I want to retain only those records that have S1901 in them. Apologies for the external link but i am not able to retain formatting of the table.
I found some perl code that I used to write it but it retains all the rows and does not select only those rows/records where the pattern matches. Any tips would be much appreciated. The perl code is below:
#perl -w
$pattern = "Subject_Census*.csv"; # process only those files that match pattern
while (defined ($in = glob($pattern))) {
($out = $in) =~ s/\.csv$/.outcsv/; # read from "xyz.in" and write to "xyz.out"
open (IN, "<", $in) or die "Can't open $in for reading: $!";
open (OUT,">>", $out) or die "Can't open $out for writing: $!";
while (<IN>) {
$mystring =~ /S1901/;
print OUT $_ if $mystring == 0;
}
close (IN) or die "Can't close $in: $!"; # good idea to do some housekeeping
close (OUT) or die "Can't close $out: $!";
}
Untested:
use strict;
use warnings;
use autodie;
my $files_list_filename = 'files.txt';
open my $fl, '<', $files_list_filename;
my #list_of_files = <$fl>;
chomp #list_of_files;
close $fl;
foreach my $file ( #list_of_files ) {
open my $test_fh, '<', $file;
while ( my $line = <$test_fh> ) {
if( $line =~ m/S1901/ ) {
print "$file at $.: $line";
}
}
close $test_fh;
}
Is that sort of what you had in mind? It opens a file named filelist.txt and reads in a list of however many filenames you want to give it. Then it iterates over that list, opening each file one by one, scanning each file one by one, and if a line is found containing the trigger text, it prints the filename and line number, as well as the line itself where the trigger was met. Then it moves on to the next.
perl -ni.bak -e 'print if /S1901/' Subject_Census*.csv