search for text pattern in a directory using perl script - perl

could anyone please share with me a snippet where i can do a grep search in perl file. Example,
i need this grep: grep 1115852311 /opt/files/treated/postpaid/*
to be done in perl script and print all the matches
tried the below, but did not work :
my $start_dir= "\opt\files\treated\postpaid\";
my $file_name = "*";
my #filematches;
opendir(DIR, "$start_dir");
#xml_files = grep(1115852311,readdir(DIR));
print #xml_files;

A good start would be to read the documentation for grep(). If you do, you'll see that Perl's grep() works rather differently to the Unix grep command. The Unix command just looks for text in a list of files. The Perl version works on any list of data and returns any elements in that list for which an arbitrary Boolean expression is true.
A Perl version of the Unix command would look something like this:
while (<$some_open_filehandle>) {
print if /$some_string/;
}
That's not quite what you want, but we can use it as a start. First, let's write something that takes a filename and string and checks whether the string appears in the file:
sub is_string_in_file {
my ($filename, $string) = #_;
open my $fh, '<', $filename or die "Cannot open file '$filename': $!\n";
return grep { /$string/ } <$fh>;
}
We can now use that in a loop which uses readdir() to get a list of files.
my #files;
my $dir = '/opt/files/treated/postpaid/';
opendir my $dh, $dir or die $!;
while (my $file = readdir($dh)) {
if (is_string_in_file("$dir$file", 1115852311) {
push #files, "$dir$file";
}
}
After running that code, the list of files that contain your string will be in #files.
You might want to look at glob() instead of opendir() and readdir().

used the below snippet to achieve what i wanted
#!/usr/bin/perl
use strict;
use warnings;
sub is_string_in_file {
my ($filename, $string) = #_;
open my $fh, '<', $filename
or die "Cannot open file \n";
while(my $line = <$fh>){
if($line =~ /$string/){
print $string;
print $filename."\n";
}
}
#return grep { $_ eq $string } <$fh>;
}
my #files;
my $dir = '/opt/files/treated/postpaid/';
opendir my $dh, $dir or die $!;
while (my $file = readdir($dh)) {
is_string_in_file("$dir$file", 1115852311);
}

Related

Unable to open files returned by readdir in Perl [duplicate]

This question already has answers here:
Why can't I open files returned by Perl's readdir?
(2 answers)
Closed 7 years ago.
I have a problem with a Perl script, as follows.
I must open and analyze all the *.txt files in a directory, but I cannot.
I can read file names that are saved in the #files array and printed, but I cannot open those files for reading.
This is my code:
my $dir= "../Scrivania/programmi" ;
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
print "sono nel foreach\n";
print " in : "."$fh\n";
#open(CANALI,$fh);
##righe=<CANALI>;
#close(CANALI);
#print "canali:"."#righe\n";
#foreach $canali (#righe)
#{
# $canali =~ /\d\d:\d\d (-) (.*)/;
# $ora= $1;
#
# if($hhSplit[0] == $ora)
# {
# push(#output, "$canali");
#
# }
#}
}
The main problem you have is that the file names returned by readdir have no path, so you're trying to open, say, x.txt when you should be opening ../Sc/direct/x.txt. The file doesn't exist in the current working directory so your open call fails
You also have a strange mixture of stuff in glob("$dir/(.*).txt/") which looks a little like a regex pattern, which glob doesn't understand. The value of $dir is a directory handle left open from the opendir on the first line. What you should be using is glob '../Sc/direct/*.txt', but then there's no need for the readdir
There are two ways to find the contents of a file. You can use opendir and readdir to read everything in the directory, or you can use glob
The first method returns only the bare name of each entry, which means you must concatenate each name with the path to the containing directory, preferably using catfile from File::Spec::Functions. It also includes the pseudo-directories . and .. so you must filter those out before you can use the list of names
glob has neither of these disadvantages. All the strings it returns are real directory entries, and they will include a path if you provided one in the pattern you passed as a parameter
You seem to have become rather muddled over the two, so I have written this program which differentiates between the two approaches. I hope it makes things clearer
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Spec::Functions qw/ catfile /;
my $dir = '../Sc/direct';
### Using glob
for my $file ( glob catfile($dir, '*.txt') ) {
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
### Using opendir / readdir
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt$/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
Using $dir in the glob is incorrect. $dir is a GLOB type not a string value. Rather you should be looping over the #files array and looking for names that match what you want. Maybe something like so:
foreach my $fp (#files) {
if ($fp =~ /(.*).txt/) {
print "$fp is a .txt\n";
open (my $in, "<", $fp)
while (<$in>) ...
}
}

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

Perl - search and replace across multiple lines across multiple files in specified directory

At the moment this code replaces all occurences of my matching string with my replacement string, but only for the file I specify on the command line. Is there a way to change this so that all .txt files for example, in the same directory (the directory I specify) are processed without having to run this 100s of times on individual files?
#!/usr/bin/perl
use warnings;
my $filename = $ARGV[0];
open(INFILE, "<", $filename) or die "Cannot open $ARGV[0]";
my(#fcont) = <INFILE>;
close INFILE;
open(FOUT,">$filename") || die("Cannot Open File");
foreach $line (#fcont) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close INFILE;
I have also tried this:
perl -p0007i -e 's/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/m' *.txt
But have noticed that is only changes the first occurence of the matched pattern and ignores all the rest in the file.
I also have tried this, but it doesn't work in the sense that it just creates a blank file:
use v5.14;
use strict;
use warnings;
use DBI;
my $source_dir = "C:/Testing2";
# Store the handle in a variable.
opendir my $dirh, $source_dir or die "Unable to open directory: $!";
my #files = grep /\.txt$/i, readdir $dirh;
closedir $dirh;
# Stop script if there aren't any files in the list
die "No files found in $source_dir" unless #files;
foreach my $file (#files) {
say "Processing $source_dir/$file";
open my $in, '<', "$source_dir/$file" or die "Unable to open $source_dir/$file: $!\n";
open(FOUT,">$source_dir/$file") || die("Cannot Open File");
foreach my $line (#files) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close $in;
}
say "Status: Processing of complete";
Just wondering what am I missing from my code above? Thanks.
You could try the following:
opendir(DIR,"your_directory");
my #all_files = readdir(DIR);
closedir(DIR);
for (#all_files) .....

Perl Open: No such file or directory

I'm trying to read every text file in a directory into a variable then print the first 100 characters, including line breaks. However, Perl says that the files don't exist even though they really do exist.
use strict;
use warnings;
my $dir = "C:\\SomeFiles";
my #flist;
open(my $fh, "dir /a:-d /b $dir |") || die "$!";
while (<$fh>) {
if ($_ =~ /.*(.txt)$/i) {
push(#flist, $_);
}
}
foreach my $f (#flist) {
print "$dir\\$f";
my $txt = do {
local $/ = undef;
open(my $ff, "<", "$dir\\$f") || die "$!";
<$ff>;
};
print substr($txt, 0, 100);
}
When I run the script, the following is written to the console:
C:\SomeFiles\file1.txt
No such file or directory at script.pl line 19, <$fh> chunk 10.
It's looking at the right file and I'm certain that the file exists. When I try using this method to open a single file rather than getting each file via an array with foreach, it works just fine. Is there something obvious that I've overlooked here?
A better solution is to use readdir() instead (or File::Find if you ever want to do it recursively):
my $dir = "C:\\SomeFiles";
opendir(my $dh, $dir) || die "$!";
while (my $file = readdir($dh)) {
if ($file =~ /\\.txt$/i) {
print $file . "\n";
my $txt = do {
local $/ = undef;
open(my $ff, "<", "$dir\\$file") || die "$!";
<$ff>;
};
print substr($txt, 0, 100) . "\n";
}
}
closedir($dh);