Perl : How to search a Indefinite list of keywords from a list of files in a folder

Perl : How to search a Indefinite list of keywords from a list of files in a folder - perl

Can anyone help me with Perl Script on below problem:
File1.txt -> with keywords to search
Hello_
World!
+Bye
Temp-
File2 (Can be of any extension) In which Keywords to search for, File3, File4 ....
I want to search for all the keywords from File1 in File2, and If they are found then print the keyword found along with the file number and line number In which this particular keyword is found.
I want to keep these no of keywords and files to be indefinite - they can be added and modified.
open(MYINPUTFILE, "<expressions.txt");
# open for input
my(#lines) = <MYINPUTFILE>;
#print #lines;
my #files = grep ( -f ,<*main_log>,<*Project>);
$n = 0;
$l = 0;
#foreach my$file (#files) {
foreach my $line (#lines) {
my #f = grep /$line/,#files;
print "#f\n";
}
#}
}
Issue - I tried to execute the above code but It does not print anything on my command prompt. I am using Windows 7

This answer is based on your posted code:
use strict; # always use these
use warnings;
open( my $kw, '<', 'expressions.txt') or die $!;
my #keywords = <$kw>;
chomp(#keywords); # remove newlines at the end of keywords
# get list of files in current directory
my #files = grep { -f } (<*main_log>,<*Project>);
# loop over each file to search keywords in
foreach my $file (#files) {
open(my $fh, '<', $file) or die $!;
my #content = <$fh>;
close($fh);
my $l = 0;
foreach my $kw (#keywords) {
my $search = quotemeta($kw); # otherwise keyword is used as regex, not literally
foreach (#content) { # go through every line for this keyword
$l++;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_
if /$search/;
}
}
}
Regarding the questions in the comments below:
The innermost loop just counts for line numbers $l++ and puts out the finds in case of occurence - the if /$search/ is still part of the statement above. It could also be written as
if ( /$search/ ) {
printf ...
}
The printf is used to format the output. You could have also done this by simple using print and concatinate all the needed variables. I just prefer it this way.
This assumes, that you want a list of found lines per keyword for every file. You have to switch the order and logic for #keywords and #content to get it line ordered.
For additional functionality regarding comments in the keyword file, you would have to postprocess the content to discern the search terms from comments. Possibly in a hash with search term as key and comment as value. Then you could use only the hash keys for the search (see innermost loop) and put out the comment, if existing, as additional line.

Related

Perl search for a content in file and take out value using regex

I have a set of log files where I want to search for a word called Sum in each file and take the respective sum value out which is next to Sum keyword in the file.
Instead of doing file read operation I am using Tie::File to have the content of file in array and thinking to take out whatever value I needed.
Here is my code:
...
my $logpath = "C:/Users/Vinod/Perl/LOG/";
opendir(DIR, $logpath);
while (my $file = readdir(DIR)) {
next unless (-f "$logpath/$file");
next unless ($file =~ m/\.log$/);
my #lines;
print "$file\n";
tie #lines, 'Tie::File', $file, mode => O_RDWR;
for (#lines) {
print $_ if($_ =~ m/Sum/);
}
untie #lines;
}
closedir(DIR);
Here is what I am trying to extract from my log file:
test_log_file.log
....
....
=
> Sum: 10 PC's, 5 UPS's
End...
From the above test_log_file.log I want to take out value 10.
But the line print $_ if($_ =~ m/Sum/); is printing entire file content. No idea how I can take out the line which contains Sum and PC keywords. So that I can have value 10 using regex.
I can able to take out Sum value using below command:
$sum = qx/more $file | grep -i 'Sum' | grep 'PC' | awk -F " " '{print \$3}'/;
But wanted to resolve this using Perl script itself.

Read line by line. Capture the number and output only the captured part:
while (<>) { say $1 if /Sum: ([0-9]+)/ }

How to get a comment printed for each line of text that matches within a file?

I am trying to match a keyword/text/line given in a file called expressions.txt from all files matching *main_log. When a match is found I want to print the comment for each line that matches.
Is there any better way to get this printed?
expression.txt
Hello World ! # I want to print this comments#
Bye* #I want this to print when Bye Is match with main_log#
:::
:::
Below Is the code I used :
{
open( my $kw, '<', 'expressions.txt' ) or die $!;
my #keywords = <$kw>;
chomp( #keywords ); # remove newlines at the end of keywords
# get list of files in current directory
my #files = grep { -f } ( <*main_log>, <*Project>, <*properties> );
# loop over each file to search keywords in
foreach my $file ( #files ) {
open( my $fh, '<', $file ) or die $!;
my #content = <$fh>;
close( $fh );
my $l = 0;
foreach my $kw ( #keywords ) {
my $search = quotemeta( $kw ); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
$kw =~ m/\((.*)\)/;
my $temp = $1;
print "$temp\n";
foreach ( #content ) { # go through every line for this keyword
$l++;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_ if /$search/;
}
}
}
I tried this code to print the comments mentioned within parentheses (...) but it is not printing in the fashion which I want like below:
If the expression.txt contains
Hello World ! # I want to print this comments#
If Hello World ! string is matched in my file called main_log then it should match only Hello World! from the main_log but print # I want to print this comments# as a comment for user to understand the keyword.
These keywords can be from any length or contains any character.
It worked fine but just a little doubt on printing the required output Into a file though I have used perl -w Test.pl > my_output.txt command on command prompt not sure how can I use Inside the perl script Itself
open( my $kw, '<', 'expressions.txt') or die $!;
my #keywords = <$kw>;
chomp(#keywords); # remove newlines at the end of keywords
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
# get list of files in current directory
my #files = grep { -f } (<*main_log>,<*Project>,<*properties>);
# loop over each file to search keywords in
foreach my $file (#files) {
open(my $fh, '<', $file) or die $!;
my #content = <$fh>;
close($fh);
my $l = 0;
#foreach my $kw (#keywords) {
foreach my $kw (keys %$kwhashref) {
my $search = quotemeta($kw); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
#$kw =~ m/\#(.*)\#/;
#my $temp = $1;
#print "$temp\n";
foreach (#content) { # go through every line for this keyword
$l++;
if (/$search/)
{
# only print if comment defined
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}) ;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_
#printf '$output';
}
}
}
}

Your example code has mismatched braces { ... } and won't compile.
If you were to add another closing brace to the end of your code then it would compile, but the line
$kw =~ m/\((.*)\)/;
will never succeed since there are no parentheses anywhere in expressions.txt. If a match has not succeeded then the value of $1 will be retained from the most recently successful regex match operation
You are also trying to search the lines from the files against the whole of the lines retrieved from expressions.txt, when you should be splitting those lines into keywords and their corresponding comments

This seems to be the followup for this answer of another question of you. What I tried to suggest in the last paragraph would start after the first three lines of your code:
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
Now you have the keywords in a hashref containing the actual keywords to search for as keys, and comments as values, if they exists (using your #comment# at the end of line syntax here).
Your keyword loop would now have to use keys %$kwhashref and you now can additionally print the comment in the inner loop, converted like shown in the answer I linked. The additional print:
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}); # only print if comment defined

How do I find the line a word is on when the user enters text in Perl?

I have a simple text file that includes all 50 states. I want the user to enter a word and have the program return the line the specific state is on in the file or otherwise display a "word not found" message. I do not know how to use find. Can someone assist with this? This is what I have so far.
#!/bin/perl -w
open(FILENAME,"<WordList.txt"); #opens WordList.txt
my(#list) = <FILENAME>; #read file into list
my($state); #create private "state" variable
print "Enter a US state to search for: \n"; #Print statement
$line = <STDIN>; #use of STDIN to read input from user
close (FILENAME);

An alternative solution that reads only the parts of the file until a result is found, or the file is exhausted:
use strict;
use warnings;
print "Enter a US state to search for: \n";
my $line = <STDIN>;
chomp($line);
# open file with 3 argument open (safer)
open my $fh, '<', 'WordList.txt'
or die "Unable to open 'WordList.txt' for reading: $!";
# read the file until result is found or the file is exhausted
my $found = 0;
while ( my $row = <$fh> ) {
chomp($row);
next unless $row eq $line;
# $. is a special variable representing the line number
# of the currently(most recently) accessed filehandle
print "Found '$line' on line# $.\n";
$found = 1; # indicate that you found a result
last; # stop searching
}
close($fh);
unless ( $found ) {
print "'$line' was not found\n";
}
General notes:
always use strict; and use warnings; they will save you from a wide range of bugs
3 argument open is generally preferred, as well as the or die ... statement. If you are unable to open the file, reading from the filehandle will fail
$. documentation can be found in perldoc perlvar

Tool for the job is grep.
chomp ( $line ); #remove linefeeds
print "$line is in list\n" if grep { m/^\Q$line\E$/g } #list;
You could also transform your #list into a hash, and test that, using map:
my %cities = map { $_ => 1 } #list;
if ( $cities{$line} ) { print "$line is in list\n";}
Note - the above, because of the presence of ^ and $ is an exact match (and case sensitive). You can easily adjust it to support fuzzier scenarios.

Perl split a text file into chunks

I have a large txt file made of thousand of articles and I am trying to split it into individual files - one for each of the articles that I'd like to save as article_1, article_2 etc.. Each articles begins by a line containing the word /DOCUMENTS/.
I am totally new to perl and any insight would be so great ! (even advice on good doc websites). Thanks a lot.
So far what I have tried look like:
#!/usr/bin/perl
use warnings;
use strict;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open IN, $source or die "can t read $source: $!\n";
while (<IN>)
{
{
open OUT, ">$destination" or die "can t write $destination: $!\n";
if (/DOCUMENTS/)
{
close OUT ;
$id++;
}
}
}
close IN;

Let's say that /DOCUMENTS/ appears by itself on a line. Thus you can make that the record separator.
use English qw<$RS>;
use File::Slurp qw<write_file>;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
{ local $RS = "\n/DOCUMENTS/\n";
open my $in, $source or die "can t read $source: $!\n";
while ( <$in> ) {
chomp; # removes the line "\n/DOCUMENTS/\n"
write_file( 'file' . ( ++$id ) . '.txt', $_ );
}
# being scoped by the surrounding brackets (my "local block"),
close $in; # an explicit close is not necessary
}
NOTES:
use English declares the global variable $RS. The "messy name" for it is $/. See perldoc perlvar
A line separator is the default record separator. That is, the standard unit of file reading is a record. Which is only, by default, a "line".
As you will find in the linked documentation, $RS only takes literal strings. So, using the idea that the division between articles was '/DOCUMENTS/' all by itself on a line, I specified newline + '/DOCUMENTS/' + newline. If this is part of a path that occurs somewhere on the line, then that particular value will not work for the record separator.

Did you read Programming Perl? It is the best book for beginning!
I don't understand what you are trying to do. I assume you have text that has articles and want to get all articles in separate files.
use warnings;
use strict;
use autodie qw(:all);
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open my $IN, '<', $source;
#open first file
open my $OUT, '>', $destination;
while (<$IN>) {
chomp; # kill \n at the end
if ($_ eq '/DOCUMENTS/') { # not sure, am i right here or what you looking for
close OUT;
$id++;
$destination = "file$id.txt";
open my $OUT, '>', $destination;
} else {
print {$OUT} $_, "\n"; # print into file with $id name (as you open above)
}
}
close $IN;

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps

I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.