Perl : search a keywords from a file and display the occurence

Perl : search a keywords from a file and display the occurence - perl

I have two files
1. input.txt
2. keyword.txt
input.txt has contents like
.src_ref 0 "call.s" 24 first
0x000000 0x5a80 0x0060 BRA.l 0x60
.src_ref 0 "call.s" 30 first
0x000002 0x1bc5 RETI
.src_ref 0 "call.s" 31 first
0x000003 0x6840 MOV R0L,R0L
.src_ref 0 "call.s" 35 first
0x000004 0x1bc5 RETI
keyword.txt has contents
MOV
BRA.l
RETI
ADD
SUB
..
etc
Now I want to read this keyword.txt file and search it in input.txt file and find how many times MOV has occured,how many times BRA.l has occured and so on.
So far I have managed to get it working from a single file itself. here is the code
#!/usr/bin/perl
use strict;
use warnings;
sub retriver();
my #lines;
my $lines_ref;
my $count;
$lines_ref=retriver();
#lines=#$lines_ref;
$count=#lines;
print "Count :$count\nLines\n";
print join "\n",#lines;
sub retriver()
{
my $file='C:\Users\vk18434\Desktop\input.txt';
my $keyword_file = 'C:\Users\vk18434\Desktop\keywords.txt';
open FILE, $file or die "FILE $file NOT FOUND - $!\n";
my #contents=<FILE>;
open FILE, $keyword_file or die "FILE $file NOT FOUND - $!\n";
my #key=<FILE>;
my #filtered=grep(/^$key$/,#contents);
#my #filtered = grep $_ eq $keywords,#contents;
return \#filtered;
}
Output should look like:
MOV appeared 1 time
RETI appeared 2 times
Any help is appreciated. Request you to please help on this !!

I couldn't get your code working, but this code works and is a little easier to read IMO (change the paths back to the ones on your filesystem):
#!/usr/bin/perl
open(my $keywordFile, "<", '/Users/mark/workspace/stackOverflow/keyword.txt')
or die 'Could not open keywords.txt';
foreach my $key(<$keywordFile>) {
chomp $key;
open (my $file, '<', '/Users/mark/workspace/stackOverflow/input.txt')
or die 'Could not open input.txt';
my $count = 0;
foreach my $line (<$file>) {
my $number = () = $line =~ /$key/gi;
$count = $count + $number;
}
close($file);
print "$key was found $count times.\n";
}
The one confusing part is the crazy regex line. I found that on StackOverflow here, and didn't have time to come up with anything cleaner : Is there a Perl shortcut to count the number of matches in a string?

Check this and try:
#!/usr/bin/perl
use strict;
use warnings;
my (#text, #lines);
my $lines_ref;
my $count;
$lines_ref = &retriver;
sub retriver
{
my $file='input.txt';
my $keyword_file = 'keywords.txt';
open KEY, $keyword_file or die "FILE $file NOT FOUND - $!\n";
my #key=<KEY>;
my #filtered;
foreach my $keys(#key)
{
my $total = '0';
chomp($keys);
open FILE, $file or die "FILE $file NOT FOUND - $!\n";
while(<FILE>)
{
my $line = $_;
my $counter = () = $line =~ /$keys/gi;
$total = $total + $counter;
}
close(FILE);
print "$keys found in $total\n";
}
}

# perl pe3.pl
Prototype mismatch: sub main::retriver () vs none at pe3.pl line 36.
cygwin warning:
MS-DOS style path detected: C:\Users\xxxxx\Desktop\input.txt
Preferred POSIX equivalent is: /cygdrive/c/Users/xxxxx/Desktop/input.txt
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
Count :3
Lines
BRA.l
RETI
RETI

Related

how to combine the code to make the output is on the same line?

Can you help me to combine both of these progeam to display the output in a row with two columns? The first column is for $1 and the second column is $2.
Kindly help me to solve this. Thank you :)
This is my code 1.
#!/usr/local/bin/perl
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
This is my code 2.
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
this is my output for code 1 which contain 26 line of data:
**async_default**
**clock_gating_default**
Ddia_link_clk
Ddib_link_clk
Ddic_link_clk
Ddid_link_clk
FEEDTHROUGH
INPUTS
Lclk
OUTPUTS
VISA_HIP_visa_tcss_2000
ckpll_npk_npkclk
clstr_fscan_scanclk_pulsegen
clstr_fscan_scanclk_pulsegen_notdiv
clstr_fscan_scanclk_wavegen
idvfreqA
idvfreqB
psf5_primclk
sb_nondet4tclk
sb_nondetl2tclk
sb_nondett2lclk
sbclk_nondet
sbclk_sa_det
stfclk_scan
tap4tclk
tapclk
The output code 1 also has same number of line.

paste is useful for this: assuming your shell is bash, then using process substitutions
paste <(perl script1.pl) <(perl script2.pl)
That emits columns separated by a tab character. For prettier output, you can pipe the output of paste to column
paste <(perl script1.pl) <(perl script2.pl) | column -t -s $'\t'
And with this, you con't need to try and "merge" your perl programs.

To combine the two scripts and to output two items of data on the same line, you need to hold on until the end of the file (or until you have both data items) and then output them at once. So you need to combine both loops into one:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
my( $levels, $timing );
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
}
print "$levels, $timing\n";
close (FILE);

You still haven't given us one vital piece of information - what does the input data looks like. Most importantly, are the two pieces of information you're looking for on the same line?
[Update: Looking more closely at your regexes, I see it's possible for both pieces of information to be on the same line - as they are both supposed to be the first item on the line. It would be helpful if you were clearer about that in your question.]
I think this will do the right thing, no matter what the answer to your question is. I've also added the improvements I suggested in my answer to your previous question:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $zipped = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $unzipped = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $zipped => $unzipped
or die "gunzip failed: $GunzipError\n";
open (my $fh, '<', $unzipped) or die "Cannot open '$unzipped': $!\n";
my ($levels, $timing);
while (<$fh>) {
chomp;
if (m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if (m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
# If we have both values, then print them out and
# set the variables to 'undef' for the next iteration
if ($levels and $timing) {
print "$levels, $timing\n";
undef $levels;
undef $timing;
}
}
close ($fh);

Perl simple filehandling of text

What this program is meant to do is that it reads a text file which looks like:
Item \t\t Price
apple \t\t 20
orange \t\t 50
lime \t\t 30
I'm using split function to split these 2 columns and then i should apply a -25% discount on all items and print it out to a new file. My code so far does what i want but the new text file has a '0' value under my last number in price column. I also get 2 errors if i run it with "use warnings" which are:
Use of uninitialized value $item in multiplication * ...
Use of uninitialized value $item[0] in concatenation (.) ...
I should also tell total number of items calculated but i get like 5 1's instead of 5. (11111 instead of 5)
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
print OUT $header;
while (<IN>) {
chomp;
my #items = split(/\t\t/);
foreach my $item ($items[1]){
my $discount = $item * (0.75);
print OUT "$items[0]\t\t$discount\n";
}
}

This is too complicated and not clear what are you doing in foreach loop and you are not skipping empty lines. Keep it simple:
use warnings;
use strict;
use v5.10;
<>; # skip header
while(my $line = <>)
{
chomp $line;
next unless ($line);
my ($title, $price ) = split /\s+/, $line;
if( $title && defined $price )
{
$price *= 0.75;
say "$title\t\t$price";
}
}
and run like
perl script.pl <input.txt >output.txt

use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
my $item;
my $price;
print OUT $header;
while (<IN>) {
chomp;
($item, $price) = split(/\t\t/);
my $discount = $price*0.75;
print OUT "$item $discount\n";
}
This should help! :)

If the total item count isn't very important to you:
$ perl -wane '$F[1] *= 0.75 if $. > 1; print join("\t", #F), "\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
If you really need the total item count:
$ perl -we 'while (<>) { #F = split; if ($. > 1) { $F[1] *= 0.75; $i++ } print join("\t", #F), "\n"; } print "$i items\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
3 items

I'd use this approach
#!/usr/bin/perl
use strict;
use warnings;
my %items;
my $filename = 'shop.txt';
my $discount = 'discount.txt';
open my $in, '<', $filename or die "Failed to open file! : $!\n";
open my $out, ">", $discount or die "Can't open output file for writing: $!";
print $out "Item\t\tPrice\n";
my $cnt = 0;
while (my $line = <$in>) {
chomp $line;
if (my ($item,$price) = $line =~ /(\w.+)\s+([0-9.]+)/){
$price = $price * (0.75);
print $out "$item\t\t$price\n";
$items{$item} = $price;
$cnt++;
}
}
close($in);
close($out);
my $total = keys %items;
print "Total items - $total \n";
print "Total items - $cnt\n";
Using regex capture groups to capture the item and price (using \w.+ in case the item is 2 words like apple sauce), this will also prevent empty lines from printing to file.
I also hard coded the Item and Price header, probably a good idea if you are going to be using a consistent header.
Hope it helps
---Update ----
I added 2 examples of a total count in my script. The first one is using a hash and printing out the hash size, the second method is using a counter. The hash option is good except if your list has 2 items that are the same in which case the key of the hash will be overridden with the last item found which shares the same name. The counter is a simple solution.

Merge txt files in Perl, but modify them before, leaving original files untouched

I've already posted a question and fixed the problem in my code, but now my "specification has changed" so to say, and now I need to change some things about it.
Here's a code that takes all .txt files from the current directory, cuts off the last line of the first file, the first and the last line of every following file and the first line of the last file and writes everything in a new file (in other words: merge all files, deleting header and footer so that the new file has only one header and one footer).
#!/usr/bin/perl
use warnings;
use Cwd;
use Tie::File;
use Tie::Array;
my $cwd = getcwd();
my $buff = '';
# Get all files in cwd.
my #files = grep ( -f ,<*.txt>);
# Cut off header and footer of $files [1] to $files[$#files-1],
# but only footer of $files[0] and header of $#files[$#files]
for (my $i = 0; $i <= $#files; $i++) {
print 'Opening ' . $files[$i] . "\n";
tie (#lines, Tie::File, $files[$i]) or die "can't update $file: $!";
splice #lines, 0, 1 unless $i == 0;
splice #lines, -1, 1 unless $i == $#files;
untie #lines;
open (file, "<", $files[$i]) or die "can't update $file: $!";
while (my $line =<file>) {
$buff .= $line;
}
close file;
}
# Write the buffer to a new file.
my $allfilename = $cwd.'/Trace.txt';
print 'Writing all files into new file: ' . $allfilename . "\n";
open $outputfile, ">".$allfilename or die "can't write to new file $outputfile: $!";
# Write the buffer into the output file.
print $outputfile $buff;
close $outputfile;
My problem: I don't want to change the original files, but my code does exactly that and I'm having trouble coming up with a solution. The simplest way (simple meaning not having to change too much code) would now be, to just copy all the files to a tmp directory, messing around with them and leaving the original files untouched. Problem: a simple use of dircopy doesn't do it for me, since you have to give a new tmp dir to the dircopy function, making the code only usable for Windows or UNIX systems (but I need portability).
The next approach would be to make use of the File::Temp module but I'm really having trouble with the docs on this one.
Does anybody have a good idea on this one?

I suspected that you didn't really want your original files modified when I answered your previous question.
I don't understand why you've gone back to accumulating all the text in a buffer before printing it, or why you've removed use strict, which is essential to any well-written Perl code.
Here's my previous solution modified to leave the input data untouched.
use strict;
use warnings;
use Tie::File;
my #files = grep -f, glob '*.txt';
my $all_filename = 'Trace.txt';
open my $out_fh, '>', $all_filename or die qq{Unable to open "$all_filename" for output: $!};
for my $i ( 0 .. $#files ) {
my $file = $files[$i];
next if $file eq $all_filename;
print "Opening $file\n";
tie my #lines, 'Tie::File', $file or die qq{Can't open "$file": $!};
my ($start, $end) = (0, $#lines);
++$start unless $i == 0;
--$end unless $i == $#files;
print $out_fh "$_\n" for #lines[$start..$end];
}
close $out_fh;

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $outfile = 'Trace.txt';
# Get all files in cwd.
my #files = grep { -f && $_ ne $outfile } <*.txt>;
open my $outfh, '>', $outfile;
for my $file (#files) {
my #lines = do { local #ARGV = $file; <> };
shift #lines unless $file eq $files[0];
pop #lines unless $file eq $files[-1];
print $outfh #lines;
}

Just do not use Tie::File. Or is there a reason you do this, for example all your files together do not fit your memory or something?
A version very close to your current implementation would be something like the following (untested) code. It just skips the part where you update the file, just to reopen and read it afterwards. (Note that this is certainly not a very effective or overly elegant way to do this, it just sticks to your implementation as close as possible)
#!/usr/bin/perl
use warnings;
use Cwd;
# use Tie::File;
# use Tie::Array;
my $cwd = getcwd();
my $buff = '';
# Get all files in cwd.
my #files = grep ( -f ,<*.txt>);
# Cut off header and footer of $files [1] to $files[$#files-1],
# but only footer of $files[0] and header of $#files[$#files]
for (my $i = 0; $i <= $#files; $i++) {
print 'Opening ' . $files[$i] . "\n";
open (my $fh, "<", $files[$i]) or die "can't open $file for reading: $!";
my #lines = <$fh>;
splice #lines, 0, 1 unless $i == 0;
splice #lines, -1, 1 unless $i == $#files;
foreach my $line (#lines) {
$buff .= $line;
}
}
# Write the buffer to a new file.
my $allfilename = $cwd.'/Trace.txt';
print 'Writing all files into new file: ' . $allfilename . "\n";
open $outputfile, ">".$allfilename or die "can't write to new file $outputfile: $!";
# Write the buffer into the output file.
print $outputfile $buff;
close $outputfile;

Based on Miller's answer, but most suitable for large files.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $outfile = 'Trace.txt';
# Get all files in cwd.
my #files = grep { -f && $_ ne $outfile } <*.txt>;
open my $outfh, '>', $outfile;
my $counter = 0;
for my $file (#files) {
open my $fh, '<', $file;
my ($line, $prev) = ('', '');
my $l = 0;
while ($line = <$fh>) {
print $outfh $prev unless $l++ == 1 and $counter > 0;
$prev = $line;
}
$counter++;
print $outfh $prev if $counter == #files and $l > 0;
close $fh;
}

how to find the first occurrence of a string in all the files in a folder in perl

I'm trying to find the line of first occurrence of the string "victory" in each txt file in a folder. For each first "victory" in file I would like to save the number from that line to #num and the file name to #filename
Example: For the file a.txt that starts with the line: "lalala victory 123456" -> $num[$i]=123456 and $filename[$i]="a.txt"
ARGV holds all the file names. my problem is that I'm trying to go line by line and I don't know what I'm doing wrong.
one more thing - how can I get the last occurrence of "victory" in the last file??
use strict;
use warnings;
use File::Find;
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #ARGV, $File::Find::name } }, $dir); $^I = ".bak";
my $argvv;
my $counter=0;
my $prev_arg=0;
my $line = 0;
my #filename=0;
my #num=0;
my $i = 0;
foreach $argvv (#ARGV)
{
#open $line, $argvv or die "Could not open file: $!";
my $line = IN
while (<$line>)
{
if (/victory/)
{
$line = s/[^0-9]//g;
$first_bit[$i] = $line;
$filename[$i]=$argvv;
$i++;
last;
}
}
close $line;
}
for ($i=0; $i<3; $i++)
{
print $filename[$i]." ".$num[$i]."\n";
}
Thank you very much! :)

Your example script has a number of minor problems. The following example should do what you want in a fairly clean manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
# Find the files we're interested in parsing
my #files = ();
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #files, $File::Find::name } }, $dir);
# We'll store our results in a hash, rather than in 2 arrays as you did
my %foundItems = ();
foreach my $file (#files)
{
# Using a lexical file handle is the recommended way to open files
open my $in, '<', $file or die "Could not open $file: $!";
while (<$in>)
{
# Uncomment the next two lines to see what's being parsed
# chomp; # Not required, but helpful for the debug print below
# print "$_\n"; # Print out the line being parsed; for debugging
# Capture the number if we find the word 'victory'
# This assumes the number is immediately after the word; if that
# is not the case, it's up to you to modify the logic here
if (m/victory\s+(\d+)/)
{
$foundItems{$file} = $1; # Store the item
last;
}
}
close $in;
}
foreach my $file (sort keys %foundItems)
{
print "$file=> $foundItems{$file}\n";
}

the below searches for a string abc in all the files(file*.txt) and prints only the first line.
perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' file*.txt
tested:
> cat temp
abc 11
22
13
,,
abc 22
bb
cc
,,
ww
kk
ll
,,
> cat temp2
abc t goes into 1000
fileA1, act that abc specific place
> perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' temp temp2
abc 11
abc t goes into 1000
>

How to delete the last 5 lines of a file

I have several commands printing text to a file using perl. During these print commands I have an if statement which should delete the last 5 lines of the file I am currently writing to if the statement is true. The number of lines to delete will always be 5.
if ($exists == 0) {
print(OUTPUT ???) # this should remove the last 5 lines
}

You can use Tie::File:
use Tie::File;
tie my #array, 'Tie::File', filename or die $!;
if ($exists == 0) {
$#array -= 5;
}
You can use the same array when printing, but use push instead:
push #array, "line of text";

$ tac file | perl -ne 'print unless 1 .. 5' | tac > file.tailchopped

Only obvious ways I can think of:
Lock file, scan backwards to find a position and use
truncate.
Don't print to the file directly, go through a buffer
that's at least 5 lines long, and trim the buffer.
Print a marker that means "ignore the last five lines".
Process all your files before reading them with a buffer as in #2
All are pretty fiddly, but that's the nature of flat files I'm afraid.
HTH

As an alternative, print the whole file except last 5 lines:
open($fh, "<", $filename) or die "can't open $filename for reading: $!";
open($fh_new, ">", "$filename.new") or die "can't open $filename.new: $!";
my $index = 0; # So we can loop over the buffer
my #buffer;
my $counter = 0;
while (<$fh>) {
if ($counter++ >= 5) {
print $fh_new $buffer[$index];
}
$buffer[$index++] = $_;
$index = 0 if 5 == $index;
}
close $fh;
close $fh_new;
use File::Copy;
move("$filename.new", $filename) or die "Can not copy $filename.new to $filename: $!";

File::ReadBackwards+truncate is the fastest for large files, and probably as fast as anything else for short files.
use File::ReadBackwards qw( );
my $bfh = File::ReadBackwards->new($qfn)
or die("Can't read \"$qfn\": $!\n");
$bfh->readline() or last for 1..5;
my $fh = $bfh->get_handle();
truncate($qfn, tell($fh))
or die $!;
Tie::File is the slowest, and uses a large amount of memory. Avoid that solution.

you can try something like this:
open FILE, "<", 'filename';
if ($exists == 0){
#lines = <FILE>;
$newLastLine = $#lines - 5;
#print = #lines[0 .. $newLastLine];
print "#print";
}
or even shortened:
open FILE, "<", 'filename';
#lines = <FILE>;
if ($exists == 0){
print "#lines[0 .. $#lines-5]";
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl : search a keywords from a file and display the occurence - perl

Related

how to combine the code to make the output is on the same line?

Perl simple filehandling of text

Merge txt files in Perl, but modify them before, leaving original files untouched

how to find the first occurrence of a string in all the files in a folder in perl

How to delete the last 5 lines of a file

Categories

Resources