how to find the first occurrence of a string in all the files in a folder in perl

how to find the first occurrence of a string in all the files in a folder in perl - perl

I'm trying to find the line of first occurrence of the string "victory" in each txt file in a folder. For each first "victory" in file I would like to save the number from that line to #num and the file name to #filename
Example: For the file a.txt that starts with the line: "lalala victory 123456" -> $num[$i]=123456 and $filename[$i]="a.txt"
ARGV holds all the file names. my problem is that I'm trying to go line by line and I don't know what I'm doing wrong.
one more thing - how can I get the last occurrence of "victory" in the last file??
use strict;
use warnings;
use File::Find;
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #ARGV, $File::Find::name } }, $dir); $^I = ".bak";
my $argvv;
my $counter=0;
my $prev_arg=0;
my $line = 0;
my #filename=0;
my #num=0;
my $i = 0;
foreach $argvv (#ARGV)
{
#open $line, $argvv or die "Could not open file: $!";
my $line = IN
while (<$line>)
{
if (/victory/)
{
$line = s/[^0-9]//g;
$first_bit[$i] = $line;
$filename[$i]=$argvv;
$i++;
last;
}
}
close $line;
}
for ($i=0; $i<3; $i++)
{
print $filename[$i]." ".$num[$i]."\n";
}
Thank you very much! :)

Your example script has a number of minor problems. The following example should do what you want in a fairly clean manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
# Find the files we're interested in parsing
my #files = ();
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #files, $File::Find::name } }, $dir);
# We'll store our results in a hash, rather than in 2 arrays as you did
my %foundItems = ();
foreach my $file (#files)
{
# Using a lexical file handle is the recommended way to open files
open my $in, '<', $file or die "Could not open $file: $!";
while (<$in>)
{
# Uncomment the next two lines to see what's being parsed
# chomp; # Not required, but helpful for the debug print below
# print "$_\n"; # Print out the line being parsed; for debugging
# Capture the number if we find the word 'victory'
# This assumes the number is immediately after the word; if that
# is not the case, it's up to you to modify the logic here
if (m/victory\s+(\d+)/)
{
$foundItems{$file} = $1; # Store the item
last;
}
}
close $in;
}
foreach my $file (sort keys %foundItems)
{
print "$file=> $foundItems{$file}\n";
}

the below searches for a string abc in all the files(file*.txt) and prints only the first line.
perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' file*.txt
tested:
> cat temp
abc 11
22
13
,,
abc 22
bb
cc
,,
ww
kk
ll
,,
> cat temp2
abc t goes into 1000
fileA1, act that abc specific place
> perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' temp temp2
abc 11
abc t goes into 1000
>

Related

modify lines between two tags in perl

I need some help with replaceing lines between two tages in perl. I have a file in which I want to modify lines between two tags:
some lines
some lines
tag1
ABC somelines
NOP
NOP
ABC somelines
NOP
NOP
ABC somelines
tag2
As you can see, I have two tags, tag1 and tag2 and basically, I want to replace all instances of ABC with NOP between tag1 and tag2. Here is the relevant portion of code but it doesn't replace. Can anyone please help..?
my $fh;
my $cur_file = "file_name";
my #lines = ();
open($fh, '<', "$cur_file") or die "Can't open the file for reading $!";
print "Before while\n";
while(<$fh>)
{
print "inside while\n";
my $line = $_;
if($line =~ /^tag1/)
{
print "inside range check\n";
$line = s/ABC/NOP/;
push(#lines, $line);
}
else
{
push(#lines, $line);
}
}
close($fh);
open ($fh, '>', "$cur_file") or die "Can't open file for wrinting\n";
print $fh #lines;
close($fh);

Consider a one-liner using the Flip-Flop operator.
perl -i -pe 's/ABC/NOP/ if /^tag1/ .. /^tag2/' file

Use $INPLACE_EDIT in conjunction with the range operator ..
use strict;
use warnings;
local $^I = '.bak';
local #ARGV = $cur_file;
while (<>) {
if (/^tag1/ .. /^tag2/) {
s/ABC/NOP/;
}
print;
}
unlink "$cur_file$^I"; #delete backup;
For alternative ways to edit a file, check out: perlfaq5

Your line which says $line = s/ABC/NOP/; is incorrect, you need =~ there.
#!/usr/bin/perl
use strict;
use warnings;
my $tag1 = 0;
my $tag2 = 0;
while(my $line = <DATA>){
if ($line =~ /^tag1/){
$tag1 = 1; #Set the flag for tag1
}
if ($line =~ /^tag2/){
$tag2 = 1; #Set the flag for tag2
}
if($tag1 == 1 && $tag2 == 0){
$line =~ s/ABC/NOP/;
}
print $line;
}
Demo

perl + read multiple csv files + manipulate files + provide output_files + syntax error symbol ref

Buiding on from this question. I am still having syntax trouble with this script:
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
#print "dh -> $dh\n";
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
# added the "()" here ($dh) as otherwise an error
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
print "format_file-> $format_file\n";
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print {$format_file} scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
I can fix this line opendir my $dh, $source_dir; by adding brackets ($dh)
but i am still having trouble with this line print {$format_file} scalar <$orig_file>; # Copy header line line
I get the following error:
Can't use string ("/home/Kevin Smith/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 29.
Can anyone advise?
I have tried using advise here but there is not much joy.

Use print $format_file ... or print ${format_file} ...
However $format_file is just a string containing the name of the file, not a filehandle. You have to open the file:
open my $format_fh, '>', $format_file or die $!;
...
print $format_$fh ... ;

I can't output properly

I'm trying to print a character from a file each time I get a char as input.
My problem is that it prints the whole line. I know it's a logic problem, I just can't figure out how to fix it.
use Term::ReadKey;
$inputFile = "input.txt";
open IN, $inputFile or die "I can't open the file :$ \n";
ReadMode("cbreak");
while (<IN>) {
$line = <IN>;
$char = ReadKey();
foreach $i (split //, $line) {
print "$i" if ($char == 0);
}
}

Move the ReadKey call into the foreach loop.
use strictures;
use autodie qw(:all);
use Term::ReadKey qw(ReadKey ReadMode);
my $inputFile = 'input.txt';
open my $in, '<', $inputFile;
ReadMode('cbreak');
while (my $line = <$in>) {
foreach my $i (split //, $line) {
my $char = ReadKey;
print $i;
}
}
END { ReadMode('restore') }

Your original code has 3 problems:
You only read the character once (outside the for loop)
You read 1 line from input file when testing while (<IN>) { (LOSING that line!) and then another in $line = <IN>; - therefore, only read even #d lines in your logic
print "$i" prints 1 line with no newline, therefore, you don't see characters separated

My scrip reads all the files in a directory, puts then in a list, chooses a random file from the given list.
After that, each time it gets an input char from the user, it prints a char from the file.
#!C:\perl\perl\bin\perl
use Term::ReadKey qw(ReadKey ReadMode);
use autodie qw(:all);
use IO::Handle qw();
use Fatal qw( open );
STDOUT->autoflush(1);
my $directory = "codes"; #directory's name
opendir (DIR, $directory) or die "I can't open the directory $directory :$ \n"; #open the dir
my #allFiles; #array of all the files
while (my $file = readdir(DIR)) { #read each file from the directory
next if ($file =~ m/^\./); #exclude it if it starts with '.'
push(#allFiles, $file); #add file to the array
}
closedir(DIR); #close the input directory
my $filesNr = scalar(grep {defined $_} #allFiles); #get the size of the files array
my $randomNr = int(rand($filesNr)); #generate a random number in the given range (size of array)
$file = #allFiles[$randomNr]; #get the file at given index
open IN, $file or die "I can't open the file :$ \n"; #read the given file
ReadMode('cbreak'); #don't print the user's input
while (my $line = <IN>) { #read each line from file
foreach my $i (split //, $line) { #split the line in characters (including \n & \t)
print "$i" if ReadKey(); #if keys are pressed, print the inexed char
}
}
END {
ReadMode('restore') #deactivate 'cbreak' read mode
}

How to check for files that has two different extensions in Perl

I have a file reflog with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1, file2 & file3 here as example), it needs to be exist in both extensions (.abc and .def). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).
reflog:
file1.abc
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file8abc.def
file7.abc
file1.def
file9abc.def
file10def.abc
My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).
#! /usr/bin/perl
use strict;
use warnings;
my #files_abc ;
my #files_def ;
my $line;
open(FILE1, 'reflog') || die ("Could not open reflog") ;
open (FILE2, '>log') || die ("Could not open log") ;
while ($line = <FILE1>) {
if($line=~ /(.*).abc/) {
push(#files_abc,$1);
} elsif ($line=~ /(.*).def/) {
push(#files_def,$1); }
}
close(FILE1);
my %first = map { $_ => 1 } #files_def ;
my #same = grep { $first{$_} } #files_abc ;
my #abc_only = grep { !$first{$_} } #files_abc ;
foreach my $abc (sort #abc_only) {
$abc .= ".abc";
}
my %second = map {$_=>1} #files_abc;
my #same2 = grep { $second{$_} } #files_def; ##same and same2 are equal.
my #def_only = grep { !$second{$_} } #files_def;
foreach my $def (sort #def_only) {
$def .= ".def";
}
my #combine_all = sort (#same, #abc_only, #def_only);
print "\nCombine all:-\n #combine_all\n" ;
print "\nList of files with same extension\n #same";
print "\nList of files with abc only\n #abc_only";
print "\nList of files with def only\n #def_only";
foreach my $item (sort #combine_all) {
print FILE2 "$item\n" ;
}
close (FILE2) ;
My output is like this which is wrong:-
1st:- print screen output as below:
Combine all:-
file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc
List of files with same extension
file1 file2 file3 file5
List of files with abc only
file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
file6.def
Log output as below:
**file.abc
file.abc**
file1
file10def.abc
file2
file3
file4.abc
file5
file6.def
file7.abc
Can you pls help me take a look where gies wrong? Thanks heaps.

ALWAYS add
use strict;
use warnings;
to the head of your program. They will catch most simple errors before you need to ask for help.
You should always check whether a file open succeeded with open FILE, "reflog" or die $!;
You are using a variable $ine that doesn't exist. You mean $line
The lines you read into the array contain a trailing newline. Write chomp #lines; to remove them
Your regular expressions are wrong and you need || instead of &&. Instead write if ($line =~ /\.(iif|isp)$/)
If you still have problems when these are fixed then please ask again.

Aside from the errors already pointed out, you appear to be loading #lines from FUNC instead of FILE. Is that also a typo?
Also, If reflog truly contains a series of lines with one filename on each line, why would you ever expect the conditional "if ($line =~ /.abc/ && $line =~ /.def/)" to evaluate true?
It would really help if you could post an example from the actual file you are reading from, along with the actual code you are debugging. Or at least edit the question to fix the typos already mentioned

use strict;
use warnings;
my #files_abc;
my #files_def;
my $line;
open(FILE,'reflog') || die ("could not open reflog");
while ($line = <FILE>) {
if($line=~ /(.*)\.abc/) {
push(#files_abc,$1);
}
elsif($line=~ /(.*)\.def/) {
push(#files_def,$1);
}
}
close(FILE);
my %second = map {$_=>1} #files_def;
my #same = grep { $second{$_} } #files_abc;
print "\nList of files with same extension\n #same";
foreach my $abc (#files_abc) {
$abc .= ".abc";
}
foreach my $def (#files_def) {
$def .= ".def";
}
print "\nList of files with abc extension\n #files_abc";
print "\nList of files with def extension\n #files_def";
Output is
List of files with same extension
file1 file2 file3 file5
List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc
List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def
Hope this helps...

You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your reflog file:
xx.pl
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
Since the code does not actually check the extensions, it would be feasible to omit $oldextn and $newextn; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.
I very seldom find it good for a processing script like this to remove its own input, hence I've left unlink "reflog"; commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so the print statements come in pairs.
This is a variant on control-break reporting.
reflog
file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc
Output
$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$
To handle unsorted file names with blank lines
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (#lines)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
This is very similar to the original code I posted. The new lines are these:
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", ""); # Old
foreach my $newline (#lines)
This reads the 'reflog' file, skipping blank lines, saving the rest in the #lines array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:
file1
file2
file3
Urgh: the chomp $newline; is not needed, though it is not otherwise harmful. The old-fashioned chop (a precursor to chomp) would have been dangerous. Score one for modern Perl.

open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
$line =~ s/^\s*//;
if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
$seen{$1}++;
}
}
foreach my $file ( keys %seen ) {
if ( $seen{$file} > 1 ) {
## do whatever you want to
}
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);

File manipulation in Perl

I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.

To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;

If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to find the first occurrence of a string in all the files in a folder in perl - perl

Related

modify lines between two tags in perl

perl + read multiple csv files + manipulate files + provide output_files + syntax error symbol ref

I can't output properly

How to check for files that has two different extensions in Perl

File manipulation in Perl

Categories

Resources