open files sequentially and search for content match in perl - perl

I am new to perl.
I am comfortable with opening two files and checking their contents, but how do I open files one after another in a loop and check their contents?

As mkHun suggested, you can use an array to store filenames then loop over it. See the below template to get an idea:
#!/usr/bin/perl
use strict;
use warnings;
my #files = qw(file.txt file2.txt file3.txt filen.txt);
foreach my $file (#files){
#open file in read mode to check contents
open (my $fh, "<", $file) or die "Couldn't open file $!";
#loop over file's content line by line
while(<$fh>){
#$_ contains each line of file. You can manipulate $_ below
if($_ =~ /cat/){
print "Line $. contains cat";
};
}
close $fh;
}
Also read:
Loop Control in Perl (perlsyn)

In addition to Chankey Pathak's answer, if you want to iterate over files in some directory (meaning you don't know what are the names of the files you want to process, but you know their location), the File::Find module is an easy and straightforward solution.

Related

Perl copying specific lines of VECT File

I want to copy lines 7-12 of files, like
this example .vect file,
into another .vect file in the same directory.
I want each line, to be copied twice, and the two copies of each line to be pasted consecutively in the new file.
This is the code I have used so far, and would like to continue using these methods/packages in Perl.
use strict;
use warnings;
use feature qw(say);
# This method works for reading a single file
my $dir = "D:\\Downloads";
my $readfile = $dir ."\\2290-00002.vect";
my $writefile = $dir . "\\file2.vect";
#open a file to read
open(DATA1, "<". $readfile) or die "Can't open '$readfile': $!";;
# Open a file to write
open(DATA2, ">" . $writefile) or die "Can't open '$writefile': $!";;
# Copy data from one file to another.
while ( <DATA1> ) {
print DATA2 $_;
}
close( DATA1 );
close( DATA2 );
What would be a simple way to do this using the same opening and closing file syntax I have used above?
Just modify the print line to
print DATA2 $_, $_ if 7 .. 12;
See Range Operators in "perlop - Perl operators and precedence" for details.
It's worth remembering the
Tie::File
module which maps a file line by line to a Perl array and allows you to manipulate text files using simple array operations. It can be slow when working with large amounts of data, but it is ideal for the majority of applications involving regular text files
Copying a range of lines from one file to another becomes a simple matter of copying an array slice. Remember that the file starts with line one in array element zero, so lines 7 to 12 are at indexes 6...11
This is the Perl code to do what you ask
use strict;
use warnings;
use Tie::File;
chdir 'D:\Downloads' or die $!;
tie my #infile, 'Tie::File', '2290-00002.vect' or die $!;
tie my #outfile, 'Tie::File', 'file2.vect' or die $!;
#outfile = map { $_, $_ } #infile[6..11];
Nothing else is required. Isn't that neat?

In Perl - Need to modify a script to parse all files in a directory

I have little to no Perl experience, so any assistance is much appreciated. I'm sorry if I'm not giving clear information in the question as I do not have a programming background.
I have a script that will parse a text file, check for a certain number of data points in the text file, then output "# of data points = X". I can get this to run on a single text file, and I can get it to output to a text file which is great.
However, there are 138 text files that I need to parse and analyze the data in number, all in one directory. I'm wondering if rather than running this individual script 138 times I can modify the script to go to the directory, run on each file in it, and output the results together in a text file.
I didn't write the original script, I inherited it and just barely managed to figure out how to get it to run on a single text file.
You can also do a glob, like so.
my #files = </path/where/files/are/*>;
foreach my $file (#files) {
print "working on $file...\n"
# do stuff with $file;
}
If your problem is to open 138 file in the same directory, you can open one by one using the "opendir" function, example of a script that open ALL FILES and print all lines :
#!/usr/bin/perl
use strict;
use warnings;
my $directory = '/tmp';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR)) {
print "$file\n";
open (FILE, $file) or die $!;
while (<FILE>) {
print $_;
}
}
closedir(DIR);

How can I create a new output file for each subfolder under a main folder using perl?

I have 100 subfolder in a main folder. They have difference names. Each subfolder includes a .txt file, which has 10 column. I want to get a new .txt file for each subfolder. Each new .txt file must be in its own folder. That is I will have 2 .txt files (old and new) in each subfolder. I am trying to select the lines starting "ATOM" and some columns 2,6,7 and 8 from each .txt file. My code is the following. It doesn't work correctly. It doesnt create a new .txt file. How can i figure out this problem?
#!/usr/bin/perl
$search_text = "ATOM";
#files = <*/*.txt>;
foreach $file (#files) {
print $file . "\n";
open(DATA, $file);
open(OUT_FILE, ">$file a.txt");
while ($line = <DATA>)
{
#fields = split /\s+/, $line;
if ($line =~ m/$search_text/)
{
print OUT_FILE "$fields[2]\t$fields[6]\t$fields[7]\t$fields[8]\n";
}
}
}
close(OUT_FILE);
To put the output file a.txt into the same directory as the input file, you need to extract the directory name from the input file name, and prepend it to the output file name (a.txt). There are a couple of ways you can do that; probably the simplest is to use dirname() from the standard module File::Basename:
use File::Basename;
my $dir = dirname($file);
open(OUT_FILE, ">", "$dir/a.txt") or die "Failed to open $dir/a.txt: $!";
or you could use File::Spec directly:
use File::Spec;
my ($volume, $dir) = File::Spec->splitpath($file);
my $outname = File::Spec->catpath($volume, $dir, 'a.txt');
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
or you could just use a regexp substitution:
my $outname = ( $file =~ s![^/]+$!a.txt!r );
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
Ps. In any case, I'd recommend adopting several good habits that will help you write better Perl scripts:
Always start your scripts with use strict; and use warnings;. Fix any errors and warnings they produce. In particular, declare all your local variables with my to make them lexically scoped.
Check the return value of functions like open(), and abort the script if they fail. (I've done this in my examples above.)
Use the three-argument form of open(), as I also did in my examples above. It's a lot less likely to break if your filenames contain funny characters.
Consider using lexically scoped file handles (open my $out_file, ...) instead of global file handles (open OUT_FILE, ...). I didn't do that in my code snippets above, because I wanted to keep them compatible with the rest of your code, but it would be good practice.
If you're pre-declaring a regular expression, like your $search_text, use qr// instead of a plain string, like this:
my $search_text = qr/ATOM/;
It's slightly more efficient, and the quoting rules for special characters are much saner.
For printing multiple columns from an array, consider using join() and a list slice, as in:
print OUT_FILE join("\t", #fields[2,6,7,8]), "\n";
Finally, if I were you, I'd reconsider my file naming scheme: the output file name a.txt matches your input file name glob *.txt, so your script will likely break if you run it twice in a row.

How to delete common lines from one of 2 files in Perl?

I have 2 files, a small one and a big one. The small file is a subset of the big one.
For instance:
Small file:
solar:1000
alexey:2000
Big File:
andrey:1001
solar:1000
alexander:1003
alexey:2000
I want to delete all the lines from Big.txt which are also present in Small.txt. In other words, I want to delete the lines in Big file which are common to the small File.
So, I wrote a Perl Script as shown below:
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
open(BIG, "<$big") || die("Couldn't read from the file: $big\n");
my #contents = <BIG>;
close (BIG);
open(SMALL, "<$small") || die ("Couldn't read from the file: $small\n");
while(<SMALL>)
{
chomp $_;
#contents = grep !/^\Q$_/, #contents;
}
close(SMALL);
open(OUTPUT, ">>$output") || die ("Couldn't open the file: $output\n");
print OUTPUT #contents;
close(OUTPUT);
However, this Perl Script does not delete the lines in Big.txt which are common to Small.txt
In this script, I first open the big file stream and copy the entire contents into the array, #contents. Then, I iterate over each entry in the small file and check for its presence in the bigger file. I filter the line from Big File and save it back into the array.
I am not sure why this script does not work? Thanks
Your script does NOT work because grep uses $_ and takes over (for the duration of grep) the old value of your $_ from the loop (e.g. the variable $_ you use in the regex is NOT the variable used for storing the loop value in the while block - they are named the same, but have different scopes).
Use a named variable instead (as a rule, NEVER use $_ for any code longer than 1 line, precisely to avoid this type of bug):
while (my $line=<SMALL>) {
chomp $line;
#contents = grep !/^\Q$line/, #contents;
}
However, as Oleg pointed out, a more efficient solution is to read small file's lines into a hash and then process the big file ONCE, checking hash contents (I also improved the style a bit - feel free to study and use in the future, using lexical filehandle variables, 3-arg form of open and IO error printing via $!):
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
use File::Slurp;
my #small = read_file($small);
my %small = map { ($_ => 1) } #small;
open(my $big, "<", $big) or die "Can not read $big: Error: $!\n";
open(my $output, ">", $output) or die "Can not write to $output: Error: $!\n";
while(my $line=<$big>) {
chomp $line;
next if $small{$line}; # Skip common
print $output "$line\n";
}
close($big);
close($output);
It doesn't work for several reasons. First, lines in #content still have their newlines in. And second, when you grep, $_ in !/^\Q$_/ is set not to the last line from small file, but for each element of #contents array, effectively making it: for each element in list return everything except this element, leaving you with empty list at the end.
This isn't really the good way to do it - you're reading big file and then trying to reprocess it several times. First, read a small file and put every line in hash. Then read big file inside while(<>) loop, so you won't waste your memory reading it entirely. On each line, check if key exists in previously populated hash and if it does - go to next iteration, otherwise print the line.
Here is a small and efficient solution to your problem:
#!/usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
my %diffx;
open my $bfh, "<", $big or die "Couldn't read from the file $big: $!\n";
# load big file's contents
my #big = <$bfh>;
chomp #big;
# build a lookup table, a structured table for big file
#diffx{#big} = ();
close $bfh or die "$!\n";
open my $sfh, "<", $small or die "Couldn't read from the file $small: $!\n";
my #small = <$sfh>;
chomp #small;
# delete the elements that exist in small file from the lookup table
delete #diffx{#small};
close $sfh;
# print join "\n", keys %diffx;
open my $ofh, ">", $output or die "Couldn't open the file $output for writing: $!\n";
# what is left is unique lines from big file
print $ofh join "\n", keys %diffx;
close $ofh;
__END__
P.S. I learned this trick and many others from Perl Cookbook, 2nd Edition. Thanks

How do I open a file whose full name is unknown with Perl?

I want to know if there is anything that lets me do the following:
folder1 has files "readfile1" "f2" "fi5"
The only thing I know is that I need to read the file which starts with readfile, and I don't know what's there in the name after the string readfile. Also, I know that no other file in the directory starts with readfile.
How do I open this file with the open command?
Thank you.
glob can be used to find a file matching a certain string:
my ($file) = glob 'readfile*';
open my $fh, '<', $file or die "can not open $file: $!";
You can use glob for simple cases, as toolic suggests.
my ($file) = glob 'readfile*';
If the criteria for finding the correct file are more complex, just read the entire directory and use the full power of Perl to winnow the list down to what you need:
use strict;
use warnings;
use File::Slurp qw(read_dir);
my $dir = shift #ARGV;
my #files = read_dir($dir);
# Filter the list as needed.
#files = map { ... } #files;
You don't necessarily need imports to read the contents of the directory - perl has some built-in functions that can do that:
opendir DIR, ".";
my ($file) = grep /readfile.*/, readdir(DIR);
open FILE, $file or die $!;