Perl loop through text files in a directory, locate file based on file name and print content - perl

How do i loop through files in a directory, locate file based on file name and print file content?
Please see below code:
files in directory:
1234.txt
345.txt
234.txt
Code:
opendir (DIR, "LOCATION")|| die "cant open directory\n";
my #DATA = grep {(!/^\./)} readdir (DIR);
while ( my $file = shift #DATA) {
open FILE, "LOCATION";
while (FILE){
if ($file eq "235") {
print $_;
}
}
}

This should do (untested):
opendir( DIR, "/path/to/dir" );
while ( my $entry = readdir( DIR ) ) {
if ( $entry =~ /^$filenameImLookingFor$/ ) {
open( FILE, "$entry/$filenameImLookingFor" );
my #lines = <FILE>;
close( FILE );
print( join( '', #lines );
}
}
closedir( DIR );

The code in your question:
opendir (DIR, "LOCATION")|| die "cant open directory\n";
my #DATA = grep {(!/^\./)} readdir (DIR);
while ( my $file = shift #DATA) {
open FILE, "LOCATION";
while (FILE){
if ($file eq "235") {
print $_;
}
}
}
Will do this:
First it will handily find all files in directory "LOCATION" that do not begin with a period. Then it will iterate in a rather odd loop over each file name. The normal version of this loop would be:
for my $file (#DATA)
Then it will attempt to open the directory "LOCATION" again. This will likely fail, because "LOCATION" is a directory. Since you do not check the return value with die, this error will be silent.
What you probably want is to use
if ($file eq "235.txt") {
open my $fh, "<", $file or die $!;
print <$fh>;
}
This part:
while (FILE)
Is not actually checking the return value of readline(), it is checking whether the file handle is returning a true value. As near as I can tell on my system, it does return a true value even if the open failed. Which means of course that the loop will run indefinitely. What you probably meant was
while (<FILE>)
However, as explained earlier, this will only result in the error "readline() on unopened file handle FILE" since the open statement cannot open a directory.
Your check
if ($file eq "235")
Will never be true, since you said your file names had a .txt extension. You might instead do
if ($file eq "235.txt")
Which should work.
If you wanted to be clever, you could include your check directly in the grep:
my #files = grep { $_ eq "235.txt" } readdir DIR;
And since perl can use the <> diamond operator to print files listed in the #ARGV array, you can even do this
#ARGV = grep { $_ eq "235.txt" } #ARGV;
print <>;
Assuming you call the script with:
perl script.pl dir/*.txt
This is, of course, just the long version of doing:
perl -pe0 235.txt
Which is the long version of
cat 235.txt
So, I get the feeling you are trying to do something other than what your code implies.

Related

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

perl not able to delete a file using Unlink

I am using a perl script that takes directory name as input from user and searches files in it. After searching file it reads the contents of file. If file contents contain a word "cricket" then using unlink function I should be able to delete the file. But using unlink the file that contains the word "cricket" still exists in the directory after execution of the code. Please help. My code is:
use strict;
use warnings;
use File::Basename;
print "enter a directory name\n";
my $dir = <>;
print "you have entered $dir \n";
chomp($dir);
opendir DIR, $dir or die "cannot open directory $!";
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
my $filepath = "${dir}${file}";
print "$filepath\n";
print " $file \n";
open(my $fh, '<', $filepath) or die "unable to open the $file $!";
my $count = 0;
while (my $row = <$fh>) {
chomp $row;
if ($row =~ /cricket/) {
$count++;
}
}
print "$count";
if ($count == 0) {
chomp($filepath);
unlink $filepath;
print " $filepath deleted";
}
}
By your test if($count==0) {...} you'll only delete files if they don't contain "cricket". It should work as you describe if you change it to if($count) {...}.
Additionally you're creating the filepath by concatenating the dir and file names in a manner that will only work if the dir name the user entered includes a trailing slash (${dir}${file}): this would be less error-prone as $dir/$file, or, if you wanted to go to town:
use File::Spec;
File::Spec::catfile($dir, $file);
Additionally, as the comments point out, you're not closing the open file handle, whether or not you try to delete it. This is bad practice, however, on Linux at least it should still work. Use close($fh) before your deletion test.
Note also that "cricket" is case-sensitive so files with "Cricket" won't be deleted. Use $row =~ /cricket/i for case-insensitive search.

Perl to find the extension of file

I have a program that takes directory name as input from user and searches all files inside the directory and prints the contents of file. Is there any way so that I can read the extension of file and read the contents of file that are of specified extension? For example, it should read contents of file that is in ".txt" format.
My code is
use strict;
use warnings;
use File::Basename;
#usr/bin/perl
print "enter a directory name\n";
my $dir = <>;
print "you have entered $dir \n";
chomp($dir);
opendir DIR, $dir or die "cannot open directory $!";
while ( my $file = readdir(DIR) ) {
next if ( $file =~ m/^\./ );
my $filepath = "${dir}${file}";
print "$filepath\n";
print " $file \n";
open( my $fh, '<', $filepath ) or die "unable to open the $file $!";
while ( my $row = <$fh> ) {
chomp $row;
print "$row\n";
}
}
To get just the ".txt" files, you can use a file test operator (-f : regular file) and a regex.
my #files = grep { -f && /\.txt$/ } readdir $dir;
Otherwise, you can look for just text files, using perl's -T (ascii-text file test operator)
my #files = grep { -T } readdir $dir;
Otherwise you can try even this:
my #files = grep {-f} glob("$dir/*.txt");
You're pretty close here. You have a main loop that looks like this:
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
# do stuff
}
See where you're skipping loop iterations if the filename starts with a dot. That's an excellent place to put any other skip requirements that you have - like skipping files that don't end with '.txt'.
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
next unless $file =~ /\.txt$/i; # skip non-text files
# do stuff
}
In the same way as your original test checked for the start of the string (^) followed by a literal dot (\.), we're now searching for a dot (\.) followed by txt and the end of the string ($). Note that I've also added the /i option to the match operator to make the match case-insensitive - so that we match ".TXT" as well as ".txt".
It's worth noting that the extension of a file is a terrible way to work out what the file contains.
Try this. Below code gives what you expect.
use warnings;
use strict;
print "Enter the directory name: ";
chomp(my $dir=<>);
print "Enter the file extension type: "; #only type the file format. like txt rtf
chomp(my $ext=<>);
opendir('dir',"$dir");
my #files = grep{m/.$ext/g} readdir('dir');
foreach my $ech(#files){
open('file',"$dir/$ech");
print <file>;
}
I'm store the all file from the particular directory to store the one array and i get the particular file format by using the grep command. Then open the files into the foreach condition

Script to remove first line from all the text files in a directory

I'm trying to write a Perl script which reads all the text files in a directory and writes all the lines except first to a separate file. If there are 3 files, I want the script to read all those 3 files and write 3 new files with same lines except the first. This is what I wrote.. but when I try to run the script, it executes fine with no errors but doesn't do the work it is supposed to. Can someone please look into it?
opendir (DIR, "dir\\") or die "$!";
my #files = grep {/*?\.txt/} readdir DIR;
close DIR;
my $count=0;
my $lc;
foreach my $file (#files) {
$count++;
open(FH,"dir\\$file") or die "$!";
$str="dir\\example_".$count.".txt";
open(FH2,">$str");
$lc=0;
while($line = <FH>){
if($lc!=0){
print FH2 $line;
}
$lc++;
}
close(FH);
close(FH2);
}
And the second file doesn't exists, it is supposed to be created by script.
Try changing these lines
opendir (DIR, "dir\\") or die "$!";
...
close DIR;
to
opendir (DIR, "dir") or die "$!";
...
closedir DIR;
I tried running your code locally and the only two issues I had were with the directory name containing the trailing slash and trying to use the filehandle close() function on a dirhandle.
If you have the list of files ...
foreach my $file ( #files ) {
open my $infile , '<' , "dir/$file" or die "$!" ;
open my $outfile , '>' , "dir/example_" . ++${counter} . '.txt' or die "$!" ;
<$infile>; # Skip first line.
while( <$infile> ) {
print $outfile $_ ;
}
}
The lexical filehandles will be closed automatically when going out of scope.
Not sure why you're using $count here, as that's going to just turn a list of files like:
01.txt
bob.txt
alice.txt
02.txt
into:
01_1.txt
bob_2.txt
alice_3.txt
02_4.txt
Keep in mind, #files isn't being sorted, so it will return in the order the files exist in the directory table. If you were to delete and re-create the file 01.txt, it would be moved to the end of the list, re-ordering the whole set:
bob_1.txt
alice_2.txt
02_3.txt
01_4.txt
Since that wasn't really part of your original question, this does exactly what you asked to do:
#!/usr/bin/perl
while(<*.txt>) { # for every file in the *.txt glob from the current directory
open(IN, $_) or die ("Cannot open $_: $!"); # open file for reading
my #in = <IN>; # read the contents into an array
close(IN); # close the file handle
shift #in; # remove the first element from the array
open(OUT, ">$_.new") or die ("Cannot open $_.new: $!"); # open file for writing
print OUT #in; # write the contents of the array to the file
close(OUT); # close the file handle
}

Perl program help on opendir and readdir

So I have a program that I want to clean some text files. The program asks for the user to enter the full pathway of a directory containing these text files. From there I want to read the files in the directory, print them to a new file (that is specified by the user), and then clean them in the way I need. I have already written the script to clean the text files.
I ask the user for the directory to use:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
Then I need to read the directory.
my #dir = readdir DIR;
foreach (#dir) {
Now I am lost.
Any help please?
I'm not certain of what do you want. So, I made some assumptions:
When you say clean the text file, you meant delete the text file
The names of the files you want to write into are formed by a pattern.
So, if I'm right, try something like this:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
my #dir = readdir DIR;
foreach (#dir) {
next if (($_ eq '.') || ($_ eq '..'));
# Reads the content of the original file
open FILE, $_;
$contents = <FILE>;
close FILE;
# Here you supply the new filename
$new_filename = $_ . ".new";
# Writes the content to the new file
open FILE, '>'.$new_filename;
print FILE $content;
close FILE;
# Deletes the old file
unlink $_;
}
I would suggest that you switch to File::Find. It can be a bit of a challenge in the beginning but it is powerful and cross-platform.
But, to answer your question, try something like:
my #files = readdir DIR;
foreach $file (#files) {
foo($user_supplied_directory/$file);
}
where "foo" is whatever you need to do to the files. A few notes might help:
using "#dir" as the array of files was a bit misleading
the folder name needs to be prepended to the file name to get the right file
it might be convenient to use grep to throw out unwanted files and subfolders, especially ".."
I wrote something today that used readdir. Maybe you can learn something from it. This is just a part of a (somewhat) larger program:
our #Perls = ();
{
my $perl_rx = qr { ^ perl [\d.] + $ }x;
for my $dir (split(/:/, $ENV{PATH})) {
### scanning: $dir
my $relative = ($dir =~ m{^/});
my $dirpath = $relative ? $dir : "$cwd/$dir";
unless (chdir($dirpath)) {
warn "can't cd to $dirpath: $!\n";
next;
}
opendir(my $dot, ".") || next;
while ($_ = readdir($dot)) {
next unless /$perl_rx/o;
### considering: $_
next unless -f;
next unless -x _;
### saving: $_
push #Perls, "$dir/$_";
}
}
}
{
my $two_dots = qr{ [.] .* [.] }x;
if (grep /$two_dots/, #Perls) {
#Perls = grep /$two_dots/, #Perls;
}
}
{
my (%seen, $dev, $ino);
#Perls = grep {
($dev, $ino) = stat $_;
! $seen{$dev, $ino}++;
} #Perls;
}
The crux is push(#Perls, "$dir/$_"): filenames read by readdir are basenames only; they are not full pathnames.
You can do the following, which allows the user to supply their own directory or, if no directory is specified by the user, it defaults to a designated location.
The example shows the use of opendir, readdir, stores all files in the directory in the #files array, and only files that end with '.txt' in the #keys array. The while loop ensures that the full path to the files are stored in the arrays.
This assumes that your "text files" end with the ".txt" suffix. I hope that helps, as I'm not quite sure what's meant by "cleaning the files".
use feature ':5.24';
use File::Copy;
my $dir = shift || "/some/default/directory";
opendir(my $dh, $dir) || die "Can't open $dir: $!";
while ( readdir $dh ) {
push( #files, "$dir/$_");
}
# store ".txt" files in new array
foreach $file ( #files ) {
push( #keys, $file ) if $file =~ /(\S+\.txt\z)/g;
}
# Move files to new location, even if it's across different devices
for ( #keys ) {
move $_, "/some/other/directory/"; || die "Couldn't move files: $!\n";
}
See the perldoc of File::Copy for more info.