How do I read multiple directories and read the contents of subdirectories in Perl? - perl

I have a folder and inside that I have many subfolders. In those subfolders I have many .html files to be read. I have written the following code to do that. It opens the parent folder and also the first subfolder and it prints only one .html file. It shows error:
NO SUCH FILE OR DIRECTORY
I dont want to change the entire code. Any modifications in the existing code will be good for me.
use FileHandle;
opendir PAR_DIR,"D:\\PERL\\perl_programes\\parent_directory";
while (our $sub_folders = readdir(PAR_DIR))
{
next if(-d $sub_folders);
opendir SUB_DIR,"D:\\PERL\\perl_programes\\parent_directory\\$sub_folders";
while(our $file = readdir(SUB_DIR))
{
next if($file !~ m/\.html/i);
print_file_names($file);
}
close(FUNC_MODEL1);
}
close(FUNC_MODEL);
sub print_file_names()
{
my $fh1 = FileHandle->new("D:\\PERL\\perl_programes\\parent_directory\\$file")
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}

Your posted code looks way overcomplicated. Check out File::Find::Rule and you could do most of that heavy lifting in very little code.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
I mean isn't that sexy?!
A user commented that you may be wishing to use only Depth=2 entries.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->mindepth(2)->maxdepth(2)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
Will Apply this restriction.

You're not extracting the supplied $file parameter in the print_file_names() function.
It should be:
sub print_file_names()
{
my $file = shift;
...
}
Your -d test in the outer loop looks wrong too, BTW. You're saying next if -d ... which means that it'll skip the inner loop for directories, which appears to be the complete opposite of what you require. The only reason it's working at all is because you're testing $file which is only the filename relative to the path, and not the full path name.
Note also:
Perl on Windows copes fine with / as a path separator
Set your parent directory once, and then derive other paths from that
Use opendir($scalar, $path) instead of opendir(DIR, $path)
nb: untested code follows:
use strict;
use warnings;
use FileHandle;
my $parent = "D:/PERL/perl_programes/parent_directory";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {
next unless $file =~ /\.html?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
}
closedir($par_dir);
sub print_file_names()
{
my $file = shift;
my $fh1 = FileHandle->new($file)
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}

Please start putting:
use strict;
use warnings;
at the top of all your scripts, it will help you avoid problems like this and make your code much more readable.
You can read more about it here: Perlmonks

You are going to need to change the entire code to make it robust:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $top = $ENV{TEMP};
find( { wanted => \&wanted, no_chdir=> 1 }, $top );
sub wanted {
return unless -f and /\.html$/i;
print $_, "\n";
}
__END__

Have you considered using
File::Find

Here's one method which does not require to use File::Find:
First open the root directory, and store all the sub-folders' names in an array by using readdir;
Then, use foreach loop. For each sub-folder, open the new directory by linking the root directory and the folder's name. Still use readdir to store the file names in an array.
The last step is to write the codes for processing the files inside this foreach loop.
Special thanks to my teacher who has given me this idea :) It really worked well!

Related

readdir() attempted on invalid dirhandle $par_dir

I am trying just to execute a perl script inside multiple folders, but I don't understand why I have a problem with readdir() attempted on invalid dirhandle $par_dir. $parent is printed good but $par_dir is printed like "GLOB(0x17e7a68)".
Any idea of why it is happening? Thanks a lot!
Here the code:
#!/usr/bin/perl
use warnings;
use Cwd;
use FileHandle;
use File::Glob;
my $parent = "/media/sequentia/NAS/projects/131-prgdb3/01- DATA/All_plant_genomes_proteomes";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
print $parent."\n";
print $par_dir."\n";
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
print $path."\n";
chdir($path) or die;
#files = glob( $path. '/*' );
foreach $filename (#files){
print $filename ."\n";
system ("grep 'comment' PutativeGenes.txt | wc -l");
system ("grep 'class' PutativeGenes.txt | wc -l");
}
}
closedir($par_dir);
The problem is probably that the directory you specify in $parent doesn't exist. You must always check to make sure that a call to open or opendir succeeded before going on to use the handle
That path step 01- DATA is suspicious. I would expect 01-DATA or perhaps 01- DATA with a single space, but multiple spaces are rarely used because they are invisible and difficult to count
Here are some other thoughts on your program
You must always use strict and use warnings 'all' at the top of every Perl program you write. That will alert you to many simple errors that you may otherwise overlook
Your statement next if ( $sub_folders =~ /^..?$/ ) is wrong because the dots must be escaped. As it is you are discarding any name that is one or two characters in length
If your path really does contain spaces then you need to use File::Glob ':bsd_glob', as otherwise the spaces will be treated as separators between multipl glob patterns
You execute the foreach loop for every file or directory found in $path, but your system calls aren't affected by the name of that file, so you're making the same call multiple times
It's worth noting that glob will do all the directory searching for you. I would write something like this
#!/usr/bin/perl
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my $parent_dir = "/media/sequentia/NAS/projects/131-prgdb3/01-DATA/All_plant_genomes_proteomes";
print "$parent_dir\n";
while ( my $path = glob "$parent_dir/*" ) {
next unless -d $path;
print "$path\n";
chdir $path or die qq{Unable to chdir to "$path": $!};
while ( my $filename = glob "$path/*" ) {
next unless -f $filename;
print "$filename\n";
system "grep 'comment' PutativeGenes.txt | wc -l";
system "grep 'class' PutativeGenes.txt | wc -l";
}
}
Probably opendir() is failing giving the invalid file handle (probably it fails because you try to open a nonexistent $parent directory).
If opendir fails it will return false, and $par_dir is left unchanged as undef. If you attempt to call readdir() on an undefined file handle you will get a runtime warning like:
readdir() attempted on invalid dirhandle at ...
Therefore you should always check the return code from opendir. For example, you can do:
opendir($par_dir, $parent) or die "opendir() failed: $!";
or see more suggestions on what to do in this link Does die have to be used if opening a file fails?
Note that your code could have been simplified using File::Find::Rule, for example:
my #dirs = File::Find::Rule
->directory->maxdepth(1)->mindepth(1)->in( $parent );
for my $dir (#dirs) {
say "$dir";
my #files = File::Find::Rule->file->maxdepth(1)->in( $dir );
say "--> $_" for #files;
}
Alternatively, if you don't need the directory names:
my #files = File::Find::Rule
->file->maxdepth(2)->mindepth(2)->in( $parent );
say for #files;

List content of a directory except hidden files in Perl

My code displays all files within the directory, But I need it not to display hidden files such as "." and "..".
opendir(D, "/var/spool/postfix/hold/") || die "Can't open directory: $!\n";
while (my $f = readdir(D))
{
print "MailID :$f\n";
}
closedir(D);
It sounds as though you might be wanting to use the glob function rather than readdir:
while (my $f = </var/spool/postfix/hold/*>) {
print "MailID: $f\n";
}
<...> is an alternate way of globbing, you can also just use the function directly:
while (my $f = glob "/var/spool/postfix/hold/*") {
This will automatically skip the hidden files.
Just skip the files you don't want to see:
while (my $f = readdir(D))
{
next if $f eq '.' or $f eq '..';
print "MailID :$f\n";
}
On a Linux system, "hidden" files and folders are those starting with a dot.
It is best to use lexical directory handles (and file handles).
It is also important to always use strict and use warnings at the start of every Perl program you write.
This short program uses a regular expression to check whether each name starts with a dot.
use strict;
use warnings;
opendir my $dh, '/var/spool/postfix/hold' or die "Can't open directory: $!\n";
while ( my $node = readdir($dh) ) {
next if $node =~ /^\./;
print "MailID: $node\n";
}

Perl - A way to get only the first (.txt) filename from another directory without loading them all?

I have a directory that holds ~5000 2,400 sized .txt files.
I just want one filename from that directory; order does not matter.
The file will be processed and deleted.
This is not the scripts working directory.
The intention is:
to open that file,
read it,
do some stuff,
unlink it and then
loop to the next file.
My crude attempt does not check for only .txt files and also has to get all ~5000 filenames just for one filename. I am also possibly calling too many modules?
The Verify_Empty sub was intended to validate that there is a directory and there are files in it but, my attempts are failing so, here I am seeking assistance.
#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use CGI ':standard';
print CGI::header();
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
###
use vars qw(#Files $TheFile $PathToFile);
my $ListFolder = CGI::param('openthisfolder');
Get_File($ListFolder);
###
sub Get_File{
$ListFolder = shift;
unless (Verify_Empty($ListFolder)) {
opendir(DIR,$ListFolder);
#Files = grep { $_ ne '.' && $_ ne '..' } readdir(DIR);
closedir(DIR);
foreach(#Files){
$TheFile = $_;
}
#### This is where I go off to process and unlink file (sub not here) ####
$PathToFile = $ListFolder.'/'.$TheFile;
OpenFileReadPrepare($PathToFile);
#### After unlinked, the OpenFileReadPrepare sub loops back to this script.
}
else {
print qq~No more files to process~;
exit;
}
exit;
}
####
sub Verify_Empty {
$ListFolder = shift;
opendir(DIR, $ListFolder) or die "Not a directory";
return scalar(grep { $_ ne "." && $_ ne ".." } readdir(DIR)) == 0;
closedir(DIR);
}
Obviously I am very new at this. This method seems quite "hungry"?
Seems like a lot to grab one filename and process it!
Guidance would be great!
EDIT -Latest Attempt
my $dir = '..';
my #files = glob "$dir/*.txt";
for (0..$#files){
$files[$_] =~ s/\.txt$//;
}
my $PathAndFile =$files[0].'.txt';
print qq~$PathAndFile~;
This "works" but, it still gets all the filenames. None of the examples here, so far, have worked for me. I guess I will live with this for today until I figure it out. Perhaps I will revisit and see if anyone came up with anything better.
You could loop using readdir inside while loop. In that way readdir won't return all files but give only one at the time,
# opendir(DIR, ...);
my $first_file = "";
while (my $file = readdir(DIR)) {
next if $file eq "." or $file eq "..";
$first_file = $file;
last;
}
print "$first_file\n"; # first file in directory
You're calling readdir in list context, which returns all of the directory entries. Call it in scalar context instead:
my $file;
while( my $entry = readdir DIR ) {
$file = $entry, last if $entry =~ /\.txt$/;
}
if ( defined $file ) {
print "found $file\n";
# process....
}
Additionally, you read the directory twice; once to see if it has any entries, then to process it. You don't really need to see if the directory is empty; you get that for free during the processing loop.
Unless I am greatly mistaken, what you want is just to iterate over the files in a directory, and all this about "first or last" and "order does not matter" and deleting files is just confusion about how to do this.
So, let me put it in a very simple way for you, and see if that actually does what you want:
my $directory = "somedir";
for my $file (<$directory/*.txt>) {
# do stuff with the files
}
The glob will do the same as a *nix shell would, it would list the files with the .txt extension. If you want to do further tests on the files inside the loop, that is perfectly fine.
The downside is keeping 5000 file names in memory, and also that if processing this file list takes time, there is a possibility that it conflicts with other processes that also access these files.
An alternative is to simply read the files with readdir in a while loop, such as mpapec mentioned in his answer. The benefit is that each time you read a new file name, the file will be there. Also, you won't have to keep a large list of file in memory.

Perl finding a file based off it's extension through all subdirectories

I have a segment of code that is working that finds all of the .txt files in a given directory, but I can't get it to look in the subdirectories.
I need my script to do two things
scan through a folder and all of its subdirectories for a text file
print out just the last segments of its path
For example, I have a directory structed
C:\abc\def\ghi\jkl\mnop.txt
I script that points to the path C:\abc\def\. It then goes through each of the subfolders and finds mnop.txt and any other text file that is in that folder.
It then prints out ghi\jkl\mnop.txt
I am using this, but it really only prints out the file name and if the file is currently in that directory.
opendir(Dir, $location) or die "Failure Will Robertson!";
#reports = grep(/\.txt$/,readdir(Dir));
foreach $reports(#reports)
{
my $files = "$location/$reports";
open (res,$files) or die "could not open $files";
print "$files\n";
}
I do believe that this solution is more simple and easier to read. I hope it is helpful !
#!/usr/bin/perl
use File::Find::Rule;
my #files = File::Find::Rule->file()
->name( '*.txt' )
->in( '/path/to/my/folder/' );
for my $file (#files) {
print "file: $file\n";
}
What about using File::Find?
#!/usr/bin/env perl
use warnings;
use strict;
use File::Find;
# for example let location be tmp
my $location="tmp";
sub find_txt {
my $F = $File::Find::name;
if ($F =~ /txt$/ ) {
print "$F\n";
}
}
find({ wanted => \&find_txt, no_chdir=>1}, $location);
Much easier if you just use File::Find core module:
#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);
use File::Find;
my $Target = shift;
find(\&survey, #ARGV);
sub survey {
print "Found $File::Find::name\n" if ($_ eq $Target)
}
First argument: pathless name of file to search for. All subsequent arguments are directories to check. File::Find searches recursively, so you only need to name the top of a tree, all subdirectories will automatically be searched as well.
$File::Find::name is the full pathname of the file, so you could subtract your $location from that if you want a relative path.

Recursive Perl detail need help

i think this is a simple problem, but i'm stuck with it for some time now! I need a fresh pair of eyes on this.
The thing is i have this code in perl:
#!c:/Perl/bin/perl
use CGI qw/param/;
use URI::Escape;
print "Content-type: text/html\n\n";
my $directory = param ('directory');
$directory = uri_unescape ($directory);
my #contents;
readDir($directory);
foreach (#contents) {
print "$_\n";
}
#------------------------------------------------------------------------
sub readDir(){
my $dir = shift;
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
if(-d $dir.$file)
{
#print $dir.$file. " ----- DIR\n";
readDir($dir.$file);
}
push #contents, ($dir . $file);
}
closedir(DIR);
}
I've tried to make it recursive. I need to have all the files of all of the directories and subdirectories, with the full path, so that i can open the files in the future.
But my output only returns the files in the current directory and the files in the first directory that it finds. If i have 3 folders inside the directory it only shows the first one.
Ex. of cmd call:
"perl readDir.pl directory=C:/PerlTest/"
Thanks
Avoid wheel reinvention, use CPAN.
use Path::Class::Iterator;
my $it = Path::Class::Iterator->new(
root => $dir,
breadth_first => 0
);
until ($it->done) {
my $f = $it->next;
push #contents, $f;
}
Make sure that you don't let people set $dir to something that will let them look somewhere you don't want them to look.
Your problem is the scope of the directory handle DIR. DIR has global scope so each recursive call to readDir is using the same DIR; so, when you closdir(DIR) and return to the caller, the caller does a readdir on a closed directory handle and everything stops. The solution is to use a local directory handle:
sub readDir {
my ($dir) = #_;
opendir(my $dh, $dir) or die $!;
while(my $file = readdir($dh)) {
next if($file eq '.' || $file eq '..');
my $path = $dir . '/' . $file;
if(-d $path) {
readDir($path);
}
push(#contents, $path);
}
closedir($dh);
}
Also notice that you would be missing a directory separator if (a) it wasn't at the end of $directory or (b) on every recursive call. AFAIK, slashes will be internally converted to backslashes on Windows but you might want to use a path mangling module from CPAN anyway (I only care about Unix systems so I don't have any recommendations).
I'd also recommend that you pass a reference to #contents to readDir rather than leaving it as a global variable, fewer errors and less confusion that way. And don't use parentheses on sub definitions unless you know exactly what they do and what they're for. Some sanity checking and scrubbing on $directory would be a good idea as well.
There are many modules that are available for recursively listing files in a directory.
My favourite is File::Find::Rule
use strict ;
use Data::Dumper ;
use File::Find::Rule ;
my $dir = shift ; # get directory from command line
my #files = File::Find::Rule->in( $dir );
print Dumper( \#files ) ;
Which sends a list of files into an array ( which your program was doing).
$VAR1 = [
'testdir',
'testdir/file1.txt',
'testdir/file2.txt',
'testdir/subdir',
'testdir/subdir/file3.txt'
];
There a loads of other options, like only listing files with particular names. Or you can set it up as an iterator, which is described in How can I use File::Find
How can I use File::Find in Perl?
If you want to stick to modules that come with Perl Core, have a look at File::Find.