Perl recursive code for scanning directory tree - perl

In this script that scan a directory recursively, i would like to know what happen when the "ScanDirectory($name)" is called -> does the "next" get executed right after?
Cause if the #names gets populated with new directories after each loop then we get inside the first directory in #names, and if there is other directories there Scandirectory is called again but the other directories in the previous #names are replaced and so they are not treated by the loop? Sorry if i don't make sense.
i know there is already a module for this purpose, but i want to improve my understanding of how this loop code works so i can deal with recursive code in other situations
sub ScanDirectory {
my $workdir = shift;
my $startdir = cwd;
chdir $workdir or die;
opendir my $DIR, '.' or die;
my #names = readdir $DIR or die;
closedir $DIR;
foreach my $name (#names) {
next if ($name eq ".");
next if ($name eq "..");
if (-d $name) {
ScanDirectory($name);
next;
}
}
chdir $startdir or die;
}
ScanDirectory('.');

Is this your code?
In the subroutine you call my #names = readdir that defines a new lexically scoped variable, so each recursion will create a new instance of that variable. It might work if you use our instead of my. Variables defined with our are packaged scope which means each call will use the same #names variable. Actually not even then. You're cleaning out the previous value of the variable with your readdir.
You'll be better off using File::Find. File::Find comes with most Perl installations, so it's always available.
use strict;
use warnings;
use File::Find;
my #names;
find ( sub {
next if $_ eq "." or $_ eq "..";
push #names, $File::Find::name;
}, "."
);
This is simpler to understand, easier to write, more flexible, and much more efficient since it doesn't call itself recursively. Most of the time, you'll see this without the sub being embedded in the function:
my #names;
find ( \&wanted, ".");
sub wanted {
next if $_ eq "." or $_ eq "..";
push #names, $File::Find::name;
}
I prefer to embed the subroutine if the subroutine is fairly small. It prevents the subroutine from wandering away from the find call, and it prevents the mysterious instance of #names being used in the subroutine without a clear definition.
Okay, they're both the same. Both are subroutine references (one is called wanted and one is an anonymous subroutine). However, the first use of #names doesn't appear so mysterious since it's literally defined on the line right above the find call.
If you must write your own routine from scratch (maybe a homework assignment?), then don't use recursion. use push to push the reversed readdir into an array.
Then, pop off the items of the array one at a time. If you find a directory, read it (again in reverse) and push it onto your array. Be careful with . and ...

This is strangely-written code, especially if it is published in a book.
Your confusion is because the #names array is declared lexically, which means it exists only for the extent of the current block, and is unique to a prticular stack frame (subroutine call). So each call of scan_directory (local identifiers shouldn't really contain capital letters) has its own independent #names array which vanishes when the subroutine exits, and there is no question of "replacing" the contents.
Also, the next you're referring to is redundant: it skips to the next iteration of the #names array, which is just what would happen without it.
It would be much better written like this
sub scan_directory {
my ($workdir) = #_;
my $startdir = cwd;
chdir $workdir or die $!;
opendir my $dh, '.' or die $!;
while (my $name = readdir $dh) {
next if $name eq '.' or $name eq '..';
scan_directory($name) if -d $name;
}
chdir $startdir or die $!;
}
scan_directory('.');

Related

readdir() attempted on invalid dirhandle $par_dir

I am trying just to execute a perl script inside multiple folders, but I don't understand why I have a problem with readdir() attempted on invalid dirhandle $par_dir. $parent is printed good but $par_dir is printed like "GLOB(0x17e7a68)".
Any idea of why it is happening? Thanks a lot!
Here the code:
#!/usr/bin/perl
use warnings;
use Cwd;
use FileHandle;
use File::Glob;
my $parent = "/media/sequentia/NAS/projects/131-prgdb3/01- DATA/All_plant_genomes_proteomes";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
print $parent."\n";
print $par_dir."\n";
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
print $path."\n";
chdir($path) or die;
#files = glob( $path. '/*' );
foreach $filename (#files){
print $filename ."\n";
system ("grep 'comment' PutativeGenes.txt | wc -l");
system ("grep 'class' PutativeGenes.txt | wc -l");
}
}
closedir($par_dir);
The problem is probably that the directory you specify in $parent doesn't exist. You must always check to make sure that a call to open or opendir succeeded before going on to use the handle
That path step 01- DATA is suspicious. I would expect 01-DATA or perhaps 01- DATA with a single space, but multiple spaces are rarely used because they are invisible and difficult to count
Here are some other thoughts on your program
You must always use strict and use warnings 'all' at the top of every Perl program you write. That will alert you to many simple errors that you may otherwise overlook
Your statement next if ( $sub_folders =~ /^..?$/ ) is wrong because the dots must be escaped. As it is you are discarding any name that is one or two characters in length
If your path really does contain spaces then you need to use File::Glob ':bsd_glob', as otherwise the spaces will be treated as separators between multipl glob patterns
You execute the foreach loop for every file or directory found in $path, but your system calls aren't affected by the name of that file, so you're making the same call multiple times
It's worth noting that glob will do all the directory searching for you. I would write something like this
#!/usr/bin/perl
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my $parent_dir = "/media/sequentia/NAS/projects/131-prgdb3/01-DATA/All_plant_genomes_proteomes";
print "$parent_dir\n";
while ( my $path = glob "$parent_dir/*" ) {
next unless -d $path;
print "$path\n";
chdir $path or die qq{Unable to chdir to "$path": $!};
while ( my $filename = glob "$path/*" ) {
next unless -f $filename;
print "$filename\n";
system "grep 'comment' PutativeGenes.txt | wc -l";
system "grep 'class' PutativeGenes.txt | wc -l";
}
}
Probably opendir() is failing giving the invalid file handle (probably it fails because you try to open a nonexistent $parent directory).
If opendir fails it will return false, and $par_dir is left unchanged as undef. If you attempt to call readdir() on an undefined file handle you will get a runtime warning like:
readdir() attempted on invalid dirhandle at ...
Therefore you should always check the return code from opendir. For example, you can do:
opendir($par_dir, $parent) or die "opendir() failed: $!";
or see more suggestions on what to do in this link Does die have to be used if opening a file fails?
Note that your code could have been simplified using File::Find::Rule, for example:
my #dirs = File::Find::Rule
->directory->maxdepth(1)->mindepth(1)->in( $parent );
for my $dir (#dirs) {
say "$dir";
my #files = File::Find::Rule->file->maxdepth(1)->in( $dir );
say "--> $_" for #files;
}
Alternatively, if you don't need the directory names:
my #files = File::Find::Rule
->file->maxdepth(2)->mindepth(2)->in( $parent );
say for #files;

How to regex and get file and directory path

My array (#array) contains these directory structures. below directory and files path.
/home/testuser/mysql/data/userdata/pushdir/
/home/testuser/mysql/data/userdata/pushdir/test1.sql
/home/testuser/mysql/data/userdata/nextdir/testdir/
/home/testuser/mysql/data/userdata/pushdir/testdir/test2.sql
/home/testuser/mysql/data/userdata/ - from above list till this line path is constant.
I am trying to process the files to another loop . for that I am looking for the file names output only like "pushdir/test1.sql" and "pushdir/testdir/test2.sql"
I am using this code to get that, but I am not getting the expected output like "pushdir/test1.sql" and "pushdir/testdir/test2.sql". Please share your ideas to regex and get the output
foreach $dir(#array)
{
chomp $dir;
print "$dir\n";
#files = <$dir/*>;
my #names=join("\n", sort(#files));
print #names,"\n";
}
foreach my $filepath (#names) {
(my $volume,my $dirs, my $filelist) = File::Spec->splitpath(+$filepath );
print "$filelist\n";
}
#names is declared with my, and therefore scoped inside the foreach $dir loop only. There's no #names array to iterate over in the second foreach loop. Moreover, join
returns a string, you probably don't want the string to go to the array, you want individual filesnames to go there.
Use strict (it will tell you there's no #names declared) and warnings. Indent code blocks properly to see what commands belong where.
#!/usr/bin/perl
use warnings;
use strict;
use File::Spec;
my #array = qw( home/testuser/mysql/data/userdata/pushdir/
home/testuser/mysql/data/userdata/pushdir/test1.sql
home/testuser/mysql/data/userdata/nextdir/testdir/
home/testuser/mysql/data/userdata/pushdir/testdir/test2.sql );
my #names;
for my $dir (#array) {
print "DIR: $dir\n";
push #names, sort glob "$dir/*";
print "NAMES: #names\n";
}
for my $filepath (#names) {
my ($volume, $dirs, $filelist) = 'File::Spec'->splitpath($filepath);
print "FL: $filelist\n";
}

How to pass file names to a subroutine in perl?

I'm writing a perl script and I would like to pass a file name for the output file to a subroutine.
I tried something like this:
use strict;
use warnings;
test("Output.dat");
sub test {
my $name = #_;
open(B, ">$name") or die "Failure \n";
print B "This is a test! \n";
close(B);
}
I'm going to use the subroutine multiple times, so i have to pass the file name and cannot declare it within the subroutine.
I hope you can help me :)
Your problem is this line:
my $name = #_;
You are assigning an array to a scalar variable. In Perl this will give you the number of elements in the array - so I expect you're ending up with "1" in $name.
There are a number of ways to get the first element from an array;
my $name = $_[0]; # Explicitly get the first element
my $name = shift #_; # Get the first element (and remove it from the array)
my $name = shift; # Same as the previous example - shift works on #_ by default in a subroutine
my ($name) = #_; # The parentheses make it into a list assignment
The last two are the ones that you will see most commonly.
A few more points:
1/ You would get a better clue to the problem if you included $name in your error message.
open(A, ">$name") or die "Failure: $name \n";
Or, even better, the error message that Perl gets from your operating system.
open(A, ">$name") or die "Could not open $name: $!\n";
(I've added back the missing comma - I assume that was a typo.)
2/ This days, it is generally accepted as good practice to use the three-arg version of open and lexical filehandles.
open(my $output_fh, '>', $name) or die "Failure: $name \n";
3/ In your example you open a filehandle called "A", but then try to write to a filehandle called "B". Is this a typo?
my $name = #_;
Will assign to $name value of #_ in scalar mode. It means number of elements in array _. It is a number of arguments. It is most probably not what you would like. So you have to assign an array to an array or a scalar to a scalar. You have two options
my $name = $_[0];
or
my ($name) = #_; # or even (my $name) = #_;
Where I would prefer later because it can be easily modified to my ($a, $b, $c) = #_; and it is Perl idiom.
But your code has more flaws. For example, you should use this open form
open my $fd, '>', $name or die "cannot open > $name: $!";
This has few advantages. The first, you use lexical scoped IO handle which prevents leaking outside of the lexical scope and is automatically closed when exits this lexical scope. The second, list form prevents interpretation of $name content other than file name.
So resulting code should look like:
sub test {
my ($name) = #_;
open my $fd, '>', $name
or die "cannot open > $name: $!";
print $fd "This is a test!\n";
}
Before answering your question, would like to suggest one thing -
Always use 3 parameter open() version like -
open (my $FH, '>', 'file.txt') or die "Cannot open the file:$!";
If you are passing single parameter to the subroutine, you can use 'shift' operator.
test("Output.dat");
sub test {
my $name = shift;
open (my $B, '>', $name) or die "Cannot open the file:$!";
print $B "This is a test! \n";
close($B);
}

Recursive Perl detail need help

i think this is a simple problem, but i'm stuck with it for some time now! I need a fresh pair of eyes on this.
The thing is i have this code in perl:
#!c:/Perl/bin/perl
use CGI qw/param/;
use URI::Escape;
print "Content-type: text/html\n\n";
my $directory = param ('directory');
$directory = uri_unescape ($directory);
my #contents;
readDir($directory);
foreach (#contents) {
print "$_\n";
}
#------------------------------------------------------------------------
sub readDir(){
my $dir = shift;
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
if(-d $dir.$file)
{
#print $dir.$file. " ----- DIR\n";
readDir($dir.$file);
}
push #contents, ($dir . $file);
}
closedir(DIR);
}
I've tried to make it recursive. I need to have all the files of all of the directories and subdirectories, with the full path, so that i can open the files in the future.
But my output only returns the files in the current directory and the files in the first directory that it finds. If i have 3 folders inside the directory it only shows the first one.
Ex. of cmd call:
"perl readDir.pl directory=C:/PerlTest/"
Thanks
Avoid wheel reinvention, use CPAN.
use Path::Class::Iterator;
my $it = Path::Class::Iterator->new(
root => $dir,
breadth_first => 0
);
until ($it->done) {
my $f = $it->next;
push #contents, $f;
}
Make sure that you don't let people set $dir to something that will let them look somewhere you don't want them to look.
Your problem is the scope of the directory handle DIR. DIR has global scope so each recursive call to readDir is using the same DIR; so, when you closdir(DIR) and return to the caller, the caller does a readdir on a closed directory handle and everything stops. The solution is to use a local directory handle:
sub readDir {
my ($dir) = #_;
opendir(my $dh, $dir) or die $!;
while(my $file = readdir($dh)) {
next if($file eq '.' || $file eq '..');
my $path = $dir . '/' . $file;
if(-d $path) {
readDir($path);
}
push(#contents, $path);
}
closedir($dh);
}
Also notice that you would be missing a directory separator if (a) it wasn't at the end of $directory or (b) on every recursive call. AFAIK, slashes will be internally converted to backslashes on Windows but you might want to use a path mangling module from CPAN anyway (I only care about Unix systems so I don't have any recommendations).
I'd also recommend that you pass a reference to #contents to readDir rather than leaving it as a global variable, fewer errors and less confusion that way. And don't use parentheses on sub definitions unless you know exactly what they do and what they're for. Some sanity checking and scrubbing on $directory would be a good idea as well.
There are many modules that are available for recursively listing files in a directory.
My favourite is File::Find::Rule
use strict ;
use Data::Dumper ;
use File::Find::Rule ;
my $dir = shift ; # get directory from command line
my #files = File::Find::Rule->in( $dir );
print Dumper( \#files ) ;
Which sends a list of files into an array ( which your program was doing).
$VAR1 = [
'testdir',
'testdir/file1.txt',
'testdir/file2.txt',
'testdir/subdir',
'testdir/subdir/file3.txt'
];
There a loads of other options, like only listing files with particular names. Or you can set it up as an iterator, which is described in How can I use File::Find
How can I use File::Find in Perl?
If you want to stick to modules that come with Perl Core, have a look at File::Find.

How do I read multiple directories and read the contents of subdirectories in Perl?

I have a folder and inside that I have many subfolders. In those subfolders I have many .html files to be read. I have written the following code to do that. It opens the parent folder and also the first subfolder and it prints only one .html file. It shows error:
NO SUCH FILE OR DIRECTORY
I dont want to change the entire code. Any modifications in the existing code will be good for me.
use FileHandle;
opendir PAR_DIR,"D:\\PERL\\perl_programes\\parent_directory";
while (our $sub_folders = readdir(PAR_DIR))
{
next if(-d $sub_folders);
opendir SUB_DIR,"D:\\PERL\\perl_programes\\parent_directory\\$sub_folders";
while(our $file = readdir(SUB_DIR))
{
next if($file !~ m/\.html/i);
print_file_names($file);
}
close(FUNC_MODEL1);
}
close(FUNC_MODEL);
sub print_file_names()
{
my $fh1 = FileHandle->new("D:\\PERL\\perl_programes\\parent_directory\\$file")
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Your posted code looks way overcomplicated. Check out File::Find::Rule and you could do most of that heavy lifting in very little code.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
I mean isn't that sexy?!
A user commented that you may be wishing to use only Depth=2 entries.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->mindepth(2)->maxdepth(2)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
Will Apply this restriction.
You're not extracting the supplied $file parameter in the print_file_names() function.
It should be:
sub print_file_names()
{
my $file = shift;
...
}
Your -d test in the outer loop looks wrong too, BTW. You're saying next if -d ... which means that it'll skip the inner loop for directories, which appears to be the complete opposite of what you require. The only reason it's working at all is because you're testing $file which is only the filename relative to the path, and not the full path name.
Note also:
Perl on Windows copes fine with / as a path separator
Set your parent directory once, and then derive other paths from that
Use opendir($scalar, $path) instead of opendir(DIR, $path)
nb: untested code follows:
use strict;
use warnings;
use FileHandle;
my $parent = "D:/PERL/perl_programes/parent_directory";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {
next unless $file =~ /\.html?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
}
closedir($par_dir);
sub print_file_names()
{
my $file = shift;
my $fh1 = FileHandle->new($file)
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Please start putting:
use strict;
use warnings;
at the top of all your scripts, it will help you avoid problems like this and make your code much more readable.
You can read more about it here: Perlmonks
You are going to need to change the entire code to make it robust:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $top = $ENV{TEMP};
find( { wanted => \&wanted, no_chdir=> 1 }, $top );
sub wanted {
return unless -f and /\.html$/i;
print $_, "\n";
}
__END__
Have you considered using
File::Find
Here's one method which does not require to use File::Find:
First open the root directory, and store all the sub-folders' names in an array by using readdir;
Then, use foreach loop. For each sub-folder, open the new directory by linking the root directory and the folder's name. Still use readdir to store the file names in an array.
The last step is to write the codes for processing the files inside this foreach loop.
Special thanks to my teacher who has given me this idea :) It really worked well!