Recursive Perl detail need help - perl

i think this is a simple problem, but i'm stuck with it for some time now! I need a fresh pair of eyes on this.
The thing is i have this code in perl:
#!c:/Perl/bin/perl
use CGI qw/param/;
use URI::Escape;
print "Content-type: text/html\n\n";
my $directory = param ('directory');
$directory = uri_unescape ($directory);
my #contents;
readDir($directory);
foreach (#contents) {
print "$_\n";
}
#------------------------------------------------------------------------
sub readDir(){
my $dir = shift;
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
if(-d $dir.$file)
{
#print $dir.$file. " ----- DIR\n";
readDir($dir.$file);
}
push #contents, ($dir . $file);
}
closedir(DIR);
}
I've tried to make it recursive. I need to have all the files of all of the directories and subdirectories, with the full path, so that i can open the files in the future.
But my output only returns the files in the current directory and the files in the first directory that it finds. If i have 3 folders inside the directory it only shows the first one.
Ex. of cmd call:
"perl readDir.pl directory=C:/PerlTest/"
Thanks

Avoid wheel reinvention, use CPAN.
use Path::Class::Iterator;
my $it = Path::Class::Iterator->new(
root => $dir,
breadth_first => 0
);
until ($it->done) {
my $f = $it->next;
push #contents, $f;
}
Make sure that you don't let people set $dir to something that will let them look somewhere you don't want them to look.

Your problem is the scope of the directory handle DIR. DIR has global scope so each recursive call to readDir is using the same DIR; so, when you closdir(DIR) and return to the caller, the caller does a readdir on a closed directory handle and everything stops. The solution is to use a local directory handle:
sub readDir {
my ($dir) = #_;
opendir(my $dh, $dir) or die $!;
while(my $file = readdir($dh)) {
next if($file eq '.' || $file eq '..');
my $path = $dir . '/' . $file;
if(-d $path) {
readDir($path);
}
push(#contents, $path);
}
closedir($dh);
}
Also notice that you would be missing a directory separator if (a) it wasn't at the end of $directory or (b) on every recursive call. AFAIK, slashes will be internally converted to backslashes on Windows but you might want to use a path mangling module from CPAN anyway (I only care about Unix systems so I don't have any recommendations).
I'd also recommend that you pass a reference to #contents to readDir rather than leaving it as a global variable, fewer errors and less confusion that way. And don't use parentheses on sub definitions unless you know exactly what they do and what they're for. Some sanity checking and scrubbing on $directory would be a good idea as well.

There are many modules that are available for recursively listing files in a directory.
My favourite is File::Find::Rule
use strict ;
use Data::Dumper ;
use File::Find::Rule ;
my $dir = shift ; # get directory from command line
my #files = File::Find::Rule->in( $dir );
print Dumper( \#files ) ;
Which sends a list of files into an array ( which your program was doing).
$VAR1 = [
'testdir',
'testdir/file1.txt',
'testdir/file2.txt',
'testdir/subdir',
'testdir/subdir/file3.txt'
];
There a loads of other options, like only listing files with particular names. Or you can set it up as an iterator, which is described in How can I use File::Find
How can I use File::Find in Perl?
If you want to stick to modules that come with Perl Core, have a look at File::Find.

Related

readdir() attempted on invalid dirhandle $par_dir

I am trying just to execute a perl script inside multiple folders, but I don't understand why I have a problem with readdir() attempted on invalid dirhandle $par_dir. $parent is printed good but $par_dir is printed like "GLOB(0x17e7a68)".
Any idea of why it is happening? Thanks a lot!
Here the code:
#!/usr/bin/perl
use warnings;
use Cwd;
use FileHandle;
use File::Glob;
my $parent = "/media/sequentia/NAS/projects/131-prgdb3/01- DATA/All_plant_genomes_proteomes";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
print $parent."\n";
print $par_dir."\n";
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
print $path."\n";
chdir($path) or die;
#files = glob( $path. '/*' );
foreach $filename (#files){
print $filename ."\n";
system ("grep 'comment' PutativeGenes.txt | wc -l");
system ("grep 'class' PutativeGenes.txt | wc -l");
}
}
closedir($par_dir);
The problem is probably that the directory you specify in $parent doesn't exist. You must always check to make sure that a call to open or opendir succeeded before going on to use the handle
That path step 01- DATA is suspicious. I would expect 01-DATA or perhaps 01- DATA with a single space, but multiple spaces are rarely used because they are invisible and difficult to count
Here are some other thoughts on your program
You must always use strict and use warnings 'all' at the top of every Perl program you write. That will alert you to many simple errors that you may otherwise overlook
Your statement next if ( $sub_folders =~ /^..?$/ ) is wrong because the dots must be escaped. As it is you are discarding any name that is one or two characters in length
If your path really does contain spaces then you need to use File::Glob ':bsd_glob', as otherwise the spaces will be treated as separators between multipl glob patterns
You execute the foreach loop for every file or directory found in $path, but your system calls aren't affected by the name of that file, so you're making the same call multiple times
It's worth noting that glob will do all the directory searching for you. I would write something like this
#!/usr/bin/perl
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my $parent_dir = "/media/sequentia/NAS/projects/131-prgdb3/01-DATA/All_plant_genomes_proteomes";
print "$parent_dir\n";
while ( my $path = glob "$parent_dir/*" ) {
next unless -d $path;
print "$path\n";
chdir $path or die qq{Unable to chdir to "$path": $!};
while ( my $filename = glob "$path/*" ) {
next unless -f $filename;
print "$filename\n";
system "grep 'comment' PutativeGenes.txt | wc -l";
system "grep 'class' PutativeGenes.txt | wc -l";
}
}
Probably opendir() is failing giving the invalid file handle (probably it fails because you try to open a nonexistent $parent directory).
If opendir fails it will return false, and $par_dir is left unchanged as undef. If you attempt to call readdir() on an undefined file handle you will get a runtime warning like:
readdir() attempted on invalid dirhandle at ...
Therefore you should always check the return code from opendir. For example, you can do:
opendir($par_dir, $parent) or die "opendir() failed: $!";
or see more suggestions on what to do in this link Does die have to be used if opening a file fails?
Note that your code could have been simplified using File::Find::Rule, for example:
my #dirs = File::Find::Rule
->directory->maxdepth(1)->mindepth(1)->in( $parent );
for my $dir (#dirs) {
say "$dir";
my #files = File::Find::Rule->file->maxdepth(1)->in( $dir );
say "--> $_" for #files;
}
Alternatively, if you don't need the directory names:
my #files = File::Find::Rule
->file->maxdepth(2)->mindepth(2)->in( $parent );
say for #files;

PERL - issues extracting a file from directory/subdirectories/..?

Quick note: I've been stuck with this problem for quite a few days and I'm not necessarily hoping to find an answer, but any kind of help that might "enlighten" me. I would also like to mention that I am a beginner in Perl, so my knowledge is not very vast and in this case recursivity is not my forte. here goes:
What I would like my Perl script to do is the following:
take a directory as an argument
go into the directory that was passed and its subdirectories to find an *.xml file
store the full path of the found *.xml file into an array.
Below is the code that i have so far, but i haven't managed to make it work:
#! /usr/bin/perl -W
my $path;
process_files ($path);
sub process_files
{
opendir (DIR, $path) or die "Unable to open $path: $!";
my #files =
# Third: Prepend the full path
map { $path . '/' . $_ }
# Second: take out '.' and '..'
grep { !/^\.{1,2}$/ }
# First: get all files
readdir (DIR);
closedir (DIR);
for (#files)
{
if (-d $_)
{
push #files, process_files ($_);
}
else
{
#analyse document
}
}
return #files;
}
Anybody have any clues to point me in the right direction? Or an easier way to do it?
Thank you,
sSmacKk :D
Sounds like you should be using File::Find. Its find subroutine will traverse a directory recursively.
use strict;
use warnings;
use File::Find;
my #files;
my $path = shift;
find(
sub { (-f && /\.xml$/i) or return;
push #files, $File::Find::name;
}, $path);
The subroutine will perform whatever code it contains on the files it finds. This one simply pushes the XML file names (with full path) onto the #files array. Read more in the documentation for the File::Find module, which is a core module in perl 5.

What is the most efficient way to open/act upon all of the files in a directory?

I need to perform my script (a search) on all the files of a directory. Here are the methods which work. I am just asking which is best. (I need file names of form: parsedchpt31_4.txt)
Glob:
my $parse_corpus; #(for all options)
##glob (only if all files in same directory as script?):
my #files = glob("parsed"."*.txt");
foreach my $file (#files) {
open($parse_corpus, '<', "$file") or die $!;
... all my code...
}
Readdir with while and conditions:
##readdir:
my $dir = '.';
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file"); ##Ensure it's a file
next unless ($file =~ m/^parsed.*\.txt/); ##Ensure it's a parsed file
open($parse_corpus, '<', "$file") or die "Couldn't open directory $!";
... all my code...
}
Readdir with foreach and grep:
##readdir+grep:
my $dir = '.';
opendir(DIR, $dir) or die $!;
foreach my $file (grep {/^parsed.*\.txt/} readdir (DIR)) {
next unless (-f "$dir/$file"); ##Ensure it's a file
open($parse_corpus, '<', "$file") or die "Couldn't open directory $!";
... all my code...
}
File::Find:
##File::Find
my $dir = "."; ##current directory: could be (include quotes): '/Users/jon/Desktop/...'
my #files;
find(\&open_file, $dir); ##built in function
sub open_file {
push #files, $File::Find::name if(/^parsed.*\.txt/);
}
foreach my $file (#files) {
open($parse_corpus, '<', "$file") or die $!;
...all my code...
}
Is there another way? Is it good to enclose my entire script in the loops? Is it okay I don't use closedir? I'm passing this off to others, I'm not sure where their files will be (may not be able to use glob)
Thanks a lot, hopefully this is the right place to ask this.
The best or most efficient approach depends on your purposes and the larger context. Do you mean best in terms of raw speed, simplicity of the code, or something else? I'm skeptical that memory considerations should drive this choice. How many files are in the directory?
For sheer practicality, the glob approach works fairly well. Before resorting to anything more involved, I'd ask whether there is a problem.
If you're able to use other modules, another approach is to let someone else worry about the grubby details:
use File::Util qw();
my $fu = File::Util->new;
my #files = $fu->list_dir($dir, qw(--with-paths --files-only));
Note that File::Find performs a recursive search descending into all subdirectories. Many times you don't want or need that.
I would also add that I dislike your two readdir examples because they comingle different pieces of functionality: (1) getting file names, and (2) processing individual files. I would keep those jobs separate.
my $dir = '.';
opendir(my $dh, $dir) or die $!; # Use a lexical directory handle.
my #files =
grep { -f }
map { "$dir/$_" }
grep { /^parsed.*\.txt$/ }
readdir($dh);
for my $file (#files){
...
}
I think using a while loop is the safer answer. Why? Because loading all the file names into an array could mean a large memory usage, and using line-by-line operation avoids that problem.
I prefer readdir to glob, but that's probably more a matter of taste.
If performance is an issue, one could say that the -f check is unnecessary for any file with the .txt extension.
I find that a recursive directory walking function using the perfect partners opendir/readdir and File::chdir (my fav CPAN module, great for cross-platform) allows one to easily and clearly manipulate anything in a directory including subdirectories if desired (if not, omit the recursion).
Example (a simple deep ls):
#!/usr/bin/env perl
use strict;
use warnings;
use File::chdir; #Provides special variable $CWD
# assign $CWD sets working directory
# can be local to a block
# evaluates/stringifies to absolute path
# other great features
walk_dir(shift);
sub do_something {
print shift . "\n";
}
sub walk_dir {
my $dir = shift;
local $CWD = $dir;
opendir my $dh, $CWD; # lexical opendir, so no closedir needed
print "In: $CWD\n";
while (my $entry = readdir $dh) {
next if ($entry =~ /^\.+$/);
# other exclusion tests
if (-d $entry) {
walk_dir($entry);
} elsif (-f $entry) {
do_something($entry);
}
}
}

How do I read multiple directories and read the contents of subdirectories in Perl?

I have a folder and inside that I have many subfolders. In those subfolders I have many .html files to be read. I have written the following code to do that. It opens the parent folder and also the first subfolder and it prints only one .html file. It shows error:
NO SUCH FILE OR DIRECTORY
I dont want to change the entire code. Any modifications in the existing code will be good for me.
use FileHandle;
opendir PAR_DIR,"D:\\PERL\\perl_programes\\parent_directory";
while (our $sub_folders = readdir(PAR_DIR))
{
next if(-d $sub_folders);
opendir SUB_DIR,"D:\\PERL\\perl_programes\\parent_directory\\$sub_folders";
while(our $file = readdir(SUB_DIR))
{
next if($file !~ m/\.html/i);
print_file_names($file);
}
close(FUNC_MODEL1);
}
close(FUNC_MODEL);
sub print_file_names()
{
my $fh1 = FileHandle->new("D:\\PERL\\perl_programes\\parent_directory\\$file")
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Your posted code looks way overcomplicated. Check out File::Find::Rule and you could do most of that heavy lifting in very little code.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
I mean isn't that sexy?!
A user commented that you may be wishing to use only Depth=2 entries.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->mindepth(2)->maxdepth(2)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
Will Apply this restriction.
You're not extracting the supplied $file parameter in the print_file_names() function.
It should be:
sub print_file_names()
{
my $file = shift;
...
}
Your -d test in the outer loop looks wrong too, BTW. You're saying next if -d ... which means that it'll skip the inner loop for directories, which appears to be the complete opposite of what you require. The only reason it's working at all is because you're testing $file which is only the filename relative to the path, and not the full path name.
Note also:
Perl on Windows copes fine with / as a path separator
Set your parent directory once, and then derive other paths from that
Use opendir($scalar, $path) instead of opendir(DIR, $path)
nb: untested code follows:
use strict;
use warnings;
use FileHandle;
my $parent = "D:/PERL/perl_programes/parent_directory";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {
next unless $file =~ /\.html?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
}
closedir($par_dir);
sub print_file_names()
{
my $file = shift;
my $fh1 = FileHandle->new($file)
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Please start putting:
use strict;
use warnings;
at the top of all your scripts, it will help you avoid problems like this and make your code much more readable.
You can read more about it here: Perlmonks
You are going to need to change the entire code to make it robust:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $top = $ENV{TEMP};
find( { wanted => \&wanted, no_chdir=> 1 }, $top );
sub wanted {
return unless -f and /\.html$/i;
print $_, "\n";
}
__END__
Have you considered using
File::Find
Here's one method which does not require to use File::Find:
First open the root directory, and store all the sub-folders' names in an array by using readdir;
Then, use foreach loop. For each sub-folder, open the new directory by linking the root directory and the folder's name. Still use readdir to store the file names in an array.
The last step is to write the codes for processing the files inside this foreach loop.
Special thanks to my teacher who has given me this idea :) It really worked well!

How can I add a prefix to all filenames under a directory?

I am trying to prefix a string (reference_) to the names of all the *.bmp files in all the directories as well sub-directories. The first time we run the silk script, it will create directories as well subdirectories, and under each subdirectory it will store each mobile application's sceenshot with .bmp extension.
When I run the automated silkscript for second time it will again create the *.bmp files in all the subdirectories. Before running the script for second time I want to prefix all the *.bmp with a string reference_.
For example first_screen.bmp to reference_first_screen.bmp,
I have the directory structure as below:
C:\Image_Repository\BG_Images\second
...
C:\Image_Repository\BG_Images\sixth
having first_screen.bmp and first_screen.bmp files etc...
Could any one help me out?
How can I prefix all the image file names with reference_ string?
When I run the script for second time, the Perl script in silk will take both the images from the sub-directory and compare them both pixel by pixel. I am trying with code below.
Could you please guide me how can I proceed to complete this task.
#!/usr/bin/perl -w
&one;
&two;
sub one {
use Cwd;
my $dir ="C:\\Image_Repository";
#print "$dir\n";
opendir(DIR,"+<$dir") or "die $!\n";
my #dir = readdir DIR;
#$lines=#dir;
delete $dir[-1];
print "$lines\n";
foreach my $item (#dir)
{
print "$item\n";
}
closedir DIR;
}
sub two {
use Cwd;
my $dir1 ="C:\\Image_Repository\\BG_Images";
#print "$dir1\n";
opendir(D,"+<$dir1") or "die $!\n";
my #dire = readdir D;
#$lines=#dire;
delete $dire[-1];
#print "$lines\n";
foreach my $item (#dire)
{
#print "$item\n";
$dir2="C:\\Image_Repository\\BG_Images\\$item";
print $dir2;
opendir(D1,"+<$dir2") or die " $!\n";
my #files=readdir D1;
#print "#files\n";
foreach $one (#files)
{
$one="reference_".$one;
print "$one\n";
#rename $one,Reference_.$one;
}
}
closedir DIR;
}
I tried open call with '+<' mode but I am getting compilation error for the read and write mode.
When I am running this code, it shows the files in BG_images folder with prefixed string but actually it's not updating the files in the sub-directories.
You don't open a directory for writing. Just use opendir without the mode parts of the string:
opendir my($dir), $dirname or die "Could not open $dirname: $!";
However, you don't need that. You can use File::Find to make the list of files you need.
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use File::Find;
use File::Find::Closures qw(find_regular_files);
use File::Spec::Functions qw(catfile);
my( $wanted, $reporter ) = find_regular_files;
find( $wanted, $ARGV[0] );
my $prefix = 'recursive_';
foreach my $file ( $reporter->() )
{
my $basename = basename( $file );
if( index( $basename, $prefix ) == 0 )
{
print STDERR "$file already has '$prefix'! Skipping.\n";
next;
}
my $new_path = catfile(
dirname( $file ),
"recursive_$basename"
);
unless( rename $file, $new_path )
{
print STDERR "Could not rename $file: $!\n";
next;
}
print $file, "\n";
}
You should probably check out the File::Find module for this - it will make recursing up and down the directory tree simpler.
You should probably be scanning the file names and modifying those that don't start with reference_ so that they do. That may require splitting the file name up into a directory name and a file name and then prefixing the file name part with reference_. That's done with the File::Basename module.
At some point, you need to decide what happens when you run the script the third time. Do the files that already start with reference_ get overwritten, or do the unprefixed files get overwritten, or what?
The reason the files are not being renamed is that the rename operation is commented out. Remember to add use strict; at the top of your script (as well as the -w option which you did use).
If you get a list of files in an array #files (and the names are base names, so you don't have to fiddle with File::Basename), then the loop might look like:
foreach my $one (#files)
{
my $new = "reference_$one";
print "$one --> $new\n";
rename $one, $new or die "failed to rename $one to $new ($!)";
}
With the aid of find utility from coreutils for Windows:
$ find -iname "*.bmp" | perl -wlne"chomp; ($prefix, $basename) = split(m~\/([^/]+)$~, $_); rename($_, join(q(/), ($prefix, q(reference_).$basename))) or warn qq(failed to rename '$_': $!)"