Recursive directory traversal in perl - perl

I intend to recursively traverse a directory containing this piece of perl script.
The idea is to traverse all directories whose parent directory contains the perl script and list all files path into a single array variable. Then return the list.
Here comes the error msg:
readdir() attempted on invalid dirhandle $DIR at xxx
closedir() attempted on invalid dirhandle $DIR at xxx
Code is attached for reference, Thank you in advance.
use strict;
use warnings;
use Cwd;
our #childfile = ();
sub recursive_dir{
my $current_dir = $_[0]; # a parameter
opendir(my $DIR,$current_dir) or die "Fail to open current directory,error: $!";
while(my $contents = readdir($DIR)){
next if ($contents =~ m/^\./); # filter out "." and ".."
#if-else clause separate dirs from files
if(-d "$contents"){
#print getcwd;
#print $contents;
#closedir($DIR);
recursive_dir(getcwd."/$contents");
print getcwd."/$contents";
}
else{
if($contents =~ /(?<!\.pl)$/){
push(#childfile,$contents);
}
}
}
closedir($DIR);
#print #childfile;
return #childfile;
}
recursive_dir(getcwd);

Please tell us if this is homework? You are welcome to ask for help with assignments, but it changes the sort of answer you should be given.
You are relying on getcwd to give you the current directory that you are processing, yet you never change the current working directory so your program will loop endlessly and eventually run out of memory. You should simply use $current_dir instead.
I don't believe that those error messages can be produced by the program you show. Your code checks whether opendir has succeeded and the program dies unless $DIR is valid, so the subsequent readdir and closedir must be using a valid handle.
Some other points:
Comments like # a parameter are ridiculous and only serve to clutter your code
Upper-case letters are generally reserved for global identifiers like package names. And $dir is a poor name for a directory handle, as it could also mean the directory name or the directory path. Use $dir_handle or $dh
It is crazy to use a negative look-behind just to check that a file name doesn't end with .pl. Just use push #childfile, $contents unless $contents =~ /\.pl$/
You never use the return value from your subroutine, so it is wasteful of memory to return what could be an enormous array from every call. #childfile is accessible throughout the program so you can just access it directly from anywhere
Don't put scalar variables inside double quotes. It simply forces the value to a string, which is probably unnecessary and may cause arcane bugs. Use just -d $contents
You probably want to ignore symbolic links, as otherwise you could be looping endlessly. You should change else { ... } to elsif (-f $contents) { ... }

Related

perl deleting a subdirectory containing a file of specified format

i have to do a program in perl. and im very new to it.the task is
there will be directory ,inside that many subdirectories will be there.each subdirectories contain further subdirectories. finally there will be files in the end of chain of subdirectories. If the file format is ".TXT" i should delete the subdirectory that is next to the main directory that contains the .TXT file.
for eg raghav\abc\ccd\1.txt then i should delete subdirectory "abc".
my code is
#!/usr/bin/perl
use warnings;
use strict;
use Cwd qw(abs_path);
my $path ="d:\\raghav";
search_all_folder($path);
sub search_all_folder {
my ($folder) = #_;
if ( -d $folder ) {
chdir $folder;
opendir my $dh, $folder or die "can't open the directory: $!";
while ( defined( my $file = readdir($dh) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
search_all_folder("$folder/$file"); ## recursive call
read_files($file) if ( -f $file );
}
closedir $dh or die "can't close directory: $!";
}
}
sub read_files {
my ($filename) = #_;
if($filename= ~/.txt/)
rmdir;
}
}
Never ever implement your own directory traversal. Use File::Find instead. It's more efficient and less prone to breaking.
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
my $search_path = "d:\\raghav";
my %text_files_found_in;
sub find_text_files {
if (m/\.txt$/i) {
## you may want to transform this first, to get a higher level directory.
$text_files_found_in{$File::Find::dir}++;
}
}
find( \&find_text_files, $search_path );
foreach my $dir ( keys %text_files_found_in ) {
print "$dir contained a text file\n";
##maybe delete it, but don't until you're sure the code's working!
}
You've got a list of files now, against which you can figure out what to delete and then delete it. rmdir won't do the trick though - that only works on empty directories. You can either collate a list of files (as this does) or you could figure out the path to delete as you go, and insert it into a hash. (So you don't get dupes).
Either way though - it's probably better to run the find first, delete second rather than trying to delete a tree you may still be traversing.
Edit: What this program does:
Imports the File::Find module.
defines the subroutine find_text_files
runs find (in the File::Find module), and tells it to run find_text_files on every file it finds in it's recursive traversal.
find_text_files is called on every file within the triee (below $search_path). When it's called:
File::Find sets: $_ to be the current filename. We match against m/\.txt$/ to see if it's a text file.
File::Find also sets two other variables: $File::Find::dir - to the directory path to this file. And $File::Find::file to the full file path. We insert $File::Find::dir into the hash text_files_found_in provided that pattern matches.
Once find is complete, we have a hash called text_files_found_in which contains keys of all the directories where a text file was found.
we can then iterate on keys %text_files_found_in to identify - and delete.*
at this point, you may want to transform each of the directories in that list, because it'll be a full path to the file, and you may only want to delete at a higher level.
* There's no delete code in this script - you'll have to sort that yourself. I'm not putting anything that might delete stuff up on the internet where people who don't full understand it might just run it blind.

Perl - Use of uninitialized value within %frequency in concatenation (.) or string

Not entirely sure why but for some reason i cant print the hash value outside the while loop.
#!/usr/bin/perl -w
opendir(D, "cwd" );
my #files = readdir(D);
closedir(D);
foreach $file (#files)
{
open F, $file or die "$0: Can't open $file : $!\n";
while ($line = <F>) {
chomp($line);
$line=~ s/[-':!?,;".()]//g;
$line=~ s/^[a-z]/\U/g;
#words = split(/\s/, $line);
foreach $word (#words) {
$frequency{$word}++;
$counter++;
}
}
close(F);
print "$file\n";
print "$ARGV[0]\n";
print "$frequency{$ARGV[0]}\n";
print "$counter\n";
}
Any help would be much appreciated!
cheers.
This line
print "$frequency{$ARGV[0]}\n";
Expects you to have an argument to your script, e.g. perl script.pl argument. If you have no argument, $ARGV[0] is undefined, but it will stringify to the empty string. This empty string is a valid key in the hash, but the value is undefined, hence your warning
Use of uninitialized value within %frequency in concatenation (.) or string
But you should also see the warning
Use of uninitialized value $ARGV[0] in hash element
And it is a very big mistake not to include that error in this question.
Also, when using readdir, you get all the files in the directory, including directories. You might consider filtering the files somewhat.
Using
use strict;
use warnings;
Is something that will benefit you very much, so add that to your script.
I had originally written this,
There is no %frequency defined at the top level of your program.
When perl sees you reference %frequency inside the inner-most
loop, it will auto-vivify it, in that scratchpad (lexical scope).
This means that when you exit the inner-most loop (foreach $word
(#words)), the auto-vivified %frequency is out of scope and
garbage-collected. Each time you enter that loop, a new, different
variable will be auto-vivified, and then discarded.
When you later refer to %frequency in your print, yet another new,
different %frequency will be created.
… but then realized that you had forgotten to use strict, and Perl was being generous and giving you a global %frequency, which ironically is probably what you meant. So, this answer is wrong in your case … but declaring the scope of %frequency would probably be good form, regardless.
These other, “unrelated” notes are still useful perhaps, or else I'd delete the answer altogether:
As #TLP mentioned, you should probably also skip directories (at least) in your file loop. A quick way to do this would be my #files = grep { -f "cwd/$_" } (readdir D); this will filter the list to contain only files.
I'm further suspicious that you named a directory "cwd" … are you perhaps meaning the current working directory? In all the major OS'es in use today, that directory is referenced as “.” — you're looking for a directory literally named "cwd"?

pattern search in all the files in a directory

I have the pattern something like "keyword : Multinode". Now, I need to search this pattern in all the files in a directory. If we found the pattern in any of the file, a non empty-string should be returned. It may contain file-name or directory name
In shell scripting the following will do the same
KeyMnode=grep -w "keyword : Multinode" ${dirname}/*
I thought of using find(subroutine,directory_path) and inside the sub-routine I want to traverse through the entire directory for all its entries. For every entry I want to put a check whether it is a readable file or not. If the file is readable, I want to search for the required pattern "keyword : Multinode" in the file found. If we hit with a success, the entire find command should result in a non-empty string(preferably only the existing directory Name) otherwise with an empty string. Please let me know if you need any further information.
I want this to be done using perl. Please help me with the solution.
Here are some Perl tools that will be useful in doing what you described:
File::Find will do a recursive search for files in a directory and its children, running code (the \&wanted callback in the docs) against each one to determine whether it meets your criteria or not
The -r operator will tell you whether a file is readable (if (-r $file_name)...)
open will get you access to the file and <$fh> will read its contents so that you can check with a regular expression whether they match your target pattern
Adding \b to the beginning and end of the pattern will cause it to match only at word boundaries, similar to grep's -w switch
If you have more specific issues, please post additional questions with code that demonstrates them, including statements both of what you expected to happen and of how the actual results differed from your expectation and we'll be happy to help resolve those issues.
Edit: Cleaned up and runnable version of code from comment:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use File::Find;
# Get $dirname from first command-line argument
my $dirname = shift #ARGV;
find(\&do_process, $dirname); # quotes around $dirname weren't needed
my ($KeyMnode, $KeyThreads);
sub do_process {
# chomp($_); - not needed; $_ isn't read from a file, so no newline on it
if (-r $_) { # quotes around $_ weren't needed
# $_ is just the final part of the file name; it may be better for
# reporting the location of matches to set $file_name to
# $File::Find::name instead
my $file_name = $_;
open(my $fh, '<', $file_name); # Use three-arg open!
while (<$fh>) {
chomp();
# Note that, if you store all matches into the same scalar values,
# you'll end up with only the last value found for each pattern; you
# may want to push the matches onto arrays instead.
if (/\bkeyword : Multinode\b/i) { $KeyMnode = "$file_name:$_"; }
if (/\bkeyword : Threads\b/i) { $KeyThreads = "$file_name:$_"; }
}
}
}

Identifying the difference between ordinary folders and "."/".." folders

I'm writing a Perl script to automatically copy PDFs from a folder.
Users are unable to access folders they don't have permission for so they don't accidentally gain access to any information they aren't supposed to.
I have a rough mock-up which works except for one bug: it keeps seeing the . and .. folders and opens them entering infinite loops.
The following conditional statement checked to see the file was a PDF, which would then pass to my copyPDF, which checks for exceptions and then copies the file; otherwise it passes and tries to open as a folder if a folder scans that content and repeats.
I've tried a number of ways to ignore the . and .. but it always leads to ignoring all other subfolders as well. Has anyone got a work around?
if ($file =~ /\.pdf$/i) {
print "$file is a pdf\n";
$fileLocation = "$directoryName/$file";
copyPDF("$fileLocation", "$file");
}
elsif ($file == '.') {
#print "argh\n";
}
else {
$openFolder = "$directory/$file";
print "*$openFolder";
openNextDirectory("$openFolder");
}
Always use use strict; use warnings;!!!
$file == '.'
produces
Argument "." isn't numeric in numeric eq (==)
because you are asking Perl to compare two numbers. You should be using
$file eq '.'
See perldoc perlop for more information about perl's operators.
This old question has some great answers that address this and similar questions:
How can I copy a directory recursively and filter filenames in Perl?
use the File::Find module
use File::Find;
use File::Copy;
my $directory = 'c:\my\pdf\directory';
find(\&pdfcopy, $directory);
sub pdfcopy() {
my $newdirectory = 'c:\some\new\dir';
return if ($File::Find::name !~ /\.pdf$/i);
copy($File::Find::name, $newdirectory) or
die "File $File::Find::name cannot be copied: !";
}

Renaming Sub-directories in Perl

#!/usr/bin/perl -w
use strict;
use File::Copy;
use File::Spec;
my($chptr, $base_path, $new, $dir);
$dir = "Full Metal Alchemist"; #Some dir
opendir(FMA, $dir) or die "Can't open FMA dir";
while($chptr = readdir FMA){
$base_path = File::Spec->rel2abs($dir).'/'; #find absolute path of $fir
if($chptr =~ m(Chapter\w*\d*)){ #some regex to avoid the .. and . dirs
$new = join(" 0", split(/\W/, $chptr)); #modify said sub directory
rename "$base_path$chptr", "$base_path$new" ? print "Renames $base_path$chptr to
$base_path$new\n" : die "rename failed $!";
}
}
closedir FMA;
Originally, my script only used the relative path to preform the move op, but for some reason, this leaves the sub directories unaffected. My next step was to go to absolute pathing but to no avail. I am just learning Perl so I feel like I'm making a simple mistake. Where have I gone wrong? TIA
You could exclude . and .. as follows:
if ( $child ne '.' and $child ne '..' ) { ... }
Some general remarks:
Always have a very clear spec of what you want to do. That also helps everybody trying to help you.
It's not clear what goes wrong here. Maybe your regex simply doesn't match the directories you want it to match? What is the problem?
Try to make very specific parts (like the name of the directory where you want to start processing) into parameters. Obviously, some specifics are harder to make into parameters, like what and how to rename.
Using opendir, readdir, rename and File::Spec is fine for starting. There's an easier way, though: take a look at the Path::Class module, and specifically its two subclasses. They provide a well-crafted abstraction layer over File::Spec (and more), and it's basically a one-stop service for filesystem operations.