perl deleting a subdirectory containing a file of specified format - perl

i have to do a program in perl. and im very new to it.the task is
there will be directory ,inside that many subdirectories will be there.each subdirectories contain further subdirectories. finally there will be files in the end of chain of subdirectories. If the file format is ".TXT" i should delete the subdirectory that is next to the main directory that contains the .TXT file.
for eg raghav\abc\ccd\1.txt then i should delete subdirectory "abc".
my code is
#!/usr/bin/perl
use warnings;
use strict;
use Cwd qw(abs_path);
my $path ="d:\\raghav";
search_all_folder($path);
sub search_all_folder {
my ($folder) = #_;
if ( -d $folder ) {
chdir $folder;
opendir my $dh, $folder or die "can't open the directory: $!";
while ( defined( my $file = readdir($dh) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
search_all_folder("$folder/$file"); ## recursive call
read_files($file) if ( -f $file );
}
closedir $dh or die "can't close directory: $!";
}
}
sub read_files {
my ($filename) = #_;
if($filename= ~/.txt/)
rmdir;
}
}

Never ever implement your own directory traversal. Use File::Find instead. It's more efficient and less prone to breaking.
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
my $search_path = "d:\\raghav";
my %text_files_found_in;
sub find_text_files {
if (m/\.txt$/i) {
## you may want to transform this first, to get a higher level directory.
$text_files_found_in{$File::Find::dir}++;
}
}
find( \&find_text_files, $search_path );
foreach my $dir ( keys %text_files_found_in ) {
print "$dir contained a text file\n";
##maybe delete it, but don't until you're sure the code's working!
}
You've got a list of files now, against which you can figure out what to delete and then delete it. rmdir won't do the trick though - that only works on empty directories. You can either collate a list of files (as this does) or you could figure out the path to delete as you go, and insert it into a hash. (So you don't get dupes).
Either way though - it's probably better to run the find first, delete second rather than trying to delete a tree you may still be traversing.
Edit: What this program does:
Imports the File::Find module.
defines the subroutine find_text_files
runs find (in the File::Find module), and tells it to run find_text_files on every file it finds in it's recursive traversal.
find_text_files is called on every file within the triee (below $search_path). When it's called:
File::Find sets: $_ to be the current filename. We match against m/\.txt$/ to see if it's a text file.
File::Find also sets two other variables: $File::Find::dir - to the directory path to this file. And $File::Find::file to the full file path. We insert $File::Find::dir into the hash text_files_found_in provided that pattern matches.
Once find is complete, we have a hash called text_files_found_in which contains keys of all the directories where a text file was found.
we can then iterate on keys %text_files_found_in to identify - and delete.*
at this point, you may want to transform each of the directories in that list, because it'll be a full path to the file, and you may only want to delete at a higher level.
* There's no delete code in this script - you'll have to sort that yourself. I'm not putting anything that might delete stuff up on the internet where people who don't full understand it might just run it blind.

Related

Recursive directory traversal in perl

I intend to recursively traverse a directory containing this piece of perl script.
The idea is to traverse all directories whose parent directory contains the perl script and list all files path into a single array variable. Then return the list.
Here comes the error msg:
readdir() attempted on invalid dirhandle $DIR at xxx
closedir() attempted on invalid dirhandle $DIR at xxx
Code is attached for reference, Thank you in advance.
use strict;
use warnings;
use Cwd;
our #childfile = ();
sub recursive_dir{
my $current_dir = $_[0]; # a parameter
opendir(my $DIR,$current_dir) or die "Fail to open current directory,error: $!";
while(my $contents = readdir($DIR)){
next if ($contents =~ m/^\./); # filter out "." and ".."
#if-else clause separate dirs from files
if(-d "$contents"){
#print getcwd;
#print $contents;
#closedir($DIR);
recursive_dir(getcwd."/$contents");
print getcwd."/$contents";
}
else{
if($contents =~ /(?<!\.pl)$/){
push(#childfile,$contents);
}
}
}
closedir($DIR);
#print #childfile;
return #childfile;
}
recursive_dir(getcwd);
Please tell us if this is homework? You are welcome to ask for help with assignments, but it changes the sort of answer you should be given.
You are relying on getcwd to give you the current directory that you are processing, yet you never change the current working directory so your program will loop endlessly and eventually run out of memory. You should simply use $current_dir instead.
I don't believe that those error messages can be produced by the program you show. Your code checks whether opendir has succeeded and the program dies unless $DIR is valid, so the subsequent readdir and closedir must be using a valid handle.
Some other points:
Comments like # a parameter are ridiculous and only serve to clutter your code
Upper-case letters are generally reserved for global identifiers like package names. And $dir is a poor name for a directory handle, as it could also mean the directory name or the directory path. Use $dir_handle or $dh
It is crazy to use a negative look-behind just to check that a file name doesn't end with .pl. Just use push #childfile, $contents unless $contents =~ /\.pl$/
You never use the return value from your subroutine, so it is wasteful of memory to return what could be an enormous array from every call. #childfile is accessible throughout the program so you can just access it directly from anywhere
Don't put scalar variables inside double quotes. It simply forces the value to a string, which is probably unnecessary and may cause arcane bugs. Use just -d $contents
You probably want to ignore symbolic links, as otherwise you could be looping endlessly. You should change else { ... } to elsif (-f $contents) { ... }

How can I print a list of all files in a directory with their full path?

What I have:
I have a folder that has many other folders and files within it. I need to get a list of the paths to all files that are within a folder called l1. There are many different folders called l1 within my main directory, so I have to search for each l1 folder and return the paths to each file within it. I have been able to print a list of all the l1 folder locations, but I don't know how to list the files within these locations. The code I have for finding the locations of all the l1 folders is below.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my #folder;
sub wanted {
if ( -d && $_ eq 'l1' ) {
push #folder, $File::Find::name;
}
}
find \&wanted, '/mnt/vbox_share/';
open fp, ">process.txt";
print fp "#folder";
What do I need to modify or add to be able to list all file paths that are within the folders I searched for?
This is all that's needed.
wanted ignores everything but files, discarding directories and links
The full path to the file is in $File::Find::name when wanted is being executed
Splitting that on / and taking the last but one element $path[-2] finds the name of the parent directory
print the full file path if that directory equals l1
.
use strict;
use warnings;
use File::Find;
find( \&wanted, '/path/to/root/dir');
sub wanted {
return unless -f;
my #path = split /\//, $File::Find::name;
print $File::Find::name, "\n" if #path > 1 and $path[-2] eq 'l1';
}

Merge multiple HTML Files

I am merging multiple html files in the directory/subdirectory into single html within the same directories. I gone through some website and tried the below code:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
my $basedir = 'c:/test';
opendir(DIR, $basedir) or die $!;
my #files = readdir(DIR); # name arrays plural, hashes singular
closedir DIR;
my $outfilename = 'final.htm';
my $outfilesrc = undef;
foreach (sort #files){
$outfilesrc.= File::Slurp::slurp("$basedir/$_");
}
open(OUT, "> $basedir/$outfilename") or die ("Can't open for writing: $basedir/$outfilename : $!");
print OUT $outfilesrc;
close OUT;
exit;
But I am getting follwing error and could not merge the file.
read_file 'c:/test.' - sysopen: Permission denied at mergehtml.pl line 15
Can anyone help me! Is there any way to merge HTML files to single in Perl?
Your error most likely comes from trying to open the "current directory" c:\test\. for reading. This comes from using readdir to list the files: readdir includes all the files.
If all you want to do is concatenate the files, its rather simple if you're in linux: cat test/* > final.htm. Unfortunately, in Windows its a bit more tricky.
perl -pe"BEGIN { #ARGV = map glob, #ARGV }" "C:/test/*" > final.htm
Explanation:
We use the -p option to read and print the content of the argument file names. Those arguments are in this case a glob, and the windows command shell does not perform these globs automagically, so we have to ask perl to do it, with the built-in glob command. We do this in a BEGIN block to separate it from the rest of the code. The "rest of the code" is in this case just (basically) a while (<>) { print } block that reads and prints the contents of the files. At the end of the line we redirect all the output to the file final.htm.
Why use glob over readdir? Well, for one thing, readdir includes the directories . (current dir) and .. (parent dir), which will mess up your code, like I mentioned at the top. You would need to filter out directories. And glob does this smoothly with no problem.
If you want the longer version of this script, you can do
use strict;
use warnings;
#ARGV = map glob, #ARGV;
while (<>) {
print;
}
Note that I suspect that you only want html files to be merged. So it would perhaps be a good idea of you to change your glob from * to something like
*.htm *.html
Filter out the files "." and ".." from your #files list.

Reading the contents of directories in Perl

I am trying to accomplish two things with a Perl script. I have a file, which in the first subdirectory has different user directories, and in each of these user directories contains some folders that have text files in them. I am trying to write a Perl script that
Lists the directories for each user
Gets the total number of .txt files
For the second objective I have this code
my #emails = glob "$dir/*.txt";
for (0..$#emails){
$emails[$_] =~ s/\.txt$//;
}
$emails=#emails;
but $emails is returning 0. Any insight?
Typically, using glob is not very good idea when it comes to processing files in directories and possible subdirectories.
Much better way is to use File::Find module, like this:
use File::Find;
my #emails;
File::Find::find(
{
# this will be called for each file under $dir:
wanted => sub {
my $file = $File::Find::name;
return unless -f $file and $file =~ /\.txt$/;
# this file looks like email, remember it:
push #emails, $file;
}
},
$dir
);
print "Found " . scalar #emails . " .txt files (emails) in '$dir'\n";

Renaming Sub-directories in Perl

#!/usr/bin/perl -w
use strict;
use File::Copy;
use File::Spec;
my($chptr, $base_path, $new, $dir);
$dir = "Full Metal Alchemist"; #Some dir
opendir(FMA, $dir) or die "Can't open FMA dir";
while($chptr = readdir FMA){
$base_path = File::Spec->rel2abs($dir).'/'; #find absolute path of $fir
if($chptr =~ m(Chapter\w*\d*)){ #some regex to avoid the .. and . dirs
$new = join(" 0", split(/\W/, $chptr)); #modify said sub directory
rename "$base_path$chptr", "$base_path$new" ? print "Renames $base_path$chptr to
$base_path$new\n" : die "rename failed $!";
}
}
closedir FMA;
Originally, my script only used the relative path to preform the move op, but for some reason, this leaves the sub directories unaffected. My next step was to go to absolute pathing but to no avail. I am just learning Perl so I feel like I'm making a simple mistake. Where have I gone wrong? TIA
You could exclude . and .. as follows:
if ( $child ne '.' and $child ne '..' ) { ... }
Some general remarks:
Always have a very clear spec of what you want to do. That also helps everybody trying to help you.
It's not clear what goes wrong here. Maybe your regex simply doesn't match the directories you want it to match? What is the problem?
Try to make very specific parts (like the name of the directory where you want to start processing) into parameters. Obviously, some specifics are harder to make into parameters, like what and how to rename.
Using opendir, readdir, rename and File::Spec is fine for starting. There's an easier way, though: take a look at the Path::Class module, and specifically its two subclasses. They provide a well-crafted abstraction layer over File::Spec (and more), and it's basically a one-stop service for filesystem operations.