Identifying the difference between ordinary folders and "."/".." folders - perl

I'm writing a Perl script to automatically copy PDFs from a folder.
Users are unable to access folders they don't have permission for so they don't accidentally gain access to any information they aren't supposed to.
I have a rough mock-up which works except for one bug: it keeps seeing the . and .. folders and opens them entering infinite loops.
The following conditional statement checked to see the file was a PDF, which would then pass to my copyPDF, which checks for exceptions and then copies the file; otherwise it passes and tries to open as a folder if a folder scans that content and repeats.
I've tried a number of ways to ignore the . and .. but it always leads to ignoring all other subfolders as well. Has anyone got a work around?
if ($file =~ /\.pdf$/i) {
print "$file is a pdf\n";
$fileLocation = "$directoryName/$file";
copyPDF("$fileLocation", "$file");
}
elsif ($file == '.') {
#print "argh\n";
}
else {
$openFolder = "$directory/$file";
print "*$openFolder";
openNextDirectory("$openFolder");
}

Always use use strict; use warnings;!!!
$file == '.'
produces
Argument "." isn't numeric in numeric eq (==)
because you are asking Perl to compare two numbers. You should be using
$file eq '.'
See perldoc perlop for more information about perl's operators.

This old question has some great answers that address this and similar questions:
How can I copy a directory recursively and filter filenames in Perl?

use the File::Find module
use File::Find;
use File::Copy;
my $directory = 'c:\my\pdf\directory';
find(\&pdfcopy, $directory);
sub pdfcopy() {
my $newdirectory = 'c:\some\new\dir';
return if ($File::Find::name !~ /\.pdf$/i);
copy($File::Find::name, $newdirectory) or
die "File $File::Find::name cannot be copied: !";
}

Related

create new file listing all text files in a directory with perl

I am trying to list out all text files in a directory using perl. The below does run but the resulting file is empty. This seems close but maybe it is not what I need. Thank you :).
get_list.pl
#!/bin/perl
# create a list of all *.txt files in the current directory
opendir(DIR, ".");
#files = grep(/\..txt$/,readdir(DIR));
closedir(DIR);
# print all the filenames in our array
foreach $file (#files) {
print "$file\n";
}
As written, your grep is wrong:
#files = grep(/\..txt$/,readdir(DIR));
In regular expressions - . means any character. So you will find a file called
fish.mtxt
But not a file called
fish.txt
Because of that dot.
You probably want to grep /\.txt/, readdir(DIR)
But personally, I wouldn't bother, and just use glob instead.
foreach my $file (glob "*.txt") {
print $file,"\n";
}
Also - turn on use strict; use warnings;. Consider them mandatory until you know why you want to turn them off. (There are occasions, but you'll know what they are if you ever REALLY NEED to).
You have one excess dot:
#files = grep(/\..txt$/,readdir(DIR));
should be:
#files = grep(/\.txt$/,readdir(DIR));

perl deleting a subdirectory containing a file of specified format

i have to do a program in perl. and im very new to it.the task is
there will be directory ,inside that many subdirectories will be there.each subdirectories contain further subdirectories. finally there will be files in the end of chain of subdirectories. If the file format is ".TXT" i should delete the subdirectory that is next to the main directory that contains the .TXT file.
for eg raghav\abc\ccd\1.txt then i should delete subdirectory "abc".
my code is
#!/usr/bin/perl
use warnings;
use strict;
use Cwd qw(abs_path);
my $path ="d:\\raghav";
search_all_folder($path);
sub search_all_folder {
my ($folder) = #_;
if ( -d $folder ) {
chdir $folder;
opendir my $dh, $folder or die "can't open the directory: $!";
while ( defined( my $file = readdir($dh) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
search_all_folder("$folder/$file"); ## recursive call
read_files($file) if ( -f $file );
}
closedir $dh or die "can't close directory: $!";
}
}
sub read_files {
my ($filename) = #_;
if($filename= ~/.txt/)
rmdir;
}
}
Never ever implement your own directory traversal. Use File::Find instead. It's more efficient and less prone to breaking.
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
my $search_path = "d:\\raghav";
my %text_files_found_in;
sub find_text_files {
if (m/\.txt$/i) {
## you may want to transform this first, to get a higher level directory.
$text_files_found_in{$File::Find::dir}++;
}
}
find( \&find_text_files, $search_path );
foreach my $dir ( keys %text_files_found_in ) {
print "$dir contained a text file\n";
##maybe delete it, but don't until you're sure the code's working!
}
You've got a list of files now, against which you can figure out what to delete and then delete it. rmdir won't do the trick though - that only works on empty directories. You can either collate a list of files (as this does) or you could figure out the path to delete as you go, and insert it into a hash. (So you don't get dupes).
Either way though - it's probably better to run the find first, delete second rather than trying to delete a tree you may still be traversing.
Edit: What this program does:
Imports the File::Find module.
defines the subroutine find_text_files
runs find (in the File::Find module), and tells it to run find_text_files on every file it finds in it's recursive traversal.
find_text_files is called on every file within the triee (below $search_path). When it's called:
File::Find sets: $_ to be the current filename. We match against m/\.txt$/ to see if it's a text file.
File::Find also sets two other variables: $File::Find::dir - to the directory path to this file. And $File::Find::file to the full file path. We insert $File::Find::dir into the hash text_files_found_in provided that pattern matches.
Once find is complete, we have a hash called text_files_found_in which contains keys of all the directories where a text file was found.
we can then iterate on keys %text_files_found_in to identify - and delete.*
at this point, you may want to transform each of the directories in that list, because it'll be a full path to the file, and you may only want to delete at a higher level.
* There's no delete code in this script - you'll have to sort that yourself. I'm not putting anything that might delete stuff up on the internet where people who don't full understand it might just run it blind.

Recursive directory traversal in perl

I intend to recursively traverse a directory containing this piece of perl script.
The idea is to traverse all directories whose parent directory contains the perl script and list all files path into a single array variable. Then return the list.
Here comes the error msg:
readdir() attempted on invalid dirhandle $DIR at xxx
closedir() attempted on invalid dirhandle $DIR at xxx
Code is attached for reference, Thank you in advance.
use strict;
use warnings;
use Cwd;
our #childfile = ();
sub recursive_dir{
my $current_dir = $_[0]; # a parameter
opendir(my $DIR,$current_dir) or die "Fail to open current directory,error: $!";
while(my $contents = readdir($DIR)){
next if ($contents =~ m/^\./); # filter out "." and ".."
#if-else clause separate dirs from files
if(-d "$contents"){
#print getcwd;
#print $contents;
#closedir($DIR);
recursive_dir(getcwd."/$contents");
print getcwd."/$contents";
}
else{
if($contents =~ /(?<!\.pl)$/){
push(#childfile,$contents);
}
}
}
closedir($DIR);
#print #childfile;
return #childfile;
}
recursive_dir(getcwd);
Please tell us if this is homework? You are welcome to ask for help with assignments, but it changes the sort of answer you should be given.
You are relying on getcwd to give you the current directory that you are processing, yet you never change the current working directory so your program will loop endlessly and eventually run out of memory. You should simply use $current_dir instead.
I don't believe that those error messages can be produced by the program you show. Your code checks whether opendir has succeeded and the program dies unless $DIR is valid, so the subsequent readdir and closedir must be using a valid handle.
Some other points:
Comments like # a parameter are ridiculous and only serve to clutter your code
Upper-case letters are generally reserved for global identifiers like package names. And $dir is a poor name for a directory handle, as it could also mean the directory name or the directory path. Use $dir_handle or $dh
It is crazy to use a negative look-behind just to check that a file name doesn't end with .pl. Just use push #childfile, $contents unless $contents =~ /\.pl$/
You never use the return value from your subroutine, so it is wasteful of memory to return what could be an enormous array from every call. #childfile is accessible throughout the program so you can just access it directly from anywhere
Don't put scalar variables inside double quotes. It simply forces the value to a string, which is probably unnecessary and may cause arcane bugs. Use just -d $contents
You probably want to ignore symbolic links, as otherwise you could be looping endlessly. You should change else { ... } to elsif (-f $contents) { ... }

Reading the contents of directories in Perl

I am trying to accomplish two things with a Perl script. I have a file, which in the first subdirectory has different user directories, and in each of these user directories contains some folders that have text files in them. I am trying to write a Perl script that
Lists the directories for each user
Gets the total number of .txt files
For the second objective I have this code
my #emails = glob "$dir/*.txt";
for (0..$#emails){
$emails[$_] =~ s/\.txt$//;
}
$emails=#emails;
but $emails is returning 0. Any insight?
Typically, using glob is not very good idea when it comes to processing files in directories and possible subdirectories.
Much better way is to use File::Find module, like this:
use File::Find;
my #emails;
File::Find::find(
{
# this will be called for each file under $dir:
wanted => sub {
my $file = $File::Find::name;
return unless -f $file and $file =~ /\.txt$/;
# this file looks like email, remember it:
push #emails, $file;
}
},
$dir
);
print "Found " . scalar #emails . " .txt files (emails) in '$dir'\n";

Renaming Sub-directories in Perl

#!/usr/bin/perl -w
use strict;
use File::Copy;
use File::Spec;
my($chptr, $base_path, $new, $dir);
$dir = "Full Metal Alchemist"; #Some dir
opendir(FMA, $dir) or die "Can't open FMA dir";
while($chptr = readdir FMA){
$base_path = File::Spec->rel2abs($dir).'/'; #find absolute path of $fir
if($chptr =~ m(Chapter\w*\d*)){ #some regex to avoid the .. and . dirs
$new = join(" 0", split(/\W/, $chptr)); #modify said sub directory
rename "$base_path$chptr", "$base_path$new" ? print "Renames $base_path$chptr to
$base_path$new\n" : die "rename failed $!";
}
}
closedir FMA;
Originally, my script only used the relative path to preform the move op, but for some reason, this leaves the sub directories unaffected. My next step was to go to absolute pathing but to no avail. I am just learning Perl so I feel like I'm making a simple mistake. Where have I gone wrong? TIA
You could exclude . and .. as follows:
if ( $child ne '.' and $child ne '..' ) { ... }
Some general remarks:
Always have a very clear spec of what you want to do. That also helps everybody trying to help you.
It's not clear what goes wrong here. Maybe your regex simply doesn't match the directories you want it to match? What is the problem?
Try to make very specific parts (like the name of the directory where you want to start processing) into parameters. Obviously, some specifics are harder to make into parameters, like what and how to rename.
Using opendir, readdir, rename and File::Spec is fine for starting. There's an easier way, though: take a look at the Path::Class module, and specifically its two subclasses. They provide a well-crafted abstraction layer over File::Spec (and more), and it's basically a one-stop service for filesystem operations.