Perl Recurse through Directory - perl

So essentially what I'm trying to do is go through a directory and perform an action on all of the files, in this case, the sub searchForErrors. This sub works. What I have so far is:
sub proccessFiles{
my $path = $ARGV[2];
opendir(DIR, $path) or die "Unable to open $path: $!";
my #files = readdir(DIR);
#files = map{$path . '/' . $_ } #files;
closedir(DIR);
for (#files){
if(-d $_){
process_files($_);
}
else{
searchForErrors;
}
}
}
proccessFiles($path);
Any help/suggestions would be great. And again, I'm new to Perl, so the more explanation the better. Thank you!

You should use the File::Find module instead of trying to reinvent the wheel:
use strict;
use warnings;
use File::Find;
my #files;
my $start_dir = "somedir"; # top level dir to search
find(
sub { push #files, $File::Find::name unless -d; },
$start_dir
);
for my $file (#files) {
searchForErrors($file);
}
A problem with your current code is that you are including . and .. directories in your recursive search, which will no doubt cause deep recursion errors.

I thought it would be useful to show a Path::Class solution in addition to TLP's File::Find one.
I think this is pretty much self-explanatory.
use strict;
use warnings;
use Path::Class 'dir';
my $root = dir 'C:\path\to\root';
$root->recurse(callback => sub {
my $file = shift;
searchForErrors($file) unless $file->is_dir;
});

A better way of using File::Find without using a lot of memory would be:
use strict; use warnings;
use File::Find;
find(
sub {
searchForErrors($File::Find::name) unless -d
},
"/tmp/"
);
This is more iterator style.

Related

Traversing Through Directories

I am trying to traverse through directories to change certain file extensions in those directories.
I made it to where I can go through a directory that is given through command-line, but I cannot make it traverse through that directories' subdirectories.
For example: If I want to change the file extensions in the directory Test then if Test has a subdirectory I want to be able to go through that directory and change the file extensions of those files too.
I came up with this. This works for one directory. It correctly changes the file extensions of the files in one specific directory.
#!/usr/local/bin/perl
use strict;
use warnings;
my #argv;
my $dir = $ARGV[0];
my #files = glob "${dir}/*pl";
foreach (#files) {
next if -d;
(my $txt = $_) =~ s/pl$/txt/;
rename($_, $txt);
}
I then heard of File::Find::Rule, so I tried to use that to traverse through the directories.
I came up with this:
#!/usr/local/bin/perl
use strict;
use warnings;
use File::Find;
use File::Find::Rule;
my #argv;
my $dir = $ARGV[0];
my #subdirs = File::find::Rule->directory->in( $dir );
sub fileRecurs{
my #files = glob "${dir}/*pl";
foreach (#files) {
next if -d;
(my $txt = $_) =~ s/pl$/txt/;
rename($_, $txt);
}
}
This does not work/ will not work because I am not familiar enough with File::Find::Rule
Is there a better way to traverse through the directories to change the file extensions?
#!/usr/local/bin/perl
use strict;
use warnings;
use File::Find;
my #argv;
my $dir = $ARGV[0];
find(\&dirRecurs, $dir);
sub dirRecurs{
if (-f)
{
(my $txt = $_) =~ s/pl$/txt/;
rename($_, $txt);
}
}
I figured it out with the help of the tutorial #David sent me! Thank you!

Unable to get absolute path of a file from $File::Find::name - perl

I am unable to get the absolute path of a file from $File::Find::name. It is showing undef vale as a output. Not able to figure it out why :( can any one please help me out in this
Error displayed is : Use of uninitialized value $file_name in concatenation
My Code :
use strict;
use warnings;
use File::Find;
use File::Path qw(make_path);
use File::Copy;
use Cwd;
use Data::Printer;
my $rootPATH = $ARGV[0];
my $id = $ARGV[1];
my #Arraypath;
my $file_name;
our $anr_name;
opendir( my $DIR, $rootPATH );
while ( my $entry = readdir $DIR ) {
next unless -d $rootPATH . '/' . $entry;
next if $entry eq '.' or $entry eq '..';
#print "Found directory $entry\n";
push( #Arraypath, ( split( "\n", $entry ) ) );
}
closedir $DIR;
my $count = 0;
foreach my $line (#Arraypath) {
my $fulllogpath = $rootPATH . "\\" . $line;
#print "$fulllogpath\n";
$count++;
start($fulllogpath);
}
sub start {
my $fulllogpath = shift;
our #content;
#print "$fulllogpath\n\n";
find( \&wanted, $fulllogpath );
sub wanted {
push #content, $_;
return;
}
foreach my $file (#content) {
# print "$file\n\n";
if ( $file =~ /traces[_d]*/ ) {
print "$file\n\n";
$file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}
}
}
Your program is very poorly layed out. It will be much simpler to write and debug code if you indent it properly and use carefully-chosen identifiers: a name like start for a subroutine is useless.
You also have unnecessary subroutine declarations which break up the program flow and make it awkward to follow.
Why do you have a couple of package variables (declared with our)? There is generally no need for them, and it is best to use lexical variables throughout, declared at an appropriate place so that all code has access to them if it needs it.
It is also preferable to use File::Spec to work with file paths, rather than manipulate them using string operators, with which it is easy to make a mistake.
The best way to manage the results of find is to work with absolute paths all the way through. It looks like you want to do more than just print the results returned by find since you load modules like Cwd and File::Copy, but without knowing what that further purpose is I cannot help you to write it.
This code removes all the subroutines and makes everything much more concise.
use strict;
use warnings;
use autodie;
use File::Find 'find';
use File::Spec;
use Data::Printer;
my ($root_path, $id) = #ARGV;
opendir my ($dh), $root_path;
my #dir_list =
grep -d,
map File::Spec->catfile($root_path, $_),
grep { not /\A\.\.?\z/ } readdir $dh;
closedir $dh;
my $count;
for my $dir (#dir_list) {
++$count;
find(sub {
return unless /traces[_d]*/;
my $file = $_;
print "$file\n\n";
my $file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}, $dir);
}
As has already been stated, $File::Find::name is valid only within the wanted function. Not outside of it.
However, I would recommend making the shift to using Path::Class and Path::Class::Rule for some simpler processing of your files in a cross platform compatible way:
use strict;
use warnings;
use Data::Printer;
use Path::Class;
use Path::Class::Rule;
my ( $root_path, $id ) = #ARGV;
my $dir = dir($root_path);
my $next = Path::Class::Rule->new->file->name(qr{traces[_d]*})->iter(
grep { $_->is_dir() } $dir->children
);
while ( my $file = $next->() ) {
# Accomplishes the same as your script. I suspect these prints statements are mostly for debugging though.
print $file->basename(), "\n\n";
p "$file";
print "$file\n";
}

Find::File to search a directory of a list of files

I'm writing a Perl script and I'm new to Perl -- I have a file that contains a list of files. For each item on the list I want to search a given directory and its sub-directories to find the file return the full path. I've been unsuccessful thus far trying to use File::Find. Here's what I got:
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
find(\&wanted, $directory);
sub wanted {
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
return #file_list;
}
I find File::Find::Rule a tad easier and more elegant to use.
use File::Find::Rule;
my $path = '/some/path';
# Find all directories under $path
my #paths = File::Find::Rule->directory->in( $path );
# Find all files in $path
my #files = File::Find::Rule->file->in( $path );
The arrays contain full paths to the objects File::Find::Rule finds.
File::Find is used to traverse a directory structure in the filesystem. Instead of doing what you're trying to do, namely, have the wanted subroutine read in the file, you should read in the file as follows:
use strict;
use warnings;
use vars qw/#file_list/;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
open FILE, "$input_file" or die "$!\n";
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
# do what you need to here with the #file_list array
Okay, well re-read the doc and I misunderstood the wanted subroutine. The wanted is a subroutine that is called on every file and directory that is found. So here's my code to take that into account
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
find(\&wanted, $directory);
sub wanted {
if ( $_ ~~ #file_list ) {
print "$File::Find::name\n";
}
return;
}

Perl search for specific subdirectory then process

So for the program I am writing, what I would like for it to do is search through all of the subdirectories within a directory. If the subdirectory name contains a word, let's say "foo", then the program will open this subdirectory and perform a function on the files within the subdirectory. Can anybody give me some help on how to go about this? it also needs to be recursive. Thanks in advance
This can be done using the File::Find module, but I believe Path::Class is superior even though it isn't a core module and will likely need installing.
This program finds the files wanted and calls process to process them. At present the process subroutine simply prints the name of the file for testing.
use strict;
use warnings;
use Path::Class;
my $dir = dir '/path/to/root/directory';
$dir->recurse(callback => sub {
my $node = shift;
return if $node->is_dir;
my $parent = $node->parent;
if ($parent->basename =~ /foo/) {
process($node);
}
});
sub process {
my $file = shift;
print $file, "\n";
}
Update
If you prefer, this program performs the same task using File::Find.
use strict;
use warnings;
use File::Find;
use File::Basename qw/ basename /;
my $dir = '/path/to/root/directory';
find(sub {
return unless -f;
if (basename($File::Find::dir) =~ /foo/) {
process($File::Find::name);
}
}, $dir);
sub process {
my $file = shift;
print $file, "\n";
}
Update
As requested, here is a further solution using Path::Class::Rule for comparison. As daxim suggested the code is a little shorter.
use strict;
use warnings;
use Path::Class::Rule;
my $rule = Path::Class::Rule->new;
$rule->file->and(sub { $_->parent->basename =~ /foo/ });
my $next = $rule->iter('/path/to/root/directory');
while ( my $file = $next->() ) {
process($file);
}
sub process {
my $file = shift;
print $file, "\n";
}

change the directory and grab the xml file to parse certain data in perl

I am trying to parse specific XML file which is located in sub directories of one directory. For some reason i am getting error saying file does not exists. if the file does not exist it should move on to next sub directory.
HERE IS MY CODE
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
my #xmlsearch = map { chomp; $_ } `ls`;
foreach my $directory (#xmlsearch) {
print "$directory \n";
chdir($directory) or die "Couldn't change to [$directory]: $!";
my #findResults = `find -name education.xml`;
foreach my $educationresults (#findResults){
print $educationresults;
my $parser = new XML::Simple;
my $data = $parser->XMLin($educationresults);
print Dumper($data);
chdir('..');
}
}
ERROR
music/gitar/education.xml
File does not exist: ./music/gitar/education.xml
Using chdir the way you did makes the code IMO less readable. You can use File::Find for that:
use autodie;
use File::Find;
use XML::Simple;
use Data::Dumper;
sub findxml {
my #found;
opendir(DIR, '.');
my #where = grep { -d && m#^[^.]+$# } readdir(DIR);
closedir(DIR);
File::Find::find({wanted => sub {
push #found, $File::Find::name if m#^education\.xml$#s && -f _;
} }, #where);
return #found;
}
foreach my $xml (findxml()){
say $xml;
print Dumper XMLin($xml);
}
Whenever you find yourself relying on backticks to execute shell commands, you should consider whether there is a proper perl way to do it. In this case, there is.
ls can be replaced with <*>, which is a simple glob. The line:
my #array = map { chomp; $_ } `ls`;
Is just a roundabout way of saying
chomp(my #array = `ls`); # chomp takes list arguments as well
But of course the proper way is
my #array = <*>; # no chomp required
Now, the simple solution to all of this is simply to do
for my $xml (<*/education.xml>) { # find the xml files in dir 1 level up
Which will cover one level of directories, with no recursion. For full recursion, use File::Find:
use strict;
use warnings;
use File::Find;
my #list;
find( sub { push #list, $File::Find::name if /^education\.xml$/i; }, ".");
for (#list) {
# do stuff
# #list contains full path names of education.xml files found in subdirs
# e.g. ./music/gitar/education.xml
}
You should note that changing directories is not required, and in my experience, not worth the trouble. Instead of doing:
chdir($somedir);
my $data = XMLin($somefile);
chdir("..");
Simply do:
my $data = XMLin("$somedir/$somefile");