change the directory and grab the xml file to parse certain data in perl - perl

I am trying to parse specific XML file which is located in sub directories of one directory. For some reason i am getting error saying file does not exists. if the file does not exist it should move on to next sub directory.
HERE IS MY CODE
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
my #xmlsearch = map { chomp; $_ } `ls`;
foreach my $directory (#xmlsearch) {
print "$directory \n";
chdir($directory) or die "Couldn't change to [$directory]: $!";
my #findResults = `find -name education.xml`;
foreach my $educationresults (#findResults){
print $educationresults;
my $parser = new XML::Simple;
my $data = $parser->XMLin($educationresults);
print Dumper($data);
chdir('..');
}
}
ERROR
music/gitar/education.xml
File does not exist: ./music/gitar/education.xml

Using chdir the way you did makes the code IMO less readable. You can use File::Find for that:
use autodie;
use File::Find;
use XML::Simple;
use Data::Dumper;
sub findxml {
my #found;
opendir(DIR, '.');
my #where = grep { -d && m#^[^.]+$# } readdir(DIR);
closedir(DIR);
File::Find::find({wanted => sub {
push #found, $File::Find::name if m#^education\.xml$#s && -f _;
} }, #where);
return #found;
}
foreach my $xml (findxml()){
say $xml;
print Dumper XMLin($xml);
}

Whenever you find yourself relying on backticks to execute shell commands, you should consider whether there is a proper perl way to do it. In this case, there is.
ls can be replaced with <*>, which is a simple glob. The line:
my #array = map { chomp; $_ } `ls`;
Is just a roundabout way of saying
chomp(my #array = `ls`); # chomp takes list arguments as well
But of course the proper way is
my #array = <*>; # no chomp required
Now, the simple solution to all of this is simply to do
for my $xml (<*/education.xml>) { # find the xml files in dir 1 level up
Which will cover one level of directories, with no recursion. For full recursion, use File::Find:
use strict;
use warnings;
use File::Find;
my #list;
find( sub { push #list, $File::Find::name if /^education\.xml$/i; }, ".");
for (#list) {
# do stuff
# #list contains full path names of education.xml files found in subdirs
# e.g. ./music/gitar/education.xml
}
You should note that changing directories is not required, and in my experience, not worth the trouble. Instead of doing:
chdir($somedir);
my $data = XMLin($somefile);
chdir("..");
Simply do:
my $data = XMLin("$somedir/$somefile");

Related

Unable to get absolute path of a file from $File::Find::name - perl

I am unable to get the absolute path of a file from $File::Find::name. It is showing undef vale as a output. Not able to figure it out why :( can any one please help me out in this
Error displayed is : Use of uninitialized value $file_name in concatenation
My Code :
use strict;
use warnings;
use File::Find;
use File::Path qw(make_path);
use File::Copy;
use Cwd;
use Data::Printer;
my $rootPATH = $ARGV[0];
my $id = $ARGV[1];
my #Arraypath;
my $file_name;
our $anr_name;
opendir( my $DIR, $rootPATH );
while ( my $entry = readdir $DIR ) {
next unless -d $rootPATH . '/' . $entry;
next if $entry eq '.' or $entry eq '..';
#print "Found directory $entry\n";
push( #Arraypath, ( split( "\n", $entry ) ) );
}
closedir $DIR;
my $count = 0;
foreach my $line (#Arraypath) {
my $fulllogpath = $rootPATH . "\\" . $line;
#print "$fulllogpath\n";
$count++;
start($fulllogpath);
}
sub start {
my $fulllogpath = shift;
our #content;
#print "$fulllogpath\n\n";
find( \&wanted, $fulllogpath );
sub wanted {
push #content, $_;
return;
}
foreach my $file (#content) {
# print "$file\n\n";
if ( $file =~ /traces[_d]*/ ) {
print "$file\n\n";
$file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}
}
}
Your program is very poorly layed out. It will be much simpler to write and debug code if you indent it properly and use carefully-chosen identifiers: a name like start for a subroutine is useless.
You also have unnecessary subroutine declarations which break up the program flow and make it awkward to follow.
Why do you have a couple of package variables (declared with our)? There is generally no need for them, and it is best to use lexical variables throughout, declared at an appropriate place so that all code has access to them if it needs it.
It is also preferable to use File::Spec to work with file paths, rather than manipulate them using string operators, with which it is easy to make a mistake.
The best way to manage the results of find is to work with absolute paths all the way through. It looks like you want to do more than just print the results returned by find since you load modules like Cwd and File::Copy, but without knowing what that further purpose is I cannot help you to write it.
This code removes all the subroutines and makes everything much more concise.
use strict;
use warnings;
use autodie;
use File::Find 'find';
use File::Spec;
use Data::Printer;
my ($root_path, $id) = #ARGV;
opendir my ($dh), $root_path;
my #dir_list =
grep -d,
map File::Spec->catfile($root_path, $_),
grep { not /\A\.\.?\z/ } readdir $dh;
closedir $dh;
my $count;
for my $dir (#dir_list) {
++$count;
find(sub {
return unless /traces[_d]*/;
my $file = $_;
print "$file\n\n";
my $file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}, $dir);
}
As has already been stated, $File::Find::name is valid only within the wanted function. Not outside of it.
However, I would recommend making the shift to using Path::Class and Path::Class::Rule for some simpler processing of your files in a cross platform compatible way:
use strict;
use warnings;
use Data::Printer;
use Path::Class;
use Path::Class::Rule;
my ( $root_path, $id ) = #ARGV;
my $dir = dir($root_path);
my $next = Path::Class::Rule->new->file->name(qr{traces[_d]*})->iter(
grep { $_->is_dir() } $dir->children
);
while ( my $file = $next->() ) {
# Accomplishes the same as your script. I suspect these prints statements are mostly for debugging though.
print $file->basename(), "\n\n";
p "$file";
print "$file\n";
}

Find::File to search a directory of a list of files

I'm writing a Perl script and I'm new to Perl -- I have a file that contains a list of files. For each item on the list I want to search a given directory and its sub-directories to find the file return the full path. I've been unsuccessful thus far trying to use File::Find. Here's what I got:
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
find(\&wanted, $directory);
sub wanted {
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
return #file_list;
}
I find File::Find::Rule a tad easier and more elegant to use.
use File::Find::Rule;
my $path = '/some/path';
# Find all directories under $path
my #paths = File::Find::Rule->directory->in( $path );
# Find all files in $path
my #files = File::Find::Rule->file->in( $path );
The arrays contain full paths to the objects File::Find::Rule finds.
File::Find is used to traverse a directory structure in the filesystem. Instead of doing what you're trying to do, namely, have the wanted subroutine read in the file, you should read in the file as follows:
use strict;
use warnings;
use vars qw/#file_list/;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
open FILE, "$input_file" or die "$!\n";
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
# do what you need to here with the #file_list array
Okay, well re-read the doc and I misunderstood the wanted subroutine. The wanted is a subroutine that is called on every file and directory that is found. So here's my code to take that into account
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
find(\&wanted, $directory);
sub wanted {
if ( $_ ~~ #file_list ) {
print "$File::Find::name\n";
}
return;
}

Perl search for specific subdirectory then process

So for the program I am writing, what I would like for it to do is search through all of the subdirectories within a directory. If the subdirectory name contains a word, let's say "foo", then the program will open this subdirectory and perform a function on the files within the subdirectory. Can anybody give me some help on how to go about this? it also needs to be recursive. Thanks in advance
This can be done using the File::Find module, but I believe Path::Class is superior even though it isn't a core module and will likely need installing.
This program finds the files wanted and calls process to process them. At present the process subroutine simply prints the name of the file for testing.
use strict;
use warnings;
use Path::Class;
my $dir = dir '/path/to/root/directory';
$dir->recurse(callback => sub {
my $node = shift;
return if $node->is_dir;
my $parent = $node->parent;
if ($parent->basename =~ /foo/) {
process($node);
}
});
sub process {
my $file = shift;
print $file, "\n";
}
Update
If you prefer, this program performs the same task using File::Find.
use strict;
use warnings;
use File::Find;
use File::Basename qw/ basename /;
my $dir = '/path/to/root/directory';
find(sub {
return unless -f;
if (basename($File::Find::dir) =~ /foo/) {
process($File::Find::name);
}
}, $dir);
sub process {
my $file = shift;
print $file, "\n";
}
Update
As requested, here is a further solution using Path::Class::Rule for comparison. As daxim suggested the code is a little shorter.
use strict;
use warnings;
use Path::Class::Rule;
my $rule = Path::Class::Rule->new;
$rule->file->and(sub { $_->parent->basename =~ /foo/ });
my $next = $rule->iter('/path/to/root/directory');
while ( my $file = $next->() ) {
process($file);
}
sub process {
my $file = shift;
print $file, "\n";
}

Perl Recurse through Directory

So essentially what I'm trying to do is go through a directory and perform an action on all of the files, in this case, the sub searchForErrors. This sub works. What I have so far is:
sub proccessFiles{
my $path = $ARGV[2];
opendir(DIR, $path) or die "Unable to open $path: $!";
my #files = readdir(DIR);
#files = map{$path . '/' . $_ } #files;
closedir(DIR);
for (#files){
if(-d $_){
process_files($_);
}
else{
searchForErrors;
}
}
}
proccessFiles($path);
Any help/suggestions would be great. And again, I'm new to Perl, so the more explanation the better. Thank you!
You should use the File::Find module instead of trying to reinvent the wheel:
use strict;
use warnings;
use File::Find;
my #files;
my $start_dir = "somedir"; # top level dir to search
find(
sub { push #files, $File::Find::name unless -d; },
$start_dir
);
for my $file (#files) {
searchForErrors($file);
}
A problem with your current code is that you are including . and .. directories in your recursive search, which will no doubt cause deep recursion errors.
I thought it would be useful to show a Path::Class solution in addition to TLP's File::Find one.
I think this is pretty much self-explanatory.
use strict;
use warnings;
use Path::Class 'dir';
my $root = dir 'C:\path\to\root';
$root->recurse(callback => sub {
my $file = shift;
searchForErrors($file) unless $file->is_dir;
});
A better way of using File::Find without using a lot of memory would be:
use strict; use warnings;
use File::Find;
find(
sub {
searchForErrors($File::Find::name) unless -d
},
"/tmp/"
);
This is more iterator style.

How do I use chdir to traverse subdirectories and parse XML files?

I want to write a script that traverses a directory and its subdirectories, grabs all the XML files and parses them. I am having trouble with chdir. This works fine:
my $search = "/home/user/books";
chdir($search) or die "cant change dir to $search $!";
system("ls");
But I want the user to decide the path where he want to search it so I am using Getopt::Long:
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
use Getopt::Long;
my $outputFile = '';
my $searchPath = "";
my $debug = 0;
GetOptions('outputFile=s' => \$outputFile, 'searchPath=s' => \$searchPath);
if ($outputFile eq '' or $searchPath = '') {
die("parameter --outpulFile=s is required.");
}
$searchPath =~ s/\/*$/\//;
my #founddirs = `cd $searchPath`;
foreach my $foundfiles (#founddirs) {
print $foundfiles;
chdir($foundfiles) or die "cant change dir to $searchPath $!";
chdir('..');
}
Command to run:
perl sample.pl --outputFile=books.txt --searchPath=/home/user/june18
I want to grab all the recursive.xml files from the subdirectories and parse them. Does anyone know how this can be done?
A couple of issues here:
$searchPath = '' is setting the search path to an empty string during the input validation. Use eq instead (not ==)
#founddirs will contain nothing since the backtick operator will return nothing. This is because
my #founddirs = `cd $searchPath`;
does not print found directories that are separated by newlines. Perhaps you're after ls $searchPath
On a side note, why not use File::Find instead?
use strict;
use warnings;
use File::Find;
use Getopt::Long;
my $outputFile;
my $searchPath;
GetOptions(
'outputFile=s' => \$outputFile,
'searchPath=s' => \$searchPath,
);
die "Usage : perl sample.pl -outputFile -searchPath\n"
unless $outputFile && $searchPath;
die "No such directory found: $searchPath\n" unless -d $searchPath;
find( sub { print "$File::Find::name\n" if /$outputFile/ }, $searchPath );
#!/usr/bin/perl --
use strict; use warnings;
use Data::Dump qw/ dd /;
use File::Find::Rule qw/ find /;
my #files = find(
file =>
name => '*.xml',
in => \#ARGV
);
dd \#files;
__END__
$ perl ffrule
[]
$ perl ffrule ../soap
[
"../soap/ex1.xml",
"../soap/ex2.xml",
"../soap/ex3.xml",
]