PERL - issues extracting a file from directory/subdirectories/..? - perl

Quick note: I've been stuck with this problem for quite a few days and I'm not necessarily hoping to find an answer, but any kind of help that might "enlighten" me. I would also like to mention that I am a beginner in Perl, so my knowledge is not very vast and in this case recursivity is not my forte. here goes:
What I would like my Perl script to do is the following:
take a directory as an argument
go into the directory that was passed and its subdirectories to find an *.xml file
store the full path of the found *.xml file into an array.
Below is the code that i have so far, but i haven't managed to make it work:
#! /usr/bin/perl -W
my $path;
process_files ($path);
sub process_files
{
opendir (DIR, $path) or die "Unable to open $path: $!";
my #files =
# Third: Prepend the full path
map { $path . '/' . $_ }
# Second: take out '.' and '..'
grep { !/^\.{1,2}$/ }
# First: get all files
readdir (DIR);
closedir (DIR);
for (#files)
{
if (-d $_)
{
push #files, process_files ($_);
}
else
{
#analyse document
}
}
return #files;
}
Anybody have any clues to point me in the right direction? Or an easier way to do it?
Thank you,
sSmacKk :D

Sounds like you should be using File::Find. Its find subroutine will traverse a directory recursively.
use strict;
use warnings;
use File::Find;
my #files;
my $path = shift;
find(
sub { (-f && /\.xml$/i) or return;
push #files, $File::Find::name;
}, $path);
The subroutine will perform whatever code it contains on the files it finds. This one simply pushes the XML file names (with full path) onto the #files array. Read more in the documentation for the File::Find module, which is a core module in perl 5.

Related

Error - For all subdirectories under a main dir - create a list of files within each subdirectory

I am trying to generate individual list of files inside each subdirectory under one given directory. This may be an already addressed and resolved problem, but my specific issue is that my sub directories will always have a pattern in name and also have exact number files for eg -
<**Main_Dir**>
<sub-directory>- sub1_dir with files
sub1.name.txt
sub1.place.txt
sub3.time.txt
sub4.date.txt
<sub-directory>- sub2_dir with files
sub2.name.txt
sub2.place.txt
..............
<sub-directory>- sub3_dir with files
sub3.name.txt
...............
.......
Is there a way to make the code loop over each sub* subfolder, since I know the pattern/name for these main folders will remain like this?
In short the script should create a file under each subdirectory with list of files in it.
Eg -
<sub-directory>- sub1_dir with files
List_sub1_dir.txt
sub1.name.txt
..............
<sub-directory>- sub2_dir with files
List_sub2_dir.txt
...............
My edited code - It does not create any list file in the subfolder Can someone please help me find the error? Thanks a lot!!
use strict;
use warnings;
use File::Find::Rule;
my $directory = './Maindir';
my #subdirs = File::Find::Rule->directory->in( $directory );
foreach my $dir (#subdirs) {
#print "$dir\n";
next if ($dir eq "..");
if (-d $dir)
{
my #files = File::Find::Rule->file() ->name( '*.*' ) ->in( $di
+r);
foreach my $file (#files)
{
open (FH,"$file");
while (<FH>)
{
open FILE,">>./$dir.txt" or die $!;
print FILE "$_";
}
close(FH);
close FILE;
#print "$file\n";
}
}
I think you want
my #files = glob './Main_dir/*/*.txt'
As someone else suggested, File::Find is a useful module for this type of task. Personally, I prefer to use File::Find::Rule instead. Basically File::Find:Rule provides an alternate interface to the File::Find module.
Here's an example using File::Find::Rule to find all files (with full paths) within a directory (and within it's subdirectories):
use strict;
use warnings;
use feature 'say';
use File::Find::Rule;
my $dir = '/mydir';
my $rule = File::Find::Rule->new();
$rule->$file;
my #files = $rule->in($dir);
foreach my $file (#files) {say $file;}
UPDATE:
The above code has a typo. The $rule->$file; line should be $rule->file;.

Perl finding a file based off it's extension through all subdirectories

I have a segment of code that is working that finds all of the .txt files in a given directory, but I can't get it to look in the subdirectories.
I need my script to do two things
scan through a folder and all of its subdirectories for a text file
print out just the last segments of its path
For example, I have a directory structed
C:\abc\def\ghi\jkl\mnop.txt
I script that points to the path C:\abc\def\. It then goes through each of the subfolders and finds mnop.txt and any other text file that is in that folder.
It then prints out ghi\jkl\mnop.txt
I am using this, but it really only prints out the file name and if the file is currently in that directory.
opendir(Dir, $location) or die "Failure Will Robertson!";
#reports = grep(/\.txt$/,readdir(Dir));
foreach $reports(#reports)
{
my $files = "$location/$reports";
open (res,$files) or die "could not open $files";
print "$files\n";
}
I do believe that this solution is more simple and easier to read. I hope it is helpful !
#!/usr/bin/perl
use File::Find::Rule;
my #files = File::Find::Rule->file()
->name( '*.txt' )
->in( '/path/to/my/folder/' );
for my $file (#files) {
print "file: $file\n";
}
What about using File::Find?
#!/usr/bin/env perl
use warnings;
use strict;
use File::Find;
# for example let location be tmp
my $location="tmp";
sub find_txt {
my $F = $File::Find::name;
if ($F =~ /txt$/ ) {
print "$F\n";
}
}
find({ wanted => \&find_txt, no_chdir=>1}, $location);
Much easier if you just use File::Find core module:
#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);
use File::Find;
my $Target = shift;
find(\&survey, #ARGV);
sub survey {
print "Found $File::Find::name\n" if ($_ eq $Target)
}
First argument: pathless name of file to search for. All subsequent arguments are directories to check. File::Find searches recursively, so you only need to name the top of a tree, all subdirectories will automatically be searched as well.
$File::Find::name is the full pathname of the file, so you could subtract your $location from that if you want a relative path.

Delete wildcard in perl

I am a new to Perl, but I thought the following should work. I have the following snippet of a larger perl script
#mylist = ("${my_dir}AA_???_???.DAT", "${my_dir}AA???.DAT");
foreach my $list (#mylist) {
if (-e $list) {
system ("cp ${list} ${my_other_dir}");
}
}
The above snippet is not able to find those wildcards, with AA_???_???.DAT but it does able to find the file name with the wildcard AA???.DAT
I have tried also deleting the files AA??_???.DAT as
unlink(glob(${my_dir}AA_???_???.DAT"))
but the script just hangs up. But it is able to delete files match AA???.DAT using:
unlink(glob("${my_dir}AA???.DAT))
What could be the reasons?
-e $list checks for the existence of files, so return false for both AA_???_???.DAT or AA???.DAT (unless you actually have file named exactly that). It's not true that one works and he other one doesn't.
It's also not true that unlink(glob(${my_dir}AA_???_???.DAT")) hangs. For starters, it doesn't even compile.
I would use the opendir and readdir built-in functions (modified from the documentation example):
opendir(my $dh, $some_dir) || die "can't opendir $some_dir: $!";
#mylist = grep { /^(.AA_..._...\.DAT|AA...\.DAT)$/ && -f "$some_dir/$_" } readdir($dh);
closedir $dh;
Then you can plug in your original code:
foreach my $list (#mylist) {
if (-e $list) {
system ("cp $some_dir/${list} ${my_other_dir}/");
}
}
For directory recursive file operations I really like to use the File::Find CPAN module.
This will traverse through sub directories passing each file to a specified subroutine
to process that file. As an example:
#! /usr/bin/perl
use strict;
use warnings;
use File::Find;
my #dirs='/path/to/dir';
my $my_other_dir='/path/to/otherdir';
find(&process_files, #dirs);
sub process_files {
my($file) = $_;
my($fullpath) = $File::Find::name;
return if($file !~ /^AA_..._...\.DAT$/ and
$file !~ /^AA...\.DAT$/);
system ("cp $fullpath $my_other_dir/");
}

Recursive Perl detail need help

i think this is a simple problem, but i'm stuck with it for some time now! I need a fresh pair of eyes on this.
The thing is i have this code in perl:
#!c:/Perl/bin/perl
use CGI qw/param/;
use URI::Escape;
print "Content-type: text/html\n\n";
my $directory = param ('directory');
$directory = uri_unescape ($directory);
my #contents;
readDir($directory);
foreach (#contents) {
print "$_\n";
}
#------------------------------------------------------------------------
sub readDir(){
my $dir = shift;
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
if(-d $dir.$file)
{
#print $dir.$file. " ----- DIR\n";
readDir($dir.$file);
}
push #contents, ($dir . $file);
}
closedir(DIR);
}
I've tried to make it recursive. I need to have all the files of all of the directories and subdirectories, with the full path, so that i can open the files in the future.
But my output only returns the files in the current directory and the files in the first directory that it finds. If i have 3 folders inside the directory it only shows the first one.
Ex. of cmd call:
"perl readDir.pl directory=C:/PerlTest/"
Thanks
Avoid wheel reinvention, use CPAN.
use Path::Class::Iterator;
my $it = Path::Class::Iterator->new(
root => $dir,
breadth_first => 0
);
until ($it->done) {
my $f = $it->next;
push #contents, $f;
}
Make sure that you don't let people set $dir to something that will let them look somewhere you don't want them to look.
Your problem is the scope of the directory handle DIR. DIR has global scope so each recursive call to readDir is using the same DIR; so, when you closdir(DIR) and return to the caller, the caller does a readdir on a closed directory handle and everything stops. The solution is to use a local directory handle:
sub readDir {
my ($dir) = #_;
opendir(my $dh, $dir) or die $!;
while(my $file = readdir($dh)) {
next if($file eq '.' || $file eq '..');
my $path = $dir . '/' . $file;
if(-d $path) {
readDir($path);
}
push(#contents, $path);
}
closedir($dh);
}
Also notice that you would be missing a directory separator if (a) it wasn't at the end of $directory or (b) on every recursive call. AFAIK, slashes will be internally converted to backslashes on Windows but you might want to use a path mangling module from CPAN anyway (I only care about Unix systems so I don't have any recommendations).
I'd also recommend that you pass a reference to #contents to readDir rather than leaving it as a global variable, fewer errors and less confusion that way. And don't use parentheses on sub definitions unless you know exactly what they do and what they're for. Some sanity checking and scrubbing on $directory would be a good idea as well.
There are many modules that are available for recursively listing files in a directory.
My favourite is File::Find::Rule
use strict ;
use Data::Dumper ;
use File::Find::Rule ;
my $dir = shift ; # get directory from command line
my #files = File::Find::Rule->in( $dir );
print Dumper( \#files ) ;
Which sends a list of files into an array ( which your program was doing).
$VAR1 = [
'testdir',
'testdir/file1.txt',
'testdir/file2.txt',
'testdir/subdir',
'testdir/subdir/file3.txt'
];
There a loads of other options, like only listing files with particular names. Or you can set it up as an iterator, which is described in How can I use File::Find
How can I use File::Find in Perl?
If you want to stick to modules that come with Perl Core, have a look at File::Find.

How do I read multiple directories and read the contents of subdirectories in Perl?

I have a folder and inside that I have many subfolders. In those subfolders I have many .html files to be read. I have written the following code to do that. It opens the parent folder and also the first subfolder and it prints only one .html file. It shows error:
NO SUCH FILE OR DIRECTORY
I dont want to change the entire code. Any modifications in the existing code will be good for me.
use FileHandle;
opendir PAR_DIR,"D:\\PERL\\perl_programes\\parent_directory";
while (our $sub_folders = readdir(PAR_DIR))
{
next if(-d $sub_folders);
opendir SUB_DIR,"D:\\PERL\\perl_programes\\parent_directory\\$sub_folders";
while(our $file = readdir(SUB_DIR))
{
next if($file !~ m/\.html/i);
print_file_names($file);
}
close(FUNC_MODEL1);
}
close(FUNC_MODEL);
sub print_file_names()
{
my $fh1 = FileHandle->new("D:\\PERL\\perl_programes\\parent_directory\\$file")
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Your posted code looks way overcomplicated. Check out File::Find::Rule and you could do most of that heavy lifting in very little code.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
I mean isn't that sexy?!
A user commented that you may be wishing to use only Depth=2 entries.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->mindepth(2)->maxdepth(2)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
Will Apply this restriction.
You're not extracting the supplied $file parameter in the print_file_names() function.
It should be:
sub print_file_names()
{
my $file = shift;
...
}
Your -d test in the outer loop looks wrong too, BTW. You're saying next if -d ... which means that it'll skip the inner loop for directories, which appears to be the complete opposite of what you require. The only reason it's working at all is because you're testing $file which is only the filename relative to the path, and not the full path name.
Note also:
Perl on Windows copes fine with / as a path separator
Set your parent directory once, and then derive other paths from that
Use opendir($scalar, $path) instead of opendir(DIR, $path)
nb: untested code follows:
use strict;
use warnings;
use FileHandle;
my $parent = "D:/PERL/perl_programes/parent_directory";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {
next unless $file =~ /\.html?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
}
closedir($par_dir);
sub print_file_names()
{
my $file = shift;
my $fh1 = FileHandle->new($file)
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Please start putting:
use strict;
use warnings;
at the top of all your scripts, it will help you avoid problems like this and make your code much more readable.
You can read more about it here: Perlmonks
You are going to need to change the entire code to make it robust:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $top = $ENV{TEMP};
find( { wanted => \&wanted, no_chdir=> 1 }, $top );
sub wanted {
return unless -f and /\.html$/i;
print $_, "\n";
}
__END__
Have you considered using
File::Find
Here's one method which does not require to use File::Find:
First open the root directory, and store all the sub-folders' names in an array by using readdir;
Then, use foreach loop. For each sub-folder, open the new directory by linking the root directory and the folder's name. Still use readdir to store the file names in an array.
The last step is to write the codes for processing the files inside this foreach loop.
Special thanks to my teacher who has given me this idea :) It really worked well!