Find::File to search a directory of a list of files - perl

I'm writing a Perl script and I'm new to Perl -- I have a file that contains a list of files. For each item on the list I want to search a given directory and its sub-directories to find the file return the full path. I've been unsuccessful thus far trying to use File::Find. Here's what I got:
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
find(\&wanted, $directory);
sub wanted {
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
return #file_list;
}

I find File::Find::Rule a tad easier and more elegant to use.
use File::Find::Rule;
my $path = '/some/path';
# Find all directories under $path
my #paths = File::Find::Rule->directory->in( $path );
# Find all files in $path
my #files = File::Find::Rule->file->in( $path );
The arrays contain full paths to the objects File::Find::Rule finds.

File::Find is used to traverse a directory structure in the filesystem. Instead of doing what you're trying to do, namely, have the wanted subroutine read in the file, you should read in the file as follows:
use strict;
use warnings;
use vars qw/#file_list/;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
open FILE, "$input_file" or die "$!\n";
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
# do what you need to here with the #file_list array

Okay, well re-read the doc and I misunderstood the wanted subroutine. The wanted is a subroutine that is called on every file and directory that is found. So here's my code to take that into account
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
find(\&wanted, $directory);
sub wanted {
if ( $_ ~~ #file_list ) {
print "$File::Find::name\n";
}
return;
}

Related

Traversing Through Directories

I am trying to traverse through directories to change certain file extensions in those directories.
I made it to where I can go through a directory that is given through command-line, but I cannot make it traverse through that directories' subdirectories.
For example: If I want to change the file extensions in the directory Test then if Test has a subdirectory I want to be able to go through that directory and change the file extensions of those files too.
I came up with this. This works for one directory. It correctly changes the file extensions of the files in one specific directory.
#!/usr/local/bin/perl
use strict;
use warnings;
my #argv;
my $dir = $ARGV[0];
my #files = glob "${dir}/*pl";
foreach (#files) {
next if -d;
(my $txt = $_) =~ s/pl$/txt/;
rename($_, $txt);
}
I then heard of File::Find::Rule, so I tried to use that to traverse through the directories.
I came up with this:
#!/usr/local/bin/perl
use strict;
use warnings;
use File::Find;
use File::Find::Rule;
my #argv;
my $dir = $ARGV[0];
my #subdirs = File::find::Rule->directory->in( $dir );
sub fileRecurs{
my #files = glob "${dir}/*pl";
foreach (#files) {
next if -d;
(my $txt = $_) =~ s/pl$/txt/;
rename($_, $txt);
}
}
This does not work/ will not work because I am not familiar enough with File::Find::Rule
Is there a better way to traverse through the directories to change the file extensions?
#!/usr/local/bin/perl
use strict;
use warnings;
use File::Find;
my #argv;
my $dir = $ARGV[0];
find(\&dirRecurs, $dir);
sub dirRecurs{
if (-f)
{
(my $txt = $_) =~ s/pl$/txt/;
rename($_, $txt);
}
}
I figured it out with the help of the tutorial #David sent me! Thank you!

How to read directories and sub-directories without knowing the directory name in perl?

Hi i want to read directories and sub-directories without knowing the directory name. Current directory is "D:/Temp". 'Temp' has sub-directories like 'A1','A2'. Again 'A1' has sub-directories like 'B1','B2'. Again 'B1' has sub-directories like 'C1','C2'. Perl script doesn't know these directories. So it has to first find directory and then read one file at a time in dir 'C1' once all files are read in 'C1' it should changes to dir 'C2'. I tried with below code here i don't want to read all files in array(#files) but need one file at time. In array #dir elements should be as fallows.
$dir[0] = "D:/Temp/A1/B1/C1"
$dir[1] = "D:/Temp/A1/B1/C2"
$dir[2] = "D:/Temp/A1/B2/C1"
Below is the code i tried.
use strict;
use File::Find::Rule;
use Data::Dumper;
my $dir = "D:/Temp";
my #dir = File::Find::Rule->directory->in($dir);
print Dumper (\#dir);
my $readDir = $dir[3];
opendir ( DIR, $readDir ) || die "Error in opening dir $readDir\n";
my #files = grep { !/^\.\.?$/ } readdir DIR;
print STDERR "files: #files \n\n";
for my $fil (#files) {
open (F, "<$fil");
read (F, my $data);
close (F);
print "$data";
}
use File::Find;
use strict;
use warnings;
my #dirs;
my %has_children;
find(sub {
if (-d) {
push #dirs, $File::Find::name;
$has_children{$File::Find::dir} = 1;
}
}, 'D:/Temp');
my #ends = grep {! $has_children{$_}} #dirs;
print "$_\n" for (#ends);
Your Goal: Find the absolute paths to those directories that do not themselves have child directories.
I'll call those directories of interest terminal directories. Here's the prototype for a function that I believe provides the convenience you are looking for. The function returns its result as a list.
my #list = find_terminal_directories($full_or_partial_path);
And here's an implementation of find_terminal_directories(). Note that this implementation does not require the use of any global variables. Also note the use of a private helper function that is called recursively.
On my Windows 7 system, for the input directory C:/Perl/lib/Test, I get the output:
== List of Terminal Folders ==
c:/Perl/lib/Test/Builder/IO
c:/Perl/lib/Test/Builder/Tester
c:/Perl/lib/Test/Perl/Critic
== List of Files in each Terminal Folder: ==
c:/Perl/lib/Test/Builder/IO/Scalar.pm
c:/Perl/lib/Test/Builder/Tester/Color.pm
c:/Perl/lib/Test/Perl/Critic/Policy.pm
Implementation
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd qw(abs_path getcwd);
my #dir_list = find_terminal_directories("C:/Perl/lib/Test");
print "== List of Terminal Directories ==\n";
print join("\n", #dir_list), "\n";
print "\n== List of Files in each Terminal Directory: ==\n";
for my $dir (#dir_list) {
for my $file (<"$dir/*">) {
print "$file\n";
open my $fh, '<', $file or die $!;
my $data = <$fh>; # slurp entire file contents into $data
close $fh;
# Now, do something with $data !
}
}
sub find_terminal_directories {
my $rootdir = shift;
my #wanted;
my $cwd = getcwd();
chdir $rootdir;
find_terminal_directories_helper(".", \#wanted);
chdir $cwd;
return #wanted;
}
sub find_terminal_directories_helper {
my ($dir, $wanted) = #_;
return if ! -d $dir;
opendir(my $dh, $dir) or die "open directory error!";
my $count = 0;
foreach my $child (readdir($dh)) {
my $abs_child = abs_path($child);
next if (! -d $child || $child eq "." || $child eq "..");
++$count;
chdir $child;
find_terminal_directories_helper($abs_child, $wanted); # recursion!
chdir "..";
}
push #$wanted, abs_path($dir) if ! $count; # no sub-directories found!
}
Perhaps the following will be helpful:
use strict;
use warnings;
use File::Find::Rule;
my $dir = "D:/Temp";
local $/;
my #dirs =
sort File::Find::Rule->exec( sub { File::Find::Rule->directory->in($_) == 1 }
)->directory->in($dir);
for my $dir (#dirs) {
for my $file (<"$dir/*">) {
open my $fh, '<', $file or die $!;
my $data = <$fh>;
close $fh;
print $data;
}
}
local $/; lets us slurp the file's contents into a variable. Delete it if you only want to read the first line.
The sub in the exec() is used to pass only those dirs which don't contain a dir
sort is used to arrange those dirs in your wanted order
A file glob <"$dir/*"> is used to get the files in each dir
Edit: Have modified the code to find only 'terminal directories.' Thanks to DavidRR for this spec clarification.
I would use File::Find
Sample script:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $dir = "/home/chris";
find(\&wanted, $dir);
sub wanted {
print "dir: $File::Find::dir\n";
print "file in dir: $_\n";
print "complete path to file: $File::Find::name\n";
}
OUTPUTS:
$ test.pl
dir: /home/chris/test_dir
file in dir: test_dir2
complete path to file: /home/chris/test_dir/test_dir2
dir: /home/chris/test_dir/test_dir2
file in dir: foo.txt
complete path to file: /home/chris/test_dir/test_dir2/foo.txt
...
Using backticks, write subdirs and files to a file called filelist:
`ls -R $dir > filelist`

Perl finding a file based off it's extension through all subdirectories

I have a segment of code that is working that finds all of the .txt files in a given directory, but I can't get it to look in the subdirectories.
I need my script to do two things
scan through a folder and all of its subdirectories for a text file
print out just the last segments of its path
For example, I have a directory structed
C:\abc\def\ghi\jkl\mnop.txt
I script that points to the path C:\abc\def\. It then goes through each of the subfolders and finds mnop.txt and any other text file that is in that folder.
It then prints out ghi\jkl\mnop.txt
I am using this, but it really only prints out the file name and if the file is currently in that directory.
opendir(Dir, $location) or die "Failure Will Robertson!";
#reports = grep(/\.txt$/,readdir(Dir));
foreach $reports(#reports)
{
my $files = "$location/$reports";
open (res,$files) or die "could not open $files";
print "$files\n";
}
I do believe that this solution is more simple and easier to read. I hope it is helpful !
#!/usr/bin/perl
use File::Find::Rule;
my #files = File::Find::Rule->file()
->name( '*.txt' )
->in( '/path/to/my/folder/' );
for my $file (#files) {
print "file: $file\n";
}
What about using File::Find?
#!/usr/bin/env perl
use warnings;
use strict;
use File::Find;
# for example let location be tmp
my $location="tmp";
sub find_txt {
my $F = $File::Find::name;
if ($F =~ /txt$/ ) {
print "$F\n";
}
}
find({ wanted => \&find_txt, no_chdir=>1}, $location);
Much easier if you just use File::Find core module:
#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);
use File::Find;
my $Target = shift;
find(\&survey, #ARGV);
sub survey {
print "Found $File::Find::name\n" if ($_ eq $Target)
}
First argument: pathless name of file to search for. All subsequent arguments are directories to check. File::Find searches recursively, so you only need to name the top of a tree, all subdirectories will automatically be searched as well.
$File::Find::name is the full pathname of the file, so you could subtract your $location from that if you want a relative path.

change the directory and grab the xml file to parse certain data in perl

I am trying to parse specific XML file which is located in sub directories of one directory. For some reason i am getting error saying file does not exists. if the file does not exist it should move on to next sub directory.
HERE IS MY CODE
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
my #xmlsearch = map { chomp; $_ } `ls`;
foreach my $directory (#xmlsearch) {
print "$directory \n";
chdir($directory) or die "Couldn't change to [$directory]: $!";
my #findResults = `find -name education.xml`;
foreach my $educationresults (#findResults){
print $educationresults;
my $parser = new XML::Simple;
my $data = $parser->XMLin($educationresults);
print Dumper($data);
chdir('..');
}
}
ERROR
music/gitar/education.xml
File does not exist: ./music/gitar/education.xml
Using chdir the way you did makes the code IMO less readable. You can use File::Find for that:
use autodie;
use File::Find;
use XML::Simple;
use Data::Dumper;
sub findxml {
my #found;
opendir(DIR, '.');
my #where = grep { -d && m#^[^.]+$# } readdir(DIR);
closedir(DIR);
File::Find::find({wanted => sub {
push #found, $File::Find::name if m#^education\.xml$#s && -f _;
} }, #where);
return #found;
}
foreach my $xml (findxml()){
say $xml;
print Dumper XMLin($xml);
}
Whenever you find yourself relying on backticks to execute shell commands, you should consider whether there is a proper perl way to do it. In this case, there is.
ls can be replaced with <*>, which is a simple glob. The line:
my #array = map { chomp; $_ } `ls`;
Is just a roundabout way of saying
chomp(my #array = `ls`); # chomp takes list arguments as well
But of course the proper way is
my #array = <*>; # no chomp required
Now, the simple solution to all of this is simply to do
for my $xml (<*/education.xml>) { # find the xml files in dir 1 level up
Which will cover one level of directories, with no recursion. For full recursion, use File::Find:
use strict;
use warnings;
use File::Find;
my #list;
find( sub { push #list, $File::Find::name if /^education\.xml$/i; }, ".");
for (#list) {
# do stuff
# #list contains full path names of education.xml files found in subdirs
# e.g. ./music/gitar/education.xml
}
You should note that changing directories is not required, and in my experience, not worth the trouble. Instead of doing:
chdir($somedir);
my $data = XMLin($somefile);
chdir("..");
Simply do:
my $data = XMLin("$somedir/$somefile");

How can I add a prefix to all filenames under a directory?

I am trying to prefix a string (reference_) to the names of all the *.bmp files in all the directories as well sub-directories. The first time we run the silk script, it will create directories as well subdirectories, and under each subdirectory it will store each mobile application's sceenshot with .bmp extension.
When I run the automated silkscript for second time it will again create the *.bmp files in all the subdirectories. Before running the script for second time I want to prefix all the *.bmp with a string reference_.
For example first_screen.bmp to reference_first_screen.bmp,
I have the directory structure as below:
C:\Image_Repository\BG_Images\second
...
C:\Image_Repository\BG_Images\sixth
having first_screen.bmp and first_screen.bmp files etc...
Could any one help me out?
How can I prefix all the image file names with reference_ string?
When I run the script for second time, the Perl script in silk will take both the images from the sub-directory and compare them both pixel by pixel. I am trying with code below.
Could you please guide me how can I proceed to complete this task.
#!/usr/bin/perl -w
&one;
&two;
sub one {
use Cwd;
my $dir ="C:\\Image_Repository";
#print "$dir\n";
opendir(DIR,"+<$dir") or "die $!\n";
my #dir = readdir DIR;
#$lines=#dir;
delete $dir[-1];
print "$lines\n";
foreach my $item (#dir)
{
print "$item\n";
}
closedir DIR;
}
sub two {
use Cwd;
my $dir1 ="C:\\Image_Repository\\BG_Images";
#print "$dir1\n";
opendir(D,"+<$dir1") or "die $!\n";
my #dire = readdir D;
#$lines=#dire;
delete $dire[-1];
#print "$lines\n";
foreach my $item (#dire)
{
#print "$item\n";
$dir2="C:\\Image_Repository\\BG_Images\\$item";
print $dir2;
opendir(D1,"+<$dir2") or die " $!\n";
my #files=readdir D1;
#print "#files\n";
foreach $one (#files)
{
$one="reference_".$one;
print "$one\n";
#rename $one,Reference_.$one;
}
}
closedir DIR;
}
I tried open call with '+<' mode but I am getting compilation error for the read and write mode.
When I am running this code, it shows the files in BG_images folder with prefixed string but actually it's not updating the files in the sub-directories.
You don't open a directory for writing. Just use opendir without the mode parts of the string:
opendir my($dir), $dirname or die "Could not open $dirname: $!";
However, you don't need that. You can use File::Find to make the list of files you need.
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use File::Find;
use File::Find::Closures qw(find_regular_files);
use File::Spec::Functions qw(catfile);
my( $wanted, $reporter ) = find_regular_files;
find( $wanted, $ARGV[0] );
my $prefix = 'recursive_';
foreach my $file ( $reporter->() )
{
my $basename = basename( $file );
if( index( $basename, $prefix ) == 0 )
{
print STDERR "$file already has '$prefix'! Skipping.\n";
next;
}
my $new_path = catfile(
dirname( $file ),
"recursive_$basename"
);
unless( rename $file, $new_path )
{
print STDERR "Could not rename $file: $!\n";
next;
}
print $file, "\n";
}
You should probably check out the File::Find module for this - it will make recursing up and down the directory tree simpler.
You should probably be scanning the file names and modifying those that don't start with reference_ so that they do. That may require splitting the file name up into a directory name and a file name and then prefixing the file name part with reference_. That's done with the File::Basename module.
At some point, you need to decide what happens when you run the script the third time. Do the files that already start with reference_ get overwritten, or do the unprefixed files get overwritten, or what?
The reason the files are not being renamed is that the rename operation is commented out. Remember to add use strict; at the top of your script (as well as the -w option which you did use).
If you get a list of files in an array #files (and the names are base names, so you don't have to fiddle with File::Basename), then the loop might look like:
foreach my $one (#files)
{
my $new = "reference_$one";
print "$one --> $new\n";
rename $one, $new or die "failed to rename $one to $new ($!)";
}
With the aid of find utility from coreutils for Windows:
$ find -iname "*.bmp" | perl -wlne"chomp; ($prefix, $basename) = split(m~\/([^/]+)$~, $_); rename($_, join(q(/), ($prefix, q(reference_).$basename))) or warn qq(failed to rename '$_': $!)"