Perl check if filename matching a pattern exists recursively - perl

I'm looping through folders in a directory and need to check if a file that matches a pattern exists in each directory. I've used glob but it seems to work for the the first folder only. I get file not found for the second folder even I know that it's there.
Here is my code:
my #dirs = grep { -d } glob '/data/test_all_runs/*';
for my $dir ( #dirs ) {
print "the directory is $dir\n";
my $run_folder = (split '/', $dir)[3];
print "the folder is $run_folder\n";
my $matrix_excel = $dir."/*bcmatrix.xls";
my $summary_excel = $dir."/*bc_summary.xls";
unless (-e $summary_excel) {
if (glob($summary_excel)) {
At least one file matches "*.file"
}
else
{
print "File Doesn't Exist!";
print STDERR "|=============================================|\n";
print STDERR "| |\n";
print STDERR "| Can't find Summary .xls File!!! |\n";
print STDERR "| |\n";
print STDERR "| Upload the file and rerun the program. |\n";
print STDERR "| |\n";
print STDERR "|=============================================|\n";
die;
}
}
Is there another method to check if *bcmatrix.xls file exists in each folder of /data/test_all_runs/*?

This may be a bit overkill, but it seems to do what you need. I use File::Find::Rule to fetch all of the directories in the directory structure, then use glob to get the list of file names that match the pattern:
Given this directory structure:
orig
|-a
|-a.txt
|-b
|-ba.txt
|-c
With this code:
use warnings;
use strict;
use File::Basename;
use File::Find::Rule;
my $dir = 'orig';
my $file = 'a.txt';
my #dirs = File::Find::Rule->directory
->in($dir);
for (#dirs){
next if /(?:\.|\.\.)/;
if (my #files = glob "$_/*$file"){
for my $path (#files){
my $name = basename $path;
print "file $name exists in $_\n";
}
}
else {
print "file not found in directory $_\n";
}
}
I get the following output:
file not found in directory orig
file ba.txt exists in orig/b
file not found in directory orig/c
file a.txt exists in orig/a

I suggest that you use something like this. It will build a hash of arrays that lists all the files in each subdirectory of /data/test_all_runs that look like either *bcmatrix.xls or *bc_summary.xls
You should be able to do what you want with the result
use strict;
use warnings 'all';
use File::Spec::Functions 'splitdir';
my %files;
for my $path ( glob '/data/test_all_runs/*/*{bcmatrix,bc_summary}.xls' ) {
my ($subdir, $file) = (splitdir $path)[-2, -1];
push #{ $files{$subdir} }, $file;
}
use Data::Dumper;
print Dumper \%files;

Related

Print files and subdirectories of given directory

I am trying to get all files and directories from a given directory but I can't specify what is the type (file/ directory). Nothing is being printed. What I am doing wrong and how to solve it. Here is the code:
sub DoSearch {
my $currNode = shift;
my $currentDir = opendir (my $dirHandler, $currNode->rootDirectory) or die $!;
while (my $node = readdir($dirHandler)) {
if ($node eq '.' or $node eq '..') {
next;
}
print "File: " . $node . "\n" if -f $node;
print "Directory " . $node . "\n" if -d $node;
}
closedir($dirHandler);
}
readdir returns only the node name without any path information. The file test operators will look in the current working directory if no path is specified, and because the current directory isn't $currNode->rootDirectory they won't be found
I suggest you use rel2abs from the File::Spec::Functions core module to combine the node name with the path. You can use string concatenation, but the library function takes care of corner cases like whether the directory ends with a slash
It's also worth pointing out that Perl identifiers are most often in snake_case, and people familiar with the language would thank you for not using capital letters. They should especially be avoided for the first character of an identifier, as names like that are reserved for globals like package names
I think your subroutine should look like this
use File::Spec::Functions 'rel2abs';
sub do_search {
my ($curr_node) = #_;
my $dir = $curr_node->rootDirectory;
opendir my $dh, $dir or die qq{Unable to open directory "$dir": $!};
while ( my $node = readdir $dh ) {
next if $node eq '.' or $node eq '..';
my $fullname = rel2abs($node, $dir);
print "File: $node\n" if -f $fullname;
print "Directory $node\n" if -d $fullname;
}
}
An alternative method is to set the current working directory to the directory being read. That way there is no need to manipulate file paths, but you would need to save and restore the original working directory before and after changing it
The Cwd core module provides getcwd and your code would look like this
use Cwd 'getcwd';
sub do_search {
my ($curr_node) = #_;
my $cwd = getcwd;
chdir $curr_node->rootDirectory or die $!;
opendir my $dh, '.' or die $!;
while ( my $node = readdir $dh ) {
next if $node eq '.' or $node eq '..';
print "File: \n" if -f $node;
print "Directory $node\n" if -d $node;
}
chdir $cwd or die $!;
}
Use this CPAN Module to get all files and subdirectories recursively.
use File::Find;
find(\&getFile, $dir);
my #fileList;
sub getFile{
print $File::Find::name."\n";
# Below lines will print only file name.
#if ($File::Find::name =~ /.*\/(.*)/ && $1 =~ /\./){
#push #fileList, $File::Find::name."\n";
}
Already answered, but sometimes is handy not to care with the implementation details and you could use some CPAN modules for hiding such details.
One of them is the wonderful Path::Tiny module.
Your code could be as:
use 5.014; #strict + feature 'say' + ...
use warnings;
use Path::Tiny;
do_search($_) for #ARGV;
sub do_search {
my $curr_node = path(shift);
for my $node ($curr_node->children) {
say "Directory : $node" if -d $node;
say "Plain File : $node" if -f $node;
}
}
The children method excludes the . and the .. automatically.
You also need understand that the -f test is true only for the real files. So, the above code excludes for example symlinks (whose points to real files), or FIFO files, and so on... Such "files" could be usually opened and read as plain files, therefore somethimes instead of the -f is handy to use the -e && ! -d test (e.g. exists, but not an directory).
The Path::Tiny has some methods for this, e.g. you could write
for my $node ($curr_node->children) {
print "Directory : $node\n" if $node->is_dir;
print "File : $node\n" if $node->is_file;
}
the is_file method is usually DWIM - e.g. does the: -e && ! -d.
Using the Path::Tiny you could also easily extend your function to walk the whole tree using the iterator method:
use 5.014;
use warnings;
use Path::Tiny;
do_search($_) for #ARGV;
sub do_search {
#maybe you need some error-checking here for the existence of the argument or like...
my $iterator = path(shift)->iterator({recurse => 1});
while( my $node = $iterator->() ) {
say "Directory : ", $node->absolute if $node->is_dir;
say "File : ", $node->absolute if $node->is_file;
}
}
The above prints the type for all files and directories recursive down from the given argument...
And so on... the Path::Tiny is really worth to have installed.

Perl to find the extension of file

I have a program that takes directory name as input from user and searches all files inside the directory and prints the contents of file. Is there any way so that I can read the extension of file and read the contents of file that are of specified extension? For example, it should read contents of file that is in ".txt" format.
My code is
use strict;
use warnings;
use File::Basename;
#usr/bin/perl
print "enter a directory name\n";
my $dir = <>;
print "you have entered $dir \n";
chomp($dir);
opendir DIR, $dir or die "cannot open directory $!";
while ( my $file = readdir(DIR) ) {
next if ( $file =~ m/^\./ );
my $filepath = "${dir}${file}";
print "$filepath\n";
print " $file \n";
open( my $fh, '<', $filepath ) or die "unable to open the $file $!";
while ( my $row = <$fh> ) {
chomp $row;
print "$row\n";
}
}
To get just the ".txt" files, you can use a file test operator (-f : regular file) and a regex.
my #files = grep { -f && /\.txt$/ } readdir $dir;
Otherwise, you can look for just text files, using perl's -T (ascii-text file test operator)
my #files = grep { -T } readdir $dir;
Otherwise you can try even this:
my #files = grep {-f} glob("$dir/*.txt");
You're pretty close here. You have a main loop that looks like this:
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
# do stuff
}
See where you're skipping loop iterations if the filename starts with a dot. That's an excellent place to put any other skip requirements that you have - like skipping files that don't end with '.txt'.
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
next unless $file =~ /\.txt$/i; # skip non-text files
# do stuff
}
In the same way as your original test checked for the start of the string (^) followed by a literal dot (\.), we're now searching for a dot (\.) followed by txt and the end of the string ($). Note that I've also added the /i option to the match operator to make the match case-insensitive - so that we match ".TXT" as well as ".txt".
It's worth noting that the extension of a file is a terrible way to work out what the file contains.
Try this. Below code gives what you expect.
use warnings;
use strict;
print "Enter the directory name: ";
chomp(my $dir=<>);
print "Enter the file extension type: "; #only type the file format. like txt rtf
chomp(my $ext=<>);
opendir('dir',"$dir");
my #files = grep{m/.$ext/g} readdir('dir');
foreach my $ech(#files){
open('file',"$dir/$ech");
print <file>;
}
I'm store the all file from the particular directory to store the one array and i get the particular file format by using the grep command. Then open the files into the foreach condition

How to read directories and sub-directories without knowing the directory name in perl?

Hi i want to read directories and sub-directories without knowing the directory name. Current directory is "D:/Temp". 'Temp' has sub-directories like 'A1','A2'. Again 'A1' has sub-directories like 'B1','B2'. Again 'B1' has sub-directories like 'C1','C2'. Perl script doesn't know these directories. So it has to first find directory and then read one file at a time in dir 'C1' once all files are read in 'C1' it should changes to dir 'C2'. I tried with below code here i don't want to read all files in array(#files) but need one file at time. In array #dir elements should be as fallows.
$dir[0] = "D:/Temp/A1/B1/C1"
$dir[1] = "D:/Temp/A1/B1/C2"
$dir[2] = "D:/Temp/A1/B2/C1"
Below is the code i tried.
use strict;
use File::Find::Rule;
use Data::Dumper;
my $dir = "D:/Temp";
my #dir = File::Find::Rule->directory->in($dir);
print Dumper (\#dir);
my $readDir = $dir[3];
opendir ( DIR, $readDir ) || die "Error in opening dir $readDir\n";
my #files = grep { !/^\.\.?$/ } readdir DIR;
print STDERR "files: #files \n\n";
for my $fil (#files) {
open (F, "<$fil");
read (F, my $data);
close (F);
print "$data";
}
use File::Find;
use strict;
use warnings;
my #dirs;
my %has_children;
find(sub {
if (-d) {
push #dirs, $File::Find::name;
$has_children{$File::Find::dir} = 1;
}
}, 'D:/Temp');
my #ends = grep {! $has_children{$_}} #dirs;
print "$_\n" for (#ends);
Your Goal: Find the absolute paths to those directories that do not themselves have child directories.
I'll call those directories of interest terminal directories. Here's the prototype for a function that I believe provides the convenience you are looking for. The function returns its result as a list.
my #list = find_terminal_directories($full_or_partial_path);
And here's an implementation of find_terminal_directories(). Note that this implementation does not require the use of any global variables. Also note the use of a private helper function that is called recursively.
On my Windows 7 system, for the input directory C:/Perl/lib/Test, I get the output:
== List of Terminal Folders ==
c:/Perl/lib/Test/Builder/IO
c:/Perl/lib/Test/Builder/Tester
c:/Perl/lib/Test/Perl/Critic
== List of Files in each Terminal Folder: ==
c:/Perl/lib/Test/Builder/IO/Scalar.pm
c:/Perl/lib/Test/Builder/Tester/Color.pm
c:/Perl/lib/Test/Perl/Critic/Policy.pm
Implementation
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd qw(abs_path getcwd);
my #dir_list = find_terminal_directories("C:/Perl/lib/Test");
print "== List of Terminal Directories ==\n";
print join("\n", #dir_list), "\n";
print "\n== List of Files in each Terminal Directory: ==\n";
for my $dir (#dir_list) {
for my $file (<"$dir/*">) {
print "$file\n";
open my $fh, '<', $file or die $!;
my $data = <$fh>; # slurp entire file contents into $data
close $fh;
# Now, do something with $data !
}
}
sub find_terminal_directories {
my $rootdir = shift;
my #wanted;
my $cwd = getcwd();
chdir $rootdir;
find_terminal_directories_helper(".", \#wanted);
chdir $cwd;
return #wanted;
}
sub find_terminal_directories_helper {
my ($dir, $wanted) = #_;
return if ! -d $dir;
opendir(my $dh, $dir) or die "open directory error!";
my $count = 0;
foreach my $child (readdir($dh)) {
my $abs_child = abs_path($child);
next if (! -d $child || $child eq "." || $child eq "..");
++$count;
chdir $child;
find_terminal_directories_helper($abs_child, $wanted); # recursion!
chdir "..";
}
push #$wanted, abs_path($dir) if ! $count; # no sub-directories found!
}
Perhaps the following will be helpful:
use strict;
use warnings;
use File::Find::Rule;
my $dir = "D:/Temp";
local $/;
my #dirs =
sort File::Find::Rule->exec( sub { File::Find::Rule->directory->in($_) == 1 }
)->directory->in($dir);
for my $dir (#dirs) {
for my $file (<"$dir/*">) {
open my $fh, '<', $file or die $!;
my $data = <$fh>;
close $fh;
print $data;
}
}
local $/; lets us slurp the file's contents into a variable. Delete it if you only want to read the first line.
The sub in the exec() is used to pass only those dirs which don't contain a dir
sort is used to arrange those dirs in your wanted order
A file glob <"$dir/*"> is used to get the files in each dir
Edit: Have modified the code to find only 'terminal directories.' Thanks to DavidRR for this spec clarification.
I would use File::Find
Sample script:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $dir = "/home/chris";
find(\&wanted, $dir);
sub wanted {
print "dir: $File::Find::dir\n";
print "file in dir: $_\n";
print "complete path to file: $File::Find::name\n";
}
OUTPUTS:
$ test.pl
dir: /home/chris/test_dir
file in dir: test_dir2
complete path to file: /home/chris/test_dir/test_dir2
dir: /home/chris/test_dir/test_dir2
file in dir: foo.txt
complete path to file: /home/chris/test_dir/test_dir2/foo.txt
...
Using backticks, write subdirs and files to a file called filelist:
`ls -R $dir > filelist`

Perl finding a file based off it's extension through all subdirectories

I have a segment of code that is working that finds all of the .txt files in a given directory, but I can't get it to look in the subdirectories.
I need my script to do two things
scan through a folder and all of its subdirectories for a text file
print out just the last segments of its path
For example, I have a directory structed
C:\abc\def\ghi\jkl\mnop.txt
I script that points to the path C:\abc\def\. It then goes through each of the subfolders and finds mnop.txt and any other text file that is in that folder.
It then prints out ghi\jkl\mnop.txt
I am using this, but it really only prints out the file name and if the file is currently in that directory.
opendir(Dir, $location) or die "Failure Will Robertson!";
#reports = grep(/\.txt$/,readdir(Dir));
foreach $reports(#reports)
{
my $files = "$location/$reports";
open (res,$files) or die "could not open $files";
print "$files\n";
}
I do believe that this solution is more simple and easier to read. I hope it is helpful !
#!/usr/bin/perl
use File::Find::Rule;
my #files = File::Find::Rule->file()
->name( '*.txt' )
->in( '/path/to/my/folder/' );
for my $file (#files) {
print "file: $file\n";
}
What about using File::Find?
#!/usr/bin/env perl
use warnings;
use strict;
use File::Find;
# for example let location be tmp
my $location="tmp";
sub find_txt {
my $F = $File::Find::name;
if ($F =~ /txt$/ ) {
print "$F\n";
}
}
find({ wanted => \&find_txt, no_chdir=>1}, $location);
Much easier if you just use File::Find core module:
#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);
use File::Find;
my $Target = shift;
find(\&survey, #ARGV);
sub survey {
print "Found $File::Find::name\n" if ($_ eq $Target)
}
First argument: pathless name of file to search for. All subsequent arguments are directories to check. File::Find searches recursively, so you only need to name the top of a tree, all subdirectories will automatically be searched as well.
$File::Find::name is the full pathname of the file, so you could subtract your $location from that if you want a relative path.

How can I add a prefix to all filenames under a directory?

I am trying to prefix a string (reference_) to the names of all the *.bmp files in all the directories as well sub-directories. The first time we run the silk script, it will create directories as well subdirectories, and under each subdirectory it will store each mobile application's sceenshot with .bmp extension.
When I run the automated silkscript for second time it will again create the *.bmp files in all the subdirectories. Before running the script for second time I want to prefix all the *.bmp with a string reference_.
For example first_screen.bmp to reference_first_screen.bmp,
I have the directory structure as below:
C:\Image_Repository\BG_Images\second
...
C:\Image_Repository\BG_Images\sixth
having first_screen.bmp and first_screen.bmp files etc...
Could any one help me out?
How can I prefix all the image file names with reference_ string?
When I run the script for second time, the Perl script in silk will take both the images from the sub-directory and compare them both pixel by pixel. I am trying with code below.
Could you please guide me how can I proceed to complete this task.
#!/usr/bin/perl -w
&one;
&two;
sub one {
use Cwd;
my $dir ="C:\\Image_Repository";
#print "$dir\n";
opendir(DIR,"+<$dir") or "die $!\n";
my #dir = readdir DIR;
#$lines=#dir;
delete $dir[-1];
print "$lines\n";
foreach my $item (#dir)
{
print "$item\n";
}
closedir DIR;
}
sub two {
use Cwd;
my $dir1 ="C:\\Image_Repository\\BG_Images";
#print "$dir1\n";
opendir(D,"+<$dir1") or "die $!\n";
my #dire = readdir D;
#$lines=#dire;
delete $dire[-1];
#print "$lines\n";
foreach my $item (#dire)
{
#print "$item\n";
$dir2="C:\\Image_Repository\\BG_Images\\$item";
print $dir2;
opendir(D1,"+<$dir2") or die " $!\n";
my #files=readdir D1;
#print "#files\n";
foreach $one (#files)
{
$one="reference_".$one;
print "$one\n";
#rename $one,Reference_.$one;
}
}
closedir DIR;
}
I tried open call with '+<' mode but I am getting compilation error for the read and write mode.
When I am running this code, it shows the files in BG_images folder with prefixed string but actually it's not updating the files in the sub-directories.
You don't open a directory for writing. Just use opendir without the mode parts of the string:
opendir my($dir), $dirname or die "Could not open $dirname: $!";
However, you don't need that. You can use File::Find to make the list of files you need.
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use File::Find;
use File::Find::Closures qw(find_regular_files);
use File::Spec::Functions qw(catfile);
my( $wanted, $reporter ) = find_regular_files;
find( $wanted, $ARGV[0] );
my $prefix = 'recursive_';
foreach my $file ( $reporter->() )
{
my $basename = basename( $file );
if( index( $basename, $prefix ) == 0 )
{
print STDERR "$file already has '$prefix'! Skipping.\n";
next;
}
my $new_path = catfile(
dirname( $file ),
"recursive_$basename"
);
unless( rename $file, $new_path )
{
print STDERR "Could not rename $file: $!\n";
next;
}
print $file, "\n";
}
You should probably check out the File::Find module for this - it will make recursing up and down the directory tree simpler.
You should probably be scanning the file names and modifying those that don't start with reference_ so that they do. That may require splitting the file name up into a directory name and a file name and then prefixing the file name part with reference_. That's done with the File::Basename module.
At some point, you need to decide what happens when you run the script the third time. Do the files that already start with reference_ get overwritten, or do the unprefixed files get overwritten, or what?
The reason the files are not being renamed is that the rename operation is commented out. Remember to add use strict; at the top of your script (as well as the -w option which you did use).
If you get a list of files in an array #files (and the names are base names, so you don't have to fiddle with File::Basename), then the loop might look like:
foreach my $one (#files)
{
my $new = "reference_$one";
print "$one --> $new\n";
rename $one, $new or die "failed to rename $one to $new ($!)";
}
With the aid of find utility from coreutils for Windows:
$ find -iname "*.bmp" | perl -wlne"chomp; ($prefix, $basename) = split(m~\/([^/]+)$~, $_); rename($_, join(q(/), ($prefix, q(reference_).$basename))) or warn qq(failed to rename '$_': $!)"