Finding matching files with Perl module File::Find::Rule

Finding matching files with Perl module File::Find::Rule - perl

I'n trying to use the File::Find::Rule module to find a specific file (output.txt) in a subdirectory, and if it is not there then search in the root directory to see if it exists. The issue is that multiple output.txt files exist, so we should only be looking for others if the original is not found.
Basically the directory structure looks like this
top
level-1-a
level-2-a
output.txt
level-2-b
output.txt
level-1-b
level-2-a
output.txt
level-2-b
output.txt
Right now I have:
#files = File::Find::Rule->file()->name($output)->in($sub_dir);
if ( ! #files ) {
#files = File::Find::Rule->file()->name($output)->in($root_dir);
}
Where the behavior is, we look for output.txt in \top\level-1-a first, where it finds the matches in level-2-a and level-2-b. If there are no matching files under level-1-a, we will then make the same call on \top to find the matches show up in the level-1-b directories. Is there a cleaner way to check with that "if-else" idea?

I would check the subdirectories one at a time. Here's an example. The last breaks out of the for loop as soon as soon as a subdirectory has been found that contains the required file
my #files;
for my $subdir ( 'level-1-a', 'level-1-b' ) {
last if #files = File::Find::Rule->file()->name($output)->in("/top/$subdir");
}

Related

Using perl script find file count with multiple pattern

I want to search files count inside a directory.
If number of files in folder less than or equal to 5 list the file names with count,
else files not found in directory like "zero files found".
Ex: Files inside directory like
testfile1d_01012022.txt
testfile2d_01012022.txt
testfile4d_01012022.txt
testfile9d_01012022.txt
testfile7d_01012022.txt
based on above file names, I given search pattern
$Filename = "testfile*d*".
But final result getting like below
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt -its removed number after google d
below variables used in script.
my #files = grep!/^\.\.?$/, readdir DIR
$Filename =~ s/\*//g;
Let me know how to show available files with count.

You say that you are using this "search pattern".
$Filename = "testfile*d*"
What that looks like is a shell filename pattern, like if you were to say ls testfile*d* in the shell. That's not how regular expressions work.
If you want a pattern that matches "testfile" and then some arbitrary string and then "d" and then some other arbitrary string, you would write that as
testfile.*d.*
Second, you're not using that pattern correctly. Your code shows:
my #files = grep!/^\.\.?$/, readdir DIR
$Filename =~ s/\*//g;
The first line does read a list of files, but the second line doesn't do anything useful. What you want your second line to be is:
my #matched_files = grep /testfile.*d.*/, #files;
When you do that, #matched_files will have all the files that match that regular expression.
HOWEVER, you don't have to use regular expressions at all to get a list of files. You can use the built-in glob function that DOES handle shell globbing like you want, and you don't need that readdir stuff.
my #files = glob( 'testfile*d*' ); # Note it uses shell glob syntax, not regex

How do you delete a file that does not have a file extension?

How do you delete a file that does not have a file extension?
I'm using Strawberry Perl 5.32.1.1 64 bit version. Here is what I have:
unlink glob "$dir/*.*";
I've also tried the following:
my #file_list;
opendir(my $dh, $dir) || die "can't opendir $dir: $!";
while (readdir $dh){
next unless -f;
push #file_list, $_;
}
closedir $dh;
unlink #file_list;
The result of this is that all files with an extension are deleted,
but those files without an extension remain undeleted.

You don't really explain what the problem is, so this is just a guess.
You're using readdir() to get a list of files to delete and then passing that list to unlink(). The problem here is that the filenames that you get back from readdir() do not include the directory name that you originally passed to readdir(). So you need to populate your array like this:
push #file_list, "$dir/$_";
In a comment you say:
I'm testing unlink glob "$dir/."; but the files without file extensions are not deleted.
Well, if you think about it, you're only asking for files with an extension. If the pattern you pass to glob() contains *.*, then it will only return files that match that pattern (i.e. files with a dot and more text after the dot).
The solution would seem to be to simplify the pattern that you are passing to glob() so it's just *.
unlink glob "$dir/*";
That will, of course, try to delete directories as well, so you might want this instead:
unlink grep { -f } glob "$dir/*";

You expect glob to perform DOS-like globbing, but the glob function provides csh-like globbing.
glob("*.*") matches all files that contains a ., ignoring files with a leading dot.
glob("*") matches all files, ignoring files with a leading dot.
glob("* .*") matches all files.
Note that every kind of file is matched. This includes directories. In particular, note that . and .. are matched by .*.
If you want DOS-style globs, you can use File::DosGlob.

Yes, you can use glob, but you need to do a little more work than that. grep can help select the files you are looking for.
* grabs all entries (which do not begin with a dot), -f selects only files, !/\./ removes files with a dot in the name:
unlink grep {!/\./} grep {-f} glob "$dir/*";

perl deleting a subdirectory containing a file of specified format

i have to do a program in perl. and im very new to it.the task is
there will be directory ,inside that many subdirectories will be there.each subdirectories contain further subdirectories. finally there will be files in the end of chain of subdirectories. If the file format is ".TXT" i should delete the subdirectory that is next to the main directory that contains the .TXT file.
for eg raghav\abc\ccd\1.txt then i should delete subdirectory "abc".
my code is
#!/usr/bin/perl
use warnings;
use strict;
use Cwd qw(abs_path);
my $path ="d:\\raghav";
search_all_folder($path);
sub search_all_folder {
my ($folder) = #_;
if ( -d $folder ) {
chdir $folder;
opendir my $dh, $folder or die "can't open the directory: $!";
while ( defined( my $file = readdir($dh) ) ) {
chomp $file;
next if $file eq '.' or $file eq '..';
search_all_folder("$folder/$file"); ## recursive call
read_files($file) if ( -f $file );
}
closedir $dh or die "can't close directory: $!";
}
}
sub read_files {
my ($filename) = #_;
if($filename= ~/.txt/)
rmdir;
}
}

Never ever implement your own directory traversal. Use File::Find instead. It's more efficient and less prone to breaking.
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
my $search_path = "d:\\raghav";
my %text_files_found_in;
sub find_text_files {
if (m/\.txt$/i) {
## you may want to transform this first, to get a higher level directory.
$text_files_found_in{$File::Find::dir}++;
}
}
find( \&find_text_files, $search_path );
foreach my $dir ( keys %text_files_found_in ) {
print "$dir contained a text file\n";
##maybe delete it, but don't until you're sure the code's working!
}
You've got a list of files now, against which you can figure out what to delete and then delete it. rmdir won't do the trick though - that only works on empty directories. You can either collate a list of files (as this does) or you could figure out the path to delete as you go, and insert it into a hash. (So you don't get dupes).
Either way though - it's probably better to run the find first, delete second rather than trying to delete a tree you may still be traversing.
Edit: What this program does:
Imports the File::Find module.
defines the subroutine find_text_files
runs find (in the File::Find module), and tells it to run find_text_files on every file it finds in it's recursive traversal.
find_text_files is called on every file within the triee (below $search_path). When it's called:
File::Find sets: $_ to be the current filename. We match against m/\.txt$/ to see if it's a text file.
File::Find also sets two other variables: $File::Find::dir - to the directory path to this file. And $File::Find::file to the full file path. We insert $File::Find::dir into the hash text_files_found_in provided that pattern matches.
Once find is complete, we have a hash called text_files_found_in which contains keys of all the directories where a text file was found.
we can then iterate on keys %text_files_found_in to identify - and delete.*
at this point, you may want to transform each of the directories in that list, because it'll be a full path to the file, and you may only want to delete at a higher level.
* There's no delete code in this script - you'll have to sort that yourself. I'm not putting anything that might delete stuff up on the internet where people who don't full understand it might just run it blind.

perl file size calculation not working

I am trying to write a simple perl script that will iterate through the regular files in a directory and calculate the total size of all the files put together. However, I am not able to get the actual size of the file, and I can't figure out why. Here is the relevant portion of the code. I put in print statements for debugging:
$totalsize = 0;
while ($_ = readdir (DH)) {
print "current file is: $_\t";
$cursize = -s $_;
print "size is: $cursize\n";
$totalsize += $cursize;
}
This is the output I get:
current file is: test.pl size is:
current file is: prob12.pl size is:
current file is: prob13.pl size is:
current file is: prob14.pl size is:
current file is: prob15.pl size is:
So the file size remains blank. I tried using $cursize = $_ instead but the only effect of that was to retrieve the file sizes for the current and parent directories as 4096 bytes each; it still didn't get any of the actual file sizes for the regular files.
I have looked online and through a couple of books I have on perl, and it seems that perl isn't able to get the file sizes because the script can't read the files. I tested this by putting in an if statement:
print "Cannot read file $_\n" if (! -r _);
Sure enough for each file I got the error saying that the file could not be read. I do not understand why this is happening. The directory that has the files in question is a subdirectory of my home directory, and I am running the script as myself from another subdirectory in my home directory. I have read permissions to all the relevant files. I tried changing the mode on the files to 755 (from the previous 711), but I still got the Cannot read file output for each file.
I do not understand what's going on. Either I am mixed up about how permissions work when running a perl script, or I am mixed up about the proper way to use -s _. I appreciate your guidance. Thanks!

If it isn't just your typo -s _ instead of the correct -s $_ then please remember that readdir returns file names relative to the directory you've opened with opendir. The proper way would be something like
my $base_dir = '/path/to/somewhere';
opendir DH, $base_dir or die;
while ($_ = readdir DH) {
print "size of $_: " . (-s "$base_dir/$_") . "\n";
}
closedir DH;
You could also take a look at the core module IO::Dir which offers a tie way of accessing both the file names and the attributes in a simpler manner.

You have a typo:
$cursize = -s _;
Should be:
$cursize = -s $_;

How do I skip .svn folders with File::Find?

I am writing a Perl script that is iterating over file names in a directory and its sub-directories, using the following method:
find(\&getFile, $mainDir);
sub getFile {
my $file_dir = $File::Find::name;
return unless -f $file_dir; # return if its a folder
}
The file structure looks like this:
main/classes/pages/filename.php
However because of version control each folder and subfolder has a hidden .svn directory that has duplicates of every file inside with a .svn-base suffix:
main/.svn/classes/pages/filename.php.svn-base
I was wondering if there is a return statement like the one I had previously using:
return if ($file_dir eq "something here");
to skip all the .svn folders to not find filenames with the .svn-base suffix. I have been fiddling around with regex and searching for hours without much luck. I have only been using perl for couple days.

You may use
return if ($file_dir !~ /\.svn/);
(!~ is equivalent to !($file_dir =~ /\.svn/). The =~ operator compares a variable with a pattern.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse