How do I use $File::Find::prune? - perl

I have a need to edit cue files in the first directory and not go recursively in the subdirectories.
find(\&read_cue, $dir_source);
sub read_cue {
/\.cue$/ or return;
my $fd = $File::Find::dir;
my $fn = $File::Find::name;
tie my #lines, 'Tie::File', $fn
or die "could not tie file: $!";
foreach (#lines) {
s/some substitution//;
}
untie #lines;
}
I've tried variations of
$File::Find::prune = 1;
return;
but with no success. Where should I place and define $File::Find::prune?
Thanks

If you don't want to recurse, you probably want to use glob:
for (glob("*.cue")) {
read_cue($_);
}

If you want to filter the subdirectories recursed into by File::Find, you should use the preprocess function (not the $File::Find::prune variable) as this gives you much more control. The idea is to provide a function which is called once per directory, and is passed a list of files and subdirectories; the return value is the filtered list to pass to the wanted function, and (for subdirectories) to recurse into.
As msw and Brian have commented, your example would probably be better served by a glob, but if you wanted to use File::Find, you might do something like the following. Here, the preprocess function calls -f on every file or directory it's given, returning a list of files. Then the wanted function is called only for those files, and File::Find does not recurse into any of the subdirectories:
use strict;
use File::Find;
# Function is called once per directory, with a list of files and
# subdirectories; the return value is the filtered list to pass to
# the wanted function.
sub preprocess { return grep { -f } #_; }
# Function is called once per file or subdirectory.
sub wanted { print "$File::Find::name\n" if /\.cue$/; }
# Find files in or below the current directory.
find { preprocess => \&preprocess, wanted => \&wanted }, '.';
This can be used to create much more sophisticated file finders. For example, I wanted to find all files in a Java project directory, without recursing into subdirectories starting with ".", such as ".idea" and ".svn", created by IntelliJ and Subversion. You can do this by modifying the preprocess function:
# Function is called once per directory, with a list of files and
# subdirectories; return value is the filtered list to pass to the
# wanted function.
sub preprocess { return grep { -f or (-d and /^[^.]/) } #_; }

If you only want the files in a directory without searching subdirectories, you don't want to use File::Find. A simple glob probably does the trick:
my #files = glob( "$dir_source/*.cue" );
You don't need that subroutine. In general, when you're doing a lot of work for a task that you think should be simple, you're probably doing it wrong. :)

Say you have a directory subtree with
/tmp/foo/file.cue
/tmp/foo/bar/file.cue
/tmp/foo/bar/baz/file.cue
Running
#! /usr/bin/perl
use warnings;
use strict;
use File::Find;
sub read_cue {
if (-f && /\.cue$/) {
print "found $File::Find::name\n";
}
}
#ARGV = (".") unless #ARGV;
find \&read_cue => #ARGV;
outputs
found /tmp/foo/file.cue
found /tmp/foo/bar/file.cue
found /tmp/foo/bar/baz/file.cue
But if you remember the directories in which you found cue files
#! /usr/bin/perl
use warnings;
use strict;
use File::Find;
my %seen_cue;
sub read_cue {
if (-f && /\.cue$/) {
print "found $File::Find::name\n";
++$seen_cue{$File::Find::dir};
}
elsif (-d && $seen_cue{$File::Find::dir}) {
$File::Find::prune = 1;
}
}
#ARGV = (".") unless #ARGV;
find \&read_cue => #ARGV;
you get only the toplevel cue file:
found /tmp/foo/file.cue
That's because $File::Find::prune emulates the -prune option of find that affects directory processing:
-prune
True; if the file is a directory, do not descend into it.

Related

Perl , How to read subfolder Output

I am writing a script to read the content of multiple sub folder in a directory.
And recently i need to read the content of folder inside multiple sub-folder.
Want to ask how can i write the code to read those folder inside multiple sub-folder.
This is the new conditions
Multiple Sub-folder -> Local folder -> fileAAA.csv
how do i read this fileAAA in Local folder of Multiple Sub-folder?
Currently the code i am writing was in this condition and it works well.
Multiple Sub-folder -> fileAAA.csv
Able to read fileAAA from multiple Sub-folder
Below is the code i use to read
Multiple Sub-folder -> fileAAA.csv
my ( $par_dir, $sub_dir );
opendir( $par_dir, "$parent" );
while ( my $sub_folders = readdir($par_dir) ) {
next if ( $sub_folders =~ /^..?$/ ); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless ( -d $path ); # skip anything that isn't a directory
opendir( $sub_dir, $path );
while ( my $file = readdir($sub_dir) ) {
next unless $file =~ /\.csv?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
$flag = 0;
}
closedir($par_dir);
......
Updated
You should look at the File::Find module which has everything already in place to do searches like this, and has taken account of all corner cases for you
I wrote that on my tablet and at the time I couldn't offer sample code to support it. I believe this will do what you're asking for, which is simply to find all CSV files at any level beneath a parent directory
use strict;
use warnings;
use File::Find qw/ find /;
STDOUT->autoflush;
my $parent = '/Multiple Sub-folder';
find(sub {
return unless -f and /\.csv$/i;
print_file_names($File::Find::name);
}, $parent);
sub print_file_names {
my ($fn) = #_;
print $fn, "\n";
}
Without using moudle try this
Instead of opendir can you try glob for subdirectory search.
In below script i make a subroutine for continuous search.
When elsif condition is satisfied the path of the directory is will go to the find subroutine then it'll seach and so on.
my $v = "/Multiple Sub-folder";
find($v);
sub find{
my ($s) = #_;
foreach my $ma (glob "$s/*")
{
if(-f $ma)
{
if($ma =~m/.csv$/) # Here search for csv files.
{
print "$ma\n";
}
}
elsif(-d $ma)
{
find("$ma")
}
}
}
But can you use File::Find module for search the files in the directory as the answer of Borodin Which is the best approach.

How to perl process and push multiple directories using simple code

I am trying to get all sub directories and files inside 'dir1, dir2, dir3,dir4,dir5' like below and push it to another Dir. if I using this code I am getting everything. however, I need to process more directories like this. is there simple way to process all these 'dir1 to ... dirx' using simple code instead of assign each directories individually below. Thanks in Advance
use File::Find::Rule;
my #pushdir;
my #pushdir1 = File::Find::Rule->directory->in('/tmp/dirx');
my #pushdir2 = File::Find::Rule->directory->in('/tmp/nextdir');
my #pushdir3 = File::Find::Rule->directory->in('/tmp/pushdir');
my #pushdir4 = File::Find::Rule->directory->in('/tmp/logdir');
my #pushdir5 = File::Find::Rule->directory->in('/tmp/testdir');
push #pushdir, #pushdir1,#pushdir2,#pushdir3,#pushdir4,#pushdir5;
my #Files;
foreach my $dir (#pushdir) {
push #Files, sort glob "$dir/*.txt";
}
Use a subroutine:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find::Rule;
my #dirnames = qw( dirx temp_dir testdir nextdir );
my #dirs_to_search = map { "/tmp/$_" } #dirnames;
my #files;
for my $dir (#dirs_to_search) {
push #files, find_files_in_dir($dir);
}
sub find_files_in_dir {
my ($dir) = #_;
my #subdirs = File::Find::Rule->directory->in( $dir );
my #txt_files;
for my $subdir ( #subdirs ) {
push #txt_files, sort glob "$subdir/*.txt";
}
return #txt_files;
}
It sounds like you want a list of all *.txt files contained in /tmp child directories and their descendants
That's easily done with a single glob call, like this
my #files = glob '/tmp/**/*.txt'

Determine if directory is the root directory

I am trying to traverse the directory tree upwards, searching for a given directory name, if the directory is found, I should chdir to it, otherwise give an error message. For example:
use warnings;
use strict;
use Cwd qw(getcwd);
die "Base directory not found!" if (!gotoDir());
sub gotoDir {
my $baseDir = '.test';
my $curdir = getcwd();
while (1) {
return 1 if (-d $baseDir);
if (! chdir("..")) {
chdir($curdir);
return 0;
}
}
}
The problem is that chdir does not fail when going beyond the root, so the above program enters an infinite loop if .test is not found.
Of course, I could just test for / since I am on Linux, but I would like to do this in a system independent manner.
As #Gnouc has answered, the File::Spec module has a portable representation of the root directory with its rootdir method.
This is how I would write your goto_dir subroutine. Note that capital letters are conventionally reserved for global identifiers like Package::Names.
I think it is best to pass the directory you are searching for as a parameter to the subroutine to make it more general. I have also written it so that the subroutine does a chdir to the .test directory if it is is found, which is what you say you want but not what your own solution tries to do.
Finally, since portability is important, I have used File::Spec->updir in place of a literal '..' to refer to the parent of the current directory.
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd 'cwd';
use File::Spec;
goto_dir('.test') or die 'Base directory not found!';
sub goto_dir {
my ($base_dir) = #_;
my $original_dir = cwd;
while () {
if (-d $base_dir) {
chdir $base_dir;
return 1;
}
elsif (cwd eq File::Spec->rootdir) {
chdir $original_dir;
return 0;
}
else {
chdir File::Spec->updir;
}
}
}
You can use File::Spec to get the root directory:
$ perl -MFile::Spec -E 'say File::Spec->rootdir()'
/
File::Spec is great for obtaining what the root directory is, but for testing whether a given directory is or isn't that is not so easy. For that you likely want to use stat and compare if the dev and ino fields are equal:
use File::stat;
my $rootstat = stat(File::Spec->rootdir);
...
my $thisstat = stat($dir);
if( $thisstat->dev == $rootstat->dev and $thisstat->ino == $rootstat->ino ) {
say "This is the root directory";
}
This avoids problems of the string-formatted form of a path to the directory, as it may be that you have the path ../../../../../.. for example.

Ignore an entire directory when using File::Find in Perl script

I have a script which scans every local filesystem for world-writable files. Any found files are written to an output file. It also uses another file which provides a list of files to ignore.
We have the Tivoli monitoring agent installed which, for some strange reason, has been designed to create every file in its installation path with world-writable permissions. As it is known and there is little we can do about it, we would like to simply ignore the entire directory.
I imagine I can utilize a glob such as /opt/IBM/ITM/* but I haven't the first bit of a clue at to how to do that.
At the moment I've hard-coded the directory into the script. This is less than ideal, but functional. I'd prefer to have it in the list of excludes.
Over at Code Review it was suggested that I use File::Find::prune. Unfortunately, this hasn't worked. From what I gather and understand about File::Find::prune if it finds a file at /opt/IBM/ITM/.../.../file.txt which is supposed to be excluded, it will then skip the entire /opt/IBM/ITM/.../.../ directory. This is fine, but it means I would need to have an exclusion entry for every sub-directoy of /opt/IBM/ITM/. This would be a tedious endeavor considering how many sub-directories and sub-sub-directories there are.
I did try placing a world-writable file under /opt/IBM/ITM/ and add that to the exclusion list, but it didn't work. I'm guessing because it wasn't found first.
The script:
#!/usr/bin/perl
use warnings;
use strict;
use Fcntl ':mode';
use File::Find;
no warnings 'File::Find';
no warnings 'uninitialized';
my $dir = "/var/log/tivoli/";
my $mtab = "/etc/mtab";
my $permFile = "world_writable_files.txt";
my $tmpFile = "world_writable_files.tmp";
my $exclude = "/usr/local/etc/world_writable_excludes.txt";
my $mask = S_IWUSR | S_IWGRP | S_IWOTH;
my (%excludes, %devNums);
my $errHeader;
# Compile a list of mountpoints that need to be scanned
my #mounts;
open MT, "<${mtab}" or die "Cannot open ${mtab}, $!";
# We only want the local mountpoints
while (<MT>) {
if ($_ =~ /ext[34]/) {
chomp;
my #line = split;
push(#mounts, $line[1]);
my #stats = stat($line[1]);
$devNums{$stats[0]} = undef;
}
}
close MT;
# Build a hash from /usr/local/etc/world_writables_excludes.txt
if ((! -e $exclude) || (-z $exclude)) {
$errHeader = <<HEADER;
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! !!
!! /usr/local/etc/world_writable_excludes.txt is !!
!! is missing or empty. This report includes !!
!! every world-writable file including those which !!
!! are expected and should be excluded. !!
!! !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
HEADER
} else {
open XCLD, "<${exclude}" or die "Cannot open ${exclude}, $!\n";
while (<XCLD>) {
chomp;
$excludes{$_} = 1;
}
}
sub wanted {
my #dirStats = stat($File::Find::name);
# Is it excluded from the report...
return if exists $excludes{$File::Find::name};
# ...is the Tivoli installation directory...
return if ($File::Find::name =~ /\b\/ITM\b/);
# ...in a special directory, ...
return if ($File::Find::name =~ /^\bsys\b|\bproc\b|\bdev\b$/);
# ...a regular file, ...
return unless -f;
# ...local, ...
return unless (exists $devNums{$dirStats[0]});
# ...and world writable?
return unless ($dirStats[2] & $mask) == $mask;
# If so, add the file to the list of world writable files
print(WWFILE "$File::Find::name\n");
}
# Create the output file path if it doesn't already exist.
mkdir($dir or die "Cannot execute mkdir on ${dir}, $!") unless (-d $dir);
# Create our filehandle for writing our findings
open WWFILE, ">${dir}${tmpFile}" or die "Cannot open ${dir}${tmpFile}, $!";
print(WWFILE "${errHeader}") if ($errHeader);
finddepth(\&wanted, #mounts);
close WWFILE;
# If no world-writable files have been found ${tmpFile} should be zero-size;
# Delete it so Tivoli won't alert
if (-z "${dir}${tmpFile}") {
unlink "${dir}${tmpFile}";
} else {
rename("${dir}${tmpFile}","${dir}${permFile}") or die "Cannot rename file ${dir}${tmpFile}, $!";
}
It has also been suggested elsewhere that I use File::Find::Rule. I'd rather avoid doing this simply because I don't want to perform a complete rewrite of the script.
As I've said, the script above works. I'd prefer not hard-coding the exclusion, though. Figuring out how to do this would also allow me to remove the match against the "special" directories.
To prune an entire directory tree, just set the $File::Find::prune value in your wanted sub. This will work as long as bydepth was not specified:
if ($File::Find::name eq '/opt/IBM/ITM') {
$File::Find::prune = 1;
return;
}

directory tree warning

i have writed some script, that recursively print's directory's content. But it prints warning for each folder. How to fix this?
sample folder:
dev# cd /tmp/testdev# ls -p -Rtest2/testfiletestfile2
./test2:testfile3testfile4
my code:
#!/usr/bin/perl
use strict;
use warnings;
browseDir('/tmp/test');
sub browseDir {
my $path = shift;
opendir(my $dir, $path);
while (readdir($dir)) {
next if /^\.{1,2}$/;
if (-d "$path/$_") {
browseDir("$path/$_");
}
print "$path/$_\n";
}
closedir($dir);
}
and the output:
dev# perl /tmp/cotest.pl/tmp/test/test2/testfile3
/tmp/test/test2/testfile4Use of uninitialized value $_ in
concatenation (.) or string at /tmp/cotest.pl line 16./tmp/test/
/tmp/test/testfile/tmp/test/testfile2
May you try that code:
#!/usr/bin/perl
use strict;
use warnings;
browseDir('/tmp');
sub browseDir {
my $path = shift;
opendir(my $dir, $path);
while (readdir($dir)) {
next if /^\.{1,2}$/;
print "$path/$_\n";
if (-d "$path/$_") {
browseDir("$path/$_");
}
}
closedir($dir);
}
If you got that error, its because you call browseDir() before use variable $_.
Why not use the File::Find module? It's included in almost all distributions of Perl since Perl 5.x. It's not my favorite module due to the sort of messy way it works, but it does a good job.
You define a wanted subroutine that does what you want and filter out what you don't want. In this case, you're printing pretty much everything, so all wanted does is print out what is found.
In File::Find, the name of the file is kept in $File::Find::name and the directory for that file is in $File::Find::dir. The $_ is the file itself, and can be used for testing.
Here's a basic way of what you want:
use strict;
use warnings;
use feature qw(say);
use File::Find;
my $directory = `/tmp/test`;
find ( \&wanted, $directory );
sub wanted {
say $File::Find::Name;
}
I prefer to put my wanted function in my find subroutine, so they're together. This is equivalent to the above:
use strict;
use warnings;
use feature qw(say);
use File::Find;
my $directory = `/tmp/test`;
find (
sub {
say $File::Find::Name
},
$directory,
);
Good programming says not to print in subroutines. Instead, you should use the subroutine to store and return your data. Unfortunately, find doesn't return anything at all. You have to use a global array to capture the list of files, and later print them out:
use strict;
use warnings;
use feature qw(say);
use File::Find;
my $directory = `/tmp/test`;
my #directory_list;
find (
sub {
push #directory_list, $File::Find::Name
}, $directory );
for my $file (#directory_list) {
say $file;
}
Or, if you prefer a separate wanted subroutine:
use strict;
use warnings;
use feature qw(say);
use File::Find;
my $directory = `/tmp/test`;
my #directory_list;
find ( \&wanted, $directory );
sub wanted {
push #directory_list, $File::Find::Name;
}
for my $file (#directory_list) {
say $file;
}
The fact that my wanted subroutine depends upon an array that's not local to the subroutine bothers me which is why I prefer embedding the wanted subroutine inside my find call.
One thing you can do is use your subroutine to filter out what you want. Let's say you're only interested in JPG files:
use strict;
use warnings;
use feature qw(say);
use File::Find;
my $directory = `/tmp/test`;
my #directory_list;
find ( \&wanted, $directory );
sub wanted {
next unless /\.jpg$/i; #Skip everything that doesn't have .jpg suffix
push #directory_list, $File::Find::Name;
}
for my $file (#directory_list) {
say $file;
}
Note how the wanted subroutine does a next on any file I don't want before I push it into my #directory_list array. Again, I prefer the embedding:
find (sub {
next unless /\.jpg$/i; #Skip everything that doesn't have .jpg suffix
push #directory_list, $File::Find::Name;
}
I know this isn't exactly what you asked, but I just wanted to let you know about the Find::File module and introduce you to Perl modules (if you didn't already know about them) which can add a lot of functionality to Perl.
You place a value in $_ before calling browseDir and you expect it the value to be present after calling browseDir (a reasonable expectation), but browseDir modifies that variable.
Just add local $_; to browseDir to make sure that any change to it are undone before the sub exits.
Unrelated to your question, here are three other issues:
Not even minimal error checking!
You could run out of directory handles will navigating a deep directory.
You filter out files ".\n" and "..\n".
Fix:
#!/usr/bin/perl
use strict;
use warnings;
browseDir('/tmp/test');
sub browseDir {
my $path = shift;
opendir(my $dh, $path) or die $!;
my #files = readdir($dh);
closedir($dh);
for (#files) {
next if /^\.{1,2}z/;
if (-d "$path/$_") {
browseDir("$path/$_");
}
print "$path/$_\n";
}
}
Finally, why don't use you a module like File::Find::Rule?
use File::Find::Rule qw( );
print "$_\n" for File::Find::Rule->in('/tmp');
Note: Before 5.12, while (readir($dh)) would have to be written while (defined($_ = readdir($dh)))