How do I use File::Find::Rule in taint mode? - perl

I am trying to get a list of subdirectories in a given directory using something like the following:
#!/usr/bin/perl -wT
use strict;
use warnings;
use File::Find::Rule;
use Data::Dumper;
my #subdirs = File::Find::Rule->maxdepth(1)->directory->relative->in('mydir');
print Dumper(#subdirs);
However, running this gives the result:
Insecure dependency in chdir while running with -T switch
I understand that File::Find has options for dealing with taint mode, but I can’t seem to find an equivalent in File::Find::Rule. Is it possible to do the above? Should I use an alternative method for listing subdirectories? Am I completely misunderstanding something obvious that I really should understand about taint mode?

(Edit!) Okay, logic would suggest that throwing in the following would work:
->extras( {untaint => 1, untaint_pattern => $untaint_pattern, untaint_skip => 1} )
This lets you use the taint-mode features of File::Find by passing arguments directly to that module's find() function. Incidentally, File::Find mentions that one should set $untaint_pattern by using the qr// operator. For example, the default value is
$untaint_pattern = qr|^([-+#\w./]+)$|
However, this does not work! In fact, your issue is a known bug in File::Find::Rule. (For example, here are the CPAN and Debian bug reports.) If you would like a bugfix, then both of those bug reports have patches.
If you are in a restricted environment, one thing you can do is essentially implement the patch yourself in your code. For example, if you want to keep everything in one file, you can add the large code block below after use File::Find::Rule. Note that this is a very quick fix and may be suboptimal. If it doesn't work for you (e.g., because you have spaces in your filenames), change the pattern qr|^([-+#\w./]+)$| that is used.
Note finally that if you want your code organization to be a bit better, you may want to dump this into a separate package, maybe called MyFileFindRuleFix or something, that you always use after File::Find::Rule itself.
package File::Find::Rule;
no warnings qw(redefine);
sub in {
my $self = _force_object shift;
my #found;
my $fragment = $self->_compile( $self->{subs} );
my #subs = #{ $self->{subs} };
warn "relative mode handed multiple paths - that's a bit silly\n"
if $self->{relative} && #_ > 1;
my $topdir;
my $code = 'sub {
(my $path = $File::Find::name) =~ s#^(?:\./+)+##;
$path = "." if ($path eq ""); # See Debian bug #329377
my #args = ($_, $File::Find::dir, $path);
my $maxdepth = $self->{maxdepth};
my $mindepth = $self->{mindepth};
my $relative = $self->{relative};
# figure out the relative path and depth
my $relpath = $File::Find::name;
$relpath =~ s{^\Q$topdir\E/?}{};
my $depth = scalar File::Spec->splitdir($relpath);
#print "name: \'$File::Find::name\' ";
#print "relpath: \'$relpath\' depth: $depth relative: $relative\n";
defined $maxdepth && $depth >= $maxdepth
and $File::Find::prune = 1;
defined $mindepth && $depth < $mindepth
and return;
#print "Testing \'$_\'\n";
my $discarded;
return unless ' . $fragment . ';
return if $discarded;
if ($relative) {
push #found, $relpath if $relpath ne "";
}
else {
push #found, $path;
}
}';
#use Data::Dumper;
#print Dumper \#subs;
#warn "Compiled sub: '$code'\n";
my $sub = eval "$code" or die "compile error '$code' $#";
my $cwd = getcwd;
# Untaint it
if ( $cwd =~ qr|^([-+#\w./]+)$| ) {
$cwd = $1;
} else {
die "Couldn't untaint \$cwd: [$cwd]";
}
for my $path (#_) {
# $topdir is used for relative and maxdepth
$topdir = $path;
# slice off the trailing slash if there is one (the
# maxdepth/mindepth code is fussy)
$topdir =~ s{/?$}{}
unless $topdir eq '/';
$self->_call_find( { %{ $self->{extras} }, wanted => $sub }, $path );
}
chdir $cwd;
return #found;
}
use warnings;
package main;

Related

Print files and subdirectories of given directory

I am trying to get all files and directories from a given directory but I can't specify what is the type (file/ directory). Nothing is being printed. What I am doing wrong and how to solve it. Here is the code:
sub DoSearch {
my $currNode = shift;
my $currentDir = opendir (my $dirHandler, $currNode->rootDirectory) or die $!;
while (my $node = readdir($dirHandler)) {
if ($node eq '.' or $node eq '..') {
next;
}
print "File: " . $node . "\n" if -f $node;
print "Directory " . $node . "\n" if -d $node;
}
closedir($dirHandler);
}
readdir returns only the node name without any path information. The file test operators will look in the current working directory if no path is specified, and because the current directory isn't $currNode->rootDirectory they won't be found
I suggest you use rel2abs from the File::Spec::Functions core module to combine the node name with the path. You can use string concatenation, but the library function takes care of corner cases like whether the directory ends with a slash
It's also worth pointing out that Perl identifiers are most often in snake_case, and people familiar with the language would thank you for not using capital letters. They should especially be avoided for the first character of an identifier, as names like that are reserved for globals like package names
I think your subroutine should look like this
use File::Spec::Functions 'rel2abs';
sub do_search {
my ($curr_node) = #_;
my $dir = $curr_node->rootDirectory;
opendir my $dh, $dir or die qq{Unable to open directory "$dir": $!};
while ( my $node = readdir $dh ) {
next if $node eq '.' or $node eq '..';
my $fullname = rel2abs($node, $dir);
print "File: $node\n" if -f $fullname;
print "Directory $node\n" if -d $fullname;
}
}
An alternative method is to set the current working directory to the directory being read. That way there is no need to manipulate file paths, but you would need to save and restore the original working directory before and after changing it
The Cwd core module provides getcwd and your code would look like this
use Cwd 'getcwd';
sub do_search {
my ($curr_node) = #_;
my $cwd = getcwd;
chdir $curr_node->rootDirectory or die $!;
opendir my $dh, '.' or die $!;
while ( my $node = readdir $dh ) {
next if $node eq '.' or $node eq '..';
print "File: \n" if -f $node;
print "Directory $node\n" if -d $node;
}
chdir $cwd or die $!;
}
Use this CPAN Module to get all files and subdirectories recursively.
use File::Find;
find(\&getFile, $dir);
my #fileList;
sub getFile{
print $File::Find::name."\n";
# Below lines will print only file name.
#if ($File::Find::name =~ /.*\/(.*)/ && $1 =~ /\./){
#push #fileList, $File::Find::name."\n";
}
Already answered, but sometimes is handy not to care with the implementation details and you could use some CPAN modules for hiding such details.
One of them is the wonderful Path::Tiny module.
Your code could be as:
use 5.014; #strict + feature 'say' + ...
use warnings;
use Path::Tiny;
do_search($_) for #ARGV;
sub do_search {
my $curr_node = path(shift);
for my $node ($curr_node->children) {
say "Directory : $node" if -d $node;
say "Plain File : $node" if -f $node;
}
}
The children method excludes the . and the .. automatically.
You also need understand that the -f test is true only for the real files. So, the above code excludes for example symlinks (whose points to real files), or FIFO files, and so on... Such "files" could be usually opened and read as plain files, therefore somethimes instead of the -f is handy to use the -e && ! -d test (e.g. exists, but not an directory).
The Path::Tiny has some methods for this, e.g. you could write
for my $node ($curr_node->children) {
print "Directory : $node\n" if $node->is_dir;
print "File : $node\n" if $node->is_file;
}
the is_file method is usually DWIM - e.g. does the: -e && ! -d.
Using the Path::Tiny you could also easily extend your function to walk the whole tree using the iterator method:
use 5.014;
use warnings;
use Path::Tiny;
do_search($_) for #ARGV;
sub do_search {
#maybe you need some error-checking here for the existence of the argument or like...
my $iterator = path(shift)->iterator({recurse => 1});
while( my $node = $iterator->() ) {
say "Directory : ", $node->absolute if $node->is_dir;
say "File : ", $node->absolute if $node->is_file;
}
}
The above prints the type for all files and directories recursive down from the given argument...
And so on... the Path::Tiny is really worth to have installed.

Unable to get absolute path of a file from $File::Find::name - perl

I am unable to get the absolute path of a file from $File::Find::name. It is showing undef vale as a output. Not able to figure it out why :( can any one please help me out in this
Error displayed is : Use of uninitialized value $file_name in concatenation
My Code :
use strict;
use warnings;
use File::Find;
use File::Path qw(make_path);
use File::Copy;
use Cwd;
use Data::Printer;
my $rootPATH = $ARGV[0];
my $id = $ARGV[1];
my #Arraypath;
my $file_name;
our $anr_name;
opendir( my $DIR, $rootPATH );
while ( my $entry = readdir $DIR ) {
next unless -d $rootPATH . '/' . $entry;
next if $entry eq '.' or $entry eq '..';
#print "Found directory $entry\n";
push( #Arraypath, ( split( "\n", $entry ) ) );
}
closedir $DIR;
my $count = 0;
foreach my $line (#Arraypath) {
my $fulllogpath = $rootPATH . "\\" . $line;
#print "$fulllogpath\n";
$count++;
start($fulllogpath);
}
sub start {
my $fulllogpath = shift;
our #content;
#print "$fulllogpath\n\n";
find( \&wanted, $fulllogpath );
sub wanted {
push #content, $_;
return;
}
foreach my $file (#content) {
# print "$file\n\n";
if ( $file =~ /traces[_d]*/ ) {
print "$file\n\n";
$file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}
}
}
Your program is very poorly layed out. It will be much simpler to write and debug code if you indent it properly and use carefully-chosen identifiers: a name like start for a subroutine is useless.
You also have unnecessary subroutine declarations which break up the program flow and make it awkward to follow.
Why do you have a couple of package variables (declared with our)? There is generally no need for them, and it is best to use lexical variables throughout, declared at an appropriate place so that all code has access to them if it needs it.
It is also preferable to use File::Spec to work with file paths, rather than manipulate them using string operators, with which it is easy to make a mistake.
The best way to manage the results of find is to work with absolute paths all the way through. It looks like you want to do more than just print the results returned by find since you load modules like Cwd and File::Copy, but without knowing what that further purpose is I cannot help you to write it.
This code removes all the subroutines and makes everything much more concise.
use strict;
use warnings;
use autodie;
use File::Find 'find';
use File::Spec;
use Data::Printer;
my ($root_path, $id) = #ARGV;
opendir my ($dh), $root_path;
my #dir_list =
grep -d,
map File::Spec->catfile($root_path, $_),
grep { not /\A\.\.?\z/ } readdir $dh;
closedir $dh;
my $count;
for my $dir (#dir_list) {
++$count;
find(sub {
return unless /traces[_d]*/;
my $file = $_;
print "$file\n\n";
my $file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}, $dir);
}
As has already been stated, $File::Find::name is valid only within the wanted function. Not outside of it.
However, I would recommend making the shift to using Path::Class and Path::Class::Rule for some simpler processing of your files in a cross platform compatible way:
use strict;
use warnings;
use Data::Printer;
use Path::Class;
use Path::Class::Rule;
my ( $root_path, $id ) = #ARGV;
my $dir = dir($root_path);
my $next = Path::Class::Rule->new->file->name(qr{traces[_d]*})->iter(
grep { $_->is_dir() } $dir->children
);
while ( my $file = $next->() ) {
# Accomplishes the same as your script. I suspect these prints statements are mostly for debugging though.
print $file->basename(), "\n\n";
p "$file";
print "$file\n";
}

Perl search for specific subdirectory then process

So for the program I am writing, what I would like for it to do is search through all of the subdirectories within a directory. If the subdirectory name contains a word, let's say "foo", then the program will open this subdirectory and perform a function on the files within the subdirectory. Can anybody give me some help on how to go about this? it also needs to be recursive. Thanks in advance
This can be done using the File::Find module, but I believe Path::Class is superior even though it isn't a core module and will likely need installing.
This program finds the files wanted and calls process to process them. At present the process subroutine simply prints the name of the file for testing.
use strict;
use warnings;
use Path::Class;
my $dir = dir '/path/to/root/directory';
$dir->recurse(callback => sub {
my $node = shift;
return if $node->is_dir;
my $parent = $node->parent;
if ($parent->basename =~ /foo/) {
process($node);
}
});
sub process {
my $file = shift;
print $file, "\n";
}
Update
If you prefer, this program performs the same task using File::Find.
use strict;
use warnings;
use File::Find;
use File::Basename qw/ basename /;
my $dir = '/path/to/root/directory';
find(sub {
return unless -f;
if (basename($File::Find::dir) =~ /foo/) {
process($File::Find::name);
}
}, $dir);
sub process {
my $file = shift;
print $file, "\n";
}
Update
As requested, here is a further solution using Path::Class::Rule for comparison. As daxim suggested the code is a little shorter.
use strict;
use warnings;
use Path::Class::Rule;
my $rule = Path::Class::Rule->new;
$rule->file->and(sub { $_->parent->basename =~ /foo/ });
my $next = $rule->iter('/path/to/root/directory');
while ( my $file = $next->() ) {
process($file);
}
sub process {
my $file = shift;
print $file, "\n";
}

change the directory and grab the xml file to parse certain data in perl

I am trying to parse specific XML file which is located in sub directories of one directory. For some reason i am getting error saying file does not exists. if the file does not exist it should move on to next sub directory.
HERE IS MY CODE
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
my #xmlsearch = map { chomp; $_ } `ls`;
foreach my $directory (#xmlsearch) {
print "$directory \n";
chdir($directory) or die "Couldn't change to [$directory]: $!";
my #findResults = `find -name education.xml`;
foreach my $educationresults (#findResults){
print $educationresults;
my $parser = new XML::Simple;
my $data = $parser->XMLin($educationresults);
print Dumper($data);
chdir('..');
}
}
ERROR
music/gitar/education.xml
File does not exist: ./music/gitar/education.xml
Using chdir the way you did makes the code IMO less readable. You can use File::Find for that:
use autodie;
use File::Find;
use XML::Simple;
use Data::Dumper;
sub findxml {
my #found;
opendir(DIR, '.');
my #where = grep { -d && m#^[^.]+$# } readdir(DIR);
closedir(DIR);
File::Find::find({wanted => sub {
push #found, $File::Find::name if m#^education\.xml$#s && -f _;
} }, #where);
return #found;
}
foreach my $xml (findxml()){
say $xml;
print Dumper XMLin($xml);
}
Whenever you find yourself relying on backticks to execute shell commands, you should consider whether there is a proper perl way to do it. In this case, there is.
ls can be replaced with <*>, which is a simple glob. The line:
my #array = map { chomp; $_ } `ls`;
Is just a roundabout way of saying
chomp(my #array = `ls`); # chomp takes list arguments as well
But of course the proper way is
my #array = <*>; # no chomp required
Now, the simple solution to all of this is simply to do
for my $xml (<*/education.xml>) { # find the xml files in dir 1 level up
Which will cover one level of directories, with no recursion. For full recursion, use File::Find:
use strict;
use warnings;
use File::Find;
my #list;
find( sub { push #list, $File::Find::name if /^education\.xml$/i; }, ".");
for (#list) {
# do stuff
# #list contains full path names of education.xml files found in subdirs
# e.g. ./music/gitar/education.xml
}
You should note that changing directories is not required, and in my experience, not worth the trouble. Instead of doing:
chdir($somedir);
my $data = XMLin($somefile);
chdir("..");
Simply do:
my $data = XMLin("$somedir/$somefile");

How to rename directories recursively

I want to rename directories recursively using File::Find::Rule, eg. remove extra spaces in each found but as I understand the module doesn't do finddepth and renames only one. Is there a way to do that. Thanks.
use autodie;
use strict ;
use warnings;
use File::Find::Rule;
my $dir = 'D:/Test';
my #fd = File::Find::Rule->directory
->in( $dir );
for my $fd ( #fd ) {
my $new = $fd;
$new =~ s/\s\s+/ /g;
print "$new\n";
rename $fd, $new;
}
You want to process the deeper results first, so process the list in reverse. You can only rename the leaf part of the path; you'll get to the more shallow parts later.
use Path::Class qw( dir );
for ( reverse #fd ) {
my $dir = dir($_);
my $parent = $dir->parent;
my $old_leaf = my $new_leaf = $dir->dir_list(-1);
$new_leaf =~ s/\s+/ /g;
if ($new_leaf ne $old_leaf) {
my $old_file = $parent->dir($old_leaf);
my $new_file = $parent->dir($new_leaf);
# Prevent accidental deletion of files.
if (-e $new_file) {
warn("$new_file already exists\n");
next;
}
rename($old_file, $new_file);
}
}
Answer to original question:
I don't see how FFR comes into play.
rename 'Test1/Test2/Test3', 'Test1/Test2/Dir3';
rename 'Test1/Test2', 'Test1/Dir2';
rename 'Test1', 'Dir1';
For arbitrary paths,
use Path::Class qw( dir );
my #parts1 = dir('Test1/Test2/Test3')->dir_list();
my #parts2 = dir('Dir1/Dir2/Dir3' )->dir_list();
die if #parts1 != #parts2;
for (reverse 0..$#parts1) {
my $path1 = dir(#parts1[ 0..$_ ]);
my $path2 = dir(#parts2[ 0..$_ ]);
rename($path1, $path2);
}
Or maybe you want to rename all Test1 to Dir1, Test2 to Dir2, and Test3 to Dir3, process the list in reverse order.
my %map = (
'Test1' => 'Dir1',
'Test2' => 'Dir2',
'Test3' => 'Dir3',
);
my $pat = join '|', map quotemeta, keys %map;
for ( reverse #fd ) {
my $o = $_;
my $n = $_;
$n =~ s{/\K($pat)\z}{$map{$1}};
if ($n ne $o) {
if (-e $n) {
warn("$n already exists\n");
next;
}
rename($o, $n);
}
}
I have a module for doing actions recursively in a directory tree. It didn't have the ability to act on the directories themselves though, so it took a little updating. I have uploaded version 0.03 of my File::chdir::WalkDir, but until it shows up, it can be installed from its GitHub repo, and now available using your fav CPAN utility. This script would then remove spaces from directory names inside the base directory 'Test' relative to the working directory:
#!/usr/bin/env perl
use strict;
use warnings;
use File::chdir::WalkDir 0.030;
use File::Copy;
my $job = sub {
my ($name, $in_dir) = #_;
#ONLY act on directories
return 0 unless (-d $name);
my $new_name = $name;
if ($new_name =~ s/\s+/ /g) {
move($name, $new_name);
}
};
walkdir( 'Test', $job, {'act_on_directories' => 1} );