Perl script to penetrate to sub directories - perl

I have a long list of sub directories such that C0/C1/C2...C354. It individually contains respective files. I am trying to change the ownership of the directories (not the files in the directories). This is what I have wrote:
#!/usr/bin/perl -w
use strict;
use File::Find;
my #directories;
find sub{
print "$File::Find::name";
print "\n";
return unless -d;
next if (m/^\./);
push #directories, $File::Find::name;
}, ".";
foreach my $file (#directories){
my $cmd = qx |chown deep:deep $file|;
}
It goes uptil C0/C1 and stops penetrating into the other files. Is there a problem with my linux file-system or there is a limitation to the File::Find module in Perl. Please help. Thank you.

There is no loop inside the function given to File::Find::find but you still use "next" inside it. This will cause it to look for a loop outside the function and it will probably find it somewhere inside File::Find, thus causing erratic behavior.
You probably want to have a "return" instead of a "next".
And there should have been a (runtime) warning about it.

Related

Perl glob returning a false positive

What seemed liked a straightforward piece of code most certainly didn't do what I wanted it to do.
Can somebody explain to me what it does do and why?
my $dir = './some/directory';
if ( -d $dir && <$dir/*> ) {
print "Dir exists and has non-hidden files in it\n";
}
else {
print "Dir either does not exist or has no non-hidden files in it\n";
}
In my test case, the directory did exist and it was empty. However, the then (first) section of the if triggered instead of the else section as expected.
I don't need anybody to suggest how to accomplish what I want to accomplish. I just want to understand Perl's interpretation of this code, which definitely does not match mine.
Using glob (aka <filepattern>) in a scalar context makes it an iterator; it will return one file at a time each time it is called, and will not respond to changes in the pattern (e.g. a different $dir) until it has finished iterating over the initial results; I suspect this is causing the trouble you see.
The easy answer is to always use it in list context, like so:
if( -d $dir && ( () = <$dir/*> ) ) {
glob may only really be used safely in scalar context in code you will execute more than once if you are absolutely sure you will exhaust the iterator before you try to start a new iteration. Most of the time it's just easier to avoid glob in scalar context altogether.
I believe that #ysth is on the right track, but repeated calls to glob in scalar context don't generate false positives.
For example
use strict;
use warnings;
use 5.010;
say scalar glob('/usr/*'), "\n";
say scalar glob('/usr/*'), "\n";
output
/usr/bin
/usr/bin
But what is true is that any single call to glob maintains a state, so if I have
use strict;
use warnings;
use 5.010;
for my $dir ( '/sys', '/usr', '/sys', '/usr' ) {
say scalar glob("$dir/*"), "\n";
}
output
/sys/block
/sys/bus
/sys/class
/sys/dev
So clearly that glob statement inside the loop is maintaining a state, and ignoring the changes to $dir.
This is similar to the way that the pos (and corresponding \G regex anchor) has a state per scalar variable, and how print without a specific file handle prints to the last selected handle. In the end it is how all of Perl works, with the it variable $_ being the ultimate example.

simple way to test if a given filename is underneath a directory?

Is there a Perl module (preferably core) that has a function that will tell me if a given filename is inside a directory (or a subdirectory of the directory, recursively)?
For example:
my $f = "/foo/bar/baz";
# prints 1
print is_inside_of($f, "/foo");
# prints 0
print is_inside_of($f, "/foo/asdf");
I could write my own, but there are some complicating factors such as symlinks, relative paths, whether it's OK to examine the filesystem or not, etc. I'd rather not reinvent the wheel.
Path::Tiny is not in core, but it has no non-core dependencies, so is a very quick and easy installation.
use Path::Tiny qw(path);
path("/usr/bin")->subsumes("/usr/bin/perl"); # true
Now, it does this entirely by looking at the file paths (after canonicalizing them), so it may or may not be adequate depending on what sort of behaviour you're expecting in edge cases like symlinks. But for most purposes it should be sufficient. (If you want to take into account hard links, the only way is to search through the entire directory structure and compare inode numbers.)
If you want the function to work for only a filename (without a path) and a path, you can use File::Find:
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
sub is_inside_of {
my ($file, $path) = #_;
my $found;
find( sub { $found = 1 if $_ eq $file }, $path);
return $found
}
If you don't want to check the filesystem, but only process the path, see File::Spec for some functions that can help you. If you want to process symlinks, though, you can't avoid touching the file system.

Recursive directory traversal in perl

I intend to recursively traverse a directory containing this piece of perl script.
The idea is to traverse all directories whose parent directory contains the perl script and list all files path into a single array variable. Then return the list.
Here comes the error msg:
readdir() attempted on invalid dirhandle $DIR at xxx
closedir() attempted on invalid dirhandle $DIR at xxx
Code is attached for reference, Thank you in advance.
use strict;
use warnings;
use Cwd;
our #childfile = ();
sub recursive_dir{
my $current_dir = $_[0]; # a parameter
opendir(my $DIR,$current_dir) or die "Fail to open current directory,error: $!";
while(my $contents = readdir($DIR)){
next if ($contents =~ m/^\./); # filter out "." and ".."
#if-else clause separate dirs from files
if(-d "$contents"){
#print getcwd;
#print $contents;
#closedir($DIR);
recursive_dir(getcwd."/$contents");
print getcwd."/$contents";
}
else{
if($contents =~ /(?<!\.pl)$/){
push(#childfile,$contents);
}
}
}
closedir($DIR);
#print #childfile;
return #childfile;
}
recursive_dir(getcwd);
Please tell us if this is homework? You are welcome to ask for help with assignments, but it changes the sort of answer you should be given.
You are relying on getcwd to give you the current directory that you are processing, yet you never change the current working directory so your program will loop endlessly and eventually run out of memory. You should simply use $current_dir instead.
I don't believe that those error messages can be produced by the program you show. Your code checks whether opendir has succeeded and the program dies unless $DIR is valid, so the subsequent readdir and closedir must be using a valid handle.
Some other points:
Comments like # a parameter are ridiculous and only serve to clutter your code
Upper-case letters are generally reserved for global identifiers like package names. And $dir is a poor name for a directory handle, as it could also mean the directory name or the directory path. Use $dir_handle or $dh
It is crazy to use a negative look-behind just to check that a file name doesn't end with .pl. Just use push #childfile, $contents unless $contents =~ /\.pl$/
You never use the return value from your subroutine, so it is wasteful of memory to return what could be an enormous array from every call. #childfile is accessible throughout the program so you can just access it directly from anywhere
Don't put scalar variables inside double quotes. It simply forces the value to a string, which is probably unnecessary and may cause arcane bugs. Use just -d $contents
You probably want to ignore symbolic links, as otherwise you could be looping endlessly. You should change else { ... } to elsif (-f $contents) { ... }

Finding files with Perl

File::Find and the wanted subroutine
This question is much simpler than the original title ("prototypes and forward declaration of subroutines"!) lets on. I'm hoping the answer, however simple, will help me understand subroutines/functions, prototypes and scoping and the File::Find module.
With Perl, subroutines can appear pretty much anywhere and you normally don't need to make forward declarations (except if the sub declares a prototype, which I'm not sure how to do in a "standard" way in Perl). For what I usually do with Perl there's little difference between these different ways of running somefunction:
sub somefunction; # Forward declares the function
&somefunction;
somefunction();
somefunction; # Bare word warning under `strict subs`
I often use find2perl to generate code which I crib/hack into parts of scripts. This could well be bad style and now my dirty laundry is public, but so be it :-) For File::Find the wanted function is a required subroutine - find2perl creates it and adds sub wanted; to the resulting script it creates. Sometimes, when I edit the script I'll remove the "sub" from sub wanted and it ends up as &wanted; or wanted();. But without the sub wanted; forward declaration form I get this warning:
Use of uninitialized value $_ in lstat at findscript.pl line 29
My question is: why does this happen and is it a real problem? It is "just a warning", but I want to understand it better.
The documentation and code say $_ is localized inside of sub wanted {}. Why would it be undefined if I use wanted(); instead of sub wanted;?
Is wanted using prototypes somewhere? Am I missing something obvious in Find/File.pm?
Is it because wanted returns a code reference? (???)
My guess is that the forward declaration form "initializes" wanted in some way so that the first use doesn't have an empty default variable. I guess this would be how prototypes - even Perl prototypes, such as they exist - would work as well. I tried grepping through the Perl source code to get a sense of what sub is doing when a function is called using sub function instead of function(), but that may be beyond me at this point.
Any help deepening (and speeding up) my understanding of this is much appreciated.
EDIT: Here's a recent example script here on Stack Overflow that I created using find2perl's output. If you remove the sub from sub wanted; you should get the same error.
EDIT: As I noted in a comment below (but I'll flag it here too): for several months I've been using Path::Iterator::Rule instead of File::Find. It requires perl >5.10, but I never have to deploy production code at sites with odd, "never upgrade", 5.8.* only policies so Path::Iterator::Rule has become one of those modules I never want to do with out. Also useful is Path::Class. Cheers.
I'm not a big fan of File::Find. It just doesn't work right. The find command doesn't return a list of files, so you either have to use a non-local array variable in your find to capture your list of files you've found (not good), or place your entire program in your wanted subroutine (even worse). Plus, the separate subroutine means that your logic is separate from your find command. It's just ugly.
What I do is inline my wanted subroutine inside my find command. Subroutine stays with the find. Plus, my non-local array variable is now just part of my find command and doesn't look so bad
Here's how I handle the File::Find -- assuming I want files that have a .pl suffix:
my #file_list;
find ( sub {
return unless -f; #Must be a file
return unless /\.pl$/; #Must end with `.pl` suffix
push #file_list, $File::Find::name;
}, $directory );
# At this point, #file_list contains all of the files I found.
This is exactly the same as:
my #file_list;
find ( \&wanted, $directory );
sub wanted {
return unless -f;
return unless /\.pl$/;
push #file_list, $File::Find::name;
}
# At this point, #file_list contains all of the files I found.
In lining just looks nicer. And, it keep my code together. Plus, my non-local array variable doesn't look so freaky.
I also like taking advantage of the shorter syntax in this particular way. Normally, I don't like using the inferred $_, but in this case, it makes the code much easier to read. My original Wanted is the same as this:
sub wanted {
my $file_name = $_;
if ( -f $file_name and $file_name =~ /\.pl$/ ) {
push #file_list, $File::Find::name;
}
}
File::Find isn't that tricky to use. You just have to remember:
When you find a file you don't want, you use return to go to the next file.
$_ contains the file name without the directory, and you can use that for testing the file.
The file's full name is $File::Find::name.
The file's directory is $File::Find::dir.
And, the easiest way is to push the files you want into an array, and then use that array later in your program.
Removing the sub from sub wanted; just makes it a call to the wanted function, not a forward declaration.
However, the wanted function hasn't been designed to be called directly from your code - it's been designed to be called by File::Find. File::Find does useful stuff like populating$_ before calling it.
There's no need to forward-declare wanted here, but if you want to remove the forward declaration, remove the whole sub wanted; line - not just the word sub.
Instead of File::Find, I would recommend using the find_wanted function from File::Find::Wanted.
find_wanted takes two arguments:
a subroutine that returns true for any filename that you would want.
a list of the files you are searching for.
find_wanted returns an array containing the list of filenames that it found.
I used code like the following to find all the JPEG files in certain directories on a computer:
my #files = find_wanted( sub { -f && /\.jpg$/i }, #dirs );
Explanation of some of the syntax, for those that might need it:
sub {...} is an anonymous subroutine, where ... is replaced with the code of the subroutine.
-f checks that a filename refers to a "plain file"
&& is boolean and
/\.jpg$/i is a regular expression that checks that a filename ends in .jpg (case insensitively).
#dirs is an array containing the directory names to be searched. A single directory could be searched as well, in which case a scalar works too (e.g. $dir).
Why not use open and invoke the shell find? The user can edit $findcommand (below) to be anything they want, or can define it in real time based on arguments and options passed to a script.
#!/usr/bin/perl
use strict; use warnings;
my $findcommand='find . -type f -mtime 0';
open(FILELIST,"$findcommand |")||die("can't open $findcommand |");
my #filelist=<FILELIST>;
close FILELIST;
my $Nfilelist = scalar(#filelist);
print "Number of files is $Nfilelist \n";

Perl Subdirectory Traversal

I am writing a script that goes through our large directory of Perl Scripts and checks for certain things. Right now, it takes two kinds of input: the main directory, or a single file. If a single file is provided, it runs the main function on that file. If the main directory is provided, it runs the main function on every single .pm and .pl inside that directory (due to the recursive nature of the directory traversal).
How can I write it (or what package may be helpful)- so that I can also enter one of the seven SUBdirectories, and it will traverse ONLY that subdirectory (instead of the entire thing)?
I can't really see the difference in processing between the two directory arguments. Surely, using File::Find will just do the right thing in both instances.
Something like this...
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $input = shift;
if (-f $input) {
# Handle a single file
handle_a_file($input);
} else {
# Handler a directory
handle_a_directory($input);
}
sub handle_a_file {
my $file = shift;
# Do whatever you need to with a single file
}
sub handle_a_directory {
my $dir = shift;
find(\&do this, $dir);
}
sub do_this {
return unless -f;
return unless /\.p[ml]$/;
handle_a_file($File::Find::name);
}
One convenient way would be to use the excellent Path::Class module, more precisely: the traverse() method of Path::Class::Dir. You'd control what to process from within the callback function which is supplied as the first argument to traverse(). The manpages has sample snippets.
Using the built-ins like opendir is perfectly fine, of course.
I've just turned to using Path::Class almost everywhere, though, as it has so many nice convenience methods and simply feels right. Be sure to read the docs for Path::Class::File to know what's available. Really does the job 99% of the time.
If you know exactly what directory and subdirectories you want to look at you can use glob("$dir/*/*/*.txt") for example to get ever .txt file in 3rd level of the given $dir