Renaming items in a File::Find folder traversal - perl

I am supposed to traverse through a whole tree of folders and rename everything (including folders) to lower case. I looked around quite a bit and saw that the best way was to use File::Find. I tested this code:
#!/usr/bin/perl -w
use File::Find;
use strict;
print "Folder: ";
chomp(my $dir = <STDIN>);
find(\&lowerCase, $dir);
sub lowerCase{
print $_," = ",lc($_),"\n";
rename $_, lc($_);
}
and it seems to work fine. But can anyone tell me if I might run into trouble with this code? I remember posts on how I might run into trouble because of renaming folders before files or something like that.

If you are on Windows, as comments stated, then no, renaming files or folders in any order won't be a problem, because a path DIR1/file1 is the same as dir1/file1 to Windows.
It MAY be a problem on Unix though, in which case you are better off doing a recursive BFS by hand.
Also, when doing system calls like rename, ALWAYS check result:
rename($from, $to) || die "Error renaming $from to $to: $!";
As noted in comments, take care about renaming "ABC" to "abc". On Windows is not a problem.

Personally, I prefer to:
List files to be renamed using find dir/ > 2b_renamed
Review the list manually, using an editor of choice (vim 2b_renamed, in my case)
Use the rename from CPAN on that list: xargs rename 'y/A-Z/a-z/' < 2b_renamed
That manual review is very important to me, even when I can easily rollback changes (via git or even Time Machine).

Related

How can i delete directories including their contents using perl?

It is possible to delete few directories including their contents at once using perl ?
Eg:- I have below folders
test1
test2
test3
test4
Can i delete them like rm -rf test* ?
i tried with below
system(#rm, '-fr', "$path/test*");
but its not taking *
I must first say that this is inherently dangerous, recursively removing directories out of a program.
Having said that, one module that provides the capability is the core File::Path
use File::Path qw(remove_tree);
...
remove_tree( #dirs_to_delete, safe => 1 );
This will croak on errors, which is a pretty good idea in my opinion if recursive removal had trouble. But if that won't work with your designs you can use the error option and then it's up to you to check for errors. See Error Handling, and see a few other useful options.
This removes files as well. So be careful in composing the list of things to blow away, for example
use File::Glob ':bsd_glob';
my #dirs_to_delete = glob "$path/test*/"; # note the trailing /
where the trailing / makes it so that only directories are returned. With File::Glob use statement we get the glob from :bsd_glob which works with filenames that have spaces.†
It may also be safer to first get "closer" to what you want to remove
my $orig_cwd = Cwd::cwd;
chdir $path or die "Can't \"chdir $path\": $!";
my #dirs_to_delete = glob "test*/";
# chdir $orig_cwd;
(there are options other than Cwd for chdir-ing and getting back to the original cwd)
This of course does not (at all) make it error-proof.
Please excuse repeated warnings but posting on recursive removal makes me a little nervous.
† Or double those (double) quotes, glob qq{"$path/test*/"}, or use the \Q form of quotemeta, glob "\Q$path/test/*/", but which also escapes all non-word characters.

About searching recursively in Perl

I have a Perl script that I, well, mostly pieced together from questions on this site. I've read the documentation on some parts to better understand it. Anyway, here it is:
#!/usr/bin/perl
use File::Find;
my $dir = '/home/jdoe';
my $string = "hard-coded pattern to match";
find(\&printFile, $dir);
sub printFile
{
my $element = $_;
if(-f $element && $element =~ /\.txt$/)
{
open my $in, "<", $element or die $!;
while(<$in>)
{
if (/\Q$string\E/)
{
print "$File::Find::name\n";
last; # stops looking after match is found
}
}
}
}
This is a simple script that, similar to grep, will look down recursively through directories for a matching string. It will then print the location of the file that contains the string. It works, but only if the file is located in my home directory. If I change the hard-coded search to look in a different directory (that I have permissions in), for example /admin/programs, the script no longer seems to do anything: No output is displayed, even when I know it should be matching at least one file (tested by making a file in admin/programs with the hard-coded pattern. Why am I experiencing this behavior?
Also, might as well disclaim that this isn't a really useful script (heck, this would be so easy with grep or awk!), but understanding how to do this in Perl is important to me right now. Thanks
EDIT: Found the problem. A simple oversight in that the files in the directory I was looking for did not have .txt as extension. Thanks for helping me find that.
I was able to get the desired output using the code you pasted by making few changes like:
use strict;
use warnings;
You should always use them as they notify of various errors in your code which you may not get hold of.
Next I changed the line :
my $dir = './home/jdoe'; ##'./admin/programs'
The . signifies current directory. Also if you face problems still try using the absolute path(from source) instead of relative path. Do let me know if this solves your problem.
This script works fine without any issue. One thing hidden from this script to us is the pattern. you can share the pattern and let us know what you are expecting from that pattern, so that we can validate that.
You could also run your program in debug mode i.e.,
perl -d your_program.
That should take you to debug mode and there are lot of options available to inspect through the flow. type 'n' on the debug prompt to step in to the code flow to understand how your code flows. Typing 'n' will print the code execution point and its result

Perl recursively copy files of specific format [duplicate]

This question already has answers here:
How can I copy a directory recursively and filter filenames in Perl?
(5 answers)
Closed 8 years ago.
Note: The related question is solved by use of a deprecated module and is not consistent across OS. The answer to current question uses newer modules and hence is being posted here.
I have a module that in turn uses the File::NCopy CPAN module for recursively copying files from to .
The problem is - I need to recursively copy only the files of specific file type to the destination. Is there any way to filter the source by extension?
As an alternative - is there a way to copy all files except the hidden files? My main problem being the .git folder also gets copied - which is not desired.
Platform: MacOS
Alternatives explored:
1) File::Copy::Recursive module :- seems to provide only recursive copy of files or directories. Does not seem to help with either hidden files or exclude filter
2) Using rsync -avz --exclude=".*" :- unable to combine this with recursive copy functionality.
3) Homegrown solution similar to How can I copy a directory recursively and filter filenames in Perl? :- Might be the last resort - but does not seem portable unless tweaked and tested across different platforms. Will be falling back to this unless a module already exists.
4) https://metacpan.org/pod/Path::Class::Dir :- Seems plausible - will be running a quick implementation using this.
Implemented Solution:
I used the recursive module and the Path::Class::Dir
dir($sourceDir)->recurse(callback => sub {
my $file = shift;
return if($file eq '.' || $file eq '..');
return if -d $file;
if (<custom filter>)
{
my $path = file($file)->relative($sourceDir);
fcopy("$sourceDir/$path", "$destinationDir/$path") or die "Could not perform fcopy: $!";
}
});
Without the relative path - the destination folder structure does not seem to be the same as the source folder structure.
I think the easiest solution is to use File::Copy::Recursive to copy the directory structure fully, and then to go back with File::Find::Rule to determine all the dirs that you want to filter and then remove them.
Given that .git folders don't necessarily hold that much data, I think the performance hit from copying more files than you need to is likely to be pretty small. The following would be sufficient to accomplish what you desire:
use strict;
use warnings;
use File::Copy::Recursive qw(dircopy pathrmdir);
use File::Find::Rule;
my $src = '...src...';
my $dest = '...dest...';
dircopy($src, $dest) or die "Can't dircopy: $!";
my #git = File::Find::Rule->directory()
->name('.git')
->in($dest);
pathrmdir($_) or die "Can't remove $_: $!" for (#git);
Alternatively, if you'd like to roll your own, you might take a look at File::Find::Rule #Further Examples which includes an example on how to "ignore CVS directories".

How can I get the last changed directory in Perl?

Apache version 2.2.11 (Unix)
Architecture x86_64
Operating system Linux
Kernel version 2.6.18-164.el5
Ok, here is what I have working. However, I may not be using File::Util for anything else in the rest of the script.
My directory names are 8 digits starting at 10000000 .
I was comparing the highest found number with stat last created as a double check but, overkill I believe.
Another issue is that I did not know how to slap a regex in the list_dir command so only 8 digits eg m!^([0-9]{8})\z!x) could reside in that string. Reading the man, the example reads ....'--pattern=\.txt$') but, my futile attempt: '--pattern=m!^([0-9]{8})\z!x)') well, was just that.
So, would there be a "better" way to grab the latest folder/directory?
use File::Util;
my($f) = File::Util->new();
my(#dirs) = $f->list_dir('/home/accountname/public_html/topdir','--no-fsdots');
my #last = (sort { $b <=> $a } #dirs);
my $new = ($last[0]+1);
print "Content-type: text/html\n\n";
print "I will now create dir $new\n";
And.. How would I ignore anything not matching my regex?
I was thinking an answer may reside in ls -d as well but, as a beginner here, I am new to system calls from a script (and if in fact that's what that would be? ;-) ).
More specifically:
Best way to open a directory, return the name of the latest 8 digit directory in that directory ignoring all else. Increase the 8 digit dir name by 1 and create the new directory.
Whichever is most efficient: stat or actual 8 digit file name. (directory names are going to be 8 digits either way.) Better to use File::Util or just built in Perl calls?
What are you doing? It sounds really weird and fraught with danger. I certainly wouldn't want to let a CGI script create new directories. There might be a better solution for what you are trying to achieve.
How many directories do you expect to have? The more entries you have in any directory, the slower things are going to get. You should work out a scheme where you can hash things into a directory structure that spreads out the files so no directory holds that many items. Say, it you have the name '0123456789', you create the directory structure like:
0/01/0123456789
You can have as many directory levels as you like. See the directory structure of CPAN, for instance. My author name is BDFOY, so my author directory is authors/id/B/BD/BDFOY. That way there isn't any directory that has a large number of entries (unless your author id is ADAMK or RJBS).
You also have a potential contention issue to work out. Between the time you discover the latest and the time you try to make the next one, you might already create the directory.
As for the task at hand, I think I'd punt to system for this one if you are going to have a million directories. With something like:
ls -t -d -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -1
I don't think you'll be able to get any faster than ls for this task. If there are a large number of directories, the cost of the fork should be outweighed by the work you have to do to go through everything yourself.
I suspect, however, that what you really need is some sort of database.
Best way to open a directory, return the name of the latest 8 digit directory in that directory ignoring all else. Increase the 8 digit dir name by 1 and create the new directory. Whichever is most efficient: stat or actual 8 digit file name?
First, I should point out that having about 100,000,000 subdirectories in a directory is likely to be very inefficient.
How do you get only the directory names that consist of eight digits?
use File::Slurp;
my #dirs = grep { -d and /\A[0-9]{8}\z/ } read_dir $top;
How do you get the largest?
use List::Util qw( max );
my $latest = max #dirs;
Now, the problem is, between the determination of $latest and the attempt to create the directory, some other process can create the same directory. So, I would use $latest as the starting point and keep trying to create the next directory until I succeed or run out of numbers.
#/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
use File::Spec::Functions qw( catfile );
use List::Util qw( max );
sub make_numbered_dir {
my $max = 100_000_000;
my $top = '/home/accountname/public_html/topdir';
my $latest = max grep { /\A[0-9]{8}\z/ } read_dir $top;
while ( ++$latest < $max ) {
mkdir catfile($top, sprintf '%8.8d', $latest)
and return 1;
}
return;
}
If you try to do it the way I originally recommended, you will invoke mkdir way too many times.
As for how you use File::Util::list_dir to filter entries:
#/usr/bin/perl
use strict;
use warnings;
use File::Util;
my $fu = File::Util->new;
print "$_\n" for $fu->list_dir('.',
'--no-fsdots',
'--pattern=\A[0-9]{8}\z'
);
C:\Temp> ks
10001010
12345678
However, I must point out that I did not much like this module in the few minutes I spent with it, especially the module author's obsession with invoking methods and functions in list context. I do not think I will be using it again.

What reasons are there to prefer glob over readdir (or vice-versa) in Perl?

This question is a spin-off from this one. Some history: when I first learned Perl, I pretty much always used glob rather than opendir + readdir because I found it easier. Then later various posts and readings suggested that glob was bad, and so now I pretty much always use readdir.
After thinking over this recent question I realized that my reasons for one or the other choice may be bunk. So, I'm going to lay out some pros and cons, and I'm hoping that more experienced Perl folks can chime in and clarify. The question in a nutshell is are there compelling reasons to prefer glob to readdir or readdir to glob (in some or all cases)?
glob pros:
No dotfiles (unless you ask for them)
Order of items is guaranteed
No need to prepend the directory name onto items manually
Better name (c'mon - glob versus readdir is no contest if we're judging by names alone)
(From ysth's answer; cf. glob cons 4 below) Can return non-existent filenames:
#deck = glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
glob cons:
Older versions are just plain broken (but 'older' means pre 5.6, I think, and frankly if you're using pre 5.6 Perl, you have bigger problems)
Calls stat each time (i.e., useless use of stat in most cases).
Problems with spaces in directory names (is this still true?)
(From brian's answer) Can return filenames that don't exist:
$ perl -le 'print glob "{ab}{cd}"'
readdir pros:
(From brian's answer) opendir returns a filehandle which you can pass around in your program (and reuse), but glob simply returns a list
(From brian's answer) readdir is a proper iterator and provides functions to rewinddir, seekdir, telldir
Faster? (Pure guess based on some of glob's features from above. I'm not really worried about this level of optimization anyhow, but it's a theoretical pro.)
Less prone to edge-case bugs than glob?
Reads everything (dotfiles too) by default (this is also a con)
May convince you not to name a file 0 (a con also - see Brad's answer)
Anyone? Bueller? Bueller?
readdir cons:
If you don't remember to prepend the directory name, you will get bit when you try to do filetests or copy items or edit items or...
If you don't remember to grep out the . and .. items, you will get bit when you count items, or try to walk recursively down the file tree or...
Did I mention prepending the directory name? (A sidenote, but my very first post to the Perl Beginners mail list was the classic, "Why does this code involving filetests not work some of the time?" problem related to this gotcha. Apparently, I'm still bitter.)
Items are returned in no particular order. This means you will often have to remember to sort them in some manner. (This could be a pro if it means more speed, and if it means that you actually think about how and if you need to sort items.) Edit: Horrifically small sample, but on a Mac readdir returns items in alphabetical order, case insensitive. On a Debian box and an OpenBSD server, the order is utterly random. I tested the Mac with Apple's built-in Perl (5.8.8) and my own compiled 5.10.1. The Debian box is 5.10.0, as is the OpenBSD machine. I wonder if this is a filesystem issue, rather than Perl?
Reads everything (dotfiles too) by default (this is also a pro)
Doesn't necessarily deal well with a file named 0 (see pros also - see Brad's answer)
You missed the most important, biggest difference between them: glob gives you back a list, but opendir gives you a directory handle. You can pass that directory handle around to let other objects or subroutines use it. With the directory handle, the subroutine or object doesn't have to know anything about where it came from, who else is using it, and so on:
sub use_any_dir_handle {
my( $dh ) = #_;
rewinddir $dh;
...do some filtering...
return \#files;
}
With the dirhandle, you have a controllable iterator where you can move around with seekdir, although with glob you just get the next item.
As with anything though, the costs and benefits only make sense when applied to a certain context. They do not exist outside of a particular use. You have an excellent list of their differences, but I wouldn't classify those differences without knowing what you were trying to do with them.
Some other things to remember:
You can implement your own glob with opendir, but not the other way around.
glob uses its own wildcard syntax, and that's all you get.
glob can return filenames that don't exist:
$ perl -le 'print glob "{ab}{cd}"'
glob pros: Can return 'filenames' that don't exist:
my #deck = List::Util::shuffle glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
while (my #hand = splice #deck,0,13) {
say join ",", #hand;
}
__END__
6♥,8♠,7♠,Q♠,K♣,Q♦,A♣,3♦,6♦,5♥,10♣,Q♣,2♠
2♥,2♣,K♥,A♥,8♦,6♠,8♣,10♠,10♥,5♣,3♥,Q♥,K♦
5♠,5♦,J♣,J♥,J♦,9♠,2♦,8♥,9♣,4♥,10♦,6♣,3♠
3♣,A♦,K♠,4♦,7♣,4♣,A♠,4♠,7♥,J♠,9♥,7♦,9♦
glob makes it convenient to read all the subdirectories of a given fixed depth, as in glob "*/*/*". I've found this handy in several occasions.
Here is a disadvantage for opendir and readdir.
{
open my $file, '>', 0;
print {$file} 'Breaks while( readdir ){ ... }'
}
opendir my $dir, '.';
my $a = 0;
++$a for readdir $dir;
print $a, "\n";
rewinddir $dir;
my $b = 0;
++$b while readdir $dir;
print $b, "\n";
You would expect that code would print the same number twice, but it doesn't because there is a file with the name of 0. On my computer it prints 251, and 188, tested with Perl v5.10.0 and v5.10.1
This problem also makes it so that this just prints out a bunch of empty lines, regardless of the existence of file 0:
use 5.10.0;
opendir my $dir, '.';
say while readdir $dir;
Where as this always works just fine:
use 5.10.0;
my $a = 0;
++$a for glob '*';
say $a;
my $b = 0;
++$b while glob '*';
say $b;
say for glob '*';
say while glob '*';
I fixed these issues, and sent in a patch which made it into Perl v5.11.2, so this will work properly with Perl v5.12.0 when it comes out.
My fix converts this:
while( readdir $dir ){ ... }
into this:
while( defined( $_ = readdir $dir ){ ...}
Which makes it work the same way that read has worked on files. Actually it is the same bit of code, I just added another element to the corresponding if statements.
Well, you pretty much cover it. All that taken into account, I would tend to use glob when I'm throwing together a quick one-off script and its behavior is just what I want, and use opendir and readdir in ongoing production code or libraries where I can take my time and clearer, cleaner code is helpful.
That was a pretty comprehensive list. readdir (and readdir + grep) has less overhead than glob and so that is a plus for readdir if you need to analyze lots and lots of directories.
For small, simple things, I prefer glob. Just the other day, I used it and a twenty line perl script to retag a large portion of my music library. glob, however, has a pretty strange name. Glob? It's not intuitive at all, as far as a name goes.
My biggest hangup with readdir is that it treats a directory in a way that's somewhat odd to most people. Usually, programmers don't think of a directory as a stream, they think of it as a resource, or list, which glob provides. The name is better, the functionality is better, but the interface still leaves something to be desired.
glob pros:
3) No need to prepend the directory name onto items manually
Exception:
say for glob "*";
--output:--
1perl.pl
2perl.pl
2perl.pl.bak
3perl.pl
3perl.pl.bak
4perl.pl
data.txt
data1.txt
data2.txt
data2.txt.out
As far as I can tell, the rule for glob is: you must provide a full path to the directory to get full paths back. The Perl docs do not seem to mention that, and neither do any of the posts here.
That means that glob can be used in place of readdir when you want just filenames (rather than full paths), and you don't want hidden files returned, i.e. ones starting with '.'. For example,
chdir ("../..");
say for glob("*");
On a similar note, File::Slurp has a function called read_dir.
Since I use File::Slurp's other functions a lot in my scripts, read_dir has also become a habit.
It also has following options: err_mode, prefix, and keep_dot_dot.
First, do some reading. Chapter 9.6. of the Perl Cookbook outlines the point I want to get to nicely, just under the discussion heading.
Secondly, do a search for glob and dosglob in your Perl directory. While many different sources (ways to get the file list) can be used, the reason why I point you to dosglob is that if you happen to be on a Windows platform (and using the dosglob solution), it is actually using opendir/readdir/closedir. Other versions use built-in shell commands or precompiled OS specific executables.
If you know you are targetting a specific platform, you can use this information to your advantage. Just for reference I looked into this on Strawberry Perl Portable edition 5.12.2, so things may be slightly different on newer or original versions of Perl.