Recursive directory traversal in Perl - perl

I'm trying to write a script that prints out the file structure starting at the folder the script is located in. The script works fine without the recursive call but with that call it prints the contents of the first folder and crashes with the following message: closedir() attempted on invalid dirhandle DIR at printFiles.pl line 24. The folders are printed and the execution reaches the last line but why isn't the recursive call done? And how should I solve this instead?
#!/usr/bin/perl -w
printDir(".");
sub printDir{
opendir(DIR, $_[0]);
local(#files);
local(#dirs);
(#files) = readdir(DIR);
foreach $file (#files) {
if (-f $file) {
print $file . "\n";
}
if (-d $file && $file ne "." && $file ne "..") {
push(#dirs, $file);
}
}
foreach $dir (#dirs) {
print "\n";
print $dir . "\n";
printDir($dir);
}
closedir(DIR);
}

You should always use strict; and use warnings; at the start of your Perl program, especially before you ask for help with it. That way Perl will show up a lot of straightforward errors that you may not notice otherwise.
The invalid filehandle error is likely because DIR is a global directory handle and has been closed already by a previous execution of the subroutine. It is best to always used lexical handles for both files and directories, and to test the return code to make sure the open succeeded, like this
opendir my $dh, $_[0] or die "Failed to open $_[0]: $!";
One advantage of lexical file handles is that they are closed implicitly when they go out of scope, so there is no need for your closedir call at the end of the subroutine.
local isn't meant to be used like that. It doesn't suffice as a declaration, and you are creating a temporary copy of a global variable that everything can access. Best to use my instead, like this
my #dirs;
my #files = readdir $dh;
Also, the file names you are using from readdir have no path, and so your file tests will fail unless you either chdir to the directory being processed or append the directory path string to the file name before testing it.

Use the File::Find module. The way i usually do this is using the find2perl tool which comes with perl, which takes the same parameters as find and creates a suitable perl script using File::Find. Then i fine-tune the generated script to do what i want it to do. But it's also possible to use File::Find directly.

Why not use File::Find?
use strict; #ALWAYS!
use warnings; #ALWAYS!
use File::Find;
find(sub{print "$_\n";},".");

Related

Optimized way to print directory paths recursively without file comparison in perl

I have a directory which contains multiple levels of sub dirs. I want to print path for each and every directory.
Currently, I am using
use File::Find;
find(
{
wanted => \&findfiles,
}, $maindirectory);
sub findfiles
{
if (-d) {
push #arrayofdirs,$File::Find::dir;
}
}
But each subdirectory contains thousands of files at each level. The above code takes lot of time to provide the result as it compares each file for directory. Is there a way to get subdirectories path without comparing files to save time or any other optimized method?
Edit: This issue got partially resolved but a new issue came up because of this solution. I have listed it here: Multiple File search in varying level of directories in perl
If you are on a UNIX/Linux platform then you can try reading output of find $maindirectory -type d command into your program (see this answer for a safe way to do that.). This command prints the names of directories in $maindirectory. It is faster because a compiled C program (find) does all the hard work. The following script should print all directory paths found.
Sample script:
use strict;
use warnings;
my $maindirectory = '.';
open my $fh, '-|', 'find', $maindirectory, '-type', 'd' or die "Can't open pipe: $!";
while( my $dir = <$fh>) {
print $dir;
}
close $fh or warn "can't close pipe: $!";
Note that there is no point in calling find through perl and then just printing its output without any processing. You can just as well run find $maindirectory -type d in shell itself.

how to point this script to one folder for reading and another for writing

I can't seem to get this script to open from one directory and write to another. Both Directories exist. I've commented out what I tried. Funny this is it runs fine when I place it in the directory with the files to process. Here's the code:
use strict;
use warnings "all";
my $tmp;
my $dir = ".";
#my $dir = "Ask/Parsed/Html4/";
opendir(DIR, $dir) or die "Cannot open directory: $dir!\n";
my #files = readdir(DIR);
closedir(DIR);
open my $out, ">>output.txt" or die "Cannot open output.txt!\n";
#open my $out, ">>Ask/Parsed/Html5/output.txt" or die "Cannot open output.txt!\n";
foreach my $file (#files)
{
if($file =~ /html$/)
{
open my $in, "<$file" or die "Cannot open $file!\n";
undef $tmp;
while(<$in>)
{
$tmp .= $_;
}
print $out ">$file\n";
print $out "$tmp\n";
#print $out "===============";
close $in;
}
}
close $out;
The directories you use -- . and Ask/Parsed/Html4/ -- are relative paths, which means they are relative to your current working directory, and so it makes a difference where in the file system you are currently located when you run the script.
In addition, the files you are opening -- output.txt and $file -- have no path information, so Perl will look in your current working directory to find them.
There are a few ways to solve this.
You could cd to the directory where your files are before running the script, and open the directory as . as you currently do
You could achieve the same effect by calling chdir from within the script, which will change the current working directory and make the program ignore your location when you ran it
Or you could add an absolute directory path to the beginning of the file names, preferably using catfile from File::Spec::Functions
However I would choose to use glob -- which works in the same way as command-line filename expansion -- in preference to opendir / readdir as the resulting strings include the path (if one was specified in the parameter) and there is no need to separately filter the .html files.
I would also choose to undefine the input record separator $/ to read the whole file, rather than reading it line-by-line and concatenating them all.
Finally, if you are running version 10 or later of Perl 5 then it is simpler to use autodie rather than checking the success of every open, readline, close, opendir, readdir, and closedir etc.
Something like this
use strict;
use warnings 'all';
use 5.010;
use autodie;
my $dir = '/path/to/Ask/Parsed/Html4';
my #html = glob "$dir/*.html";
open my $out, '>>', "$dir/output.txt";
for my $file (#html) {
my $contents = do {
open my $in, '<', $file;
local $/;
<$in>;
};
print $out "> $file\n";
print $out "$contents\n";
print $out "===============";
}
close $out;
It is likely trying to access the files from where ever you are calling this from. If you're files are located relative to the location of the script use the following example to provide a full path;
use FindBin;
my $file = "$FindBin::Bin/Ask/Parsed/Html5/output.txt";
If your file us not relative to the script, provide the full path;
my $file = "/home/john.doe/Ask/Parsed/Html5/output.txt";
Note that readdir() only returns the file name. If you want to open it prepend the directory
eg
open my $in, "<", "$dir/$file" or die "Cannot open $file!\n";
Note that best practice says you should be using the three parameter version of open, otherwise

Rename all .txt files in a directory and then open that file in perl

I need some help with file manipulations and need some expert advice.
It looks like I am making a silly mistake somewhere but I can't catch it.
I have a directory that contains files with a .txt suffix, for example file1.txt, file2.txt, file3.txt.
I want to add a revision string, say rev0, to each of those files and then open the modified files. For instance rev0_file1.txt, rev0_file2.txt, rev0_file3.txt.
I can append rev0, but my program fails to open the files.
Here is the relevant portion of my code
my $dir = "path to my directory";
my #::tmp = ();
opendir(my $DIR, "$dir") or die "Can't open directory, $!";
#::list = readdir($DIR);
#::list2 = sort grep(/^.*\.txt$/, #::list);
foreach (#::list2) {
my $new_file = "$::REV0" . "_" . "$_";
print "new file is $new_file\n";
push(#::tmp, "$new_file\n");
}
closedir($DIR);
foreach my $cur_file (<#::tmp>) {
$cur_file = $_;
print "Current file name is $cur_file\n"; # This debug print shows nothing
open($fh, '<', "$cur_file") or die "Can't open the file\n"; # Fails to open file;
}
Your problem is here:
foreach my $cur_file(<#::tmp>) {
$cur_file = $_;
You are using the loop variable $cur_file, but you overwrite it with $_, which is not used at all in this loop. To fix this, just remove the second line.
Your biggest issue is the fact you are using $cur_file in your loop for the file name, but then reassign it with $_ even though $_ won't have a value at that point. Also, as Borodin pointed out, $::REV0 was never defined.
You can use the move command from the File::Copy to move the files, and you can use File::Find to find the files you want to move:
use strict;
use warnings;
use feature qw(say);
use autodie;
use File::Copy; # Provides the move command
use File::Find; # Gives you a way to find the files you want
use constant {
DIRECTORY => '/path/to/directory',
PREFIX => 'rev0_',
};
my #files_to_rename;
find (
sub {
next unless /\.txt$/; # Only interested in ".txt" files
push #files_to_rename, $File::Find::name;
}, DIRECTORY );
for my $file ( #files_to_rename ) {
my $new_name = PREFIX . $file;
move $file, $new_name;
$file = $new_name; # Updates #files_to_rename with new name
open my $file_fh, "<", $new_name; # Open the file here?
...
close $file_fh;
}
for my $file ( #files_to_rename ) {
open my $file_fh, "<", $new_name; # Or, open the file here?
...
close $file_fh;
}
See how using Perl modules can make your task much easier? Perl comes with hundreds of pre-installed packages to handle zip files, tarballs, time, email, etc. You can find a list at the Perldoc page (make sure you select the version of Perl you're using!).
The $file = $new_name is actually changing the value of the file name right inside the #files_to_rename array. It's a little Perl trick. This way, your array refers to the file even through it has been renamed.
You have two choices where to open the file for reading: You can rename all of your files first, and then loop through once again to open each one, or you can open them after you rename them. I've shone both places.
Don't use $:: at all. This is very bad form since it overrides use strict; -- that is if you're using use strict to begin with. The standard is not to use package variables (aka global variables) unless you have to. Instead, you should use lexically scoped variables (aka local variables) defined with my.
One of the advantages of the my variable, I really don't need the close command since the variable falls out of scope with each iteration of the loop and disappears entirely once the loop is complete. When the variable that contains the file handle falls out of scope, the file handle is automatically closed.
Always include use strict;, use warnings at the top of EVERY script. And use autodie; anytime you're doing file or directory processing.
There is no reason why you should be prefixing your variables with :: so please simplify your code like the following:
use strict;
use warnings;
use autodie;
use File::Copy;
my $dir = "path to my directory";
chdir($dir); # Make easier by removing the need to prefix path information
foreach my $file (glob('*.txt')) {
my $newfile = 'rev0_'.$file;
copy($file, $newfile) or die "Can't copy $file -> $newfile: $!";
open my $fh, '<', $newfile;
# File processing
}
What you've attempted to store is the updated name of the file in #::tmp. The file hasn't been renamed, so it's little surprise that the code died because it couldn't find the renamed file.
Since it's just renaming, consider the following code:
use strict;
use warnings;
use File::Copy 'move';
for my $file ( glob( "file*.txt" ) ) {
move( $file, "rev0_$file" )
or die "Unable to rename '$file': $!";
}
From a command line/terminal, consider the rename utility if it is available:
$ rename file rev0_file file*.txt

Error while trying to rename files in script and cmd line

$dir = "/home/naveen/mp3tag/testfolder";
opendir(DMP3, $dir) || die("Cannot open directory");
#files= readdir(DMP3;
foreach $f (#files)
{
unless ( ($f eq ".") || ($f eq "..") )
{
$oldfile = $f;
$newfile = $f;
$newfile =~ s/ /_/g;
print "Old file: $oldfile \t";
print "New file: $newfile";
print "\n";
rename ("$oldfile", "$newfile") or warn "Couldn't rename $oldfile to $newfile !\n";
}
}
I'm writing a simple program to add underscores to an existing file and rename it. This is how far ive gotten with the code. However its not able to rename the file and gives me a warning and i'm not sure where the mistake is.
Also when i tried the same line on the cmd line I get the following error msg.
$ rename Jacques\ Greene\ -\ Clark\ \(Original\ Mix\).mp3 JG - C.mp3
Bareword "mp3" not allowed while "strict subs" in use at (eval 1) line 1.
$ rename Jacques\ Greene\ -\ Clark\ \(Original\ Mix\) JG - C
Can't locate object method "Original" via package "Mix" (perhaps you forgot to load "Mix"?) at (eval 1) line 1.
You're trying to rename all the files in the directory, not just one file. The error could be a great many things, since you did not mention it, I could only guess.
rename is, as I recall, a bit wonky, and using move from File::Copy is a safer bet. Also, you might want to avoid renaming directories. Using a more intuitive interface would probably not be a bad idea either.
One of your biggest mistakes is not using use strict; use warnings;. The amount of trouble you bring on yourself by leaving these out cannot be underestimated.
use strict;
use warnings;
use File::Copy qw(move);
for (#ARGV) {
my $org = $_;
tr/ /_/;
move($org, $_) or warn "Couldn't move $org to $_: $!";
}
Usage:
perl script.pl /home/naveen/mp3tag/testfolder/*.mp3
So, as long as you give a proper glob as argument, your script will only affect those files. You can add more checks to make it stricter.
If that commandline attempt of yours is meant to be with using the tool from /usr/bin/rename, I would hazard a guess that your error can simply be avoided by using quotes.
This working example might help
use strict;
use warnings;
use File::Copy;
my $dir = '/home/naveen/mp3tag/testfolder';
my #mp3s = glob ("$dir/*.mp3");
for my $mp3 (#mp3s) {
my $new_mp3 = $mp3;
$new_mp3 =~ s/\s/_/g;
move($mp3, $new_mp3);
}
You are calling rename in /usr/bin. If you want to call your program, choose a better name for it, or call it with full path specified.
But before you do, add at least the missing right bracket to readdir.

How can I create a path with all of its subdirectories in one shot in Perl?

If you have a path to a file (for example, /home/bob/test/foo.txt) where each subdirectory in the path may or may not exist, how can I create the file foo.txt in a way that uses "/home/bob/test/foo.txt" as the only input instead of creating every nonexistent directory in the path one by one and finally creating foo.txt itself?
You can use File::Basename and File::Path
use strict;
use File::Basename;
use File::Path qw/make_path/;
my $file = "/home/bob/test/foo.txt";
my $dir = dirname($file);
make_path($dir);
open my $fh, '>', $file or die "Ouch: $!\n"; # now go do stuff w/file
I didn't add any tests to see if the file already exists but that's pretty easy to add with Perl.
Use make_dir from File::Util
use File::Util;
my($f) = File::Util->new();
$f->make_dir('/var/tmp/tempfiles/foo/bar/');
# optionally specify a creation bitmask to be used in directory creations
$f->make_dir('/var/tmp/tempfiles/foo/bar/',0755);
I don't think there's a standard function that can do all of what you ask, directly from the filename.
But mkpath(), from the module File::Path, can almost do it given the filename's directory. From the File::Path docs:
The "mkpath" function provides a
convenient way to create directories,
even if your "mkdir" kernel call won't
create more than one level of
directory at a time.
Note that mkpath() does not report errors in a nice way: it dies instead of just returning zero, for some reason.
Given all that, you might do something like:
use File::Basename;
use File::Path;
my $fname = "/home/bob/test/foo.txt";
eval {
local $SIG{'__DIE__'}; # ignore user-defined die handlers
mkpath(dirname($fname));
};
my $fh;
if ($#) {
print STDERR "Error creating dir: $#";
} elsif (!open($fh, ">", $fname)) {
print STDERR "Error creating file: $!\n";
}