How can I list all files in a directory using Perl? - perl

I usually use something like
my $dir="/path/to/dir";
opendir(DIR, $dir) or die "can't open $dir: $!";
my #files = readdir DIR;
closedir DIR;
or sometimes I use glob, but anyway, I always need to add a line or two to filter out . and .. which is quite annoying.
How do you usually go about this common task?

my #files = grep {!/^\./} readdir DIR;
This will exclude all the dotfiles as well, but that's usually What You Want.

I often use File::Slurp. Benefits include: (1) Dies automatically if the directory does not exist. (2) Excludes . and .. by default. It's behavior is like readdir in that it does not return the full paths.
use File::Slurp qw(read_dir);
my $dir = '/path/to/dir';
my #contents = read_dir($dir);
Another useful module is File::Util, which provides many options when reading a directory. For example:
use File::Util;
my $dir = '/path/to/dir';
my $fu = File::Util->new;
my #contents = $fu->list_dir( $dir, '--with-paths', '--no-fsdots' );

I will normally use the glob method:
for my $file (glob "$dir/*") {
#do stuff with $file
}
This works fine unless the directory has lots of files in it. In those cases you have to switch back to readdir in a while loop (putting readdir in list context is just as bad as the glob):
open my $dh, $dir
or die "could not open $dir: $!";
while (my $file = readdir $dh) {
next if $file =~ /^[.]/;
#do stuff with $file
}
Often though, if I am reading a bunch of files in a directory, I want to read them in a recursive manner. In those cases I use File::Find:
use File::Find;
find sub {
return if /^[.]/;
#do stuff with $_ or $File::Find::name
}, $dir;

If some of the dotfiles are important,
my #files = grep !/^\.\.?$/, readdir DIR;
will only exclude . and ..

When I just want the files (as opposed to directories), I use grep with a -f test:
my #files = grep { -f } readdir $dir;

Thanks Chris and Ether for your recommendations. I used the following to read a listing of all files (excluded directories), from a directory handle referencing a directory other than my current directory, into an array. The array was always missing one file when not using the absolute path in the grep statement
use File::Slurp;
print "\nWhich folder do you want to replace text? " ;
chomp (my $input = <>);
if ($input eq "") {
print "\nNo folder entered exiting program!!!\n";
exit 0;
}
opendir(my $dh, $input) or die "\nUnable to access directory $input!!!\n";
my #dir = grep { -f "$input\\$_" } readdir $dh;

Related

Unable to open files returned by readdir in Perl [duplicate]

This question already has answers here:
Why can't I open files returned by Perl's readdir?
(2 answers)
Closed 7 years ago.
I have a problem with a Perl script, as follows.
I must open and analyze all the *.txt files in a directory, but I cannot.
I can read file names that are saved in the #files array and printed, but I cannot open those files for reading.
This is my code:
my $dir= "../Scrivania/programmi" ;
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
print "sono nel foreach\n";
print " in : "."$fh\n";
#open(CANALI,$fh);
##righe=<CANALI>;
#close(CANALI);
#print "canali:"."#righe\n";
#foreach $canali (#righe)
#{
# $canali =~ /\d\d:\d\d (-) (.*)/;
# $ora= $1;
#
# if($hhSplit[0] == $ora)
# {
# push(#output, "$canali");
#
# }
#}
}
The main problem you have is that the file names returned by readdir have no path, so you're trying to open, say, x.txt when you should be opening ../Sc/direct/x.txt. The file doesn't exist in the current working directory so your open call fails
You also have a strange mixture of stuff in glob("$dir/(.*).txt/") which looks a little like a regex pattern, which glob doesn't understand. The value of $dir is a directory handle left open from the opendir on the first line. What you should be using is glob '../Sc/direct/*.txt', but then there's no need for the readdir
There are two ways to find the contents of a file. You can use opendir and readdir to read everything in the directory, or you can use glob
The first method returns only the bare name of each entry, which means you must concatenate each name with the path to the containing directory, preferably using catfile from File::Spec::Functions. It also includes the pseudo-directories . and .. so you must filter those out before you can use the list of names
glob has neither of these disadvantages. All the strings it returns are real directory entries, and they will include a path if you provided one in the pattern you passed as a parameter
You seem to have become rather muddled over the two, so I have written this program which differentiates between the two approaches. I hope it makes things clearer
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Spec::Functions qw/ catfile /;
my $dir = '../Sc/direct';
### Using glob
for my $file ( glob catfile($dir, '*.txt') ) {
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
### Using opendir / readdir
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt$/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
Using $dir in the glob is incorrect. $dir is a GLOB type not a string value. Rather you should be looping over the #files array and looking for names that match what you want. Maybe something like so:
foreach my $fp (#files) {
if ($fp =~ /(.*).txt/) {
print "$fp is a .txt\n";
open (my $in, "<", $fp)
while (<$in>) ...
}
}

relative addressing in perl ,using open dir

I have the following code for listing all files in a directory , I have trouble with path addressing ,my directory is is */tmp/* ,basically I want the files which are in a directory in tmp directory.but I am not allowed to use * ,do you have any idea?
my $directory="*/tmp/*/";
opendir(DIR, $directory) or die "couldn't open $directory: $!\n";
my #files = readdir DIR;
foreach $files (#files){
#...
} ;
closedir DIR;
opendir can't work with wildcards
For your task exists a bit ugly, but working solution
my #files = grep {-f} <*/tmp/*>; # this is equivalent of ls */tmp/*
# grep {-f} will stat on each entry and filter folders
# So #files would contain only file names with relative path
foreach my $file (#files) {
# do with $file whatever you want
}
Without globbing and * wildcard:
use 5.010;
use Path::Class::Rule qw();
for my $tmp_dir (Path::Class::Rule->new->dir->and(sub { return 'tmp' eq (shift->dir_list(1,1) // q{}) })->all) {
say $_ for $tmp_dir->children;
}

perl iterate through directories

I'm trying to get the name of all directories in the specified path
I tried the following but that gives me every level down not just at the path i specified
find(\&dir_names, "C:\\mydata\\");
sub dir_names {
print "$File::Find::dir\n" if(-f $File::Find::dir,'/');
}
my #dirs = grep { -d } glob 'C:\mydata\*';
Use opendir instead
opendir DIR, $dirname or die "Couldn't open dir '$dirname': $!";
my #files = readdir(DIR);
closedir DIR;
#next processing...
EDIT:
"This will give all the files, not just the directories. You'd still have to grep."
Yes, and in that case you can just use file test operator to see whether it's a directory or not.
In Windows:
$dirname="C:\\";
opendir(DIR, $dirname);
#files = readdir(DIR);
closedir DIR;
foreach $key (#files)
{
if(-d "$dirname\\$key")
{
print "$key\n";
}
}
See chapter 2 Filesystems from Automating System Administration with Perl. That provides us with this:
sub ScanDirectory{
my ($workdir) = shift;
chdir($workdir) or die "Unable to enter dir $workdir:$!\n";
opendir(DIR, ".") or die "Unable to open $workdir:$!\n";
my #names = readdir(DIR) or die "Unable to read $workdir:$!\n";
closedir(DIR);
foreach my $name (#names){
next if ($name eq ".");
next if ($name eq "..");
if (-d $name){ # is this a directory?
#Whatever you want to do goes here.
}
}
}
glob or readdir would probably be my choice too. Another way to do it is to use the windows dir command to do the job:
my #dirs = qx(dir /AD /B);
chomp #dirs;

What is the most efficient way to open/act upon all of the files in a directory?

I need to perform my script (a search) on all the files of a directory. Here are the methods which work. I am just asking which is best. (I need file names of form: parsedchpt31_4.txt)
Glob:
my $parse_corpus; #(for all options)
##glob (only if all files in same directory as script?):
my #files = glob("parsed"."*.txt");
foreach my $file (#files) {
open($parse_corpus, '<', "$file") or die $!;
... all my code...
}
Readdir with while and conditions:
##readdir:
my $dir = '.';
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file"); ##Ensure it's a file
next unless ($file =~ m/^parsed.*\.txt/); ##Ensure it's a parsed file
open($parse_corpus, '<', "$file") or die "Couldn't open directory $!";
... all my code...
}
Readdir with foreach and grep:
##readdir+grep:
my $dir = '.';
opendir(DIR, $dir) or die $!;
foreach my $file (grep {/^parsed.*\.txt/} readdir (DIR)) {
next unless (-f "$dir/$file"); ##Ensure it's a file
open($parse_corpus, '<', "$file") or die "Couldn't open directory $!";
... all my code...
}
File::Find:
##File::Find
my $dir = "."; ##current directory: could be (include quotes): '/Users/jon/Desktop/...'
my #files;
find(\&open_file, $dir); ##built in function
sub open_file {
push #files, $File::Find::name if(/^parsed.*\.txt/);
}
foreach my $file (#files) {
open($parse_corpus, '<', "$file") or die $!;
...all my code...
}
Is there another way? Is it good to enclose my entire script in the loops? Is it okay I don't use closedir? I'm passing this off to others, I'm not sure where their files will be (may not be able to use glob)
Thanks a lot, hopefully this is the right place to ask this.
The best or most efficient approach depends on your purposes and the larger context. Do you mean best in terms of raw speed, simplicity of the code, or something else? I'm skeptical that memory considerations should drive this choice. How many files are in the directory?
For sheer practicality, the glob approach works fairly well. Before resorting to anything more involved, I'd ask whether there is a problem.
If you're able to use other modules, another approach is to let someone else worry about the grubby details:
use File::Util qw();
my $fu = File::Util->new;
my #files = $fu->list_dir($dir, qw(--with-paths --files-only));
Note that File::Find performs a recursive search descending into all subdirectories. Many times you don't want or need that.
I would also add that I dislike your two readdir examples because they comingle different pieces of functionality: (1) getting file names, and (2) processing individual files. I would keep those jobs separate.
my $dir = '.';
opendir(my $dh, $dir) or die $!; # Use a lexical directory handle.
my #files =
grep { -f }
map { "$dir/$_" }
grep { /^parsed.*\.txt$/ }
readdir($dh);
for my $file (#files){
...
}
I think using a while loop is the safer answer. Why? Because loading all the file names into an array could mean a large memory usage, and using line-by-line operation avoids that problem.
I prefer readdir to glob, but that's probably more a matter of taste.
If performance is an issue, one could say that the -f check is unnecessary for any file with the .txt extension.
I find that a recursive directory walking function using the perfect partners opendir/readdir and File::chdir (my fav CPAN module, great for cross-platform) allows one to easily and clearly manipulate anything in a directory including subdirectories if desired (if not, omit the recursion).
Example (a simple deep ls):
#!/usr/bin/env perl
use strict;
use warnings;
use File::chdir; #Provides special variable $CWD
# assign $CWD sets working directory
# can be local to a block
# evaluates/stringifies to absolute path
# other great features
walk_dir(shift);
sub do_something {
print shift . "\n";
}
sub walk_dir {
my $dir = shift;
local $CWD = $dir;
opendir my $dh, $CWD; # lexical opendir, so no closedir needed
print "In: $CWD\n";
while (my $entry = readdir $dh) {
next if ($entry =~ /^\.+$/);
# other exclusion tests
if (-d $entry) {
walk_dir($entry);
} elsif (-f $entry) {
do_something($entry);
}
}
}

Filter filenames by pattern

I need to search for files in a directory that begin with a particular pattern, say "abc". I also need to eliminate all the files in the result that end with ".xh". I am not sure how to go about doing it in Perl.
I have something like this:
opendir(MYDIR, $newpath);
my #files = grep(/abc\*.*/,readdir(MYDIR)); # DOES NOT WORK
I also need to eliminate all files from result that end with ".xh"
Thanks, Bi
try
#files = grep {!/\.xh$/} <$MYDIR/abc*>;
where MYDIR is a string containing the path of your directory.
opendir(MYDIR, $newpath); my #files = grep(/abc*.*/,readdir(MYDIR)); #DOES NOT WORK
You are confusing a regex pattern with a glob pattern.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $dir_h, '.'
or die "Cannot open directory: $!";
my #files = grep { /abc/ and not /\.xh$/ } readdir $dir_h;
closedir $dir_h;
print "$_\n" for #files;
opendir(MYDIR, $newpath) or die "$!";
my #files = grep{ !/\.xh$/ && /abc/ } readdir(MYDIR);
close MYDIR;
foreach (#files) {
do something
}
The point that kevinadc and Sinan Unur are using but not mentioning is that readdir() returns a list of all the entries in the directory when called in list context. You can then use any list operator on that. That's why you can use:
my #files = grep (/abc/ && !/\.xh$/), readdir MYDIR;
So:
readdir MYDIR
returns a list of all the files in MYDIR.
And:
grep (/abc/ && !/\.xh$/)
returns all the elements returned by readdir MYDIR that match the criteria there.
foreach $file (#files)
{
my $fileN = $1 if $file =~ /([^\/]+)$/;
if ($fileN =~ /\.xh$/)
{
unlink $file;
next;
}
if ($fileN =~ /^abc/)
{
open(FILE, "<$file");
while(<FILE>)
{
# read through file.
}
}
}
also, all files in a directory can be accessed by doing:
$DIR = "/somedir/somepath";
foreach $file (<$DIR/*>)
{
# apply file checks here like above.
}
ALternatively you can use the perl module File::find.
Instead of using opendir and filtering readdir (don't forget to closedir!), you could instead use glob:
use File::Spec::Functions qw(catfile splitpath);
my #files =
grep !/^\.xh$/, # filter out names ending in ".xh"
map +(splitpath $_)[-1], # filename only
glob # perform shell-like glob expansion
catfile $newpath, 'abc*'; # "$newpath/abc*" (or \ or :, depending on OS)
If you don't care about eliminating the $newpath prefixed to the results of glob, get rid of the map+splitpath.