How can I get a list of all files with a certain extension from a specific directory? - perl

I'm using this code to get a list of all the files in a specific directory:
opendir DIR, $dir or die "cannot open dir $dir: $!";
my #files= readdir DIR;
closedir DIR;
How can I modify this code or append something to it so that it only looks for text files and only loads the array with the prefix of the filename?
Example directory contents:
.
..
923847.txt
98398523.txt
198.txt
deisi.jpg
oisoifs.gif
lksdjl.exe
Example array contents:
files[0]=923847
files[1]=98398523
files[2]=198

my #files = glob "$dir/*.txt";
for (0..$#files){
$files[$_] =~ s/\.txt$//;
}

it is enough to change one line:
my #files= map{s/\.[^.]+$//;$_}grep {/\.txt$/} readdir DIR;

If you can use the new features of Perl 5.10, this is how I would write it.
use strict;
use warnings;
use 5.10.1;
use autodie; # don't need to check the output of opendir now
my $dir = ".";
{
opendir my($dirhandle), $dir;
for( readdir $dirhandle ){ # sets $_
when(-d $_ ){ next } # skip directories
when(/^[.]/){ next } # skip dot-files
when(/(.+)[.]txt$/){ say "text file: ", $1 }
default{
say "other file: ", $_;
}
}
# $dirhandle is automatically closed here
}
Or if you have very large directories, you could use a while loop.
{
opendir my($dirhandle), $dir;
while( my $elem = readdir $dirhandle ){
given( $elem ){ # sets $_
when(-d $_ ){ next } # skip directories
when(/^[.]/){ next } # skip dot-files
when(/(.+)[.]txt$/){ say "text file: ", $1 }
default{
say "other file: ", $_;
}
}
}
}

This is the simplest way I've found (as in human readable) using the glob function:
# Store only TXT-files in the #files array using glob
my #files = grep ( -f ,<*.txt>);
# Write them out
foreach $file (#files) {
print "$file\n";
}
Additionally the "-f" ensures that only actual files (and not directories) are stored in the array.

To get just the ".txt" files, you can use a file test operator (-f : regular file) and a regex.
my #files = grep { -f && /\.txt$/ } readdir $dir;
Otherwise, you can look for just text files, using perl's -T (ascii-text file test operator)
my #files = grep { -T } readdir $dir;

Just use this:
my #files = map {-f && s{\.txt\z}{} ? $_ : ()} readdir DIR;

Related

Perl cannot stat $_

I want the last modified time for each file in the directory. To make sure my loop is working I print $_ and I see the file names of the directory:
for ( #Files ) {
opendir( D, $path . '\/' . $_ ) or die "$!";
my #textfiles = grep { ! /^\.{1,2}$/ } readdir( D );
for ( #textfiles ) {
# print "$_\n"; <----the file names.
my $epoch_timestamp = ( stat( $_ ) )[9];
print "$epoch_timestamp\n";
}
I get this error
Use of uninitialized value $epoch_timestamp in concatenation (.) or string
What am I doing wrong?
readdir returns only the names of the files. If your current working directory is different then you must build the full path as you did with the parameter to opendir. The easiest way is to use map in the list for the for loop
I'm concerned about your statement
opendir( D, $path . '\/' . $_ ) or die "$!";
which will put, literally, \/ between $path and $_. I think you need just /, but it is simplest to interpolate the variables with
opendir( D, "$path/$_" ) or die "$!";
But $_ comes from the array #Files. If these are indeed file names then your opendir will fail. They need to be directory names
In my solution I've built the variable $dir as
my $dir = "$path/$_"
so that it can be used in the call to opendir as well as to build the full path to the files in the following for loop
Note that I have also used a lexical directory handle my $dh, which are far superior to global handles D
for ( #Files ) {
my $dir = "$path/$_";
opendir my $dh, $dir or die $!;
my #textfiles = grep { ! /^\.{1,2}$/ } readdir $dh;
for ( map { "$dir/$_" } #textfiles ) {
# print "$_\n"; <----the file names.
my $epoch_timestamp = ( stat( $_ ) )[9];
print "$epoch_timestamp\n";
}
Or alternatively to above perfect answers, you could use some modules and make your life more easy. :) Like: Path::Tiny[1]
use 5.014;
use warnings;
use Path::Tiny;
my $path = path('/etc');
my #Files = qw(defaults cups ssl);
for my $dir (#Files) {
my #textfiles = $path->child($dir)->children;
for my $file (#textfiles) {
say "$file: ", $file->stat->mtime;
}
}
Of course, the above the nested loop could be written as
for my $dir (#Files) {
my #textfiles = $path->child($dir)->children;
say "$_: ", $_->stat->mtime for (#textfiles);
}
and also storing the list of files into #textfiles isn't necessary, so it could be reduced to:
for my $dir (#Files) {
say "$_: ", $_->stat->mtime for ( $path->child($dir)->children );
}
Path::Tiny conveniently throws a clean exception message on error.
readdir only returns the name of the file in the directory. You need to provide a qualified path to the file to stat.
my $dir_qfn = ...;
opendir(my $dh, $dir_qfn)
or do {
warn("Can't read dir \"$dir_qfn\": $!\n");
next;
};
while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^\.\.?\z/;
my $qfn = "$dir_qfn/$fn";
my $mtime = ( stat($qfn) )[9];
defined($mtime)
or do {
warn("Can't stat file \"$file_qfn\": $!\n");
next;
};
...
}
Using glob instead
my $dir = ...;
my %ts =
map { $_ => (stat $_)[9] }
grep { !m{/\.\.?\z} } #/
glob "\Q$dir\E/{*,.*}";
say "ts{$_} => $_" for sort keys %ts;
I use a hash name => timestamp to collect both in a data structure. The pattern $dir/{*,.*} is there to catch dot files as well, or it would be just $dir/*.
The grep filters out . and .. filenames, found in path by m{..} match. Its pattern needs \Q..\E to prevent an injection bug with particular directory names. It also escapes spaces so File::Glob with its :bsd_globoption isn't needed. Thanks to ikegami for comments.
If you'd rather process files one at a time, retrieve the list with glob and then iterate through it.

How to read directories and sub-directories without knowing the directory name in perl?

Hi i want to read directories and sub-directories without knowing the directory name. Current directory is "D:/Temp". 'Temp' has sub-directories like 'A1','A2'. Again 'A1' has sub-directories like 'B1','B2'. Again 'B1' has sub-directories like 'C1','C2'. Perl script doesn't know these directories. So it has to first find directory and then read one file at a time in dir 'C1' once all files are read in 'C1' it should changes to dir 'C2'. I tried with below code here i don't want to read all files in array(#files) but need one file at time. In array #dir elements should be as fallows.
$dir[0] = "D:/Temp/A1/B1/C1"
$dir[1] = "D:/Temp/A1/B1/C2"
$dir[2] = "D:/Temp/A1/B2/C1"
Below is the code i tried.
use strict;
use File::Find::Rule;
use Data::Dumper;
my $dir = "D:/Temp";
my #dir = File::Find::Rule->directory->in($dir);
print Dumper (\#dir);
my $readDir = $dir[3];
opendir ( DIR, $readDir ) || die "Error in opening dir $readDir\n";
my #files = grep { !/^\.\.?$/ } readdir DIR;
print STDERR "files: #files \n\n";
for my $fil (#files) {
open (F, "<$fil");
read (F, my $data);
close (F);
print "$data";
}
use File::Find;
use strict;
use warnings;
my #dirs;
my %has_children;
find(sub {
if (-d) {
push #dirs, $File::Find::name;
$has_children{$File::Find::dir} = 1;
}
}, 'D:/Temp');
my #ends = grep {! $has_children{$_}} #dirs;
print "$_\n" for (#ends);
Your Goal: Find the absolute paths to those directories that do not themselves have child directories.
I'll call those directories of interest terminal directories. Here's the prototype for a function that I believe provides the convenience you are looking for. The function returns its result as a list.
my #list = find_terminal_directories($full_or_partial_path);
And here's an implementation of find_terminal_directories(). Note that this implementation does not require the use of any global variables. Also note the use of a private helper function that is called recursively.
On my Windows 7 system, for the input directory C:/Perl/lib/Test, I get the output:
== List of Terminal Folders ==
c:/Perl/lib/Test/Builder/IO
c:/Perl/lib/Test/Builder/Tester
c:/Perl/lib/Test/Perl/Critic
== List of Files in each Terminal Folder: ==
c:/Perl/lib/Test/Builder/IO/Scalar.pm
c:/Perl/lib/Test/Builder/Tester/Color.pm
c:/Perl/lib/Test/Perl/Critic/Policy.pm
Implementation
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd qw(abs_path getcwd);
my #dir_list = find_terminal_directories("C:/Perl/lib/Test");
print "== List of Terminal Directories ==\n";
print join("\n", #dir_list), "\n";
print "\n== List of Files in each Terminal Directory: ==\n";
for my $dir (#dir_list) {
for my $file (<"$dir/*">) {
print "$file\n";
open my $fh, '<', $file or die $!;
my $data = <$fh>; # slurp entire file contents into $data
close $fh;
# Now, do something with $data !
}
}
sub find_terminal_directories {
my $rootdir = shift;
my #wanted;
my $cwd = getcwd();
chdir $rootdir;
find_terminal_directories_helper(".", \#wanted);
chdir $cwd;
return #wanted;
}
sub find_terminal_directories_helper {
my ($dir, $wanted) = #_;
return if ! -d $dir;
opendir(my $dh, $dir) or die "open directory error!";
my $count = 0;
foreach my $child (readdir($dh)) {
my $abs_child = abs_path($child);
next if (! -d $child || $child eq "." || $child eq "..");
++$count;
chdir $child;
find_terminal_directories_helper($abs_child, $wanted); # recursion!
chdir "..";
}
push #$wanted, abs_path($dir) if ! $count; # no sub-directories found!
}
Perhaps the following will be helpful:
use strict;
use warnings;
use File::Find::Rule;
my $dir = "D:/Temp";
local $/;
my #dirs =
sort File::Find::Rule->exec( sub { File::Find::Rule->directory->in($_) == 1 }
)->directory->in($dir);
for my $dir (#dirs) {
for my $file (<"$dir/*">) {
open my $fh, '<', $file or die $!;
my $data = <$fh>;
close $fh;
print $data;
}
}
local $/; lets us slurp the file's contents into a variable. Delete it if you only want to read the first line.
The sub in the exec() is used to pass only those dirs which don't contain a dir
sort is used to arrange those dirs in your wanted order
A file glob <"$dir/*"> is used to get the files in each dir
Edit: Have modified the code to find only 'terminal directories.' Thanks to DavidRR for this spec clarification.
I would use File::Find
Sample script:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $dir = "/home/chris";
find(\&wanted, $dir);
sub wanted {
print "dir: $File::Find::dir\n";
print "file in dir: $_\n";
print "complete path to file: $File::Find::name\n";
}
OUTPUTS:
$ test.pl
dir: /home/chris/test_dir
file in dir: test_dir2
complete path to file: /home/chris/test_dir/test_dir2
dir: /home/chris/test_dir/test_dir2
file in dir: foo.txt
complete path to file: /home/chris/test_dir/test_dir2/foo.txt
...
Using backticks, write subdirs and files to a file called filelist:
`ls -R $dir > filelist`

Perl program help on opendir and readdir

So I have a program that I want to clean some text files. The program asks for the user to enter the full pathway of a directory containing these text files. From there I want to read the files in the directory, print them to a new file (that is specified by the user), and then clean them in the way I need. I have already written the script to clean the text files.
I ask the user for the directory to use:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
Then I need to read the directory.
my #dir = readdir DIR;
foreach (#dir) {
Now I am lost.
Any help please?
I'm not certain of what do you want. So, I made some assumptions:
When you say clean the text file, you meant delete the text file
The names of the files you want to write into are formed by a pattern.
So, if I'm right, try something like this:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
my #dir = readdir DIR;
foreach (#dir) {
next if (($_ eq '.') || ($_ eq '..'));
# Reads the content of the original file
open FILE, $_;
$contents = <FILE>;
close FILE;
# Here you supply the new filename
$new_filename = $_ . ".new";
# Writes the content to the new file
open FILE, '>'.$new_filename;
print FILE $content;
close FILE;
# Deletes the old file
unlink $_;
}
I would suggest that you switch to File::Find. It can be a bit of a challenge in the beginning but it is powerful and cross-platform.
But, to answer your question, try something like:
my #files = readdir DIR;
foreach $file (#files) {
foo($user_supplied_directory/$file);
}
where "foo" is whatever you need to do to the files. A few notes might help:
using "#dir" as the array of files was a bit misleading
the folder name needs to be prepended to the file name to get the right file
it might be convenient to use grep to throw out unwanted files and subfolders, especially ".."
I wrote something today that used readdir. Maybe you can learn something from it. This is just a part of a (somewhat) larger program:
our #Perls = ();
{
my $perl_rx = qr { ^ perl [\d.] + $ }x;
for my $dir (split(/:/, $ENV{PATH})) {
### scanning: $dir
my $relative = ($dir =~ m{^/});
my $dirpath = $relative ? $dir : "$cwd/$dir";
unless (chdir($dirpath)) {
warn "can't cd to $dirpath: $!\n";
next;
}
opendir(my $dot, ".") || next;
while ($_ = readdir($dot)) {
next unless /$perl_rx/o;
### considering: $_
next unless -f;
next unless -x _;
### saving: $_
push #Perls, "$dir/$_";
}
}
}
{
my $two_dots = qr{ [.] .* [.] }x;
if (grep /$two_dots/, #Perls) {
#Perls = grep /$two_dots/, #Perls;
}
}
{
my (%seen, $dev, $ino);
#Perls = grep {
($dev, $ino) = stat $_;
! $seen{$dev, $ino}++;
} #Perls;
}
The crux is push(#Perls, "$dir/$_"): filenames read by readdir are basenames only; they are not full pathnames.
You can do the following, which allows the user to supply their own directory or, if no directory is specified by the user, it defaults to a designated location.
The example shows the use of opendir, readdir, stores all files in the directory in the #files array, and only files that end with '.txt' in the #keys array. The while loop ensures that the full path to the files are stored in the arrays.
This assumes that your "text files" end with the ".txt" suffix. I hope that helps, as I'm not quite sure what's meant by "cleaning the files".
use feature ':5.24';
use File::Copy;
my $dir = shift || "/some/default/directory";
opendir(my $dh, $dir) || die "Can't open $dir: $!";
while ( readdir $dh ) {
push( #files, "$dir/$_");
}
# store ".txt" files in new array
foreach $file ( #files ) {
push( #keys, $file ) if $file =~ /(\S+\.txt\z)/g;
}
# Move files to new location, even if it's across different devices
for ( #keys ) {
move $_, "/some/other/directory/"; || die "Couldn't move files: $!\n";
}
See the perldoc of File::Copy for more info.

Filter filenames by pattern

I need to search for files in a directory that begin with a particular pattern, say "abc". I also need to eliminate all the files in the result that end with ".xh". I am not sure how to go about doing it in Perl.
I have something like this:
opendir(MYDIR, $newpath);
my #files = grep(/abc\*.*/,readdir(MYDIR)); # DOES NOT WORK
I also need to eliminate all files from result that end with ".xh"
Thanks, Bi
try
#files = grep {!/\.xh$/} <$MYDIR/abc*>;
where MYDIR is a string containing the path of your directory.
opendir(MYDIR, $newpath); my #files = grep(/abc*.*/,readdir(MYDIR)); #DOES NOT WORK
You are confusing a regex pattern with a glob pattern.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $dir_h, '.'
or die "Cannot open directory: $!";
my #files = grep { /abc/ and not /\.xh$/ } readdir $dir_h;
closedir $dir_h;
print "$_\n" for #files;
opendir(MYDIR, $newpath) or die "$!";
my #files = grep{ !/\.xh$/ && /abc/ } readdir(MYDIR);
close MYDIR;
foreach (#files) {
do something
}
The point that kevinadc and Sinan Unur are using but not mentioning is that readdir() returns a list of all the entries in the directory when called in list context. You can then use any list operator on that. That's why you can use:
my #files = grep (/abc/ && !/\.xh$/), readdir MYDIR;
So:
readdir MYDIR
returns a list of all the files in MYDIR.
And:
grep (/abc/ && !/\.xh$/)
returns all the elements returned by readdir MYDIR that match the criteria there.
foreach $file (#files)
{
my $fileN = $1 if $file =~ /([^\/]+)$/;
if ($fileN =~ /\.xh$/)
{
unlink $file;
next;
}
if ($fileN =~ /^abc/)
{
open(FILE, "<$file");
while(<FILE>)
{
# read through file.
}
}
}
also, all files in a directory can be accessed by doing:
$DIR = "/somedir/somepath";
foreach $file (<$DIR/*>)
{
# apply file checks here like above.
}
ALternatively you can use the perl module File::find.
Instead of using opendir and filtering readdir (don't forget to closedir!), you could instead use glob:
use File::Spec::Functions qw(catfile splitpath);
my #files =
grep !/^\.xh$/, # filter out names ending in ".xh"
map +(splitpath $_)[-1], # filename only
glob # perform shell-like glob expansion
catfile $newpath, 'abc*'; # "$newpath/abc*" (or \ or :, depending on OS)
If you don't care about eliminating the $newpath prefixed to the results of glob, get rid of the map+splitpath.

How do I read in the contents of a directory in Perl?

How do I get Perl to read the contents of a given directory into an array?
Backticks can do it, but is there some method using 'scandir' or a similar term?
opendir(D, "/path/to/directory") || die "Can't open directory: $!\n";
while (my $f = readdir(D)) {
print "\$f = $f\n";
}
closedir(D);
EDIT: Oh, sorry, missed the "into an array" part:
my $d = shift;
opendir(D, "$d") || die "Can't open directory $d: $!\n";
my #list = readdir(D);
closedir(D);
foreach my $f (#list) {
print "\$f = $f\n";
}
EDIT2: Most of the other answers are valid, but I wanted to comment on this answer specifically, in which this solution is offered:
opendir(DIR, $somedir) || die "Can't open directory $somedir: $!";
#dots = grep { (!/^\./) && -f "$somedir/$_" } readdir(DIR);
closedir DIR;
First, to document what it's doing since the poster didn't: it's passing the returned list from readdir() through a grep() that only returns those values that are files (as opposed to directories, devices, named pipes, etc.) and that do not begin with a dot (which makes the list name #dots misleading, but that's due to the change he made when copying it over from the readdir() documentation). Since it limits the contents of the directory it returns, I don't think it's technically a correct answer to this question, but it illustrates a common idiom used to filter filenames in Perl, and I thought it would be valuable to document. Another example seen a lot is:
#list = grep !/^\.\.?$/, readdir(D);
This snippet reads all contents from the directory handle D except '.' and '..', since those are very rarely desired to be used in the listing.
A quick and dirty solution is to use glob
#files = glob ('/path/to/dir/*');
This will do it, in one line (note the '*' wildcard at the end)
#files = </path/to/directory/*>;
# To demonstrate:
print join(", ", #files);
IO::Dir is nice and provides a tied hash interface as well.
From the perldoc:
use IO::Dir;
$d = IO::Dir->new(".");
if (defined $d) {
while (defined($_ = $d->read)) { something($_); }
$d->rewind;
while (defined($_ = $d->read)) { something_else($_); }
undef $d;
}
tie %dir, 'IO::Dir', ".";
foreach (keys %dir) {
print $_, " " , $dir{$_}->size,"\n";
}
So you could do something like:
tie %dir, 'IO::Dir', $directory_name;
my #dirs = keys %dir;
You could use DirHandle:
use DirHandle;
$d = new DirHandle ".";
if (defined $d)
{
while (defined($_ = $d->read)) { something($_); }
$d->rewind;
while (defined($_ = $d->read)) { something_else($_); }
undef $d;
}
DirHandle provides an alternative, cleaner interface to the opendir(), closedir(), readdir(), and rewinddir() functions.
Similar to the above, but I think the best version is (slightly modified) from "perldoc -f readdir":
opendir(DIR, $somedir) || die "can't opendir $somedir: $!";
#dots = grep { (!/^\./) && -f "$somedir/$_" } readdir(DIR);
closedir DIR;
You can also use the children method from the popular Path::Tiny module:
use Path::Tiny;
my #files = path("/path/to/dir")->children;
This creates an array of Path::Tiny objects, which are often more useful than just filenames if you want to do things to the files, but if you want just the names:
my #files = map { $_->stringify } path("/path/to/dir")->children;
Here's an example of recursing through a directory structure and copying files from a backup script I wrote.
sub copy_directory {
my ($source, $dest) = #_;
my $start = time;
# get the contents of the directory.
opendir(D, $source);
my #f = readdir(D);
closedir(D);
# recurse through the directory structure and copy files.
foreach my $file (#f) {
# Setup the full path to the source and dest files.
my $filename = $source . "\\" . $file;
my $destfile = $dest . "\\" . $file;
# get the file info for the 2 files.
my $sourceInfo = stat( $filename );
my $destInfo = stat( $destfile );
# make sure the destinatin directory exists.
mkdir( $dest, 0777 );
if ($file eq '.' || $file eq '..') {
} elsif (-d $filename) { # if it's a directory then recurse into it.
#print "entering $filename\n";
copy_directory($filename, $destfile);
} else {
# Only backup the file if it has been created/modified since the last backup
if( (not -e $destfile) || ($sourceInfo->mtime > $destInfo->mtime ) ) {
#print $filename . " -> " . $destfile . "\n";
copy( $filename, $destfile ) or print "Error copying $filename: $!\n";
}
}
}
print "$source copied in " . (time - $start) . " seconds.\n";
}
from: http://perlmeme.org/faqs/file_io/directory_listing.html
#!/usr/bin/perl
use strict;
use warnings;
my $directory = '/tmp';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
print "$file\n";
}
The following example (based on a code sample from perldoc -f readdir) gets all the files (not directories) beginning with a period from the open directory. The filenames are found in the array #dots.
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/tmp';
opendir(DIR, $dir) or die $!;
my #dots
= grep {
/^\./ # Begins with a period
&& -f "$dir/$_" # and is a file
} readdir(DIR);
# Loop through the array printing out the filenames
foreach my $file (#dots) {
print "$file\n";
}
closedir(DIR);
exit 0;
closedir(DIR);
exit 0;