How can I find the newest created file in a directory? - perl

Is there an elegant way in Perl to find the newest file in a directory (newest by modification date)?
What I have so far is searching for the files I need, and for each one get it's modification time, push into an array containing the filename, modification time, then sort it.
There must be a better way.

Your way is the "right" way if you need a sorted list (and not just the first, see Brian's answer for that). If you don't fancy writing that code yourself, use this
use File::DirList;
my #list = File::DirList::list('.', 'M');
Personally I wouldn't go with the ls -t method - that involves forking another program and it's not portable. Hardly what I'd call "elegant"!
Regarding rjray's solution hand coded solution, I'd change it slightly:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my #files = map { [ stat "$DIR/$_", $_ ] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
sub rev_by_date { $b->[9] <=> $a->[9] }
my #sorted_files = sort rev_by_date #files;
After this, #sorted_files contains the sorted list, where the 0th element is the newest file, and each element itself contains a reference to the results of stat, with the filename itself in the last element:
my #newest = #{$sorted_files[0]};
my $name = pop(#newest);
The advantage of this is that it's easier to change the sorting method later, if desired.
EDIT: here's an easier-to-read (but longer) version of the directory scan, which also ensures that only plain files are added to the listing:
my #files;
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
while (defined (my $file = readdir($DH))) {
my $path = $DIR . '/' . $file;
next unless (-f $path); # ignore non-files - automatically does . and ..
push(#files, [ stat(_), $path ]); # re-uses the stat results from '-f'
}
closedir($DH);
NB: the test for defined() on the result of readdir() is because a file called '0' would cause the loop to fail if you only test for if (my $file = readdir($DH))

You don't need to keep all of the modification times and filenames in a list, and you probably shouldn't. All you need to do is look at one file and see if it's older than the oldest you've previously seen:
{
opendir my $dh, $dir or die "Could not open $dir: $!";
my( $newest_name, $newest_time ) = ( undef, 2**31 -1 );
while( defined( my $file = readdir( $dh ) ) ) {
my $path = File::Spec->catfile( $dir, $file );
next if -d $path; # skip directories, or anything else you like
( $newest_name, $newest_time ) = ( $file, -M _ ) if( -M $path < $newest_time );
}
print "Newest file is $newest_name\n";
}

you could try using the shell's ls command:
#list = `ls -t`;
$newest = $list[0];

Assuming you know the $DIR you want to look in:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my %files = map { $_ => (stat("$DIR/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
# $sorted_files[0] is the most-recently modified. If it isn't the actual
# file-of-interest, you can iterate through #sorted_files until you find
# the interesting file(s).
The grep that wraps the readdir filters out the "." and ".." special files in a UNIX(-ish) filesystem.

If you can't let ls do the sorting for you as #Nathan suggests, then you can optimize your process by only keeping the newest modification time and associated filename seen thus far and replace it every time you find a newer file in the directory. No need to keep any files around that you know are older than the newest one you've seen so far and certainly no need to sort them since you can detect which is the newest one while reading from the directory.

Subject is old, but maybe someone will try it - it isn't portable (Unix-like systems only), but it's quite simple and works:
chdir $directory or die "cannot change directory";
my $newest_file = bash -c 'ls -t | head -1';
chomp $newest_file;
print "$newest_file \n";

Related

What is the most efficient way to open/act upon all of the files in a directory?

I need to perform my script (a search) on all the files of a directory. Here are the methods which work. I am just asking which is best. (I need file names of form: parsedchpt31_4.txt)
Glob:
my $parse_corpus; #(for all options)
##glob (only if all files in same directory as script?):
my #files = glob("parsed"."*.txt");
foreach my $file (#files) {
open($parse_corpus, '<', "$file") or die $!;
... all my code...
}
Readdir with while and conditions:
##readdir:
my $dir = '.';
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next unless (-f "$dir/$file"); ##Ensure it's a file
next unless ($file =~ m/^parsed.*\.txt/); ##Ensure it's a parsed file
open($parse_corpus, '<', "$file") or die "Couldn't open directory $!";
... all my code...
}
Readdir with foreach and grep:
##readdir+grep:
my $dir = '.';
opendir(DIR, $dir) or die $!;
foreach my $file (grep {/^parsed.*\.txt/} readdir (DIR)) {
next unless (-f "$dir/$file"); ##Ensure it's a file
open($parse_corpus, '<', "$file") or die "Couldn't open directory $!";
... all my code...
}
File::Find:
##File::Find
my $dir = "."; ##current directory: could be (include quotes): '/Users/jon/Desktop/...'
my #files;
find(\&open_file, $dir); ##built in function
sub open_file {
push #files, $File::Find::name if(/^parsed.*\.txt/);
}
foreach my $file (#files) {
open($parse_corpus, '<', "$file") or die $!;
...all my code...
}
Is there another way? Is it good to enclose my entire script in the loops? Is it okay I don't use closedir? I'm passing this off to others, I'm not sure where their files will be (may not be able to use glob)
Thanks a lot, hopefully this is the right place to ask this.
The best or most efficient approach depends on your purposes and the larger context. Do you mean best in terms of raw speed, simplicity of the code, or something else? I'm skeptical that memory considerations should drive this choice. How many files are in the directory?
For sheer practicality, the glob approach works fairly well. Before resorting to anything more involved, I'd ask whether there is a problem.
If you're able to use other modules, another approach is to let someone else worry about the grubby details:
use File::Util qw();
my $fu = File::Util->new;
my #files = $fu->list_dir($dir, qw(--with-paths --files-only));
Note that File::Find performs a recursive search descending into all subdirectories. Many times you don't want or need that.
I would also add that I dislike your two readdir examples because they comingle different pieces of functionality: (1) getting file names, and (2) processing individual files. I would keep those jobs separate.
my $dir = '.';
opendir(my $dh, $dir) or die $!; # Use a lexical directory handle.
my #files =
grep { -f }
map { "$dir/$_" }
grep { /^parsed.*\.txt$/ }
readdir($dh);
for my $file (#files){
...
}
I think using a while loop is the safer answer. Why? Because loading all the file names into an array could mean a large memory usage, and using line-by-line operation avoids that problem.
I prefer readdir to glob, but that's probably more a matter of taste.
If performance is an issue, one could say that the -f check is unnecessary for any file with the .txt extension.
I find that a recursive directory walking function using the perfect partners opendir/readdir and File::chdir (my fav CPAN module, great for cross-platform) allows one to easily and clearly manipulate anything in a directory including subdirectories if desired (if not, omit the recursion).
Example (a simple deep ls):
#!/usr/bin/env perl
use strict;
use warnings;
use File::chdir; #Provides special variable $CWD
# assign $CWD sets working directory
# can be local to a block
# evaluates/stringifies to absolute path
# other great features
walk_dir(shift);
sub do_something {
print shift . "\n";
}
sub walk_dir {
my $dir = shift;
local $CWD = $dir;
opendir my $dh, $CWD; # lexical opendir, so no closedir needed
print "In: $CWD\n";
while (my $entry = readdir $dh) {
next if ($entry =~ /^\.+$/);
# other exclusion tests
if (-d $entry) {
walk_dir($entry);
} elsif (-f $entry) {
do_something($entry);
}
}
}

Recursive Perl detail need help

i think this is a simple problem, but i'm stuck with it for some time now! I need a fresh pair of eyes on this.
The thing is i have this code in perl:
#!c:/Perl/bin/perl
use CGI qw/param/;
use URI::Escape;
print "Content-type: text/html\n\n";
my $directory = param ('directory');
$directory = uri_unescape ($directory);
my #contents;
readDir($directory);
foreach (#contents) {
print "$_\n";
}
#------------------------------------------------------------------------
sub readDir(){
my $dir = shift;
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
if(-d $dir.$file)
{
#print $dir.$file. " ----- DIR\n";
readDir($dir.$file);
}
push #contents, ($dir . $file);
}
closedir(DIR);
}
I've tried to make it recursive. I need to have all the files of all of the directories and subdirectories, with the full path, so that i can open the files in the future.
But my output only returns the files in the current directory and the files in the first directory that it finds. If i have 3 folders inside the directory it only shows the first one.
Ex. of cmd call:
"perl readDir.pl directory=C:/PerlTest/"
Thanks
Avoid wheel reinvention, use CPAN.
use Path::Class::Iterator;
my $it = Path::Class::Iterator->new(
root => $dir,
breadth_first => 0
);
until ($it->done) {
my $f = $it->next;
push #contents, $f;
}
Make sure that you don't let people set $dir to something that will let them look somewhere you don't want them to look.
Your problem is the scope of the directory handle DIR. DIR has global scope so each recursive call to readDir is using the same DIR; so, when you closdir(DIR) and return to the caller, the caller does a readdir on a closed directory handle and everything stops. The solution is to use a local directory handle:
sub readDir {
my ($dir) = #_;
opendir(my $dh, $dir) or die $!;
while(my $file = readdir($dh)) {
next if($file eq '.' || $file eq '..');
my $path = $dir . '/' . $file;
if(-d $path) {
readDir($path);
}
push(#contents, $path);
}
closedir($dh);
}
Also notice that you would be missing a directory separator if (a) it wasn't at the end of $directory or (b) on every recursive call. AFAIK, slashes will be internally converted to backslashes on Windows but you might want to use a path mangling module from CPAN anyway (I only care about Unix systems so I don't have any recommendations).
I'd also recommend that you pass a reference to #contents to readDir rather than leaving it as a global variable, fewer errors and less confusion that way. And don't use parentheses on sub definitions unless you know exactly what they do and what they're for. Some sanity checking and scrubbing on $directory would be a good idea as well.
There are many modules that are available for recursively listing files in a directory.
My favourite is File::Find::Rule
use strict ;
use Data::Dumper ;
use File::Find::Rule ;
my $dir = shift ; # get directory from command line
my #files = File::Find::Rule->in( $dir );
print Dumper( \#files ) ;
Which sends a list of files into an array ( which your program was doing).
$VAR1 = [
'testdir',
'testdir/file1.txt',
'testdir/file2.txt',
'testdir/subdir',
'testdir/subdir/file3.txt'
];
There a loads of other options, like only listing files with particular names. Or you can set it up as an iterator, which is described in How can I use File::Find
How can I use File::Find in Perl?
If you want to stick to modules that come with Perl Core, have a look at File::Find.

Perl program help on opendir and readdir

So I have a program that I want to clean some text files. The program asks for the user to enter the full pathway of a directory containing these text files. From there I want to read the files in the directory, print them to a new file (that is specified by the user), and then clean them in the way I need. I have already written the script to clean the text files.
I ask the user for the directory to use:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
Then I need to read the directory.
my #dir = readdir DIR;
foreach (#dir) {
Now I am lost.
Any help please?
I'm not certain of what do you want. So, I made some assumptions:
When you say clean the text file, you meant delete the text file
The names of the files you want to write into are formed by a pattern.
So, if I'm right, try something like this:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
my #dir = readdir DIR;
foreach (#dir) {
next if (($_ eq '.') || ($_ eq '..'));
# Reads the content of the original file
open FILE, $_;
$contents = <FILE>;
close FILE;
# Here you supply the new filename
$new_filename = $_ . ".new";
# Writes the content to the new file
open FILE, '>'.$new_filename;
print FILE $content;
close FILE;
# Deletes the old file
unlink $_;
}
I would suggest that you switch to File::Find. It can be a bit of a challenge in the beginning but it is powerful and cross-platform.
But, to answer your question, try something like:
my #files = readdir DIR;
foreach $file (#files) {
foo($user_supplied_directory/$file);
}
where "foo" is whatever you need to do to the files. A few notes might help:
using "#dir" as the array of files was a bit misleading
the folder name needs to be prepended to the file name to get the right file
it might be convenient to use grep to throw out unwanted files and subfolders, especially ".."
I wrote something today that used readdir. Maybe you can learn something from it. This is just a part of a (somewhat) larger program:
our #Perls = ();
{
my $perl_rx = qr { ^ perl [\d.] + $ }x;
for my $dir (split(/:/, $ENV{PATH})) {
### scanning: $dir
my $relative = ($dir =~ m{^/});
my $dirpath = $relative ? $dir : "$cwd/$dir";
unless (chdir($dirpath)) {
warn "can't cd to $dirpath: $!\n";
next;
}
opendir(my $dot, ".") || next;
while ($_ = readdir($dot)) {
next unless /$perl_rx/o;
### considering: $_
next unless -f;
next unless -x _;
### saving: $_
push #Perls, "$dir/$_";
}
}
}
{
my $two_dots = qr{ [.] .* [.] }x;
if (grep /$two_dots/, #Perls) {
#Perls = grep /$two_dots/, #Perls;
}
}
{
my (%seen, $dev, $ino);
#Perls = grep {
($dev, $ino) = stat $_;
! $seen{$dev, $ino}++;
} #Perls;
}
The crux is push(#Perls, "$dir/$_"): filenames read by readdir are basenames only; they are not full pathnames.
You can do the following, which allows the user to supply their own directory or, if no directory is specified by the user, it defaults to a designated location.
The example shows the use of opendir, readdir, stores all files in the directory in the #files array, and only files that end with '.txt' in the #keys array. The while loop ensures that the full path to the files are stored in the arrays.
This assumes that your "text files" end with the ".txt" suffix. I hope that helps, as I'm not quite sure what's meant by "cleaning the files".
use feature ':5.24';
use File::Copy;
my $dir = shift || "/some/default/directory";
opendir(my $dh, $dir) || die "Can't open $dir: $!";
while ( readdir $dh ) {
push( #files, "$dir/$_");
}
# store ".txt" files in new array
foreach $file ( #files ) {
push( #keys, $file ) if $file =~ /(\S+\.txt\z)/g;
}
# Move files to new location, even if it's across different devices
for ( #keys ) {
move $_, "/some/other/directory/"; || die "Couldn't move files: $!\n";
}
See the perldoc of File::Copy for more info.

How can I list all files in a directory using Perl?

I usually use something like
my $dir="/path/to/dir";
opendir(DIR, $dir) or die "can't open $dir: $!";
my #files = readdir DIR;
closedir DIR;
or sometimes I use glob, but anyway, I always need to add a line or two to filter out . and .. which is quite annoying.
How do you usually go about this common task?
my #files = grep {!/^\./} readdir DIR;
This will exclude all the dotfiles as well, but that's usually What You Want.
I often use File::Slurp. Benefits include: (1) Dies automatically if the directory does not exist. (2) Excludes . and .. by default. It's behavior is like readdir in that it does not return the full paths.
use File::Slurp qw(read_dir);
my $dir = '/path/to/dir';
my #contents = read_dir($dir);
Another useful module is File::Util, which provides many options when reading a directory. For example:
use File::Util;
my $dir = '/path/to/dir';
my $fu = File::Util->new;
my #contents = $fu->list_dir( $dir, '--with-paths', '--no-fsdots' );
I will normally use the glob method:
for my $file (glob "$dir/*") {
#do stuff with $file
}
This works fine unless the directory has lots of files in it. In those cases you have to switch back to readdir in a while loop (putting readdir in list context is just as bad as the glob):
open my $dh, $dir
or die "could not open $dir: $!";
while (my $file = readdir $dh) {
next if $file =~ /^[.]/;
#do stuff with $file
}
Often though, if I am reading a bunch of files in a directory, I want to read them in a recursive manner. In those cases I use File::Find:
use File::Find;
find sub {
return if /^[.]/;
#do stuff with $_ or $File::Find::name
}, $dir;
If some of the dotfiles are important,
my #files = grep !/^\.\.?$/, readdir DIR;
will only exclude . and ..
When I just want the files (as opposed to directories), I use grep with a -f test:
my #files = grep { -f } readdir $dir;
Thanks Chris and Ether for your recommendations. I used the following to read a listing of all files (excluded directories), from a directory handle referencing a directory other than my current directory, into an array. The array was always missing one file when not using the absolute path in the grep statement
use File::Slurp;
print "\nWhich folder do you want to replace text? " ;
chomp (my $input = <>);
if ($input eq "") {
print "\nNo folder entered exiting program!!!\n";
exit 0;
}
opendir(my $dh, $input) or die "\nUnable to access directory $input!!!\n";
my #dir = grep { -f "$input\\$_" } readdir $dh;

How to get nested directories contents in Perl

i'm trying to write a script which would process certain files. The data are organized like this: there is a folder (let's call it X) where my script will be placed. In this same folder there is a subfolder called 'data'. This contains several more subfolders with various names and each of these contains many files (no other subfolders, just files). I need to process all files in a subfolder (more specifically, run a function on each file) and then merge the results for all files in the subfolder, so for each folder there is one result (no matter how many files it contains).
The problem is, i'm not able to get to the files so i could run my function on them. What i have now is this:
$dirname = "data";
opendir ( DIR, $dirname ) || die "Error in opening dir $dirname\n";
while( ($dirname2 = readdir(DIR)) )
{
next if $dirname2 eq ".";
next if $dirname2 eq "..";
opendir ( DIR2, $dirname2 ) || die "Error in opening dir $dirname2\n";
while( ($file = readdir(DIR2)) )
{
next if $file eq ".";
next if $file eq "..";
print( "file:$file\n" );
}
closedir(DIR2);
}
closedir(DIR);
It always fails with the message "Error in opening dir alex". 'alex' happens to be the first directory in the data directory. My question is - where is the problem? Is this even the correct way how to achieve what i'm trying to do? I'm also worried that this my fail if there is a file also in the data folder, since i cannot open it with opendir, or can I?
PS: sorry for that horrible Perl code - i'm still trying to learn this language.
Thanks,
Peter
You can try File::Path - Create or remove directory trees
As i am running your program, i think you have to specify your full path while opening a directory ie.,
opendir ( DIR2, $dirname.\\.$dirname2 ) || die "Error in opening dir $dirname2\n"; #running code on windows
It will work, try it.
you can use File::Find to do find files nested directories
Are you sure that inside folder exist only folders? Add additional check:
next if !(-d $dirname2);
Here is a slightly cleaned up version of what was posted in the question.
use strict;
use warnings;
use autodie;
use File::Spec::Functions qw'catdir catfile';
my $dirname = "data";
{
opendir my $dir_h, $dirname;
while( my $dirname2 = readdir($dir_h) ){
next if $dirname2 eq ".";
next if $dirname2 eq "..";
$dirname2 = catdir( $dirname, $dirname2 );
next unless -d $dirname2;
opendir my $dir_h2, $dirname2;
while( my $file = readdir($dir_h2) )
{
next if $file eq ".";
next if $file eq "..";
$file = catfile($dirname2,$file);
if( -f $file ){
print( "file:$file\n" );
}
}
# $dir_h2 automatically closes here
}
# $dir_h automatically closes here
}
If you are going to run it on Perl versions earlier than 5.12.0 you should wrap the while loop's conditional with defined().
while( my $dirname2 = readdir($dir_h) ){
while( defined( my $dirname2 = readdir($dir_h) ) ){