Perl program help on opendir and readdir - perl

So I have a program that I want to clean some text files. The program asks for the user to enter the full pathway of a directory containing these text files. From there I want to read the files in the directory, print them to a new file (that is specified by the user), and then clean them in the way I need. I have already written the script to clean the text files.
I ask the user for the directory to use:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
Then I need to read the directory.
my #dir = readdir DIR;
foreach (#dir) {
Now I am lost.
Any help please?

I'm not certain of what do you want. So, I made some assumptions:
When you say clean the text file, you meant delete the text file
The names of the files you want to write into are formed by a pattern.
So, if I'm right, try something like this:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
my #dir = readdir DIR;
foreach (#dir) {
next if (($_ eq '.') || ($_ eq '..'));
# Reads the content of the original file
open FILE, $_;
$contents = <FILE>;
close FILE;
# Here you supply the new filename
$new_filename = $_ . ".new";
# Writes the content to the new file
open FILE, '>'.$new_filename;
print FILE $content;
close FILE;
# Deletes the old file
unlink $_;
}

I would suggest that you switch to File::Find. It can be a bit of a challenge in the beginning but it is powerful and cross-platform.
But, to answer your question, try something like:
my #files = readdir DIR;
foreach $file (#files) {
foo($user_supplied_directory/$file);
}
where "foo" is whatever you need to do to the files. A few notes might help:
using "#dir" as the array of files was a bit misleading
the folder name needs to be prepended to the file name to get the right file
it might be convenient to use grep to throw out unwanted files and subfolders, especially ".."

I wrote something today that used readdir. Maybe you can learn something from it. This is just a part of a (somewhat) larger program:
our #Perls = ();
{
my $perl_rx = qr { ^ perl [\d.] + $ }x;
for my $dir (split(/:/, $ENV{PATH})) {
### scanning: $dir
my $relative = ($dir =~ m{^/});
my $dirpath = $relative ? $dir : "$cwd/$dir";
unless (chdir($dirpath)) {
warn "can't cd to $dirpath: $!\n";
next;
}
opendir(my $dot, ".") || next;
while ($_ = readdir($dot)) {
next unless /$perl_rx/o;
### considering: $_
next unless -f;
next unless -x _;
### saving: $_
push #Perls, "$dir/$_";
}
}
}
{
my $two_dots = qr{ [.] .* [.] }x;
if (grep /$two_dots/, #Perls) {
#Perls = grep /$two_dots/, #Perls;
}
}
{
my (%seen, $dev, $ino);
#Perls = grep {
($dev, $ino) = stat $_;
! $seen{$dev, $ino}++;
} #Perls;
}
The crux is push(#Perls, "$dir/$_"): filenames read by readdir are basenames only; they are not full pathnames.

You can do the following, which allows the user to supply their own directory or, if no directory is specified by the user, it defaults to a designated location.
The example shows the use of opendir, readdir, stores all files in the directory in the #files array, and only files that end with '.txt' in the #keys array. The while loop ensures that the full path to the files are stored in the arrays.
This assumes that your "text files" end with the ".txt" suffix. I hope that helps, as I'm not quite sure what's meant by "cleaning the files".
use feature ':5.24';
use File::Copy;
my $dir = shift || "/some/default/directory";
opendir(my $dh, $dir) || die "Can't open $dir: $!";
while ( readdir $dh ) {
push( #files, "$dir/$_");
}
# store ".txt" files in new array
foreach $file ( #files ) {
push( #keys, $file ) if $file =~ /(\S+\.txt\z)/g;
}
# Move files to new location, even if it's across different devices
for ( #keys ) {
move $_, "/some/other/directory/"; || die "Couldn't move files: $!\n";
}
See the perldoc of File::Copy for more info.

Related

Perl to find the extension of file

I have a program that takes directory name as input from user and searches all files inside the directory and prints the contents of file. Is there any way so that I can read the extension of file and read the contents of file that are of specified extension? For example, it should read contents of file that is in ".txt" format.
My code is
use strict;
use warnings;
use File::Basename;
#usr/bin/perl
print "enter a directory name\n";
my $dir = <>;
print "you have entered $dir \n";
chomp($dir);
opendir DIR, $dir or die "cannot open directory $!";
while ( my $file = readdir(DIR) ) {
next if ( $file =~ m/^\./ );
my $filepath = "${dir}${file}";
print "$filepath\n";
print " $file \n";
open( my $fh, '<', $filepath ) or die "unable to open the $file $!";
while ( my $row = <$fh> ) {
chomp $row;
print "$row\n";
}
}
To get just the ".txt" files, you can use a file test operator (-f : regular file) and a regex.
my #files = grep { -f && /\.txt$/ } readdir $dir;
Otherwise, you can look for just text files, using perl's -T (ascii-text file test operator)
my #files = grep { -T } readdir $dir;
Otherwise you can try even this:
my #files = grep {-f} glob("$dir/*.txt");
You're pretty close here. You have a main loop that looks like this:
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
# do stuff
}
See where you're skipping loop iterations if the filename starts with a dot. That's an excellent place to put any other skip requirements that you have - like skipping files that don't end with '.txt'.
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
next unless $file =~ /\.txt$/i; # skip non-text files
# do stuff
}
In the same way as your original test checked for the start of the string (^) followed by a literal dot (\.), we're now searching for a dot (\.) followed by txt and the end of the string ($). Note that I've also added the /i option to the match operator to make the match case-insensitive - so that we match ".TXT" as well as ".txt".
It's worth noting that the extension of a file is a terrible way to work out what the file contains.
Try this. Below code gives what you expect.
use warnings;
use strict;
print "Enter the directory name: ";
chomp(my $dir=<>);
print "Enter the file extension type: "; #only type the file format. like txt rtf
chomp(my $ext=<>);
opendir('dir',"$dir");
my #files = grep{m/.$ext/g} readdir('dir');
foreach my $ech(#files){
open('file',"$dir/$ech");
print <file>;
}
I'm store the all file from the particular directory to store the one array and i get the particular file format by using the grep command. Then open the files into the foreach condition

Perl loop through text files in a directory, locate file based on file name and print content

How do i loop through files in a directory, locate file based on file name and print file content?
Please see below code:
files in directory:
1234.txt
345.txt
234.txt
Code:
opendir (DIR, "LOCATION")|| die "cant open directory\n";
my #DATA = grep {(!/^\./)} readdir (DIR);
while ( my $file = shift #DATA) {
open FILE, "LOCATION";
while (FILE){
if ($file eq "235") {
print $_;
}
}
}
This should do (untested):
opendir( DIR, "/path/to/dir" );
while ( my $entry = readdir( DIR ) ) {
if ( $entry =~ /^$filenameImLookingFor$/ ) {
open( FILE, "$entry/$filenameImLookingFor" );
my #lines = <FILE>;
close( FILE );
print( join( '', #lines );
}
}
closedir( DIR );
The code in your question:
opendir (DIR, "LOCATION")|| die "cant open directory\n";
my #DATA = grep {(!/^\./)} readdir (DIR);
while ( my $file = shift #DATA) {
open FILE, "LOCATION";
while (FILE){
if ($file eq "235") {
print $_;
}
}
}
Will do this:
First it will handily find all files in directory "LOCATION" that do not begin with a period. Then it will iterate in a rather odd loop over each file name. The normal version of this loop would be:
for my $file (#DATA)
Then it will attempt to open the directory "LOCATION" again. This will likely fail, because "LOCATION" is a directory. Since you do not check the return value with die, this error will be silent.
What you probably want is to use
if ($file eq "235.txt") {
open my $fh, "<", $file or die $!;
print <$fh>;
}
This part:
while (FILE)
Is not actually checking the return value of readline(), it is checking whether the file handle is returning a true value. As near as I can tell on my system, it does return a true value even if the open failed. Which means of course that the loop will run indefinitely. What you probably meant was
while (<FILE>)
However, as explained earlier, this will only result in the error "readline() on unopened file handle FILE" since the open statement cannot open a directory.
Your check
if ($file eq "235")
Will never be true, since you said your file names had a .txt extension. You might instead do
if ($file eq "235.txt")
Which should work.
If you wanted to be clever, you could include your check directly in the grep:
my #files = grep { $_ eq "235.txt" } readdir DIR;
And since perl can use the <> diamond operator to print files listed in the #ARGV array, you can even do this
#ARGV = grep { $_ eq "235.txt" } #ARGV;
print <>;
Assuming you call the script with:
perl script.pl dir/*.txt
This is, of course, just the long version of doing:
perl -pe0 235.txt
Which is the long version of
cat 235.txt
So, I get the feeling you are trying to do something other than what your code implies.

Odd file handling in perl on OS X

I'm very much a perl newbie, so bear with me.
I was looking for a way to recurse through folders in OS X and came across this solution: How to traverse all the files in a directory...
I modified perreal's answer (see code below) slightly so that I could specify the search folder in an argument; i.e. I changed my #dirs = ("."); to my #dirs = ($ARGV[0]);
But for some reason this wouldn't work -- it would open the folder, but would not identify any of the subdirectories as folders, apart from '.' and '..', so it never actually went beyond the specified root.
If I actively specified the folder (e.g. \Volumes\foo\bar) it still doesn't work. But, if I go back to my #dirs = ("."); and then sit in my desired folder (foo\bar) and call the script from its own folder (foo\boo\script.pl) it works fine.
Is this 'expected' behaviour? What am I missing?!
Many thanks,
Mat
use warnings;
use strict;
my #dirs = (".");
my %seen;
while (my $pwd = shift #dirs) {
opendir(DIR,"$pwd") or die "Cannot open $pwd\n";
my #files = readdir(DIR);
closedir(DIR);
foreach my $file (#files) {
if (-d $file and ($file !~ /^\.\.?$/) and !$seen{$file}) {
$seen{$file} = 1;
push #dirs, "$pwd/$file";
}
next if ($file !~ /\.txt$/i);
my $mtime = (stat("$pwd/$file"))[9];
print "$pwd $file $mtime";
print "\n";
}
}
The problem is that you are using the -d operator on the file basename without its path. Perl will look in the current working directory for a directory of that name and return true if it finds one there, when it should be looking in $pwd.
This solution changes $file to always hold the full name of the file or directory, including the path.
use strict;
use warnings;
my #dirs = (shift);
my %seen;
while (my $pwd = shift #dirs) {
opendir DIR, $pwd or die "Cannot open $pwd\n";
my #files = readdir DIR;
closedir DIR;
foreach (#files) {
next if /^\.\.?$/;
my $file = "$pwd/$_";
next if $seen{$file};
if ( -d $file ) {
$seen{$file} = 1;
push #dirs, $file;
}
elsif ( $file =~ /\.txt$/i ) {
my $mtime = (stat $file)[9];
print "$file $mtime\n";
}
}
}
use full path with -d
-d "$pwd/$file"

How can I add a prefix to all filenames under a directory?

I am trying to prefix a string (reference_) to the names of all the *.bmp files in all the directories as well sub-directories. The first time we run the silk script, it will create directories as well subdirectories, and under each subdirectory it will store each mobile application's sceenshot with .bmp extension.
When I run the automated silkscript for second time it will again create the *.bmp files in all the subdirectories. Before running the script for second time I want to prefix all the *.bmp with a string reference_.
For example first_screen.bmp to reference_first_screen.bmp,
I have the directory structure as below:
C:\Image_Repository\BG_Images\second
...
C:\Image_Repository\BG_Images\sixth
having first_screen.bmp and first_screen.bmp files etc...
Could any one help me out?
How can I prefix all the image file names with reference_ string?
When I run the script for second time, the Perl script in silk will take both the images from the sub-directory and compare them both pixel by pixel. I am trying with code below.
Could you please guide me how can I proceed to complete this task.
#!/usr/bin/perl -w
&one;
&two;
sub one {
use Cwd;
my $dir ="C:\\Image_Repository";
#print "$dir\n";
opendir(DIR,"+<$dir") or "die $!\n";
my #dir = readdir DIR;
#$lines=#dir;
delete $dir[-1];
print "$lines\n";
foreach my $item (#dir)
{
print "$item\n";
}
closedir DIR;
}
sub two {
use Cwd;
my $dir1 ="C:\\Image_Repository\\BG_Images";
#print "$dir1\n";
opendir(D,"+<$dir1") or "die $!\n";
my #dire = readdir D;
#$lines=#dire;
delete $dire[-1];
#print "$lines\n";
foreach my $item (#dire)
{
#print "$item\n";
$dir2="C:\\Image_Repository\\BG_Images\\$item";
print $dir2;
opendir(D1,"+<$dir2") or die " $!\n";
my #files=readdir D1;
#print "#files\n";
foreach $one (#files)
{
$one="reference_".$one;
print "$one\n";
#rename $one,Reference_.$one;
}
}
closedir DIR;
}
I tried open call with '+<' mode but I am getting compilation error for the read and write mode.
When I am running this code, it shows the files in BG_images folder with prefixed string but actually it's not updating the files in the sub-directories.
You don't open a directory for writing. Just use opendir without the mode parts of the string:
opendir my($dir), $dirname or die "Could not open $dirname: $!";
However, you don't need that. You can use File::Find to make the list of files you need.
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use File::Find;
use File::Find::Closures qw(find_regular_files);
use File::Spec::Functions qw(catfile);
my( $wanted, $reporter ) = find_regular_files;
find( $wanted, $ARGV[0] );
my $prefix = 'recursive_';
foreach my $file ( $reporter->() )
{
my $basename = basename( $file );
if( index( $basename, $prefix ) == 0 )
{
print STDERR "$file already has '$prefix'! Skipping.\n";
next;
}
my $new_path = catfile(
dirname( $file ),
"recursive_$basename"
);
unless( rename $file, $new_path )
{
print STDERR "Could not rename $file: $!\n";
next;
}
print $file, "\n";
}
You should probably check out the File::Find module for this - it will make recursing up and down the directory tree simpler.
You should probably be scanning the file names and modifying those that don't start with reference_ so that they do. That may require splitting the file name up into a directory name and a file name and then prefixing the file name part with reference_. That's done with the File::Basename module.
At some point, you need to decide what happens when you run the script the third time. Do the files that already start with reference_ get overwritten, or do the unprefixed files get overwritten, or what?
The reason the files are not being renamed is that the rename operation is commented out. Remember to add use strict; at the top of your script (as well as the -w option which you did use).
If you get a list of files in an array #files (and the names are base names, so you don't have to fiddle with File::Basename), then the loop might look like:
foreach my $one (#files)
{
my $new = "reference_$one";
print "$one --> $new\n";
rename $one, $new or die "failed to rename $one to $new ($!)";
}
With the aid of find utility from coreutils for Windows:
$ find -iname "*.bmp" | perl -wlne"chomp; ($prefix, $basename) = split(m~\/([^/]+)$~, $_); rename($_, join(q(/), ($prefix, q(reference_).$basename))) or warn qq(failed to rename '$_': $!)"

How can I find the newest created file in a directory?

Is there an elegant way in Perl to find the newest file in a directory (newest by modification date)?
What I have so far is searching for the files I need, and for each one get it's modification time, push into an array containing the filename, modification time, then sort it.
There must be a better way.
Your way is the "right" way if you need a sorted list (and not just the first, see Brian's answer for that). If you don't fancy writing that code yourself, use this
use File::DirList;
my #list = File::DirList::list('.', 'M');
Personally I wouldn't go with the ls -t method - that involves forking another program and it's not portable. Hardly what I'd call "elegant"!
Regarding rjray's solution hand coded solution, I'd change it slightly:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my #files = map { [ stat "$DIR/$_", $_ ] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
sub rev_by_date { $b->[9] <=> $a->[9] }
my #sorted_files = sort rev_by_date #files;
After this, #sorted_files contains the sorted list, where the 0th element is the newest file, and each element itself contains a reference to the results of stat, with the filename itself in the last element:
my #newest = #{$sorted_files[0]};
my $name = pop(#newest);
The advantage of this is that it's easier to change the sorting method later, if desired.
EDIT: here's an easier-to-read (but longer) version of the directory scan, which also ensures that only plain files are added to the listing:
my #files;
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
while (defined (my $file = readdir($DH))) {
my $path = $DIR . '/' . $file;
next unless (-f $path); # ignore non-files - automatically does . and ..
push(#files, [ stat(_), $path ]); # re-uses the stat results from '-f'
}
closedir($DH);
NB: the test for defined() on the result of readdir() is because a file called '0' would cause the loop to fail if you only test for if (my $file = readdir($DH))
You don't need to keep all of the modification times and filenames in a list, and you probably shouldn't. All you need to do is look at one file and see if it's older than the oldest you've previously seen:
{
opendir my $dh, $dir or die "Could not open $dir: $!";
my( $newest_name, $newest_time ) = ( undef, 2**31 -1 );
while( defined( my $file = readdir( $dh ) ) ) {
my $path = File::Spec->catfile( $dir, $file );
next if -d $path; # skip directories, or anything else you like
( $newest_name, $newest_time ) = ( $file, -M _ ) if( -M $path < $newest_time );
}
print "Newest file is $newest_name\n";
}
you could try using the shell's ls command:
#list = `ls -t`;
$newest = $list[0];
Assuming you know the $DIR you want to look in:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my %files = map { $_ => (stat("$DIR/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
# $sorted_files[0] is the most-recently modified. If it isn't the actual
# file-of-interest, you can iterate through #sorted_files until you find
# the interesting file(s).
The grep that wraps the readdir filters out the "." and ".." special files in a UNIX(-ish) filesystem.
If you can't let ls do the sorting for you as #Nathan suggests, then you can optimize your process by only keeping the newest modification time and associated filename seen thus far and replace it every time you find a newer file in the directory. No need to keep any files around that you know are older than the newest one you've seen so far and certainly no need to sort them since you can detect which is the newest one while reading from the directory.
Subject is old, but maybe someone will try it - it isn't portable (Unix-like systems only), but it's quite simple and works:
chdir $directory or die "cannot change directory";
my $newest_file = bash -c 'ls -t | head -1';
chomp $newest_file;
print "$newest_file \n";