How to loop through a directory in perl - perl

I have a directory dir1 containing several hundreds of files, which are to be iteratively processed by a speech program called HRest. The program is supposed to take each file one by one, process it and put it in a new directory (as dir2 for first iteration) to be used in next iteration. My problem is that i don't know if the way I've employed to loop through the files in dir1, and also the way I am running the script (trainhmms.pl dir1 1) is correct.
If the files in dir1 are L1, L2, L3, ..., L500, I want HRest to be executed as
HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l dir1/L1 -M dir2 -S train.scp
for the first file, and as
HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l dir1/L2 -M dir2 -S train.scp
for the next file, and so on for all files. Then in next call of the script, I want it to be changed to
HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l dir2/L1 -M dir3 -S train.scp
for the first file, and so on..
Here is the script for the first iteration:
#!/usr/bin/perl
use File::Slurp;
# Usage: trainhmms.pl dir1 1
# dir1: Folder containing models after being initialised by HInit (L1,L2,..,L512)
$file = $ARGV[0];
$iter = $ARGV[1];
my #files = read_dir '/Users/negarolfati/Documents/Detection_rerun/AF_TIMIT/1_state//trainHMMs/dir1';
for my $file ( #files ) {
$iter2 = $iter+1;
$cmd = "HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l '$dir[$iter]/$file' -M '$dir[$iter2]' -S train.scp ";
system("$cmd");
}

You can't just use readdir on a directory string. You have to opendir the string, then readdir from the directory handle that you get, and finally closedir the handle.
You must also remember that readdir returns directory names as well as file names, and the pseudo-directories . and .. too. To filter out just the files, you can use the -f test operator. And it is usually most convenient to chdir to the directory you are reading so that you don't have to append the path to each file name that readdir returns before you do the test.
I don't know what HRest is, but if your command line must be executed from a specific working directory (perhaps to acccess timedlabels_train.mlf and train.scp) then please say so. I will have to remove the chdir statement.
Something like this should get you going. I have used autodie, which does automatic checks on file system operations. It saves having to check chdir and opendir explicitly each time with or die $!.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use File::Spec::Functions 'catdir';
my ($file, $iter) = #ARGV;
my $root = '/Users/negarolfati/Documents/Detection_rerun/AF_TIMIT/1_state/trainHMMs';
my $dir1 = catdir $root, 'dir'.$iter;
my $dir2 = catdir $root, 'dir'.($iter+1);
chdir $dir1;
opendir my ($dh), '.';
my #files = grep -f, readdir $dh;
closedir $dh;
for my $file ( #files ) {
my $cmd = "HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l '$dir1/$file' -M '$dir2' -S train.scp";
system($cmd);
}
Update
Here is an alternative version that avoids chdir so that the current working directory remains unchanged.
I have added the secondary loop that was in your bash script. I have also added a print statement so that you can see each command before it is executed.
To allow the system call to go ahead, just delete or comment out the next statement.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use File::Spec::Functions qw/ catdir catfile /;
STDOUT->autoflush;
my $root = '/Users/negarolfati/Documents/Detection_rerun/AF_TIMIT/1_state/trainHMMs';
for my $iter (1 .. 4) {
my $dir1 = catdir $root, 'dir'.$iter;
my $dir2 = catdir $root, 'dir'.($iter+1);
opendir my ($dh), $dir1;
while (my $node = readdir $dh) {
my $file = catfile($dir1, $node);
next unless -f $file;
my $cmd = "HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l '$file' -M '$dir2' -S train.scp";
print $cmd, "\n";
next; # Remove for full functionality
system($cmd);
}
closedir $dh;
}

You can do this:
my #files = <$path/*>;
foreach my $filename ( reverse(#files) ) {
...
}

Related

Need to loop through directory - delete lines that match pattern

Need to loop through a Unix directory and search each line in each file. If there is a pattern match delete the line. Was not able to get the line deletion to work so i'm just trying to find pattern and replace with another.
Populating an array with file names and looping through. I have a counter set it's looking at each of the lines in each file (at least they count is correct).
#!/usr/bin/perl -l
#!/usr/bin/perl -i.bak -w
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
# 4-1-19
# pfs
# remove lines with dental code ADD2999 from all HMO Max load files in /home/hsxxx/dat/feeLoad directory
$| = 1;
chdir "/home/hstrn/dat/feeLoad";
chdir;
my $dir = </home/hstrn/dat/feeLoad/>;
my #files;
my $count=0;
opendir(DIR, $dir) or die "Cannot open directory $dir, Perl says $!\n";
while (my $file = readdir DIR)
{
push #files, "$dir/$file" unless -d "$dir/$file";
}
closedir DIR;
{
local #ARGV = #files;
while (<>)
{
s/ADD2999/sometext/g;
$count++;
}
print "Total lines read are: $count";
}
Would expect all strings ADD2999 to be replaced with sometext
To remove lines, you need to avoid printing them when writing to the new file. Your code doesn't write to any files at all???
This might be a job for existing tools.
find /home/hstrn/dat/feeLoad -maxdepth 1 -type f \
-exec perl -i~ -ne'print if !/ADD2999/' {} +
Use -i instead of -i~ if you want to avoid creating a backup. I prefer creating the backups, then deleting them once I've confirmed that everything is ok.
Show the files that are going to get deleted:
find /home/hstrn/dat/feeLoad -maxdepth 1 -type f -name '*~'
Delete the files:
find /home/hstrn/dat/feeLoad -maxdepth 1 -type f -name '*~' -delete
This would be my first attempt at the problem, but it could use some more corner case checking. E.g. how do you handle write-protected files, etc. It also assumes that the files are small enough to fit into memory for processing.
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use File::Spec;
use File::Slurper qw(read_text write_text);
my $count = 0;
my $dir = "tmp";
opendir(my $dh, $dir);
while (readdir $dh) {
# skip anything that shouldn't be processed
next if /^\.\.?$/; # . && ..
my $file = File::Spec->catfile($dir, $_);
next if -d $file; # directories
# slurp file content and replace text
my $content = read_text($file);
my $matches = ($content =~ s/ADD2999/sometext/g);
# count lines
my #eols = ($content =~ /(\n)/g);
$count += #eols;
# replace original file if contents were modified
write_text($file, $content) if $matches;
}
closedir($dh);
print "Total lines read are: $count\n";
exit 0;
Test run:
$ wc -l tmp/test*.txt
5 tmp/test2.txt
6 tmp/test.txt
11 total
$ fgrep ADD2999 tmp/*.txt
tmp/test2.txt:asddsada ADD2999 asdsadasd
tmp/test2.txt:21312398 ADD2999 dasdas
$ perl dummy.pl
Total lines read are: 11
$ fgrep ADD2999 tmp/*.txt
$ fgrep sometext tmp/*.txt
tmp/test2.txt:asddsada sometext asdsadasd
tmp/test2.txt:21312398 sometext dasdas
If the files are large you will need to use line-by-line processing approach (just showing the contents of the loop). That has the side-effect that all files will be touched, although they might not have any replacements in it:
# read file and replace text
open(my $ifh, '<', $file);
my $tmpfile = File::Spec->catfile($dir, "$_.$$");
open(my $ofh, '>', $tmpfile);
while (<$ifh>) {
s/ADD2999/sometext/g;
print $ofh $_;
}
$count += $.; # total lines in $ifh
close($ofh);
close($ifh);
# replace original file with new file
unlink($file);
rename($tmpfile, $file);

Perl directory is getting past if(! -d) statement? [readdir results]

Okay so I have a program that basically looks into a passed in directory, if any file names match a pattern I will make a directory and move that specific file and any that matches it (regardless of extension) into that directory. Now if they don't match I should move them into the PassedInDir/misc/ directory.
I have a condition in both cases to avoid passing in any directory (as my program isn't ready to deal with those yet) something like if( ! -d $fp).
Everything works fine when I run it the first time in the directory. However when I run it again on the same directory (which should now only contain directories) I get the Error Could not move file assignmentZ to destination DataB/misc at projectSorter.pl line 16.. AssignmentZ is a directory however its somehow getting past the (!-d) in the second case.
#!/usr/bin/perl -w
use File::Copy;
if(#ARGV < 1){
print "\nUsage: proj6.pl <directory>\n\n";
exit;
}
die("\nDirectory $ARGV[0] does not exist\n\n") if( ! -e $ARGV[0]);
opendir( DIR, $ARGV[0]) or die("\nCould not open directory $ARGV[0]\n\n");
while(($fp = readdir(DIR))){
if($fp =~ m/proj(.*)\./){
(! -d "$ARGV[0]/assignment$1") && (mkdir "$ARGV[0]/assignment$1");
move("$ARGV[0]/$fp" , "$ARGV[0]/assignment$1") or die("Could not move file $fp to destination $ARGV[0]/assignment$1");
}
elsif(! -d $fp){ #gets past here!!!
(! -d "$ARGV[0]/misc") && (mkdir "$ARGV[0]/misc");
move("$ARGV[0]/$fp" , "$ARGV[0]/misc") or die("Could not move file $fp to destination $ARGV[0]/misc");
}
}
It is the only directory to do it out of the ones previously made by running my program once. I am curious about why this is happening.
$fp as set by readdir is relative to scanned directory. chdir to the scanned directory or prepend the scanned directory name for -d test.
You use "$ARGV[0]/$fp" as argument to move function.
perldoc -f readdir
readdir DIRHANDLE
Returns the next directory entry for a directory opened by
"opendir". […]
If you're planning to filetest the return values out of a
"readdir", you'd better prepend the directory in question.
Otherwise, because we didn't "chdir" there, it would have been
testing the wrong file.
Some suggestions.
‣ Don't use the -w flag with Perl. Some modules turn warnings off to do their work but the -w flag is global. With it, they will report warnings that should be ignored.
‣ Always have these two lines at the top of every script.
use strict;
use warnings;
These will catch a lot of errors in your code. See perldoc strict and perldoc warnings for more details.
‣ Use glob() or Find::Find instead of opendir/readdir/closedir.
‣ Use make_path() from File::Path instead of mkdir.
‣ Use an if statement for conditional execution instead of &&.
‣ Place blank lines in your code to make reading it easier.
File::Find and File::path are standard modules that come installed with Perl. For a list of the standard modules, see perldoc perlmodlib.
#!/usr/bin/perl
# --------------------------------------
# pragmas
use strict;
use warnings;
# --------------------------------------
# modules
use File::Copy;
use File::Path qw( make_path );
# --------------------------------------
# main
# make sure there is something to work on
if(#ARGV < 1){
print "\nUsage: proj6.pl <directory>\n\n";
exit;
}
# arguments should be directories
for my $src_dir ( #ARGV ){
# validate the source directory
die("\n$src_dir does not exist\n\n") if( ! -e $src_dir);
die("\n$src_dir is not a directory\n\n") if( ! -d $src_dir);
# move proj* files
for my $proj ( glob( "$src_dir/proj*" )){
# get the proj number
( my $number ) = $proj =~ m/proj(.*)\./;
# get the destination directory
my $dst_dir = "$src_dir/assignment$number";
# create the directory where it goes
if( ! -d $dst_dir ){
make_path( $dst_dir ) or die "could not make path $dst_dir";
}
# move the file
move( $proj, $dst_dir ) or die( "could not move file $proj to destination $dst_dir" );
} # end of $proj files
# move other files
for my $file ( grep { ! -d } glob( "$src_dir/*" )){
# get the destination directory
my $dst_dir = "$src_dir/misc";
# create the directory where it goes
if( ! -d $dst_dir ){
make_path( $dst_dir ) or die "could not make path $dst_dir";
}
# move the file
move( $file, $dst_dir ) or die( "could not move file $file to destination $dst_dir" );
} # end other files
} # end of src_dir

Perl script to compare file contents of dirA with file contents of dirB and output the difference in a separate file

I am just a beginner at perl and I need help with the following.
DirA and DirB have files f1 ,f2 ,f3 ,f4, f5 (not necessarily 5 in number). I need to compare the contents of f1 in DirA with contents of f1 in DirB and output the result in f1_diff. Similarly need to do this for all files in directories A and B. So in the above case assuming the contents of all files are different, Script will output 5 files f1_diff,f2_diff,f3_diff,f4_diff,f5_diff which has the missing lines of both f1 DirA and f1 DirB.
For example, if
f1 dir A has line1, line2, line3, line4xxxx, line5
f1 dir B has line1, line2xxxx, line3, line4, line5
f1_diff should have
line2 --> line2xxxx
line4xxxx -->line4
Can someone please help me with this.
Update:
I have the below script right now and need to add the following.
Filename: # of different lines
File 1 – 1
File 2 - 30
File 3 – missing in dir1
File 3a – missing in dir2
Secondly the number of lines differing. Can someone help me modify the same
#!/usr/bin/perl
package Main; {
use strict;
use warnings;
use Cwd;
my $DirA;
my $DirB;
my $y;
print ("\nChoose one of the entries below\n");
print ("e1\e2\e3\n\n");
print("Enter here --->");
my $dir = <>;
chomp($dir);
if ($dir eq "e1"){
$DirA = "./dir1";
$DirB = "./dir2";
}
elsif ($dir eq "e2"){
$DirA = "./dir3";
$DirB = "./dir4";
}
else{
$DirA = "./dir5";
$DirB = "./dir6";
}
opendir my($dh), "$DirA" or die "Couldn't open dir DirA!";
my #files = readdir $dh;
closedir $dh;
system("rm -rf diffs");
system ("mkdir diffs");
foreach my $file (#files) {
next if -d $file;
my $diff = `diff -y --suppress-common-lines "$DirA/$file" "$DirB/$file"`;
open DIFF_FILE, ">$file"."_diff";
print DIFF_FILE $diff;
close DIFF_FILE;#}
}
chdir("./diffs/");
my $cwd = cwd();
system("mv ../*_diff .");
foreach(glob('*.*')) {
unlink if (-f $_) && !-s _;
}
print("Total number of differences = "); system("ls *_diff | wc -l");print("\n");
}
I recommend the use of CPAN Modules to achieve this tasks:
To find files: File::Find
To compare files: File::Compare
To show file differences: Text::Diff
Check if the output format of Text::Diff is what you need, it offers configuration features
You could use Text::Diff.
#!/usr/bin/perl
use Text::Diff;
use strict;
use warnings;
opendir my($dh), "DirA" or die "Couldn't open dir DirA!";
my #files = readdir $dh;
closedir $dh;
foreach my $file (#files) {
next if -d $file;
my $diff = diff "DirA/$file", "DirB/$file";
open DIFF_FILE, ">$file"."_diff";
print DIFF_FILE $diff;
close DIFF_FILE;
}
It is easy if parallel is available :
ls dirA/f* | parallel 'echo {} - dirB/{/} >>{/}_dif ; diff {} dirB/{/} >> {/}_dif '

How can I detect that a symlink is broken in Perl?

I would like to remove a broken symlink in a directory using Perl.
In my mind I just had to list the file of a directory and test is this a symlink (-l) and if it returns false just unlink it.
But it appears that when using readir to list all files my broken symlinks are not recoganized as a file. Because my link is pointing to nothing I understand why.
All the file in $myDir are symlinks, either valid or broken.
When I display #files I only get a list of valid symlink.
opendir DIR, $myDir;
my #files = grep(/$regexp/,readdir(DIR));
closedir DIR;
print "filenames : #files\n";
There are two main relevant system calls, stat() and lstat(). The lstat() call will tell you that it is a symlink (but on other files, behaves the same as stat()). This allows you to determine that the name is a symlink. The stat() system call follows a symlink to its end, and tells you about the file (or directory) at the end of the link. If the stat() call fails on the symlink, then the symlink is broken or you're trying to access a directory or file where you have no permission.
The Perl file test operators include -l to detect whether a name is a symlink. You can use the Perl functions stat and lstat explicitly. Between these, you should be able to sort out whether a symlink is broken or not - but you should probably plan to write a function to do the job.
You probably don't need to use the readlink Perl function. Beware the underlying system readlink() call; it does not return a null-terminated string!
It is interesting that neither Perl nor its POSIX module supports the realpath() function. However, the PathTools module does support it. If realpath fails, on a symlink, the symlink is non-functional (aka broken).
Here's some code I've used to remove broken links:
chdir $dir or die;
opendir(DIR, '.') or die;
foreach my $link (readdir DIR) {
next unless -l $link and not -e readlink($link);
print "Removing broken link $link\n";
unlink $link;
}
closedir DIR;
Note that it's important that the directory containing the links is the current directory. readdir returns only filenames, and the links might be relative.
Combining lstat with stat:
say "dangling link at $fn" if (lstat $fn and not stat $fn);
update: it works for me...
salva#topo:~/t/dl$ perl -E 'opendir $dh, "."; say $_ for grep { !stat $_ and lstat $_ } readdir $dh'
foo
salva#topo:~/t/dl$ ls -l
total 0
-rw-rw-r-- 1 salva salva 0 2011-07-05 12:34 f
lrwxrwxrwx 1 salva salva 11 2011-07-05 12:00 fii -> /etc/shadow
lrwxrwxrwx 1 salva salva 12 2011-07-05 11:59 foo -> /etc/hjdkshf
Check for broken symlinks (checking only the top level if there are symlinks to symlinks):
use strict;
use warnings;
use autodie;
opendir my $dirh, '.';
while (my $file = readdir $dirh) {
if ( -l $file ) {
my $target = readlink $file;
if ( ! -e $target && ! -l $target ) {
print "$file -> $target broken\n";
}
}
}
Use readlink() and stat() the result.
Using the built-in Perl glob function ?
For examples:
#files = <*>;
foreach $file (#files) {
print $file . "\n";
}
For a specific $dir:
#files = <$dir*>;
foreach $file (#files) {
print $file . "\n";
}
A broken symlink is a link (-l) that does not exists (!-e)
perl -e 'print "broken: $_\n" for grep { -l and ! -e } glob("*");'

How do I check for a sub-subdirectory in Perl?

I have a folder called Client which contains many subfolders. I want to create a Perl script to look at each of those subfolders and check for a folder there. If it is there, I want to skip it and move on, if it is not there, I want to create it and do some processing.
How do I go about looping through all of the subfolders and checking for the directory I want? I have found a lot of information on how to get all the files in a folder and/or subfolders, but nothing on checking for a directory within each subfolder.
Augh! Too much complexity in the other answers. The original question doesn't appear to be asking for a recursive traversal. As far as I can see, this is a perfectly sensible solution, and vastly more readable to boot:
foreach my $dir (glob "Client/*") {
next if ! -d $dir; # skip if it's not a directory
next if -d "$dir/subfolder"; # skip if subfolder already exists
mkdir "$dir/subfolder" or die; # create it
do_some_processing(); # do some processing
}
Seriously folks: opendir/readdir? Really?
It's pretty easy once you break it into steps. Get a list of the subdirectories with glob then see which ones don't have the second-level directory. If you are using a File::Find-like module, you are probably doing too much work:
#!perl
use strict;
use warnings;
use File::Spec::Functions;
my $start = 'Clients';
my $subdir = 'already_there';
# #queue is the list of directories you need to process
my #queue = grep { ! -d catfile( $_, $subdir ) } # filter for the second level
grep { -d } # filter for directories
glob catfile( $start, '*' ); # everything below $start
#!/usr/bin/perl
use strict;
use Fcntl qw( :DEFAULT :flock :seek );
use File::Spec;
use IO::Handle;
my $startdir = shift #ARGV || '.';
die "$startdir is not a directory\n"
unless -d $startdir;
my $verify_dir_name = 'MyDir';
my $dh = new IO::Handle;
opendir $dh, $startdir or
die "Cannot open $startdir: $!\n";
while(defined(my $cont = readdir($dh))) {
next
if $cont eq '.' || $cont eq '..';
my $fullpath = File::Spec->catfile($dir, $cont);
next
unless -d $fullpath && -r $fullpath && -w $fullpath;
my $verify_path = File::Spec->catfile($fullpath, $verify_dir_name);
next
if -d $verify_path;
mkdir($verify_path, 0755);
# do whatever other operations you want to $verify_path
}
closedir($dh);
The short answer is use File::FInd.
The long answer is first write a subroutine that validates the existence of the folder and if the folder is not there, create it and then do the processing needed. Then invoke the find method of the File::Find module with a reference to the subroutine and the starting folder to process all the subfolders.