How do I distinguish a file from a directory in Perl? - perl

I'm trying to traverse through all the subdirectories of the current directory in Perl, and get data from those files. I'm using grep to get a list of all files and folders in the given directory, but I don't know which of the values returned is a folder name and which is a file with no file extention.
How can I tell the difference?

You can use a -d file test operator to check if something is a directory. Here's some of the commonly useful file test operators
-e File exists.
-z File has zero size (is empty).
-s File has nonzero size (returns size in bytes).
-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
See perlfunc manual page for more
Also, try using File::Find which can recurse directories for you. Here's a sample which looks for directories....
sub wanted {
if (-d) {
print $File::Find::name." is a directory\n";
}
}
find(\&wanted, $mydir);

print "$file is a directory\n" if ( -d $file );

Look at the -X operators:
perldoc -f -X
For directory traversal, use File::Find, or, if you're not a masochist, use my File::Next module which makes an iterator for you and doesn't require crazy callbacks. In fact, you can have File::Next ONLY return files, and ignore directories.
use File::Next;
my $iterator = File::Next::files( '/tmp' );
while ( defined ( my $file = $iterator->() ) ) {
print $file, "\n";
}
# Prints...
/tmp/foo.txt
/tmp/bar.pl
/tmp/baz/1
/tmp/baz/2.txt
/tmp/baz/wango/tango/purple.txt
It's at http://metacpan.org/pod/File::Next

my $dh = opendir(".");
my #entries = grep !/^\.\.?$/, readdir($dh);
closedir $dh;
foreach my $entry (#entries) {
if(-f $entry) {
# $entry is a file
} elsif (-d $entry) {
# $entry is a directory
}
}

my #files = grep { -f } #all;
my #dirs = grep { -d } #all;

It would be easier to use File::Find.

Related

How can Perl verify that a path is a directory yet not be able to call opendir() on the very same?

I'm trying to find empty subdirectories and delete them. I'm sure there are better ways to achieve this (I'm a poor programmer and relatively new with Perl) but even so, I'd like to understand what's wrong with my approach.
use strict;
use warnings;
use File::Basename;
use File::Find
my $lambda2 = sub
{
my $path = $File::Find::name;
if ( -d $path )
{
print("Directory: ", $path, "\n");
# Define anonymous function to test if directory is empty
my $hasContent = sub
{
my $directory = shift;
opendir ( my $dh, $directory );
return scalar ( grep { $_ ne "." && $_ ne ".." } readdir ( $dh ) );
};
# Remove item if it is an empty directory
if ( ! $hasContent->( $path ) )
{
rmdir( $path );
}
}
};
my $directory = "/Users/username/testdir/";
find( { wanted => $lambda2, no_chdir => 1 }, $directory );
If testdir has an empty subdirectory called testsubdir, say, I get the seemingly contradictory response:
Directory: /Users/username/testdir
Directory: /Users/username/testdir/testsubdir
Can't opendir(/Users/username/testdir/testsubdir): No such file or directory
The printing of the latter directory implies that it passed the -d check, but the subsequent error message says there is no such directory. As far as I can see nothing occurs inbetween.
The code's removing directories under find's feet, so to speak.
The simplest fix: change find to finddepth, for postorder traversal, since
it invokes the &wanted function for a directory after invoking it for the directory's contents.
(original emphasis)   Then it won't attempt to invoke wanted on the directory just removed.
Or, merely collect the list of empty directories in find and delete them after find completes.
Let's throw in some logging statements and see what is happening:
my $lambda2 = sub {
my $path = $File::Find::name;
if ( -d $path ) {
print("Directory: ", $path, "\n");
my $hasContent = sub {
my $directory = shift;
opendir ( my $dh, $directory );
return scalar ( grep { $_ ne "." && $_ ne ".." } readdir ( $dh ) );
};
my $hc = $hasContent->($path);
print STDERR "hc($path) = $hc\n";
if (! $hc) {
print STDERR "Deleting $path\n";
rmdir( $path );
}
}
};
$ mkdir -p /Users/username/testdir/testsubdir
$ perl subdir.pl
Directory: /Users/username/testdir
hc(/Users/username/testdir) = 1
Directory: /Users/username/testdir/testsubdir
hc(/Users/username/testdir/testsubdir) = 0
Deleting /Users/username/testdir/testsubdir
Can't opendir(/Users/username/testdir/testsubdir): No such file or directory
at subdir.pl line 26.
So the code is more or less working as designed, it's just that File::Find is trying to walk /Users/username/testdir/testsubdir after you have deleted it.
Using du and awk.
List all empty directories under $target_directory
du $target_directory| awk '$1=="0"{print $2}'
Remove all empty directories under $target_directory
du $target_directory| awk '$1=="0"{system("rmdir "$2);}'

Check if the nested directory structure is empty using perl

My requirement is to check if a nested directory structure is having any binary file or not.
The directory structure looks something like this:
DIR-A
|
|--DIR-X
| |
| |--DIR-X1
| |--DIR-X2
|
|--DIR-Y
| |
| |--DIR-Y1
| |--DIR-Y2
| |--DIR-Y3
|
|--DIR-Z
| |
| |--DIR-Z1
| |--DIR-Z2
| |--DIR-Z3
At any point in time there can be more directories at Level-1 or Level-2 i.e. there can be some more directories i.e. DIR-P, DIR-Q etc at level-1 and there can be DIR-X3 or DIR-Y4 at level-2.
I have written a sample code but it exits if it finds DIR-X1, Ideally it should exit if there is a binary file inside the directory.
#!/usr/bin/perl
my $someDir = "/path/of/DIR-A";
my #files = ();
my $file;
my $i=0;
opendir(DIR, "$someDir") or die "Cant open $someDir: $!\n";
#files = readdir(DIR);
foreach $file(#files)
{
unless ($file =~ /^[.][.]?\z/)
{
print "$i : $file \n";
$i++;
last;
}
}
if ($i != 0)
{
print "The directory contains files! \n";
exit 1;
}
else
{
print "This DIR-A is Empty! \n";
exit 0;
}
closedir(DIR);
Please suggest me get to the expected solution as below:
read DIR-A
print SUCCESS, if none of the nested directories have a binary file.
print ERROR, if at least one of the nested directories has a binary file.
Thanks!
Use File::Find::Rule
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find::Rule;
my $someDir = "/path/of/DIR-A";
my #files = File::Find::Rule->file()
->name('*.bin')
->in($someDir);
This will get you all files with the extension '.bin'.
If you need to perform a per file test to check that they are 'binary' then you can use grep on your list of #files.
my #files = grep {-B} File::Find::Rule->file()
->in($someDir);
print "Binary files found\n" if #files;
Also:
use strict; use warnings;. It's good.
Code formatting is a really good thing. perltidy -pbp makes it easy.
I am unclear as to what a binary file is for your test. I am assuming that any file found in the directory structure traversed is a binary file. Using File::Find, which is a core module:
use File::Find;
my $error = 0;
find(\&wanted, #ARGV);
if( $error ) {
print "ERROR, $error files found\n";
}
else {
print "SUCCESS\n";
}
sub wanted {
if( -f $_ ) {
$error++;
}
}
You may add any test to the wanted function. The find function will invoke the function provided for each file found in the list of directories that is also passed, which will be traversed recursively in depth-first search order (much like the find command does.) Passing it #ARGV you may invoke the script with a list of directories as required (maybe using shell expansion like DIR-*.)
The test function will get the file name being traversed in $_, while the current working directory is set to the directory that contains the file.
You can use below script to find if binary file exist or not recursively.
#! /usr/bin/env perl
use warnings;
use strict;
use File::Find;
my $path="/path/of/DIR-A";
sub find_binary {
my $file = $File::Find::name;
if (-B $file && ! -d $file) {
print "ERROR: At least one of the nested directories has a binary file : $file\n";
exit;
}
}
find(\&find_binary,$path);
print("SUCCESS: None of the nested directories have a binary file. \n");
Use (warning: my) module File::Globstar:
use v5.10;
use File::Globstar qw(globstar);
my $dir = $ARGV[0] // '.';
say join "\n", grep { -B && ! -d } globstar "$dir/**";
If you want the list of files in a list, assign it instead of printing it:
my #nondirs = grep { -B && ! -d } globstar "$dir/**";
If you know the extender of the files, you can also do this:
my #nondirs = grep { -B && ! -d } globstar "$dir/**/*.png";
Note that the file test -B produces a truthy value for empty files which is maybe not what you want. In that case change the test to -B && -s && ! -d.

Perl directory is getting past if(! -d) statement? [readdir results]

Okay so I have a program that basically looks into a passed in directory, if any file names match a pattern I will make a directory and move that specific file and any that matches it (regardless of extension) into that directory. Now if they don't match I should move them into the PassedInDir/misc/ directory.
I have a condition in both cases to avoid passing in any directory (as my program isn't ready to deal with those yet) something like if( ! -d $fp).
Everything works fine when I run it the first time in the directory. However when I run it again on the same directory (which should now only contain directories) I get the Error Could not move file assignmentZ to destination DataB/misc at projectSorter.pl line 16.. AssignmentZ is a directory however its somehow getting past the (!-d) in the second case.
#!/usr/bin/perl -w
use File::Copy;
if(#ARGV < 1){
print "\nUsage: proj6.pl <directory>\n\n";
exit;
}
die("\nDirectory $ARGV[0] does not exist\n\n") if( ! -e $ARGV[0]);
opendir( DIR, $ARGV[0]) or die("\nCould not open directory $ARGV[0]\n\n");
while(($fp = readdir(DIR))){
if($fp =~ m/proj(.*)\./){
(! -d "$ARGV[0]/assignment$1") && (mkdir "$ARGV[0]/assignment$1");
move("$ARGV[0]/$fp" , "$ARGV[0]/assignment$1") or die("Could not move file $fp to destination $ARGV[0]/assignment$1");
}
elsif(! -d $fp){ #gets past here!!!
(! -d "$ARGV[0]/misc") && (mkdir "$ARGV[0]/misc");
move("$ARGV[0]/$fp" , "$ARGV[0]/misc") or die("Could not move file $fp to destination $ARGV[0]/misc");
}
}
It is the only directory to do it out of the ones previously made by running my program once. I am curious about why this is happening.
$fp as set by readdir is relative to scanned directory. chdir to the scanned directory or prepend the scanned directory name for -d test.
You use "$ARGV[0]/$fp" as argument to move function.
perldoc -f readdir
readdir DIRHANDLE
Returns the next directory entry for a directory opened by
"opendir". […]
If you're planning to filetest the return values out of a
"readdir", you'd better prepend the directory in question.
Otherwise, because we didn't "chdir" there, it would have been
testing the wrong file.
Some suggestions.
‣ Don't use the -w flag with Perl. Some modules turn warnings off to do their work but the -w flag is global. With it, they will report warnings that should be ignored.
‣ Always have these two lines at the top of every script.
use strict;
use warnings;
These will catch a lot of errors in your code. See perldoc strict and perldoc warnings for more details.
‣ Use glob() or Find::Find instead of opendir/readdir/closedir.
‣ Use make_path() from File::Path instead of mkdir.
‣ Use an if statement for conditional execution instead of &&.
‣ Place blank lines in your code to make reading it easier.
File::Find and File::path are standard modules that come installed with Perl. For a list of the standard modules, see perldoc perlmodlib.
#!/usr/bin/perl
# --------------------------------------
# pragmas
use strict;
use warnings;
# --------------------------------------
# modules
use File::Copy;
use File::Path qw( make_path );
# --------------------------------------
# main
# make sure there is something to work on
if(#ARGV < 1){
print "\nUsage: proj6.pl <directory>\n\n";
exit;
}
# arguments should be directories
for my $src_dir ( #ARGV ){
# validate the source directory
die("\n$src_dir does not exist\n\n") if( ! -e $src_dir);
die("\n$src_dir is not a directory\n\n") if( ! -d $src_dir);
# move proj* files
for my $proj ( glob( "$src_dir/proj*" )){
# get the proj number
( my $number ) = $proj =~ m/proj(.*)\./;
# get the destination directory
my $dst_dir = "$src_dir/assignment$number";
# create the directory where it goes
if( ! -d $dst_dir ){
make_path( $dst_dir ) or die "could not make path $dst_dir";
}
# move the file
move( $proj, $dst_dir ) or die( "could not move file $proj to destination $dst_dir" );
} # end of $proj files
# move other files
for my $file ( grep { ! -d } glob( "$src_dir/*" )){
# get the destination directory
my $dst_dir = "$src_dir/misc";
# create the directory where it goes
if( ! -d $dst_dir ){
make_path( $dst_dir ) or die "could not make path $dst_dir";
}
# move the file
move( $file, $dst_dir ) or die( "could not move file $file to destination $dst_dir" );
} # end other files
} # end of src_dir

Perl script is not recognizing all directories using -d flag

I am having issues getting my Perl script to recognize some subdirectories when traversing through my file system.
Please note that this is part of a homework assignment, and I am unable to use modules of any kind. I have attempted to troubleshoot this on my own for quite some time, and am now at a bit of a roadblock.
I am attempting to traverse a file structure, capture the names of all of the files, directories, and subdirectories into their respective arrays, and print them out in the illustrated format below:
Directory: ./
Files: file1.text file2.pl file3.text
Directories: subdir1 subdir2 subdir3
Directory: subdir1
Files: file3.text file4.pl
Directories: subdir42
...and so on.
I have attempted to use recursion as a solution to this, but my teacher indicated that recursion was not the appropriate way to handle this problem.
I have managed to print, in the appropriate format, the current working directory, and subdirectories within the current working directory.
For some reason, when I change the current code block to
if (-d $entry){
next if $entry =~/^\./;
push(#subdirs,"$entry");
push(#dirs,"$currdir/$entry");
}
elsif(-f $entry) {
push(#files,"$entry");
}
It will omit some of the subdirectories.
Please see the entire script below.
#!/usr/bin/perl
use strict;
use warnings;
sub traverse {
my #dirs = ('.');
my #subdirs;
my #files;
while(my $currdir = shift #dirs){
opendir(my $dirhandle, $currdir) or die $!;
while( my $entry = readdir $dirhandle){
if (-d -r $entry){
next if $entry =~/^\./;
push(#subdirs,"$entry");
push(#dirs,"$entry");
}
else {
push(#files,"$entry");
}
}
print "Directory: $currdir\n";
print "Directories: ";
print "#subdirs";
print"\n";
print "Files: ";
foreach my $curfile (#files) {
next if $curfile eq '.' or $curfile eq '..';
if ($curfile =~ /(\S*\.delete)/){
unlink "$currdir/$curfile";
}
$curfile =~ s/txt$/text/;
print "$curfile ";
}
print "\n";
close $dirhandle;
undef #files;
undef #subdirs;
}
return;
}
traverse();
And the current output:
Directory: .
Directories: test dir_3 test2
Files: recursion.text count_files.pl testing.text thelastone.pl testing.pl prog5_test.pl program51.pl program5.pl recursion.pl recurse.text prog52.pl
dirs.pl
Directory: test
Directories:
Files: testfile1.text prog5_test.pl stilltesting program5.pl testfile2.text dirs.pl
Directory: dir_3
Directories:
Files:
Directory: test2
Directories:
Files: file2.text moretesting file3.text
stilltesting and moretesting should be recognized as directories.
if (-d $entry)
should be
if (-d "$currdir/$entry")
$entry is just a name in a directory. -d needs an actual path.

How can I compare file list from a tar archive and directory?

I am still learning Perl. Can anyone please suggest me the Perl code to compare files from .tar.gz and a directory path.
Let's say I have tar.gz backup of following directory path which I have taken few days back.
a/file1
a/file2
a/file3
a/b/file4
a/b/file5
a/c/file5
a/b/d/file and so on..
Now I want to compare files and directories under this path with the tar.gz backup file.
Please suggest Perl code to do that.
See Archive::Tar.
The Archive::Tar and File::Find modules will be helpful. A basic example is shown below. It just prints information about the files in a tar and the files in a directory tree.
It was not clear from your question how you want to compare the files. If you need to compare the actual content, the get_content() method in Archive::Tar::File will likely be needed. If a simpler comparison is adequate (for example, name, size, and mtime), you won't need much more than methods used in the example below.
#!/usr/bin/perl
use strict;
use warnings;
# A utility function to display our results.
sub Print_file_info {
print map("$_\n", #_), "\n";
}
# Print some basic information about files in a tar.
use Archive::Tar qw();
my $tar_file = 'some_tar_file.tar.gz';
my $tar = Archive::Tar->new($tar_file);
for my $ft ( $tar->get_files ){
# The variable $ft is an Archive::Tar::File object.
Print_file_info(
$ft->name,
$ft->is_file ? 'file' : 'other',
$ft->size,
$ft->mtime,
);
}
# Print some basic information about files in a directory tree.
use File::Find;
my $dir_name = 'some_directory';
my #files;
find(sub {push #files, $File::Find::name}, $dir_name);
Print_file_info(
$_,
-f $_ ? 'file' : 'other',
-s,
(stat)[9],
) for #files;
Perl is kind of overkill for this, really. A shell script would do fine. The steps you need to take though:
Extract the tar to a temporary folder somewhere.
diff -uR the two folders and redirect the output somewhere (or perhaps pipe to less as appropriate)
Clean up the temporary folder.
And you're done. Shouldn't be more than 5-6 lines. Something quick and untested:
#!/bin/sh
mkdir $TEMP/$$
tar -xz -f ../backups/backup.tgz $TEMP/$$
diff -uR $TEMP/$$ ./ | less
rm -rf $TEMP/$$
Heres an example that checks to see if every file that is in an archive, also exists in a folder.
# $1 is the file to test
# $2 is the base folder
for file in $( tar --list -f $1 | perl -pe'chomp;$_=qq["'$2'$_" ]' )
do
# work around bash deficiency
if [[ -e "$( perl -eprint$file )" ]]
then
echo " $file"
else
echo "no $file"
fi
done
This is how I tested this:
I removed / renamed config, then ran the following:
bash test Downloads/update-dnsomatic-0.1.2.tar.gz Downloads/
Which gave the output of:
"Downloads/update-dnsomatic-0.1.2/"
no "Downloads/update-dnsomatic-0.1.2/config"
"Downloads/update-dnsomatic-0.1.2/update-dnsomatic"
"Downloads/update-dnsomatic-0.1.2/README"
"Downloads/update-dnsomatic-0.1.2/install.sh"
I am new to bash / shell programming, so there is probably a better way to do this.
This might be a good starting point for a good Perl program. It does what the question asked for though.
It was just hacked together, and ignores most of the best practices for Perl.
perl test.pl full \
Downloads/update-dnsomatic-0.1.2.tar.gz \
Downloads/ \
update-dnsomatic-0.1.2
#! /usr/bin/env perl
use strict;
use 5.010;
use warnings;
use autodie;
use Archive::Tar;
use File::Spec::Functions qw'catfile catdir';
my($action,$file,$directory,$special_dir) = #ARGV;
if( #ARGV == 1 ){
$file = *STDOUT{IO};
}
if( #ARGV == 3 ){
$special_dir = '';
}
sub has_file(_);
sub same_size($$);
sub find_missing(\%$);
given( lc $action ){
# only compare names
when( #{[qw'simple name names']} ){
my #list = Archive::Tar->list_archive($file);
say qq'missing file: "$_"' for grep{ ! has_file } #list;
}
# compare names, sizes, contents
when( #{[qw'full aggressive']} ){
my $next = Archive::Tar->iter($file);
my( %visited );
while( my $file = $next->() ){
next unless $file->is_file;
my $name = $file->name;
$visited{$name} = 1;
unless( has_file($name) ){
say qq'missing file: "$name"' ;
next;
}
unless( same_size( $name, $file->size ) ){
say qq'different size: "$name"';
next;
}
next unless $file->size;
unless( same_checksum( $name, $file->get_content ) ){
say qq'different checksums: "$name"';
next;
}
}
say qq'file not in archive: "$_"' for find_missing %visited, $special_dir;
}
}
sub has_file(_){
my($file) = #_;
if( -e catfile $directory, $file ){
return 1;
}
return;
}
sub same_size($$){
my($file,$size) = #_;
if( -s catfile($directory,$file) == $size ){
return $size || '0 but true';
}
return; # empty list/undefined
}
sub same_checksum{
my($file,$contents) = #_;
require Digest::SHA1;
my($outside,$inside);
my $sha1 = Digest::SHA1->new;
{
open my $io, '<', catfile $directory, $file;
$sha1->addfile($io);
close $io;
$outside = $sha1->digest;
}
$sha1->add($contents);
$inside = $sha1->digest;
return 1 if $inside eq $outside;
return;
}
sub find_missing(\%$){
my($found,$current_dir) = #_;
my(#dirs,#files);
{
my $open_dir = catdir($directory,$current_dir);
opendir my($h), $open_dir;
while( my $elem = readdir $h ){
next if $elem =~ /^[.]{1,2}[\\\/]?$/;
my $path = catfile $current_dir, $elem;
my $open_path = catfile $open_dir, $elem;
given($open_path){
when( -d ){
push #dirs, $path;
}
when( -f ){
push #files, $path, unless $found->{$path};
}
default{
die qq'not a file or a directory: "$path"';
}
}
}
}
for my $path ( #dirs ){
push #files, find_missing %$found, $path;
}
return #files;
}
After renaming config to config.rm, adding an extra char to README, changing a char in install.sh, and adding a file .test. This is what it outputted:
missing file: "update-dnsomatic-0.1.2/config"
different size: "update-dnsomatic-0.1.2/README"
different checksums: "update-dnsomatic-0.1.2/install.sh"
file not in archive: "update-dnsomatic-0.1.2/config.rm"
file not in archive: "update-dnsomatic-0.1.2/.test"