Check if the nested directory structure is empty using perl - perl

My requirement is to check if a nested directory structure is having any binary file or not.
The directory structure looks something like this:
DIR-A
|
|--DIR-X
| |
| |--DIR-X1
| |--DIR-X2
|
|--DIR-Y
| |
| |--DIR-Y1
| |--DIR-Y2
| |--DIR-Y3
|
|--DIR-Z
| |
| |--DIR-Z1
| |--DIR-Z2
| |--DIR-Z3
At any point in time there can be more directories at Level-1 or Level-2 i.e. there can be some more directories i.e. DIR-P, DIR-Q etc at level-1 and there can be DIR-X3 or DIR-Y4 at level-2.
I have written a sample code but it exits if it finds DIR-X1, Ideally it should exit if there is a binary file inside the directory.
#!/usr/bin/perl
my $someDir = "/path/of/DIR-A";
my #files = ();
my $file;
my $i=0;
opendir(DIR, "$someDir") or die "Cant open $someDir: $!\n";
#files = readdir(DIR);
foreach $file(#files)
{
unless ($file =~ /^[.][.]?\z/)
{
print "$i : $file \n";
$i++;
last;
}
}
if ($i != 0)
{
print "The directory contains files! \n";
exit 1;
}
else
{
print "This DIR-A is Empty! \n";
exit 0;
}
closedir(DIR);
Please suggest me get to the expected solution as below:
read DIR-A
print SUCCESS, if none of the nested directories have a binary file.
print ERROR, if at least one of the nested directories has a binary file.
Thanks!

Use File::Find::Rule
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find::Rule;
my $someDir = "/path/of/DIR-A";
my #files = File::Find::Rule->file()
->name('*.bin')
->in($someDir);
This will get you all files with the extension '.bin'.
If you need to perform a per file test to check that they are 'binary' then you can use grep on your list of #files.
my #files = grep {-B} File::Find::Rule->file()
->in($someDir);
print "Binary files found\n" if #files;
Also:
use strict; use warnings;. It's good.
Code formatting is a really good thing. perltidy -pbp makes it easy.

I am unclear as to what a binary file is for your test. I am assuming that any file found in the directory structure traversed is a binary file. Using File::Find, which is a core module:
use File::Find;
my $error = 0;
find(\&wanted, #ARGV);
if( $error ) {
print "ERROR, $error files found\n";
}
else {
print "SUCCESS\n";
}
sub wanted {
if( -f $_ ) {
$error++;
}
}
You may add any test to the wanted function. The find function will invoke the function provided for each file found in the list of directories that is also passed, which will be traversed recursively in depth-first search order (much like the find command does.) Passing it #ARGV you may invoke the script with a list of directories as required (maybe using shell expansion like DIR-*.)
The test function will get the file name being traversed in $_, while the current working directory is set to the directory that contains the file.

You can use below script to find if binary file exist or not recursively.
#! /usr/bin/env perl
use warnings;
use strict;
use File::Find;
my $path="/path/of/DIR-A";
sub find_binary {
my $file = $File::Find::name;
if (-B $file && ! -d $file) {
print "ERROR: At least one of the nested directories has a binary file : $file\n";
exit;
}
}
find(\&find_binary,$path);
print("SUCCESS: None of the nested directories have a binary file. \n");

Use (warning: my) module File::Globstar:
use v5.10;
use File::Globstar qw(globstar);
my $dir = $ARGV[0] // '.';
say join "\n", grep { -B && ! -d } globstar "$dir/**";
If you want the list of files in a list, assign it instead of printing it:
my #nondirs = grep { -B && ! -d } globstar "$dir/**";
If you know the extender of the files, you can also do this:
my #nondirs = grep { -B && ! -d } globstar "$dir/**/*.png";
Note that the file test -B produces a truthy value for empty files which is maybe not what you want. In that case change the test to -B && -s && ! -d.

Related

How to use the Unix/AIX find command with a pipe in Perl?

I'm trying to use the Unix/AIX find command piped to the head command to return the first file in a directory and assign it to a variable. However, all of my attempts have resulted in the all the files that find returns being assigned to the variable without the head command being applied.
Here's are my three attempts:
Attempt 1:
$first_file = `/usr/bin/find $my_path -type f -name $potential_file_names | head -n1`;
Attempt 2:
$first_file = `/usr/bin/find $my_path -type f -name $potential_file_names '|' head -n1`;
Attempt 3:
$first_file = `/usr/bin/find $my_path -type f -name $potential_file_names \\| head -n1`;
The $potential_file_names variable is a string with wildcard characters to return any file in the directory that's in the format "fileXXX.txt" where 'XXX' is a three digit number.
$potential_file_names = 'file???.txt';
The first attempt doesn't work because Perl appears to take exception to the pipe as it returns error, "sh[2]: 0403-057.
First attempt output:
file001.txt
file002.txt
file003.txt
file004.txt
file005.txt
The second and third attempts also fail. The error for them is, "sh[2]: |: not found."
The output for the second and third attempts is the same as the first attempt.
Is it possible to use the find command piped to head to return the first file in the directory I'm searching (in my case, "file001.txt"?
Update
I should mention that the file names may not start with 001, so I'll need the oldest file. The files are created sequentially, so grabbing the first file using find and piping to head -n1 works from the command line outside the script. It needs to be the oldest/first file because I'll be deleting files using a loop later in the script and this needs to find the oldest/first file for each iteration.
Thanks.
Try something like this:
open EXE, qq{/usr/bin/find $my_path -type f -name $potential_file_names | head -n1}
or die qq{Error running command $!};
my $file = <EXE>;
close(EXE);
Avoid using system and backticks when there are pure Perl equivalents; your code will be more portable and you won't have to worry about nasty shell quoting issues.
If you don't care about subdirectories, you can use readdir to get a list of files inside a particular directory:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = 'foo';
opendir my $dh, $dir or die $!;
my #files = sort { -M "$dir/$b" <=> -M "$dir/$a" }
grep { /^file\d{3}\.txt$/ && -f "$dir/$_" } readdir $dh;
closedir $dh;
print $files[0];
This prints the name of the file with the oldest modified date, although you could certainly use another file test instead.
If you also want to search inside subdirectories, you can use File::Find, which is a core module:
use File::Find;
use File::Spec;
my #files;
my $dir = 'foo';
find(sub { push #files, $File::Find::name if /^file\d{3}\.txt$/ and -f $_; }, $dir);
my #sorted = sort { -M $b <=> -M $a } #files;
print $sorted[0];
This prints the path to the file with the oldest modified date.
Okay, some of the answers create a dogs breakfast for follow on coders but do point in the correct direction, with the module 'use File::Find;'
Sample of how I use it.
find (\&wanted, $directory); # start searching the path
sub wanted {
my $file = $File::Find::name;
if (-d $file ) {
$directoryMap{$file} = $file;
return;
}
if (-z $file) {
$zeroHash{$file} = 1;
return;
}
if ($file =~ /(AAF|MXF|NSV|Ogg|RM|SVI|SMI|WMV)$/i) {
my $size = -s $file;
if ($size) {
$hashmap{$file} = $size;
return;
}
else {
$rejectMap{$file} = 1;
return;
}
}
else {
$rejectMap{$file} = 1;
return;
}
}
I use this to look for specific files with a specific extension and then I stuff them into a hash - the whole code an be found in my github in my Perl Diretory (https://github.com/alexmac131/mediaData). you can change the wanted to something useful for you.

How to compare two directories and their files in perl

Fred here again with a little issue I'm having that I hope you guys can help me with.
I'm reviewing for midterms and going over an old file I found on here and I wanted to get it working. I can't find it on here anymore but I still have the source code so I'll make another question on it.
So here was his assignment:
Write a perl script that will compare two directories for differences in regular files. All regular files with the same names should be tested with the unix function /usr/bin/diff -q which will determine whether they are identical. A file in dir1 which does not have a similarly named file in dir2 will have it's name printed after the string <<< while a file in dir2 without a corresponding dir1 entry will be prefixed with the string >>>. If two files have the same name but are different then the file name will be surrounded by > <.
Here is the script:
#!/usr/bin/perl -w
use File::Basename;
#files1 = `/usr/bin/find $ARGV[0] -print`;
chop #files1;
#files2 = `/usr/bin/find $ARGV[1] -print`;
chop #files2;
statement:
for ($i=1; #files1 >= $i; $i++) {
for ($x=1; #files2 >= $x; $x++) {
$file1 = basename($files1[$i]);
$file2 = basename($files2[$x]);
if ($file1 eq $file2) {
shift #files1;
shift #files2;
$result = `/usr/bin/diff -q $files1[$i] $files2[$x]`;
chop $result;
if ($result eq "Files $files1[$i] and $files2[$x] differ") {
print "< $file1 >\n";
next statement;
} else {
print "> $file1 <\n";
}
} else {
if ( !-e "$files1[$i]/$file2") { print ">>> $file2\n";}
unless ( -e "$files2[$x]/$file1") { print "<<< $file1\n";}
}
}
}
This is the output:
> file2 <
>>> file5
<<< file1
The output should be:
> file1 <
> file2 <
<<< file4
>>> file5
I already checked the files to make sure that they all match and such but still having problems. If anyone can help me out I would greatly appreciate it!
First off, always use these:
use strict;
use warnings;
It comes with a short learning curve, but they more than make up for it in the long run.
Some notes:
You should use the File::Find module instead of using a system call.
You start your loops at array index 1. In perl, the first array index is 0. So you skip the first element.
Your loop condition is wrong. #files >= $x means you will iterate to 1 more than max index (normally). You want either $x < #files or $x <= $#files.
You should use chomp, which is a safer version of chop.
Altering the arrays you are iterating over is a sure way to cause yourself some confusion.
Why use if (! -e ...) and then unless (-e ...)? That surely just adds confusion.
And this part:
$file1 = basename($files1[$i]);
...
if ( !-e "$files1[$i]/$file2" )
Assuming #files1 contains file names and not just directories, this will never match anything. For example:
$file2 = basename("dir/bar.html");
$file1 = basename("foo/bar.html");
-e "foo/bar.html/bar.html"; # does not compute
I would recommend using hashes for the lookup, assuming you only want to match against identical file names and missing file names:
use strict;
use warnings;
use File::Find;
use List::MoreUtils qw(uniq);
my (%files1, %files2);
my ($dir1, $dir2) = #ARGV;
find( sub { -f && $files1{$_} = $File::Find::name }, $dir1);
find( sub { -f && $files2{$_} = $File::Find::name }, $dir2);
my #all = uniq(keys %files1, keys %files2);
for my $file (#all) {
my $result;
if ($files1{$file} && $files2{$file}) { # file exists in both dirs
$result = qx(/usr/bin/diff -q $files1{$file} $files2{$file});
# ... etc
} elsif ($files1{$file}) { # file only exists in dir1
} else { # file only exists in dir2
}
}
In the find() subroutine, $_ represents the base name, and $File::Find::name the name including path (which is suitable for use with diff). The -f check will assert that you only include regular files in your hash.

How to change all occurrences of a char in all files in a directory (and subdirectories ) using Perl

** I have a follow-up question that is marked with '**' **
I was asked to write Perl code that replaces every { with {function(<counter>) and in every replacement the counter should get larger by 1. e.g. first replacement of { will be {function(0) ,
second replacement of { will be {function(1) etc.
It suppose to do the replacement in every *.c and *.h file in a folder including subfolders.
I wrote this code :
#!/usr/bin/perl
use Tie::File;
use File::Find;
$counter = 0;
$flag = 1;
#directories_to_search = 'd:\testing perl';
#newString = '{ function('.$counter.')';
$refChar = "{";
finddepth(\&fileMode, #directories_to_search);
sub fileMode
{
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{
if (-f $file) # check if it is a file or dir
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{
if (s/{/#newString/g)
{
$counter++;
#newString = '{function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
The code searches the directory d:\testing Perl and does the replacement but instead of getting
{function(<number>) I get {function(number1) function(number3) function(number5) function(number7) for instance for the first replacement I get
{function(0) function(2) function(4) function(6) and I wanted to get {function(0)
I really don't know what is wrong with my code.
An awk solution or any other Perl solution will also be great!
* I have a follow-up question.
now I want my perl program to do the same substitution in all the files except the lines when there is a '{'
and a '}' in the same line. so i modified the code this way.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
use File::Find;
my $dir = "C:/test dir";
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
#foreach $filename (#ARGV)
while (<>)
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $ARGV or die "Can't read file: $!\n";
#$_='{function(' . $counter++ . ')';
foreach (#lines)
{
if (!( index (#lines,'}')!= -1 )) # if there is a '}' in the same line don't add the macro
{
s/{/'{function(' . $counter++ . ')'/ge;
print;
}
}
untie #lines; # free #lines
}
what I was trying to do is to go through all the files in #ARGV that i found in my dir and subdirs and for each *.c or *.h file I want to go line by line and check if this line contains '{'. if it does the program won't check if there is a '{' and won't make the substitution, if it doesn't the program will substitute '{' with '{function();'
unfortunately this code does not work.
I'm ashamed to say that I'm trying to make it work all day and still no go.
I would really appreciate some help.
Thank You!!
This is a simple matter of combining a finding method with an in-place edit. You could use Tie::File, but it is really the same end result. Also, needless to say, you should keep backups of your original files, always, when doing edits like these because changes are irreversible.
So, if you do not need recursion, your task is dead simple in Unix/Linux style:
perl -pi -we 's/{/"{ function(" . $i++ . ")"/ge' *.h *.c
Of course, since you seem to be using Windows, the cmd shell won't glob our arguments, so we need to do that manually. And we need to change the quotes around. And also, we need to supply a backup argument for the -i (in-place edit) switch.
perl -pi.bak -we "BEGIN { #ARGV = map glob, #ARGV }; s/{/'{ function(' . $i++ . ')'/ge" *.h *.c
This is almost getting long enough to make a script of.
If you do need recursion, you would use File::Find. Note that this code is pretty much identical in functionality as the one above.
use strict;
use warnings;
use File::Find;
my $dir = "d:/testing perl"; # use forward slashes in paths
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
while (<>) {
s/{/'{ function(' . $counter++ . ')'/ge;
print;
}
Don't be lulled into a false sense of security by the backup option: If you run this script twice in a row, those backups will be overwritten, so keep that in mind.
$ perl -pi -e 's| (?<={) | q#function(# . ($i++) . q#)# |gex' *.c *.h
It can be done in a single line as below:
perl -pi -e 's/({)/"{function(".++$a.")"/ge;' your_file
I have just taken an example input file and tested too.
> cat temp
line-1 { { { {
line-2 { { {
line-3 { {
line-4 {
Now the execution:
> perl -pi -e 's/({)/"{function(".++$a.")"/ge;' temp
> cat temp
line-1 {function(1) {function(2) {function(3) {function(4)
line-2 {function(5) {function(6) {function(7)
line-3 {function(8) {function(9)
line-4 {function(10)
Using awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' and your code as input:
$ awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' file
sub fileMode
{function(0)
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{function(1)
if (-f $file) # check if it is a file or dir
{function(2)
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{function(3)' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{function(4)
if (s/{function(5)/#newString/g)
{function(6)
$counter++;
#newString = '{function(7)function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
Note: The function number won't be incremented for inline nested {.
$ echo -e '{ { \n{\n-\n{' | awk '/{/{gsub(/{/,"{function("i++")");print;next}1'
{function(0) {function(0)
{function(1)
-
{function(2)
Explanation:
/{/ # For any lines that contain {
gsub( /{/ , "{function("i++")" ) # replace { with function(i++)
print;next # print the line where the replacement happened and skip to the next
print # print all the lines

How do I check for a sub-subdirectory in Perl?

I have a folder called Client which contains many subfolders. I want to create a Perl script to look at each of those subfolders and check for a folder there. If it is there, I want to skip it and move on, if it is not there, I want to create it and do some processing.
How do I go about looping through all of the subfolders and checking for the directory I want? I have found a lot of information on how to get all the files in a folder and/or subfolders, but nothing on checking for a directory within each subfolder.
Augh! Too much complexity in the other answers. The original question doesn't appear to be asking for a recursive traversal. As far as I can see, this is a perfectly sensible solution, and vastly more readable to boot:
foreach my $dir (glob "Client/*") {
next if ! -d $dir; # skip if it's not a directory
next if -d "$dir/subfolder"; # skip if subfolder already exists
mkdir "$dir/subfolder" or die; # create it
do_some_processing(); # do some processing
}
Seriously folks: opendir/readdir? Really?
It's pretty easy once you break it into steps. Get a list of the subdirectories with glob then see which ones don't have the second-level directory. If you are using a File::Find-like module, you are probably doing too much work:
#!perl
use strict;
use warnings;
use File::Spec::Functions;
my $start = 'Clients';
my $subdir = 'already_there';
# #queue is the list of directories you need to process
my #queue = grep { ! -d catfile( $_, $subdir ) } # filter for the second level
grep { -d } # filter for directories
glob catfile( $start, '*' ); # everything below $start
#!/usr/bin/perl
use strict;
use Fcntl qw( :DEFAULT :flock :seek );
use File::Spec;
use IO::Handle;
my $startdir = shift #ARGV || '.';
die "$startdir is not a directory\n"
unless -d $startdir;
my $verify_dir_name = 'MyDir';
my $dh = new IO::Handle;
opendir $dh, $startdir or
die "Cannot open $startdir: $!\n";
while(defined(my $cont = readdir($dh))) {
next
if $cont eq '.' || $cont eq '..';
my $fullpath = File::Spec->catfile($dir, $cont);
next
unless -d $fullpath && -r $fullpath && -w $fullpath;
my $verify_path = File::Spec->catfile($fullpath, $verify_dir_name);
next
if -d $verify_path;
mkdir($verify_path, 0755);
# do whatever other operations you want to $verify_path
}
closedir($dh);
The short answer is use File::FInd.
The long answer is first write a subroutine that validates the existence of the folder and if the folder is not there, create it and then do the processing needed. Then invoke the find method of the File::Find module with a reference to the subroutine and the starting folder to process all the subfolders.

How do I distinguish a file from a directory in Perl?

I'm trying to traverse through all the subdirectories of the current directory in Perl, and get data from those files. I'm using grep to get a list of all files and folders in the given directory, but I don't know which of the values returned is a folder name and which is a file with no file extention.
How can I tell the difference?
You can use a -d file test operator to check if something is a directory. Here's some of the commonly useful file test operators
-e File exists.
-z File has zero size (is empty).
-s File has nonzero size (returns size in bytes).
-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
See perlfunc manual page for more
Also, try using File::Find which can recurse directories for you. Here's a sample which looks for directories....
sub wanted {
if (-d) {
print $File::Find::name." is a directory\n";
}
}
find(\&wanted, $mydir);
print "$file is a directory\n" if ( -d $file );
Look at the -X operators:
perldoc -f -X
For directory traversal, use File::Find, or, if you're not a masochist, use my File::Next module which makes an iterator for you and doesn't require crazy callbacks. In fact, you can have File::Next ONLY return files, and ignore directories.
use File::Next;
my $iterator = File::Next::files( '/tmp' );
while ( defined ( my $file = $iterator->() ) ) {
print $file, "\n";
}
# Prints...
/tmp/foo.txt
/tmp/bar.pl
/tmp/baz/1
/tmp/baz/2.txt
/tmp/baz/wango/tango/purple.txt
It's at http://metacpan.org/pod/File::Next
my $dh = opendir(".");
my #entries = grep !/^\.\.?$/, readdir($dh);
closedir $dh;
foreach my $entry (#entries) {
if(-f $entry) {
# $entry is a file
} elsif (-d $entry) {
# $entry is a directory
}
}
my #files = grep { -f } #all;
my #dirs = grep { -d } #all;
It would be easier to use File::Find.