Perl abs_path undefined return using File::Find::dir value from relative directory - perl

A Perl script running on macOS takes a directory argument and then calls the File::Find &wanted() function to process the contents.
If I supply the directory as an absolute path (e.g. "/Users/sw/Downloads/FGT") I have no problems.
But if I supply a relative path (i.e. "." when run from above directory) I get an error.
The error happens when processing a file in a subdirectory: it is always the first file in the particular subdirectory (the first second level subdirectory) that generates the error (if I create or touch a new file with an alphabetically earlier name than any existing one in the directory, it is this new file that causes the error). Many earlier files are processed OK.
Complete simplified code below, with output.
So why does the below "abs_path" call give an undefined result for this one particular file (first file in first second level subdirectory)?
COMPLETE CODE:
#!/usr/local/bin/perl
use warnings;
use English;
use vars qw ( $VERSION );
use Cwd 'abs_path';
use File::Find;
find({ wanted => \&processFile, follow => 0 }, $ARGV[0]); #we do NOT follow symbolic links
exit();
sub processFile
{
if (substr($_,0,1) eq ".") { return; } #ignore entry starting with dot
print "\nPROCESSING WITH:\n";
print ">> \$File::Find::dir = <<$File::Find::dir>>\n";
print ">> \$_ = <<$_>>\n";
print ">> \$File::Find::name=<<$File::Find::name>>\n";
my $dirAbsPath=abs_path($File::Find::dir);
die "ERROR: undef abs_path() using Find::File::dir value!\n" if ! defined $dirAbsPath;
print "Dir Abs Path is <<$dirAbsPath>>\n";
}
OUTPUT:
PROCESSING WITH:
>> $File::Find::dir = <<.>>
>> $_ = <<L1DIR>>
>> $File::Find::name=<<./L1DIR>>
Dir Abs Path is <</Users/sw/Downloads/FGT>>
PROCESSING WITH:
>> $File::Find::dir = <<./L1DIR>>
>> $_ = <<LPT1.csv>>
>> $File::Find::name=<<./L1DIR/LPT1.csv>>
Dir Abs Path is <</Users/sw/Downloads/FGT/L1DIR/L1DIR>>
PROCESSING WITH:
>> $File::Find::dir = <<./L1DIR>>
>> $_ = <<L2DIR2>>
>> $File::Find::name=<<./L1DIR/L2DIR2>>
Dir Abs Path is <</Users/sw/Downloads/FGT/L1DIR/L1DIR>>
PROCESSING WITH:
>> $File::Find::dir = <<./L1DIR/L2DIR2>>
>> $_ = <<abc>>
>> $File::Find::name=<<./L1DIR/L2DIR2/abc>>
ERROR: undef abs_path() using Find::File::dir value!
DIRECTORY TREE OF ".":
sw#Max FGT % tree .
.
└── L1DIR
├── L2DIR1
│   └── LD.csv
├── L2DIR2
│   ├── abc
│   └── x.csv
├── L2DIR3
│   └── xsym2 -> T2/x.csv
└── LPT1.csv

$File::Find::dir is relative to the original CWD. But File::Find changes the CWD unless you use no_chdir => 1.
When processing ./L1DIR/L2DIR2/abc with no_chdir => 0 (default),
The CWD is $orig_cwd/./L1DIR/L2DIR2.
$File::Find::name is ./L1DIR/L2DIR2/abc.
$File::Find::dir is ./L1DIR/L2DIR2.
$_ is abc.
(Where $orig_cwd is the absolute path of . before you call find.)
So that means that abs_path( $File::Find::dir ) is the absolute path of $orig_cwd/./L1DIR/L2DIR2/./L1DIR/L2DIR2, which obviously fails.
Using abs_path(".") would work.
When processing ./L1DIR/L2DIR2/abc with no_chdir => 1,
The CWD is $orig_cwd.
$File::Find::name is ./L1DIR/L2DIR2/abc.
$File::Find::dir is ./L1DIR/L2DIR2.
$_ is ./L1DIR/L2DIR2/abc.
So that means that abs_path( $File::Find::dir ) is the absolute path of $orig_cwd/./L1DIR/L2DIR2, which is what you want.
I prefer to avoid all those needless chdir, I prefer using no_chdir => 1 at all times.

Related

Zip a complete directory in Perl

I have tried zipping a directory which contains sub directories and files in perl and this is my code.
#!/usr/bin/perl -w
use strict;
use List::Util qw(first);
use Archive::Zip qw(:ERROR_CODES :CONSTANTS);
my $input= 'D:\temp\sample';
my $outDir = 'D:\out\sample.zip';
my $obj = Archive::Zip->new();
$obj->addTree( $input );
# # Write the files to zip.
if ($obj->writeToFileNamed($outDir) == AZ_OK)
{
# write to disk
print "\n\nArchive created successfully!\n";
}
else
{
print "Error while Zipping !";
}
When i execute this, the zip file ( sample.zip ) is created.
Always the unzipping is done manually selecting the option " Extract Here ".
When i unzip this , the directory bursts open putting all the subfolders and files at the same location( D:\out).
What i want is, i expect the output to be a unique folder ( D:\out\sample)
So, what changes need to be done in my code. Pls help me.
From the docs:
$zip->addTree( $root, $dest [, $pred, $compressionLevel ] ) -- Add tree of files to a zip
$root is the root of the tree of files and directories to be added. It
is a valid directory name on your system. $dest is the name for the
root in the zip file (undef or blank means to use relative pathnames).
It is a valid ZIP directory name (that is, it uses forward slashes (/)
for separating directory components).
And the examples, also from the docs:
use Archive::Zip;
my $zip = Archive::Zip->new();
# add all readable files and directories below . as xyz/*
$zip->addTree( '.', 'xyz' );
# add all readable plain files below /abc as def/*
$zip->addTree( '/abc', 'def', sub { -f && -r } );
# add all .c files below /tmp as stuff/*
$zip->addTreeMatching( '/tmp', 'stuff', '\.c$' );
# add all .o files below /tmp as stuff/* if they aren't writable
$zip->addTreeMatching( '/tmp', 'stuff', '\.o$', sub { ! -w } );
# add all .so files below /tmp that are smaller than 200 bytes as stuff/*
$zip->addTreeMatching( '/tmp', 'stuff', '\.o$', sub { -s < 200 } );
# and write them into a file
$zip->writeToFileNamed('xxx.zip');
# now extract the same files into /tmpx
$zip->extractTree( 'stuff', '/tmpx' );

Find up-to-date files for different paths but with identical file names

I have the following files
./path/to/stuff1/file1 (x)
./path/to/stuff1/file2
./path/to/stuff1/file3
./path/to/stuff2/file1
./path/to/stuff2/file2 (x)
./path/to/stuff2/file3
./path/to/stuff3/file1 (x)
./path/to/stuff3/file2
./path/to/stuff3/file3
where I marked the files I touched lastly. I want to get exactly those marked files. In other words:
I want to get the up-to-date file for each directory.
I constructed the bash command
for line in $( find . -name 'file*' -type f | awk -F/ 'sub($NF,x)' | sort | uniq ); do
find $line -name 'file*' -type f -printf '%T# %p\n' | sort -n | tail -1 | cut -f2 -d' '
done
which I am able to use in perl using the system command and escaping the $. Is it possible to do this directly in perl or do you think my approach is fine?
edit
If possible the task should be done in perl without using external modules.
edit2
Sorry, I noticed my question wasn't clear. I thought the answer of #TLP would work but I have to clearify: I want to check for the newest file in each folder, e.g. the newest file in stuff1. Say I do
touch ./path/to/stuff1/file1
touch ./path/to/stuff2/file2
touch ./path/to/stuff3/file1
before I run the script. It then should output:
./path/to/stuff1/file1
./path/to/stuff2/file2
./path/to/stuff3/file1
The filename can be identical for different stuff but only one file per path should be output.
The script of #codnodder does this but I wish to search for only for the filename and not for the full path. So I want to search for all files beginning with file and the script should search recursively.
Your find command can be emulated with File::Find's find command. This is a core module in Perl 5, and is almost certainly already on your system. To check the file modification time, you can use the -M file test.
So something like this:
use strict;
use warnings;
use File::Find;
my %times;
find(\&wanted, '.');
for my $dir (keys %times) {
print $times{$dir}{file}, "\n";
}
sub wanted {
return unless (-f && /^file/);
my $mod = -M $_;
if (!defined($times{$File::Find::dir}) or
$mod < $times{$File::Find::dir}{mod}) {
$times{$File::Find::dir}{mod} = $mod;
$times{$File::Find::dir}{file} = $File::Find::name;
}
}
If I run this command in my test directory, on my system, I get the following Data::Dumper structure, where you can clearly see the file name key, the full path stored in the file key, and the modification date (in days compared to the run time of the script) as the mod.
$VAR1 = {
'./phone' => {
'file' => './phone/file.txt',
'mod' => '3.47222222222222e-005'
},
'./foo' => {
'file' => './foo/fileb.txt',
'mod' => '0.185'
},
'.' => {
'file' => './file.conf',
'mod' => '0.154490740740741'
}
};
There are 3 general approaches I can think of.
Using opendir(), readdir(), and stat().
using File::Find.
Using glob().
The most appropriate option depends on the specifics of what you have
to work with, that we can't see from your posting.
Also, I assume when you say "no external modules", you are not
excluding modules installed with Perl (i.e., in Core).
Here is an example using glob():
use File::Basename qw/fileparse/;
for my $file (newest_file()) {
print "$file\n";
}
sub newest_file {
my %files;
for my $file (glob('./path/stuff*/file*')) {
my ($name, $path, $suffix) = fileparse($file);
my $mtime = (stat($file))[9];
if (!exists $files{$path} || $mtime > $files{$path}[0]) {
$files{$path} = [$mtime, $name];
}
}
return map { $files{$_}[1] } keys %files;
}

How to go 1 level back for a directory path stored in a variable in perl

I am dynamically getting the path of a directory and it is getting stored in a variable. I need to go 1 level down and get the parent directory. Also, the path is a Windows path.
Basically , i am running perl on windows.
How to achieve this in perl ? Anything is fine - regular expression or any other way.
Please help!
For example :
Original directory path :
my $dir = "C:/mytest/mydata/mywork/mydir";
Output what is needed is :
my $dir = "C:/mytest/mydata/mywork";
I need till parent directory "mywork" , not "mydir".
Can anybody help in writing code for the same ?
Second query :
I am having space in the directory path:
example :
my $testdir = "C:/mytest own/mydata/mywork/mydir";
Notice, "mytest own" is one directory but with space.
Now, when i am trying to enter to use this variable to enter to this directory, i am unable to do so.
For query 1 , I tried the below piece of code :
#Ingo :
#!/usr/bin/perl
my $dir = "C:/mytest/mydata/mywork/mydir";
print "Input directory is : $dir\n";
my $outdir = $dir . "/..";
print "Output directory is : $outdir";
Note : I need to accomplish this, without using any perl module.
#TLP : I tried using your cut-short method, the code is again copied, but it doesn't give the desired output.
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec;
my $str = "C:/mytest/mydata/mywork/mydir";
my #dirs = File::Spec->splitdir($str); # parse directories
my $newdir = File::Spec->catdir(#dirs, File::Spec->updir()); # create new path
print $newdir;
Output looks like : The output gets appended with /..
C:/mytest/mydata/mywork/mydir/..
Using a module for manipulating a path is perhaps a safer option. File::Spec has the convenient splitdir and catdir functions:
use strict;
use warnings;
use File::Spec;
my $str = "C:/mytest/mydata/mywork/mydir";
my #dirs = File::Spec->splitdir($str); # parse directories
pop #dirs; # remove top dir
my $newdir = File::Spec->catdir(#dirs); # create new path
print $newdir;
Note that the new path will by default use path delimiters that your current system uses.
You can also do
my $newdir = File::Spec->catdir(#dirs, File::Spec->updir());
(updir() returns ..) The output is the same on my system (Windows 7), which is to say
C:\mytest\mydata\mywork
I would have thought that it would be C:\mytest\mydata\mywork\mydir\.., but apparently the module abbreviated.
First, to get the parent directory, just append "/.."
$dir . "/.."
Second, are you using this in an external command or something? Then you must quote the path name, like in:
system "ls '$testdir'"
Otherwise, if you want to change directory in perl, the following should be fine:
chdir $testdir or die "Can't change to '$testdir' - $!"
Since you might want to know how to separate a directory path into a list of path elements, here is a subroutine which will separate the path. This is for linux, so you probably want to change the path separator regex,
#!/bin/perl
use strict;
use warnings;
#examine pathname, extract directory path
#examine pathname, extract directory path
sub dirname
{
my($pn) = #_;
my #p = split(/\//,$pn); ##windows would want split(/\\/,$pn)
if( $#p > 0 ) { pop(#p); push(#p,""); }
else { pop(#p); push(#p,"."); }
$pn = join('/',#p); ##you would want join('\\',#p)
return($pn);
};
my $dir=$ARGV[0] || "/a/b/c/d/e/f/g/h/i/j";
while ( $dir && length($dir)>1) {
print "$dir\n";
$dir = dirname($dir);
}
use this code
my $dir = "C:/mytest/mydata/mywork/mydir";
my $char = '/';
my $result = rindex($dir, $char);
$path = substr("$dir", 0,$result);

Comparing two directories using Perl

i am new to Perl so excuse my noobness,
Here's what i intend to do.
$ perl dirComp.pl dir1 dir2
dir1 & dir2 are directory names.
The script dirComp.pl should identify whether contents in dir1 & dir2 are identical or not.
I have come up with an algorithm
Store all the contents of dir1(recursively) in a list
Store all the contents of dir2 in another list
Compare the two list, if they are same - dir1 & dir2 are same else not.
my #files1 = readdir(DIR1h);
my #files2 = readdir(DIR2h);
# Remove filename extensions for each list.
foreach my $item (#files1) {
my ( $fileName, $filePath, $fileExt ) = fileparse($item, qr/\.[^.]*/);
$item = $fileName;
}
foreach my $item (#files2) {
my ( $fileName, $filePath, $fileExt ) = fileparse($item, qr/\.[^.]*/);
$item = $fileName;
}
I am not able to recursively traverse subdirectories in a given directory with the help of above code. Any help would be appreciated.
EDIT: Using File:DirCompare
#!/usr/bin/perl -w
use File::DirCompare;
use File::Basename;
if ($#ARGV < 1 )
{
&usage;
}
my $dir1 = $ARGV[0];
my $dir2 = $ARGV[1];
File::DirCompare->compare($dir1,$dir2,sub {
my ($a,$b) = #_;
if ( !$b )
{
printf "Test result:PASSED.\n";
printf "Only in %s : %s\n", dirname($a), basename($a);
}elsif ( !$a ) {
printf "Test result:PASSED.\n";
printf "Only in %s : %s\n", dirname($b), basename($b);
}else {
printf "Test result:FAILED.\n";
printf "Files $a and $b are different.\n";
}
});
I have a directory structure as below,
dir1/ dir2/
--file1.txt --file1.txt
--file2.txt --file2.txt
--file3.cpp --file3.cpp
I am facing Test result:FAILED. As the result must have been passed. Can anyone please correct me?
Thanks
The example you supplied using File::DirCompare works as intended.
Keep in mind that the callback subroutine is called for every unique file in each directory and for every pair of files which differ in their content. Having the same filename is not enough, the contents of each file in each directory must be exactly the same as well.
Furthermore, the cases in which you report "PASSED" aren't a success at all (by your definition) since they detail the cases in which a file exists in one of the directories, but not the other: meaning the directories' contents are not identical.
This should be closer to what you want:
#!/usr/bin/perl
use strict;
use warnings;
use File::DirCompare;
use File::Basename;
sub compare_dirs
{
my ($dir1, $dir2) = #_;
my $equal = 1;
File::DirCompare->compare($dir1, $dir2, sub {
my ($a,$b) = #_;
$equal = 0; # if the callback was called even once, the dirs are not equal
if ( !$b )
{
printf "File '%s' only exists in dir '%s'.\n", basename($a), dirname($a);
}
elsif ( !$a ) {
printf "File '%s' only exists in dir '%s'.\n", basename($b), dirname($b);
}
else
{
printf "File contents for $a and $b are different.\n";
}
});
return $equal;
}
print "Please specify two directory names\n" and exit if (#ARGV < 2);
printf "%s\n", &compare_dirs($ARGV[0], $ARGV[1]) ? 'Test: PASSED' : 'Test: FAILED';
I'd recommend using File::DirCompare module instead. ) It takes all the hard work of traversing the directory structure - you just need to define how your directories should be checked (should the sub compare the file contents, etc.)
You might want to try the ol' File::Find. It's not my favorite module. (It is just funky in the way it works), but for your purposes, it allows you to easily find all files in two directories, and compare them. Here's a brief example:
use strict;
use warnings;
use feature qw(say);
use Digest::MD5::File qw(file_md5_hex);
use File::Find;
use constant {
DIR_1 => "/usr/foo",
DIR_2 => "/usr/bar",
};
my %dir_1;
my %dir_2;
find ( sub {
if ( -f $File::Find::name ) {
$dir_1{$File::Find::name} = file_md5_hex($File::Find::name);
}
else {
$dir_1($file::Find::name} = "DIRECTORY!";
}
}, DIR_1);
find ( sub {
if ( -f $File::Find::name ) {
$dir_2{$File::Find::name} = file_md5_hex($File::Find::name);
}
else {
$dir_2($file::Find::name} = "DIRECTORY!";
}
}, DIR_2);
This will create two hashes keyed by the file names in each directory. I used the Digest::MD5::File to create a MD5 checksum. If the checksum between the two files differ, I know the files differ (although I don't know where).
Now you have to do three things:
Go through %dir_1 and see if there's an equivalent key in %dir_2. If there is not an equivalent key, you know that a file exists in %dir_1 and not %dir_2.
If there an equivalent key in each hash, check to see if the md5 checksums agree. If they do, then, the files match. If they don't they differ. You can't say where they differ, but they differ.
Finally, go through %dir_2 and check to see if there's an equivalent key in %dir_1. If there is, do nothing. If there isn't, that means there's a file in %dir_1 that's not in %dir_2.
Just a word of warning: The keys int these two hashes won't match. You'll have to transform one to the other when doing your compare. For example, you'll have two files as:
/usr/bar/my/file/is/here.txt
/usr/foo/my/file/is/here.txt
As you can see, my/file/is/here.txt exist in both directories, but in my code, the two hashes will have two different keys. You could either fix the two subroutines to strip the directory name off the front of the files paths, or when you do your comparison, transform one to the other. I didn't want to run through a full test. (The bit of code I wrote works in my testing), so I'm not 100% sure what you'll have to do to make sure you find the matching keys.
Oh, another warning: I pick up all entries and not just files. For directories, I can check to see if the hash key is equal to DIRECTORY! or not. I could simply ignore everything that's not a file.
And, you might want to check for special cases. Is this a link? Is it a hard link or a soft link? What about some sort of special file. That makes things a bit more complex. However, the basics are here.

How can I remove relative path components but leave symlinks alone in Perl?

I need to get Perl to remove relative path components from a Linux path. I've found a couple of functions that almost do what I want, but:
File::Spec->rel2abs does too little. It does not resolve ".." into a directory properly.
Cwd::realpath does too much. It resolves all symbolic links in the path, which I do not want.
Perhaps the best way to illustrate how I want this function to behave is to post a bash log where FixPath is a hypothetical command that gives the desired output:
'/tmp/test'$ mkdir -p a/b/c1 a/b/c2
'/tmp/test'$ cd a
'/tmp/test/a'$ ln -s b link
'/tmp/test/a'$ ls
b link
'/tmp/test/a'$ cd b
'/tmp/test/a/b'$ ls
c1 c2
'/tmp/test/a/b'$ FixPath . # rel2abs works here
===> /tmp/test/a/b
'/tmp/test/a/b'$ FixPath .. # realpath works here
===> /tmp/test/a
'/tmp/test/a/b'$ FixPath c1 # rel2abs works here
===> /tmp/test/a/b/c1
'/tmp/test/a/b'$ FixPath ../b # realpath works here
===> /tmp/test/a/b
'/tmp/test/a/b'$ FixPath ../link/c1 # neither one works here
===> /tmp/test/a/link/c1
'/tmp/test/a/b'$ FixPath missing # should work for nonexistent files
===> /tmp/test/a/b/missing
Alright, here is what I came up with:
sub mangle_path {
# NOT PORTABLE
# Attempt to remove relative components from a path - can return
# incorrect results for paths like ../some_symlink/.. etc.
my $path = shift;
$path = getcwd . "/$path" if '/' ne substr $path, 0, 1;
my #dirs = ();
for(split '/', $path) {
pop #dirs, next if $_ eq '..';
push #dirs, $_ unless $_ eq '.' or $_ eq '';
}
return '/' . join '/', #dirs;
}
I know this is possibly insecure and invalid, but any input to this routine will come from me on the command line, and it solves a couple of tricky use cases for me.