Search file in directory structure - perl

Does anybody know a method to search for a file in a directory structure without using File::Find? I know step-by-step how to do it but if it is possible to make it smoother that will be helpful.

File::Find is a core module since perl 5.000 so I don't see a reason for not using it.
But if you still want to take your crazy way you could call the find command.

From one File::Find hater to another: DirWalk.pm, inspired by the Python's os.walk().
package DirWalk;
use strict;
use warnings;
sub new {
my ($class, #dirs) = #_;
my #odirs = #dirs;
#dirs = qw/./ unless #dirs;
s!/+$!! for #dirs;
s!/+\.$!! for #dirs;
my $self = { _odirs => [#odirs], _dirs => [#dirs], _dhstack => [], _dnstack => [] };
opendir my($dirh), $dirs[0];
return undef unless $dirh;
shift #{ $self->{_dirs} };
unshift #{ $self->{_dhstack} }, $dirh;
unshift #{ $self->{_dnstack} }, $dirs[0];
return bless $self, $class;
}
sub _walk_op {
my ($self) = #_;
if (wantarray) {
my #ret;
while (defined(my $x = $self->next())) {
push #ret, $x;
}
return #ret;
}
elsif (defined wantarray) {
return $self->next();
}
return undef;
}
sub next
{
my ($self) = #_;
my $dstack = $self->{_dhstack};
my $nstack = $self->{_dnstack};
if (#$dstack) {
my $x;
do {
$x = readdir $dstack->[0];
} while (defined($x) && ($x eq '.' || $x eq '..'));
if (defined $x) {
my $nm = $nstack->[0].'/'.$x;
if (-d $nm) {
# open dir, and put the handle on the stack
opendir my($dh), $nm;
if (defined $dh) {
unshift #{ $self->{_dhstack} }, $dh;
unshift #{ $self->{_dnstack} }, $nm;
}
else {
warn "can't walk into $nm!"
}
$nm .= '/';
}
# return the name
return $nm;
}
else {
closedir $dstack->[0];
shift #$dstack;
shift #$nstack;
unless (#$dstack) {
while (#{ $self->{_dirs} }) {
my $dir = shift #{ $self->{_dirs} };
opendir my($dirh), $dir;
next unless defined $dirh;
unshift #{ $self->{_dhstack} }, $dirh;
unshift #{ $self->{_dnstack} }, $dir;
last;
}
}
return $self->next();
}
}
else {
return undef;
}
}
use overload '<>' => \&_walk_op;
use overload '""' => sub { 'DirWalk('.join(', ', #{$_[0]->{_odirs}}).')'; };
1;
Example:
# prepare test structure
mkdir aaa
touch aaa/bbb
mkdir aaa/ccc
touch aaa/ccc/ddd
# example invocation:
perl -mDirWalk -E '$dw=DirWalk->new("aaa"); say while <$dw>;'
#output
aaa/ccc/
aaa/ccc/ddd
aaa/bbb
Another example:
use strict;
use warnings;
use DirWalk;
# iteration:
my $dw = DirWalk->new("aaa");
while (<$dw>) {
print "$_\n";
}
# or as a list:
$dw = DirWalk->new("aaa");
my #list = <$dw>;
for (#list) {
print "$_\n";
}

The method I've been inplamenting is utilizing three commands: opendir, readdir, and closedir. See below for an example:
opendir my $dir1, $cwd or die "cannot read the directory $cwd: $!";
#cwd= readdir $dir1;
closedir $dir1;
shift #cwd; shift #cwd;
foreach(#cwd){if ($_=~/$file_search_name/){print "I have found the file in $_\n!";}}
The directory will be stored in #cwd, which includes . and .. For windows, shift #cwd will remove these. I unfortunately am tight for time, but utilize this idea with an anon array to store the directory handles as well as another array for storing the directory paths. Perhaps utilize -d to check if it is a directory. There might be file permission issues, so perhaps unless(opendir ...) would be a great option.
Best of luck.

I'm sure I will be flayed alive for this answer but you could always use either system() or backticks `` to execute the regular linux find command. Or do some sort of ls...
#files = `ls $var/folder/*.logfile`
#files = `find . -name $file2find`
I expect some seasoned perlers have many good reasons not to do this.

yon can also try some stuff like this!!!
# I want to find file xyz.txt in $dir (say C:\sandbox)
Findfile("xyz.txt", $dir);
sub Findfile ()
{
my $file = shift;
my $Searchdir = shift;
my #content = <$Searchdir/*>;
foreach my $element (#content)
{
if($element =~ /.*$file$/)
{
print "found";
last;
}
elsif (-d $element)
{
Findfile($file, $element); #recursive search
}
}
}

File::Find::Rule is "smoother".
use File::Find::Rule qw( );
say for File::Fine::Rule->in(".");

Related

Perl - How to modify variables with subroutines without return

The next code is used to get a filepath and check if exists and can be read; otherwise the value will be switched to a custom one:
use strict;
use warnings;
[...]
sub checkFilePath{
my ($args) = #_;
my $checkingPath = $args->{path};
my $checkingCustomPath = $args->{customPath};
my $canBeRead = 1;
if ($checkingPath) {
if (!(-e "$checkingPath")) {
print "[WARN] File $checkingPath doesn't exist.\n";
$canBeRead = 0;
} elsif (!(-f "$checkingPath")) {
print "[WARN] $checkingPath is not a file.\n";
$canBeRead = 0;
} elsif (!(-r "$checkingPath")) {
print "[WARN] File $checkingPath can't be read.\n";
$canBeRead = 0;
}
}
if (!($canBeRead)) {
# Testing custom regex file path
# If doesn't exist, it will try to use custom file or the script will die
die "[ERR] Custom file $checkingCustomPath doesn't exist\n" if (!(-e $checkingCustomPath));
die "[ERR] Custom file $checkingCustomPath is not a file\n" if (!(-f $checkingCustomPath));
die "[ERR] Custom file $checkingCustomPath cannot be read\n" if (!(-r $checkingCustomPath));
return $checkingCustomPath;
}
return $checkingPath;
}
[...]
$logPath = checkFilePath({
path => $logPath,
customPath => $customLogPath
});
I was wondering if there is a way to modify this code to update $logPath only with a subroutine call, like:
# $logPath = '/tmp/thisfiledoesntexist.txt'
checkFilePath({
path => $logPath,
customPath => $customLogPath
});
# $logPath now has a valid filepath, which is the same as $customLogPath
If $logPath was passed to the subroutine as an argument (or via a reference), it would be possible to change it (by modifying the correct element of #_ (or modifying the referenced scalar)). But you copy its value into a hash and pass a reference to that hash instead. At best, you could modify $hash->{path} instead of $logPath.
sub fixFilePath {
our $checkingPath; local *checkingPath = \shift; # my \$checkingPath = \shift;
my %args = #_;
my $checkingCustomPath = $args{customPath};
...
return if $canBeRead;
...
$checkingPath = $checkingCustomPath;
}
fixFilePath($logPath,
customPath => $customLogPath,
);
Thinking about this a little more, I decided to propose a different, less repetitive, and, IMO, clearer way of doing it:
use strict;
use warnings;
use autouse Carp => qw(croak);
print chooseFilePath('doesnot.exist', "$ENV{TEMP}/t.log"), "\n";
sub chooseFilePath {
my $wantedPath = shift;
my $defaultPath = shift;
if (defined(my $reason = isBadFilePath($wantedPath))) {
warn "[WARN] $reason.\n";
if (defined($reason = isBadFilePath($defaultPath))) {
die "[ERR] $reason.\n";
}
return $defaultPath;
}
return $wantedPath;
}
sub isBadFilePath {
#_ or croak 'Need a path';
my $path = shift;
-e $path or return "File '$path' doesn't exist";
-f _ or return "'$path' is not a file";
-r _ or return "File '$path' can't be read";
return;
}
Output:
C:\...\Temp> perl s.pl
[WARN] File 'doesnot.exist' doesn't exist.
[ERR] File 'C:\...\Temp/t.log' doesn't exist.
C:\...\Temp> echo x > t.log
C:\...\Temp> perl s.pl
[WARN] File 'doesnot.exist' doesn't exist.
C:\...\Temp/t.log

Preserve local context across nested subroutines

Let's consider the wanted code below. I have recursive calls to process and for each recursion I use a local %context. In this way I can get my context back when I return from a call.
sub process {
my %context; # Local context
process() if rerun();
job1();
job2();
sub job1() {print $context{foo}}
sub job2() {print $context{bar}}
}
Unfortunately perl does not manage nested subroutines as I expected. By moving my subroutines outside from the process subroutine I will get a problem because I won't be able to access %context anymore. So I need to make it global and use a stack as follow:
my %context; # Local context
my #context_stack;
sub process {
push #context_stack, %context;
%context = undef;
process() if rerun();
job1();
job2();
%context = pop #context_stack;
}
sub job1() {print $context{foo}}
sub job2() {print $context{bar}}
The third solution is to pass the context to all subroutines which can be annoying for very small subroutines. Also %context become global to all my program. So I loose the privacy of this variable.
my %context; # Local context
my #context_stack;
sub process {
push #context_stack, %context;
%context = undef;
process() if rerun(\%context);
job1(\%context);
job2(\%context);
%context = pop #context_stack;
}
sub job1() {$context = shift; print $context->{foo}}
sub job2() {$context = shift; print $context->{bar}}
What would be the best approach?
EDIT
For a better understanding of my specific, I provide another example:
process(#ARGV);
exit 0;
sub process {
my $infile = shift;
my $outfile = shift;
open my $fp_in, '<', $infile;
open my $fp_out, '>', $outfile;
LINE: while(<$fp_in>) {
remove_c_comment();
say STDERR "File is $infile";
process($1, "$1.processed") if /#include "(.*?)";
warning("Huh raisin, no!") if /\braisin/;
say STDERR "Fill is still $infile";
print $fp_out $_;
}
sub remove_c_comment { s|//.*$|| }
sub warning { say "[Warning] $infile:$. ".shift() }
}
The thing you're looking for - but you may not know it - is called a closure. (see also: perlref)
{
my %context;
sub job1 { print $context{foo} };
sub job2 { print $context{bar} };
sub init_context{ $context{foo} = 1 };
}
Context remains private within this block, but accessible to all the subroutines.
As an alternative - you can return a code reference from a subroutine - like this:
use strict;
use warnings;
sub make_sub_with_context {
my %context;
$context{"bar"} = 1;
return sub { print $context{"bar"}++ };
}
my $job1_with_context = make_sub_with_context();
my $another_job_with_context = make_sub_with_context();
$job1_with_context->();
$job1_with_context->();
$another_job_with_context->();
$another_job_with_context->();
$another_job_with_context->();
Which may be a better example.
Edit:
Following on from your updated example it looks like your problem spec is to iterate a set of files, and (recursively) traverse referenced files.
Sort of like a find but following include directives. I would point out that by doing it that way, what you're doing is potentially going to end up with a loop, which isn't ideal.
Can I suggest instead taking a different approach? Don't recurse:
use strict;
use warnings;
my #files_to_process = #ARGV;
my %done;
while ( my $infile = pop #files_to_process ) {
next if $done{$infile}++;
open my $fp_in, '<', $infile or die $!;
open my $fp_out, '>', $infile . ".processed" or die $!;
while ( my $line = <$fp_in> ) {
$line =~ s|\/\/.*$||;
if ( my ($include) = ( $line =~ m/#include "(.*?)"/ ) ) {
push #files_to_process, $include;
}
print {$fp_out} $line;
}
close($fp_out);
close($fp_in);
}
With a bit more thought, and the expansion that this task needs to process stuff in declaration order - I'd offer instead - perhaps taking an OO approach would help. Something like:
use strict;
use warnings;
package parser;
sub new {
my ($class) = #_;
my $self = {};
bless $self, $class;
return $self;
}
sub process {
my ( $self, $infile, $outfile ) = #_;
open my $fp_in, '<', $infile;
open my $fp_out, '>', $outfile;
LINE: while ( my $line = <$fp_in> ) {
$line =~ s|\/\/.*$||;
say STDERR "File is $infile";
if ( my ($includefile) = ( $line =~ m/#include "(.*?)"/ ) ) {
my $processor = parser->new();
$processor -> process( $includefile, "$includefile.processed" );
}
$self->warning("Huh raisin, no!") if /\braisin/;
say STDERR "Fill is still $infile";
print $fp_out $line;
}
}
package main;
my $processor = parser->new()->process(#ARGV);

How to build hierarhical hash in Perl from directory tree

I am trying to build the structure like this.
{
"file1": "supersong.mp3",
"file2": "supersong2.mp3",
"file3": "text.txt",
"file4": "tex2t.txt",
"file5": "text3.txt",
"file6": "json.pl",
"directory_movies": [
"file1": "supersong.mp3",
"file2": "supersong2.mp3",
"file3": "text.txt",
"file4": "tex2t.txt",
"file5": "text3.txt",
"file6": "json.pl",
"directory_sub_movies": [
"file1": "supersong.mp3",
"file2": "supersong2.mp3",
"file3": "text.txt",
"file4": "tex2t.txt",
"file5": "text3.txt",
"file6": "json.pl",
]
]
};
So as any directory hierarchy in my case in unix. So we have simple files or directories, if it is directory it is nested hash and so on recursively.
I need to represent it as hash in perl, the easiest way I have found is to use File::Find module.
It works correctly but I cannot figure out how to save hierarchy in hash to be nested as above.
Here is my test script. That determines type of current item correctly.
sub path_checker {
if (-d $File::Find::name) {
print "Directory " . $_ . "\n";
}
elsif (-f $File::Find::name) {
print "File " . $_ . " Category is " . basename($File::Find::dir) . "\n";
}
}
sub parse_tree {
my ($class,$root_path) = #_;
File::Find::find(\&path_checker, $root_path);
}
Please help to modify it to create structure like I have described above. I would be very grateful.
Subfolders should also be hashes, not arrays,
use strict;
use warnings;
# use Data::Dumper;
use File::Find;
use JSON;
sub parse_tree {
my ($root_path) = #_;
my %root;
my %dl;
my %count;
my $path_checker = sub {
my $name = $File::Find::name;
if (-d $name) {
my $r = \%root;
my $tmp = $name;
$tmp =~ s|^\Q$root_path\E/?||;
$r = $r->{$_} ||= {} for split m|/|, $tmp; #/
$dl{$name} ||= $r;
}
elsif (-f $name) {
my $dir = $File::Find::dir;
my $key = "file". ++$count{ $dir };
$dl{$dir}{$key} = $_;
}
};
find($path_checker, $root_path);
return \%root;
}
print encode_json(parse_tree("/tmp"));

How can I still get automatic assignment to '$_' with a mocked 'readline' function?

Perl has some special handling for the readline function (and the equivalent <> I/O operator) where it treats the expressions
while (<HANDLE>)
while (readline(HANDLE))
as equivalent to
while (defined($_ = <HANDLE>))
cf.
$ perl -MO=Deparse -e 'f($_) while <>'
f($_) while defined($_ = <ARGV>); <--- implicitly sets $_
-e syntax OK
But this automatic assignment doesn't seem to happen if you hijack the readline function:
$ perl -MO=Deparse -e 'BEGIN {
> *CORE::GLOBAL::readline = sub { }
> }
> f($_) while <>'
sub BEGIN {
*CORE::GLOBAL::readline = sub {
};
}
f($_) while readline(ARGV); <--- doesn't set $_ !
-e syntax OK
Of course, this will make the custom readline function work incorrectly for a lot of legacy code. The output of this code is "foo" with the BEGIN block and "bar" without it, but I want it to be "BAR".
use warnings;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return uc $line if defined $line;
return;
}
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print $_; # want and expect to see "BAR\n"
}
What options do I have to hijack the readline function but still get the proper treatment of the while (<...>) idiom? It's not practical to explicitly convert everything to while (defined($_=<...>)) in all the legacy code.
This is a fairly dirty hack using overloading to detect boolean context, but it seems to do the trick. It certainly needs more testing than I have given it before using this solution in a production environment:
use warnings;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return Readline->new(uc $line) if defined $line;
return;
}
{package Readline;
sub new {shift; bless [#_]}
use overload fallback => 1,
'bool' => sub {defined($_ = $_[0][0])}, # set $_ in bool context
'""' => sub {$_[0][0]},
'+0' => sub {$_[0][0]};
}
my $bar;
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print $_; # want and expect to see "BAR\n"
}
which prints:
BAR
This will also make if (<X>) {...} set $_. I don't know if there is a way to limit the magic to only while loops.
This code:
use warnings;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return unless defined $line;
$line = uc $line;
$_ = $line;
return $line;
}
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print $_; # want and expect to see "BAR\n"
}
print "$_"; # prints "BAR" instad of "foo"
does almost the right thing, but $_ is not localised, so after the loop, $_ is set to the last value read from the filehandle. Adding Scope::Upper to the mix fixes that:
use warnings;
use Scope::Upper qw/localize SCOPE/;
BEGIN { *CORE::GLOBAL::readline = \&uc_readline; }
sub uc_readline {
my $line = CORE::readline(shift || *ARGV);
return unless defined $line;
$line = uc $line;
local $_ = $line;
# localize $_ in the scope of the while
localize *main::_, \$line, SCOPE(1);
return $line;
}
($_, $bar) = ("foo\n", "bar\n");
open X, '<', \$bar;
while (<X>) {
print "$_"; # want and expect to see "BAR\n"
}
print "$_"; # will print 'foo', not "BAR"

How can I recursively read out directories in Perl?

I want to read out a directory recursively to print the data-structure in an HTML-Page with Template::Toolkit.
But I'm hanging in how to save the Paths and Files in a form that can be read our easy.
My idea started like this
sub list_dirs{
my ($rootPath) = #_;
my (#paths);
$rootPath .= '/' if($rootPath !~ /\/$/);
for my $eachFile (glob($path.'*'))
{
if(-d $eachFile)
{
push (#paths, $eachFile);
&list_dirs($eachFile);
}
else
{
push (#files, $eachFile);
}
}
return #paths;
}
How could I solve this problem?
This should do the trick
use strict;
use warnings;
use File::Find qw(finddepth);
my #files;
finddepth(sub {
return if($_ eq '.' || $_ eq '..');
push #files, $File::Find::name;
}, '/my/dir/to/search');
You should always use strict and warnings to help you debug your code. Perl would have warned you for example that #files is not declared. But the real problem with your function is that you declare a lexical variable #paths on every recursive call to list_dirs and don't push the return value back after the recursion step.
push #paths, list_dir($eachFile)
If you don't want to install additional modules, the following solution should probably help you:
use strict;
use warnings;
use File::Find qw(find);
sub list_dirs {
my #dirs = #_;
my #files;
find({ wanted => sub { push #files, $_ } , no_chdir => 1 }, #dirs);
return #files;
}
The answer by mdom explains how your initial attempt went astray. I would also suggest that you consider friendlier alternatives to File::Find. CPAN has several options. Here's one.
use strict;
use warnings;
use File::Find::Rule;
my #paths = File::Find::Rule->in(#ARGV);
Also see here:
SO answer providing CPAN
alternatives to File::Find.
SO question on directory iterators.
And here is a rewrite of your recursive solution. Things to note: use strict; use warnings; and the use of a scoping block to create a static variable for the subroutine.
use strict;
use warnings;
print $_, "\n" for dir_listing(#ARGV);
{
my #paths;
sub dir_listing {
my ($root) = #_;
$root .= '/' unless $root =~ /\/$/;
for my $f (glob "$root*"){
push #paths, $f;
dir_listing($f) if -d $f;
}
return #paths;
}
}
I think you have problem in the following line in your code
for my $eachFile (glob($path.'*'))
You change the $path variable into $rootpath.
It will store the path correctly.
I use this script to remove hidden files (created by Mac OS X) from my USB Pendrive, where I usually use it to listen music in the car, and any file ending with ".mp3", even when it starts with "._", will be listed in the car audio list.
#!/bin/perl
use strict;
use warnings;
use File::Find qw(find);
sub list_dirs {
my #dirs = #_;
my #files;
find({ wanted => sub { push #files, $_ } , no_chdir => 1 }, #dirs);
return #files;
}
if ( ! #ARGV || !$ARGV[0] ) {
print "** Invalid dir!\n";
exit ;
}
if ( $ARGV[0] !~ /\/Volumes\/\w/s ) {
print "** Dir should be at /Volume/... > $ARGV[0]\n";
exit ;
}
my #paths = list_dirs($ARGV[0]) ;
foreach my $file (#paths) {
my ($filename) = ( $file =~ /([^\\\/]+)$/s ) ;
if ($filename =~ /^\._/s ) {
unlink $file ;
print "rm> $file\n" ;
}
}
you can use this method as recursive file search that separate specific file types,
my #files;
push #files, list_dir($outputDir);
sub list_dir {
my #dirs = #_;
my #files;
find({ wanted => sub { push #files, glob "\"$_/*.txt\"" } , no_chdir => 1 }, #dirs);
return #files;
}