How to build hierarhical hash in Perl from directory tree - perl

I am trying to build the structure like this.
{
"file1": "supersong.mp3",
"file2": "supersong2.mp3",
"file3": "text.txt",
"file4": "tex2t.txt",
"file5": "text3.txt",
"file6": "json.pl",
"directory_movies": [
"file1": "supersong.mp3",
"file2": "supersong2.mp3",
"file3": "text.txt",
"file4": "tex2t.txt",
"file5": "text3.txt",
"file6": "json.pl",
"directory_sub_movies": [
"file1": "supersong.mp3",
"file2": "supersong2.mp3",
"file3": "text.txt",
"file4": "tex2t.txt",
"file5": "text3.txt",
"file6": "json.pl",
]
]
};
So as any directory hierarchy in my case in unix. So we have simple files or directories, if it is directory it is nested hash and so on recursively.
I need to represent it as hash in perl, the easiest way I have found is to use File::Find module.
It works correctly but I cannot figure out how to save hierarchy in hash to be nested as above.
Here is my test script. That determines type of current item correctly.
sub path_checker {
if (-d $File::Find::name) {
print "Directory " . $_ . "\n";
}
elsif (-f $File::Find::name) {
print "File " . $_ . " Category is " . basename($File::Find::dir) . "\n";
}
}
sub parse_tree {
my ($class,$root_path) = #_;
File::Find::find(\&path_checker, $root_path);
}
Please help to modify it to create structure like I have described above. I would be very grateful.

Subfolders should also be hashes, not arrays,
use strict;
use warnings;
# use Data::Dumper;
use File::Find;
use JSON;
sub parse_tree {
my ($root_path) = #_;
my %root;
my %dl;
my %count;
my $path_checker = sub {
my $name = $File::Find::name;
if (-d $name) {
my $r = \%root;
my $tmp = $name;
$tmp =~ s|^\Q$root_path\E/?||;
$r = $r->{$_} ||= {} for split m|/|, $tmp; #/
$dl{$name} ||= $r;
}
elsif (-f $name) {
my $dir = $File::Find::dir;
my $key = "file". ++$count{ $dir };
$dl{$dir}{$key} = $_;
}
};
find($path_checker, $root_path);
return \%root;
}
print encode_json(parse_tree("/tmp"));

Related

Search file in directory structure

Does anybody know a method to search for a file in a directory structure without using File::Find? I know step-by-step how to do it but if it is possible to make it smoother that will be helpful.
File::Find is a core module since perl 5.000 so I don't see a reason for not using it.
But if you still want to take your crazy way you could call the find command.
From one File::Find hater to another: DirWalk.pm, inspired by the Python's os.walk().
package DirWalk;
use strict;
use warnings;
sub new {
my ($class, #dirs) = #_;
my #odirs = #dirs;
#dirs = qw/./ unless #dirs;
s!/+$!! for #dirs;
s!/+\.$!! for #dirs;
my $self = { _odirs => [#odirs], _dirs => [#dirs], _dhstack => [], _dnstack => [] };
opendir my($dirh), $dirs[0];
return undef unless $dirh;
shift #{ $self->{_dirs} };
unshift #{ $self->{_dhstack} }, $dirh;
unshift #{ $self->{_dnstack} }, $dirs[0];
return bless $self, $class;
}
sub _walk_op {
my ($self) = #_;
if (wantarray) {
my #ret;
while (defined(my $x = $self->next())) {
push #ret, $x;
}
return #ret;
}
elsif (defined wantarray) {
return $self->next();
}
return undef;
}
sub next
{
my ($self) = #_;
my $dstack = $self->{_dhstack};
my $nstack = $self->{_dnstack};
if (#$dstack) {
my $x;
do {
$x = readdir $dstack->[0];
} while (defined($x) && ($x eq '.' || $x eq '..'));
if (defined $x) {
my $nm = $nstack->[0].'/'.$x;
if (-d $nm) {
# open dir, and put the handle on the stack
opendir my($dh), $nm;
if (defined $dh) {
unshift #{ $self->{_dhstack} }, $dh;
unshift #{ $self->{_dnstack} }, $nm;
}
else {
warn "can't walk into $nm!"
}
$nm .= '/';
}
# return the name
return $nm;
}
else {
closedir $dstack->[0];
shift #$dstack;
shift #$nstack;
unless (#$dstack) {
while (#{ $self->{_dirs} }) {
my $dir = shift #{ $self->{_dirs} };
opendir my($dirh), $dir;
next unless defined $dirh;
unshift #{ $self->{_dhstack} }, $dirh;
unshift #{ $self->{_dnstack} }, $dir;
last;
}
}
return $self->next();
}
}
else {
return undef;
}
}
use overload '<>' => \&_walk_op;
use overload '""' => sub { 'DirWalk('.join(', ', #{$_[0]->{_odirs}}).')'; };
1;
Example:
# prepare test structure
mkdir aaa
touch aaa/bbb
mkdir aaa/ccc
touch aaa/ccc/ddd
# example invocation:
perl -mDirWalk -E '$dw=DirWalk->new("aaa"); say while <$dw>;'
#output
aaa/ccc/
aaa/ccc/ddd
aaa/bbb
Another example:
use strict;
use warnings;
use DirWalk;
# iteration:
my $dw = DirWalk->new("aaa");
while (<$dw>) {
print "$_\n";
}
# or as a list:
$dw = DirWalk->new("aaa");
my #list = <$dw>;
for (#list) {
print "$_\n";
}
The method I've been inplamenting is utilizing three commands: opendir, readdir, and closedir. See below for an example:
opendir my $dir1, $cwd or die "cannot read the directory $cwd: $!";
#cwd= readdir $dir1;
closedir $dir1;
shift #cwd; shift #cwd;
foreach(#cwd){if ($_=~/$file_search_name/){print "I have found the file in $_\n!";}}
The directory will be stored in #cwd, which includes . and .. For windows, shift #cwd will remove these. I unfortunately am tight for time, but utilize this idea with an anon array to store the directory handles as well as another array for storing the directory paths. Perhaps utilize -d to check if it is a directory. There might be file permission issues, so perhaps unless(opendir ...) would be a great option.
Best of luck.
I'm sure I will be flayed alive for this answer but you could always use either system() or backticks `` to execute the regular linux find command. Or do some sort of ls...
#files = `ls $var/folder/*.logfile`
#files = `find . -name $file2find`
I expect some seasoned perlers have many good reasons not to do this.
yon can also try some stuff like this!!!
# I want to find file xyz.txt in $dir (say C:\sandbox)
Findfile("xyz.txt", $dir);
sub Findfile ()
{
my $file = shift;
my $Searchdir = shift;
my #content = <$Searchdir/*>;
foreach my $element (#content)
{
if($element =~ /.*$file$/)
{
print "found";
last;
}
elsif (-d $element)
{
Findfile($file, $element); #recursive search
}
}
}
File::Find::Rule is "smoother".
use File::Find::Rule qw( );
say for File::Fine::Rule->in(".");

Match two strings based on common substring

I have a list of files that needs to be grouped in pairs. (I need to append an HTML 'File B' (body) to 'File A' (header) because I need to serve them statically without server-side includes).
Example:
/path/to/headers/.../matching_folder/FileA.html
/someother/path/to/.../matching_folder/body/FileB.html
Emphasizing with the ellipses that the paths are not of uniform length, nor is 'matching folder' always in the same position in the path.
It seems I need to match/join based on the common substring 'matching_folder', but I am stumped on scanning each string, storing, matching (excerpt):
my #dirs = ( $headerPath, $bodyPath );
my #files = ();
find( { wanted => \&wanted, no_chdir => 1 }, #dirs );
foreach my $file (#files) {
# pseudocode: append $file[0] to $file[1] if both paths contain same 'matching_folder'
};
sub wanted {
return unless -f and /(FileA\.html$)|(FileB\.html$)/i;
push #files, $_;
};
Hash the files by all the directory steps in their names.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use File::Find;
my $headerPath = 'headers';
my $bodyPath = 'bodies';
my #dirs = ($headerPath, $bodyPath);
my #files;
sub wanted {
return unless -f and /file.\.html$/;
push #files, $_;
};
find({ wanted => \&wanted, no_chdir => 1 }, #dirs);
my %common;
for my $file (#files) {
my #steps = split m(/), $file;
push #{ $common{$_} }, $file for #steps;
};
# All the headers and all the bodies share their prefixes,
# but that's not what we're interested in.
delete #common{qw{ bodies headers }};
for my $step (keys %common) {
next if 1 == #{ $common{$step} };
print "$step common for #{ $common{$step} }\n";
}
Tested on the following structure:
bodies/3/something/C/something2/fileA.html
bodies/2/junk/B/fileB.html
bodies/1/A/fileC.html
headers/a/B/fileD.html
headers/c/one/A/two/fileE.html
headers/b/garbage/C/fileF.html
Output:
B common for headers/a/B/fileD.html bodies/2/junk/B/fileB.html
C common for headers/b/garbage/C/fileF.html bodies/3/something/C/something2/fileA.html
A common for headers/c/one/A/two/fileE.html bodies/1/A/fileC.html
With the above, I can get to
for my $step (keys %common) {
next unless 2 == #{ $common{$step} }; # pairs
my #pairs = #{ $common{$step} };
my $html;
foreach my $f (#pairs) {
$html .= &readfile($f);
};
&writefile($html, $step . '.html');
}
And get what I need for now. Thanks all! (I love Perl, making hard things possible indeed).

Sorting module subroutines alphabetically

I would like to sort my module subroutines alphabetically (I have a lot of subroutines, and I think it will be easier to edit the file if the subroutines are ordered in the file). For example given A.pm:
package A;
use warnings;
use strict;
sub subA {
print "A\n";
}
sub subC {
print "C\n";
}
sub subB {
print "B\n";
}
1;
I would like to run a sortSub A.pm the gives:
package A;
use warnings;
use strict;
sub subA {
print "A\n";
}
sub subB {
print "B\n";
}
sub subC {
print "C\n";
}
1;
Is there any CPAN resource that can help with this task?
To parse and reformat Perl code, you should use PPI.
This is the same tool that Perl::Critic and Perl::Tidy use to accomplish all of their feats.
In this case, I studied the code for PPI::Dumper to get a sense of how to navigate the Document Tree that PPI returns.
The following will parse source code and separate out sections containing subroutines and comments. It will tie the comments, pod, and whitespace before a subroutine with it, and then it will sort all the neighboring subs by their names.
use strict;
use warnings;
use PPI;
use Data::Dump;
my $src = do { local $/; <DATA> };
# Load a document
my $doc = PPI::Document->new( \$src );
# Save Sub locations for later sorting
my #group = ();
my #subs = ();
for my $i ( 0 .. $#{ $doc->{children} } ) {
my $child = $doc->{children}[$i];
my ( $subtype, $subname )
= $child->isa('PPI::Statement::Sub')
? grep { $_->isa('PPI::Token::Word') } #{ $child->{children} }
: ( '', '' );
# Look for grouped subs, whitespace and comments. Sort each group separately.
my $is_related = ($subtype eq 'sub') || grep { $child->isa("PPI::Token::$_") } qw(Whitespace Comment Pod);
# State change or end of stream
if ( my $range = $is_related .. ( !$is_related || ( $i == $#{ $doc->{children} } ) ) ) {
if ($is_related) {
push #group, $child;
if ( $subtype ) {
push #subs, { name => "$subname", children => [#group] };
#group = ();
}
}
if ( $range =~ /E/ ) {
#group = ();
if (#subs) {
# Sort and Flatten
my #sorted = map { #{ $_->{children} } } sort { $a->{name} cmp $b->{name} } #subs;
# Assign back to document, and then reset group
my $min_index = $i - $range + 1;
#{ $doc->{children} }[ $min_index .. $min_index + $#sorted ] = #sorted;
#subs = ();
}
}
}
}
print $doc->serialize;
1;
__DATA__
package A;
use warnings;
use strict;
=comment
Pod describing subC
=cut
sub subC {
print "C\n";
}
INIT {
print "Hello World";
}
sub subB {
print "B\n";
}
# Hello subA comment
sub subA {
print "A\n";
}
1;
Output:
package A;
use warnings;
use strict;
=comment
Pod describing subC
=cut
sub subC {
print "C\n";
}
INIT {
print "Hello World";
}
# Hello subA comment
sub subA {
print "A\n";
}
sub subB {
print "B\n";
}
1;
First, here's my solution;
#!/bin/sh
TOKEN=sub
gsed -e ':a;N;$!ba;s/\n/__newline__/g' "$1" > "$1.out"
gsed -i "s/__newline__\\s*$TOKEN\W/\\nsub /g" "$1.out"
sort $1.out -o $1.out
gsed -i 's/__newline__/\n/g' $1.out
Usage: token_sort.sh myfile.pl
This is how it works;
Replace all newlines with a placeholder, __newline__
break out all $TOKENS, in this case subs, to their own line
Sort the lines using unix sort
Replace back all the newlines
You should now have a sorted copy of your file in myfile.pl.out
A few caveats;
Add a comment, "# Something", or "#!/usr/bin/env perl" to the top of the file; this will ensure that the header block remains sorted at the top.
The sorted block will be the start of the current sub to the next sub - comments at above the sub will get sorted with the previous sub.
You need to use gnu-sed for this to work, on a mac this means doing a "brew install gnu-sed"

Find unused "use'd" Perl modules

I am working on a very large, very old "historically grown" codebase. In the past, there were often people thinking "Oh, I may need this and that module, so I just include it...", and later, people often "cached" Data inside of modules ("use ThisAndThat" needing a few seconds to load some hundred MB from DB to RAM, yeah, its really a stupid Idea, we are working on that too) and so, often, we have a small module use'ing like 20 or 30 modules, from who 90% are totally unused in the source itself, and, because of "caching" in several use'd submodules, modules tend to take up one minute to load or even more, which is, of course, not acceptable.
So, Im trying to get that done better. Right now, my way is looking through all the modules, understanding them as much as possible and I look at all the modules including them and see whether they are needed or not.
Is there any easier way? I mean: There are functions returning all subs a module has like
...
return grep { defined &{"$module\::$_"} } keys %{"$module\::"}
, so, aint there any simple way to see which ones are exported by default and which ones come from where and are used in the other modules?
A simple example is Data::Dumper, which is included in nearly every file, even, when all debug-warns and prints and so on arent in the script anymore. But still the module has to load Data::Dumper.
Is there any simple way to check that?
Thanks!
The following code could be part of your solution - it will show you which symbols are imported for each instance of use:
package traceuse;
use strict;
use warnings;
use Devel::Symdump;
sub import {
my $class = shift;
my $module = shift;
my $caller = caller();
my $before = Devel::Symdump->new($caller);
my $args = \#_;
# more robust way of emulating use?
eval "package $caller; require $module; $module\->import(\#\$args)";
my $after = Devel::Symdump->new($caller);
my #added;
my #after_subs = $after->functions;
my %before_subs = map { ($_,1) } $before->functions;
for my $k (#after_subs) {
push(#added, $k) unless $before_subs{$k};
}
if (#added) {
warn "using module $module added: ".join(' ', #added)."\n";
} else {
warn "no new symbols from using module $module\n";
}
}
1;
Then just replace "use module ..." with "use traceuse module ...", and you'll get a list of the functions that were imported.
Usage example:
package main;
sub foo { print "debug: foo called with: ".Dumper(\#_)."\n"; }
use traceuse Data::Dumper;
This will output:
using module Data::Dumper added: main::Dumper
i.e. you can tell which functions were imported in robust way. And you can easily extend this to report on imported scalar, array and hash variables - check the docs on Devel::Symdump.
Determine which functions are actually used is the other half of the equation. For that you might be able to get away with a simple grep of your source code - i.e. does Dumper appear in the module's source code that's not on a use line. It depends on what you know about your source code.
Notes:
there may be a module which does what traceuse does - I haven't checked
there might be a better way to emulate "use" from another package
I kind of got of got it to work with PPI. It looks like this:
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Term::ANSIColor;
use PPI;
use PPI::Dumper;
my %doneAlready = ();
$" = ", ";
our $maxDepth = 2;
my $showStuffOtherThanUsedOrNot = 0;
parse("/modules/Test.pm", undef, undef, 0);
sub parse {
my $file = shift;
my $indent = shift || 0;
my $caller = shift || $file;
my $depth = shift || 0;
if($depth && $depth >= $maxDepth) {
return;
}
return unless -e $file;
if(exists($doneAlready{$file}) == 1) {
return;
}
$doneAlready{$file} = 1;
my $skript = PPI::Document->new($file);
my #included = ();
eval {
foreach my $x (#{$skript->find("PPI::Statement::Include")}) {
foreach my $y (#{$x->{children}}) {
push #included, $y->{content} if (ref $y eq "PPI::Token::Word" && $y->{content} !~ /^(use|vars|constant|strict|warnings|base|Carp|no)$/);
}
}
};
my %double = ();
print "===== $file".($file ne $caller ? " (Aufgerufen von $caller)" : "")."\n" if $showStuffOtherThanUsedOrNot;
if($showStuffOtherThanUsedOrNot) {
foreach my $modul (#included) {
next unless -e createFileName($modul);
my $is_crap = ((exists($double{$modul})) ? 1 : 0);
print "\t" x $indent;
print color("blink red") if($is_crap);
print $modul;
print color("reset") if($is_crap);
print "\n";
$double{$modul} = 1;
}
}
foreach my $modul (#included) {
next unless -e createFileName($modul);
my $anyUsed = 0;
my $modulDoc = parse(createFileName($modul), $indent + 1, $file, $depth + 1);
if($modulDoc) {
my #exported = getExported($modulDoc);
print "Exported: \n" if(scalar #exported && $showStuffOtherThanUsedOrNot);
foreach (#exported) {
print(("\t" x $indent)."\t");
if(callerUsesIt($_, $file)) {
$anyUsed = 1;
print color("green"), "$_, ", color("reset") if $showStuffOtherThanUsedOrNot;
} else {
print color("red"), "$_, ", color("reset") if $showStuffOtherThanUsedOrNot;
}
print "\n" if $showStuffOtherThanUsedOrNot;
}
print(("\t" x $indent)."\t") if $showStuffOtherThanUsedOrNot;
print "Subs: " if $showStuffOtherThanUsedOrNot;
foreach my $s (findAllSubs($modulDoc)) {
my $isExported = grep($s eq $_, #exported) ? 1 : 0;
my $rot = callerUsesIt($s, $caller, $modul, $isExported) ? 0 : 1;
$anyUsed = 1 unless $rot;
if($showStuffOtherThanUsedOrNot) {
print color("red") if $rot;
print color("green") if !$rot;
print "$s, ";
print color("reset");
}
}
print "\n" if $showStuffOtherThanUsedOrNot;
print color("red"), "=========== $modul wahrscheinlich nicht in Benutzung!!!\n", color("reset") unless $anyUsed;
print color("green"), "=========== $modul in Benutzung!!!\n", color("reset") if $anyUsed;
}
}
return $skript;
}
sub createFileName {
my $file = shift;
$file =~ s#::#/#g;
$file .= ".pm";
$file = "/modules/$file";
return $file;
}
sub getExported {
my $doc = shift;
my #exported = ();
eval {
foreach my $x (#{$doc->find("PPI::Statement")}) {
my $worthATry = 0;
my $isMatch = 0;
foreach my $y (#{$x->{children}}) {
$worthATry = 1 if(ref $y eq "PPI::Token::Symbol");
if($y eq '#EXPORT') {
$isMatch = 1;
} elsif($isMatch && ref($y) ne "PPI::Token::Whitespace" && ref($y) ne "PPI::Token::Operator" && $y->{content} ne ";") {
push #exported, $y->{content};
}
}
}
};
my #realExported = ();
foreach (#exported) {
eval "\#realExported = $_";
}
return #realExported;
}
sub callerUsesIt {
my $subname = shift;
my $caller = shift;
my $namespace = shift || undef;
my $isExported = shift || 0;
$caller = `cat $caller`;
unless($namespace) {
return 1 if($caller =~ /\b$subname\b/);
} else {
$namespace = createPackageName($namespace);
my $regex = qr#$namespace(?:::|->)$subname#;
if($caller =~ $regex) {
return 1;
}
}
return 0;
}
sub findAllSubs {
my $doc = shift;
my #subs = ();
eval {
foreach my $x (#{$doc->find("PPI::Statement::Sub")}) {
my $foundName = 0;
foreach my $y (#{$x->{children}}) {
no warnings;
if($y->{content} ne "sub" && ref($y) eq "PPI::Token::Word") {
push #subs, $y;
}
use warnings;
}
}
};
return #subs;
}
sub createPackageName {
my $name = shift;
$name =~ s#/modules/##g;
$name =~ s/\.pm$//g;
$name =~ s/\//::/g;
return $name;
}
Its really ugly and maybe not 100% working, but it seems, with the tests that Ive done now, that its good for a beginning.

How can I recursively read out directories in Perl?

I want to read out a directory recursively to print the data-structure in an HTML-Page with Template::Toolkit.
But I'm hanging in how to save the Paths and Files in a form that can be read our easy.
My idea started like this
sub list_dirs{
my ($rootPath) = #_;
my (#paths);
$rootPath .= '/' if($rootPath !~ /\/$/);
for my $eachFile (glob($path.'*'))
{
if(-d $eachFile)
{
push (#paths, $eachFile);
&list_dirs($eachFile);
}
else
{
push (#files, $eachFile);
}
}
return #paths;
}
How could I solve this problem?
This should do the trick
use strict;
use warnings;
use File::Find qw(finddepth);
my #files;
finddepth(sub {
return if($_ eq '.' || $_ eq '..');
push #files, $File::Find::name;
}, '/my/dir/to/search');
You should always use strict and warnings to help you debug your code. Perl would have warned you for example that #files is not declared. But the real problem with your function is that you declare a lexical variable #paths on every recursive call to list_dirs and don't push the return value back after the recursion step.
push #paths, list_dir($eachFile)
If you don't want to install additional modules, the following solution should probably help you:
use strict;
use warnings;
use File::Find qw(find);
sub list_dirs {
my #dirs = #_;
my #files;
find({ wanted => sub { push #files, $_ } , no_chdir => 1 }, #dirs);
return #files;
}
The answer by mdom explains how your initial attempt went astray. I would also suggest that you consider friendlier alternatives to File::Find. CPAN has several options. Here's one.
use strict;
use warnings;
use File::Find::Rule;
my #paths = File::Find::Rule->in(#ARGV);
Also see here:
SO answer providing CPAN
alternatives to File::Find.
SO question on directory iterators.
And here is a rewrite of your recursive solution. Things to note: use strict; use warnings; and the use of a scoping block to create a static variable for the subroutine.
use strict;
use warnings;
print $_, "\n" for dir_listing(#ARGV);
{
my #paths;
sub dir_listing {
my ($root) = #_;
$root .= '/' unless $root =~ /\/$/;
for my $f (glob "$root*"){
push #paths, $f;
dir_listing($f) if -d $f;
}
return #paths;
}
}
I think you have problem in the following line in your code
for my $eachFile (glob($path.'*'))
You change the $path variable into $rootpath.
It will store the path correctly.
I use this script to remove hidden files (created by Mac OS X) from my USB Pendrive, where I usually use it to listen music in the car, and any file ending with ".mp3", even when it starts with "._", will be listed in the car audio list.
#!/bin/perl
use strict;
use warnings;
use File::Find qw(find);
sub list_dirs {
my #dirs = #_;
my #files;
find({ wanted => sub { push #files, $_ } , no_chdir => 1 }, #dirs);
return #files;
}
if ( ! #ARGV || !$ARGV[0] ) {
print "** Invalid dir!\n";
exit ;
}
if ( $ARGV[0] !~ /\/Volumes\/\w/s ) {
print "** Dir should be at /Volume/... > $ARGV[0]\n";
exit ;
}
my #paths = list_dirs($ARGV[0]) ;
foreach my $file (#paths) {
my ($filename) = ( $file =~ /([^\\\/]+)$/s ) ;
if ($filename =~ /^\._/s ) {
unlink $file ;
print "rm> $file\n" ;
}
}
you can use this method as recursive file search that separate specific file types,
my #files;
push #files, list_dir($outputDir);
sub list_dir {
my #dirs = #_;
my #files;
find({ wanted => sub { push #files, glob "\"$_/*.txt\"" } , no_chdir => 1 }, #dirs);
return #files;
}