Create an array of hashes describing files obtained from a given directory - perl

I need to
Read a list of the files from in a specified directory
Create an array of hashes describing those files
My program fetches the path to a directory from the command line, opens it and reads its contents. Directories and "dot" files are skipped, and the absolute path to every other entry is printed.
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use File::Spec;
use Digest::SHA qw(sha256_hex);
my $dir = $ARGV[0];
opendir DIR, $dir or die "cannot open dir $dir: $!";
while ( my $file = readdir DIR ) {
next unless ( -f "${ \File::Spec->catfile($dir, $file) }" );
next if ( "$file" =~ m/^\./) ;
print "${ \File::Spec->rel2abs($file) }\n";
}
closedir DIR;
Here I take a single file and create a hash with the path, size, and sha256sum.
my $file = "file1.txt";
my #fileref = (
{
path => "Full path: " . File::Spec->rel2abs($file),
size => "Size (bytes): " . -s $file,
id => "SHA256SUM: " . sha256_hex($file),
},
);
print "\n";
print "$fileref[0]{path}\n";
print "$fileref[0]{size}\n";
print "$fileref[0]{id}\n";
All of this works, but I cannot figure out how to iterate over each file and add it to the array.
This is what I planned
for each file
push file into array
add the path, size, and id key:value pairs to file
repeat
How can I generate the necessary array?

Thanks to Wumpus Q. Wumbley's push suggestion, I have solved my problem:
my #array;
opendir DIR, $dir or die "cannot open dir $dir: $!";
while(my $file = readdir DIR) {
next unless(-f "${\File::Spec->catfile($dir, $file)}");
next if("$file" =~ m/^\./);
#print "${\File::Spec->rel2abs($file)}\n";
my %hash = (
path => File::Spec->rel2abs($file),
size => -s $file,
id => sha256_hex($file),
);
push(#array, \%hash);
#print Dumper sort \#array;
}
closedir DIR;
print Dumper \#array;
I create the "frame" for the hash, and then pass it to the array via reference and the push function.

Here's my approach to a solution. I've saved a lot of code by using File::Spec::Functions instead of File::Spec, and calling rel2abs only once
I've also removed the labels, like "Full path: ", from the values in the hashes. There's no reason to put presentation strings in there: that's for the output code to do
use strict;
use warnings;
use File::Spec::Functions qw/ catfile rel2abs /;
use Digest::SHA qw/ sha256_hex /;
use Data::Dumper qw/ Dumper /;
my ( $root ) = #ARGV;
$root = rel2abs( $root ); # Allow for relative path input
my #file_data;
{
opendir my $dh, $root or die qq{Cannot open directory "$root": $!};
while ( readdir $dh ) {
next if /^\./;
my $file = catfile( $root, $_ );
next unless -f $file;
push #file_data, {
path => $file,
size => -s $file,
id => sha256_hex( $file ),
};
}
}
print Dumper \#file_data;

You have all the code, you are creating an array with the first element. Your way to dereference that hints at an easy way to add others:
my $i = 0;
while(my $file = readdir DIR) {
next unless(-f "${\File::Spec->catfile($dir, $file)}");
next if("$file" =~ m/^\./);
print "${\File::Spec->rel2abs($file)}\n";
$fileref[$i]{path} = File::Spec->rel2abs($file);
$fileref[$i]{size} = -s $fileref[0]{path};
$fileref[$i]{id} = sha256_hex(fileref[0]{path});
$i++;
}

Related

Perl cannot stat $_

I want the last modified time for each file in the directory. To make sure my loop is working I print $_ and I see the file names of the directory:
for ( #Files ) {
opendir( D, $path . '\/' . $_ ) or die "$!";
my #textfiles = grep { ! /^\.{1,2}$/ } readdir( D );
for ( #textfiles ) {
# print "$_\n"; <----the file names.
my $epoch_timestamp = ( stat( $_ ) )[9];
print "$epoch_timestamp\n";
}
I get this error
Use of uninitialized value $epoch_timestamp in concatenation (.) or string
What am I doing wrong?
readdir returns only the names of the files. If your current working directory is different then you must build the full path as you did with the parameter to opendir. The easiest way is to use map in the list for the for loop
I'm concerned about your statement
opendir( D, $path . '\/' . $_ ) or die "$!";
which will put, literally, \/ between $path and $_. I think you need just /, but it is simplest to interpolate the variables with
opendir( D, "$path/$_" ) or die "$!";
But $_ comes from the array #Files. If these are indeed file names then your opendir will fail. They need to be directory names
In my solution I've built the variable $dir as
my $dir = "$path/$_"
so that it can be used in the call to opendir as well as to build the full path to the files in the following for loop
Note that I have also used a lexical directory handle my $dh, which are far superior to global handles D
for ( #Files ) {
my $dir = "$path/$_";
opendir my $dh, $dir or die $!;
my #textfiles = grep { ! /^\.{1,2}$/ } readdir $dh;
for ( map { "$dir/$_" } #textfiles ) {
# print "$_\n"; <----the file names.
my $epoch_timestamp = ( stat( $_ ) )[9];
print "$epoch_timestamp\n";
}
Or alternatively to above perfect answers, you could use some modules and make your life more easy. :) Like: Path::Tiny[1]
use 5.014;
use warnings;
use Path::Tiny;
my $path = path('/etc');
my #Files = qw(defaults cups ssl);
for my $dir (#Files) {
my #textfiles = $path->child($dir)->children;
for my $file (#textfiles) {
say "$file: ", $file->stat->mtime;
}
}
Of course, the above the nested loop could be written as
for my $dir (#Files) {
my #textfiles = $path->child($dir)->children;
say "$_: ", $_->stat->mtime for (#textfiles);
}
and also storing the list of files into #textfiles isn't necessary, so it could be reduced to:
for my $dir (#Files) {
say "$_: ", $_->stat->mtime for ( $path->child($dir)->children );
}
Path::Tiny conveniently throws a clean exception message on error.
readdir only returns the name of the file in the directory. You need to provide a qualified path to the file to stat.
my $dir_qfn = ...;
opendir(my $dh, $dir_qfn)
or do {
warn("Can't read dir \"$dir_qfn\": $!\n");
next;
};
while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^\.\.?\z/;
my $qfn = "$dir_qfn/$fn";
my $mtime = ( stat($qfn) )[9];
defined($mtime)
or do {
warn("Can't stat file \"$file_qfn\": $!\n");
next;
};
...
}
Using glob instead
my $dir = ...;
my %ts =
map { $_ => (stat $_)[9] }
grep { !m{/\.\.?\z} } #/
glob "\Q$dir\E/{*,.*}";
say "ts{$_} => $_" for sort keys %ts;
I use a hash name => timestamp to collect both in a data structure. The pattern $dir/{*,.*} is there to catch dot files as well, or it would be just $dir/*.
The grep filters out . and .. filenames, found in path by m{..} match. Its pattern needs \Q..\E to prevent an injection bug with particular directory names. It also escapes spaces so File::Glob with its :bsd_globoption isn't needed. Thanks to ikegami for comments.
If you'd rather process files one at a time, retrieve the list with glob and then iterate through it.

Counting the number of files in a directory of special type in perl

I would like to know if there is anyway to find the number of files exsiting in a folder with special type. For example I have a folder with 30 files with *.txt, *.doc and html extension. I want to know the number of say html file inthis directory.
Update: Here is what I have as a number os files in the directory. But I am not sure how I could use glob(). Of course, instead of getcwd one could give another parameter.
use Cwd;
my $dir = getcwd;
my $count = 0;
opendir (DIR, $dir) or die $!;
my #dir = readdir DIR;
my #file_list;
if (#file_list eq glob "*.pl"){
print "$item\n";
$count = $count + 1;
}
closedir DIR;
$count = $count - 2;
print "There are $count files in this directory.";
I found out how to do it without glob():
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
my $dir = getcwd;
my $count = 0;
opendir(my $dh, $dir) or die "$0: $dir: $!\n";
while (my $file = readdir($dh)) {
# We only want files
next unless (-f "$dir/$file");
# Use a regular expression to find files ending in .txt
next unless ($file =~ m/\.html$/);
print "$file\n";
$count = $count + 1;
}
closedir($dh);
print "There are $count files in this directory.";
exit 0;
Thanks a lot for the comments!!
The problem you've got in your question is that glob is a bit magic. You can do this:
foreach my $file ( glob ("*.txt") ) {
print $file,"\n";
}
and
while ( my $file = glob ("*.txt" )) {
print $file,"\n";
}
Glob is detecting whether you're expecting a scalar (single value) return - in which case it works as an iterator - or an array (multiple scalars) - in which case it returns the whole lot.
You can make it do what you want like this:
my #stuff = glob ( "*.txt" );
print "There are: ", scalar #stuff," files matching the pattern\n";
print join ( "\n", #stuff );
Note that readdir works the same way - you can either slurp the whole lot doing it in a list context, or one line at a time with a scalar context:
opendir ( my $dirh, "some_directory");
my #stuff = readdir ( $dirh );
#etc.
Or
opendir ( my $dirh, "." ) or die $!;
while ( my $dir_entry = readdir ( $dirh ) ) {
#etc.
}
If you do want to do readdir-and-filter you can also do it like this:
my #matches = grep { m/\.txt$/ } readdir ( $dirh );
For example (this doesn't save you any efficiency - grep just hides the loop. It might make it more readable - that's a matter of taste).

perl + read multiple csv files + manipulate files + provide output_files

Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.

How to read directories and sub-directories without knowing the directory name in perl?

Hi i want to read directories and sub-directories without knowing the directory name. Current directory is "D:/Temp". 'Temp' has sub-directories like 'A1','A2'. Again 'A1' has sub-directories like 'B1','B2'. Again 'B1' has sub-directories like 'C1','C2'. Perl script doesn't know these directories. So it has to first find directory and then read one file at a time in dir 'C1' once all files are read in 'C1' it should changes to dir 'C2'. I tried with below code here i don't want to read all files in array(#files) but need one file at time. In array #dir elements should be as fallows.
$dir[0] = "D:/Temp/A1/B1/C1"
$dir[1] = "D:/Temp/A1/B1/C2"
$dir[2] = "D:/Temp/A1/B2/C1"
Below is the code i tried.
use strict;
use File::Find::Rule;
use Data::Dumper;
my $dir = "D:/Temp";
my #dir = File::Find::Rule->directory->in($dir);
print Dumper (\#dir);
my $readDir = $dir[3];
opendir ( DIR, $readDir ) || die "Error in opening dir $readDir\n";
my #files = grep { !/^\.\.?$/ } readdir DIR;
print STDERR "files: #files \n\n";
for my $fil (#files) {
open (F, "<$fil");
read (F, my $data);
close (F);
print "$data";
}
use File::Find;
use strict;
use warnings;
my #dirs;
my %has_children;
find(sub {
if (-d) {
push #dirs, $File::Find::name;
$has_children{$File::Find::dir} = 1;
}
}, 'D:/Temp');
my #ends = grep {! $has_children{$_}} #dirs;
print "$_\n" for (#ends);
Your Goal: Find the absolute paths to those directories that do not themselves have child directories.
I'll call those directories of interest terminal directories. Here's the prototype for a function that I believe provides the convenience you are looking for. The function returns its result as a list.
my #list = find_terminal_directories($full_or_partial_path);
And here's an implementation of find_terminal_directories(). Note that this implementation does not require the use of any global variables. Also note the use of a private helper function that is called recursively.
On my Windows 7 system, for the input directory C:/Perl/lib/Test, I get the output:
== List of Terminal Folders ==
c:/Perl/lib/Test/Builder/IO
c:/Perl/lib/Test/Builder/Tester
c:/Perl/lib/Test/Perl/Critic
== List of Files in each Terminal Folder: ==
c:/Perl/lib/Test/Builder/IO/Scalar.pm
c:/Perl/lib/Test/Builder/Tester/Color.pm
c:/Perl/lib/Test/Perl/Critic/Policy.pm
Implementation
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd qw(abs_path getcwd);
my #dir_list = find_terminal_directories("C:/Perl/lib/Test");
print "== List of Terminal Directories ==\n";
print join("\n", #dir_list), "\n";
print "\n== List of Files in each Terminal Directory: ==\n";
for my $dir (#dir_list) {
for my $file (<"$dir/*">) {
print "$file\n";
open my $fh, '<', $file or die $!;
my $data = <$fh>; # slurp entire file contents into $data
close $fh;
# Now, do something with $data !
}
}
sub find_terminal_directories {
my $rootdir = shift;
my #wanted;
my $cwd = getcwd();
chdir $rootdir;
find_terminal_directories_helper(".", \#wanted);
chdir $cwd;
return #wanted;
}
sub find_terminal_directories_helper {
my ($dir, $wanted) = #_;
return if ! -d $dir;
opendir(my $dh, $dir) or die "open directory error!";
my $count = 0;
foreach my $child (readdir($dh)) {
my $abs_child = abs_path($child);
next if (! -d $child || $child eq "." || $child eq "..");
++$count;
chdir $child;
find_terminal_directories_helper($abs_child, $wanted); # recursion!
chdir "..";
}
push #$wanted, abs_path($dir) if ! $count; # no sub-directories found!
}
Perhaps the following will be helpful:
use strict;
use warnings;
use File::Find::Rule;
my $dir = "D:/Temp";
local $/;
my #dirs =
sort File::Find::Rule->exec( sub { File::Find::Rule->directory->in($_) == 1 }
)->directory->in($dir);
for my $dir (#dirs) {
for my $file (<"$dir/*">) {
open my $fh, '<', $file or die $!;
my $data = <$fh>;
close $fh;
print $data;
}
}
local $/; lets us slurp the file's contents into a variable. Delete it if you only want to read the first line.
The sub in the exec() is used to pass only those dirs which don't contain a dir
sort is used to arrange those dirs in your wanted order
A file glob <"$dir/*"> is used to get the files in each dir
Edit: Have modified the code to find only 'terminal directories.' Thanks to DavidRR for this spec clarification.
I would use File::Find
Sample script:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $dir = "/home/chris";
find(\&wanted, $dir);
sub wanted {
print "dir: $File::Find::dir\n";
print "file in dir: $_\n";
print "complete path to file: $File::Find::name\n";
}
OUTPUTS:
$ test.pl
dir: /home/chris/test_dir
file in dir: test_dir2
complete path to file: /home/chris/test_dir/test_dir2
dir: /home/chris/test_dir/test_dir2
file in dir: foo.txt
complete path to file: /home/chris/test_dir/test_dir2/foo.txt
...
Using backticks, write subdirs and files to a file called filelist:
`ls -R $dir > filelist`

Find::File to search a directory of a list of files

I'm writing a Perl script and I'm new to Perl -- I have a file that contains a list of files. For each item on the list I want to search a given directory and its sub-directories to find the file return the full path. I've been unsuccessful thus far trying to use File::Find. Here's what I got:
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
find(\&wanted, $directory);
sub wanted {
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
return #file_list;
}
I find File::Find::Rule a tad easier and more elegant to use.
use File::Find::Rule;
my $path = '/some/path';
# Find all directories under $path
my #paths = File::Find::Rule->directory->in( $path );
# Find all files in $path
my #files = File::Find::Rule->file->in( $path );
The arrays contain full paths to the objects File::Find::Rule finds.
File::Find is used to traverse a directory structure in the filesystem. Instead of doing what you're trying to do, namely, have the wanted subroutine read in the file, you should read in the file as follows:
use strict;
use warnings;
use vars qw/#file_list/;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
open FILE, "$input_file" or die "$!\n";
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
# do what you need to here with the #file_list array
Okay, well re-read the doc and I misunderstood the wanted subroutine. The wanted is a subroutine that is called on every file and directory that is found. So here's my code to take that into account
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
find(\&wanted, $directory);
sub wanted {
if ( $_ ~~ #file_list ) {
print "$File::Find::name\n";
}
return;
}