I have multiple subdirectories and within each of them I have different number of txt files. I am trying to read each of the txt files into an array from each subdirectory. Note that each subdirectory has different number of txt files. I struggled to find that somebody did something similar. Does anybody has some suggestion where to look, how to do it or something like this?
I have found some example how it can be done by using server command, but I it fails to do what I want. I am also a bit confused how to name each array although within different subdirectory arrays can have same names like array1, array2, array3..
#!/usr/local/bin/perl
use strict;
use warnings;
use File::Glob;
my $txt;
my #fh;
my #table;
my $table;
for my $txt(glob'*.txt')
{
open my $fh,'<',$txt;
print "$txt\n";
for (my $txt =1 ;$txt <=8; $txt++)
{
open ($fh,"server$txt");
while (<$fh>)
{
chomp;
my #values = split " ",$_;
push #{ "table$txt"},\#values;
print "$table$txt\n";
}
}
}
I can use this bash script to run perl script on all subdirectories:
for i in `ls -d */`;do cd $i; pwd; for j in *txt; do perl ../foo.pl $j; done; cd ../ ; done
I have not tested this, I only typed the code into the window. I am also rusty on my perl and do things in a blunt manner.
Assuming all the text dirs are under one main directory. You open the main directory using opendir. You then read all the entries in the main directory and test to see if the entry is a subdirectory(where the .txt files will be).
Accessing the sub directory, you then you test for the .txt extension using glob and regex which will return an array, pushing the array into the main array, creating an array of arrays. You can look up how to iterate over this structure to get your information.
#all_subdirs;
$main_dir = "C:\main_dir";
my #files;
opendir (DIR,$main_dir) || die $!;
#files = readdir (DIR);
closedir (DIR) || die $!;
$i = 0;
foreach $subdir(#files){
if(-d $sub_dir){ #-d tests for directory
#tmp = glob "$main_dir\$subdir\*.txt";
$all_dirs[$i] = [ #tmp ];
$i++;
}
}
This stores the subdir array as an array reference. To get an array back, you need to dereference it as follows:
$arrayref = $all_subdirs[0];
#an_array = #$array_ref;
You are not clear whether you wish to read the names of the files into an array, or the contents of the files.
In the former case, you may benefit from the File::Find module to collect all filenames in a specific directory and its subdirectories; in the latter case you should use both File::Find and File::Slurp to read the contents.
Related
I received a Perl script which currently reads a list of directories from a text file and stores them in a string vector. I would like to modify it so that it reads the names of all the directories in the current folder, and stores them in the vector. This way, the user doesn't have to modify the input file each time the list of directories in the current folder changes.
I have no knowledge of Perl, apart from the fact that it looks like array indices in Perl start from 0 (as in Python). I have a basic knowledge of bash and Python, but I'd rather not rewrite the script from scratch in Python. It's a long, complex script, and I'm not sure I'd be able to rewrite it in Python. Can you help me? Here is the part of the script which is currently reading the text file:
#!/usr/bin/perl
use Cwd;
.
.
.
open FILES, "<files.txt" or die; # open input file
<FILES> or die; # skip a comment
my $nof = <FILES> or die; # number of directories
<FILES> or die; # skip a comment
my #massflow; # read directories
for (my $i = 0; $i < $nof; $i++){
chomp($massflow[$i] = <FILES>);
}
.
.
.
close(FILES);
PS I think the script is rather self-explanatory, but just to be sure, this piece opens a text file called "files.txt", skips a line, reads the number of directories, skips another line and reads, one name for each line, the names of all the directories in the current folder, as written in "files.txt".
EDIT I wrote this script following #Sobrique suggestion, but it lists also files, not only dirs:
#!/usr/bin/perl
use Cwd;
my #flow = glob ("*");
my $arrSize = #flow;
print $arrSize;
for (my $i = 0; $i < $arrSize; $i++){
print $flow[$i], "\n";
}
It's simpler than you think:
my #list_of_files = glob ("/path/to/files/*");
If you want to filter by a criteria - like 'is it a directory' you can:
my #list_of_dirs = grep { -d } glob "/path/to/dirs/*";
Open directory inside which the sub-directories are with opendir, read its content with readdir. Filter out everything that is not a directory using file test -d, see -X
my $rootdir = 'top-level-directory';
opendir my $dh, "$rootdir" or die "Can't open directory $rootdir: $!";
my #dirlist = grep { -d } map { "$rootdir/$_" } readdir ($dh);
Since readdir returns bare names we need to prepend the path.
You can also get dir like this:
my #dir = `find . -type d`;
perl -e ' use strict; use warnings; use Data::Dumper; my #dir = `find . -type d`; print Dumper(\#dir);'
$VAR1 = [
'.
',
'./.fonts
',
'./.mozilla
',
'./bin
',
'./.ssh
',
'./scripts
'
];
I am new to perl. I have a directory structure. In each directory, I have a log file. I want to grep pattern from that file and do post processing. Right now I am grepping the pattern from those files using unix grep and putting into text file and reading that text file to do post processing, But I want to automate task of reading each file and grepping pattern from that file. In the code below the mdp_cgdis_1102.txt have grepped pattern from directories. I would really appreciate any help
#!usr/bin/perl
use strict;
use warnings;
open FILE, 'mdp_cgdis_1102.txt' or die "Cannot open file $!";
my #array = <FILE>;
my #arr;
my #brr;
foreach my $i (#array){
#arr = split (/\//, $i);
#brr = split (/\:/, $i);
print " $arr[0] --- $brr[2]";
}
It is unclear to me which part of the process needs automating. I'll go by "want to automate reading each file and grepping pattern from that file," whereby you presumably already have a list of files. If you actually need to build the file list as well see the added code below.
One way: pull all patterns from each file and store that in a hash (filename => arrayref-with-patterns)
my %file_pattern;
foreach my $file (#filelist) {
open my $fh, '<', $file or die "Can't open $file: $!";
$file_pattern{$file} = [ grep { /$pattern/ } <$fh> ];
close $fh;
}
The [ ] takes a reference to the list returned by grep, ie. constructs an "anonymous array", and that (reference) is assigned as a value to the $file key.
Now you can process your patterns, per log file
foreach my $filename (sort keys %file_pattern) {
print "Processing log $filename.\n";
my #patterns = #{$file_pattern{$filename}};
# Process the list of patterns in this log file
}
ADDED
In order to build the list of files #filelist used above, from a known list of directories, use core File::Find
module which recursively scans supplied directories and applies supplied subroutines
use File::Find;
find( { wanted => \&process_logs, preprocess => \&select_logs }, #dir_list);
Your subroutine process_logs() is applied to each file/directory that passed preprocessing by the second sub, with its name available as $File::Find::name, and in it you can either populate the hash with patterns-per-log as shown above, or run complete processing as needed.
Your subroutine select_logs() contains code to filter log files from all files in each directory, that File::Find would normally processes, so that process_file() only gets the log files.
Another way would be to use the other invocation
find(\&process_all, #dir_list);
where now the sub process_all() is applied to all entries (files and directories) found and thus this sub itself needs to ensure that it only processes the log files. See linked documentation.
The equivalent of
find ... -name '*.txt' -type f -exec grep ... {} +
is
use File::Find::Rule qw( );
my $base_dir_qfn = ...;
my $re = qr/.../;
my #log_qfns =
File::Find::Rule
->name(qr/\..txt\z/)
->file
->in($base_dir_qfn);
my $success = 1;
for my $log_qfn (#log_qfns) {
open(my $fh, '<', $log_qfn)
or do {
$success = 0;
warn("Can't open log file \"$log_qfn\": $!\n);
next;
};
while (<$fh>) {
print if /$re/;
}
}
exit(1) if !$success;
Use File::Find to traverse the directory.
In a loop go through all the logfiles:
Open the file
read it line by line
For each line, do a regular expression match (
if ($line =~ /pattern/) ) or use
if (index($line, $searchterm) >= 0) if you are looking for a certain static string.
If you find a match, print the line.
close the file
I hope that gives you enough pointers to get started. You will learn more if you find out how to do each of these steps in Perl by yourself (I pointed out the hard ones).
How do I find, in a given path, all folders with no further subfolders? They may contain files but no further folders.
For example, given the following directory structure:
time/aa/
time/aa/bb
time/aa/bb/something/*
time/aa/bc
time/aa/bc/anything/*
time/aa/bc/everything/*
time/ab/
time/ab/cc
time/ab/cc/here/*
time/ab/cc/there/*
time/ab/cd
time/ab/cd/everywhere/*
time/ac/
The output of find(time) should be as follows:
time/aa/bb/something/*
time/aa/bc/anything/*
time/aa/bc/everything/*
time/ab/cc/here/*
time/ab/cc/there/*
time/ab/cd/everywhere/*
* above represents files.
Any time you want to write a directory walker, always use the standard File::Find module. When dealing with the filesystem, you have to be able to handle odd corner cases, and naïve implementations rarely do.
The environment provided to the callback (named wanted in the documentation) has three variables that are particularly useful for what you want to do.
$File::Find::dir is the current directory name
$_ is the current filename within that directory
$File::Find::name is the complete pathname to the file
When we find a directory that is not . or .., we record the complete path and delete its parent, which we now know cannot be a leaf directory. At the end, any recorded paths that remain must be leaves because find in File::Find performs a depth-first search.
#! /usr/bin/env perl
use strict;
use warnings;
use File::Find;
#ARGV = (".") unless #ARGV;
my %dirs;
sub wanted {
return unless -d && !/^\.\.?\z/;
++$dirs{$File::Find::name};
delete $dirs{$File::Find::dir};
}
find \&wanted, #ARGV;
print "$_\n" for sort keys %dirs;
You can run it against a subdirectory of the current directory
$ leaf-dirs time
time/aa/bb/something
time/aa/bc/anything
time/aa/bc/everything
time/ab/cc/here
time/ab/cc/there
time/ab/cd/everywhere
or use a full path
$ leaf-dirs /tmp/time
/tmp/time/aa/bb/something
/tmp/time/aa/bc/anything
/tmp/time/aa/bc/everything
/tmp/time/ab/cc/here
/tmp/time/ab/cc/there
/tmp/time/ab/cd/everywhere
or plumb multiple directories in the same invocation.
$ mkdir -p /tmp/foo/bar/baz/quux
$ leaf-dirs /tmp/time /tmp/foo
/tmp/foo/bar/baz/quux
/tmp/time/aa/bb/something
/tmp/time/aa/bc/anything
/tmp/time/aa/bc/everything
/tmp/time/ab/cc/here
/tmp/time/ab/cc/there
/tmp/time/ab/cd/everywhere
Basically, you open the root folder and use following procedure:
sub child_dirs {
my ($directory) = #_;
Open the directory
opendir my $dir, $directory or die $!;
select the files from the files in this directory where the file is a directory
my #subdirs = grep {-d $_ and not m</\.\.?$>} map "$directory/$_", readdir $dir;
# ^-- directory and not . or .. ^-- use full name
If the list of such selected files contains elements,
3.1. then recurse into each such directory,
3.2. else this directory is a "leaf" and it will be appended to the output files.
if (#subdirs) {
return map {child_dirs($_)} #subdirs;
} else {
return "$directory/*";
}
# OR: #subdirs ? map {child_dirs($_)} #subdirs : "$directory/*";
.
}
Example usage:
say $_ for child_dirs("time"); # dir `time' has to be in current directory.
This function will do it. Just call it with your initial path:
sub isChild {
my $folder = shift;
my $isChild = 1;
opendir(my $dh, $folder) || die "can't opendir $folder: $!";
while (readdir($dh)) {
next if (/^\.{1,2}$/); # skip . and ..
if (-d "$folder/$_") {
$isChild = 0;
isChild("$folder/$_");
}
}
closedir $dh;
if ($isChild) { print "$folder\n"; }
}
I tried the readdir way of doing things. Then I stumbled upon this...
use File::Find::Rule;
# find all the subdirectories of a given directory
my #subdirs = File::Find::Rule->directory->in( $directory );
I eliminated any entry matching the initial part of the string and not having some of the leaf entries, from this output.
I have a string stored in a Perl variable that should match with the beginning part of a file name stored in a directory.
I use this variable to find the file matching this pattern from the directory, using Perl's grep. Here's what I am doing:
opendir (DIR, "data/testroot/") or die "$!";
#file1 = <$f1/*.hdf>
foreach(#file1){
$patt = substr(basename($_),0,$ind);
$file2 = grep {/${patt}*\.hdf/} readdir(DIR);
#other code follows.......
}
closedir(DIR);
First, I get a list of all files in folder f1 and storing them in the array #file. Then for each entry in #file1, I extract the first few characters, store them in $patt, then try to pick up similar files from another folder data/testroot/ which have the matching beginning pattern as stored in $patt.
That grep $file2 = grep {/${patt}*\.hdf/} readdir(DIR); is not working.
I think you want to find all *.hdf files in directory A whose filenames match the first $ind characters of any such file in directory B?
You should use either glob or readdir for both directories, but not both. In this case glob seems to be the best bet as it allows you to select all *.hdf files from A without having to check them with the regex.
The program below seems to do what you need. I have substituted sample values for $f1 and $ind.
use strict;
use warnings;
use File::Basename;
my $f1 = 'data';
my $f1 = 'data/testroot';
my $ind = 6;
foreach (glob "$f1/*.hdf") {
my $patt = substr(basename($_), 0, $ind);
my #match = glob "$f2/$patt*.hdf";
#other code follows.......
}
${patt}*\.hdf means "0 or more occurrences of $patt, followed by .hdf". Are you sure you don't mean "$patt, followed by arbitrary text, followed by .hdf"?
That would be /${patt}.*\.hdf/.
How can I scan an entire directory's contents, including its subdirectories' contents, and find the newest .pl file within them using Perl?
I want to build a sorted array/list of the full file paths of all .pl files within a directory tree.
So, for example, if my base directory is /home/users/cheeseconqueso/ I want to search for .pl files in that directory and any subdirectory within that path and then sort the .pl files by date.
The end result would be an array, #pl_paths, where $pl_paths[0] would be something like /home/users/cheeseconqueso/maybe_not_newest_directory/surely_newest_file.pl
From that result, I want to execute the file, but I think once I get the sorted array figured out, executing the file in $pl_paths[0], won't be a problem.
There is a similar question on SO that I have been trying to modify to suit my needs, but I am here now for obvious reasons.
The code I'm using to get the newest file NAME only in one directory is:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my %files = map { $_ => (stat("$DIR/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
print $sorted_files[0]."\n";
You can use File::Find if you want a core module for this, but I would prefer to use File::Find::Rule.
To start off, we can find all of the .pl files under a directory with
use File::Find::Rule;
my #files = File::Find::Rule->file
->name('*.pl')
->in($directory);
Then let's use map to associate filenames with their modification times:
my #files_with_mtimes = map +{ name => $_, mtime => (stat $_)[9] }, #files;
And sort them by mtime:
my #sorted_files = reverse sort { $a->{mtime} <=> $b->{mtime} }
#files_with_mtimes;
And from there, the name of the newest one is in $sorted_files[0]{name}.
If you only want to find the top one, there's actually no need to do a complete sort, but the nicest solution I can think of involves some slightly advanced FP, so don't worry about it at all if it looks strange to you:
use List::Util 'reduce';
my ($top_file) = reduce { $a->{mtime} >= $b->{mtime} ? $a : $b }
#files_with_mtimes;
With File::Find::Rule, and Schwartzian transform, you can get the newest file with .pl extension, in a subtree starting from dir_path.
#!/usr/bin/env perl
use v5.12;
use strict;
use File::Find::Rule;
my #files = File::Find::Rule->file()->name( '*.pl' )->in( 'dir_path' );
# Note that (stat $_ )[ 9 ] yields last modified timestamp
#files =
map { $_->[ 0 ] }
sort { $b->[ 1 ] <=> $a->[ 1 ] }
map { [ $_, ( stat $_ )[ 9 ] ] } #files;
# Here is the newest file in path dir_path
say $files[ 0 ];
The map-sort-map chain is a typical idiom: getting timestamp is slow, so we do it only one time per file, keeping every timestamp with its file in an arrayref. Then we sort the new list using timestamp ( comparing the second element of each arrayref ), and finally we discard timestamps, keeping only filenames.
Use the File::Find core module.