How can I list files under a directory with a specific name pattern using Perl? - perl

I have a directory /var/spool and inside that, directories named
a b c d e f g h i j k l m n o p q r s t u v x y z
And inside each "letter directory", a directory called "user" and inside this, many directories called auser1 auser2 auser3 auser4 auser5 ...
Every user directory contains mail messages and the file names have the following format: 2. 3. 4. 5. etc.
How can I list the email files for every user in every directory in the following way:
/var/spool/a/user/auser1/11.
/var/spool/a/user/auser1/9.
/var/spool/a/user/auser1/8.
/var/spool/a/user/auser1/10.
/var/spool/a/user/auser1/2.
/var/spool/a/user/auser1/4.
/var/spool/a/user/auser1/12.
/var/spool/b/user/buser1/12.
/var/spool/b/user/buser1/134.
/var/spool/b/user/buser1/144.
etc.
I need that files and then open every single file for modify the header and body. This part I already have, but I need the first part.
I am trying this:
dir = "/var/spool";
opendir ( DIR, $dir ) || die "No pude abrir el directorio $dirname\n";
while( ($filename = readdir(DIR))){
#directorios1 = `ls -l "$dir/$filename"`;
print("#directorios1\n");
}
closedir(DIR);
But does not work the way I need it.

You can use File::Find.

As others have noted, use File::Find:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
find(\&find_emails => '/var/spool');
sub find_emails {
return unless /\A[0-9]+[.]\z/;
return unless -f $File::Find::name;
process_an_email($File::Find::name);
return;
}
sub process_an_email {
my ($file) = #_;
print "Processing '$file'\n";
}

Use File::Find to traverse a directory tree.

For a fixed level of directories, sometimes it's easier to use glob than File::Find:
while (my $file = </var/spool/[a-z]/user/*/*>) {
print "Processing $file\n";
}

People keep recommending File::Find, but the other piece that makes it easy is my File::Find::Closures, which provides the convenience functions for you:
use File::Find;
use File::Find::Closures qw( find_by_regex );
my( $wanted, $reporter ) = find_by_regex( qr/^\d+\.\z/ );
find( $wanted, #directories_to_search );
my #files = $reporter->();
You don't even need to use File::Find::Closures. I wrote the module so that you could lift out the subroutine you wanted and paste it into your own code, perhaps tweaking it to get what you needed.

Try this:
sub browse($);
sub browse($)
{
my $path = $_[0];
#append a / if missing
if($path !~ /\/$/)
{
$path .= '/';
}
#loop through the files contained in the directory
for my $eachFile (glob($path.'*'))
{
#if the file is a directory
if(-d $eachFile)
{
#browse directory recursively
browse($eachFile);
}
else
{
# your file processing here
}
}
}#browse

Related

Need to loop through directory and all of it's subdirectories to find files at a certain in Perl

I am attempting to loop through a directory and all of its sub-directories to see if the files within those directories are a certain size. But I am not sure if the files in the #files array still contains the file size so I can compare the size( i.e. - size <= value_size ). Can someone offer any guidance?
use strict;
use warnings;
use File::Find;
use DateTime;
my #files;
my $dt = DateTime->now;
my $date = $dt->ymd;
my $start_dir = "/apps/trinidad/archive/in/$date";
my $empty_file = 417;
find( \&wanted, $start_dir);
for my $file( #files )
{
if(`ls -ltr | awk '{print $5}'`<= $empty_file)
{
print "The file $file appears to be empty please check within the folder if this is empty"
}
else
return;
}
exit;
sub wanted {
push #files, $File::Find::name unless -d;
return;
}
I think you could use this code instead of shelling out to awk.
(Don't understand why my empty_file = 417; is an empty file size).
if (-s $file <= $empty_file)
Also notice that you are missing an open and close brace for your else branch.
(Unsure why you want to 'return' if the first file found that is not 'empty' branches to the return which doesn't do anything because return is only used to return from a function).
The exit is unnecessary and the return in the wanted function is unnessary.
Update: A File::Find::Rule solution could be used. Here is a small program that captures all files less than 14 bytes in my current directory and all of it's subdirectories.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use File::Find::Rule;
my $dir = '.';
my #files = find( file => size => "<14", in => $dir);
say -s $_, " $_" for #files;

Perl - concatenate files with similar names pattern and write concatenated file names to a list

I have a directory with multiple sub-directories in it and each subdir has a fixed set of files - one for each category like -
1)Main_dir
1.1) Subdir1 with files
- Test.1.age.txt
- Test.1.name.txt
- Test.1.place.csv
..........
1.2) Subdir2 with files
- Test.2.age.txt
- Test.2.name.txt
- Test.2.place.csv
.........
there are around 20 folders with 10 files in them. I need to first concatenate files under each category like Test.1.age.txt and Test.2.age.txt into a combined.age.txt file and once I do all concatenation I want to printout these filenames in a new Final_list.txt file like
./Main_dir/Combined.age.txt
./Main_dir/Combined.name.txt
I am able to read all the files from all subdirs in an array, but i am not sure how to do pattern search for the similar files names. Also, will be able to figure out this printout part of the code. Can anyone please share on how to do this pattern search for concatenation? My code so far :
use warnings;
use strict;
use File::Spec;
use Data::Dumper;
use File::Basename;
foreach my $file (#files) {
print "$file\n";
}
my $testdir = './Main_dir';
my #Comp_list = glob("$testdir/test_dir*/*.txt");
I am trying to do the pattern search on the array contents in the #Comp_list, which I surely need to learn -
foreach my $f1 (#Comp_list) {
if($f1 !~ /^(\./\.txt$/) {
print $f1; # check if reading the file right
#push it to a file using concatfile(
}}
Thanks a lot!
This should work for you. I've only tested it superficially as it would take me a while to create some test data, so as you have some at hand I'm hoping you'll report back with any problems
The program segregates all the files found by the equivalent of your glob call, and puts them in buckets according to their type. I've assumed that the names are exactly as you've shown, so the type is penultimate field when the file name is split on dots; i.e. the type of Test.1.age.txt is age
Having collected all of the file lists, I've used a technique that is originally designed to read through all of the files specified on the command line. If #ARGV is set to a list of files then an <ARGV> operation will read through all the files as if they were one, and so can easily be copied to a new output file
If you need the files concatenated in a specific order then I will have to amend my solution. At present they will be processed in the order that glob returns them -- probably in lexical order of their file names, but you shouldn't rely on that
use strict;
use warnings 'all';
use v5.14.0; # For autoflush method
use File::Spec::Functions 'catfile';
use constant ROOT_DIR => './Main_dir';
my %files;
my $pattern = catfile(ROOT_DIR, 'test_dir*', '*.txt');
for my $file ( glob $pattern ) {
my #fields = split /\./, $file;
my $type = lc $fields[-2];
push #{ $files{$type} }, $file;
}
STDOUT->autoflush; # Get prompt reports of progress
for my $type ( keys %files ) {
my $outfile = catfile(ROOT_DIR, "Combined.$type.txt");
open my $out_fh, '>', $outfile or die qq{Unable to open "$outfile" for output: $!};
my $files = $files{$type};
printf qq{Writing aggregate file "%s" from %d input file%s ... },
$outfile,
scalar #$files,
#$files == 1 ? '' : 's';
local #ARGV = #$files;
print $out_fh $_ while <ARGV>;
print "complete\n";
}
I think it's easier if you categorize the files first then you can work with them.
use warnings;
use strict;
use File::Spec;
use Data::Dumper;
use File::Basename;
my %hash = ();
my $testdir = './main_dir';
my #comp_list = glob("$testdir/**/*.txt");
foreach my $file (#comp_list){
$file =~ /(\w+\.\d\..+\.txt)/;
next if not defined $1;
my #tmp = split(/\./, $1);
if (not defined $hash{$tmp[-2]}) {
$hash{$tmp[-2]} = [$file];
}else{
push($hash{$tmp[-2]}, $file);
}
}
print Dumper(\%hash);
Files:
main_dir
├── sub1
│   ├── File.1.age.txt
│   └── File.1.name.txt
└── sub2
├── File.2.age.txt
└── File.2.name.txt
Result:
$VAR1 = {
'age' => [
'./main_dir/sub1/File.1.age.txt',
'./main_dir/sub2/File.2.age.txt'
],
'name' => [
'./main_dir/sub1/File.1.name.txt',
'./main_dir/sub2/File.2.name.txt'
]
};
You can create a loop to concatenate and combine files

Recursive grep in perl

I am new to perl. I have a directory structure. In each directory, I have a log file. I want to grep pattern from that file and do post processing. Right now I am grepping the pattern from those files using unix grep and putting into text file and reading that text file to do post processing, But I want to automate task of reading each file and grepping pattern from that file. In the code below the mdp_cgdis_1102.txt have grepped pattern from directories. I would really appreciate any help
#!usr/bin/perl
use strict;
use warnings;
open FILE, 'mdp_cgdis_1102.txt' or die "Cannot open file $!";
my #array = <FILE>;
my #arr;
my #brr;
foreach my $i (#array){
#arr = split (/\//, $i);
#brr = split (/\:/, $i);
print " $arr[0] --- $brr[2]";
}
It is unclear to me which part of the process needs automating. I'll go by "want to automate reading each file and grepping pattern from that file," whereby you presumably already have a list of files. If you actually need to build the file list as well see the added code below.
One way: pull all patterns from each file and store that in a hash (filename => arrayref-with-patterns)
my %file_pattern;
foreach my $file (#filelist) {
open my $fh, '<', $file or die "Can't open $file: $!";
$file_pattern{$file} = [ grep { /$pattern/ } <$fh> ];
close $fh;
}
The [ ] takes a reference to the list returned by grep, ie. constructs an "anonymous array", and that (reference) is assigned as a value to the $file key.
Now you can process your patterns, per log file
foreach my $filename (sort keys %file_pattern) {
print "Processing log $filename.\n";
my #patterns = #{$file_pattern{$filename}};
# Process the list of patterns in this log file
}
ADDED
In order to build the list of files #filelist used above, from a known list of directories, use core File::Find
module which recursively scans supplied directories and applies supplied subroutines
use File::Find;
find( { wanted => \&process_logs, preprocess => \&select_logs }, #dir_list);
Your subroutine process_logs() is applied to each file/directory that passed preprocessing by the second sub, with its name available as $File::Find::name, and in it you can either populate the hash with patterns-per-log as shown above, or run complete processing as needed.
Your subroutine select_logs() contains code to filter log files from all files in each directory, that File::Find would normally processes, so that process_file() only gets the log files.
Another way would be to use the other invocation
find(\&process_all, #dir_list);
where now the sub process_all() is applied to all entries (files and directories) found and thus this sub itself needs to ensure that it only processes the log files. See linked documentation.
The equivalent of
find ... -name '*.txt' -type f -exec grep ... {} +
is
use File::Find::Rule qw( );
my $base_dir_qfn = ...;
my $re = qr/.../;
my #log_qfns =
File::Find::Rule
->name(qr/\..txt\z/)
->file
->in($base_dir_qfn);
my $success = 1;
for my $log_qfn (#log_qfns) {
open(my $fh, '<', $log_qfn)
or do {
$success = 0;
warn("Can't open log file \"$log_qfn\": $!\n);
next;
};
while (<$fh>) {
print if /$re/;
}
}
exit(1) if !$success;
Use File::Find to traverse the directory.
In a loop go through all the logfiles:
Open the file
read it line by line
For each line, do a regular expression match (
if ($line =~ /pattern/) ) or use
if (index($line, $searchterm) >= 0) if you are looking for a certain static string.
If you find a match, print the line.
close the file
I hope that gives you enough pointers to get started. You will learn more if you find out how to do each of these steps in Perl by yourself (I pointed out the hard ones).

script in perl to copy directory structure from the source to the destination

#!/usr/bin/perl -w
use File::Copy;
use strict;
my $i= "0";
my $j= "1";
my $source_directory = $ARGV[$i];
my $target_directory = $ARGV[$j];
#print $source_directory,"\n";
#print $target_directory,"\n";
my#list=process_files ($source_directory);
print "remaninign files\n";
print #list;
# Accepts one argument: the full path to a directory.
# Returns: A list of files that reside in that path.
sub process_files {
my $path = shift;
opendir (DIR, $path)
or die "Unable to open $path: $!";
# We are just chaining the grep and map from
# the previous example.
# You'll see this often, so pay attention ;)
# This is the same as:
# LIST = map(EXP, grep(EXP, readdir()))
my #files =
# Third: Prepend the full path
map { $path . '/' . $_}
# Second: take out '.' and '..'
grep { !/^\.{1,2}$/ }
# First: get all files
readdir (DIR);
closedir (DIR);
for (#files) {
if (-d $_) {
# Add all of the new files from this directory
# (and its subdirectories, and so on... if any)
push #files, process_files ($_);
} else { #print #files,"\n";
# for(#files)
while(#files)
{
my $input= pop #files;
print $input,"\n";
copy($input,$target_directory);
}
}
# NOTE: we're returning the list of files
return #files;
}
}
This basically copies files from source to destination but I need some guidance on how to
copy the directory as well. The main thing to note here is no CPAN modules are allowed except copy, move, and path
Instead of rolling your own directory processing adventure, why not simply use File::Find to go through the directory structure for you.
#! /usr/bin/env perl
use :5.10;
use warnings;
use File::Find;
use File::Path qw(make_path);
use File::Copy;
use Cwd;
# The first two arguments are source and dest
# 'shift' pops those arguments off the front of
# the #ARGV list, and returns what was removed
# I use "cwd" to get the current working directory
# and prepend that to $dest_dir. That way, $dest_dir
# is in correct relationship to my input parameter.
my $source_dir = shift;
my $dest_dir = cwd . "/" . shift;
# I change into my $source_dir, so the $source_dir
# directory isn't in the file name when I find them.
chdir $source_dir
or die qq(Cannot change into "$source_dir");;
find ( sub {
return unless -f; #We want files only
make_path "$dest_dir/$File::Find::dir"
unless -d "$dest_dir/$File::Find::dir";
copy "$_", "$dest_dir/$File::Find::dir"
or die qq(Can't copy "$File::Find::name" to "$dest_dir/$File::Find::dir");
}, ".");
Now, you don't need a process_files subroutine. You let File::Find::find handle recursing the directory for you.
By the way, you could rewrite the find like this which is how you usually see it in the documentation:
find ( \&wanted, ".");
sub wanted {
return unless -f; #We want files only
make_path "$dest_dir/$File::Find::dir"
unless -d "$dest_dir/$File::Find::dir";
copy "$_", "$dest_dir/$File::Find::dir"
or die qq(Can't copy "$File::Find::name" to "$dest_dir/$File::Find::dir");
}
I prefer to embed my wanted subroutine into my find command instead because I think it just looks better. It first of all guarantees that the wanted subroutine is kept with the find command. You don't have to look at two different places to see what's going on.
Also, the find command has a tendency to swallow up your entire program. Imagine where I get a list of files and do some complex processing on them. The entire program can end up in the wanted subroutine. To avoid this, you simply create an array of the files you want to operate on, and then operate on them inside your program:
...
my #file_list;
find ( \&wanted, "$source_dir" );
for my $file ( #file_list ) {
...
}
sub wanted {
return unless -f;
push #file_list, $File::Find::name;
}
I find this a programming abomination. First of all, what is going on with find? It's modifying my #file_list, but how? No where in the find command is #file_list mentioned. What is it doing?
Then at the end of my program is this sub wanted function that is using a variable, #file_list in a global manner. That's bad programming practice.
Embedding my subroutine directly into my find command solves many of these issues:
my #file_list;
find ( sub {
return unless -f;
push #file_list;
}, $source_dir );
for my $file ( #file_list ) {
...
}
This just looks better. I can see that #file_list is being manipulated directly by my find command. Plus, that pesky wanted subroutine has disappeared from the end of my program. Its' the exact same code. It just looks better.
Let's get to what that find command is doing and how it works with the wanted subroutine:
The find command finds each and every file, directory, link, or whatnot located in the directory list you pass to it. With each item it finds in that directory, it passes it to your wanted subroutine for processing. A return leaves the wanted subroutine and allows find to fetch the next item.
Each time the wanted subroutine is called, find sets three variables:
$File::Find::name: The name of the item found with the full path attached to it.
$File::Find::dir: The name of the directory where the item was found.
$_: The name of the item without the directory name.
In Perl, that $_ variable is very special. It's sort of a default variable for many commands. That is, you you execute a command, and don't give it a variable to use, that command will use $_. For example:
print
prints out $_
return if -f;
Is the same as saying this:
if ( -f $_ ) {
return;
}
This for loop:
for ( #file_list ) {
...
}
Is the same as this:
for $_ ( #file_list ) {
...
}
Normally, I avoid the default variable. It's global in scope and it's not always obvious what is being acted upon. However, there are a few circumstances where I'll use it because it really clarifies the program's meaning:
return unless -f;
in my wanted function is very obvious. I exit the wanted subroutine unless I was handed a file. Here's another:
return unless /\.txt$/;
This will exit my wanted function unless the item ends with '.txt'.
I hope this clarifies what my program is doing. Plus, I eliminated a few bugs while I was at it. I miscopied $File::Find::dir to $File::Find::name which is why you got the error.

how to exclude files having extension ".doc or .jar" from search using FILE::FIND in perl

I am facing two problems .
1)I am searching a directory for certain keywords and the directory contains files with many extension like .java .sql .ksh .pl .jar etc.. . Now my problem is i dont want to search the files having .jar or .doc or .docx . currently my code looks like below
use strict;
use warnings;
use File::find;
my $dir = "C:\DURAB\";
my $out = "output.txt";
open my $out, ">", "output.txt";
find(\&printFile,$dir);
sub printFile {
my $element = $_;
if(-f $element && $element =~ /\.*$/){
open my $in, "<", $element or die $!;
while(<$in>) {
if (/\Q$searchString\E/) {
my $last_update_time = (stat($element))[9];
my $timestamp = localtime($last_update_time);
print $out "$File::Find::name". " $timestamp". " $searchString\n";
last;
}
}
}
}
in this it is searching for all the files now i want to restrict the search so that no need of searching .jar,.exe ,.doc kind of extensions.
2)
while searching the directory i have many subdirectories . i dont want to search in the directories having name obsolete or retired in them (for example obsolete_dr , retired_as ) how can i achieve this using file::find
i think i am little bit confusing actually below is the output i need
from the above code $file::find::name displays the file path if this file path contains obsolete or retired then that output should not be shown.
for example i got path like
c:\DURAB\ASD\OBSOLETE_EC23\sirgu.sql and c:\DURAB\AS\drive.ksh
from the above first path should not be displayed since it is having the word obsolete.
i think now i am clear. Thanks in advance
return if /\.(?:docx?|jar)\z/;
and
if (/obsolete|retired/) {
$File::Find::prune = 1;
return;
}
To reject the unwanted files you should check the name of the file using a regular expression and return from the print_file subroutine if necessary.
To ignore directories containing obsolete or retired you should check whether the node is a directory whose name contains either of these words. If so, then setting $File::Find::prune to a true value will prevent File::Find from recursing into it.
Note that I have used use autodie which will implicitly raise an exception if an open call doesn't work, and File::stat allows by-name access to the fields of the result of a call to stat.
I'm not sure what you meant by your regex /\.*$/ which will match any string as it requires zero or more trailing dots, but this code also excludes file names that end with a dot.
use strict;
use warnings;
use File::Find;
use File::stat;
use autodie;
my $dir = 'C:\DURAB';
my $search_string = 'ABCDEF';
my $out = 'output.txt';
open my $outfh, '>', $out;
find(\&print_file, $dir);
sub print_file {
if ( -d and /obsolete|retired/i ) {
$File::Find::prune = 1;
return;
}
return unless -f and not /\.(exe|doc|docx|exe|)$/;
my $element = $_;
open my $in, '<', $element;
while (<$in>) {
if ( /\Q$search_string\E/ ) {
my $last_update_time = stat($element)->mtime;
my $timestamp = localtime $last_update_time;
printf $outfh "%s %s %s\n",
$File::Find::name,
$timestamp,
$search_string;
last;
}
}
}
Change this line
if(-f $element && $element =~ /\.*$/){
to this
if( (-f $element && $element !~ /\.(jar|docx?)$/i) || ( $File::Find::dir !~ /(obsolete|retired)/i) ){
a good guide: http://www.perlmonks.org/?node_id=217166
Regards,