I am using File::Find and file i/o on a text file to parse a series of directories and move the contents into a new folder. It is a simple script (see below):
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use File::Copy;
my $dir = "/opt/CollectMinderDocuments/coastalalglive"; #base directory for Coastal documents
#read file that contains a list of closed IDs
open(MYDATA, "Closed.txt");
mkdir("Closed");
while(my $line = <MYDATA>) {
chomp $line;
my $str = "$dir" . "/Account$line";
print "$str\n";
find(\&move_documents, $str);
}
sub move_documents {
my $smallStr = substr $File::Find::name, 43;
if(-d) {
#system("mkdir ~/Desktop/Closed/$smallStr");
print "I'm here\n";
system("mkdir /opt/CollectMinderDocuments/coastalalglive/Closed/$smallStr");
#print "Made a directory: /opt/CollectMinderDocuments/coastalalglive/Closed/$smallStr\n";
}
else {
print "Now I'm here\n";
my $smallerStr = substr $File::Find::dir, 43;
my $temp = "mv * /opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/";
system("$temp");
}
}
The text file contains a list of numbers:
1234
2805
5467
The code worked when I executed it last month, but it is now returning a "file or directory not found" error. The actual error is "No such file or directoryerDocuments/coastalalglive/Account2805". I know all of the directories it is searching for exist. I have manually typed in one of the directories, and the script executes fine:
find(\&move_documents, "/opt/CollectMinderDocuments/coastalalglive/Account2805/");
I am not sure why the error is being returned. Thanks in advance for the help.
Your error:
"No such file or directoryerDocuments/coastalalglive/Account2805"
Seems to imply that there is an \r that was not removed by your chomp. That will happen when transferring files between different file systems, where the file contains \r\n as line endings. The real error string would be something like:
/opt/CollectMinderDocuments/coastalalglive/Account2805\r: No such file or directory
Try changing chomp $line to $line =~ s/[\r\n]+$//; instead, and see if that works.
Also:
my $temp = "mv * /opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/";
system("$temp");
Is very wrong. The first non-directory file in that loop will move all the remaining files (including dirs? not sure if mv does that by default). Hence, subsequent iterations of the subroutine will find nothing to move, also causing a "Not found" type error. Though not one caught by perl, since you are using system instead of File::Copy::move. E.g.:
move $_, "/opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/" or die $!;
Related
I am new to perl. I have a directory structure. In each directory, I have a log file. I want to grep pattern from that file and do post processing. Right now I am grepping the pattern from those files using unix grep and putting into text file and reading that text file to do post processing, But I want to automate task of reading each file and grepping pattern from that file. In the code below the mdp_cgdis_1102.txt have grepped pattern from directories. I would really appreciate any help
#!usr/bin/perl
use strict;
use warnings;
open FILE, 'mdp_cgdis_1102.txt' or die "Cannot open file $!";
my #array = <FILE>;
my #arr;
my #brr;
foreach my $i (#array){
#arr = split (/\//, $i);
#brr = split (/\:/, $i);
print " $arr[0] --- $brr[2]";
}
It is unclear to me which part of the process needs automating. I'll go by "want to automate reading each file and grepping pattern from that file," whereby you presumably already have a list of files. If you actually need to build the file list as well see the added code below.
One way: pull all patterns from each file and store that in a hash (filename => arrayref-with-patterns)
my %file_pattern;
foreach my $file (#filelist) {
open my $fh, '<', $file or die "Can't open $file: $!";
$file_pattern{$file} = [ grep { /$pattern/ } <$fh> ];
close $fh;
}
The [ ] takes a reference to the list returned by grep, ie. constructs an "anonymous array", and that (reference) is assigned as a value to the $file key.
Now you can process your patterns, per log file
foreach my $filename (sort keys %file_pattern) {
print "Processing log $filename.\n";
my #patterns = #{$file_pattern{$filename}};
# Process the list of patterns in this log file
}
ADDED
In order to build the list of files #filelist used above, from a known list of directories, use core File::Find
module which recursively scans supplied directories and applies supplied subroutines
use File::Find;
find( { wanted => \&process_logs, preprocess => \&select_logs }, #dir_list);
Your subroutine process_logs() is applied to each file/directory that passed preprocessing by the second sub, with its name available as $File::Find::name, and in it you can either populate the hash with patterns-per-log as shown above, or run complete processing as needed.
Your subroutine select_logs() contains code to filter log files from all files in each directory, that File::Find would normally processes, so that process_file() only gets the log files.
Another way would be to use the other invocation
find(\&process_all, #dir_list);
where now the sub process_all() is applied to all entries (files and directories) found and thus this sub itself needs to ensure that it only processes the log files. See linked documentation.
The equivalent of
find ... -name '*.txt' -type f -exec grep ... {} +
is
use File::Find::Rule qw( );
my $base_dir_qfn = ...;
my $re = qr/.../;
my #log_qfns =
File::Find::Rule
->name(qr/\..txt\z/)
->file
->in($base_dir_qfn);
my $success = 1;
for my $log_qfn (#log_qfns) {
open(my $fh, '<', $log_qfn)
or do {
$success = 0;
warn("Can't open log file \"$log_qfn\": $!\n);
next;
};
while (<$fh>) {
print if /$re/;
}
}
exit(1) if !$success;
Use File::Find to traverse the directory.
In a loop go through all the logfiles:
Open the file
read it line by line
For each line, do a regular expression match (
if ($line =~ /pattern/) ) or use
if (index($line, $searchterm) >= 0) if you are looking for a certain static string.
If you find a match, print the line.
close the file
I hope that gives you enough pointers to get started. You will learn more if you find out how to do each of these steps in Perl by yourself (I pointed out the hard ones).
This perl script is traversing all directories and sub directories, searching for a file named RUN in it. Then it opens the file and runs the 1st line written in the file. The problem is that I am not able to redirect the output of the system command to a file named error.log and STDERR to another file named test_file.errorlog, but no such file is created.
Note that all variable are declared if not found.
find (\&pickup_run,$path_to_search);
### Subroutine for extracting path of directories with RUN FILE PRESENT
sub pickup_run {
if ($File::Find::name =~/RUN/) {
### If RUN file is present , push it into array named run_file_present
push(#run_file_present,$File::Find::name);
}
}
###### Iterate over the array containing paths to directories containing RUN files one by one
foreach my $var (#run_file_present) {
$var =~ s/\//\\/g;
($path_minus_run=$var) =~ s/RUN\b//;
#print "$path_minus_run\n";
my $test_case_name;
($test_case_name=$path_minus_run) =~ s/expression to be replced//g;
chdir "$path_minus_run";
########While iterating over the paths, open each file
open data, "$var";
#####Run the first two lines containing commands
my #lines = <data>;
my $return_code=system (" $lines[0] >error.log 2>test_file.errorlog");
if($return_code) {
print "$test_case_name \t \t FAIL \n";
}
else {
print "$test_case_name \t \t PASS \n";
}
close (data);
}
The problem is almost certainly that $lines[0] has a newline at the end after being read from the file
But there are several improvements you could make
Always use strict and use warnings at the top of every Perl program, and declare all your variables using my as close as possible to their first point of use
Use the three-parameter form of open and always check whether it succeeded, putting the built-in variable $! into your die string to say why it failed. You can also use autodie to save writing the code for this manually for every open, but it requires Perl v5.10.1 or better
You shouldn't put quotes around scalar variables -- just used them as they are. so chdir $path_minus_run and open data, $var are correct
There is also no need to save all the files to be processed and deal with them later. Within the wanted subroutine, File::Find sets you up with $File::Find::dir set to the directory containing the file, and $_ set to the bare file name without a path. It also does a chdir to the directory for you, so the context is ideal for processing the file
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Find;
my $path_to_search;
find( \&pickup_run, $path_to_search );
sub pickup_run {
return unless -f and $_ eq 'RUN';
my $cmd = do {
open my $fh, '<', $_;
<$fh>;
};
chomp $cmd;
( my $test_name = $File::Find::dir ) =~ s/expression to be replaced//g;
my $retcode = system( "$cmd >error.log 2>test_file.errorlog" );
printf "%s\t\t%s\n", $test_name, $retcode ? 'FAIL' : 'PASS';
}
I've solved #1, But I got 2 question left.
Anyone that can help me?
From the directory local dir, list all the files that start with a number (0..9) and which file
extension is .song
Create the subdirectory ”local dir/selected” in which you will copy each one of these files after
numbering each (non blank) line in each one of them.
Print out, in a file called stats.txt, the following informations concerning each one of the files:
a) Number of (non blank) lines.
b) Number of paragraphs. A paragraph here is a block of text composed of non empty lines and
delimited at its beginning and at its end by either the beginning of the file, the end of the file or
by a blank line.
c) The mean size of a paragraph (in number of lines).
d) If, yes or no, all paragraphs in the file have the same length.
Bonus questions:
e) Detect each rhymes present in each file.
f) Give the ratio of rhyming lines towards the total number of lines.
For #1:
#!/usr/bin/perl
use strict;
use warnings;
my $directory = '/local_dir';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR))
{
# Use a regular expression to find files ending in .song.txt
next unless ($file =~ m/\.song.txt$/);
print "$file\n";
}
closedir(DIR);
exit 0;
Here's how you might go about doing the first and second step:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $src_dir = '';
my #files = glob $src_dir . qq([0-9]*.song.txt);
my $dest_dir = 'selected';
mkdir $dest_dir;
for my $file (#files) {
open my $fin, "<", $file;
open my $fout, ">", "$dest_dir/$file";
my $c = 1;
while (<$fin>) {
$_ = $c++ . " $_" unless /^$/;
print $fout $_;
}
}
Rather than using opendir, you can use glob to find the files that you are interested in, starting with a number [0-9] and ending with .song.txt. After creating the output directory, the for loop goes through these files and creates a new file in the destination directory with the same name. The while loop goes through each line of the input file. It adds a number to the start of each line if it is not blank, i.e. the start of the line is not followed immediately by the end of the line /^$/. Then it writes the line to the new file.
As for the third step, I think that it would be worth you having a go at that yourself and asking a new question if you get stuck.
I have a perl script to which i supply input(text file) from batch or sometimes from command prompt. When i supply input from batch file sometimes the file may not exisits. I want to catch the No such file exists error and do some other task when this error is thrown. Please find the below sample code.
while(<>) //here it throws an error when file doesn't exists.
{
#parse the file.
}
#if error is thrown i want to handle that error and do some other task.
Filter #ARGV before you use <>:
#ARGV = grep {-e $_} #ARGV;
if(scalar(#ARGV)==0) die('no files');
# now carry on, if we've got here there is something to do with files that exist
while(<>) {
#...
}
<> reads from the files listed in #ARGV, so if we filter that before it gets there, it won't try to read non-existant files. I've added the check for the size of #ARGV because if you supply a list files which are all absent, it will wait on stdin (the flipside of using <>). This assumes that you don't want to do that.
However, if you don't want to read from stdin, <> is probably a bad choice; you might as well step through the list of files in #ARGV. If you do want the option of reading from stdin, then you need to know which mode you're in:
$have_files = scalar(#ARGV);
#ARGV = grep {-e $_} #ARGV;
if($have_files && scalar(grep {defined $_} #ARGV)==0) die('no files');
# now carry on, if we've got here there is something to do;
# have files that exist or expecting stdin
while(<>) {
#...
}
The diamond operator <> means:
Look at the names in #ARGV and treat them as files you want to open.
Just loop through all of them, as if they were one big file.
Actually, Perl uses the ARGV filehandle for this purpose
If no command line arguments are given, use STDIN instead.
So if a file doesn't exist, Perl gives you an error message (Can't open nonexistant_file: ...) and continues with the next file. This is what you usually want. If this is not the case, just do it manually. Stolen from the perlop page:
unshift(#ARGV, '-') unless #ARGV;
FILE: while ($ARGV = shift) {
open(ARGV, $ARGV);
LINE: while (<ARGV>) {
... # code for each line
}
}
The open function returns a false value when a problem is encountered. So always invoke open like
open my $filehandle "<", $filename or die "Can't open $filename: $!";
The $! contains a reason for the failure. Instead of dieing, we can do some other error recovery:
use feature qw(say);
#ARGV or #ARGV = "-"; # the - symbolizes STDIN
FILE: while (my $filename = shift #ARGV) {
my $filehandle;
unless (open $filehandle, "<", $filename) {
say qq(Oh dear, I can't open "$filename". What do you wan't me to do?);
my $tries = 5;
do {
say qq(Type "q" to quit, or "n" for the next file);
my $response = <STDIN>;
exit if $response =~ /^q/i;
next FILE if $response =~ /^n/i;
say "I have no idea what that meant.";
} while --$tries;
say "I give up" and exit!!1;
}
LINE: while (my $line = <$filehandle>) {
# do something with $line
}
}
I am trying to write a program that reads all files recursively from some top point into an array and subsequently read lines of filenames from a separate file, trying to print if those filenames are present in the earlier array.
my program churns through the 43K files in the directory structure and subsequently gets through about 300 of the 400 lines in the file before providing me with a spectactular "* glibc detected perl: corrupted double-linked list: 0x0000000000a30740 **"
Which I have no knowledge about at all.. Could this be an 'out of memory' type bug? I can't imagine it is not since the host has 24G of memory.
Do you have any idea where I'm going wrong? I was trying to save time and effort by reading the entire list of files from the subdirectory into an array one time and subsequently matching against it using the shorter list of files from the filename given as ARGV[0].
Here is my code:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
use File::Find;
use 5.010001;
## debug subroutine
my $is_debug = $ENV{DEBUG} // 0;
sub debug { print "DEBUG: $_[0]\n" if $is_debug };
## exit unless properly called with ARGV
die "Please provide a valid filename: $!" unless $ARGV[0] && (-e $ARGV[0]);
my #pic_files;
my $pic_directory="/files/multimedia/pictures";
find( sub {
push #pic_files, $File::Find::name
if -f && ! -d ;
}, $pic_directory);
open LIST, '<', $ARGV[0] or die "Could not open $ARGV[0]: $!";
while(<LIST>) {
chomp;
debug "\$_ is ->$_<-";
if ( #pic_files ~~ /.*$_/i ) {
print "found: $_\n";
} else {
print "missing: $_\n";
}
}
close LIST or die "Could not close $ARGV[0]: $!";
And here is a sample of the file:
DSC02338.JPG
DSC02339.JPG
DSC02340.JPG
DSC02341.JPG
DSC02342.JPG
DSC02343.JPG
DSC02344.JPG
DSC02345.JPG
DSC02346.JPG
DSC02347.JPG
And the obligitory error:
missing: DSC02654.JPG
DEBUG: is ->DSC02655.JPG<-
missing: DSC02655.JPG
DEBUG: is ->DSC02656.JPG<-
missing: DSC02656.JPG
*** glibc detected *** perl: corrupted double-linked list: 0x0000000000a30740 ***
======= Backtrace: =========
/lib/libc.so.6(+0x71bd6)[0x7fb6d15dbbd6]
/lib/libc.so.6(+0x7553f)[0x7fb6d15df53f]
Thanks in advance!
This is a very inefficient algorithm. You are running 21,500 * n regexes, where n is the number of files in LIST. My guess is, this is opening you up to some kind of underlying memory issue or bug.
Here is an alternative approach that would be much more efficient without many changes. First, read the files into a hash rather than an array (I added lc to make everything lowercase, since you want case-insensitive matching):
my %pic_files;
find( sub {
$pic_files{lc $File::Find::name}++
if -f && ! -d ;
}, $pic_directory);
Edit: Second, rather than using a regex to search every single file in the directory, use a regex on the input line to intelligently find potential matches.
my $path_portion = lc $_;
my $found = 0;
do {
if (exists $pic_files{$path_portion} or exists $pic_files{'/' . $path_portion} )
{
$found = 1;
}
} while (!found and $path_portion =~ /\/(.*)$/ and $path_portion = $1);
if ($found) { print "found: $_"; }
else { print "not found: $_\n"; }
This checks the path in the input file, then lops off the first directory in the path each time it does not match and checks again. It should be much faster, and hopefully this strange bug will go away (though it would be nice to figure out what was happening; if it is a bug in Perl, your version becomes very important, since smart match is a new feature that has had a lot of recent changes and bug fixes).
Although I haven't seen an error like this before, I suspect it is being caused by generating a 43,000-element list of files and using it in a smart match. Are you using a 64-bit perl?
You also make things more difficult by storing the full path to each file when all you need to match is the base file name.
This really isn't the sort of thing smart match is good for, and I suggest that you should create a hash of the file names in the input file and mark them off one by one as find comes across them
This program shows the idea. I don't have a perl installation at hand so I can't test it but it looks OK
use strict;
use warnings;
use File::Find;
my $listfile = shift;
die "Please provide a valid filename" unless $listfile;
open my $list, '<', $listfile or die "Unable to open '$listfile': $!";
my %list;
while (<$list>) {
chomp;
$list{$_} = 0;
}
close $list;
my $pic_directory = '/files/multimedia/pictures';
find( sub {
if (-f and exists $list{$_}) {
print "found: $_\n";
$list{$_}++;
}
}, $pic_directory);
for my $file (keys %list) {
print "missing: $_\n" unless $list{$file};
}