Unable to open files returned by readdir in Perl [duplicate] - perl

This question already has answers here:
Why can't I open files returned by Perl's readdir?
(2 answers)
Closed 7 years ago.
I have a problem with a Perl script, as follows.
I must open and analyze all the *.txt files in a directory, but I cannot.
I can read file names that are saved in the #files array and printed, but I cannot open those files for reading.
This is my code:
my $dir= "../Scrivania/programmi" ;
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
print "sono nel foreach\n";
print " in : "."$fh\n";
#open(CANALI,$fh);
##righe=<CANALI>;
#close(CANALI);
#print "canali:"."#righe\n";
#foreach $canali (#righe)
#{
# $canali =~ /\d\d:\d\d (-) (.*)/;
# $ora= $1;
#
# if($hhSplit[0] == $ora)
# {
# push(#output, "$canali");
#
# }
#}
}

The main problem you have is that the file names returned by readdir have no path, so you're trying to open, say, x.txt when you should be opening ../Sc/direct/x.txt. The file doesn't exist in the current working directory so your open call fails
You also have a strange mixture of stuff in glob("$dir/(.*).txt/") which looks a little like a regex pattern, which glob doesn't understand. The value of $dir is a directory handle left open from the opendir on the first line. What you should be using is glob '../Sc/direct/*.txt', but then there's no need for the readdir
There are two ways to find the contents of a file. You can use opendir and readdir to read everything in the directory, or you can use glob
The first method returns only the bare name of each entry, which means you must concatenate each name with the path to the containing directory, preferably using catfile from File::Spec::Functions. It also includes the pseudo-directories . and .. so you must filter those out before you can use the list of names
glob has neither of these disadvantages. All the strings it returns are real directory entries, and they will include a path if you provided one in the pattern you passed as a parameter
You seem to have become rather muddled over the two, so I have written this program which differentiates between the two approaches. I hope it makes things clearer
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Spec::Functions qw/ catfile /;
my $dir = '../Sc/direct';
### Using glob
for my $file ( glob catfile($dir, '*.txt') ) {
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
### Using opendir / readdir
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt$/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}

Using $dir in the glob is incorrect. $dir is a GLOB type not a string value. Rather you should be looping over the #files array and looking for names that match what you want. Maybe something like so:
foreach my $fp (#files) {
if ($fp =~ /(.*).txt/) {
print "$fp is a .txt\n";
open (my $in, "<", $fp)
while (<$in>) ...
}
}

Related

Recovering a specific line in multiple .txt in a directory using Perl

I have the results of a program which gives me the results from some search giving me 2000+ file txt archives. I just need a specific line in each file, this is what I have been trying with Perl:
opendir(DIR, $dirname) or die "Could not open $dirname\n";
while ($filename = readdir(DIR)) {
print "$filename\n";
open ($filename, '<', $filename)or die("Could not open file.");
my $line;
while( <$filename> ) {
if( $. == $27 ) {
print "$line\n";
last;
}
}
}
closedir(DIR);
But there is a problem with the $filename in line 5 and I don't know an alternative to it so I don't have to manually name each file.
Several issues with that code:
Using an old-school bareword identifier for the directory handle instead of a autovivified variable like you are for the file handle.
Using the same variable for the filename and file handle is pretty strange.
You don't check to see if the file is a directory or something else other than a plain file before trying to open it.
$27?
You never assign anything to that $line variable before printing it.
Unless $directory is your program's current working directory, you're running into an issue mentioned in the readdir documentation
If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.
(Substitute open for filetest)
Always use strict; and use warnings;.
Personally, if you just want to print the 27th line of a large number of files, I'd turn to awk and find (Using its -exec test to avoid potential errors about the command line maximum length being hit):
find directory/ -maxdepth 1 -type -f -exec awk 'FNR == 27 { print FILENAME; print }' \{\} \+
If you're on a Windows system without standard unix tools like those installed, or it's part of a bigger program, a fixed up perl way:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw/say/;
use File::Spec;
my $directory = shift;
opendir(my $dh, $directory);
while (my $filename = readdir $dh) {
my $fullname = File::Spec->catfile($directory, $filename); # Construct a full path to the file
next unless -f $fullname; # Only look at regular files
open my $fh, "<", $fullname;
while (my $line = <$fh>) {
if ($. == 27) {
say $fullname;
print $line;
last;
}
}
close $fh;
}
closedir $dh;
You might also consider using glob to get the filenames instead of opendir/readdir/closedir.
And if you have Path::Tiny available, a simpler version is:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw/say/;
use Path::Tiny;
my $directory = shift;
my $dir = path $directory;
for my $file ($dir->children) {
next unless -f $file;
my #lines = $file->lines({count => 27});
if (#lines == 27) {
say $file;
print $lines[-1];
}
}

search for text pattern in a directory using perl script

could anyone please share with me a snippet where i can do a grep search in perl file. Example,
i need this grep: grep 1115852311 /opt/files/treated/postpaid/*
to be done in perl script and print all the matches
tried the below, but did not work :
my $start_dir= "\opt\files\treated\postpaid\";
my $file_name = "*";
my #filematches;
opendir(DIR, "$start_dir");
#xml_files = grep(1115852311,readdir(DIR));
print #xml_files;
A good start would be to read the documentation for grep(). If you do, you'll see that Perl's grep() works rather differently to the Unix grep command. The Unix command just looks for text in a list of files. The Perl version works on any list of data and returns any elements in that list for which an arbitrary Boolean expression is true.
A Perl version of the Unix command would look something like this:
while (<$some_open_filehandle>) {
print if /$some_string/;
}
That's not quite what you want, but we can use it as a start. First, let's write something that takes a filename and string and checks whether the string appears in the file:
sub is_string_in_file {
my ($filename, $string) = #_;
open my $fh, '<', $filename or die "Cannot open file '$filename': $!\n";
return grep { /$string/ } <$fh>;
}
We can now use that in a loop which uses readdir() to get a list of files.
my #files;
my $dir = '/opt/files/treated/postpaid/';
opendir my $dh, $dir or die $!;
while (my $file = readdir($dh)) {
if (is_string_in_file("$dir$file", 1115852311) {
push #files, "$dir$file";
}
}
After running that code, the list of files that contain your string will be in #files.
You might want to look at glob() instead of opendir() and readdir().
used the below snippet to achieve what i wanted
#!/usr/bin/perl
use strict;
use warnings;
sub is_string_in_file {
my ($filename, $string) = #_;
open my $fh, '<', $filename
or die "Cannot open file \n";
while(my $line = <$fh>){
if($line =~ /$string/){
print $string;
print $filename."\n";
}
}
#return grep { $_ eq $string } <$fh>;
}
my #files;
my $dir = '/opt/files/treated/postpaid/';
opendir my $dh, $dir or die $!;
while (my $file = readdir($dh)) {
is_string_in_file("$dir$file", 1115852311);
}

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

Why does readdir() list the filenames in wrong order?

I'm using the following code to read filenames from a directory and push them onto an array:
#!/usr/bin/perl
use strict;
use warnings;
my $directory="/var/www/out-original";
my $filterstring=".csv";
my #files;
# Open the folder
opendir(DIR, $directory) or die "couldn't open $directory: $!\n";
foreach my $filename (readdir(DIR)) {
if ($filename =~ m/$filterstring/) {
# print $filename;
# print "\n";
push (#files, $filename);
}
}
closedir DIR;
foreach my $file (#files) {
print $file . "\n";
}
The output I get from running this code is:
Report_10_2014.csv
Report_04_2014.csv
Report_07_2014.csv
Report_05_2014.csv
Report_02_2014.csv
Report_06_2014.csv
Report_03_2014.csv
Report_01_2014.csv
Report_08_2014.csv
Report.csv
Report_09_2014.csv
Why is this code pushing the file names into the array in this order, and not from 01 to 10?
Unix directories are not stored in sorted order. Unix commands like ls and sh sort directory listings for you, but Perl's opendir function does not; it returns items in the same order the kernel does, which is based on the order they're stored in. If you want the results to be sorted, you'll need to do that yourself:
for my $filename (sort readdir(DIR)) {
(Btw: bareword file handles, like DIR, are global variables; it's considered good practice to use lexical file handles instead, like:
opendir my $dir, $directory or die "Couldn't open $directory: $!\n";
for my $filename (sort readdir($dir)) {
as a safety measure.)

How can I list all files in a directory using Perl?

I usually use something like
my $dir="/path/to/dir";
opendir(DIR, $dir) or die "can't open $dir: $!";
my #files = readdir DIR;
closedir DIR;
or sometimes I use glob, but anyway, I always need to add a line or two to filter out . and .. which is quite annoying.
How do you usually go about this common task?
my #files = grep {!/^\./} readdir DIR;
This will exclude all the dotfiles as well, but that's usually What You Want.
I often use File::Slurp. Benefits include: (1) Dies automatically if the directory does not exist. (2) Excludes . and .. by default. It's behavior is like readdir in that it does not return the full paths.
use File::Slurp qw(read_dir);
my $dir = '/path/to/dir';
my #contents = read_dir($dir);
Another useful module is File::Util, which provides many options when reading a directory. For example:
use File::Util;
my $dir = '/path/to/dir';
my $fu = File::Util->new;
my #contents = $fu->list_dir( $dir, '--with-paths', '--no-fsdots' );
I will normally use the glob method:
for my $file (glob "$dir/*") {
#do stuff with $file
}
This works fine unless the directory has lots of files in it. In those cases you have to switch back to readdir in a while loop (putting readdir in list context is just as bad as the glob):
open my $dh, $dir
or die "could not open $dir: $!";
while (my $file = readdir $dh) {
next if $file =~ /^[.]/;
#do stuff with $file
}
Often though, if I am reading a bunch of files in a directory, I want to read them in a recursive manner. In those cases I use File::Find:
use File::Find;
find sub {
return if /^[.]/;
#do stuff with $_ or $File::Find::name
}, $dir;
If some of the dotfiles are important,
my #files = grep !/^\.\.?$/, readdir DIR;
will only exclude . and ..
When I just want the files (as opposed to directories), I use grep with a -f test:
my #files = grep { -f } readdir $dir;
Thanks Chris and Ether for your recommendations. I used the following to read a listing of all files (excluded directories), from a directory handle referencing a directory other than my current directory, into an array. The array was always missing one file when not using the absolute path in the grep statement
use File::Slurp;
print "\nWhich folder do you want to replace text? " ;
chomp (my $input = <>);
if ($input eq "") {
print "\nNo folder entered exiting program!!!\n";
exit 0;
}
opendir(my $dh, $input) or die "\nUnable to access directory $input!!!\n";
my #dir = grep { -f "$input\\$_" } readdir $dh;