Why " print readdir(DIR_HANDLE); " will pop out many files? - perl

I use readdir(DIR) to read a file , but when I use
$file = readdir(DIR);
print $file;
print "\n";
sleep(2);
it will print a file one time;
but when I use
print readdir(DIR);
print "\n";
sleep(2);
it pop out many files
what's wrong with it?
thanks

readdir does not read a file. It scans a directory for the next directory listing.
You can check out the perldoc for it here: readdir
The reason it printed only one file with your declaration of $file is because it is a scalar value. It will only read from the directory handle once and return a listing.
More commonly when you want to read from an entire directory, you assign it to a list which is what readdir returns thus printing all the directory listings in your second example.

readdir returns the next file when evaluated in scalar context (or undef after the last one has been read).
my $file = readdir($fh);
The scalar assign operator evaluates its RHS operand in scalar context.
readdir returns the remaining files when evaluated in list context.
my #files = readdir($fh);
print evaluate its argument list in list context.

Related

Perl - Could not open and read files

I've created a script for validating xml files after given input folder. It should grep xml files from the input directory then sort out the xml files and check the condition. But it throws a command that not Open at line , <STDIN> line 1.
But it creates an empty log file.
Since i faced numeric error while sorting, comment that.
so i need to be given input location, the script should check the xml files and throw errors in a mentioned log file.
Anyone can help this?
Script
#!/usr/bin/perl
# use strict;
use warnings;
use Cwd;
use File::Basename;
use File::Path;
use File::Copy;
use File::Find;
print "Enter the path: ";
my $filepath = <STDIN>;
chomp $filepath;
die "\n\tpleas give input folder \n" if(!defined $filepath or !-d $filepath);
my $Toolpath = dirname($0);
my $base = basename($filepath);
my $base_path = dirname($filepath);
my ($xmlF, #xmlF);
my #errors=();
my #warnings=();
my #checkings=();
my $ecount=0;
my $wcount=0;
my $ccount=0;
my ($x, $y);
my $z="0";
opendir(DIR,"$filepath");
my #xmlFiles = grep{/\.xml$/} readdir(DIR);
closedir(DIR);
my $logfile = "$base_path\\$base"."_Err.log";
# #xmlF=sort{$a <=> $b}#xmlFiles;
#xmlF=sort{$a cmp $b}#xmlFiles;
open(OUT, ">$logfile") || die ("\nLog file couldnt write $logfile :$!");
my $line;
my $flcnt = scalar (#xmlF);
for ($x=0; $x < $flcnt; $x++)
{
open IN, "$xmlF[$x]" or die "not Open";
print OUT "\n".$xmlF[$x]."\n==================\n";
print "\nProcessing File $xmlF[$x] .....\n";
local $/;
while ($line=<IN>)
{
while ($line=~m#(<res(?: [^>]+)? type="weblink"[^>]*>)((?:(?!</res>).)*)</res>#igs)
{
my $tmp1 = $1; my $tmp2 = $&; my $pre1 = $`;
if($tmp1 =~ m{ subgroup="Weblink"}i){
my $pre = $pre1.$`;
if($tmp2 !~ m{<tooltip><\!\[CDATA\[Weblink\]\]><\/tooltip>}ms){
my $pre = $pre1.$`;
push(#errors,lineno($pre),"\t<tooltip><\!\[CDATA\[Weblink\]\]></tooltip> is missing\n");
}
}
}
foreach my $warnings(#warnings)
{
$wcount = $wcount+1;
}
foreach my $checkings(#checkings)
{
$ccount = $ccount+1;
}
foreach my $errors(#errors)
{
$ecount = $ecount+1;
}
my $count_err = $ecount/2;
print OUT "".$count_err." Error(s) Found:-\n------------------------\n ";
print OUT "#errors\n";
$ecount = 0;
my $count_war = $wcount/2;
print OUT "$count_war Warning(s) Found:-\n-------------------------\n ";
print OUT "#warnings\n";
$wcount = 0;
my $count_check = $ccount/2;
print OUT "$count_check Checking(s) Found:-\n-------------------------\n ";
print OUT "#checkings\n";
$wcount = 0;
undef #errors;
undef #warnings;
undef #checkings;
close IN;
}
}
The readdir returns bare file names, without the path.
So when you go ahead to open those files you need to prepend the names returned by readdir with the name of the directory the readdir read them from, here $filepath. Or build the full path names right away
use warnings;
use strict;
use feature 'say';
use File::Spec;
print "Enter the path: ";
my $filepath = <STDIN>;
chomp $filepath;
die "\nPlease give input folder\n" if !defined $filepath or !-d $filepath;
opendir(my $fh_dir, $filepath) or die "Can't opendir $filepath: $!";
my #xml_files =
map { File::Spec->catfile($filepath, $_) }
grep { /\.xml$/ }
readdir $fh_dir;
closedir $fh_dir;
say for #xml_files;
where I used File::Spec to portably piece together the file name.
The map can be made to also do grep's job so to make only one pass over the file list
my #xml_files =
map { /\.xml$/ ? File::Spec->catfile($filepath, $_) : () }
readdir $fh_dir;
The empty list () gets flattened in the returned list, effectively disappearing altogether.
Here are some comments on the code. Note that this is normally done at Code Review but I feel that it is needed here.
First: a long list of variables is declared upfront. It is in fact important to declare in as small a scope as possible. It turns out that most of those variables can indeed be declared where they are used, as seen in comments below.
The location of the executable is best found using
use FindBin qw($RealBin);
where $RealBin also resolves links (as opposed to $Bin, also available)
Assigning () to an array at declaration doesn't do anything; it is exactly the same as normal my #errors;. They can also go together, my (#errors, #warnings, #checks);. If the array has something then = () clears it, what is a good way to empty an array
Assigning a "0" makes the variable a string. While Perl normally converts between strings and numbers as needed, if a number is needed then use a number, my $z = 0;
Lexical filehandles (open my $fh, ...) are better than globs (open FH, ...)
I don't understand the comment about "numeric error" in sorting. The cmp operator sorts lexicographically, for numeric sort use <=>
When array is used in scalar context – when assigned to a scalar for example – the number of elements is returned. So no need for scalar but do my flcnt = #xmlF;
For iteration over array indices use $#ary, the index of the last element of #ary, for
foreach my $i (0..$#xmlF) { ... }
But if there aren't any uses of the index (I don't see any) then loop over elements
foreach my $file (#xmlF) { ... }
When you check the file open print the error $!, open ... or die "... : $!";. This is done elsewhere in the code, and it should be done always.
The local $/; unsets the input record separator, what makes the following read take the whole file. If that is intended then $line is not a good name. Also note that a variable can be declared inside the condition, while (my $line = <$fh>) { }
I can't comment on the regex as I don't know what it's supposed to accomplish, but it is complex; any chance to simplify all that?
The series of foreach loops only works out the number of elements of those arrays; there is no need for loops then, just my $ecount = #errors; (etc). This also allows you to keep the declaration of those counter variables in minimal scope.
The undef #errors; (etc) aren't needed since those arrays count for each file and so you can declare them inside the loops, anew at each iteration (and at smallest scope). When you wish to empty an array it is better to do #ary = (); than to undef it; that way it's not allocated all over again on the next use

So what exactly does <FILE> do?

So, I've used <FILE> a large number of times. A simple example would be:
open (FILE, '<', "someFile.txt");
while (my $line = <FILE>){
print $line;
}
So, I had thought that using <FILE> would take a a part of a file at a time (a line specifically) and use it, and when it was called on again, it would go to the next line. And indeed, whenever I set <FILE> to a scalar, that's exactly what it would do. But, when I told the computer a line like this one:
print <FILE>;
it printed the entire file, newlines and all. So my question is, what does the computer think when it's passed <FILE>, exactly?
Diamond operator <> used to read from file is actually built-in readline function.
From perldoc -f readline
Reads from the filehandle whose typeglob is contained in EXPR (or from *ARGV if EXPR is not provided). In scalar context, each call reads and returns the next line until end-of-file is reached, whereupon the subsequent call returns undef. In list context, reads until end-of-file is reached and returns a list of lines.
If you would like to check particular context in perl,
sub context { return wantarray ? "LIST" : "SCALAR" }
print my $line = context(), "\n";
print my #array = context(), "\n";
print context(), "\n";
output
SCALAR
LIST
LIST
It depends if it's used in a scalar context or a list context.
In scalar context: my $line = <file> it reads one line at a tie.
In list context: my #lines = <FILE> it reads the whole file.
When you say print <FILE>; it's list context.
The behaviour is different depending on what context it is being evaluated in:
my $scalar = <FILE>; # Read one line from FILE into $scalar
my #array = <FILE>; # Read all lines from FILE into #array
As print takes a list argument, <FILE> is evaluated in list context and behaves in the latter way.

reading and printing a text file in Perl

I have simple question:
why the first code does not print the first line of the file but the second one does?
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (<FH>) {
print (<FH>);
}
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (my $file = <FH>) {
print ("$file");
}
Context.
Your first program tests for end-of-file on FH by reading the first line, then reads FH in list context as an argument to print. That translates to the whole file, as a list with one line per item. It then tests for EOF again, most likely detects it, and stops.
Your second program iterates by line, each one read in scalar context to variable $file, and prints them individually. It detects EOF by a special case in the while syntax. (see the code samples in the documentation)
So the specific reason why your program doesn't print the first line in one case is that it's lost in the argument to while. Do note that the two programs' structure is pretty different: the first only runs a single while iteration, while the second iterates once per line.
PS: nowadays, the recommended way to manage files tends towards lexical filehandles (open my $file, 'name'; print <$file>;).
Because you are comsuming the first line with the <> operator and then using it again in the print, so the first line has already gone but you are not printing it. <> is the readline operator. You need to print the $_ variable, or assign it to a defined variable as you are doing in the second code. You could rewrite the first:
print;
And it would work, because print uses $_ if you don't give it anything.
When used in scalar context, <FH> returns the next single line from the file.
When used in list context, <FH> returns a list of all remaining lines in the file.
while (my $file = <FH>) is a scalar context, since you're assigning to a scalar. while (<FH>) is short for while(defined($_ = <FH>)), so it is also a scalar context. print (<FH>); makes it a list context, since you're using it as argument to a function that can take multiple arguments.
while (<FH>) {
print (<FH>);
}
The while part reads the first line into $_ (which is never used again). Then the print part reads the rest of the lines all at once, then prints them all out again. Then the while condition is checked again, but since there are now no lines left, <FH> returns undef and the loop quits after just one iteration.
while (my $file = <FH>) {
print ("$file");
}
does more what you probably expect: reads and then prints one line during each iteration of the loop.
By the way, print $file; does the same as print ("$file");
while (<FH>) {
print (<FH>);
}
use this instead:
while (<FH>) {
print $_;
}

Can not remove . and .. using grep?

I use readdir to get the files of a directory , but I want to remove . and .. using grep . The output shows it still contain the . and .. , but I can't figure out what's wrong with it ?
here is my code
#!/usr/bin/perl
opendir(Dir,$ARGV[0]);
#Dirs = readdir(Dir);
#Dirs = grep { $_ != /./ } #Dirs;
# #Dirs = grep { $_ =~ /^./ } #Dirs;
print join("\n",#Dirs);
Thanks
I strongly suggest you take note of the following
Always use strict and use warnings, even for the tiniest bit of code. They will repay you the extra typing time many times over
Always use lexical directory handles and file handles. Global handles like this have been the
wrong choice for over twelve years now
Always check the success of file and directory open calls, and use a die string that includes the $! variable to say why the open failed
Use lower-case letters and underscores for local variable names. Upper case is reserved by convention for global items like package names and built-in variables
Use print "$_\n" for #array instead of print join "\n", #array because a) using join produces a second copy of the text in the array and wastes space, and b) using join omits the newline from the last line of the array
Take a look at this alternative to your program, which applies the advice above. I have excluded all directory entries beginning with a dot, as it successfully removes . and .. as well as Linux "hidden" entries that start with a dot. You may require something different.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $dh, $ARGV[0] or die $!;
my #dirs = grep { not /^\./ } readdir $dh;
print "$_\n" for #dirs;
Try escaping the .:
#Dirs = grep { $_ !~ /^\.\.?$/ } #Dirs;
The dot is a special metacharacter which matches any character when not escaped.
. in a regexp means "any character", try escaping it like this: \.

Why doesn't Perl file glob() work outside of a loop in scalar context?

According to the Perl documentation on file globbing, the <*> operator or glob() function, when used in a scalar context, should iterate through the list of files matching the specified pattern, returning the next file name each time it is called or undef when there are no more files.
But, the iterating process only seems to work from within a loop. If it isn't in a loop, then it seems to start over immediately before all values have been read.
From the Perl docs:
In scalar context, glob iterates through such filename expansions, returning undef when the list is exhausted.
http://perldoc.perl.org/functions/glob.html
However, in scalar context the operator returns the next value each time it's called, or undef when the list has run out.
http://perldoc.perl.org/perlop.html#I/O-Operators
Example code:
use warnings;
use strict;
my $filename;
# in scalar context, <*> should return the next file name
# each time it is called or undef when the list has run out
$filename = <*>;
print "$filename\n";
$filename = <*>; # doesn't work as documented, starts over and
print "$filename\n"; # always returns the same file name
$filename = <*>;
print "$filename\n";
print "\n";
print "$filename\n" while $filename = <*>; # works in a loop, returns next file
# each time it is called
In a directory with 3 files...file1.txt, file2.txt, and file3.txt, the above code will output:
file1.txt
file1.txt
file1.txt
file1.txt
file2.txt
file3.txt
Note: The actual perl script should be outside the test directory, or you will see the file name of the script in the output as well.
Am I doing something wrong here, or is this how it is supposed to work?
Here's a way to capture the magic of the <> glob operator's state into an object that you can manipulate in a normal sort of way: anonymous subs (and/or closures)!
sub all_files {
return sub { scalar <*> };
}
my $iter = all_files();
print $iter->(), "\n";
print $iter->(), "\n";
print $iter->(), "\n";
or perhaps:
sub dir_iterator {
my $dir = shift;
return sub { scalar glob("$dir/*") };
}
my $iter = dir_iterator("/etc");
print $iter->(), "\n";
print $iter->(), "\n";
print $iter->(), "\n";
Then again my inclination is to file this under "curiosity". Ignore this particular oddity of glob() / <> and use opendir/readdir, IO::All/readdir, or File::Glob instead :)
The following code also seems to create 2 separate instances of the iterator...
for ( 1..3 )
{
$filename = <*>;
print "$filename\n" if defined $filename;
$filename = <*>;
print "$filename\n" if defined $filename;
}
I guess I see the logic there, but it is kind of counter intuitive and contradictory to the documentation. The docs don't mention anything about having to be in a loop for the iteration to work.
Also from perlop:
A (file)glob evaluates its (embedded) argument only when it is starting a new list.
Calling glob creates a list, which is either returned whole (in list context) or retrieved one element at a time (in scalar context). But each call to glob creates a separate list.
(Scratching away at my rusty memory of Perl...) I think that multiple lexical instances of <*> are treated as independent invokations of glob, whereas in the while loop you are invoking the same "instance" (whatever that means).
Imagine, for instance, if you did this:
while (<*>) { ... }
...
while (<*>) { ... }
You certainly wouldn't expect those two invocations to interfere with each other.