Why does Perl's glob return undef for every other call? - perl

I'm not necessarily looking for a better way to do this, rather an explanations of the output would greatly be appreciated. Recently, a senior programmer asked me why his code worked but only for one instance. What I came to find out was that it worked every other occurrence. Here is my example:
#!/usr/bin/perl -w
use strict;
my #list_env_vars = (
'$SERVER',
'$SERVER',
'$SERVER',
'$SERVER',
'$SERVER',
'$SERVER',
);
foreach (#list_env_vars){
print "$_ = ".glob()."\n";
}
which output for perl 5.004:
$SERVER = UNIX_SERVER
$SERVER =
$SERVER = UNIX_SERVER
$SERVER =
$SERVER = UNIX_SERVER
$SERVER =
or output for perl 5.10:
$SITE = $SITE
Use of uninitialized value in concatenation (.) or string at glob_test.pl line 14.
$SITE =
$SITE = $SITE
Use of uninitialized value in concatenation (.) or string at glob_test.pl line 14.
$SITE =
$SITE = $SITE
Use of uninitialized value in concatenation (.) or string at glob_test.pl line 14.
$SITE =
I personally have never used glob() in this fashion so I was ill equipped to answer him. I read through perldoc glob documentation and followed the File::Glob link on that page and still couldn’t find anything that would explain the output. Any help would be much appreciated.

glob in scalar context:
In scalar context, glob iterates through such filename expansions, returning undef when the list is exhausted.
In
foreach (#list_env_vars){
print "$_ = ".glob()."\n";
}
The glob() there really is glob($_). Every iteration, $_ contains the string $SERVER. Given that the environment variable does not change, $SERVER is expanded to the same string. First time, this string is returned. Next, the list is exhausted, so undef is returned. Third time, we start over. ...
Clarification: It does not matter that the argument to the second call is the same as the one for the first call since there is no way to reset glob's iterator.
You can see this more clearly using the following example (current directory contains files '1.a', 1.b', '2.a' and '2.b'):
#!/usr/bin/perl -w
use strict;
my #patterns = (
'*.a',
'*.b',
);
for my $v ( #patterns ) {
print "$v = ", scalar glob($v), "\n";
}
Output:
C:\Temp> d
*.a = 1.a
*.b = 2.a
I would recommend accessing environment variables via the %ENV hash:
my #list_env_vars = ($ENV{SERVER}) x 6;
or
my #list_env_vars = #ENV{qw(HOME TEMP SERVER)};

Incidentally, the reason why in 5.004 you get a variable expansion, while on 5.10 you just get your literal string back, is because on old perl, glob() was carried out by the system shell, which just as a side-effect performs variable expansion. Since perl 5.6, glob() uses the File::Glob module which does the work itself, without the shell, and doesn't expand environment variables (which glob was never intended to do). %ENV is the proper way to get at the environment.

Notes on the old behavior, wiki'd for your convenience (and so that I have the full range of markup and no 500-char limit):
The fact that glob and <*globbything*> changed in 5.6 is mentioned in passing in the docs (perl56delta, perlop, -f glob) but the only real source on exactly how it used to work is a pre-5.6 version of perlop. Here's the relevant bit from 5.005:
Example:
while (<*.c>) {
chmod 0644, $_;
}
is equivalent to
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
chop;
chmod 0644, $_;
}
In fact, it's currently implemented that way. (Which means it will not work on filenames with spaces in them unless you have csh(1) on your machine.)
Heh, that's pretty evil stuff. Anyway, if you ever find yourself wanting to consult old perldocs like that, just go to search.cpan.org, pull up the perl distribution, use the pulldown list to select an old version, then click through to the doc that you need. perl itself isn't really subject to getting "tidied" off of CPAN; currently everything from 5.004 on up is available without hitting BackPan.

Related

Line Input operator with glob returning old values

The following excerpt code, when running on perl 5.16.3 and older versions, has a strange behavior, where subsequent calls to a glob in the line input operator causes the glob to continue returning previous values, rather than running the glob anew.
#!/usr/bin/env perl
use strict;
use warnings;
my #dirs = ("/tmp/foo", "/tmp/bar");
foreach my $dir (#dirs) {
my $count = 0;
my $glob = "*";
print "Processing $glob in $dir\n";
while (<$dir/$glob>) {
print "Processing file $_\n";
$count++;
last if $count > 0;
}
}
If you put two files in /tmp/foo and one or more in /tmp/bar, and run the code, I get the following output:
Processing * in /tmp/foo
Processing file /tmp/foo/foo.1
Processing * in /tmp/bar
Processing file /tmp/foo/foo.2
I thought that when the while terminates after the last, that the new invocation of the while on the second iteration would re-run the glob and give me the files listed /tmp/bar, but instead I get a continuation of what's in /tmp/foo.
It's almost like the angle operator glob is acting like a precompiled pattern. My hypothesis is that the angle operator is creating a filehandle in the symbol table that's still open and being reused behind the scenes, and that it's scoped to the containing foreach, or possibly the whole subroutine.
From I/O Operators in perlop
(my emphasis)
A (file)glob evaluates its (embedded) argument only when it is starting a
new list. All values must be read before it will start over. In list
context, this isn't important because you automatically get them all
anyway. However, in scalar context the operator returns the next value
each time it's called, or undef when the list has run out.
Since <> is called in scalar context here and you exit the loop with last after the first iteration, the next time you enter it it keeps reading from the original list.
It is clarified in comments that there is a practical need behind this quest: process only some of the files from a directory and never return all filenames since there can be many.
So assigning from glob to a list and working with it, or better yet using for instead of while as commented by ysth, doesn't help here as it returns a huge list.
I haven't found a way to make glob (what <> with a filename pattern uses) drop and rebuild the list once it's generated it, without getting to its end first.
Apparently, each instance of the operator gets its own list. So using another <> inside the while loop with the hope of resetting it, in any way and even with the same pattern, doesn't affect the list being iterated over in while (<$glob>).
Just to note, breaking out of the loop with a die (with while in an eval) doesn't help either; the next time we come to that while the same list is continued. Wrapping it in a closure
sub iter_glob { my $dir = shift; return sub { scalar <"$dir/*"> } }
for my $d (#dirs) {
my $iter = iter_glob($d);
while (my $f = $iter->()) {
# ...
}
}
met with the same fate; the original list keeps being used.
The solution then is to use readdir instead.

Perl glob returning a false positive

What seemed liked a straightforward piece of code most certainly didn't do what I wanted it to do.
Can somebody explain to me what it does do and why?
my $dir = './some/directory';
if ( -d $dir && <$dir/*> ) {
print "Dir exists and has non-hidden files in it\n";
}
else {
print "Dir either does not exist or has no non-hidden files in it\n";
}
In my test case, the directory did exist and it was empty. However, the then (first) section of the if triggered instead of the else section as expected.
I don't need anybody to suggest how to accomplish what I want to accomplish. I just want to understand Perl's interpretation of this code, which definitely does not match mine.
Using glob (aka <filepattern>) in a scalar context makes it an iterator; it will return one file at a time each time it is called, and will not respond to changes in the pattern (e.g. a different $dir) until it has finished iterating over the initial results; I suspect this is causing the trouble you see.
The easy answer is to always use it in list context, like so:
if( -d $dir && ( () = <$dir/*> ) ) {
glob may only really be used safely in scalar context in code you will execute more than once if you are absolutely sure you will exhaust the iterator before you try to start a new iteration. Most of the time it's just easier to avoid glob in scalar context altogether.
I believe that #ysth is on the right track, but repeated calls to glob in scalar context don't generate false positives.
For example
use strict;
use warnings;
use 5.010;
say scalar glob('/usr/*'), "\n";
say scalar glob('/usr/*'), "\n";
output
/usr/bin
/usr/bin
But what is true is that any single call to glob maintains a state, so if I have
use strict;
use warnings;
use 5.010;
for my $dir ( '/sys', '/usr', '/sys', '/usr' ) {
say scalar glob("$dir/*"), "\n";
}
output
/sys/block
/sys/bus
/sys/class
/sys/dev
So clearly that glob statement inside the loop is maintaining a state, and ignoring the changes to $dir.
This is similar to the way that the pos (and corresponding \G regex anchor) has a state per scalar variable, and how print without a specific file handle prints to the last selected handle. In the end it is how all of Perl works, with the it variable $_ being the ultimate example.

Perl - Use of uninitialized value within %frequency in concatenation (.) or string

Not entirely sure why but for some reason i cant print the hash value outside the while loop.
#!/usr/bin/perl -w
opendir(D, "cwd" );
my #files = readdir(D);
closedir(D);
foreach $file (#files)
{
open F, $file or die "$0: Can't open $file : $!\n";
while ($line = <F>) {
chomp($line);
$line=~ s/[-':!?,;".()]//g;
$line=~ s/^[a-z]/\U/g;
#words = split(/\s/, $line);
foreach $word (#words) {
$frequency{$word}++;
$counter++;
}
}
close(F);
print "$file\n";
print "$ARGV[0]\n";
print "$frequency{$ARGV[0]}\n";
print "$counter\n";
}
Any help would be much appreciated!
cheers.
This line
print "$frequency{$ARGV[0]}\n";
Expects you to have an argument to your script, e.g. perl script.pl argument. If you have no argument, $ARGV[0] is undefined, but it will stringify to the empty string. This empty string is a valid key in the hash, but the value is undefined, hence your warning
Use of uninitialized value within %frequency in concatenation (.) or string
But you should also see the warning
Use of uninitialized value $ARGV[0] in hash element
And it is a very big mistake not to include that error in this question.
Also, when using readdir, you get all the files in the directory, including directories. You might consider filtering the files somewhat.
Using
use strict;
use warnings;
Is something that will benefit you very much, so add that to your script.
I had originally written this,
There is no %frequency defined at the top level of your program.
When perl sees you reference %frequency inside the inner-most
loop, it will auto-vivify it, in that scratchpad (lexical scope).
This means that when you exit the inner-most loop (foreach $word
(#words)), the auto-vivified %frequency is out of scope and
garbage-collected. Each time you enter that loop, a new, different
variable will be auto-vivified, and then discarded.
When you later refer to %frequency in your print, yet another new,
different %frequency will be created.
… but then realized that you had forgotten to use strict, and Perl was being generous and giving you a global %frequency, which ironically is probably what you meant. So, this answer is wrong in your case … but declaring the scope of %frequency would probably be good form, regardless.
These other, “unrelated” notes are still useful perhaps, or else I'd delete the answer altogether:
As #TLP mentioned, you should probably also skip directories (at least) in your file loop. A quick way to do this would be my #files = grep { -f "cwd/$_" } (readdir D); this will filter the list to contain only files.
I'm further suspicious that you named a directory "cwd" … are you perhaps meaning the current working directory? In all the major OS'es in use today, that directory is referenced as “.” — you're looking for a directory literally named "cwd"?

perl s/this/that/r ==> "Bareword found where operator expected"

Perl docs recommend this:
$foo = $bar =~ s/this/that/r;
However, I get this error:
Bareword found where operator expected near
"s/this/that/r" (#1)
This is specific to the r modifier, without it the code works.
However, I do not want to modify $bar.
I can, of course, replace
my $foo = $bar =~ s/this/that/r;
with
my $foo = $bar;
$foo =~ s/this/that/;
Is there a better solution?
As ruakh wrote, /r is new in perl 5.14. However you can do this in previous versions of perl:
(my $foo = $bar) =~ s/this/that/;
There's no better solution, no (though I usually write it on one line, since the s/// is essentially serving as part of the initialization process:
my $foo = $bar; $foo =~ s/this/that/;
By the way, the reason for your error-message is almost certainly that you're running a version of Perl that doesn't support the /r flag. That flag was added quite recently, in Perl 5.14. You might find it easier to develop using the documentation for your own version; for example, http://perldoc.perl.org/5.12.4/perlop.html if you're on Perl 5.12.4.
For completeness.
If you are stuck with an older version of perl.
And really want to use the s/// command without resorting to using a temporary variable.
Here is one way:
perl -E 'say map { s/_iter\d+\s*$//; $_ } $ENV{PWD}'
Basically use map to transform a copy of the string and return the final output.
Instead of what s/// does - of returning the count of substitutions.

Can NOT List directory including space using Perl in Windows Platform

In order to list pathes in Windows,I wrote below Perl function(executed under StrawBerry runtime environment).
sub listpath
{
my $path = shift;
my #list = glob "$path/*";
#my #list = <$path/*>;
my #pathes = grep { -d and $_ ne "." and $_ ne ".." } #list;
}
But it can't parse directory including space correctly, for example:
When I issued following code:
listpath("e:/test/test1/test11/test111/test1111/test11111 - Copy");
The function returned an array including two elements:
1: e:/test/test1/test11/test111/test1111/test11111
2: -
I am wondering if glob could parse above space directories. Thanks a lot.
Try bsd_glob instead:
use File::Glob ':glob';
my #list = bsd_glob "$path/*";
Even if the topic has been answered long time ago, I recently encounter the same problem, and a quick search gives me another solution, from perlmonks (last reply):
my $path = shift;
$path =~ s/ /\\ /g;
my #list = glob "$path/*";
But prefer bsd_glob, it supports also a couple of other neat features, such as [] for character class.
The question is about Windows platform, where Bentoy13's solution does not work because the backslash would be mistaken for a path separator.
Here's an option if for whatever reason you don't want to go with bsd_glob: wrap the offensive part of the path in double quotes. This can be one directory name (path\\"to my"\\file.txt) or several directory names ("path\\to my"\\file.txt). Slash instead of backslash usually works, too. Of course, they don't have to include a space, so this here always works:
my #list = glob "\"$path\"/*";
remember, it's a Windows solution. Whether it works under Linux depends on context.