Filehandle stored in hash variable reading as GLOB [duplicate] - perl

Code
$ cat test1
hello
i am
lazer
nananana
$ cat 1.pl
use strict;
use warnings;
my #fh;
open $fh[0], '<', 'test1', or die $!;
my #res1 = <$fh[0]>; # Way1: why does this not work as expected?
print #res1."\n";
my $fh2 = $fh[0];
my #res2 = <$fh2>; # Way2: this works!
print #res2."\n";
Run
$ perl 1.pl
1
5
$
I am not sure why Way1 does not work as expected while Way2 does. Aren't those two methods the same? What is happening here?

Because of the dual nature of the <> operator (i.e. is it glob or readline?), the rules are that to behave as readline, you can only have a bareword or a simple scalar inside the brackets. So you'll have to either assign the array element to a simple scalar (as in your example), or use the readline function directly.

Because from perlop:
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means <$x> is always a readline() from an indirect handle, but <$hash{key}> is always a glob().
You can spell the <> operator as readline instead to avoid problems with this magic.

Anything more complex than a bareword (interpreted as a file handle) or a simple scalar $var is interpreted as an argument to the glob() function. Only barewords and simple scalars are treated as file handles to be iterated by the <...> operator.
Basically the rules are:
<bareword> ~~ readline bareword
<$scalar> ~~ readline $scalar
<$array[0]> ~~ glob "$array[0]"
<anything else> ~~ glob ...

It's because <$fh[0]> is parsed as glob($fh[0]).
Use readline instead:
my #res1 = readline($fh[0]);

Related

In Perl, how does readline assign to $_ in a loop condition but not elsewhere?

How is readline implemented in Perl?
Question is why readline sets $_ if readline is used in a loop condition such as:
while(<>) {
#here $_ is set
print;
}
On the contrary, if we just do
<>;
print; #$_ is not set here
It will not print anything?
How is this implemented? How does the function know it is used in a loop condition statement? Or it is just a built-in behavior so designed that way?
In this case, there's nothing special about the implementation of readline. It never sets $_. Instead, there's a special case in the Perl compiler that examines the condition of a while loop and rewrites certain conditions internally.
For example, while (<>) {} gets rewritten into
while (defined($_ = <ARGV>)) {
();
}
You can see this with perl -MO=Deparse -e 'while (<>) {}'.
This is documented under I/O Operators in perlop:
Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a while statement (even if disguised as a for(;;) loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously.
It's also mentioned in Loop Control & For Loops in perlsyn.
while is a special case here assigning to $_. In the second case you just read everything on the command line and throw it away immediately. For further details, read the docs: https://metacpan.org/pod/perlop#I-O-Operators
Perl functions can behave differently in different contexts.
Your first example is a scalar context. Your second example is a void context.
You can determine the calling context of a function by using the built-in wantarray.
perldoc perlvar declares all the places where $_ is modified or used:
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf, quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only), rmdir, say, sin, split (for its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f , -d) except for -t, which defaults to STDIN. See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put the next value or input record when a <FH>, readline, readdir or each operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.

Why can't I use a typeglob in the diamond operator in Perl?

Usually bareword as the filehanle or a variable holds filehandle could be places inside <> operator to reference the file, but NOT the filehandle extracted from typeglob as the last line below shows. Why it doesn't work because the last case also references a filehandle?
open FILE, 'file.txt';
my $myfile = *FILE{IO};
print <$myfile>;
print <*FILE{IO}>; # this line doesn't work.
<> is among other things shortcut for readline(), and it accepts simple scalars or bare word, ie. <FILE>. For more complex expressions you have to be more explicit,
print readline *FILE{IO};
otherwise it will be interpreted as glob()
perl -MO=Deparse -e 'print <*FILE{IO}>;'
use File::Glob ();
print glob('*FILE{IO}');
In perlop, it says:
If what's within the angle brackets is neither a filehandle nor a
simple scalar variable containing a filehandle name, typeglob, or
typeglob reference, it is interpreted as a filename pattern to be
globbed ...
Since we want to be able to say things like:
foreach (<*.c>) {
# Do something for each file that matches *.c
}
it is not possible for perl to interpret the '*' as meaning a typeglob.
As noted in the other answer, you can work around this using readline, or you can assign the typeglob to a scalar first (as your example shows).

Difference between "printf" and "print sprintf"

The following two simple perl programs have different behaviors:
#file1
printf #ARGV;
#file2
$tmp = sprintf #ARGV;
print $tmp;
$> perl file1 "hi %04d %.2f" 5 7.12345
#output: hi 0005 7.12
$> perl file2 "hi %04d %.2f" 5 7.12345
#output: 3
Why is the difference? I had thought the two programs are equivalent. Wonder if there is a way to make file2 (using "sprintf") to behave like file1.
The builtin sprintf function has a prototype:
$ perl -e 'print prototype("CORE::sprintf")'
$#
It treats the first argument as a scalar. Since you provided the argument #ARGV, it was coerced into a scalar by passing the number of elements in #ARGV instead.
Since the printf function has to support the syntax printf HANDLE TEMPLATE,LIST as well as printf TEMPLATE,LIST, it cannot support a prototype. So it always treats its arguments as a flat list, and uses the first element in the list as the template.
One way to make it the second script work correctly would be to call it like
$tmp = sprintf shift #ARGV, #ARGV
Another difference between printf and sprintf is that print sprintf appends $\ to the output, while printf does not (thanks, ysth).
#ARGV contains the arguments passed to the script in list form. printf takes that list and prints it out as is.
In second example you are using sprintf with the array and assigning it to scalar. Which basically means it stores the length of the array in your variable $tmp. Hence you get 3 as output.
From the perl docs (jaypal said it already)
Unlike printf, sprintf does not do what you probably mean when you pass it an array as your first argument. The array is given scalar context, and instead of using the 0th element of the array as the format, Perl will use the count of elements in the array as the format, which is almost never useful.

What happens internally when you have < FH >, <>, or < * > in perl?

I apologize if this question sounds simple, my intention is to understand in depth how this (these?) particular operator(s) works and I was unable to find a satisfactory description in the perldocs (It probably exists somewhere, I just couldn't find it for the life of me)
Particularly, I am interested in knowing if
a) <>
b) <*> or whatever glob and
c) <FH>
are fundamentally similar or different, and how they are used internally.
I built my own testing functions to gain some insight on this (presented below). I still don't have a full understanding (my understanding might even be wrong) but this is what I've concluded:
<>
In Scalar Context: Reads the next line of the "current file" being read (provided in #ARGV). Questions: This seems like a very particular scenario, and I wonder why it is the way it is and whether it can be generalized or not. Also what is the "current file" that is being read? Is it in a file handle? What is the counter?
In List Context: Reads ALL of the files in #ARGV into an array
<list of globs>
In Scalar Context: Name of the first file found in current folder that matches the glob. Questions: Why the current folder? How do I change this? Is the only way to change this doing something like < /home/* > ?
In List Context: All the files that match the glob in the current folder.
<FH> just seems to return undef when assigned to a variable.
Questions: Why is it undef? Does it not have a type? Does this behave similarly when the FH is not a bareword filehandle?
General Question: What is it that handles the value of <> and the others during execution? In scalar context, is any sort of reference returned, or are the variables that we assign them to, at that point identical to any other non-ref scalar?
I also noticed that even though I am assigning them in sequence, the output is reset each time. i.e. I would have assumed that when I do
$thing_s = <>;
#thing_l = <>;
#thing_l would be missing the first item, since it was already received by $thing_s. Why is this not the case?
Code used for testing:
use strict;
use warnings;
use Switch;
use Data::Dumper;
die "Call with a list of files\n" if (#ARGV<1);
my #whats = ('<>','<* .*>','<FH>');
my $thing_s;
my #thing_l;
for my $what(#whats){
switch($what){
case('<>'){
$thing_s = <>;
#thing_l = <>;
}
case('<* .*>'){
$thing_s = <* .*>;
#thing_l = <* .*>;
}
case('<FH>'){
open FH, '<', $ARGV[0];
$thing_s = <FH>;
#thing_l = <FH>;
}
}
print "$what in scalar context is: \n".Dumper($thing_s)."\n";
print "$what in list context is: \n".Dumper(#thing_l)."\n";
}
The <> thingies are all iterators. All of these variants have common behaviour:
Used in list context, all remaining elements are returned.
Used in scalar context, only the next element is returned.
Used in scalar context, it returns undef once the iterator is exhausted.
These last two properties make it suitable for use as a condition in while loops.
There are two kinds of iterators that can be used with <>:
Filehandles. In this case <$fh> is equivalent to readline $fh.
Globs, so <* .*> is equivalent to glob '* .*'.
The <> is parsed as a readline when it contains either nothing, a bareword, or a simple scalar. More complex expression can be embedded like <{ ... }>.
It is parsed as a glob in all other cases. This can be made explicit by using quotes: <"* .*"> but you should really be explicit and use the glob function instead.
Some details differ, e.g. where the iterator state is kept:
When reading from a file handle, the file handle holds that iterator state.
When using the glob form, each glob expression has its own state.
Another part is if the iterator can restart:
glob restarts after returning one undef.
filehandles can only be restarted by seeking – not all FHs support this operation.
If no file handle is used in <>, then this defaults to the special ARGV file handle. The behaviour of <ARGV> is as follows:
If #ARGV is empty, then ARGV is STDIN.
Otherwise, the elements of #ARGV are treated as file names. The following pseudocode is executed:
$ARGV = shift #ARGV;
open ARGV, $ARGV or die ...; # careful! no open mode is used
The $ARGV scalar holds the filename, and the ARGV file handle holds that file handle.
When ARGV would be eof, the next file from #ARGV is opened.
Only when #ARGV is completely empty can <> return undef.
This can actually be used as a trick to read from many files:
local #ARGV = qw(foo.txt bar.txt baz.txt);
while (<>) {
...;
}
What is it that handles the value of <> and the others during execution?
The Perl compiler is very context-aware, and often has to choose between multiple ambiguous interpretations of a code segment. It will compile <> as a call to readline or to glob depending on what is inside the brackets.
In scalar context, is any sort of reference returned, or are the variables that we assign them to, at that point identical to any other non-ref scalar?
I'm not sure what you're asking here, or why you think the variables that take the result of a <> should be any different from other variables. They are always simple string values: either a filename returned by glob, or some file data returned by readline.
<FH> just seems to return undef when assigned to a variable. Questions: Why is it undef? Does it not have a type? Does this behave similarly when the FH is not a bareword filehandle?
This form will treat FH as a filehandle, and return the next line of data from the file if it is open and not at eof. Otherwise undef is returned, to indicate that nothing valid could be read. Perl is very flexible with types, but undef behaves as its own type, like Ruby's nil. The operator behaves the same whether FH is a global file handle or a (variable that contains) a reference to a typeglob.

Why does Perl autovivify in this case?

Why does $a become an arrayref? I'm not pushing anything to it.
perl -MData::Dumper -e 'use strict; 1 for #$a; print Dumper $a'
$VAR1 = [];
It is because the for loop treats contents of #$a as lvalues--something that you can assign to. Remember that for aliases the contents of the array to $_. It appears that the act of looking for aliasable contents in #$a, is sufficient to cause autovivification, even when there are no contents to alias.
This effect of aliasing is consistent, too. The following also lead to autovivification:
map {stuff} #$a;
grep {stuff} #$a;
a_subroutine( #$a);
If you want to manage autovivification, you can use the eponymous pragma to effect lexical controls.
When you treat a scalar variable whose value is undef as any sort of reference, Perl makes the value the reference type you tried to use. In this case, $a has the value undef, and when you use #$a, it has to autovivify an array reference in $a so you can dereference it as an array reference.
$a becomes an ARRAY reference due to Perl's autovivification feature.
$a and $b are special variables in Perl (used in sort) and have a special scope of their own.
perl -MData::Dumper -e 'use strict; 1 for #$c; print Dumper $c'
produces
Global symbol "$c" requires explicit package name at -e line 1.
Global symbol "$c" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.