I've two perl scripts, both of them wait for user to enter some input as below,
Does both of them are same ?
Does "STDIN" written in <> are just to for user-readability of code ?
If not please tell me the differences.
a) $in = <STDIN>;
b) $in = <>;
The form <FILEHANDLE> will only read from FILEHANDLE.
The form <> will read from STDIN if #ARGV is empty; or from all the files whose names are still in #ARGV which contains the command line arguments passed to the program.
<> is shorthand for <ARGV>. And ARGV is a special filehandle that either opens and iterates through all of the filenames specified in #ARGV (the command-line arguments) or gets aliased to STDIN (when #ARGV is empty).
More information about <> you can get from perlop, section about I/O Operators
Related
I have noticed that the contents of #ARGV gets directed to the input of a <> command.
If I am going to get input from the keyboard using <>, should I clear #ARGV beforehand? Is that the only way to do it?
eg:
#ARGV = ();
$input = <>;
(I was quite surprised that #ARGV interferes with <>. How does that make sense?)
<> means <ARGV>. The ARGV filehandle refers to the concatenation of the files listed in #ARGV. If #ARGV is empty, it acts as if #ARGV = ('-'), which means reading from standard input. (It's magical that way.) See I/O Operators in perldoc perlop and perldoc -f readline.
<> is meant to emulate the common behavior of many unix tools (e.g. cat, sort, wc, ...) that read from standard input if passed no arguments, and otherwise read input from all files listed on the command line (treating - as a directive to read from standard input as well).
If you just want to read from STDIN, do this:
my $line = <STDIN>;
... or, my preference:
my $line = readline STDIN;
(Note that standard input does not necessarily refer to the keyboard. You can easily redirect it e.g. to a file: yourscript.pl < input.txt)
I apologize if this question sounds simple, my intention is to understand in depth how this (these?) particular operator(s) works and I was unable to find a satisfactory description in the perldocs (It probably exists somewhere, I just couldn't find it for the life of me)
Particularly, I am interested in knowing if
a) <>
b) <*> or whatever glob and
c) <FH>
are fundamentally similar or different, and how they are used internally.
I built my own testing functions to gain some insight on this (presented below). I still don't have a full understanding (my understanding might even be wrong) but this is what I've concluded:
<>
In Scalar Context: Reads the next line of the "current file" being read (provided in #ARGV). Questions: This seems like a very particular scenario, and I wonder why it is the way it is and whether it can be generalized or not. Also what is the "current file" that is being read? Is it in a file handle? What is the counter?
In List Context: Reads ALL of the files in #ARGV into an array
<list of globs>
In Scalar Context: Name of the first file found in current folder that matches the glob. Questions: Why the current folder? How do I change this? Is the only way to change this doing something like < /home/* > ?
In List Context: All the files that match the glob in the current folder.
<FH> just seems to return undef when assigned to a variable.
Questions: Why is it undef? Does it not have a type? Does this behave similarly when the FH is not a bareword filehandle?
General Question: What is it that handles the value of <> and the others during execution? In scalar context, is any sort of reference returned, or are the variables that we assign them to, at that point identical to any other non-ref scalar?
I also noticed that even though I am assigning them in sequence, the output is reset each time. i.e. I would have assumed that when I do
$thing_s = <>;
#thing_l = <>;
#thing_l would be missing the first item, since it was already received by $thing_s. Why is this not the case?
Code used for testing:
use strict;
use warnings;
use Switch;
use Data::Dumper;
die "Call with a list of files\n" if (#ARGV<1);
my #whats = ('<>','<* .*>','<FH>');
my $thing_s;
my #thing_l;
for my $what(#whats){
switch($what){
case('<>'){
$thing_s = <>;
#thing_l = <>;
}
case('<* .*>'){
$thing_s = <* .*>;
#thing_l = <* .*>;
}
case('<FH>'){
open FH, '<', $ARGV[0];
$thing_s = <FH>;
#thing_l = <FH>;
}
}
print "$what in scalar context is: \n".Dumper($thing_s)."\n";
print "$what in list context is: \n".Dumper(#thing_l)."\n";
}
The <> thingies are all iterators. All of these variants have common behaviour:
Used in list context, all remaining elements are returned.
Used in scalar context, only the next element is returned.
Used in scalar context, it returns undef once the iterator is exhausted.
These last two properties make it suitable for use as a condition in while loops.
There are two kinds of iterators that can be used with <>:
Filehandles. In this case <$fh> is equivalent to readline $fh.
Globs, so <* .*> is equivalent to glob '* .*'.
The <> is parsed as a readline when it contains either nothing, a bareword, or a simple scalar. More complex expression can be embedded like <{ ... }>.
It is parsed as a glob in all other cases. This can be made explicit by using quotes: <"* .*"> but you should really be explicit and use the glob function instead.
Some details differ, e.g. where the iterator state is kept:
When reading from a file handle, the file handle holds that iterator state.
When using the glob form, each glob expression has its own state.
Another part is if the iterator can restart:
glob restarts after returning one undef.
filehandles can only be restarted by seeking – not all FHs support this operation.
If no file handle is used in <>, then this defaults to the special ARGV file handle. The behaviour of <ARGV> is as follows:
If #ARGV is empty, then ARGV is STDIN.
Otherwise, the elements of #ARGV are treated as file names. The following pseudocode is executed:
$ARGV = shift #ARGV;
open ARGV, $ARGV or die ...; # careful! no open mode is used
The $ARGV scalar holds the filename, and the ARGV file handle holds that file handle.
When ARGV would be eof, the next file from #ARGV is opened.
Only when #ARGV is completely empty can <> return undef.
This can actually be used as a trick to read from many files:
local #ARGV = qw(foo.txt bar.txt baz.txt);
while (<>) {
...;
}
What is it that handles the value of <> and the others during execution?
The Perl compiler is very context-aware, and often has to choose between multiple ambiguous interpretations of a code segment. It will compile <> as a call to readline or to glob depending on what is inside the brackets.
In scalar context, is any sort of reference returned, or are the variables that we assign them to, at that point identical to any other non-ref scalar?
I'm not sure what you're asking here, or why you think the variables that take the result of a <> should be any different from other variables. They are always simple string values: either a filename returned by glob, or some file data returned by readline.
<FH> just seems to return undef when assigned to a variable. Questions: Why is it undef? Does it not have a type? Does this behave similarly when the FH is not a bareword filehandle?
This form will treat FH as a filehandle, and return the next line of data from the file if it is open and not at eof. Otherwise undef is returned, to indicate that nothing valid could be read. Perl is very flexible with types, but undef behaves as its own type, like Ruby's nil. The operator behaves the same whether FH is a global file handle or a (variable that contains) a reference to a typeglob.
Perl offers this very nice feature:
while ( <> )
{
# do something
}
...which allows the script to be used as script.pl <filename> as well as cat <filename> | script.pl.
Now, is there a way to determine if the script has been called in the former way, and if yes, what the filename was?
I know I knew this once, and I know I even used the construct, but I cannot remember where / how. And it proved very hard to search the 'net for this ("perl stdin filename"? No...).
Help, please?
The variable $ARGV holds the current file being processed.
$ echo hello1 > file1
$ echo hello2 > file2
$ echo hello3 > file3
$ perl -e 'while(<>){s/^/$ARGV:/; print;}' file*
file1:hello1
file2:hello2
file3:hello3
The I/O Operators section of perlop is very informative about this.
Essentially, the first time <> is executed, - is added to #ARGV if it started out empty. Opening - has the effect of cloning the STDIN file handle, and the variable $ARGV is set to the current element of #ARGV as it is processed.
Here's the full clip.
The null filehandle "<>" is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes a
list of filenames, doing the same to each line of input from all of
them. Input from "<>" comes either from standard input, or from each
file listed on the command line. Here's how it works: the first time
"<>" is evaluated, the #ARGV array is checked, and if it is empty,
$ARGV[0] is set to "-", which when opened gives you standard input. The
#ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally. "<>" is just
a synonym for "<ARGV>", which is magical. (The pseudo code above doesn't
work because it treats "<ARGV>" as non-magical.)
If you care to know about when <> switches to a new file (e.g. in my case - I wanted to record the new filename and line number), then the eof() function documentation offers a trick:
# reset line numbering on each input file
while (<>) {
next if /^\s*#/; # skip comments
print "$.\t$_";
} continue {
close ARGV if eof; # Not eof()!
}
print reverse <>;
print sort <>;
What's the exact steps perl handles with these operations?
It seems for the 1st one,perl not just reverses the order of invocation parameters,but also the contents of each file...
print reverse <>;
<> is evaluated in an array context, meaning it "slurps" the file. It reads the entire file. In the case of the magic file represented by the files named in #ARGV, it will read the contents of all files in the order referenced by the command line arguments (#ARGV).
reverse then reverses the order of the array, meaning the last line from the last file comes first, and the first line from the last file comes last.
print then prints the array.
From your notes, you might want something like this:
perl -e 'sub BEGIN { #ARGV=reverse #ARGV; } print <>;' /etc/motd /etc/passwd
This is described in the docs for I/O operators. Here's an excerpt from the docs:
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The #ARGV array is then processed as a list of filenames.
It's worth reading the entire doc, as it provides equivalent "non-magical" Perl code equivalent to <> in various use cases.
Consider the following silly Perl program:
$firstarg = $ARGV[0];
print $firstarg;
$input = <>;
print $input;
I run it from a terminal like:
perl myprog.pl sample_argument
And get this error:
Can't open sample_argument: No such file or directory at myprog.pl line 5.
Any ideas why this is? When it gets to the <> is it trying to read from the (non-existent) file, "sample_argument" or something? And why?
<> is shorthand for "read from the files specified in #ARGV, or if #ARGV is empty, then read from STDIN". In your program, #ARGV contains the value ("sample_argument"), and so Perl tries to read from that file when you use the <> operator.
You can fix it by clearing #ARGV before you get to the <> line:
$firstarg = shift #ARGV;
print $firstarg;
$input = <>; # now #ARGV is empty, so read from STDIN
print $input;
See the perlio man page, which reads in part:
The null filehandle <> is special: it can be used to emulate the behavior of sed
and awk. Input from <> comes either from standard input, or from each file listed
on the command line. Here’s how it works: the first time <> is evaluated, the
#ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when
opened gives you standard input. The #ARGV array is then processed as a list of
filenames.
If you want STDIN, use STDIN, not <>.
By default, perl consumes the command line arguments as input files for <>. After you've used them, you should consume them yourself with shift;