Need more explanation about what's being done here? - perl

print reverse <>;
print sort <>;
What's the exact steps perl handles with these operations?
It seems for the 1st one,perl not just reverses the order of invocation parameters,but also the contents of each file...

print reverse <>;
<> is evaluated in an array context, meaning it "slurps" the file. It reads the entire file. In the case of the magic file represented by the files named in #ARGV, it will read the contents of all files in the order referenced by the command line arguments (#ARGV).
reverse then reverses the order of the array, meaning the last line from the last file comes first, and the first line from the last file comes last.
print then prints the array.
From your notes, you might want something like this:
perl -e 'sub BEGIN { #ARGV=reverse #ARGV; } print <>;' /etc/motd /etc/passwd

This is described in the docs for I/O operators. Here's an excerpt from the docs:
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The #ARGV array is then processed as a list of filenames.
It's worth reading the entire doc, as it provides equivalent "non-magical" Perl code equivalent to <> in various use cases.

Related

Clear #ARGV before getting input from <>?

I have noticed that the contents of #ARGV gets directed to the input of a <> command.
If I am going to get input from the keyboard using <>, should I clear #ARGV beforehand? Is that the only way to do it?
eg:
#ARGV = ();
$input = <>;
(I was quite surprised that #ARGV interferes with <>. How does that make sense?)
<> means <ARGV>. The ARGV filehandle refers to the concatenation of the files listed in #ARGV. If #ARGV is empty, it acts as if #ARGV = ('-'), which means reading from standard input. (It's magical that way.) See I/O Operators in perldoc perlop and perldoc -f readline.
<> is meant to emulate the common behavior of many unix tools (e.g. cat, sort, wc, ...) that read from standard input if passed no arguments, and otherwise read input from all files listed on the command line (treating - as a directive to read from standard input as well).
If you just want to read from STDIN, do this:
my $line = <STDIN>;
... or, my preference:
my $line = readline STDIN;
(Note that standard input does not necessarily refer to the keyboard. You can easily redirect it e.g. to a file: yourscript.pl < input.txt)

does "STDIN" written in <> effect behaviour of perl program?

I've two perl scripts, both of them wait for user to enter some input as below,
Does both of them are same ?
Does "STDIN" written in <> are just to for user-readability of code ?
If not please tell me the differences.
a) $in = <STDIN>;
b) $in = <>;
The form <FILEHANDLE> will only read from FILEHANDLE.
The form <> will read from STDIN if #ARGV is empty; or from all the files whose names are still in #ARGV which contains the command line arguments passed to the program.
<> is shorthand for <ARGV>. And ARGV is a special filehandle that either opens and iterates through all of the filenames specified in #ARGV (the command-line arguments) or gets aliased to STDIN (when #ARGV is empty).
More information about <> you can get from perlop, section about I/O Operators

What's the use of <> in Perl?

What's the use of <> in Perl. How to use it ?
If we simply write
<>;
and
while(<>)
what is that the program doing in both cases?
The answers above are all correct, but it might come across more plainly if you understand general UNIX command line usage. It is very common to want a command to work on multiple files. E.g.
ls -l *.c
The command line shell (bash et al) turns this into:
ls -l a.c b.c c.c ...
in other words, ls never see '*.c' unless the pattern doesn't match. Try this at a command prompt (not perl):
echo *
you'll notice that you do not get an *.
So, if the shell is handing you a bunch of file names, and you'd like to go through each one's data in turn, perl's <> operator gives you a nice way of doing that...it puts the next line of the next file (or stdin if no files are named) into $_ (the default scalar).
Here is a poor man's grep:
while(<>) {
print if m/pattern/;
}
Running this script:
./t.pl *
would print out all of the lines of all of the files that match the given pattern.
cat /etc/passwd | ./t.pl
would use cat to generate some lines of text that would then be checked for the pattern by the loop in perl.
So you see, while(<>) gets you a very standard UNIX command line behavior...process all of the files I give you, or process the thing I piped to you.
<>;
is a short way of writing
readline();
or if you add in the default argument,
readline(*ARGV);
readline is an operator that reads a line from the specified file handle. Reading from the special file handle ARGV will read from STDIN if #ARGV is empty or from the concatenation of the files named by #ARGV if it's not.
As for
while (<>)
It's a syntax error. If you had
while (<>) { ... }
it get rewritten to
while (defined($_ = <>)) { ... }
And as previously explained, that means the same as
while (defined($_ = readline(*ARGV))) { ... }
That means it will read lines from (previously explained) ARGV until there are no more lines to read.
It is called the diamond operator and feeds data from either stdin if ARGV is empty or each line from the files named in ARGV. This webpage http://docstore.mik.ua/orelly/perl/learn/ch06_02.htm explains it very well.
In many cases of programming with syntactical sugar like this, Deparse of O is helpful to find out what's happening:
$ perl -MO=Deparse -e 'while(<>){print 42}'
while (defined($_ = <ARGV>)) {
print 42;
}
-e syntax OK
Quoting perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes
a list of filenames, doing the same to each line of input from all of
them. Input from <> comes either from standard input, or from each
file listed on the command line.
it takes the STDIN standard input:
> cat temp.pl
#!/usr/bin/perl
use strict;
use warnings;
my $count=<>;
print "$count"."\n";
>
below is the execution:
> temp.pl
3
3
>
so as soon as you execute the script it will wait till the user gives some input.
after 3 is given as input,it stores that value in $count and it prints the value in the next statement.

Perl: How to get filename when using <> construct?

Perl offers this very nice feature:
while ( <> )
{
# do something
}
...which allows the script to be used as script.pl <filename> as well as cat <filename> | script.pl.
Now, is there a way to determine if the script has been called in the former way, and if yes, what the filename was?
I know I knew this once, and I know I even used the construct, but I cannot remember where / how. And it proved very hard to search the 'net for this ("perl stdin filename"? No...).
Help, please?
The variable $ARGV holds the current file being processed.
$ echo hello1 > file1
$ echo hello2 > file2
$ echo hello3 > file3
$ perl -e 'while(<>){s/^/$ARGV:/; print;}' file*
file1:hello1
file2:hello2
file3:hello3
The I/O Operators section of perlop is very informative about this.
Essentially, the first time <> is executed, - is added to #ARGV if it started out empty. Opening - has the effect of cloning the STDIN file handle, and the variable $ARGV is set to the current element of #ARGV as it is processed.
Here's the full clip.
The null filehandle "<>" is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes a
list of filenames, doing the same to each line of input from all of
them. Input from "<>" comes either from standard input, or from each
file listed on the command line. Here's how it works: the first time
"<>" is evaluated, the #ARGV array is checked, and if it is empty,
$ARGV[0] is set to "-", which when opened gives you standard input. The
#ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally. "<>" is just
a synonym for "<ARGV>", which is magical. (The pseudo code above doesn't
work because it treats "<ARGV>" as non-magical.)
If you care to know about when <> switches to a new file (e.g. in my case - I wanted to record the new filename and line number), then the eof() function documentation offers a trick:
# reset line numbering on each input file
while (<>) {
next if /^\s*#/; # skip comments
print "$.\t$_";
} continue {
close ARGV if eof; # Not eof()!
}

Can someone suggest how this Perl script works?

I have to maintain the following Perl script:
#!/usr/bin/perl -w
die "Usage: $0 <file1> <file2>\n" unless scalar(#ARGV)>1;
undef $/;
my #f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
my #f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2);
foreach my $g (0 .. $#f1) {
print (($f2[$g] =~ m/RESULT:\s+PASS/) ? $f2[$g] : $f1[$g]);
}
print STDERR "$#f1 serials found\n";
I know pretty much what it does, but how it's done is difficult to follow. The calls to split() are particulary puzzling.
It's fairly idiomatic Perl and I would be grateful if a Perl expert could make a few clarifying suggestions about how it does it, so that if I need to use it on input files it can't deal with, I can attempt to modify it.
It combines the best results from two datalog files containing test results. The datalog files contain results for various serial numbers and the data for each serial number begins and ends with SERIAL NUMBER: n (I know this because my equipment creates the input files)
I could describe the format of the datalog files, but I think the only important aspect is the SERIAL NUMBER: n because that's all the Perl script checks for
The ternary operator is used to print a value from one input file or the other, so the output can be redirected to a third file.
This may not be what I would call "idiomatic" (that would be use Module::To::Do::Task) but they've certainly (ab)used some language features here. I'll see if I can't demystify some of this for you.
die "Usage: $0 <file1> <file2>\n" unless scalar(#ARGV)>1;
This exits with a usage message if they didn't give us any arguments. Command-line arguments are stored in #ARGV, which is like C's char **argv except the first element is the first argument, not the program name. scalar(#ARGV) converts #ARGV to "scalar context", which means that, while #ARGV is normally a list, we want to know about it's scalar (i.e. non-list) properties. When a list is converted to scalar context, we get the list's length. Therefore, the unless conditional is satisfied only if we passed no arguments.
This is rather misleading, because it will turn out your program needs two arguments. If I wrote this, I would write:
die "Usage: $0 <file1> <file2>\n" unless #ARGV == 2;
Notice I left off the scalar(#ARGV) and just wrote #ARGV. The scalar() function forces scalar context, but if we're comparing equality with a number, Perl can implicitly assume scalar context.
undef $/;
Oof. The $/ variable is a special Perl built-in variable that Perl uses to tell what a "line" of data from a file is. Normally, $/ is set to the string "\n", meaning when Perl tries to read a line it will read up until the next linefeed (or carriage return/linefeed on Windows). Your writer has undef-ed the variable, though, which means when you try to read a "line", Perl will just slurp up the whole file.
my #f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
This is a fun one. <> is a special filehandle that reads line-by-line from each file given on the command line. However, since we've told Perl that a "line" is an entire file, calling <> once will read in the entire file given in the first argument, and storing it temporarily as a string.
Then we take that string and split() it up into pieces, using the regex /(?=(?:SERIAL NUMBER:\s+\d+))/. This uses a lookahead, which tells our regex engine "only match if this stuff comes after our match, but this stuff isn't part of our match," essentially allowing us to look ahead of our match to check on more info. It basically splits the file into pieces, where each piece (except possibly the first) begins with "SERIAL NUMBER:", some arbitrary whitespace (the \s+ part), and then some digits (the \d+ part). I can't teach you regexes, so for more info I recommend reading perldoc perlretut - they explain all of that better than I ever will.
Once we've split the string into a list, we store that list in a list called #f1.
my #f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>);
This does the same thing as the last line, only to the second file, because <> has already read the entire first file, and storing the list in another variable called #f2.
die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2);
This line prints an error message if #f1 and #f2 are different sizes. $#f1 is a special syntax for arrays - it returns the index of the last element, which will usually be the size of the list minus one (lists are 0-indexed, like in most languages). He uses this same value in his error message, which may be deceptive, as it will print 1 fewer than might be expected. I would write it as:
die "Error: file $ARGV[0] has ", $#f1 + 1, " serials, file $ARGV[1] has ", $#f2 + 1, "\n"
if $#f1 != $#f2;
Notice I changed "file1" to "file $ARGV[0]" - that way, it will print the name of the file you specified, rather than just the ambiguous "file1". Notice also that I split up the die() function and the if() condition on two lines. I think it's more readable that way. I also might write unless $#f1 == $#f2 instead of if $#f1 != $#f2, but that's just because I happen to think != is an ugly operator. There's more than one way to do it.
foreach my $g (0 .. $#f1) {
This is a common idiom in Perl. We normally use for() or foreach() (same thing, really) to iterate over each element of a list. However, sometimes we need the indices of that list (some languages might use the term "enumerate"), so we've used the range operator (..) to make a list that goes from 0 to $#f1, i.e., through all the indices of our list, since $#f1 is the value of the highest index in our list. Perl will loop through each index, and in each loop, will assign the value of that index to the lexically-scoped variable $g (though why they didn't use $i like any sane programmer, I don't know - come on, people, this tradition has been around since Fortran!). So the first time through the loop, $g will be 0, and the second time it will be 1, and so on until the last time it is $#f1.
print (($f2[$g] =~ m/RESULT:\s+PASS/) ? $f2[$g] : $f1[$g]);
This is the body of our loop, which uses the ternary conditional operator ?:. There's nothing wrong with the ternary operator, but if the code gives you trouble we can just change it to an if(). Let's just go ahead and rewrite it to use if():
if($f2[$g] =~ m/RESULT:\s+PASS/) {
print $f2[$g];
} else {
print $f1[$g];
}
Pretty simple - we do a regex check on $f2[$g] (the entry in our second file corresponding to the current entry in our first file) that basically checks whether or not that test passed. If it did, we print $f2[$g] (which will tell us that test passed), otherwise we print $f1[$g] (which will tell us the test that failed).
print STDERR "$#f1 serials found\n";
This just prints an ending diagnostic message telling us how many serials were found (minus one, again).
I personally would rewrite that whole hairy bit where he hacks with $/ and then does two reads from <> to be a loop, because I think that would be more readable, but this code should work fine, so if you don't have to change it too much you should be in good shape.
The undef $/ line deactivates the input record separator. Instead of reading records line by line, the interpreter will read whole files at once after that.
The <>, or 'diamond operator' reads from the files from the command line or standard input, whichever makes sense. In your case, the command line is explicitely checked, so it will be files. Input record separation has been deactivated, so each time you see a <>, you can think of it as a function call returning a whole file as a string.
The split operators take this string and cut it in chunks, each time it meets the regular expression in argument. The (?= ... ) construct means "the delimiter is this, but please keep it in the chunked result anyway."
That's all there is to it. There would always be a few optimizations, simplifications, or "other ways to do it," but this should get you running.
You can get quick glimpse how the script works, by translating it into java or scala. The inccode.com translator delivers following java code:
public class script extends CRoutineProcess implements IInProcess
{
VarArray arrF1 = new VarArray();
VarArray arrF2 = new VarArray();
VarBox call ()
{
// !/usr/bin/perl -w
if (!(BoxSystem.ProgramArguments.scalar().isGT(1)))
{
BoxSystem.die(BoxString.is(VarString.is("Usage: ").join(BoxSystem.foundArgument.get(0
).toString()).join(" <file1> <file2>\n")
)
);
}
BoxSystem.InputRecordSeparator.empty();
arrF1.setValue(BoxConsole.readLine().split(BoxPattern.is("(?=(?:SERIAL NUMBER:\\s+\\d+))")));
arrF2.setValue(BoxConsole.readLine().split(BoxPattern.is("(?=(?:SERIAL NUMBER:\\s+\\d+))")));
if ((arrF1.length().isNE(arrF2.length())))
{
BoxSystem.die("Error: file1 has $#f1 serials, file2 has $#f2\n");
}
for (
VarBox varG : VarRange.is(0,arrF1.length()))
{
BoxSystem.print((arrF2.get(varG).like(BoxPattern.is("RESULT:\\s+PASS"))) ? arrF2.get(varG) : arrF1.get(varG)
);
}
return STDERR.print("$#f1 serials found\n");
}
}