Usage of defined with Filehandle and while Loop - perl

While reading a book on advanced Perl programming(1), I came across
this code:
while (defined($s = <>)) {
...
Is there any special reason for using defined here? The documentation for
perlop says:
In these loop constructs, the assigned value (whether assignment is
automatic or explicit) is then tested to see whether it is defined. The
defined test avoids problems where line has a string value that would be
treated as false by Perl, for example a "" or a "0" with no trailing
newline. If you really mean for such values to terminate the loop, they
should be tested for explicitly: [...]
So, would there be a corner case or that's simply because the book is too old
and the automatic defined test was added in a recent Perl version?
(1) Advanced Perl Programming, First Edition, Sriram Srinivasan. O'Reilly
(1997)

Perl has a lot of implicit behaviors, many more than most other languages. Perl's motto is There's More Than One To Do It, and because there is so much implicit behavior, there is often More Than One Way To express the exact same thing.
/foo/ instead of $_ =~ m/foo/
$x = shift instead of $x = shift #_
while (defined($_=<ARGV>)) instead of while(<>)
etc.
Which expressions to use are largely a matter of your local coding standards and personal preference. The more explicit expressions remind the reader what is really going on under the hood. This may or may not improve the readability of the code -- that depends on how knowledgeable the audience is and whether you are using well-known idioms.
In this case, the implicit behavior is a little more complicated than it seems. Sometimes perl will implicitly perform a defined(...) test on the result of the readline operator:
$ perl -MO=Deparse -e 'while($s=<>) { print $s }'
while (defined($s = <ARGV>)) {
print $s;
}
-e syntax OK
but sometimes it won't:
$ perl -MO=Deparse -e 'if($s=<>) { print $s }'
if ($s = <ARGV>) {
print $s;
}
-e syntax OK
$ perl -MO=Deparse -e 'while(some_condition() && ($s=<>)) { print $s }'
while (some_condition() and $s = <ARGV>) {
print $s;
}
-e syntax OK
Suppose that you are concerned about the corner cases that this implicit behavior is supposed to handle. Have you committed perlop to memory so that you understand when Perl uses this implicit behavior and when it doesn't? Do you understand the differences in this behavior between Perl v5.14 and Perl v5.6? Will the people reading your code understand?
Again, there's no right or wrong answer about when to use the more explicit expressions, but the case for using an explicit expression is stronger when the implicit behavior is more esoteric.

Say you have the following file
4<LF>
3<LF>
2<LF>
1<LF>
0
(<LF> represents a line feed. Note the lack of newline on the last line.)
Say you use the code
while ($s = <>) {
chomp;
say $s;
}
If Perl didn't do anything magical, the output would be
4
3
2
1
Note the lack of 0, since the string 0 is false. defined is needed in the unlikely case that
You have a non-standard text file (missing trailing newline).
The last line of the file consists of a single ASCII zero (0x30).
BUT WAIT A MINUTE! If you actually ran the above code with the above data, you would see 0 printed! What many don't know is that Perl automagically translates
while ($s = <>) {
to
while (defined($s = <>)) {
as seen here:
$ perl -MO=Deparse -e'while($s=<DATA>) {}'
while (defined($s = <DATA>)) {
();
}
__DATA__
-e syntax OK
So you technically don't even need to specify defined in this very specific circumstance.
That said, I can't blame someone for being explicit instead of relying on Perl automagically modifying their code. After all, Perl is (necessarily) quite specific as to which code sequences it will change. Note the lack of defined in the following even though it's supposedly equivalent code:
$ perl -MO=Deparse -e'while((), $s=<DATA>) {}'
while ((), $s = <DATA>) {
();
}
__DATA__
-e syntax OK

while($line=<DATA>){
chomp($line);
if(***defined*** $line){
print "SEE:$line\n";
}
}
__DATA__
1
0
3
Try the code with defined removed and you will see the different result.

Related

Is it possible to pass command-line arguments to #ARGV when using the -n or -p options?

I think the title of my question basically covers it. Here's a contrived example which tries to filter for input lines that exactly equal a parameterized string, basically a Perlish fgrep -x:
perl -ne 'chomp; print if $_ eq $ARGV[0];' bb <<<$'aa\nbb\ncc';
## Can't open bb: No such file or directory.
The problem of course is that the -n option creates an implicit while (<>) { ... } loop around the code, and the diamond operator gobbles up all command-line arguments for file names. So, although technically the bb argument did get to #ARGV, the whole program fails because the argument was also picked up by the diamond operator. The end result is, it is impossible to pass command-line arguments to the Perl program when using -n.
I suppose what I really want is an option that would create an implicit while (<STDIN>) { ... } loop around the code, so command-line arguments wouldn't be taken for file names, but such a thing does not exist.
I can think of three possible workarounds:
1: BEGIN { ... } block to copy and clear #ARGV.
perl -ne 'BEGIN { our #x = shift(#ARGV); } chomp; print if $_ eq $x[0];' bb <<<$'aa\nbb\ncc';
## bb
2: Manually code the while-loop in the one-liner.
perl -e 'while (<STDIN>) { chomp; print if $_ eq $ARGV[0]; }' bb <<<$'aa\nbb\ncc';
## bb
3: Find another way to pass the arguments, such as environment variables.
PAT=bb perl -ne 'chomp; print if $_ eq $ENV{PAT};' <<<$'aa\nbb\ncc';
## bb
The BEGIN { ... } block solution is undesirable since it constitutes a bit of a jarring context switch in the one-liner, is somewhat verbose, and requires messing with the special variable #ARGV.
I consider the manual while-loop solution to be more of a non-solution, since it forsakes the -n option entirely, and the point is I want to be able to use the -n option with command-line arguments.
The same can be said for the environment variable solution; the point is I want to be able to use command-line arguments with the -n option.
Is there a better way?
You've basically identified them all. The only one you missed, that I know of at least, is the option of passing switch arguments (instead of positional arguments):
$ perl -sne'chomp; print if $_ eq $kwarg' -- -kwarg=bb <<<$'aa\nbb\ncc';
bb
You could also use one of the many getopt modules instead of -s. This is essentially doing the same thing as manipulating #ARGV in a BEGIN {} block before the main program loop, but doing it for you and making it a little cleaner for a one-liner.

print doesn't recognize barewords as parameter?

In non-strict mode of Perl, barewords could be recognized as string, like below:
$x = hello;
print $x;
But it seems barewords cannot be passed to print directly, like below one doesn't output the string. Why are they different?
print hello;
In cases like this, the B::Deparse module can be very helpful:
$ perl -MO=Deparse -e 'print hello;'
print hello $_;
-e syntax OK
As you see, it interprets the identifier as a filehandle and takes the value to be printed from $_.
Barewords should be avoided like the plague they are. Personally, I'll continue to avoid them all the time. They're a relic left over from the wild west days of Perl (circa 1990) and would have been eliminated from the language except for the need to maintain backwards compatibility.
In any case, in that context, it prints $_ to the file handle hello, doesn't it? That's the sort of reason why barewords are worth avoiding. (Compare: print STDERR "Hello\n"; and print STDERR;, and print hello;).
For example:
open hello, ">junk.out";
while (<>)
{
print STDERR "Hello\n";
print STDERR;
print hello;
}
Sample run:
$ perl hello.pl
abc
Hello
abc
def
Hello
def
$ cat junk.out
abc
def
$
In case it isn't clear, I typed one line of abc, which was followed by Hello and abc on standard error; then I typed def, which was followed by Hello and def on standard error. I typed Control-D to indicate EOF, and showed the contents of the file junk.out (which didn't exist before I ran the script). It contained the two lines that I'd typed.
So, don't use barewords — they're confusing. And do use use strict; and use warnings; so that you have less opportunity to be confused.
An identifier is only a bareword if it has no other meaning.
A word that has no other interpretation in the grammar will be treated as if it were a quoted string. These are known as "barewords".
So, for example, there are no barewords in the following program:
sub f { print STDOUT "f()\n"; }
X: f;
In other circumstances, all of sub, f, print, STDOUT and X could be barewords, but they all have other meanings here. I could add use strict;, and it'll still work fine.
In your code, you used print hello as I used print STDOUT. If an identifier follows print, you are using the print FILEHANDLE LIST syntax of print, where the identifier is the name of a file handle.

Does Perl optimize based on specific arguments, while parsing the source code?

Is perl only checking for syntax errors during the parsing of the source code, or also doing some optimizations based on arguments/parameters?
E.g. if we run:
perl source.pl debug=0
and inside source.pl there is an if condition:
if ($debug == 1) {...} else {...}
Would the "precompilation/parsing" optimize the code so that the "if" check is skipped (of course assuming that $debug is assigned only at the beginning of the code etc, etc.)?
By the way, any idea if TCL does that?
Giorgos
Thanks
Optimizations in Perl are rather limited. This is mostly due to the very permissive type system, and the absence of static typing. Features like eval etc. don't make it any easier, either.
Perl does not optimize code like
my $foo = 1;
if ($foo) { ... }
to
do { ... };
However, one can declare compile time constants:
use constant FOO => 1;
if (FOO) { ... }
which is then optimized (constant folding). Constants are implemented as special subroutines, with the assumption that subs won't be redefined. Literals will be folded as well, so print 1 + 2 + 3 will actually be compiled as print 6
Interesting runtime optimizations include method caching, and regex optimizations.
However, perl won't try to prove certain properties about your code, and will always assume that variables are truly variable, even if they are only ever assigned once.
Given a Perl script, you can look at the way it was parsed and compiled by passing perl the -MO=Deparse option. This turns the compiled opcodes back to Perl code. The output isn't always runnable. When '???' turns up, this indicates code that was optimized away, but is irrelevant. Examples:
$ perl -MO=Deparse -e' "constant" ' # literal in void context
'???';
$ perl -MO=Deparse -e' print 1 + 2 + 3 ' # constant folding
print 6;
$ perl -MO=Deparse -e' print 1 ? "yep" : "nope" ' # constant folding removes branches
print 'yep';

perl s/this/that/r ==> "Bareword found where operator expected"

Perl docs recommend this:
$foo = $bar =~ s/this/that/r;
However, I get this error:
Bareword found where operator expected near
"s/this/that/r" (#1)
This is specific to the r modifier, without it the code works.
However, I do not want to modify $bar.
I can, of course, replace
my $foo = $bar =~ s/this/that/r;
with
my $foo = $bar;
$foo =~ s/this/that/;
Is there a better solution?
As ruakh wrote, /r is new in perl 5.14. However you can do this in previous versions of perl:
(my $foo = $bar) =~ s/this/that/;
There's no better solution, no (though I usually write it on one line, since the s/// is essentially serving as part of the initialization process:
my $foo = $bar; $foo =~ s/this/that/;
By the way, the reason for your error-message is almost certainly that you're running a version of Perl that doesn't support the /r flag. That flag was added quite recently, in Perl 5.14. You might find it easier to develop using the documentation for your own version; for example, http://perldoc.perl.org/5.12.4/perlop.html if you're on Perl 5.12.4.
For completeness.
If you are stuck with an older version of perl.
And really want to use the s/// command without resorting to using a temporary variable.
Here is one way:
perl -E 'say map { s/_iter\d+\s*$//; $_ } $ENV{PWD}'
Basically use map to transform a copy of the string and return the final output.
Instead of what s/// does - of returning the count of substitutions.

perl encapsulate single variable in double quotes

In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.