Does Perl optimize based on specific arguments, while parsing the source code? - perl

Is perl only checking for syntax errors during the parsing of the source code, or also doing some optimizations based on arguments/parameters?
E.g. if we run:
perl source.pl debug=0
and inside source.pl there is an if condition:
if ($debug == 1) {...} else {...}
Would the "precompilation/parsing" optimize the code so that the "if" check is skipped (of course assuming that $debug is assigned only at the beginning of the code etc, etc.)?
By the way, any idea if TCL does that?
Giorgos
Thanks

Optimizations in Perl are rather limited. This is mostly due to the very permissive type system, and the absence of static typing. Features like eval etc. don't make it any easier, either.
Perl does not optimize code like
my $foo = 1;
if ($foo) { ... }
to
do { ... };
However, one can declare compile time constants:
use constant FOO => 1;
if (FOO) { ... }
which is then optimized (constant folding). Constants are implemented as special subroutines, with the assumption that subs won't be redefined. Literals will be folded as well, so print 1 + 2 + 3 will actually be compiled as print 6
Interesting runtime optimizations include method caching, and regex optimizations.
However, perl won't try to prove certain properties about your code, and will always assume that variables are truly variable, even if they are only ever assigned once.
Given a Perl script, you can look at the way it was parsed and compiled by passing perl the -MO=Deparse option. This turns the compiled opcodes back to Perl code. The output isn't always runnable. When '???' turns up, this indicates code that was optimized away, but is irrelevant. Examples:
$ perl -MO=Deparse -e' "constant" ' # literal in void context
'???';
$ perl -MO=Deparse -e' print 1 + 2 + 3 ' # constant folding
print 6;
$ perl -MO=Deparse -e' print 1 ? "yep" : "nope" ' # constant folding removes branches
print 'yep';

Related

Perl eval command not working as expected

I have a sort of perl "terminal" (pastebin code) we'll call it that I've written, the idea behind writing it is I wanted to run perl code line by line, allowing me to run new commands on existing (large) data sets, without having to change a script and reload the data set and re-run my script.
(Mind you, I wrote this almost a year ago now, and it was mostly a learning experiment (with a dynamic function tablet), however now I have some use for it and discovered some issues which are preventing me from utilising it.)
As such, I eval user entered commands, however, they aren't behaving as expected and perhaps someone can shed some light on why this would be.
This is the 'important' bit, I have the command line data stored in #args, and the first element of that is stored in $prog. I check if there's an existing function (I allow users to create functions, and really abuse references to get an action table) if not I try and eval the command.
if(exists($actions{$prog})){
print "\n";
$actions{$prog}->(#args);
print "\n";
}else{
print "\nEVALing '$command'\n";
eval $command;
warn $# if $#;
print "\n";
}
As can be seen below, it works as expected for the assignment of scalars, but fails with the assignment of arrays and hashes.
user#host:~/$ perl term.pl
1358811935>$a = 0;
EVALing '$a = 0;'
1358811937>print $a;
EVALing 'print $a;'
0
1358811944>#b = qw(2 3);
EVALing '#b = qw(2 3);'
Global symbol "#b" requires explicit package name at (eval 5) line 1.
1358811945>print #b;
EVALing 'print #b;'
Global symbol "#b" requires explicit package name at (eval 6) line 1.
1358812008>my #b = qw(2 3);
EVALing 'my #b = qw(2 3);'
1358812008>print "#b";
EVALing 'print "#b";'
Possible unintended interpolation of #b in string at (eval 9) line 1.
Global symbol "#b" requires explicit package name at (eval 9) line 1.
1358812016>print join(',',#b);
EVALing 'print join(',',#b);'
Global symbol "#b" requires explicit package name at (eval 10) line 1.
1358812018>
Variables $a and $b are special, because they are used by sort. Therefore, strict does not complain if they are not declared. Using $x would trigger the same error as arrays and hashes.
For this kind of thing, you probably want to allow arbitrary package variables to be used by saying no strict 'vars';. Declaring a lexical (my) variable in the eval'd code will work, but will no longer be in scope for the next eval.
Alternatively, pre-declare a set of variables for the eval'd code to use (perhaps including a %misc hash).
A completely different approach is to each time through eval a concatenation of all the code entered so far (if printing output is a factor, redirecting output up until the most recent code entered).

perl: what is the right way to call a function stored in a variable?

What is the right way to call a function stored in a variable?
my $f = sub () { ... };
&$f(); # 1st
$f->(); # 2nd
Both appear to work, and the first probably worked in perl4.
However, I was wondering what the "official perl5 way" was.
Also, are there any performance implications?
Both are the right way. Perl is not about forcing any special style down your throat.
Style #1 &$f()
Pro:
Emphasizes that we are using a subroutine
Con:
Looks like line noise
Overrides function templates
Seems a bit perl4-ly to me
Caveats:
In the dark ages of perl4, there were no references. One could simulate references by passing around variable names (*shudder*). This also works with subs, so this code runs:
sub f { (shift == 0) ? 1 : 0 }
$g = "f";
print &$g(1); # prints 0;
print &$g(0); # prints 1;
Please use strict 'refs' to guard against this horror.
Style #2 $f->()
Pro:
Emphasizes that we are handling a reference
Looks cleaner
Con:
can be confused with objects and hashrefs
Caveats:
Same as with the other syntax, as they are the same under the hood. But the dereference operator is not misused as often.
Performance implications
Lets face it, if we were all about performance, we would be writing assembler. If you want to optimize Perl, first optimize the algorithm, then code everything in C/XS, throw away any objects and modules, and finally discuss dereferencing syntax.
I would guess style #1 is faster in theory, but I doubt it would have serious implications in real life.
I sincerely doubt there are any differences performance, since both methods result in the same code:
$ perl -MO=Deparse -e'&$f()'
&$f();
-e syntax OK
$ perl -MO=Deparse -e'$f->()'
&$f();
-e syntax OK

Usage of defined with Filehandle and while Loop

While reading a book on advanced Perl programming(1), I came across
this code:
while (defined($s = <>)) {
...
Is there any special reason for using defined here? The documentation for
perlop says:
In these loop constructs, the assigned value (whether assignment is
automatic or explicit) is then tested to see whether it is defined. The
defined test avoids problems where line has a string value that would be
treated as false by Perl, for example a "" or a "0" with no trailing
newline. If you really mean for such values to terminate the loop, they
should be tested for explicitly: [...]
So, would there be a corner case or that's simply because the book is too old
and the automatic defined test was added in a recent Perl version?
(1) Advanced Perl Programming, First Edition, Sriram Srinivasan. O'Reilly
(1997)
Perl has a lot of implicit behaviors, many more than most other languages. Perl's motto is There's More Than One To Do It, and because there is so much implicit behavior, there is often More Than One Way To express the exact same thing.
/foo/ instead of $_ =~ m/foo/
$x = shift instead of $x = shift #_
while (defined($_=<ARGV>)) instead of while(<>)
etc.
Which expressions to use are largely a matter of your local coding standards and personal preference. The more explicit expressions remind the reader what is really going on under the hood. This may or may not improve the readability of the code -- that depends on how knowledgeable the audience is and whether you are using well-known idioms.
In this case, the implicit behavior is a little more complicated than it seems. Sometimes perl will implicitly perform a defined(...) test on the result of the readline operator:
$ perl -MO=Deparse -e 'while($s=<>) { print $s }'
while (defined($s = <ARGV>)) {
print $s;
}
-e syntax OK
but sometimes it won't:
$ perl -MO=Deparse -e 'if($s=<>) { print $s }'
if ($s = <ARGV>) {
print $s;
}
-e syntax OK
$ perl -MO=Deparse -e 'while(some_condition() && ($s=<>)) { print $s }'
while (some_condition() and $s = <ARGV>) {
print $s;
}
-e syntax OK
Suppose that you are concerned about the corner cases that this implicit behavior is supposed to handle. Have you committed perlop to memory so that you understand when Perl uses this implicit behavior and when it doesn't? Do you understand the differences in this behavior between Perl v5.14 and Perl v5.6? Will the people reading your code understand?
Again, there's no right or wrong answer about when to use the more explicit expressions, but the case for using an explicit expression is stronger when the implicit behavior is more esoteric.
Say you have the following file
4<LF>
3<LF>
2<LF>
1<LF>
0
(<LF> represents a line feed. Note the lack of newline on the last line.)
Say you use the code
while ($s = <>) {
chomp;
say $s;
}
If Perl didn't do anything magical, the output would be
4
3
2
1
Note the lack of 0, since the string 0 is false. defined is needed in the unlikely case that
You have a non-standard text file (missing trailing newline).
The last line of the file consists of a single ASCII zero (0x30).
BUT WAIT A MINUTE! If you actually ran the above code with the above data, you would see 0 printed! What many don't know is that Perl automagically translates
while ($s = <>) {
to
while (defined($s = <>)) {
as seen here:
$ perl -MO=Deparse -e'while($s=<DATA>) {}'
while (defined($s = <DATA>)) {
();
}
__DATA__
-e syntax OK
So you technically don't even need to specify defined in this very specific circumstance.
That said, I can't blame someone for being explicit instead of relying on Perl automagically modifying their code. After all, Perl is (necessarily) quite specific as to which code sequences it will change. Note the lack of defined in the following even though it's supposedly equivalent code:
$ perl -MO=Deparse -e'while((), $s=<DATA>) {}'
while ((), $s = <DATA>) {
();
}
__DATA__
-e syntax OK
while($line=<DATA>){
chomp($line);
if(***defined*** $line){
print "SEE:$line\n";
}
}
__DATA__
1
0
3
Try the code with defined removed and you will see the different result.

Why is parenthesis optional only after sub declaration?

(Assume use strict; use warnings; throughout this question.)
I am exploring the usage of sub.
sub bb { print #_; }
bb 'a';
This works as expected. The parenthesis is optional, like with many other functions, like print, open etc.
However, this causes a compilation error:
bb 'a';
sub bb { print #_; }
String found where operator expected at t13.pl line 4, near "bb 'a'"
(Do you need to predeclare bb?)
syntax error at t13.pl line 4, near "bb 'a'"
Execution of t13.pl aborted due to compilation errors.
But this does not:
bb('a');
sub bb { print #_; }
Similarly, a sub without args, such as:
special_print;
my special_print { print $some_stuff }
Will cause this error:
Bareword "special_print" not allowed while "strict subs" in use at t13.pl line 6.
Execution of t13.pl aborted due to compilation errors.
Ways to alleviate this particular error is:
Put & before the sub name, e.g. &special_print
Put empty parenthesis after sub name, e.g. special_print()
Predeclare special_print with sub special_print at the top of the script.
Call special_print after the sub declaration.
My question is, why this special treatment? If I can use a sub globally within the script, why can't I use it any way I want it? Is there a logic to sub being implemented this way?
ETA: I know how I can fix it. I want to know the logic behind this.
I think what you are missing is that Perl uses a strictly one-pass parser. It does not scan the file for subroutines, and then go back and compile the rest. Knowing this, the following describes how the one pass parse system works:
In Perl, the sub NAME syntax for declaring a subroutine is equivalent to the following:
sub name {...} === BEGIN {*name = sub {...}}
This means that the sub NAME syntax has a compile time effect. When Perl is parsing source code, it is working with a current set of declarations. By default, the set is the builtin functions. Since Perl already knows about these, it lets you omit the parenthesis.
As soon as the compiler hits a BEGIN block, it compiles the inside of the block using the current rule set, and then immediately executes the block. If anything in that block changes the rule set (such as adding a subroutine to the current namespace), those new rules will be in effect for the remainder of the parse.
Without a predeclared rule, an identifier will be interpreted as follows:
bareword === 'bareword' # a string
bareword LIST === syntax error, missing ','
bareword() === &bareword() # runtime execution of &bareword
&bareword === &bareword # same
&bareword() === &bareword() # same
When using strict and warnings as you have stated, barewords will not be converted into strings, so the first example is a syntax error.
When predeclared with any of the following:
sub bareword;
use subs 'bareword';
sub bareword {...}
BEGIN {*bareword = sub {...}}
Then the identifier will be interpreted as follows:
bareword === &bareword() # compile time binding to &bareword
bareword LIST === &bareword(LIST) # same
bareword() === &bareword() # same
&bareword === &bareword # same
&bareword() === &bareword() # same
So in order for the first example to not be a syntax error, one of the preceding subroutine declarations must be seen first.
As to the why behind all of this, Perl has a lot of legacy. One of the goals in developing Perl was complete backwards compatibility. A script that works in Perl 1 still works in Perl 5. Because of this, it is not possible to change the rules surrounding bareword parsing.
That said, you will be hard pressed to find a language that is more flexible in the ways it lets you call subroutines. This allows you to find the method that works best for you. In my own code, if I need to call a subroutine before it has been declared, I usually use name(...), but if that subroutine has a prototype, I will call it as &name(...) (and you will get a warning "subroutine called too early to check prototype" if you don't call it this way).
The best answer I can come up with is that's the way Perl is written. It's not a satisfying answer, but in the end, it's the truth. Perl 6 (if it ever comes out) won't have this limitation.
Perl has a lot of crud and cruft from five different versions of the language. Perl 4 and Perl 5 did some major changes which can cause problems with earlier programs written in a free flowing manner.
Because of the long history, and the various ways Perl has and can work, it can be difficult for Perl to understand what's going on. When you have this:
b $a, $c;
Perl has no way of knowing if b is a string and is simply a bareword (which was allowed in Perl 4) or if b is a function. If b is a function, it should be stored in the symbol table as the rest of the program is parsed. If b isn't a subroutine, you shouldn't put it in the symbol table.
When the Perl compiler sees this:
b($a, $c);
It doesn't know what the function b does, but it at least knows it's a function and can store it in the symbol table waiting for the definition to come later.
When you pre-declare your function, Perl can see this:
sub b; #Or use subs qw(b); will also work.
b $a, $c;
and know that b is a function. It might not know what the function does, but there's now a symbol table entry for b as a function.
One of the reasons for Perl 6 is to remove much of the baggage left from the older versions of Perl and to remove strange things like this.
By the way, never ever use Perl Prototypes to get around this limitation. Use use subs or predeclare a blank subroutine. Don't use prototypes.
Parentheses are optional only if the subroutine has been predeclared. This is documented in perlsub.
Perl needs to know at compile time whether the bareword is a subroutine name or a string literal. If you use parentheses, Perl will guess that it's a subroutine name. Otherwise you need to provide this information beforehand (e.g. using subs).
The reason is that Larry Wall is a linguist, not a computer scientist.
Computer scientist: The grammar of the language should be as simple & clear as possible.
Avoids complexity in the compiler
Eliminates sources of ambiguity
Larry Wall: People work differently from compilers. The language should serve the programmer, not the compiler. See also Larry Wall's outline of the three virtues of a programmer.

How can I eval environment variables in Perl?

I would like to evaluate an environment variable and set the result to a variable:
$x=eval($ENV{EDITOR});
print $x;
outputs:
/bin/vi
works fine.
If I set an environment variable QUOTE to \' and try the same thing:
$x=eval($ENV{QUOTE});
print $x;
outputs:
(nothing)
$# set to: "Can't find a string terminator anywhere before ..."
I do not wish to simply set $x=$ENV{QUOTE}; as the eval is also used to call a script and return its last value (very handy), so I would like to stick with the eval(); Note that all of the Environment variables eval'ed in this manner are set by me in a different place so I am not concerned with malicious access to the environment variables eval-ed in this way.
Suggestions?
Well, of course it does nothing.
If your ENV varaible contains text which is half code, but isn't and you give the resulting string to something that evaluates that code as Perl, of course it's not going to work.
You only have 3 options:
Programmatically process the string so it doesn't have invalid syntax in it
Manually make sure your ENV variables are not rubbish
Find a solution not involving eval but gives the right result.
You may as well complain that
$x = '
Is not valid code, because that's essentially what's occurring.
Samples of Fixing the value of 'QUOTE' to work
# Bad.
QUOTE="'" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# Can't find string terminator "'" anywhere before EOF at (eval 1) line 1.
# Bad.
QUOTE="\'" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# Can't find string terminator "'" anywhere before EOF at (eval 1) line 1.
# Bad.
QUOTE="\\'" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# Can't find string terminator "'" anywhere before EOF at (eval 1) line 1.
# Good
QUOTE="'\''" perl -wWe 'print eval $ENV{QUOTE}; print "$#"'
# '
Why are you eval'ing in the first place? Should you just say
my $x = $ENV{QUOTE};
print "$x\n";
The eval is executing the string in $ENV{QUOTE} as if it were Perl code, which I certainly hope it isn't. That is why \ disappears. If you were to check the $# variable you would find an error message like
syntax error at (eval 1) line 2, at EOF
If you environment variables are going to contain code that Perl should be executing then you should look into the Safe module. It allows you to control what sort of code can execute in an eval so you don't accidentally wind up executing something like "use File::Find; find sub{unlink $File::Find::file}, '.'"
Evaluating an environment value is very dangerous, and would generate errors if running under taint mode.
# purposely broken
QUOTE='`rm system`'
$x=eval($ENV{QUOTE});
print $x;
Now just imagine if this script was running with root access, and was changed to actually delete the file system.
Kent's answer, while technically correct, misses the point. The solution is not to use eval better, but to not use eval at all!
The crux of this problem seems to be in understanding what eval STRING does (there is eval BLOCK which is completely different despite having the same name). It takes a string and runs it as Perl code. 99.99% this is unnecessary and dangerous and results in spaghetti code and you absolutely should not be using it so early in your Perl programming career. You have found the gun in your dad's sock drawer. Discovering that it can blow holes in things you are now trying to use it to hang a poster. It's better to forget it exists, your code will be so much better for it.
$x = eval($ENV{EDITOR}); does not do what you think it does. I don't even have to know what you think it does, that you even used it there means you don't know. I also know that you're running with warnings off because Perl would have screamed at you for that. Why? Let's assume that EDITOR is set to /bin/vi. The above is equivalent to $x = /bin/vi which isn't even valid Perl code.
$ EDITOR=/bin/vi perl -we '$x=eval($ENV{EDITOR}); print $x'
Bareword found where operator expected at (eval 1) line 1, near "/bin/vi"
(Missing operator before vi?)
Unquoted string "vi" may clash with future reserved word at (eval 1) line 2.
Use of uninitialized value $x in print at -e line 1.
I'm not sure how you got it to work in the first place. I suspect you left something out of your example. Maybe tweaking EDITOR until it worked?
You don't have to do anything magical to read an environment variable. Just $x = $ENV{EDITOR}. Done. $x is now /bin/vi as you wanted. It's just the same as $x = $y. Same thing with QUOTE.
$ QUOTE=\' perl -wle '$x=$ENV{QUOTE}; print $x'
'
Done.
Now, I suspect what you really want to do is run that editor and use that quote in some shell command. Am I right?
Well, you could double-escape the QUOTE's value, I guess, since you know that it's going to be evaled.
Maybe what you want is not Perl's eval but to evaluate the environment variable as the shell would. For this, you want to use backticks.
$x = `$ENV{QUOTE}`