Perl `defined' and `undef' subroutine scope - perl

Please take a look at the following code:
use strict;
use warnings;
print "subroutine is defined\n" if defined &myf;
myf();
sub myf
{
print "called myf\n";
}
undef &myf;
#myf();
print "now subroutine is defined\n" if defined &myf;
The output is
subroutine is defined
called myf
The first print statement can print, does that mean the interpreter (or compiler?) looks further and sees the subroutine definition? If so, why it doesn't see the undef &myf; as the second print statement?
Thanks

That doesn't have to do with scope, but with compile time and run time. Here's a simplified explanation.
The Perl interpreter will scan your code initially, and follow any use statements or BEGIN blocks. At that point, it sees all the subs, and notes them down in their respective packages. So now you have a &::myf in your symbol table.
When compile time has reached the end of the program, it will switch into run time.
At that point, it actually runs the code. Your first print statement is executed if &myf is defined. We know it is, because it got set at compile time. Perl then calls that function. All is well. Now you undef that entry in the symbol table. That occurs at run time, too.
After that, defined &myf returns false, so it doesn't print.
You even have the second call to myf() there in the code, but commented out. If you remove the comment, it will complain about Undefined subroutine &main::myf called. That's a good hint at what happened.
So in fact it doesn't look forward or backward in the code. It is already finished scanning the code at that time.
The different stages are explained in perlmod.
Note that there are not a lot of use cases for actually undefing a function. I don't see why you would remove it, unless you wanted to clean up your namespace manually.

Related

Perl Function call

$\ = "\n";
sub foo
{
print("one");
}
foo(); // mark1
sub foo
{
print("two");
}
foo(); //mark2
On executing the above code, your output will be : two, two. As far as I understand Perl is a interpreter, so when foo(mark1) is called, shouldn't one get printed first, and when foo()(mark2) is called two must get printed. But why is two getting printed both the times, and please explain how?
Because Perl isn't an interpreted language in the way that you understand it. Perl code is compiled before it is run. There's no separate compilation step for you to run, but the compiler parses and compiles all of the source code before starting to execute the program.
If you had included use warnings in your code, then you would have seen the following warning (before the output from the first function call):
Subroutine foo redefined at func line 12.
Which makes it pretty clear what is going on.
Oh, and by the way - // is not a comment in Perl. You wanted #.

identify a procedure and replace it with a different procedure

What I want to achieve:
###############CODE########
old_procedure(arg1, arg2);
#############CODE_END######
I have a huge code which has a old procedure in it. I want that the call to that old_procedure go to a call to a new procedure (new_procedure(arg1, arg2)) with the same arguments.
Now I know, the question seems pretty stupid but the trick is I am not allowed to change the code or the bad_function. So the only thing I can do it create a procedure externally which reads the code flow or something and then whenever it finds the bad_function, it replaces it with the new_function. They have a void type, so don't have to worry about the return values.
I am usng perl. If someone knows how to atleast start in this direction...please comment or answer. It would be nice if the new code can be done in perl or C, but other known languages are good too. C++, java.
EDIT: The code is written in shell script and perl. I cannot edit the code and I don't have location of the old_function, I mean I can find it...but its really tough. So I can use the package thing pointed out but if there is a way around it...so that I could parse the thread with that function and replace function calls. Please don't remove tags as I need suggestions from java, C++ experts also.
EDIT: #mirod
So I tried it out and your answer made a new subroutine and now there is no way of accessing the old one. I had created an variable which checks the value to decide which way to go( old_sub or new_sub)...is there a way to add the variable in the new code...which sends the control back to old_function if it is not set...
like:
use BadPackage; # sub is defined there
BEGIN
{ package BapPackage;
no warnings; # to avoid the "Subroutine bad_sub redefined" message
# check for the variable and send to old_sub if the var is not set
sub bad_sub
{ # good code
}
}
# Thanks #mirod
This is easier to do in Perl than in a lot of other languages, but that doesn't mean it's easy, and I don't know if it's what you want to hear. Here's a proof-of-concept:
Let's take some broken code:
# file name: Some/Package.pm
package Some::Package;
use base 'Exporter';
our #EXPORT = qw(forty_two nineteen);
sub forty_two { 19 }
sub nineteen { 19 }
1;
# file name: main.pl
use Some::Package;
print "forty-two plus nineteen is ", forty_two() + nineteen();
Running the program perl main.pl produces the output:
forty-two plus nineteen is 38
It is given that the files Some/Package.pm and main.pl are broken and immutable. How can we fix their behavior?
One way we can insert arbitrary code to a perl command is with the -M command-line switch. Let's make a repair module:
# file: MyRepairs.pm
CHECK {
no warnings 'redefine';
*forty_two = *Some::Package::forty_two = sub { 42 };
};
1;
Now running the program perl -MMyRepairs main.pl produces:
forty-two plus nineteen is 61
Our repair module uses a CHECK block to execute code in between the compile-time and run-time phase. We want our code to be the last code run at compile-time so it will overwrite some functions that have already been loaded. The -M command-line switch will run our code first, so the CHECK block delays execution of our repairs until all the other compile time code is run. See perlmod for more details.
This solution is fragile. It can't do much about modules loaded at run-time (with require ... or eval "use ..." (these are common) or subroutines defined in other CHECK blocks (these are rare).
If we assume the shell script that runs main.pl is also immutable (i.e., we're not allowed to change perl main.pl to perl -MMyRepairs main.pl), then we move up one level and pass the -MMyRepairs in the PERL5OPT environment variable:
PERL5OPT="-I/path/to/MyRepairs -MMyRepairs" bash the_immutable_script_that_calls_main_pl.sh
These are called automated refactoring tools and are common for other languages. For Perl though you may well be in a really bad way because parsing Perl to find all the references is going to be virtually impossible.
Where is the old procedure defined?
If it is defined in a package, you can switch to the package, after it has been used, and redefine the sub:
use BadPackage; # sub is defined there
BEGIN
{ package BapPackage;
no warnings; # to avoid the "Subroutine bad_sub redefined" message
sub bad_sub
{ # good code
}
}
If the code is in the same package but in a different file (loaded through a require), you can do the same thing without having to switch package.
if all the code is in the same file, then change it.
sed -i 's/old_procedure/new_procedure/g codefile
Is this what you mean?

Why is parenthesis optional only after sub declaration?

(Assume use strict; use warnings; throughout this question.)
I am exploring the usage of sub.
sub bb { print #_; }
bb 'a';
This works as expected. The parenthesis is optional, like with many other functions, like print, open etc.
However, this causes a compilation error:
bb 'a';
sub bb { print #_; }
String found where operator expected at t13.pl line 4, near "bb 'a'"
(Do you need to predeclare bb?)
syntax error at t13.pl line 4, near "bb 'a'"
Execution of t13.pl aborted due to compilation errors.
But this does not:
bb('a');
sub bb { print #_; }
Similarly, a sub without args, such as:
special_print;
my special_print { print $some_stuff }
Will cause this error:
Bareword "special_print" not allowed while "strict subs" in use at t13.pl line 6.
Execution of t13.pl aborted due to compilation errors.
Ways to alleviate this particular error is:
Put & before the sub name, e.g. &special_print
Put empty parenthesis after sub name, e.g. special_print()
Predeclare special_print with sub special_print at the top of the script.
Call special_print after the sub declaration.
My question is, why this special treatment? If I can use a sub globally within the script, why can't I use it any way I want it? Is there a logic to sub being implemented this way?
ETA: I know how I can fix it. I want to know the logic behind this.
I think what you are missing is that Perl uses a strictly one-pass parser. It does not scan the file for subroutines, and then go back and compile the rest. Knowing this, the following describes how the one pass parse system works:
In Perl, the sub NAME syntax for declaring a subroutine is equivalent to the following:
sub name {...} === BEGIN {*name = sub {...}}
This means that the sub NAME syntax has a compile time effect. When Perl is parsing source code, it is working with a current set of declarations. By default, the set is the builtin functions. Since Perl already knows about these, it lets you omit the parenthesis.
As soon as the compiler hits a BEGIN block, it compiles the inside of the block using the current rule set, and then immediately executes the block. If anything in that block changes the rule set (such as adding a subroutine to the current namespace), those new rules will be in effect for the remainder of the parse.
Without a predeclared rule, an identifier will be interpreted as follows:
bareword === 'bareword' # a string
bareword LIST === syntax error, missing ','
bareword() === &bareword() # runtime execution of &bareword
&bareword === &bareword # same
&bareword() === &bareword() # same
When using strict and warnings as you have stated, barewords will not be converted into strings, so the first example is a syntax error.
When predeclared with any of the following:
sub bareword;
use subs 'bareword';
sub bareword {...}
BEGIN {*bareword = sub {...}}
Then the identifier will be interpreted as follows:
bareword === &bareword() # compile time binding to &bareword
bareword LIST === &bareword(LIST) # same
bareword() === &bareword() # same
&bareword === &bareword # same
&bareword() === &bareword() # same
So in order for the first example to not be a syntax error, one of the preceding subroutine declarations must be seen first.
As to the why behind all of this, Perl has a lot of legacy. One of the goals in developing Perl was complete backwards compatibility. A script that works in Perl 1 still works in Perl 5. Because of this, it is not possible to change the rules surrounding bareword parsing.
That said, you will be hard pressed to find a language that is more flexible in the ways it lets you call subroutines. This allows you to find the method that works best for you. In my own code, if I need to call a subroutine before it has been declared, I usually use name(...), but if that subroutine has a prototype, I will call it as &name(...) (and you will get a warning "subroutine called too early to check prototype" if you don't call it this way).
The best answer I can come up with is that's the way Perl is written. It's not a satisfying answer, but in the end, it's the truth. Perl 6 (if it ever comes out) won't have this limitation.
Perl has a lot of crud and cruft from five different versions of the language. Perl 4 and Perl 5 did some major changes which can cause problems with earlier programs written in a free flowing manner.
Because of the long history, and the various ways Perl has and can work, it can be difficult for Perl to understand what's going on. When you have this:
b $a, $c;
Perl has no way of knowing if b is a string and is simply a bareword (which was allowed in Perl 4) or if b is a function. If b is a function, it should be stored in the symbol table as the rest of the program is parsed. If b isn't a subroutine, you shouldn't put it in the symbol table.
When the Perl compiler sees this:
b($a, $c);
It doesn't know what the function b does, but it at least knows it's a function and can store it in the symbol table waiting for the definition to come later.
When you pre-declare your function, Perl can see this:
sub b; #Or use subs qw(b); will also work.
b $a, $c;
and know that b is a function. It might not know what the function does, but there's now a symbol table entry for b as a function.
One of the reasons for Perl 6 is to remove much of the baggage left from the older versions of Perl and to remove strange things like this.
By the way, never ever use Perl Prototypes to get around this limitation. Use use subs or predeclare a blank subroutine. Don't use prototypes.
Parentheses are optional only if the subroutine has been predeclared. This is documented in perlsub.
Perl needs to know at compile time whether the bareword is a subroutine name or a string literal. If you use parentheses, Perl will guess that it's a subroutine name. Otherwise you need to provide this information beforehand (e.g. using subs).
The reason is that Larry Wall is a linguist, not a computer scientist.
Computer scientist: The grammar of the language should be as simple & clear as possible.
Avoids complexity in the compiler
Eliminates sources of ambiguity
Larry Wall: People work differently from compilers. The language should serve the programmer, not the compiler. See also Larry Wall's outline of the three virtues of a programmer.

In Perl, is there any way to tie a stash?

Similar to the way AUTOLOAD can be used to define subroutines on demand, I am wondering if there is a way to tie a package's stash so that I can intercept access to variables in that package.
I've tried various permutations of the following idea, but none seem to work:
{package Tie::Stash;
use Tie::Hash;
BEGIN {our #ISA = 'Tie::StdHash'}
sub FETCH {
print "calling fetch\n";
}
}
{package Target}
BEGIN {tie %Target::, 'Tie::Stash'}
say $Target::x;
This dies with Bad symbol for scalar ... on the last line, without ever printing "calling fetch". If the say $Target::x; line is removed, the program runs and exits properly.
My guess is that the failure has to do with stashes being like, but not the same as hashes, so the standard tie mechanism is not working right (or it might just be that stash lookup never invokes tie magic).
Does anyone know if this is possible? Pure Perl would be best, but XS solutions are ok.
You're hitting a compile time internal error ("Bad symbol for scalar"), this happens while Perl is trying to work out what '$Target::x' should be, which you can verify by running a debugging Perl with:
perl -DT foo.pl
...
### 14:LEX_NORMAL/XOPERATOR ";\n"
### Pending identifier '$Target::x'
Bad symbol for scalar at foo.pl line 14.
I think the GV for '::Target' is replaced by something else when you tie() it, so that whatever eventually tries to get to its internal hash cannot. Given that tie() is a little bit of a mess, I suspect what you're trying to do won't work, which is also suggested by this (old) set of exchanges on p5p:
https://groups.google.com/group/perl.perl5.porters/browse_thread/thread/f93da6bde02a91c0/ba43854e3c59a744?hl=en&ie=UTF-8&q=perl+tie+stash#ba43854e3c59a744
A little late to the question, but although it's not possible to use tie to do this, Variable::Magic allows you to attach magic to a stash and thereby achieve something similar.

When should I use the & to call a Perl subroutine?

I have heard that people shouldn't be using & to call Perl subs, i.e:
function($a,$b,...);
# opposed to
&function($a,$b,...);
I know for one the argument list becomes optional, but what are some cases where it is appropriate to use the & and the cases where you should absolutely not be using it?
Also how does the performace increase come into play here when omitting the &?
I'm a frequent abuser of &, but mostly because I'm doing weird interface stuff. If you don't need one of these situations, don't use the &. Most of these are just to access a subroutine definition, not call a subroutine. It's all in perlsub.
Taking a reference to a named subroutine. This is probably the only common situation for most Perlers:
my $sub = \&foo;
Similarly, assigning to a typeglob, which allows you to call the subroutine with a different name:
*bar = \&foo;
Checking that a subroutine is defined, as you might in test suites:
if( defined &foo ) { ... }
Removing a subroutine definition, which shouldn't be common:
undef &foo;
Providing a dispatcher subroutine whose only job is to choose the right subroutine to call. This is the only situation I use & to call a subroutine, and when I expect to call the dispatcher many, many times and need to squeeze a little performance out of the operation:
sub figure_it_out_for_me {
# all of these re-use the current #_
if( ...some condition... ) { &foo }
elsif( ...some other... ) { &bar }
else { &default }
}
To jump into another subroutine using the current argument stack (and replacing the current subroutine in the call stack), an unrare operation in dispatching, especially in AUTOLOAD:
goto ⊂
Call a subroutine that you've named after a Perl built-in. The & always gives you the user-defined one. That's why we teach it in Learning Perl. You don't really want to do that normally, but it's one of the features of &.
There are some places where you could use them, but there are better ways:
To call a subroutine with the same name as a Perl built-in. Just don't have subroutines with the same name as a Perl built-in. Check perlfunc to see the list of built-in names you shouldn't use.
To disable prototypes. If you don't know what that means or why you'd want it, don't use the &. Some black magic code might need it, but in those cases you probably know what you are doing.
To dereference and execute a subroutine reference. Just use the -> notation.
IMO, the only time there's any reason to use & is if you're obtaining or calling a coderef, like:
sub foo() {
print "hi\n";
}
my $x = \&foo;
&$x();
The main time that you can use it that you absolutely shouldn't in most circumstances is when calling a sub that has a prototype that specifies any non-default call behavior. What I mean by this is that some prototypes allow reinterpretation of the argument list, for example converting #array and %hash specifications to references. So the sub will be expecting those reinterpretations to have occurred, and unless you go to whatever lengths are necessary to mimic them by hand, the sub will get inputs wildly different from those it expects.
I think mainly people are trying to tell you that you're still writing in Perl 4 style, and we have a much cleaner, nicer thing called Perl 5 now.
Regarding performance, there are various ways that Perl optimizes sub calls which & defeats, with one of the main ones being inlining of constants.
There is also one circumstance where using & provides a performance benefit: if you're forwarding a sub call with foo(#_). Using &foo is infinitesimally faster than foo(#_). I wouldn't recommend it unless you've definitively found by profiling that you need that micro-optimization.
The &subroutine() form disables prototype checking. This may or may not be what you want.
http://www.perl.com/doc/manual/html/pod/perlsub.html#Prototypes
Prototypes allow you to specify the numbers and types of your subroutine arguments, and have them checked at compile time. This can provide useful diagnostic assistance.
Prototypes don't apply to method calls, or calls made in the old-fashioned style using the & prefix.
The & is necessary to reference or dereference a subroutine or code reference
e.g.
sub foo {
# a subroutine
}
my $subref = \&foo; # take a reference to the subroutine
&$subref(#args); # make a subroutine call using the reference.
my $anon_func = sub { ... }; # anonymous code reference
&$anon_func(); # called like this
Protypes aren't applicable to subroutine references either.
The &subroutine form is also used in the so-called magic goto form.
The expression goto &subroutine replaces the current calling context with a call to the named subroutine, using the current value of #_.
In essence, you can completely switch a call to one subroutine with a call to the named one. This is commonly seen in AUTOLOAD blocks, where a deferred subroutine call can be made, perhaps with some modification to #_ , but it looks to the program entirely as if it was a call to the named sub.
e.g.
sub AUTOLOAD {
...
push #_, #extra_args; # add more arguments onto the parameter list
goto &subroutine ; # change call another subroutine, as if we were never here
}
}
Potentially this could be useful for tail call elimination, I suppose.
see detailed explanation of this technique here
I've read the arguments against using '&', but I nearly always use it. It saves me too much time not to. I spend a very large fraction of my Perl coding time looking for what parts of the code call a particular function. With a leading &, I can search and find them instantly. Without a leading &, I get the function definition, comments, and debug statements, usually tripling the amount of code I have to inspect to find what I'm looking for.
The main thing not using '&' buys you is it lets you use function prototypes. But Perl function prototypes may create errors as often as they prevent them, because they will take your argument list and reinterpret it in ways you might not expect, so that your function call no longer passes the arguments that it literally says it does.