Nested subroutines and Scoping in Perl - perl

I'm writing Perl for quite some time now and always discovering new things, and I just ran into something interesting that I don't have the explanation to it, nor found it over the web.
sub a {
sub b {
print "In B\n";
}
}
b();
how come I can call b() from outside its scope and it works?
I know its a bad practice to do it, and I dont do it, I use closured and such for these cases, but just saw that.

Subroutines are stored in a global namespace at compile time. In your example b(); is short hand for main::b();. To limit visibility of a function to a scope you need to assign an anonymous subroutines to a variable.
Both named and anonymous subroutines can form closures, but since named subroutines are only compiled once if you nest them they don't behave as many people expect.
use warnings;
sub one {
my $var = shift;
sub two {
print "var: $var\n";
}
}
one("test");
two();
one("fail");
two();
__END__
output:
Variable "$var" will not stay shared at -e line 5.
var: test
var: test
Nesting named subroutines is allowed in Perl but it's almost certainly a sign that the code is doing someting incorrectly.

The "official" way to create nested subroutines in perl is to use the local keyword. For example:
sub a {
local *b = sub {
return 123;
};
return b(); # Works as expected
}
b(); # Error: "Undefined subroutine &main::b called at ..."
The perldoc page perlref has this example:
sub outer {
my $x = $_[0] + 35;
local *inner = sub { return $x * 19 };
return $x + inner();
}
"This has the interesting effect of creating a function local to another function, something not normally supported in Perl."

The following prints 123.
sub a {
$b = 123;
}
a();
print $b, "\n";
So why are you surprised that the following does too?
sub a {
sub b { return 123; }
}
a();
print b(), "\n";
Nowhere is any request for $b or &b to be lexical. In fact, you can't ask for &b to be lexical (yet).
sub b { ... }
is basically
BEGIN { *b = sub { ... }; }
where *b is the symbol table entry for $b, #b, ..., and of course &b. That means subs belong to packages, and thus can be called from anywhere within the package, or anywhere at all if their fully qualified name is used (MyPackage::b()).

Subroutines are defined during compile time, and are not affected by scope. In other words, they cannot truly be nested. At least not as far as their own scope is concerned. After being defined, they are effectively removed from the source code.

Related

Access object created in another function

My program creates an object, which, in turn, creates another object
MainScript.pm
use module::FirstModule qw ($hFirstModule);
$hFirstModule->new(parametres);
$hFirstModule->function();
FirstModule.pm
use Exporter ();
#EXPORT = qw($hFirstModule);
use module::SecondModule qw ($hSecondModule);
sub new {
my $className = shift;
my $self = { key => 'val' };
bless $self, $classname;
return $self;
}
sub function{
$hSecondModule->new(parametres);
#some other code here
}
I want to acces $hSecondModule from MainScript.pm.
It depends.
We would have to see the actual code. What you've shown is a bit ambiguous. However, there are two scenarios.
You can't
If your code is not exactly like what you have shown as pseudo-code, then there is no chance to do that. Consider this code in &module1::function.
sub function {
my $obj = Module2->new;
# ... more stuff
return;
}
In this case, you are not returning anything, and the $obj is lexically scoped. A lexical scope means that it only exists inside of the closest {} block (and all blocks inside that). That's the block of the function sub. Once the program returns out of that sub, the variable goes out of scope and the object is destroyed. There is no way to get to it afterwards. It's gone.
Even if it was not destroyed, you cannot reach into a different scope.
You can
If you however return the object from the function, then you'd have to assign it in your script, and then you can access it later. If the code is exactly what you've shown above, this works.
sub function {
my $obj = Module2->new;
# nothing here
}
In Perl, subs always return the last true statement. If you don't have a return and the last statement is the Module2->new call, then the result of that statement, which is the object, is returned. Of course it also works if you actually return explicitly.
sub function {
return Module2->new;
}
So if you assign that to a variable in your script, you can access it in the script.
my $obj = module1->function();
This is similar to the factory pattern.
This is vague, but without more information it's impossible to answer the question more precicely.
Here is a very hacky approach that takes your updated code into consideration. It uses Sub::Override to grab the return value of the constructor call to your SecondModule thingy. This is something that you'd usually maybe do in a unit test, but not in production code. However, it should work. Here's an example.
Foo.pm
package Foo;
use Bar;
sub new {
return bless {}, $_[0];
}
sub frobnicate {
Bar->new;
return;
}
Bar.pm
package Bar;
sub new {
return bless {}, $_[0];
}
sub drink {
return 42; # because.
}
script.pl
package main;
use Foo; # this will load Bar at compile time
use Sub::Override;
my $foo = Foo->new;
my $bar; # this is lexical to the main script, so we can use it inside
my $orig = \&Bar::new; # grab the original function
my $sub = Sub::Override->new(
"Bar::new" => sub {
my $self = shift;
# call the constructor of $hSecondModule, grab the RV and assign
# it to our var from the main script
$bar = $self->$orig(#_);
return $bar;
}
);
$foo->frobnicate;
# restore the original sub
$sub->restore;
# $bar is now assigend
print $bar->drink;
Again, I would not do this in production code.
Let's take a look at the main function. It first creates a new Foo object. Then it grabs a reference to the Bar::new function. We need that as the original, so we can call it to create the object. Then we use Sub::Override to temporarily replace the Bar::new with our sub that calls the original, but takes the return value (which is the object) and assigns it to our variable that's lexical to the main script. Then we return it.
This function will now be called when $foo->frobnicate calls Bar->new. After that call, $bar is populated in our main script. Then we restore Bar::new so we don't accidentally overwrite our $bar in case that gets called again from somewhere else.
Afterwards, we can use $bar.
Note that this is advanced. I'll say again that I would not use this kind of hack in production code. There is probably a better way to do what you want. There might be an x/y problem here and you need to better explain why you need to do this so we can find a less crazy solution.

How do I access variables in a higher stack frame in perl?

How can I do something like this in Perl? E.g. access $a in a function that it isn't defined in? I don't want to use globals, and also don't want to use a CPAN module or pass $a as a parameter to bar.
sub foo {
my $a;
bar();
}
sub bar {
print STDOUT "a is " . magic_function_that_looks_into_callers_frame('a');
}
It sounds like what you're looking for is dynamic extent (i.e., the value hangs around until Perl execution is done with the subroutine which started it). Perl implements this with local (rather than my). Check out the answer at: https://stackoverflow.com/a/8473837/2140998, but here's a small example:
our $foo;
sub top {
local $foo = "top";
bar();
}
sub bar {
say "Called from $foo";
}
top();
So the (value of the) variable can be accessed from the calling stack frame, although the variable needs to actually exist globally or the code won't properly compile (Perl does like its lexical scoping).
For more advanced work, there's also: https://metacpan.org/pod/PadWalker, but that's really playing with Perl's internals, so not for normal use.
Since $a is in bar()'s environment when its called, a way out is to call bar() with $a as an argument. For example:
sub bar {
print #_;
}
sub foo {
my $a = "what";
bar($a);
}
foo;

Perl: "Variable will not stay shared"

I looked up a few answers dealing with this warning, but neither did they help me, nor do I truly understand what Perl is doing here at all. Here's what I WANT it to do:
sub outerSub {
my $dom = someBigDOM;
...
my $otherVar = innerSub();
return $otherVar;
sub innerSub {
my $resultVar = doStuffWith($dom);
return $resultVar;
}
}
So basically, I have a big DOM object stored in $dom that I don't want to pass along on the stack if possible. In outerSub, stuff is happening that needs the results from innerSub. innerSub needs access to $dom. When I do this, I get this warning "Variable $dom will not stay shared".
What I don't understand:
Does this warning concern me here? Will my intended logic work here or will there be strange things happening?
If it doesn't work as intended: is it possible to do that? To make a local var visible to a nested sub? Or is it better to just pass it as a parameter? Or is it better to declare an "our" variable?
If I push it as a parameter, will the whole object with all its data (may have several MB) be pushed on the stack? Or can I just pass something like a reference? Or is Perl handling that parameter as a reference all by itself?
In "Variable $foo will not stay shared" Warning/Error in Perl While Calling Subroutine, someone talks about an anonymous sub that will make this possible. I did not understand how that works, never used anything like that.
I do not understand that explanation at all (maybe cause English is not my first language): "When the inner subroutine is called, it will see the value of the outer subroutine's variable as it was before and during the first call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable.":
What does "the first call to the outer subroutine is complete? mean"
I mean: first I call the outer sub. The outer sub calls the inner sub. The outer sub is of course still running. Once the outer sub is complete, the inner sub will be finished as well. Then how does any of this still apply when the inner sub is already finished? And what about the "first" call? When is the "second" call happening... sorry, this explanation confuses me to no end.
Sorry for the many questions. Maybe someone can at least answer some of them.
In brief, the second and later times outerSub is called will have a different $dom variable than the one used by innerSub. You can fix this by doing this:
{
my $dom;
sub outerSub {
$dom = ...
... innerSub() ...
}
sub innerSub {
...
}
}
or by doing this:
sub outerSub {
my $dom = ...
*innerSub = sub {
...
};
... innerSub() ...
}
or this:
sub outerSub {
my $dom = ...
my $innerSub = sub {
...
};
... $innerSub->() ...
}
All the variables are originally preallocated, and innerSub and outerSub share the same $dom. When you leave a scope, perl goes through the lexical variables that were declared in the scope and reinitializes them. So at the point that the first call to outerSub is completed, it gets a new $dom. Because named subs are global things, though, innerSub isn't affected by this, and keeps referring to the old $dom. So if outerSub is called a second time, its $dom and innerSub's $dom are in fact separate variables.
So either moving the declaration out of outerSub or using an anonymous sub (which gets freshly bound to the lexical environment at runtime) fixed the problem.
You need to have an anonymous subroutine to capture variables:
my $innerSub = sub {
my $resultVar = doStuffWith($dom);
return $resultVar;
};
Example:
sub test {
my $s = shift;
my $f = sub {
return $s x 2;
};
print $f->(), "\n";
$s = "543";
print $f->(), "\n";
}
test("a1b");
Gives:
a1ba1b
543543
If you want to minimize the amount of size passing parameters to subs, use Perl references. The drawback / feature is that the sub could change the referenced param contents.
my $dom = someBigDOM;
my $resultVar = doStuffWith(\$dom);
sub doStuffWith {
my $dom_reference = shift;
my $dom_contents = $$dom_reference;
#...
}
Following http://www.foo.be/docs/perl/cookbook/ch10_17.htm , you should define a local GLOB as follows :
local *innerSub = sub {
...
}
#You can call this sub without ->
innerSub( ... )
Note that even if warning is displayed, the result stay the same as it should be expected : variables that are not defined in the inner sub are modified in the outer sub scope. I cannot see what this warning is about.

Is there any difference between "standard" and block package declaration?

Usually, a package starts simply as
package Cat;
... #content
!0;
I just discovered that starting from the perl 5.14 there is the "block" syntax too.
package Cat {
... #content
}
It is probably the same. But just to be sure, is there any difference?
And about the 1; at the end of the package file. The return value of any block, is taken as the value of the last evaluated expression. So can I put the 1; before the closing }? To make require happy, is there any difference between:
package Cat {
... #content
1;
}
and
package Cat {
... #content
}
1;
Of course there is a difference. The second variant has a block.
A package declaration sets the current namespace for subs and globals. This is scoped normally, i.e. the scope ends with the end of file or eval string, or with an enclosing block.
The package NAME BLOCK syntax is just syntactic sugar for
{ package NAME;
...;
}
and even compiles down to the same opcodes.
While the package declaration is syntactically a statement, this isn't semantically true; it just sets compile-time properties. Therefore, the last statement of the last block is the last statement of the file, and there is no difference between
package Foo;
1;
and
package Foo {
1;
}
wrt. the last statement.
The package BLOCK syntax is interesting mainly because it looks like class Foo {} in other languages, I think. Because the block limits scope, this also makes using properly scoped variables easier. Think:
package Foo;
our $x = 1;
package main;
$::x = 42;
say $x;
Output: 1, because our is lexically scoped like my and just declares an alias! This can be prevented by the block syntax:
package Foo {
our $x = 1;
}
package main {
$::x = 42;
say $x;
}
works as expected (42), although strict isn't happy.
package Foo { ... }
is equivalent to
{ package Foo; ... }
which is different from the following in that it create a block
package Foo; ...
This only matters if you have code that follows in the file.
package Foo { ... }
isn't equivalent to
BEGIN { package Foo; ... }
I would have liked this, but that proposal was not accepted.
require (and do) requires that the last expression evaluated in the file returns a true value. { ... } evaluates to the last value evaluated within, so the following is fine:
package Foo { ...; 1 }
Placing the 1; on the outside makes more sense to me — it pertains to the file rather than the package — but that's merely a style choice.
There is a difference :) If you can try running below two different code snippets (just import the modules in a perl file )
# perlscript.pl
use wblock;
wblock::wbmethod();
First Code snippet without block, wblock.pm
package wblock1;
my $a =10;
sub wbmethod1{
print "in wb $a";
}
package wblock;
sub wbmethod{
print "in wb1 $a";
}
1;
Second with wblock.pm
package wblock1 {
my $a =10;
sub wbmethod1{
print "in wb $a";
}
1;
}
package wblock {
sub wbmethod{
print "in wb1 $a";
}
1;
}
Now the difference as you might have seen is, variable $a is not available for wblock package when we use BLOCK. But without BLOCK we can use $a from other package, as it's scope is for file.
More to say from perldoc itself:
That is, the forms without a BLOCK are operative through the end of
the current scope, just like the my, state, and our operators.
There is a difference but only if you are doing hard-to-maintain programming. Specially, if you are using a my variable across packages. The convention for Perl is to have only one package per file and the entire package in one file.
There's one other difference that other answers haven't given, that may help explain why there are two.
package Foo;
sub bar { ... }
was always the way to do it in Perl 5. The package BLOCK syntax of
package Foo {
sub bar { ... }
}
was only added at perl 5.14.
The main difference then is that the latter form is neater but only works since 5.14; the former form will work back to older versions. The neater form was added largely for visual neatness; it doesn't have any semantic difference worth worrying about.

Perl Anonymous Subroutine/Function error

I have the following piece of code: (extremely simplified for the purposes of this question, but perfectly illustrates the problem I am having)
#!/usr/bin/perl
use strict;
use warnings;
&outer;
my $connected_sub;
sub outer {
print "HELLO\n";
&$connected_sub;
$connected_sub = sub {
print "GOODBYE\n";
}
}
When run the program gives this output and error:
HELLO
Use of uninitialized value in subroutine entry at subTesting line 13.
Can't use string ("") as a subroutine ref while "strict refs" in use at subTesting.pl line 13.
Am I totally overlooking something here? I cannot understand or work out what the problem with this is.
To clarify:
Subroutine definitions happen in the compilation stage. Thus code like this will work:
foo();
sub foo { print "No need to declare me before calling!"; }
But an assignment doesn't actually happen until that line of code is called. That is why this won't work:
my $foo;
&$foo();
$foo = sub { print "Foo hasn't been set to me when you try to call me." }
I assume that what you are trying to do here is assign an anonymous sub to the variable $connected_sub. This is not a good way to do it.
What you are doing is taking an empty variable, trying to use it as a code reference, assigning a code reference to it, then exiting the sub and then declaring the variable with my. Not the best order of doing things.
What you probably want to do is return a value which can be assigned to the variable, like so:
my $connected = outer();
$connected->();
sub outer {
print "HELLO\n";
my $sub = sub { print "GOODBYE\n"; }
return $sub;
}
Using a lexical variable inside a subroutine is somewhat confusing, I think. Besides the general drawbacks of using global variables, the subroutine is also compiled before the code is executed and the variable declared.
Also, when calling a subroutine, the standard way of doing so is
name(#args);
Where #args is your argument list. Using & is old style perl, and using it has a special meaning (override prototypes). When using an anonymous sub in a variable, use the ->() notation.
The $connected_sub is not initializated. Try to assign to an anonymous sub:
my $connected_sub = sub {
print "The code you need to run\n";
}
At the definition, and drop the code after the &$connected_sub call
This is the complete example modified:
#!/usr/bin/perl
use strict;
use warnings;
my $connected_sub = sub {
print "GOODBYE\n";
};
&outer;
sub outer
{
print "HELLO\n";
&$connected_sub;
}
Looks like you're using $connected_stub before it is initialized. Try to move the initialization up, like:
$connected_sub = sub {
print "GOODBYE\n";
}
&$connected_sub;