How to avoid global variable declaration when using Perl's dynamic scoping? - perl

I am trying to write a perl script that calls a function written somewhere else (by someone else) which manipulates some of the variables in my script's scope. Let's say the script is main.pl and the function is there in funcs.pm. My main.pl looks like this:
use warnings;
use strict;
package plshelp;
use funcs;
my $var = 3;
print "$var\n"; # <--- prints 3
{ # New scope somehow prevents visibility of $pointer outside
local our $pointer = \$var;
change();
}
print "$var\n"; # <--- Ideally should print whatever funcs.pm wanted
For some reason, using local our $pointer; prevents visibility of $pointer outside the scope. But if I just use our $pointer;, the variable can be seen outside the scope in main.pl using $plshelp::pointer (but not in funcs.pm, so it would be useless anyway). As a side-note, could someone please explain this?
funcs.pm looks something like this:
use warnings;
use strict;
package plshelp;
sub change
{
${$pointer} = 4;
}
I expected this to change the value of $var and print 4 when the main script was run. But I get a compile error saying $pointer wasn't declared. This error can be removed by adding our $pointer; at the top of change in funcs.pm, but that would create an unnecessary global variable that is visible everywhere. We can also remove this error by removing the use strict;, but that seems like a bad idea. We can also get it to work by using $plshelp::pointer in funcs.pm, but the person writing funcs.pm doesn't want to do that.
Is there a good way to achieve this functionality of letting funcs.pm manipulate variables in my scope without declaring global variables? If we were going for global variables anyway, I guess I don't need to use dynamic scoping at all.
Let's just say it's not possible to pass arguments to the function for some reason.
Update
It seems that local our isn't doing any "special" as far as preventing visibility is concerned. From perldoc:
This means that when use strict 'vars' is in effect, our lets you use a package variable without qualifying it with the package name, but only within the lexical scope of the our declaration. This applies immediately--even within the same statement.
and
This works even if the package variable has not been used before, as package variables spring into existence when first used.
So this means that $pointer "exists" even after we leave the curly braces. Just that we have to refer to it using $plshelp::pointer instead of just $pointer. But since we used local before initializing $pointer, it is still undefined outside the scope (although it is still "declared", whatever that means). A clearer way to write this would be (local (our $pointer)) = \$var;. Here, our $pointer "declares" $pointer and returns $pointer as well. We now apply local on this returned value, and this operation returns $pointer again which we are assigning to \$var.
But this still leaves the main question of whether there is a good way of achieving the required functionality unanswered.

Let's be clear about how global variables with our work and why they have to be declared: There's a difference between the storage of a global variable, and visibility of its unqualified name. Under use strict, undefined variable names will not implicitly refer to a global variable.
We can always access the global variable with its fully qualified name, e.g. $Foo::bar.
If a global variable in the current package already exists at compile time and is marked as an imported variable, we can access it with an unqualified name, e.g. $bar. If a Foo package is written appropriately, we could say use Foo qw($bar); say $bar where $bar is now a global variable in our package.
With our $foo, we create a global variable in the current package if that variable doesn't already exist. The name of the variable is also made available in the current lexical scope, just like the variable of a my declaration.
The local operator does not create a variable. Instead, it saves the current value of a global variable and clears that variable. At the end of the current scope, the old value is restored. You can interpret each global variable name as a stack of values. With local you can add (and remove) values on the stack.
So while local can dynamically scope a value, it does not create a dynamically scoped variable name.
By carefully considering which code is compiled when, it becomes clear why your example doesn't currently work:
In your main script, you load the module funcs. The use statement is executed in the BEGIN phase, i.e. during parsing.
use warnings;
use strict;
package plshelp;
use funcs;
The funcs module is compiled:
use warnings;
use strict;
package plshelp;
sub change
{
${$pointer} = 4;
}
At this point, no $pointer variable is in lexical scope and no imported global $pointer variable exists. Therefore you get an error. This compile-time observation is unrelated to the existence of a $pointer variable at runtime.
The canonical way to fix this error is to declare an our $pointer variable name in the scope of the sub change:
sub change {
our $pointer;
${$pointer} = 4;
}
Note that the global variable will exist anyway, this just brings the name into scope for use as an unqualified variable name.
Just because you can use global variables doesn't mean that you should. There are two issues with them:
On a design level, global variables do not declare a clear interface. By using a fully qualified name you can simply access a variable without any checks. They do not provide any encapsulation. This makes for fragile software and weird action-at-a-distance.
On an implementation level, global variables are simply less efficient than lexical variables. I have never actually seen this matter, but think of the cycles!
Also, global variables are global variables: They can only have one value at a time! Scoping the value with local can help to avoid this in some cases, but there can still be conflicts in complex systems where two modules want to set the same global variable to different values and those modules call into each other.
The only good uses for global variables I have seen are to provide additional context to a callback that cannot take extra parameters, roughly similar to your approach. But where possible it is always better to pass the context as a parameter. Subroutine arguments are already effectively dynamically scoped:
sub change {
my ($pointer) = #_;
${$pointer} = 4;
}
...
my $var = 3;
change(\$var);
If there is a lot of context it can be come cumbersome to pass all those references: change(\$foo, \$bar, \$baz, \#something_else, \%even_more, ...). It could then make sense to bundle that context into an object, which can then be manipulated in a more controlled manner. Manipulating local or global variables is not always the best design.

There's too much wrong with your code to just fix it
You've used package plshelp in both the main script and the module, even though the main entry point is in main.pl and your module is in funcs.pm. That's just irresponsible. Did you imagine that the package statement was solely for advertising for help and it didn't matter what you put in there?
Your post doesn't say what is wrong with what you have written, but it's surprising that it doesn't throw an error.
Here's something close that does what you seem to expect. I can't really explain things as your own code is so far from working
Functions.pm
package Functions;
use strict;
use warnings;
use Exporter 'import';
our #EXPORT_OK = 'change';
sub change {
my ($ref) = #_;
$$ref = 4;
}
main.pl
use strict;
use warnings 'all';
use Functions 'change';
my $var = 44;
print "$var\n";
change(\$var);
print "$var\n";
output
44
4

Related

Use a variable from one file into another

I have a variable $x which currently has a local scope in A.pm and I want to use the output of $x (which is usually PASSED/FAILED) in an if else statement in B.pm
Something like below
A.pm:
if (condition1) { $x = 'PASSED'; }
if (condition2) { $x = 'FAILED'; }
B.pm:
if ($x=='PASSED') { $y=1; } else { $y=0; }
I tried using require ("A.pm"); in B.pm but it gives me an error global symbol requires an explicit package name which means it is not able to read the variable from require. Any inputs would help
I have a variable $x which currently has a local scope in A.pm and I want to use the output of $x (which is usually PASSED/FAILED) in an if else statement in B.pm
We could show you how to do this, but this is a really bad, awful idea.
There's a reason why variables are scoped, and even global variables declared with our and not my are still scoped to a particular package.
Imagine someone modifying one of your packages, and not realizing there's a direct connection to a variable name $x. They could end up making a big mess without even knowing why.
What I would HIGHLY recommend is that you use functions (subroutines) to pass around the value you need:
Local/A.pm
package Local::A;
use strict;
use warnings;
use lib qw($ENV{HOME});
use Exporter qw(import);
our #EXPORT_OK = qw(set_condition);
sub set_condition {
if ( condition1 ) {
return "PASSED";
elsif ( condition2 ) {
return "FALSED";
else {
return "Huh?";
}
1;
Here's what I did:
I can't use B as a module name because that's an actual module. Therefore, I used Local::B and Local::A instead. The Local module namespace is undefined in CPAN and never used. You can always declare your own modules under this module namespace.
The use lib allows me to specify where to find my modules.
The package command gives this module a completely separate namespace. This way, variables in A.pm don't affect B.pm.
use Exporter allows me to export subroutines from one module to another. #EXPORT_OK are the names of the subroutines I want to export.
Finally, there's a subroutine that runs my test for me. Instead of setting a variable in A.pm, I return the value from this subroutine.
Check your logic. Your logic is set that $x isn't set if neither condition is true. You probably don't want that.
Your module can't return a zero as the last value. Thus, it's common to always put 1; as the last line of a module.
Local/B.pm
package Local::B;
use lib qw($ENV{HOME});
use Local::A qw(set_condition);
my $condition = set_contition();
my $y;
if ( $condition eq 'PASSED' ) { # Note: Use `eq` and not `==` because THIS IS A STRING!
$y = 1;
else {
$y = 0;
}
1;
Again, I define a separate module namespace with package.
I use Local::A qw(set_condition); to export my set_condition subroutine into B.pm. Now, I can call this subroutine without prefixing it with Local::A all of the time.
I set a locally scoped variable called $condition to the status of my condition.
Now, I can set $y from the results of the subroutine set_condition. No messy need to export variables from one package to another.
If all of this looks like mysterious magic, you need to read about Perl modules. This isn't light summer reading. It can be a bit impenetrable, but it's definitely worth the struggle. Or, get Learning Perl and read up on Chapter 11.
After you require A;, you can then access the variable by giving it an explicit package name like the error message says.
in B.pm:
my $y = $A::x eq 'PASSED ? 1 : 0
The variable $x will have to be declared with our instead of my.
Finally, use eq instead of == for doing string comparisons.

Naming convention for passing a hash reference to a subroutine

When I pass a hash reference to a subroutine, what is a good naming convention for the variable to which the hash ref parameter is assigned?
In the example below, what is the better option:
Use the same variable name as the original variable (and let Perl hide the original within the block).
Come up with a new name for the hash ref within the subroutine?
I am inclined to use option 1, because a new name for the same hash reference feels redundant.
If option 2 is better, what would be a good naming convention for the variable within the subroutine ?
## example:
use strict;
use warnings;
my $href_phone_book = generate_phone_book();
my $phone_number = get_phone_number($href_phone_book, "john");
print $phone_number."\n";
sub generate_phone_book {
my %phone_book = (
john => "5554321",
alice => "5551234"
);
return \%phone_book;
}
sub get_phone_number{
# Is it OK to hide the original variable
my $href_phone_book = shift;
# Or should I use a new variable name? What would then be a good naming convention?
my $person = shift;
my $phone_number = $href_phone_book->{$person};
return $phone_number;
}
You should name your variables for the data that they reference. Ideally you should use a short non-ambiguous name, and remove context where possible. For example if your function did something that would apply generically to any hash reference, the name $hash_ref would be preferable IMO to $phone_book
1. Use the same variable name as the original variable (and let Perl hide the original within the block).
Although not a generically safe assumption, it is what I would do in your example. I would use $phone_book as the single param.
Note if you are using the same reference data structure repeatedly as a parameter to functions, then it points to writing some OO Perl. You should probably create a class, with the data structure the object, and functions changed to instance methods.
Caveat: Your example script will generate a warning due to variable scope. So my advice needs to be modified. I would not alter the subroutine's variable but instead name the global variable something like $main_phone_book . . . but this is only necessary due to the structure of your example script. Ideally you should avoid global data structures, and use modularisation (a PhoneBook.pm file that contained all the logic for handling phone numbers, whether exported functions or a new class, would not have a clash on the variable name because of package namespaces). Or if the variables are truly global to the problem at hand you probably would not be passing them as parameters at all.
2. Come up with a new name for the hash ref within the subroutine?
In this case it is only necessary due to file structure of the example script. Consistency, at least at the level of individual scripts and modules, is more important than finding some special naming convention that can handle all your needs.
I suggest you read the variable naming convention chapters of Perl Best Practices. It is very clear and useful advice.

Perl scoping and the life of local variables

How long does the memory location allocated by a local variable in Perl live for (both for arrays, hashes and scalars)? For instance:
sub routine
{
my $foo = "bar";
return \$foo;
}
Can you still access the string "bar" in memory after the function has returned? How long will it live for, and is it similar to a static variable in C or more like a variable declared off the heap?
Basically, does this make sense in this context?
$ref = routine()
print ${$ref};
Yes, that code will work fine.
Perl uses reference counting, so the variable will live as long as somebody has a reference to it. Perl's lexical variables are sort of like C's automatic variables, because they normally go away when you leave the scope, but they're also like a variable on the heap, because you can return a reference to one and it will just work.
They're not like C's static variables, because you get a new $foo every time you call routine (even recursively). (Perl 5.10 introduced state variables, which are rather like a C static.)

How does this call to a subroutine in a Perl module work?

I recently saw some Perl code that confused me. I took out all of the extra parts to see how it was working, but I still don't understand why it works.
Basically, I created this dummy "module" (TTT.pm):
use strict;
use warnings;
package TTT;
sub new {
my $class = shift;
return bless {'Test' => 'Test'}, $class;
}
sub acquire {
my $tt = new TTT();
return $tt;
}
1;
Then I created this script to use the module (ttt.pl):
#!/usr/bin/perl
use strict;
use warnings;
use TTT;
our $VERSION = 1;
my $tt = acquire TTT;
print $tt->{Test};
The line that confuses me, that I thought would not work, is:
my $tt = acquire TTT;
I thought it would not work since the "acquire" sub was never exported. But it does work.
I was confused by the "TTT" after the call to acquire, so I removed that, leaving the line like this:
my $tt = acquire;
And it complained of a bareword, like I thought it would. I added parens, like this:
my $tt = acquire();
And it complained that there wasn't a main::acquire, like I thought it would.
I'm used to the subroutines being available to the object, or subroutines being exported, but I've never seen a subroutine get called with the package name on the end. I don't even know how to search for this on Google.
So my question is, How does adding the package name after the subroutine call work? I've never seen anything like that before, and it probably isn't good practice, but can someone explain what Perl is doing?
Thanks!
You are using the indirect object syntax that Perl allows (but in modern code is discouraged). Basically, if a name is not predeclared, it can be placed in front of an object (or class name) separated with a space.
So the line acquire TTT actually means TTT->acquire. If you actually had a subroutine named acquire in scope, it would instead be interpreted as aquire(TTT) which is can lead to ambiguity (hence why it is discouraged).
You should also update the new TTT(); line in the method to read TTT->new;
It's the indirect object syntax for method calls, which lets you put the method name before the object name.
As the documentation there shows, it's best avoided because it's unwieldy and it can break in unpredictable ways, for example if there is an imported or defined subroutine named acquire — but it used to be more common than it is today, and so you will find it pretty often in old code and docs.

Can Perl method calls be intercepted?

Can you intercept a method call in Perl, do something with the arguments, and then execute it?
Yes, you can intercept Perl subroutine calls. I have an entire chapter about that sort of thing in Mastering Perl. Check out the Hook::LexWrap module, which lets you do it without going through all of the details. Perl's methods are just subroutines.
You can also create a subclass and override the method you want to catch. That's a slightly better way to do it because that's the way object-oriented programming wants you do to it. However, sometimes people write code that doesn't allow you to do this properly. There's more about that in Mastering Perl too.
To describe briefly, Perl has the aptitude to modify symbol table. You call a subroutine (method) via symbol table of the package, to which the method belongs. If you modify the symbol table (and this is not considered very dirty), you can substitute most method calls with calling the other methods you specify. This demonstrates the approach:
# The subroutine we'll interrupt calls to
sub call_me
{
print shift,"\n";
}
# Intercepting factory
sub aspectate
{
my $callee = shift;
my $value = shift;
return sub { $callee->($value + shift); };
}
my $aspectated_call_me = aspectate \&call_me, 100;
# Rewrite symbol table of main package (lasts to the end of the block).
# Replace "main" with the name of the package (class) you're intercepting
local *main::call_me = $aspectated_call_me;
# Voila! Prints 105!
call_me(5);
This also shows that, once someone takes reference of the subroutine and calls it via the reference, you can no longer influence such calls.
I am pretty sure there are frameworks to do aspectation in perl, but this, I hope, demonstrates the approach.
This looks like a job for Moose! Moose is an object system for Perl that can do that and lots more. The docs will do a much better job at explaining than I can, but what you'll likely want is a Method Modifier, specifically before.
You can, and Pavel describes a good way to do it, but you should probably elaborate as to why you are wanting to do this in the first place.
If you're looking for advanced ways of intercepting calls to arbitrary subroutines, then fiddling with symbol tables will work for you, but if you want to be adding functionality to functions perhaps exported to the namespace you are currently working in, then you might need to know of ways to call functions that exist in other namespaces.
Data::Dumper, for example, normally exports the function 'Dumper' to the calling namespace, but you can override or disable that and provide your own Dumper function which then calls the original by way of the fully qualified name.
e.g.
use Data::Dumper;
sub Dumper {
warn 'Dumping variables';
print Data::Dumper::Dumper(#_);
}
my $foo = {
bar => 'barval',
};
Dumper($foo);
Again, this is an alternate solution that may be more appropriate depending on the original problem. A lot of fun can be had when playing with the symbol table, but it may be overkill and could lead to hard to maintain code if you don't need it.
Yes.
You need three things:
The arguments to a call are in #_ which is just another dynamically scoped variable.
Then, goto supports a reference-sub argument which preserves the current #_ but makes another (tail) function call.
Finally local can be used to create lexically scoped global variables, and the symbol tables are buried in %::.
So you've got:
sub foo {
my($x,$y)=(#_);
print "$x / $y = " . ((0.0+$x)/$y)."\n";
}
sub doit {
foo(3,4);
}
doit();
which of course prints out:
3 / 4 = 0.75
We can replace foo using local and go:
my $oldfoo = \&foo;
local *foo = sub { (#_)=($_[1], $_[0]); goto $oldfoo; };
doit();
And now we get:
4 / 3 = 1.33333333333333
If you wanted to modify *foo without using its name, and you didn't want to use eval, then you could modify it by manipulating %::, for example:
$::{"foo"} = sub { (#_)=($_[0], 1); goto $oldfoo; };
doit();
And now we get:
3 / 1 = 3