Perl scoping and the life of local variables - perl

How long does the memory location allocated by a local variable in Perl live for (both for arrays, hashes and scalars)? For instance:
sub routine
{
my $foo = "bar";
return \$foo;
}
Can you still access the string "bar" in memory after the function has returned? How long will it live for, and is it similar to a static variable in C or more like a variable declared off the heap?
Basically, does this make sense in this context?
$ref = routine()
print ${$ref};

Yes, that code will work fine.
Perl uses reference counting, so the variable will live as long as somebody has a reference to it. Perl's lexical variables are sort of like C's automatic variables, because they normally go away when you leave the scope, but they're also like a variable on the heap, because you can return a reference to one and it will just work.
They're not like C's static variables, because you get a new $foo every time you call routine (even recursively). (Perl 5.10 introduced state variables, which are rather like a C static.)

Related

How to avoid global variable declaration when using Perl's dynamic scoping?

I am trying to write a perl script that calls a function written somewhere else (by someone else) which manipulates some of the variables in my script's scope. Let's say the script is main.pl and the function is there in funcs.pm. My main.pl looks like this:
use warnings;
use strict;
package plshelp;
use funcs;
my $var = 3;
print "$var\n"; # <--- prints 3
{ # New scope somehow prevents visibility of $pointer outside
local our $pointer = \$var;
change();
}
print "$var\n"; # <--- Ideally should print whatever funcs.pm wanted
For some reason, using local our $pointer; prevents visibility of $pointer outside the scope. But if I just use our $pointer;, the variable can be seen outside the scope in main.pl using $plshelp::pointer (but not in funcs.pm, so it would be useless anyway). As a side-note, could someone please explain this?
funcs.pm looks something like this:
use warnings;
use strict;
package plshelp;
sub change
{
${$pointer} = 4;
}
I expected this to change the value of $var and print 4 when the main script was run. But I get a compile error saying $pointer wasn't declared. This error can be removed by adding our $pointer; at the top of change in funcs.pm, but that would create an unnecessary global variable that is visible everywhere. We can also remove this error by removing the use strict;, but that seems like a bad idea. We can also get it to work by using $plshelp::pointer in funcs.pm, but the person writing funcs.pm doesn't want to do that.
Is there a good way to achieve this functionality of letting funcs.pm manipulate variables in my scope without declaring global variables? If we were going for global variables anyway, I guess I don't need to use dynamic scoping at all.
Let's just say it's not possible to pass arguments to the function for some reason.
Update
It seems that local our isn't doing any "special" as far as preventing visibility is concerned. From perldoc:
This means that when use strict 'vars' is in effect, our lets you use a package variable without qualifying it with the package name, but only within the lexical scope of the our declaration. This applies immediately--even within the same statement.
and
This works even if the package variable has not been used before, as package variables spring into existence when first used.
So this means that $pointer "exists" even after we leave the curly braces. Just that we have to refer to it using $plshelp::pointer instead of just $pointer. But since we used local before initializing $pointer, it is still undefined outside the scope (although it is still "declared", whatever that means). A clearer way to write this would be (local (our $pointer)) = \$var;. Here, our $pointer "declares" $pointer and returns $pointer as well. We now apply local on this returned value, and this operation returns $pointer again which we are assigning to \$var.
But this still leaves the main question of whether there is a good way of achieving the required functionality unanswered.
Let's be clear about how global variables with our work and why they have to be declared: There's a difference between the storage of a global variable, and visibility of its unqualified name. Under use strict, undefined variable names will not implicitly refer to a global variable.
We can always access the global variable with its fully qualified name, e.g. $Foo::bar.
If a global variable in the current package already exists at compile time and is marked as an imported variable, we can access it with an unqualified name, e.g. $bar. If a Foo package is written appropriately, we could say use Foo qw($bar); say $bar where $bar is now a global variable in our package.
With our $foo, we create a global variable in the current package if that variable doesn't already exist. The name of the variable is also made available in the current lexical scope, just like the variable of a my declaration.
The local operator does not create a variable. Instead, it saves the current value of a global variable and clears that variable. At the end of the current scope, the old value is restored. You can interpret each global variable name as a stack of values. With local you can add (and remove) values on the stack.
So while local can dynamically scope a value, it does not create a dynamically scoped variable name.
By carefully considering which code is compiled when, it becomes clear why your example doesn't currently work:
In your main script, you load the module funcs. The use statement is executed in the BEGIN phase, i.e. during parsing.
use warnings;
use strict;
package plshelp;
use funcs;
The funcs module is compiled:
use warnings;
use strict;
package plshelp;
sub change
{
${$pointer} = 4;
}
At this point, no $pointer variable is in lexical scope and no imported global $pointer variable exists. Therefore you get an error. This compile-time observation is unrelated to the existence of a $pointer variable at runtime.
The canonical way to fix this error is to declare an our $pointer variable name in the scope of the sub change:
sub change {
our $pointer;
${$pointer} = 4;
}
Note that the global variable will exist anyway, this just brings the name into scope for use as an unqualified variable name.
Just because you can use global variables doesn't mean that you should. There are two issues with them:
On a design level, global variables do not declare a clear interface. By using a fully qualified name you can simply access a variable without any checks. They do not provide any encapsulation. This makes for fragile software and weird action-at-a-distance.
On an implementation level, global variables are simply less efficient than lexical variables. I have never actually seen this matter, but think of the cycles!
Also, global variables are global variables: They can only have one value at a time! Scoping the value with local can help to avoid this in some cases, but there can still be conflicts in complex systems where two modules want to set the same global variable to different values and those modules call into each other.
The only good uses for global variables I have seen are to provide additional context to a callback that cannot take extra parameters, roughly similar to your approach. But where possible it is always better to pass the context as a parameter. Subroutine arguments are already effectively dynamically scoped:
sub change {
my ($pointer) = #_;
${$pointer} = 4;
}
...
my $var = 3;
change(\$var);
If there is a lot of context it can be come cumbersome to pass all those references: change(\$foo, \$bar, \$baz, \#something_else, \%even_more, ...). It could then make sense to bundle that context into an object, which can then be manipulated in a more controlled manner. Manipulating local or global variables is not always the best design.
There's too much wrong with your code to just fix it
You've used package plshelp in both the main script and the module, even though the main entry point is in main.pl and your module is in funcs.pm. That's just irresponsible. Did you imagine that the package statement was solely for advertising for help and it didn't matter what you put in there?
Your post doesn't say what is wrong with what you have written, but it's surprising that it doesn't throw an error.
Here's something close that does what you seem to expect. I can't really explain things as your own code is so far from working
Functions.pm
package Functions;
use strict;
use warnings;
use Exporter 'import';
our #EXPORT_OK = 'change';
sub change {
my ($ref) = #_;
$$ref = 4;
}
main.pl
use strict;
use warnings 'all';
use Functions 'change';
my $var = 44;
print "$var\n";
change(\$var);
print "$var\n";
output
44
4

Naming convention for passing a hash reference to a subroutine

When I pass a hash reference to a subroutine, what is a good naming convention for the variable to which the hash ref parameter is assigned?
In the example below, what is the better option:
Use the same variable name as the original variable (and let Perl hide the original within the block).
Come up with a new name for the hash ref within the subroutine?
I am inclined to use option 1, because a new name for the same hash reference feels redundant.
If option 2 is better, what would be a good naming convention for the variable within the subroutine ?
## example:
use strict;
use warnings;
my $href_phone_book = generate_phone_book();
my $phone_number = get_phone_number($href_phone_book, "john");
print $phone_number."\n";
sub generate_phone_book {
my %phone_book = (
john => "5554321",
alice => "5551234"
);
return \%phone_book;
}
sub get_phone_number{
# Is it OK to hide the original variable
my $href_phone_book = shift;
# Or should I use a new variable name? What would then be a good naming convention?
my $person = shift;
my $phone_number = $href_phone_book->{$person};
return $phone_number;
}
You should name your variables for the data that they reference. Ideally you should use a short non-ambiguous name, and remove context where possible. For example if your function did something that would apply generically to any hash reference, the name $hash_ref would be preferable IMO to $phone_book
1. Use the same variable name as the original variable (and let Perl hide the original within the block).
Although not a generically safe assumption, it is what I would do in your example. I would use $phone_book as the single param.
Note if you are using the same reference data structure repeatedly as a parameter to functions, then it points to writing some OO Perl. You should probably create a class, with the data structure the object, and functions changed to instance methods.
Caveat: Your example script will generate a warning due to variable scope. So my advice needs to be modified. I would not alter the subroutine's variable but instead name the global variable something like $main_phone_book . . . but this is only necessary due to the structure of your example script. Ideally you should avoid global data structures, and use modularisation (a PhoneBook.pm file that contained all the logic for handling phone numbers, whether exported functions or a new class, would not have a clash on the variable name because of package namespaces). Or if the variables are truly global to the problem at hand you probably would not be passing them as parameters at all.
2. Come up with a new name for the hash ref within the subroutine?
In this case it is only necessary due to file structure of the example script. Consistency, at least at the level of individual scripts and modules, is more important than finding some special naming convention that can handle all your needs.
I suggest you read the variable naming convention chapters of Perl Best Practices. It is very clear and useful advice.

Why do '::' and '->' work (sort of) interchangeably when calling methods from Perl modules?

I keep getting :: confused with -> when calling subroutines from modules. I know that :: is more related to paths and where the module/subroutine is and -> is used for objects, but I don't really understand why I can seemingly interchange both and it not come up with immediate errors.
I have perl modules which are part of a larger package, e.g. FullProgram::Part1
I'm just about getting to grips with modules, but still am on wobbly grounds when it comes to Perl objects, but I've been accidentally doing this:
FullProgram::Part1::subroutine1();
instead of
FullProgram::Part1->subroutine1();
so when I've been passing a hash ref to subroutine1 and been careful about using $class/$self to deal with the object reference and accidentally use :: I end up pulling my hair out wondering why my hash ref seems to disappear. I have learnt my lesson, but would really like an explanation of the difference. I have read the perldocs and various websites on these but I haven't seen any comparisons between the two (quite hard to google...)
All help appreciated - always good to understand what I'm doing!
There's no inherent difference between a vanilla sub and one's that's a method. It's all in how you call it.
Class::foo('a');
This will call Class::foo. If Class::foo doesn't exist, the inheritance tree will not be checked. Class::foo will be passed only the provided arguments ('a').
It's roughly the same as: my $sub = \&Class::foo; $sub->('a');
Class->foo('a');
This will call Class::foo, or foo in one of its base classes if Class::foo doesn't exist. The invocant (what's on the left of the ->) will be passed as an argument.
It's roughly the same as: my $sub = Class->can('foo'); $sub->('Class', 'a');
FullProgram::Part1::subroutine1();
calls the subroutine subroutine1 of the package FullProgram::Part1 with an empty parameter list while
FullProgram::Part1->subroutine1();
calls the same subroutine with the package name as the first argument (note that it gets a little bit more complex when you're subclassing). This syntax is used by constructor methods that need the class name for building objects of subclasses like
sub new {
my ($class, #args) = #_;
...
return bless $thing, $class;
}
FYI: in Perl OO you see $object->method(#args) which calls Class::method with the object (a blessed reference) as the first argument instead of the package/class name. In a method like this, the subroutine could work like this:
sub method {
my ($self, $foo, $bar) = #_;
$self->do_something_with($bar);
# ...
}
which will call the subroutine do_something_with with the object as first argument again followed by the value of $bar which was the second list element you originally passed to method in #args. That way the object itself doesn't get lost.
For more informations about how the inheritance tree becomes important when calling methods, please see ikegami's answer!
Use both!
use Module::Two;
Module::Two::->class_method();
Note that this works but also protects you against an ambiguity there; the simple
Module::Two->class_method();
will be interpreted as:
Module::Two()->class_method();
(calling the subroutine Two in Module and trying to call class_method on its return value - likely resulting in a runtime error or calling a class or instance method in some completely different class) if there happens to be a sub Two in Module - something that you shouldn't depend on one way or the other, since it's not any of your code's business what is in Module.
Historically, Perl dont had any OO. And functions from packages called with FullProgram::Part1::subroutine1(); sytax. Or even before with FullProgram'Part1'subroutine1(); syntax(deprecated).
Later, they implemented OOP with -> sign, but dont changed too much actually. FullProgram::Part1->subroutine1(); calls subroutine1 and FullProgram::Part1 goes as 1st parameter. you can see usage of this when you create an object: my $cgi = CGI->new(). Now, when you call a method from this object, left part also goes as first parameter to function: $cgi->param(''). Thats how param gets object he called from (usually named $self). Thats it. -> is hack for OOP. So as a result Perl does not have classes(packages work as them) but does have objects("objects" hacks too - they are blessed scalars).
Offtop: Also you can call with my $cgi = new CGI; syntax. This is same as CGI->new. Same when you say print STDOUT "text\n";. Yeah, just just calling IOHandle::print().

Can one pass Perl hash references between processes?

I have an ActiveState PerlCtrl project. I'd like to know if it's possible to have a hash in the COM DLL, pass it's ref out to the calling process as a string (e.g. "HASH(0x2345)") and then pass that string back into the COM DLL and somehow bless it back into pointing to the relevant hash.
Getting the hashref seems easy enough, using return "" . \%Graph; and I have tried things like $Graph = shift; $Graph = bless {%$Graph}; but they don't seem to achieve what I'm after. The %Graph hash is at least global to the module.
The testing code (VBScript):
set o = CreateObject("Project.BOGLE.1")
x = o.new_graph()
wscript.echo x
x = o.add_vertex(x, "foo")
If these are different processes, you will need to either serialize the content of the hash or persistently store it in a disk file. To do the former, see Storable or Data::Dumper; for the latter, it depends whether it's a hash of simple scalars or something more complex.
If it is the same perl interpreter in the same process, you can keep some global variable like %main::hashes;
set $main::hashes{\%Graph} = \%Graph before passing the stringified reference back to the calling process, then later use it to look up the actual hash reference.
Don't do this, though: http://perlmonks.org/?node_id=379395.
No, you can't reliably pass hash references between processes.

Can Perl method calls be intercepted?

Can you intercept a method call in Perl, do something with the arguments, and then execute it?
Yes, you can intercept Perl subroutine calls. I have an entire chapter about that sort of thing in Mastering Perl. Check out the Hook::LexWrap module, which lets you do it without going through all of the details. Perl's methods are just subroutines.
You can also create a subclass and override the method you want to catch. That's a slightly better way to do it because that's the way object-oriented programming wants you do to it. However, sometimes people write code that doesn't allow you to do this properly. There's more about that in Mastering Perl too.
To describe briefly, Perl has the aptitude to modify symbol table. You call a subroutine (method) via symbol table of the package, to which the method belongs. If you modify the symbol table (and this is not considered very dirty), you can substitute most method calls with calling the other methods you specify. This demonstrates the approach:
# The subroutine we'll interrupt calls to
sub call_me
{
print shift,"\n";
}
# Intercepting factory
sub aspectate
{
my $callee = shift;
my $value = shift;
return sub { $callee->($value + shift); };
}
my $aspectated_call_me = aspectate \&call_me, 100;
# Rewrite symbol table of main package (lasts to the end of the block).
# Replace "main" with the name of the package (class) you're intercepting
local *main::call_me = $aspectated_call_me;
# Voila! Prints 105!
call_me(5);
This also shows that, once someone takes reference of the subroutine and calls it via the reference, you can no longer influence such calls.
I am pretty sure there are frameworks to do aspectation in perl, but this, I hope, demonstrates the approach.
This looks like a job for Moose! Moose is an object system for Perl that can do that and lots more. The docs will do a much better job at explaining than I can, but what you'll likely want is a Method Modifier, specifically before.
You can, and Pavel describes a good way to do it, but you should probably elaborate as to why you are wanting to do this in the first place.
If you're looking for advanced ways of intercepting calls to arbitrary subroutines, then fiddling with symbol tables will work for you, but if you want to be adding functionality to functions perhaps exported to the namespace you are currently working in, then you might need to know of ways to call functions that exist in other namespaces.
Data::Dumper, for example, normally exports the function 'Dumper' to the calling namespace, but you can override or disable that and provide your own Dumper function which then calls the original by way of the fully qualified name.
e.g.
use Data::Dumper;
sub Dumper {
warn 'Dumping variables';
print Data::Dumper::Dumper(#_);
}
my $foo = {
bar => 'barval',
};
Dumper($foo);
Again, this is an alternate solution that may be more appropriate depending on the original problem. A lot of fun can be had when playing with the symbol table, but it may be overkill and could lead to hard to maintain code if you don't need it.
Yes.
You need three things:
The arguments to a call are in #_ which is just another dynamically scoped variable.
Then, goto supports a reference-sub argument which preserves the current #_ but makes another (tail) function call.
Finally local can be used to create lexically scoped global variables, and the symbol tables are buried in %::.
So you've got:
sub foo {
my($x,$y)=(#_);
print "$x / $y = " . ((0.0+$x)/$y)."\n";
}
sub doit {
foo(3,4);
}
doit();
which of course prints out:
3 / 4 = 0.75
We can replace foo using local and go:
my $oldfoo = \&foo;
local *foo = sub { (#_)=($_[1], $_[0]); goto $oldfoo; };
doit();
And now we get:
4 / 3 = 1.33333333333333
If you wanted to modify *foo without using its name, and you didn't want to use eval, then you could modify it by manipulating %::, for example:
$::{"foo"} = sub { (#_)=($_[0], 1); goto $oldfoo; };
doit();
And now we get:
3 / 1 = 3