Naming convention for passing a hash reference to a subroutine - perl

When I pass a hash reference to a subroutine, what is a good naming convention for the variable to which the hash ref parameter is assigned?
In the example below, what is the better option:
Use the same variable name as the original variable (and let Perl hide the original within the block).
Come up with a new name for the hash ref within the subroutine?
I am inclined to use option 1, because a new name for the same hash reference feels redundant.
If option 2 is better, what would be a good naming convention for the variable within the subroutine ?
## example:
use strict;
use warnings;
my $href_phone_book = generate_phone_book();
my $phone_number = get_phone_number($href_phone_book, "john");
print $phone_number."\n";
sub generate_phone_book {
my %phone_book = (
john => "5554321",
alice => "5551234"
);
return \%phone_book;
}
sub get_phone_number{
# Is it OK to hide the original variable
my $href_phone_book = shift;
# Or should I use a new variable name? What would then be a good naming convention?
my $person = shift;
my $phone_number = $href_phone_book->{$person};
return $phone_number;
}

You should name your variables for the data that they reference. Ideally you should use a short non-ambiguous name, and remove context where possible. For example if your function did something that would apply generically to any hash reference, the name $hash_ref would be preferable IMO to $phone_book
1. Use the same variable name as the original variable (and let Perl hide the original within the block).
Although not a generically safe assumption, it is what I would do in your example. I would use $phone_book as the single param.
Note if you are using the same reference data structure repeatedly as a parameter to functions, then it points to writing some OO Perl. You should probably create a class, with the data structure the object, and functions changed to instance methods.
Caveat: Your example script will generate a warning due to variable scope. So my advice needs to be modified. I would not alter the subroutine's variable but instead name the global variable something like $main_phone_book . . . but this is only necessary due to the structure of your example script. Ideally you should avoid global data structures, and use modularisation (a PhoneBook.pm file that contained all the logic for handling phone numbers, whether exported functions or a new class, would not have a clash on the variable name because of package namespaces). Or if the variables are truly global to the problem at hand you probably would not be passing them as parameters at all.
2. Come up with a new name for the hash ref within the subroutine?
In this case it is only necessary due to file structure of the example script. Consistency, at least at the level of individual scripts and modules, is more important than finding some special naming convention that can handle all your needs.
I suggest you read the variable naming convention chapters of Perl Best Practices. It is very clear and useful advice.

Related

How to avoid global variable declaration when using Perl's dynamic scoping?

I am trying to write a perl script that calls a function written somewhere else (by someone else) which manipulates some of the variables in my script's scope. Let's say the script is main.pl and the function is there in funcs.pm. My main.pl looks like this:
use warnings;
use strict;
package plshelp;
use funcs;
my $var = 3;
print "$var\n"; # <--- prints 3
{ # New scope somehow prevents visibility of $pointer outside
local our $pointer = \$var;
change();
}
print "$var\n"; # <--- Ideally should print whatever funcs.pm wanted
For some reason, using local our $pointer; prevents visibility of $pointer outside the scope. But if I just use our $pointer;, the variable can be seen outside the scope in main.pl using $plshelp::pointer (but not in funcs.pm, so it would be useless anyway). As a side-note, could someone please explain this?
funcs.pm looks something like this:
use warnings;
use strict;
package plshelp;
sub change
{
${$pointer} = 4;
}
I expected this to change the value of $var and print 4 when the main script was run. But I get a compile error saying $pointer wasn't declared. This error can be removed by adding our $pointer; at the top of change in funcs.pm, but that would create an unnecessary global variable that is visible everywhere. We can also remove this error by removing the use strict;, but that seems like a bad idea. We can also get it to work by using $plshelp::pointer in funcs.pm, but the person writing funcs.pm doesn't want to do that.
Is there a good way to achieve this functionality of letting funcs.pm manipulate variables in my scope without declaring global variables? If we were going for global variables anyway, I guess I don't need to use dynamic scoping at all.
Let's just say it's not possible to pass arguments to the function for some reason.
Update
It seems that local our isn't doing any "special" as far as preventing visibility is concerned. From perldoc:
This means that when use strict 'vars' is in effect, our lets you use a package variable without qualifying it with the package name, but only within the lexical scope of the our declaration. This applies immediately--even within the same statement.
and
This works even if the package variable has not been used before, as package variables spring into existence when first used.
So this means that $pointer "exists" even after we leave the curly braces. Just that we have to refer to it using $plshelp::pointer instead of just $pointer. But since we used local before initializing $pointer, it is still undefined outside the scope (although it is still "declared", whatever that means). A clearer way to write this would be (local (our $pointer)) = \$var;. Here, our $pointer "declares" $pointer and returns $pointer as well. We now apply local on this returned value, and this operation returns $pointer again which we are assigning to \$var.
But this still leaves the main question of whether there is a good way of achieving the required functionality unanswered.
Let's be clear about how global variables with our work and why they have to be declared: There's a difference between the storage of a global variable, and visibility of its unqualified name. Under use strict, undefined variable names will not implicitly refer to a global variable.
We can always access the global variable with its fully qualified name, e.g. $Foo::bar.
If a global variable in the current package already exists at compile time and is marked as an imported variable, we can access it with an unqualified name, e.g. $bar. If a Foo package is written appropriately, we could say use Foo qw($bar); say $bar where $bar is now a global variable in our package.
With our $foo, we create a global variable in the current package if that variable doesn't already exist. The name of the variable is also made available in the current lexical scope, just like the variable of a my declaration.
The local operator does not create a variable. Instead, it saves the current value of a global variable and clears that variable. At the end of the current scope, the old value is restored. You can interpret each global variable name as a stack of values. With local you can add (and remove) values on the stack.
So while local can dynamically scope a value, it does not create a dynamically scoped variable name.
By carefully considering which code is compiled when, it becomes clear why your example doesn't currently work:
In your main script, you load the module funcs. The use statement is executed in the BEGIN phase, i.e. during parsing.
use warnings;
use strict;
package plshelp;
use funcs;
The funcs module is compiled:
use warnings;
use strict;
package plshelp;
sub change
{
${$pointer} = 4;
}
At this point, no $pointer variable is in lexical scope and no imported global $pointer variable exists. Therefore you get an error. This compile-time observation is unrelated to the existence of a $pointer variable at runtime.
The canonical way to fix this error is to declare an our $pointer variable name in the scope of the sub change:
sub change {
our $pointer;
${$pointer} = 4;
}
Note that the global variable will exist anyway, this just brings the name into scope for use as an unqualified variable name.
Just because you can use global variables doesn't mean that you should. There are two issues with them:
On a design level, global variables do not declare a clear interface. By using a fully qualified name you can simply access a variable without any checks. They do not provide any encapsulation. This makes for fragile software and weird action-at-a-distance.
On an implementation level, global variables are simply less efficient than lexical variables. I have never actually seen this matter, but think of the cycles!
Also, global variables are global variables: They can only have one value at a time! Scoping the value with local can help to avoid this in some cases, but there can still be conflicts in complex systems where two modules want to set the same global variable to different values and those modules call into each other.
The only good uses for global variables I have seen are to provide additional context to a callback that cannot take extra parameters, roughly similar to your approach. But where possible it is always better to pass the context as a parameter. Subroutine arguments are already effectively dynamically scoped:
sub change {
my ($pointer) = #_;
${$pointer} = 4;
}
...
my $var = 3;
change(\$var);
If there is a lot of context it can be come cumbersome to pass all those references: change(\$foo, \$bar, \$baz, \#something_else, \%even_more, ...). It could then make sense to bundle that context into an object, which can then be manipulated in a more controlled manner. Manipulating local or global variables is not always the best design.
There's too much wrong with your code to just fix it
You've used package plshelp in both the main script and the module, even though the main entry point is in main.pl and your module is in funcs.pm. That's just irresponsible. Did you imagine that the package statement was solely for advertising for help and it didn't matter what you put in there?
Your post doesn't say what is wrong with what you have written, but it's surprising that it doesn't throw an error.
Here's something close that does what you seem to expect. I can't really explain things as your own code is so far from working
Functions.pm
package Functions;
use strict;
use warnings;
use Exporter 'import';
our #EXPORT_OK = 'change';
sub change {
my ($ref) = #_;
$$ref = 4;
}
main.pl
use strict;
use warnings 'all';
use Functions 'change';
my $var = 44;
print "$var\n";
change(\$var);
print "$var\n";
output
44
4

Anonymous subroutines/subroutine references stored in data structures

Why would I use Perl anonymous subroutines instead of a named one? inspired me to think about the merit of:
Storing anonymous subs in arrays, hashes and scalars.
It's a pretty cool concept, but is it practical in any way? Is there any reason why I'd have to use anonymous subs/sub references stored in some sort of data structure? Or perhaps a situation where it will be convenient?
I understand why anonymous subs are required in certain contexts such as dealing with shared variables (when an anonymous sub is declared inside another sub), but unless I'm missing something, I just don't see the point of using any sort of function reference. It seems like we should just call the functions outright and the code would look much nicer/more organized.
Please tell me I'm wrong. I'd love to have a good reason to use these things.
Thanks in advance.
A dispatch table is useful for dynamically determining steps to take based on some value:
my %disp = (
foo => sub { 'foo' },
bar => sub { 'bar' },
);
my $cmd = get_cmd_somehow();
if (defined $disp{$cmd}) {
$disp{$cmd}->(#args)
} else {
die "I don't know how to handle $cmd"
}
(Method dispatch via ->can($method) is conceptually similar, but more flexible and the details are hidden.)
Anonymous functions and lexical closure has many other uses; perhaps look deeper into "higher-order functions." (Think about map()/grep(), for example.)
Object-oriented methods are very much akin to anonymous subroutines. Polymorphism means that an object's methods can change without the calling code having to do lookups manually to see what routine to run. And that's VERY useful.
Also, think about perl's sort. Why set up a named routine just for a simple sort method? Ditto map and grep.
As well, iterators are very useful. Also, think about storing a routine that can be resolved later, rather than only being able to store a static value.
In the end, if you don't want to store anonymous routines (or even references to routines) that's your business. But having the option is way better than not having it.

Why do '::' and '->' work (sort of) interchangeably when calling methods from Perl modules?

I keep getting :: confused with -> when calling subroutines from modules. I know that :: is more related to paths and where the module/subroutine is and -> is used for objects, but I don't really understand why I can seemingly interchange both and it not come up with immediate errors.
I have perl modules which are part of a larger package, e.g. FullProgram::Part1
I'm just about getting to grips with modules, but still am on wobbly grounds when it comes to Perl objects, but I've been accidentally doing this:
FullProgram::Part1::subroutine1();
instead of
FullProgram::Part1->subroutine1();
so when I've been passing a hash ref to subroutine1 and been careful about using $class/$self to deal with the object reference and accidentally use :: I end up pulling my hair out wondering why my hash ref seems to disappear. I have learnt my lesson, but would really like an explanation of the difference. I have read the perldocs and various websites on these but I haven't seen any comparisons between the two (quite hard to google...)
All help appreciated - always good to understand what I'm doing!
There's no inherent difference between a vanilla sub and one's that's a method. It's all in how you call it.
Class::foo('a');
This will call Class::foo. If Class::foo doesn't exist, the inheritance tree will not be checked. Class::foo will be passed only the provided arguments ('a').
It's roughly the same as: my $sub = \&Class::foo; $sub->('a');
Class->foo('a');
This will call Class::foo, or foo in one of its base classes if Class::foo doesn't exist. The invocant (what's on the left of the ->) will be passed as an argument.
It's roughly the same as: my $sub = Class->can('foo'); $sub->('Class', 'a');
FullProgram::Part1::subroutine1();
calls the subroutine subroutine1 of the package FullProgram::Part1 with an empty parameter list while
FullProgram::Part1->subroutine1();
calls the same subroutine with the package name as the first argument (note that it gets a little bit more complex when you're subclassing). This syntax is used by constructor methods that need the class name for building objects of subclasses like
sub new {
my ($class, #args) = #_;
...
return bless $thing, $class;
}
FYI: in Perl OO you see $object->method(#args) which calls Class::method with the object (a blessed reference) as the first argument instead of the package/class name. In a method like this, the subroutine could work like this:
sub method {
my ($self, $foo, $bar) = #_;
$self->do_something_with($bar);
# ...
}
which will call the subroutine do_something_with with the object as first argument again followed by the value of $bar which was the second list element you originally passed to method in #args. That way the object itself doesn't get lost.
For more informations about how the inheritance tree becomes important when calling methods, please see ikegami's answer!
Use both!
use Module::Two;
Module::Two::->class_method();
Note that this works but also protects you against an ambiguity there; the simple
Module::Two->class_method();
will be interpreted as:
Module::Two()->class_method();
(calling the subroutine Two in Module and trying to call class_method on its return value - likely resulting in a runtime error or calling a class or instance method in some completely different class) if there happens to be a sub Two in Module - something that you shouldn't depend on one way or the other, since it's not any of your code's business what is in Module.
Historically, Perl dont had any OO. And functions from packages called with FullProgram::Part1::subroutine1(); sytax. Or even before with FullProgram'Part1'subroutine1(); syntax(deprecated).
Later, they implemented OOP with -> sign, but dont changed too much actually. FullProgram::Part1->subroutine1(); calls subroutine1 and FullProgram::Part1 goes as 1st parameter. you can see usage of this when you create an object: my $cgi = CGI->new(). Now, when you call a method from this object, left part also goes as first parameter to function: $cgi->param(''). Thats how param gets object he called from (usually named $self). Thats it. -> is hack for OOP. So as a result Perl does not have classes(packages work as them) but does have objects("objects" hacks too - they are blessed scalars).
Offtop: Also you can call with my $cgi = new CGI; syntax. This is same as CGI->new. Same when you say print STDOUT "text\n";. Yeah, just just calling IOHandle::print().

Can one pass Perl hash references between processes?

I have an ActiveState PerlCtrl project. I'd like to know if it's possible to have a hash in the COM DLL, pass it's ref out to the calling process as a string (e.g. "HASH(0x2345)") and then pass that string back into the COM DLL and somehow bless it back into pointing to the relevant hash.
Getting the hashref seems easy enough, using return "" . \%Graph; and I have tried things like $Graph = shift; $Graph = bless {%$Graph}; but they don't seem to achieve what I'm after. The %Graph hash is at least global to the module.
The testing code (VBScript):
set o = CreateObject("Project.BOGLE.1")
x = o.new_graph()
wscript.echo x
x = o.add_vertex(x, "foo")
If these are different processes, you will need to either serialize the content of the hash or persistently store it in a disk file. To do the former, see Storable or Data::Dumper; for the latter, it depends whether it's a hash of simple scalars or something more complex.
If it is the same perl interpreter in the same process, you can keep some global variable like %main::hashes;
set $main::hashes{\%Graph} = \%Graph before passing the stringified reference back to the calling process, then later use it to look up the actual hash reference.
Don't do this, though: http://perlmonks.org/?node_id=379395.
No, you can't reliably pass hash references between processes.

How to make a hash of objects in perl

I would like to be able to store objects in a hash structure so I can work with the name of the object as a variable.
Could someone help me make a
sub new{
...
}
routine that creates a new object as member of a hash? I am not exactly sure how to go about doing this or how to refer to and/or use the object when it is stored like this. I just want to be able to use and refer to the name of the object for other subroutines.
See my comment in How can I get name of an object in Perl? for why I want to do this.
Thank you
Objects don't really have names. Why are you trying to give them names? One of the fundamental points of references is that you don't need to know a name, or even what class it is, to work with it.
There's probably a much better way to achieve your task.
However, since objects are just references, and references are just scalars, the object can be a hash value:
my %hash = (
some_name => Class->new( ... ),
other_name => Class->new( ... ).
);
You might want to check out a book such as Intermediate Perl to learn how references and objects work.
Don't quite understand what you are trying to do. Perhaps you can provide some concrete examples?
You can store objects into hashes just like any other variable in perl.
my %hash = ( );
$hash{'foo'} = new Foo(...);
$hash{'bar'} = new Bar(...);
Assuming you know the object stored at 'foo' is a Foo object and at 'bar' is a Bar object, then you can retrieve the objects from the hash and use it.
$hash{'foo'}->foo_method();
$hash{'bar'}->bar_method();
You may want to programmatically determine this behavior at run time. That's assuming that you are sticking with this naming scheme.