How to make a hash of objects in perl - perl

I would like to be able to store objects in a hash structure so I can work with the name of the object as a variable.
Could someone help me make a
sub new{
...
}
routine that creates a new object as member of a hash? I am not exactly sure how to go about doing this or how to refer to and/or use the object when it is stored like this. I just want to be able to use and refer to the name of the object for other subroutines.
See my comment in How can I get name of an object in Perl? for why I want to do this.
Thank you

Objects don't really have names. Why are you trying to give them names? One of the fundamental points of references is that you don't need to know a name, or even what class it is, to work with it.
There's probably a much better way to achieve your task.
However, since objects are just references, and references are just scalars, the object can be a hash value:
my %hash = (
some_name => Class->new( ... ),
other_name => Class->new( ... ).
);
You might want to check out a book such as Intermediate Perl to learn how references and objects work.

Don't quite understand what you are trying to do. Perhaps you can provide some concrete examples?
You can store objects into hashes just like any other variable in perl.
my %hash = ( );
$hash{'foo'} = new Foo(...);
$hash{'bar'} = new Bar(...);
Assuming you know the object stored at 'foo' is a Foo object and at 'bar' is a Bar object, then you can retrieve the objects from the hash and use it.
$hash{'foo'}->foo_method();
$hash{'bar'}->bar_method();
You may want to programmatically determine this behavior at run time. That's assuming that you are sticking with this naming scheme.

Related

Naming convention for passing a hash reference to a subroutine

When I pass a hash reference to a subroutine, what is a good naming convention for the variable to which the hash ref parameter is assigned?
In the example below, what is the better option:
Use the same variable name as the original variable (and let Perl hide the original within the block).
Come up with a new name for the hash ref within the subroutine?
I am inclined to use option 1, because a new name for the same hash reference feels redundant.
If option 2 is better, what would be a good naming convention for the variable within the subroutine ?
## example:
use strict;
use warnings;
my $href_phone_book = generate_phone_book();
my $phone_number = get_phone_number($href_phone_book, "john");
print $phone_number."\n";
sub generate_phone_book {
my %phone_book = (
john => "5554321",
alice => "5551234"
);
return \%phone_book;
}
sub get_phone_number{
# Is it OK to hide the original variable
my $href_phone_book = shift;
# Or should I use a new variable name? What would then be a good naming convention?
my $person = shift;
my $phone_number = $href_phone_book->{$person};
return $phone_number;
}
You should name your variables for the data that they reference. Ideally you should use a short non-ambiguous name, and remove context where possible. For example if your function did something that would apply generically to any hash reference, the name $hash_ref would be preferable IMO to $phone_book
1. Use the same variable name as the original variable (and let Perl hide the original within the block).
Although not a generically safe assumption, it is what I would do in your example. I would use $phone_book as the single param.
Note if you are using the same reference data structure repeatedly as a parameter to functions, then it points to writing some OO Perl. You should probably create a class, with the data structure the object, and functions changed to instance methods.
Caveat: Your example script will generate a warning due to variable scope. So my advice needs to be modified. I would not alter the subroutine's variable but instead name the global variable something like $main_phone_book . . . but this is only necessary due to the structure of your example script. Ideally you should avoid global data structures, and use modularisation (a PhoneBook.pm file that contained all the logic for handling phone numbers, whether exported functions or a new class, would not have a clash on the variable name because of package namespaces). Or if the variables are truly global to the problem at hand you probably would not be passing them as parameters at all.
2. Come up with a new name for the hash ref within the subroutine?
In this case it is only necessary due to file structure of the example script. Consistency, at least at the level of individual scripts and modules, is more important than finding some special naming convention that can handle all your needs.
I suggest you read the variable naming convention chapters of Perl Best Practices. It is very clear and useful advice.

Anonymous subroutines/subroutine references stored in data structures

Why would I use Perl anonymous subroutines instead of a named one? inspired me to think about the merit of:
Storing anonymous subs in arrays, hashes and scalars.
It's a pretty cool concept, but is it practical in any way? Is there any reason why I'd have to use anonymous subs/sub references stored in some sort of data structure? Or perhaps a situation where it will be convenient?
I understand why anonymous subs are required in certain contexts such as dealing with shared variables (when an anonymous sub is declared inside another sub), but unless I'm missing something, I just don't see the point of using any sort of function reference. It seems like we should just call the functions outright and the code would look much nicer/more organized.
Please tell me I'm wrong. I'd love to have a good reason to use these things.
Thanks in advance.
A dispatch table is useful for dynamically determining steps to take based on some value:
my %disp = (
foo => sub { 'foo' },
bar => sub { 'bar' },
);
my $cmd = get_cmd_somehow();
if (defined $disp{$cmd}) {
$disp{$cmd}->(#args)
} else {
die "I don't know how to handle $cmd"
}
(Method dispatch via ->can($method) is conceptually similar, but more flexible and the details are hidden.)
Anonymous functions and lexical closure has many other uses; perhaps look deeper into "higher-order functions." (Think about map()/grep(), for example.)
Object-oriented methods are very much akin to anonymous subroutines. Polymorphism means that an object's methods can change without the calling code having to do lookups manually to see what routine to run. And that's VERY useful.
Also, think about perl's sort. Why set up a named routine just for a simple sort method? Ditto map and grep.
As well, iterators are very useful. Also, think about storing a routine that can be resolved later, rather than only being able to store a static value.
In the end, if you don't want to store anonymous routines (or even references to routines) that's your business. But having the option is way better than not having it.

Perl Autovivication Use case

I came across this piece of code (modified excerpt):
my $respMap;
my $respIdArray;
foreach my $respId (#$someList) {
push(#$respIdArray, $respId);
}
$respMap->{'ids'} = $respIdArray;
return $respMap;
Is there a reason to use autovivication in this case? Why not simply do
my $respMap;
my #respIdArray;
foreach my $respId (#$someList) {
push(#respIdArray, $respId);
}
$respMap->{'ids'} = \#respIdArray;
return $respMap;
Follow up: Could someone give me a good use case of autovivication?
Either way is correct; first one using array reference $respIdArray, and second plain array #respIdArray. You'll need array references when building complex data structures (check perldoc perlreftut), but other than that it's up to you which one you'll choose.
Note that in both cases you're assigning array reference to $respMap->{'ids'}, so examples are actually pretty similar.
And btw, autovivification is another thing and has to do with dynamic creation of data structures.
Autovivication is more useful when dealing with deep structures.
push( #{$hash{'key'}{$subkey}}, 'value' );

Perl - Calling subclass constructor from superclass (OO)

This may turn out to be an embarrassingly stupid question, but better than potentially creating embarrassingly stupid code. :-) This is an OO design question, really.
Let's say I have an object class 'Foos' that represents a set of dynamic configuration elements, which are obtained by querying a command on disk, 'mycrazyfoos -getconfig'. Let's say that there are two categories of behavior that I want 'Foos' objects to have:
Existing ones: one is, query ones that exist in the command output I just mentioned (/usr/bin/mycrazyfoos -getconfig`. Make modifications to existing ones via shelling out commands.
Create new ones that don't exist; new 'crazyfoos', using a complex set of /usr/bin/mycrazyfoos commands and parameters. Here I'm not really just querying, but actually running a bunch of system() commands. Affecting changes.
Here's my class structure:
Foos.pm
package Foos, which has a new($hashref->{name => 'myfooname',) constructor that takes a 'crazyfoo NAME' and then queries the existence of that NAME to see if it already exists (by shelling out and running the mycrazyfoos command above). If that crazyfoo already exists, return a Foos::Existing object. Any changes to this object requires shelling out, running commands and getting confirmation that everything ran okay.
If this is the way to go, then the new() constructor needs to have a test to see which subclass constructor to use (if that even makes sense in this context). Here are the subclasses:
Foos/Existing.pm
As mentioned above, this is for when a Foos object already exists.
Foos/Pending.pm
This is an object that will be created if, in the above, the 'crazyfoo NAME' doesn't actually exist. In this case, the new() constructor above will be checked for additional parameters, and it will go ahead and, when called using ->create() shell out using system() and create a new object... possibly returning an 'Existing' one...
OR
As I type this out, I am realizing it is perhaps it's better to have a single:
(an alternative arrangement)
Foos class, that has a
->new() that takes just a name
->create() that takes additional creation parameters
->delete(), ->change() and other params that affect ones that exist; that will have to just be checked dynamically.
So here we are, two main directions to go with this. I'm curious which would be the more intelligent way to go.
In general it's a mistake (design-wise, not syntax-wise) for the new method to return anything but a new object. If you want to sometimes return an existing object, call that method something else, e.g. new_from_cache().
I also find it odd that you're splitting up this functionality (constructing a new object, and returning an existing one) not just into separate namespaces, but also different objects. So in general, you're closer with your second approach, but you can still have the main constructor (new) handle a variety of arguments:
package Foos;
use strict;
use warnings;
sub new
{
my ($class, %args) = #_;
if ($args{name})
{
# handle the name => value option
}
if ($args{some_other_option})
{
# ...
}
my $this = {
# fill in any fields you need...
};
return bless $this, $class;
}
sub new_from_cache
{
my ($class, %args) = #_;
# check if the object already exists...
# if not, create a new object
return $class->new(%args);
}
Note: I don't want to complicate things while you're still learning, but you may also want to look at Moose, which takes care of a lot of the gory details of construction for you, and the definition of attributes and their accessors.
It is generally speaking a bad idea for a superclass to know about its subclasses, a principle which extends to construction.[1] If you need to decide at runtime what kind of object to create (and you do), create a fourth class to have just that job. This is one kind of "factory".
Having said that in answer to your nominal question, your problem as described does not seem to call for subclassing. In particular, you apparently are going to be treating the different classes of Foos differently depending on which concrete class they belong to. All you're really asking for is a unified way to instantiate two separate classes of objects.
So how's this suggestion[3]: Make Foos::Exists and Foos::Pending two separate and unrelated classes and provide (in Foos) a method that returns the appropriate one. Don't call it new; you're not making a new Foos.
If you want to unify the interfaces so that clients don't have to know which kind they're talking about, then we can talk subclassing (or better yet, delegation to a lazily-created and -updated Foos::Handle).
[1]: Explaining why this is true is a subject hefty enough for a book[2], but the short answer is that it creates a dependency cycle between the subclass (which depends on its superclass by definition) and the superclass (which is being made to depend on its subclass by a poor design decision).
[2]: Lakos, John. (1996). Large-scale C++ Software Design. Addison-Wesley.
[3]: Not a recommendation, since I can't get a good enough handle on your requirements to be sure I'm not shooting fish in a dark ocean.
It is also a factory pattern (bad in Perl) if the object's constructor will return an instance blessed into more than one package.
I would create something like this. If the names exists than is_created is set to 1, otherwise it is set to 0.. I would merge the ::Pending, and ::Existing together, and if the object isn't created just put that into the default for the _object, the check happens lazily. Also, Foo->delete() and Foo->change() will defer to the instance in _object.
package Foo;
use Moose;
has 'name' => ( is => 'ro', isa => 'Str', required => 1 );
has 'is_created' => (
is => 'ro'
, isa => 'Bool'
, init_arg => undef
, default => sub {
stuff_if_exists ? 1 : 0
}
);
has '_object' => (
isa => 'Object'
, is => 'ro'
, lazy => 1
, init_arg => undef
, default => sub {
my $self = shift;
$self->is_created
? Foo->new
: Bar->new
}
, handles => [qw/delete change/]
);
Interesting answers! I am digesting it as I try out different things in code.
Well, I have another variation of the same question -- the same question, mind you, just a different problem to the same class:subclass creation issue!
This time:
This code is an interface to a command line that has a number of different complex options. I told you about /usr/bin/mycrazyfoos before, right? Well, what if I told you that that binary changes based on versions, and sometimes it completely changes its underlying options. And that this class we're writing, it has to be able to account for all of these things. The goal (or perhaps idea) is to do: (perhaps called FROM the Foos class we were discussing above):
Foos::Commandline, which has as subclasses different versions of the underlying '/usr/bin/mycrazyfoos' command.
Example:
my $fcommandobj = new Foos::Commandline;
my #raw_output_list = $fcommandobj->getlist();
my $result_dance = $fcommandobj->dance();
where 'getlist' and 'dance' are version-dependent. I thought about doing this:
package Foos::Commandline;
new (
#Figure out some clever way to decide what version user has
# (automagically)
# And call appropriate subclass? Wait, you all are telling me this is bad OO:
# if v1.0.1 (new Foos::Commandline::v1.0.1.....
# else if v1.2 (new Foos::Commandline::v1.2....
#etc
}
then
package Foos::Commandline::v1.0.1;
sub getlist ( eval... system ("/usr/bin/mycrazyfoos", "-getlistbaby"
# etc etc
and (different .pm files, in subdir of Foos/Commandline)
package Foos::Commandline::v1.2;
sub getlist ( eval... system ("/usr/bin/mycrazyfoos", "-getlistohyeahrightheh"
#etc
Make sense? I expressed in code what I'd like to do, but it just doesn't feel right, particularly in light of what was discussed in the above responses. What DOES feel right is that there should be a generic interface / superclass to Commandline... and that different versions should be able to override it. Right? Would appreciate a suggestion or two on that. Gracias.

What's the best Perl practice for returning hashes from functions?

I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.