Can one pass Perl hash references between processes? - perl

I have an ActiveState PerlCtrl project. I'd like to know if it's possible to have a hash in the COM DLL, pass it's ref out to the calling process as a string (e.g. "HASH(0x2345)") and then pass that string back into the COM DLL and somehow bless it back into pointing to the relevant hash.
Getting the hashref seems easy enough, using return "" . \%Graph; and I have tried things like $Graph = shift; $Graph = bless {%$Graph}; but they don't seem to achieve what I'm after. The %Graph hash is at least global to the module.
The testing code (VBScript):
set o = CreateObject("Project.BOGLE.1")
x = o.new_graph()
wscript.echo x
x = o.add_vertex(x, "foo")

If these are different processes, you will need to either serialize the content of the hash or persistently store it in a disk file. To do the former, see Storable or Data::Dumper; for the latter, it depends whether it's a hash of simple scalars or something more complex.
If it is the same perl interpreter in the same process, you can keep some global variable like %main::hashes;
set $main::hashes{\%Graph} = \%Graph before passing the stringified reference back to the calling process, then later use it to look up the actual hash reference.
Don't do this, though: http://perlmonks.org/?node_id=379395.

No, you can't reliably pass hash references between processes.

Related

How can I stop or allow a perl object to be used as a hashref?

I have a Perl class which is a based on a blessed hashref ( https://github.com/kylemhall/Koha/blob/master/Koha/Object.pm )
This is a community based project, with many developers of varying skill.
What I've seen is some developers accidentally using our objects as hashrefs. The actual data is not stored in the blessed hashref, but in a dbic object that is stored in the hashref ( in $self->{_result} ). When the dev tries something like $object->{id} perl doesn't complain, it just returns undef, as would be expected.
What I'd like to do is to either
A) Make the script explode with an error when this happens
B) Allow the use of the hashref syntax for setting / getting values in the dbic objects stored in $self->{_result}
I tried using:
use overload '%{}' => \&get_hashref;
but when I do this, get_hashref is called any time a regular method is called! This makes sense in a way, since the object itself is a hashref. I'm sure this has something to do with the Perl internals for blessed hashrefs as objects.
Is what I'm trying to accomplish even possible?
I suggest using a scalar-based or array-based object instead of a hash-based object. This is a cheap (efficient) solution since it simply causes the offender to run afoul of existing type checks.
For example, the object produced by the following is simply a reference to the actual object. Just use $$self instead of $self in the methods.
$ perl -e'
sub new {
my $class = shift;
my $self = bless(\{}, $class);
# $$self->{...} = ...;
return $self;
}
my $o = __PACKAGE__->new();
my $id = $o->{id};
'
Not a HASH reference at -e line 9.
One way to do this, is by using "inside-out objects", where the object is just a blessed simple scalar, and the data is stored separately.
See, for instance, Class::STD

Naming convention for passing a hash reference to a subroutine

When I pass a hash reference to a subroutine, what is a good naming convention for the variable to which the hash ref parameter is assigned?
In the example below, what is the better option:
Use the same variable name as the original variable (and let Perl hide the original within the block).
Come up with a new name for the hash ref within the subroutine?
I am inclined to use option 1, because a new name for the same hash reference feels redundant.
If option 2 is better, what would be a good naming convention for the variable within the subroutine ?
## example:
use strict;
use warnings;
my $href_phone_book = generate_phone_book();
my $phone_number = get_phone_number($href_phone_book, "john");
print $phone_number."\n";
sub generate_phone_book {
my %phone_book = (
john => "5554321",
alice => "5551234"
);
return \%phone_book;
}
sub get_phone_number{
# Is it OK to hide the original variable
my $href_phone_book = shift;
# Or should I use a new variable name? What would then be a good naming convention?
my $person = shift;
my $phone_number = $href_phone_book->{$person};
return $phone_number;
}
You should name your variables for the data that they reference. Ideally you should use a short non-ambiguous name, and remove context where possible. For example if your function did something that would apply generically to any hash reference, the name $hash_ref would be preferable IMO to $phone_book
1. Use the same variable name as the original variable (and let Perl hide the original within the block).
Although not a generically safe assumption, it is what I would do in your example. I would use $phone_book as the single param.
Note if you are using the same reference data structure repeatedly as a parameter to functions, then it points to writing some OO Perl. You should probably create a class, with the data structure the object, and functions changed to instance methods.
Caveat: Your example script will generate a warning due to variable scope. So my advice needs to be modified. I would not alter the subroutine's variable but instead name the global variable something like $main_phone_book . . . but this is only necessary due to the structure of your example script. Ideally you should avoid global data structures, and use modularisation (a PhoneBook.pm file that contained all the logic for handling phone numbers, whether exported functions or a new class, would not have a clash on the variable name because of package namespaces). Or if the variables are truly global to the problem at hand you probably would not be passing them as parameters at all.
2. Come up with a new name for the hash ref within the subroutine?
In this case it is only necessary due to file structure of the example script. Consistency, at least at the level of individual scripts and modules, is more important than finding some special naming convention that can handle all your needs.
I suggest you read the variable naming convention chapters of Perl Best Practices. It is very clear and useful advice.

Perl scoping and the life of local variables

How long does the memory location allocated by a local variable in Perl live for (both for arrays, hashes and scalars)? For instance:
sub routine
{
my $foo = "bar";
return \$foo;
}
Can you still access the string "bar" in memory after the function has returned? How long will it live for, and is it similar to a static variable in C or more like a variable declared off the heap?
Basically, does this make sense in this context?
$ref = routine()
print ${$ref};
Yes, that code will work fine.
Perl uses reference counting, so the variable will live as long as somebody has a reference to it. Perl's lexical variables are sort of like C's automatic variables, because they normally go away when you leave the scope, but they're also like a variable on the heap, because you can return a reference to one and it will just work.
They're not like C's static variables, because you get a new $foo every time you call routine (even recursively). (Perl 5.10 introduced state variables, which are rather like a C static.)

What's the best Perl practice for returning hashes from functions?

I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.

How can I elegantly call a Perl subroutine whose name is held in a variable?

I keep the name of the subroutine I want to call at runtime in a variable called $action. Then I use this to call that sub at the right time:
&{\&{$action}}();
Works fine. The only thing I don't like is that it's ugly and every time I do it, I feel beholden to add a comment for the next developer:
# call the sub by the name of $action
Anyone know a prettier way of doing this?
UPDATE: The idea here was to avoid having to maintain a dispatch table every time I added a new callable sub, since I am the sole developer, I'm not worried about other programmers following or not following the 'rules'. Sacrificing a bit of security for my convenience. Instead my dispatch module would check $action to make sure that 1) it is the name of a defined subroutine and not malicious code to run with eval, and 2) that it wouldn't run any sub prefaced by an underscore, which would be marked as internal-only subs by this naming convention.
Any thoughts on this approach? Whitelisting subroutines in the dispatch table is something I will forget all the time, and my clients would rather me err on the side of "it works" than "it's wicked secure". (very limited time to develop apps)
FINAL UPDATE: I think I've decided on a dispatch table after all. Although I'd be curious if anyone who reads this question has ever tried to do away with one and how they did it, I have to bow to the collective wisdom here. Thanks to all, many great responses.
Rather than storing subroutine names in a variable and calling them, a better way to do this is to use a hash of subroutine references (otherwise known as a dispatch table.)
my %actions = ( foo => \&foo,
bar => \&bar,
baz => sub { print 'baz!' }
...
);
Then you can call the right one easily:
$actions{$action}->();
You can also add some checking to make sure $action is a valid key in the hash, and so forth.
In general, you should avoid symbolic references (what you're doing now) as they cause all kinds of problems. In addition, using real subroutine references will work with strict turned on.
Just &$action(), but usually it's nicer to use coderefs from the beginning, or use a dispatcher hash. For example:
my $disp = {foo => \&some_sub, bar => \&some_other_sub };
$disp->{'foo'}->();
Huh? You can just say
$action->()
Example:
sub f { return 11 }
$action = 'f';
print $action->();
$ perl subfromscalar.pl
11
Constructions like
'f'->() # equivalent to &f()
also work.
I'm not sure I understand what you mean. (I think this is another in a recent group of "How can I use a variable as a variable name?" questions, but maybe not.)
In any case, you should be able to assign an entire subroutine to a variable (as a reference), and then call it straightforwardly:
# create the $action variable - a reference to the subroutine
my $action = \&sing_out;
# later - perhaps much later - I call it
$action->();
sub sing_out {
print "La, la, la, la, la!\n"
}
The most important thing is: why do you want to use variable as function name. What will happen if it will be 'eval'?
Is there a list of functions that can be used? Or can it be any function? If list exists - how long it is?
Generally, the best way to handle such cases is to use dispatch tables:
my %dispatch = (
'addition' => \&some_addition_function,
'multiplication' => sub { $self->call_method( #_ ) },
);
And then just:
$dispatch{ $your_variable }->( 'any', 'args' );
__PACKAGE__->can($action)->(#args);
For more info on can(): http://perldoc.perl.org/UNIVERSAL.html
I do something similar. I split it into two lines to make it slightly more identifiable, but it's not a lot prettier.
my $sub = \&{$action};
$sub->();
I do not know of a more correct or prettier way of doing it. For what it's worth, we have production code that does what you are doing, and it works without having to disable use strict.
Every package in Perl is already a hash table. You can add elements and reference them by the normal hash operations. In general it is not necessary to duplicate the functionality by an additional hash table.
#! /usr/bin/perl -T
use strict;
use warnings;
my $tag = 'HTML';
*::->{$tag} = sub { print '<html>', #_, '</html>', "\n" };
HTML("body1");
*::->{$tag}("body2");
The code prints:
<html>body1</html>
<html>body2</html>
If you need a separate name space, you can define a dedicated package.
See perlmod for further information.
Either use
&{\&{$action}}();
Or use eval to execute the function:
eval("$action()");
I did it in this way:
#func = qw(cpu mem net disk);
foreach my $item (#func){
$ret .= &$item(1);
}
If it's only in one program, write a function that calls a subroutine using a variable name, and only have to document it/apologize once?
I used this: it works for me.
(\$action)->();
Or you can use 'do', quite similar with previous posts:
$p = do { \&$conn;};
$p->();