In Perl, on what basis should one choose between passing a reference to a hash/array and passing a hashref/arrayref scalar? - perl

To be brief, I'm wondering if there are any best-practice reasons for deciding between:
my %hash = ( foo => 1, bar => 2 );
# some in-between logic
some_func(\%hash);
and
my $hashref = { foo => 1, bar => 2 };
# some in-between logic
some_func($hashref);
Or is purely a style decision?

The two are equivalent and interchangeable. The decision should be made based on what is clearest for you.
You can also move back and forth:
my $hashref = {x => 1, y => 2};
our %hash; *hash = $hashref;
some_func($hashref);
some_func(\%hash); # \%hash == $hashref
In general I prefer to work with the plural forms %name and #name since it results in less line noise due to dereferencing. That and some_func(\%var) is clearer with regard to var's type than some_func($var)

These two examples do exactly the same thing. So it mainly depends on which is more convenient for what else, if anything, you do with the %list variable and/or $listref variable.
Or maybe you'd like to skip the extra variable entirely:
some_func( { foo => 1, bar => 2 } );
(Less likely now that you've added those "in-between logic" comments.)

As indicated above, they're the same though the first may give you better context to what type of variable (i.e. a hash) you're using in your "some in-between logic" code.

In most cases it won't make a difference, but I have had some nasty bugs that occurred from trying to change where my reference points if I pass in a reference-of list.
Unless I really want to change the contents of a pre-existing array or hash, I personally favor using anonymous lists/hashes (like your second example).
This is just personal experience, though. I will admit that maybe if I understood Perl better I would know the objective best practices (if any exist).

Related

*why* does list assignment flatten its left hand side?

I understand that list assignment flattens its left hand side:
my ($a, $b, $c);
($a, ($b, $c)) = (0, (1.0, 1.1), 2);
say "\$a: $a"; # OUTPUT: «$a: 0»
say "\$b: $b"; # OUTPUT: «$b: 1 1.1» <-- $b is *not* 1
say "\$c: $c"; # OUTPUT: «$c: 2» <-- $c is *not* 1.1
I also understand that we can use :($a, ($b, $c)) := (0, (1.0, 1.1)) to get the non-flattening behavior.
What I don't understand is why the left hand side is flattened during list assignment. And that's kind of two questions: First, how does this flattening behavior fit in with the rest of the language? Second, does auto-flattening allow any behavior that would be impossible if the left hand side were non-flattening?
On the first question, I know that Raku historically had a lot of auto-flattening behavior. Before the Great List Refactor, an expression like my #a = 1, (2, 3), 4 would auto-flatten its right hand side, resulting in the Array [1, 2, 3, 4]; similarly, map and many other iterating constructs would flatten their arguments. Post-GLR, though, Raku basically never flattens a list without being told to. In fact, I can't think of any other situation where Raku flattens without flat, .flat, |, or *# being involved somehow. (#_ creates an implicit *#). Am I missing something, or is the LHS behavior in list assignment really inconsistent with post-GLR semantics? Is this behavior a historical oddity, or does it still make sense?
With respect to my second question, I suspect that the flattening behavior of list assignment may somehow help support for laziness. For example, I know that we can use list assignment to consume certain values from a lazy list without producing/calculating them all – whereas using := with a list will need to calculate all of the RHS values. But I'm not sure if/how auto-flattening the LHS is required to support this behavior.
I also wonder if the auto-flattening has something to do with the fact that = can be passed to meta operators – unlike :=, which generates a "too fiddly" error if used with a metaoperator. But I don't know how/if auto-flattening makes = less "fiddly".
[edit: I've found IRC references to the "(GLR-preserved) decision that list assignment is flattening" as early as early as 2015-05-02, so it's clear that this decision was intentional and well-justified. But, so far, I haven't found that justification and suspect that it may have been decided at in-person meetings. So I'm hopping someone knows.]
Finally, I also wonder how the LHS is flattened, at a conceptual level. (I don't mean in the Rakudo implementation specifically; I mean as a mental model). Here's how I'd been thinking about binding versus list assignment:
my ($a, :$b) := (4, :a(2)); # Conceptually similar to calling .Capture on the RHS
my ($c, $d, $e);
($c, ($d, $e) = (0, 1, 2); # Conceptually similar to calling flat on the LHS
Except that actually calling .Capture on the RHS in line 1 works, whereas calling flat on the LHS in line 3 throws a Cannot modify an immutable Seq error – which I find very confusing, given that we flatten Seqs all the time. So is there a better mental model for thinking about this auto-flattening behavior?
Thanks in advance for any help. I'm trying to understand this better as part of my work to improve the related docs, so any insight you can provide would support that effort.
Somehow, answering the questions parts in the opposite order felt more natural to me. :-)
Second, does auto-flattening allow any behavior that would be impossible if the left hand side were non-flattening?
It's relatively common to want to assign the first (or first few) items of a list into scalars and have the rest placed into an array. List assignment descending into iterables on the left is what makes this work:
my ($first, $second, #rest) = 1..5;
.say for $first, $second, #rest;'
The output being:
1
2
[3 4 5]
With binding, which respects structure, it would instead be more like:
my ($first, $second, *#rest) := |(1..5);
First, how does this flattening behavior fit in with the rest of the language?
In general, operations where structure would not have meaning flatten it away. For example:
# Process arguments
my $proc = Proc::Async.new($program, #some-args, #some-others);
# Promise combinators
await Promise.anyof(#downloads, #uploads);
# File names
unlink #temps, #previous-output;
# Hash construction
my #a = x => 1, y => 2;
my #b = z => 3;
dd hash #a, #b; # {:x(1), :y(2), :z(3)}
List assignment could, of course, have been defined in a structure-respecting way instead. These things tend to happen for multiple reasons, but for one but the language already has binding for when you do want to do structured things, and for another my ($first, #rest) = #all is just a bit too common to send folks wanting it down the binding/slurpy power tool path.

Is instantiating a hash in a function inefficient in perl?

Is there any difference in doing the following, efficiency, bad practice...?
(In a context of bigger hashes and sending them through many functions)
sub function {
my ($self, $hash_ref) = #_;
my %hash = %{$hash_ref};
print $hash{$key};
return;
}
Compared to:
sub function {
my ($self, $hash_ref) = #_;
print $hash_ref->{$key};
return;
}
Let's say %$hash_ref contains N elements.
The first snippet does the following in addition to what the second snippet does:
Creates N scalars. (Can involve multiple memory allocations each.)
Adds N*2 scalars to the stack. (Cheap.)
Creates a hash. (More memory allocations...)
Adds N elements to a hash. (More memory allocations...)
The second snippet does the following in addition to what the first snippet does:
[Nothing. It's a complete subset of the first snippet]
The first snippet is therefore far less efficient than the second. It also more complicated by virtue of having extra code. The complete lack of benefit and the numerous costs dictate that one should avoid the pattern used in the first snippet.
1st snippet is silly. But it's convenient practice to emulate named arguments:
sub function {
my ($self, %params ) = #_;
...
}
So, pass arrays/hashes by reference, creation of new (especially big) hash will be much slower. But there is nothing bad in "named arguments" hack.
And did you now that there exist key/value slice (v5.20+ only)? You can copy part of hash easily this way:
my %foo = ( one => 1, two => 2, three => 3, four => 4);
my %bar = %foo{'one', 'four'};
More information in perldoc perldata
The first version of the sub creates a local copy of the data structure which reference is passed to it. As such it is far less efficient, of course.
There is one legitimate reason for this: to make sure the data in the caller isn't changed. That local %hash can be changed in the sub as needed or convenient and the data in the calling code is not affected. This way the data in the caller is also protected against accidental changes.
Another reason why a local copy of data is done, in particular with deeper data structures, is to avoid long chains of dereferencing and thus simplify code; so parts of deep hierarchies may be copied for simpler access. Then this is merely for (presumed) programming convenience.
So in the shown example there'd be absolutely no reason to make a local copy. However, presumably the question is about subs where more work is done and then what's best depends on details.

What's the best Perl practice for returning hashes from functions?

I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.

How can I elegantly call a Perl subroutine whose name is held in a variable?

I keep the name of the subroutine I want to call at runtime in a variable called $action. Then I use this to call that sub at the right time:
&{\&{$action}}();
Works fine. The only thing I don't like is that it's ugly and every time I do it, I feel beholden to add a comment for the next developer:
# call the sub by the name of $action
Anyone know a prettier way of doing this?
UPDATE: The idea here was to avoid having to maintain a dispatch table every time I added a new callable sub, since I am the sole developer, I'm not worried about other programmers following or not following the 'rules'. Sacrificing a bit of security for my convenience. Instead my dispatch module would check $action to make sure that 1) it is the name of a defined subroutine and not malicious code to run with eval, and 2) that it wouldn't run any sub prefaced by an underscore, which would be marked as internal-only subs by this naming convention.
Any thoughts on this approach? Whitelisting subroutines in the dispatch table is something I will forget all the time, and my clients would rather me err on the side of "it works" than "it's wicked secure". (very limited time to develop apps)
FINAL UPDATE: I think I've decided on a dispatch table after all. Although I'd be curious if anyone who reads this question has ever tried to do away with one and how they did it, I have to bow to the collective wisdom here. Thanks to all, many great responses.
Rather than storing subroutine names in a variable and calling them, a better way to do this is to use a hash of subroutine references (otherwise known as a dispatch table.)
my %actions = ( foo => \&foo,
bar => \&bar,
baz => sub { print 'baz!' }
...
);
Then you can call the right one easily:
$actions{$action}->();
You can also add some checking to make sure $action is a valid key in the hash, and so forth.
In general, you should avoid symbolic references (what you're doing now) as they cause all kinds of problems. In addition, using real subroutine references will work with strict turned on.
Just &$action(), but usually it's nicer to use coderefs from the beginning, or use a dispatcher hash. For example:
my $disp = {foo => \&some_sub, bar => \&some_other_sub };
$disp->{'foo'}->();
Huh? You can just say
$action->()
Example:
sub f { return 11 }
$action = 'f';
print $action->();
$ perl subfromscalar.pl
11
Constructions like
'f'->() # equivalent to &f()
also work.
I'm not sure I understand what you mean. (I think this is another in a recent group of "How can I use a variable as a variable name?" questions, but maybe not.)
In any case, you should be able to assign an entire subroutine to a variable (as a reference), and then call it straightforwardly:
# create the $action variable - a reference to the subroutine
my $action = \&sing_out;
# later - perhaps much later - I call it
$action->();
sub sing_out {
print "La, la, la, la, la!\n"
}
The most important thing is: why do you want to use variable as function name. What will happen if it will be 'eval'?
Is there a list of functions that can be used? Or can it be any function? If list exists - how long it is?
Generally, the best way to handle such cases is to use dispatch tables:
my %dispatch = (
'addition' => \&some_addition_function,
'multiplication' => sub { $self->call_method( #_ ) },
);
And then just:
$dispatch{ $your_variable }->( 'any', 'args' );
__PACKAGE__->can($action)->(#args);
For more info on can(): http://perldoc.perl.org/UNIVERSAL.html
I do something similar. I split it into two lines to make it slightly more identifiable, but it's not a lot prettier.
my $sub = \&{$action};
$sub->();
I do not know of a more correct or prettier way of doing it. For what it's worth, we have production code that does what you are doing, and it works without having to disable use strict.
Every package in Perl is already a hash table. You can add elements and reference them by the normal hash operations. In general it is not necessary to duplicate the functionality by an additional hash table.
#! /usr/bin/perl -T
use strict;
use warnings;
my $tag = 'HTML';
*::->{$tag} = sub { print '<html>', #_, '</html>', "\n" };
HTML("body1");
*::->{$tag}("body2");
The code prints:
<html>body1</html>
<html>body2</html>
If you need a separate name space, you can define a dedicated package.
See perlmod for further information.
Either use
&{\&{$action}}();
Or use eval to execute the function:
eval("$action()");
I did it in this way:
#func = qw(cpu mem net disk);
foreach my $item (#func){
$ret .= &$item(1);
}
If it's only in one program, write a function that calls a subroutine using a variable name, and only have to document it/apologize once?
I used this: it works for me.
(\$action)->();
Or you can use 'do', quite similar with previous posts:
$p = do { \&$conn;};
$p->();

Any good collection module in perl?

Can someone suggest a good module in perl which can be used to store collection of objects?
Or is ARRAY a good enough substitute for most of the needs?
Update:
I am looking for a collections class because I want to be able to do an operation like compute collection level property from each element.
Since I need to perform many such operations, I might as well write a class which can be extended by individual objects. This class will obviously work with arrays (or may be hashes).
There are collection modules for more complex structures, but it is common style in Perl to use Arrays for arrays, stacks and lists. Perl has built in functions for using the array as a stack or list : push/pop, shift/unshift, splice (inserting or removing in the middle) and the foreach form for iteration.
Perl also has a map, called a hashmap which is the equivalent to a Dictionary in Python - allowing you to have an association between a single key and a single value.
Perl developers often compose these two data-structures to build what they need - need multiple values? Store array-references in the value part of the hashtable (Map). Trees can be built in a similar manner - if you need unique keys, use multiple-levels of hashmaps, or if you don't use nested array references.
These two primitive collection types in Perl don't have an Object Oriented api, but they still are collections.
If you look on CPAN you'll likely find modules that provide other Object Oriented data structures, it really depends on your need. Is there a particular data structure you need besides a List, Stack or Map? You might get a more precise answer (eg a specific module) if you're asking about a particular data structure.
Forgot to mention, if you're looking for small code examples across a variety of languages, PLEAC (Programming Language Examples Alike Cookbook) is a decent resource.
I would second Michael Carman's comment: please do not use the term "Hashmap" or "map" when you mean a hash or associative array. Especially when Perl has a map function; that just confuses things.
Having said that, Kyle Burton's response is fundamentally sound: either a hash or an array, or a complex structure composed of a mixture of the two, is usually enough. Perl groks OO, but doesn't enforce it; chances are that a loosely-defined data structure may be good enough for what you need.
Failing that, please define more exactly what you mean by "compute collection level property from each element". And bear in mind that Perl has keywords like map and grep that let you do functional programming things like e.g.
my $record = get_complex_structure();
# $record = {
# 'widgets' => {
# name => 'ACME Widgets',
# skus => [ 'WIDG01', 'WIDG02', 'WIDG03' ],
# sales => {
# WIDG01 => { num => 25, value => 105.24 },
# WIDG02 => { num => 10, value => 80.02 },
# WIDG03 => { num => 8, value => 205.80 },
# },
# },
# ### and so on for 'grommets', 'nuts', 'bolts' etc.
# }
my #standouts =
map { $_->[0] }
sort {
$b->[2] <=> $a->[2]
|| $b->[1] <=> $a->[1]
|| $record->{$a->[0]}->{name} cmp $record->{$b->[0]}->{name}
}
map {
my ($num, $value);
for my $sku (#{$record->{$_}{skus}}) {
$num += $record->{$_}{sales}{$sku}{num};
$value += $record->{$_}{sales}{$sku}{value};
}
[ $_, $num, $value ];
}
keys %$record;
Reading from back to front, this particular Schwarztian transform does three things:
3) It takes a key to $record, goes through the SKUs defined in this arbitrary structure, and works out the aggregate number and total value of transactions. It returns an anonymous array containing the key, the number of transactions and the total value.
2) The next block takes in a number of arrayrefs and sorts them a) first of all by comparing the total value, numerically, in descending orders; b) if the values are equal, by comparing the number of transactions, numerically in descending order; and c) if that fails, by sorting asciibetically on the name associated with this order.
1) Finally, we take the key to $record from the sorted data structure, and return that.
It may well be that you don't need to set up a separate class to do what you want.
I would normally use an #array or a %hash.
What features are you looking for that aren't provided by those?
Base your decision on how you need to access the objects. If pushing them onto an array, indexing into, popping/shifting them off works, then use an array. Otherwise hash them by some key or organize them into a tree of objects that meets your needs. A hash of objects is a very simple, powerful, and highly-optimized way of doing things in Perl.
Since Perl arrays can easily be appended to, resized, sorted, etc., they are good enough for most "collection" needs. In cases where you need something more advanced, a hash will generally do. I wouldn't recommend that you go looking for a collection module until you actually need it.
Either an array or a hash can store a collection of objects. A class might be better if you want to work with the class in certain ways but you'd have to tell us what those ways are before we could make any good recommendations.
i would stick with an ARRAY or a HASH.
#names = ('Paul','Michael','Jessica','Megan');
and
my %petsounds = ("cat" => "meow",
"dog" => "woof",
"snake" => "hiss");
source
It depends a lot; there's Sparse Matrix modules, some forms of persistence, a new style of OO etc
Most people just man perldata, perllol, perldsc to answer their specific issue with a data structure.