Any good collection module in perl? - perl

Can someone suggest a good module in perl which can be used to store collection of objects?
Or is ARRAY a good enough substitute for most of the needs?
Update:
I am looking for a collections class because I want to be able to do an operation like compute collection level property from each element.
Since I need to perform many such operations, I might as well write a class which can be extended by individual objects. This class will obviously work with arrays (or may be hashes).

There are collection modules for more complex structures, but it is common style in Perl to use Arrays for arrays, stacks and lists. Perl has built in functions for using the array as a stack or list : push/pop, shift/unshift, splice (inserting or removing in the middle) and the foreach form for iteration.
Perl also has a map, called a hashmap which is the equivalent to a Dictionary in Python - allowing you to have an association between a single key and a single value.
Perl developers often compose these two data-structures to build what they need - need multiple values? Store array-references in the value part of the hashtable (Map). Trees can be built in a similar manner - if you need unique keys, use multiple-levels of hashmaps, or if you don't use nested array references.
These two primitive collection types in Perl don't have an Object Oriented api, but they still are collections.
If you look on CPAN you'll likely find modules that provide other Object Oriented data structures, it really depends on your need. Is there a particular data structure you need besides a List, Stack or Map? You might get a more precise answer (eg a specific module) if you're asking about a particular data structure.
Forgot to mention, if you're looking for small code examples across a variety of languages, PLEAC (Programming Language Examples Alike Cookbook) is a decent resource.

I would second Michael Carman's comment: please do not use the term "Hashmap" or "map" when you mean a hash or associative array. Especially when Perl has a map function; that just confuses things.
Having said that, Kyle Burton's response is fundamentally sound: either a hash or an array, or a complex structure composed of a mixture of the two, is usually enough. Perl groks OO, but doesn't enforce it; chances are that a loosely-defined data structure may be good enough for what you need.
Failing that, please define more exactly what you mean by "compute collection level property from each element". And bear in mind that Perl has keywords like map and grep that let you do functional programming things like e.g.
my $record = get_complex_structure();
# $record = {
# 'widgets' => {
# name => 'ACME Widgets',
# skus => [ 'WIDG01', 'WIDG02', 'WIDG03' ],
# sales => {
# WIDG01 => { num => 25, value => 105.24 },
# WIDG02 => { num => 10, value => 80.02 },
# WIDG03 => { num => 8, value => 205.80 },
# },
# },
# ### and so on for 'grommets', 'nuts', 'bolts' etc.
# }
my #standouts =
map { $_->[0] }
sort {
$b->[2] <=> $a->[2]
|| $b->[1] <=> $a->[1]
|| $record->{$a->[0]}->{name} cmp $record->{$b->[0]}->{name}
}
map {
my ($num, $value);
for my $sku (#{$record->{$_}{skus}}) {
$num += $record->{$_}{sales}{$sku}{num};
$value += $record->{$_}{sales}{$sku}{value};
}
[ $_, $num, $value ];
}
keys %$record;
Reading from back to front, this particular Schwarztian transform does three things:
3) It takes a key to $record, goes through the SKUs defined in this arbitrary structure, and works out the aggregate number and total value of transactions. It returns an anonymous array containing the key, the number of transactions and the total value.
2) The next block takes in a number of arrayrefs and sorts them a) first of all by comparing the total value, numerically, in descending orders; b) if the values are equal, by comparing the number of transactions, numerically in descending order; and c) if that fails, by sorting asciibetically on the name associated with this order.
1) Finally, we take the key to $record from the sorted data structure, and return that.
It may well be that you don't need to set up a separate class to do what you want.

I would normally use an #array or a %hash.
What features are you looking for that aren't provided by those?

Base your decision on how you need to access the objects. If pushing them onto an array, indexing into, popping/shifting them off works, then use an array. Otherwise hash them by some key or organize them into a tree of objects that meets your needs. A hash of objects is a very simple, powerful, and highly-optimized way of doing things in Perl.

Since Perl arrays can easily be appended to, resized, sorted, etc., they are good enough for most "collection" needs. In cases where you need something more advanced, a hash will generally do. I wouldn't recommend that you go looking for a collection module until you actually need it.

Either an array or a hash can store a collection of objects. A class might be better if you want to work with the class in certain ways but you'd have to tell us what those ways are before we could make any good recommendations.

i would stick with an ARRAY or a HASH.
#names = ('Paul','Michael','Jessica','Megan');
and
my %petsounds = ("cat" => "meow",
"dog" => "woof",
"snake" => "hiss");
source

It depends a lot; there's Sparse Matrix modules, some forms of persistence, a new style of OO etc
Most people just man perldata, perllol, perldsc to answer their specific issue with a data structure.

Related

Is instantiating a hash in a function inefficient in perl?

Is there any difference in doing the following, efficiency, bad practice...?
(In a context of bigger hashes and sending them through many functions)
sub function {
my ($self, $hash_ref) = #_;
my %hash = %{$hash_ref};
print $hash{$key};
return;
}
Compared to:
sub function {
my ($self, $hash_ref) = #_;
print $hash_ref->{$key};
return;
}
Let's say %$hash_ref contains N elements.
The first snippet does the following in addition to what the second snippet does:
Creates N scalars. (Can involve multiple memory allocations each.)
Adds N*2 scalars to the stack. (Cheap.)
Creates a hash. (More memory allocations...)
Adds N elements to a hash. (More memory allocations...)
The second snippet does the following in addition to what the first snippet does:
[Nothing. It's a complete subset of the first snippet]
The first snippet is therefore far less efficient than the second. It also more complicated by virtue of having extra code. The complete lack of benefit and the numerous costs dictate that one should avoid the pattern used in the first snippet.
1st snippet is silly. But it's convenient practice to emulate named arguments:
sub function {
my ($self, %params ) = #_;
...
}
So, pass arrays/hashes by reference, creation of new (especially big) hash will be much slower. But there is nothing bad in "named arguments" hack.
And did you now that there exist key/value slice (v5.20+ only)? You can copy part of hash easily this way:
my %foo = ( one => 1, two => 2, three => 3, four => 4);
my %bar = %foo{'one', 'four'};
More information in perldoc perldata
The first version of the sub creates a local copy of the data structure which reference is passed to it. As such it is far less efficient, of course.
There is one legitimate reason for this: to make sure the data in the caller isn't changed. That local %hash can be changed in the sub as needed or convenient and the data in the calling code is not affected. This way the data in the caller is also protected against accidental changes.
Another reason why a local copy of data is done, in particular with deeper data structures, is to avoid long chains of dereferencing and thus simplify code; so parts of deep hierarchies may be copied for simpler access. Then this is merely for (presumed) programming convenience.
So in the shown example there'd be absolutely no reason to make a local copy. However, presumably the question is about subs where more work is done and then what's best depends on details.

Perl Autovivication Use case

I came across this piece of code (modified excerpt):
my $respMap;
my $respIdArray;
foreach my $respId (#$someList) {
push(#$respIdArray, $respId);
}
$respMap->{'ids'} = $respIdArray;
return $respMap;
Is there a reason to use autovivication in this case? Why not simply do
my $respMap;
my #respIdArray;
foreach my $respId (#$someList) {
push(#respIdArray, $respId);
}
$respMap->{'ids'} = \#respIdArray;
return $respMap;
Follow up: Could someone give me a good use case of autovivication?
Either way is correct; first one using array reference $respIdArray, and second plain array #respIdArray. You'll need array references when building complex data structures (check perldoc perlreftut), but other than that it's up to you which one you'll choose.
Note that in both cases you're assigning array reference to $respMap->{'ids'}, so examples are actually pretty similar.
And btw, autovivification is another thing and has to do with dynamic creation of data structures.
Autovivication is more useful when dealing with deep structures.
push( #{$hash{'key'}{$subkey}}, 'value' );

In Perl, on what basis should one choose between passing a reference to a hash/array and passing a hashref/arrayref scalar?

To be brief, I'm wondering if there are any best-practice reasons for deciding between:
my %hash = ( foo => 1, bar => 2 );
# some in-between logic
some_func(\%hash);
and
my $hashref = { foo => 1, bar => 2 };
# some in-between logic
some_func($hashref);
Or is purely a style decision?
The two are equivalent and interchangeable. The decision should be made based on what is clearest for you.
You can also move back and forth:
my $hashref = {x => 1, y => 2};
our %hash; *hash = $hashref;
some_func($hashref);
some_func(\%hash); # \%hash == $hashref
In general I prefer to work with the plural forms %name and #name since it results in less line noise due to dereferencing. That and some_func(\%var) is clearer with regard to var's type than some_func($var)
These two examples do exactly the same thing. So it mainly depends on which is more convenient for what else, if anything, you do with the %list variable and/or $listref variable.
Or maybe you'd like to skip the extra variable entirely:
some_func( { foo => 1, bar => 2 } );
(Less likely now that you've added those "in-between logic" comments.)
As indicated above, they're the same though the first may give you better context to what type of variable (i.e. a hash) you're using in your "some in-between logic" code.
In most cases it won't make a difference, but I have had some nasty bugs that occurred from trying to change where my reference points if I pass in a reference-of list.
Unless I really want to change the contents of a pre-existing array or hash, I personally favor using anonymous lists/hashes (like your second example).
This is just personal experience, though. I will admit that maybe if I understood Perl better I would know the objective best practices (if any exist).

What's the best Perl practice for returning hashes from functions?

I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.

Is there a Perl solution for lazy lists this side of Perl 6?

Has anybody found a good solution for lazily-evaluated lists in Perl? I've tried a number of ways to turn something like
for my $item ( map { ... } #list ) {
}
into a lazy evaluation--by tie-ing #list, for example. I'm trying to avoid breaking down and writing a source filter to do it, because they mess with your ability to debug the code. Has anybody had any success. Or do you just have to break down and use a while loop?
Note: I guess that I should mention that I'm kind of hooked on sometimes long grep-map chains for functionally transforming lists. So it's not so much the foreach loop or the while loop. It's that map expressions tend to pack more functionality into the same vertical space.
As mentioned previously, for(each) is an eager loop, so it wants to evaluate the entire list before starting.
For simplicity, I would recommend using an iterator object or closure rather than trying to have a lazily evaluated array. While you can use a tie to have a lazily evaluated infinite list, you can run into troubles if you ever ask (directly or indirectly, as in the foreach above) for the entire list (or even the size of the entire list).
Without writing a full class or using any modules, you can make a simple iterator factory just by using closures:
sub make_iterator {
my ($value, $max, $step) = #_;
return sub {
return if $value > $max; # Return undef when we overflow max.
my $current = $value;
$value += $step; # Increment value for next call.
return $current; # Return current iterator value.
};
}
And then to use it:
# All the even numbers between 0 - 100.
my $evens = make_iterator(0, 100, 2);
while (defined( my $x = $evens->() ) ) {
print "$x\n";
}
There's also the Tie::Array::Lazy module on the CPAN, which provides a much richer and fuller interface to lazy arrays. I've not used the module myself, so your mileage may vary.
All the best,
Paul
[Sidenote: Be aware that each individual step along a map/grep chain is eager. If you give it a big list all at once, your problems start much sooner than at the final foreach.]
What you can do to avoid a complete rewrite is to wrap your loop with an outer loop. Instead of writing this:
for my $item ( map { ... } grep { ... } map { ... } #list ) { ... }
… write it like this:
while ( my $input = calculcate_next_element() ) {
for my $item ( map { ... } grep { ... } map { ... } $input ) { ... }
}
This saves you from having to significantly rewrite your existing code, and as long as the list does not grow by several orders of magnitude during transformation, you get pretty nearly all the benefit that a rewrite to iterator style would offer.
If you want to make lazy lists, you'll have to write your own iterator. Once you have that, you can use something like Object::Iterate which has iterator-aware versions of map and grep. Take a look at the source for that module: it's pretty simple and you'll see how to write your own iterator-aware subroutines.
Good luck, :)
There is at least one special case where for and foreach have been optimized to not generate the whole list at once. And that is the range operator. So you have the option of saying:
for my $i (0..$#list) {
my $item = some_function($list[$i]);
...
}
and this will iterate through the array, transformed however you like, without creating a long list of values up front.
If you wish your map statement to return variable numbers of elements, you could do this instead:
for my $i (0..$#array) {
for my $item (some_function($array[$i])) {
...
}
}
If you wish more pervasive laziness than this, then your best option is to learn how to use closures to generate lazy lists. MJD's excellent book Higher Order Perl can walk you through those techniques. However do be warned that they will involve far larger changes to your code.
Bringing this back from the dead to mention that I just wrote the module List::Gen on CPAN which does exactly what the poster was looking for:
use List::Gen;
for my $item ( #{gen { ... } \#list} ) {...}
all computation of the lists are lazy, and there are map / grep equivalents along with a few other functions.
each of the functions returns a 'generator' which is a reference to a tied array. you can use the tied array directly, or there are a bunch of accessor methods like iterators to use.
Use an iterator or consider using Tie::LazyList from CPAN (which is a tad dated).
I asked a similar question at perlmonks.org, and BrowserUk gave a really good framework in his answer. Basically, a convenient way to get lazy evaluation is to spawn threads for the computation, at least as long as you're sure you want the results, Just Not Now. If you want lazy evaluation not to reduce latency but to avoid calculations, my approach won't help because it relies on a push model, not a pull model. Possibly using Corooutines, you can turn this approach into a (single-threaded) pull model as well.
While pondering this problem, I also investigated tie-ing an array to the thread results to make the Perl program flow more like map, but so far, I like my API of introducing the parallel "keyword" (an object constructor in disguise) and then calling methods on the result. The more documented version of the code will be posted as a reply to that thread and possibly released onto CPAN as well.
If I remember correctly, for/foreach do get the whole list first anyways, so a lazily evaluated list would be read completely and then it would start to iterate through the elements. Therefore, I think there's no other way than using a while loop. But I may be wrong.
The advantage of a while loop is that you can fake the sensation of a lazily evaluated list with a code reference:
my $list = sub { return calculate_next_element };
while(defined(my $element = &$list)) {
...
}
After all, I guess a tie is as close as you can get in Perl 5.