What is the most elegant way in Perl to expand an iterator into a list? - perl

I have an iterator with this interface: $hit->next_hsp
The current implementation to listify it is:
my #list;
while ( my $hsp = $hit->next_hsp ) {
push( #list, $hsp );
}
Now I'm thinking that there might be better ways to do this in less code. What do you say, stackers?

All iterators I've ever seen return undef to signify that they are exhausted. Therefore you should write while (defined(my $hsp = $hit->next_hsp)). The following example demonstrates the fault in the question which tests for truth (aborts at 1) instead of definedness (passes 'liftoff').
use 5.010;
my $hit = __PACKAGE__;
sub next_hsp {
state $i;
$i++;
return ['mumble', 4, 3, 2, 1, 0, 'liftoff']->[$i];
}
# insert snippet from question here

It entirely depends on the iterator implementation. If next_hsp is the only available method, then you're doing it right.

Don't worry about playing golf, the code you have looks just fine (other than the other answers about using defined). However, if you find yourself repeating this pattern 2 things come to mind.
The first is obvious, refactor it into a utility function, so that you have my #list = expand($hit).
The second question is a bit deeper - but to me smells more than playing golf. The whole point of iterators is to consume as you need them, so if you find yourself doing this often, are you sure it's really the right thing to do? Perhaps your moving this data outside your own API, so you're constrained to other's choices, but if you have the option of consuming an iterator rather than a list, maybe this will be a cleaner solution.

Related

Is instantiating a hash in a function inefficient in perl?

Is there any difference in doing the following, efficiency, bad practice...?
(In a context of bigger hashes and sending them through many functions)
sub function {
my ($self, $hash_ref) = #_;
my %hash = %{$hash_ref};
print $hash{$key};
return;
}
Compared to:
sub function {
my ($self, $hash_ref) = #_;
print $hash_ref->{$key};
return;
}
Let's say %$hash_ref contains N elements.
The first snippet does the following in addition to what the second snippet does:
Creates N scalars. (Can involve multiple memory allocations each.)
Adds N*2 scalars to the stack. (Cheap.)
Creates a hash. (More memory allocations...)
Adds N elements to a hash. (More memory allocations...)
The second snippet does the following in addition to what the first snippet does:
[Nothing. It's a complete subset of the first snippet]
The first snippet is therefore far less efficient than the second. It also more complicated by virtue of having extra code. The complete lack of benefit and the numerous costs dictate that one should avoid the pattern used in the first snippet.
1st snippet is silly. But it's convenient practice to emulate named arguments:
sub function {
my ($self, %params ) = #_;
...
}
So, pass arrays/hashes by reference, creation of new (especially big) hash will be much slower. But there is nothing bad in "named arguments" hack.
And did you now that there exist key/value slice (v5.20+ only)? You can copy part of hash easily this way:
my %foo = ( one => 1, two => 2, three => 3, four => 4);
my %bar = %foo{'one', 'four'};
More information in perldoc perldata
The first version of the sub creates a local copy of the data structure which reference is passed to it. As such it is far less efficient, of course.
There is one legitimate reason for this: to make sure the data in the caller isn't changed. That local %hash can be changed in the sub as needed or convenient and the data in the calling code is not affected. This way the data in the caller is also protected against accidental changes.
Another reason why a local copy of data is done, in particular with deeper data structures, is to avoid long chains of dereferencing and thus simplify code; so parts of deep hierarchies may be copied for simpler access. Then this is merely for (presumed) programming convenience.
So in the shown example there'd be absolutely no reason to make a local copy. However, presumably the question is about subs where more work is done and then what's best depends on details.

How to detect if Perl code is being run inside an eval?

I've got a module that uses an #INC hook and tries to install missing modules as they are used. I don't want this behaviour to fire inside an eval. My attempt to do this currently is:
return
if ( ( $caller[3] && $caller[3] =~ m{eval} )
|| ( $caller[1] && $caller[1] =~ m{eval} ) );
That's the result of me messing around with the call stack in some experiments, but it's not catching everything, like this code in HTTP::Tinyish:
sub configure_backend {
my($self, $backend) = #_;
unless (exists $configured{$backend}) {
$configured{$backend} =
eval { require_module($backend); $backend->configure };
}
$configured{$backend};
}
sub require_module {
local $_ = shift;
s!::!/!g;
require "$_.pm";
}
Maybe I just need to traverse every level of the call stack until I hit an eval or run out of levels. Is there a better or easier way for me to figure out whether or not code is being wrapped in an eval without traversing the call stack?
Post mortem on this question:
as was suggested by multiple posters, this was basically a bad idea
$^S is technically a correct way to do this, but it doesn't let you know if you're inside an eval that was called somewhere higher in the stack
using a regex + Carp::longmess() seems to be the most concise way to figure this out
knowing if code is running inside an eval may be somewhat helpful for informational purposes, but since this could be happening for many different reasons, it's very hard to infer why it's happening
regardless, this was an interesting exercise. i thank all contributors for their helpful input
Carp::longmess traverses the stack for you in one call, if that makes things easier
return if Carp::longmess =~ m{^\s+eval }m
If $^S is true, the code is inside an eval.
sub foo { print $^S }
eval { foo() }; # 1
foo(); # 0
Don't try to do this in reusable code. There are many reasons to be in an eval and not want this kind of action at a distance change.

Concerns with concatenating strings and ints

I have taken a principles of programming class and have been given a Perl expression that concatenates a string number to an int number and then adds another number to it and it evaluates fine. i.e. ("4" . 3) + 7 == 50.
I'm trying to understand why Perl does this and what concerns it may bring up. I'm having a hard time grasping many of the concepts of the class and am trying to get explanations from different sources apart from my horrible teacher and equally horrible notes.
Can the concept behind this kind of expression be explained to me as well as concerns they might bring up? Thanks in advance for the help.
Edit: For Clarity
Perl is built around the central concept of 'do what I mean'.
A scalar is a multi purpose variable type, and is intended to implicitly cast values to a data type that's appropriate to what you're doing.
The reason this works is because perl is context sensitive - it knows the difference between different expected return values.
At a basic level, you can see this with the wantarray function. (Which as noted below - is probably badly named, because we're talking about a LIST context)
sub context_test {
if ( not defined wantarray() ) {
print "Void context\n";
}
if ( wantarray() ) {
return ( "List", "Context" );
}
else {
return "scalar context";
}
}
context_test();
my $scalar = context_test();
my #list = context_test();
print "Scalar context gave me $scalar\n";
print "List context gave me #list\n";
This principle occurs throughout perl. If you want, you can use something like Contextual::Return to extend this further - testing the difference between numeric, string and boolean subsets of scalar contexts.
The reason I mention this is because a scalar is a special sort of data type - if you look at Scalar::Util you will see a capability of creating a dualvar - a scalar that has different values in different contexts.
my $dualvar = Scalar::Util::dualvar ( 666, "the beast" );
print "Numeric:",$dualvar + 0,"\n";
print "String:",$dualvar . '',"\n";
Now, messing around with dualvars is a good way to create some really annoying and hard to trace bugs, but the point is - a scalar is a magic datatype, and perl is always aware of what you're doing with the result.
If you perform a string operation, perl treats it as a string. If you perform a numeric operation, perl tries to treat it as a number.
my $value = '4'; #string;
my $newvalue = $value . 3; #because we concat, perl treats _both_ as strings.
print $newvalue,"\n";
my $sum = $newvalue + 7; #perl turns strings back to numbers, because we're adding;
print $sum,"\n";
if ( Scalar::Util::isdual ( $newvalue ) ) { print "newvalue Is a dual var\n" };
if ( not Scalar::Util::isdual ( $sum ) ) { print "sum is NOT a dual var\n"; };
Mostly 'context' is something that happens behind the scenes in perl, and you don't have to worry about it. If you've come from a programming background, the idea of implicit casting between int and string may seem a little bit dirty. But it mostly works fine.
You may occasionally get errors like:
Argument "4a3" isn't numeric in addition (+)
One of the downsides of this approach is these are runtime errors, because you're not doing strong type checking at 'compile' time.
So in terms of specific concerns:
You're runtime type checking, not compile time. If you have strict types, you can detect an attempt to add a string to an int before you start to run anything.
You're not always operating in the context that you assume you are, which can lead to some unpredictable behaviour. One of the best examples is that print operates in a list context - so to take the example above:
print context_test();
You'll get List Context.
If you monkey around with context sensitive return types, you can create some really annoying bugs that are immensely irritating to back trace and troubleshoot.

How can I elegantly call a Perl subroutine whose name is held in a variable?

I keep the name of the subroutine I want to call at runtime in a variable called $action. Then I use this to call that sub at the right time:
&{\&{$action}}();
Works fine. The only thing I don't like is that it's ugly and every time I do it, I feel beholden to add a comment for the next developer:
# call the sub by the name of $action
Anyone know a prettier way of doing this?
UPDATE: The idea here was to avoid having to maintain a dispatch table every time I added a new callable sub, since I am the sole developer, I'm not worried about other programmers following or not following the 'rules'. Sacrificing a bit of security for my convenience. Instead my dispatch module would check $action to make sure that 1) it is the name of a defined subroutine and not malicious code to run with eval, and 2) that it wouldn't run any sub prefaced by an underscore, which would be marked as internal-only subs by this naming convention.
Any thoughts on this approach? Whitelisting subroutines in the dispatch table is something I will forget all the time, and my clients would rather me err on the side of "it works" than "it's wicked secure". (very limited time to develop apps)
FINAL UPDATE: I think I've decided on a dispatch table after all. Although I'd be curious if anyone who reads this question has ever tried to do away with one and how they did it, I have to bow to the collective wisdom here. Thanks to all, many great responses.
Rather than storing subroutine names in a variable and calling them, a better way to do this is to use a hash of subroutine references (otherwise known as a dispatch table.)
my %actions = ( foo => \&foo,
bar => \&bar,
baz => sub { print 'baz!' }
...
);
Then you can call the right one easily:
$actions{$action}->();
You can also add some checking to make sure $action is a valid key in the hash, and so forth.
In general, you should avoid symbolic references (what you're doing now) as they cause all kinds of problems. In addition, using real subroutine references will work with strict turned on.
Just &$action(), but usually it's nicer to use coderefs from the beginning, or use a dispatcher hash. For example:
my $disp = {foo => \&some_sub, bar => \&some_other_sub };
$disp->{'foo'}->();
Huh? You can just say
$action->()
Example:
sub f { return 11 }
$action = 'f';
print $action->();
$ perl subfromscalar.pl
11
Constructions like
'f'->() # equivalent to &f()
also work.
I'm not sure I understand what you mean. (I think this is another in a recent group of "How can I use a variable as a variable name?" questions, but maybe not.)
In any case, you should be able to assign an entire subroutine to a variable (as a reference), and then call it straightforwardly:
# create the $action variable - a reference to the subroutine
my $action = \&sing_out;
# later - perhaps much later - I call it
$action->();
sub sing_out {
print "La, la, la, la, la!\n"
}
The most important thing is: why do you want to use variable as function name. What will happen if it will be 'eval'?
Is there a list of functions that can be used? Or can it be any function? If list exists - how long it is?
Generally, the best way to handle such cases is to use dispatch tables:
my %dispatch = (
'addition' => \&some_addition_function,
'multiplication' => sub { $self->call_method( #_ ) },
);
And then just:
$dispatch{ $your_variable }->( 'any', 'args' );
__PACKAGE__->can($action)->(#args);
For more info on can(): http://perldoc.perl.org/UNIVERSAL.html
I do something similar. I split it into two lines to make it slightly more identifiable, but it's not a lot prettier.
my $sub = \&{$action};
$sub->();
I do not know of a more correct or prettier way of doing it. For what it's worth, we have production code that does what you are doing, and it works without having to disable use strict.
Every package in Perl is already a hash table. You can add elements and reference them by the normal hash operations. In general it is not necessary to duplicate the functionality by an additional hash table.
#! /usr/bin/perl -T
use strict;
use warnings;
my $tag = 'HTML';
*::->{$tag} = sub { print '<html>', #_, '</html>', "\n" };
HTML("body1");
*::->{$tag}("body2");
The code prints:
<html>body1</html>
<html>body2</html>
If you need a separate name space, you can define a dedicated package.
See perlmod for further information.
Either use
&{\&{$action}}();
Or use eval to execute the function:
eval("$action()");
I did it in this way:
#func = qw(cpu mem net disk);
foreach my $item (#func){
$ret .= &$item(1);
}
If it's only in one program, write a function that calls a subroutine using a variable name, and only have to document it/apologize once?
I used this: it works for me.
(\$action)->();
Or you can use 'do', quite similar with previous posts:
$p = do { \&$conn;};
$p->();

Is there a Perl solution for lazy lists this side of Perl 6?

Has anybody found a good solution for lazily-evaluated lists in Perl? I've tried a number of ways to turn something like
for my $item ( map { ... } #list ) {
}
into a lazy evaluation--by tie-ing #list, for example. I'm trying to avoid breaking down and writing a source filter to do it, because they mess with your ability to debug the code. Has anybody had any success. Or do you just have to break down and use a while loop?
Note: I guess that I should mention that I'm kind of hooked on sometimes long grep-map chains for functionally transforming lists. So it's not so much the foreach loop or the while loop. It's that map expressions tend to pack more functionality into the same vertical space.
As mentioned previously, for(each) is an eager loop, so it wants to evaluate the entire list before starting.
For simplicity, I would recommend using an iterator object or closure rather than trying to have a lazily evaluated array. While you can use a tie to have a lazily evaluated infinite list, you can run into troubles if you ever ask (directly or indirectly, as in the foreach above) for the entire list (or even the size of the entire list).
Without writing a full class or using any modules, you can make a simple iterator factory just by using closures:
sub make_iterator {
my ($value, $max, $step) = #_;
return sub {
return if $value > $max; # Return undef when we overflow max.
my $current = $value;
$value += $step; # Increment value for next call.
return $current; # Return current iterator value.
};
}
And then to use it:
# All the even numbers between 0 - 100.
my $evens = make_iterator(0, 100, 2);
while (defined( my $x = $evens->() ) ) {
print "$x\n";
}
There's also the Tie::Array::Lazy module on the CPAN, which provides a much richer and fuller interface to lazy arrays. I've not used the module myself, so your mileage may vary.
All the best,
Paul
[Sidenote: Be aware that each individual step along a map/grep chain is eager. If you give it a big list all at once, your problems start much sooner than at the final foreach.]
What you can do to avoid a complete rewrite is to wrap your loop with an outer loop. Instead of writing this:
for my $item ( map { ... } grep { ... } map { ... } #list ) { ... }
… write it like this:
while ( my $input = calculcate_next_element() ) {
for my $item ( map { ... } grep { ... } map { ... } $input ) { ... }
}
This saves you from having to significantly rewrite your existing code, and as long as the list does not grow by several orders of magnitude during transformation, you get pretty nearly all the benefit that a rewrite to iterator style would offer.
If you want to make lazy lists, you'll have to write your own iterator. Once you have that, you can use something like Object::Iterate which has iterator-aware versions of map and grep. Take a look at the source for that module: it's pretty simple and you'll see how to write your own iterator-aware subroutines.
Good luck, :)
There is at least one special case where for and foreach have been optimized to not generate the whole list at once. And that is the range operator. So you have the option of saying:
for my $i (0..$#list) {
my $item = some_function($list[$i]);
...
}
and this will iterate through the array, transformed however you like, without creating a long list of values up front.
If you wish your map statement to return variable numbers of elements, you could do this instead:
for my $i (0..$#array) {
for my $item (some_function($array[$i])) {
...
}
}
If you wish more pervasive laziness than this, then your best option is to learn how to use closures to generate lazy lists. MJD's excellent book Higher Order Perl can walk you through those techniques. However do be warned that they will involve far larger changes to your code.
Bringing this back from the dead to mention that I just wrote the module List::Gen on CPAN which does exactly what the poster was looking for:
use List::Gen;
for my $item ( #{gen { ... } \#list} ) {...}
all computation of the lists are lazy, and there are map / grep equivalents along with a few other functions.
each of the functions returns a 'generator' which is a reference to a tied array. you can use the tied array directly, or there are a bunch of accessor methods like iterators to use.
Use an iterator or consider using Tie::LazyList from CPAN (which is a tad dated).
I asked a similar question at perlmonks.org, and BrowserUk gave a really good framework in his answer. Basically, a convenient way to get lazy evaluation is to spawn threads for the computation, at least as long as you're sure you want the results, Just Not Now. If you want lazy evaluation not to reduce latency but to avoid calculations, my approach won't help because it relies on a push model, not a pull model. Possibly using Corooutines, you can turn this approach into a (single-threaded) pull model as well.
While pondering this problem, I also investigated tie-ing an array to the thread results to make the Perl program flow more like map, but so far, I like my API of introducing the parallel "keyword" (an object constructor in disguise) and then calling methods on the result. The more documented version of the code will be posted as a reply to that thread and possibly released onto CPAN as well.
If I remember correctly, for/foreach do get the whole list first anyways, so a lazily evaluated list would be read completely and then it would start to iterate through the elements. Therefore, I think there's no other way than using a while loop. But I may be wrong.
The advantage of a while loop is that you can fake the sensation of a lazily evaluated list with a code reference:
my $list = sub { return calculate_next_element };
while(defined(my $element = &$list)) {
...
}
After all, I guess a tie is as close as you can get in Perl 5.