Perl 5 - Iterator - perl

I have implemented a simple iterator in perl. I normally work with C#, and use iterators and functional programming quite frequently. So I thought it would be simple to get some basics working in perl.
Problem is, I'm getting some poor performance, I don't expect be any faster than for or foreach, but I thought someone could give me some insight in how to speed it up.
Here is the guts of my package:
package Iterator;
use strict;
#Constructor for Iterator type
sub new {
my $type = $_[0];
my $this = {};
#set default values
$this->{Array} = #_[1];
$this->{Index} = 0;
$this->{Sub} = sub {
my #array = #{$this->{Array}};
return $#array >= $this->{Index} ? $array[$this->{Index}++] : undef;
};
#return self
bless($this, $type);
return $this;
}
#Iterates next
sub Next {
return $_[0]->{Sub}->();
}
Allows you to do this:
my $iterator = Iterator->new(\#array);
while (defined(my $current = $iterator->Next())) {
print $current, "\n";
}
Not flashy... yet.
Also enables some functional code like this:
my $sum = 0;
Iterator
->new(\#array)
->Where(sub { $_[0] % 2 == 0 })
->ForEach(sub { $sum += $_[0] });
Which would sum up all the even values of an array.
My bottleneck is the iteration code:
$this->{Sub} = sub {
my #array = #{$this->{Array}};
return $#array >= $this->{Index} ? $array[$this->{Index}++] : undef;
};
Any pointers to speed this up?

A bit late to the game here, but since you are concerned about performance, one of the largest bottlenecks in iterator type code is that the fields of your hash based object need to be dereferenced on each access. One way to combat this is to use closures in which the fields are closed over variables, avoiding unneeded dereferencing.
In my module List::Gen which contains a fairly performant implementation of lazy lists, I wrote the utility function curse which makes closure based objects behave like normal Perl objects.
Here is a short example of your iterator class written with curse. In a simple benchmark summing 1000 numbers, this method is twice as fast as yours, even after fixing all of the inefficiencies noted in the other answers.
{package Iterator;
use List::Gen 'curse';
sub new {
my ($class, $array) = #_;
my $index = 0;
curse {
next => sub {$$array[$index++]},
index => sub :lvalue {$index},
} => $class
}
sub reset {shift->index = 0}
}
If you are really pushing for more speed, since the next method does not need to be passed anything, you could even write:
my $next = $iterator->can('next');
while (defined (my $x = $next->()) {...}
Which will give you a 30% to 40% speed boost over a normal method call.
You can read the source of List::Gen for more advanced usage of curse

You might find it useful to read a bit of Higher Order Perl.

this line:
my #array = #{$this->{Array}};
duplicates the array into #array, and I don't think you want to. Just do $#{$this->{Array}} to find the endpoint of your array.

A much more efficient version:
package Iterator;
use strict;
#Constructor for Iterator type
sub new {
my $type = shift;
my $array = shift;
my $this = {};
$this->{array} = $array;
$this->{index} = 0;
bless($this, $type);
return $this;
}
#Iterates next
sub Next {
my $this = shift;
return $this->{array}->[$this->{index}++];
}

Summing even numbers is easier done using grep and List::Util:
use List::Util 'sum';
say sum grep { not $_ % 2 } (1 .. 10); // 30
It seems very likely to me that that the code suggested by your question is over-engineering. Unless you can come up with a decent example that cannot be easily solved using the traditional Perl primitives.

Have a look at List::Util and List::MoreUtils for utilities that may help you with this.
You can even use perl5i for a more modern looking syntax.
Example:
use perl5i::2;
my #nums = (0..100);
my $sumEven = #nums->grep(sub { $_ % 2 == 0 })->reduce(sub { $a+$b });
say $sumEven;

There is already an array iterator in CPAN, so you can look at its approach if you have not done it yet.
By the way in your code you have:
#set default values
$this->{Array} = #_[1];
I assume you want to say $_[1]. With #_[1] you are requesting an array slice of one element. At the end the result is the same but the semantics isn't. The curious thing is that I was expecting to have an array of one element if I do #_[1] or an error but tested in the debugger and you obtain the scalar (at least in perl 5.10). Perl 6 will go for this behaviour anyway and will not change sigil for accessing elements in arrays or hashes so you are coding 'advanced' Perl ;-)

Don't unload the stored array. You're copying every element of an array from where it is pointed at by $this->{Array} to the local list #array when you do this:
my #array = #{$this->{Array}};
Also if you know that you are going to stop when you hit undef, then you don't have to even check bounds.
$this->{Sub} = sub { return $this->{Array}[++$this->{Index}]; }
Is all you need. When {Index} gets out of range, it will return undef.
In addition, you can write your expression in Perl like:
$sum += $_ foreach grep { $_ % 2 == 0 } #array;

A much simpler Perl iterator:
my #array = (1, 2, 3, 4);
while (my $i = shift #array)
{
print $i . "\n";
}

Related

Memory/performance tradeoff when determining the size of a Perl hash

I was browsing through some Perl code in a popular repositiory on GitHub and ran across this method to calculate the size of a hash:
while ( my ($a, undef ) = each %h ) { $num++; }
I thought why would one go through the trouble of writing all that code when it could more simply be written as
$num = scalar keys %h;
So, I compared both methods with Benchmark.
my %h = (1 .. 1000);
cmpthese(-10, {
keys => sub {
my $num = 0;
$num = scalar keys %h;
},
whileloop => sub {
my $num = 0;
while ( my ($a, undef ) = each %h ) {
$num++;
}
},
});
RESULTS
Rate whileloop keys
whileloop 5090/s -- -100%
keys 7234884/s 142047% --
The results show that using keys is MUCH faster than the while loop. My question is this: why would the original coder use such a slow method? Is there something that I'm missing? Also, is there a faster way?
I cannot read the mind of whomever might have written that piece of code, but he/she likely thought:
my $n = keys %hash;
used more memory than iterating through everything using each.
Note that the scalar on the left hand side of the assignment creates scalar context: There is no need for scalar unless you want to create a scalar context in what would otherwise have been list context.
Because he didn't know about keys's ability to return the number of elements in the hash.

Perl custom syntax for passing function arguments

I have been using perl for some time now.
I want to know how I can run the following operation in perl:
subtract(40)(20)
To get the result:
20
I think I would have to look at custom parsing techniques for Perl.
This is what I am looking at right now:
Devel::Declare
Devel::CallParser
and
http://www.perl.com/pub/2012/10/an-overview-of-lexing-and-parsing.html
Now, I am not sure what to look for or what to do.
Any help on HOW to go about this, WHAT to read would be appreciated. Please be clear.
Thank you.
I recommend trying Parse::Keyword. Parse::Keyword is really great for parsing custom syntax, as it lets you call back various parts of the Perl parser, such as parse_listexpr, parse_block, parse_fullstmt, etc (see perlapi).
It has a drawback in that if you use those to parse expressions that close over variables, these are handled badly, but this can be worked around with PadWalker.
Parse::Keyword (including PadWalker trickery) is what Kavorka uses; and that does some pretty complex stuff! Early versions of p5-mop-redux used it too.
Anyway, here's a demonstration of how your weird function could be parsed...
use v5.14;
use strict;
use warnings;
# This is the package where we define the functions...
BEGIN {
package Math::Weird;
# Set up parsing for the functions
use Parse::Keyword {
add => \&_parser,
subtract => \&_parser,
multiply => \&_parser,
divide => \&_parser,
};
# This package is an exporter of course
use parent 'Exporter::Tiny';
our #EXPORT = qw( add subtract multiply divide );
# We'll need these things from PadWalker
use PadWalker qw( closed_over set_closed_over peek_my );
sub add {
my #numbers = _grab_args(#_);
my $sum = 0;
$sum += $_ for #numbers;
return $sum;
}
sub subtract {
my #numbers = _grab_args(#_);
my $diff = shift #numbers;
$diff -= $_ for #numbers;
return $diff;
}
sub multiply {
my #numbers = _grab_args(#_);
my $product = 1;
$product *= $_ for #numbers;
return $product;
}
sub divide {
my #numbers = _grab_args(#_);
my $quotient = shift #numbers;
$quotient /= $_ for #numbers;
return $quotient;
}
sub _parser {
lex_read_space;
my #args;
while (lex_peek eq '(')
{
# read "("
lex_read(1);
lex_read_space;
# read a term within the parentheses
push #args, parse_termexpr;
lex_read_space;
# read ")"
lex_peek eq ')' or die;
lex_read(1);
lex_read_space;
}
return sub { #args };
}
# In an ideal world _grab_args would be implemented like
# this:
#
# sub _grab_args { map scalar(&$_), #_ }
#
# But because of issues with Parse::Keyword, we need
# something slightly more complex...
#
sub _grab_args {
my $caller_vars = peek_my(2);
map {
my $code = $_;
my $closed_over = closed_over($code);
$closed_over->{$_} = $caller_vars->{$_} for keys %$closed_over;
set_closed_over($code, $closed_over);
scalar $code->();
} #_;
}
# We've defined a package inline. Mark it as loaded, so
# that we can `use` it below.
$INC{'Math/Weird.pm'} = __FILE__;
};
use Math::Weird qw( add subtract multiply );
say add(2)(3); # says 5
say subtract(40)(20); # says 20
say multiply( add(2)(3) )( subtract(40)(20) ); # says 100
If you can live with the additions of a sigil and an arrow, you could curry subtract as in
my $subtract = sub {
my($x) = #_;
sub { my($y) = #_; $x - $y };
};
Call it as in
my $result = $subtract->(40)(20);
If the arrow is acceptable but not the sigil, recast subtract as
sub subtract {
my($x) = #_;
sub { my($y) = #_; $x - $y };
};
Invocation in this case looks like
my $result = subtract(40)->(20);
Please don't tack on broken syntax extensions on your program to solve a solved problem.
What you want are closures, and a technique sometimes called currying.
Currying is the job of transforming a function that takes multiple arguments into a function that is invoked multiple times with one argument each. For example, consider
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
Now we can create a subroutine that already provides the first argument:
sub subtract1 { subtract(40, #_) }
Invoking subtract1(20) now evaluates to 20.
We can use anonymous subroutines instead, which makes this more flexible:
my $subtract = sub { subtract(40, #_) };
$subtract->(20);
We don't need that variable:
sub { subtract(40, #_) }->(20); # equivalent to subtract(40, 20)
We can write subtract in a way that does this directly:
sub subtract_curried {
my $x = shift;
# don't return the result, but a subroutine that calculates the result
return sub {
my $y = shift;
return $x - $y;
};
}
Now: subtract_curried(40)->(20) – notice the arrow in between, as we are dealing with a code reference (another name for anonymous subroutine, or closures).
This style of writing functions is much more common in functional languages like Haskell or OCaml where the syntax for this is prettier. It allows very flexible combinations of functions. If you are interested in this kind of programming in Perl, you might want to read Higher-Order Perl.
#Heartache: Please forget this challenge as it makes no sense for the parser and the user.
You can think of using fn[x][y] or fn{x}{y} which are valid syntax variants - i.e. you can stack [] and {} but not lists,
or fn(x,y) or fn(x)->(y) which do look nice, are also valid and meaningful syntax variants.
But fn(x)(y) will not know in which context the second list should be used.
For fn(x)(y) the common interpretation would be fn(x); (y) => (y). It returns the 2nd list after evaluating the first call.
You may create the source code filter:
package BracketFilter;
use Filter::Util::Call;
sub import {
filter_add(sub {
my $status;
s/\)\(/, /g if ($status = filter_read()) > 0;
return $status ;
});
}
1;
And use it:
#!/usr/bin/perl
use BracketFilter;
subtract(40)(20);
sub subtract {
return $_[0] - $_[1];
}

Sub references as arguments

Another Perl-beginner question, but strangely enough, I found no tutorial to explain me this simple problem.
I wanted, as an exercise, to write a function map that takes a function and an array, returning an array. In functional languages, this is used quite often and I heard about the sub references and how to use them.
sub map {
my $f = shift;
my #r = ();
foreach (#_) {
push(#r, &f($_));
}
return #r;
}
sub square {
my $r = shift;
return $r*$r;
}
print map(\&shift, 1, 2, 3, 4, 5);
But, for some reason, I only get the word CODE and a hex-number as an output, five times. I have then changed the call of f in map, to $$f($_) and $f->($_) but all of it had the same result.
What do I do wrong here?
perl have a builin map function. let's call it map2
use &$f to dereference $f
use join to print an array properly
sub map2 {
my $f = shift;
my #r = ();
foreach (#_) {
push(#r, &$f($_));
}
return #r;
}
sub square {
my $r = shift;
return $r*$r;
}
print join ",", map2(\&square, 1, 2, 3, 4, 5);
$ perl 1.pl
1,4,9,16,25
As mentioned in comments, Perl has a built in map function that you should use.
my #squares = map {$_ ** 2} 1 .. 5;
Rather than passing an argument, Perl's built in map sets $_ to each element, which allows you to write your square function succinctly as either {$_ * $_} or {$_ ** 2}
But Perl also gives you the ability to make custom map-like functions with a similar syntax. For example, say you wanted to write a version of map that maps over pairs of values:
sub pair_map (&#) { # the (&#) prototype here tells perl that the sub
my $code = shift; # takes a code block, and then a list, just like `map`
my #ret;
while (#_) {
push #ret, $code->(splice #_, 0, 2);
}
#ret
}
my #pairs = pair_map {\#_} 1 .. 10;
pair_map {print "$_[0]: $_[1]\n"} %hash;
But since Perl has been around a while, most utility functions have probably been written already. A search of CPAN will turn up many map-like functions that do various things.
I found that I frequently need to map over lists with various step sizes, so I wrote the mapn function in List::Gen. This is a fully developed solution, so it includes an optimization when called in void context, and falls back to Perl's own map when n == 1:
sub mapn (&$#) {
my ($sub, $n, #ret) = splice #_, 0, 2;
croak '$_[1] must be >= 1' unless $n >= 1;
return map $sub->($_) => #_ if $n == 1;
my $want = defined wantarray;
while (#_) {
local *_ = \$_[0];
if ($want) {push #ret =>
$sub->(splice #_, 0, $n)}
else {$sub->(splice #_, 0, $n)}
}
#ret
}
Both pair_map and mapn utilize an advanced feature of Perl subroutines called prototypes. These prototypes are not argument validation tools (like in many other languages). Rather, they tell perl to interpret calls to the functions in a special way (similar to the way some other builtin functions are used). In this case, the & portion of the prototype tells perl that the first argument to these functions can be written as a bare block, just like a normal map call.

Why does my Perl max() function always return the first element of the array?

I am relatively new to Perl and I do not want to use the List::Util max function to find the maximum value of a given array.
When I test the code below, it just returns the first value of the array, not the maximum.
sub max
{
my #array = shift;
my $cur = $array[0];
foreach $i (#array)
{
if($i > $cur)
{
$cur = $i;
}
else
{
$cur = $cur;
}
}
return $cur;
}
Replace
my #array = shift;
with
my #array = #_;
#_ is the array containing all function arguments. shift only grabs the first function argument and removes it from #_. Change that code and it should work correctly!
Why don't you want to use something that works?
One of the ways to solve problems like this is to debug your data structures. At each step you print the data you have to see if what you expect is actually in there. That can be as simple as:
print "array is [#array]\n";
Or for complex data structures:
use Data::Dumper;
print Dumper( \#array );
In this case, you would have seen that #array has only one element, so there it must be the maximum.
If you want to see how list assignment and subroutine arguments work, check out Learning Perl.
You can write the function as:
#!/usr/bin/perl
use strict; use warnings;
print max(#ARGV);
sub max {
my $max = shift;
$max >= $_ or $max = $_ for #_;
return $max;
}
However, it would be far more efficient to pass it a reference to the array and even more efficient to use List::Util::max.

How can I verify that a value is present in an array (list) in Perl?

I have a list of possible values:
#a = qw(foo bar baz);
How do I check in a concise way that a value $val is present or absent in #a?
An obvious implementation is to loop over the list, but I am sure TMTOWTDI.
Thanks to all who answered! The three answers I would like to highlight are:
The accepted answer - the most "built-in" and backward-compatible way.
RET's answer is the cleanest, but only good for Perl 5.10 and later.
draegtun's answer is (possibly) a bit faster, but requires using an additional module. I do not like adding dependencies if I can avoid them, and in this case do not need the performance difference, but if you have a 1,000,000-element list you might want to give this answer a try.
If you have perl 5.10, use the smart-match operator ~~
print "Exist\n" if $var ~~ #array;
It's almost magic.
Perl's bulit in grep() function is designed to do this.
#matches = grep( /^MyItem$/, #someArray );
or you can insert any expression into the matcher
#matches = grep( $_ == $val, #a );
This is answered in perlfaq4's answer to "How can I tell whether a certain element is contained in a list or array?".
To search the perlfaq, you could search through the list of all questions in perlfaq using your favorite browser.
From the command line, you can use the -q switch to perldoc to search for keywords. You would have found your answer by searching for "list":
perldoc -q list
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
use 5.010;
if( $item ~~ #array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
#blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (#blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
#primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
#is_tiny_prime = ();
for (#primes) { $is_tiny_prime[$_] = 1 }
# or simply #istiny_prime[#primes] = (1) x #primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
#articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (#articles) { vec($read,$_,1) = 1 }
Now check whether vec($read,$n,1) is true for some $n.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports the function first for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalent looks like this subroutine:
sub first (&#) {
my $code = shift;
foreach (#_) {
return $_ if &{$code}();
}
undef;
}
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
my $is_there = grep $_ eq $whatever, #array;
If you want to actually extract the matching elements, simply use grep in list context.
my #matches = grep $_ eq $whatever, #array;
Use the first function from List::Util which comes as standard with Perl....
use List::Util qw/first/;
my #a = qw(foo bar baz);
if ( first { $_ eq 'bar' } #a ) { say "Found bar!" }
NB. first returns the first element it finds and so doesn't have to iterate through the complete list (which is what grep will do).
One possible approach is to use List::MoreUtils 'any' function.
use List::MoreUtils qw/any/;
my #array = qw(foo bar baz);
print "Exist\n" if any {($_ eq "foo")} #array;
Update: corrected based on zoul's comment.
Interesting solution, especially for repeated searching:
my %hash;
map { $hash{$_}++ } #a;
print $hash{$val};
$ perl -e '#a = qw(foo bar baz);$val="bar";
if (grep{$_ eq $val} #a) {
print "found"
} else {
print "not found"
}'
found
$val='baq';
not found
If you don't like unnecessary dependency, implement any or first yourself
sub first (&#) {
my $code = shift;
$code->() and return $_ foreach #_;
undef
}
sub any (&#) {
my $code = shift;
$code->() and return 1 foreach #_;
undef
}