How is the map function in Perl implemented? - perl

Is map function in Perl written in Perl? I just can not figure out how to implement it. Here is my attempt:
use Data::Dumper;
sub Map {
my ($function, $sequence) = #_;
my #result;
foreach my $item (#$sequence) {
my $_ = $item;
push #result, $function->($item);
}
return #result
}
my #sample = qw(1 2 3 4 5);
print Dumper Map(sub { $_ * $_ }, \#sample);
print Dumper map({ $_ * $_ } #sample);
$_ in $function is undefined as it should be, but how map overcomes this?

map has some special syntax, so you can't entirely implement it in pure-perl, but this would come pretty close to it (as long as you're using the block form of map):
sub Map(&#) {
my ($function, #sequence) = #_;
my #result;
foreach my $item (#sequence) {
local $_ = $item;
push #result, $function->($item);
}
return #result
}
use Data::Dumper;
my #sample = qw(1 2 3 4 5);
print Dumper Map { $_ * $_ } #sample;
print Dumper map { $_ * $_ } #sample;
$_ being undefined is overcome by using local $_ instead of my $_. Actually you almost never want to use my $_ (even though you do want to use it on almost all other variables).
Adding the (&#) prototype allows you not to specify sub in front of the block. Again, you almost never want to use prototypes but this is a valid use of them.

While the accepted answer implements a map-like function, it does NOT do it in the way perl would. An important part of for, foreach, map, and grep is that the $_ they provide to you is always an alias to the values in the argument list. This means that calling something like s/a/b/ in any of those constructs will modify the elements they were called with. This allows you to write things like:
my ($x, $y) = qw(foo bar);
$_ .= '!' for $x, $y;
say "$x $y"; # foo! bar!
map {s/$/!!!/} $x, $y;
say "$x $y"; # foo!!!! bar!!!!
Since in your question, you have asked for Map to use array references rather than arrays, here is a version that works on array refs that is as close to the builtin map as you can get in pure Perl.
use 5.010;
use warnings;
use strict;
sub Map (&\#) {
my ($code, $array) = splice #_;
my #return;
push #return, &$code for #$array;
#return
}
my #sample = qw(1 2 3 4 5);
say join ', ' => Map { $_ * $_ } #sample; # 1, 4, 9, 16, 25
say join ', ' => map { $_ * $_ } #sample; # 1, 4, 9, 16, 25
In Map, the (&\#) prototype tells perl that the Map bareword will be parsed with different rules than a usual subroutine. The & indicates that the first argument will either be a bare block Map {...} NEXT or it will be a literal code reference Map \&somesub, NEXT. Note the comma between the arguments in the latter version. The \# prototype indicates that the next argument will start with # and will be passed in as an array reference.
Finally, the splice #_ line empties #_ rather than just copying the values out. This is so that the &$code line will see an empty #_ rather than the args Map received. The reason for &$code is that it is the fastest way to call a subroutine, and is as close to the multicall calling style that map uses as you can get without using C. This calling style is perfectly suited for this usage, since the argument to the block is in $_, which does not require any stack manipulation.
In the code above, I cheat a little bit and let for do the work of localizing $_. This is good for performance, but to see how it works, here is that line rewritten:
for my $i (0 .. $#$array) { # for each index
local *_ = \$$array[$i]; # install alias into $_
push #return, &$code;
}

My Object::Iterate module is an example of what you are trying to do.

Related

Dereferencing Conditionally in Perl

I have a scalar that may or may not be a reference to an array. If it is a reference to an array, I would like to dereference it and iterate over it. If not, I would like to treat it as a one-element array and iterate over that.
my $result = my_complicated_expression;
for my $value (ref($result) eq 'ARRAY' ? #$result : ($result)) {
# Do work with $value
}
Currently, I have the above code, which works fine but feels clunky and not very Perlish. Is there a more concise way to express the idea of dereferencing a value with fallback behavior if the value is not what I expect?
Just force it before the loop.
Limited, known ref type
my $result = *some function call* // [];
$result = [$result] if ref $result ne 'ARRAY';
for my $val ( #$result ){
print $val;
}
Ref type unknown
#!/usr/bin/perl
use 5.012;
use strict;
no warnings;
sub array_ref;
my $result = [qw/foo bar foobar/];
# $result = 'foo'; # scalar test case
# $result = {foo=>q{bar}}; # hash test case
$result = array_ref $result;
for my $val ( #$result ){
say $val;
}
sub array_ref {
my $ref = shift;
given(ref $ref){
$ref = [%$ref] when('HASH');
$ref = [$ref] when(['SCALAR','']);
when('ARRAY'){}
default {
die 'Did not prepare for other ref types';
}
}
return $ref;
}
This is for demo purposes (you shouldn't use given/when in production code), but shows you could easily test for the ref type and cast a new response. However, if you really don't know what type of variable your function is returning, how are you sure it's even a reference. What if it was an array or hash?
Being perl, there's going to be several answers to this with the 'right' one being a matter of taste - IMHO, an acceptable shortening involves relying on the fact that the ref function returns the empty string if the expression given it is scalar. This means you don't need the eq 'ARRAY' if you know there are only two possibilities (ie, a scalar value and an array ref).
Secondly, you can iterate over a single scalar value (producing 1 iteration, obviously), so you don't have to put the $result in parentheses in the "scalar" case.
Putting these two small simplifications togeather gives;
use v5.12;
my $result1 = "Hello World";
my $result2 = [ "Hello" , "World" ];
for my $result ($result1, $result2) {
for my $value ( ref $result ? #$result : $result) {
say $value ;
}
}
which produces;
Hello World
Hello
World
There's likely to be 'fancier' things you can do, but this seems a reasonable compromise between being terse and readable. Of course, YMMV.
I see that I'm late to this, but I can't help it. With eval and $#, and the comma operator
my $ra = [ qw(a b c) ];
my $x = 23;
my $var = $ra;
# my $var = $x; # swap comment to test the other
foreach my $el ( eval { #{$var} }, $# && $var )
{
next if $el =~ /^$/; # when #$var is good comma adds empty line
print $el, "\n";
}
Prints a b c (one per line), if we swap to my $var = $x it prints 23.
When $var has the reference, the $# is empty but the comma is still executed and this adds an empty line, thus the next in the loop. Alternatively to skipping empty lines one can filter them out
foreach my $el ( grep { !/^$/ } eval { #{$var} }, $# && $var )
This does, in addition, clean out empty lines. However, most of the time that is desirable.
sub deref {
map ref($_) eq 'ARRAY'? #$_ : ref($_) eq 'HASH'? %$_ : $_, #_
}
sub myCompExpr {
1, 2, 3, [4, 5, 6], {Hello => 'world', Answer => 42}
}
print $_ for deref myCompExpr

Why isn't my sort working in Perl?

I have never used Perl, but I need to complete this exercise. My task is to sort an array in a few different ways. I've been provided with a test script. This script puts together the array and prints statements for each stage of it's sorting. I've named it foo.pl:
use strict;
use warnings;
use MyIxHash;
my %myhash;
my $t = tie(%myhash, "MyIxHash", 'a' => 1, 'abe' => 2, 'cat'=>'3');
$myhash{b} = 4;
$myhash{da} = 5;
$myhash{bob} = 6;
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the starting key => val pairs\n";
$t->SortByKey; # sort alphabetically
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the alphabetized key => val pairs\n";
$t->SortKeyByFunc(sub {my ($a, $b) = #_; return ($b cmp $a)}); # sort alphabetically in reverse order
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the reverse alphabetized key => val pairs\n";
$t->SortKeyByFunc(\&abcByLength); # use abcByLength to sort
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the abcByLength sorted key => val pairs\n";
print "Done\n\n";
sub abcByLength {
my ($a, $b) = #_;
if(length($a) == length($b)) { return $a cmp $b; }
else { return length($a) <=> length($b) }
}
Foo.pl uses a package called MyIxHash which I've created a module for called MyIxHash.pm. The script runs through the alphabetical sort: "SortByKey", which I've inherited via the "IxHash" package in my module. The last two sorts are the ones giving me issues. When the sub I've created: "SortKeyByFunc" is ran on the array, it passes in the array and a subroutine as arguments. I've attempted to take those arguments and associate them with variables.
The final sort is supposed to sort by string length, then alphabetically. A subroutine for this is provided at the bottom of foo.pl as "abcByLength". In the same way as the reverse alpha sort, this subroutine is being passed as a parameter to my SortKeyByFunc subroutine.
For both of these sorts, it seems the actual sorting work is done for me, and I just need to apply this subroutine to my array.
My main issue here seems to be that I don't know how, if possible, to take my subroutine argument and run my array through it as a parameter. I'm a running my method on my array incorrectly?
package MyIxHash;
#use strict;
use warnings;
use parent Tie::IxHash;
use Data::Dumper qw(Dumper);
sub SortKeyByFunc {
#my $class = shift;
my ($a, $b) = #_;
#this is a reference to the already alphabetaized array being passed in
my #letters = $_[0][1];
#this is a reference to the sub being passed in as a parameter
my $reverse = $_[1];
#this is my variable to contain my reverse sorted array
my #sorted = #letters->$reverse();
return #sorted;
}
1;
"My problem occurs where I try: my #sorted = #letters->$reverse(); I've also tried: my #sorted = sort {$reverse} #letters;"
You were really close; the correct syntax is:
my $reverse = sub { $b cmp $a };
# ...
my #sorted = sort $reverse #letters;
Also note that, for what are basically historical reasons, sort passes the arguments to the comparison function in the (slightly) magic globals $a and $b, not in #_, so you don't need to (and indeed shouldn't) do my ($a, $b) = #_; in your sortsubs (unless you declare them with a prototype; see perldoc -f sort for the gritty details).
Edit: If you're given a comparison function that for some reason does expect its arguments in #_, and you can't change the definition of that function, then your best bet is probably to wrap it in a closure like this:
my $fixed_sortsub = sub { $weird_sortsub->($a, $b) };
my #sorted = sort $fixed_sortsub #letters;
or simply:
my #sorted = sort { $weird_sortsub->($a, $b) } #letters;
Edit 2: Ah, I see the/a problem. When you write:
my #letters = $_[0][1];
what you end up with a is a single-element array containing whatever $_[0][1] is, which is presumably an array reference. You should either dereference it immediately, like this:
my #letters = #{ $_[0][1] };
or just keep is as a reference for now and dereference it when you use it:
my $letters = $_[0][1];
# ...
my #sorted = sort $whatever #$letters;
Edit 3: Once you do manage to sort the keys, then, as duskwuff notes in his original answer, you'll also need to call the Reorder() method from your parent class, Tie::IxHash to actually change the order of the keys. Also, the first line:
my ($a, $b) = #_;
is completely out of place in what's supposed to be an object method that takes a code reference (and, in fact, lexicalizing $a and $b is a bad idea anyway if you want to call sort later in the same code block). What it should read is something like:
my ($self, $sortfunc) = #_;
In fact, rather than enumerating all the things that seem to be wrong with your original code, it might be easier to just fix it:
package MyIxHash;
use strict;
use warnings;
use parent 'Tie::IxHash';
sub SortKeyByFunc {
my ($self, $sortfunc) = #_;
my #unsorted = $self->Keys();
my #sorted = sort { $sortfunc->($a, $b) } #unsorted;
$self->Reorder( #sorted );
}
1;
or simply:
sub SortKeyByFunc {
my ($self, $sortfunc) = #_;
$self->Reorder( sort { $sortfunc->($a, $b) } $self->Keys() );
}
(Ps. I now see why the comparison functions were specified as taking their arguments in #_ rather than in the globals $a and $b where sort normally puts them: it's because the comparison functions belong to a different package, and $a and $b are not magical enough to be the same in every package like, say, $_ and #_ are. I guess that could be worked around, but it would take some quite non-trivial trickery with caller.)
(Pps. Please do credit me and duskwuff / Stack Overflow when you hand in your exercise. And good luck with learning Perl — trust me, it'll be a useful skill to have.)
Your SortKeyByFunc method returns the results of sorting the array (#sorted), but it doesn't modify the array "in place". As a result, just calling $t->SortKeyByFunc(...); doesn't end up having any visible permanent effects.
You'll need to call $t->Reorder() within your SortKeyByFunc method to have any lasting impact on the array. I haven't tried it, but something like:
$t->Reorder(#sorted);
at the end of your method may be sufficient.

Sub references as arguments

Another Perl-beginner question, but strangely enough, I found no tutorial to explain me this simple problem.
I wanted, as an exercise, to write a function map that takes a function and an array, returning an array. In functional languages, this is used quite often and I heard about the sub references and how to use them.
sub map {
my $f = shift;
my #r = ();
foreach (#_) {
push(#r, &f($_));
}
return #r;
}
sub square {
my $r = shift;
return $r*$r;
}
print map(\&shift, 1, 2, 3, 4, 5);
But, for some reason, I only get the word CODE and a hex-number as an output, five times. I have then changed the call of f in map, to $$f($_) and $f->($_) but all of it had the same result.
What do I do wrong here?
perl have a builin map function. let's call it map2
use &$f to dereference $f
use join to print an array properly
sub map2 {
my $f = shift;
my #r = ();
foreach (#_) {
push(#r, &$f($_));
}
return #r;
}
sub square {
my $r = shift;
return $r*$r;
}
print join ",", map2(\&square, 1, 2, 3, 4, 5);
$ perl 1.pl
1,4,9,16,25
As mentioned in comments, Perl has a built in map function that you should use.
my #squares = map {$_ ** 2} 1 .. 5;
Rather than passing an argument, Perl's built in map sets $_ to each element, which allows you to write your square function succinctly as either {$_ * $_} or {$_ ** 2}
But Perl also gives you the ability to make custom map-like functions with a similar syntax. For example, say you wanted to write a version of map that maps over pairs of values:
sub pair_map (&#) { # the (&#) prototype here tells perl that the sub
my $code = shift; # takes a code block, and then a list, just like `map`
my #ret;
while (#_) {
push #ret, $code->(splice #_, 0, 2);
}
#ret
}
my #pairs = pair_map {\#_} 1 .. 10;
pair_map {print "$_[0]: $_[1]\n"} %hash;
But since Perl has been around a while, most utility functions have probably been written already. A search of CPAN will turn up many map-like functions that do various things.
I found that I frequently need to map over lists with various step sizes, so I wrote the mapn function in List::Gen. This is a fully developed solution, so it includes an optimization when called in void context, and falls back to Perl's own map when n == 1:
sub mapn (&$#) {
my ($sub, $n, #ret) = splice #_, 0, 2;
croak '$_[1] must be >= 1' unless $n >= 1;
return map $sub->($_) => #_ if $n == 1;
my $want = defined wantarray;
while (#_) {
local *_ = \$_[0];
if ($want) {push #ret =>
$sub->(splice #_, 0, $n)}
else {$sub->(splice #_, 0, $n)}
}
#ret
}
Both pair_map and mapn utilize an advanced feature of Perl subroutines called prototypes. These prototypes are not argument validation tools (like in many other languages). Rather, they tell perl to interpret calls to the functions in a special way (similar to the way some other builtin functions are used). In this case, the & portion of the prototype tells perl that the first argument to these functions can be written as a bare block, just like a normal map call.

Why is Perl foreach variable assignment modifying the values in the array?

OK, I have the following code:
use strict;
my #ar = (1, 2, 3);
foreach my $a (#ar)
{
$a = $a + 1;
}
print join ", ", #ar;
and the output?
2, 3, 4
What the heck? Why does it do that? Will this always happen? is $a not really a local variable? What where they thinking?
Perl has lots of these almost-odd syntax things which greatly simplify common tasks (like iterating over a list and changing the contents in some way), but can trip you up if you're not aware of them.
$a is aliased to the value in the array - this allows you to modify the array inside the loop. If you don't want to do that, don't modify $a.
See perldoc perlsyn:
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
There is nothing weird or odd about a documented language feature although I do find it odd how many people refuse check the docs upon encountering behavior they do not understand.
$a in this case is an alias to the array element. Just don't have $a = in your code and you won't modify the array. :-)
If I remember correctly, map, grep, etc. all have the same aliasing behaviour.
As others have said, this is documented.
My understanding is that the aliasing behavior of #_, for, map and grep provides a speed and memory optimization as well as providing interesting possibilities for the creative. What happens is essentially, a pass-by-reference invocation of the construct's block. This saves time and memory by avoiding unnecessary data copying.
use strict;
use warnings;
use List::MoreUtils qw(apply);
my #array = qw( cat dog horse kanagaroo );
foo(#array);
print join "\n", '', 'foo()', #array;
my #mapped = map { s/oo/ee/g } #array;
print join "\n", '', 'map-array', #array;
print join "\n", '', 'map-mapped', #mapped;
my #applied = apply { s/fee//g } #array;
print join "\n", '', 'apply-array', #array;
print join "\n", '', 'apply-applied', #applied;
sub foo {
$_ .= 'foo' for #_;
}
Note the use of List::MoreUtils apply function. It works like map but makes a copy of the topic variable, rather than using a reference. If you hate writing code like:
my #foo = map { my $f = $_; $f =~ s/foo/bar/ } #bar;
you'll love apply, which makes it into:
my #foo = apply { s/foo/bar/ } #bar;
Something to watch out for: if you pass read only values into one of these constructs that modifies its input values, you will get a "Modification of a read-only value attempted" error.
perl -e '$_++ for "o"'
the important distinction here is that when you declare a my variable in the initialization section of a for loop, it seems to share some properties of both locals and lexicals (someone with more knowledge of the internals care to clarify?)
my #src = 1 .. 10;
for my $x (#src) {
# $x is an alias to elements of #src
}
for (#src) {
my $x = $_;
# $_ is an alias but $x is not an alias
}
the interesting side effect of this is that in the first case, a sub{} defined within the for loop is a closure around whatever element of the list $x was aliased to. knowing this, it is possible (although a bit odd) to close around an aliased value which could even be a global, which I don't think is possible with any other construct.
our #global = 1 .. 10;
my #subs;
for my $x (#global) {
push #subs, sub {++$x}
}
$subs[5](); # modifies the #global array
Your $a is simply being used as an alias for each element of the list as you loop over it. It's being used in place of $_. You can tell that $a is not a local variable because it is declared outside of the block.
It's more obvious why assigning to $a changes the contents of the list if you think about it as being a stand in for $_ (which is what it is). In fact, $_ doesn't exist if you define your own iterator like that.
foreach my $a (1..10)
print $_; # error
}
If you're wondering what the point is, consider the case:
my #row = (1..10);
my #col = (1..10);
foreach (#row){
print $_;
foreach(#col){
print $_;
}
}
In this case it is more readable to provide a friendlier name for $_
foreach my $x (#row){
print $x;
foreach my $y (#col){
print $y;
}
}
Try
foreach my $a (#_ = #ar)
now modifying $a does not modify #ar.
Works for me on v5.20.2

How can I make Perl functions that use $_ by default?

I have an array and a simple function that trims white spaces:
my #ar=("bla ", "ha 1")
sub trim { my $a = shift; $a =~ s/\s+$//; $a}
Now, I want to apply this to an array with the map function. Why can't I do this by just giving the function name like one would do with built-in functions?
For example, you can do
print map(length, #ar)
But you can't do
print map(trim, #ar)
You have to do something like:
print map {trim($_)} #ar
print map(trim($_), #ar)
If you are using 5.10 or later, you can specify _ as the prototype for trim. If you are using earlier versions, use Axeman's answer:
As the last character of a prototype, or just before a semicolon, you can use _ in place of $ : if this argument is not provided, $_ will be used instead.
use strict; use warnings;
my #x = ("bla ", "ha 1");
sub trim(_) { my ($x) = #_; $x =~ s!\s+$!!; $x }
print map trim, #x;
Incidentally, don't use $a and $b outside of a sort comparator: They are immune from strict checking.
However, I prefer not to use prototypes for functions I write mainly because their use makes it harder to mentally parse the code. So, I would prefer using:
map trim($_), #x;
See also perldoc perlsub:
This is all very powerful, of course, and should be used only in moderation to make the world a better place.
The prototype that Sinan talks about is the best current way. But for earlier versions, there is still the old standby:
sub trim {
# v-- Here's the quick way to do it.
my $str = #_ ? $_[0] : $_;
# That was it.
$str =~ s/^\s+|\s+$//;
return $str;
}
Of course, I have a trim function with more features and handles more arguments and list context, but it doesn't demonstrate the concept as well. The ternary expression is a quick way to do what the '_' prototype character now does.
My favorite way to optionally use $_ without needing 5.10+ is as follows:
sub trim {
my ($s) = (#_, $_);
$s =~ s/\s+$//;
$s
}
This assigns the first element of #_ to $s if there is one. Otherwise it uses $_.
Many Perl built-in functions operate on $_ if given no arguments.
If your function did the same, it would work:
my #ar = ("bla ", "ha 1");
sub trim { my $s = #_ ? $_[0] : $_; $s =~ s/\s+$//; $s}
print map(trim, #ar), "\n";
And yes, Perl is kind of gross.