Perl - Best practices when sending blocks to subs

Perl - Best practices when sending blocks to subs - perl

I'm a big fan of functional programming, so when I discovered block references in Perl I started using them a lot.
However, the functions I've written that take blocks as arguments are written in this style:
sub mygrep (&#) {
my $code = shift;
my #result;
foreach $_ (#_) {
push(#result, $_) if &$code;
}
#result;
}
(From http://perldoc.perl.org/perlsub.html#Prototypes)
In essence, most of my functions set $_ in order for the code block to gain access to data in my sub. I guess my question can be split into three subquestions:
Are there some major pitfalls in this approach?
Is it a better idea to localize $_ before setting it?
Should i use partially applied functions instead?
I'm still a Perl newbie so any answers and suggestions are appreciated - thanks in advance! :)

In the code you have written:
sub mygrep (&#) {
my $code = shift;
my #result;
foreach $_ (#_) {
push(#result, $_) if &$code;
}
#result;
}
The foreach loop is implicitly localizing the $_ variable on every loop iteration. It is perfectly safe (and the fastest way to get values into $_ properly).
The only nit that I have with the code above is that every time &$code is executed, it has access to the source argument list, which could cause a bug. You could rewrite the code as follows:
sub mygrep (&#) {
my $code = shift;
my #result;
foreach $_ (splice #_) {
push(#result, $_) if &$code; # #_ is empty here
}
#result;
}
Here are a few other ways you could write that function:
sub mygrep (&#) {
my ($code, #result) = shift;
&$code and push #result, $_ for splice #_;
#result
}
sub mygrep (&#) {
my $code = shift;
# or using grep in our new grep:
grep &$code, splice #_
}
Each of these examples provides an aliased $_ to its subroutine, with proper localization.
If you are interested in higher order functions, I'd encourage you to take a look at my module List::Gen on CPAN, which provides dozens of higher order functions for manipulating both real and lazy lists.
use List::Gen;
my $list = filter {$_ % 2} <1..>;
# as a lazy array:
say "#$list[0 .. 5]"; # 1 3 5 7 9 11
# as an object:
$list->map('**2')->drop(100)->say(5); # 40401 41209 42025 42849 43681
zip('.' => <a..>, <1..>)->say(5); # a1 b2 c3 d4 e5

How about using $code->($arg)?
sub mygrep (&#) {
my $code = shift;
my #result;
foreach my $arg (#_) {
push(#result, $arg) if $code->( $arg);
}
#result;
}
I haven't tested it but I would assume this would work, and it would let you pass additional arguments to $code.
Updated: this looked fun so I went ahead and tested it. It works just fine, see below (I intensely dislike prototypes, so I removed it, especially as it kept complaining about #a not being an array ref ;--(
#!/usr/bin/perl
use strict;
use warnings;
sub mygrep {
my $code = shift;
my #result;
foreach my $arg (#_) {
push(#result, $arg) if $code->( $arg);
}
#result;
}
my #a= ( 1, 2, 3, 4, 5, 6);
print mygrep( sub { return shift() % 2 }, #a), "\n";
And of course the main fun with this line of thinking is also to generate the code;
#!/usr/bin/perl
use strict;
use warnings;
sub mygrep {
my $code = shift;
my $filter= shift;
my #result;
foreach my $arg (#_) {
push(#result, $arg) if $code->( $arg);
}
#result;
}
my #a= ( 1, 2, 3, 4, 5, 6, 7, 8, 9);
print mygrep( mod_filter( 3), #a), "\n";
print mygrep( mod_filter( 4), #a), "\n";
sub mod_filter
{ my( $filter)= #_;
return sub { ! (shift() % $filter) };
}

1. Are there some major pitfalls in this approach?
my $_; in view of the block will hide your changes to package variable $_. There's nothing you can do about that from inside of mygrep.
&$code is very special. You want &$code() or $code->() instead.
Changing $_ will change the arguments passed to mygrep. That's undesirable here.
2. Is it a better idea to localize $_ before setting it?
for provides much better localisation that local, but it also provides aliasing that's undesirable here.
3. Should i use partially applied functions instead?
I don't know what that means.
Fixed:
sub mygrep (&#) {
my $code = shift;
my #result;
for (#_) {
# Create copy so $_ can be modified safely.
for (my $s = $_) {
push #result, $_ if $code->();
}
}
return #result;
}
That said, I think mygrep is kind pointless, since map+grep already does what you want more easily. Compare
mygrep { if ($_ % 2) { ++$_; 1 } else { 0 } } LIST
with
map { $_+1 } grep { $_ % 2 } LIST
You can even merge the map and grep.
map { $_ % 2 ? $_+1 : () } LIST

It's absolutely better to localize $_. The subref can modify the value of $_, and those changes will propagate into the calling function. This isn't a problem in the mygrep() case, but could be in others.

Related

Unexpected results for high order function

I have a higher order function that maps even position values in an array:
sub map_even(&#) {
my $block = shift;
my #res;
for $i (0..$#_) {
push #res, $i%2 ? $_[$i] : &$block($_[$i]);
}
#res;
}
print map_even {$_*$_} 1,2,3,4;
I am expecting the output to be 14316, but the actual output is
0204
Why does this happen and how can I fix this? And is there any improvement can be done to the code?

In your anonymous function you have to access first input argument via $_[0] (hint: #_ array).
use strict;
use warnings;
sub map_even(&#) {
my $block = shift;
my #res;
for my $i (0..$#_) {
push #res, $i%2 ? $block->($_[$i]) : $_[$i];
}
#res;
}
print join ",", map_even {$_[0]*$_[0]} 1,2,3,4;
output
1,4,3,16
Using $_,
sub map_even(&#) {
my $block = shift;
my #res;
for my $i (0..$#_) {
push #res, $i%2 ? $block->() : $_ for $_[$i];
# or
# local $_ = $_[$i];
# push #res, $i%2 ? $block->() : $_;
}
#res;
}
print join ",", map_even {$_*$_} 1,2,3,4;

In your map_even block, you use the special $_ variable. However, you have to set it inside your loop:
local $_ = $_[$i];
... $block->();
The $_ is a global variable and can be temporarily overridden with the local operator. The $_ has nothing to do with subroutine arguments.
About aliasing: Perls for, map and grep mostly alias $_ to the current element as a performance hack, not because this behavior would be particularly desirable. In order to perform an alias, you should localize the whole *_ typeglob which contains the $_ variable and then assign a scalar reference of the alias target to the glob:
local *_ = \$_[$i];

I would solve this one of two ways.
First, by using List::Utils's pairmap:
use strict;
use warnings;
use List::Util qw(pairmap);
my #x = (1 .. 4);
my #result = pairmap {$a, $b**2} #x;
print "#result\n";
Or more simply, by just using the indexes:
use strict;
use warnings;
my #x = (1 .. 4);
my #result = map {$_ % 2 ? $x[$_] ** 2 : $x[$_]} (0..$#x);
print "#result\n";
However, if you really wanted a new sub, I'd just setup a flip-flop:
use strict;
use warnings;
sub map_even(&#) {
my $block = shift;
my $even = 1;
map {($even ^= 1) ? $block->() : $_} #_;
}
print join " ", map_even {$_*$_} 1,2,3,4;
All output:
1 4 3 16

What do dollar, at-sign and semicolon characters in Perl parameter lists mean?

I have encountered a number of Perl scripts in the codebase at my job. Some of them contain subroutines with the following syntax oddity:
sub sum($$$) {
my($a,$b,$m)=#_;
for my $i (0..$m) {
$$a[$i] += $$b[$i] if $$b[$i] > 0;
}
}
sub gNode($$;$$) {
my($n,$l,$s,$d) = #_;
return (
"Node name='$n' label='$l' descr='$d'" ,
$s ? ("Shape type='$s' /") : (),
'/Node'
);
}
sub gOut($$#) {
my $h = shift;
my $i = shift;
if ($i > 0) {
print $h (('')x$i, map '<'.$_.'>', #_);
} else {
print $h map '<'.$_.'>', #_;
}
}
Leaving aside the question of what these subroutines are meant to do (I'm not entirely sure myself...), what do the sequences of characters in the 'parameter list' position mean? Viz. the $$$, $$;$$ and $$# sequences in these examples.
I have a very limited understanding of Perl, but I believe that the my($a,$b,$m)=#_; line in the first example (sum) unpacks the parameters passed to the subroutine into the $a, $b and $m local variables. This suggests that the $$$ indicates the arity and type signature of sum (it expects three scalars, in this case). This would potentially suggest that gOut expects two scalars and an array. Is this the correct interpretation?
Even if the above interpretation is correct, I'm lost as to the meaning of the semicolon in the second routine (gNode).

See perldoc perlsub entry on Prototypes.
# Declared as Called as
sub mylink ($$) mylink $old, $new
sub myvec ($$$) myvec $var, $offset, 1
sub myindex ($$;$) myindex &getstring, "substr"
sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off
sub myreverse (#) myreverse $a, $b, $c
sub myjoin ($#) myjoin ":", $a, $b, $c
sub mypop (+) mypop #array
sub mysplice (+$$#) mysplice #array, 0, 2, #pushme
sub mykeys (+) mykeys %{$hashref}
sub myopen (*;$) myopen HANDLE, $name
sub mypipe (**) mypipe READHANDLE, WRITEHANDLE
sub mygrep (&#) mygrep { /foo/ } $a, $b, $c
sub myrand (;$) myrand 42
sub mytime () mytime
Don't forget: This is all very powerful, of course, and should be used only in moderation to make the world a better place.

I agree with the rest: don't use sub prototypes unless you know what you're doing. "With great power comes great responsibility." Those look like they were created by someone used to C prototypes. For example, the sub sum really should have this prototype:
sub sum (\$\$\$) {

Why does this Perl function appear to process only some of its arguments?

Why do
my $i=0;
my #arr=();
sub readall {
foreach (#_) {
$arr[$i] = shift #_;
$i++;
}
}
readall(1, 2, 3, 4, 5);
print "#arr"
and
my $i=0;
my #arr=();
sub readall {
foreach (#_) {
$arr[$i] = shift #_;
print $arr[$i];
$i++;
}
}
readall(1, 2, 3, 4, 5);
print only three of the arguments to readall?
Why does this function, which seems like it should behave the same, process all five arguments?
sub readall {
foreach (#_) {
print $_;
}
}
readall(1, 2, 3, 4, 5);
This also reads all five (but does operate on a different principle):
my #arr=();
sub readall {
push(#arr, #_);
}
readall(1, 2, 3, 4, 5);
print "#arr"

Using foreach and shift on the same array may be causing confusion. Both the ones that fail use it, both that dont.. dont.
Just changing shift #_ to $_ fixes it.
This is happening because you are shortening the array as you are iterating.

Every time you shift your array, it gets shorter... So you are not operating on the whole array, and it will stop early. You can see this by adding a line to your code:
perl -wlae 'my $i=0; my #arr=(); sub readall {foreach (#_) {$arr[$i]=shift #_; $i++; print #_;}} readall(1,2,3,4,5); print "#arr"'
2345
345
45
I assume you can figure it out from here.

You iterate over all arguments in #_, simultaneously shifting #_ to make it shorter:
sub readall {foreach (#_) {$arr[$i]=shift #_ ....}
Let great perlists here explain what's expected in this case, what's documented and why you shouldn't do it. For me it's just logically wrong, and does not make any sense. Perhaps s/foreach/while/ is more idiomatic (at least, it works).

shift #_ inside of foreach (#_) is wrong, getting rid of it fixes the array walk:
$ perl -wlae 'my $i=0; my #arr=(); sub readall {foreach (#_) {$arr[$i]=$_[$i]; $i++}} readall(1,2,3,4,5); print "#arr"'
1 2 3 4 5
foreach works referencing $_ to each element of the array:
$ perl -wlae 'my #arr=(1..5);foreach (#arr) { $_ *= 2 }; foreach (#arr) { print }'
2
4
6
8
10
So dereferencing an element with an unshift/pop messes everything.

How is the map function in Perl implemented?

Is map function in Perl written in Perl? I just can not figure out how to implement it. Here is my attempt:
use Data::Dumper;
sub Map {
my ($function, $sequence) = #_;
my #result;
foreach my $item (#$sequence) {
my $_ = $item;
push #result, $function->($item);
}
return #result
}
my #sample = qw(1 2 3 4 5);
print Dumper Map(sub { $_ * $_ }, \#sample);
print Dumper map({ $_ * $_ } #sample);
$_ in $function is undefined as it should be, but how map overcomes this?

map has some special syntax, so you can't entirely implement it in pure-perl, but this would come pretty close to it (as long as you're using the block form of map):
sub Map(&#) {
my ($function, #sequence) = #_;
my #result;
foreach my $item (#sequence) {
local $_ = $item;
push #result, $function->($item);
}
return #result
}
use Data::Dumper;
my #sample = qw(1 2 3 4 5);
print Dumper Map { $_ * $_ } #sample;
print Dumper map { $_ * $_ } #sample;
$_ being undefined is overcome by using local $_ instead of my $_. Actually you almost never want to use my $_ (even though you do want to use it on almost all other variables).
Adding the (&#) prototype allows you not to specify sub in front of the block. Again, you almost never want to use prototypes but this is a valid use of them.

While the accepted answer implements a map-like function, it does NOT do it in the way perl would. An important part of for, foreach, map, and grep is that the $_ they provide to you is always an alias to the values in the argument list. This means that calling something like s/a/b/ in any of those constructs will modify the elements they were called with. This allows you to write things like:
my ($x, $y) = qw(foo bar);
$_ .= '!' for $x, $y;
say "$x $y"; # foo! bar!
map {s/$/!!!/} $x, $y;
say "$x $y"; # foo!!!! bar!!!!
Since in your question, you have asked for Map to use array references rather than arrays, here is a version that works on array refs that is as close to the builtin map as you can get in pure Perl.
use 5.010;
use warnings;
use strict;
sub Map (&\#) {
my ($code, $array) = splice #_;
my #return;
push #return, &$code for #$array;
#return
}
my #sample = qw(1 2 3 4 5);
say join ', ' => Map { $_ * $_ } #sample; # 1, 4, 9, 16, 25
say join ', ' => map { $_ * $_ } #sample; # 1, 4, 9, 16, 25
In Map, the (&\#) prototype tells perl that the Map bareword will be parsed with different rules than a usual subroutine. The & indicates that the first argument will either be a bare block Map {...} NEXT or it will be a literal code reference Map \&somesub, NEXT. Note the comma between the arguments in the latter version. The \# prototype indicates that the next argument will start with # and will be passed in as an array reference.
Finally, the splice #_ line empties #_ rather than just copying the values out. This is so that the &$code line will see an empty #_ rather than the args Map received. The reason for &$code is that it is the fastest way to call a subroutine, and is as close to the multicall calling style that map uses as you can get without using C. This calling style is perfectly suited for this usage, since the argument to the block is in $_, which does not require any stack manipulation.
In the code above, I cheat a little bit and let for do the work of localizing $_. This is good for performance, but to see how it works, here is that line rewritten:
for my $i (0 .. $#$array) { # for each index
local *_ = \$$array[$i]; # install alias into $_
push #return, &$code;
}

My Object::Iterate module is an example of what you are trying to do.

perl: iterate over a typeglob

Given a typeglob, how can I find which types are actually defined?
In my application, we user PERL as a simple configuration format.
I'd like to require() the user config file, then be able to see which variables are defined, as well as what types they are.
Code: (questionable quality advisory)
#!/usr/bin/env perl
use strict;
use warnings;
my %before = %main::;
require "/path/to/my.config";
my %after = %main::;
foreach my $key (sort keys %after) {
next if exists $before{$symbol};
local *myglob = $after{$symbol};
#the SCALAR glob is always defined, so we check the value instead
if ( defined ${ *myglob{SCALAR} } ) {
my $val = ${ *myglob{SCALAR} };
print "\$$symbol = '".$val."'\n" ;
}
if ( defined *myglob{ARRAY} ) {
my #val = #{ *myglob{ARRAY} };
print "\#$symbol = ( '". join("', '", #val) . "' )\n" ;
}
if ( defined *myglob{HASH} ) {
my %val = %{ *myglob{HASH} };
print "\%$symbol = ( ";
while( my ($key, $val) = each %val ) {
print "$key=>'$val', ";
}
print ")\n" ;
}
}
my.config:
#A = ( a, b, c );
%B = ( b=>'bee' );
$C = 'see';
output:
#A = ( 'a', 'b', 'c' )
%B = ( b=>'bee', )
$C = 'see'
$_<my.config = 'my.config'

In the fully general case, you can't do what you want thanks to the following excerpt from perlref:
*foo{THING} returns undef if that particular THING hasn't been used yet, except in the case of scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn't been used yet. This might change in a future release.
But if you're willing to accept the restriction that any scalar must have a defined value to be detected, then you might use code such as
#! /usr/bin/perl
use strict;
use warnings;
open my $fh, "<", \$_; # get DynaLoader out of the way
my %before = %main::;
require "my.config";
my %after = %main::;
foreach my $name (sort keys %after) {
unless (exists $before{$name}) {
no strict 'refs';
my $glob = $after{$name};
print "\$$name\n" if defined ${ *{$glob}{SCALAR} };
print "\#$name\n" if defined *{$glob}{ARRAY};
print "%$name\n" if defined *{$glob}{HASH};
print "&$name\n" if defined *{$glob}{CODE};
print "$name (format)\n" if defined *{$glob}{FORMAT};
print "$name (filehandle)\n" if defined *{$glob}{IO};
}
}
will get you there.
With my.config of
$JACKPOT = 3_756_788;
$YOU_CANT_SEE_ME = undef;
#OPTIONS = qw/ apple cherries bar orange lemon /;
%CREDITS = (1 => 1, 5 => 6, 10 => 15);
sub is_jackpot {
local $" = ""; # " fix Stack Overflow highlighting
"#_[0,1,2]" eq "barbarbar";
}
open FH, "<", \$JACKPOT;
format WinMessage =
You win!
.
the output is
%CREDITS
FH (filehandle)
$JACKPOT
#OPTIONS
WinMessage (format)
&is_jackpot
Printing the names takes a little work, but we can use the Data::Dumper module to take part of the burden. The front matter is similar:
#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub _dump {
my($ref) = #_;
local $Data::Dumper::Indent = 0;
local $Data::Dumper::Terse = 1;
scalar Dumper $ref;
}
open my $fh, "<", \$_; # get DynaLoader out of the way
my %before = %main::;
require "my.config";
my %after = %main::;
We need to dump the various slots slightly differently and in each case remove the trappings of references:
my %dump = (
SCALAR => sub {
my($ref,$name) = #_;
return unless defined $$ref;
"\$$name = " . substr _dump($ref), 1;
},
ARRAY => sub {
my($ref,$name) = #_;
return unless defined $ref;
for ("\#$name = " . _dump $ref) {
s/= \[/= (/;
s/\]$/)/;
return $_;
}
},
HASH => sub {
my($ref,$name) = #_;
return unless defined $ref;
for ("%$name = " . _dump $ref) {
s/= \{/= (/;
s/\}$/)/;
return $_;
}
},
);
Finally, we loop over the set-difference between %before and %after:
foreach my $name (sort keys %after) {
unless (exists $before{$name}) {
no strict 'refs';
my $glob = $after{$name};
foreach my $slot (keys %dump) {
my $var = $dump{$slot}(*{$glob}{$slot},$name);
print $var, "\n" if defined $var;
}
}
}
Using the my.config from your question, the output is
$ ./prog.pl
#A = ('a','b','c')
%B = ('b' => 'bee')
$C = 'see'

Working code using a CPAN module that gets some of the hair out of the way, Package::Stash. As noted in my comment to gbacon's answer, this is blind to the config file doing $someval = undef but that seems to be unavoidable, and at least the other cases are caught. It also limits itself to the SCALAR, ARRAY, HASH, CODE, and IO types -- getting GLOB and FORMAT is possible but it makes the code less pretty and also creates noise in the output :)
#!perl
use strict;
use warnings;
use Package::Stash;
sub all_vars_in {
my ($package) = #_;
my #ret;
my $stash = Package::Stash->new($package);
for my $sym ($stash->list_all_package_symbols) {
for my $sigil (qw($ # % &), '') {
my $fullsym = "$sigil$sym";
push #ret, $fullsym if $stash->has_package_symbol($fullsym);
}
}
#ret;
}
my %before;
$before{$_} ++ for all_vars_in('main');
require "my.config";
for my $var (all_vars_in('main')) {
print "$var\n" unless exists $before{$var};
}

Beginning in 5.010, you can distinguish whether a SCALAR exists using the B introspection module; see Detecting declared package variables in perl
Update: example copied from that answer:
# package main;
our $f;
sub f {}
sub g {}
use B;
use 5.010;
if ( ${ B::svref_2object(\*f)->SV } ) {
say "f: Thar be a scalar tharrr!";
}
if ( ${ B::svref_2object(\*g)->SV } ) {
say "g: Thar be a scalar tharrr!";
}
1;

UPDATE:
gbacon is right. *glob{SCALAR} is defined.
Here is the output I get using your code:
Name "main::glob" used only once:
possible typo at
test_glob_foo_thing.pl line 13.
'FOO1' (SCALAR)
'FOO1' (GLOB)
'FOO2' (SCALAR)
'FOO2' (GLOB)
'_<my.config' (SCALAR)
'_<my.config' (GLOB)
This is despite FOO2 being defined as a hash, but not as a scalar.
ORIGINAL ANSWER:
If I understand you correctly, you simply need to use the defined built-in.
#!/usr/bin/env perl
use strict;
use warnings;
my %before = %main::;
require "/path/to/my.config";
my %after = %main::;
foreach my $key (sort keys %after) {
if (not exists $before{$key}) {
if(defined($after{$key}){
my $val = $after{$key};
my $what = ref($val);
print "'$key' ($what)\n";
}
}
}

I hate to ask, but instead of messing around with typeglobs, why not switch to a real configuration format? e.g. check out Config::Simple and YAML.
I wouldn't recommend messing around with typeglobs and symbol tables in normal cases (some CPAN modules do that, but only at the bottom levels of large systems - e.g. Moose in the lowest levels of Class::MOP). Perl gives you a lot of rope to work with, but that rope is also quite happy to self-noosify and self-tie-around-your-neck if you're not careful :)
See also: How do you manage configuration files in Perl?

no strict 'refs';
my $func_name = 'myfunc';
*{$func_name}{CODE}()
use strict 'refs';

If you don't mind parsing Data::Dump output, you could use it to tease out the differences.
use strict;
use warnings;
use Data::Dump qw{ dump };
my %before = %main::;
require "my.config";
my %after = %main::;
foreach my $key ( sort keys %after ) {
if ( not exists $before{$key} ) {
my $glob = $after{$key};
print "'$key' " . dump( $glob) . "\n";
}
}
Using this code with the following config file:
$FOO1 = 3;
$FOO2 = 'my_scalar';
%FOO2 = ( a=>'b', c=>'d' );
#FOO3 = ( 1 .. 5);
$FOO4 = [ 1 .. 5 ];
I believe that this output provides enough information to be able to figure out which parts of each type glob are defined:
'FOO1' do {
my $a = *main::FOO1;
$a = \3;
$a;
}
'FOO2' do {
my $a = *main::FOO2;
$a = \"my_scalar";
$a = { a => "b", c => "d" };
$a;
}
'FOO3' do {
my $a = *main::FOO3;
$a = [1 .. 5];
$a;
}
'FOO4' do {
my $a = *main::FOO4;
$a = \[1 .. 5];
$a;
}
'_<my.config' do {
my $a = *main::_<my.config;
$a = \"my.config";
$a;
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl - Best practices when sending blocks to subs - perl

It's absolutely better to localize $_. The subref can modify the value of $_, and those changes will propagate into the calling function. This isn't a problem in the mygrep() case, but could be in others.

Related

Unexpected results for high order function

What do dollar, at-sign and semicolon characters in Perl parameter lists mean?

Why does this Perl function appear to process only some of its arguments?

How is the map function in Perl implemented?

perl: iterate over a typeglob

Categories

Resources