How to call a subroutine with a variable pre-assigned to some value? - perl

In Perl, when one uses the sort function with a custom comparison, the variables $a and $b are already assigned to the current pair of elements to compare, e.g. in:
#decreasing = sort { $b <=> $a } #list;
How can I write other subroutines with a similar functionality? For example, imagine that I want to write sort of process_and_store function that does something special with each item of a list and then stores it in a database; and where the variable $item is already assigned to the current item being processed. I would like to write for example something like:
process_and_store { do_something_with($item); } #list;
Rather than
process_and_store { my $item = shift; do_something_with($item); } #list;
How should I go about doing this?
UPDATE: For completeness, although flesk's answer works without problems, in order to “properly” localize the changes I make to the $item variable I had to follow the advice from Axeman. In SomePackage.pm I placed something like:
package SomePackage;
use strict;
require Exporter;
our #ISA = qw/Exporter/;
our #EXPORT = qw(process_and_store);
our $item;
sub import {
my $import_caller = caller();
{ no strict 'refs';
*{ $import_caller . '::item' } = \*item;
}
# Now, cue Exporter!
goto &{ Exporter->can( 'import' ) };
}
sub process_and_store (&#) {
my $code = shift;
for my $x (#_) {
local *item = \$x;
$code->();
print "stored: $item\n"
}
}
1;
Then I call this from main.pl with something like:
#!/usr/bin/perl -w
use strict;
use SomePackage;
process_and_store { print "seen: $item\n"; } (1, 2, 3);
And get the expected result:
seen: 1
stored: 1
seen: 2
stored: 2
seen: 3
stored: 3

In my "associative array" processing library, I do something similar. The user can export the variables $k and $v (key-value) so that they can do things like this:
each_pair { "$k => $v" } some_source_list()
Here's how I do it:
I declare our ( $k, $v ) in the implementing package.
In import I allow packages to export those symbols and alias them in the
receiving package: *{$import_caller.'::'.$name} = \*{ $name };
In the pair processors, I do the following:
local *::_ = \$_[0];
local *k = \$_[0];
local *v = \$_[1];
#res = $block->( $_[0], $_[1] );
Thus $k and $v are aliases of what's in the queue. If this doesn't have to be the case, then you might be happy enough with something like the following:
local ( $k, $v ) = splice( #_, 0, 2 );
local $_ = $k;
But modifiable copies also allow me to do things like:
each_pair { substr( $k, 0, 1 ) eq '-' and $v++ } %some_hash;
UPDATE:
It seems that you're neglecting step #2. You have to make sure that the symbol in the client package maps to your symbol. It can be as simple as:
our $item;
sub import {
my $import_caller = caller();
{ no strict 'refs';
*{ $import_caller . '::item' } = \*item;
}
# Now, cue Exporter!
goto &{ Exporter->can( 'import' ) };
}
Then when you localize your own symbol, the aliased symbol in the client package is localized as well.
The main way that I can see that it would work without the local, is if you were calling it from the same package. Otherwise, $SomePackage::item and $ClientPackage::item are two distinct things.

I think it's a bit of a hack, but you could do something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $item;
sub process_and_store(&#) {
my $code = shift;
for (#_) {
$item = $_;
&$code();
}
undef $item;
}
The thing is, $item has to be a global scalar for this to work, so process_and_store has to update that scalar while looping over the list. You should also undef $item at the end of the sub routine to limit any potential side-effects. If I were to write something like this, I'd tuck it away in a module and make it possible to define the iterator variable, so as to limit name conflicts.
Test:
my #list = qw(apples pears bananas);
process_and_store { do_something_with($item) } #list;
sub do_something_with {
my $fruit = shift;
print "$fruit\n";
}
Output:
apples
pears
bananas

The $a and $b variables are special in Perl; they're real global variables and hence exempt from use strict, and also used specifically by the sort() function.
Most other similar uses in Perl would use the $_ global for this sort of thing:
process_and_store { do_something_with( $_ ) } #list;
Which is already handled by the normal $_ rules. Don't forget to localise $_:
sub process_and_store(&#)
{
my $code = shift;
foreach my $item (#_) {
local $_ = $item;
$code->();
}
}

Related

Use of reference to elements in #_ to avoid duplicating code

Is it safe to take reference of elements of #_ in a subroutine in order to avoid duplicating code? I also wonder if the following is good practice or can be simplified. I have a subroutine mod_str that takes an option saying if a string argument should be modified in-place or not:
use feature qw(say);
use strict;
use warnings;
my $str = 'abc';
my $mstr = mod_str( $str, in_place => 0 );
say $mstr;
mod_str( $str, in_place => 1 );
say $str;
sub mod_str {
my %opt;
%opt = #_[1..$#_];
if ( $opt{in_place} ) {
$_[0] =~ s/a/A/g;
# .. do more stuff with $_[0]
return;
}
else {
my $str = $_[0];
$str =~ s/a/A/g;
# .. do more stuff with $str
return $str;
}
}
In order to avoid repeating/duplicating code in the if and else blocks above, I tried to improve mod_str:
sub mod_str {
my %opt;
%opt = #_[1..$#_];
my $ref;
my $str;
if ( $opt{in_place} ) {
$ref = \$_[0];
}
else {
$str = $_[0]; # make copy
$ref = \$str;
}
$$ref =~ s/a/A/g;
# .. do more stuff with $$ref
$opt{in_place} ? return : return $$ref;
}
The "in place" flag changes the function's interface to the point where it should be a new function. It will simplify the interface, testing, documentation and the internals to have two functions. Rather than having to parse arguments and have a big if/else block, the user has already made that choice for you.
Another way to look at it is the in_place option will always be set to a constant. Because it fundamentally changes how the function behaves, there's no sensible case where you'd write in_place => $flag.
Once you do that, the reuse becomes more obvious. Write one function to do the operation in place. Write another which calls that on a copy.
sub mod_str_in_place {
# ...Do work on $_[0]...
return;
}
sub mod_str {
my $str = $_[0]; # string is copied
mod_str_in_place($str);
return $str;
}
In the absence of the disgraced given I like using for as a topicalizer. This effectively aliases $_ to either $_[0] or the local copy depending on the value of the in_place hash element. It's directly comparable to your $ref but with aliases, and a lot cleaner
I see no reason to return a useless undef / () in the case that the string is modified in place; the subroutine may as well return the new value of the string. (I suspect the old value might be more useful, after the fashion of $x++, but that makes for uglier code!)
I'm not sure whether this is readable code to anyone but me, so comments are welcome!
use strict;
use warnings;
my $ss = 'abcabc';
printf "%s %s\n", mod_str($ss), $ss;
$ss = 'abcabc';
printf "%s %s\n", mod_str($ss, in_place => 1), $ss;
sub mod_str {
my ($copy, %opt) = #_;
for ( $opt{in_place} ? $_[0] : $copy ) {
s/a/A/g;
# .. do more stuff with $_
return $_;
}
}
output
AbcAbc abcabc
AbcAbc AbcAbc

perl list subs in a package, excluding imported subs from other packages

There's several way to list all subs in a package:
sub list_methods {
my $package = shift;
no strict 'refs';
return grep { defined &{"$package\::$_"} } keys %{"$package\::"}
}
But, if the package 'use' other packages such as 'File::Basename', the subs like 'fileparse' will be listed as well.
I tried to 'require' packages instead of 'use' them, the problem can be resolved. On the other hand, if I 'require' packages, I have to specify the full path of the subs.
Do you have any thoughts?
use B qw( svref_2object );
sub list_nonimported_subs {
my ($pkg_name) = #_;
my $pkg = do { no strict 'refs'; *{ $pkg_name . '::' } };
my #nonimported_subs;
for my $name (keys %$pkg) {
my $glob = $pkg->{$name};
my $code = *$glob{CODE}
or next;
my $cv = svref_2object($code);
my $orig_pkg_name = $cv->GV->STASH->NAME;
next if $orig_pkg_name ne $pkg_name;
push #nonimported_subs, $name;
}
return #nonimported_subs;
}
There's a flag that will tell whether the CV in a glob is imported or not, but I can't find how to get that using B, so I check the __PACKAGE__ of the sub against the package being inspected.
It's impossible to tell whether something is a method or not, so I generalised the name of the sub.
PPI will parse the source, so the module doesn't even need to be loaded:
use PPI;
my $source = $INC{'Some/Module.pm'}; # or whatever
my $Document = PPI::Document->new($source) or die "oops";
for my $sub ( #{ $Document->find('PPI::Statement::Sub') || [] } ) {
unless ( $sub->forward ) {
print $sub->name, "\n";
}
}

"Passing arguments to subroutine"-question?

Is routine2 ok too or shouldn't I do this? (I don't need a copy of #list in the subroutine)
#!/usr/bin/perl
use 5.012;
use warnings;
my #list = 0 .. 9;
sub routine1 {
my $list = shift;
for (#$list) { $_++ };
return $list
}
my $l = routine1( \#list );
say "#$l";
sub routine2 {
for (#list) { $_++ };
}
routine2();
say "#list";
If it works for you, then it's ok too. But the first sub can do the job for any array you pass to it which makes it more general.
P.S. Remember that #_ contains aliases for the parameters passed to the function. So you could also use this:
sub increment { $_++ for #_ }
increment(#list);
If you're worried about making the syntax look nice, try this:
sub routine3 (\#) {
for (#{$_[0]}) { $_++ }
}
my #list = (0 .. 9);
routine3(#list);
say "#list"; # prints 1 .. 10
This declares routine3 with a prototype - it takes an array argument by reference. So $_[0] is a reference to #list, no rather unsightly \ needed by the caller. (Some people discourage prototypes, so take this as you will. I like them.)
But unless this is a simplification for what your actual routine does, what I'd do is this:
my #list = 0 .. 9;
my #new_list = map { $_ + 1 } #list;
say "#new_list";
Unless routine is actually really complicated, and it's vital somehow that you modify the original array, I'd just use map. Especially with map, you can plug in a subroutine:
sub complex_operation { ... }
my #new_list = map { complex_operation($_) } #list;
Of course, you could prototype complex_operation with (_) and then just write map(complex_operation, #list); but I like the bracket-syntax personally.

Perl: if ( element in list )

I'm looking for presence of an element in a list.
In Python there is an in keyword and I would do something like:
if element in list:
doTask
Is there something equivalent in Perl without having to manually iterate through the entire list?
UPDATE:
The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~, given, or when.
If you can get away with requiring Perl v5.10, then you can use any of the following examples.
The smart match ~~ operator.
if( $element ~~ #list ){ ... }
if( $element ~~ [ 1, 2, 3 ] ){ ... }
You could also use the given/when construct. Which uses the smart match functionality internally.
given( $element ){
when( #list ){ ... }
}
You can also use a for loop as a "topicalizer" ( meaning it sets $_ ).
for( #elements ){
when( #list ){ ... }
}
One thing that will come out in Perl 5.12 is the ability to use the post-fix version of when. Which makes it even more like if and unless.
given( $element ){
... when #list;
}
If you have to be able to run on older versions of Perl, there still are several options.
You might think you can get away with using List::Util::first, but there are some edge conditions that make it problematic.
In this example it is fairly obvious that we want to successfully match against 0. Unfortunately this code will print failure every time.
use List::Util qw'first';
my $element = 0;
if( first { $element eq $_ } 0..9 ){
print "success\n";
} else {
print "failure\n";
}
You could check the return value of first for defined-ness, but that will fail if we actually want a match against undef to succeed.
You can safely use grep however.
if( grep { $element eq $_ } 0..9 ){ ... }
This is safe because grep gets called in a scalar context. Arrays return the number of elements when called in scalar context. So this will continue to work even if we try to match against undef.
You could use an enclosing for loop. Just make sure you call last, to exit out of the loop on a successful match. Otherwise you might end up running your code more than once.
for( #array ){
if( $element eq $_ ){
...
last;
}
}
You could put the for loop inside the condition of the if statement ...
if(
do{
my $match = 0;
for( #list ){
if( $element eq $_ ){
$match = 1;
last;
}
}
$match; # the return value of the do block
}
){
...
}
... but it might be more clear to put the for loop before the if statement.
my $match = 0;
for( #list ){
if( $_ eq $element ){
$match = 1;
last;
}
}
if( $match ){ ... }
If you're only matching against strings, you could also use a hash. This can speed up your program if #list is large and, you are going to match against %hash several times. Especially if #array doesn't change, because then you only have to load up %hash once.
my %hash = map { $_, 1 } #array;
if( $hash{ $element } ){ ... }
You could also make your own subroutine. This is one of the cases where it is useful to use prototypes.
sub in(&#){
local $_;
my $code = shift;
for( #_ ){ # sets $_
if( $code->() ){
return 1;
}
}
return 0;
}
if( in { $element eq $_ } #list ){ ... }
if( $element ~~ #list ){
do_task
}
~~ is the "smart match operator", and does more than just list membership detection.
grep is helpful here
if (grep { $_ eq $element } #list) {
....
}
If you plan to do this many times, you can trade-off space for lookup time:
#!/usr/bin/perl
use strict; use warnings;
my #array = qw( one ten twenty one );
my %lookup = map { $_ => undef } #array;
for my $element ( qw( one two three ) ) {
if ( exists $lookup{ $element }) {
print "$element\n";
}
}
assuming that the number of times the element appears in #array is not important and the contents of #array are simple scalars.
List::Util::first
$foo = first { ($_ && $_ eq "value" } #list; # first defined value in #list
Or for hand-rolling types:
my $is_in_list = 0;
foreach my $elem (#list) {
if ($elem && $elem eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
A slightly different version MIGHT be somewhat faster on very long lists:
my $is_in_list = 0;
for (my $i = 0; i < scalar(#list); ++$i) {
if ($list[i] && $list[i] eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
TIMTOWTDI
sub is (&#) {
my $test = shift;
$test->() and return 1 for #_;
0
}
sub in (#) {#_}
if( is {$_ eq "a"} in qw(d c b a) ) {
print "Welcome in perl!\n";
}
List::MoreUtils
On perl >= 5.10 the smart match operator is surely the easiest way, as many others have already said.
On older versions of perl, I would instead suggest List::MoreUtils::any.
List::MoreUtils is not a core module (some say it should be) but it's very popular and it's included in major perl distributions.
It has the following advantages:
it returns true/false (as Python's in does) and not the value of the element, as List::Util::first does (which makes it hard to test, as noted above);
unlike grep, it stops at the first element which passes the test (perl's smart match operator short circuits as well);
it works with any perl version (well, >= 5.00503 at least).
Here is an example which works with any searched (scalar) value, including undef:
use List::MoreUtils qw(any);
my $value = 'test'; # or any other scalar
my #array = (1, 2, undef, 'test', 5, 6);
no warnings 'uninitialized';
if ( any { $_ eq $value } #array ) {
print "$value present\n"
}
P.S.
(In production code it's better to narrow the scope of no warnings 'uninitialized').
Probably Perl6::Junction is the clearest way to do. No XS dependencies, no mess and no new perl version required.
use Perl6::Junction qw/ any /;
if (any(#grant) eq 'su') {
...
}
This blog post discusses the best answers to this question.
As a short summary, if you can install CPAN modules then the best solutions are:
if any(#ingredients) eq 'flour';
or
if #ingredients->contains('flour');
However, a more usual idiom is:
if #any { $_ eq 'flour' } #ingredients
which i find less clear.
But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list. While any() will stop as soon as it finds your value.
Check out the blog post for more details.
PS: i'm answering for people who will have the same question in the future.
You can accomplish a similar enough syntax in Perl if you do some Autoload hacking.
Create a small package to handle the autoload:
package Autoloader;
use strict;
use warnings;
our $AUTOLOAD;
sub AUTOLOAD {
my $self = shift;
my ($method) = (split(/::/, $AUTOLOAD))[-1];
die "Object does not contain method '$method'" if not ref $self->{$method} eq 'CODE';
goto &{$self->{$method}};
}
1;
Then your other package or main script will contain a subroutine that returns the blessed object which gets handled by Autoload when its method attempts to be called.
sub element {
my $elem = shift;
my $sub = {
in => sub {
return if not $_[0];
# you could also implement this as any of the other suggested grep/first/any solutions already posted.
my %hash; #hash{#_} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}
This leaves you with usage looking like:
doTask if element('something')->in(#array);
If you reorganize the closure and its arguments, you can switch the syntax around the other way to make it look like this, which is a bit closer to the autobox style:
doTask if search(#array)->contains('something');
function to do that:
sub search {
my #arr = #_;
my $sub = {
contains => sub {
my $elem = shift or return;
my %hash; #hash{#arr} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}

How can I list all variables that are in a given scope?

I know I can list all of the package and lexcial variables in a given scope using Padwalker's peek_our and peek_my, but how can I get the names and values of all of the global variables like $" and $/?
#!/usr/bin/perl
use strict;
use warnings;
use PadWalker qw/peek_our peek_my/;
use Data::Dumper;
our $foo = 1;
our $bar = 2;
{
my $foo = 3;
print Dumper in_scope_variables();
}
print Dumper in_scope_variables();
sub in_scope_variables {
my %in_scope = %{peek_our(1)};
my $lexical = peek_my(1);
#lexicals hide package variables
while (my ($var, $ref) = each %$lexical) {
$in_scope{$var} = $ref;
}
##############################################
#FIXME: need to add globals to %in_scope here#
##############################################
return \%in_scope;
}
You can access the symbol table, check out p. 293 of "Programming Perl"
Also look at "Mastering Perl: http://www252.pair.com/comdog/mastering_perl/
Specifically: http://www252.pair.com/comdog/mastering_perl/Chapters/08.symbol_tables.html
Those variables you are looking for will be under the main namespace
A quick Google search gave me:
{
no strict 'refs';
foreach my $entry ( keys %main:: )
{
print "$entry\n";
}
}
You can also do
*sym = $main::{"/"}
and likewise for other values
If you want to find the type of the symbol you can do (from mastering perl):
foreach my $entry ( keys %main:: )
{
print "-" x 30, "Name: $entry\n";
print "\tscalar is defined\n" if defined ${$entry};
print "\tarray is defined\n" if defined #{$entry};
print "\thash is defined\n" if defined %{$entry};
print "\tsub is defined\n" if defined &{$entry};
}
And that does it. Thanks to MGoDave and kbosak for providing the answer in front of my face that I was too stupid to see (I looked in %main:: to start with, but missed that they didn't have their sigils). Here is the complete code:
#!/usr/bin/perl
use strict;
use warnings;
use PadWalker qw/peek_our peek_my/;
use Data::Dumper;
our $foo = 1;
our $bar = 2;
{
my $foo = 3;
print Dumper in_scope_variables();
}
print Dumper in_scope_variables();
sub in_scope_variables {
my %in_scope = %{peek_our(1)};
my $lexical = peek_my(1);
for my $name (keys %main::) {
my $glob = $main::{$name};
if (defined ${$glob}) {
$in_scope{'$' . $name} = ${$glob};
}
if (defined #{$glob}) {
$in_scope{'#' . $name} = [#{$glob}];
}
if (defined %{$glob}) {
$in_scope{'%' . $name} = {%{$glob}};
}
}
#lexicals hide package variables
while (my ($var, $ref) = each %$lexical) {
$in_scope{$var} = $ref;
}
return \%in_scope;
}
You can do something like the following to check the symbol table of the main package:
{
no strict 'refs';
for my $var (keys %{'main::'}) {
print "$var\n";
}
}
Thanks, Chas, very useful code.
As a note for future users of your code with perl > 5.12:
I was using it in in my pdl2 .perldlrc to find out lexical variables (like the 'y' command in the debugger) and I had this warning:
load_rcfile: loading
/homes/pmg/.perldlrc defined(%hash) is deprecated at (eval 254) line 36.
(Maybe you should just omit the defined()?)
From perldoc -f defined
Use of defined on aggregates (hashes
and arrays) is deprecated. It used to
report whether memory for that
aggregate had ever been allocated.
This behavior may disappear in future
versions of Perl. You should instead
use a simple test for size:
> if (#an_array) { print "has array elements\n" }
> if (%a_hash) { print "has hash members\n" }
What I don't understand is why it only complained with the defined hash and not also with the array?