Perl sort won't use function from another package - perl

I have a function for case insensitive sorting. It works if it's from the same package, but not otherwise.
This works:
my #arr = sort {lc $a cmp lc $b} #list;
This works (if a function called "isort" is defined in the same file):
my #arr = sort isort #list;
This does not (function exported with Exporter from another package):
my #arr = sort isort #list;
This does not (function referred to explicitly by package name):
my #arr = sort Utils::isort #list;
What is going on? How do I put a sorting function in another package?

What evidence do you have for it not working? Have you put a print() statement in the subroutine to see if it's being called?
I suspect you're being tripped up by this (from perldoc -f sort):
$a and $b are set as package globals in the package the sort() is called from. That means $main::a and $main::b (or $::a and $::b ) in the main package, $FooPack::a and $FooPack::b in the FooPack package, etc.
Oh, and later on it's more specific:
Sort subroutines written using $a and $b are bound to their calling package. It is possible, but of limited interest, to define them in a different package, since the subroutine must still refer to the calling package's $a and $b:
package Foo;
sub lexi { $Bar::a cmp $Bar::b }
package Bar;
... sort Foo::lexi ...
Use the prototyped versions (see above) for a more generic alternative.
The "prototyped versions" are described above like this:
If the subroutine's prototype is ($$) , the elements to be compared are passed by reference in #_, as for a normal subroutine. This is slower than unprototyped subroutines, where the elements to be compared are passed into the subroutine as the package global variables $a and $b (see example below).
So you could try rewriting your subroutine like this:
package Utils;
sub isort ($$) {
my ($a, $b) = #_;
# existing code...
}
And then calling it using one of your last two alternatives.

Related

Can I make a variable optional in a perl sub prototype?

I'd like to understand if it's possible to have a sub prototype and optional parameters in it. With prototypes I can do this:
sub some_sub (\#\#\#) {
...
}
my #foo = qw/a b c/;
my #bar = qw/1 2 3/;
my #baz = qw/X Y Z/;
some_sub(#foo, #bar, #baz);
which is nice and readable, but the minute I try to do
some_sub(#foo, #bar);
or even
some_sub(#foo, #bar, ());
I get errors:
Not enough arguments for main::some_sub at tablify.pl line 72, near "#bar)"
or
Type of arg 3 to main::some_sub must be array (not stub) at tablify.pl line 72, near "))"
Is it possible to have a prototype and a variable number of arguments? or is something similar achievable via signatures?
I know it could be done by always passing arrayrefs I was wondering if there was another way. After all, TMTOWTDI.
All arguments after a semi-colon are optional:
sub some_sub(\#\#;\#) {
}
Most people are going to expect your argument list to flatten, and you are reaching for an outdated tool to do what people don't expect.
Instead, pass data structures by reference:
some_sub( \#array1, \#array2 );
sub some_sub {
my #args = #_;
say "Array 1 has " . $args[0]->#* . " elements";
}
If you want to use those as named arrays within the sub, you can use ref aliasing
use v5.22;
use experimental qw(ref_aliasing);
sub some_sub {
\my( #array1 ) = $_[0];
...
}
With v5.26, you can move the reference operator inside the parens:
use v5.26;
use experimental qw(declared_refs);
sub some_sub {
my( \#array1 ) = $_[0];
...
}
And, remember that v5.20 introduced the :prototype attribute so you can distinguish between prototypes and signatures:
use v5.20;
sub some_sub :prototype(##;#) { ... }
I write about these things at The Effective Perler (which you already read, I see), in Perl New Features, a little bit in Preparing for Perl 7 (which is mostly about what you need to stop doing in Perl 5 to be future proof).

Query reg code in List::Util::reduce

I came across the following code in List::Util for reduce subroutine.
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
I could understand that reduce function is called as:
my $sum = reduce { $a + $b } 1 .. 1000;
So, I understood the code is trying to reference $a mentioned in the subroutine. But, I am unable to understand the intent correctly.
For reference, I am adding the complete code for subroutine
sub reduce (&#) {
my $code = shift;
require Scalar::Util;
my $type = Scalar::Util::reftype($code);
unless($type and $type eq 'CODE') {
require Carp;
Carp::croak("Not a subroutine reference");
}
no strict 'refs';
return shift unless #_ > 1;
use vars qw($a $b);
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
$a = shift;
foreach (#_) {
$b = $_;
$a = &{$code}();
}
$a;
}
The following aliases package variable $foo to variable $bar.
*foo = \$bar;
Any change to one changes the other as both names refer to the same scalar.
$ perl -E'
*foo = \$bar;
$bar=123; say $foo;
$foo=456; say $bar;
say \$foo == \$bar ? 1 : 0;
'
123
456
1
Of course, you can fully qualify *foo since it's a symbol table entry. The following aliases package variable $main::foo to $bar.
*main::foo = \$bar;
Or, if you don't know the name at compile time
my $caller = 'main';
*{$caller."::foo"} = \$bar; # Symbolic reference
$bar, of course, can just as easily be a lexical variable as a package variable. And since my $bar; actually returns the variable begin declared,
my $bar;
*foo = \$bar;
can be written as
*foo = \my $bar;
So,
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
declares and aliases lexical variables $a and $b the similarly named package variables in the caller's namespace.
local simply causes everything to return to their original state once the sub is exited.
On scope
Perl has two variable name scoping mechnisms: global and lexical. Declaration of lexical vars is done with my, and they are accessibly by this name until they encounter a closing curly brace.
Global variables, on the other hand, are accessible from anywhere and do not have a scope. They can be declared with our and use vars, or do not have to be declared if strict is not in effect. However, they have namespaces, or packages. The namespace is a prefix seperated from the variable name by two colons (or a single quote, but never do that). Inside the package of the variable, the variable can be accessed with or without the prefix. Outside of the package, the prefix is required.
The local function is somewhat special and gives global variables a temporary value. The scope of this value is the same as that of a lexical variable plus the scopes of all subs called within this scope. The old value is restored once this scope is exited. This is called the dynamic scope.
On Globs
Perl organizes global variables in a big hash representing the namespace and all variable names (sometimes called the stash). In each slot of this hash, there is a so-called glob. A typeglob is a special hash that has a field for each of Perls native types, e.g. scalar, array, hash, IO, format, code etc. You assign to a slot by passing the glob a reference of a value you want to add - the glob figures out the right slot on it's own. This is also the reason you can have multiple variables with the same name (like $thing, #thing, %thing, thing()). Typeglobs have a special sigil, namely the asterisk *.
On no strict 'refs'
The no strict 'refs' is a cool thing if you know what you are doing. Normally you can only dereference normal references, e.g.
my #array = (1 .. 5);
my $arrayref = \#array; # is a reference
push #{$arrayref}, 6; # works
push #{array}, 6; # works; barewords are considered o.k.
push #{"array"}, 6; # dies horribly, if strict refs enabled.
The last line tried to dereference a string, this is considered bad practice. However, under no strict 'refs', we can access a variable of which we do not know the name at compile time, as we do here.
Conclusion
The caller functions returns the name of the package of the calling code, i.e. it looks up one call stack frame. The name is used here to construct the full names of $a and $b variables of the calling packages, so that they can be used there without a prefix. Then, these names are locally (i.e. in the dynamic scope) assigned to the reference of a newly declared, lexical variable.
The global variables $a and $b are predeclared in each package.
In the foreach loop, these lexicals are assigned different values (lexical vars take precedence over global vars), but the global variables $foo::a and $foo::$b point to the same data because of the reference, allowing the anonymous callback sub in the reduce call to read the two arguments easily. (See ikegamis answer for details on this.)
All of this hassle is good because (a) the effects are not externaly visible, and (b) the callback doesn't have to do tedious argument unpacking.

Using a sorting subroutine from another package

I have a script and a package like so:
# file: sortscript.pl
use strict;
use warnings;
use SortPackage;
my #arrays = ([1,"array1"],[10,"array3"],[4,"array2"]);
print "Using sort outside package\n";
foreach (sort SortPackage::simplesort #arrays){
print $_->[1],"\n";
}
print "\nUsing sort in same package\n";
SortPackage::sort_from_same_package(#arrays);
--
# file: SortPackage.pm
use strict;
use warnings;
package SortPackage;
sub simplesort{
return ($a->[0] <=> $b->[0]);
}
sub sort_from_same_package{
my #arrs = #_;
foreach (sort simplesort #arrs){
print $_->[1],"\n";
}
}
1;
Running the script produces the output:
$ perl sortscript.pl
Using sort outside package
Use of uninitialized value in numeric comparison (<=>) at SortPackage.pm line 15.
Use of uninitialized value in numeric comparison (<=>) at SortPackage.pm line 15.
Use of uninitialized value in numeric comparison (<=>) at SortPackage.pm line 15.
Use of uninitialized value in numeric comparison (<=>) at SortPackage.pm line 15.
Use of uninitialized value in numeric comparison (<=>) at SortPackage.pm line 15.
Use of uninitialized value in numeric comparison (<=>) at SortPackage.pm line 15.
array1
array3
array2
Using sort in same package
array1
array2
array3
Why am I not able to correctly use the subroutine to sort with when it is in another package?
As has been mentioned, $a and $b are package globals, so another solution is to temporarily alias the globals at the call site to the ones in package SortPackage:
{
local (*a, *b) = (*SortPackage::a, *SortPackage::b);
foreach (sort SortPackage::simplesort #arrays){
print $_->[1],"\n";
}
}
But this is pretty ugly, of course. I would just have SortPackage export a complete sorting routine, not just a comparator:
package SortPackage;
use strict;
sub _sort_by_first_element_comparator {
return $a->[0] <=> $b->[0];
}
sub sort_by_first_element {
return sort _sort_by_first_element_comparator #_;
}
$a and $b are special "package global" variables.
To use the main scope's $a and $b your comparator function would have to refer to $::a or $main::a (and likewise for $b).
However that comparator function then wouldn't work when called from any other package, or even from within its own package.
See the perlvars help and the perldoc for the sort function. The solution is also in the latter help text:
If the subroutine’s prototype is "($$)", the elements to be
compared are passed by reference in #_, as for a normal
subroutine. This is slower than unprototyped subroutines,
where the elements to be compared are passed into the
subroutine as the package global variables $a and $b (see
example below). Note that in the latter case, it is usually
counter‐productive to declare $a and $b as lexicals.
The special variables $a and $b are package globals. Your subroutine expects $SortPackage::a and $SortPackage::b. When you call it from sortscript.pl, the variables $main::a and $main::b are being set by sort.
The solution is to use a prototyped subroutine:
package SortPackage;
sub simplesort ($$) {
return ($_[0]->[0] <=> $_[1]->[0]);
}
It's a little slower (since you have actual parameters being passed, instead of reading pre-set globals), but it allows you to use subroutines from other packages by name, as you are attempting.

Where is $_ being modified in this perl code?

The following perl code generates a warning in PerlCritic (by Activestate):
sub natural_sort {
my #sorted;
#sorted = grep {s/(^|\D)0+(\d)/$1$2/g,1} sort grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
}
The warning generated is:
Don't modify $_ in list functions
More info about that warning here
I don't understand the warning because I don't think I'm modifying $_, although I suppose I must be.
Can someone explain it to me please?
Both of your greps are modifying $_ because you're using s//. For example, this:
grep {s/(^|\D)0+(\d)/$1$2/g,1}
is the same as this:
grep { $_ =~ s/(^|\D)0+(\d)/$1$2/g; 1 }
I think you'd be better off using map as you are not filtering anything with your greps, you're just using grep as an iterator:
sub natural_sort {
my $t;
return map { ($t = $_) =~ s/(^|\D)0+(\d)/$1$2/g; $t }
sort
map { ($t = $_) =~ s/(\d+)/sprintf"%06.6d",$1/ge; $t }
#_;
}
That should do the same thing and keep critic quiet. You might want to have a look at List::MoreUtils if you want some nicer list operators than plain map.
You are doing a substitution ( i.e. s/// ) in the grep, which modifies $_ i.e. the list being grepped.
This and other cases are explained in perldoc perlvar:
Here are the places where Perl will
assume $_ even if you don't use it:
The following functions:
abs, alarm, chomp, chop, chr, chroot,
cos, defined, eval, exp, glob, hex,
int, lc, lcfirst, length, log, lstat,
mkdir, oct, ord, pos, print,
quotemeta, readlink, readpipe, ref,
require, reverse (in scalar context
only), rmdir, sin, split (on its
second argument), sqrt, stat, study,
uc, ucfirst, unlink, unpack.
All file tests (-f , -d ) except for -t , which defaults to STDIN.
See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when
used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is
supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put an input record when a operation's result
is tested by itself as the sole
criterion of a while test. Outside a
while test, this will not happen.
Many people have correctly answered that the s operator is modifying $_, however in the soon to be released Perl 5.14.0 there will be the new r flag for the s operator (i.e. s///r) which rather than modify in-place will return the modified elements. Read more at The Effective Perler . You can use perlbrew to install this new version.
Edit: Perl 5.14 is now available! Announcement Announcement Delta
Here is the function suggested by mu (using map) but using this functionality:
use 5.14.0;
sub natural_sort {
return map { s/(^|\D)0+(\d)/$1$2/gr }
sort
map { s/(\d+)/sprintf"%06.6d",$1/gre }
#_;
}
The VERY important part that other answers have missed is that the line
grep {s/(\d+)/sprintf"%06.6d",$1/ge,1} #_;
Is actually modifying the arguments passed into the function, and not copies of them.
grep is a filtering command, and the value in $_ inside the code block is an alias to one of the values in #_. #_ in turn contains aliases to the arguments passed to the function, so when the s/// operator performs its substitution, the change is being made to the original argument. This is shown in the following example:
sub test {grep {s/a/b/g; 1} #_}
my #array = qw(cat bat sat);
my #new = test #array;
say "#new"; # prints "cbt bbt sbt" as it should
say "#array"; # prints "cbt bbt sbt" as well, which is probably an error
The behavior you are looking for (apply a function that modifies $_ to a copy of a list) has been encapsulated as the apply function in a number of modules. My module List::Gen contains such an implementation. apply is also fairly simple to write yourself:
sub apply (&#) {
my ($sub, #ret) = #_;
$sub->() for #ret;
wantarray ? #ret : pop #ret
}
With that, your code could be rewritten as:
sub natural_sort {
apply {s/(^|\D)0+(\d)/$1$2/g} sort apply {s/(\d+)/sprintf"%06.6d",$1/ge} #_
}
If your goal with the repeated substitutions is to perform a sort of the original data with a transient modification applied, you should look into a Perl idiom known as the Schwartzian transform which is a more efficient way of achieving that goal.

How would I do the equivalent of Prototype's Enumerator.detect in Perl with the least amount of code?

Lately I've been thinking a lot about functional programming. Perl offers quite a few tools to go that way, however there's something I haven't been able to find yet.
Prototype has the function detect for enumerators, the descriptions is simply this:
Enumerator.detect(iterator[, context]) -> firstElement | undefined
Finds the first element for which the iterator returns true.
Enumerator in this case is any list while iterator is a reference to a function, which is applied in turn on each element of the list.
I am looking for something like this to apply in situations where performance is important, i.e. when stopping upon encountering a match saves time by disregarding the rest of the list.
I am also looking for a solution that would not involve loading any extra module, so if possible it should be done with builtins only. And if possible, it should be as concise as this for example:
my #result = map function #array;
You say you don't want a module, but this is exactly what the first function in List::Util does. That's a core module, so it should be available everywhere.
use List::Util qw(first);
my $first = first { some condition } #array;
If you insist on not using a module, you could copy the implementation out of List::Util. If somebody knew a faster way to do it, it would be in there. (Note that List::Util includes an XS implementation, so that's probably faster than any pure-Perl approach. It also has a pure-Perl version of first, in List::Util::PP.)
Note that the value being tested is passed to the subroutine in $_ and not as a parameter. This is a convenience when you're using the first { some condition} #values form, but is something you have to remember if you're using a regular subroutine. Some more examples:
use 5.010; # I want to use 'say'; nothing else here is 5.10 specific
use List::Util qw(first);
say first { $_ > 3 } 1 .. 10; # prints 4
sub wanted { $_ > 4 }; # note we're using $_ not $_[0]
say first \&wanted, 1 .. 10; # prints 5
my $want = \&wanted; # Get a subroutine reference
say first \&$want, 1 .. 10; # This is how you pass a reference in a scalar
# someFunc expects a parameter instead of looking at $_
say first { someFunc($_) } 1 .. 10;
Untested since I don't have Perl on this machine, but:
sub first(\&#) {
my $pred = shift;
die "First argument to "first" must be a sub" unless ref $pred eq 'CODE';
for my $val (#_) {
return $val if $pred->($val);
}
return undef;
}
Then use it as:
my $first = first { sub performing test } #list;
Note that this doesn't distinguish between no matches in the list and one of the elements in the list being an undefined value and having that match.
Just since its not here, a Perl function definition of first that localizes $_ for its block:
sub first (&#) {
my $code = shift;
for (#_) {return $_ if $code->()}
undef
}
my #array = 1 .. 10;
say first {$_ > 5} #array; # prints 6
While it will work fine, I don't advocate using this version, since List::Util is a core module (installed by default), and its implementation of first will usually use the XS version (written in C) which is much faster.