Can I pass arguments to the compare subroutine of sort in Perl? - perl

I'm using sort with a customized comparison subroutine I've written:
sub special_compare {
# calc something using $a and $b
# return value
}
my #sorted = sort special_compare #list;
I know it's best use $a and $b which are automatically set, but sometimes I'd like my special_compare to get more arguments, i.e.:
sub special_compare {
my ($a, $b, #more) = #_; # or maybe 'my #more = #_;' ?
# calc something using $a, $b and #more
# return value
}
How can I do that?

Use the sort BLOCK LIST syntax, see perldoc -f sort.
If you have written the above special_compare sub, you can do, for instance:
my #sorted = sort { special_compare($a, $b, #more) } #list;

You can use closure in place of the sort subroutine:
my #more;
my $sub = sub {
# calc something using $a, $b and #more
};
my #sorted = sort $sub #list;
If you want to pass the elements to be compared in #_, set subroutine's prototype to ($$). Note: this is slower than unprototyped subroutine.

Related

Perl unexpected result

Imagine I have this Perl script
my $name = " foo ";
my $sn = " foosu";
trim($name, \$sn);
print "name: [$name]\n";
print "sn: [$sn]\n";
exit 0;
sub trim{
my $fref_trim = sub{
my ($ref_input) = #_;
${$ref_input} =~ s/^\s+// ;
${$ref_input} =~ s/\s+$// ;
};
foreach my $input (#_){
if (ref($input) eq "SCALAR"){
$fref_trim->($input);
} else {
$fref_trim->(\$input);
}
}
}
Result:
name: [foo]
sn: [foosu]
I would expect $name to be "[ foo ]" when printing the value after calling trim, but the sub is setting $name as I would want it. Why is this working, when it really shouldn't?
I'm not passing $name by reference and the trim sub is not returning anything. I'd expect the trim sub to create a copy of the $name value, process the copy, but then the original $name would still have the leading and trailing white spaces when printed in the main code.
I assume it is because of the alias with #_, but shouldn't the foreach my $input (#_) force the sub to copy the value and only treat the value not the alias?
I know I can simplify this sub and I used it only as an example.
Elements of #_ are aliases to the original variables. What you are observing is the difference between:
sub ltrim {
$_[0] =~ s/^\s+//;
return $_[0];
}
and
sub ltrim {
my ($s) = #_;
$s =~ s/^\s+//;
return $s;
}
Compare your code to:
#!/usr/bin/env perl
my $name = " foo ";
my $sn = " foosu";
trim($name, \$sn);
print "name: [$name]\n";
print "sn: [$sn]\n";
sub trim {
my #args = #_;
my $fref_trim = sub{
my ($ref_input) = #_;
${$ref_input} =~ s/^\s+//;
${$ref_input} =~ s/\s+\z//;
};
for my $input (#args) {
if (ref($input) eq "SCALAR") {
$fref_trim->($input);
}
else {
$fref_trim->(\$input);
}
}
}
Output:
$ ./zz.pl
name: [ foo ]
sn: [foosu]
Note also that the loop variable in for my $input ( #array ) does not create a new copy for each element of the array. See perldoc perlsyn:
The foreach loop iterates over a normal list value and sets the scalar variable VAR to be each element of the list in turn. ...
...
the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
In your case, this would mean that, at each iteration $input is an alias to the corresponding element of #_ which itself is an alias to the variable that was passed in as an argument to the subroutine.
Making a copy of #_ thus prevents the variables in the calling context from being modified. Of course, you could do something like:
sub trim {
my $fref_trim = sub{
my ($ref_input) = #_;
${$ref_input} =~ s/^\s+//;
${$ref_input} =~ s/\s+\z//;
};
for my $input (#_) {
my $input_copy = $input;
if (ref($input_copy) eq "SCALAR") {
$fref_trim->($input_copy);
}
else {
$fref_trim->(\$input_copy);
}
}
}
but I find making a wholesale copy of #_ once to be clearer and more efficient assuming you do not want to be selective.
I assume it is because of the alias with #_, but shouldn't the foreach my $input (#_) force the sub to copy the value and only treat the value not the alias?
You're right that #_ contains aliases. The part that's missing is that foreach also aliases the loop variable to the current list element. Quoting perldoc perlsyn:
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
So ultimately $input is an alias for $_[0], which is an alias for $name, which is why you see the changes appearing in $name.

what is the usage of \& and $expr->()

sub reduce(&#) {
my $expr = \&{shift #ARG};
my $result = shift #ARG;
while (scalar #ARG > 0) {
our $a = $result;
our $b = shift #ARG;
$result = $expr->();
}
return $result;
}
I cannot really understand some grammar in this code. Anyone can explain to me? like \& and $result = $expr->()
\&name returns a reference to the subroutine named name.
$code_ref->() calls the subroutine referenced by $code_ref.
$ perl -e'
sub f { CORE::say "Hi" }
my $code_ref = \&f;
$code_ref->();
'
Hi
In your case, shift #ARG returns a subroutine reference. \&{ $code_ref } simply returns the code ref. As such,
my $expr = \&{shift #ARG};
could have been written as
my $expr = shift #ARG;
Note that reduce's prototype allows it to be called as
reduce { ... } ...
but what is actually executed is
reduce( sub { ... }, ... )
Note that this version of reduce is buggy. You should use the one provided by List::Util.
local $a and local $b should have been used to avoid clobbering the values its caller might have in $a and $b.
This version of reduce expects its callback to have been compiled in the same package as reduce itself. Otherwise, the callback sub won't be able to simply use $a and $b.
Declaring the variables using our is actually completely useless in this version case since $a and $b are exempt from use strict; checks, and the undeclared use of $a and $b would access the very same package variables.
Having a look some List::Util::reduce() examples will probably help.
Let's take the first one:
$foo = reduce { $a > $b ? $a : $b } 1..10;
So reduce takes a BLOCK followed by a LIST, which the function signature declares: sub reduce(&#) {. The block in our case is the statement $a > $b ? $a : $b, while the list is 1..10. From the docs:
Reduces #list by calling "BLOCK" in a scalar context multiple times,
setting $a and $b each time. The first call will be with $a and $b set to
the first two elements of the list, subsequent calls will be done by
setting $a to the result of the previous call and $b to the next element
in the list.
Returns the result of the last call to the "BLOCK". If #list is empty then
"undef" is returned. If #list only contains one element then that element
is returned and "BLOCK" is not executed.
And now to an annotated version of the code:
$foo = reduce { $a > $b ? $a : $b } 1..10; # $foo will be set to 10
sub reduce(&#) {
# reduce() takes a BLOCK followed by a LIST
my $expr = \&{shift #ARG};
# $expr is now a subroutine reference, i.e.
# $expr = sub { $a > $b ? $a : $b };
# Start by setting $result to the first item in the list, 1
my $result = shift #ARG;
# While there are more items in the list...
while (scalar #ARG > 0) {
# Set $a to the current result
our $a = $result;
# Set $b to the next item in the list
our $b = shift #ARG;
# Set $result to the result of $a > $b ? $a : $b
$result = $expr->();
}
# List has now been reduced by the operation $a > $b ? $a : $b
return $result;
}

Perl: Change in Subroutine not printing outside of routine

So I want to change numbers that I pass into a subroutine, and then retain those numbers being changed, but it doesn't seem to work.
my $A = 0;
my $T = 0;
my $C = 0;
my $G = 0;
foreach my $bases in (keys %basereads){
count ($bases, $A, $T, $C, $G);
}
Here is my subroutine
sub count {
my $bases = shift;
my $A = shift;
my $T = shift;
my $C = shift;
my $G = shift;
for (my $i = 0; $i < length($bases); $i++){
print "$bases\t";
if (uc(substr($bases,$i,1)) eq 'A'){
$A++;
}elsif (uc(substr($bases,$i,1)) eq 'T'){
$T++;
} elsif (uc(substr($bases,$i,1)) eq 'G'){
$G++;
} elsif (uc(substr($bases,$i,1)) eq 'C'){
$C++;
} else { next; }
}
print "$A\t$C\t$T\t$G\n";
return my($bases, $A, $T, $C, $G);
}
after the subroutine, I want to stored the altered A, C, T, G into a hashmap. When I print bases and ATCG inside the subroutine, it prints, so I know the computer is running through the subroutine, but it's not saving it, and when I try to manipulate it outside the subroutine (after I've called it), it starts from zero (what I had defined the four bases as before). I'm new to Perl, so I'm a little weary of subroutines.
Could someone help?
Always include use strict; and use warnings; at the top of EVERY script.
With warnings enabled, you should've gotten the following messages:
"my" variable $bases masks earlier declaration in same scope at script.pl line ...
"my" variable $A masks earlier declaration in same scope at script.pl line ...
"my" variable $T masks earlier declaration in same scope at script.pl line ...
"my" variable $C masks earlier declaration in same scope at script.pl line ...
"my" variable $G masks earlier declaration in same scope at script.pl line ...
These are caused by the my before your return statement:
return my($bases, $A, $T, $C, $G);
Correct this by simply removing the my:
return ($bases, $A, $T, $C, $G);
And then you just need to capture your returned values
($bases, $A, $T, $C, $G) = count($bases, $A, $T, $C, $G);
Given that you're new to perl, I'm sure you won't be surprised that your code could be cleaned up further though. If one uses a hash, it makes it a lot easier to count various characters in a string, as demonstrated below:
use strict;
use warnings;
my $A = 0;
my $T = 0;
my $C = 0;
my $G = 0;
foreach my $bases (keys %basereads) {
my %counts;
for my $char (split //, $bases) {
$counts{$char}++;
}
$A += $counts{A};
$T += $counts{T};
$C += $counts{C};
$G += $counts{G};
}

Perl code is not throwing any error

#!/usr/bin/perl
use strict ;
$b = 5;
my $var=10;
print $var;
Above code is not throwing error. I have not used my for variable $b.
$b is a predeclared variable in Perl. It is used with sort() like so:
my #sorted = sort { $a <=> $b } #list;
It is documented in perldoc perlvar.

Is there a Perl idiom which is the functional equivalent of calling a subroutine from within the substitution operator?

Perl allows ...
$a = "fee";
$result = 1 + f($a) ; # invokes f with the argument $a
but disallows, or rather doesn't do what I want ...
s/((fee)|(fie)|(foe)|(foo))/f($1)/ ; # does not invoke f with the argument $1
The desired-end-result is a way to effect a substitution geared off what the regex matched.
Do I have to write
sub lala {
my $haha = shift;
return $haha . $haha;
}
my $a = "the giant says foe" ;
$a =~ m/((fee)|(fie)|(foe)|(foo))/;
my $result = lala($1);
$a =~ s/$1/$result/;
print "$a\n";
See perldoc perlop. You need to specify the e modifier so that the replacement part is evaluated.
#!/usr/bin/perl
use strict; use warnings;
my $x = "the giant says foe" ;
$x =~ s/(f(?:ee|ie|o[eo]))/lala($1)/e;
print "$x\n";
sub lala {
my ($haha) = #_;
return "$haha$haha";
}
Output:
C:\Temp> r
the giant says foefoe
Incidentally, avoid using $a and $b outside of sort blocks as they are special package scoped variables special-cased for strict.