Query reg code in List::Util::reduce - perl

I came across the following code in List::Util for reduce subroutine.
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
I could understand that reduce function is called as:
my $sum = reduce { $a + $b } 1 .. 1000;
So, I understood the code is trying to reference $a mentioned in the subroutine. But, I am unable to understand the intent correctly.
For reference, I am adding the complete code for subroutine
sub reduce (&#) {
my $code = shift;
require Scalar::Util;
my $type = Scalar::Util::reftype($code);
unless($type and $type eq 'CODE') {
require Carp;
Carp::croak("Not a subroutine reference");
}
no strict 'refs';
return shift unless #_ > 1;
use vars qw($a $b);
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
$a = shift;
foreach (#_) {
$b = $_;
$a = &{$code}();
}
$a;
}

The following aliases package variable $foo to variable $bar.
*foo = \$bar;
Any change to one changes the other as both names refer to the same scalar.
$ perl -E'
*foo = \$bar;
$bar=123; say $foo;
$foo=456; say $bar;
say \$foo == \$bar ? 1 : 0;
'
123
456
1
Of course, you can fully qualify *foo since it's a symbol table entry. The following aliases package variable $main::foo to $bar.
*main::foo = \$bar;
Or, if you don't know the name at compile time
my $caller = 'main';
*{$caller."::foo"} = \$bar; # Symbolic reference
$bar, of course, can just as easily be a lexical variable as a package variable. And since my $bar; actually returns the variable begin declared,
my $bar;
*foo = \$bar;
can be written as
*foo = \my $bar;
So,
my $caller = caller;
local(*{$caller."::a"}) = \my $a;
local(*{$caller."::b"}) = \my $b;
declares and aliases lexical variables $a and $b the similarly named package variables in the caller's namespace.
local simply causes everything to return to their original state once the sub is exited.

On scope
Perl has two variable name scoping mechnisms: global and lexical. Declaration of lexical vars is done with my, and they are accessibly by this name until they encounter a closing curly brace.
Global variables, on the other hand, are accessible from anywhere and do not have a scope. They can be declared with our and use vars, or do not have to be declared if strict is not in effect. However, they have namespaces, or packages. The namespace is a prefix seperated from the variable name by two colons (or a single quote, but never do that). Inside the package of the variable, the variable can be accessed with or without the prefix. Outside of the package, the prefix is required.
The local function is somewhat special and gives global variables a temporary value. The scope of this value is the same as that of a lexical variable plus the scopes of all subs called within this scope. The old value is restored once this scope is exited. This is called the dynamic scope.
On Globs
Perl organizes global variables in a big hash representing the namespace and all variable names (sometimes called the stash). In each slot of this hash, there is a so-called glob. A typeglob is a special hash that has a field for each of Perls native types, e.g. scalar, array, hash, IO, format, code etc. You assign to a slot by passing the glob a reference of a value you want to add - the glob figures out the right slot on it's own. This is also the reason you can have multiple variables with the same name (like $thing, #thing, %thing, thing()). Typeglobs have a special sigil, namely the asterisk *.
On no strict 'refs'
The no strict 'refs' is a cool thing if you know what you are doing. Normally you can only dereference normal references, e.g.
my #array = (1 .. 5);
my $arrayref = \#array; # is a reference
push #{$arrayref}, 6; # works
push #{array}, 6; # works; barewords are considered o.k.
push #{"array"}, 6; # dies horribly, if strict refs enabled.
The last line tried to dereference a string, this is considered bad practice. However, under no strict 'refs', we can access a variable of which we do not know the name at compile time, as we do here.
Conclusion
The caller functions returns the name of the package of the calling code, i.e. it looks up one call stack frame. The name is used here to construct the full names of $a and $b variables of the calling packages, so that they can be used there without a prefix. Then, these names are locally (i.e. in the dynamic scope) assigned to the reference of a newly declared, lexical variable.
The global variables $a and $b are predeclared in each package.
In the foreach loop, these lexicals are assigned different values (lexical vars take precedence over global vars), but the global variables $foo::a and $foo::$b point to the same data because of the reference, allowing the anonymous callback sub in the reduce call to read the two arguments easily. (See ikegamis answer for details on this.)
All of this hassle is good because (a) the effects are not externaly visible, and (b) the callback doesn't have to do tedious argument unpacking.

Related

Perl sort won't use function from another package

I have a function for case insensitive sorting. It works if it's from the same package, but not otherwise.
This works:
my #arr = sort {lc $a cmp lc $b} #list;
This works (if a function called "isort" is defined in the same file):
my #arr = sort isort #list;
This does not (function exported with Exporter from another package):
my #arr = sort isort #list;
This does not (function referred to explicitly by package name):
my #arr = sort Utils::isort #list;
What is going on? How do I put a sorting function in another package?
What evidence do you have for it not working? Have you put a print() statement in the subroutine to see if it's being called?
I suspect you're being tripped up by this (from perldoc -f sort):
$a and $b are set as package globals in the package the sort() is called from. That means $main::a and $main::b (or $::a and $::b ) in the main package, $FooPack::a and $FooPack::b in the FooPack package, etc.
Oh, and later on it's more specific:
Sort subroutines written using $a and $b are bound to their calling package. It is possible, but of limited interest, to define them in a different package, since the subroutine must still refer to the calling package's $a and $b:
package Foo;
sub lexi { $Bar::a cmp $Bar::b }
package Bar;
... sort Foo::lexi ...
Use the prototyped versions (see above) for a more generic alternative.
The "prototyped versions" are described above like this:
If the subroutine's prototype is ($$) , the elements to be compared are passed by reference in #_, as for a normal subroutine. This is slower than unprototyped subroutines, where the elements to be compared are passed into the subroutine as the package global variables $a and $b (see example below).
So you could try rewriting your subroutine like this:
package Utils;
sub isort ($$) {
my ($a, $b) = #_;
# existing code...
}
And then calling it using one of your last two alternatives.

Can someone explain why Perl behaves this way (variable scoping)?

My test goes like this:
use strict;
use warnings;
func();
my $string = 'string';
func();
sub func {
print $string, "\n";
}
And the result is:
Use of uninitialized value $string in print at test.pl line 10.
string
Perl allows us to call a function before it has been defined. However when the function uses a variable declared only after the function call, the variable appears to be undefined. Is this behavior documented somewhere? Thank you!
The behaviour of my is documented in perlsub - it boils down to this - perl knows $string is in scope - because the my tells it so.
The my operator declares the listed variables to be lexically confined to the enclosing block, conditional (if/unless/elsif/else), loop (for/foreach/while/until/continue), subroutine, eval, or do/require/use'd file.
It means it's 'in scope' from the point at which it's first 'seen' until the closing bracket of the current 'block'. (Or in your example - the end of the code)
However - in your example my also assigns a value.
This scoping process happens at compile time - where perl checks where it's valid to use $string or not. (Thanks to strict). However - it can't know what the value was, because that might change during code execution. (and is non-trivial to analyze)
So if you do this it might be a little clearer what's going on:
#!/usr/bin/env perl
use strict;
use warnings;
my $string; #undefined
func();
$string = 'string';
func();
sub func {
print $string, "\n";
}
$string is in scope in both cases - because the my happened at compile time - before the subroutine has been called - but it doesn't have a value set beyond the default of undef prior to the first invocation.
Note this contrasts with:
#!/usr/bin/env perl
use strict;
use warnings;
sub func {
print $string, "\n";
}
my $string; #undefined
func();
$string = 'string';
func();
Which errors because when the sub is declared, $string isn't in scope.
First of all, I would consider this undefined behaviour since it skips executing my like my $x if $cond; does.
That said, the behaviour is currently consistent and predictable. And in this instance, it behaves exactly as expected if the optimization that warranted the undefined behaviour notice didn't exit.
At compile-time, my has the effect of declaring and allocating the variable[1]. Scalars are initialized to undef when created. Arrays and hashes are created empty.
my $string was encountered by the compiler, so the variable was created. But since you haven't executed the assignment yet, it still has its default value (undefined) during the first call to func.
This model allows variables to be captured by closures.
Example 1:
{
my $x = "abc";
sub foo { $x } # Named subs capture at compile-time.
}
say foo(); # abc, even though $x fell out of scope before foo was called.
Example 2:
sub make_closure {
my ($x) = #_;
return sub { $x }; # Anon subs capture at run-time.
}
my $foo = make_closure("foo");
my $bar = make_closure("bar");
say $foo->(); # foo
say $bar->(); # bar
The allocation is possibly deferred until the variable is actually used.

How do I define an anonymous scalar ref in Perl?

How do I properly define an anonymous scalar ref in Perl?
my $scalar_ref = ?;
my $array_ref = [];
my $hash_ref = {};
If you want a reference to some mutable storage, there's no particularly neat direct syntax for it. About the best you can manage is
my $var;
my $sref = \$var;
Or neater
my $sref = \my $var;
Or if you don't want the variable itself to be in scope any more, you can use a do block:
my $sref = do { \my $tmp };
At this point you can pass $sref around by value, and any mutations to the scalar it references will be seen by others.
This technique of course works just as well for array or hash references, just that there's neater syntax for doing that with [] and {}:
my $aref = do { \my #tmp }; ## same as my $aref = [];
my $href = do { \my %tmp }; ## same as my $href = {};
Usually you just declare and don't initialize it.
my $foo; # will be undef.
You have to consider that empty hash refs and empty array refs point to a data structure that has a representation. Both of them, when dereferenced, give you an empty list.
perldata says (emphasis mine):
There are actually two varieties of null strings (sometimes referred to as "empty" strings), a defined one and an undefined one. The defined version is just a string of length zero, such as "" . The undefined version is the value that indicates that there is no real value for something, such as when there was an error, or at end of file, or when you refer to an uninitialized variable or element of an array or hash. Although in early versions of Perl, an undefined scalar could become defined when first used in a place expecting a defined value, this no longer happens except for rare cases of autovivification as explained in perlref. You can use the defined() operator to determine whether a scalar value is defined (this has no meaning on arrays or hashes), and the undef() operator to produce an undefined value.
So an empty scalar (which it didn't actually say) would be undef. If you want it to be a reference, make it one.
use strict;
use warnings;
use Data::Printer;
my $scalar_ref = \undef;
my $scalar = $$scalar_ref;
p $scalar_ref;
p $scalar;
This will output:
\ undef
undef
However, as ikegami pointed out, it will be read-only because it's not a variable. LeoNerd provides a better approach for this in his answer.
Anyway, my point is, an empty hash ref and an empty array ref when dereferenced both contain an empty list (). And that is not undef but nothing. But there is no nothing as a scalar value, because everything that is not nothing is a scalar value.
my $a = [];
say ref $r; # ARRAY
say scalar #$r; # 0
say "'#$r'"; # ''
So there is no real way to initialize with nothing. You can only not initialize. But Moose will turn it to undef anyway.
What you could do is make it maybe a scalar ref.
use strict;
use warnings;
use Data::Printer;
{
package Foo;
use Moose;
has bar => (
is => 'rw',
isa => 'Maybe[ScalarRef]',
predicate => 'has_bar'
);
}
my $foo = Foo->new;
p $foo->has_bar;
p $foo;
say $foo->bar;
Output:
""
Foo {
Parents Moose::Object
public methods (3) : bar, has_bar, meta
private methods (0)
internals: {}
}
Use of uninitialized value in say at scratch.pl line 268.
The predicate gives a value that is not true (the empty string ""). undef is also not true. The people who made Moose decided to go with that, but it really doesn't matter.
Probably what you want is not have a default value, but just make it a ScalarRef an required.
Note that perlref doesn't say anything about initializing an empty scalar ref either.
I'm not entirely sure why you need to but I'd suggest:
my $ref = \undef;
print ref $ref;
Or perhaps:
my $ref = \0;
#LeoNerd's answer is spot on.
Another option is to use a temporary anonymous hash value:
my $scalar_ref = \{_=>undef}->{_};
$$scalar_ref = "Hello!\n";
print $$scalar_ref;

Why does this Perl variable keep its value

What is the difference between the following two Perl variable declarations?
my $foo = 'bar' if 0;
my $baz;
$baz = 'qux' if 0;
The difference is significant when these appear at the top of a loop. For example:
use warnings;
use strict;
foreach my $n (0,1){
my $foo = 'bar' if 0;
print defined $foo ? "defined\n" : "undefined\n";
$foo = 'bar';
print defined $foo ? "defined\n" : "undefined\n";
}
print "==\n";
foreach my $m (0,1){
my $baz;
$baz = 'qux' if 0;
print defined $baz ? "defined\n" : "undefined\n";
$baz = 'qux';
print defined $baz ? "defined\n" : "undefined\n";
}
results in
undefined
defined
defined
defined
==
undefined
defined
undefined
defined
It seems that if 0 fails, so foo is never reinitialized to undef. In this case, how does it get declared in the first place?
First, note that my $foo = 'bar' if 0; is documented to be undefined behaviour, meaning it's allowed to do anything including crash. But I'll explain what happens anyway.
my $x has three documented effects:
It declares a symbol at compile-time.
It creates an new variable on execution.
It returns the new variable on execution.
In short, it's suppose to be like Java's Scalar x = new Scalar();, except it returns the variable if used in an expression.
But if it actually worked that way, the following would create 100 variables:
for (1..100) {
my $x = rand();
print "$x\n";
}
This would mean two or three memory allocations per loop iteration for the my alone! A very expensive prospect. Instead, Perl only creates one variable and clears it at the end of the scope. So in reality, my $x actually does the following:
It declares a symbol at compile-time.
It creates the variable at compile-time[1].
It puts a directive on the stack that will clear[2] the variable when the scope is exited.
It returns the new variable on execution.
As such, only one variable is ever created[2]. This is much more CPU-efficient than then creating one every time the scope is entered.
Now consider what happens if you execute a my conditionally, or never at all. By doing so, you are preventing it from placing the directive to clear the variable on the stack, so the variable never loses its value. Obviously, that's not meant to happen, so that's why my ... if ...; isn't allowed.
Some take advantage of the implementation as follows:
sub foo {
my $state if 0;
$state = 5 if !defined($state);
print "$state\n";
++$state;
}
foo(); # 5
foo(); # 6
foo(); # 7
But doing so requires ignoring the documentation forbidding it. The above can be achieved safely using
{
my $state = 5;
sub foo {
print "$state\n";
++$state;
}
}
or
use feature qw( state ); # Or: use 5.010;
sub foo {
state $state = 5;
print "$state\n";
++$state;
}
Notes:
"Variable" can mean a couple of things. I'm not sure which definition is accurate here, but it doesn't matter.
If anything but the sub itself holds a reference to the variable (REFCNT>1) or if variable contains an object, the directive replaces the variable with a new one (on scope exit) instead of clearing the existing one. This allows the following to work as it should:
my #a;
for (...) {
my $x = ...;
push #a, \$x;
}
See ikegami's better answer, probably above.
In the first example, you never define $foo inside the loop because of the conditional, so when you use it, you're referencing and then assigning a value to an implicitly declared global variable. Then, the second time through the loop that outside variable is already defined.
In the second example, $baz is defined inside the block each time the block is executed. So the second time through the loop it is a new, not yet defined, local variable.

What does '#_' do in Perl?

I was glancing through some code I had written in my Perl class and I noticed this.
my ($string) = #_;
my #stringarray = split(//, $string);
I am wondering two things:
The first line where the variable is in parenthesis, this is something you do when declaring more than one variable and if I removed them it would still work right?
The second question would be what does the #_ do?
The #_ variable is an array that contains all the parameters passed into a subroutine.
The parentheses around the $string variable are absolutely necessary. They designate that you are assigning variables from an array. Without them, the #_ array is assigned to $string in a scalar context, which means that $string would be equal to the number of parameters passed into the subroutine. For example:
sub foo {
my $bar = #_;
print $bar;
}
foo('bar');
The output here is 1--definitely not what you are expecting in this case.
Alternatively, you could assign the $string variable without using the #_ array and using the shift function instead:
sub foo {
my $bar = shift;
print $bar;
}
Using one method over the other is quite a matter of taste. I asked this very question which you can check out if you are interested.
When you encounter a special (or punctuation) variable in Perl, check out the perlvar documentation. It lists them all, gives you an English equivalent, and tells you what it does.
Perl has two different contexts, scalar context, and list context. An array '#_', if used in scalar context returns the size of the array.
So given these two examples, the first one gives you the size of the #_ array, and the other gives you the first element.
my $string = #_ ;
my ($string) = #_ ;
Perl has three 'Default' variables $_, #_, and depending on who you ask %_. Many operations will use these variables, if you don't give them a variable to work on. The only exception is there is no operation that currently will by default use %_.
For example we have push, pop, shift, and unshift, that all will accept an array as the first parameter.
If you don't give them a parameter, they will use the 'default' variable instead. So 'shift;' is the same as 'shift #_;'
The way that subroutines were designed, you couldn't formally tell the compiler which values you wanted in which variables. Well it made sense to just use the 'default' array variable '#_' to hold the arguments.
So these three subroutines are (nearly) identical.
sub myjoin{
my ( $stringl, $stringr ) = #_;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift;
my $stringr = shift;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift #_;
my $stringr = shift #_;
return "$stringl$stringr";
}
I think the first one is slightly faster than the other two, because you aren't modifying the #_ variable.
The variable #_ is an array (hence the # prefix) that holds all of the parameters to the current function.