How does the closure mechanism work in Perl? - perl

Newbie in Perl again here, trying to understand closure in Perl.
So here's an example of code which I don't understand:
sub make_saying {
my $salute = shift;
my $newfunc = sub {
my $target = shift;
print "$salute, $target!\n";
};
return $newfunc; # Return a closure
}
$f = make_saying("Howdy"); # Create a closure
$g = make_saying("Greetings"); # Create another closure
# Time passes...
$f->("world");
$g->("earthlings");
So my questions are:
If a variable is assigned to a function, is it automatically a reference to that function?
In that above code, can I write $f = \make_saying("Howdy") instead? And when can I use the & because I tried using that in passing the parameters (&$f("world")) but it doesn't work.
and lastly, in that code above how did the strings world and earthlings get appended to the strings Howdy and Greetings.
Note: I understand that $f is somewhat bound to the function with the parameter "Howdy" so that's my understanding how "world" got appended. What I don't understand is the 2nd function inside. How that one operates its magic. Sorry I really don't know how to ask this one.

In Perl, scalar variables cannot hold subroutines directly, they can only hold references. This is very much like scalars cannot hold arrays or hashes, only arrayrefs or hashrefs.
The sub { ... } evaluates to a coderef, so you can directly assign it to a scalar variable. If you want to assign a named function (e.g. foo), you have to obtain the reference like \&foo.
You can call coderefs like $code->(#args) or &$code(#args).
The code
$f = \make_saying("Howdy")
evaluates make_saying("Howdy"), and takes a reference to the returned value. So you get a reference that points to a coderef, not a coderef itself.
Therefore, it can't be called like &$f("world"), you need to dereference one extra level: &$$f("world").
A closure is a function that is bound to a certain environment.
The environment consists of all currently visible variables, so a closure always remembers that scope. In the code
my $x;
sub foo {
my $y;
return sub { "$x, $y" };
}
foo is a closure over $x, as the outer environment consists of $x. The inner sub is a closure over $x and $y.
Each time foo is executed, we get a new $y and therefore a new closure. Most importantly $y does not "go out of scope" when foo is left, but it will be "kept alive" as it's still reachable from the closure returned.
In short: $y is a "local state" of the closure returned.
When we execute make_saying("Howdy"), the $salute variable is set to Howdy. The returned closure remembers that scope.
When we execute it again with make_saying("Greetings"), the body of make_saying is evaluated again. The $salute is now set to Greetings, and the inner sub closes over this variable. This variable is separate from the previous $salute, which still exists, but isn't accessible except through the first closure.
The two greeters have closed over separate $salute variables. When they are executed, their respective $salute is still in scope, and they can access and modify the value.

If a variable is asigned to a function, is it automatically a
reference to that function?
No. In example the function make_saying return reference another function. Such closures do not have name and can catch a variable from outside its scope (variable $salute in your example).
In that above code, can i write $f = \make_saying("Howdy") instead?
And when can i use the & cause i tried using that in passing the
parameters (&$f("world")) but it doesnt work.
No. $f = \make_saying("Howdy") is not what you think (read amon post for details). You can write $f = \&make_saying; which means "put into $f reference to function make_saying". You can use it later like this:
my $f = \&make_saying;
my $other_f = $f->("Howdy");
$other_f->("world");
and lastly, In that code above how in he** did the words world and
earthlings got appended to the words howdy and greetings.
make_saying creating my variable which goes into lamda (my $newfunc = sub); that lambda is returned from make_saying. It holds the given word "Howdy" through "closing" (? sorry dont know which word in english).

Every time you call the subroutine 'make_saying', it:
creates a DIFFERENT closure
assigns the received parameter to the scalar '$salute'
defines (creates but not execute) an inner anonymous subroutine:
That is the reason why at that moment nothing is assigned to the scalar
$target nor is the statement print "$salute, $target!\n"; executed .
finally the subroutine 'make_saying' returns a reference to the inner anonymous subroutine, that reference becomes the only way to call the (specific) anonymous subroutine.
Ever time you call each anonymous subroutine, it:
assign the received parameter to the scalar $target
sees also the scalar $salute that will have the value assigned at the moment in which was created the anonymous subroutine (when was called its parent subroutine make_saying
finally executes the statement print "$salute, $target!\n";

Related

Perl Recursive Explanation

I have following script that need to understand
&main('key_1');
sub main {
#{$tmp{'key_1'}} = ("A", "B");
#{$tmp{'A'}} = ("C");
&test_recurse(\%tmp, 'key_1');
}
sub test_recurse {
my ($hash, $cell) = #_;
foreach my $cur_subcell (#{$hash->{$cell}}){
&test_recurse($hash, $cur_subcell);
}
print "X($cell)\n";
}
The output:
X(C)
X(A)
X(B)
X(key_1)
I want to understand why the key_1 is printing at the last? I am expecting the key_1 might not be printed at all.
I am expecting the key_1 might not be printed at all
The function test_recurse ends with print "X($cell)\n". This means that it ends by printing its second argument. Since you initially call it with key_1 as argument, it prints X(key_1).
To understand a bit better how the function test_recurse works, I suggest to add some prints as follows:
sub test_recurse {
my ($hash, $cell, $offset) = #_;
print "${offset}test_recurse($cell)\n";
foreach my $cur_subcell (#{$hash->{$cell}}){
&test_recurse($hash, $cur_subcell, $offset . " ");
}
print "${offset}X($cell)\n";
}
Thanks to the addition of $offset, each time you make a recursive call, the prints within this recursive call are indented further to the right. Calling this modified function with test_recurse(\%tmp, 'key_1', ""), you'll get this output:
test_recurse(key_1)
test_recurse(A)
test_recurse(C)
X(C)
X(A)
test_recurse(B)
X(B)
X(key_1)
So, what happens is:
You call test_recurse with key_1. This prints test_recurse(key_1).
In the foreach loop, it will make two successive calls to test_recurse:
The first one with A as argument. This will print test_recurse(A).
In the foreach loop, it will make a call to
test_recurse with C as argument. This will print test_recurse(C).
Since $tmp{C} does not exist, this call does not enter in the foreach loop, and directly proceed to the final print and prints X(C). We then go back to the caller (test_recurse with A as argument).
Now that the foreach loop is done, this function moves on to the last print and prints X(A). We then go back to the caller (test_recurse with key_1 as argument).
The second recursive call is to test_recurse with B as argument. This will print test_recurse(B). Since $tmp{B} does not exist, we do not enter the foreach loop and move on to the final print, which prints X(B). We then return to the caller (test_recurse with key_1 as argument).
The foreach loop is now over and we move on to the final print, which prints X(key_1).
Some tips:
Always add use strict and use warnings at the beginning of your scripts.
#{$tmp{'key_1'}} = ("A", "B"); would be clearer as $tmp{'key_1'} = [ 'A', 'B' ].
The whole initialization of %tmp could actually be done with:
my %tmp = (
key_1 => [ 'A', 'B' ],
A => [ 'C' ]
);
You call &main('key_1'); with key_1 as argument, but main does not expect any argument.
To call a function, you don't need &: do test_recurse(\%tmp, 'key_1'); instead of &test_recurse(\%tmp, 'key_1');.
In a comment, you say:
I am just thinking that as if the variable $cell has been replace by C, A, B why the key_1 is coming back at the end.
And I think that's probably a good indication of where the confusion lies.
Your test_recurse() subroutine starts with this line:
my ($hash, $cell) = #_;
That defines two new variables called $hash and $cell and then populates them from #_. Because these variables are declared using my, they are lexical variables. That means they are only visible within the block of code where they are declared.
Later on in test_recurse() you call the same subroutine again. And, once again, that subroutine starts with the same declaration statement and creates another two variables called $hash and $cell. These two new variables are completely separate from the original two variables. The original variables still exist and still contain their original values - but you can't currently access them because they are declared in a different call to the subroutine.
So when your various calls to the subroutine end, you rewind back to the original call - and that still has the original two variables which still hold their original values.

perl assign reference to subroutine

I use #_ in a subroutine to get a parameter which is assigned as a reference of an array, but the result dose not showing as an array reference.
My code is down below.
my #aar = (9,8,7,6,5);
my $ref = \#aar;
AAR($ref);
sub AAR {
my $ref = #_;
print "ref = $ref";
}
This will print 1 , not an array reference , but if I replace #_ with shift , the print result will be a reference.
can anyone explain why I can't get a reference using #_ to me ?
This is about context in Perl. It is a crucial aspect of the language.
An expression like
my $var = #ary;
attempts to assign an array to a scalar.
That doesn't make sense as it stands and what happens is that the right-hand side is evaluated to the number of elements of the array and that is assigned to $var.
In order to change that behavior you need to provide the "list context" to the assignment operator.† In this case you'd do
my ($var) = #ary;
and now we have an assignment of a list (of array elements) to a list (of variables, here only $var), where they are assigned one for one. So here the first element of #ary is assigned to $var. Please note that this statement plays loose with the elusive notion of the "list."
So in your case you want
my ($ref) = #_;
and the first element from #_ is assigned to $ref, as needed.
Alternatively, you can remove and return the first element of #_ using shift, in which case the scalar-context assignment is fine
my $ref = shift #_;
In this case you can also do
my $ref = shift;
since shift by default works on #_.
This is useful when you want to remove the first element of input as it's being assigned so that the remaining #_ is well suited for further processing. It is often done in object-oriented code.
It is well worth pointing out that many operators and builtin facilities in Perl act differently depending on what context they are invoked in.
For some specifics, just a few examples: the regex match operator returns true/false (1/empty string) in scalar context but the actual matches in list context,‡ readdir returns a single entry in scalar context but all of them in list context, while localtime shows a bit more distinct difference. This context-sensitive behavior is in every corner of Perl.
User level subroutines can be made to behave that way via wantarray.
†
See Scalar vs List Assignment Operator
for a detailed discussion
‡
See it in perlretut and in perlop for instance
When you assign an array to a scalar, you're getting the size of the array. You pass one argument (a reference to an array) to AAR, that's why you get 1.
To get the actual parameters, place the local variable in braces:
sub AAR {
my ($ref) = #_;
print "ref = $ref\n";
}
This prints something like ref = ARRAY(0x5566c89a4710).
You can then use the reference to access the array elements like this:
print join(", ", #{$ref});

Is it possible to use hash, array, scalar with the same name in Perl?

I'm converting perl script to python.
I haven't used perl language before so many things in perl confused me.
For example, below, opt was declared as a scalar first but declared again as a hash. %opt = ();
Is it possible to declare a scalar and a hash with the same name in perl?
As I know, foreach $opt (#opts) means that scalar opt gets values of array opts one by one. opt is an array at this time???
In addition, what does $opt{$opt} mean?
opt outside $opt{$opt} is a hash and opt inside $opt{$opt} is a scalar?
I'm so confused, please help me ...
sub ParseOptions
{
local (#optval) = #_;
local ($opt, #opts, %valFollows, #newargs);
while (#optval) {
$opt = shift(#optval);
push(#opts,$opt);
$valFollows{$opt} = shift(#optval);
}
#optArgs = ();
%opt = ();
arg: while (defined($arg = shift(#ARGV))) {
foreach $opt (#opts) {
if ($arg eq $opt) {
push(#optArgs, $arg);
if ($valFollows{$opt}) {
if (#ARGV == 0) {
&Usage();
}
$opt{$opt} = shift(#ARGV);
push(#optArgs, $opt{$opt});
} else {
$opt{$opt} = 1;
}
next arg;
}
}
push(#newargs,$arg);
}
#ARGV = #newargs;
}
In Perl types SCALAR, ARRAY, and HASH are distinguished by the leading sigil in the variable name, $, #, and % respectively, but otherwise they may bear the same name. So $var and #var and %var are, other than belonging to the same *var typeglob, completely distinct variables.
A key for a hash must be a scalar, let's call it $key. A value in a hash must also be a scalar, and the one corresponding to $key in the hash %h is $h{$key}, the leading $ indicating that this is now a scalar. Much like an element of an array is a scalar, thus $ary[$index].
In foreach $var (#ary) the scalar $var does not really "get values" of array elements but rather aliases them. So if you change it in the loop you change the element of the array.
(Note, you have the array #opts, not #opt.)
A few comments on the code, in the light of common practices in modern Perl
One must always have use warnings; on top. The other one that is a must is use strict, as it enforces declaration of variables with my (or our), promoting all kinds of good behavior. This restricts the scope of lexical (my) variables to the nearest block enclosing the declaration
A declaration may be done inside loops as well, allowing while (my $var = EXPR) {} and foreach my $var (LIST) {}, and then $var is scoped to the loop's block (and is also usable in the rest of the condition)
The local is a different beast altogether, and in this code those should be my instead
Loop labels like arg: are commonly typed in block capitals
The & in front of a function name has rather particular meanings† and is rarely needed. It is almost certainly not needed in this code
Assignment of empty list like my #optArgs = () when the variable is declared (with my) is unneeded and has no effect (with a global #name it may be needed to clear it from higher scope)
In this code there is no need to introduce variables first since they are global and thus created the first time they are used, much like in Python – Unless they are used outside of this sub, in which case they may need to be cleared. That's the thing with globals, they radiate throughout all code
Except for the last two these are unrelated to your Python translation task, for this script.
† It suppresses prototypes; informs interpreter (at runtime) that it's a subroutine and not a "bareword", if it hasn't been already defined (can do that with () after the name); ensures that the user-defined sub is called, if there is a builtin of the same name; passes the caller's #_ (when without parens, &name); in goto &name and defined ...
See, for example, this post and this post
First, to make it clear, in perl, as in shell, you could surround variable names in curly brackets {} :
${bimbo} and $bimbo are the same scalar variable.
#bimbo is an array pointer ;
%bimbo is a hash pointer ;
$bimbo is a scalar (unique value).
To address an array or hash value, you will have to use the '$' :
$bimbo{'index'} is the 'index' value of hash %bimbo.
If $i contains an int, such as 1 for instance,
$bimbo[$i] is the second value of the array #bimbo.
So, as you can see, # or % ALWAYS refers to the whole array, as $bimbo{} or $bimbo[] refers to any value of the hash or array.
${bimbo[4]} refers to the 5th value of the array #bimbo ; %{bimbo{'index'}} refers to the 'index' value of array %bimbo.
Yes, all those structures could have the same name. This is one of the obvious syntax of perl.
And, euhm… always think in perl as a C++ edulcored syntax, it is simplified, but it is C.

How does #_ work in Perl subroutines?

I was always sure that if I pass a Perl subroutine a simple scalar, it can never change its value outside the subroutine. That is:
my $x = 100;
foo($x);
# without knowing anything about foo(), I'm sure $x still == 100
So if I want foo() to change x, I must pass it a reference to x.
Then I found out this is not the case:
sub foo {
$_[0] = 'CHANGED!';
}
my $x = 100;
foo($x);
print $x, "\n"; # prints 'CHANGED!'
And the same goes for array elements:
my #arr = (1,2,3);
print $arr[0], "\n"; # prints '1'
foo($arr[0]);
print $arr[0], "\n"; # prints 'CHANGED!'
That kinda surprised me. How does this work? Isn't the subroutine only gets the value of the argument? How does it know its address?
In Perl, the subroutine arguments stored in #_ are always aliases to the values at the call site. This aliasing only persists in #_, if you copy values out, that's what you get, values.
so in this sub:
sub example {
# #_ is an alias to the arguments
my ($x, $y, #rest) = #_; # $x $y and #rest contain copies of the values
my $args = \#_; # $args contains a reference to #_ which maintains aliases
}
Note that this aliasing happens after list expansion, so if you passed an array to example, the array expands in list context, and #_ is set to aliases of each element of the array (but the array itself is not available to example). If you wanted the latter, you would pass a reference to the array.
Aliasing of subroutine arguments is a very useful feature, but must be used with care. To prevent unintended modification of external variables, in Perl 6 you must specify that you want writable aliased arguments with is rw.
One of the lesser known but useful tricks is to use this aliasing feature to create array refs of aliases
my ($x, $y) = (1, 2);
my $alias = sub {\#_}->($x, $y);
$$alias[1]++; # $y is now 3
or aliased slices:
my $slice = sub {\#_}->(#somearray[3 .. 10]);
it also turns out that using sub {\#_}->(LIST) to create an array from a list is actually faster than [ LIST ] since Perl does not need to copy every value. Of course the downside (or upside depending on your perspective) is that the values remain aliased, so you can't change them without changing the originals.
As tchrist mentions in a comment to another answer, when you use any of Perl's aliasing constructs on #_, the $_ that they provide you is also an alias to the original subroutine arguments. Such as:
sub trim {s!^\s+!!, s!\s+$!! for #_} # in place trimming of white space
Lastly all of this behavior is nestable, so when using #_ (or a slice of it) in the argument list of another subroutine, it also gets aliases to the first subroutine's arguments:
sub add_1 {$_[0] += 1}
sub add_2 {
add_1(#_) for 1 .. 2;
}
This is all documented in detail in perldoc perlsub. For example:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The
array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the
corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the
function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the
element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
Perl passes arguments by reference, not by value. See http://www.troubleshooters.com/codecorn/littperl/perlsub.htm

What does '#_' do in Perl?

I was glancing through some code I had written in my Perl class and I noticed this.
my ($string) = #_;
my #stringarray = split(//, $string);
I am wondering two things:
The first line where the variable is in parenthesis, this is something you do when declaring more than one variable and if I removed them it would still work right?
The second question would be what does the #_ do?
The #_ variable is an array that contains all the parameters passed into a subroutine.
The parentheses around the $string variable are absolutely necessary. They designate that you are assigning variables from an array. Without them, the #_ array is assigned to $string in a scalar context, which means that $string would be equal to the number of parameters passed into the subroutine. For example:
sub foo {
my $bar = #_;
print $bar;
}
foo('bar');
The output here is 1--definitely not what you are expecting in this case.
Alternatively, you could assign the $string variable without using the #_ array and using the shift function instead:
sub foo {
my $bar = shift;
print $bar;
}
Using one method over the other is quite a matter of taste. I asked this very question which you can check out if you are interested.
When you encounter a special (or punctuation) variable in Perl, check out the perlvar documentation. It lists them all, gives you an English equivalent, and tells you what it does.
Perl has two different contexts, scalar context, and list context. An array '#_', if used in scalar context returns the size of the array.
So given these two examples, the first one gives you the size of the #_ array, and the other gives you the first element.
my $string = #_ ;
my ($string) = #_ ;
Perl has three 'Default' variables $_, #_, and depending on who you ask %_. Many operations will use these variables, if you don't give them a variable to work on. The only exception is there is no operation that currently will by default use %_.
For example we have push, pop, shift, and unshift, that all will accept an array as the first parameter.
If you don't give them a parameter, they will use the 'default' variable instead. So 'shift;' is the same as 'shift #_;'
The way that subroutines were designed, you couldn't formally tell the compiler which values you wanted in which variables. Well it made sense to just use the 'default' array variable '#_' to hold the arguments.
So these three subroutines are (nearly) identical.
sub myjoin{
my ( $stringl, $stringr ) = #_;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift;
my $stringr = shift;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift #_;
my $stringr = shift #_;
return "$stringl$stringr";
}
I think the first one is slightly faster than the other two, because you aren't modifying the #_ variable.
The variable #_ is an array (hence the # prefix) that holds all of the parameters to the current function.