Can anybody please explain (my $self = shift) in Perl - perl

I'm having a really hard time understanding the intersection of OO Perl and my $self = shift; The documentation on these individual elements is great, but none of them that I've found touch on how they work together.
I've been using Moose to make modules with attributes, and of course, it's useful to reference a module's attribute within said module. I've been told over and over again to use my $self = shift; within a subroutine to assign the module's attributes to that variable. This makes sense and works, but when I'm also passing arguments to the subroutine, this process clearly takes the first element of the #ARGV array and assigns it to $self as well.
Can someone offer an explanation of how I can use shift to gain internal access to a module's attributes, while also passing in arguments in the #ARGV array?

First off, a subroutine isn't passed the #ARGV array. Rather all the parameters passed to a subroutine are flattened into a single list represented by #_ inside the subroutine. The #ARGV array is available at the top-level of your script, containing the command line arguments passed to you script.
Now, in Perl, when you call a method on an object, the object is implicitly passed as a parameter to the method.
If you ignore inheritance,
$obj->doCoolStuff($a, $b);
is equivalent to
doCoolStuff($obj, $a, $b);
Which means the contents of #_ in the method doCoolStuff would be:
#_ = ($obj, $a, $b);
Now, the shift builtin function, without any parameters, shifts an element out of the default array variable #_. In this case, that would be $obj.
So when you do $self = shift, you are effectively saying $self = $obj.
I also hope this explains how to pass other parameters to a method via the -> notation. Continuing the example I've stated above, this would be like:
sub doCoolStuff {
# Remember #_ = ($obj, $a, $b)
my $self = shift;
my ($a, $b) = #_;
Additionally, while Moose is a great object layer for Perl, it doesn't take away from the requirement that you need to initialize the $self yourself in each method. Always remember this. While language like C++ and Java initialize the object reference this implicitly, in Perl you need to do it explicitly for every method you write.

In top level-code, shift() is short for shift(#ARGV). #ARGV contains the command-line arguments.
In a sub, shift() is short for shift(#_). #_ contains the sub's arguments.
So my $self = shift; is grabbing the sub's first argument. When calling a method, the invocant (what's left of the ->) is passed as the first parameter. In other words,
$o->method(#a)
is similar to
my $sub = $o->can('method');
$sub->($o, #a);
In that example, my $self = shift; will assign $o to $self.

If you call:
$myinstance->myMethod("my_parameter");
is the same that doing:
myMethod($myinstance, "my_parameter");
but if you do:
myMethod("my_parameter");
only "my_parameter" wil be passed.
THEN if inside myMethod always you do :
$self = shift #_;
$self will be the object reference when myMethod id called from an object context
but will be "my_parameter" when called from another method inside on a procedural way.
Be aware of this;

Related

Assigning shift to a variable

Other thread that I read: What does assigning 'shift' to a variable mean?
I also used perldoc -f shift:
shift ARRAY
shift
Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If there are no elements in the array, returns the undefined value. If ARRAY is omitted, shifts the #_ array within the lexical scope of subroutines and formats, and the #ARGV array outside of a subroutine and also within the lexical scopes established by the eval STRING, BEGIN {}, INIT {}, CHECK {}, UNITCHECK {} and END {} constructs.
See also unshift, push, and pop. shift and unshift do the same thing to the left end of an array that pop and push do to the right end.
I understand outside of subroutines, the array is #ARGV, and inside arguments are passed through #_
I've read countless tutorials on how to use the shift function, but it's always about arrays, and how it removes the first element at the beginning of the array and returns it. But I see sometimes
$eg = shift;
~do more things here~
It seems as if nothing is making sense to me, and I feel like I can't continue reading more until I understand how this works as it's a "basic building block" to the language.
I'm not quite sure if an example of code is needed, as I believe the same principles apply to all programs that use shift. But if wrong I can provide some examples of code.
It depends on your context.
In all cases, shift removes the element at list index 0 and returns it, shifting all remaining elements down one.
Inside a sub, shift without an argument (bare shift) operates on the #_ list of subroutine parameters.
Suppose I call mysub('me', '22')
sub mysub {
my $self = shift; # return $_[0], moves everything else down, $self = 'me', #_ = ( '22' )
my $arg1 = shift; # returns $_[0], moves everything down, $arg1 = '22', #_ = ()
}
Outside a sub, it operates on the #ARGV list of command line parameters.
Given an argument, shift operates on that list.
my #list1 = ( 1,2,3,4,5 );
my $first = shift #list1; # $first = 1, #list1 = (2,3,4,5)
It removes the first element from an array and returns it.
If #_ contains the elements ("foo","bar",123) then the statement
$eg = shift; # same as $eg = shift #_
Assigns the value "foo" to the variable $eg, and leaves #_ containing the elements ("bar",123).
This is very much a basic building block of the language. You will frequently see this construction at the beginning of subroutines as it is one of the ways to copy the arguments to the subroutine (#_) into other variables.
sub func {
my $x = shift; # put first arg into $x
my $y = shift; # put second arg into $y
my #z = #_; # put remaining args into #z
...
}
$r = func(1,2,"foo");

sub _init in Perl explanation

I was wondering what this subroutine does in Perl. I believe i have the general idea but i'm wondering about some of the syntax.
sub _init
{
my $self = shift;
if (#_) {
my %extra = #_;
#$self{keys %extra} = values %extra;
}
}
Here is what i think it does: essentially add any "extra" key-value pairs to the nameless hash refereced by the variable $self. Also i'm not 100% sure about this but i think my $self = shift is actually referring to the variable $self that called the _init() subroutine.
My specific questions are:
Is $self actually referring to the variable that called the subroutine _init() ?
What does the #$ syntax mean when writing #$self{keys %extra} = values %extra;
Your understanding is correct. This allows the user of a class to set any parameters they want into the object.
Yes. If you call, for example, $myobject->_init('color', 'green'), then this code will set $myobject->{'color'} = 'green'.
This is a somewhat confusing hash operation. keys %extra is a list (obviously of keys). We are effectively using a "hash slice" here. Think of it like an array slice, where you can call #$arrayref[1, 3, 4]. We use the # sign here because we're talking about a list - it's the list of values corresponding to the list of keys keys %extra in the hash referenced by $self.
Another way to write this would be:
foreach my $key (keys %extra) {
$self->{$key} = $extra{$key};
}
Is $self actually referring to the variable that called the subroutine _init()?
Variables don't call subroutines.
The invocant (what's on the left of the -> in ->_init()) is passed to the method as its first argument, and you placed this in $self. (shift() is short for shift(#_) in subs.)
What does the #$ syntax mean when writing #$self{keys %extra} = values %extra;
#hash{LIST} is a hash slice.
#{ EXPR }{LIST} is a hash slice where the hash to slice is specified through a reference. The curlies are optional when EXPR is simple scalar lookup, so #{ $hash_ref }{LIST} can be written as #$hash_ref{LIST}.
The method add the arguments to %$self, the hash-based object used as the invocant. It could also have been written as follows:
%$self = ( %$self, #_ );

in perl, is it bad practice to call multiple subroutines with default arguments?

I am learning perl and understand that it is a common and accepted practice to unpack subroutine arguments using shift. I also understand that it is common and acceptable practice to omit function arguments to use the default #_ array.
Considering these two things, if you call a subroutine without arguments, the #_ can (and will, if using shift) be changed. Does this mean that calling another subroutine with default arguments, or, in fact, using the #_ array after this, is considered bad practice? Consider this example:
sub total { # calculate sum of all arguments
my $running_sum;
# take arguments one by one and sum them together
while (#_) {
$running_sum += shift;
}
$running_sum;
}
sub avg { calculate the mean of given arguments
if (#_ == 0) { return }
my $sum = &total; # gets the correct answer, but changes #_
$sum / #_ # causes division by zero, since #_ is now empty
}
My gut feeling tells me that using shift to unpack arguments would actually be bad practice, unless your subroutine is actually supposed to change the passed arguments, but I have read in multiple places, including Stack Overflow, that this is not a bad practice.
So the question is: if using shift is common practice, should I always assume the passed argument list could get changed, as a side-effect of the subroutine (like the &total subroutine in the quoted example)? Is there maybe a way to pass arguments by value, so I can be sure that the argument list does not get changed, so I could use it again (like in the &avg subroutine in the quoted text)?
In general, shifting from the arguments is ok—using the & sigil to call functions isn't. (Except in some very specific situations you'll probably never encounter.)
Your code could be re-written, so that total doesn't shift from #_. Using a for-loop may even be more efficient.
sub total {
my $total = 0;
$total += $_ for #_;
$total;
}
Or you could use the sum function from List::Util:
use List::Util qw(sum);
sub avg { #_ ? sum(#_) / #_ : 0 }
Using shift isn't that common, except for extracting $self in object oriented Perl. But as you always call your functions like foo( ... ), it doesn't matter if foo shifts or doesn't shift the argument array.
(The only thing worth noting about a function is whether it assigns to elements in #_, as these are aliases for the variables you gave as arguments. Assigning to elements in #_ is usually bad.)
Even if you can't change the implementation of total, calling the sub with an explicit argument list is safe, as the argument list is a copy of the array:
(a) &total — calls total with the identical #_, and overrides prototypes.
(b) total(#_) — calls total with a copy of #_.
(c) &total(#_) — calls total with a copy of #_, and overrides prototypes.
Form (b) is standard. Form (c) shouldn't be seen, except in very few cases for subs inside the same package where the sub has a prototype (and don't use prototypes), and they have to be overridden for some obscure reason. A testament to poor design.
Form (a) is only sensible for tail calls (#_ = (...); goto &foo) or other forms of optimization (and premature optimization is the root of all evil).
You should avoid using the &func; style of calling unless you have a really good reason, and trust that others do the same.
To guard your #_ against modification by a callee, just do &func() or func.
Perl is a little too lax sometimes and having multiple ways of accessing input parameters can make smelly and inconsistent code. For want of a better answer, try to impose your own standard.
Here's a few ways I've used and seen
Shifty
sub login
{
my $user = shift;
my $passphrase = shift;
# Validate authentication
return 0;
}
Expanding #_
sub login
{
my ($user, $passphrase) = #_;
# Validate authentication
return 0;
}
Explicit indexing
sub login
{
my user = $_[0];
my user = $_[1];
# Validate authentication
return 0;
}
Enforce parameters with function prototypes (this is not popular however)
sub login($$)
{
my ($user, $passphrase) = #_;
# Validate authentication
return 0;
}
Sadly you still have to perform your own convoluted input validation/taint checking, ie:
return unless defined $user;
return unless defined $passphrase;
or better still, a little more informative
unless (defined($user) && defined($passphrase)) {
carp "Input error: user or passphrase not defined";
return -1;
}
Perldoc perlsub should really be your first port of call.
Hope this helps!
Here are some examples where the careful use of #_ matters.
1. Hash-y Arguments
Sometimes you want to write a function which can take a list of key-value pairs, but one is the most common use and you want that to be available without needing a key. For example
sub get_temp {
my $location = #_ % 2 ? shift : undef;
my %options = #_;
$location ||= $options{location};
...
}
So now if you call the function with an odd number of arguments, the first is location. This allows get_temp('Chicago') or get_temp('New York', unit => 'C') or even get_temp( unit => 'K', location => 'Nome, Ak'). This may be a more convenient API for your users. By shifting the odd argument, now #_ is an even list and may be assigned to a hash.
2. Dispatching
Lets say we have a class that we want to be able to dispatch methods by name (possibly AUTOLOAD could be useful, we will hand roll). Perhaps this is a command line script where arguments are methods. In this case we define two dispatch methods one "clean" and one "dirty". If we call with the -c flag we get the clean one. These methods find the method by name and call it. The difference is how. The dirty one leaves itself in the stack trace, the clean one has to be more cleaver, but dispatches without being in the stack trace. We make a death method which gives us that trace.
#!/usr/bin/env perl
use strict;
use warnings;
package Unusual;
use Carp;
sub new {
my $class = shift;
return bless { #_ }, $class;
}
sub dispatch_dirty {
my $self = shift;
my $name = shift;
my $method = $self->can($name) or confess "No method named $name";
$self->$method(#_);
}
sub dispatch_clean {
my $self = shift;
my $name = shift;
my $method = $self->can($name) or confess "No method named $name";
unshift #_, $self;
goto $method;
}
sub death {
my ($self, $message) = #_;
$message ||= 'died';
confess "$self->{name}: $message";
}
package main;
use Getopt::Long;
GetOptions
'clean' => \my $clean,
'name=s' => \(my $name = 'Robot');
my $obj = Unusual->new(name => $name);
if ($clean) {
$obj->dispatch_clean(#ARGV);
} else {
$obj->dispatch_dirty(#ARGV);
}
So now if we call ./test.pl to invoke the death method
$ ./test.pl death Goodbye
Robot: Goodbye at ./test.pl line 32
Unusual::death('Unusual=HASH(0xa0f7188)', 'Goodbye') called at ./test.pl line 19
Unusual::dispatch_dirty('Unusual=HASH(0xa0f7188)', 'death', 'Goodbye') called at ./test.pl line 46
but wee see dispatch_dirty in the trace. If instead we call ./test.pl -c we now use the clean dispatcher and get
$ ./test.pl -c death Adios
Robot: Adios at ./test.pl line 33
Unusual::death('Unusual=HASH(0x9427188)', 'Adios') called at ./test.pl line 44
The key here is the goto (not the evil goto) which takes the subroutine reference and immediately switches the execution to that reference, using the current #_. This is why I have to unshift #_, $self so that the invocant is ready for the new method.
Refs:
sub refWay{
my ($refToArray,$secondParam,$thirdParam) = #_;
#work here
}
refWay(\#array, 'a','b');
HashWay:
sub hashWay{
my $refToHash = shift; #(if pass ref to hash)
#and i know, that:
return undef unless exists $refToHash->{'user'};
return undef unless exists $refToHash->{'password'};
#or the same in loop:
for (qw(user password etc)){
return undef unless exists $refToHash->{$_};
}
}
hashWay({'user'=>YourName, 'password'=>YourPassword});
I tried a simple example:
#!/usr/bin/perl
use strict;
sub total {
my $sum = 0;
while(#_) {
$sum = $sum + shift;
}
return $sum;
}
sub total1 {
my ($a, $aa, $aaa) = #_;
return ($a + $aa + $aaa);
}
my $s;
$s = total(10, 20, 30);
print $s;
$s = total1(10, 20, 30);
print "\n$s";
Both print statements gave answer as 60.
But personally I feel, the arguments should be accepted in this manner:
my (arguments, #garb) = #_;
in order to avoid any sort of issue latter.
I found the following gem in http://perldoc.perl.org/perlsub.html:
"Yes, there are still unresolved issues having to do with visibility of #_ . I'm ignoring that question for the moment. (But note that if we make #_ lexically scoped, those anonymous subroutines can act like closures... (Gee, is this sounding a little Lispish? (Never mind.)))"
You might have run into one of those issues :-(
OTOH amon is probably right -> +1

How does #_ work in Perl subroutines?

I was always sure that if I pass a Perl subroutine a simple scalar, it can never change its value outside the subroutine. That is:
my $x = 100;
foo($x);
# without knowing anything about foo(), I'm sure $x still == 100
So if I want foo() to change x, I must pass it a reference to x.
Then I found out this is not the case:
sub foo {
$_[0] = 'CHANGED!';
}
my $x = 100;
foo($x);
print $x, "\n"; # prints 'CHANGED!'
And the same goes for array elements:
my #arr = (1,2,3);
print $arr[0], "\n"; # prints '1'
foo($arr[0]);
print $arr[0], "\n"; # prints 'CHANGED!'
That kinda surprised me. How does this work? Isn't the subroutine only gets the value of the argument? How does it know its address?
In Perl, the subroutine arguments stored in #_ are always aliases to the values at the call site. This aliasing only persists in #_, if you copy values out, that's what you get, values.
so in this sub:
sub example {
# #_ is an alias to the arguments
my ($x, $y, #rest) = #_; # $x $y and #rest contain copies of the values
my $args = \#_; # $args contains a reference to #_ which maintains aliases
}
Note that this aliasing happens after list expansion, so if you passed an array to example, the array expands in list context, and #_ is set to aliases of each element of the array (but the array itself is not available to example). If you wanted the latter, you would pass a reference to the array.
Aliasing of subroutine arguments is a very useful feature, but must be used with care. To prevent unintended modification of external variables, in Perl 6 you must specify that you want writable aliased arguments with is rw.
One of the lesser known but useful tricks is to use this aliasing feature to create array refs of aliases
my ($x, $y) = (1, 2);
my $alias = sub {\#_}->($x, $y);
$$alias[1]++; # $y is now 3
or aliased slices:
my $slice = sub {\#_}->(#somearray[3 .. 10]);
it also turns out that using sub {\#_}->(LIST) to create an array from a list is actually faster than [ LIST ] since Perl does not need to copy every value. Of course the downside (or upside depending on your perspective) is that the values remain aliased, so you can't change them without changing the originals.
As tchrist mentions in a comment to another answer, when you use any of Perl's aliasing constructs on #_, the $_ that they provide you is also an alias to the original subroutine arguments. Such as:
sub trim {s!^\s+!!, s!\s+$!! for #_} # in place trimming of white space
Lastly all of this behavior is nestable, so when using #_ (or a slice of it) in the argument list of another subroutine, it also gets aliases to the first subroutine's arguments:
sub add_1 {$_[0] += 1}
sub add_2 {
add_1(#_) for 1 .. 2;
}
This is all documented in detail in perldoc perlsub. For example:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The
array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the
corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the
function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the
element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
Perl passes arguments by reference, not by value. See http://www.troubleshooters.com/codecorn/littperl/perlsub.htm

What does '#_' do in Perl?

I was glancing through some code I had written in my Perl class and I noticed this.
my ($string) = #_;
my #stringarray = split(//, $string);
I am wondering two things:
The first line where the variable is in parenthesis, this is something you do when declaring more than one variable and if I removed them it would still work right?
The second question would be what does the #_ do?
The #_ variable is an array that contains all the parameters passed into a subroutine.
The parentheses around the $string variable are absolutely necessary. They designate that you are assigning variables from an array. Without them, the #_ array is assigned to $string in a scalar context, which means that $string would be equal to the number of parameters passed into the subroutine. For example:
sub foo {
my $bar = #_;
print $bar;
}
foo('bar');
The output here is 1--definitely not what you are expecting in this case.
Alternatively, you could assign the $string variable without using the #_ array and using the shift function instead:
sub foo {
my $bar = shift;
print $bar;
}
Using one method over the other is quite a matter of taste. I asked this very question which you can check out if you are interested.
When you encounter a special (or punctuation) variable in Perl, check out the perlvar documentation. It lists them all, gives you an English equivalent, and tells you what it does.
Perl has two different contexts, scalar context, and list context. An array '#_', if used in scalar context returns the size of the array.
So given these two examples, the first one gives you the size of the #_ array, and the other gives you the first element.
my $string = #_ ;
my ($string) = #_ ;
Perl has three 'Default' variables $_, #_, and depending on who you ask %_. Many operations will use these variables, if you don't give them a variable to work on. The only exception is there is no operation that currently will by default use %_.
For example we have push, pop, shift, and unshift, that all will accept an array as the first parameter.
If you don't give them a parameter, they will use the 'default' variable instead. So 'shift;' is the same as 'shift #_;'
The way that subroutines were designed, you couldn't formally tell the compiler which values you wanted in which variables. Well it made sense to just use the 'default' array variable '#_' to hold the arguments.
So these three subroutines are (nearly) identical.
sub myjoin{
my ( $stringl, $stringr ) = #_;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift;
my $stringr = shift;
return "$stringl$stringr";
}
sub myjoin{
my $stringl = shift #_;
my $stringr = shift #_;
return "$stringl$stringr";
}
I think the first one is slightly faster than the other two, because you aren't modifying the #_ variable.
The variable #_ is an array (hence the # prefix) that holds all of the parameters to the current function.