Array in scalar context in Perl - perl

In Perl, if you will assign an array to a scalar, you will get length of that array.
Example:
my #arr = (1,2,3,4);
my $x = #arr;
Now if you check the content of $x, you will find that $x contains the length of #arr array.
I want to know the reason why Perl does so. What is the reason behind it? I try at my level but could not find any good reason. So, can someone help me understand the reason behind the scene which is taking place?

Perl uses contexts where other languages use functions to get some info or convert the value between different types. The concept of context is thoroughly explained in perldata manual page (man perldata).
In Perl, the same data look differently under different context. An array looks like a list of its elements in list context, while it looks like number of its elements in scalar context.
How else could it possibly look in scalar context?
It could be the fist element of the array. This can be done with my ($x) = #arr; or my $x = shift #arr; or my $x = $arr[0];.
It could be the last element of the array. This can be done with my ($x) = reverse #arr; or my $x = pop #arr; or my $x = $arr[-1];.
I cannot think of any other reasonable way to make a scalar from an array. Obviously using array length as its scalar value is better than these two, because it is somewhat global property of the array, while these two are fairly local. And it is also very logical when you look at typical use of array in scalar context:
die "Not enough arguments" if #ARGV < 5;
You can read < quite naturally as “is smaller than”.

Related

Round brackets enclosing private variables. Why used in this case?

I am reading Learning Perl 6th edition, and the subroutines chapter has this code:
foreach (1..10) {
my($square) = $_ * $_; # private variable in this loop
print "$_ squared is $square.\n";
}
Now I understand that the list syntax, ie the brackets, are used to distinguish between list context and scalar context as in:
my($num) = #_; # list context, same as ($num) = #_;
my $num = #_; # scalar context, same as $num = #_;
But in the foreach loop case I can't see how a list context is appropriate.
And I can change the code to be:
foreach (1..10) {
my $square = $_ * $_; # private variable in this loop
print "$_ squared is $square.\n";
}
And it works exactly the same. So why did the author use my($square) when a simple my $square could have been used instead?
Is there any difference in this case?
Certainly in this case, the brackets aren't necessary. They're not strictly wrong in the sense that they do do what the author intends. As with so much in Perl, there's more than one way to do it.
So there's the underlying question: why did the author choose to do this this way? I wondered at first whether it was the author's preferred style: perhaps he chose always to put his lists of new variables in brackets simply so that something like:
my ($count) = 4;
where the brackets aren't doing anything helpful, at least looked consistent with something like:
my ($min, $max) = (2, 3);
But looking at the whole book, I can't find a single example of this use of brackets for a single value other than the section you referenced. As one example of many, the m// in List Context section in Chapter 9 contains a variety of different uses of my with assignments, but does not use brackets with any single values.
I'm left with the conclusion that as the author introduced my in subroutines with my($m, $n); he tried to vary the syntax as little as possible the next time he used it, ending up with my($max_so_far) and then tried to explain scalar and list contexts, as you quoted above. I'm not sure this is terribly helpful.
TL;DR It's not necessary, although it's not actually wrong. Probably a good idea to avoid this style in your code.
You're quite correct. It's redundant. It doesn't make any difference in this case, because you're effectively forcing a list context to list context operation.
E.g.
my ( $square ) = ( $_ * $_ );
Which also produces the same result. So - in this case, doesn't matter. But is generally speaking not good coding style.

Perl operators that modify inputs in-place

I recently took a Perl test and one of the questions was to find all the Perl operations that can be used to modify their inputs in-place. The options were
sort
map
do
grep
eval
I don't think any of these can modify the inputs in-place. Am I missing anything here or is the question wrong?
Try this:
my #array = qw(1 2 3 4);
print "#array\n";
my #new_array = map ++$_, #array;
print "#new_array\n";
print "#array\n"; # oops, we modified this in-place
grep is similar. For sort, the $a and $b variables are aliases back to the original array, so can also be used to modify it. The result is somewhat unpredictable, depending on what sorting algorithm Perl is using (which has historically changed in different versions of Perl, though hasn't changed in a while).
my #arr = qw(1 2 3 4 5);
my #new = sort { ++$a } #arr;
print "#arr\n";
do and eval can take an arbitrary code block, so can obviously modify any non-readonly variable, though it's not clear whether that counts as modifying inputs in place. Slade's example using the stringy form of eval should certainly count though.
I'm assuming the question is testing to see if the student knows to properly use the return values of sort, map, and so on instead of using them in void context and expecting side effects. It's totally possible to modify the parameters given, though.
map and grep alias $_ to each element, so modifying $_ will change the values of the variables in the list passed to it (assuming they're not constants or literals).
eval EXPR and do EXPR can do anything, more or less, so there's nothing stopping you from doing something like:
my $code = q($code = 'modified');
eval $code;
say $code;
The arguments to do BLOCK and eval BLOCK are always a literal block of code, which aren't valid lvalues in any way I know of.
sort has a special optimization when called like #array = sort { $a <=> $b } #array;. If you look at the opcodes generated by this with B::Concise, you'll see something like:
9 <#> sort lK/INPLACE,NUM
But for a question about the language semantics, an implementation detail is irrelevant.

Is there any advantage to using keys #array instead of 0 .. $#array?

I was quite surprised to find that the keys function happily works with arrays:
keys HASH
keys ARRAY
keys EXPR
Returns a list consisting of all the keys of the named hash, or the
indices of an array. (In scalar context, returns the number of keys or
indices.)
Is there any benefit in using keys #array instead of 0 .. $#array with respect to memory usage, speed, etc., or are the reasons for this functionality more of a historic origin?
Seeing that keys #array holds up to $[ modification, I'm guessing it's historic :
$ perl -Mstrict -wE 'local $[=4; my #array="a".."z"; say join ",", keys #array;'
Use of assignment to $[ is deprecated at -e line 1.
4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
Mark has it partly right, I think. What he's missing is that each now works on an array, and, like the each with hashes, each with arrays returns two items on each call. Where each %hash returns key and value, each #array also returns key (index) and value.
while (my ($idx, $val) = each #array)
{
if ($idx > 0 && $array[$idx-1] eq $val)
{
print "Duplicate indexes: ", $idx-1, "/", $idx, "\n";
}
}
Thanks to Zaid for asking, and jmcnamara for bringing it up on perlmonks' CB. I didn't see this before - I've often looped through an array and wanted to know what index I'm at. This is waaaay better than manually manipulating some $i variable created outside of a loop and incremented inside, as I expect that continue, redo, etc., will survive this better.
So, because we can now use each on arrays, we need to be able to reset that iterator, and thus we have keys.
The link you provided actually has one important reason you might use/not use keys:
As a side effect, calling keys() resets the internal interator of the HASH or ARRAY (see each). In particular, calling keys() in void context resets the iterator with no other overhead.
That would cause each to reset to the beginning of the array. Using keys and each with arrays might be important if they ever natively support sparse arrays as a real data-type.
All that said, with so many array-aware language constructs like foreach and join in perl, I can't remember the last time I used 0..$#array.
I actually think you've answered your own question: it returns the valid indices of the array, no matter what value you've set for $[. So from a generality point of view (especially for library usage), it's more preferred.
The version of Perl I have (5.10.1) doesn't support using keys with arrays, so it can't be for historic reasons.
Well in your example, you are putting them in a list; So, in a list context
keys #array will be replaced with all elements of array
whereas 0 .. $#array will do the same but as array slicing; So, instead $array[0 .. $#array] you can also mention $array[0 .. (some specific index)]

What is the meaning of #_ in Perl?

What is the meaning of #_ in Perl?
perldoc perlvar is the first place to check for any special-named Perl variable info.
Quoting:
#_: Within a subroutine the array #_ contains the parameters passed to that subroutine.
More details can be found in perldoc perlsub (Perl subroutines) linked from the perlvar:
Any arguments passed in show up in the
array #_ .
Therefore, if you called a function with two arguments, those
would be stored in $_[0] and $_[1].
The array #_ is a local array, but its
elements are aliases for the actual scalar parameters.
In particular, if
an element $_[0] is updated, the
corresponding argument is updated (or
an error occurs if it is not
updatable).
If an argument is an array
or hash element which did not exist
when the function was called, that
element is created only when (and if)
it is modified or a reference to it is
taken. (Some earlier versions of Perl
created the element whether or not the
element was assigned to.) Assigning to
the whole array #_ removes that
aliasing, and does not update any
arguments.
Usually, you expand the parameters passed to a sub using the #_ variable:
sub test{
my ($a, $b, $c) = #_;
...
}
# call the test sub with the parameters
test('alice', 'bob', 'charlie');
That's the way claimed to be correct by perlcritic.
First hit of a search for perl #_ says this:
#_ is the list of incoming parameters to a sub.
It also has a longer and more detailed explanation of the same.
The question was what #_ means in Perl. The answer to that question is that, insofar as $_ means it in Perl, #_ similarly means they.
No one seems to have mentioned this critical aspect of its meaning — as well as theirs.
They’re consequently both used as pronouns, or sometimes as topicalizers.
They typically have nominal antecedents, although not always.
You can also use shift for individual variables in most cases:
$var1 = shift;
This is a topic in which you should research further as Perl has a number of interesting ways of accessing outside information inside your sub routine.
All Perl's "special variables" are listed in the perlvar documentation page.
Also if a function returns an array, but the function is called without assigning its returned data to any variable like below. Here split() is called, but it is not assigned to any variable. We can access its returned data later through #_:
$str = "Mr.Bond|Chewbaaka|Spider-Man";
split(/\|/, $str);
print #_[0]; # 'Mr.Bond'
This will split the string $str and set the array #_.
# is used for an array.
In a subroutine or when you call a function in Perl, you may pass the parameter list. In that case, #_ is can be used to pass the parameter list to the function:
sub Average{
# Get total number of arguments passed.
$n = scalar(#_);
$sum = 0;
foreach $item (#_){
# foreach is like for loop... It will access every
# array element by an iterator
$sum += $item;
}
$average = $sum / $n;
print "Average for the given numbers: $average\n";
}
Function call
Average(10, 20, 30);
If you observe the above code, see the foreach $item(#_) line... Here it passes the input parameter.
Never try to edit to #_ variable!!!! They must be not touched.. Or you get some unsuspected effect. For example...
my $size=1234;
sub sub1{
$_[0]=500;
}
sub1 $size;
Before call sub1 $size contain 1234. But after 500(!!) So you Don't edit this value!!! You may pass two or more values and change them in subroutine and all of them will be changed! I've never seen this effect described. Programs I've seen also leave #_ array readonly. And only that you may safely pass variable don't changed internal subroutine
You must always do that:
sub sub2{
my #m=#_;
....
}
assign #_ to local subroutine procedure variables and next worked with them.
Also in some deep recursive algorithms that returun array you may use this approach to reduce memory used for local vars. Only if return #_ array the same.

Why do you need $ when accessing array and hash elements in Perl?

Since arrays and hashes can only contain scalars in Perl, why do you have to use the $ to tell the interpreter that the value is a scalar when accessing array or hash elements? In other words, assuming you have an array #myarray and a hash %myhash, why do you need to do:
$x = $myarray[1];
$y = $myhash{'foo'};
instead of just doing :
$x = myarray[1];
$y = myhash{'foo'};
Why are the above ambiguous?
Wouldn't it be illegal Perl code if it was anything but a $ in that place? For example, aren't all of the following illegal in Perl?
#var[0];
#var{'key'};
%var[0];
%var{'key'};
I've just used
my $x = myarray[1];
in a program and, to my surprise, here's what happened when I ran it:
$ perl foo.pl
Flying Butt Monkeys!
That's because the whole program looks like this:
$ cat foo.pl
#!/usr/bin/env perl
use strict;
use warnings;
sub myarray {
print "Flying Butt Monkeys!\n";
}
my $x = myarray[1];
So myarray calls a subroutine passing it a reference to an anonymous array containing a single element, 1.
That's another reason you need the sigil on an array access.
Slices aren't illegal:
#slice = #myarray[1, 2, 5];
#slice = #myhash{qw/foo bar baz/};
And I suspect that's part of the reason why you need to specify if you want to get a single value out of the hash/array or not.
The sigil give you the return type of the container. So if something starts with #, you know that it returns a list. If it starts with $, it returns a scalar.
Now if there is only an identifier after the sigil (like $foo or #foo, then it's a simple variable access. If it's followed by a [, it is an access on an array, if it's followed by a {, it's an access on a hash.
# variables
$foo
#foo
# accesses
$stuff{blubb} # accesses %stuff, returns a scalar
#stuff{#list} # accesses %stuff, returns an array
$stuff[blubb] # accesses #stuff, returns a scalar
# (and calls the blubb() function)
#stuff[blubb] # accesses #stuff, returns an array
Some human languages have very similar concepts.
However many programmers found that confusing, so Perl 6 uses an invariant sigil.
In general the Perl 5 compiler wants to know at compile time if something is in list or in scalar context, so without the leading sigil some terms would become ambiguous.
This is valid Perl: #var[0]. It is an array slice of length one. #var[0,1] would be an array slice of length two.
#var['key'] is not valid Perl because arrays can only be indexed by numbers, and
the other two (%var[0] and %var['key']) are not valid Perl because hash slices use the {} to index the hash.
#var{'key'} and #var{0} are both valid hash slices, though. Obviously it isn't normal to take slices of length one, but it is certainly valid.
See the slice section of perldata perldocfor more information about slicing in Perl.
People have already pointed out that you can have slices and contexts, but sigils are there to separate the things that are variables from everything else. You don't have to know all of the keywords or subroutine names to choose a sensible variable name. It's one of the big things I miss about Perl in other languages.
I can think of one way that
$x = myarray[1];
is ambiguous - what if you wanted a array called m?
$x = m[1];
How can you tell that apart from a regex match?
In other words, the syntax is there to help the Perl interpreter, well, interpret!
In Perl 5 (to be changed in Perl 6) a sigil indicates the context of your expression.
You want a particular scalar out of a hash so it's $hash{key}.
You want the value of a particular slot out of an array, so it's $array[0].
However, as pointed out by zigdon, slices are legal. They interpret the those expressions in a list context.
You want a lists of 1 value in a hash #hash{key} works
But also larger lists work as well, like #hash{qw<key1 key2 ... key_n>}.
You want a couple of slots out of an array #array[0,3,5..7,$n..$n+5] works
#array[0] is a list of size 1.
There is no "hash context", so neither %hash{#keys} nor %hash{key} has meaning.
So you have "#" + "array[0]" <=> < sigil = context > + < indexing expression > as the complete expression.
The sigil provides the context for the access:
$ means scalar context (a scalar
variable or a single element of a hash or an array)
# means list context (a whole array or a slice of
a hash or an array)
% is an entire hash
In Perl 5 you need the sigils ($ and #) because the default interpretation of bareword identifier is that of a subroutine call (thus eliminating the need to use & in most cases ).