Perl: Meaning of double squared brackets after a string? - perl

I don't understand a seemingly basic piece of code in Perl, which looks like this:
$line[$k][1]
What is the meaning of the double squared brackets?
I'm sorry if this was already asked or is so basic it can be found in every beginners book for Perl. I couldn't find it anywhere

It means you're working with a two dimensional array.
#!/usr/bin/env perl
use strict;
use warnings;
my #stuff = (
[ 1, 2, 3, 4 ],
[ 5, 6, 7, 8 ],
);
print $stuff[1][2];
#prints '7'

It means that what you have there is not "a string". It's an array called #line and that every element in #line is a reference to another array.
When you access a single element in a Perl array, the sigil changes from # (which implies multiple values) to $ (which implies a single value). So to look up the element with index $k in an array called #line, you use:
$line[$k]
But in your example, $line[$k] contains a reference to another array. To get from an array reference to one of the elements of the referenced array, we use the ->[...] syntax. So the second element of the array referenced by the $kth element of #line is given by:
$line[$k]->[1];
And in Perl, we have a rule that when two sets of array (or hash) look-up brackets are separated by just a dereferencing array, we can omit that arrow. So my previous example can be simplified to:
$line[$k][1];

A [..] is an array index. If you've got 2 of them, that means it's an array of arrays. In your example you're getting the 2nd element (indexes start at 0) of the $kth element of #line.
That you think it's a string is possibly a sign the code isn't very well written as there should be a line saying something along the lines of my #line;
Make sure the code has use strict; and use warnings; at the top and that should throw up any problems with the code.

Related

Perl assign array elements to hash user defined key

Below is code in which I need help.
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #arrayElements = ('Array Functions');
print join(", ", #arrayElements);
### Output => Array Functions
my %hashElements = ();
I want to assign the content of #arrayElements to $hashElements{Item}
Missing some core concepts or trying wrong and been a while struggling with this.
You seem to be missing some core concepts of Perl (or programming in general). If you are learning Perl through a book or online tutorial, I suggest you re-read the chapters on arrays and hashes.
Let's look at the things involved here. You have:
#arrayElements, which is an array. It contains a list with one elements, the string 'Array Functions'.
%hashElements, which is a hash. It's empty.
$hashElements{Item}, which is a scalar value. You want to set this.
You say you want $hashElements{Item} to have the value 'Array Functions', which you have as the first element in your array #arrayElements.
$hashElements{Item} = $arrayElements[0];
And that's it. Both $hashElements{Item} and $arrayElements[0] are scalar values. That's why their sigils (the sign at the front) changes from an # (for array) or % (for hash) to a $. You can distinguish whether the value came from a hash or an array by the brackets used to access the elements. [] is for arrays, and {} is for hashes.
You cannot do the following though.
$hashElements{Item} = #arrayElements;
Because $hashElements{Item} is a scalar, the thing on the right hand side of the assignment will be treated in scalar context. An array in scalar context gets converted to the number of elements in the array, so this would assign 1. That's not what you want.
You should really read up more about this, and also pick better names for your variables. Your example is very confusing. In general, we don't do $CamelCase for variable names in Perl, but instead use $snake_case, which is easier to read and type.
Take a look at the following resources to learn more about the concepts I've mentioned above.
Perl Maven, perldata, perldsc

Index of an element in an array in Perl

I am trying to find a way to get an index of an element in an array that partially matches a certain patten.
Let's say I have an array with values
Maria likes tomatoes,
Sonia likes plums,
Andrew likes oranges
If my search term is plums, I will get 1 returned as index.
Thank you!
Quick search didn't find a dupe, but I'm sure there is one. Meanwhile:
To find elements of an array that meet a certain condition, you use grep. If you want the indexes instead of the elements.. well, Perl 6 added a grep-index method to handle that case, but in Perl 5 the easiest way is to change the target of grep. That is, instead of running it on the original array, run it on a list of indexes - just with a condition that references the original array. In your case, that might look like this:
my #array = ( 'Maria likes tomatoes',
'Sonia likes plums',
'Andrew likes oranges');
grep { $array[$_] =~ /plums/ } 0..$#array; # 1
Relevant bits:
$#array returns the index of the last element of #array.
m..n generates a range of values between m and n (inclusive); in list context that becomes a list of those values.
grep { code } list returns the elements of list for which code produces a true value when the special variable $_ is set to the element.
These sorts of expressions read most easily from right to left. So, first we generate a list of all the indexes of the original array (0..$#array), then we use grep to test each index (represented by $_) to see if the corresponding element of #array ($array[$_]) matches (~=) the regular expression /plums/.
If it does, that index is included in the list returned by the grep; if not, it's left out. So the end result is a list of only those indexes for which the condition is true. In this case, that list contains only the value 1.
Added to reply to your comment: It's important to note that the return value of grep is normally a list of matching elements, even if there is only one match. If you assign the result to an array (e.g. with my #indexes = grep...), the array will contain all the matching values. However, grep is context-sensitive, and if you call it in scalar context (e.g. by assigning its return value to a scalar variable with something like my $count = grep...), you'll instead only get a number telling you how many matches there were. You might want to take a look at this tutorial on context sensitivity in Perl.
This is what firstidx from List::MoreUtils is for.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::MoreUtils 'firstidx';
my #array = ('Maria likes tomatoes',
'Sonia likes plums',
'Andrew likes oranges');
say firstidx { /plums/ } #array;
Update: I see that draegtun has answered your comment about getting multiple indexes. But I wonder why you couldn't just browse the List::MoreUtils documentation to see if there was a useful-looking function in there.

Initialize multidimensional array

I want to make an multidimensional array, but as i will declare it, i don't know how many element it will have, so I tried this:
my #multarray = [ ][ ];
Is it good?
Perl isn't C, and you don't need to initialise any sort of array. Just my #multarray is fine.
Observe
use strict;
use warnings;
my #data;
$data[2][2] = 99;
print $data[2][2], "\n";
The section in perldoc perlol on Declaration and Access of Arrays of Arrays will be of help here.
output
99
Perl doesn't support multidimensional arrays directly; an array is always just a sequence of scalars. But you can create references to arrays, and store the references in arrays, and that allows you to simulate multidimensional arrays.
The square bracket notation is useful here to create an array literal and return a reference to it. For example, the following creates an array of ('a','b',1234) and returns a reference to it:
my $ref_array = ['a','b',1234];
Here's an example of how to create (what we might call) a multidimensional array literal, of dimension 3x2:
my $multarray = [['a','b'],['c','d'],['e','f']];
So you could access the 'c' (for example) with:
print($multarray->[1]->[0]);
Now, you say you don't know what the final dimensions will be. That's fine; in Perl, you can conceptually view arrays as being infinite in size; all elements that haven't been assigned yet will be returned as undef, whether or not their index is less than or equal to the greatest index that has thus far been assigned. So here's what you can do:
my $multarray = [];
Now you can read and write directly to any element at any index and at any depth level:
$multarray->[23]->[19234]->[3] = 'a';
print($multarray->[23]->[19234]->[3]); ## prints 'a'
The syntax I've been using is the "explicit" syntax, where you explicitly create and dereference all the array references that are being manipulated. But for convenience, Perl allows you to omit the dereference tokens when you bracket-index an array reference, except when dereferencing the top-level array reference, which must always be explicitly dereferenced with the -> token:
$multarray->[23][19234][3] = 'a';
print($multarray->[23][19234][3]); ## prints 'a'
Finally, if you prefer to work with array variables (unlike me; I actually prefer to work with scalar references all the time), you can work with a top-level array variable instead of an array reference, and in that case you can escape having to use the dereference token entirely:
my #multarray;
$multarray[23][19234][3] = 'a';
print($multarray[23][19234][3]); ## prints 'a'
But even if you use that somewhat deceptively concise syntax, it's good to have an understanding of what's going on behind-the-scenes; it's still a nested structure of array references.

Perl: Anonymous Multi-Dimensional Arrays

NOTE: See the end of this post for final explanation.
This is probably a very basic question, but I'm still trying to master a few of the fundamentals regarding references in Perl, and came across something in the perldsc page that I'd like to confirm. The following code is in the Generating Array of Arrays section:
while ( <> ) {
push #AoA, [ split ];
}
Obviously, the <> operation in the while loop reads one line of input in at a time. I am assuming at this point that line is then put into an anonymous array via the [ ] brackets, we'll call this #zero. Then the split command places everything in a given line separated by whitespace within the array (e.g., the first word is assigned to $zero[0], the second to $zero[1] and so on). The scalar reference of #zero is then pushed onto #AoA.
The next line of input is passed via the <> operator and gets assigned to a completely new anonymous array (e.g. #one), and its scalar reference is pushed onto #AoA.
Once #AoA is populated, I could then access its contents with a nested foreach loop; the first iterating through the "rows" (e.g. for $row (#AoA)), and a second, inner loop, foreach to access the columns of that particular row.
The latter (accessing said "columns" would be done by dereferencing (e.g., for $column (#$row)) the particular $row being read by the previous, "outer" foreach loop.
Is my understanding correct? I'm assuming you could still access any element of the #AoA just as you would if it were assigned vs. being anonymous? That is $element = $AoA[8][1]; .
I'm want to verify my thought process here. Is the automatic declaration of a unique, anonymous array each time through the loop part of the autovivication in Perl? I guess that is what is throwing me off a bit. Thanks.
EDIT: Based on the comments below my understanding regarding the anonymous array is still unclear, so I want to take a shot at one more description to see if it meets everyone's understanding.
Starting with the push #AoA, [split]; statement, split takes in the line from $_ and returns a list parsed by whitepace. That list is captured by [ ], which then returns an array reference. That array reference (created by [ ]) is then pushed onto #AoA. Is this accurate re: [ ]? The next step (dereferencing / use of #AoA) was covered very well by #krico below.
FINAL ANSWER/EXPLANATION: Based on all of the comments / feedback here, some further research on my part, and testing it seems my understanding was correct. I'll break it down here, so others can easily reference it later. See #krico's response below for a more explicit code representation that follows the steps outlined here.
while ( <> ) {
push #AoA, [ split ];
}
One line of input is passed at a time to the <> operator
The split function takes that line in via $_ and parses it based on whitespace (the default).
split then returns a LIST.
The [ ] is an anonymous array that provides the perl data structure for the List passed by split.
The push #AoA pushes the reference to the anonymous array onto its queue as element $AoA[0] (the second anonymous array reference will be put into $AoA1, etc...).
This continues through the entire input file. Once completed, #AoA is a 2D array, holding reference values (scalar values) to each of the previously generated anonymous arrays.
From this point #AoA can be dereferenced appropriately to work with the underlying/reference elements taken in from the input file. The default dereferencing technique is CIRCUMFIX (see perlfef below); however as of 5.19 a new method of dereferencing is available and will be released in 5.20, POSTFIX. Articles are linked below.
References: Perl References Documentation, Perl References Tutorial, Perl References Question noted by #Eli Hubert, Mike Friedman's blog post about differences between arrays and lists, Upcoming Postfix dereferencing in Perl, and Postfix dereferencing Article
This is what is going on:
The <> will put the line into the default variable $_
The split function will read $_ and return an array
The [ ] brackets will return a scalar, in it there will be a reference to that array
That reference is then pushed into the #AoA array
When you do $AoA[8][2] you are implicitly dereferencing the scalar. It's the same as $AoA[8]->[2].
The same code a little more readeable and you should understand it.
my $line;
while ( $line = <STDIN> ) {
my #parts = split $line;
my $partsRef = \#parts;
push #AoA, $partsRef;
}
Now, if you wanted to print the 2nd part of the 5th line you could say.
my $ref = #AoA[4];
my #parts = #$ref;
print $parts[1];
Get it?

Why do you need $ when accessing array and hash elements in Perl?

Since arrays and hashes can only contain scalars in Perl, why do you have to use the $ to tell the interpreter that the value is a scalar when accessing array or hash elements? In other words, assuming you have an array #myarray and a hash %myhash, why do you need to do:
$x = $myarray[1];
$y = $myhash{'foo'};
instead of just doing :
$x = myarray[1];
$y = myhash{'foo'};
Why are the above ambiguous?
Wouldn't it be illegal Perl code if it was anything but a $ in that place? For example, aren't all of the following illegal in Perl?
#var[0];
#var{'key'};
%var[0];
%var{'key'};
I've just used
my $x = myarray[1];
in a program and, to my surprise, here's what happened when I ran it:
$ perl foo.pl
Flying Butt Monkeys!
That's because the whole program looks like this:
$ cat foo.pl
#!/usr/bin/env perl
use strict;
use warnings;
sub myarray {
print "Flying Butt Monkeys!\n";
}
my $x = myarray[1];
So myarray calls a subroutine passing it a reference to an anonymous array containing a single element, 1.
That's another reason you need the sigil on an array access.
Slices aren't illegal:
#slice = #myarray[1, 2, 5];
#slice = #myhash{qw/foo bar baz/};
And I suspect that's part of the reason why you need to specify if you want to get a single value out of the hash/array or not.
The sigil give you the return type of the container. So if something starts with #, you know that it returns a list. If it starts with $, it returns a scalar.
Now if there is only an identifier after the sigil (like $foo or #foo, then it's a simple variable access. If it's followed by a [, it is an access on an array, if it's followed by a {, it's an access on a hash.
# variables
$foo
#foo
# accesses
$stuff{blubb} # accesses %stuff, returns a scalar
#stuff{#list} # accesses %stuff, returns an array
$stuff[blubb] # accesses #stuff, returns a scalar
# (and calls the blubb() function)
#stuff[blubb] # accesses #stuff, returns an array
Some human languages have very similar concepts.
However many programmers found that confusing, so Perl 6 uses an invariant sigil.
In general the Perl 5 compiler wants to know at compile time if something is in list or in scalar context, so without the leading sigil some terms would become ambiguous.
This is valid Perl: #var[0]. It is an array slice of length one. #var[0,1] would be an array slice of length two.
#var['key'] is not valid Perl because arrays can only be indexed by numbers, and
the other two (%var[0] and %var['key']) are not valid Perl because hash slices use the {} to index the hash.
#var{'key'} and #var{0} are both valid hash slices, though. Obviously it isn't normal to take slices of length one, but it is certainly valid.
See the slice section of perldata perldocfor more information about slicing in Perl.
People have already pointed out that you can have slices and contexts, but sigils are there to separate the things that are variables from everything else. You don't have to know all of the keywords or subroutine names to choose a sensible variable name. It's one of the big things I miss about Perl in other languages.
I can think of one way that
$x = myarray[1];
is ambiguous - what if you wanted a array called m?
$x = m[1];
How can you tell that apart from a regex match?
In other words, the syntax is there to help the Perl interpreter, well, interpret!
In Perl 5 (to be changed in Perl 6) a sigil indicates the context of your expression.
You want a particular scalar out of a hash so it's $hash{key}.
You want the value of a particular slot out of an array, so it's $array[0].
However, as pointed out by zigdon, slices are legal. They interpret the those expressions in a list context.
You want a lists of 1 value in a hash #hash{key} works
But also larger lists work as well, like #hash{qw<key1 key2 ... key_n>}.
You want a couple of slots out of an array #array[0,3,5..7,$n..$n+5] works
#array[0] is a list of size 1.
There is no "hash context", so neither %hash{#keys} nor %hash{key} has meaning.
So you have "#" + "array[0]" <=> < sigil = context > + < indexing expression > as the complete expression.
The sigil provides the context for the access:
$ means scalar context (a scalar
variable or a single element of a hash or an array)
# means list context (a whole array or a slice of
a hash or an array)
% is an entire hash
In Perl 5 you need the sigils ($ and #) because the default interpretation of bareword identifier is that of a subroutine call (thus eliminating the need to use & in most cases ).