Index of an element in an array in Perl - perl

I am trying to find a way to get an index of an element in an array that partially matches a certain patten.
Let's say I have an array with values
Maria likes tomatoes,
Sonia likes plums,
Andrew likes oranges
If my search term is plums, I will get 1 returned as index.
Thank you!

Quick search didn't find a dupe, but I'm sure there is one. Meanwhile:
To find elements of an array that meet a certain condition, you use grep. If you want the indexes instead of the elements.. well, Perl 6 added a grep-index method to handle that case, but in Perl 5 the easiest way is to change the target of grep. That is, instead of running it on the original array, run it on a list of indexes - just with a condition that references the original array. In your case, that might look like this:
my #array = ( 'Maria likes tomatoes',
'Sonia likes plums',
'Andrew likes oranges');
grep { $array[$_] =~ /plums/ } 0..$#array; # 1
Relevant bits:
$#array returns the index of the last element of #array.
m..n generates a range of values between m and n (inclusive); in list context that becomes a list of those values.
grep { code } list returns the elements of list for which code produces a true value when the special variable $_ is set to the element.
These sorts of expressions read most easily from right to left. So, first we generate a list of all the indexes of the original array (0..$#array), then we use grep to test each index (represented by $_) to see if the corresponding element of #array ($array[$_]) matches (~=) the regular expression /plums/.
If it does, that index is included in the list returned by the grep; if not, it's left out. So the end result is a list of only those indexes for which the condition is true. In this case, that list contains only the value 1.
Added to reply to your comment: It's important to note that the return value of grep is normally a list of matching elements, even if there is only one match. If you assign the result to an array (e.g. with my #indexes = grep...), the array will contain all the matching values. However, grep is context-sensitive, and if you call it in scalar context (e.g. by assigning its return value to a scalar variable with something like my $count = grep...), you'll instead only get a number telling you how many matches there were. You might want to take a look at this tutorial on context sensitivity in Perl.

This is what firstidx from List::MoreUtils is for.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::MoreUtils 'firstidx';
my #array = ('Maria likes tomatoes',
'Sonia likes plums',
'Andrew likes oranges');
say firstidx { /plums/ } #array;
Update: I see that draegtun has answered your comment about getting multiple indexes. But I wonder why you couldn't just browse the List::MoreUtils documentation to see if there was a useful-looking function in there.

Related

Perl: Meaning of double squared brackets after a string?

I don't understand a seemingly basic piece of code in Perl, which looks like this:
$line[$k][1]
What is the meaning of the double squared brackets?
I'm sorry if this was already asked or is so basic it can be found in every beginners book for Perl. I couldn't find it anywhere
It means you're working with a two dimensional array.
#!/usr/bin/env perl
use strict;
use warnings;
my #stuff = (
[ 1, 2, 3, 4 ],
[ 5, 6, 7, 8 ],
);
print $stuff[1][2];
#prints '7'
It means that what you have there is not "a string". It's an array called #line and that every element in #line is a reference to another array.
When you access a single element in a Perl array, the sigil changes from # (which implies multiple values) to $ (which implies a single value). So to look up the element with index $k in an array called #line, you use:
$line[$k]
But in your example, $line[$k] contains a reference to another array. To get from an array reference to one of the elements of the referenced array, we use the ->[...] syntax. So the second element of the array referenced by the $kth element of #line is given by:
$line[$k]->[1];
And in Perl, we have a rule that when two sets of array (or hash) look-up brackets are separated by just a dereferencing array, we can omit that arrow. So my previous example can be simplified to:
$line[$k][1];
A [..] is an array index. If you've got 2 of them, that means it's an array of arrays. In your example you're getting the 2nd element (indexes start at 0) of the $kth element of #line.
That you think it's a string is possibly a sign the code isn't very well written as there should be a line saying something along the lines of my #line;
Make sure the code has use strict; and use warnings; at the top and that should throw up any problems with the code.

map documentation not clear enough

I've tried to understand the map function by reading its documentation to no avail.
In the documentation it says "Evaluates the BLOCK or EXPR for each element of LIST"
However, how is one to know that one can also use file test operators as well as shown below?
map { [$_, -s] } ('perl.c', 'sv.c', 'hv.c', 'av.c');
Source of the above code is: http://www.stllinux.org/meeting_notes/1997/0918/schwtr.html
So basically, the result will be a hash of files along with its file size but how on earth was I supposed to know about this from the documentation alone?
Can you guys help me out to understand more?
Actually, it says
map BLOCK LIST
Evaluates the BLOCK or EXPR for each element of LIST (locally setting
$_ to each element) and returns the list value composed of the results
of each such evaluation. In scalar context, returns the total number
of elements so generated. Evaluates BLOCK or EXPR in list context, so
each element of LIST may produce zero, one, or more elements in the
returned value.
The important part is that $_ is localized to the BLOCK, containing the value of each element of the LIST. Much the same is true for a for loop, i.e. for (LIST).
The -s function is as you say a file test, and without explicit argument it operates on $_. This is the same default behaviour that many of Perl's built-in functions have, for example print, unpack, ord, length.
The code you are showing contains a single scalar expression: [$_, -s], which is an array ref containing the file name inside $_ and as you say, its size.
So, basically, what you are seeing here is basic Perl techniques. If there is anything that is still not clear, feel free to ask.
Update:
As for what this code in specific does, it is probably part of a Schwartzian transform, whereby you perform a more efficient sort on a list, where the sort criteria consists of an expensive operation. For example:
my #files = ('perl.c', 'sv.c', 'hv.c', 'av.c');
my #sorted = sort { -s $a <=> -s $b } #files; # sorting by file size
For a small list, this will not matter much, but with a larger list, it might not be very efficient to run file tests multiple times, so instead we cache the test result in an array ref:
my #sorted = map $_->[0], # restore original value
sort { $a->[1] <=> $b->[1] } # perform sort on element #2
map { [ $_, -s ] } #files; # your map statement
And this is then called a Schwartzian transform.

Is there any advantage to using keys #array instead of 0 .. $#array?

I was quite surprised to find that the keys function happily works with arrays:
keys HASH
keys ARRAY
keys EXPR
Returns a list consisting of all the keys of the named hash, or the
indices of an array. (In scalar context, returns the number of keys or
indices.)
Is there any benefit in using keys #array instead of 0 .. $#array with respect to memory usage, speed, etc., or are the reasons for this functionality more of a historic origin?
Seeing that keys #array holds up to $[ modification, I'm guessing it's historic :
$ perl -Mstrict -wE 'local $[=4; my #array="a".."z"; say join ",", keys #array;'
Use of assignment to $[ is deprecated at -e line 1.
4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
Mark has it partly right, I think. What he's missing is that each now works on an array, and, like the each with hashes, each with arrays returns two items on each call. Where each %hash returns key and value, each #array also returns key (index) and value.
while (my ($idx, $val) = each #array)
{
if ($idx > 0 && $array[$idx-1] eq $val)
{
print "Duplicate indexes: ", $idx-1, "/", $idx, "\n";
}
}
Thanks to Zaid for asking, and jmcnamara for bringing it up on perlmonks' CB. I didn't see this before - I've often looped through an array and wanted to know what index I'm at. This is waaaay better than manually manipulating some $i variable created outside of a loop and incremented inside, as I expect that continue, redo, etc., will survive this better.
So, because we can now use each on arrays, we need to be able to reset that iterator, and thus we have keys.
The link you provided actually has one important reason you might use/not use keys:
As a side effect, calling keys() resets the internal interator of the HASH or ARRAY (see each). In particular, calling keys() in void context resets the iterator with no other overhead.
That would cause each to reset to the beginning of the array. Using keys and each with arrays might be important if they ever natively support sparse arrays as a real data-type.
All that said, with so many array-aware language constructs like foreach and join in perl, I can't remember the last time I used 0..$#array.
I actually think you've answered your own question: it returns the valid indices of the array, no matter what value you've set for $[. So from a generality point of view (especially for library usage), it's more preferred.
The version of Perl I have (5.10.1) doesn't support using keys with arrays, so it can't be for historic reasons.
Well in your example, you are putting them in a list; So, in a list context
keys #array will be replaced with all elements of array
whereas 0 .. $#array will do the same but as array slicing; So, instead $array[0 .. $#array] you can also mention $array[0 .. (some specific index)]

Using Delete in Perl

I'm creating a graph data structure that contains essentially an array of nodes (and edgeList with extra information). I also have a hash that allows me to quickly get a reference to a particular node by giving its name. Suppose I now want to implement a removeNode() function in the graph class, how can I delete something quickly. Let's say the function takes the name of a node, and I hash directly to it (and have a reference to that node). Delete takes arrays or hashes as a parameter, but within an array I want to delete the object that I have a reference to.
Any ideas?
I'm not clear on exactly what you're trying to do. If you just want to remove an item from a hash, delete $hash{$key}; is all you need.
If you want to remove an item from an array, and not leave that index undefined, then you can use splice #array, $index, 1; which will remove the item and shift everything after it down one spot.
If you want to just remove an element from an array but leave the rest of the list alone, then you can just undefine it: $array[$index] = undef;
That's the same thing that delete $array[$index] does, but using delete on an array index is deprecated.
Edit:
If you need to find an object in an array and then delete it, the best way is to use firstidx from List::MoreUtils, e.g.
use List::MoreUtils 'firstidx';
my $obj = get_object_to_delete();
my $index = firstidx { $_ eq $obj } #array;
splice #array, $index, 1;
This assumes the objects stringify to something suitable for comparing for equality. If they have stringification overloaded, use something like refaddr from Scalar::Util to get the numeric reference address directly.

Why do you need $ when accessing array and hash elements in Perl?

Since arrays and hashes can only contain scalars in Perl, why do you have to use the $ to tell the interpreter that the value is a scalar when accessing array or hash elements? In other words, assuming you have an array #myarray and a hash %myhash, why do you need to do:
$x = $myarray[1];
$y = $myhash{'foo'};
instead of just doing :
$x = myarray[1];
$y = myhash{'foo'};
Why are the above ambiguous?
Wouldn't it be illegal Perl code if it was anything but a $ in that place? For example, aren't all of the following illegal in Perl?
#var[0];
#var{'key'};
%var[0];
%var{'key'};
I've just used
my $x = myarray[1];
in a program and, to my surprise, here's what happened when I ran it:
$ perl foo.pl
Flying Butt Monkeys!
That's because the whole program looks like this:
$ cat foo.pl
#!/usr/bin/env perl
use strict;
use warnings;
sub myarray {
print "Flying Butt Monkeys!\n";
}
my $x = myarray[1];
So myarray calls a subroutine passing it a reference to an anonymous array containing a single element, 1.
That's another reason you need the sigil on an array access.
Slices aren't illegal:
#slice = #myarray[1, 2, 5];
#slice = #myhash{qw/foo bar baz/};
And I suspect that's part of the reason why you need to specify if you want to get a single value out of the hash/array or not.
The sigil give you the return type of the container. So if something starts with #, you know that it returns a list. If it starts with $, it returns a scalar.
Now if there is only an identifier after the sigil (like $foo or #foo, then it's a simple variable access. If it's followed by a [, it is an access on an array, if it's followed by a {, it's an access on a hash.
# variables
$foo
#foo
# accesses
$stuff{blubb} # accesses %stuff, returns a scalar
#stuff{#list} # accesses %stuff, returns an array
$stuff[blubb] # accesses #stuff, returns a scalar
# (and calls the blubb() function)
#stuff[blubb] # accesses #stuff, returns an array
Some human languages have very similar concepts.
However many programmers found that confusing, so Perl 6 uses an invariant sigil.
In general the Perl 5 compiler wants to know at compile time if something is in list or in scalar context, so without the leading sigil some terms would become ambiguous.
This is valid Perl: #var[0]. It is an array slice of length one. #var[0,1] would be an array slice of length two.
#var['key'] is not valid Perl because arrays can only be indexed by numbers, and
the other two (%var[0] and %var['key']) are not valid Perl because hash slices use the {} to index the hash.
#var{'key'} and #var{0} are both valid hash slices, though. Obviously it isn't normal to take slices of length one, but it is certainly valid.
See the slice section of perldata perldocfor more information about slicing in Perl.
People have already pointed out that you can have slices and contexts, but sigils are there to separate the things that are variables from everything else. You don't have to know all of the keywords or subroutine names to choose a sensible variable name. It's one of the big things I miss about Perl in other languages.
I can think of one way that
$x = myarray[1];
is ambiguous - what if you wanted a array called m?
$x = m[1];
How can you tell that apart from a regex match?
In other words, the syntax is there to help the Perl interpreter, well, interpret!
In Perl 5 (to be changed in Perl 6) a sigil indicates the context of your expression.
You want a particular scalar out of a hash so it's $hash{key}.
You want the value of a particular slot out of an array, so it's $array[0].
However, as pointed out by zigdon, slices are legal. They interpret the those expressions in a list context.
You want a lists of 1 value in a hash #hash{key} works
But also larger lists work as well, like #hash{qw<key1 key2 ... key_n>}.
You want a couple of slots out of an array #array[0,3,5..7,$n..$n+5] works
#array[0] is a list of size 1.
There is no "hash context", so neither %hash{#keys} nor %hash{key} has meaning.
So you have "#" + "array[0]" <=> < sigil = context > + < indexing expression > as the complete expression.
The sigil provides the context for the access:
$ means scalar context (a scalar
variable or a single element of a hash or an array)
# means list context (a whole array or a slice of
a hash or an array)
% is an entire hash
In Perl 5 you need the sigils ($ and #) because the default interpretation of bareword identifier is that of a subroutine call (thus eliminating the need to use & in most cases ).