Using string as array index in Perl - perl

I have run into a strange behavior in Perl that I haven't been able to find documentation for. If I (by accident) use a string as an index in an array I get the first item of the array and not undef as I would have expected.
$index = "Some string";
#array = qw(one two three);
$item = $array[$index];
print "item: " . $item;
I expected to get item: as output, but instead I get item: one. I assume that because the string doesn't start with a number it's "translated" to 0 and hence giving me the first item in the array. If the string starts with a number that part of the string seems to be used as the index.
Is this to be expected, and is there any documentation describing how strings (e.g. "2strings") are interpreted as numbers in Perl?

Array index imposes numeric context. The string "Some string" in numeric context is equal to 0.
Under warnings, Perl will complain
Argument "Some string" isn't numeric in array or hash lookup at ...

Array indexes must be integers, so non-integer values are converted to integers.
The string one produces the number 0, as well as the following warning:
Argument "Some string" isn't numeric in array or hash lookup
This concept is found throughout Perl. For the same reason, arguments to addition and multiplication will similarly be converted to numbers. And values used as hash keys will be converted to strings. Dereferencing undef scalars even produces the necessary value and reference in a process called autovivification!
$ perl -Mv5.10 -e'
my $ref;
say $ref // "[undef]";
$ref->[1] = 123;
say $ref // "[undef]";
'
[undef]
ARRAY(0x560b653ae4b8)
As you can see, an array and a reference to that array were spontaneously created in the above program because they were needed.
The lesson to take: Always use use strict; use warnings;.

Related

How do references in Perl work

Can anyone explain the push statement in the following Perl code to me? I know how push in perl works but I can't understand what the first argument in following push command represents. I am trying to interpret someone's script. I tried to print "#a\n"; but it only printed ARRAY(0x9aa370) which makes me think that the push is not doing anything. Any help is appreciated. Thanks!
my #a = ();
my $b = 10;
my $c = 'a';
push(#{$a[$b]}, $c);
Let's break it down.
The #{...} is understood from "Using References" in perlref
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type.
So what is inside { ... } block had better work out to an array reference. You have $a[$b] there, an element of #a at index $b, so that element must be an arrayref.
Then #{...} dereferences it and pushes a new element $c to it. Altogether, $c is copied into a (sole) element of an anonymous array whose reference is at index $b of the array #a.
And a crucial part: as there is in fact no arrayref † there, the autovivification kicks in and it is created. Since there are no elements at indices preceding $b they are created as well, with value undef.
Now please work through
tutorial perlreftut and
data-structures cookbook perldsc
while using perlref linked in the beginning for a full reference.
With complex data structures it is useful to be able to see them, and there are tools for that. A most often used one is the core Data::Dumper, and here is an example with Data::Dump
perl -MData::Dump=dd -wE'#ary = (1); push #{$ary[3]}, "ah"; dd \#ary'
with output
[1, undef, undef, ["ah"]]
where [] inside indicate an arrayref, with its sole element being the string ah.
† More precisely, an undef scalar is dereferenced and since this happens in an lvalue context the autovivification goes. Thanks to ikegami for a comment. See for instance this post with its links.
Let's start with the following two assertions:
#a starts out as an empty array with no elements.
$b is assigned the value of 10.
Now look at this construct:
#{$a[$b]}
To understand we can start in the middle: $a[$b] indexes element 10 of the array #a.
Now we can work outward from there: #{...} treats its contents as a reference to an array. So #{$a[$b]} treats the content of element 10 of the array #a as a reference to an anonymous array. That is to say, the scalar value contained in $a[10] is an array reference.
Now layer in the push:
push #{$a[$b]}, $c;
Into the anonymous array referenced in element 10 of #a you are pushing the value of $c, which is the character "a". You could access that element like this:
my $value = $a[10]->[0]; # character 'a'
Or shorthand,
my $value = $a[10][0];
If you pushed another value into #{$a[10]} then you would access it at:
my $other_value = $a[10][1];
But what about $a[0] through $a[9]? You're only pushing a value into $a[$b], which is $a[10]. Perl automatically extends the array to accommodate that 11th element ($a[10]), but leaves the value in $a[0] through $a[9] as undef. You mentioned that you tried this:
print "#a\n";
Interpolating an array into a string causes its elements to be printed with a space between each one. So you didn't see this:
ARRAY(0xa6f328)
You saw this:
ARRAY(0xa6f328)
...because there were ten spaces before the 11th element which contains an array reference.
If you were running your script with use warnings at the top, you would have seen this instead:
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
Use of uninitialized value in join or string at scripts/mytest.pl line 12.
ARRAY(0xa6f328)
...or something quite similar.
Your structure currently looks like this:
#a = (undef,undef,undef,undef,undef,undef,undef,undef,undef,undef,['a'])
If you ever want to really get a look at what a data structure looks like, rather than using a simple print, do something like this:
use Data::Dumper;
print Dumper \#a;
I've had a discussion over this yesterday here
what it means is that
#a is an array
$a[$b]
is a cell in the array
the #{} syntax helps perl understand that the cell in question is an array so you can preform push/pop operations on it.
if you do
use Data::Dumper;
print Dumper \#a;
you should see something like:
$VAR1 = [
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
[
'a'
]
];
as you can see, the 11th cell is an array containing the letter 'a' as its only value
the push operation on an empty cell could have also been written as:
$a[$b] = [$c]

perl 101, getting length of array from using $ on the array variable

I was expecting this to give the length of the array. Since I thought $mo implied scalar context.
But instead, I get the error :
Global symbol "$mo" requires explicit package name at ./a.pl line 7.
#! /usr/bin/perl
use strict;
use warnings;
my #mo = (3,4,5);
print( $mo);
UPDATE::
I thought mo is the variable and the sigil $ on $mo is using scalar context. My question is more on the sigil then actually getting the length.
In order to get the number of elements in #mo use scalar #mo.
my $num_elements = scalar #mo;
You can omit the scalar when the context dictates that it must be scalar, such as in a comparison:
if ($count < #mo) { print "$count is less than the number of elements" }
You can also use $#mo, which is the index of the last element (generally one less than the number of elements).
my $last_index = $#mo;
This is useful when you are iterating through an array and need the array index:
for (0..$#mo)
{
print "Index $_ is $mo[$_]\n";
}
The $mo form is used when obtaining an element of the array:
my $second_element = $mo[1];
$mo just by itself is a totally separate variable (though you probably shouldn't create such a variable, as it would be confusing).
You are trying to print a scalar variable $mo which does not exist. You need to use the array name in scalar context as:
my #mo = (3,4,5);
print scalar #mo;
Another way is to use $#mo which would return the largest index in the array which in your case is 2.
You may get length of an array as
my $mo = #mo;
print $mo;
my $mo = scalar (#mo);
print $mo;
my $mo = $#mo + 1; print $mo;

= and , operators in Perl

Please explain this apparently inconsistent behaviour:
$a = b, c;
print $a; # this prints: b
$a = (b, c);
print $a; # this prints: c
The = operator has higher precedence than ,.
And the comma operator throws away its left argument and returns the right one.
Note that the comma operator behaves differently depending on context. From perldoc perlop:
Binary "," is the comma operator. In
scalar context it evaluates its left
argument, throws that value away, then
evaluates its right argument and
returns that value. This is just like
C's comma operator.
In list context, it's just the list
argument separator, and inserts both
its arguments into the list. These
arguments are also evaluated from left
to right.
As eugene's answer seems to leave some questions by OP i try to explain based on that:
$a = "b", "c";
print $a;
Here the left argument is $a = "b" because = has a higher precedence than , it will be evaluated first. After that $a contains "b".
The right argument is "c" and will be returned as i show soon.
At that point when you print $a it is obviously printing b to your screen.
$a = ("b", "c");
print $a;
Here the term ("b","c") will be evaluated first because of the higher precedence of parentheses. It returns "c" and this will be assigned to $a.
So here you print "c".
$var = ($a = "b","c");
print $var;
print $a;
Here $a contains "b" and $var contains "c".
Once you get the precedence rules this is perfectly consistent
Since eugene and mugen have answered this question nicely with good examples already, I am going to setup some concepts then ask some conceptual questions of the OP to see if it helps to illuminate some Perl concepts.
The first concept is what the sigils $ and # mean (we wont descuss % here). # means multiple items (said "these things"). $ means one item (said "this thing"). To get first element of an array #a you can do $first = $a[0], get the last element: $last = $a[-1]. N.B. not #a[0] or #a[-1]. You can slice by doing #shorter = #longer[1,2].
The second concept is the difference between void, scalar and list context. Perl has the concept of the context in which your containers (scalars, arrays etc.) are used. An easy way to see this is that if you store a list (we will get to this) as an array #array = ("cow", "sheep", "llama") then we store the array as a scalar $size = #array we get the length of the array. We can also force this behavior by using the scalar operator such as print scalar #array. I will say it one more time for clarity: An array (not a list) in scalar context will return, not an element (as a list does) but rather the length of the array.
Remember from before you use the $ sigil when you only expect one item, i.e. $first = $a[0]. In this way you know you are in scalar context. Now when you call $length = #array you can see clearly that you are calling the array in scalar context, and thus you trigger the special property of an array in list context, you get its length.
This has another nice feature for testing if there are element in the array. print '#array contains items' if #array; print '#array is empty' unless #array. The if/unless tests force scalar context on the array, thus the if sees the length of the array not elements of it. Since all numerical values are 'truthy' except zero, if the array has non-zero length, the statement if #array evaluates to true and you get the print statement.
Void context means that the return value of some operation is ignored. A useful operation in void context could be something like incrementing. $n = 1; $n++; print $n; In this example $n++ (increment after returning) was in void context in that its return value "1" wasn't used (stored, printed etc).
The third concept is the difference between a list and an array. A list is an ordered set of values, an array is a container that holds an ordered set of values. You can see the difference for example in the gymnastics one must do to get particular element after using sort without storing the result first (try pop sort { $a cmp $b } #array for example, which doesn't work because pop does not act on a list, only an array).
Now we can ask, when you attempt your examples, what would you want Perl to do in these cases? As others have said, this depends on precedence.
In your first example, since the = operator has higher precedence than the ,, you haven't actually assigned a list to the variable, you have done something more like ($a = "b"), ("c") which effectively does nothing with the string "c". In fact it was called in void context. With warnings enabled, since this operation does not accomplish anything, Perl attempts to warn you that you probably didn't mean to do that with the message: Useless use of a constant in void context.
Now, what would you want Perl to do when you attempt to store a list to a scalar (or use a list in a scalar context)? It will not store the length of the list, this is only a behavior of an array. Therefore it must store one of the values in the list. While I know it is not canonically true, this example is very close to what happens.
my #animals = ("cow", "sheep", "llama");
my $return;
foreach my $animal (#animals) {
$return = $animal;
}
print $return;
And therefore you get the last element of the list (the canonical difference is that the preceding values were never stored then overwritten, however the logic is similar).
There are ways to store a something that looks like a list in a scalar, but this involves references. Read more about that in perldoc perlreftut.
Hopefully this makes things a little more clear. Finally I will say, until you get the hang of Perl's precedence rules, it never hurts to put in explicit parentheses for lists and function's arguments.
There is an easy way to see how Perl handles both of the examples, just run them through with:
perl -MO=Deparse,-p -e'...'
As you can see, the difference is because the order of operations is slightly different than you might suspect.
perl -MO=Deparse,-p -e'$a = a, b;print $a'
(($a = 'a'), '???');
print($a);
perl -MO=Deparse,-p -e'$a = (a, b);print $a'
($a = ('???', 'b'));
print($a);
Note: you see '???', because the original value got optimized away.

how to grep perl Hash Keys in to an array?

Iam a perl newbie and need help in understanding the below piece of code.
I have a perl Hash defined like this
1 my %myFavourite = ("Apple"=>"Apple");
2 my #fruits = ("Apple", "Orange", "Grape");
3 #myFavourite{#fruits}; # This returns Apple. But how?
It would be great if perl gurus could explain what's going on in Line-3 of the above code.
myFavourite is declared has a hash,but used as an array? And the statement simply takes the key of the hash ,greps it in to the array and returns the hash values corresponding the key searched. Is this the way we grep Hash Keys in to the Array?
It doesn't return Apple. It evaluates to a hash slice consisting of all of the values in the hash corresponding to the keys in #fruits. Notice if you turn on warnings that you get 2 warnings about uninitialized values. This is because myFavourite does not contain values for the keys Orange and Grape. Look up 'hash slice' in perldata.
Essentially, #myFavourite{#fruits} is shorthand for ($myFavourite{Apple}, $myFavourite{Orange}, $myFavourite{Grape}), which in this case is ($myFavourite{Apple},undef,undef). If you print it, the only output you see is Apple.
myFavourite is declared has a hash,but used as an array?
Yes, and it returns a list. It's a hash slice. See: http://perldoc.perl.org/perldata.html
Think of it as an expansion of array #fruits into multiple hash key lookups.
The #hash{#keys} syntax is just a handy way of extracting portions of the hash.
Specifically:
#myFavourite{#fruits}
is equivalent to:
($myFavourite{'Apple'},$myFavourite{'Orange'},$myFavourite{'Grape'})
which returns a three item list if called in list context or a concatenation of all three elements in scalar context (e.g. print)
my #slice_values = #myFavourite{#fruits}
# #slice_values now contains ('Apple',undef,undef)
# which is functionally equivalent to:
my #slice_values = map { $myFavourite{$_} } #fruits;
If you want to only extract hash values with keys, do:
my #favourite_fruits = #myFavourite{ grep { exists $myFavourite{$_} } #fruits };
# #favourite_fruits now contains ('Apple')
If you:
use warnings;
you'll see the interpreters warnings about the two uninitialized values being autovivified as undef.

Accessing Array Elements of Referenced Array

I am new to Perl. I wrote a snippet to access array elements and print it to the console:
use strict;
use warnings;
my #array1 = ('20020701 00000', 'Sending Mail in Perl', 'Philip Yuson');
my #array2 = ('20020601', 'Manipulating Dates in Perl', 'Philip Yuson');
my #array3 = ('20020501', 'GUI Application for CVS', 'Philip Yuson');
my #main = (\#array1, \#array2, \#array3);
my $a = $main[0];
print #$a;
print #$a . "pdf";
First print:
20020701 00000Sending Mail in PerlPhilip Yuson
But why second print outputs this?
3pdf
I need to get the output like
20020701 00000Sending Mail in PerlPhilip Yusonpdf
I don't know why it is giving 3pdf i am pressed get out of this. Any Help is greatly appreciated.
The 3 is the number of elements in the array. The . is forcing the array into scalar context, and then you get the number of elements instead of the array contents. You could use
print "#$a pdf";
or
print #$a , "pdf";
depending on what kind of output you want.
Arrays are one of the parts of Perl that act differently according to the “context”, which is a very important concept in Perl programming. Consider this:
my #fruits = qw/apples pears bananas/;
my $items = #fruits;
On the second line you are assigning to a scalar (⇒ here we have some context), but on the right side you have an array. We say that the array here is used in scalar context, and in scalar context the value of an array is the number of its items.
Now to your problem: When you are simply printing the array, there is not much magic involved. But when you try to append a string onto the array using the . operator, you are using the array in scalar context. Which means the array evaluates to the number of its items (3), to which you append the pdf.
Is that clear? You should Google up something on “Perl context”, that will make Perl programming much easier for you.
This is a matter of contexts. In Perl, the data type of a value is only part of what evaluates to; the other half is the context that value is used in.
As you may know, there are three built-in data types: scalars, arrays, and hashes. There is also some degree of implicit casting that can be done between these data types.
There are also two major contexts: list and scalar. Arrays and hashes both work without casting in list context; scalar values work without change in scalar contexts.
The behavior of an operator can depend on the context it is run in. If an operator requires a particular context, and Perl is able to implicitly cast the value into something matching that context, it will. In the case of arrays and associative arrays being cast to integers, what you get is the ''cardinality'' of the array, the number of elements it contains.
In your example above, #$a evaluates to data typed as an array. The other half of that story, though, is the context in which the operator . runs in. Reading perldoc perlop, it says the following:
Binary . concatenates two strings.
Well, strings are scalar values, and so we need to cast the array #$a to be valid in a scalar context, and in doing so get the back the cardinality of the array. #$a contains 3 things, so this evaluates to the scalar value 3, which is then turned into a string so the . operator can work its magic.
Hope this makes some sense.
print #$a . "pdf" evaluates the array in scalar context, this outputting the number of elements in the array, which is why you get 3.
You're probably looking for something like this:
print #$a, "pdf";
The comma operator instead of dot forces it into list context.
I have a feeling what you really would like is:
print "#$a.pdf", "\n";
That is:
my #array1 = ('20020701 00000', 'Sending Mail in Perl', 'Philip Yuson');
my #array2 = ('20020601', 'Manipulating Dates in Perl', 'Philip Yuson');
my #array3 = ('20020501', 'GUI Application for CVS', 'Philip Yuson');
my #main = (\#array1, \#array2, \#array3);
for my $x ( #main ) {
print "#$x.pdf", "\n";
}
Output:
20020701 00000 Sending Mail in Perl Philip Yuson.pdf
20020601 Manipulating Dates in Perl Philip Yuson.pdf
20020501 GUI Application for CVS Philip Yuson.pdf