Using the $# operator - perl

I've just taken over maintenance of a piece of Perl system. The machine it used to run on is dead so I do not know which version of Perl it was using, but it was working. It included the following line to count the lines in a page of ASCII text
my $lcnt = $#{#{$page{'lines'}}};
In Perl 5.10.1 ( we are now running this on CentOS 6.3 ) the above code no longer works. I instead use the following, which works fine.
my #arr = #{$page{'lines'}};
my $lcnt = $#arr;
I'll admit my Perl isn't great but from what I can see the first version should never have worked as it is trying to deference an array rather than an array ref
First question - is my guess at why this first line of code doesn't now work correct, and secondly did it work earlier due to a now fixed bug in a prior Perl version?
Thanks!

The first version never worked. Assuming $page{'lines'} is an arrayref, this is what you want:
my $lcnt = $#{$page{'lines'}};
Note that this is going to give you one less than the number of items in your arraref. The $# operator is the INDEX of the last item, not the number of items. If you want the number of items in $page{'lines'}, you probably want this:
my $lcnt = scalar(#{$page{'lines'}});

Some things about your code. This:
my $lcnt = $#{#{$page{'lines'}}};
Was never correct. Take a look at the three things going on here
$page{'lines'} # presumably an array ref
#{ ... } # dereference into an array
$#{ ... } # get last index of an array ref
This is equivalent to (continuing on your own code):
my #arr = #{$page{'lines'}};
my $foo = #arr; # foo is now the size of the array, e.g. 3
my $lcnt = $#$foo;
If you use
use strict;
use warnings;
Which you should always do, without question (!), you will get the informative fatal error message:
Can't use string ("3") as an ARRAY ref while "strict refs" in use
(Where 3 will be the size of your array)
The correct way to get the size (number of elements) of an array is to put the array in scalar context:
my $size = #{ $page{'lines'} };
The way to get the index of the last element is using the $# sigil:
my $last_index = $#{ $page{'lines'} };
As you'll note, the syntax is the same, it is just a matter of using # or $# to get what you want, just the same as when using a regular array
my $size = #array;
my $last = $#array;
So, to refer back to the beginning: Using both # and $# is not and was never correct.

Related

Why doesn't this deref of a reference work as a one-liner?

Given:
my #list1 = ('a');
my #list2 = ('b');
my #list0 = ( \#list1, \#list2 );
then
my #listRef = $list0[1];
my #list = #$listRef; # works
but
my #list = #$($list0[1]); # gives an error message
I can't figure out why. What am I missing?
There is one simple de-referencing rule that covers this. Loosely put:
What follows the sigil need be the correct reference for it, or a block that evaluates to that.
A specific case from perlreftut
You can always use an array reference, in curly braces, in place of the name of an array.
In your case then, it should be
my #list = #{ $list0[1] };
(not index [2] since your #list0 has two elements)  Spaces are there only for readability.
The attempted #$($list0[2]) is a syntax error, first because the ( (following the $) isn't allowed in an identifier (variable name), what presumably follows that $.
A block {} though would be allowed after the $ and would be evaluated, and must yield a scalar reference in this case, to be dereferenced by that $ in front of it; but then the first # would be in error. That can then fixed as well but this gets messy if pushed, and wasn't meant to go that far. While the exact rules are (still) a little murky, see Identifier Parsing in perldata.
The #$listRef earlier is correct syntax in general. But it refers to a scalar variable $listRef (which must be an array reference since it's getting dereferenced into an array by the first #), and there is no such a thing in the example -- you have an array variable #listRef.
So with use strict; in effect this, too, would fail to compile.
Dereferencing an arrayref to assign a new array is expensive as it has to copy all elements (and to construct the new array variable), while it's rarely needed (unless you actually want a copy). With the array reference on hand ($ar) all that one may need is readily available
#$ar; # list of elements
$ar->[$index]; # specific element
#$ar[#indices]; # slice -- list of some elements, like #$ar[0,2..5,-1]
$ar->#[0,-1]; # slice, with new "postfix dereferencing" (stable at v5.24)
$#$ar; # last index in the anonymous array referred by $ar
See Slices in perldata and Postfix reference slicing in perlref
You need
#{ $list0[1] }
Whenever you can use the name of a variable, you can use a block that evaluates to a reference. That means the syntax for getting the elements of an array are
#NAME # If you have the name
#BLOCK # If you have a reference
That means that
my #array1 = 4..5;
my #array2 = #array1;
and
my $array1 = [ 4..5 ];
my #array2 = #{ $array1 }
are equivalent.
When the only thing in the block is a simple scalar ($NAME or $BLOCK), you can omit the curlies. That means that
#{ $array1 }
is equivalent to
#$array1
That's why #$listRef works, and it's why #{ $list0[1] } can't be simplified.
See Perl Dereferencing Syntax.
You have a lot going on there and multiple levels of inadvertent references, so let's go through it:
First, you start by making a list of two items, each of which is an array reference. You store that in an array:
my #list0 = ( \#list2, \#list2 );
Then you ask for the item with index 2, which is a single item, and store that in an array:
my #listRef = $list0[2];
However, there is no item with index 2 because Perl indexes from zero. The value in #listRef in undefined. Not only that, but you've asked for a single item and stored it in an array instead of a scalar. That's probably not what you meant.
You say this following line works, but I don't think you know that because it won't give you the value you were expecting even if you didn't get an error. Something else is happening. You haven't declared or used a variable $listRef, so Perl creates it for you and gives it the value undef. When you try to dereference it, Perl uses "autovivification" to create the reference. This is the process where Perl helpfully creates a reference structure for you if you start with undef:
my #list = #$listRef; # works
There is nothing in that array so #list should be empty.
Fix that to get the last item, which has index of 1, and fix it so you are assigning the single value (the reference) to a scalar variable:
my $listRef = $list0[1];
Data::Dumper is handy here:
use Data::Dumper;
my #list2 = qw(a b c);
my #list0 = ( \#list2, \#list2 );
my $listRef = $list0[1];
print Dumper($listRef);
You get the output:
$VAR1 = [
'a',
'b',
'c'
];
Perl has some features that can catch these sorts of variable naming mistakes and will help you track down problems. Add these to the top of your program:
use strict;
use warnings;
For the rest, you might want to check out my book Intermediate Perl which explains all this reference stuff.
And, recent Perls have a new feature called postfix dereferencing that allows you to write dereferences from left to right:
my #items = ( \#list2, \#list2 );
my #items_of_last_ref = $items[1]->#*;
my #list = #$#listRef; # works
I doubt that works. That may not throw a syntax error but it sure as hell does not do what you think it does. For once
my #list0 = ( \#list2, \#list2 );
defines an array with 2 elements and you access
my #listRef = $list0[2];
the third element. So #listRef is an array that contains one element which is undef. The following code doesn't make sense either.
Unless the question is purely academic (answered by zdim already), I assume you want the second element of #list into a separate array, I would write
my #list = #{ $list0[1] };
The question is not complete and not clear on desired outcome.
OP tries to access an element $list0[2] of array #list0 which does not exist -- array has elements with indexes 0 and 1.
Perhaps #listRef should be $listRef instead in the post.
Bellow is my vision of described problem
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #list1 = qw/word1 word2 word3 word4/;
my #list2 = 1000..1004;
my #list0 = (\#list1, \#list2);
my $ref_array = $list0[0];
map{ say } #{$ref_array};
$ref_array = $list0[1];
map{ say } #{$ref_array};
say "Element: " . #{$ref_array}[2];
output
word1
word2
word3
word4
1000
1001
1002
1003
1004
Element: 1002

Handling with two warnings of #ARGS

Small debug question I can't solve for some reason. Consider the following code:
use warnings;
my $flag = 0;
foreach my $i (0..scalar(#ARGV)) {
$data{$OPTION} .= $ARGV[$i]." " if($flag);
$flag = 1 if($ARGV[$i] =~ /$OPTION/);
undef $ARGV[$i] if($flag);
}
I get the following two warnings:
Use of uninitialized value within #ARGV in concatenation (.) or string at line 4
Use of uninitialized value in pattern match (m//) at line 5
I get the reason is that I undefine some value of #ARGV and then it tries to check it.
The way I do it like this is because I would like to 'cut' some of the data of #ARGV before using GetOpt module (which uses this array).
How to solve it?
Let's expand on those comments a bit.
Imagine #ARGV contains four elements. They will have the indexes 0, 1, 2 and 3 (as arrays in Perl are zero-based).
And your loop looks like this:
foreach my $i (0..scalar(#ARGV)) {
You want to visit each element in #ARGV, so you use the range operator (..) to generate a list of all those indexes. But scalar #ARGV returns the number of elements in #ARGV and that's 4. So your range is 0 .. 4. And there's no value at $ARGV[4] - so you get an "undefined value" warning (as you're trying to read past the end of an array).
A better way to do this is to use $#ARGV instead of scalar #ARGV. For every array variable in Perl (say #foo) you also get a variable (called $#foo) which contains the last index number in the array. In our case, that's 3 and your range (0 .. $#ARGV) now contains the integers 0 .. 3 and you no longer try to read past the end of the array and you don't get the "undefined value" warnings.
There's one other improvement I would suggest. Inside your loop, you only ever use $i to access an element from #ARGV. It's only used in expressions like $ARGV[$i]. In this case, it's probably better to skip the middle man and to iterate across the elements in the array, not the indexes.
I mean you can write your code like this:
foreach my $arg (#ARGV) {
$data{$OPTION} .= $arg . " " if($flag);
$flag = 1 if($arg =~ /$OPTION/);
undef $arg if($flag);
}
I think that's a little easier to follow.

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.

Why does Perl's shift complain 'Type of arg 1 to shift must be array (not grep iterator).'?

I've got a data structure that is a hash that contains an array of hashes. I'd like to reach in there and pull out the first hash that matches a value I'm looking for. I tried this:
my $result = shift grep {$_->{name} eq 'foo'} #{$hash_ref->{list}};
But that gives me this error: Type of arg 1 to shift must be array (not grep iterator). I've re-read the perldoc for grep and I think what I'm doing makes sense. grep returns a list, right? Is it in the wrong context?
I'll use a temporary variable for now, but I'd like to figure out why this doesn't work.
A list isn't an array.
my ($result) = grep {$_->{name} eq 'foo'} #{$hash_ref->{list}};
… should do the job though. Take the return from grep in list context, but don't assign any of the values other than the first.
I think a better way to write this would be this:
use List::Util qw/first/;
my $result = first { $_->{name} eq 'foo' } #{ $hash_ref->{list} };
Not only will it be more clear what you're trying to do, it will also be faster because it will stop grepping your array once it has found the matching element.
Another way to do it:
my $result = (grep {$_->{name} eq 'foo'} #{$hash_ref->{list}})[0];
Note that the curlies around the first argument to grep are redundant in this case, so you can avoid block setup and teardown costs with
my $result = (grep $_->{name} eq 'foo', #{$hash_ref->{list}})[0];
“List value constructors” in perldata documents subscripting of lists:
A list value may also be subscripted like a normal array. You must put the list in parentheses to avoid ambiguity. For example:
# Stat returns list value.
$time = (stat($file))[8];
# SYNTAX ERROR HERE.
$time = stat($file)[8]; # OOPS, FORGOT PARENTHESES
# Find a hex digit.
$hexdigit = ('a','b','c','d','e','f')[$digit-10];
# A "reverse comma operator".
return (pop(#foo),pop(#foo))[0];
As I recall, we got this feature when Randal Schwartz jokingly suggested it, and Chip Salzenberg—who was a patching machine in those days—implemented it that evening.
Update: A bit of searching shows the feature I had in mind was $coderef->(#args). The commit message even logs the conversation!

What does it mean to pre-increment $#array?

I've come across the following line of code. It has issues:
it is intended to do the same as push
it ought to have used push
it's hard to read, understand
I've since changed it to use push
it does something I thought was illegal, but clearly isn't
here it is:
$array [++$#array] = 'data';
My question is: what does it mean to pre-increment $#array? I always considered $#array to be an attribute of an array, and not writable.
perldata says:
"The length of an array is a scalar value. You may find the length of array #days by evaluating $#days , as in csh. However, this isn't the length of the array; it's the subscript of the last element, which is a different value since there is ordinarily a 0th element. Assigning to $#days actually changes the length of the array. Shortening an array this way destroys intervening values. Lengthening an array that was previously shortened does not recover values that were in those elements."
Modifying $#array is useful in some cases, but in this case, clearly push is better.
A post-increment will return the variable first and then increment it.
If you used post-increment you would be modifing the last element, since its returned first, and then pushing an empty element onto the end. On the second loop you would be modifing that empty value and pushing a new empty one for later. So it wouldn't work like a push at all.
The pre-increment will increment the variable and then return it. That way your example will always being writing to a new, last element of the array and work like push. Example below:
my (#pre, #post);
$pre[$#pre++] = '1';
$pre[$#pre++] = '2';
$pre[$#pre++] = '3';
$post[++$#post] = '1';
$post[++$#post] = '2';
$post[++$#post] = '3';
print "pre keys: ".#pre."\n";
print "pre: #pre\n";
print "post keys: ".#post."\n";
print "post: #post\n";
outputs:
pre keys: 3
pre: 2 3
post keys: 3
post: 1 2 3
Assigning a value larger than the current array length to $#array extends the array.
This code works too:
$ perl -le 'my #a; $a[#a]="Hello"; $a[#a]=" world!"; print #a'
Hello world!
Perl array is dynamic and grows when assign beyond limits.
First of all, that's foul.
That said, I'm also surprised that it works. I would have guessed that ++$#array would have gotten the "Can't modify constant" error you get when trying to increment a number. (Not that I ever accidentally do that, of course.) But, I guess that's exactly where we were wrong: $#array isn't a constant (a number); it's a variable expression. As such you can mess with it. Consider the following:
my #array = qw/1 2 3/;
++$#array;
$array[$#array] = qw/4/;
print "#array\n"
And even, for extra fun, this:
my #array = qw/1 2 3/;
$#array += 5;
foreach my $wtf (#array) {
if (defined $wtf) {
print "$wtf\n";
}
else {
print "undef\n";
}
}
And, yeah, the Perl Cookbook is happy to mess with $#array to grow or truncate arrays (Chapter 4, recipe 3). I still find it ugly, but maybe that's just a lingering "but it's a number" prejudice.