What does `$hash{$key} |= {}` do in Perl? - perl

I was wrestling with some Perl that uses hash references.
In the end it turned out that my problem was the line:
$myhash{$key} |= {};
That is, "assign $myhash{$key} a reference to an empty hash, unless it already has a value".
Dereferencing this and trying to use it as a hash reference, however, resulted in interpreter errors about using a string as a hash reference.
Changing it to:
if( ! exists $myhash{$key}) {
$myhash{$key} = {};
}
... made things work.
So I don't have a problem. But I'm curious about what was going on.
Can anyone explain?

The reason you're seeing an error about using a string as a hash reference is because you're using the wrong operator. |= means "bitwise-or-assign." In other words,
$foo |= $bar;
is the same as
$foo = $foo | $bar
What's happening in your example is that your new anonymous hash reference is getting stringified, then bitwise-ORed with the value of $myhash{$key}. To confuse matters further, if $myhash{$key} is undefined at the time, the value is the simple stringification of the hash reference, which looks like HASH(0x80fc284). So if you do a cursory inspection of the structure, it may look like a hash reference, but it's not. Here's some useful output via Data::Dumper:
perl -MData::Dumper -le '$hash{foo} |= { }; print Dumper \%hash'
$VAR1 = {
'foo' => 'HASH(0x80fc284)'
};
And here's what you get when you use the correct operator:
perl -MData::Dumper -le '$hash{foo} ||= { }; print Dumper \%hash'
$VAR1 = {
'foo' => {}
};

Perl has shorthand assignment operators. The ||= operator is often used to set default values for variables due to Perl's feature of having logical operators return the last value evaluated. The problem is that you used |= which is a bitwise or instead of ||= which is a logical or.
As of Perl 5.10 it's better to use //= instead. // is the logical defined-or operator and doesn't fail in the corner case where the current value is defined but false.

I think your problem was using "|=" (bitwise-or assignment) instead of "||=" (assign if false).
Note that your new code is not exactly equivalent. The difference is that "$myhash{$key} ||= {}" will replace existing-but-false values with a hash reference, but the new one won't. In practice, this is probably not relevant.

Try this:
my %myhash;
$myhash{$key} ||= {};
You can't declare a hash element in a my clause, as far as I know. You declare the hash first, then add the element in.
Edit: I see you've taken out the my. How about trying ||= instead of |=? The former is idiomatic for "lazy" initialisation.

Related

How can #? be used on a dereferenced array without first using #?

An array in perl is dereferenced like so,
my #array = #{$array_reference};
When trying to assign an array to a dereference without the '#', like,
my #array = {$array_reference};
Perl throws the error, 'Odd number of elements in anonymous hash at ./sand.pl line 22.' We can't assign it to an array variable becauase Perl is confused about the type.
So how can we perform...
my $lastindex = $#{$array_reference};
if Perl struggles to understand that '{$array_reference}' is an array type? It would make more sense to me if this looked like,
my $lastindex = $##{$array_reference};
(despite looking much uglier).
tl;dr: It's $#{$array_reference} to match the syntax of $#array.
{} is overloaded with many meanings and that's just how Perl is.
Sometimes {} creates an anonymous hash. That's what {$array_reference} is doing, trying to make a hash where the key is the stringification of $array_reference, something like "ARRAY(0x7fb21e803280)" and there is no value. Because you're trying to create a hash with a key and no value you get an "odd number of elements" warning.
Sometimes {...} is a block like sub { ... } or if(...) { ... }, or do {...} and so on.
Sometimes it's a bare block like { local $/; ... }.
Sometimes it's indicating the key of a hash like $hash{key} or $hash->{key}.
Preceeded with certain sigils {} makes dereferencing explicit. While you can write $#$array_reference or #$array_reference sometimes you want to dereference something that isn't a simple scalar. For example, if you had a function that returned an array reference you could get its size in one line with $#{ get_array_reference() }. It's $#{$array_reference} to match the syntax of $#array.
$#{...} dereferences an array and gets the index. #{...} dereferences an array. %{...} dereferences a hash. ${...} dereferences a scalar. *{...} dereferences a glob.
You might find the section on Variable Names and Sigils in Modern Perl helpful to see the pattern better.
It would make more sense to me if this looked like...
There's a lot of things like that. Perl has been around since 1987. A lot of these design decisions were made decades ago. The code for deciding what {} means is particularly complex. That there is a distinction between an array and an array reference at all is a bit odd.
$array[$index]
#array[#indexes]
#array
$#array
is equivalent to
${ \#array }[$index]
#{ \#array }[#indexes]
#{ \#array }
$#{ \#array }
See the pattern? Wherever the NAME of an array isused, you can use a BLOCK that returns a reference to an array instead. That means you can use
${ $ref }[$index]
#{ $ref }[#indexes]
#{ $ref }
$#{ $ref }
This is illustrated in Perl Dereferencing Syntax.
Note that you can omit the curlies if the BLOCK contains nothing but a simple scalar.
$$ref[$index]
#$ref[#indexes]
#$ref
$#$ref
There's also an "arrow" syntax which is considered clearer.
$ref->[$index]
$ref->#[#indexes]
$ref->#*
$ref->$#*
Perl is confused about the type
Perl struggles to understand that '{$array_reference}' is an array type
Well, it's not an array type. Perl doesn't "struggle"; you just have wrong expectations.
The general rule (as explained in perldoc perlreftut) is: You can always use a reference in curly braces in place of a variable name.
Thus:
#array # a whole array
#{ $array_ref } # same thing with a reference
$array[$i] # an array element
${ $array_ref }[$i] # same thing with a reference
$#array # last index of an array
$#{ $array_ref } # same thing with a reference
On the other hand, what's going on with
my #array = {$array_reference};
is that you're using the syntax for a hash reference constructor, { LIST }. The warning occurs because the list in question is supposed to have an even number of elements (for keys and values):
my $hash_ref = {
key1 => 'value1',
key2 => 'value2',
};
What you wrote is treated as
my #array = ({
$array_reference => undef,
});
i.e. an array containing a single element, which is a reference to a hash containing a single key, which is a stringified reference (and whose value is undef).
The syntactic difference between a dereference and a hashref constructor is that a dereference starts with a sigil (such as $, #, or %) whereas a hashref constructor starts with just a bare {.
Technically speaking the { } in the dereference syntax form an actual block of code:
print ${
print "one\n"; # yeah, I just put a statement in the middle of an expression
print "two\n";
["three"] # the last expression in this block is implicitly returned
# (and dereferenced by the surrounding $ [0] construct outside)
}[0], "\n";
For (hopefully) obvious reasons, no one actually does this in real code.
The syntax is
my $lastindex = $#$array_reference;
which assigns to $lastindex the index of the last element of the anonymous array which reference is in the variable $array_reference.
The code
my #ary = { $ra }; # works but you get a warning
doesn't throw "an error" but rather a warning. In other words, you do get #ary with one element, a reference to an anonymous hash. However, a hash need have an even number of elements so you also get a warning that that isn't so.
Your last attempt dereferences the array with #{$array_reference} -- which returns a list, not an array variable. A "list" is a fleeting collection of scalars in memory (think of copying scalars on stack to go elsewhere); there is no notion of "index" for such a thing. For this reason a $##{$ra} isn't even parsed as intended and is a syntax error.
The syntax $#ary works only with a variable #ary, and then there is the $#$arrayref syntax. You can in general write $#{$arrayref} since the curlies allow for an arbitrary expression that evaluates to an array reference but there is no reason for that since you do have a variable with an array reference.
I'd agree readily that much of this syntax takes some getting-used-to, to put it that way.

How do I define an anonymous scalar ref in Perl?

How do I properly define an anonymous scalar ref in Perl?
my $scalar_ref = ?;
my $array_ref = [];
my $hash_ref = {};
If you want a reference to some mutable storage, there's no particularly neat direct syntax for it. About the best you can manage is
my $var;
my $sref = \$var;
Or neater
my $sref = \my $var;
Or if you don't want the variable itself to be in scope any more, you can use a do block:
my $sref = do { \my $tmp };
At this point you can pass $sref around by value, and any mutations to the scalar it references will be seen by others.
This technique of course works just as well for array or hash references, just that there's neater syntax for doing that with [] and {}:
my $aref = do { \my #tmp }; ## same as my $aref = [];
my $href = do { \my %tmp }; ## same as my $href = {};
Usually you just declare and don't initialize it.
my $foo; # will be undef.
You have to consider that empty hash refs and empty array refs point to a data structure that has a representation. Both of them, when dereferenced, give you an empty list.
perldata says (emphasis mine):
There are actually two varieties of null strings (sometimes referred to as "empty" strings), a defined one and an undefined one. The defined version is just a string of length zero, such as "" . The undefined version is the value that indicates that there is no real value for something, such as when there was an error, or at end of file, or when you refer to an uninitialized variable or element of an array or hash. Although in early versions of Perl, an undefined scalar could become defined when first used in a place expecting a defined value, this no longer happens except for rare cases of autovivification as explained in perlref. You can use the defined() operator to determine whether a scalar value is defined (this has no meaning on arrays or hashes), and the undef() operator to produce an undefined value.
So an empty scalar (which it didn't actually say) would be undef. If you want it to be a reference, make it one.
use strict;
use warnings;
use Data::Printer;
my $scalar_ref = \undef;
my $scalar = $$scalar_ref;
p $scalar_ref;
p $scalar;
This will output:
\ undef
undef
However, as ikegami pointed out, it will be read-only because it's not a variable. LeoNerd provides a better approach for this in his answer.
Anyway, my point is, an empty hash ref and an empty array ref when dereferenced both contain an empty list (). And that is not undef but nothing. But there is no nothing as a scalar value, because everything that is not nothing is a scalar value.
my $a = [];
say ref $r; # ARRAY
say scalar #$r; # 0
say "'#$r'"; # ''
So there is no real way to initialize with nothing. You can only not initialize. But Moose will turn it to undef anyway.
What you could do is make it maybe a scalar ref.
use strict;
use warnings;
use Data::Printer;
{
package Foo;
use Moose;
has bar => (
is => 'rw',
isa => 'Maybe[ScalarRef]',
predicate => 'has_bar'
);
}
my $foo = Foo->new;
p $foo->has_bar;
p $foo;
say $foo->bar;
Output:
""
Foo {
Parents Moose::Object
public methods (3) : bar, has_bar, meta
private methods (0)
internals: {}
}
Use of uninitialized value in say at scratch.pl line 268.
The predicate gives a value that is not true (the empty string ""). undef is also not true. The people who made Moose decided to go with that, but it really doesn't matter.
Probably what you want is not have a default value, but just make it a ScalarRef an required.
Note that perlref doesn't say anything about initializing an empty scalar ref either.
I'm not entirely sure why you need to but I'd suggest:
my $ref = \undef;
print ref $ref;
Or perhaps:
my $ref = \0;
#LeoNerd's answer is spot on.
Another option is to use a temporary anonymous hash value:
my $scalar_ref = \{_=>undef}->{_};
$$scalar_ref = "Hello!\n";
print $$scalar_ref;

Perl dereferencing in non-strict mode

In Perl, if I have:
no strict;
#ARY = (58, 90);
To operate on an element of the array, say it, the 2nd one, I would write (possibly as part of a larger expression):
$ARY[1] # The most common way found in Perldoc's idioms.
Though, for some reason these also work:
#ARY[1]
#{ARY[1]}
Resulting all in the same object:
print (\$ARY[1]);
print (\#ARY[1]);
print (\#{ARY[1]});
Output:
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
What is the syntax rules that enable this sort of constructs? How far could one devise reliable program code with each of these constructs, or with a mix of all of them either? How interchangeable are these expressions? (always speaking in a non-strict context).
On a concern of justifying how I come into this question, I agree "use strict" as a better practice, still I'm interested at some knowledge on build-up non-strict expressions.
In an attemp to find myself some help to this uneasiness, I came to:
The notion on "no strict;" of not complaining about undeclared
variables and quirk syntax.
The prefix dereference having higher precedence than subindex [] (perldsc § "Caveat on precedence").
The clarification on when to use # instead of $ (perldata § "Slices").
The lack of "[]" (array subscript / slice) description among the Perl's operators (perlop), which lead me to think it is not an
operator... (yet it has to be something else. But, what?).
For what I learned, none of these hints, put together, make me better understand my issue.
Thanks in advance.
Quotation from perlfaq4:
What is the difference between $array[1] and #array[1]?
The difference is the sigil, that special character in front of the array name. The $ sigil means "exactly one item", while the # sigil means "zero or more items". The $ gets you a single scalar, while the # gets you a list.
Please see: What is the difference between $array[1] and #array[1]?
#ARY[1] is indeed a slice, in fact a slice of only one member. The difference is it creates a list context:
#ar1[0] = qw( a b c ); # List context.
$ar2[0] = qw( a b c ); # Scalar context, the last value is returned.
print "<#ar1> <#ar2>\n";
Output:
<a> <c>
Besides using strict, turn warnings on, too. You'll get the following warning:
Scalar value #ar1[0] better written as $ar1[0]
In perlop, you can read that "Perl's prefix dereferencing operators are typed: $, #, %, and &." The standard syntax is SIGIL { ... }, but in the simple cases, the curly braces can be omitted.
See Can you use string as a HASH ref while "strict refs" in use? for some fun with no strict refs and its emulation under strict.
Extending choroba's answer, to check a particular context, you can use wantarray
sub context { return wantarray ? "LIST" : "SCALAR" }
print $ary1[0] = context(), "\n";
print #ary1[0] = context(), "\n";
Outputs:
SCALAR
LIST
Nothing you did requires no strict; other than to hide your error of doing
#ARY = (58, 90);
when you should have done
my #ARY = (58, 90);
The following returns a single element of the array. Since EXPR is to return a single index, it is evaluated in scalar context.
$array[EXPR]
e.g.
my #array = qw( a b c d );
my $index = 2;
my $ele = $array[$index]; # my $ele = 'c';
The following returns the elements identified by LIST. Since LIST is to return 0 or more elements, it must be evaluated in list context.
#array[LIST]
e.g.
my #array = qw( a b c d );
my #indexes ( 1, 2 );
my #slice = $array[#indexes]; # my #slice = qw( b c );
\( $ARY[$index] ) # Returns a ref to the element returned by $ARY[$index]
\( #ARY[#indexes] ) # Returns refs to each element returned by #ARY[#indexes]
${foo} # Weird way of writing $foo. Useful in literals, e.g. "${foo}bar"
#{foo} # Weird way of writing #foo. Useful in literals, e.g. "#{foo}bar"
${foo}[...] # Weird way of writing $foo[...].
Most people don't even know you can use these outside of string literals.

Why does Perl think that a non-existent multi-level hash element is there?

Sorry, this seems like such a basic question but I still don't understand. If I have a hash, for example:
my %md_hash = ();
$md_hash{'top'}{'primary'}{'secondary'} = 0;
How come this is true?
if ($md_hash{'top'}{'foobar'}{'secondary'} == 0) {
print "I'm true even though I'm not in that hash\n";
}
There is no "foobar" level in the hash so shouldn't that result in false?
TIA
This isn't a multidimensional hash specific question.
It works the same with
my %foo;
if ( $foo{'bar'} == 0 ) {
print "I'm true even though I'm not in that hash\n";
}
$foo{'bar'} is undef, which compares true to 0, albeit with a warning if you have warnings enabled as you should.
There is an additional side effect in your case; when you say
my %md_hash = ();
$md_hash{'top'}{'primary'}{'secondary'} = 0;
if ( $md_hash{'top'}{'foobar'}{'secondary'} == 0 ) {
print "I'm true even though I'm not in that hash\n";
}
$md_hash{'top'} returns a hash reference, and the 'foobar' key is looked for in that hash. Because of the {'secondary'}, that 'foobar' element lookup is in hash-dereference context. This makes $md_hash{'top'}{'foobar'} "autovivify" a hash reference as the value of the 'foobar' key, leaving you with this structure:
my %md_hash = (
'top' => {
'primary' => {
'secondary' => 0,
},
'foobar' => {},
},
);
The autovivification pragma can be used to disable this behavior. People sometimes assert that exists() has some effect on autovifification, but this is not true.
You are testing an undefined value for numeric zero. Of course you get a true result! What were you expecting?
You should also get a warning under use warnings. Why didn’t you?
If you do not start a program with:
use v5.12; # or whatever it is you are using
use strict;
use warnings;
you really shouldn’t even bother. :)
EDIT
NB: I am only clarifying for correctness, because the comment lines are not good for that. I really could not possibly care one whingeing whit less about the reputation points. I just want people to understand.
Even under the CPAN autovivification module, nothing changes. Witness:
use v5.10;
use strict;
use warnings;
no autovivification;
my %md_hash = ();
$md_hash{top}{primary}{secondary} = 0;
if ($md_hash{top}{foobar}{secondary} == 0) {
say "yup, that was zero.";
}
When run, that says:
$ perl /tmp/demo
Use of uninitialized value in numeric eq (==) at /tmp/demo line 10.
yup, that was zero.
The test is the == operator. Its RHS is 0. Its LHS is undef irrespective of autovivification. Since undef is numerically 0, that means that both LHS and RHS contain 0, which the == correctly identifies as holding the same number.
Autovivification is not the issue here, as ysth correctly observes when he writes that “This isn’t a multidimensional hash specific question.” All that matters is what you pass to ==. And undef is numerically 0.
You can stop autoviv if you really, really want to — by using the CPAN pragma. But you will not ever manage to change what happens here by suppressing autoviv. That shows that it is not an autoviv matter at all, just an undef one.
Now, you will get “extra” keys when you do this, since the undefined lvalue will get filled in on the way through the dereference chain. These are neccesarily all the same:
$md_hash{top}{foobar}{secondary} # implicit arrow for infix deref
$md_hash{top}->{foobar}->{secondary} # explicit arrow for infix deref
${ ${ $md_hash{top} }{foobar} }{secondary} # explicit prefix deref
Whenever you deref an lvaluable undef in Perl, that storage location always gets filled in with the proper sort of reference to a newly allocated anonymous referent of the proper type. In short, it autovivs.
And that you can stop either by suppressing or else by sidestepping the autoviv. However, denying the autoviv is not the same as sidestepping it, because you just change what sort of thing gets returned. The overall expession is still fully evaluated: there is no automatic short-circuiting just because you suppress autoviv. That’s because autoviv is not the problem (and if it were not there, you would be really annoyed: trust me).
If you want short-circuiting, you have to write that yourself. I never seem to need to myself. In Perl, that is. On the other hand, C programmers are quite accustomed to writing
if (p && p->whatever) { ... }
And so you can do that, too, if you want to. However, it is pretty rare in my own experience. You almost have to bend over wrongwards in Perl for that ever to make a difference, as it is quite easy to arrange for your code not to change how it acts if there are empty levels.
Try a search on "Perl autovivification".
The hash values "spring into existence" when you first access them. In this case, the value is undef, which when interpreted as a number is zero.
To test for existence of a hash value without auto-vivifying it, use the exists operator:
if (exists $md_hash{'top'}{'foobar'}{'secondary'}
&& $md_hash{'top'}{'foobar'}{'secondary'} == 0) {
print "I exist and I am zero\n";
}
Note that this will still auto-vivify $md_hash{'top'} and
$md_hash{'top'}{'foobar'} (i.e. the sub-hashes).
[edit]
As tchrist points out in a comment, it is poor style to compare undef against anything. So a better way to write this code would be:
if (defined $md_hash{'top'}{'foobar'}{'secondary'}
&& $md_hash{'top'}{'foobar'}{'secondary'} == 0) {
print "I exist and I am zero\n";
}
(Although this will now auto-vivify all three levels of the nested hash, setting the lowest level to undef'.)

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.