Why does Perl's strict mode allow you to dereference a variable with an undefined value in this foreach context but not in an assignment context? - perl

This code:
#!/usr/bin/perl
use 5.18.0;
use strict;
# Part 1
my $undef = undef;
print "1 $undef\n";
foreach my $index (#$undef) {
print "unreachable with no crash\n";
}
print "2 $undef\n";
# Part 2
my $undef = undef;
my #array = #$undef;
print "unreachable with crash\n";
Outputs:
1
2 ARRAY(0x7faefa803ee8)
Can't use an undefined value as an ARRAY reference at /tmp/perlfile line 12.
Questions about Part 1:
Why does dereferencing $undef in the Part 1 change $undef to an arrayref to an empty array?
Are there other contexts (other than a foreach) where dereferencing $undef would change it in the same way? What is the terminology to describe the most generic such case?
Questions about Part 2:
Why does dereferencing $undef in the Part 2 fall afoul of strict?
Are there other contexts (other than assignment) where dereferencing $undef would fall afoul of strict. What is the terminology to describe the most generic such case?

1) for() in Perl puts its operand into "l-value context", therefore the $undef is being auto-vivified into existence as an array (reference) with zero elements (see this relatively similar question/answer regarding l-value context).
3) Because you're trying to coercively assign an undefined value into something else in r-value context, and that's illegal under strict (nothing gets auto-vivified in this context, so you're not magically creating a variable from nothing like you would be in an l-value operation).
As far as question 2 and 4, there are several other context, too many to think of off the top of my head. For 2, map() comes to mind, or any other operation that treats the operand as an l-value.

When you dereference an undefined variable in lvalue context, Perl will auto-vivify the reference and that which it references.
For example,
#$ref = qw( a b c );
means
#{ $ref //= [] } = qw( a b c );
When you dereference an undefined variable in rvalue context, Perl won't auto-vivify. Under strict refs, this is an error. Otherwise, undefined is stringified (with warning) to the empty string, which is used as symbolic reference.
For example,
no strict qw( refs ); my $ref; my #a = #$ref;
is equivalent to
no strict qw( refs ); my #a = #{""};
(Aside from the lack of warning for the latter.)
Lvalue context is provided to:
The left-hand-side argument of assignments. (This is the "L" in "lvalue".)
Arguments of sub and method calls (because of aliasing of elements of #_).
Foreach's list (because of aliasing of $_).
The operands of some named operators (e.g. map and grep, because of aliasing of $_).

Related

How to check if LVALUE represents SCALAR

For years, I am using a code that contains the following condition
ref \$_[0] eq 'SCALAR'
I always expect there an ARRAY or SCALAR, but recently I passed substr() into that parameter. Unexpected things happened. The condition returned a false value.
Then I figured it out. The ref returned LVALUE instead of SCALAR.
Since LVALUE is a weird type of reference, I need to check if a scalar is behind it.
How can I check is this LVALUE represents SCALAR or not?
An LVALUE is a special magical value that takes action when it is assigned to. So for example, calling f($hash{foo}) doesn't immediately create the foo entry in %hash; instead a temporary LVALUE is created and passed to the sub as $_[0]. If $_[0] is subsequently assigned to, that value gets stored as $hash{foo}.
An LVALUE is a scalar (i.e. it holds a single value), just an odd type of scalar.
So it's likely that the code can be fixed by just accepting both SCALAR and LVALUE as valid values. But it will depend on exactly why the check is done. It may also be that such a check is a logic error and is not actually needed, or is buggy. For example ref \$_[0] should never return ARRAY.
Values that can be returned by reftype( $x ) (from Scalar::Util) in Perl 5.36:
Value
Description
Can be dereferenced?
undef
The value of $x isn't a reference.
No
"ARRAY"
$x reference an array (a scalar of type PVAV.
As an array
"HASH"
$x reference a hash (a scalar of type PVHV).
As a hash
"CODE"
$x reference a sub (a scalar of type PVCV).
As a sub
"GLOB"
$x reference a glob (a scalar of type PVGV).
As a glob, scalar, array, hash or code
"LVALUE"
$x references a scalar of type PVLV, with some exceptions.
As a scalar
"REGEXP"
$x references a scalar that's a compiled regexp pattern (a scalar of type REGEXP).
As a scalar
"IO"
$x references a scalar that's an IO object (a scalar of type PVIO).
No
"FORMAT"
$x references a scalar of type PVFM.I don't know if it's possible to get a reference to a format.
No
"INVLIST"
$x references a scalar of type INVLIST.This is an internal type. You should never see this.
No
"REF"
$x references a scalar that contains a reference.
As a scalar
"VSTRING"
$x references a scalar that contains a version string.
As a scalar
"SCALAR"
$x references something else.
As a scalar
Exceptions:
SCALAR is returned for reference to a tied PVLV.
REF is returned for reference to a PVLV containing a reference.
Warnings:
REF is returned for magical variables that currently contain a reference, and SCALAR for those that don't. But the value of magical variables can change every time you fetch it.
ref( $x ) is similar. It returns false (the special scalar, not the string false) instead of undef if the value of $x isn't a reference. If $x is a reference to something blessed, it returns the package into which it is blessed. Otherwise, it returns the same as reftype( $x ). I consider ref a broken mix of reftype and blessed.
I always expect there an ARRAY or SCALAR
reftype( \$_[0] ) will never be ARRAY.
ref( \$_[0] ) can return ARRAY, but only if you do something weird like my $x; bless( \$x, "ARRAY" ); f( $x ). (And that's why I consider it broken.)
$_[0] is always going to a scalar. Array elements can only be scalars. Arguments can only be scalars.
How can I check is this LVALUE represents SCALAR or not?
It's always a scalar.
In your case, you don't need reftype or ref to tell you that. The value of array elements are always scalars, so $_[0] is always going to be a scalar.
As per above, it's returned for scalars of type PVLV. It's a type of magical scalar. Most magical scalars are of type PVMG, but some types of magic take advantage of the extra fields provided by a PVLV scalar. PVLV scalars are returned by $h{EXPR} as an argument, pos as lvalue, substr as lvalue, vec as lvalue, and more.

How can #? be used on a dereferenced array without first using #?

An array in perl is dereferenced like so,
my #array = #{$array_reference};
When trying to assign an array to a dereference without the '#', like,
my #array = {$array_reference};
Perl throws the error, 'Odd number of elements in anonymous hash at ./sand.pl line 22.' We can't assign it to an array variable becauase Perl is confused about the type.
So how can we perform...
my $lastindex = $#{$array_reference};
if Perl struggles to understand that '{$array_reference}' is an array type? It would make more sense to me if this looked like,
my $lastindex = $##{$array_reference};
(despite looking much uglier).
tl;dr: It's $#{$array_reference} to match the syntax of $#array.
{} is overloaded with many meanings and that's just how Perl is.
Sometimes {} creates an anonymous hash. That's what {$array_reference} is doing, trying to make a hash where the key is the stringification of $array_reference, something like "ARRAY(0x7fb21e803280)" and there is no value. Because you're trying to create a hash with a key and no value you get an "odd number of elements" warning.
Sometimes {...} is a block like sub { ... } or if(...) { ... }, or do {...} and so on.
Sometimes it's a bare block like { local $/; ... }.
Sometimes it's indicating the key of a hash like $hash{key} or $hash->{key}.
Preceeded with certain sigils {} makes dereferencing explicit. While you can write $#$array_reference or #$array_reference sometimes you want to dereference something that isn't a simple scalar. For example, if you had a function that returned an array reference you could get its size in one line with $#{ get_array_reference() }. It's $#{$array_reference} to match the syntax of $#array.
$#{...} dereferences an array and gets the index. #{...} dereferences an array. %{...} dereferences a hash. ${...} dereferences a scalar. *{...} dereferences a glob.
You might find the section on Variable Names and Sigils in Modern Perl helpful to see the pattern better.
It would make more sense to me if this looked like...
There's a lot of things like that. Perl has been around since 1987. A lot of these design decisions were made decades ago. The code for deciding what {} means is particularly complex. That there is a distinction between an array and an array reference at all is a bit odd.
$array[$index]
#array[#indexes]
#array
$#array
is equivalent to
${ \#array }[$index]
#{ \#array }[#indexes]
#{ \#array }
$#{ \#array }
See the pattern? Wherever the NAME of an array isused, you can use a BLOCK that returns a reference to an array instead. That means you can use
${ $ref }[$index]
#{ $ref }[#indexes]
#{ $ref }
$#{ $ref }
This is illustrated in Perl Dereferencing Syntax.
Note that you can omit the curlies if the BLOCK contains nothing but a simple scalar.
$$ref[$index]
#$ref[#indexes]
#$ref
$#$ref
There's also an "arrow" syntax which is considered clearer.
$ref->[$index]
$ref->#[#indexes]
$ref->#*
$ref->$#*
Perl is confused about the type
Perl struggles to understand that '{$array_reference}' is an array type
Well, it's not an array type. Perl doesn't "struggle"; you just have wrong expectations.
The general rule (as explained in perldoc perlreftut) is: You can always use a reference in curly braces in place of a variable name.
Thus:
#array # a whole array
#{ $array_ref } # same thing with a reference
$array[$i] # an array element
${ $array_ref }[$i] # same thing with a reference
$#array # last index of an array
$#{ $array_ref } # same thing with a reference
On the other hand, what's going on with
my #array = {$array_reference};
is that you're using the syntax for a hash reference constructor, { LIST }. The warning occurs because the list in question is supposed to have an even number of elements (for keys and values):
my $hash_ref = {
key1 => 'value1',
key2 => 'value2',
};
What you wrote is treated as
my #array = ({
$array_reference => undef,
});
i.e. an array containing a single element, which is a reference to a hash containing a single key, which is a stringified reference (and whose value is undef).
The syntactic difference between a dereference and a hashref constructor is that a dereference starts with a sigil (such as $, #, or %) whereas a hashref constructor starts with just a bare {.
Technically speaking the { } in the dereference syntax form an actual block of code:
print ${
print "one\n"; # yeah, I just put a statement in the middle of an expression
print "two\n";
["three"] # the last expression in this block is implicitly returned
# (and dereferenced by the surrounding $ [0] construct outside)
}[0], "\n";
For (hopefully) obvious reasons, no one actually does this in real code.
The syntax is
my $lastindex = $#$array_reference;
which assigns to $lastindex the index of the last element of the anonymous array which reference is in the variable $array_reference.
The code
my #ary = { $ra }; # works but you get a warning
doesn't throw "an error" but rather a warning. In other words, you do get #ary with one element, a reference to an anonymous hash. However, a hash need have an even number of elements so you also get a warning that that isn't so.
Your last attempt dereferences the array with #{$array_reference} -- which returns a list, not an array variable. A "list" is a fleeting collection of scalars in memory (think of copying scalars on stack to go elsewhere); there is no notion of "index" for such a thing. For this reason a $##{$ra} isn't even parsed as intended and is a syntax error.
The syntax $#ary works only with a variable #ary, and then there is the $#$arrayref syntax. You can in general write $#{$arrayref} since the curlies allow for an arbitrary expression that evaluates to an array reference but there is no reason for that since you do have a variable with an array reference.
I'd agree readily that much of this syntax takes some getting-used-to, to put it that way.

How do I define an anonymous scalar ref in Perl?

How do I properly define an anonymous scalar ref in Perl?
my $scalar_ref = ?;
my $array_ref = [];
my $hash_ref = {};
If you want a reference to some mutable storage, there's no particularly neat direct syntax for it. About the best you can manage is
my $var;
my $sref = \$var;
Or neater
my $sref = \my $var;
Or if you don't want the variable itself to be in scope any more, you can use a do block:
my $sref = do { \my $tmp };
At this point you can pass $sref around by value, and any mutations to the scalar it references will be seen by others.
This technique of course works just as well for array or hash references, just that there's neater syntax for doing that with [] and {}:
my $aref = do { \my #tmp }; ## same as my $aref = [];
my $href = do { \my %tmp }; ## same as my $href = {};
Usually you just declare and don't initialize it.
my $foo; # will be undef.
You have to consider that empty hash refs and empty array refs point to a data structure that has a representation. Both of them, when dereferenced, give you an empty list.
perldata says (emphasis mine):
There are actually two varieties of null strings (sometimes referred to as "empty" strings), a defined one and an undefined one. The defined version is just a string of length zero, such as "" . The undefined version is the value that indicates that there is no real value for something, such as when there was an error, or at end of file, or when you refer to an uninitialized variable or element of an array or hash. Although in early versions of Perl, an undefined scalar could become defined when first used in a place expecting a defined value, this no longer happens except for rare cases of autovivification as explained in perlref. You can use the defined() operator to determine whether a scalar value is defined (this has no meaning on arrays or hashes), and the undef() operator to produce an undefined value.
So an empty scalar (which it didn't actually say) would be undef. If you want it to be a reference, make it one.
use strict;
use warnings;
use Data::Printer;
my $scalar_ref = \undef;
my $scalar = $$scalar_ref;
p $scalar_ref;
p $scalar;
This will output:
\ undef
undef
However, as ikegami pointed out, it will be read-only because it's not a variable. LeoNerd provides a better approach for this in his answer.
Anyway, my point is, an empty hash ref and an empty array ref when dereferenced both contain an empty list (). And that is not undef but nothing. But there is no nothing as a scalar value, because everything that is not nothing is a scalar value.
my $a = [];
say ref $r; # ARRAY
say scalar #$r; # 0
say "'#$r'"; # ''
So there is no real way to initialize with nothing. You can only not initialize. But Moose will turn it to undef anyway.
What you could do is make it maybe a scalar ref.
use strict;
use warnings;
use Data::Printer;
{
package Foo;
use Moose;
has bar => (
is => 'rw',
isa => 'Maybe[ScalarRef]',
predicate => 'has_bar'
);
}
my $foo = Foo->new;
p $foo->has_bar;
p $foo;
say $foo->bar;
Output:
""
Foo {
Parents Moose::Object
public methods (3) : bar, has_bar, meta
private methods (0)
internals: {}
}
Use of uninitialized value in say at scratch.pl line 268.
The predicate gives a value that is not true (the empty string ""). undef is also not true. The people who made Moose decided to go with that, but it really doesn't matter.
Probably what you want is not have a default value, but just make it a ScalarRef an required.
Note that perlref doesn't say anything about initializing an empty scalar ref either.
I'm not entirely sure why you need to but I'd suggest:
my $ref = \undef;
print ref $ref;
Or perhaps:
my $ref = \0;
#LeoNerd's answer is spot on.
Another option is to use a temporary anonymous hash value:
my $scalar_ref = \{_=>undef}->{_};
$$scalar_ref = "Hello!\n";
print $$scalar_ref;

Perl dereferencing in non-strict mode

In Perl, if I have:
no strict;
#ARY = (58, 90);
To operate on an element of the array, say it, the 2nd one, I would write (possibly as part of a larger expression):
$ARY[1] # The most common way found in Perldoc's idioms.
Though, for some reason these also work:
#ARY[1]
#{ARY[1]}
Resulting all in the same object:
print (\$ARY[1]);
print (\#ARY[1]);
print (\#{ARY[1]});
Output:
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
What is the syntax rules that enable this sort of constructs? How far could one devise reliable program code with each of these constructs, or with a mix of all of them either? How interchangeable are these expressions? (always speaking in a non-strict context).
On a concern of justifying how I come into this question, I agree "use strict" as a better practice, still I'm interested at some knowledge on build-up non-strict expressions.
In an attemp to find myself some help to this uneasiness, I came to:
The notion on "no strict;" of not complaining about undeclared
variables and quirk syntax.
The prefix dereference having higher precedence than subindex [] (perldsc § "Caveat on precedence").
The clarification on when to use # instead of $ (perldata § "Slices").
The lack of "[]" (array subscript / slice) description among the Perl's operators (perlop), which lead me to think it is not an
operator... (yet it has to be something else. But, what?).
For what I learned, none of these hints, put together, make me better understand my issue.
Thanks in advance.
Quotation from perlfaq4:
What is the difference between $array[1] and #array[1]?
The difference is the sigil, that special character in front of the array name. The $ sigil means "exactly one item", while the # sigil means "zero or more items". The $ gets you a single scalar, while the # gets you a list.
Please see: What is the difference between $array[1] and #array[1]?
#ARY[1] is indeed a slice, in fact a slice of only one member. The difference is it creates a list context:
#ar1[0] = qw( a b c ); # List context.
$ar2[0] = qw( a b c ); # Scalar context, the last value is returned.
print "<#ar1> <#ar2>\n";
Output:
<a> <c>
Besides using strict, turn warnings on, too. You'll get the following warning:
Scalar value #ar1[0] better written as $ar1[0]
In perlop, you can read that "Perl's prefix dereferencing operators are typed: $, #, %, and &." The standard syntax is SIGIL { ... }, but in the simple cases, the curly braces can be omitted.
See Can you use string as a HASH ref while "strict refs" in use? for some fun with no strict refs and its emulation under strict.
Extending choroba's answer, to check a particular context, you can use wantarray
sub context { return wantarray ? "LIST" : "SCALAR" }
print $ary1[0] = context(), "\n";
print #ary1[0] = context(), "\n";
Outputs:
SCALAR
LIST
Nothing you did requires no strict; other than to hide your error of doing
#ARY = (58, 90);
when you should have done
my #ARY = (58, 90);
The following returns a single element of the array. Since EXPR is to return a single index, it is evaluated in scalar context.
$array[EXPR]
e.g.
my #array = qw( a b c d );
my $index = 2;
my $ele = $array[$index]; # my $ele = 'c';
The following returns the elements identified by LIST. Since LIST is to return 0 or more elements, it must be evaluated in list context.
#array[LIST]
e.g.
my #array = qw( a b c d );
my #indexes ( 1, 2 );
my #slice = $array[#indexes]; # my #slice = qw( b c );
\( $ARY[$index] ) # Returns a ref to the element returned by $ARY[$index]
\( #ARY[#indexes] ) # Returns refs to each element returned by #ARY[#indexes]
${foo} # Weird way of writing $foo. Useful in literals, e.g. "${foo}bar"
#{foo} # Weird way of writing #foo. Useful in literals, e.g. "#{foo}bar"
${foo}[...] # Weird way of writing $foo[...].
Most people don't even know you can use these outside of string literals.

What does `$hash{$key} |= {}` do in Perl?

I was wrestling with some Perl that uses hash references.
In the end it turned out that my problem was the line:
$myhash{$key} |= {};
That is, "assign $myhash{$key} a reference to an empty hash, unless it already has a value".
Dereferencing this and trying to use it as a hash reference, however, resulted in interpreter errors about using a string as a hash reference.
Changing it to:
if( ! exists $myhash{$key}) {
$myhash{$key} = {};
}
... made things work.
So I don't have a problem. But I'm curious about what was going on.
Can anyone explain?
The reason you're seeing an error about using a string as a hash reference is because you're using the wrong operator. |= means "bitwise-or-assign." In other words,
$foo |= $bar;
is the same as
$foo = $foo | $bar
What's happening in your example is that your new anonymous hash reference is getting stringified, then bitwise-ORed with the value of $myhash{$key}. To confuse matters further, if $myhash{$key} is undefined at the time, the value is the simple stringification of the hash reference, which looks like HASH(0x80fc284). So if you do a cursory inspection of the structure, it may look like a hash reference, but it's not. Here's some useful output via Data::Dumper:
perl -MData::Dumper -le '$hash{foo} |= { }; print Dumper \%hash'
$VAR1 = {
'foo' => 'HASH(0x80fc284)'
};
And here's what you get when you use the correct operator:
perl -MData::Dumper -le '$hash{foo} ||= { }; print Dumper \%hash'
$VAR1 = {
'foo' => {}
};
Perl has shorthand assignment operators. The ||= operator is often used to set default values for variables due to Perl's feature of having logical operators return the last value evaluated. The problem is that you used |= which is a bitwise or instead of ||= which is a logical or.
As of Perl 5.10 it's better to use //= instead. // is the logical defined-or operator and doesn't fail in the corner case where the current value is defined but false.
I think your problem was using "|=" (bitwise-or assignment) instead of "||=" (assign if false).
Note that your new code is not exactly equivalent. The difference is that "$myhash{$key} ||= {}" will replace existing-but-false values with a hash reference, but the new one won't. In practice, this is probably not relevant.
Try this:
my %myhash;
$myhash{$key} ||= {};
You can't declare a hash element in a my clause, as far as I know. You declare the hash first, then add the element in.
Edit: I see you've taken out the my. How about trying ||= instead of |=? The former is idiomatic for "lazy" initialisation.