How to check if LVALUE represents SCALAR - perl

For years, I am using a code that contains the following condition
ref \$_[0] eq 'SCALAR'
I always expect there an ARRAY or SCALAR, but recently I passed substr() into that parameter. Unexpected things happened. The condition returned a false value.
Then I figured it out. The ref returned LVALUE instead of SCALAR.
Since LVALUE is a weird type of reference, I need to check if a scalar is behind it.
How can I check is this LVALUE represents SCALAR or not?

An LVALUE is a special magical value that takes action when it is assigned to. So for example, calling f($hash{foo}) doesn't immediately create the foo entry in %hash; instead a temporary LVALUE is created and passed to the sub as $_[0]. If $_[0] is subsequently assigned to, that value gets stored as $hash{foo}.
An LVALUE is a scalar (i.e. it holds a single value), just an odd type of scalar.
So it's likely that the code can be fixed by just accepting both SCALAR and LVALUE as valid values. But it will depend on exactly why the check is done. It may also be that such a check is a logic error and is not actually needed, or is buggy. For example ref \$_[0] should never return ARRAY.

Values that can be returned by reftype( $x ) (from Scalar::Util) in Perl 5.36:
Value
Description
Can be dereferenced?
undef
The value of $x isn't a reference.
No
"ARRAY"
$x reference an array (a scalar of type PVAV.
As an array
"HASH"
$x reference a hash (a scalar of type PVHV).
As a hash
"CODE"
$x reference a sub (a scalar of type PVCV).
As a sub
"GLOB"
$x reference a glob (a scalar of type PVGV).
As a glob, scalar, array, hash or code
"LVALUE"
$x references a scalar of type PVLV, with some exceptions.
As a scalar
"REGEXP"
$x references a scalar that's a compiled regexp pattern (a scalar of type REGEXP).
As a scalar
"IO"
$x references a scalar that's an IO object (a scalar of type PVIO).
No
"FORMAT"
$x references a scalar of type PVFM.I don't know if it's possible to get a reference to a format.
No
"INVLIST"
$x references a scalar of type INVLIST.This is an internal type. You should never see this.
No
"REF"
$x references a scalar that contains a reference.
As a scalar
"VSTRING"
$x references a scalar that contains a version string.
As a scalar
"SCALAR"
$x references something else.
As a scalar
Exceptions:
SCALAR is returned for reference to a tied PVLV.
REF is returned for reference to a PVLV containing a reference.
Warnings:
REF is returned for magical variables that currently contain a reference, and SCALAR for those that don't. But the value of magical variables can change every time you fetch it.
ref( $x ) is similar. It returns false (the special scalar, not the string false) instead of undef if the value of $x isn't a reference. If $x is a reference to something blessed, it returns the package into which it is blessed. Otherwise, it returns the same as reftype( $x ). I consider ref a broken mix of reftype and blessed.
I always expect there an ARRAY or SCALAR
reftype( \$_[0] ) will never be ARRAY.
ref( \$_[0] ) can return ARRAY, but only if you do something weird like my $x; bless( \$x, "ARRAY" ); f( $x ). (And that's why I consider it broken.)
$_[0] is always going to a scalar. Array elements can only be scalars. Arguments can only be scalars.
How can I check is this LVALUE represents SCALAR or not?
It's always a scalar.
In your case, you don't need reftype or ref to tell you that. The value of array elements are always scalars, so $_[0] is always going to be a scalar.
As per above, it's returned for scalars of type PVLV. It's a type of magical scalar. Most magical scalars are of type PVMG, but some types of magic take advantage of the extra fields provided by a PVLV scalar. PVLV scalars are returned by $h{EXPR} as an argument, pos as lvalue, substr as lvalue, vec as lvalue, and more.

Related

perl assign reference to subroutine

I use #_ in a subroutine to get a parameter which is assigned as a reference of an array, but the result dose not showing as an array reference.
My code is down below.
my #aar = (9,8,7,6,5);
my $ref = \#aar;
AAR($ref);
sub AAR {
my $ref = #_;
print "ref = $ref";
}
This will print 1 , not an array reference , but if I replace #_ with shift , the print result will be a reference.
can anyone explain why I can't get a reference using #_ to me ?
This is about context in Perl. It is a crucial aspect of the language.
An expression like
my $var = #ary;
attempts to assign an array to a scalar.
That doesn't make sense as it stands and what happens is that the right-hand side is evaluated to the number of elements of the array and that is assigned to $var.
In order to change that behavior you need to provide the "list context" to the assignment operator.† In this case you'd do
my ($var) = #ary;
and now we have an assignment of a list (of array elements) to a list (of variables, here only $var), where they are assigned one for one. So here the first element of #ary is assigned to $var. Please note that this statement plays loose with the elusive notion of the "list."
So in your case you want
my ($ref) = #_;
and the first element from #_ is assigned to $ref, as needed.
Alternatively, you can remove and return the first element of #_ using shift, in which case the scalar-context assignment is fine
my $ref = shift #_;
In this case you can also do
my $ref = shift;
since shift by default works on #_.
This is useful when you want to remove the first element of input as it's being assigned so that the remaining #_ is well suited for further processing. It is often done in object-oriented code.
It is well worth pointing out that many operators and builtin facilities in Perl act differently depending on what context they are invoked in.
For some specifics, just a few examples: the regex match operator returns true/false (1/empty string) in scalar context but the actual matches in list context,‡ readdir returns a single entry in scalar context but all of them in list context, while localtime shows a bit more distinct difference. This context-sensitive behavior is in every corner of Perl.
User level subroutines can be made to behave that way via wantarray.
†
See Scalar vs List Assignment Operator
for a detailed discussion
‡
See it in perlretut and in perlop for instance
When you assign an array to a scalar, you're getting the size of the array. You pass one argument (a reference to an array) to AAR, that's why you get 1.
To get the actual parameters, place the local variable in braces:
sub AAR {
my ($ref) = #_;
print "ref = $ref\n";
}
This prints something like ref = ARRAY(0x5566c89a4710).
You can then use the reference to access the array elements like this:
print join(", ", #{$ref});

How can #? be used on a dereferenced array without first using #?

An array in perl is dereferenced like so,
my #array = #{$array_reference};
When trying to assign an array to a dereference without the '#', like,
my #array = {$array_reference};
Perl throws the error, 'Odd number of elements in anonymous hash at ./sand.pl line 22.' We can't assign it to an array variable becauase Perl is confused about the type.
So how can we perform...
my $lastindex = $#{$array_reference};
if Perl struggles to understand that '{$array_reference}' is an array type? It would make more sense to me if this looked like,
my $lastindex = $##{$array_reference};
(despite looking much uglier).
tl;dr: It's $#{$array_reference} to match the syntax of $#array.
{} is overloaded with many meanings and that's just how Perl is.
Sometimes {} creates an anonymous hash. That's what {$array_reference} is doing, trying to make a hash where the key is the stringification of $array_reference, something like "ARRAY(0x7fb21e803280)" and there is no value. Because you're trying to create a hash with a key and no value you get an "odd number of elements" warning.
Sometimes {...} is a block like sub { ... } or if(...) { ... }, or do {...} and so on.
Sometimes it's a bare block like { local $/; ... }.
Sometimes it's indicating the key of a hash like $hash{key} or $hash->{key}.
Preceeded with certain sigils {} makes dereferencing explicit. While you can write $#$array_reference or #$array_reference sometimes you want to dereference something that isn't a simple scalar. For example, if you had a function that returned an array reference you could get its size in one line with $#{ get_array_reference() }. It's $#{$array_reference} to match the syntax of $#array.
$#{...} dereferences an array and gets the index. #{...} dereferences an array. %{...} dereferences a hash. ${...} dereferences a scalar. *{...} dereferences a glob.
You might find the section on Variable Names and Sigils in Modern Perl helpful to see the pattern better.
It would make more sense to me if this looked like...
There's a lot of things like that. Perl has been around since 1987. A lot of these design decisions were made decades ago. The code for deciding what {} means is particularly complex. That there is a distinction between an array and an array reference at all is a bit odd.
$array[$index]
#array[#indexes]
#array
$#array
is equivalent to
${ \#array }[$index]
#{ \#array }[#indexes]
#{ \#array }
$#{ \#array }
See the pattern? Wherever the NAME of an array isused, you can use a BLOCK that returns a reference to an array instead. That means you can use
${ $ref }[$index]
#{ $ref }[#indexes]
#{ $ref }
$#{ $ref }
This is illustrated in Perl Dereferencing Syntax.
Note that you can omit the curlies if the BLOCK contains nothing but a simple scalar.
$$ref[$index]
#$ref[#indexes]
#$ref
$#$ref
There's also an "arrow" syntax which is considered clearer.
$ref->[$index]
$ref->#[#indexes]
$ref->#*
$ref->$#*
Perl is confused about the type
Perl struggles to understand that '{$array_reference}' is an array type
Well, it's not an array type. Perl doesn't "struggle"; you just have wrong expectations.
The general rule (as explained in perldoc perlreftut) is: You can always use a reference in curly braces in place of a variable name.
Thus:
#array # a whole array
#{ $array_ref } # same thing with a reference
$array[$i] # an array element
${ $array_ref }[$i] # same thing with a reference
$#array # last index of an array
$#{ $array_ref } # same thing with a reference
On the other hand, what's going on with
my #array = {$array_reference};
is that you're using the syntax for a hash reference constructor, { LIST }. The warning occurs because the list in question is supposed to have an even number of elements (for keys and values):
my $hash_ref = {
key1 => 'value1',
key2 => 'value2',
};
What you wrote is treated as
my #array = ({
$array_reference => undef,
});
i.e. an array containing a single element, which is a reference to a hash containing a single key, which is a stringified reference (and whose value is undef).
The syntactic difference between a dereference and a hashref constructor is that a dereference starts with a sigil (such as $, #, or %) whereas a hashref constructor starts with just a bare {.
Technically speaking the { } in the dereference syntax form an actual block of code:
print ${
print "one\n"; # yeah, I just put a statement in the middle of an expression
print "two\n";
["three"] # the last expression in this block is implicitly returned
# (and dereferenced by the surrounding $ [0] construct outside)
}[0], "\n";
For (hopefully) obvious reasons, no one actually does this in real code.
The syntax is
my $lastindex = $#$array_reference;
which assigns to $lastindex the index of the last element of the anonymous array which reference is in the variable $array_reference.
The code
my #ary = { $ra }; # works but you get a warning
doesn't throw "an error" but rather a warning. In other words, you do get #ary with one element, a reference to an anonymous hash. However, a hash need have an even number of elements so you also get a warning that that isn't so.
Your last attempt dereferences the array with #{$array_reference} -- which returns a list, not an array variable. A "list" is a fleeting collection of scalars in memory (think of copying scalars on stack to go elsewhere); there is no notion of "index" for such a thing. For this reason a $##{$ra} isn't even parsed as intended and is a syntax error.
The syntax $#ary works only with a variable #ary, and then there is the $#$arrayref syntax. You can in general write $#{$arrayref} since the curlies allow for an arbitrary expression that evaluates to an array reference but there is no reason for that since you do have a variable with an array reference.
I'd agree readily that much of this syntax takes some getting-used-to, to put it that way.

Why does Perl's strict mode allow you to dereference a variable with an undefined value in this foreach context but not in an assignment context?

This code:
#!/usr/bin/perl
use 5.18.0;
use strict;
# Part 1
my $undef = undef;
print "1 $undef\n";
foreach my $index (#$undef) {
print "unreachable with no crash\n";
}
print "2 $undef\n";
# Part 2
my $undef = undef;
my #array = #$undef;
print "unreachable with crash\n";
Outputs:
1
2 ARRAY(0x7faefa803ee8)
Can't use an undefined value as an ARRAY reference at /tmp/perlfile line 12.
Questions about Part 1:
Why does dereferencing $undef in the Part 1 change $undef to an arrayref to an empty array?
Are there other contexts (other than a foreach) where dereferencing $undef would change it in the same way? What is the terminology to describe the most generic such case?
Questions about Part 2:
Why does dereferencing $undef in the Part 2 fall afoul of strict?
Are there other contexts (other than assignment) where dereferencing $undef would fall afoul of strict. What is the terminology to describe the most generic such case?
1) for() in Perl puts its operand into "l-value context", therefore the $undef is being auto-vivified into existence as an array (reference) with zero elements (see this relatively similar question/answer regarding l-value context).
3) Because you're trying to coercively assign an undefined value into something else in r-value context, and that's illegal under strict (nothing gets auto-vivified in this context, so you're not magically creating a variable from nothing like you would be in an l-value operation).
As far as question 2 and 4, there are several other context, too many to think of off the top of my head. For 2, map() comes to mind, or any other operation that treats the operand as an l-value.
When you dereference an undefined variable in lvalue context, Perl will auto-vivify the reference and that which it references.
For example,
#$ref = qw( a b c );
means
#{ $ref //= [] } = qw( a b c );
When you dereference an undefined variable in rvalue context, Perl won't auto-vivify. Under strict refs, this is an error. Otherwise, undefined is stringified (with warning) to the empty string, which is used as symbolic reference.
For example,
no strict qw( refs ); my $ref; my #a = #$ref;
is equivalent to
no strict qw( refs ); my #a = #{""};
(Aside from the lack of warning for the latter.)
Lvalue context is provided to:
The left-hand-side argument of assignments. (This is the "L" in "lvalue".)
Arguments of sub and method calls (because of aliasing of elements of #_).
Foreach's list (because of aliasing of $_).
The operands of some named operators (e.g. map and grep, because of aliasing of $_).

How does #_ work in Perl subroutines?

I was always sure that if I pass a Perl subroutine a simple scalar, it can never change its value outside the subroutine. That is:
my $x = 100;
foo($x);
# without knowing anything about foo(), I'm sure $x still == 100
So if I want foo() to change x, I must pass it a reference to x.
Then I found out this is not the case:
sub foo {
$_[0] = 'CHANGED!';
}
my $x = 100;
foo($x);
print $x, "\n"; # prints 'CHANGED!'
And the same goes for array elements:
my #arr = (1,2,3);
print $arr[0], "\n"; # prints '1'
foo($arr[0]);
print $arr[0], "\n"; # prints 'CHANGED!'
That kinda surprised me. How does this work? Isn't the subroutine only gets the value of the argument? How does it know its address?
In Perl, the subroutine arguments stored in #_ are always aliases to the values at the call site. This aliasing only persists in #_, if you copy values out, that's what you get, values.
so in this sub:
sub example {
# #_ is an alias to the arguments
my ($x, $y, #rest) = #_; # $x $y and #rest contain copies of the values
my $args = \#_; # $args contains a reference to #_ which maintains aliases
}
Note that this aliasing happens after list expansion, so if you passed an array to example, the array expands in list context, and #_ is set to aliases of each element of the array (but the array itself is not available to example). If you wanted the latter, you would pass a reference to the array.
Aliasing of subroutine arguments is a very useful feature, but must be used with care. To prevent unintended modification of external variables, in Perl 6 you must specify that you want writable aliased arguments with is rw.
One of the lesser known but useful tricks is to use this aliasing feature to create array refs of aliases
my ($x, $y) = (1, 2);
my $alias = sub {\#_}->($x, $y);
$$alias[1]++; # $y is now 3
or aliased slices:
my $slice = sub {\#_}->(#somearray[3 .. 10]);
it also turns out that using sub {\#_}->(LIST) to create an array from a list is actually faster than [ LIST ] since Perl does not need to copy every value. Of course the downside (or upside depending on your perspective) is that the values remain aliased, so you can't change them without changing the originals.
As tchrist mentions in a comment to another answer, when you use any of Perl's aliasing constructs on #_, the $_ that they provide you is also an alias to the original subroutine arguments. Such as:
sub trim {s!^\s+!!, s!\s+$!! for #_} # in place trimming of white space
Lastly all of this behavior is nestable, so when using #_ (or a slice of it) in the argument list of another subroutine, it also gets aliases to the first subroutine's arguments:
sub add_1 {$_[0] += 1}
sub add_2 {
add_1(#_) for 1 .. 2;
}
This is all documented in detail in perldoc perlsub. For example:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The
array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the
corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the
function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the
element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
Perl passes arguments by reference, not by value. See http://www.troubleshooters.com/codecorn/littperl/perlsub.htm

How to clear a Perl hash

Let's say we define an anonymous hash like this:
my $hash = {};
And then use the hash afterwards. Then it's time to empty or clear the hash for
reuse. After some Google searching, I found:
%{$hash} = ()
and:
undef %{$hash}
Both will serve my needs. What's the difference between the two? Are they both identical ways to empty a hash?
%$hash_ref = (); makes more sense than undef-ing the hash. Undef-ing the hash says that you're done with the hash. Assigning an empty list says you just want an empty hash.
Yes, they are absolutely identical. Both remove any existing keys and values from the table and sets the hash to the empty list.
See perldoc -f undef:
undef EXPR
undef Undefines the value of EXPR, which must be an lvalue. Use only
on a scalar value, an array (using "#"), a hash (using "%"), a
subroutine (using "&"), or a typeglob (using "*")...
Examples:
undef $foo;
undef $bar{'blurfl'}; # Compare to: delete $bar{'blurfl'};
undef #ary;
undef %hash;
However, you should not use undef to remove the value of anything except a scalar. For other variable types, set it to the "empty" version of that type -- e.g. for arrays or hashes, #foo = (); %bar = ();