Understanding precedence when assigning and testing for definedness in Perl - perl

When trying to assign a variable and test it for definedness in one operation in Perl, as would be useful for instance in an if's condition, it would seem natural to me to write:
if ( defined my $thing = $object->get_thing ) {
$thing->do_something;
}
As far as my understanding goes, defined has the precedence of a rightward list operator, which is lower than that of the assignment, therefore I would expect my code above to be equivalent to:
if ( defined ( my $thing = $object->get_thing ) ) {
$thing->do_something;
}
While the latter, parenthesised code does work, the former yields the following fatal error: "Can't modify defined operator in scalar assignment".
It's not a big deal having to add parentheses, but I would love to understand why the first version doesn't work, e.g. what kind of "thing" defined is and what is its precedence?

Named operators are divided into unary operators (operators that always take exactly one operand) and list operators (everything else)[1].
defined and my[2] are unary operators, which have much higher precedence than other named operators.
The same goes for subs, so I'll use them to demonstrate.
$ perl -MO=Deparse,-p -e'sub f :lvalue {} sub g :lvalue {} f g $x = 123;'
sub f : lvalue { }
sub g : lvalue { }
f(g(($x = 123)));
-e syntax OK
$ perl -MO=Deparse,-p -e'sub f($) :lvalue {} sub g($) :lvalue {} f g $x = 123;'
sub f ($) : lvalue { }
sub g ($) : lvalue { }
(f(g($x)) = 123);
-e syntax OK
But of course, defined is not an lvalue function, so finding it on the LHS of an assignment results in an error.
and, or, not, xor, lt, le, gt, ge, eq, ne and cmp are not considered named operators.
my is very unusual. Aside from having both a compile-time and run-time effect, its syntax varies depending on whether parens are used around its argument(s) or not. Without parens, it's a unary operator. With parens, it's a list operator.

Related

Order of `shift` execution in Perl equation

For a demonstration of my question, consider the following Perl code:
use strict;
use warnings;
use Data::Dumper;
my #a = (1, 2);
my %h;
sub test {
$h{shift #_} = shift;
}
&test(#a);
print Dumper(%h);
The output IS as the following:
$VAR1 = '2';
$VAR2 = 1;
Why Perl executes the first shift from right side of the equation, and not from the left one?
Why the output IS NOT as the following?
$VAR1 = '1';
$VAR2 = 2;
In most language, operand evaluation order is undefined or at least undocumented for most operators.[1] Perl is no exception.
Does f() + g() call f() or g() first? Well, that's undocumented and presumably undefined.
Now, it turns out that perl is currently very consistent. The binary arithmetic operators will always evaluate their left-hand side operand before their right-hand side operand (including **, which is right-associative), while the scalar assignment operator and list assignment operator evaluate their RHS operand before their LHS operand.
Notable exceptions include the comma operator in scalar context, and short-circuiting operators.
The comma operator in scalar context is documented to evaluate its LHS before its RHS, though no such guarantee is made when it's called in list context.
Short-circuiting operators —namely &&, ||, and, or and the conditional operator— must necessarily evaluate their LHS before any other operand.

Perl dereferencing in non-strict mode

In Perl, if I have:
no strict;
#ARY = (58, 90);
To operate on an element of the array, say it, the 2nd one, I would write (possibly as part of a larger expression):
$ARY[1] # The most common way found in Perldoc's idioms.
Though, for some reason these also work:
#ARY[1]
#{ARY[1]}
Resulting all in the same object:
print (\$ARY[1]);
print (\#ARY[1]);
print (\#{ARY[1]});
Output:
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
What is the syntax rules that enable this sort of constructs? How far could one devise reliable program code with each of these constructs, or with a mix of all of them either? How interchangeable are these expressions? (always speaking in a non-strict context).
On a concern of justifying how I come into this question, I agree "use strict" as a better practice, still I'm interested at some knowledge on build-up non-strict expressions.
In an attemp to find myself some help to this uneasiness, I came to:
The notion on "no strict;" of not complaining about undeclared
variables and quirk syntax.
The prefix dereference having higher precedence than subindex [] (perldsc § "Caveat on precedence").
The clarification on when to use # instead of $ (perldata § "Slices").
The lack of "[]" (array subscript / slice) description among the Perl's operators (perlop), which lead me to think it is not an
operator... (yet it has to be something else. But, what?).
For what I learned, none of these hints, put together, make me better understand my issue.
Thanks in advance.
Quotation from perlfaq4:
What is the difference between $array[1] and #array[1]?
The difference is the sigil, that special character in front of the array name. The $ sigil means "exactly one item", while the # sigil means "zero or more items". The $ gets you a single scalar, while the # gets you a list.
Please see: What is the difference between $array[1] and #array[1]?
#ARY[1] is indeed a slice, in fact a slice of only one member. The difference is it creates a list context:
#ar1[0] = qw( a b c ); # List context.
$ar2[0] = qw( a b c ); # Scalar context, the last value is returned.
print "<#ar1> <#ar2>\n";
Output:
<a> <c>
Besides using strict, turn warnings on, too. You'll get the following warning:
Scalar value #ar1[0] better written as $ar1[0]
In perlop, you can read that "Perl's prefix dereferencing operators are typed: $, #, %, and &." The standard syntax is SIGIL { ... }, but in the simple cases, the curly braces can be omitted.
See Can you use string as a HASH ref while "strict refs" in use? for some fun with no strict refs and its emulation under strict.
Extending choroba's answer, to check a particular context, you can use wantarray
sub context { return wantarray ? "LIST" : "SCALAR" }
print $ary1[0] = context(), "\n";
print #ary1[0] = context(), "\n";
Outputs:
SCALAR
LIST
Nothing you did requires no strict; other than to hide your error of doing
#ARY = (58, 90);
when you should have done
my #ARY = (58, 90);
The following returns a single element of the array. Since EXPR is to return a single index, it is evaluated in scalar context.
$array[EXPR]
e.g.
my #array = qw( a b c d );
my $index = 2;
my $ele = $array[$index]; # my $ele = 'c';
The following returns the elements identified by LIST. Since LIST is to return 0 or more elements, it must be evaluated in list context.
#array[LIST]
e.g.
my #array = qw( a b c d );
my #indexes ( 1, 2 );
my #slice = $array[#indexes]; # my #slice = qw( b c );
\( $ARY[$index] ) # Returns a ref to the element returned by $ARY[$index]
\( #ARY[#indexes] ) # Returns refs to each element returned by #ARY[#indexes]
${foo} # Weird way of writing $foo. Useful in literals, e.g. "${foo}bar"
#{foo} # Weird way of writing #foo. Useful in literals, e.g. "#{foo}bar"
${foo}[...] # Weird way of writing $foo[...].
Most people don't even know you can use these outside of string literals.

Perl increment or decrement, but not both?

$a++; # ok
$a--; # ok
--$a; # ok
++$a; # ok
--$a++; # syntax error
$a++--; # syntax error
($a++)--; # syntax error
--$a--; # syntax error
On some of these, I can sort of see why- but on like --$a-- there is no ambiguity and no precedence conflict. I'm floored Larry didn't let me do that.. (and don't even get me started on the lack of a floor operator!)
Not that I would need or want to- I was just trying to understand more about how these operators worked and discovered that sort of surprising result..
In the Perldoc for auto increment/decrement we find:
"++" and "--" work as in C.
and slightly earlier on the same page
Perl operators have the following associativity and precedence, listed from highest precedence to lowest. Operators borrowed from C keep the same precedence relationship with each other, even where C's precedence is slightly screwy. (This makes learning Perl easier for C folks.)
Since C returns an rvalue in both cases, Perl does the same. Interestingly, C++ returns a reference to an lvalue for pre-increment/decrement thus having different semantics.
Consider the following:
length($x) = 123;
Just like ++(++$a), there is no ambiguity, there is no precedence conflict, and it would require absolutely no code to function. The limitation is completely artificial[1], which means code was added specifically to forbid it!
So why is length($x) = 123; disallowed? Because disallowing it allows us to catch errors with little or no downside.
length($x) = 123; # XXX Did you mean "length($x) == 123"?
How is it disallowed? Using a concept of lvalues. lvalues are values that are allowed to appear on the left of a scalar assignment.
Some operators are deemed to return lvalues.
$x = 123; # $x returns an lvalue
$#a = 123; # $#a returns an lvalue
substr($s,0,0) = "abc"; # substr returns an lvalue
Some arguments are expected to be an lvalue.
length($x) = 123; # XXX LHS of scalar assignment must be an lvalue
++length($x); # XXX Operand of pre/post-inc/dec must be an lvalue.
The pre/post-increment/decrement operators aren't flagged as returning an lvalue. Operators that except an lvalue will not accept them.
++$a = 123; # XXX Did you mean "++$a == 123"?
This has the side effect of also preventing ++(++$a) which would work fine without the lvalue check.
$ perl -E' ++( ++$a); say $a;'
Can't modify preincrement (++) in preincrement (++) at -e line 1, near ");"
Execution of -e aborted due to compilation errors.
$ perl -E'sub lvalue :lvalue { $_[0] } ++lvalue(++$a); say $a;'
2
Changing ++$a to return an lvalue would allow ++(++$a) to work, but it would also allow ++$a = 123 to work. What's more likely? ++$a = 123 was intentional, or ++$a = 123 is a typo for ++$a == 123?
The following shows that length($x) = 123 would work without the lvalue syntax check.
$ perl -E' say length($x) = 123;'
Can't modify length in scalar assignment at -e line 1, near "123;"
Execution of -e aborted due to compilation errors.
$ perl -E'sub lvalue :lvalue { $_[0] } say lvalue(length($x)) = 123;'
123
The value you see printed is the value of the scalar returned by length after it was changed by the assignment.
For example, what do you expect for:
$a = 1;
$b = --$a++; # imaginary syntax
I think it will be harder to explain that $b is equals to 0 and $a is 1, isn't it?... In any case, I don´t remember any real example where that syntax would be useful. It's useless and ugly

Change meaning of the operator "+" in perl

Currently "+" in perl means addition, in my project, we do string concatenation a lot. I know we can concatention with "." operator, like:
$x = $a . $b; #will concatenate string $a, and string $b
But "+" feels better. Wonder if there is a magic to make the following do concatenation.
$x = $a + $b;
Even better, make the it check the operator type, if both variables ($a, $b) are numbers, then do "addition" in the usual sense, otherwise, do concatenation.
I know in C++, one can overload the operator. Hope there is something similar in perl.
Thanks.
Yes, Perl too offers operator overloading.
package UnintuitiveString;
use Scalar::Util qw/looks_like_number/;
use overload '+' => \&concat,
'.' => \&concat,
'""' => \&as_string;
# Additionally, the following operators *have* to be overridden
# I suggest you raise an exception if an implementation does not make sense
# - * / % ** << >> x
# <=> cmp
# & | ^ ~
# atan2 cos sin exp log sqrt int
# 0+ bool
# ~~
sub new {
my ($class, $val) = #_;
return bless \$val => $class;
}
sub concat {
my ($self, $other, $swap) = #_;
# check for append mode
if (not defined $swap) {
$$self .= "$other";
return $self;
}
($self, $other) = ($other, $self) if $swap;
return UnintuitiveString->new("$self" . "$other");
}
sub as_string {
my ($self) = #_;
return $$self;
}
sub as_number {
my ($self) = #_;
return 0+$$self if looks_like_number $$self;
return undef;
}
Now we can do weird stuff like:
my $foo = UnintuitiveString->new(4);
my $bar = UnintuitiveString->new(2);
print $foo + $bar, "\n"; # "42"
my ($num_x, $num_y) = map { $_->as_number } $foo, $bar;
print $num_x + $num_y, "\n"; # "6"
$foo += 6;
print $foo + "\n"; # "46"
But just because we can do such things does not at all mean that we should:
Perl already has a concatenation operator: .. It's perfectly fine to use that.
Operator overloading comes at a massive performance cost. What previously was a single opcode in perl's VM is now a series of method calls and intermediate copies.
Changing the meaning of your operators is extremely confusing for people who actually know Perl. I stumbled a few times with the test cases above, when I was surprised that $foo + 6 wouldn't produce 10.
Perl's scalars are not a number or a string, they are both at the same time and are interpreted as one or the other depending on their usage context. This is actually half-true, and the scalars have different representations. They could be a string (PV), an integer (IV), a float (NV). However, once a PV is used in a numerical context like addition, a numerical value is determined and saved alongside the string, and we get an PVIV or PVNV. The reverse is also true: when a number is used in a stringy context, the formatted string is saved alongside the number. The looks_like_number function mentioned above determines whether a given string could represent a valid number like "42" or "NaN". Because just using a scalar in some context can change the representation, checking that a given scalar is a PV does not guarantee that it was intended to be a string, and an IV does not guarantee that it was intended to be an integer.
Perl has two sets of operators for a very good reason: If the “type” of a scalar is fluid, we need another way to explicitly request certain behavior. E.g. Perl has numeric comparison operators < <= == != >= > <=> and stringy comparison operators lt le eq ne ge gt cmp which can behave very differently: 4 XXX 12 will be -1 for <=> (because 4 is numerically smaller than 12), but 1 for cmp (because 4 comes later than 1 in most collation orders).
Other languages suffer a lot from having operators coerce their operands to required types but not offering two sets of operators. E.g. in Java, + is overloaded to concat strings. However, this leads to a loss of commutativity and associativity. Given three values x, y, z which can be either strings or numbers, we get different results for:
x + y and y + x – string concatenation is not commutative, whereas numeric addition is.
(x + y) + z and x + (y + z) – the + is not associative as soon as one string enters the playing field. Consider x = 1, y = 2, z = "4". Then the first evaluation order leads to "34", whereas the second leads to "124".
In Java, this is not a problem, because the language is statically typed, and because there are very few coercions (autoboxing, autounboxing, widening conversions, and stringification in concatenation). However, JavaScript (which is dynamically typed and will perform conversions from strings to numbers for other operators) shows the exact same behavior. Oops.
Stop this madness. Now. Perl's set of operators (barring smartmatch) is one of the best designed parts of the language (and its type system one of the worst parts from a modern viewpoint). If you dislike Perl because its operators make sense, you are free to use PHP instead (which, by the way, also uses . for concatenation to avoid such issues) :P

Is ~~ a short-circuit operator?

From the Smart matching in detail section in perlsyn:
The smart match operator
short-circuits whenever possible.
Does ~~ have anything in common with short circuit operators (&&, ||, etc.) ?
The meaning of short-circuiting here is that evaluation will stop as soon as the boolean outcome is established.
perl -E "#x=qw/a b c d/; for (qw/b w/) { say qq($_ - ), $_ ~~ #x ? q(ja) : q(nein) }"
For the input b, Perl won't look at the elements following b in #x. The grep built-in, on the other hand, to which the document you quote makes reference, will process the entire list even though all that's needed might be a boolean.
perl -E "#x=qw/a b c/; for (qw/b d/) { say qq($_ - ), scalar grep $_, #x ? q(ja) : q(nein) }"
Yes, in the sense that when one of the arguments is an Array or a Hash, ~~ will only check elements until it can be sure of the result.
For instance, in sub x { ... }; my %h; ...; %h ~~ \&x, the smart match returns true only if x returns true for all the keys of %h; if one call returns false, the match can return false at once without checking the rest of the keys. This is similar to the && operator.
On the other hand, in /foo/ ~~ %h, the smart match can return true if it finds just one key that matches the regular expression; this is similar to ||.