Is the truthiness of a dualvar always that of its string part? - perl

The empirical behaviour of my Perl 5.26.2 x64 (Cygwin) is that a dualvar is truthy if and only if its string part is truthy:
# Falsy number, truthy string => truthy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 0, "foo"; say "yes" if $v'
yes
# Truthy number, falsy string => falsy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 1, ""; say "yes" if $v'
# Truthy number, truthy string => truthy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 1, "foo"; say "yes" if $v'
yes
# Falsy number, falsy string => falsy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 0, ""; say "yes" if $v'
This has been the case since 2009 per this.
Question: Is this guaranteed behaviour?
Boolean::String says that this is the behaviour. However, I don't know if that's something I can rely on, in terms of backward compatibility.
I also do not see an express statement in perlsyn, Scalar::Util, or perldata#Context.
I do see the following in perldata#Scalar-values:
A scalar value is interpreted as FALSE in the Boolean sense if it is undefined, the null string or the number 0 (or its string equivalent, "0"), and TRUE if it is anything else. The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed.
The statement that "no conversion ... is ever performed" unfortunately doesn't tell me which part(s) of a dualvar the interpreter is looking at!
Similarly, Chas. Owens's related answer says that
the truthiness test looks at strings first
But if it looks at strings first, what does it look at second, and when?
Edit My understanding is that if overload is defined on a variable, dualvar or not, the bool overload will control. I am wondering about the non-overloaded case.
Edit 2 ikegami's answer here points out that PL_sv_yes and PL_sv_no also have an NV (double) component. For bonus points :) , does the NV have any effect on truthiness if a dualvar has one? (Let me know if that answer is actually involved enough to deserve a separate question.)

Yes, at least so far. The SvTRUE_common macro is usually used to decide where an SV is "true" in a boolean context. Here's how it is defined in sv.h in the perl 5.26.1 source:
#define SvTRUE_common(sv,fallback) ( \
!SvOK(sv) \
? 0 \
: SvPOK(sv) \
? SvPVXtrue(sv) \
: (SvFLAGS(sv) & (SVf_IOK|SVf_NOK)) \
? ( (SvIOK(sv) && SvIVX(sv) != 0) \
|| (SvNOK(sv) && SvNVX(sv) != 0.0)) \
: (fallback))
After the scalar passes the SvOK test (whether it is defined), the next check is SvPOK -- whether the scalar has a valid internal string representation. Dualvars always pass this check, so the boolean test of a dualvar is whether its string representation is true (SvPVXtrue(...)).
The code is different in perl 5.6.2
I32
Perl_sv_true(pTHX_ register SV *sv)
{
if (!sv)
return 0;
if (SvPOK(sv)) {
register XPV* tXpv;
if ((tXpv = (XPV*)SvANY(sv)) &&
(tXpv->xpv_cur > 1 ||
(tXpv->xpv_cur && *tXpv->xpv_pv != '0')))
return 1;
else
return 0;
}
else {
...
but the logic is the same -- check SvPOK first and then return whether the string representation is not empty and not equal to "0".
I would think future generations of Perl developers would be wary of changing this long-standing logic.

Question: Is this guaranteed behaviour?
This boils down to how a scalar is tested in the Boolean context, as string or numeric?
In Perl the documentation is the closest thing to a standard. So if there is no statement in docs then the formal answer must be: No, it is not "guaranteed behaviour".
Since the docs come tantalizingly close a few times, talking about that context and conversions, and yet specifically do not spell out which test is done I'd say that this must indeed be taken as an implementation detail. You cannot "rely" on it.
If strict reliability is needed one solution is a simple class that ensures to test what you need.
In more practical terms, it appears that in if ($v) it is the string part that is tested, and if it's not there then a numeric test goes (without the actual conversion as the docs say). As you ask about variables that have been set as dualvar then for those it's going to be the string test.

Related

Perl comparison operator output

I am not exactly sure what the output of a comparison is. For instance, consider
$rr = 1>2;
$qq = 2>1;
print $rr; #nothing printed
print $qq; #1 printed
Is $rr the empty string? Is this behavior documented somewhere? Or how can one tell for sure?
I was looking for the answer in Learning Perl by Schwartz et al., but could not immediately resolve the answer.
http://perldoc.perl.org/perlop.html#Relational-Operators:
Perl operators that return true or false generally return values that can be safely used as numbers. For example, the relational operators in this section and the equality operators in the next one return 1 for true and a special version of the defined empty string, "" , which counts as a zero but is exempt from warnings about improper numeric conversions, just as "0 but true" is.
So it what is returned is something that is an empty string in string context, and 0 in numeric context.

Why is my command line argument being interpreted as a Boolean (Perl 6)?

Given this program:
#!/bin/env perl6
sub MAIN ($filename='test.fq', :$seed=floor(now) )
{
say "Seed is $seed";
}
When I run it without any command line arguments, it works fine. However, when I give it a command line argument for seed, it says that its value is True:
./seed.p6 --seed 1234
Seed is True
Why is the number 1234 being interpreted as a boolean?
Perl 6's MAIN argument handling plays well with gradual typing. Arguments can, and should be typecast to reduce ambiguity and improve validation:
#!/bin/env perl6
sub MAIN (Str $filename='test.fq', Int :$seed=floor(now))
{
say "Seed is $seed.";
}
After typecasting seed to Int, this option must be given a numeric argument and no longer defaults to a Boolean:
perl6 ./seed.pl -seed 1234
Usage:
./seed.pl [--seed=<Int>] [<filename>]
perl6 ./seed.pl -seed=abc
Usage:
./seed.pl [--seed=<Int>] [<filename>]
perl6 ./seed.pl -seed=1234
Seed is 1234.
You need to use an = sign between your option --seed and its value 1234:
./seed.p6 --seed=1234
Since you have a positional argument in your MAIN subroutine signature (i.e. $filename), the first argument not tied to an value with an = sign will be assigned to it.
Your original
./seed.p6 --seed 1234
was being interpreted as if 1234 were the filename (i.e. it was assigned to the variable $filename). Since a command line option without an argument is considered to be True, $seed was being assigned True in your original invocation of that script.

When does Perl impose string context?

It appears that string context (while a real thing, and mentioned in "Programming Perl" chapter "2.7.1. Scalar and List Context" as a sub-idea of scalar context), isn't clearly documented anywhere I was able to find on Perldoc.
Obviously, some things in Perl (e.g. eq operator, or qq// quoting interpolation) force a value into a string context.
When does Perl impose string context?
perldoc seems to contain no useful answer.
Perl will vivify the PV (the string component) of the structure that comprises a scalar when Perl needs a string. The best place to learn about this is in perlguts, perldata, and to a lesser degree, perlop. But essentially any time a string type operation is performed with a scalar, it will impose your sense of string context, and if the scalar only contains, for example, an integer, a string will be implicitly created from that value.
So, if you have $var = 15, which places an integer in $var, and then say, if( $var eq '15' ) {...}, a string representation of the integer 15 will be generated and stored in the PV portion of the scalar's struct.
This is by no means a complete list, but the following will do the trick:
String comparison operators (eq, ne, ge, le, lt, gt)
Match binding operators for left operand (=~, !~)
string interpolation (qq{$var} , "$var", qx/$var/, and backticks)
Regex operators (m/$interpolated/, s/$interpolated//, qr/$interpolated/)
<<"HERE" (HERE doc with interpolation)
Hash key $hash{$stringified_key}.
. concatenation operator.
In newer versions of Perl, with the bitwise feature enabled, the string bitwise operators &., |., ^., and ~., along with their assignment counterparts such as &.= will invoke string context on their operands.
vec imposes string context on its first parameter.
There probably are others. But the good news is that this implementation detail rarely leaks to abstraction layers outside of the "guts" level. One example of where it can be a concern is when encoding JSON, since the JSON modules I am familiar with all look at whether or not a given scalar has a PV component to decide whether to encode a value as a string or a number.
As Joel identified in a comment below, the Devel::Peek module, which has been in the Perl core since Perl version 5.6.0 can facilitate introspection into the guts of a scalar:
use Devel::Peek;
my $foo = 12;
print "Initial state of \$foo:\n";
Dump($foo);
my $bar = "$foo";
print "\n\nFinal state of \$foo:\n";
Dump($foo);
The output produced by that code is:
Initial state of $foo:
SV = IV(0x56547b4bb2a0) at 0x56547b4bb2b0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 12
Final state of $foo:
SV = PVIV(0x56547b4b5880) at 0x56547b4bb2b0
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 12
PV = 0x56547b4ab600 "12"\0
CUR = 2
LEN = 10
As you can see, after the forced stringification there is a PV element, the POK flag is set, and the CUR and LEN fields are present to indicate the string buffer's length and the current length of its contents.

0, 0e0, 0.0, -0, +0, 000 all mean the same thing to Perl, why?

Just puzzling to me.
Related, but different question:
What does “0 but true” mean in Perl?
Perl doesn't distinguish kinds of numbers. Looking at all of those with a non-CS/programmer eye, they all mean the same thing to me as well: zero. (This is one of the foundations of Perl: it tries to work like people, not like computers. "If it looks like a duck....")
So, if you use them as numbers, they're all the same thing. If you use them as strings, they differ. This does lead to situations where you may need to force one interpretation ("0 but true"; see also "nancy typing"). but by and large it "does the right thing" automatically.
I don't understand, what else should they mean?
You give integer, scientific, floating point, signed integers and octal notations of zero. Why should they differ?
0==0 as everyone, including Larry Wall, knows.
Perl interprets every scalar value as both a string and (potentially) a number. All of those string representations of zero can convert to the integer value 0 , according to perl's conversion rules:
"0", "0.0", "-0", "+0", "000" => Simplest case of straight string to numeric conversion.
"0e0" => In a numeric context, only the leading valid numeric characters are converted, so only the leading "0" is used. For example, "1984abcdef2112" would be interpreted numerically as 1984.
"0 but true" in perl means that a string like "0e0" will evalutate numerically to 0, but in a boolean context will be "true" because the conversion to boolean follows different rules than the strict numeric conversion.
Perl works in contexts. In string context, they are all different. In numeric context, they are all zero.
print "same string\n" if '0' eq '0.0';
print "same number\n" if 0 == 0.0;
'0 but true' in boolean context is true:
print "boolean context\n" if '0 but true';
print "string context\n" if '0 but true' eq '0';
print "numeric context\n" if '0 but true' == 0;

Pass zero in to Getopt::Std

I am using Getopt::Std in a Perl script, and would like to pass in a zero as value. I am checking that values are set correctly using unless(). At the moment unless() is rejecting the value as being unset.
Is there a way to get unless() to accept zero as a valid value (any non-negative integer is valid).
This is probably perfeclty simple, but I've never touched Perl before a few days ago!
Rich
You need to use unless defined <SOMETHING> instead of unless <SOMETHING> , because zero is false in Perl.
Perl 5 has several false values: 0, "0", "", undef, ().
It is important to note that some things may look like they should be false, but aren't. For instance 0.0 is false because it is number that is equivalent to 0, but "0.0" is not (the only strings which are false are the empty string ("") and "0").
It also has the concept of definedness. A variable that has a value (other than undef) assigned to it is said to be defined and will return true when tested with the defined function.
Given that you want an argument to be a non-negative integer, it is probably better to test for that:
unless (defined $value and $value =~ /^[0-9]+$/) {
#blah
}