When does Perl impose string context? - perl

It appears that string context (while a real thing, and mentioned in "Programming Perl" chapter "2.7.1. Scalar and List Context" as a sub-idea of scalar context), isn't clearly documented anywhere I was able to find on Perldoc.
Obviously, some things in Perl (e.g. eq operator, or qq// quoting interpolation) force a value into a string context.
When does Perl impose string context?
perldoc seems to contain no useful answer.

Perl will vivify the PV (the string component) of the structure that comprises a scalar when Perl needs a string. The best place to learn about this is in perlguts, perldata, and to a lesser degree, perlop. But essentially any time a string type operation is performed with a scalar, it will impose your sense of string context, and if the scalar only contains, for example, an integer, a string will be implicitly created from that value.
So, if you have $var = 15, which places an integer in $var, and then say, if( $var eq '15' ) {...}, a string representation of the integer 15 will be generated and stored in the PV portion of the scalar's struct.
This is by no means a complete list, but the following will do the trick:
String comparison operators (eq, ne, ge, le, lt, gt)
Match binding operators for left operand (=~, !~)
string interpolation (qq{$var} , "$var", qx/$var/, and backticks)
Regex operators (m/$interpolated/, s/$interpolated//, qr/$interpolated/)
<<"HERE" (HERE doc with interpolation)
Hash key $hash{$stringified_key}.
. concatenation operator.
In newer versions of Perl, with the bitwise feature enabled, the string bitwise operators &., |., ^., and ~., along with their assignment counterparts such as &.= will invoke string context on their operands.
vec imposes string context on its first parameter.
There probably are others. But the good news is that this implementation detail rarely leaks to abstraction layers outside of the "guts" level. One example of where it can be a concern is when encoding JSON, since the JSON modules I am familiar with all look at whether or not a given scalar has a PV component to decide whether to encode a value as a string or a number.
As Joel identified in a comment below, the Devel::Peek module, which has been in the Perl core since Perl version 5.6.0 can facilitate introspection into the guts of a scalar:
use Devel::Peek;
my $foo = 12;
print "Initial state of \$foo:\n";
Dump($foo);
my $bar = "$foo";
print "\n\nFinal state of \$foo:\n";
Dump($foo);
The output produced by that code is:
Initial state of $foo:
SV = IV(0x56547b4bb2a0) at 0x56547b4bb2b0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 12
Final state of $foo:
SV = PVIV(0x56547b4b5880) at 0x56547b4bb2b0
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 12
PV = 0x56547b4ab600 "12"\0
CUR = 2
LEN = 10
As you can see, after the forced stringification there is a PV element, the POK flag is set, and the CUR and LEN fields are present to indicate the string buffer's length and the current length of its contents.

Related

How does word counting by list assignment work in Perl?

I cannot exactly understand how the following snippet works:
my $str = 'abc def ghi';
my $num = () = $str =~ /\w+/g;
say $num; # prints the word count, 3
I know that $str =~ /\w+/g returns a list of the words which, apparently, is conveyed to the leftmost assignment. Then $num imposes a scalar context on that list and becomes 3.
But what does () = ('abc', 'def', 'ghi') mean? Is it something like my $a = my #b = (3, 5, 8)? If so, how is the list at the rightmost side transferred to $num at the leftmost side?
Each perl operator has specific behavior in list and scalar context. Operators give context to their operands, but receive context from what they are an operand to.
When a list assignment is placed in scalar context, it returns the number of elements on the right side of the assignment. This enables code like:
while (my #pair = splice(#array, 0, 1)) {
There's nothing special about how = () = is handled; you could just as well do = ($dummy) = or = (#dummy) =; the key part is that you want the match to be list context (producing all the possible matches) and then to just get a count of them.
So you do a list assignment (which is what = does whenever there's either a parenthesized expression or an array or slice as the left operand) but since you don't actually want the values, you can use an empty list. And then place that in scalar context; in this case, using the list assignment as the right operand to a scalar assignment.
Nowadays fewer people start learning Perl, one of reason is it has some obscure code like your example.
Check the perlsecret page for Saturn https://metacpan.org/pod/distribution/perlsecret/lib/perlsecret.pod#Goatse
=( )=
(Alternate nickname: "Saturn")
If you don't understand the name of this operator, consider yourself lucky. You are advised not to search the Internet for a visual explanation.
The goatse operator provides a list context to its right side and returns the number of elements to its left side. Note that the left side must provide a scalar context; obviously, a list context on the left side will receive the empty list in the middle.
The explanation is that a list assignment in scalar context returns the number of elements on the right-hand side of the assignment, no matter how many of those elements were actually assigned to variables. In this case, all the elements on the right are simply assigned to an empty list (and therefore discarded).

Perl's foreach with string argument

Perldocs only indicate that foreach loops "iterates over a normal list value" https://perldoc.perl.org/perlsyn.html#Foreach-Loops, but I sometimes see them with string arguments, such as the following examples:
foreach (`curl example.com 2>/dev/null`) {
# iterates 50 times
}
foreach ("foo\nbar\nbaz") {
# iterates just 1 time. Why?
}
Is the behavior of passing a string like this defined? Separately, why the disparate results from passing the string returned by a backticked command, and a literal string, as in the example?
In scalar context, backticks return a single scalar containing all the output of the enclosed command. But foreach (...) evaluates the backticks in list context, which will separates the output into a list with one line per element.
The question revolves around the context, a critical concept for many things in Perl.
The foreach loop needs a list to iterate over, so it imposes the list context to build the list values you saw mentioned in docs. The list may be formed with literals, qw(a b c), and may have one element; this is your second example, where one string is given, forming the one-element list that is iterated over.
The list can also come from an expression, that is evaluated in the list context; this is your first example. Many operations yield different returns based on context, and qx is such an operator as explained in mob's answer. This is something to note and be careful with. An expression may also return a single value regardless of context; then it is simply used to populate the list.
From perldoc -f qx:
In list context, returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
From perldoc perlsyn:
Compound statements
[...]
LABEL foreach VAR (LIST) BLOCK
A string is not a list. If you want to iterate over the characters in a string you'll need
foreach my $character (split('', "foo\nbar\nbaz")) {
I think you might be confusing Perl with Python:
>>> for c in "foo\nbar\nbaz":
... print c
...
f
o
.... remainder deleted ....
a
z
>>>
As pointed out by the other answers backticks/qx{} return a list of output lines from the executed command in list context.

Is the truthiness of a dualvar always that of its string part?

The empirical behaviour of my Perl 5.26.2 x64 (Cygwin) is that a dualvar is truthy if and only if its string part is truthy:
# Falsy number, truthy string => truthy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 0, "foo"; say "yes" if $v'
yes
# Truthy number, falsy string => falsy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 1, ""; say "yes" if $v'
# Truthy number, truthy string => truthy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 1, "foo"; say "yes" if $v'
yes
# Falsy number, falsy string => falsy
$ perl -MScalar::Util=dualvar -E 'my $v=dualvar 0, ""; say "yes" if $v'
This has been the case since 2009 per this.
Question: Is this guaranteed behaviour?
Boolean::String says that this is the behaviour. However, I don't know if that's something I can rely on, in terms of backward compatibility.
I also do not see an express statement in perlsyn, Scalar::Util, or perldata#Context.
I do see the following in perldata#Scalar-values:
A scalar value is interpreted as FALSE in the Boolean sense if it is undefined, the null string or the number 0 (or its string equivalent, "0"), and TRUE if it is anything else. The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed.
The statement that "no conversion ... is ever performed" unfortunately doesn't tell me which part(s) of a dualvar the interpreter is looking at!
Similarly, Chas. Owens's related answer says that
the truthiness test looks at strings first
But if it looks at strings first, what does it look at second, and when?
Edit My understanding is that if overload is defined on a variable, dualvar or not, the bool overload will control. I am wondering about the non-overloaded case.
Edit 2 ikegami's answer here points out that PL_sv_yes and PL_sv_no also have an NV (double) component. For bonus points :) , does the NV have any effect on truthiness if a dualvar has one? (Let me know if that answer is actually involved enough to deserve a separate question.)
Yes, at least so far. The SvTRUE_common macro is usually used to decide where an SV is "true" in a boolean context. Here's how it is defined in sv.h in the perl 5.26.1 source:
#define SvTRUE_common(sv,fallback) ( \
!SvOK(sv) \
? 0 \
: SvPOK(sv) \
? SvPVXtrue(sv) \
: (SvFLAGS(sv) & (SVf_IOK|SVf_NOK)) \
? ( (SvIOK(sv) && SvIVX(sv) != 0) \
|| (SvNOK(sv) && SvNVX(sv) != 0.0)) \
: (fallback))
After the scalar passes the SvOK test (whether it is defined), the next check is SvPOK -- whether the scalar has a valid internal string representation. Dualvars always pass this check, so the boolean test of a dualvar is whether its string representation is true (SvPVXtrue(...)).
The code is different in perl 5.6.2
I32
Perl_sv_true(pTHX_ register SV *sv)
{
if (!sv)
return 0;
if (SvPOK(sv)) {
register XPV* tXpv;
if ((tXpv = (XPV*)SvANY(sv)) &&
(tXpv->xpv_cur > 1 ||
(tXpv->xpv_cur && *tXpv->xpv_pv != '0')))
return 1;
else
return 0;
}
else {
...
but the logic is the same -- check SvPOK first and then return whether the string representation is not empty and not equal to "0".
I would think future generations of Perl developers would be wary of changing this long-standing logic.
Question: Is this guaranteed behaviour?
This boils down to how a scalar is tested in the Boolean context, as string or numeric?
In Perl the documentation is the closest thing to a standard. So if there is no statement in docs then the formal answer must be: No, it is not "guaranteed behaviour".
Since the docs come tantalizingly close a few times, talking about that context and conversions, and yet specifically do not spell out which test is done I'd say that this must indeed be taken as an implementation detail. You cannot "rely" on it.
If strict reliability is needed one solution is a simple class that ensures to test what you need.
In more practical terms, it appears that in if ($v) it is the string part that is tested, and if it's not there then a numeric test goes (without the actual conversion as the docs say). As you ask about variables that have been set as dualvar then for those it's going to be the string test.

Perl comparison operator output

I am not exactly sure what the output of a comparison is. For instance, consider
$rr = 1>2;
$qq = 2>1;
print $rr; #nothing printed
print $qq; #1 printed
Is $rr the empty string? Is this behavior documented somewhere? Or how can one tell for sure?
I was looking for the answer in Learning Perl by Schwartz et al., but could not immediately resolve the answer.
http://perldoc.perl.org/perlop.html#Relational-Operators:
Perl operators that return true or false generally return values that can be safely used as numbers. For example, the relational operators in this section and the equality operators in the next one return 1 for true and a special version of the defined empty string, "" , which counts as a zero but is exempt from warnings about improper numeric conversions, just as "0 but true" is.
So it what is returned is something that is an empty string in string context, and 0 in numeric context.

What does this mean in Perl 1..$#something?

I have a loop for example :
for my $something ( #place[1..$#thing] ) {
}
I don't get this statement 1..$#thing
I know that # is for comments but my IDE doesn't color #thing as comment. Or is it really just a comment for someone to know that what is in "$" is "thing" ? And if it's a comment why was the rest of the line not commented out like ] ) { ?
If it has other meanings, i will like to know. Sorry if my question sounds odd, i am just new to perl and perplexed by such an expression.
The $# is the syntax for getting the highest index of the array in question, so $#thing is the highest index of the array #thing. This is documented in perldoc perldata
.. is the range operator, and 1 .. $#thing means a list of numbers, from 1 to whatever the highest index of #thing is.
Using this list inside array brackets with the # sigill denotes that this is an array slice, which is to say, a selected number of elements in the #place array.
So assuming the following:
my #thing = qw(foo bar baz);
my #place = qw(home work restaurant gym);
then #place[1 .. $#thing] (or 1 .. 2) would expand into the list work, restaurant.
It is correct that # is used for comments, but not in this case.
it's how you define a range. From starting value to some other value.
for my $something ( #place[1..3] ) {
# Takes the first three elements
}
Binary ".." is the range operator, which is really two different
operators depending on the context. In list context, it returns a list
of values counting (up by ones) from the left value to the right
value. If the left value is greater than the right value then it
returns the empty list. The range operator is useful for writing
foreach (1..10) loops and for doing slice operations on arrays. In the
current implementation, no temporary array is created when the range
operator is used as the expression in foreach loops, but older
versions of Perl might burn a lot of memory when you write something
like this:
http://perldoc.perl.org/perlop.html#Range-Operators