Perl variable assignment side effects - perl

I'll be the first to admit that Perl is not my strong suit. But today I ran across this bit of code:
my $scaledWidth = int($width1x * $scalingFactor);
my $scaledHeight = int($height1x * $scalingFactor);
my $scaledSrc = $Media->prependStyleCodes($src, 'SX' . $scaledWidth);
# String concatenation makes this variable into a
# string, so we need to make it an integer again.
$scaledWidth = 0 + $scaledWidth;
I could be missing something obvious here, but I don't see anything in that code that could make $scaledWidth turn into a string. Unless somehow the concatenation in the third line causes Perl to permanently change the type of $scaledWidth. That seems ... wonky.
I searched a bit for "perl assignment side effects" and similar terms, and didn't come up with anything.
Can any of you Perl gurus tell me if that commented line of code actually does anything useful? Does using an integer variable in a concatenation expression really change the type of that variable?

It is only a little bit useful.
Perl can store a scalar value as a number or a string or both, depending on what it needs.
use Devel::Peek;
Dump($x = 42);
Dump($x = "42");
Outputs:
SV = PVIV(0x139a808) at 0x178a0b8
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
PV = 0x178d9e0 "0"\0
CUR = 1
LEN = 16
SV = PVIV(0x139a808) at 0x178a0b8
REFCNT = 1
FLAGS = (POK,pPOK)
IV = 42
PV = 0x178d9e0 "42"\0
CUR = 2
LEN = 16
The IV and IOK tokens refer to how the value is stored as a number and whether the current integer representation is valid, while PV and POK indicate the string representation and whether it is valid. Using a numeric scalar in a string context can change the internal representation.
use Devel::Peek;
$x = 42;
Dump($x);
$y = "X" . $x;
Dump($x);
SV = IV(0x17969d0) at 0x17969e0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
SV = PVIV(0x139aaa8) at 0x17969e0
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 42
PV = 0x162fc00 "42"\0
CUR = 2
LEN = 16
Perl will seamlessly convert one to the other as needed, and there is rarely a need for the Perl programmer to worry about the internal representation.
I say rarely because there are some known situations where the internal representation matters.

Perl variables are not typed. Any scalar can be either a number or a string depending how you use it. There are a few exceptions where an operation is dependent on whether a value seems more like a number or string, but most of them have been either deprecated or considered bad ideas. The big exception is when these values must be serialized to a format that explicitly stores numbers and strings differently (commonly JSON), so you need to know which it is "supposed" to be.
The internal details are that a SV (scalar value) contains any of the values that have been relevant to its usage during its lifetime. So your $scaledWidth first contains only an IV (integer value) as the result of the int function. When it is concatenated, that uses it as a string, so it generates a PV (pointer value, used for strings). That variable contains both, it is not one type or the other. So when something like JSON encoders need to determine whether it's supposed to be a number or a string, they see both in the internal state.
There have been three strategies that JSON encoders have taken to resolve this situation. Originally, JSON::PP and JSON::XS would simply consider it a string if it contains a PV, or in other words, if it's ever been used as a string; and as a number if it only has an IV or NV (double). As you alluded to, this leads to an inordinate amount of false positives.
Cpanel::JSON::XS, a fork of JSON::XS that fixes a large number of issues, along with more recent versions of JSON::PP, use a different heuristic. Essentially, a value will still be considered a number if it has a PV but the PV matches the IV or NV it contains. This, of course, still results in false positives (example: you have the string '5', and use it in a numerical operation), but in practice it is much more often what you want.
The third strategy is the most useful if you need to be sure what types you have: be explicit. You can do this by reassigning every value to explicitly be a number or string as in the code you found. This assigns a new SV to $scaledWidth that contains only an IV (the result of the addition operation), so there is no ambiguity. Another method of being explicit is using an encoding method that allows specifying the types you want, like Cpanel::JSON::XS::Type.
The details of course vary if you're not talking about the JSON format, but that is where this issue has been most deliberated. This distinction is invisible in most Perl code where the operation, not the values, determine the type.

Related

What does flag `pIOK` mean?

When dumping perl SV with Devel::Peek I can see:
SV = IV(0x1c13168) at 0x1c13178
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 2
But can not find the description what pIOK mean.
I tried to look it at Devel::Peek, perlapi , perlguts, perlxs ...
In sources I found that:
{SVp_IOK, "pIOK,"}
But still can not find what SVp_IOK is. What is it?
UPD
I found this document. It shed the light a bit what flags mean and where they are situated. (beware this DOC is outdated a bit)
This flag indicates that the object has a valid non-public IVX field value. It can only be set for value type SvIV or subtypes of it.
UPD
Why private and public flags are differ
pIOK is how Devel::Peek represents the bit corresponding to bit mask SVp_IOK. The p indicates a "private" flag, and it forms a pair with "public" flag IOK (bit mask SVf_IOK)
The exact meaning of the private flags has changed across perl versions, but in general terms they mean that the IV (or NV or PV) field of the SV is "inaccurate" in some way
The most common situation where pIOK is set on its own (pIOK is always set if IOK is set) is where a PV has been converted to a numeric NV value. The NV and IV fields are both populated, but if the IV value isn't an accurate representation of the number (i.e. it has been truncated) then pIOK is set but IOK is cleared
This code shows a way to reach that state. Variable $pi_str is set to a string value for π and it is converted to a floating-point value by adding 0.0 and storing it into $pi_num. Devel::Peek now shows that NOK/pNOK and POK/pPOK are set, but only pIOK while IOK remains clear. Looking at the IV value we can see why: it is set to 3, which is the cached value of int $pi_str in case we need it again, but it is not an accurate representation of the string "3.14159" in integer form
use strict;
use warnings 'all';
use Devel::Peek 'Dump';
my $pi_str = "3.14159";
my $pi_num = $pi_str + 0.0;
Dump $pi_str;
output
SV = PVNV(0x28fba68) at 0x3f30d30
REFCNT = 1
FLAGS = (NOK,POK,IsCOW,pIOK,pNOK,pPOK)
IV = 3
NV = 3.14159
PV = 0x3fb7ab8 "3.14159"\0
CUR = 7
LEN = 10
COW_REFCNT = 1
Perl v5.16 and before used to use the flag to indicate "magic" variables (such as tied values) because the value in the IV field could not be used directly. That was changed in v5.18 and later, and magic values now use pIOK in the same way as any other value

Why does sv_setref_pv() store its void * argument in the IV slot?

When looking at the Perl API, I was wondering why
sv_setref_iv() stores its IV argument in the IV slot,
sv_setref_nv() stores its NV argument in the NV slot,
but sv_setref_pv() stores its void * argument in the IV slot, instead of the PV slot?
I have a hunch (the CUR and LEN fields wouldn't make sense for such a variable), but I'd like to have the opinion of someone knowledgeable in XS :-)
There are many different types of scalars.
SvNULL isn't capable of holding any value except undef.
SvIV is capable of holding an IV, UV or RV.
SvNV is capable of holding an NV.
SvPV is capable of holding a PV.
SvPVIV is capable of holding a PV, as well as an IV, UV or RV.
...
AV, HV, CV, GV are really just types of scalar too.
Note I said "capable" of holding. You can think of scalars as being objects, and the above as being classes and subclasses. Each of the above has a different structure.
Having these different types of scalars allows one to save memory.
An SvIV (the smallest scalar type capable of holding an IV) is smaller than an SvPV (the smallest scalar type capable of holding a PV).
$ perl -le'
use Devel::Size qw( total_size );
use Inline C => <<'\''__EOI__'\'';
void upgrade_to_iv(SV* sv) {
SvUPGRADE(sv, SVt_IV);
}
void upgrade_to_pv(SV* sv) {
SvUPGRADE(sv, SVt_PV);
}
__EOI__
{ my $x; upgrade_to_iv($x); print total_size($x); }
{ my $x; upgrade_to_pv($x); print total_size($x); }
'
24
40
Using an SvIV instead of an SvPV is a savings of 16 bytes per reference.

Perl booleans, negation (and how to explain it)?

I'm new here. After reading through how to ask and format, I hope this will be an OK question. I'm not very skilled in perl, but it is the programming language what I known most.
I trying apply Perl to real life but I didn't get an great understanding - especially not from my wife. I tell her that:
if she didn't bring to me 3 beers in the evening, that means I got zero (or nothing) beers.
As you probably guessed, without much success. :(
Now factually. From perlop:
Unary "!" performs logical negation, that is, "not".
Languages, what have boolean types (what can have only two "values") is OK:
if it is not the one value -> must be the another one.
so naturally:
!true -> false
!false -> true
But perl doesn't have boolean variables - have only a truth system, whrere everything is not 0, '0' undef, '' is TRUE. Problem comes, when applying logical negation to an not logical value e.g. numbers.
E.g. If some number IS NOT 3, thats mean it IS ZERO or empty, instead of the real life meaning, where if something is NOT 3, mean it can be anything but 3 (e.g. zero too).
So the next code:
use 5.014;
use Strictures;
my $not_3beers = !3;
say defined($not_3beers) ? "defined, value>$not_3beers<" : "undefined";
say $not_3beers ? "TRUE" : "FALSE";
my $not_4beers = !4;
printf qq{What is not 3 nor 4 mean: They're same value: %d!\n}, $not_3beers if( $not_3beers == $not_4beers );
say qq(What is not 3 nor 4 mean: #{[ $not_3beers ? "some bears" : "no bears" ]}!) if( $not_3beers eq $not_4beers );
say ' $not_3beers>', $not_3beers, "<";
say '-$not_3beers>', -$not_3beers, "<";
say '+$not_3beers>', -$not_3beers, "<";
prints:
defined, value><
FALSE
What is not 3 nor 4 mean: They're same value: 0!
What is not 3 nor 4 mean: no bears!
$not_3beers><
-$not_3beers>0<
+$not_3beers>0<
Moreover:
perl -E 'say !!4'
what is not not 4 IS 1, instead of 4!
The above statements with wife are "false" (mean 0) :), but really trying teach my son Perl and he, after a while, asked my wife: why, if something is not 3 mean it is 0 ? .
So the questions are:
how to explain this to my son
why perl has this design, so why !0 is everytime 1
Is here something "behind" what requires than !0 is not any random number, but 0.
as I already said, I don't know well other languages - in every language is !3 == 0?
I think you are focussing to much on negation and too little on what Perl booleans mean.
Historical/Implementation Perspective
What is truth? The detection of a higher voltage that x Volts.
On a higher abstraction level: If this bit here is set.
The abstraction of a sequence of bits can be considered an integer. Is this integer false? Yes, if no bit is set, i.e. the integer is zero.
A hardware-oriented language will likely use this definition of truth, e.g. C, and all C descendants incl Perl.
The negation of 0 could be bitwise negation—all bits are flipped to 1—, or we just set the last bit to 1. The results would usually be decoded as integers -1 and 1 respectively, but the latter is more energy efficient.
Pragmatic Perspective
It is convenient to think of all numbers but zero as true when we deal with counts:
my $wordcount = ...;
if ($wordcount) {
say "We found $wordcount words";
} else {
say "There were no words";
}
or
say "The array is empty" unless #array; # notice scalar context
A pragmatic language like Perl will likely consider zero to be false.
Mathematical Perspective
There is no reason for any number to be false, every number is a well-defined entity. Truth or falseness emerges solely through predicates, expressions which can be true or false. Only this truth value can be negated. E.g.
¬(x ≤ y) where x = 2, y = 3
is false. Many languages which have a strong foundation in maths won't consider anything false but a special false value. In Lisps, '() or nil is usually false, but 0 will usually be true. That is, a value is only true if it is not nil!
In such mathematical languages, !3 == 0 is likely a type error.
Re: Beers
Beers are good. Any number of beers are good, as long as you have one:
my $beers = ...;
if (not $beers) {
say "Another one!";
} else {
say "Aaah, this is good.";
}
Boolification of a beer-counting variable just tells us if you have any beers. Consider !! to be a boolification operator:
my $enough_beer = !! $beers;
The boolification doesn't concern itself with the exact amount. But maybe any number ≥ 3 is good. Then:
my $enough_beer = ($beers >= 3);
The negation is not enough beer:
my $not_enough_beer = not($beers >= 3);
or
my $not_enough_beer = not $beers;
fetch_beer() if $not_enough_beer;
Sets
A Perl scalar does not symbolize a whole universe of things. Especially, not 3 is not the set of all entities that are not three. Is the expression 3 a truthy value? Yes. Therefore, not 3 is a falsey value.
The suggested behaviour of 4 == not 3 to be true is likely undesirable: 4 and “all things that are not three” are not equal, the four is just one of many things that are not three. We should write it correctly:
4 != 3 # four is not equal to three
or
not( 4 == 3 ) # the same
It might help to think of ! and not as logical-negation-of, but not as except.
How to teach
It might be worth introducing mathematical predicates: expressions which can be true or false. If we only ever “create” truthness by explicit tests, e.g. length($str) > 0, then your issues don't arise. We can name the results: my $predicate = (1 < 2), but we can decide to never print them out, instead: print $predicate ? "True" : "False". This sidesteps the problem of considering special representations of true or false.
Considering values to be true/false directly would then only be a shortcut, e.g. foo if $x can considered to be a shortcut for
foo if defined $x and length($x) > 0 and $x != 0;
Perl is all about shortcuts.
Teaching these shortcuts, and the various contexts of perl and where they turn up (numeric/string/boolean operators) could be helpful.
List Context
Even-sized List Context
Scalar Context
Numeric Context
String Context
Boolean Context
Void Context
as I already said, I don't know well other languages - in every language is !3 == 0?
Yes. In C (and thus C++), it's the same.
void main() {
int i = 3;
int n = !i;
int nn = !n;
printf("!3=%i ; !!3=%i\n", n, nn);
}
Prints (see http://codepad.org/vOkOWcbU )
!3=0 ; !!3=1
how to explain this to my son
Very simple. !3 means "opposite of some non-false value, which is of course false". This is called "context" - in a Boolean context imposed by negation operator, "3" is NOT a number, it's a statement of true/false.
The result is also not a "zero" but merely something that's convenient Perl representation of false - which turns into a zero if used in a numeric context (but an empty string if used in a string context - see the difference between 0 + !3 and !3 . "a")
The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed. (perldoc perldata)
why perl has this design, so why !0 is everytime 1
See above. Among other likely reasons (though I don't know if that was Larry's main reason), C has the same logic and Perl took a lot of its syntax and ideas from C.
For a VERY good underlying technical detail, see the answers here: " What do Perl functions that return Boolean actually return " and here: " Why does Perl use the empty string to represent the boolean false value? "
Is here something "behind" what requires than !0 is not any random number, but 0.
Nothing aside from simplicity of implementation. It's easier to produce a "1" than a random number.
if you're asking a different question of "why is it 1 instead of the original # that was negated to get 0", the answer to that is simple - by the time Perl interpreter gets to negate that zero, it no longer knows/remembers that zero was a result of "!3" as opposed to some other expression that resulted in a value of zero/false.
If you want to test that a number is not 3, then use this:
my_variable != 3;
Using the syntax !3, since ! is a boolean operator, first converts 3 into a boolean (even though perl may not have an official boolean type, it still works this way), which, since it is non-zero, means it gets converted to the equivalent of true. Then, !true yields false, which, when converted back to an integer context, gives 0. Continuing with that logic shows how !!3 converts 3 to true, which then is inverted to false, inverted again back to true, and if this value is used in an integer context, gets converted to 1. This is true of most modern programming languages (although maybe not some of the more logic-centered ones), although the exact syntax may vary some depending on the language...
Logically negating a false value requires some value be chosen to represent the resulting true value. "1" is as good a choice as any. I would say it is not important which value is returned (or conversely, it is important that you not rely on any particular true value being returned).

What do Perl functions that return Boolean actually return

The Perl defined function (and many others) returns "a Boolean value".
Given Perl doesn't actually have a Boolean type (and uses values like 1 for true, and 0 or undef for false) does the Perl language specify exactly what is returned for a Boolean values? For example, would defined(undef) return 0 or undef, and is it subject to change?
In almost all cases (i.e. unless there's a reason to do otherwise), Perl returns one of two statically allocated scalars: &PL_sv_yes (for true) and &PL_sv_no (for false). This is them in detail:
>perl -MDevel::Peek -e"Dump 1==1"
SV = PVNV(0x749be4) at 0x3180b8
REFCNT = 2147483644
FLAGS = (PADTMP,IOK,NOK,POK,READONLY,pIOK,pNOK,pPOK)
IV = 1
NV = 1
PV = 0x742dfc "1"\0
CUR = 1
LEN = 12
>perl -MDevel::Peek -e"Dump 1==0"
SV = PVNV(0x7e9bcc) at 0x4980a8
REFCNT = 2147483647
FLAGS = (PADTMP,IOK,NOK,POK,READONLY,pIOK,pNOK,pPOK)
IV = 0
NV = 0
PV = 0x7e3f0c ""\0
CUR = 0
LEN = 12
yes is a triple var (IOK, NOK and POK). It contains a signed integer (IV) equal to 1, a floating point number (NV) equal to 1, and a string (PV) equal to 1.
no is also a triple var (IOK, NOK and POK). It contains a signed integer (IV) equal to 0, a floating point number (NV) equal to 0, and an empty string (PV). This means it stringifies to the empty string, and it numifies to 0. It is neither equivalent to an empty string
>perl -wE"say 0+(1==0);"
0
>perl -wE"say 0+'';"
Argument "" isn't numeric in addition (+) at -e line 1.
0
nor to 0
>perl -wE"say ''.(1==0);"
>perl -wE"say ''.0;"
0
There's no guarantee that this will always remain the case. And there's no reason to rely on this. If you need specific values, you can use something like
my $formatted = $result ? '1' : '0';
They return a special false value that is "" in string context but 0 in numeric context (without a non-numeric warning). The true value isn't so special, since it's 1 in either context. defined() does not return undef.
(You can create similar values yourself with e.g. Scalar::Util::dualvar(0,"").)
Since that's the official man page I'd say that its exact return value is not specified. If the Perl documentation talks about a Boolean value then then it almost always talks about evaluating said value in a Boolean context: if (defined ...) or print while <> etc. In such contexts several values evaluate to a false: 0, undef, "" (empty strings), even strings equalling "0".
All other values evaluate to true in a Boolean context, including the infamous example "0 but true".
As the documentation is that vague I would not ever rely on defined() returning any specific value for the undefined case. However, you'll always be OK if you simply use defined() in a Boolean context without comparing it to a specific value.
OK: print "yes\n" if defined($var)
Not portable/future proof: print "yes\n" if defined($var) eq '' or something similar
It probably won't ever change, but perl does not specify the exact boolean value that defined(...) returns.
When using Boolean values good code should not depend on the actual value used for true and false.
Example:
# not so great code:
my $bool = 0; #
...
if (some condition) {
$bool = 1;
}
if ($bool == 1) { ... }
# better code:
my $bool; # default value is undef which is false
$bool = some condition;
if ($bool) { ... }
99.9% of the time there is no reason to care about the value used for the boolean.
That said, there are some cases when it is better to use an explicit 0 or 1 instead of the boolean-ness of a value. Example:
sub foo {
my $object = shift;
...
my $bool = $object;
...
return $bool;
}
the intent being that foo() is called with either a reference or undef and should return false if $object is not defined. The problem is that if $object is defined foo() will return the object itself and thus create another reference to the object, and this may interfere with its garbage collection. So here it would be better to use an explicit boolean value here, i.e.:
my $bool = $object ? 1 : 0;
So be careful about using a reference itself to represent its truthiness (i.e. its defined-ness) because of the potential for creating unwanted references to the reference.

How can I set a default value for a Perl variable?

I am completely new to Perl. I needed to use an external module HTTP::BrowserDetect. I was testing some code and tried to get the name of the OS from os_string method. So, I simply initialized the object and created a variable to store the value returned.
my $ua = HTTP::BrowserDetect->new($user_agent);
my $os_name = $ua->os_string();
print "$user_agent $os_name\n";
there are some user agents that are not browser user agents so they won't get any value from os_string. I am getting an error Use of uninitialized value $os_name in concatenation (.) or string
How do I handle such cases when the $os_name is not initialized because the method os_string returns undef (this is what I think happens from reading the module source code). I guess there should be a way to give a default string, e.g. No OS in these cases.
Please note: the original answer's approach ( $var = EXPRESSION || $default_value ) would produce the default value for ANY "Perl false values" returned from the expression, e.g. if the expression is an empty string, you will use the default value instead.
In your particular example it's probably the right thing to do (OS name should not be an empty string), but in general case, it can be a bug, if what you actually wanted was to only use the default value instead of undef.
If you only want to avoid undef values, but not empty string or zeros, you can do:
Perl 5.10 and later: use the "defined-or operator" (//):
my $os_name = $ua->os_string() // 'No OS';
Perl 5.8 and earlier: use a conditional operator because defined-or was not yet available:
my $os_string = $ua->os_string(); # Cache the value to optimize a bit
my $os_name = (defined $os_string) ? $os_string : 'No OS';
# ... or, un-optimized version (sometimes more readable but rarely)
my $os_name = (defined $ua->os_string()) ? $ua->os_string() : 'No OS';
A lot more in-depth look at the topic (including details about //) are in brian d foy's post here: http://www.effectiveperlprogramming.com/2010/10/set-default-values-with-the-defined-or-operator/
my $os_name = $ua->os_string() || 'No OS';
If $ua->os_string() is falsy (ie: undef, zero, or the empty string), then the second part of the || expression will be evaluated (and will be the value of the expression).
Are you looking for defined?