What is the definition of stringwise equality in Perl? - perl

I was curious about the results of the following perl snippets:
my $var1 ;
my $var2 ;
if( $var1 eq $var2 ) {
print "yes";
} else {
print "no";
}
and
my $var1 ;
my $var2 = "";
if( $var1 eq $var2 ) {
print "yes";
} else {
print "no";
}
They turn out to be yes(Perl 5.16).
Unlike javascript specificaton, there is clear description of Equality Comparison Algorithm(http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.3), the perl doc for Equality Operators says:
Binary "eq" returns true if the left argument is stringwise equal to the right argument.
But what is the definition of stringwise equality?

You're not having a problem with the definition of stringwise equality. What you don't seem to have wrapped your head arround yet is the concept of stringification. In this particular case, undef stringifies to ''.

Perl has monomorphic operators (mostly) and contextually polymorphic data types.
What this means is that it is the operator that dictates how the data is represented. The eq operator is for stringwise equality. The == operator is for numeric equality. When == is applied to strings, they are treated as numbers. When eq is applied to numbers, they are treated as strings. Each of these internal transformations follow a very specific set of rules.
Thus, when you apply a stringwise operator to a pair of values, those values will be treated as strings. When you apply a numeric operator to a pair of values, they will be treated as numbers.
In your second example, $var1 contains undef, and $var2 contains an empty string. When you compare $var1 eq $var2, the rules of stringification are applied to the operands. $var2 is already a string. But $var1 must be stringified before it can be compared. The rule of stringification for an undefined value is to treat it as an empty string. So $var1 eq $var2 is seen by the eq operator as '' eq '', which is to say, empty string eq empty string.
TRUE.
Note: In modern versions of Perl, using an undefined scalar variable in a stringwise operation only results in stringification for the duration of the operation; the underlying data in the container isn't altered. (see perldata).
perldata is one resource on this topic.

Your definition question has been answered by DavidO, but the (probably) most important piece of information is in TLP's comment - you should always use these two at the top of every script:
use strict;
use warnings;
The one relevant in this case is use warnings, which will produce the following warning:
Use of uninitialized value $var1 in string eq
Some programs go even further - this will cause the program to die because of this:
use warnings FATAL => 'all';
You are comparing variables using eq, which means you are doing a string comparison. A string comparison expects two strings, but you are providing an undefined variable (variable declared, but not defined, so not a string). This is not clean because you are using that operator with invalid input. If you wanted to know if the variables differ, not just the strings they (may or may not) represent, you would not use a string comparison like that.
It works because Perl knows that you want to compare strings (eq) so it assumes you don't care about the fact that a variable has no value assigned. The undefined variable is converted into an empty string (temporarily), which is then compared. (Actually, the variable itself isn't converted, it's still undef after the comparison, but that's not important.)
Of course, this assumption could be wrong and there might be a bug in your code. So it'd be better to check for invalid input before comparing it.
You don't care about the difference between undef and '' (and you know why).
You could explicitly compare empty strings instead of undef. Another programmer who is reading your code will know what's going on (won't assume there's a bug).
if (($var1 // q()) eq ($var2 // q()))
...
In many cases, you might actually care about undef.
For example, your script might take some input (maybe a hash) and if that input variable is an empty string, that would be okay, but if it's undef (maybe not found in the input hash), that would be an error.
if (!defined($var1))
{
die "Input data missing, can't continue!";
}
if ($var1 eq $var2)
...

Related

Does fetchrow_hashref return into a default value?

I'm trying to write like:
my $q='select name as N from names';
my $sth=$dbh->prepare( $q ) ;
$sth->execute;
$test->{ $_->{N} } = 1 while $sth->fetchrow_hashref();
but $_ is undef. Does the return value get stored somewhere accessible without explicitly assigning it? Tim, it sure would be handy if it got stored somewhere!
The while loop does not always implicitly assign a value to the default operator $_. As described in perldoc perlsyn:
If the condition expression of a while statement is based on any of a
group of iterative expression types then it gets some magic treatment.
The affected iterative expression types are readline, the
input operator, readdir, glob, the globbing operator, and
each. If the condition expression is one of these expression types,
then the value yielded by the iterative operator will be implicitly
assigned to $_.
(...and otherwise it is not, is implied here)
To do what you want, you need to do
... while $_ = $sth->fetchrow_hashref();
Or better yet, use the idiomatic syntax:
while (my $row = $sth->fetchrow_hashref()) {
$test->{ $row->{N} } = 1;
}

How can #? be used on a dereferenced array without first using #?

An array in perl is dereferenced like so,
my #array = #{$array_reference};
When trying to assign an array to a dereference without the '#', like,
my #array = {$array_reference};
Perl throws the error, 'Odd number of elements in anonymous hash at ./sand.pl line 22.' We can't assign it to an array variable becauase Perl is confused about the type.
So how can we perform...
my $lastindex = $#{$array_reference};
if Perl struggles to understand that '{$array_reference}' is an array type? It would make more sense to me if this looked like,
my $lastindex = $##{$array_reference};
(despite looking much uglier).
tl;dr: It's $#{$array_reference} to match the syntax of $#array.
{} is overloaded with many meanings and that's just how Perl is.
Sometimes {} creates an anonymous hash. That's what {$array_reference} is doing, trying to make a hash where the key is the stringification of $array_reference, something like "ARRAY(0x7fb21e803280)" and there is no value. Because you're trying to create a hash with a key and no value you get an "odd number of elements" warning.
Sometimes {...} is a block like sub { ... } or if(...) { ... }, or do {...} and so on.
Sometimes it's a bare block like { local $/; ... }.
Sometimes it's indicating the key of a hash like $hash{key} or $hash->{key}.
Preceeded with certain sigils {} makes dereferencing explicit. While you can write $#$array_reference or #$array_reference sometimes you want to dereference something that isn't a simple scalar. For example, if you had a function that returned an array reference you could get its size in one line with $#{ get_array_reference() }. It's $#{$array_reference} to match the syntax of $#array.
$#{...} dereferences an array and gets the index. #{...} dereferences an array. %{...} dereferences a hash. ${...} dereferences a scalar. *{...} dereferences a glob.
You might find the section on Variable Names and Sigils in Modern Perl helpful to see the pattern better.
It would make more sense to me if this looked like...
There's a lot of things like that. Perl has been around since 1987. A lot of these design decisions were made decades ago. The code for deciding what {} means is particularly complex. That there is a distinction between an array and an array reference at all is a bit odd.
$array[$index]
#array[#indexes]
#array
$#array
is equivalent to
${ \#array }[$index]
#{ \#array }[#indexes]
#{ \#array }
$#{ \#array }
See the pattern? Wherever the NAME of an array isused, you can use a BLOCK that returns a reference to an array instead. That means you can use
${ $ref }[$index]
#{ $ref }[#indexes]
#{ $ref }
$#{ $ref }
This is illustrated in Perl Dereferencing Syntax.
Note that you can omit the curlies if the BLOCK contains nothing but a simple scalar.
$$ref[$index]
#$ref[#indexes]
#$ref
$#$ref
There's also an "arrow" syntax which is considered clearer.
$ref->[$index]
$ref->#[#indexes]
$ref->#*
$ref->$#*
Perl is confused about the type
Perl struggles to understand that '{$array_reference}' is an array type
Well, it's not an array type. Perl doesn't "struggle"; you just have wrong expectations.
The general rule (as explained in perldoc perlreftut) is: You can always use a reference in curly braces in place of a variable name.
Thus:
#array # a whole array
#{ $array_ref } # same thing with a reference
$array[$i] # an array element
${ $array_ref }[$i] # same thing with a reference
$#array # last index of an array
$#{ $array_ref } # same thing with a reference
On the other hand, what's going on with
my #array = {$array_reference};
is that you're using the syntax for a hash reference constructor, { LIST }. The warning occurs because the list in question is supposed to have an even number of elements (for keys and values):
my $hash_ref = {
key1 => 'value1',
key2 => 'value2',
};
What you wrote is treated as
my #array = ({
$array_reference => undef,
});
i.e. an array containing a single element, which is a reference to a hash containing a single key, which is a stringified reference (and whose value is undef).
The syntactic difference between a dereference and a hashref constructor is that a dereference starts with a sigil (such as $, #, or %) whereas a hashref constructor starts with just a bare {.
Technically speaking the { } in the dereference syntax form an actual block of code:
print ${
print "one\n"; # yeah, I just put a statement in the middle of an expression
print "two\n";
["three"] # the last expression in this block is implicitly returned
# (and dereferenced by the surrounding $ [0] construct outside)
}[0], "\n";
For (hopefully) obvious reasons, no one actually does this in real code.
The syntax is
my $lastindex = $#$array_reference;
which assigns to $lastindex the index of the last element of the anonymous array which reference is in the variable $array_reference.
The code
my #ary = { $ra }; # works but you get a warning
doesn't throw "an error" but rather a warning. In other words, you do get #ary with one element, a reference to an anonymous hash. However, a hash need have an even number of elements so you also get a warning that that isn't so.
Your last attempt dereferences the array with #{$array_reference} -- which returns a list, not an array variable. A "list" is a fleeting collection of scalars in memory (think of copying scalars on stack to go elsewhere); there is no notion of "index" for such a thing. For this reason a $##{$ra} isn't even parsed as intended and is a syntax error.
The syntax $#ary works only with a variable #ary, and then there is the $#$arrayref syntax. You can in general write $#{$arrayref} since the curlies allow for an arbitrary expression that evaluates to an array reference but there is no reason for that since you do have a variable with an array reference.
I'd agree readily that much of this syntax takes some getting-used-to, to put it that way.

Comparing empty string with number

I have a global value:
my $gPrevious ='';
# main();
func();
sub func {
my $localval = 52552;
if ($gPrevious != $localval) {
------------
x statements;
}
}
Output:
Argument "" isn't numeric in numeric ne (!=) at line x.
Why do I use the ne operator?
Here I am comparing with an empty string.
The value undef is there specifically so that you can test whether a variable has been defined yet. Initialising a variable to a string when it is to be compared to a number rather defeats this purpose.
You should leave your $gPrevious variable undefined, and test for that inside your subroutine.
Like this
my $gPrevious;
func();
sub func {
my $localval = 52552;
unless (defined $gPrevious and $gPrevious == $localval) {
# x statements;
}
}
Using the '==' or '!=' comparison operator requires having two numbers.
If one of the two is a string that perl can easily make a number such as "1", perl will do what you want and compare the 1.
But: If you compare a string or undef with the '!=' operator, there will be a warning.
In your case, I suggest you simply use
$gPrevious = 0;
# ...
if ( $gPrevious != $local_val ){ #...
which would get rid of the warning and work fine.
Testing for defined is another possibility, while testing for integers is not trivial.
You could use eq instead, but this is problematic, as then for example '1.0' ne '1'.
Perl provides two separate operators for string comparison (eq/ne) and numeric comparison (==/!=). LHS value must match the type of operator being used. In your case, LHS is a string and you are using a numeric operator for comparison. Hence, the error.

Perl dereferencing in non-strict mode

In Perl, if I have:
no strict;
#ARY = (58, 90);
To operate on an element of the array, say it, the 2nd one, I would write (possibly as part of a larger expression):
$ARY[1] # The most common way found in Perldoc's idioms.
Though, for some reason these also work:
#ARY[1]
#{ARY[1]}
Resulting all in the same object:
print (\$ARY[1]);
print (\#ARY[1]);
print (\#{ARY[1]});
Output:
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
What is the syntax rules that enable this sort of constructs? How far could one devise reliable program code with each of these constructs, or with a mix of all of them either? How interchangeable are these expressions? (always speaking in a non-strict context).
On a concern of justifying how I come into this question, I agree "use strict" as a better practice, still I'm interested at some knowledge on build-up non-strict expressions.
In an attemp to find myself some help to this uneasiness, I came to:
The notion on "no strict;" of not complaining about undeclared
variables and quirk syntax.
The prefix dereference having higher precedence than subindex [] (perldsc § "Caveat on precedence").
The clarification on when to use # instead of $ (perldata § "Slices").
The lack of "[]" (array subscript / slice) description among the Perl's operators (perlop), which lead me to think it is not an
operator... (yet it has to be something else. But, what?).
For what I learned, none of these hints, put together, make me better understand my issue.
Thanks in advance.
Quotation from perlfaq4:
What is the difference between $array[1] and #array[1]?
The difference is the sigil, that special character in front of the array name. The $ sigil means "exactly one item", while the # sigil means "zero or more items". The $ gets you a single scalar, while the # gets you a list.
Please see: What is the difference between $array[1] and #array[1]?
#ARY[1] is indeed a slice, in fact a slice of only one member. The difference is it creates a list context:
#ar1[0] = qw( a b c ); # List context.
$ar2[0] = qw( a b c ); # Scalar context, the last value is returned.
print "<#ar1> <#ar2>\n";
Output:
<a> <c>
Besides using strict, turn warnings on, too. You'll get the following warning:
Scalar value #ar1[0] better written as $ar1[0]
In perlop, you can read that "Perl's prefix dereferencing operators are typed: $, #, %, and &." The standard syntax is SIGIL { ... }, but in the simple cases, the curly braces can be omitted.
See Can you use string as a HASH ref while "strict refs" in use? for some fun with no strict refs and its emulation under strict.
Extending choroba's answer, to check a particular context, you can use wantarray
sub context { return wantarray ? "LIST" : "SCALAR" }
print $ary1[0] = context(), "\n";
print #ary1[0] = context(), "\n";
Outputs:
SCALAR
LIST
Nothing you did requires no strict; other than to hide your error of doing
#ARY = (58, 90);
when you should have done
my #ARY = (58, 90);
The following returns a single element of the array. Since EXPR is to return a single index, it is evaluated in scalar context.
$array[EXPR]
e.g.
my #array = qw( a b c d );
my $index = 2;
my $ele = $array[$index]; # my $ele = 'c';
The following returns the elements identified by LIST. Since LIST is to return 0 or more elements, it must be evaluated in list context.
#array[LIST]
e.g.
my #array = qw( a b c d );
my #indexes ( 1, 2 );
my #slice = $array[#indexes]; # my #slice = qw( b c );
\( $ARY[$index] ) # Returns a ref to the element returned by $ARY[$index]
\( #ARY[#indexes] ) # Returns refs to each element returned by #ARY[#indexes]
${foo} # Weird way of writing $foo. Useful in literals, e.g. "${foo}bar"
#{foo} # Weird way of writing #foo. Useful in literals, e.g. "#{foo}bar"
${foo}[...] # Weird way of writing $foo[...].
Most people don't even know you can use these outside of string literals.

= and , operators in Perl

Please explain this apparently inconsistent behaviour:
$a = b, c;
print $a; # this prints: b
$a = (b, c);
print $a; # this prints: c
The = operator has higher precedence than ,.
And the comma operator throws away its left argument and returns the right one.
Note that the comma operator behaves differently depending on context. From perldoc perlop:
Binary "," is the comma operator. In
scalar context it evaluates its left
argument, throws that value away, then
evaluates its right argument and
returns that value. This is just like
C's comma operator.
In list context, it's just the list
argument separator, and inserts both
its arguments into the list. These
arguments are also evaluated from left
to right.
As eugene's answer seems to leave some questions by OP i try to explain based on that:
$a = "b", "c";
print $a;
Here the left argument is $a = "b" because = has a higher precedence than , it will be evaluated first. After that $a contains "b".
The right argument is "c" and will be returned as i show soon.
At that point when you print $a it is obviously printing b to your screen.
$a = ("b", "c");
print $a;
Here the term ("b","c") will be evaluated first because of the higher precedence of parentheses. It returns "c" and this will be assigned to $a.
So here you print "c".
$var = ($a = "b","c");
print $var;
print $a;
Here $a contains "b" and $var contains "c".
Once you get the precedence rules this is perfectly consistent
Since eugene and mugen have answered this question nicely with good examples already, I am going to setup some concepts then ask some conceptual questions of the OP to see if it helps to illuminate some Perl concepts.
The first concept is what the sigils $ and # mean (we wont descuss % here). # means multiple items (said "these things"). $ means one item (said "this thing"). To get first element of an array #a you can do $first = $a[0], get the last element: $last = $a[-1]. N.B. not #a[0] or #a[-1]. You can slice by doing #shorter = #longer[1,2].
The second concept is the difference between void, scalar and list context. Perl has the concept of the context in which your containers (scalars, arrays etc.) are used. An easy way to see this is that if you store a list (we will get to this) as an array #array = ("cow", "sheep", "llama") then we store the array as a scalar $size = #array we get the length of the array. We can also force this behavior by using the scalar operator such as print scalar #array. I will say it one more time for clarity: An array (not a list) in scalar context will return, not an element (as a list does) but rather the length of the array.
Remember from before you use the $ sigil when you only expect one item, i.e. $first = $a[0]. In this way you know you are in scalar context. Now when you call $length = #array you can see clearly that you are calling the array in scalar context, and thus you trigger the special property of an array in list context, you get its length.
This has another nice feature for testing if there are element in the array. print '#array contains items' if #array; print '#array is empty' unless #array. The if/unless tests force scalar context on the array, thus the if sees the length of the array not elements of it. Since all numerical values are 'truthy' except zero, if the array has non-zero length, the statement if #array evaluates to true and you get the print statement.
Void context means that the return value of some operation is ignored. A useful operation in void context could be something like incrementing. $n = 1; $n++; print $n; In this example $n++ (increment after returning) was in void context in that its return value "1" wasn't used (stored, printed etc).
The third concept is the difference between a list and an array. A list is an ordered set of values, an array is a container that holds an ordered set of values. You can see the difference for example in the gymnastics one must do to get particular element after using sort without storing the result first (try pop sort { $a cmp $b } #array for example, which doesn't work because pop does not act on a list, only an array).
Now we can ask, when you attempt your examples, what would you want Perl to do in these cases? As others have said, this depends on precedence.
In your first example, since the = operator has higher precedence than the ,, you haven't actually assigned a list to the variable, you have done something more like ($a = "b"), ("c") which effectively does nothing with the string "c". In fact it was called in void context. With warnings enabled, since this operation does not accomplish anything, Perl attempts to warn you that you probably didn't mean to do that with the message: Useless use of a constant in void context.
Now, what would you want Perl to do when you attempt to store a list to a scalar (or use a list in a scalar context)? It will not store the length of the list, this is only a behavior of an array. Therefore it must store one of the values in the list. While I know it is not canonically true, this example is very close to what happens.
my #animals = ("cow", "sheep", "llama");
my $return;
foreach my $animal (#animals) {
$return = $animal;
}
print $return;
And therefore you get the last element of the list (the canonical difference is that the preceding values were never stored then overwritten, however the logic is similar).
There are ways to store a something that looks like a list in a scalar, but this involves references. Read more about that in perldoc perlreftut.
Hopefully this makes things a little more clear. Finally I will say, until you get the hang of Perl's precedence rules, it never hurts to put in explicit parentheses for lists and function's arguments.
There is an easy way to see how Perl handles both of the examples, just run them through with:
perl -MO=Deparse,-p -e'...'
As you can see, the difference is because the order of operations is slightly different than you might suspect.
perl -MO=Deparse,-p -e'$a = a, b;print $a'
(($a = 'a'), '???');
print($a);
perl -MO=Deparse,-p -e'$a = (a, b);print $a'
($a = ('???', 'b'));
print($a);
Note: you see '???', because the original value got optimized away.