What is the most efficient operator to compare any two items? - type-conversion

Frequently I need to convert data from one type to another and then compare them. Some operators will convert to specific types first and this conversion may cause loss of efficiency. For instance, I may have
my $a, $b = 0, "foo"; # initial value
$a = (3,4,5).Set; # re-assign value
$b = open "dataFile"; # re-assign value
if $a eq $b { say "okay"; } # convert to string
if $a == 5 { say "yes"; } # convert to number
if $a == $b {} # error, Cannot resolve caller Numeric(IO::Handle:D: );
The operators "eq" and "==" will convert data to the digestible types first and may slow things down. Will the operators "eqv" and "===" skip converting data types and be more efficient if data to be compared cannot be known in advance (i.e., you absolutely have no clue what you are going to get in advance)?

It's not quite clear to me from the question if you actually want the conversions to happen or not.
Operators like == and eq are really calls to multi subs with names like infix:<==>, and there are many candidates. For example, there's one for (Int, Int), which is selected if we're comparing two Ints. In that case, it knows that it doesn't need to coerce, and will just do the integer comparison.
The eqv and === operators will not coerce; the first thing they do is to check that the values have the same type, and if they don't, they go no further. Make sure to use the correct one depending of if you want snapshot semantics (eqv) or reference semantics (===). Note that the types really must be the exact same, so 1e0 === 1 will not come out true because the one value is a Num and the other an Int.
The auto-coercion behavior of operators like == and eq can be really handy, but from a performance angle it can also be a trap. They coerce, use the result of the coercion for the comparison, and then throw it away. Repeatedly doing comparisons can thus repeatedly trigger coercions. If you have that situation, it makes sense to split the work into two phases: first "parse" the incoming data into the appropriate data types, and then go ahead and do the comparisons.
Finally, in any discussion on efficiency, it's worth noting that the runtime optimizer is good at lifting out duplicate type checks. Thus while in principle, if you read the built-ins source, == might seem cheaper in the case it gets two things have the same type because it isn't enforcing they are precisely the same type, in reality that extra check will get optimized out for === anyway.

Both === and eqv first check whether the operands are of the same type, and will return False if they are not. So at that stage, there is no real difference between them.
The a === b operator is really short for a.WHICH eq b.WHICH. So it would call the .WHICH method on the operands, which could be expensive if an operand is something like a really large Buf.
The a eqv b operator is more complicated in that it has special cased many object comparisons, so in general you cannot say much about it.
In other words: YMMV. And if you're really interested in performance, benchmark! And be prepared to adapt your code if another way of solving the problem proves more performant.

Related

*why* does list assignment flatten its left hand side?

I understand that list assignment flattens its left hand side:
my ($a, $b, $c);
($a, ($b, $c)) = (0, (1.0, 1.1), 2);
say "\$a: $a"; # OUTPUT: «$a: 0»
say "\$b: $b"; # OUTPUT: «$b: 1 1.1» <-- $b is *not* 1
say "\$c: $c"; # OUTPUT: «$c: 2» <-- $c is *not* 1.1
I also understand that we can use :($a, ($b, $c)) := (0, (1.0, 1.1)) to get the non-flattening behavior.
What I don't understand is why the left hand side is flattened during list assignment. And that's kind of two questions: First, how does this flattening behavior fit in with the rest of the language? Second, does auto-flattening allow any behavior that would be impossible if the left hand side were non-flattening?
On the first question, I know that Raku historically had a lot of auto-flattening behavior. Before the Great List Refactor, an expression like my #a = 1, (2, 3), 4 would auto-flatten its right hand side, resulting in the Array [1, 2, 3, 4]; similarly, map and many other iterating constructs would flatten their arguments. Post-GLR, though, Raku basically never flattens a list without being told to. In fact, I can't think of any other situation where Raku flattens without flat, .flat, |, or *# being involved somehow. (#_ creates an implicit *#). Am I missing something, or is the LHS behavior in list assignment really inconsistent with post-GLR semantics? Is this behavior a historical oddity, or does it still make sense?
With respect to my second question, I suspect that the flattening behavior of list assignment may somehow help support for laziness. For example, I know that we can use list assignment to consume certain values from a lazy list without producing/calculating them all – whereas using := with a list will need to calculate all of the RHS values. But I'm not sure if/how auto-flattening the LHS is required to support this behavior.
I also wonder if the auto-flattening has something to do with the fact that = can be passed to meta operators – unlike :=, which generates a "too fiddly" error if used with a metaoperator. But I don't know how/if auto-flattening makes = less "fiddly".
[edit: I've found IRC references to the "(GLR-preserved) decision that list assignment is flattening" as early as early as 2015-05-02, so it's clear that this decision was intentional and well-justified. But, so far, I haven't found that justification and suspect that it may have been decided at in-person meetings. So I'm hopping someone knows.]
Finally, I also wonder how the LHS is flattened, at a conceptual level. (I don't mean in the Rakudo implementation specifically; I mean as a mental model). Here's how I'd been thinking about binding versus list assignment:
my ($a, :$b) := (4, :a(2)); # Conceptually similar to calling .Capture on the RHS
my ($c, $d, $e);
($c, ($d, $e) = (0, 1, 2); # Conceptually similar to calling flat on the LHS
Except that actually calling .Capture on the RHS in line 1 works, whereas calling flat on the LHS in line 3 throws a Cannot modify an immutable Seq error – which I find very confusing, given that we flatten Seqs all the time. So is there a better mental model for thinking about this auto-flattening behavior?
Thanks in advance for any help. I'm trying to understand this better as part of my work to improve the related docs, so any insight you can provide would support that effort.
Somehow, answering the questions parts in the opposite order felt more natural to me. :-)
Second, does auto-flattening allow any behavior that would be impossible if the left hand side were non-flattening?
It's relatively common to want to assign the first (or first few) items of a list into scalars and have the rest placed into an array. List assignment descending into iterables on the left is what makes this work:
my ($first, $second, #rest) = 1..5;
.say for $first, $second, #rest;'
The output being:
1
2
[3 4 5]
With binding, which respects structure, it would instead be more like:
my ($first, $second, *#rest) := |(1..5);
First, how does this flattening behavior fit in with the rest of the language?
In general, operations where structure would not have meaning flatten it away. For example:
# Process arguments
my $proc = Proc::Async.new($program, #some-args, #some-others);
# Promise combinators
await Promise.anyof(#downloads, #uploads);
# File names
unlink #temps, #previous-output;
# Hash construction
my #a = x => 1, y => 2;
my #b = z => 3;
dd hash #a, #b; # {:x(1), :y(2), :z(3)}
List assignment could, of course, have been defined in a structure-respecting way instead. These things tend to happen for multiple reasons, but for one but the language already has binding for when you do want to do structured things, and for another my ($first, #rest) = #all is just a bit too common to send folks wanting it down the binding/slurpy power tool path.

Is the uplus function useful?

This is a rhetorical question about the uplus function in MATLAB, or its corresponding operator, the unary plus +.
Is there a case where this operator is useful? Even better, is there a case where this operator is necessary?
It is not necessary, a language without a unary plus does not allow to write +1. Obviously you could also write 1 but when importing data which always writes the + or - it's very nice to have.
Searching some source codes, I found a curious use of +
A=+A
which replaced the code:
if ~isnumeric(A)
A=double(A);
end
It casts chars and logicals to double, but all numeric data types remain untouched.
It can be useful when defining new numeric types.
Suppose you define quaternion and overload uplus:
classdef quaternion
...
end
Then in your code you can write:
x = quaternion(...);
y = [+x, -x];
z = +quaternion.Inf;
t = -quaternion.Inf;
If you don't you cannot have same syntax as for other numeric.
PS: To the question "is it useful" (in the sence mandatory for some syntaxes) ... well I can't find any reason ... but sometimes writting '+x' make things clearer when reading back the code.
I'm not sure if this fully constitutes "useful" or if it's the best programming practice, but in some cases, one may wish to use the unary + for symmetry/clarity reasons. There's probably a better example, but I'm thinking of something like this:
A = [+1 -1 +1;
-1 +1 -1;
+1 -1 +1];
As for the uplus function, it's kind of a NOOP for numeric operations. If one writes a function that requires a function handle input to specify an operation to perform, it might be useful to have do nothing option.
Lastly, numeric operators can be overloaded for other classes. The uplus function could have more use in other built-in classes or even one you might want write yourself.

Concerns with concatenating strings and ints

I have taken a principles of programming class and have been given a Perl expression that concatenates a string number to an int number and then adds another number to it and it evaluates fine. i.e. ("4" . 3) + 7 == 50.
I'm trying to understand why Perl does this and what concerns it may bring up. I'm having a hard time grasping many of the concepts of the class and am trying to get explanations from different sources apart from my horrible teacher and equally horrible notes.
Can the concept behind this kind of expression be explained to me as well as concerns they might bring up? Thanks in advance for the help.
Edit: For Clarity
Perl is built around the central concept of 'do what I mean'.
A scalar is a multi purpose variable type, and is intended to implicitly cast values to a data type that's appropriate to what you're doing.
The reason this works is because perl is context sensitive - it knows the difference between different expected return values.
At a basic level, you can see this with the wantarray function. (Which as noted below - is probably badly named, because we're talking about a LIST context)
sub context_test {
if ( not defined wantarray() ) {
print "Void context\n";
}
if ( wantarray() ) {
return ( "List", "Context" );
}
else {
return "scalar context";
}
}
context_test();
my $scalar = context_test();
my #list = context_test();
print "Scalar context gave me $scalar\n";
print "List context gave me #list\n";
This principle occurs throughout perl. If you want, you can use something like Contextual::Return to extend this further - testing the difference between numeric, string and boolean subsets of scalar contexts.
The reason I mention this is because a scalar is a special sort of data type - if you look at Scalar::Util you will see a capability of creating a dualvar - a scalar that has different values in different contexts.
my $dualvar = Scalar::Util::dualvar ( 666, "the beast" );
print "Numeric:",$dualvar + 0,"\n";
print "String:",$dualvar . '',"\n";
Now, messing around with dualvars is a good way to create some really annoying and hard to trace bugs, but the point is - a scalar is a magic datatype, and perl is always aware of what you're doing with the result.
If you perform a string operation, perl treats it as a string. If you perform a numeric operation, perl tries to treat it as a number.
my $value = '4'; #string;
my $newvalue = $value . 3; #because we concat, perl treats _both_ as strings.
print $newvalue,"\n";
my $sum = $newvalue + 7; #perl turns strings back to numbers, because we're adding;
print $sum,"\n";
if ( Scalar::Util::isdual ( $newvalue ) ) { print "newvalue Is a dual var\n" };
if ( not Scalar::Util::isdual ( $sum ) ) { print "sum is NOT a dual var\n"; };
Mostly 'context' is something that happens behind the scenes in perl, and you don't have to worry about it. If you've come from a programming background, the idea of implicit casting between int and string may seem a little bit dirty. But it mostly works fine.
You may occasionally get errors like:
Argument "4a3" isn't numeric in addition (+)
One of the downsides of this approach is these are runtime errors, because you're not doing strong type checking at 'compile' time.
So in terms of specific concerns:
You're runtime type checking, not compile time. If you have strict types, you can detect an attempt to add a string to an int before you start to run anything.
You're not always operating in the context that you assume you are, which can lead to some unpredictable behaviour. One of the best examples is that print operates in a list context - so to take the example above:
print context_test();
You'll get List Context.
If you monkey around with context sensitive return types, you can create some really annoying bugs that are immensely irritating to back trace and troubleshoot.

if (Option.nonEmpty) vs Option.foreach

I want to perform some logic if the value of an option is set.
Coming from a java background, I used:
if (opt.nonEmpty) {
//something
}
Going a little further into scala, I can write that as:
opt.foreach(o => {
//something
})
Which one is better? The "foreach" one sounds more "idiomatic" and less Java, but it is less readable - "foreach" applied to a single value sounds weird.
Your example is not complete and you don't use minimal syntax. Just compare these two versions:
if (opt.nonEmpty) {
val o = opt.get
// ...
}
// vs
opt foreach {
o => // ...
}
and
if (opt.nonEmpty)
doSomething(opt.get)
// vs
opt foreach doSomething
In both versions there is more syntactic overhead in the if solution, but I agree that foreach on an Option can be confusing if you think of it only as an optional value.
Instead foreach describes that you want to do some sort of side effects, which makes a lot of sense if you think of Option being a monad and foreach just a method to transform it. Using foreach has furthermore the great advantage that it makes refactorings easier - you can just change its type to a List or any other monad and you will not get any compiler errors (because of Scalas great collections library you are not constrained to use only operations that work on monads, there are a lot of methods defined on a lot of types).
foreach does make sense, if you think of Option as being like a List, but with a maximum of one element.
A neater style, IMO, is to use a for-comprehension:
for (o <- opt) {
something(o)
}
foreach makes sense if you consider Option to be a list that can contain at most a single value. This also leads to a correct intuition about many other methods that are available to Option.
I can think of at least one important reason you might want to prefer foreach in this case: it removes possible run-time errors. With the nonEmpty approach, you'll at one point have to do a get*, which can crash your program spectacularly if you by accident forget to check for emptiness one time.
If you completely erase get from your mind to avoid bugs of that kind, a side effect is that you also have less use for nonEmpty! And you'll start to enjoy foreach and let the compiler take care of what should happen if the Option happens to be empty.
You'll see this concept cropping up in other places. You would never do
if (age.nonEmpty)
Some(age.get >= 18)
else
None
What you'll rather see is
age.map(_ >= 18)
The principle is that you want to avoid having to write code that handles the failure case – you want to offload that burden to the compiler. Many will tell you that you should never use get, however careful you think you are about pre-checking. So that makes things easier.
* Unless you actually just want to know whether or not the Option contains a value and don't really care for the value, in which case nonEmpty is just fine. In that case it serves as a sort of toBoolean.
Things I didn't find in the other answers:
When using if, I prefer if (opt isDefined) to if (opt nonEmpty) as the former is less collection-like and makes it more clear we're dealing with an option, but that may be a matter of taste.
if and foreach are different in the sense that if is an expression that will return a value while foreach returns Unit. So in a certain way using foreach is even more java-like than if, as java has a foreach loop and java's if is not an expression. Using a for comprehension is more scala-like.
You can also use pattern matching (but this is also also less idiomatic)
You can use the fold method, which takes two functions so you can evaluate one expression in the Some case and another in the None case. You may need to explicitly specify the expected type because of how type inference works (as shown here). So in my opinion it may sometimes still be clearer to use either pattern matching or val result = if (opt isDefined) expression1 else expression2.
If you don't need a return value and really have no need to handle the not-defined case, you can use foreach.

Why doesn't Array's == function return true for Array(1,2) == Array(1,2)?

In Programming in Scala the authors write that Scala's == function compares value equality instead of reference equality.
This works as expected on lists:
scala> List(1,2) == List(1,2)
res0: Boolean = true
It doesn't however work on arrays:
scala> Array(1,2) == Array(1,2)
res1: Boolean = false
The authors recommend to use the sameElements function instead:
scala> Array(1,2).sameElements(Array(1,2))
res2: Boolean = true
As an explanation they write:
While this may seem like an inconsistency, encouraging an explicit test of the equality of two mutable data structures is a conservative approach on the part of the language designers. In the long run, it should save you from unexpected results in your conditionals.
What does this mean? What kind of unexpected results are they talking about? What else could I expect from an array comparison than to return true if the arrays contain the same elements in the same position? Why does the equals function work on List but not on Array?
How can I make the equals function work on arrays?
It is true that the explanation offered in the book is questionable, but to be fair it was more believable when they wrote it. It's still true in 2.8, but we have to retrofit different reasoning because as you've noticed, all the other collections do element comparisons even if they're mutable.
A lot of blood had been shed trying to make Arrays seem like the rest of the collections, but this was a tremendously leaky abstraction and in the end it was impossible. It was determined, correctly I think, that we should go to the other extreme and supply native arrays the way they are, using implicit machinery to enhance their capabilities. Where this most noticeably falls down is toString and equals, because neither of them behaves in a reasonable fashion on Arrays, but we cannot intercept those calls with implicit conversions because they are defined on java.lang.Object. (Conversions only happen when an expression doesn't type check, and those always type check.)
So you can pick your explanation, but in the end arrays are treated fundamentally differently by the underlying architecture and there's no way to paper over that without paying a price somewhere. It's not a terrible situation, but it is something you have to be aware of.
This exact question has been voiced many times (by myself too, see Strange behaviour of the Array type ).
Note that it is ONLY the Array collection that does not support ==, all other collections do. The root cause is that Array IS the Java array.
It's all about referential transparency. The idea is, if two values are ==, it shouldn't matter which one you use for something. If you have two arrays with the same contents, it clearly matters which one you modify, so == returns false unless they are the same one.