Why doesn't Array's == function return true for Array(1,2) == Array(1,2)? - scala

In Programming in Scala the authors write that Scala's == function compares value equality instead of reference equality.
This works as expected on lists:
scala> List(1,2) == List(1,2)
res0: Boolean = true
It doesn't however work on arrays:
scala> Array(1,2) == Array(1,2)
res1: Boolean = false
The authors recommend to use the sameElements function instead:
scala> Array(1,2).sameElements(Array(1,2))
res2: Boolean = true
As an explanation they write:
While this may seem like an inconsistency, encouraging an explicit test of the equality of two mutable data structures is a conservative approach on the part of the language designers. In the long run, it should save you from unexpected results in your conditionals.
What does this mean? What kind of unexpected results are they talking about? What else could I expect from an array comparison than to return true if the arrays contain the same elements in the same position? Why does the equals function work on List but not on Array?
How can I make the equals function work on arrays?

It is true that the explanation offered in the book is questionable, but to be fair it was more believable when they wrote it. It's still true in 2.8, but we have to retrofit different reasoning because as you've noticed, all the other collections do element comparisons even if they're mutable.
A lot of blood had been shed trying to make Arrays seem like the rest of the collections, but this was a tremendously leaky abstraction and in the end it was impossible. It was determined, correctly I think, that we should go to the other extreme and supply native arrays the way they are, using implicit machinery to enhance their capabilities. Where this most noticeably falls down is toString and equals, because neither of them behaves in a reasonable fashion on Arrays, but we cannot intercept those calls with implicit conversions because they are defined on java.lang.Object. (Conversions only happen when an expression doesn't type check, and those always type check.)
So you can pick your explanation, but in the end arrays are treated fundamentally differently by the underlying architecture and there's no way to paper over that without paying a price somewhere. It's not a terrible situation, but it is something you have to be aware of.

This exact question has been voiced many times (by myself too, see Strange behaviour of the Array type ).
Note that it is ONLY the Array collection that does not support ==, all other collections do. The root cause is that Array IS the Java array.

It's all about referential transparency. The idea is, if two values are ==, it shouldn't matter which one you use for something. If you have two arrays with the same contents, it clearly matters which one you modify, so == returns false unless they are the same one.

Related

Does an expression exist such that <<expression>> == <<expression>> is always false?

I'm not an expert on Haskell. And this question is not exactly a Haskell question, but I know Haskell people would have a better understanding of what I'm trying to achieve.
So I'm building a dynamic language and I want it to be pure... Totally pure. (With support for IO effects, and I already know why, but that's not part of this question)
Also, I would like it to have some form of polymorphism, so I'm toying with the idea of adding class support.
(Also, everything in the language is supposed to be an expression, so yep, no statements)
While exploring the idea I ended up realizing that in order for it to be referentially transparent, class expressions should be able to be substituted too.
The thing with class expressions is that one of its main functionalities is to check whether some value is instance of it.
So
val Person =class {...}
val person1 =Person(blabla)
Person.instantiated(person1) // returns true
// Equivalent to
class {...}.
instantiated(class{...}(blabla))
Yet! That last part makes no sense... It feels wrong, like I created two different classes
So!
Is there an expression such that
val expr = <<expression>>
expr == expr // true
But <<expression>> == <<expression>> is false?
In a pure language?
I think that what I'm asking is equivalent to asking if the newtype Haskell statement could become an expression
The way you've worded your question, you're likely to get at least a few answers that talk about peculiarities of the == operator (and, as I write this, you've already gotten one comment to that effect). But, that's not what you're asking, so forget about ==. Go back to your class example.
Referential transparency implies that after:
val Person = class {<PERSONCLASSDEFN>}
val person1 = Person(<PERSONARGS>)
the two expressions:
Person.instantiated(person1)
and:
(class {<PERSONCLASSDEFN>}).instantiated((class {<PERSONCLASSDEFN>})(<PERSONARGS>))
should be indistinguishable. That is, a program's meaning should not change if one is substituted for the other and vice versa.
Therefore, the identity of classes must depend only on their definition (the part in the curly braces), not where or how many times they are (re)defined or the names they are given.
As a simpler example, you should also consider the implications of:
val Person = class {<CLASSDEFN>}
val Automobile = class {<CLASSDEFN>}
val person = Person(<ARGS>)
val automobile = Automobile(<ARGS>)
after which, the two objects person and automobile should be indistinguishable.
I find it difficult to see what this question actually is about, but maybe the problem is that you're talking about equalities when you actually mean equivalence relations?
Two objects that are an instance of the same class are typically not equal, and correspondingly == will yield False. Yet they are equivalent in the sense of being instances of the same class. They're members of the same equivalence class (mathematical term; the usage of the word “class” in both OO and Haskell descends from this).
You can just have that equivalence class as a different operator. Like, in Python
def sameclassinstances(a, b):
return (type(a) is type(b))
Depending on your language's syntax that could of course also be a custom infix operator like
infix 4 ~=.
A separate issue is that equality itself can be interpreted as either value equality (always in Haskell), or some form of implementation equality or reference equality, which is fairly common in other languages. But if you want your language to be pure, you should probably stay away from the latter, or give it a telling name like Haskell's reallyUnsafePtrEquality.

What is the most efficient operator to compare any two items?

Frequently I need to convert data from one type to another and then compare them. Some operators will convert to specific types first and this conversion may cause loss of efficiency. For instance, I may have
my $a, $b = 0, "foo"; # initial value
$a = (3,4,5).Set; # re-assign value
$b = open "dataFile"; # re-assign value
if $a eq $b { say "okay"; } # convert to string
if $a == 5 { say "yes"; } # convert to number
if $a == $b {} # error, Cannot resolve caller Numeric(IO::Handle:D: );
The operators "eq" and "==" will convert data to the digestible types first and may slow things down. Will the operators "eqv" and "===" skip converting data types and be more efficient if data to be compared cannot be known in advance (i.e., you absolutely have no clue what you are going to get in advance)?
It's not quite clear to me from the question if you actually want the conversions to happen or not.
Operators like == and eq are really calls to multi subs with names like infix:<==>, and there are many candidates. For example, there's one for (Int, Int), which is selected if we're comparing two Ints. In that case, it knows that it doesn't need to coerce, and will just do the integer comparison.
The eqv and === operators will not coerce; the first thing they do is to check that the values have the same type, and if they don't, they go no further. Make sure to use the correct one depending of if you want snapshot semantics (eqv) or reference semantics (===). Note that the types really must be the exact same, so 1e0 === 1 will not come out true because the one value is a Num and the other an Int.
The auto-coercion behavior of operators like == and eq can be really handy, but from a performance angle it can also be a trap. They coerce, use the result of the coercion for the comparison, and then throw it away. Repeatedly doing comparisons can thus repeatedly trigger coercions. If you have that situation, it makes sense to split the work into two phases: first "parse" the incoming data into the appropriate data types, and then go ahead and do the comparisons.
Finally, in any discussion on efficiency, it's worth noting that the runtime optimizer is good at lifting out duplicate type checks. Thus while in principle, if you read the built-ins source, == might seem cheaper in the case it gets two things have the same type because it isn't enforcing they are precisely the same type, in reality that extra check will get optimized out for === anyway.
Both === and eqv first check whether the operands are of the same type, and will return False if they are not. So at that stage, there is no real difference between them.
The a === b operator is really short for a.WHICH eq b.WHICH. So it would call the .WHICH method on the operands, which could be expensive if an operand is something like a really large Buf.
The a eqv b operator is more complicated in that it has special cased many object comparisons, so in general you cannot say much about it.
In other words: YMMV. And if you're really interested in performance, benchmark! And be prepared to adapt your code if another way of solving the problem proves more performant.

Convert tuple to array in Scala

What is the best way to convert a tuple into an array in Scala? Here "best" means in as few lines of code as possible. I was shocked to search Google and StackOverflow only to find nothing on this topic, which seems like it should be trivial and common. Lists have a a toArray function; why don't tuples?
Use productIterator, immediately followed by toArray:
(42, 3.14, "hello", true).productIterator.toArray
gives:
res0: Array[Any] = Array(42, 3.14, hello, true)
The type of the result shows the main reason why it's rarely used: in tuples, the types of the elements can be heterogeneous, in arrays they must be homogeneous, so that often too much type information is lost during this conversion. If you want to do this, then you probably shouldn't have stored your information in tuples in the first place.
There is simply almost nothing you can (safely) do with an Array[Any], except printing it out, or converting it to an even more degenerate Set[Any]. Instead you could use:
Lists of case classes belonging to a common sealed trait,
shapeless HLists,
a carefully chosen base class with a bit of inheritance,
or something that at least keeps some kind of schema at runtime (like Apache Spark Datasets)
they would all be better alternatives.
In the somewhat less likely case that the elements of the "tuples" that you are processing frequently turn out to have an informative least upper bound type, then it might be because you aren't working with plain tuples, but with some kind of traversable data structure that puts restrictions on the number of substructures in the nodes. In this case, you should consider implementing something like Traverse interface for the structure, instead of messing with some "tuples" manually.

Behavior of scala fold in parallel collections

Let's run the following line of code several times:
Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
The results are quite interesting:
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res10: Int = 8
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res11: Int = 20
However clearly it should be like in its sequential version:
scala> Set(1,2,3,4,5,6,7).fold(0)(_ - _)
res15: Int = -28
I understand that operation - is non-associative on integers and that's the reason behind such behavior, but my question is quite simple: doesn't it mean that fold should not be parallelized in .par implementation of collections?
When you look at the standard library documentation, you see that fold is undeterministic here:
Folds the elements of this sequence using the specified associative binary operator.
The order in which operations are performed on elements is unspecified and may be nondeterministic.
As an alternative, there's foldLeft:
Applies a binary operator to a start value and all elements of this sequence, going left to right.
Applies a binary operator to a start value and all elements of this collection or iterator, going left to right.
Note: might return different results for different runs, unless the underlying collection type is ordered or the operator is associative and commutative.
As Set is not an ordered collection, there's no canonical order in which the elements could be folded, so the standard library allows itself to be undeterministic even for foldLeft. If you would use an ordered sequence here, foldLeft would be deterministic in that case.
The scaladoc does say:
The order in which the elements are reduced is unspecified and may be nondeterministic.
So, as you stated, a binary operation applied in ParSet#fold that is not associative is not guaranteed to produce a deterministic result. The above text is warning is all you get.
Does that mean ParSet#fold (and cousins) should not be parallelized? Not exactly. If your binary operation is commutative and you don't care about non-determinism of side-effects (not that a fold should have any), then there isn't a problem. However, you are hit with the caveat of needing to tread carefully around parallel collections.
Whether or not it is correct is more of a matter of opinion. One could argue that if a method can result in accidental non-determinism, that it should not exist in a language or library. But the alternative is to clip out functionality so that ParSet is missing functionality that is present in most of the other collection implementations. You could use the same line of thinking to also suggest the removal of Stream#foreach to prevent people from accidentally triggering infinite loops on infinite streams, but should you?
It is useful to parallelize fold operation with high workloads, however, to guarantee a deterministic output from calling of collection.par.fold(z)(f), the following conditions must hold:
1- f(f(a,b),c) == f(a,f(b,c)) // Associativity
2- f(z,a) == f(a,z) == a , where z is the neutral element for f (like 0 for sum, and 1 for multiplication).
Fabian's answer suggests using foldLeft instead. Although this is deterministic, using .par with it won't really parallelize anything. because foldLeft is sequential by nature.

Scala: Why does Seq.contains take an Any argument, instead of an argument of the sequence type?

So for example why does List(1,2,3,4).contains("wtf") even compile? Wouldn't it be nice if the compiler rejected this?
Lots of interesting answers, but here's my own theory: if contains did not receive an Any, then Seq could not be co-variant.
See, for instance, Set, which is not co-variant and whose contains take an A instead of an Any.
The reasons for that is left as an exercise to the reader. ;-) But here is a hint:
scala> class Container[+A](elements: A*) {
| def contains(what: A): Boolean = elements exists (what ==)
| }
<console>:7: error: covariant type A occurs in contravariant position in type A of value what
def contains(what: A): Boolean = elements exists (what ==)
^
"contains" is fundamentally about equality testing, and equality in Scala (as in Java before it) is untyped. The practical value of having untyped-equality is small, but not zero. There are, for instance, a few occasions where it makes sense for two objects of different classes to be equal to one another. For instance, you might wish an object of type RGBColor to be equal to a PantoneColor if they define the same hue, or an immutable HashSet and an immutable TreeSet to be equal if they contain the same elements. That said, untyped-equality also causes a bunch of headaches, and the fact that the compiler could easily catch that List(1,2,3,4).contains("wtf") is nonsensical but won't is one of them.
Most Java bug-finding tools include tests to detect the presence of improbable untyped-equality uses. (I wrote the inspections to do this in IntelliJ IDEA.) I have no doubt that when Scala bug-finding tools come online, these will be among the first bugs detected.
SeqLike.contains checks whether a value is present by checking for an element in the sequence that is equal to the value (using ==). == takes an Any so I suspect that this is the reason.