Scala iterable set and sameElements [duplicate] - scala

Whilst working through the Scala exercises on Iterables, I encountered the following strange behaviour:
val xs = Set(5,4,3,2,1)
val ys = Set(1,2,3,4,5)
xs sameElements ys // true
val xs = Set(3,2,1)
val ys = Set(1,2,3)
xs sameElements ys // false - WAT?!
Surely these Sets have the same elements, and should ignore ordering; and why does this work as expected only for the larger set?

The Scala collections library provides specialised implementations for Sets of fewer than 5 values (see the source). The iterators for these implementations return elements in the order in which they were added, rather than the consistent, hash-based ordering used for larger Sets.
Furthermore, sameElements (scaladoc) is defined on Iterables (it is implemented in IterableLike - see the source); it returns true only if the iterators return the same elements in the same order.
So although Set(1,2,3) and Set(3,2,1) ought to be equivalent, their iterators are different, therefore sameElements returns false.
This behaviour is surprising, and arguably a bug since it violates the mathematical expectations for a Set (but only for certain sizes of Set!).
As I.K. points out in the comments, == works fine if you are just comparing Sets with one another, i.e. Set(1,2,3) == Set(3,2,1). However, sameElements is more general in that it can compare the elements of any two iterables. For example, List(1, 2, 3) == Array(1, 2, 3) is false, but List(1, 2, 3) sameElements Array(1, 2, 3) is true.
More generally, equality can be confusing - note that:
List(1,2,3) == Vector(1,2,3)
List(1,2,3) != Set(1,2,3)
List(1,2,3) != Array(1,2,3)
Array(1,2,3) != Array(1,2,3)
I have submitted a fix for the Scala exercises that explains the sameElements problem.

Related

Scala Semantic Equality [duplicate]

scala> List(1,2,3) == List(1,2,3)
res2: Boolean = true
scala> Map(1 -> "Olle") == Map(1 -> "Olle")
res3: Boolean = true
But when trying to do the same with Array, it does not work the same. Why?
scala> Array('a','b') == Array('a','b')
res4: Boolean = false
I have used 2.8.0.RC7 and 2.8.0.Beta1-prerelease.
Because the definition of "equals" for Arrays is that they refer to the same array.
This is consistent with Java's array equality, using Object.Equals, so it compares references.
If you want to check pairwise elements, then use sameElements
Array('a','b').sameElements(Array('a','b'))
or deepEquals, which has been deprecated in 2.8, so instead use:
Array('a','b').deep.equals(Array('a','b').deep)
There's a good Nabble discussion on array equality.
The root cause the it is that fact that Scala use the same Array implementation as Java, and that's the only collection that not support == as equality operator.
Also, it's important to note that the chosen answer suggest equally sameElements and deep comparison when actually it's preferred to use:
Array('a','b').deep.equals(Array('a','b').deep)
Or, because now we can use == back again:
Array('a','b').deep == Array('a','b').deep
Instead of:
Array('a','b').sameElements(Array('a','b'))
Because sameElements won't for for nested array, it's not recursive. And deep comparison will.

Lists in Scala - plus colon vs double colon (+: vs ::)

I am little bit confused about +: and :: operators that are available.
It looks like both of them gives the same results.
scala> List(1,2,3)
res0: List[Int] = List(1, 2, 3)
scala> 0 +: res0
res1: List[Int] = List(0, 1, 2, 3)
scala> 0 :: res0
res2: List[Int] = List(0, 1, 2, 3)
For my novice eye source code for both methods looks similar (plus-colon method has additional condition on generics with use of builder factories).
Which one of these methods should be used and when?
+: works with any kind of collection, while :: is specific implementation for List.
If you look at the source for +: closely, you will notice that it actually calls :: when the expected return type is List. That is because :: is implemented more efficiently for the List case: it simply connects the new head to the existing list and returns the result, which is a constant-time operation, as opposed to linear copying the entire collection in the generic case of +:.
+: on the other hand, takes CanBuildFrom, so you can do fancy (albeit, not looking as nicely in this case) things like:
val foo: Array[String] = List("foo").+:("bar")(breakOut)
(It's pretty useless in this particular case, as you could start with the needed type to begin with, but the idea is you can prepend and element to a collection, and change its type in one "go", avoiding an additional copy).

Scala Sets contain the same elements, but sameElements() returns false

Whilst working through the Scala exercises on Iterables, I encountered the following strange behaviour:
val xs = Set(5,4,3,2,1)
val ys = Set(1,2,3,4,5)
xs sameElements ys // true
val xs = Set(3,2,1)
val ys = Set(1,2,3)
xs sameElements ys // false - WAT?!
Surely these Sets have the same elements, and should ignore ordering; and why does this work as expected only for the larger set?
The Scala collections library provides specialised implementations for Sets of fewer than 5 values (see the source). The iterators for these implementations return elements in the order in which they were added, rather than the consistent, hash-based ordering used for larger Sets.
Furthermore, sameElements (scaladoc) is defined on Iterables (it is implemented in IterableLike - see the source); it returns true only if the iterators return the same elements in the same order.
So although Set(1,2,3) and Set(3,2,1) ought to be equivalent, their iterators are different, therefore sameElements returns false.
This behaviour is surprising, and arguably a bug since it violates the mathematical expectations for a Set (but only for certain sizes of Set!).
As I.K. points out in the comments, == works fine if you are just comparing Sets with one another, i.e. Set(1,2,3) == Set(3,2,1). However, sameElements is more general in that it can compare the elements of any two iterables. For example, List(1, 2, 3) == Array(1, 2, 3) is false, but List(1, 2, 3) sameElements Array(1, 2, 3) is true.
More generally, equality can be confusing - note that:
List(1,2,3) == Vector(1,2,3)
List(1,2,3) != Set(1,2,3)
List(1,2,3) != Array(1,2,3)
Array(1,2,3) != Array(1,2,3)
I have submitted a fix for the Scala exercises that explains the sameElements problem.

Should Scala's map() behave differently when mapping to the same type?

In the Scala Collections framework, I think there are some behaviors that are counterintuitive when using map().
We can distinguish two kinds of transformations on (immutable) collections. Those whose implementation calls newBuilder to recreate the resulting collection, and those who go though an implicit CanBuildFrom to obtain the builder.
The first category contains all transformations where the type of the contained elements does not change. They are, for example, filter, partition, drop, take, span, etc. These transformations are free to call newBuilder and to recreate the same collection type as the one they are called on, no matter how specific: filtering a List[Int] can always return a List[Int]; filtering a BitSet (or the RNA example structure described in this article on the architecture of the collections framework) can always return another BitSet (or RNA). Let's call them the filtering transformations.
The second category of transformations need CanBuildFroms to be more flexible, as the type of the contained elements may change, and as a result of this, the type of the collection itself maybe cannot be reused: a BitSet cannot contain Strings; an RNA contains only Bases. Examples of such transformations are map, flatMap, collect, scanLeft, ++, etc. Let's call them the mapping transformations.
Now here's the main issue to discuss. No matter what the static type of the collection is, all filtering transformations will return the same collection type, while the collection type returned by a mapping operation can vary depending on the static type.
scala> import collection.immutable.TreeSet
import collection.immutable.TreeSet
scala> val treeset = TreeSet(1,2,3,4,5) // static type == dynamic type
treeset: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 2, 3, 4, 5)
scala> val set: Set[Int] = TreeSet(1,2,3,4,5) // static type != dynamic type
set: Set[Int] = TreeSet(1, 2, 3, 4, 5)
scala> treeset.filter(_ % 2 == 0)
res0: scala.collection.immutable.TreeSet[Int] = TreeSet(2, 4) // fine, a TreeSet again
scala> set.filter(_ % 2 == 0)
res1: scala.collection.immutable.Set[Int] = TreeSet(2, 4) // fine
scala> treeset.map(_ + 1)
res2: scala.collection.immutable.SortedSet[Int] = TreeSet(2, 3, 4, 5, 6) // still fine
scala> set.map(_ + 1)
res3: scala.collection.immutable.Set[Int] = Set(4, 5, 6, 2, 3) // uh?!
Now, I understand why this works like this. It is explained there and there. In short: the implicit CanBuildFrom is inserted based on the static type, and, depending on the implementation of its def apply(from: Coll) method, may or may not be able to recreate the same collection type.
Now my only point is, when we know that we are using a mapping operation yielding a collection with the same element type (which the compiler can statically determine), we could mimic the way the filtering transformations work and use the collection's native builder. We can reuse BitSet when mapping to Ints, create a new TreeSet with the same ordering, etc.
Then we would avoid cases where
for (i <- set) {
val x = i + 1
println(x)
}
does not print the incremented elements of the TreeSet in the same order as
for (i <- set; x = i + 1)
println(x)
So:
Do you think this would be a good idea to change the behavior of the mapping transformations as described?
What are the inevitable caveats I have grossly overlooked?
How could it be implemented?
I was thinking about something like an implicit sameTypeEvidence: A =:= B parameter, maybe with a default value of null (or rather an implicit canReuseCalleeBuilderEvidence: B <:< A = null), which could be used at runtime to give more information to the CanBuildFrom, which in turn could be used to determine the type of builder to return.
I looked again at it, and I think your problem doesn't arise from a particular deficiency of Scala collections, but rather a missing builder for TreeSet. Because the following does work as intended:
val list = List(1,2,3,4,5)
val seq1: Seq[Int] = list
seq1.map( _ + 1 ) // yields List
val vector = Vector(1,2,3,4,5)
val seq2: Seq[Int] = vector
seq2.map( _ + 1 ) // yields Vector
So the reason is that TreeSet is missing a specialised companion object/builder:
seq1.companion.newBuilder[Int] // ListBuffer
seq2.companion.newBuilder[Int] // VectorBuilder
treeset.companion.newBuilder[Int] // Set (oops!)
So my guess is, if you take proper provision for such a companion for your RNA class, you may find that both map and filter work as you wish...?

How to do something like this in Scala?

Sorry for the lack of a descriptive title; I couldn't think of anything better. Edit it if you think of one.
Let's say I have two Lists of Objects, and they are always changing. They need to remain as separate lists, but many operations have to be done on both of them. This leads me to doing stuff like:
//assuming A and B are the lists
A.foo(params)
B.foo(params)
In other words, I'm doing the exact same operation to two different lists at many places in my code. I would like a way to reduce them down to one list without explicitly having to construct another list. I know that just combining lists A and b into a list C would solve all my problems, but then we'd just be back to the same operation if I needed to add a new object to the list (because I'd have to add it to C as well as its respective list).
It's in a tight loop and performance is very important. Is there any way to construct an iterator or something that would iterate A and then move on to B, all transparently? I know another solution would be to construct the combined list (C) every time I'd like to perform some kind of function on both of these lists, but that is a huge waste of time (computationally speaking).
Iterator is what you need here. Turning a List into an Iterator and concatenating 2 Iterators are both O(1) operations.
scala> val l1 = List(1, 2, 3)
l1: List[Int] = List(1, 2, 3)
scala> val l2 = List(4, 5, 6)
l2: List[Int] = List(4, 5, 6)
scala> (l1.iterator ++ l2.iterator) foreach (println(_)) // use List.elements for Scala 2.7.*
1
2
3
4
5
6
I'm not sure if I understand what's your meaning.
Anyway, this is my solution:
scala> var listA :List[Int] = Nil
listA: List[Int] = List()
scala> var listB :List[Int] = Nil
listB: List[Int] = List()
scala> def dealWith(op : List[Int] => Unit){ op(listA); op(listB) }
dealWith: ((List[Int]) => Unit)Unit
and then if you want perform a operator in both listA and listB,you can use like following:
scala> listA ::= 1
scala> listB ::= 0
scala> dealWith{ _ foreach println }
1
0