It appears that MultiMap's addBinding does not preserve the insertion order of values binded to a same key, as the underlying mechanism it uses is a HashSet. What may be an idiomatic way to preserve insertion order with a MultiMap?
Based on MultiMap where the implementation states:
/** Creates a new set.
*
* Classes that use this trait as a mixin can override this method
* to have the desired implementation of sets assigned to new keys.
* By default this is `HashSet`.
*
* #return An empty set of values of type `B`.
*/
protected def makeSet: Set[B] = new HashSet[B]
You can simply define:
trait OrderedMultimap[A, B] extends MultiMap[A, B] {
override def makeSet: Set[B] = new LinkedHashSet[B]
}
One way would probably be to fall back to a regular Map (not MultiMap), using a collection for the value type, whereas that collection would be a collection type where order can be enforced (i.e. not a Set). As I understand, to preserve order of insertion in the wider sense that allows element repetition, the natural Scala collection to use would be a Seq implementation (e.g. Vector, or Queue, depending on the access patterns).
Related
What exactly happens when you evaluate the expression: Seq(1,2,3)?
I am new to Scala and I am now a bit confused about the various collection types. Seq is a trait, right? So when you call it like this: Seq(1,2,3), it must be some kind of a companion object? Or not? Is it some kind of a class that extends Seq? And most importantly, what is the type of the returned value? Is it Seq and if yes, why is it not explicitly the extension class instead?
Also in the REPL I see that the contents of the evaluated expression is actually a List(1,2,3), but the type is apparently Seq[Int]. Why is it not an IndexedSeq collection type, like Vector? What is the logic behind all that?
What exactly happens when you evaluate expression: Seq(1,2,3)?
In Scala, foo(bar) is syntactic sugar for foo.apply(bar), unless this also has a method named foo, in which case it is a method call on the implicit this receiver, i.e. just like Java, it is then equivalent to this.foo(bar).
Just like any other OO language, the receiver of a method call alone decides what to do with that call, so in this case, Seq decides what to do.
Seq is a trait, right?
There are two Seqs in the standard library:
The trait Seq, which is a type.
The object Seq, which is a value.
So when you call it like that Seq(1,2,3) it must be some kind of a companion object? Or not?
Yes, it must be an object, since you can only call methods on objects. You cannot call methods on types, therefore, when you see a method call, it must be an object. Always. So, in this case, Seq cannot possibly be the Seq trait, it must be the Seq object.
Note that "it must be some kind of a companion object" is not true. The only thing you can see from that piece of code is that Seq is an object. You cannot know from that piece of code whether it is a companion object. For that, you would have to look at the source code. In this particular case, it turns out that it is, in fact, a companion object, but you cannot conclude that from the code you showed.
Is it some kind of a class that extends Seq?
No. It cannot possibly be a class, since you can only call methods on objects, and classes are not objects in Scala. (This is not like Ruby or Smalltalk, where classes are also objects and instances of the Class class.) It must be an object.
And most importantly what is the type of the returned value?
The easiest way to find that out is to simply look at the documentation for Seq.apply:
def apply[A](elems: A*): Seq[A]
Creates a collection with the specified elements.
A: the type of the collection's elements
elems: the elements of the created collection
returns a new collection with elements elems
So, as you can see, the return type of Seq.apply is Seq, or more precisely, Seq[A], where A is a type variable denoting the type of the elements of the collection.
Is it Seq and if yes, why is not explicitly the extension class instead?
Because there is no extension class.
Also, the standard design pattern in Scala is that the apply method of a companion object returns an instance of the companion class or trait. It would be weird and surprising to break this convention.
Also in REPL I see that the contents of the evaluated expression is actually a List(1,2,3), but the type is apparently Seq[Int].
The static type is Seq[Int]. That is all you need to know. That is all you can know.
Now, Seq is a trait, and traits cannot be instantiated, so the runtime type will be some subclass of Seq. But! You cannot and must not care, what specific runtime type it is.
Why is not an Indexed collection type, like Vector? What is the logic behind all that?
How do you know it is not going to return a Vector the next time you call it? It wouldn't matter one bit, since the static type is Seq and thus you are only allowed to call Seq methods on it, and you are only allowed to rely on the contract of Seq, i.e. Seq's post-conditions, invariants, etc. anyway. Even if you knew it was a Vector that is returned, you wouldn't be able to do anything with this knowledge.
Thus, Seq.apply returns the simplest thing it can possibly return, and that is a List.
Seq is the val of:
package object scala {
...
val Seq = scala.collection.Seq
...
}
it points to object scala.collection.Seq:
/** $factoryInfo
* The current default implementation of a $Coll is a `List`.
* #define coll sequence
* #define Coll `Seq`
*/
object Seq extends SeqFactory[Seq] {
/** $genericCanBuildFromInfo */
implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Seq[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]]
def newBuilder[A]: Builder[A, Seq[A]] = immutable.Seq.newBuilder[A]
}
and when you do Seq(1,2,3) the apply() method is ivoked from scala.collection.generic.GenericCompanion abstract class:
/** A template class for companion objects of "regular" collection classes
* represent an unconstrained higher-kinded type. Typically
* such classes inherit from trait `GenericTraversableTemplate`.
* #tparam CC The type constructor representing the collection class.
* #see [[scala.collection.generic.GenericTraversableTemplate]]
* #author Martin Odersky
* #since 2.8
* #define coll collection
* #define Coll `CC`
*/
abstract class GenericCompanion[+CC[X] <: GenTraversable[X]] {
...
/** Creates a $coll with the specified elements.
* #tparam A the type of the ${coll}'s elements
* #param elems the elements of the created $coll
* #return a new $coll with elements `elems`
*/
def apply[A](elems: A*): CC[A] = {
if (elems.isEmpty) empty[A]
else {
val b = newBuilder[A]
b ++= elems
b.result()
}
}
}
and finally, this method builds an object of Seq type by code mentioned above
And most importantly what is the type of the returned value?
object MainClass {
def main(args: Array[String]): Unit = {
val isList = Seq(1,2,3).isInstanceOf[List[Int]]
println(isList)
}
}
prints:
true
So, the type is scala.collection.immutable.List
Also in REPL I see that the contents of the evaluated expression is actually a List(1,2,3), but the type is apparently Seq[Int].
The default implementation of Seq is List by the code mentioned above.
Why is not an Indexed collection type, like Vector? What is the logic behind all that?
Because of immutable design. The list is immutable and to make it immutable and have a constant prepend operation but O(n) append operation cost and O(n) cost of accessing n'th element. The Vector has a constant efficient implementation of access and add elements by id, prepend and append operations.
To have a better understanding of how the List is designed in Scala, see https://mauricio.github.io/2013/11/25/learning-scala-by-building-scala-lists.html
Why is the output of this comparison outputs true?
import scala.collection.immutable.ListSet
Set(1) == ListSet(1) // Expect false
//Output
res0: Boolean = true
And in a more general sense, how the comparison is actually done?
Since the inheritance chain Set <: GenSet <: GenSetLike is a bit lengthy, it might be not immediately obvious where to look for the code of equals, so I thought maybe I quote it here:
GenSetLike.scala:
/** Compares this set with another object for equality.
*
* '''Note:''' This operation contains an unchecked cast: if `that`
* is a set, it will assume with an unchecked cast
* that it has the same element type as this set.
* Any subsequent ClassCastException is treated as a `false` result.
* #param that the other object
* #return `true` if `that` is a set which contains the same elements
* as this set.
*/
override def equals(that: Any): Boolean = that match {
case that: GenSet[_] =>
(this eq that) ||
(that canEqual this) &&
(this.size == that.size) &&
(try this subsetOf that.asInstanceOf[GenSet[A]]
catch { case ex: ClassCastException => false })
case _ =>
false
}
Essentially, it checks whether the other object is also a GenSet, and if yes, it attempts to perform some fail-fast checks (like comparing size and invoking canEqual), and if the sizes are equal, it checks whether this set is a subset of another set, presumably by checking each element.
So, the exact class used to represent the set at runtime is irrelevant, what matters is that the compared object is also a GenSet and has the same elements.
From Scala collections equality:
The collection libraries have a uniform approach to equality and hashing. The idea is, first, to divide collections into sets, maps, and sequences.
...
On the other hand, within the same category, collections are equal if and only if they have the same elements
In your case, both collections are considered sets and they contain the same elements, hence, they're equal.
Scala 2.12.8 documentation:
This class implements immutable sets using a list-based data
structure.
So ListSet is a set too but with concrete (list-based) implementation.
Let's say you have the following:
case class Foo(x: Int, y: Int) extends Ordered[Foo] {
override def compare(that: Foo): Int = x compareTo that.x
}
val mutableSet: scala.collection.mutable.SortedSet[Foo] =
scala.collection.mutable.SortedSet(Foo(1, 2), Foo(1,3))
I expect the result of mutableSet.size to be 2. Why, Foo(1,2) and Foo(1,3) are not equal but they have the same ordering. So the sorted set should be (IMO) Foo(1,2), Foo(1,3). As this is the order they were created in (even the other way around would be fine, counter intuitive but fine).
However, the result of mutableSet.size is 1 and it saves the last value i.e. Foo(1,3).
What am I missing?
The behavior is similar to the Java SortedSet collections. SortedSet uses compareTo in order to define equality, thus it eliminates the same Foo case classes from your example.
In Scala 2.11 it uses scala.collection.TreeSet for the implementation SortedSet implementation. The best way to figure out this is to put breakpoint into your compareTo method.
TreeSet is implemented using AVL Tree data structure, you can inspect the behavior by looking into the AVLTree.scala insert method of the Node class. It compare compareTo result with 0 in order to figure out does it duplicated element in the collection.
You have overridden compare so that only the first field is used in comparison. The Set uses this compare function not just to sort the items but also to determine if the items are equal for the purpose of storing them in the Set.
In a nutshell what I wish to do is take a set of Longs, arbitrarily ordered as in (7,3,9,14,123,2) and have available a series of Objects:
Set(SomeObject(7),SomeObject(3),SomeObject(9),SomeObject(14),SomeObject(123),SomeObject(2))
However I do not want the SomeObject objects initialized until I actually ask for them. I wish to be able to ask for them in arbitrary order as well: As in give me the 3rd SomeObject (by index) or give me the SomeObject that maps to the Long value of 7. All that without triggering initializations down the stack.
I understand a lazy stream however I'm not quite sure how to connect the dots between the first Set of Longs (map will do that instantly of course as in map { x => SomeObject(x)}) and yet end up with a Lazy Stream (in the same initial arbitrary order please!)
One of the additional rules is this needs to be Set based so I never have the same Long (and it's matching SomeObject) appear twice.
An additional need is to to handle multiple Sets of Longs initially being mashed together, while maintaining the (fifo) order and uniqueness but I believe that is all built into a subclass of Set to begin with.
Set doesn't provide a indexed access so you can't get "3rd SomeObject". Also Set cant provide you any operations without evaluating values that it contains because this values need to be ordered (in case of Tree-based implementation) or hashed (in case of HashSets), and you cant sort or hash value that you do not know.
If creation of SomeObject is resource consuming maybe it is better to create a "SomeObjectHolder" class that would create SomeObject on demand and provide hashing operations that will not require creation of SomeObject.
Than you will have
Set(SomeObjectHolder(7),SomeObjectHolder(3),SomeObjectHolder(9),...
And each SomeObjectHolder will create corresponding SomeObject for you when you need.
Some of your requirements can be satisfied by lazy view of some indexed sequence:
case class SomeObject(v:Long) {
println(s"$v created")
}
val source = Vector(0L, 1L, 2L, 3L, 4L)
val col = source.view.map(SomeObject.apply)
In this case, when you access individual elements by index col(2) only requested elements are evaluated. However when you request slice, all elements from 0 to endpoint are evaluated.
col.slice(1, 2).toList
Prints:
0 created
1 created
This approach has several drawbacks:
when you request element several times, it get's evaluated each time
when you request slice, all elements from the beginning are evaluated
you can't request mapping for arbitrary key (only for index)
To satisfy all you requirements custom class should be created:
class CachedIndexedSeq[K, V](source: IndexedSeq[K], func: K => V) extends IndexedSeq[V] {
private val cache = mutable.Map[K, V]()
def getMapping(key: K): V = cache.getOrElseUpdate(key, func(key))
override def length: Int = source.length
override def apply(idx: Int): V = getMapping(source(idx))
}
This class takes source indexed sequence as the argument along with mapping function. It lazily evaluates elements and also provides getMapping method to lazily map arbitrary key.
val source = Vector(0L, 1L, 2L)
val col2 = new CachedIndexedSeq[Long, SomeObject](source, SomeObject.apply)
col2.slice(1, 3).toList
col2(1)
col2(1)
col2.getMapping(1L)
Prints:
1 created
2 created
The only remaining requirement is the ability to avoid duplicates. Set doesn't combine well with requesting elements by index. So I suggest to put all your initial Longs into any indexed seq (such as Vector) and then call distinct on them, before wrapping in CachedIndexedSeq.
I'd like to extract the distinct elements from a Scala list, but I don't want to use the natural equality relation. How can I specify it?
Do I have to rewrite the function or is there any way (maybe using some implicit definition that I am missing) to invoke the distinct method with a custom equality relation?
distinct does not expect an ordering algorithm - it uses the equals-method (source).
One way to achieve what you want is to create your own ordering and pass it to a SortedSet, which expects an Ordering:
implicit val ord = new Ordering[Int] {
def compare(i: Int, j: Int) = /* your implementation here */
}
val sortedList = collection.immutable.SortedSet(list: _*)/*(ord)*/.toList