Merging two iterables in Scala

Merging two iterables in Scala - scala

I'd like to write a merge method that takes two iterables and merges them together. (maybe merge is not the best word to describe what I want, but for the sake of this question it's irrelevant). I'd like this method be generic to work with different concrete iterables.
For example, merge(Set(1,2), Set(2,3)) should return Set(1,2,3) and
merge(List(1,2), List(2,3)) should return List(1, 2, 2, 3). I've done the following naive attempt, but the compiler is complaining about the type of res: It is Iterable[Any] instead of A.
def merge[A <: Iterable[_]](first: A, second: A): A = {
val res = first ++ second
res
}
How can I fix this compile error? (I'm more interested in understanding how to implement such a functionality, rather than a library that does it for me, so explanation of why my code does not work is very appreciated.)

Let's start off with why your code didn't work. First off you're accidentally using the abbreviated syntax for an existential type, rather than actually using a type bound on a higher kinded type.
// What you wrote is equivalent to this
def merge[A <: Iterable[T] forSome {type T}](first: A, second: A): A
Even fixing it though doesn't quite get you what you want.
def merge[A, S[T] <: Iterable[T]](first: S[A], second: S[A]): S[A] = {
first ++ second // CanBuildFrom errors :(
}
This is because ++ doesn't use type bounds to achieve its polymorphism, it uses an implicit CanBuildFrom[From, Elem, To]. CanBuildFrom is responsible for giving an appropriate Builder[Elem, To], which is a mutable buffer which we use to build up the collection of our desired type.
So that means we're going to have to give it the CanBuildFrom it so desires and everything'll work right?
import collection.generic.CanBuildFrom
// Cannot construct a collection of type S[A] with elements of type A
// based on a collection of type Iterable[A]
merge0[A, S[T] <: Iterable[T], That](x: S[A], y: S[A])
(implicit bf: CanBuildFrom[S[A], A, S[A]]): S[A] = x.++[A, S[A]](y)
Nope :(.
I've added the extra type annotations to ++ to make the compiler error more relevant. What this is telling us is that because we haven't specifically overridden Iterable's ++ with our own for our arbitrary S, we're using Iterable's implementation of it, which just so happens to take an implicit CanBuildFrom that builds from Iterable's to our S.
This is incidentally the problem #ChrisMartin was running into (and this whole thing really is a long-winded comment to his answer).
Unfortunately Scala does not offer such a CanBuildFrom, so it looks like we're gonna have to use CanBuildFrom manually.
So down the rabbit hole we go...
Let's start off by noticing that ++ is in fact actually defined originally in TraversableLike and so we can make our custom merge a bit more general.
def merge[A, S[T] <: TraversableLike[T, S[T]], That](it: S[A], that: TraversableOnce[A])
(implicit bf: CanBuildFrom[S[A], A, That]): That = ???
Now let's actually implement that signature.
import collection.mutable.Builder
def merge[A, S[T] <: TraversableLike[T, S[T]], That](it: S[A], that: TraversableOnce[A])
(implicit bf: CanBuildFrom[S[A], A, That]): That= {
// Getting our mutable buffer from CanBuildFrom
val builder: Builder[A, That] = bf()
builder ++= it
builder ++= that
builder.result()
}
Note that I've changed GenTraversableOnce[B]* to TraversableOnce[B]**. This is because the only way to make Builder's ++= work is to have sequential access***. And that's there is all to CanBuildFrom. It gives you a mutable buffer that you fill with all values you want, then you convert the buffer into whatever your desired output collection is with result.
scala> merge(List(1, 2, 3), List(2, 3, 4))
res0: List[Int] = List(1, 2, 3, 2, 3, 4)
scala> merge(Set(1, 2, 3), Set(2, 3, 4))
res1: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
scala> merge(List(1, 2, 3), Set(1, 2, 3))
res2: List[Int] = List(1, 2, 3, 1, 2, 3)
scala> merge(Set(1, 2, 3), List(1, 2, 3)) // Not the same behavior :(
res3: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
In short, the CanBuildFrom machinery lets you build code that deals with the fact that we often wish to automatically convert between different branches of the inheritance graph of Scala's collections, but it comes at the cost of some complexity and occasionally unintuitive behavior. Weigh the tradeoffs accordingly.
Footnotes:
* "Generalized" collections for which we can "Traverse" at least "Once", but maybe not more, in some order which may or may not be sequential, e.g. perhaps parallel.
** Same thing as GenTraversableOnce except not "General" because it guarantees sequential access.
*** TraversableLike gets around this by forcibly calling seq on the GenTraversableOnce internally, but I feel like that's cheating people out of parallelism when they might have otherwise expected it. Force callers to decide whether they want to give up their parallelism; don't do it invisibly for them.

Preliminarily, here are the imports needed for all of the code in this answer:
import collection.GenTraversableOnce
import collection.generic.CanBuildFrom
Start by looking at the API doc to see the method signature for Iterable.++ (Note that the API docs for most collections are wrong, and you need to click "Full Signature" to see the real type):
def ++[B >: A, That](that: GenTraversableOnce[B])
(implicit bf: CanBuildFrom[Iterable[A], B, That]): That
From there you can just do a straightforward translation from an instance method to a function:
def merge[A, B >: A, That](it: Iterable[A], that: GenTraversableOnce[B])
(implicit bf: CanBuildFrom[Iterable[A], B, That]): That = it ++ that
Breaking this down:
[A, B >: A, That] —
Iterable has one type parameter A, and ++ has two type parameters B and That, so the resulting function has all three type parameters A, B, and That
it: Iterable[A] — The method belongs to Iterable[A], so we made that the first value parameter
that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Iterable[A], B, That]): That — the remaining parameter and type constraint copied directly from the signature of ++

Related

Avoiding downcasts in generic classes

Noticing that my code was essentially iterating over a list and updating a value in a Map, I first created a trivial helper method which took a function for the transformation of the map value and return an updated map. As the program evolved, it gained a few other Map-transformation functions, so it was natural to turn it into an implicit value class that adds methods to scala.collection.immutable.Map[A, B]. That version works fine.
However, there's nothing about the methods that require a specific map implementation and they would seem to apply to a scala.collection.Map[A, B] or even a MapLike. So I would like it to be generic in the map type as well as the key and value types. This is where it all goes pear-shaped.
My current iteration looks like this:
implicit class RichMap[A, B, MapType[A, B] <: collection.Map[A, B]](
val self: MapType[A, B]
) extends AnyVal {
def updatedWith(k: A, f: B => B): MapType[A, B] =
self updated (k, f(self(k)))
}
This code does not compile because self updated (k, f(self(k))) isa scala.collection.Map[A, B], which is not a MapType[A, B]. In other words, the return type of self.updated is as if self's type was the upper type bound rather than the actual declared type.
I can "fix" the code with a downcast:
def updatedWith(k: A, f: B => B): MapType[A, B] =
self.updated(k, f(self(k))).asInstanceOf[MapType[A, B]]
This does not feel satisfactory because downcasting is a code smell and indicates misuse of the type system. In this particular case it would seem that the value will always be of the cast-to type, and that the whole program compiles and runs correctly with this downcast supports this view, but it still smells.
So, is there a better way to write this code to have scalac correctly infer types without using a downcast, or is this a compiler limitation and a downcast is necessary?
[Edited to add the following.]
My code which uses this method is somewhat more complex and messy as I'm still exploring a few ideas, but an example minimum case is the computation of a frequency distribution as a side-effect with code roughly like this:
var counts = Map.empty[Int, Int] withDefaultValue 0
for (item <- items) {
// loads of other gnarly item-processing code
counts = counts updatedWith (count, 1 + _)
}
There are three answers to my question at the time of writing.
One boils down to just letting updatedWith return a scala.collection.Map[A, B] anyway. Essentially, it takes my original version that accepted and returned an immutable.Map[A, B], and makes the type less specific. In other words, it's still insufficiently generic and sets policy on which types the caller uses. I can certainly change the type on the counts declaration, but that is also a code smell to work around a library returning the wrong type, and all it really does is move the downcast into the caller's code. So I don't really like this answer at all.
The other two are variations on CanBuildFrom and builders in that they essentially iterate over the map to produce a modified copy. One inlines a modified updated method, whereas the other calls the original updated and appends it to the builder and thus appears to make an extra temporary copy. Both are good answers which solve the type correctness problem, although the one that avoids an extra copy is the better of the two from a performance standpoint and I prefer it for that reason. The other is however shorter and arguably more clearly shows intent.
In the case of a hypothetical immutable Map that shares large trees in a similar vein to List, this copying would break the sharing and reduce performance and so it would be preferable to use the existing modified without performing copies. However, Scala's immutable maps don't appear to do this and so copying (once) seems to be the pragmatic solution that is unlikely to make any difference in practice.

Yes! Use CanBuildFrom. This is how the Scala collections library infers the closest collection type to the one you want, using CanBuildFrom evidence. So long as you have implicit evidence of CanBuildFrom[From, Elem, To], where From is the type of collection you're starting with, Elem is the type contained within the collection, and To is the end result you want. The CanBuildFrom will supply a Builder to which you can add elements to, and when you're done, you can call Builder#result() to get the completed collection of the appropriate type.
In this case:
From = MapType[A, B]
Elem = (A, B) // The type actually contained in maps
To = MapType[A, B]
Implementation:
import scala.collection.generic.CanBuildFrom
implicit class RichMap[A, B, MapType[A, B] <: collection.Map[A, B]](
val self: MapType[A, B]
) extends AnyVal {
def updatedWith(k: A, f: B => B)(implicit cbf: CanBuildFrom[MapType[A, B], (A, B), MapType[A, B]]): MapType[A, B] = {
val builder = cbf()
builder ++= self.updated(k, f(self(k)))
builder.result()
}
}
scala> val m = collection.concurrent.TrieMap(1 -> 2, 5 -> 3)
m: scala.collection.concurrent.TrieMap[Int,Int] = TrieMap(1 -> 2, 5 -> 3)
scala> m.updatedWith(1, _ + 10)
res1: scala.collection.concurrent.TrieMap[Int,Int] = TrieMap(1 -> 12, 5 -> 3)

Please note that updated method returns Map class, rather than generic, so I would say you should be fine returning Map as well. But if you really want to return a proper type, you could have a look at implementation of updated in List.updated
I've wrote a small example. I'm not sure it covers all the cases, but it works on my tests. I also used mutable Map, because it was harder for me to test immutable, but I guess it can be easily converted.
implicit class RichMap[A, B, MapType[x, y] <: Map[x, y]](val self: MapType[A, B]) extends AnyVal {
import scala.collection.generic.CanBuildFrom
def updatedWith[R >: B](k: A, f: B => R)(implicit bf: CanBuildFrom[MapType[A, B], (A, R), MapType[A, R]]): MapType[A, R] = {
val b = bf(self)
for ((key, value) <- self) {
if (key != k) {
b += (key -> value)
} else {
b += (key -> f(value))
}
}
b.result()
}
}
import scala.collection.immutable.{TreeMap, HashMap}
val map1 = HashMap(1 -> "s", 2 -> "d").updatedWith(2, _.toUpperCase()) // map1 type is HashMap[Int, String]
val map2 = TreeMap(1 -> "s", 2 -> "d").updatedWith(2, _.toUpperCase()) // map2 type is TreeMap[Int, String]
val map3 = HashMap(1 -> "s", 2 -> "d").updatedWith(2, _.asInstanceOf[Any]) // map3 type is HashMap[Int, Any]
Please also note that CanBuildFrom pattern is much more powerfull and this example doesn't use all of it's power. Thanks to CanBuildFrom some operations can change the type of collection completely like BitSet(1, 3, 5, 7) map {_.toString } type is actually SortedSet[String].

What's up with immutable.Map.map?

How does immutable.Map.map work? It looks like there is something wrong with the documentation:
def map[B](f: (A) ⇒ B): Map[B]
[use case]
Builds a new collection by applying a function to all elements of this immutable map.
Full Signature
def map[B, That](f: ((A, B)) ⇒ B)(implicit bf: CanBuildFrom[Map[A, B], B, That]): That
Map[B] does not make sense since Map takes two type parameters.
In the full signature, there is a name conflict between B the type argument of map, and B the type parameter of Map.

The understandable confusion stems from the fact that map is not implemented in Map, but in TraversableLike. However, the function documentation in inherited by sub classes.
TraversableLike takes two type parameters TraversableLike[+A, +Repr] and the map function has the signature TraversableLike.map[B](f: (A) ⇒ B): Traversable[B]. In the api docs of Map the documentation for that is inherited and partly adjusted Traversable[B] by Map[B], but B from TraversableLike is not resolved to (A, B) form Map.
(Might be a bug in Scaladoc, that probably won't be fixed, as there could be an erasure problem. But that's just guessing on my side.)
You can check what is actually implemented in Map if you configure visibility of members right above the members documentation.
EDIT:
Now to your core question:
If the documentation would give us, what we could read intuitively
and if we simplify it a bit for readability
and if we than further use a bit more of natural language instead of acronyms, the signature of map for a Map[A, B] could look like:
map[ResultItem, ResultCollection](f: (A,B) => ResultItem)(implicit bf: CanBuildFrom[Map[A, B], ResultItem, ResultCollection]): ResultCollection
So, basically you apply a function to each key-value-pair of the Map that transforms a key-value-pair of type (A,B) into a value of type ResultType.
As you can build almost any kind of collection from a map (or any other collection), this result type does not have to be another tuple. And ResultCollection does not have to be another Map. E.g.:
Map("1" -> 1, "2" -> 2).map((keyValuePair: (String, Int)) => keyValuePair._2)
or short
Map("1" -> 1, "2" -> 2).map(_._2)
has List(1, 2) as result, List[Int] as ResultCollection, Int as ResultItem.
This is possible because of the implicit CanBuildFrom parameter that adds an implicit builder that takes the result of the map function and appends it to its builder-result. In most cases CanBuildFrom is inferred by the compiler. However, there are cases when it will not be able to infer the proper result collection.
In such cases you have to give the compiler more information:
val test2: Vector[Int] = Map("1" -> 1, "2" -> 2).map(_._2)(collection.breakOut)
val test3: Set[Int] = Map("1" -> 1, "2" -> 2).map(_._2)(collection.breakOut)
For more information on breakOut and CanBuildFrom I'd recommend this answer

write polymorphic function that accept IndexedSeq[A] as well as ParVector[A]?

I want to write a polymorphic function that accepts either an IndexedSeq[A] or a ParVector[A]. Inside the function I want access to the prepend method i.e. +: which is in SeqLike. SeqLike like is a rather confusing type for me since it take a Repr which I sort of ignored, unsuccessfully of course.
def goFoo[M[_] <: SeqLike[_,_], A](ac: M[A])(p: Int): M[A] = ???
The function should accept an empty accumulator to start with and call itself recursively p times and each time prepend an A. Here is a concrete example
def goStripper[M[_] <: SeqLike[_,_]](ac: M[PDFTextStripper])(p: Int): M[PDFTextStripper] = {
val str = new PDFTextStripper
str.setStartPage(p)
str.setEndPage(p)
if (p > 1) goStripper(str +: ac)(p-1)
else str +: ac
}
But of course this doesn't compile because I am missing something fundamental about SeqLike. Does anyone has a solution (preferably with an explanation for this?)
Thanks.

Dealing with SeqLike[A, Repr] can be a bit difficult sometimes. You really need to have a good understanding of how the collections library works (This is a great article if you are interested, http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html). Thankfully, in your case, you actually don't need to even worry about it too much. Both IndexedSeq[A] and ParVector[A] are subclasses of scala.collection.GenSeq[A]. So you can merely write your method as follows
Simple Solution
scala> def goFoo[A, B <: GenSeq[A] with GenSeqLike[A, B]](ac: B)(p: Int): B = ac
goFoo: [A, B <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B]](ac: B)(p: Int)B
scala> goFoo[Int, IndexedSeq[Int]](IndexedSeq(1))(1)
res26: IndexedSeq[Int] = Vector(1)
scala> goFoo[Int, ParVector[Int]](new ParVector(Vector(1)))(1)
res27: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)
You need to enforce that B is both a subtype of GenSeq[A] and GenSeqLike[A, Repr] so that you can provide the correct value for the Repr. You also need to enforce that the Repr in GenSeqLike[A, Repr] is B. Otherwise some of the methods won't return the correct type. Repr is the underlying representation of the collection. To really understand it, you should read the article I linked, but you can think of it as the output type of the many of the collection operations, although that is very oversimplified. I talk about it more below, if you are really interested. For now, it suffices to say we want it to be the same type as the collection we are operating on.
Higher Kind Solution
Right now, the type system needs you to manually supply both generic parameters, which is fine, but we can do a little better. You can make this a little cleaner if you allow for higher kinds.
scala> import scala.language.higherKinds
import scala.language.higherKinds
scala> def goFoo[A, B[A] <: GenSeq[A] with GenSeqLike[A, B[A]]](ac: B[A])(p: Int): B[A] = ac
goFoo: [A, B[A] <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B[A]]](ac: B[A])(p: Int)B[A]
scala> goFoo(IndexedSeq(1))(1)
res28: IndexedSeq[Int] = Vector(1)
scala> goFoo(new ParVector(Vector(1)))(1)
res29: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)
Now you don't have to worry about manually supplying the types.
Recursion
These solutions work with recursion as well.
scala> #tailrec
| def goFoo[A, B <: GenSeq[A] with GenSeqLike[A, B]](ac: B)(p: Int): B =
| if(p == 0){
| ac
| } else {
| goFoo[A, B](ac.drop(1))(p-1)
| }
goFoo: [A, B <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B]](ac: B)(p: Int)B
scala> goFoo[Int, IndexedSeq[Int]](IndexedSeq(1, 2))(1)
res30: IndexedSeq[Int] = Vector(2)
And the higher kinded version
scala> #tailrec
| def goFoo[A, B[A] <: GenSeq[A] with GenSeqLike[A, B[A]]](ac: B[A])(p: Int): B[A] =
| if(p == 0){
| ac
| } else {
| goFoo(ac.drop(1))(p-1)
| }
goFoo: [A, B[A] <: scala.collection.GenSeq[A] with scala.collection.GenSeqLike[A,B[A]]](ac: B[A])(p: Int)B[A]
scala> goFoo(IndexedSeq(1, 2))(1)
res31: IndexedSeq[Int] = Vector(2)
Using GenSeqLike[A, Repr] Directly TL;DR
So I just want to say, unless you have a need for a more general solution don't do this. It is the hardest to understand and work with. We can't use SeqLike[A, Repr] because ParVector is not an instance of SeqLike, but we can use GenSeqLike[A, Repr], which both ParVector[A] and IndexedSeq[A] subclass.
That being said let's talk about how you could also solve this problem using GenSeqLike[A, Repr] directly.
Unpacking the type variables
First the easy one
A
This is just the type of the value in the collection, so for a Seq[Int] this would be Int.
Repr
This is the underlying type of the collection.
Scala collections implement most of their functionality in common traits, so that they don't have to duplicate code all over the place. Further, they wish to permit out of band types to function as though they are collections even if they don't inherit from a collections trait(I'm looking at you Array), and to allow client libraries/programs to add their own collection instances very easily while getting the bulk of the collection methods defined for free.
They are designed with two guiding constraints
Return the most specific type for the operation
Don't violate Liskov's Substitution Principle (https://en.wikipedia.org/wiki/Liskov_substitution_principle)
Note: These examples are taken from the aforementioned article and are not my own.(linked again here for completeness http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html)
The first constraint can be shown in the following example. A BitSet is a set of non-negative integers. If I perform the following operation, what should the result be?
BitSet(1).map(_+1): ???
The correct answer was a BitSet. I know that seemed rather obvious, but consider the following. What is the type of this operation?
BitSet(1).map(_.toFloat): ???
It can't be a BitSet, right? Because we said that BitSet values are non-negative integers. So it turns out to be a SortedSet[Float].
The Repr parameter, combined with an appropriate CanBuildFrom instance(I explain what this is in a second) is one of the primary mechanisms that allows for the returning the most specific type possible. We can see this by sort of cheating the system on the REPL. Consider the following, Vector is both a subclass of IndexedSeq and Seq. So what if we do this...
scala> val x: GenSeqLike[Int, IndexedSeq[Int]] = Vector(1)
x: scala.collection.SeqLike[Int,IndexedSeq[Int]] = Vector(1)
scala> 1 +: x
res26: IndexedSeq[Int] = Vector(1, 1)
See how the final type here was IndexedSeq[Int]. This was because we told the type system that the underlying representation of the collection was IndexedSeq[Int] so it tries to return that type if possible. Now watch this,
scala> val x: GenSeqLike[Int, Seq[Int]] = Vector(1)
x: scala.collection.SeqLike[Int,Seq[Int]] = Vector(1)
scala> 1 +: x
res27: Seq[Int] = Vector(1, 1)
Now we get a Seq out.
So scala collections try to give you the most specific type for your operation, while still allowing for huge amounts of code reuse. They do this by leveraging the Repr type, as we as a CanBuildFrom(still getting to it) I know your probably wondering what this has to do with your question, don't worry we're getting to that right now. I am not going to say anything on Liskov's Substitution Principle, as it doesn't pertain much to your specific question (but you should still read about it!)
Okay, so now we understand that GenSeqLike[A, Repr] is the trait that scala collections use to reuse the code for Seq (and other things like Seq). And we understand that Repr is used to store the underlying collection representation to help inform the type of collection to return. How this last point works we have yet to explain, so let's do that now!
CanBuildFrom[-From, -Elem, +To]
A CanBuildFrom instance is how the collections library knows how to build the result type of a given operation. For instance the real type of the +: method on SeqLike[A, Repr] is this.
abstract def +:[B >: A, That](elem: B)(implicit bf: CanBuildFrom[Repr, B, That]): That
This means that in order to prepend an element to a GenSeqLike[A, Repr] we need an instance of CanBuildFrom[Repr, B, That] where Repr is the type of our current collection, B is a supertype of the elements that we have in our collection, and That is they type of collection we will have after the operation is done. I am not going to get into the internals of how CanBuildFrom works (again see the linked article for the details), for now just believe me that this is what it does.
Putting it all together
So now we are ready to build an instance of goFoo that works with GenSeqLike[A, Repr] values.
scala> def goFoo[A, Repr <: GenSeqLike[A, Repr]](ac: Repr)(p: Int)(implicit cbf: CanBuildFrom[Repr, A, Repr]): Repr = ac
goFoo: [A, Repr <: scala.collection.GenSeqLike[A,Repr]](ac: Repr)(p: Int)(implicit cbf: scala.collection.generic.CanBuildFrom[Repr,A,Repr])Repr
scala> goFoo[Int, IndexedSeq[Int]](IndexedSeq(1))(1)
res7: IndexedSeq[Int] = Vector(1)
scala> goFoo[Int, ParVector[Int]](new ParVector(Vector(1)))(1)
res8: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)
What we are saying here, is that there is a CanBuildFrom that will take a subclass of GenSeqLike of type Repr over elements A and build a new Repr. This means that we can perform any operation on Repr type that will result in a new Repr, or in the specific case a new ParVector or IndexedSeq.
Unfortunately we must provide the generic parameters manually or the type system gets confused. Thankfully we can use higher kinds again to avoid this,
scala> def goFoo[A, Repr[A] <: GenSeqLike[A, Repr[A]]](ac: Repr[A])(p: Int)(implicit cbf: CanBuildFrom[Repr[A], A, Repr[A]]): Repr[A] = ac
goFoo: [A, Repr[A] <: scala.collection.GenSeqLike[A,Repr[A]]](ac: Repr[A])(p: Int)(implicit cbf: scala.collection.generic.CanBuildFrom[Repr[A],A,Repr[A]])Repr[A]
scala> goFoo(IndexedSeq(1))(1)
res16: IndexedSeq[Int] = Vector(1)
scala> goFoo(new ParVector(Vector(1)))(1)
res17: scala.collection.parallel.immutable.ParVector[Int] = ParVector(1)
So this is nice, in that it is a little more general than using GenSeq, but it is also way more confusing. I would not recommend doing this for anything other than a thought experiment.
Conclusion
While it has hopefully been informative to learn about how scala collections work to use GenSeqLike directly, I could hardly think of a use case where I would actually recommend it. The code is hard to understand, hard to work with, and may very well have some edge cases that I have missed. In general I would recommend avoiding interacting with scala collections implementation traits, such as GenSeqLike as much as is possible, unless you are installing your own collection into the system. You still have to touch GenSeqLike lightly to get all of the operations in GenSeq by giving it the correct Repr type, but you can avoid thinking about CanBuildFrom values.

Why does the scaladoc say HashMap.toArray returns Array[A] instead of Array[(A,B)]?

I was looking at the definition of toArray for hashmaps :
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.HashMap
It has
toArray: Array[A]
def toArray[B >: (A, B)](implicit arg0: ClassTag[B]): Array[B]
I don't quite understand this - the first bit says you get an Array[A], but the second part says you get Array[B]? Neither of these are what I expect - Array[(A,B)]
When I check it myself :
scala> val x = scala.collection.mutable.HashMap[String, Int]()
x: scala.collection.mutable.HashMap[String,Int] = Map()
scala> x.put("8", 7)
res0: Option[Int] = None
scala> x foreach println
(8,7)
scala> x.toArray
res2: Array[(String, Int)] = Array((8,7))
why isn't it like toList?
toList: scala.List[(A, B)]

The scaladoc has all kinds of subtle bugs. The problem here is that you are seeing the "simplified" version of the method signature (meant as a way to convey the essential part of the signature and hide things such as CanBuildFrom in map/flatMap methods, which are really an implementation detail).
The simplification got a little awry here, and does not seem to make much sense.
If you click on the "full signature" link, you'll see that the real signature looks like:
def toArray[B >: (A, B)](implicit arg0: ClassTag[B]): Array[B]
In fact this is still wrong, as we certainly cannot have a type B where B >: (A, B). It should be more like :
def toArray[C >: (A, B)](implicit arg0: ClassTag[C]): Array[C]
The problem is that there are actually two Bs: the first one comes from the HashMap class declaration itself (HashMap[A, +B]) while the other one comes from the methods toArray defined in its base class
TraversableOnce (def toArray[B >: A](implicit arg0: ClassTag[B]): Array[B]). It just happens that the scaladoc generator failed to dedup the two instances of B

The API you see in the the Scaladoc of toArray:
def toArray[B >: (A, B)](implicit arg0: ClassTag[B]): Array[B]
Is equivalent to:
def toArray[C >: (A, B)](implicit arg0: ClassTag[C]): Array[C]
The choice of the type variable B is indeed unfortunate (and maybe even a Scaladoc bug, I'm not sure if you are allowed to write that).
It basically means you'll get an array of the most specific supertype of (A,B) for which a ClassTag is available. The ClassTag is required in order to create the Array.
This basically means that if at compile-time, the-run time type of the Map you are converting is fully known, you will get a Array[(A,B)]. However, if you have up-casted your Map somewhere, the run-time type of the resulting Array will depend on the up-casted type, and not on the runtime type. This is different behavior than toList and due to the JVMs restrictions on how native arrays can be created.

The Scaladoc is just wrong because it inherits toArray from TraversableOnce, where the type of the collection is A and the return value is B. The Array[A] thing is left over from TraversableOnce where A is whatever TraversableOnce is traversing (in this case, actually (A,B) for a different definition of A and B); and although it fills the (A,B) in properly in the long form, it still uses B as the new return variable instead of a different letter like C.
Kind of confusing! It actually should read
def toArray[C >: (A,B)](...[C]): Array[C]
and the short form should be
toArray: Array[(A,B)]
just like you expect.

How does implicit <:< help to find type parameters

A couple of questions arise while I'm reading 7.3.2 Capturing type constraints
from Joshua's Scala in Depth. The example excerpted from the book:
scala> def peek[C, A](col: C)(implicit ev: C <:< Traversable[A]) = (col.head, col)
peek: [C, A](col: C)(implicit ev: <:<[C,Traversable[A]])(A, C)
scala> peek(List(1, 2, 3))
res9: (Int, List[Int]) = (1,List(1, 2, 3))
It seems straightforward that C is found to be List[Int] by the 1st parameter
list. And how <:< enforces type constraint by variance is explained in the book.
But I don't quite see how that helps to find A.
My understanding is, from 1st parameter list, scala finds out C: List[Int],
then it looks for implicit ev: <:<[List[Int], Traversable[A]].
At the moment A remains unknown.
It "pulls" two implicits conforms[List[Int]] and conforms[Traversable[A]] to
match ev. In either case to satisfy variance, List[Int] <: Traversable[A] has to be satisfied, which leads to the finding that A is Int.
Does it work as what I'm describing here? Especially on how/when A is deduced.

As pedrofurla commented, you've got it right—with one little qualification. You say that the compiler "pulls" conforms[Traversable[A]], but there's really no need for any such instance here. To take a simplified example where it's very clear what implicits are in scope:
trait Foo[-From, +To]
implicit object intListFoo extends Foo[List[Int], List[Int]]
Now there's definitely no Foo[Traversable[Int], Traversable[Int]] around, but we can write the following:
scala> implicitly[Foo[List[Int], Traversable[Int]]]
res0: Foo[List[Int],Traversable[Int]] = intListFoo$#8e760f2
More or less exactly the same thing is happening in your example. In that case we would have an instance Traversable[Int] <:< Traversable[Int] around if we needed it, but we don't for that specific implicit search.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Merging two iterables in Scala - scala

Related

Avoiding downcasts in generic classes

What's up with immutable.Map.map?

write polymorphic function that accept IndexedSeq[A] as well as ParVector[A]?

Why does the scaladoc say HashMap.toArray returns Array[A] instead of Array[(A,B)]?

How does implicit <:< help to find type parameters

Categories

Resources