Scala 2.8 breakOut - scala

In Scala 2.8, there is an object in scala.collection.package.scala:
def breakOut[From, T, To](implicit b : CanBuildFrom[Nothing, T, To]) =
new CanBuildFrom[From, T, To] {
def apply(from: From) = b.apply() ; def apply() = b.apply()
}
I have been told that this results in:
> import scala.collection.breakOut
> val map : Map[Int,String] = List("London", "Paris").map(x => (x.length, x))(breakOut)
map: Map[Int,String] = Map(6 -> London, 5 -> Paris)
What is going on here? Why is breakOut being called as an argument to my List?

The answer is found on the definition of map:
def map[B, That](f : (A) => B)(implicit bf : CanBuildFrom[Repr, B, That]) : That
Note that it has two parameters. The first is your function and the second is an implicit. If you do not provide that implicit, Scala will choose the most specific one available.
About breakOut
So, what's the purpose of breakOut? Consider the example given for the question, You take a list of strings, transform each string into a tuple (Int, String), and then produce a Map out of it. The most obvious way to do that would produce an intermediary List[(Int, String)] collection, and then convert it.
Given that map uses a Builder to produce the resulting collection, wouldn't it be possible to skip the intermediary List and collect the results directly into a Map? Evidently, yes, it is. To do so, however, we need to pass a proper CanBuildFrom to map, and that is exactly what breakOut does.
Let's look, then, at the definition of breakOut:
def breakOut[From, T, To](implicit b : CanBuildFrom[Nothing, T, To]) =
new CanBuildFrom[From, T, To] {
def apply(from: From) = b.apply() ; def apply() = b.apply()
}
Note that breakOut is parameterized, and that it returns an instance of CanBuildFrom. As it happens, the types From, T and To have already been inferred, because we know that map is expecting CanBuildFrom[List[String], (Int, String), Map[Int, String]]. Therefore:
From = List[String]
T = (Int, String)
To = Map[Int, String]
To conclude let's examine the implicit received by breakOut itself. It is of type CanBuildFrom[Nothing,T,To]. We already know all these types, so we can determine that we need an implicit of type CanBuildFrom[Nothing,(Int,String),Map[Int,String]]. But is there such a definition?
Let's look at CanBuildFrom's definition:
trait CanBuildFrom[-From, -Elem, +To]
extends AnyRef
So CanBuildFrom is contra-variant on its first type parameter. Because Nothing is a bottom class (ie, it is a subclass of everything), that means any class can be used in place of Nothing.
Since such a builder exists, Scala can use it to produce the desired output.
About Builders
A lot of methods from Scala's collections library consists of taking the original collection, processing it somehow (in the case of map, transforming each element), and storing the results in a new collection.
To maximize code reuse, this storing of results is done through a builder (scala.collection.mutable.Builder), which basically supports two operations: appending elements, and returning the resulting collection. The type of this resulting collection will depend on the type of the builder. Thus, a List builder will return a List, a Map builder will return a Map, and so on. The implementation of the map method need not concern itself with the type of the result: the builder takes care of it.
On the other hand, that means that map needs to receive this builder somehow. The problem faced when designing Scala 2.8 Collections was how to choose the best builder possible. For example, if I were to write Map('a' -> 1).map(_.swap), I'd like to get a Map(1 -> 'a') back. On the other hand, a Map('a' -> 1).map(_._1) can't return a Map (it returns an Iterable).
The magic of producing the best possible Builder from the known types of the expression is performed through this CanBuildFrom implicit.
About CanBuildFrom
To better explain what's going on, I'll give an example where the collection being mapped is a Map instead of a List. I'll go back to List later. For now, consider these two expressions:
Map(1 -> "one", 2 -> "two") map Function.tupled(_ -> _.length)
Map(1 -> "one", 2 -> "two") map (_._2)
The first returns a Map and the second returns an Iterable. The magic of returning a fitting collection is the work of CanBuildFrom. Let's consider the definition of map again to understand it.
The method map is inherited from TraversableLike. It is parameterized on B and That, and makes use of the type parameters A and Repr, which parameterize the class. Let's see both definitions together:
The class TraversableLike is defined as:
trait TraversableLike[+A, +Repr]
extends HasNewBuilder[A, Repr] with AnyRef
def map[B, That](f : (A) => B)(implicit bf : CanBuildFrom[Repr, B, That]) : That
To understand where A and Repr come from, let's consider the definition of Map itself:
trait Map[A, +B]
extends Iterable[(A, B)] with Map[A, B] with MapLike[A, B, Map[A, B]]
Because TraversableLike is inherited by all traits which extend Map, A and Repr could be inherited from any of them. The last one gets the preference, though. So, following the definition of the immutable Map and all the traits that connect it to TraversableLike, we have:
trait Map[A, +B]
extends Iterable[(A, B)] with Map[A, B] with MapLike[A, B, Map[A, B]]
trait MapLike[A, +B, +This <: MapLike[A, B, This] with Map[A, B]]
extends MapLike[A, B, This]
trait MapLike[A, +B, +This <: MapLike[A, B, This] with Map[A, B]]
extends PartialFunction[A, B] with IterableLike[(A, B), This] with Subtractable[A, This]
trait IterableLike[+A, +Repr]
extends Equals with TraversableLike[A, Repr]
trait TraversableLike[+A, +Repr]
extends HasNewBuilder[A, Repr] with AnyRef
If you pass the type parameters of Map[Int, String] all the way down the chain, we find that the types passed to TraversableLike, and, thus, used by map, are:
A = (Int,String)
Repr = Map[Int, String]
Going back to the example, the first map is receiving a function of type ((Int, String)) => (Int, Int) and the second map is receiving a function of type ((Int, String)) => String. I use the double parenthesis to emphasize it is a tuple being received, as that's the type of A as we saw.
With that information, let's consider the other types.
map Function.tupled(_ -> _.length):
B = (Int, Int)
map (_._2):
B = String
We can see that the type returned by the first map is Map[Int,Int], and the second is Iterable[String]. Looking at map's definition, it is easy to see that these are the values of That. But where do they come from?
If we look inside the companion objects of the classes involved, we see some implicit declarations providing them. On object Map:
implicit def canBuildFrom [A, B] : CanBuildFrom[Map, (A, B), Map[A, B]]
And on object Iterable, whose class is extended by Map:
implicit def canBuildFrom [A] : CanBuildFrom[Iterable, A, Iterable[A]]
These definitions provide factories for parameterized CanBuildFrom.
Scala will choose the most specific implicit available. In the first case, it was the first CanBuildFrom. In the second case, as the first did not match, it chose the second CanBuildFrom.
Back to the Question
Let's see the code for the question, List's and map's definition (again) to see how the types are inferred:
val map : Map[Int,String] = List("London", "Paris").map(x => (x.length, x))(breakOut)
sealed abstract class List[+A]
extends LinearSeq[A] with Product with GenericTraversableTemplate[A, List] with LinearSeqLike[A, List[A]]
trait LinearSeqLike[+A, +Repr <: LinearSeqLike[A, Repr]]
extends SeqLike[A, Repr]
trait SeqLike[+A, +Repr]
extends IterableLike[A, Repr]
trait IterableLike[+A, +Repr]
extends Equals with TraversableLike[A, Repr]
trait TraversableLike[+A, +Repr]
extends HasNewBuilder[A, Repr] with AnyRef
def map[B, That](f : (A) => B)(implicit bf : CanBuildFrom[Repr, B, That]) : That
The type of List("London", "Paris") is List[String], so the types A and Repr defined on TraversableLike are:
A = String
Repr = List[String]
The type for (x => (x.length, x)) is (String) => (Int, String), so the type of B is:
B = (Int, String)
The last unknown type, That is the type of the result of map, and we already have that as well:
val map : Map[Int,String] =
So,
That = Map[Int, String]
That means breakOut must, necessarily, return a type or subtype of CanBuildFrom[List[String], (Int, String), Map[Int, String]].

I'd like to build upon Daniel's answer. It was very thorough, but as noted in the comments, it doesn't explain what breakout does.
Taken from Re: Support for explicit Builders (2009-10-23), here is what I believe breakout does:
It gives the compiler a suggestion as to which Builder to choose implicitly (essentially it allows the compiler to choose which factory it thinks fits the situation best.)
For example, see the following:
scala> import scala.collection.generic._
import scala.collection.generic._
scala> import scala.collection._
import scala.collection._
scala> import scala.collection.mutable._
import scala.collection.mutable._
scala>
scala> def breakOut[From, T, To](implicit b : CanBuildFrom[Nothing, T, To]) =
| new CanBuildFrom[From, T, To] {
| def apply(from: From) = b.apply() ; def apply() = b.apply()
| }
breakOut: [From, T, To]
| (implicit b: scala.collection.generic.CanBuildFrom[Nothing,T,To])
| java.lang.Object with
| scala.collection.generic.CanBuildFrom[From,T,To]
scala> val l = List(1, 2, 3)
l: List[Int] = List(1, 2, 3)
scala> val imp = l.map(_ + 1)(breakOut)
imp: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 3, 4)
scala> val arr: Array[Int] = l.map(_ + 1)(breakOut)
imp: Array[Int] = Array(2, 3, 4)
scala> val stream: Stream[Int] = l.map(_ + 1)(breakOut)
stream: Stream[Int] = Stream(2, ?)
scala> val seq: Seq[Int] = l.map(_ + 1)(breakOut)
seq: scala.collection.mutable.Seq[Int] = ArrayBuffer(2, 3, 4)
scala> val set: Set[Int] = l.map(_ + 1)(breakOut)
seq: scala.collection.mutable.Set[Int] = Set(2, 4, 3)
scala> val hashSet: HashSet[Int] = l.map(_ + 1)(breakOut)
seq: scala.collection.mutable.HashSet[Int] = Set(2, 4, 3)
You can see the return type is implicitly chosen by the compiler to best match the expected type. Depending on how you declare the receiving variable, you get different results.
The following would be an equivalent way to specify a builder. Note in this case, the compiler will infer the expected type based on the builder's type:
scala> def buildWith[From, T, To](b : Builder[T, To]) =
| new CanBuildFrom[From, T, To] {
| def apply(from: From) = b ; def apply() = b
| }
buildWith: [From, T, To]
| (b: scala.collection.mutable.Builder[T,To])
| java.lang.Object with
| scala.collection.generic.CanBuildFrom[From,T,To]
scala> val a = l.map(_ + 1)(buildWith(Array.newBuilder[Int]))
a: Array[Int] = Array(2, 3, 4)

Daniel Sobral's answer is great, and should be read together with Architecture of Scala Collections (Chapter 25 of Programming in Scala).
I just wanted to elaborate on why it is called breakOut:
Why is it called breakOut?
Because we want to break out of one type and into another:
Break out of what type into what type? Lets look at the map function on Seq as an example:
Seq.map[B, That](f: (A) -> B)(implicit bf: CanBuildFrom[Seq[A], B, That]): That
If we wanted to build a Map directly from mapping over the elements of a sequence such as:
val x: Map[String, Int] = Seq("A", "BB", "CCC").map(s => (s, s.length))
The compiler would complain:
error: type mismatch;
found : Seq[(String, Int)]
required: Map[String,Int]
The reason being that Seq only knows how to build another Seq (i.e. there is an implicit CanBuildFrom[Seq[_], B, Seq[B]] builder factory available, but there is NO builder factory from Seq to Map).
In order to compile, we need to somehow breakOut of the type requirement, and be able to construct a builder that produces a Map for the map function to use.
As Daniel has explained, breakOut has the following signature:
def breakOut[From, T, To](implicit b: CanBuildFrom[Nothing, T, To]): CanBuildFrom[From, T, To] =
// can't just return b because the argument to apply could be cast to From in b
new CanBuildFrom[From, T, To] {
def apply(from: From) = b.apply()
def apply() = b.apply()
}
Nothing is a subclass of all classes, so any builder factory can be substituted in place of implicit b: CanBuildFrom[Nothing, T, To]. If we used the breakOut function to provide the implicit parameter:
val x: Map[String, Int] = Seq("A", "BB", "CCC").map(s => (s, s.length))(collection.breakOut)
It would compile, because breakOut is able to provide the required type of CanBuildFrom[Seq[(String, Int)], (String, Int), Map[String, Int]], while the compiler is able to find an implicit builder factory of type CanBuildFrom[Map[_, _], (A, B), Map[A, B]], in place of CanBuildFrom[Nothing, T, To], for breakOut to use to create the actual builder.
Note that CanBuildFrom[Map[_, _], (A, B), Map[A, B]] is defined in Map, and simply initiates a MapBuilder which uses an underlying Map.
Hope this clears things up.

A simple example to understand what breakOut does:
scala> import collection.breakOut
import collection.breakOut
scala> val set = Set(1, 2, 3, 4)
set: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
scala> set.map(_ % 2)
res0: scala.collection.immutable.Set[Int] = Set(1, 0)
scala> val seq:Seq[Int] = set.map(_ % 2)(breakOut)
seq: Seq[Int] = Vector(1, 0, 1, 0) // map created a Seq[Int] instead of the default Set[Int]

Related

Difference between type constructors and parametrized type bounds in Scala

Take a look at the following code:
case class MyTypeConstructor[T[_]: Seq, A](mySeq: T[A]) {
def map[B](f: A => B): T[B] = mySeq.map(f) // value map is not a member of type parameter T[A]
}
case class MyTypeBounds[T[A] <: Seq[A], A](mySeq: T[A]) {
def map[B](f: A => B): T[B] = mySeq.map(f)
}
Ideally both would do the same thing, just define a dummy map that calls the map method from Seq. However, the first one does not event compile while the second one works (actually the second one doesen't work either but I am omitting things for simplicity).
The compilation error I get is that T[A] does not have a member map, but I am stranged because the type constructor T should return a Seq (which does have map).
Can anyone explain me what is conceptually different between these two implementations?
T[_]: Seq
This does not say "T[_] should return a Seq-like this". That's what your second example correctly states. This says "T[_] should satisfy an implicit whose name is Seq". But T takes parameters, so it can't really be a part of an implicit. Essentially, it's trying to do
case class MyTypeConstructor[T[_], A](mySeq: T[A])(implicit arg: Seq[T[_]])
But Seq[T[_]] doesn't make sense as an argument to a function, first off because T takes a parameter which is not provided* and second because Seq is not intended to be used as an implicit.
We can see that this is an odd construct because you can remove myMap and still get an error.
// error: type T takes type parameters
case class MyTypeConstructor[T[_]: Seq, A](mySeq: T[A]) {}
*Theoretically, the compiler could treat T[_]: Seq as a declaration that an implicit existential argument is required, but that isn't what it does now and would be of questionable utility, even if it did.
what is conceptually different between these two implementations?
We can constrain polymorphic type parameters either using subtyping or type class approach
scala> case class Subtyping[T[A] <: Seq[A], A](xs: T[A]) {
| def map[B](f: A => B) = xs.map(f)
| }
|
| import scala.collection.BuildFrom
|
| case class TypeClassVanilla[T[x] <: IterableOnce[x], A](xs: T[A]) {
| def map[B](f: A => B)(implicit bf: BuildFrom[T[A], B, T[B]]): T[B] =
| bf.fromSpecific(xs)(xs.iterator.map(f))
| }
|
| import cats.Functor
| import cats.syntax.all._
|
| case class TypeClassCats[T[_]: Functor, A](xs: T[A]) {
| def map[B](f: A => B): T[B] =
| xs.map(f)
| }
class Subtyping
import scala.collection.BuildFrom
class TypeClassVanilla
import cats.Functor
import cats.syntax.all._
class TypeClassCats
scala> val xs = List(1, 2, 3)
val xs: List[Int] = List(1, 2, 3)
scala> Subtyping(xs).map(_ + 1)
val res0: Seq[Int] = List(2, 3, 4)
scala> TypeClassCats(xs).map(_ + 1)
val res1: List[Int] = List(2, 3, 4)
scala> TypeClassVanilla(xs).map(_ + 1)
val res2: List[Int] = List(2, 3, 4)
They are different approaches to achieving the same thing. With type class approach perhaps we do not have to worry as much about organising inheritance hierarchies, which as system grows in complexity, might lead us to start artificially forcing things into hierarchy.

Non-unary type constructor bounded by unary type constructor

The title is attempting to describe the following subtyping
implicitly[Map[Int, String] <:< Iterable[(Int, String)]]
Type parameter A is inferred to (Int, String) here
def foo[A](cc: Iterable[A]): A = cc.head
lazy val e: (Int, String) = foo(Map.empty[Int, String])
however attempting to achieve similar effect using type parameter bounds the best I can do is explicitly specifying arity of the type constructor like so
def foo[F[x,y] <: Iterable[(x,y)], A, B](cc: F[A, B]): (A, B) = cc.head
lazy val e: (Int, String) = foo(Map.empty[Int, String])
because the following errors
def foo[F[x] <: Iterable[x], A](cc: F[A]) = cc.head
lazy val e: (Int, String) = foo(Map.empty[Int, String])
// type mismatch;
// [error] found : A
// [error] required: (Int, String)
// [error] lazy val e: (Int, String) = foo(Map.empty[Int, String])
// [error] ^
Hence using Iterable as upper bound it seems we need one signature to handle unary type constructors Seq and Set, and a separate signature to handle 2-arity type constructor Map
def foo[F[x] <: Iterable[x], A](cc: F[A]): A // When F is Seq or Set
def foo[F[x,y] <: Iterable[(x,y)], A, B](cc: F[A, B]): (A, B) // When F is Map
Is there a way to have a single signature using type bounds that works for all three? Putting it differently, how could we write, say, an extension method that works across all collections?
I think the issue here is that F is set to Map, and kindness is wrong. You would have to have say: I have some type X, that extends F[A], so that when I upcast it, I can use it as F[A] - which in turn we want to be a subtype of Iterable[A]. If we ask about it this way, it sounds hard.
Which is why I personally would just stay at:
# def foo[A](x: Iterable[A]): A = x.head
defined function foo
# foo(List(1 -> "test"))
res24: (Int, String) = (1, "test")
# foo(Map(1 -> "test"))
res25: (Int, String) = (1, "test")
"Give me any x that is an instance of Iterable[A] for A".
If I had to do some derivation... I would probably also go this way. I think this limitation is the reason CanBuildFrom works the way it works - providing matching for part of the type is hard, especially in cases like Map, so let's provide a whole type at once as a parameter, to limit the number of inference needed.

How to construct an empty Iterable subclass, generically, and in Scala.js

Normally breakout would aid in conversion from one collection to another, but it doesn't seem to be able to infer the necessary colleciton constuctor for C:
import scala.collection.breakOut
object Utils {
implicit class IterableExtra[T, C[X] <: Iterable[X]](val list: C[T]) extends AnyVal {
def empty: C[T] = Iterable.empty[T].map(x => x)(breakOut)
}
}
Ideally this would work with minimal reflection, so that it might work in scala.js
Update I was also trying to use this in a different way, and I forgot to have the implicit at the outermost level:
def testIterableEmpty[B, I[X] <: Iterable[X]](implicit cbf: CanBuildFrom[I[B], B, I[B]]): I[B] = {
def emptyIter: I[B] = cbf().result()
emptyIter
}
scala> val x: List[Int] = testIterableEmpty[Int, List]
x: List[Int] = List()
breakOut is defined like so:
def breakOut[From, T, To](implicit b: CanBuildFrom[Nothing, T, To]): CanBuildFrom[From, T, To]
So it cannot be used to avoid passing a CanBuildFrom into your empty method - it requires one itself. Luckily, it is easy to write - you want to create a C[T] out of C[T], and the element type is T, so:
def empty(implicit cbf: CanBuildFrom[C[T], T, C[T]]): C[T] =
Iterable.empty[T].map(x => x)(breakOut)
Tho since you have a CanBuildFrom instance anyway, the implementation using it directly is straightforward too:
def empty(implicit cbf: CanBuildFrom[C[T], T, C[T]]): C[T] =
cbf().result()

Comparing Haskell and Scala Bind/Flatmap Examples

The following bind(>>=) code, in Haskell, does not compile:
ghci> [[1]] >>= Just
<interactive>:38:11:
Couldn't match type ‘Maybe’ with ‘[]’
Expected type: [t] -> [[t]]
Actual type: [t] -> Maybe [t]
In the second argument of ‘(>>=)’, namely ‘Just’
In the expression: [[1]] >>= Just
But, in Scala, it does actually compile and run:
scala> List( List(1) ).flatMap(x => Some(x) )
res1: List[List[Int]] = List(List(1))
Haskell's >>= signature is:
>>= :: Monad m => m a -> (a -> m b) -> m b
So, in [[1]] >>= f, f's type should be: a -> [b].
Why does the Scala code compile?
As #chi explained Scala's flatMap is more general than the Haskell's >>=. The full signature from the Scala docs is:
final def flatMap[B, That](f: (A) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]): That
This implicit isn't relevant for this specific problem, so we could as well use the simpler definition:
final def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): List[B]
There is only one Problem, Option is no subclass of GenTraversableOnce, here an implicit conversion comes in. Scala defines an implicit conversion from Option to Iterable which is a subclass of Traversable which is a subclass of GenTraversableOnce.
implicit def option2Iterable[A](xo: Option[A]): Iterable[A]
The implicit is defined in the companion object of Option.
A simpler way to see the implicit at work is to assign a Option to an Iterable val:
scala> val i:Iterable[Int] = Some(1)
i: Iterable[Int] = List(1)
Scala uses some defaulting rules, to select List as the implementation of Iterable.
The fact that you can combine different subtypes of TraversableOnce with monad operations comes from the implicit class MonadOps:
implicit class MonadOps[+A](trav: TraversableOnce[A]) {
def map[B](f: A => B): TraversableOnce[B] = trav.toIterator map f
def flatMap[B](f: A => GenTraversableOnce[B]): TraversableOnce[B] = trav.toIterator flatMap f
def withFilter(p: A => Boolean) = trav.toIterator filter p
def filter(p: A => Boolean): TraversableOnce[A] = withFilter(p)
}
This enhances every TraversableOnce with the methods above. The subtypes are free to define more efficient versions on there own, these will shadow the implicit definitions. This is the case for List.
Quoting from the Scala reference for List
final def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): List[B]
So, flatMap is more general than Haskell's (>>=), since it only requires the mapped function f to generate a traversable type, not necessarily a List.

Scala: Option, Some and the ArrowAssoc operator

I am trying to analyze the following piece of Scala code:
import java.nio.file._
import scala.Some
abstract class MyCustomDirectoryIterator[T](path:Path,someNumber:Int, anotherNum:Int) extends Iterator[T] {
def getCustomIterator(myPath:Path):Option[(DirectoryStream[Path],
Iterator[Path])] = try {
//we get the directory stream
val str = Files.newDirectoryStream(myPath)
//then we get the iterator out of the stream
val iter = str.iterator()
Some((str -> iter))
} catch {
case de:DirectoryIteratorException =>
printstacktrace(de.getMessage)
None
}
How do I interpert this piece of code: Some((str -> iter))
Yes, it is returning a value of type:
Option[(DirectoryStream[Path], Iterator[Path])]
The -> operator is, to the best of my understanding, ArrowAssoc from the scala.Predef package.
implicit final class ArrowAssoc[A] extends AnyVal
But I still do not understand what the -> thing is doing to give me a return value of type:
Option[(DirectoryStream[Path], Iterator[Path])]
Can the Scala experts out here throw more light on this? Is there any way to write the "Some(..)" thing in a more readable way? I do understand the role played by Some, though.
The -> operator just creates a tuple:
scala> 1 -> "one"
res0: (Int, String) = (1,one)
which is equivalent to
scala> (1, "one")
res1: (Int, String) = (1,one)
I was just going to add the source code, but Reactormonk has got there first ;-)
The -> method is made available on any object via the implicit ArrowAssoc class. Calling it on an object of type A, passing a parameter of type B, creates a Tuple2[A, B].
The usual case for the -> operator is
Map(1 -> "foo", 2 -> "bar")
which is the same as
Map((1, "foo"), (2, "bar"))
Which works because the signature for Map.apply is
def apply[A, B](elems: Tuple2[A, B]*): Map[A, B]
which means it takes tuples as arguments and constructs a Map from
it.
So
(1 -> "foo")
is equivalent to
(1, "foo")
From the compiler sources:
implicit final class ArrowAssoc[A](private val self: A) extends AnyVal {
#inline def -> [B](y: B): Tuple2[A, B] = Tuple2(self, y)
def →[B](y: B): Tuple2[A, B] = ->(y)
}
which tells you directly it's creating a tuple. And that 1 → "foo" works as well.