Why does scala reduce() and reduceLeft() reduce the sequence of the values in the same way? - scala

If I run:
List(1, 4, 3, 9).reduce {(a, b) => println(s"$a + $b"); a + b}
The result is:
1 + 4
5 + 3
8 + 9
However, If I use reduceLeft instead of reduce, it also prints:
1 + 4
5 + 3
8 + 9
I have thought that reduce reduces the sequence of values in this way:
(1+4) + (3+9)
(5) + (12)
(17)
What is the real difference between reduceLeft and reduce?

Take a look at the type of the two functions:
def reduce[B >: A](op: (B, B) => B): B
def reduceLeft[B >: A](op: (B, A) => B): B
Notice, how the second argument to op in reduceLeft is A (same as the element type), while in reduce it is B (same as the return value).
This is an important difference. The operation in reduce must be associative, meaning that you can split the list into several segments, apply operation to each of them separately, and then apply it again to the results to combine.
This is important when you want to do things in parallel: reduce can be applied in parallel to different parts of the list, and then the results can be combined.
reduceLeft on the other hand, can only work sequentially (left to right, as the name implies), operation does not have to be associative, and can expect its second parameter to always be an element of the sequence (and the first one – result of the operation so far);
Consider
val sum = (1 to 100).reduce(_ + _)
This yields the same result as (1 to 100).reduceLeft(_ + _), except that the former can be parallelized.
On the other hand:
val strings = (1 to 100).reduceLeft[Any](_.toString + ";" + _.toString)
Should not be written as reduce (even though, in this contrived example, it can, and will even work as long as you stick with serial processing), because the result depends on the order in which elements are fed to the operation.

The difference is:
reduce is allowed to reduce in any order; for a List it happens to be the same as reduceLeft because it's more efficient (and it's the default implementation for IterableOnceOps subtypes, so most of them will be the same). For another collection it could be equivalent to reduceRight, and for a tree type it may be more as you expected.
different type signatures:
reduce[B >: A](op: (B, B) => B): B
versus
reduceLeft[B >: A](op: (B, A) => B): B
because the first parameter to op in reduceLeft is always an element of the collection it's called on, but for reduce it could be the result of a recursive reduce call.

Related

What's the diff between reduceLeft and reduceRight in Scala?

What's the diff between reduceLeft and reduceRight in Scala?
val list = List(1, 0, 0, 1, 1, 1)
val sum1 = list reduceLeft {_ + _}
val sum2 = list reduceRight {_ + _}
println { sum2 == sum2 }
In my snippet sum1 = sum2 = 4, so the order does not matter here.
When do they produce the same result
As Lionel already pointed out, reduceLeft and reduceRight only produce the same result if the function you are using to combine the elements is associative (this isn't always true, see my note at the bottom). For instance when running reduceLeft and reduceRight on Seq(1,2,3) with the function (a: Int, b: Int) => a - b you get a different result.
scala> Seq(1,2,3)
res0: Seq[Int] = List(1, 2, 3)
scala> res0.reduceLeft(_ - _)
res5: Int = -4
scala> res0.reduceRight(_ - _)
res6: Int = 2
Why this happens can be made clear if we look at how each of the functions is applied over the list.
For reduceRight this is what the calls look like if we were to unwrap them.
(1 - (2 - 3))
(1 - (-1))
2
For reduceLeft the sequence is built up starting from the left,
((1 - 2) - 3)
((-1) - 3)
(-4)
Tail Recursion
Further because reduceLeft is implemented using Tail Recursion, it will not stack overflow when operating on very large collections (possibly even infinite). reduceRight is not tail recursive, so given a collection of large enough size, it will produce a stack overflow.
For instance, on my machine if I run the following I get an Out of Memory error,
scala> (0 to 100000000).reduceRight(_ - _)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Integer.valueOf(Integer.java:832)
at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:65)
at scala.collection.immutable.Range.apply(Range.scala:61)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reversed(TraversableOnce.scala:99)
at scala.collection.AbstractIterator.reversed(Iterator.scala:1194)
at scala.collection.TraversableOnce$class.reduceRight(TraversableOnce.scala:197)
at scala.collection.AbstractIterator.reduceRight(Iterator.scala:1194)
at scala.collection.IterableLike$class.reduceRight(IterableLike.scala:85)
at scala.collection.AbstractIterable.reduceRight(Iterable.scala:54)
... 20 elided
But if I compute with reduceLeft I don't get the OOM,
scala> (0 to 100000000).reduceLeft(_ - _)
res16: Int = -987459712
You might get slightly different results on your system, depending your JVM default memory settings.
Prefer left versions
So, because of tail recursion, if you know that reduceLeft and reduceRight will produce the same value, you should prefer the reduceLeft variant. This generally holds true of the other left/right functions, such as foldRight and foldLeft (which are just more general versions of reduceRight and reduceLeft).
When do they really always produce the same result
A small note about reduceLeft and reduceRight and the Associative Property of the function you are using. I said that reduceRight and reduceLeft only produce the same results if the operator is associative. This isn't always true for all collection types. That is somewhat of another topic though, so consult the ScalaDoc, but in short the function you are reducing with needs to be both commutative and associative in order to get the same results for all collection types.
Reduce left doesn't always equal the same result as reduce right. Consider an asymmetric function on your array.
Assuming the same result, performance is one obvious difference
See performance-characteristics
The data structure is build with constant access time to head and tail. Iterating backwards will perform worse for large lists.
The best way to know the difference is read the source code in library/scala/collection/LinearSeqOptimized.scala:
def reduceLeft[B >: A](f: (B, A) => B): B =
......
tail.foldLeft[B](head)(f)
def reduceRight[B >: A](op: (A, B) => B): B =
......
op(head, tail.reduceRight(op))
def foldLeft[B](z: B)(f: (B, A) => B): B = {
var acc = z
var these = this
while (!these.isEmpty) {
acc = f(acc, these.head)
these = these.tail
}
acc
}
The above is some key part of the code, and you can see reduceLeft is based on foldLeft , while reduceRight is implemented via recursion.
I guess reduceLeft has a better performance.

How does aggregate work in scala?

I knew how a normal aggregate works in scala and its use over fold. Tried a lot to know how the below code works, but couldn't. Could someone help me in explaining how it works and gives me a output of (10,4)
val input=List(1,2,3,4)
val result = input.aggregate((0, 0))(
(acc, value) => (acc._1 + value, acc._2 + 1),
(acc1, acc2) => (acc1._1 + acc2._1, acc1._2 + acc2._2))
Could someone help me in explaining how it works and gives me a output
of (10,4)
When using aggregate, you provide three parameters:
the initial value from which you accumulate elements from a partition, often it's the neutral element
a function that given a partition, will accumulate the result within it
a function that will combine two partitions
So in your case, the initial value for a partition is the tuple (0, 0).
Then the accumulator function you defined will sum the current element you're traversing with the first element of the tuple and increment the second element of the tuple by one. In fact, it will compute the sum of the elements in a partition and its number of elements.
The combiner function combined two tuples. As you defined it, it will sum the sums and count the number of elements of 2 partitions. It's not used in your case because you traverse the pipeline sequentially. You could call .par on the List so that you get a parallel implementation to see the combiner in action (note that it has to be an associative function).
Thus you get (10, 4) because 1+2+3+4=10 and there was 4 elements in the list (you did 4 additions).
You could add a print statement in the accumulator function (running on a sequential input), to see how it behaves:
Acc: (0,0) - value:1
Acc: (1,1) - value:2
Acc: (3,2) - value:3
Acc: (6,3) - value:4
I knew how a normal aggregate works in scala and its use over fold.
For a sequential input, aggregate is a foldLeft:
def aggregate[B](z: =>B)(seqop: (B, A) => B, combop: (B, B) => B): B = foldLeft(z)(seqop)
For a parallel input, the list is split into chunks so that multiple threads can work separately. The accumulator function is run on each chunk, using the initial value. When two threads need to merge their results, the combine function is used:
def aggregate[S](z: =>S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
tasksupport.executeAndWaitResult(new Aggregate(() => z, seqop, combop, splitter))
}
This is the principle of the fork-join model but it requires that your task can be parallelizable well. It's the case here, because a thread does not need to know the result of another thread to do its job.

What is point of /: function?

Documentation for /: includes
Note: might return different results for different runs, unless the underlying collection type is ordered or the operator
is associative and commutative.
( src)
This just applies if the par version of this function is run, otherwise the result is deterministic (same as foldLeft) ?
Also this function is calling foldLeft under the hood : def /:[B](z: B)(op: (B, A) => B): B = foldLeft(z)(op)
Their function definitions are same (except for function param label, "op" instad of "f") :
def /:[B](z: B)(op: (B, A) ⇒ B): B
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
For these reasons what is point of /: function and when should it be used in favour of foldLeft ?
Is my reasoning incorrect ?
It's just an alternative syntax. Methods ending in : are called on the right hand side.
Instead of
list.foldLeft(0) { op(_, _) }
or
list./:(0) { op(_, _) }
you can
( z /: list ) { op(_, _) }
For example,
scala> val a = List(1,2,3,4)
a: List[Int] = List(1, 2, 3, 4)
scala> ( 0 /: a ) { _ + _ }
res5: Int = 10
Yes, those are aliases originating from dark times when people liked their operators like this:
val x = y |#<#|: z.
The point of the note is to remind that for collections with unspecified iteration order the result of folds might differ. Consider having a Set {1,2,3} that doesn't guarantee the same access order even if left unmodified, and applying an operation that is not e. g. associative (like /). Even if run not after par call, this might result in the following (pseudocode):
{1,2,3} foldLeft / ==> (1 / 2) / 3 ==> 1/6 = 0.1(6)
{3,1,2} foldLeft / ==> (3 / 1) / 2 ==> 3/2 = 1.5
In terms of consistency this is similar to applying non-parallelizable operations to parallel collections, though.

Scala : fold vs foldLeft

I am trying to understand how fold and foldLeft and the respective reduce and reduceLeft work. I used fold and foldLeft as my example
scala> val r = List((ArrayBuffer(1, 2, 3, 4),10))
scala> r.foldLeft(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1)
scala> res28: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(5)
scala> r.fold(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1)
<console>:11: error: value _1 is not a member of Serializable with Equals
r.fold(ArrayBuffer(1,2,4,5))((x,y) => x -- y._1)
Why fold didn't work as foldLeft? What is Serializable with Equals? I understand fold and foldLeft has slight different API signature in terms of parameter generic types. Please advise. Thanks.
The method fold (originally added for parallel computation) is less powerful than foldLeft in terms of types it can be applied to. Its signature is:
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
This means that the type over which the folding is done has to be a supertype of the collection element type.
def foldLeft[B](z: B)(op: (B, A) => B): B
The reason is that fold can be implemented in parallel, while foldLeft cannot. This is not only because of the *Left part which implies that foldLeft goes from left to right sequentially, but also because the operator op cannot combine results computed in parallel -- it only defines how to combine the aggregation type B with the element type A, but not how to combine two aggregations of type B. The fold method, in turn, does define this, because the aggregation type A1 has to be a supertype of the element type A, that is A1 >: A. This supertype relationship allows in the same time folding over the aggregation and elements, and combining aggregations -- both with a single operator.
But, this supertype relationship between the aggregation and the element type also means that the aggregation type A1 in your example should be the supertype of (ArrayBuffer[Int], Int). Since the zero element of your aggregation is ArrayBuffer(1, 2, 4, 5) of the type ArrayBuffer[Int], the aggregation type is inferred to be the supertype of both of these -- and that's Serializable with Equals, the only least upper bound of a tuple and an array buffer.
In general, if you want to allow parallel folding for arbitrary types (which is done out of order) you have to use the method aggregate which requires defining how two aggregations are combined. In your case:
r.aggregate(ArrayBuffer(1, 2, 4, 5))({ (x, y) => x -- y._1 }, (x, y) => x intersect y)
Btw, try writing your example with reduce/reduceLeft -- because of the supertype relationship between the element type and the aggregation type that both these methods have, you will find that it leads to a similar error as the one you've described.

Example of the Scala aggregate function

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.
Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:
val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))
After applying aggregate:
Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))
Also, can you give example of parameters z, segop, and combop? I'm unclear on what these parameters do.
Let's see if some ascii art doesn't help. Consider the type signature of aggregate:
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
Also, note that A refers to the type of the collection. So, let's say we have 4 elements in this collection, then aggregate might work like this:
z A z A z A z A
\ / \ /seqop\ / \ /
B B B B
\ / combop \ /
B _ _ B
\ combop /
B
Let's see a practical example of that. Say I have a GenSeq("This", "is", "an", "example"), and I want to know how many characters there are in it. I can write the following:
Note the use of par in the below snippet of code. The second function passed to aggregate is what is called after the individual sequences are computed. Scala is only able to do this for sets that can be parallelized.
import scala.collection.GenSeq
val seq = GenSeq("This", "is", "an", "example")
val chars = seq.par.aggregate(0)(_ + _.length, _ + _)
So, first it would compute this:
0 + "This".length // 4
0 + "is".length // 2
0 + "an".length // 2
0 + "example".length // 7
What it does next cannot be predicted (there are more than one way of combining the results), but it might do this (like in the ascii art above):
4 + 2 // 6
2 + 7 // 9
At which point it concludes with
6 + 9 // 15
which gives the final result. Now, this is a bit similar in structure to foldLeft, but it has an additional function (B, B) => B, which fold doesn't have. This function, however, enables it to work in parallel!
Consider, for example, that each of the four computations initial computations are independent of each other, and can be done in parallel. The next two (resulting in 6 and 9) can be started once their computations on which they depend are finished, but these two can also run in parallel.
The 7 computations, parallelized as above, could take as little as the same time 3 serial computations.
Actually, with such a small collection the cost in synchronizing computation would be big enough to wipe out any gains. Furthermore, if you folded this, it would only take 4 computations total. Once your collections get larger, however, you start to see some real gains.
Consider, on the other hand, foldLeft. Because it doesn't have the additional function, it cannot parallelize any computation:
(((0 + "This".length) + "is".length) + "an".length) + "example".length
Each of the inner parenthesis must be computed before the outer one can proceed.
The aggregate function does not do that (except that it is a very general function, and it could be used to do that). You want groupBy. Close to at least. As you start with a Seq[(String, String)], and you group by taking the first item in the tuple (which is (String, String) => String), it would return a Map[String, Seq[(String, String)]). You then have to discard the first parameter in the Seq[String, String)] values.
So
list.groupBy(_._1).mapValues(_.map(_._2))
There you get a Map[String, Seq[(String, String)]. If you want a Seq instead of Map, call toSeq on the result. I don't think you have a guarantee on the order in the resulting Seq though
Aggregate is a more difficult function.
Consider first reduceLeft and reduceRight.
Let as be a non empty sequence as = Seq(a1, ... an) of elements of type A, and f: (A,A) => A be some way to combine two elements of type A into one. I will note it as a binary operator #, a1 # a2 rather than f(a1, a2). as.reduceLeft(#) will compute (((a1 # a2) # a3)... # an). reduceRight will put the parentheses the other way, (a1 # (a2 #... # an)))). If # happens to be associative, one does not care about the parentheses. One could compute it as (a1 #... # ap) # (ap+1 #...#an) (there would be parantheses inside the 2 big parantheses too, but let's not care about that). Then one could do the two parts in parallel, while the nested bracketing in reduceLeft or reduceRight force a fully sequential computation. But parallel computation is only possible when # is known to be associative, and the reduceLeft method cannot know that.
Still, there could be method reduce, whose caller would be responsible for ensuring that the operation is associative. Then reduce would order the calls as it sees fit, possibly doing them in parallel. Indeed, there is such a method.
There is a limitation with the various reduce methods however. The elements of the Seq can only be combined to a result of the same type: # has to be (A,A) => A. But one could have the more general problem of combining them into a B. One starts with a value b of type B, and combine it with every elements of the sequence. The operator # is (B,A) => B, and one computes (((b # a1) # a2) ... # an). foldLeft does that. foldRight does the same thing but starting with an. There, the # operation has no chance to be associative. When one writes b # a1 # a2, it must mean (b # a1) # a2, as (a1 # a2) would be ill-typed. So foldLeft and foldRight have to be sequential.
Suppose however, that each A can be turned into a B, let's write it with !, a! is of type B. Suppose moreover that there is a + operation (B,B) => B, and that # is such that b # a is in fact b + a!. Rather than combining elements with #, one could first transform all of them to B with !, then combine them with +. That would be as.map(!).reduceLeft(+). And if + is associative, then that can be done with reduce, and not be sequential: as.map(!).reduce(+). There could be an hypothetical method as.associativeFold(b, !, +).
Aggregate is very close to that. It may be however, that there is a more efficient way to implement b#a than b+a! For instance, if type B is List[A], and b#a is a::b, then a! will be a::Nil, and b1 + b2 will be b2 ::: b1. a::b is way better than (a::Nil):::b. To benefit from associativity, but still use #, one first splits b + a1! + ... + an!, into (b + a1! + ap!) + (ap+1! + ..+ an!), then go back to using # with (b # a1 # an) + (ap+1! # # an). One still needs the ! on ap+1, because one must start with some b. And the + is still necessary too, appearing between the parantheses. To do that, as.associativeFold(!, +) could be changed to as.optimizedAssociativeFold(b, !, #, +).
Back to +. + is associative, or equivalently, (B, +) is a semigroup. In practice, most of the semigroups used in programming happen to be monoids too, i.e they contain a neutral element z (for zero) in B, so that for each b, z + b = b + z = b. In that case, the ! operation that make sense is likely to be be a! = z # a. Moreover, as z is a neutral element b # a1 ..# an = (b + z) # a1 # an which is b + (z + a1 # an). So is is always possible to start the aggregation with z. If b is wanted instead, you do b + result at the end. With all those hypotheses, we can do as.aggregate(z, #, +). That is what aggregate does. # is the seqop argument (applied in a sequence z # a1 # a2 # ap), and + is combop (applied to already partially combined results, as in (z + a1#...#ap) + (z + ap+1#...#an)).
To sum it up, as.aggregate(z)(seqop, combop) computes the same thing as as.foldLeft(z)( seqop) provided that
(B, combop, z) is a monoid
seqop(b,a) = combop(b, seqop(z,a))
aggregate implementation may use the associativity of combop to group the computations as it likes (not swapping elements however, + has not to be commutative, ::: is not). It may run them in parallel.
Finally, solving the initial problem using aggregate is left as an exercise to the reader. A hint: implement using foldLeft, then find z and combo that will satisfy the conditions stated above.
The signature for a collection with elements of type A is:
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
z is an object of type B acting as a neutral element. If you want to count something, you can use 0, if you want to build a list, start with an empty list, etc.
segop is analoguous to the function you pass to fold methods. It takes two argument, the first one is the same type as the neutral element you passed and represent the stuff which was already aggregated on previous iteration, the second one is the next element of your collection. The result must also by of type B.
combop: is a function combining two results in one.
In most collections, aggregate is implemented in TraversableOnce as:
def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B
= foldLeft(z)(seqop)
Thus combop is ignored. However, it makes sense for parallel collections, becauseseqop will first be applied locally in parallel, and then combopis called to finish the aggregation.
So for your example, you can try with a fold first:
val seqOp =
(map:Map[String,Set[String]],tuple: (String,String)) =>
map + ( tuple._1 -> ( map.getOrElse( tuple._1, Set[String]() ) + tuple._2 ) )
list.foldLeft( Map[String,Set[String]]() )( seqOp )
// returns: Map(one -> Set(i, 1), two -> Set(2, ii), four -> Set(iv))
Then you have to find a way of collapsing two multimaps:
val combOp = (map1: Map[String,Set[String]], map2: Map[String,Set[String]]) =>
(map1.keySet ++ map2.keySet).foldLeft( Map[String,Set[String]]() ) {
(result,k) =>
result + ( k -> ( map1.getOrElse(k,Set[String]() ) ++ map2.getOrElse(k,Set[String]() ) ) )
}
Now, you can use aggregate in parallel:
list.par.aggregate( Map[String,Set[String]]() )( seqOp, combOp )
//Returns: Map(one -> Set(i, 1), two -> Set(2, ii), four -> Set(iv))
Applying the method "par" to list, thus using the parallel collection(scala.collection.parallel.immutable.ParSeq) of the list to really take advantage of the multi core processors. Without "par", there won't be any performance gain since the aggregate is not done on the parallel collection.
aggregate is like foldLeft but may executed in parallel.
As missingfactor says, the linear version of aggregate(z)(seqop, combop) is equivalent to foldleft(z)(seqop). This is however impractical in the parallel case, where we would need to combine not only the next element with the previous result (as in a normal fold) but we want to split the iterable into sub-iterables on which we call aggregate and need to combine those again. (In left-to-right order but not associative as we might have combined the last parts before the fist parts of the iterable.) This re-combining in in general non-trivial, and therefore, one needs a method (S, S) => S to accomplish that.
The definition in ParIterableLike is:
def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}
which indeed uses combop.
For reference, Aggregate is defined as:
protected[this] class Aggregate[S](z: S, seqop: (S, T) => S, combop: (S, S) => S, protected[this] val pit: IterableSplitter[T])
extends Accessor[S, Aggregate[S]] {
#volatile var result: S = null.asInstanceOf[S]
def leaf(prevr: Option[S]) = result = pit.foldLeft(z)(seqop)
protected[this] def newSubtask(p: IterableSplitter[T]) = new Aggregate(z, seqop, combop, p)
override def merge(that: Aggregate[S]) = result = combop(result, that.result)
}
The important part is merge where combop is applied with two sub-results.
Here is the blog on how aggregate enable performance on the multi cores processor with bench mark.
http://markusjais.com/scalas-parallel-collections-and-the-aggregate-method/
Here is video on "Scala parallel collections" talk from "Scala Days 2011".
http://days2011.scala-lang.org/node/138/272
The description on the video
Scala Parallel Collections
Aleksandar Prokopec
Parallel programming abstractions become increasingly important as the number of processor cores grows. A high-level programming model enables the programmer to focus more on the program and less on low-level details such as synchronization and load-balancing. Scala parallel collections extend the programming model of the Scala collection framework, providing parallel operations on datasets.
The talk will describe the architecture of the parallel collection framework, explaining their implementation and design decisions. Concrete collection implementations such as parallel hash maps and parallel hash tries will be described. Finally, several example applications will be shown, demonstrating the programming model in practice.
The definition of aggregate in TraversableOnce source is:
def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B =
foldLeft(z)(seqop)
which is no different than a simple foldLeft. combop doesn't seem to be used anywhere. I am myself confused as to what the purpose of this method is.
Just to clarify explanations of those before me, in theory the idea is that
aggregate should work like this, (I have changed the names of the parameters to make them clearer):
Seq(1,2,3,4).aggragate(0)(
addToPrev = (prev,curr) => prev + curr,
combineSums = (sumA,sumB) => sumA + sumB)
Should logically translate to
Seq(1,2,3,4)
.grouped(2) // split into groups of 2 members each
.map(prevAndCurrList => prevAndCurrList(0) + prevAndCurrList(1))
.foldLeft(0)(sumA,sumB => sumA + sumB)
Because the aggregation and mapping are separate, the original list could theoretically be split into different groups of different sizes and run in parallel or even on different machine.
In practice scala current implementation does not support this feature by default but you can do this in your own code.