Simplest way to define a type that is a sequence of a specific number of elements in Scala? - scala

Suppose I'm doing something like the following:
val a = complicatedChainOfSteps("c")
val b = complicatedChainOfSteps("d")
I'm interested in writing code like the following (to reduce code and copy/paste errors):
val Seq(a, b) = Seq("c", "d").map(complicatedChainOfSteps(_))
but having the compiler ensure that the number of elements matches, so the following don't compile:
val Seq(a, b) = Seq("c", "d", "e").map(s => s + s)
val Seq(a, b) = Seq("c").map(s => s + s)
I know that using tuples instead to ensure that the number of elements matches works when performing multiple assignment (e.g., val (a, b) = ("c", "d")), but you cannot map over tuples (which makes sense because they have heterogeneous types).
I also know I can just define my own types for sequence of 2 elements and sequence of 3 elements or whatever, but is there a convenient built in way of doing this? If not, what's the simplest way to define a type that is a sequence of a specific number of elements?

Related

Filtering RDDs based on value of Key

I have two RDDs that wrap the following arrays:
Array((3,Ken), (5,Jonny), (4,Adam), (3,Ben), (6,Rhonda), (5,Johny))
Array((4,Rudy), (7,Micheal), (5,Peter), (5,Shawn), (5,Aaron), (7,Gilbert))
I need to design a code in such a way that if I provide input as 3 I need to return
Array((3,Ken), (3,Ben))
If input is 6, output should be
Array((6,Rhonda))
I tried something like this:
val list3 = list1.union(list2)
list3.reduceByKey(_+_).collect
list3.reduceByKey(6).collect
None of these worked, can anyone help me out with a solution for this problem?
Given the following that you would have to define yourself
// Provide you SparkContext and inputs here
val sc: SparkContext = ???
val array1: Array[(Int, String)] = ???
val array2: Array[(Int, String)] = ???
val n: Int = ???
val rdd1 = sc.parallelize(array1)
val rdd2 = sc.parallelize(array2)
You can use the union and filter to reach your goal
rdd1.union(rdd2).filter(_._1 == n)
Since filtering by key is something that you would probably want to do in several occasions, it makes sense to encapsulate this functionality in its own function.
It would also be interesting if we could make sure that this function could work on any type of keys, not only Ints.
You can express this in the old RDD API as follows:
def filterByKey[K, V](rdd: RDD[(K, V)], k: K): RDD[(K, V)] =
rdd.filter(_._1 == k)
You can use it as follows:
val rdd = rdd1.union(rdd2)
val filtered = filterByKey(rdd, n)
Let's look at this method a little bit more in detail.
This method allows to filterByKey and RDD which contains a generic pair, where the type of the first item is K and the type of the second type is V (from key and value). It also accepts a key of type K that will be used to filter the RDD.
You then use the filter function, that takes a predicate (a function that goes from some type - in this case K - to a Boolean) and makes sure that the resulting RDD only contains items that respect this predicate.
We could also have written the body of the function as:
rdd.filter(pair => pair._1 == k)
or
rdd.filter { case (key, value) => key == k }
but we took advantage of the _ wildcard to express the fact that we want to act on the first (and only) parameter of this anonymous function.
To use it, you first parallelize your RDDs, call union on them and then invoke the filterByKey function with the number you want to filter by (as shown in the example).

Compile time enforcement of length equality for a runtime list

Suppose I have a runtime list of numbers Int, the length of which I do not know at compile time, e.g.
numbers: Seq[Int].
I also have function that takes an Int element and returns an A, e.g.
f: Int => A
I have another function that takes an Int element and returns a B, e.g.
g: Int => B
If I map over the list with f and g separately, I end up with 2 lists:
val list1: Seq[A] = numbers.map(f)
val list2: Seq[B] = numbers.map(g)
Is there any way for me to write a function that works with these two lists and ensures at compile that the number of elements in both are the same?
Since both list1 and list2 are maped from the same list (numbers), I'm thinking it might be possible using Shapeless' Sized, but I couldn't figure out how to do it without knowledge of numbers's length at compile time.
A low tech solution to this problem would be to avoid creating two separate lists in the first place,
val list12: Seq[(A, B)] = numbers.map { i => (f(i), g(i)) }
I recommend exploring this avenue before deploying heavy artillery.

How can I functionally iterate over a collection combining elements?

I have a sequence of values of type A that I want to transform to a sequence of type B.
Some of the elements with type A can be converted to a B, however some other elements need to be combined with the immediately previous element to produce a B.
I see it as a small state machine with two states, the first one handling the transformation from A to B when just the current A is needed, or saving A if the next row is needed and going to the second state; the second state combining the saved A with the new A to produce a B and then go back to state 1.
I'm trying to use scalaz's Iteratees but I fear I'm overcomplicating it, and I'm forced to return a dummy B when the input has reached EOF.
What's the most elegant solution to do it?
What about invoking the sliding() method on your sequence?
You might have to put a dummy element at the head of the sequence so that the first element (the real head) is evaluated/converted correctly.
If you map() over the result from sliding(2) then map will "see" every element with its predecessor.
val input: Seq[A] = ??? // real data here (no dummy values)
val output: Seq[B] = (dummy +: input).sliding(2).flatMap(a2b).toSeq
def a2b( arg: Seq[A] ): Seq[B] = {
// arg holds 2 elements
// return a Seq() of zero or more elements
}
Taking a stab at it:
Partition your list into two lists. The first is the one you can directly convert and the second is the one that you need to merge.
scala> val l = List("String", 1, 4, "Hello")
l: List[Any] = List(String, 1, 4, Hello)
scala> val (string, int) = l partition { case s:String => true case _ => false}
string: List[Any] = List(String, Hello)
int: List[Any] = List(1, 4)
Replace the logic in the partition block with whatever you need.
After you have the two lists, you can do whatever you need to with your second using something like this
scala> string ::: int.collect{case i:Integer => i}.sliding(2).collect{
| case List(a, b) => a+b.toString}.toList
res4: List[Any] = List(String, Hello, 14)
You would replace the addition with whatever your aggregate function is.
Hopefully this is helpful.

How can I idiomatically "remove" a single element from a list in Scala and close the gap?

Lists are immutable in Scala, so I'm trying to figure out how I can "remove" - really, create a new collection - that element and then close the gap created in the list. This sounds to me like it would be a great place to use map, but I don't know how to get started in this instance.
Courses is a list of strings. I need this loop because I actually have several lists that I will need to remove the element at that index from (I'm using multiple lists to store data associated across lists, and I'm doing this by simply ensuring that the indices will always correspond across lists).
for (i <- 0 until courses.length){
if (input == courses(i) {
//I need a map call on each list here to remove that element
//this element is not guaranteed to be at the front or the end of the list
}
}
}
Let me add some detail to the problem. I have four lists that are associated with each other by index; one list stores the course names, one stores the time the class begins in a simple int format (ie 130), one stores either "am" or "pm", and one stores the days of the classes by int (so "MWF" evals to 1, "TR" evals to 2, etc). I don't know if having multiple this is the best or the "right" way to solve this problem, but these are all the tools I have (first-year comp sci student that hasn't programmed seriously since I was 16). I'm writing a function to remove the corresponding element from each lists, and all I know is that 1) the indices correspond and 2) the user inputs the course name. How can I remove the corresponding element from each list using filterNot? I don't think I know enough about each list to use higher order functions on them.
This is the use case of filter:
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.filter(_ != 2)
res1: List[Int] = List(1, 3, 4, 5)
You want to use map when you are transforming all the elements of a list.
To answer your question directly, I think you're looking for patch, for instance to remove element with index 2 ("c"):
List("a","b","c","d").patch(2, Nil, 1) // List(a, b, d)
where Nil is what we're replacing it with, and 1 is the number of characters to replace.
But, if you do this:
I have four lists that are associated with each other by index; one
list stores the course names, one stores the time the class begins in
a simple int format (ie 130), one stores either "am" or "pm", and one
stores the days of the classes by int
you're going to have a bad time. I suggest you use a case class:
case class Course(name: String, time: Int, ampm: String, day: Int)
and then store them in a Set[Course]. (Storing time and days as Ints isn't a great idea either - have a look at java.util.Calendar instead.)
First a few sidenotes:
List is not an index-based structure. All index-oriented operations on it take linear time. For index-oriented algorithms Vector is a much better candidate. In fact if your algorithm requires indexes it's a sure sign that you're really not exposing Scala's functional capabilities.
map serves for transforming a collection of items "A" to the same collection of items "B" using a passed in transformer function from a single "A" to single "B". It cannot change the number of resulting elements. Probably you've confused map with fold or reduce.
To answer on your updated question
Okay, here's a functional solution, which works effectively on lists:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= (courses, timeList, amOrPmList, dateList)
.zipped
.filterNot(_._1 == input)
.unzip4
But there's a catch. I actually came to be quite astonished to find out that functions used in this solution, which are so basic for functional languages, were not present in the standard Scala library. Scala has them for 2 and 3-ary tuples, but not the others.
To solve that you'll need to have the following implicit extensions imported.
implicit class Tuple4Zipped
[ A, B, C, D ]
( val t : (Iterable[A], Iterable[B], Iterable[C], Iterable[D]) )
extends AnyVal
{
def zipped
= t._1.toStream
.zip(t._2).zip(t._3).zip(t._4)
.map{ case (((a, b), c), d) => (a, b, c, d) }
}
implicit class IterableUnzip4
[ A, B, C, D ]
( val ts : Iterable[(A, B, C, D)] )
extends AnyVal
{
def unzip4
= ts.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
}
This implementation requires Scala 2.10 as it utilizes the new effective Value Classes feature for pimping the existing types.
I have actually included these in a small extensions library called SExt, after depending your project on which you'll be able to have them by simply adding an import sext._ statement.
Of course, if you want you can just compose these functions directly into the solution:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= courses.toStream
.zip(timeList).zip(amOrPmList).zip(dateList)
.map{ case (((a, b), c), d) => (a, b, c, d) }
.filterNot(_._1 == input)
.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
Removing and filtering List elements
In Scala you can filter the list to remove elements.
scala> val courses = List("Artificial Intelligence", "Programming Languages", "Compilers", "Networks", "Databases")
courses: List[java.lang.String] = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
Let's remove a couple of classes:
courses.filterNot(p => p == "Compilers" || p == "Databases")
You can also use remove but it's deprecated in favor of filter or filterNot.
If you want to remove by an index you can associate each element in the list with an ordered index using zipWithIndex. So, courses.zipWithIndex becomes:
List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Programming Languages,1), (Compilers,2), (Networks,3), (Databases,4))
To remove the second element from this you can refer to index in the Tuple with courses.filterNot(_._2 == 1) which gives the list:
res8: List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Compilers,2), (Networks,3), (Databases,4))
Lastly, another tool is to use indexWhere to find the index of an arbitrary element.
courses.indexWhere(_ contains "Languages")
res9: Int = 1
Re your update
I'm writing a function to remove the corresponding element from each
lists, and all I know is that 1) the indices correspond and 2) the
user inputs the course name. How can I remove the corresponding
element from each list using filterNot?
Similar to Nikita's update you have to "merge" the elements of each list. So courses, meridiems, days, and times need to be put into a Tuple or class to hold the related elements. Then you can filter on an element of the Tuple or a field of the class.
Combining corresponding elements into a Tuple looks as follows with this sample data:
val courses = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
val meridiems = List(am, pm, am, pm, am)
val times = List(100, 1200, 0100, 0900, 0800)
val days = List(MWF, TTH, MW, MWF, MTWTHF)
Combine them with zip:
courses zip days zip times zip meridiems
val zipped = List[(((java.lang.String, java.lang.String), java.lang.String), java.lang.String)] = List((((Artificial Intelligence,MWF),100),am), (((Programming Languages,TTH),1200),pm), (((Compilers,MW),0100),am), (((Networks,MWF),0900),pm), (((Databases,MTWTHF),0800),am))
This abomination flattens the nested Tuples to a Tuple. There are better ways.
zipped.map(x => (x._1._1._1, x._1._1._2, x._1._2, x._2)).toList
A nice list of tuples to work with.
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Networks,MWF,0900,pm), (Databases,MTWTHF,0800,am))
Finally we can filter based on course name using filterNot. e.g. filterNot(_._1 == "Networks")
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Databases,MTWTHF,0800,am))
The answer I am about to give might be overstepping what you have been taught so far in your course, so if that is the case I apologise.
Firstly, you are right to question whether you should have four lists - fundamentally, it sounds like what you need is an object which represents a course:
/**
* Represents a course.
* #param name the human-readable descriptor for the course
* #param time the time of day as an integer equivalent to
* 12 hour time, i.e. 1130
* #param meridiem the half of the day that the time corresponds
* to: either "am" or "pm"
* #param days an encoding of the days of the week the classes runs.
*/
case class Course(name : String, timeOfDay : Int, meridiem : String, days : Int)
with which you may define an individual course
val cs101 =
Course("CS101 - Introduction to Object-Functional Programming",
1000, "am", 1)
There are better ways to define this type (better representations of 12-hour time, a clearer way to represent the days of the week, etc), but I won't deviate from your original problem statement.
Given this, you would have a single list of courses:
val courses = List(cs101, cs402, bio101, phil101)
And if you wanted to find and remove all courses that matched a given name, you would write:
val courseToRemove = "PHIL101 - Philosophy of Beard Ownership"
courses.filterNot(course => course.name == courseToRemove)
Equivalently, using the underscore syntactic sugar in Scala for function literals:
courses.filterNot(_.name == courseToRemove)
If there was the risk that more than one course might have the same name (or that you are filtering based on some partial criteria using a regular expression or prefix match) and that you only want to remove the first occurrence, then you could define your own function to do that:
def removeFirst(courses : List[Course], courseToRemove : String) : List[Course] =
courses match {
case Nil => Nil
case head :: tail if head == courseToRemove => tail
case head :: tail => head :: removeFirst(tail)
}
Use the ListBuffer is a mutable List like a java list
var l = scala.collection.mutable.ListBuffer("a","b" ,"c")
print(l) //ListBuffer(a, b, c)
l.remove(0)
print(l) //ListBuffer(b, c)

Example of the Scala aggregate function

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.
Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:
val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))
After applying aggregate:
Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))
Also, can you give example of parameters z, segop, and combop? I'm unclear on what these parameters do.
Let's see if some ascii art doesn't help. Consider the type signature of aggregate:
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
Also, note that A refers to the type of the collection. So, let's say we have 4 elements in this collection, then aggregate might work like this:
z A z A z A z A
\ / \ /seqop\ / \ /
B B B B
\ / combop \ /
B _ _ B
\ combop /
B
Let's see a practical example of that. Say I have a GenSeq("This", "is", "an", "example"), and I want to know how many characters there are in it. I can write the following:
Note the use of par in the below snippet of code. The second function passed to aggregate is what is called after the individual sequences are computed. Scala is only able to do this for sets that can be parallelized.
import scala.collection.GenSeq
val seq = GenSeq("This", "is", "an", "example")
val chars = seq.par.aggregate(0)(_ + _.length, _ + _)
So, first it would compute this:
0 + "This".length // 4
0 + "is".length // 2
0 + "an".length // 2
0 + "example".length // 7
What it does next cannot be predicted (there are more than one way of combining the results), but it might do this (like in the ascii art above):
4 + 2 // 6
2 + 7 // 9
At which point it concludes with
6 + 9 // 15
which gives the final result. Now, this is a bit similar in structure to foldLeft, but it has an additional function (B, B) => B, which fold doesn't have. This function, however, enables it to work in parallel!
Consider, for example, that each of the four computations initial computations are independent of each other, and can be done in parallel. The next two (resulting in 6 and 9) can be started once their computations on which they depend are finished, but these two can also run in parallel.
The 7 computations, parallelized as above, could take as little as the same time 3 serial computations.
Actually, with such a small collection the cost in synchronizing computation would be big enough to wipe out any gains. Furthermore, if you folded this, it would only take 4 computations total. Once your collections get larger, however, you start to see some real gains.
Consider, on the other hand, foldLeft. Because it doesn't have the additional function, it cannot parallelize any computation:
(((0 + "This".length) + "is".length) + "an".length) + "example".length
Each of the inner parenthesis must be computed before the outer one can proceed.
The aggregate function does not do that (except that it is a very general function, and it could be used to do that). You want groupBy. Close to at least. As you start with a Seq[(String, String)], and you group by taking the first item in the tuple (which is (String, String) => String), it would return a Map[String, Seq[(String, String)]). You then have to discard the first parameter in the Seq[String, String)] values.
So
list.groupBy(_._1).mapValues(_.map(_._2))
There you get a Map[String, Seq[(String, String)]. If you want a Seq instead of Map, call toSeq on the result. I don't think you have a guarantee on the order in the resulting Seq though
Aggregate is a more difficult function.
Consider first reduceLeft and reduceRight.
Let as be a non empty sequence as = Seq(a1, ... an) of elements of type A, and f: (A,A) => A be some way to combine two elements of type A into one. I will note it as a binary operator #, a1 # a2 rather than f(a1, a2). as.reduceLeft(#) will compute (((a1 # a2) # a3)... # an). reduceRight will put the parentheses the other way, (a1 # (a2 #... # an)))). If # happens to be associative, one does not care about the parentheses. One could compute it as (a1 #... # ap) # (ap+1 #...#an) (there would be parantheses inside the 2 big parantheses too, but let's not care about that). Then one could do the two parts in parallel, while the nested bracketing in reduceLeft or reduceRight force a fully sequential computation. But parallel computation is only possible when # is known to be associative, and the reduceLeft method cannot know that.
Still, there could be method reduce, whose caller would be responsible for ensuring that the operation is associative. Then reduce would order the calls as it sees fit, possibly doing them in parallel. Indeed, there is such a method.
There is a limitation with the various reduce methods however. The elements of the Seq can only be combined to a result of the same type: # has to be (A,A) => A. But one could have the more general problem of combining them into a B. One starts with a value b of type B, and combine it with every elements of the sequence. The operator # is (B,A) => B, and one computes (((b # a1) # a2) ... # an). foldLeft does that. foldRight does the same thing but starting with an. There, the # operation has no chance to be associative. When one writes b # a1 # a2, it must mean (b # a1) # a2, as (a1 # a2) would be ill-typed. So foldLeft and foldRight have to be sequential.
Suppose however, that each A can be turned into a B, let's write it with !, a! is of type B. Suppose moreover that there is a + operation (B,B) => B, and that # is such that b # a is in fact b + a!. Rather than combining elements with #, one could first transform all of them to B with !, then combine them with +. That would be as.map(!).reduceLeft(+). And if + is associative, then that can be done with reduce, and not be sequential: as.map(!).reduce(+). There could be an hypothetical method as.associativeFold(b, !, +).
Aggregate is very close to that. It may be however, that there is a more efficient way to implement b#a than b+a! For instance, if type B is List[A], and b#a is a::b, then a! will be a::Nil, and b1 + b2 will be b2 ::: b1. a::b is way better than (a::Nil):::b. To benefit from associativity, but still use #, one first splits b + a1! + ... + an!, into (b + a1! + ap!) + (ap+1! + ..+ an!), then go back to using # with (b # a1 # an) + (ap+1! # # an). One still needs the ! on ap+1, because one must start with some b. And the + is still necessary too, appearing between the parantheses. To do that, as.associativeFold(!, +) could be changed to as.optimizedAssociativeFold(b, !, #, +).
Back to +. + is associative, or equivalently, (B, +) is a semigroup. In practice, most of the semigroups used in programming happen to be monoids too, i.e they contain a neutral element z (for zero) in B, so that for each b, z + b = b + z = b. In that case, the ! operation that make sense is likely to be be a! = z # a. Moreover, as z is a neutral element b # a1 ..# an = (b + z) # a1 # an which is b + (z + a1 # an). So is is always possible to start the aggregation with z. If b is wanted instead, you do b + result at the end. With all those hypotheses, we can do as.aggregate(z, #, +). That is what aggregate does. # is the seqop argument (applied in a sequence z # a1 # a2 # ap), and + is combop (applied to already partially combined results, as in (z + a1#...#ap) + (z + ap+1#...#an)).
To sum it up, as.aggregate(z)(seqop, combop) computes the same thing as as.foldLeft(z)( seqop) provided that
(B, combop, z) is a monoid
seqop(b,a) = combop(b, seqop(z,a))
aggregate implementation may use the associativity of combop to group the computations as it likes (not swapping elements however, + has not to be commutative, ::: is not). It may run them in parallel.
Finally, solving the initial problem using aggregate is left as an exercise to the reader. A hint: implement using foldLeft, then find z and combo that will satisfy the conditions stated above.
The signature for a collection with elements of type A is:
def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
z is an object of type B acting as a neutral element. If you want to count something, you can use 0, if you want to build a list, start with an empty list, etc.
segop is analoguous to the function you pass to fold methods. It takes two argument, the first one is the same type as the neutral element you passed and represent the stuff which was already aggregated on previous iteration, the second one is the next element of your collection. The result must also by of type B.
combop: is a function combining two results in one.
In most collections, aggregate is implemented in TraversableOnce as:
def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B
= foldLeft(z)(seqop)
Thus combop is ignored. However, it makes sense for parallel collections, becauseseqop will first be applied locally in parallel, and then combopis called to finish the aggregation.
So for your example, you can try with a fold first:
val seqOp =
(map:Map[String,Set[String]],tuple: (String,String)) =>
map + ( tuple._1 -> ( map.getOrElse( tuple._1, Set[String]() ) + tuple._2 ) )
list.foldLeft( Map[String,Set[String]]() )( seqOp )
// returns: Map(one -> Set(i, 1), two -> Set(2, ii), four -> Set(iv))
Then you have to find a way of collapsing two multimaps:
val combOp = (map1: Map[String,Set[String]], map2: Map[String,Set[String]]) =>
(map1.keySet ++ map2.keySet).foldLeft( Map[String,Set[String]]() ) {
(result,k) =>
result + ( k -> ( map1.getOrElse(k,Set[String]() ) ++ map2.getOrElse(k,Set[String]() ) ) )
}
Now, you can use aggregate in parallel:
list.par.aggregate( Map[String,Set[String]]() )( seqOp, combOp )
//Returns: Map(one -> Set(i, 1), two -> Set(2, ii), four -> Set(iv))
Applying the method "par" to list, thus using the parallel collection(scala.collection.parallel.immutable.ParSeq) of the list to really take advantage of the multi core processors. Without "par", there won't be any performance gain since the aggregate is not done on the parallel collection.
aggregate is like foldLeft but may executed in parallel.
As missingfactor says, the linear version of aggregate(z)(seqop, combop) is equivalent to foldleft(z)(seqop). This is however impractical in the parallel case, where we would need to combine not only the next element with the previous result (as in a normal fold) but we want to split the iterable into sub-iterables on which we call aggregate and need to combine those again. (In left-to-right order but not associative as we might have combined the last parts before the fist parts of the iterable.) This re-combining in in general non-trivial, and therefore, one needs a method (S, S) => S to accomplish that.
The definition in ParIterableLike is:
def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}
which indeed uses combop.
For reference, Aggregate is defined as:
protected[this] class Aggregate[S](z: S, seqop: (S, T) => S, combop: (S, S) => S, protected[this] val pit: IterableSplitter[T])
extends Accessor[S, Aggregate[S]] {
#volatile var result: S = null.asInstanceOf[S]
def leaf(prevr: Option[S]) = result = pit.foldLeft(z)(seqop)
protected[this] def newSubtask(p: IterableSplitter[T]) = new Aggregate(z, seqop, combop, p)
override def merge(that: Aggregate[S]) = result = combop(result, that.result)
}
The important part is merge where combop is applied with two sub-results.
Here is the blog on how aggregate enable performance on the multi cores processor with bench mark.
http://markusjais.com/scalas-parallel-collections-and-the-aggregate-method/
Here is video on "Scala parallel collections" talk from "Scala Days 2011".
http://days2011.scala-lang.org/node/138/272
The description on the video
Scala Parallel Collections
Aleksandar Prokopec
Parallel programming abstractions become increasingly important as the number of processor cores grows. A high-level programming model enables the programmer to focus more on the program and less on low-level details such as synchronization and load-balancing. Scala parallel collections extend the programming model of the Scala collection framework, providing parallel operations on datasets.
The talk will describe the architecture of the parallel collection framework, explaining their implementation and design decisions. Concrete collection implementations such as parallel hash maps and parallel hash tries will be described. Finally, several example applications will be shown, demonstrating the programming model in practice.
The definition of aggregate in TraversableOnce source is:
def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B =
foldLeft(z)(seqop)
which is no different than a simple foldLeft. combop doesn't seem to be used anywhere. I am myself confused as to what the purpose of this method is.
Just to clarify explanations of those before me, in theory the idea is that
aggregate should work like this, (I have changed the names of the parameters to make them clearer):
Seq(1,2,3,4).aggragate(0)(
addToPrev = (prev,curr) => prev + curr,
combineSums = (sumA,sumB) => sumA + sumB)
Should logically translate to
Seq(1,2,3,4)
.grouped(2) // split into groups of 2 members each
.map(prevAndCurrList => prevAndCurrList(0) + prevAndCurrList(1))
.foldLeft(0)(sumA,sumB => sumA + sumB)
Because the aggregation and mapping are separate, the original list could theoretically be split into different groups of different sizes and run in parallel or even on different machine.
In practice scala current implementation does not support this feature by default but you can do this in your own code.