Parallel Aggregates in Scala using associative operators - scala

I want to perform aggregate of a list of values in Scala. Here are few considerations:
aggregate function [1] is associative as well as commutative: examples are plus and multiply
this function is applied to the list in parallel so as to utilize all the cores of CPU
Here is an implementation:
package com.example.reactive
import scala.concurrent.Future
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
object AggregateParallel {
private def pm[T](l: List[Future[T]])(zero: T)(fn: (T, T) => T): Future[T] = {
val l1 = l.grouped(2)
val l2 = l1.map { sl =>
sl match {
case x :: Nil => x
case x :: y :: Nil =>
for (a <- x; b <- y) yield fn(a, b)
case _ => Future(zero)
}
}.toList
l2 match {
case x :: Nil => x
case x :: xs => pm(l2)(zero)(fn)
case Nil => Future(zero)
}
}
def parallelAggregate[T](l: List[T])(zero: T)(fn: (T, T) => T): T = {
val n = pm(l.map(Future(_)))(zero)(fn)
Await.result(n, 1000 millis)
n.value.get.get
}
def main(args: Array[String]) {
// multiply empty list: zero value is 1
println(parallelAggregate(List[Int]())(1)((x, y) => x * y))
// multiply a list: zero value is 1
println(parallelAggregate(List(1, 2, 3, 4, 5))(1)((x, y) => x * y))
// sum a list: zero value is 0
println(parallelAggregate(List(1, 2, 3, 4, 5))(0)((x, y) => x + y))
// sum a list: zero value is 0
val bigList1 = List(1, 2, 3, 4, 5).map(BigInt(_))
println(parallelAggregate(bigList1)(0)((x, y) => x + y))
// sum a list of BigInt: zero value is 0
val bigList2 = (1 to 100).map(BigInt(_)).toList
println(parallelAggregate(bigList2)(0)((x, y) => x + y))
// multiply a list of BigInt: zero value is 1
val bigList3 = (1 to 100).map(BigInt(_)).toList
println(parallelAggregate(bigList3)(1)((x, y) => x * y))
}
}
OUTPUT:
1
120
15
15
5050
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
How else can I achieve the same objective or improve this code in Scala?
EDIT1:
I have implemented bottom up aggregate. I think I am quite close to the aggregate method in Scala ( below). The difference being that I am only splitting into sub lists of two elements:
Scala implementation:
def aggregate[S](z: S)(seqop: (S, T) => S, combop: (S, S) => S): S = {
executeAndWaitResult(new Aggregate(z, seqop, combop, splitter))
}
With this implementation I assume that the aggregate happens in parallel like so:
List(1,2,3,4,5,6)
-> split parallel -> List(List(1,2), List(3,4), List(5,6) )
-> execute in parallel -> List( 3, 7, 11 )
-> split parallel -> List(List(3,7), List(11) )
-> execute in parallel -> List( 10, 11)
-> Result is 21
Is that correct to assume that Scala aggregate is also doing bottom-up aggregates in parallel?
[1] http://www.mathsisfun.com/associative-commutative-distributive.html

The parallel lists of scala already have an aggregate method that do you just what you're asking for:
http://markusjais.com/scalas-parallel-collections-and-the-aggregate-method/
It works like foldLeft but takes an extra parameter:
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
When called on a parallel collection aggregate splits the collection in N parts, uses foldLeft parrallely on each parts, and uses combop to unit all the results.
But when called on a non parallel collection aggregate just works like foldLeft and ignores the combop.
To have consistent results you need associative and commutative operators because you don't control how the list will be split at all.
A short example:
List(1, 2, 3, 4, 5).par.aggregate(1)(_ * _, _ * _)
res0: Int = 120
Answer to Edit1 (improved according to comment):
I don't think it's the right approach, for a N items list you'll create N Futures. Which creates a big overhead in scheduling. Unless the seqop is really long, I'll avoid creating a Future each time you are calling it.

Related

SortedSet fold type mismatch

I have this code:
def distinct(seq: Seq[Int]): Seq[Int] =
seq.fold(SortedSet[Int]()) ((acc, i) => acc + i)
I want to iterate over seq, delete duplicates (keep the first number) and keep order of the numbers. My idea was to use a SortedSet as an acc.
But I am getting:
Type mismatch:
Required: String
Found: Any
How to solve this? (I also don't know how to convert SortedSet to Seq in the final iteration as I want distinct to return seq)
p.s. without using standard seq distinct method
Online code
You shouldn't use fold if you try to accumulate something with different type than container (SortedSet != Int) in your case. Look at signature fold:
def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
it takes accumulator with type A1 and combiner function (A1, A1) => A1 which combines two A1 elements.
In your case is better to use foldLeft which takes accumulator with different type than container:
def foldLeft[B](z: B)(op: (B, A) => B): B
it accumulates some B value using seed z and combiner from B and A to B.
In your case I would like to use LinkedHashSet it keeps the order of added elements and remove duplicates, look:
import scala.collection.mutable
def distinct(seq: Seq[Int]): Seq[Int] = {
seq.foldLeft(mutable.LinkedHashSet.empty[Int])(_ + _).toSeq
}
distinct(Seq(7, 2, 4, 2, 3, 0)) // ArrayBuffer(7, 2, 4, 3, 0)
distinct(Seq(0, 0, 0, 0)) // ArrayBuffer(0)
distinct(Seq(1, 5, 2, 7)) // ArrayBuffer(1, 5, 2, 7)
and after folding just use toSeq
be careful, lambda _ + _ is just syntactic sugar for combiner:
(linkedSet, nextElement) => linkedSet + nextElement
I would just call distinct on your Seq. You can see in the source-code of SeqLike, that distinct will just traverse the Seq und skip already seen data:
def distinct: Repr = {
val b = newBuilder
val seen = mutable.HashSet[A]()
for (x <- this) {
if (!seen(x)) {
b += x
seen += x
}
}
b.result
}

Scala Split Seq of Int from right when cumulative results meet the condition

Have a sequence of positive integer, need to split the sequence from the right at the element when the sum till that element is less than or equal to a threshold. For example,
val seq = Seq(9,8,7,6,5,4,3,2,1)
The threshold is 10, so the result is
Seq(9,8,7,6,5) and Seq(4,3,2,1)
I tried dropWhile and scanLeft after reverse, however, they are either quadratic or linear but complicated. Since our sequence may be very long, but normally the threshold is small and very few elements from the right side will meet the condition. I am wondering if there is any better and linear way to do it.
This will stop as soon as the threshold is met. Unfortunately it uses a return to break.
val seq = Seq(9,8,7,6,5,4,3,2,1)
val threshold = 10
def processList(): (Seq[Int], Int) = {
seq.foldRight((Seq[Int](), 0)) {
case (elem, (acc, total)) =>
if (total + elem <= threshold) {
(elem +: acc, total + elem)
} else {
return (acc, total)
}
}
}
processList()
Looks like there's not a great way to do this with built-in methods, but you can implement it yourself:
def splitRightCumulative[A, B](xs: Seq[A])(start: B)(f: (B, A) => B, cond: B => Boolean): (Seq[A], Seq[A]) = {
def _loop(current: B, xs: Seq[A], acc: Seq[A]): (Seq[A], Seq[A]) = {
val next = f(current, xs.head)
if (cond(next)) {
_loop(next, xs.tail, xs.head +: acc)
} else {
(xs.reverse, acc)
}
}
_loop(start, xs.reverse, Seq.empty)
}
val xs = List(9, 8, 7, 6, 5, 4, 3, 2, 1)
val (left, right) = splitRightCumulative(xs)(0)(_ + _, _ <= 10)
The second type parameter (B) might not be necessary if you're always accumulating the same type as what's in your collection.

How to elegantly extract range of list based on specific criteria?

I want to extract range of elements from a list, meeting the following requirements:
First element of range has to be an element previous to element matching specific condition
Last element of range has to be an element next to element matching specific condition
Example: For list (1,1,1,10,2,10,1,1,1) and condition x >= 10 I want to get (1,10,2,10,1)
This is very simple to program imperatively, but I am just wondering if there is some smart Scala-functional way to achieve it. Is it?
Keeping it in the scala standard lib, I would solve this using recursion:
def f(_xs: List[Int])(cond: Int => Boolean): List[Int] = {
def inner(xs: List[Int], res: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: y :: tail if cond(y) && res.isEmpty => inner(tail, res ++ (x :: y :: Nil))
case x :: y :: tail if cond(x) && res.nonEmpty => res ++ (x :: y :: Nil)
case x :: tail if res.nonEmpty => inner(tail, res :+ x)
case x :: tail => inner(tail, res)
}
inner(_xs, Nil)
}
scala> f(List(1,1,1,10,2,10,1,1,1))(_ >= 10)
res3: List[Int] = List(1, 10, 2, 10, 1)
scala> f(List(2,10,2,10))(_ >= 10)
res4: List[Int] = List()
scala> f(List(2,10,2,10,1))(_ >= 10)
res5: List[Int] = List(2, 10, 2, 10, 1)
Maybe there is something I did not think of in this solution, or I missunderstood something, but I think you will get the basic idea.
Good functional algorithm design practice is all about breaking complex problems into simpler ones.
The principle is called Divide and Conquer.
It's easy to extract two simpler subproblems from the subject problem:
Get a list of all elements after the matching one, preceded with this matching element,
preceded with an element before it.
Get a list of all elements up to the latest matching one, followed by the matching element and
the element after it.
The named problems are simple enough for the appropriate functions to be implemented, so no subdivision is required.
Here's the implementation of the first function:
def afterWithPredecessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: tail if test( a ) => Nil // since there is no predecessor
case a :: b :: tail if test( b ) => a :: b :: tail
case a :: tail => afterWithPredecessor( tail )( test )
}
Since the second problem can be seen as a direct inverse of the first one, it can be easily implemented by reversing the input and output:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= afterWithPredecessor( elements.reverse )( test ).reverse
But here's an optimized version of this:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: b :: tail if test( a ) =>
a :: b :: beforeWithSuccessor( tail )( test )
case a :: tail =>
beforeWithSuccessor( tail )( test ) match {
case Nil => Nil
case r => a :: r
}
}
Finally, composing the above functions together to produce the function solving your problem becomes quite trivial:
def range[ A ]( elements : List[ A ] )( test : A => Boolean ) : List[ A ]
= beforeWithSuccessor( afterWithPredecessor( elements )( test ) )( test )
Tests:
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 10 )
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 1 )
res1: List[Int] = List()
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ == 2 )
res2: List[Int] = List(10, 2, 10)
The second test returns an empty list since the outermost elements satisfying the predicate have no predecessors (or successors).
def range[T](elements: List[T], condition: T => Boolean): List[T] = {
val first = elements.indexWhere(condition)
val last = elements.lastIndexWhere(condition)
elements.slice(first - 1, last + 2)
}
scala> range[Int](List(1,1,1,10,2,10,1,1,1), _ >= 10)
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range[Int](List(2,10,2,10), _ >= 10)
res1: List[Int] = List(2, 10, 2, 10)
scala> range[Int](List(), _ >= 10)
res2: List[Int] = List()
Zip and map to the rescue
val l = List(1, 1, 1, 10, 2, 1, 1, 1)
def test (i: Int) = i >= 10
((l.head :: l) zip (l.tail :+ l.last)) zip l filter {
case ((a, b), c) => (test (a) || test (b) || test (c) )
} map { case ((a, b), c ) => c }
That should work. I only have my smartphone and am miles from anywhere I could test this, so apologise for any typos or minor syntax errors
Edit: works now. I hope it's obvious that my solution shuffles the list to the right and to the left to create two new lists. When these are zipped together and zipped again with the original list, the result is a list of tuples, each containing the original element and a tuple of its neighbours. This is then trivial to filter and map back to a simple list.
Making this into a more general function (and using collect rather than filter -> map)...
def filterWithNeighbours[E](l: List[E])(p: E => Boolean) = l match {
case Nil => Nil
case li if li.size < 3 => if (l exists p) l else Nil
case _ => ((l.head :: l) zip (l.tail :+ l.last)) zip l collect {
case ((a, b), c) if (p (a) || p (b) || p (c) ) => c
}
}
This is less efficient than the recursive solution but makes the test much simpler and more clear. It can be difficult to match the right sequence of patterns in a recursive solution, as the patterns often express the shape of the chosen implementation rather than the original data. With the simple functional solution, each element is clearly and simply being compared to its neighbours.

Calculating differences of subsequent elements of a sequence in scala

I would like to do almost exactly this in scala. Is there an elegant way?
Specifically, I just want the difference of adjacent elements in a sequence. For example
input = 1,2,6,9
output = 1,4,3
How about this?
scala> List(1, 2, 6, 9).sliding(2).map { case Seq(x, y, _*) => y - x }.toList
res0: List[Int] = List(1, 4, 3)
Here is one that uses recursion and works best on Lists
def differences(l:List[Int]) : List[Int] = l match {
case a :: (rest # b :: _) => (b - a) :: differences(rest)
case _ => Nil
}
And here is one that should be pretty fast on Vector or Array:
def differences(a:IndexedSeq[Int]) : IndexedSeq[Int] =
a.indices.tail.map(i => a(i) - a(i-1))
Of course there is always this:
def differences(a:Seq[Int]) : Seq[Int] =
a.tail.zip(a).map { case (x,y) => x - y }
Note that only the recursive version handles empty lists without an exception.

Scala - can a lambda parameter match a tuple?

So say i have some list like
val l = List((1, "blue"), (5, "red"), (2, "green"))
And then i want to filter one of them out, i can do something like
val m = l.filter(item => {
val (n, s) = item // "unpack" the tuple here
n != 2
}
Is there any way i can "unpack" the tuple as the parameter to the lambda directly, instead of having this intermediate item variable?
Something like the following would be ideal, but eclipse tells me wrong number of parameters; expected=1
val m = l.filter( (n, s) => n != 2 )
Any help would be appreciated - using 2.9.0.1
This is about the closest you can get:
val m = l.filter { case (n, s) => n != 2 }
It's basically pattern matching syntax inside an anonymous PartialFunction. There are also the tupled methods in Function object and traits, but they are just a wrapper around this pattern matching expression.
Hmm although Kipton has a good answer. You can actually make this shorter by doing.
val l = List((1, "blue"), (5, "red"), (2, "green"))
val m = l.filter(_._1 != 2)
There are a bunch of options:
for (x <- l; (n,s) = x if (n != 2)) yield x
l.collect{ case x # (n,s) if (n != 2) => x }
l.filter{ case (n,s) => n != 2 }
l.unzip.zipped.map((n,s) => n != 2).zip // Complains that zip is deprecated
val m = l.filter( (n, s) => n != 2 )
... is a type mismatch because that lambda defines a
Function2[String,Int,Boolean] with two parameters instead of
Function1[(String,Int),Boolean] with one Tuple2[String,Int] as its parameter.
You can convert between them like this:
val m = l.filter( ((n, s) => n != 2).tupled )
I've pondered the same, and came to your question today.
I'm not very fond of the partial function approaches (anything having case) since they imply that there could be more entry points for the logic flow. At least to me, they tend to blur the intention of the code. On the other hand, I really do want to go straight to the tuple fields, like you.
Here's a solution I drafted today. It seems to work, but I haven't tried it in production, yet.
object unTuple {
def apply[A, B, X](f: (A, B) => X): (Tuple2[A, B] => X) = {
(t: Tuple2[A, B]) => f(t._1, t._2)
}
def apply[A, B, C, X](f: (A, B, C) => X): (Tuple3[A, B, C] => X) = {
(t: Tuple3[A, B, C]) => f(t._1, t._2, t._3)
}
//...
}
val list = List( ("a",1), ("b",2) )
val list2 = List( ("a",1,true), ("b",2,false) )
list foreach unTuple( (k: String, v: Int) =>
println(k, v)
)
list2 foreach unTuple( (k: String, v: Int, b: Boolean) =>
println(k, v, b)
)
Output:
(a,1)
(b,2)
(a,1,true)
(b,2,false)
Maybe this turns out to be useful. The unTuple object should naturally be put aside in some tool namespace.
Addendum:
Applied to your case:
val m = l.filter( unTuple( (n:Int,color:String) =>
n != 2
))