Scala Parallel Collections- How to return early? - scala

I have a list of possible input Values
val inputValues = List(1,2,3,4,5)
I have a really long to compute function that gives me a result
def reallyLongFunction( input: Int ) : Option[String] = { ..... }
Using scala parallel collections, I can easily do
inputValues.par.map( reallyLongFunction( _ ) )
To get what all the results are, in parallel. The problem is, I don't really want all the results, I only want the FIRST result. As soon as one of my input is a success, I want my output, and want to move on with my life. This did a lot of extra work.
So how do I get the best of both worlds? I want to
Get the first result that returns something from my long function
Stop all my other threads from useless work.
Edit -
I solved it like a dumb java programmer by having
#volatile var done = false;
Which is set and checked inside my reallyLongFunction. This works, but does not feel very scala. Would like a better way to do this....

(Updated: no, it doesn't work, doesn't do the map)
Would it work to do something like:
inputValues.par.find({ v => reallyLongFunction(v); true })
The implementation uses this:
protected[this] class Find[U >: T](pred: T => Boolean, protected[this] val pit: IterableSplitter[T]) extends Accessor[Option[U], Find[U]] {
#volatile var result: Option[U] = None
def leaf(prev: Option[Option[U]]) = { if (!pit.isAborted) result = pit.find(pred); if (result != None) pit.abort }
protected[this] def newSubtask(p: IterableSplitter[T]) = new Find(pred, p)
override def merge(that: Find[U]) = if (this.result == None) result = that.result
}
which looks pretty similar in spirit to your #volatile except you don't have to look at it ;-)

I took interpreted your question in the same way as huynhjl, but if you just want to search and discardNones, you could do something like this to avoid the need to repeat the computation when a suitable outcome is found:
class Computation[A,B](value: A, function: A => B) {
lazy val result = function(value)
}
def f(x: Int) = { // your function here
Thread.sleep(100 - x)
if (x > 5) Some(x * 10)
else None
}
val list = List.range(1, 20) map (i => new Computation(i, f))
val found = list.par find (_.result.isDefined)
//found is Option[Computation[Int,Option[Int]]]
val result = found map (_.result.get)
//result is Option[Int]
However find for parallel collections seems to do a lot of unnecessary work (see this question), so this might not work well, with current versions of Scala at least.
Volatile flags are used in the parallel collections (take a look at the source for find, exists, and forall), so I think your idea is a good one. It's actually better if you can include the flag in the function itself. It kills referential transparency on your function (i.e. for certain inputs your function now sometimes returns None rather than Some), but since you're discarding the stopped computations, this shouldn't matter.

If you're willing to use a non-core library, I think Futures would be a good match for this task. For instance:
Akka's Futures include Futures.firstCompletedOf
Twitter's Futures include Future.select
...both of which appear to enable the functionality you're looking for.

Related

Conditional chain of futures

I have a sequence of parameters. For each parameter I have to perform DB query, which may or may not return a result. Simply speaking, I need to stop after the first result is non-empty. Of course, I would like to avoid doing unnecessary calls. The caveat is - I need to have this operation(s) contained as a another Future - or any "most reactive" approach.
Speaking of code:
//that what I have
def dbQuery(p:Param): Future[Option[Result]] = {}
//my list of params
val input = Seq(p1,p2,p3)
//that what I need to implements
def getFirstNonEmpty(params:Seq[Param]): Future[Option[Result]]
I know I can possibly just wrap entire function in yet another Future and execute code sequentially (Await? Brrr...), but that not the cleanest solution.
Can I somehow create lazy initialized collection of futures, like
params.map ( p => FutureWhichWontStartUnlessAskedWhichWrapsOtherFuture { dbQuery(p) }).findFirst(!_.isEmpty())
I believe it's possible!
What do you think about something like this?
def getFirstNonEmpty(params: Seq[Param]): Future[Option[Result]] = {
params.foldLeft(Future.successful(Option.empty[Result])) { (accuFtrOpt, param) =>
accuFtrOpt.flatMap {
case None => dbQuery(param)
case result => Future.successful(result)
}
}
}
This might be overkill, but if you are open to using scalaz we can do this using OptionT and foldMap.
With OptionT we sort of combine Future and Option into one structure. We can get the first of two Futures with a non-empty result using OptionT.orElse.
import scalaz._, Scalaz._
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
val someF: Future[Option[Int]] = Future.successful(Some(1))
val noneF: Future[Option[Int]] = Future.successful(None)
val first = OptionT(noneF) orElse OptionT(someF)
first.run // Future[Option[Int]] = Success(Some(1))
We could now get the first non-empty Future from a List with reduce from the standard library (this will however run all the Futures) :
List(noneF, noneF, someF).map(OptionT.apply).reduce(_ orElse _).run
But with a List (or other collection) we can't be sure that there is at least one element, so we need to use fold and pass a start value. Scalaz can do this work for us by using a Monoid. The Monoid[OptionT[Future, Int]] we will use will supply the start value and combine the Futures with the orElse used above.
type Param = Int
type Result = Int
type FutureO[x] = OptionT[Future, x]
def query(p: Param): Future[Option[Result]] =
Future.successful{ println(p); if (p > 2) Some(p) else None }
def getFirstNonEmpty(params: List[Param]): Future[Option[Result]] = {
implicit val monoid = PlusEmpty[FutureO].monoid[Result]
params.foldMap(p => OptionT(query(p))).run
}
val result = getFirstNonEmpty(List(1,2,3,4))
// prints 1, 2, 3
result.foreach(println) // Some(3)
This is an old question, but if someone comes looking for an answer, here is my take. I solved it for a use case that required me to loop through a limited number of futures sequentially and stop when the first of them returned a result.
I did not need a library for my use-case, a light-weight combination of recursion and pattern matching was sufficient. Although the question here does not have the same problem as a sequence of futures, looping through a sequence of parameters would be similar.
Here would be the pseudo-code based on recursion.
I have not compiled this, fix the types being matched/returned.
def getFirstNonEmpty(params: Seq[Param]): Future[Option[Result]] = {
if (params.isEmpty) {
Future.successful(None)
} else {
val head = params.head
dbQuery(head) match {
case Some(v) => Future.successful(Some(v))
case None => getFirstNonEmpty(params.tail)
}
}
}

Does scala have a lazy evaluating wrapper?

I want to return a wrapper/holder for a result that I want to compute only once and only if the result is actually used. Something like:
def getAnswer(question: Question): Lazy[Answer] = ???
println(getAnswer(q).value)
This should be pretty easy to implement using lazy val:
class Lazy[T](f: () => T) {
private lazy val _result = Try(f())
def value: T = _result.get
}
But I'm wondering if there's already something like this baked into the standard API.
A quick search pointed at Streams and DelayedLazyVal but neither is quite what I'm looking for.
Streams do memoize the stream elements, but it seems like the first element is computed at construction:
def compute(): Int = { println("computing"); 1 }
val s1 = compute() #:: Stream.empty
// computing is printed here, before doing s1.take(1)
In a similar vein, DelayedLazyVal starts computing upon construction, even requires an execution context:
val dlv = new DelayedLazyVal(() => 1, { println("started") })
// immediately prints out "started"
There's scalaz.Need which I think you'd be able to use for this.

Scala parallelization method

Having a look at the forall method implementation, how could this method be parallel?
def forall(p: A => Boolean): Boolean = {
var result = true
breakable {
for (x <- this)
if (!p(x)) { result = false; break }
}
result
}
As I understand for better parallelization, avoid using var's and prefer val's. How is this now supposed to work if I use forall?
You are looking at a non-parallel version of forall.
A parallel version looks like this (provided by a parallel collection, in this case ParIterableLike):
def forall(pred: T => Boolean): Boolean = {
tasksupport.executeAndWaitResult
(new Forall(pred, splitter assign new DefaultSignalling with VolatileAbort))
}
To get a parallel collection, insert a .par, for example:
List.range(1, 10).par.forall(n => n % 2 == 0)
If you use an IDE like IntelliJ, just "zoom" into that forall, and you'll see the code above. Otherwise here is the source of ParIterableLike (thanks #Kigyo for sharing the URL).

Efficient groupwise aggregation on Scala collections

I often need to do something like
coll.groupBy(f(_)).mapValues(_.foldLeft(x)(g(_,_)))
What is the best way to achieve the same effect, but avoid explicitly constructing the intermediate collections with groupBy?
You could fold the initial collection over a map holding your intermediate results:
def groupFold[A,B,X](as: Iterable[A], f: A => B, init: X, g: (X,A) => X): Map[B,X] =
as.foldLeft(Map[B,X]().withDefaultValue(init)){
case (m,a) => {
val key = f(a)
m.updated(key, g(m(key),a))
}
}
You said collection and I wrote Iterable, but you have to think whether order matters in the fold in your question.
If you want efficient code, you will probably use a mutable map as in Rex' answer.
You can't really do it as a one-liner, so you should be sure you need it before writing something more elaborate like this (written from a somewhat performance-minded view since you asked for "efficient"):
final case class Var[A](var value: A) { }
def multifold[A,B,C](xs: Traversable[A])(f: A => B)(zero: C)(g: (C,A) => C) = {
import scala.collection.JavaConverters._
val m = new java.util.HashMap[B, Var[C]]
xs.foreach{ x =>
val v = {
val fx = f(x)
val op = m.get(fx)
if (op != null) op
else { val nv = Var(zero); m.put(fx, nv); nv }
}
v.value = g(v.value, x)
}
m.asScala.mapValues(_.value)
}
(Depending on your use case you may wish to pack into an immutable map instead in the last step.) Here's an example of it in action:
scala> multifold(List("salmon","herring","haddock"))(_(0))(0)(_ + _.length)
res1: scala.collection.mutable.HashMap[Char,Int] = Map(h -> 14, s -> 6)
Now, you might notice something weird here: I'm using a Java HashMap. This is because Java's HashMaps are 2-3x faster than Scala's. (You can write the equivalent thing with a Scala HashMap, but it doesn't actually make things any faster than your original.) Consequently, this operation is 2-3x faster than what you posted. But unless you're under severe memory pressure, creating the transient collections doesn't actually hurt you much.

Repeating function call until we'll get non-empty Option result in Scala

A very newbie question in Scala - how do I do "repeat function until something is returned meets my criteria" in Scala?
Given that I have a function that I'd like to call until it returns the result, for example, defined like that:
def tryToGetResult: Option[MysteriousResult]
I've come up with this solution, but I really feel that it is ugly:
var res: Option[MysteriousResult] = None
do {
res = tryToGetResult
} while (res.isEmpty)
doSomethingWith(res.get)
or, equivalently ugly:
var res: Option[MysteriousResult] = None
while (res.isEmpty) {
res = tryToGetResult
}
doSomethingWith(res.get)
I really feel like there is a solution without var and without so much hassle around manual checking whether Option is empty or not.
For comparison, Java alternative that I see seems to be much cleaner here:
MysteriousResult tryToGetResult(); // returns null if no result yet
MysteriousResult res;
while ((res = tryToGetResult()) == null);
doSomethingWith(res);
To add insult to injury, if we don't need to doSomethingWith(res) and we just need to return it from this function, Scala vs Java looks like that:
Scala
def getResult: MysteriousResult = {
var res: Option[MysteriousResult] = None
do {
res = tryToGetResult
} while (res.isEmpty)
res.get
}
Java
MysteriousResult getResult() {
while (true) {
MysteriousResult res = tryToGetResult();
if (res != null) return res;
}
}
You can use Stream's continually method to do precisely this:
val res = Stream.continually(tryToGetResult).flatMap(_.toStream).head
Or (possibly more clearly):
val res = Stream.continually(tryToGetResult).dropWhile(!_.isDefined).head
One advantage of this approach over explicit recursion (besides the concision) is that it's much easier to tinker with. Say for example that we decided that we only wanted to try to get the result a thousand times. If a value turns up before then, we want it wrapped in a Some, and if not we want a None. We just add a few characters to our code above:
Stream.continually(tryToGetResult).take(1000).flatMap(_.toStream).headOption
And we have what we want. (Note that the Stream is lazy, so even though the take(1000) is there, if a value turns up after three calls to tryToGetResult, it will only be called three times.)
Performing side effects like this make me die a little inside, but how about this?
scala> import scala.annotation.tailrec
import scala.annotation.tailrec
scala> #tailrec
| def lookupUntilDefined[A](f: => Option[A]): A = f match {
| case Some(a) => a
| case None => lookupUntilDefined(f)
| }
lookupUntilDefined: [A](f: => Option[A])A
Then call it like this
scala> def tryToGetResult(): Option[Int] = Some(10)
tryToGetResult: ()Option[Int]
scala> lookupUntilDefined(tryToGetResult())
res0: Int = 10
You may want to give lookupUntilDefined an additional parameter so it can stop eventually in case f is never defined.