Scala - Wait for all futures completed within time period - scala

I have List[Future[String]] and would like to wait constant period of time in order to collect successfull computation as well as rerun operations for futures that do not complete in specified period of time.
In pseudo code it will look like:
val inputData: List[String] = getInputData()
val futures : List[Future[String]] = inputData.map(toLongRunningIOOperation)
val (completedFutures, unfinishedFutures) = Await.ready(futures, 2 seconds)
val rerunedOperations : List[Future[String]] = unfinisedFutures.map(rerun)
Such solution could be useful if you need to execute several calls to external services whose usual latency is (low p99 < 60ms) but sometimes requests are processed more than 5 seconds (because of current state/load). In such situation it is better to rerun those requests (i.e to another instance of the service).

As an example, use Future.firstCompletedOf function to get timed future
def futureToFutureOption[T](f: Future[T]): Future[Option[T]] = f.map(Some(_)).recover[Option[T]]{case _ => None}
val inputData: List[String] = List("a", "b", "c", "d")
val completedFuture = inputData.map { a =>
a match {
case "a" | "c" => Future.firstCompletedOf(Seq(Future{Thread.sleep(3000); a},
Future.failed{Thread.sleep(2000); new RuntimeException()}))
case _ => Future(a)
}
}
val unfinished = Future.sequence(completedFuture.map(futureToFutureOption)).map(list => inputData.toSet -- list.flatten.toSet)
val rerunedOperations: Future[Set[Future[String]]] = unfinished.map { _.map(foo) }
def foo(s: String): Future[String] = ???
Here in example rerunedOperations have different type as in yours example, but I think this will be fine for you.
Also keep in mind, if you're performing some call to external system inside the future and that future didn't finish in appropriate time, such approach will not prevent unfinished future from execution, I mean that the actual call to external system will be processing while you will try to make another call

Related

How to discard other Futures if the critical Future is finished in Scala?

Let's say I have three remote calls in order to construct my page. One of them (X) is critical for the page and the other two (A, B) just used to enhance the experience.
Because criticalFutureX is too important to be effected by futureA and futureB, so I want the overall latency of of all remote calls to be Not more than X.
That means, in case of criticalFutureX finishes, I want to discard futureA and futureB.
val criticalFutureX = ...
val futureA = ...
val futureB = ...
// the overall latency of this for-comprehension depends on the longest among X, A and B
for {
x <- criticalFutureX
a <- futureA
b <- futureB
} ...
In the above example, even though they are executed in parallel, the overall latency depends on the longest among X, A and B, which is not what I want.
Latencies:
X: |----------|
A: |---------------|
B: |---|
O: |---------------| (overall latency)
There is firstCompletedOf but it can not be used to explicit say "in case of completed of criticalFutureX".
Is there something like the following?
val criticalFutureX = ...
val futureA = ...
val futureB = ...
for {
x <- criticalFutureX
a <- futureA // discard when criticalFutureX finished
b <- futureB // discard when criticalFutureX finished
} ...
X: |----------|
A: |-----------... discarded
B: |---|
O: |----------| (overall latency)
You can achieve this with a promise
def completeOnMain[A, B](main: Future[A], secondary: Future[B]) = {
val promise = Promise[Option[B]]()
main.onComplete {
case Failure(_) =>
case Success(_) => promise.trySuccess(None)
}
secondary.onComplete {
case Failure(exception) => promise.tryFailure(exception)
case Success(value) => promise.trySuccess(Option(value))
}
promise.future
}
Some testing code
private def runFor(first: Int, second: Int) = {
def run(millis: Int) = Future {
Thread.sleep(millis);
millis
}
val start = System.currentTimeMillis()
val combined = for {
_ <- Future.unit
f1 = run(first)
f2 = completeOnMain(f1, run(second))
r1 <- f1
r2 <- f2
} yield (r1, r2)
val result = Await.result(combined, 10.seconds)
println(s"It took: ${System.currentTimeMillis() - start}: $result")
}
runFor(3000, 4000)
runFor(3000, 1000)
Produces
It took: 3131: (3000,None)
It took: 3001: (3000,Some(1000))
This kind of task is very hard to achieve efficiently, reliably and safely with Scala standard library Futures. There is no way to interrupt a Future that hasn't completed yet, meaning that even if you choose to ignore its result, it will still keep running and waste memory and CPU time. And even if there was a method to interrupt a running Future, there is no way to ensure that resources that were allocated (network connections, open files etc.) will be properly released.
I would like to point out that the implementation given by Ivan Stanislavciuc has a bug: if the main Future fails, then the promise will never be completed, which is unlikely to be what you want.
I would therefore strongly suggest looking into modern concurrent effect systems like ZIO or cats-effect. These are not only safer and faster, but also much easier. Here's an implementation with ZIO that doesn't have this bug:
import zio.{Exit, Task}
import Function.tupled
def completeOnMain[A, B](
main: Task[A], secondary: Task[B]): Task[(A, Exit[Throwable, B])] =
(main.forkManaged zip secondary.forkManaged).use {
tupled(_.join zip _.interrupt)
}
Exit is a type that describes how the secondary task ended, i. e. by successfully returning a B or because of an error (of type Throwable) or due to interruption.
Note that this function can be given a much more sophisticated signature that tells you a lot more about what's going on, but I wanted to keep it simple here.

How to traverse a Set[Future[Option[User]]] and mutate a map

I have a mutable map that contains users:
val userMap = mutable.Map.empty[Int, User] // Int is user.Id
Now I need to load the new users, and add them to the map. I have the following api methods:
def getNewUsers(): Seq[Int]
def getUser(userId: Int): Future[Option[User]]
So I first get all the users I need to load:
val newUserIds: Set[Int] = api.getNewUsers
I now need to load each user, but not sure how to do getUser returns a Future[Option[User]].
I tried this:
api.getNewUsers().map( getUser(_) )
But that returns a Set[Future[Option[User]]]
I'm not sure how to use Set[Future[Option[User]]] to update my userMap now.
You'll have to wait for all of the Futures to finish. You can use Future.sequence to transform your Set[Future[_]] into a Future[Set], so you can wait for them all to finish:
val s: Set[scala.concurrent.Future[Some[User]]] = Set(Future(Some(User(1))), Future(Some(User(2))))
val f: Future[Set[Some[User]]] = Future.sequence(s)
f.map(users => users.foreach(u => /* your code here */))
However, using a mutable Map may be dangerous because it's possible to open yourself up to race conditions. Futures are executed in different threads, and if you altering a mutable object's state in different threads, bad things will happen.
You can use Future.sequence:
transforms a TraversableOnce[Future[A]] into a
Future[TraversableOnce[A]]. Useful for reducing many Futures into a
single Future
from Scala Future
You can try:
val result : Future[Seq[Option[User]]] =
Future.sequence(
api.getNewUsers().map( getUser )
)
result.andThen {
case Success(users) =>
users.flatten.foreach(u => yourMap += u.id -> u)
}

For loop containing Scala Futures modifying a List

Let's say I have a ListBuffer[Int] and I iterate it with a foreach loop, and each loop will modify this list from inside a Future (removing the current element), and will do something special when the list is empty. Example code:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.collection.mutable.ListBuffer
val l = ListBuffer(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
l.foreach(n => Future {
println(s"Processing $n")
Future {
l -= n
println(s"Removed $n")
if (l.isEmpty) println("List is empty!")
}
})
This is probably going to end very badly. I have a more complex code with similar structure and same needs, but I do not know how to structure it so I can achieve same functionality in a more reliable way.
The way you present your problem is really not in the functional paradigm that scala is intended for.
What you seem to want, is to do a list of asynchronous computations, do something at the end of each one, and something else when every one is finished. This is pretty simple if you use continuations, which are simple to implement with map and flatMap methods on Future.
val fa: Future[Int] = Future { 1 }
// will apply the function to the result when it becomes available
val fb: Future[Int] = fa.map(a => a + 1)
// will start the asynchronous computation using the result when it will become available
val fc: Future[Int] = fa.flatMap(a => Future { a + 2 })
Once you have all this, you can easily do something when each of your Future completes (successfully):
val myFutures: List[Future[Int]] = ???
myFutures.map(futInt => futInt.map(int => int + 2))
Here, I will add 2 to each value I get from the different asynchronous computations in the List.
You can also choose to wait for all the Futures in your list to complete by using Future.sequence:
val myFutureList: Future[List[Int]] = Future.sequence(myFutures)
Once again, you get a Future, which will be resolved when each of the Futures inside the input list are successfully resolved, or will fail whenever one of your Futures fails. You'll then be able to use map or flatMap on this new Future, to use all the computed values at once.
So here's how I would write the code you proposed:
val l = 1 to 10
val processings: Seq[Future[Unit]] = l.map {n =>
Future(println(s"processing $n")).map {_ =>
println(s"finished processing $n")
}
}
val processingOver: Future[Unit] =
Future.sequence(processings).map { (lu: Seq[Unit]) =>
println(s"Finished processing ${lu.size} elements")
}
Of course, I would recommend having real functions rather than procedures (returning Unit), so that you can have values to do something with. I used println to have a code which will produce the same output as yours (except for the prints, which have a slightly different meaning, since we are not mutating anything anymore).

Scala - Execute arbitrary number of Futures sequentially but dependently [duplicate]

This question already has answers here:
Is there sequential Future.find?
(3 answers)
Closed 8 years ago.
I'm trying to figure out the neatest way to execute a series of Futures in sequence, where one Future's execution depends on the previous. I'm trying to do this for an arbitrary number of futures.
User case:
I have retrieved a number of Ids from my database.
I now need to retrieve some related data on a web service.
I want to stop once I've found a valid result.
I only care about the result that succeeded.
Executing these all in parallel and then parsing the collection of results returned isn't an option. I have to do one request at a time, and only execute the next request if the previous request returned no results.
The current solution is along these lines. Using foldLeft to execute the requests and then only evaluating the next future if the previous future meets some condition.
def dblFuture(i: Int) = { i * 2 }
val list = List(1,2,3,4,5)
val future = list.foldLeft(Future(0)) {
(previousFuture, next) => {
for {
previousResult <- previousFuture
nextFuture <- { if (previousResult <= 4) dblFuture(next) else previousFuture }
} yield (nextFuture)
}
}
The big downside of this is a) I keep processing all items even once i've got a result i'm happy with and b) once I've found the result I'm after, I keep evaluating the predicate. In this case it's a simple if, but in reality it could be more complicated.
I feel like I'm missing a far more elegant solution to this.
Looking at your example, it seems as though the previous result has no bearing on subsequent results, and instead what only matters is that the previous result satisfies some condition to prevent the next result from being computed. If that is the case, here is a recursive solution using filter and recoverWith.
def untilFirstSuccess[A, B](f: A => Future[B])(condition: B => Boolean)(list: List[A]): Future[B] = {
list match {
case head :: tail => f(head).filter(condition).recoverWith { case _: Throwable => untilFirstSuccess(f)(condition)(tail) }
case Nil => Future.failed(new Exception("All failed.."))
}
}
filter will only be called when the Future has completed, and recoverWith will only be called if the Future has failed.
def dblFuture(i: Int): Future[Int] = Future {
println("Executing.. " + i)
i * 2
}
val list = List(1, 2, 3, 4, 5)
scala> untilFirstSuccess(dblFuture)(_ > 6)(list)
Executing.. 1
Executing.. 2
Executing.. 3
Executing.. 4
res1: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#514f4e98
scala> res1.value
res2: Option[scala.util.Try[Int]] = Some(Success(8))
Neatest way, and "true functional programming" is scalaz-stream ;) However you'll need to switch to scalaz.concurrent.Task from scala Future for abstraction for "future result". It's a bit different. Task is pure, and Future is "running computation", but they have a lot in common.
import scalaz.concurrent.Task
import scalaz.stream.Process
def dblTask(i: Int) = Task {
println(s"Executing task $i")
i * 2
}
val list = Seq(1,2,3,4,5)
val p: Process[Task, Int] = Process.emitAll(list)
val result: Task[Option[Int]] =
p.flatMap(i => Process.eval(dblTask(i))).takeWhile(_ < 10).runLast
println(s"result = ${result.run}")
Result:
Executing task 1
Executing task 2
Executing task 3
Executing task 4
Executing task 5
result = Some(8)
if your computation is already scala Future, you can transform it to Task
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}

ForkJoinPool getting blocked with just two workers

I have some code that is not performance-sensitive and was trying to make stacks easier to follow by using fewer futures. This resulted in some code similar to the following:
val fut = Future {
val r = Future.traverse(ips) { ip =>
val httpResponse: Future[HttpResponse] = asyncHttpClient.exec(req)
httpResponse.andThen {
case x => logger.info(s"received response here: $x")
}
httpResponse.map(r => (ip, r))
}
r.andThen { case x => logger.info(s"final result: $x") }
Await.result(r, 10 seconds)
}
fut.andThen { x => logger.info(s"finished $x") }
logger.info("here nonblocking")
As expected internal logging in the http client shows that the response returns immediately, but the callbacks executing logger.info(s"received response here: $x") and logger.info(s"final result: $x") do not execute until after Await.result(r, 10 seconds) times out. Looking at the log output, which includes thread ids, the callbacks are being executed in the same thread (ForkJoinPool-1-worker-3) that is awaiting the result, creating a deadlock. It was my understanding that ExecutionContext.global would create extra threads on demand when it ran out of threads. Is this not the case? There appears only to be two threads from the global fork join pool that are producing any output in the logs (1 and 3). Can anyone explain this?
As for fixes, I know perhaps the best way is to separate blocking and nonblocking work into different thread pools, but I was hoping to avoid this extra bookkeeping by using a dynamically sized thread pool. Is there a better solution?
If you want to grow the pool (temporarily) when threads are blocked, use concurrent.blocking. Here, you've used all the threads, doing i/o and then scheduling more work with map and andThen (the result of which you don't use).
More info: your "final result" is expected to execute after the traverse, so that is normal.
Example for blocking, although there must be a SO Q&A for it:
scala> import concurrent._ ; import ExecutionContext.Implicits._
scala> val is = 1 to 100 toList
scala> def db = s"${Thread.currentThread}"
db: String
scala> def f(i: Int) = Future { println(db) ; Thread.sleep(1000L) ; 2 * i }
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-9,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-5,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
Thread[ForkJoinPool-1-worker-15,5,main]
Thread[ForkJoinPool-1-worker-11,5,main]
res0: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#3a4b0e5d
[etc, N at a time]
versus overly parallel:
scala> def f(i: Int) = Future { blocking { println(db) ; Thread.sleep(1000L) ; 2 * i }}
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
res1: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#759d81f3
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-25,5,main]
Thread[ForkJoinPool-1-worker-29,5,main]
Thread[ForkJoinPool-1-worker-19,5,main]
scala> Thread[ForkJoinPool-1-worker-23,5,main]
Thread[ForkJoinPool-1-worker-27,5,main]
Thread[ForkJoinPool-1-worker-21,5,main]
Thread[ForkJoinPool-1-worker-31,5,main]
Thread[ForkJoinPool-1-worker-17,5,main]
Thread[ForkJoinPool-1-worker-49,5,main]
Thread[ForkJoinPool-1-worker-45,5,main]
Thread[ForkJoinPool-1-worker-59,5,main]
Thread[ForkJoinPool-1-worker-43,5,main]
Thread[ForkJoinPool-1-worker-57,5,main]
Thread[ForkJoinPool-1-worker-37,5,main]
Thread[ForkJoinPool-1-worker-51,5,main]
Thread[ForkJoinPool-1-worker-35,5,main]
Thread[ForkJoinPool-1-worker-53,5,main]
Thread[ForkJoinPool-1-worker-63,5,main]
Thread[ForkJoinPool-1-worker-47,5,main]