ForkJoinPool getting blocked with just two workers - scala

I have some code that is not performance-sensitive and was trying to make stacks easier to follow by using fewer futures. This resulted in some code similar to the following:
val fut = Future {
val r = Future.traverse(ips) { ip =>
val httpResponse: Future[HttpResponse] = asyncHttpClient.exec(req)
httpResponse.andThen {
case x => logger.info(s"received response here: $x")
}
httpResponse.map(r => (ip, r))
}
r.andThen { case x => logger.info(s"final result: $x") }
Await.result(r, 10 seconds)
}
fut.andThen { x => logger.info(s"finished $x") }
logger.info("here nonblocking")
As expected internal logging in the http client shows that the response returns immediately, but the callbacks executing logger.info(s"received response here: $x") and logger.info(s"final result: $x") do not execute until after Await.result(r, 10 seconds) times out. Looking at the log output, which includes thread ids, the callbacks are being executed in the same thread (ForkJoinPool-1-worker-3) that is awaiting the result, creating a deadlock. It was my understanding that ExecutionContext.global would create extra threads on demand when it ran out of threads. Is this not the case? There appears only to be two threads from the global fork join pool that are producing any output in the logs (1 and 3). Can anyone explain this?
As for fixes, I know perhaps the best way is to separate blocking and nonblocking work into different thread pools, but I was hoping to avoid this extra bookkeeping by using a dynamically sized thread pool. Is there a better solution?

If you want to grow the pool (temporarily) when threads are blocked, use concurrent.blocking. Here, you've used all the threads, doing i/o and then scheduling more work with map and andThen (the result of which you don't use).
More info: your "final result" is expected to execute after the traverse, so that is normal.
Example for blocking, although there must be a SO Q&A for it:
scala> import concurrent._ ; import ExecutionContext.Implicits._
scala> val is = 1 to 100 toList
scala> def db = s"${Thread.currentThread}"
db: String
scala> def f(i: Int) = Future { println(db) ; Thread.sleep(1000L) ; 2 * i }
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-9,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-5,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
Thread[ForkJoinPool-1-worker-15,5,main]
Thread[ForkJoinPool-1-worker-11,5,main]
res0: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#3a4b0e5d
[etc, N at a time]
versus overly parallel:
scala> def f(i: Int) = Future { blocking { println(db) ; Thread.sleep(1000L) ; 2 * i }}
f: (i: Int)scala.concurrent.Future[Int]
scala> Future.traverse(is)(f _)
Thread[ForkJoinPool-1-worker-13,5,main]
Thread[ForkJoinPool-1-worker-3,5,main]
Thread[ForkJoinPool-1-worker-1,5,main]
res1: scala.concurrent.Future[List[Int]] = scala.concurrent.impl.Promise$DefaultPromise#759d81f3
Thread[ForkJoinPool-1-worker-7,5,main]
Thread[ForkJoinPool-1-worker-25,5,main]
Thread[ForkJoinPool-1-worker-29,5,main]
Thread[ForkJoinPool-1-worker-19,5,main]
scala> Thread[ForkJoinPool-1-worker-23,5,main]
Thread[ForkJoinPool-1-worker-27,5,main]
Thread[ForkJoinPool-1-worker-21,5,main]
Thread[ForkJoinPool-1-worker-31,5,main]
Thread[ForkJoinPool-1-worker-17,5,main]
Thread[ForkJoinPool-1-worker-49,5,main]
Thread[ForkJoinPool-1-worker-45,5,main]
Thread[ForkJoinPool-1-worker-59,5,main]
Thread[ForkJoinPool-1-worker-43,5,main]
Thread[ForkJoinPool-1-worker-57,5,main]
Thread[ForkJoinPool-1-worker-37,5,main]
Thread[ForkJoinPool-1-worker-51,5,main]
Thread[ForkJoinPool-1-worker-35,5,main]
Thread[ForkJoinPool-1-worker-53,5,main]
Thread[ForkJoinPool-1-worker-63,5,main]
Thread[ForkJoinPool-1-worker-47,5,main]

Related

Why is Future considered to be "not referentially transparent"?

So I was reading the "Scala with Cats" book, and there was this sentence which I'm going to quote down here:
Note that Scala’s Futures aren’t a great example of pure functional programming because they aren’t referentially transparent.
And also, an example is provided as follows:
val future1 = {
// Initialize Random with a fixed seed:
val r = new Random(0L)
// nextInt has the side-effect of moving to
// the next random number in the sequence:
val x = Future(r.nextInt)
for {
a <- x
b <- x
} yield (a, b)
}
val future2 = {
val r = new Random(0L)
for {
a <- Future(r.nextInt)
b <- Future(r.nextInt)
} yield (a, b)
}
val result1 = Await.result(future1, 1.second)
// result1: (Int, Int) = (-1155484576, -1155484576)
val result2 = Await.result(future2, 1.second)
// result2: (Int, Int) = (-1155484576, -723955400)
I mean, I think it's because of the fact that r.nextInt is never referentially transparent, right? since identity(r.nextInt) would never be equal to identity(r.nextInt), does this mean that identity is not referentially transparent either? (or Identity monad, to have better comparisons with Future). If the expression being calculated is RT, then the Future would also be RT:
def foo(): Int = 42
val x = Future(foo())
Await.result(x, ...) == Await.result(Future(foo()), ...) // true
So as far as I can reason about the example, almost every function and Monad type should be non-RT. Or is there something special about Future? I also read this question and its answers, yet couldn't find what I was looking for.
You are actually right and you are touching one of the pickiest points of FP; at least in Scala.
Technically speaking, Future on its own is RT. The important thing is that different to IO it can't wrap non-RT things into an RT description. However, you can say the same of many other types like List, or Option; so why folks don't make a fuss about it?
Well, as with many things, the devil is in the details.
Contrary to List or Option, Future is typically used with non-RT things; e.g. an HTTP request or a database query. Thus, the emphasis folks give in showing that Future can't guarantee RT in those situations.
More importantly, there is only one reason to introduce Future on a codebase, concurrency (not to be confused with parallelism); otherwise, it would be the same as Try. Thus, controlling when and how those are executed is usually important.
Which is the reason why cats recommends the use of IO for all use cases of Future
Note: You can find a similar discussion on this cats PR and its linked discussions: https://github.com/typelevel/cats/pull/4182
So... the referential transparency simply means that you should be able to replace the reference with the actual thing (and vice versa) without changing the overall symatics or behaviour. Like mathematics is.
So, lets say you have x = 4 and y = 5, then x + y, 4 + y, x + 5, and 4 + 5 are pretty much the same thing. And can be replaced with each otherwhenever you want.
But... just look at following two things...
val f1 = Future { println("Hi") }
val f2 = f1
val f1 = Future { println("Hi") }
val f2 = Future { println("Hi") }
You can try to run it. The behaviour of these two programs is not going to be the same.
Scala Future are eagerly evaluated... which means that there is no way to actually write Future { println("Hi") } in your code without executing it as a seperate behaviour.
Keep in mind that this is not just linked to having side effects. Yes, the example which I used here with println was a side effect, but that was just to make the behaviour difference obvious to notice.
Even if you use something to suspend the side effect inside the Future, you will endup with two suspended side effects instead of one. And once these suspended side effects are passed to the interpreater, the same action will happen twice.
In following example, even if we suspend the print side-effect by wrapping it up in an IO, the expansive evaluation part of the program can still cause different behavours even if everything in the universe is exactly same for two cases.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
// cpu bound
// takes around 80 miliseconds
// we have only 1 core
def veryExpensiveComputation(input: Int): Int = ???
def impl1(): Unit = {
val f1 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f2 = f1
val f3 = f1
val futures = Future.sequence(Seq(f1, f2, f3))
val ios = Await.result(futures, 100 milli)
}
def impl2(): Unit = {
val f1 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f2 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f3 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val futures = Future.sequence(Seq(f1, f2, f3))
val ios = Await.result(futures, 100 milli)
}
The first impl will cause only 1 expensive computation, but the second will trigger 3 expensive computations. And thus the program will fail with timeout in the second example.
If properly written with IO or ZIO (without Future), it with fail with timeout in both implementations.

How to discard other Futures if the critical Future is finished in Scala?

Let's say I have three remote calls in order to construct my page. One of them (X) is critical for the page and the other two (A, B) just used to enhance the experience.
Because criticalFutureX is too important to be effected by futureA and futureB, so I want the overall latency of of all remote calls to be Not more than X.
That means, in case of criticalFutureX finishes, I want to discard futureA and futureB.
val criticalFutureX = ...
val futureA = ...
val futureB = ...
// the overall latency of this for-comprehension depends on the longest among X, A and B
for {
x <- criticalFutureX
a <- futureA
b <- futureB
} ...
In the above example, even though they are executed in parallel, the overall latency depends on the longest among X, A and B, which is not what I want.
Latencies:
X: |----------|
A: |---------------|
B: |---|
O: |---------------| (overall latency)
There is firstCompletedOf but it can not be used to explicit say "in case of completed of criticalFutureX".
Is there something like the following?
val criticalFutureX = ...
val futureA = ...
val futureB = ...
for {
x <- criticalFutureX
a <- futureA // discard when criticalFutureX finished
b <- futureB // discard when criticalFutureX finished
} ...
X: |----------|
A: |-----------... discarded
B: |---|
O: |----------| (overall latency)
You can achieve this with a promise
def completeOnMain[A, B](main: Future[A], secondary: Future[B]) = {
val promise = Promise[Option[B]]()
main.onComplete {
case Failure(_) =>
case Success(_) => promise.trySuccess(None)
}
secondary.onComplete {
case Failure(exception) => promise.tryFailure(exception)
case Success(value) => promise.trySuccess(Option(value))
}
promise.future
}
Some testing code
private def runFor(first: Int, second: Int) = {
def run(millis: Int) = Future {
Thread.sleep(millis);
millis
}
val start = System.currentTimeMillis()
val combined = for {
_ <- Future.unit
f1 = run(first)
f2 = completeOnMain(f1, run(second))
r1 <- f1
r2 <- f2
} yield (r1, r2)
val result = Await.result(combined, 10.seconds)
println(s"It took: ${System.currentTimeMillis() - start}: $result")
}
runFor(3000, 4000)
runFor(3000, 1000)
Produces
It took: 3131: (3000,None)
It took: 3001: (3000,Some(1000))
This kind of task is very hard to achieve efficiently, reliably and safely with Scala standard library Futures. There is no way to interrupt a Future that hasn't completed yet, meaning that even if you choose to ignore its result, it will still keep running and waste memory and CPU time. And even if there was a method to interrupt a running Future, there is no way to ensure that resources that were allocated (network connections, open files etc.) will be properly released.
I would like to point out that the implementation given by Ivan Stanislavciuc has a bug: if the main Future fails, then the promise will never be completed, which is unlikely to be what you want.
I would therefore strongly suggest looking into modern concurrent effect systems like ZIO or cats-effect. These are not only safer and faster, but also much easier. Here's an implementation with ZIO that doesn't have this bug:
import zio.{Exit, Task}
import Function.tupled
def completeOnMain[A, B](
main: Task[A], secondary: Task[B]): Task[(A, Exit[Throwable, B])] =
(main.forkManaged zip secondary.forkManaged).use {
tupled(_.join zip _.interrupt)
}
Exit is a type that describes how the secondary task ended, i. e. by successfully returning a B or because of an error (of type Throwable) or due to interruption.
Note that this function can be given a much more sophisticated signature that tells you a lot more about what's going on, but I wanted to keep it simple here.

Functional way of interrupting lazy iteration depedning on timeout and comparisson between previous and next, while, LazyList vs Stream

Background
I have the following scenario. I want to execute the method of a class from an external library, repeatedly, and I want to do so until a certain timeout condition and result condition (compared to the previous result) is met. Furthermore I want to collect the return values, even on the "failed" run (the run with the "failing" result condition that should interrupt further execution).
Thus far I have accomplished this with initializing an empty var result: Result, a var stop: Boolean and using a while loop that runs while the conditions are true and modifying the outer state. I would like to get rid of this and use a functional approach.
Some context. Each run is expected to run from 0 to 60 minutes and the total time of iteration is capped at 60 minutes. Theoretically, there's no bound to how many times it executes in this period but in practice, it's generally 2-60 times.
The problem is, the runs take a long time so I need to stop the execution. My idea is to use some kind of lazy Iterator or Stream coupled with scanLeft and Option.
Code
Boiler plate
This code isn't particularly relevant but used in my approach samples and provide identical but somewhat random pseudo runtime results.
import scala.collection.mutable.ListBuffer
import scala.util.Random
val r = Random
r.setSeed(1)
val sleepingTimes: Seq[Int] = (1 to 601)
.map(x => Math.pow(2, x).toInt * r.nextInt(100))
.toList
.filter(_ > 0)
.sorted
val randomRes = r.shuffle((0 to 600).map(x => r.nextInt(10)).toList)
case class Result(val a: Int, val slept: Int)
class Lib() {
def run(i: Int) = {
println(s"running ${i}")
Thread.sleep(sleepingTimes(i))
Result(randomRes(i), sleepingTimes(i))
}
}
case class Baz(i: Int, result: Result)
val lib = new Lib()
val timeout = 10 * 1000
While approach
val iteratorStart = System.currentTimeMillis()
val iterator = for {
i <- (0 to 600).iterator
if System.currentTimeMillis() < iteratorStart + timeout
f = Baz(i, lib.run(i))
} yield f
val iteratorBuffer = ListBuffer[Baz]()
if (iterator.hasNext) iteratorBuffer.append(iterator.next())
var run = true
while (run && iterator.hasNext) {
val next = iterator.next()
run = iteratorBuffer.last.result.a < next.result.a
iteratorBuffer.append(next)
}
Stream approach (Scala.2.12)
Full example
val streamStart = System.currentTimeMillis()
val stream = for {
i <- (0 to 600).toStream
if System.currentTimeMillis() < streamStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = stream.headOption
val tail = if (stream.nonEmpty) stream.tail else stream
val streamVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
LazyList approach (Scala 2.13)
Full example
val lazyListStart = System.currentTimeMillis()
val lazyList = for {
i <- (0 to 600).to(LazyList)
if System.currentTimeMillis() < lazyListStart + timeout
} yield Baz(i, lib.run(i))
var last: Option[Baz] = None
val head = lazyList.headOption
val tail = if (lazyList.nonEmpty) lazyList.tail else lazyList
val lazyListVersion = (tail
.scanLeft((head, true))((x, y) => {
if (x._1.exists(_.result.a > y.result.a)) (Some(y), false)
else (Some(y), true)
})
.takeWhile {
case (baz, continue) =>
if (!baz.eq(head)) last = baz
continue
}
.map(_._1)
.toList :+ last).flatten
Result
Both approaches appear to yield the correct end result:
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)))
and they interrupt execution as desired.
Edit: The desired outcome is to not execute the next iteration but still return the result of the iteration that caused the interruption. Thus the desired result is
List(Baz(0,Result(4,170)), Baz(1,Result(5,208)), Baz(2,Result(2,256))
and lib.run(i) should only run 3 times.
This is achieved by the while approach, as well as the LazyList approach but not the Stream approach which executes lib.run 4 times (Bad!).
Question
Is there another stateless approach, which is hopefully more elegant?
Edit
I realized my examples were faulty and not returning the "failing" result, which it should, and that they kept executing beyond the stop condition. I rewrote the code and examples but I believe the spirit of the question is the same.
I would use something higher level, like fs2.
(or any other high-level streaming library, like: monix observables, akka streams or zio zstreams)
def runUntilOrTimeout[F[_]: Concurrent: Timer, A](work: F[A], timeout: FiniteDuration)
(stop: (A, A) => Boolean): Stream[F, A] = {
val interrupt =
Stream.sleep_(timeout)
val run =
Stream
.repeatEval(work)
.zipWithPrevious
.takeThrough {
case (Some(p), c) if stop(p, c) => false
case _ => true
} map {
case (_, c) => c
}
run mergeHaltBoth interrupt
}
You can see it working here.

Scala - Wait for all futures completed within time period

I have List[Future[String]] and would like to wait constant period of time in order to collect successfull computation as well as rerun operations for futures that do not complete in specified period of time.
In pseudo code it will look like:
val inputData: List[String] = getInputData()
val futures : List[Future[String]] = inputData.map(toLongRunningIOOperation)
val (completedFutures, unfinishedFutures) = Await.ready(futures, 2 seconds)
val rerunedOperations : List[Future[String]] = unfinisedFutures.map(rerun)
Such solution could be useful if you need to execute several calls to external services whose usual latency is (low p99 < 60ms) but sometimes requests are processed more than 5 seconds (because of current state/load). In such situation it is better to rerun those requests (i.e to another instance of the service).
As an example, use Future.firstCompletedOf function to get timed future
def futureToFutureOption[T](f: Future[T]): Future[Option[T]] = f.map(Some(_)).recover[Option[T]]{case _ => None}
val inputData: List[String] = List("a", "b", "c", "d")
val completedFuture = inputData.map { a =>
a match {
case "a" | "c" => Future.firstCompletedOf(Seq(Future{Thread.sleep(3000); a},
Future.failed{Thread.sleep(2000); new RuntimeException()}))
case _ => Future(a)
}
}
val unfinished = Future.sequence(completedFuture.map(futureToFutureOption)).map(list => inputData.toSet -- list.flatten.toSet)
val rerunedOperations: Future[Set[Future[String]]] = unfinished.map { _.map(foo) }
def foo(s: String): Future[String] = ???
Here in example rerunedOperations have different type as in yours example, but I think this will be fine for you.
Also keep in mind, if you're performing some call to external system inside the future and that future didn't finish in appropriate time, such approach will not prevent unfinished future from execution, I mean that the actual call to external system will be processing while you will try to make another call

Scala - Execute arbitrary number of Futures sequentially but dependently [duplicate]

This question already has answers here:
Is there sequential Future.find?
(3 answers)
Closed 8 years ago.
I'm trying to figure out the neatest way to execute a series of Futures in sequence, where one Future's execution depends on the previous. I'm trying to do this for an arbitrary number of futures.
User case:
I have retrieved a number of Ids from my database.
I now need to retrieve some related data on a web service.
I want to stop once I've found a valid result.
I only care about the result that succeeded.
Executing these all in parallel and then parsing the collection of results returned isn't an option. I have to do one request at a time, and only execute the next request if the previous request returned no results.
The current solution is along these lines. Using foldLeft to execute the requests and then only evaluating the next future if the previous future meets some condition.
def dblFuture(i: Int) = { i * 2 }
val list = List(1,2,3,4,5)
val future = list.foldLeft(Future(0)) {
(previousFuture, next) => {
for {
previousResult <- previousFuture
nextFuture <- { if (previousResult <= 4) dblFuture(next) else previousFuture }
} yield (nextFuture)
}
}
The big downside of this is a) I keep processing all items even once i've got a result i'm happy with and b) once I've found the result I'm after, I keep evaluating the predicate. In this case it's a simple if, but in reality it could be more complicated.
I feel like I'm missing a far more elegant solution to this.
Looking at your example, it seems as though the previous result has no bearing on subsequent results, and instead what only matters is that the previous result satisfies some condition to prevent the next result from being computed. If that is the case, here is a recursive solution using filter and recoverWith.
def untilFirstSuccess[A, B](f: A => Future[B])(condition: B => Boolean)(list: List[A]): Future[B] = {
list match {
case head :: tail => f(head).filter(condition).recoverWith { case _: Throwable => untilFirstSuccess(f)(condition)(tail) }
case Nil => Future.failed(new Exception("All failed.."))
}
}
filter will only be called when the Future has completed, and recoverWith will only be called if the Future has failed.
def dblFuture(i: Int): Future[Int] = Future {
println("Executing.. " + i)
i * 2
}
val list = List(1, 2, 3, 4, 5)
scala> untilFirstSuccess(dblFuture)(_ > 6)(list)
Executing.. 1
Executing.. 2
Executing.. 3
Executing.. 4
res1: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#514f4e98
scala> res1.value
res2: Option[scala.util.Try[Int]] = Some(Success(8))
Neatest way, and "true functional programming" is scalaz-stream ;) However you'll need to switch to scalaz.concurrent.Task from scala Future for abstraction for "future result". It's a bit different. Task is pure, and Future is "running computation", but they have a lot in common.
import scalaz.concurrent.Task
import scalaz.stream.Process
def dblTask(i: Int) = Task {
println(s"Executing task $i")
i * 2
}
val list = Seq(1,2,3,4,5)
val p: Process[Task, Int] = Process.emitAll(list)
val result: Task[Option[Int]] =
p.flatMap(i => Process.eval(dblTask(i))).takeWhile(_ < 10).runLast
println(s"result = ${result.run}")
Result:
Executing task 1
Executing task 2
Executing task 3
Executing task 4
Executing task 5
result = Some(8)
if your computation is already scala Future, you can transform it to Task
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}