I have two methods, let's call them load() and init(). Each one starts a computation in its own thread and returns a Future on its own execution context. The two computations are independent.
val loadContext = ExecutionContext.fromExecutor(...)
def load(): Future[Unit] = {
Future
}
val initContext = ExecutionContext.fromExecutor(...)
def init(): Future[Unit] = {
Future { ... }(initContext)
}
I want to call both of these from some third thread -- say it's from main() -- and perform some other computation when both are finished.
def onBothComplete(): Unit = ...
Now:
I don't care which completes first
I don't care what thread the other computation is performed on, except:
I don't want to block either thread waiting for the other;
I don't want to block the third (calling) thread; and
I don't want to have to start a fourth thread just to set the flag.
If I use for-comprehensions, I get something like:
val loading = load()
val initialization = initialize()
for {
loaded <- loading
initialized <- initialization
} yield { onBothComplete() }
and I get Cannot find an implicit ExecutionContext.
I take this to mean Scala wants a fourth thread to wait for the completion of both futures and set the flag, either an explicit new ExecutionContext or ExecutionContext.Implicits.global. So it would appear that for-comprehensions are out.
I thought I might be able to nest callbacks:
initialization.onComplete {
case Success(_) =>
loading.onComplete {
case Success(_) => onBothComplete()
case Failure(t) => log.error("Unable to load", t)
}
case Failure(t) => log.error("Unable to initialize", t)
}
Unfortunately onComplete also takes an implicit ExecutionContext, and I get the same error. (Also this is ugly, and loses the error message from loading if initialization fails.)
Is there any way to compose Scala Futures without blocking and without introducing another ExecutionContext? If not, I might have to just throw them over for Java 8 CompletableFutures or Javaslang Vavr Futures, both of which have the ability to run callbacks on the thread that did the original work.
Updated to clarify that blocking either thread waiting for the other is also not acceptable.
Updated again to be less specific about the post-completion computation.
Why not just reuse one of your own execution contexts? Not sure what your requirements for those are but if you use a single thread executor you could just reuse that one as the execution context for your comprehension and you won't get any new threads created:
implicit val loadContext = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor)
If you really can't reuse them you may consider this as the implicit execution context:
implicit val currentThreadExecutionContext = ExecutionContext.fromExecutor(
(runnable: Runnable) => {
runnable.run()
})
Which will run futures on the current thread. However, the Scala docs explicitly recommends against this as it introduces nondeterminism in which thread runs the Future (but as you stated, you don't care which thread it runs on so this may not matter).
See Synchronous Execution Context for why this isn't advisable.
An example with that context:
val loadContext = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor)
def load(): Future[Unit] = {
Future(println("loading thread " + Thread.currentThread().getName))(loadContext)
}
val initContext = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor)
def init(): Future[Unit] = {
Future(println("init thread " + Thread.currentThread().getName))(initContext)
}
val doneFlag = new AtomicBoolean(false)
val loading = load()
val initialization = init()
implicit val currentThreadExecutionContext = ExecutionContext.fromExecutor(
(runnable: Runnable) => {
runnable.run()
})
for {
loaded <- loading
initialized <- initialization
} yield {
println("yield thread " + Thread.currentThread().getName)
doneFlag.set(true)
}
prints:
loading thread pool-1-thread-1
init thread pool-2-thread-1
yield thread main
Though the yield line may print either pool-1-thread-1 or pool-2-thread-1 depending on the run.
In Scala, a Future represents a piece of work to be executed async (i.e. concurrently to other units of work). An ExecutionContext represent a pool of threads for executing Futures. In other words, ExecutionContext is the team of worker who performs the actual work.
For efficiency and scalability, it's better to have big team(s) (e.g. single ExecutionContext with 10 threads to execute 10 Future's) rather than small teams (e.g. 5 ExecutionContext with 2 threads each to execute 10 Future's).
In your case if you want to limit the number of threads to 2, you can:
def load()(implicit teamOfWorkers: ExecutionContext): Future[Unit] = {
Future { ... } /* will use the teamOfWorkers implicitly */
}
def init()(implicit teamOfWorkers: ExecutionContext): Future[Unit] = {
Future { ... } /* will use the teamOfWorkers implicitly */
}
implicit val bigTeamOfWorkers = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(2))
/* All async works in the following will use
the same bigTeamOfWorkers implicitly and works will be shared by
the 2 workers (i.e. thread) in the team */
for {
loaded <- loading
initialized <- initialization
} yield doneFlag.set(true)
The Cannot find an implicit ExecutionContext error does not mean that Scala wants additional threads. It only means that Scala wants a ExecutionContext to do the work. And additional ExecutionContext does not necessarily implies additional 'thread', e.g. the following ExecutionContext, instead of creating new threads, will execute works in the current thread:
val currThreadExecutor = ExecutionContext.fromExecutor(new Executor {
override def execute(command: Runnable): Unit = command.run()
})
Related
So I have a api in Scala that uses twitter.util.Future.
In my case, I want to create 2 futures, one of which is dependent on the result of the other and return the first future:
def apiFunc(): Future[Response]={
val future1 = getFuture1()
val future2 = future1 map {
resp => processFuture1Resp(resp)
}
future1
}
So in this case, future2 is never consumed and the api returns the result of future1 before future2 is completed.
My question is - will future2 run even though the api has returned?
Further, future1 should not be affected by future2. That is, the processing time for future2 should not be seen by any client that calls apiFunc(). You can think of processFuture1Resp as sending the results of getFuture1() to another service but the client calling apiFunc() doesn't care about that portion and only wants future1 as quickly as possible
My understanding is that futures spawn threads, but I am unsure if that thread will be terminated after the return of the main thread.
I guess a different way to ask this is - Will a twitter.util.future always be executed? Is there a way to fire and forget a twitter.util.future?
If you want to chain two futures where one of them processes the result of the other (and return a future), you can use a for comprehension:
def getFuture1() = {
println("Getting future 1...")
Thread.sleep(1000)
Future.successful("42")
}
def processFuture1Resp(resp: String) = {
println(s"Result of future 1 is: ${resp}")
"END"
}
def apiFunc(): Future[String]={
for {
res1 <- getFuture1()
res2 <- Future(processFuture1Resp(res1))
} yield res2
}
def main(args: Array[String]) {
val finalResultOfFutures: Future[String] = apiFunc()
finalResultOfFutures
Thread.sleep(500)
}
This will print:
Getting future 1...
Result of future 1 is: 42
The value finalResultOfFutures will contain the result of chaining both futures, and you'll be sure that the first future is executed before the second one. If you don't wait for the execution of finalResultOfFutures on the main thread (commenting the last sleep function of the main thread), you will only see:
Getting future 1...
The main thread will finish before the second future has time to print anything.
Another (better) way to wait for the execution of the future would be something like this:
val maxWaitTime: FiniteDuration = Duration(5, TimeUnit.SECONDS)
Await.result(finalResultOfFutures, maxWaitTime)
Await.result will block the main thread and waits a defined duration for the result of the given Future. If it is not ready or completes with a failure, Await.result will throw an exception.
EDITED
Another option is to use Monix Tasks This library allows you wrap actions (such as Futures) and have a greater control on how and when the Futures are executed. Since a Future may start its execution right after its declaration, these functionalities can be quite handy. Example:
import monix.execution.Scheduler.Implicits.global
import scala.concurrent.duration._
def getFuture1() = {
println("Getting future 1...")
Thread.sleep(3000)
println("Future 1 finished")
Future.successful("42")
}
def processFuture1Resp(resp: Task[String]) = {
val resp1 = resp.runSyncUnsafe(2.seconds)
println(s"Future2: Result of future 1 is: ${resp1}")
}
def main(args: Array[String]) {
// Get a Task from future 1. A Task does not start its execution until "run" is called
val future1: Task[String] = Task.fromFuture(getFuture1())
// Here we can create the Future 2 that depends on Future 1 and execute it async. It will finish even though the main method ends.
val future2 = Task(processFuture1Resp(future1)).runAsyncAndForget
// Now the future 1 starts to be calculated. Yo can runAsyncAndForget here or return just the task so that the client
// can execute it later on
future1.runAsyncAndForget
}
This will print:
Getting future 1...
Future 1 finished
Future2: Result of future 1 is: 42
Process finished with exit code 0
I'm trying to use a for comprehension to both run some futures in order and merged results, but also kick off a separate thread after one of those futures completes and not care about the result (basically used to fire some logging info)
I've played around a bit with this with some thread sleeps and it looks like whatever i'm throwing inside the for block will end up blocking the thread.
private def testFunction(): EitherT[Future, Error, Response] =
for {
firstRes <- EitherT(client.getFirst())
secondRes <- EitherT(client.getSecond())
// Future i want to run on a separate async thread outside the comprehension
_ = runSomeLogging(secondRes)
thirdRes <- EitherT(client.getThird(secondRes.value))
} yield thirdRes
def runSomeLogging(): Future[Either[Error, Response]] =
Thread.sleep(10000)
Future.successful(Right(Response("123")))
}
So this above code will wait the 10 seconds before returning the thirdRes result. My hope was to kick off the runSomeLogging function on a separate thread after the secondRes runs. I thought the usage of = rather than <- would cause that, however it doesn't.
The way I am able to get this to work is below. Basically I run my second future outside of the comprehension and use .onComplete on the previous future to only run my logging if certain conditions were meant from the above comprehension. I only want to run this logging function if the secondRes function is successful in my example here.
private def runSomeLogging(response: SecondRes) =
Thread.sleep(10000)
response.value.onComplete {
case Success(either) =>
either.fold(
_ => { },
response => { logThing() }
)
case _ =>
}
private def testFunction(): EitherT[Future, Error, Response] =
val res = for {
firstRes <- EitherT(client.getFirst())
secondRes <- EitherT(client.getSecond())
thirdRes <- EitherT(client.getThird(secondRes.value))
} yield thirdRes
runSomeLogging(res)
return res
This second example works fine and does what I want, it doesn't block the for comprehension for 10 seconds from returning. However, because there are dependencies of this running for certain pieces of the comprehension, but not all of them, I was hoping there was a way to kick off the job from within the for comprehension itself but let it run on its own thread and not block the comprehension from completing.
You need a function that starts the Future but doesn't return it, so the for-comprehension can move on (since Future's map/flatMap functions won't continue to the next step until the current Future resolves). To accomplish a "start and forget", you need to use a function that returns immediately, or a Future that resolves immediately.
// this function will return immediately
def runSomeLogging(res: SomeResult): Unit = {
// since startLoggingFuture uses Future.apply, calling it will start the Future,
// but we ignore the actual Future by returning Unit instead
startLoggingFuture(res)
}
// this function returns a future that takes 10 seconds to resolve
private def startLoggingFuture(res: SomeResult): Future[Unit] = Future {
// note: please don't actually do Thread.sleep in your Future's thread pool
Thread.sleep(10000)
logger.info(s"Got result $res")
}
Then you could put e.g.
_ = runSomeLogging(res)
or
_ <- Future { runSomeLogging(res) }
in your for-comprehension.
Note, Cats-Effect and Monix have a nice abstraction for "start but ignore result", with io.start.void and task.startAndForget respectively. If you were using IO or Task instead of Future, you could use .start.void or .startAndForget on the logging task.
I have a sequence of scala Futures of same type.
I want, after some limited time, to get a result for the entire sequence while some futures may have succeeded, some may have failed and some haven't completed yet, the non completed futures should be considered failed.
I don't want to use Await each future sequentially.
I did look at this question: Scala waiting for sequence of futures
and try to use the solution from there, namely:
private def lift[T](futures: Seq[Future[T]])(implicit ex: ExecutionContext) =
futures.map(_.map { Success(_) }.recover { case t => Failure(t) })
def waitAll[T](futures: Seq[Future[T]])(implicit ex: ExecutionContext) =
Future.sequence(lift(futures))
futures: Seq[Future[MyObject]] = ...
val segments = Await.result(waitAll(futures), waitTimeoutMillis millis)
but I'm still getting a TimeoutException, I guess because some of the futures haven't completed yet.
and that answer also states,
Now Future.sequence(lifted) will be completed when every future is completed, and will represent successes and failures using Try.
But I want my Future to be completed after the timeout has passed, not when every future in the sequence has completed. What else can I do?
If I used raw Future (rather than some IO monad which has this functionality build-in, or without some Akka utils for exactly that) I would hack together utility like:
// make each separate future timeout
object FutureTimeout {
// separate EC for waiting
private val timeoutEC: ExecutorContext = ...
private def timeout[T](delay: Long): Future[T] = Future {
blocking {
Thread.sleep(delay)
}
throw new Exception("Timeout")
}(timeoutEC)
def apply[T](fut: Future[T], delat: Long)(
implicit ec: ExecutionContext
): Future[T] = Future.firstCompletedOf(Seq(
fut,
timeout(delay)
))
}
and then
Future.sequence(
futures
.map(FutureTimeout(_, delay))
.map(Success(_))
.recover { case e => Failure(e) }
)
Since each future would terminate at most after delay we would be able to collect them into one result right after that.
You have to remember though that no matter how would you trigger a timeout you would have no guarantee that the timeouted Future stops executing. It could run on and on on some thread somewhere, it's just that you wouldn't wait for the result. firstCompletedOf just makes this race more explicit.
Some other utilities (like e.g. Cats Effect IO) allow you to cancel computations (which is used in e.g. races like this one) but you still have to remember that JVM cannot arbitrarily "kill" a running thread, so that cancellation would happen after one stage of computation is completed and before the next one is started (so e.g. between .maps or .flatMaps).
If you aren't afraid of adding external deps there are other (and more reliable, as Thread.sleep is just a temporary ugly hack) ways of timing out a Future, like Akka utils. See also other questions like this.
Here is solution using monix
import monix.eval.Task
import monix.execution.Scheduler
val timeoutScheduler = Scheduler.singleThread("timeout") //it's safe to use single thread here because timeout tasks are very fast
def sequenceDiscardTimeouts[T](tasks: Task[T]*): Task[Seq[T]] = {
Task
.parSequence(
tasks
.map(t =>
t.map(Success.apply) // Map to success so we can collect the value
.timeout(500.millis)
.executeOn(timeoutScheduler) //This is needed to run timesouts in dedicated scheduler that won't be blocked by "blocking"/io work if you have any
.onErrorRecoverWith { ex =>
println("timed-out")
Task.pure(Failure(ex)) //It's assumed that any error is a timeout. It's possible to "catch" just timeout exception here
}
)
)
.map { res =>
res.collect { case Success(r) => r }
}
}
Testing code
implicit val mainScheduler = Scheduler.fixedPool(name = "main", poolSize = 10)
def slowTask(msg: String) = {
Task.sleep(Random.nextLong(1000).millis) //Sleep here to emulate a slow task
.map { _ =>
msg
}
}
val app = sequenceDiscardTimeouts(
slowTask("1"),
slowTask("2"),
slowTask("3"),
slowTask("4"),
slowTask("5"),
slowTask("6")
)
val started: Long = System.currentTimeMillis()
app.runSyncUnsafe().foreach(println)
println(s"Done in ${System.currentTimeMillis() - started} millis")
This will print an output different for each run but it should look like following
timed-out
timed-out
timed-out
3
4
5
Done in 564 millis
Please note the usage of two separate schedulers. This is to ensure that timeouts will fire even if the main scheduler is busy with business logic. You can test it by reducing poolSize for main scheduler.
I have a future(doFour) that is executed and results passed to a flatmap.
Inside the flatmap I execute two more future(doOne and doTwo) functions expecting them to run on parallel but I see they are running sequentially (2.13). Scastie
Why are doOne and doTwo not execute in parallel ?
How can I have them to run in parallel ?
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
object Test {
def doOne(): Future[Unit] = Future {
println("startFirst"); Thread.sleep(3000); println("stopFirst")
}
def doTwo(): Future[Unit] = Future {
println("startSecond"); Thread.sleep(1000); println("stopSecond")
}
def doFour(): Future[Unit] = Future {
println("do 4"); Thread.sleep(1000); println("done 4")
}
def main(args: Array[String]) {
val resOut = doFour().flatMap { a =>
val futureOperations = Seq(doOne(), doTwo())
val res = Future.sequence(futureOperations)
res
}
val stream = Await.result(resOut, Duration.Inf)
}
}
A Future becomes eligible for execution as soon as it is created. So this line creates two Futures that can potentially be executed:
val futureOperations = Seq(doOne(), doTwo())
The call to Future.sequence will create a new Future that waits for each of the futures to complete in turn, but they will both already be available for execution by this point in the code.
val res = Future.sequence(futureOperations)
If you want Futures to start sequentially you need to use map/flatMap:
val res = doOne().map( _ => doTwo())
With this code doTwo will not be called until doOne completes (and not at all if doOne fails)
The reason that this does not appear to happen in your example is that you are calling a blocking operation in your Future which is blocking a thread that would otherwise be used to execute other Futures. So although there are two Futures available for execution, only one is actually being executed at a time.
If you mark the code as blocking it works correctly:
import scala.concurrent.blocking
def doOne(): Future[Unit] = Future {
blocking{println("startFirst"); Thread.sleep(3000); println("stop First")}
}
def doTwo(): Future[Unit] = Future {
blocking{println("startSecond"); Thread.sleep(1000); println("stop Second")}
}
See the comments section for details of why the default behaviour is different on different versions, and why you should never make assumptions about the relative execution order of independent Futures.
Suppose I've got a function like this:
import scala.concurrent._
def plus2Future(fut: Future[Int])
(implicit ec: ExecutionContext): Future[Int] = fut.map(_ + 2)
Now I am using plus2Future to create a new Future:
import import scala.concurrent.ExecutionContext.Implicits.global
val fut1 = Future { 0 }
val fut2 = plus2Future(fut1)
Does function plus2 always run now on the same thread as fut1 ? I guess it does not.
Does using map in plus2Future add an overhead of thread context switching, creating a new Runnable etc. ?
Does using map in plus2Future add an overhead of thread context
switching, creating a new Runnable etc. ?
map, on the default implementation of Future (via DefaultPromise) is:
def map[S](f: T => S)(implicit executor: ExecutionContext): Future[S] = { // transform(f, identity)
val p = Promise[S]()
onComplete { v => p complete (v map f) }
p.future
}
Where onComplete creates a new CallableRunnable and eventually invokes dispatchOrAddCallback:
/** Tries to add the callback, if already completed, it dispatches the callback to be executed.
* Used by `onComplete()` to add callbacks to a promise and by `link()` to transfer callbacks
* to the root promise when linking two promises togehter.
*/
#tailrec
private def dispatchOrAddCallback(runnable: CallbackRunnable[T]): Unit = {
getState match {
case r: Try[_] => runnable.executeWithValue(r.asInstanceOf[Try[T]])
case _: DefaultPromise[_] => compressedRoot().dispatchOrAddCallback(runnable)
case listeners: List[_] => if (updateState(listeners, runnable :: listeners)) () else dispatchOrAddCallback(runnable)
}
}
Which dispatches the call to the underlying execution context. This means that it is up to the implementing ExecutionContext to decide how and where the future is ran, so one cannot give a deterministic answer of "It runs here or there". We can see that there is both an allocation of a Promise and a callback object.
In general, I wouldn't worry about this, unless I've benchmarked the code and found this to be a bottleneck.
No, it cannot (deterministically) use the same thread (unless the EC only has one thread) since an unbounded amount of time can elapse between the two lines of code.