I'm trying to use scalatest Asynchronous Test Suites, but aside from some restrictions over setting timeout and what not, I don't see what does the test suite actually add.
I wonder if anyone versed into asynchronous testing with scalatest could quickly explain the differences between Asynchronous Test Suites and org.scalatest.concurrent. What do async test suites actually add over org.scalatest.concurrent? Is one approach better over the other?
We compare the following ScalaTest facilities for testing code that returns Futures:
Asynchronous style traits, for example, AsyncFlatSpec
ScalaFutures
Eventually
Asynchronous style traits
class AsyncSpec extends AsyncFlatSpec {
...
Future(3).map { v => assert(v == 3) }
...
}
non-blocking
we can assert before Future completes, i.e., return Future[Assertion] instead of Assertion
thread-safe
single-threaded serial execution context
Futures execute and complete in the order they are started and one after another
the same thread that is used to enqueue tasks in the test body is also used to execute them afterwards
Assertions can be mapped over Futures
no need to block inside the test body, i.e., use Await, whenReady
eliminates flakiness due to thread starvation
last expression in the test body must be Future[Assertion]
does not support multiple assertions in the test body
cannot use blocking constructs inside the test body as it will hang the test forever because of waiting on
enqueued but never started task
ScalaFutures
class ScalaFuturesSpec extends FlatSpec with ScalaFutures {
...
whenReady(Future(3) { v => assert(v == 3) }
...
}
blocking
we must wait to complete the Future before we can return Assertion
not thread-safe
Likely to be used with global execution context scala.concurrent.ExecutionContext.Implicits.global which is a
multi-threaded pool for parallel execution
supports multiple assertions within the same test body
last expression in the test body does not have to be Assertion
Eventually
class EventuallySpec extends FlatSpec with Eventually {
...
eventually { assert(Future(3).value.contains(Success(3))) }
...
}
more general facility intended not just for Futures
semantics here are that of retrying a block of code of any type passed in by-name until assertion is satisfied
when testing Futures it is likely global execution context will be used
intended primarily for integration testing where testing against real services with unpredictable response times
Single-threaded serial execution model vs. thread-pooled global execution model
scalatest-async-testing-comparison is an example
demonstrating the difference in two execution model.
Given the following test body
val f1 = Future {
val tmp = mutableSharedState
Thread.sleep(5000)
println(s"Start Future1 with mutableSharedState=$tmp in thread=${Thread.currentThread}")
mutableSharedState = tmp + 1
println(s"Complete Future1 with mutableSharedState=$mutableSharedState")
}
val f2 = Future {
val tmp = mutableSharedState
println(s"Start Future2 with mutableSharedState=$tmp in thread=${Thread.currentThread}")
mutableSharedState = tmp + 1
println(s"Complete Future2 with mutableSharedState=$mutableSharedState")
}
for {
_ <- f1
_ <- f2
} yield {
assert(mutableSharedState == 2)
}
let us consider the output of AsyncSpec against ScalaFuturesSpec
testOnly example.AsyncSpec:
Start Future1 with mutableSharedState=0 in thread=Thread[pool-11-thread-3-ScalaTest-running-AsyncSpec,5,main]
Complete Future1 with mutableSharedState=1
Start Future2 with mutableSharedState=1 in thread=Thread[pool-11-thread-3-ScalaTest-running-AsyncSpec,5,main]
Complete Future2 with mutableSharedState=2
testOnly example.ScalaFuturesSpec:
Start Future2 with mutableSharedState=0 in thread=Thread[scala-execution-context-global-119,5,main]
Complete Future2 with mutableSharedState=1
Start Future1 with mutableSharedState=0 in thread=Thread[scala-execution-context-global-120,5,main]
Complete Future1 with mutableSharedState=1
Note how in serial execution model same thread is used and Futures completed in order. On the other hand,
in global execution model different threads were used, and Future2 completed before Future1, which caused
race condition on the shared mutable state, which in turn made the test fail.
Which one should we use (IMO)?
In unit tests we should use mocked subsystems where returned Futures should be completing near-instantly, so there
is no need for Eventually in unit tests. Hence the choice is between async styles and ScalaFutures. The main difference
between the two is that former is non-blocking unlike the latter. If possible, we should never block, so we
should prefer async styles like AsyncFlatSpec. Further big difference is the execution model. Async styles
by default use custom serial execution model which provides thread-safety on shared mutable state, unlike global
thread-pool backed execution model often used with ScalaFutures. In conclusion, my suggestion is we use async style
traits unless we have a good reason not to.
Related
I'm a spark Scala Programmer. I have a spark job that has sub-tasks which to complete the whole job. I wanted to use Future to complete the subtasks in parallel. On completion of the whole job I have to return the whole job response.
What I heard about scala Future is once the main thread executed and stopped the remaining threads will be killed and also you will get empty response.
I have to use Await.result to collect the results. But all the blogs are telling that you should avoid Await.result and it's a bad practice.
Is using Await.result is correct way of doing or not in my case?
def computeParallel(): Future[String] = {
val f1 = Future { "ss" }
val f2 = Future { "sss" }
val f3 = Future { "ssss" }
for {
r1 <- f1
r2 <- f2
r3 <- f3
} yield (r1 + r2 + r3)
}
computeParallel().map(result => ???)
To my understanding, we have to use Future in webservice kind of application where it has one process always running that won't be exited. But in my case, once logic execution(scala program) is complete it will exit.
Can I use Future to my problem or not?
Using futures in Spark is probably not advisable except in special cases, and simply parallelizing computation isn't one of them (giving a non-blocking wrapper to blocking I/O (e.g. making requests to an outside service) is quite possibly the only special case).
Note that Future doesn't guarantee parallelism (whether and how they're executed in parallel depends on the ExecutionContext in which they're run), just asynchrony. Also, in the event that you're spawning computation-performing futures inside a Spark transformation (i.e. on the executor, not the driver), chances are that there won't be any performance improvement, since Spark tends to do a good job of keeping the cores on the executors busy, all spawning those futures does is contend for cores with Spark.
Broadly, be very careful about combining parallelism abstractions like Spark RDDs/DStreams/Dataframes, actors, and futures: there are a lot of potential minefields where such combinations can violate guarantees and/or conventions in the various components.
It's also worth noting that Spark has requirements around serializability of intermediate values and that futures aren't generally serializable, so a Spark stage can't result in a future; this means that you basically have no choice but to Await on the futures spawned in a stage.
If you still want to spawn futures in a Spark stage (e.g. posting them to a web service), it's probably best to use Future.sequence to collapse the futures into one and then Await on that (note that I have not tested this idea: I'm assuming that there's an implicit CanBuildFrom[Iterator[Future[String]], String, Future[String]] available):
def postString(s: String): Future[Unit] = ???
def postStringRDD(rdd: RDD[String]): RDD[String] = {
rdd.mapPartitions { strings =>
// since this is only get used for combining the futures in the Await, it's probably OK to use the implicit global execution context here
implicit val ectx = ???
Await.result(strings.map(postString))
}
rdd // Pass through the original RDD
}
We have a Scala Play webapp which does a number of database operations as part of a HTTP request, each of which is a Future. Usually we bubble up the Futures to an async controller action and let Play handle waiting for them.
But I've also noticed in a number of places we don't bubble up the Future or even wait for it to complete. I think this is bad because it means the HTTP request wont fail if the future fails, but does it actually even guarantee the future will be executed at all, since nothing is going to wait on the result of it? Will Play drop un-awaited futures after the HTTP request has been served, or leave them running in the background?
TL;DR
Play will not kill your Futures after sending the HTTP response.
Errors will not be reported if any of your Futures fail.
Long version
Your futures will not be killed when the HTTP response has been sent. You can try that out for yourself like this:
def futuresTest = Action.async { request =>
println(s"Entered futuresTest at ${LocalDateTime.now}")
val ignoredFuture = Future{
var i = 0
while (i < 10) {
Thread.sleep(1000)
println(LocalDateTime.now)
i += 1
}
}
println(s"Leaving futuresTest at ${LocalDateTime.now}")
Future.successful(Ok)
}
However you are right that the request will not fail if any of the futures fail. If this is a problem then you can compose the futures using a for comprehension or flatMaps. Here's an example of what you can do (I'm assuming that your Futures only perform side efects (Future[Unit])
To let your futures execute in paralell
val dbFut1 = dbCall1(...)
val dbFut2 = dbCall2(...)
val wsFut1 = wsCall1(...)
val fut = for(
_ <- dbFut1;
_ <- dbFut2;
_ <- wsFut1
) yield ()
fut.map(_ => Ok)
To have them execute in sequence
val fut = for(
_ <- dbCall1(...);
_ <- dbCall2(...);
_ <- wsCall2(...)
) yield ()
fut.map(_ => Ok)
does it actually even guarantee the future will be executed at all,
since nothing is going to wait on the result of it? Will Play drop
un-awaited futures after the HTTP request has been served, or leave
them running in the background?
This question actually runs much deeper than Play. You're generally asking "If I don't synchronously wait on a future, how can I guarantee it will actually complete without being GCed?". To answer that, we need to understand how the GC actually views threads. From the GC point of view, a thread is what we call a "root". Such a root is the starting point for the heap to traverse it's objects and see which ones are eligible for collection. Among roots are also static fields, for example, which are known to live throughout the life time of the application.
So, when you view it like that, and think of what a Future actually does, which is queue a function that runs on a dedicated thread from the pool of threads available via the underlying ExecutorService (which we refer to as ExecutionContext in Scala), you see that even though you're not waiting on the completion, the JVM runtime does guarantee that your Future will run to completion. As for the Future object wrapping the function, it holds a reference to that unfinished function body so the Future itself isn't collected.
When you think about it from that point of view, it's totally logical, since execution of a Future happens asynchronously, and we usually continue processing it in an asynchronous manner using continuations such as map, flatMap, onComplete, etc.
What is in your view the best scala solution for the case that you have some plain chain of synchronous function calls, and you need to add an asynchronous action in the middle of it without blocking?
I think going for futures entails refactoring existing code to callback spaghetti, tethering futures all the way through the formerly synchronous chain of function calls, or polling on the promise every interval, not so optimal but maybe I am missing some trivial/suitable options here.
It may seem that refactoring for (akka) actors, entail a whole lot of boilerplate for such a simple feat of engineering.
How would you plug in an asynchronous function within an existing synchronous flow, without blocking, and without going into a lot of boilerplate?
Right now in my code I block with Await.result, which just means a thread is sleeping...
One simple dirty trick:
Let's say you have:
def f1Sync: T1
def f2Sync: T2
...
def fAsynchronous: Future[TAsync]
import scala.concurrent.{ Future, Promise }
object FutureHelper {
// A value class is much cheaper than an implicit conversion.
implicit class FutureConverter[T](val t: T) extends AnyVal {
def future: Future[T] = Promise.successful(t).future
}
}
Then you can for yield:
import FutureHelper._
def completeChain: Future[Whatever] = {
for {
r1 <- f1Sync.future
r2 <- f2Sync.future
.. all your syncs
rn <- fAsynchronous // this should also return a future
rnn <- f50Sync(rn).future// you can even pass the result of the async to the next function
} yield rn
}
There is minimal boilerplate of converting your sync calls to immediately resolved futures. Don't be tempted to do that with Future.apply[T](t) or simply Future(a) as that will put daemon threads onto the executor. You won't be able to convert without an implicit executor.
With promises you pay the price of 3, 4 promises which is negligible and you get what you want. for yield acts as a sequencer, it will wait for every result in turn, so you can even do something with your async result after it has been processed.
It will "wait" for every sync call to complete as well, which will happen immediately by design, except for the async call where normal async processing will be automatically employed.
You could also use the async/await library, with the caveat that you wind up with one big Future out of it that you still have to deal with:
http://docs.scala-lang.org/sips/pending/async.html
But, it results in code almost identical to the sychronous code; where you were previously blocking, you add an:
await { theAsyncFuture }
and then carry on with synchronous code.
I am having a issue with the below piece of code. I want 'combine' method to get triggered after all groundCoffee,heaterWater,frothedMilk method completes. They would be triggered concurrently.All the 4 methods grind,heatWater,frothMilk,brew are concurrently executed using a future.
def prepareCappuccino(): Future[Cappuccino] = {
val groundCoffee = grind("arabica beans")
val heatedWater = heatWater(Water(20))
val frothedMilk = frothMilk("milk")
for {
ground <- groundCoffee
water <- heatedWater
foam <- frothedMilk
espresso <- brew(ground, water)
} yield combine(espresso, foam)
}
When I execute the above method the output I am getting is below
start grinding...
heating the water now
milk frothing system engaged!
And the program exits after this. I got this example from a site while I was trying to learn futures. How can the program be made to wait so that combine method get triggered after all the futures return?
The solution already posted to Await for a future is a solution when you want to deliberately block execution on that thread. Two common reasons to do this are for testing, when you want to wait for the outcome before making an assertion, and when otherwise all threads would exit (as is the case when doing toy examples).
However in a proper long lived application Await is generally to be avoided.
Your question already contains one of the correct ways to do future composition - using a for comprehension. Bear in mind here, that for-comprehensions are converted to flatMaps, maps and withFilter operations, so any futures you invoke in the for-comprehension will only be created after the others complete, ie serially.
If you want a bunch of futures to operate in concurrently, then you would create them before entering the for-comprehension as you have done.
You can use the Await here:
val f = Future.sequence(futures.toList)
Await.ready(f, Duration.Inf)
I assume, you have all the futures packed in a list. The Await.ready makes all the waiting work.
I am trying to understand Scala futures coming from Java background: I understand you can write:
val f = Future { ... }
then I have two questions:
How is this future scheduled? Automatically?
What scheduler will it use? In Java you would use an executor that could be a thread pool etc.
Furthermore, how can I achieve a scheduledFuture, the one that executes after a specific time delay? Thanks
The Future { ... } block is syntactic sugar for a call to Future.apply (as I'm sure you know Maciej), passing in the block of code as the first argument.
Looking at the docs for this method, you can see that it takes an implicit ExecutionContext - and it is this context which determines how it will be executed. Thus to answer your second question, the future will be executed by whichever ExecutionContext is in the implicit scope (and of course if this is ambiguous, it's a compile-time error).
In many case this will be the one from import ExecutionContext.Implicits.global, which can be tweaked by system properties but by default uses a ThreadPoolExecutor with one thread per processor core.
The scheduling however is a different matter. For some use-cases you could provide your own ExecutionContext which always applied the same delay before execution. But if you want the delay to be controllable from the call site, then of course you can't use Future.apply as there are no parameters to communicate how this should be scheduled. I would suggest submitting tasks directly to a scheduled executor in this case.
Andrzej's answer already covers most of the ground in your question. Worth mention is that Scala's "default" implicit execution context (import scala.concurrent.ExecutionContext.Implicits._) is literally a java.util.concurrent.Executor, and the whole ExecutionContext concept is a very thin wrapper, but is closely aligned with Java's executor framework.
For achieving something similar to scheduled futures, as Mauricio points out, you will have to use promises, and any third party scheduling mechanism.
Not having a common mechanism for this built into Scala 2.10 futures is a pity, but nothing fatal.
A promise is a handle for an asynchronous computation. You create one (assuming ExecutionContext in scope) by calling val p = Promise[Int](). We just promised an integer.
Clients can grab a future that depends on the promise being fulfilled, simply by calling p.future, which is just a Scala future.
Fulfilling a promise is simply a matter of calling p.successful(3), at which point the future will complete.
Play 2.x solves scheduling by using promises and a plain old Java 1.4 Timer.
Here is a linkrot-proof link to the source.
Let's also take a look at the source here:
object Promise {
private val timer = new java.util.Timer()
def timeout[A](message: => A, duration: Long, unit: TimeUnit = TimeUnit.MILLISECONDS)
(implicit ec: ExecutionContext): Future[A] = {
val p = Promise[A]()
timer.schedule(new java.util.TimerTask {
def run() {
p.completeWith(Future(message)(ec))
}
}, unit.toMillis(duration))
p.future
}
}
This can then be used like so:
val future3 = Promise.timeout(3, 10000) // will complete after 10 seconds
Notice this is much nicer than plugging a Thread.sleep(10000) into your code, which will block your thread and force a context switch.
Also worth noticing in this example is the val p = Promise... at the function's beginning, and the p.future at the end. This is a common pattern when working with promises. Take it to mean that this function makes some promise to the client, and kicks off an asynchronous computation in order to fulfill it.
Take a look here for more information about Scala promises. Notice they use a lowercase future method from the concurrent package object instead of Future.apply. The former simply delegates to the latter. Personally, I prefer the lowercase future.