Is there an idiomatic way to deal with side-effecting scala.concurrent.Future whose result is not really needed in the caller (i.e. neither success nor failure will be necessary to calculate the calling method's return value)? For example,
object XYZ {
def log(s: String): Future[Unit] = ???
}
Is it correct to simply call XYZ.log("xyz") without any callbacks and proceed with the really important tasks? Won't this orphan Future be considered code smell? Any chance for it to be garbage collected before it is executed?
Won't this orphan Future be considered code smell?
Not any more than any other discarded non-Unit value. Note that some people do consider any discarded value to be a code smell, which is why -Ywarn-value-discard and NonUnitStatements wart exist.
Any chance for it to be garbage collected before it is executed?
The Future object itself might be garbage collected, but it will submit a task to a thread pool which won't be (because the pool holds a reference to the task). Except cases like
def log(s: String): Future[Unit] = Future.just(())
of course.
When we have such Futures which are not part of main business logic but executed for their side effect, that is, represent a separate concern such as logging, then consider running them on a separate dedicated execution context. For example, create a separate pool with
val numOfThreads = 2
val threadPoolForSeparateConcerns = Executors.newFixedThreadPool(numOfThreads, (r: Runnable) => new Thread(r, s"thread-pool-for-separate-concerns-thread-${Random.nextInt(numOfThreads)}"))
val separateConcernsEc = ExecutionContext.fromExecutor(threadPoolForSeparateConcerns)
and then make sure to pass it in when running separate concerns
object XYZ {
def log(s: String, ec: ExecutionContext): Future[Unit] = ???
}
XYZ.log("Failed to calculate 42", separateConcernsEc)
By separating main business logic thread pool, from thread pool for side-concerns, we minimise the chances of breaking main business logic through things like resource starvation etc.
Related
I am trying to do a very simple thing and want to understand the right way of doing it. I need to periodically make some Rest API calls to a separate service and then process the results asynchronously. I am using actor system's default scheduler to schedule the Http requests and have created a separate threadpool to handle the Future callbacks. Since there is no dependency between requests and response I thought a separate threadpool for handling future callbacks should be fine.
Is there some problem with this approach?
I read the Scala doc and it says there is some issue here (though i not clear on it).
Generally what is recommended way of handling these scenarios?
implicit val system = ActorSystem("my-actor-system") // define an actor system
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10)) // create a thread pool
// define a thread which periodically does some work using the actor system's scheduler
system.scheduler.scheduleWithFixedDelay(5.seconds, 5.seconds)(new Runnable {
override def run(): Unit = {
val urls = getUrls() // get list of urls
val futureResults = urls.map(entry => getData[MyData](entry))) // get data foreach url
futureResults onComplete {
case Success(res) => // do something with the result
case Failure(e) => // do something with the error
}
}
}))
def getdata[T](url : String) : Future[Option[Future[T]] = {
implicit val ec1 = system.dispatcher
val responseFuture: Future[HttpResponse] = execute(url)
responseFuture map { result => {
// transform the response and return data in format T
}
}
}
Whether or not having a separate thread pool really depends on the use case. If the service integration is very critical and is designed to take a lot of resources, then a separate thread pool may make sense, otherwise, just use the default one should be fine. Feel free to refer to Levi's question for more in-depth discussions on this part.
Regarding "job scheduling in an actor system", I think Akka streams are a perfect fit here. I give you an example below. Feel free to refer to the blog post https://blog.colinbreck.com/rethinking-streaming-workloads-with-akka-streams-part-i/ regarding how many things can Akka streams simplify for you.
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
import scala.concurrent.duration._
import scala.concurrent.{ExecutionContext, Future}
import scala.util.{Failure, Success}
object Timer {
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("Timer")
// default thread pool
implicit val ec: ExecutionContext = system.dispatcher
// comment out below if custom thread pool is needed
// also make sure you read https://doc.akka.io/docs/akka/current/dispatchers.html#setting-the-dispatcher-for-an-actor
// to define the custom thread pool
// implicit val ec: ExecutionContext = system.dispatchers.lookup("my-custom-dispatcher")
Source
.tick(5.seconds, 5.seconds, getUrls())
.mapConcat(identity)
.mapAsync(1)(url => fetch(url))
.runWith(Sink.seq)
.onComplete {
case Success(responses) =>
// handle responses
case Failure(ex) =>
// handle exceptions
}
}
def getUrls(): Seq[String] = ???
def fetch(url: String): Future[Response] = ???
case class Response(body: String)
}
In addition to Yik San Chan's answer above (especially regarding using Akka Streams), I'd also point out that what exactly you're doing in the .onComplete block is quite relevant to the choice of which ExecutionContext to use for the onComplete callback.
In general, if what you're doing in the callback will be doing blocking I/O, it's probably best to do it in a threadpool which is large relative to the number of cores (note that each thread on the JVM consumes about 1MB or so of heap, so it's probably not a great idea to use an ExecutionContext that spawns an unbounded number of threads; a fixed pool of about 10x your core count is probably OK).
Otherwise, it's probably OK to use an ExecutionContext with a threadpool roughly equal in size to the number of cores: the default Akka dispatcher is such an ExecutionContext. The only real reason to consider not using the Akka dispatcher, in my experience/opinion, is if the callback is going to occupy the CPU for a long time. The phenomenon known as "thread starvation" can occur in that scenario, with adverse impacts on performance and cluster stability (if using, e.g. Akka Cluster or health-checks). In such a scenario, I'd tend to use a dispatcher with fewer threads than cores and consider configuring the default dispatcher with fewer threads than the default (while the kernel's scheduler can and will manage more threads ready-to-run than cores, there are strong arguments for not letting it do so).
In an onComplete callback (in comparison to the various transformation methods on Future like map/flatMap and friends), since all you can do is side-effect, it's probably more likely than not that you're doing blocking I/O.
I am aware that in Java 8 it is not a good idea to start long-running tasks in stream framework's filter, map, etc. methods, as there is no way to tune the underlying fork-join pool and it can cause latency problems and starvings.
Now, my question would be, is there any problem like that with Scala? I tried to google it, but I guess I just can't put this question in a google-able sentence.
Let's say I have a list of objects and I want to save them into a database using a forEach, would that cause any issues? I guess this would not be an issue in Scala, as functional transformations are fundamental building blocks of the language, but anyway...
If you don't see any kind of I/O operations then using futures can be an overhead.
def add(x: Int, y: Int) = Future { x + y }
Executing purely CPU-bound operations in a Future constructor will make your logic slower to execute, not faster. Mapping and flatmapping over them, can increase add fuel to this problem.
In case you want to initialize a Future with a constant/simple calculation, you can use Future.successful().
But all blocking I/O, including SQL queries makes sence be wrapped inside a Future with blocking
E.g :
Future {
DB.withConnection { implicit connection =>
val query = SQL("select * from bar")
query()
}
}
Should be done as,
import scala.concurrent.blocking
Future {
blocking {
DB.withConnection { implicit connection =>
val query = SQL("select * from bar")
query()
}
}
This blocking notifies the thread pool that this task is blocking. This allows the pool to temporarily spawn new workers as needed. This is done to prevent starvation in blocking applications.
The thread pool(by default the scala.concurrent.ExecutionContext.global pool) knows when the code in a blocking is completed.(Since it's a fork join thread pool)
Therefore it will remove the spare worker threads as they completes, and the pool will shrink back down to its expected size with time(Number of cores by default).
But this scenario can also backfire if there is not enough memory to expand the thread pool.
So for your scenario, you can use
images.foreach(i => {
import scala.concurrent.blocking
Future {
blocking {
DB.withConnection { implicit connection =>
val query = SQL("insert into .........")
query()
}
}
})
If you're doing a lot of blocking I/O, then it's a good practice to create a separate thread-pool/execution context and execute all blocking calls in that pool.
References :
scala-best-practices
demystifying-the-blocking-construct-in-scala-futures
Hope this helps.
A Future is good at representing a single asynchronous task that will / should be completed within some fixed amount of time.
However, there exist another kind of asynchronous task, one where it's not possible / very hard to know exactly when it will finish. For example, the time taken for a particular string processing task might depend on various factors such as the input size.
For these kind of task, detecting failure might be better by checking if the task is able to make progress within a reasonable amount of time instead of by setting a hard timeout value such as in Future.
Are there any libraries providing suitable monadic abstraction of such kind of task in Scala?
You could use a stream of values like this:
sealed trait Update[T]
case class Progress[T](value: Double) extends Update[T]
case class Finished[T](result: T) extends Update[T]
let your task emit Progress values when it is convenient (e.g. every time a chunk of the computation has finished), and emit one Finished value once the complete computation is finished. The consumer could check for progress values to ensure that the task is still making progress. If a consumer is not interested in progress updates, you can just filter them out. I think this is more composable than an actor-based approach.
Depending on how much performance or purity you need, you might want to look at akka streams or scalaz streams. Akka streams has a pure DSL for building flow graphs, but allows mutability in processing stages. Scalaz streams is more functional, but has lower performance last I heard.
You can break your work into chunks. Each chunk is a Future with a timeout - your notion of reasonable progress. Chain those Futures together to get a complete task.
Example 1 - both chunks can run in parallel and don't depend on each other (embarassingly parallel task):
val chunk1 = Future { ... } // chunk 1 starts execution here
val chunk2 = Future { ... } // chunk 2 starts execution here
val result = for {
c1 <- chunk1
c2 <- chunk2
} yield combine(c1, c2)
Example 2 - second chunk depends on the first:
val chunk1 = Future { ... } // chunk 1 starts execution here
val result = for {
c1 <- chunk1
c2 <- Future { c1 => ... } // chunk 2 starts execution here
} yield combine(c1, c2)
There are obviously other constructs to help you when you have many Futures like sequence.
The article "The worst thing in our Scala code: Futures" by Ken Scambler points the need for a separation of concerns:
scala.concurrent.Future is built to work naturally with scala.util.Try, so our code often ends up clumsily using Try everywhere to represent failure, using raw exceptions as failure values even where no exceptions get thrown.
To do anything with a scala.concurrent.Future, you need to lug around an implicit ExecutionContext. This obnoxious dependency needs to be threaded through everywhere they are used.
So if your code does not depend directly on Future, but on simple Monad properties, you can abstract it with a Monad type:
trait Monad[F[_]] {
def flatMap[A,B](fa: F[A], f: A => F[B]): F[B]
def pure[A](a: A): F[A]
}
// read the type parameter as “for all F, constrained by Monad”.
final def load[F[_]: Monad](pageUrl: URL): F[Page]
C#'s Tasks have ConfigureAwait(false) for libraries to prevent synchronization to (for example) the UI-thread which is not always necessary:
http://msdn.microsoft.com/en-us/magazine/hh456402.aspx
In .NET I believe there can only be one SynchonizationContext, so it's clear on which threadpool a Task should execute it's continuation.
For a library, when you can't assume the user is in a webrequest(in .NET HttpContext.Current.Items flows), commandline (normal multithreaded), XAML/Windows Forms (single UI thread), it's almost always better to use ConfigureAwait(false), so the Waiter knows it can just execute the continuation on whatever thread is being used to call the Waiter (this is only bad if you do blocking code in the library which could lead to thread starvation on the threadpool where the initial workload is started, let assume we don't do that).
The point is that from a library perspective you don't want to use a thread from the caller's threadpool to synchronize a continuation, you just want the continuation to run on whatever thread. This saves a context switch and keeps the load of the UI thread for example.
In Scala, for each operation (namely map) on Futures, you need an ExecutionContext (passed implicitly). This makes managing threadpools incredibly easy, which I like a lot more than the way .NET has somewhat strange TaskFactory's (which nobody seems to use, they just use the default TaskFactory).
My question is, does Scala have the same problem as .NET in respect to context switches being sometimes unnecessary, and if so, is there a way, similar to ConfigureAwait, to fix this?
Concrete example I'm finding in Scala where I wonder about this:
def trace[T](message: => String)(block: => Future[T]): Future[T] = {
if (!logger.isTraceEnabled) block
else {
val startedAt = System.currentTimeMillis()
block.map { result =>
val timeTaken = System.currentTimeMillis() - startedAt
logger.trace(s"$message took ${timeTaken}ms")
result
}
}
}
I'm using play and I generally import play's default, implicit ExecutionContext.
The map on block needs to run on an execution context.
If I wrote this piece of Scala in a library and I would add an implicit parameter executionContext:
def trace[T](message: => String)(block: => Future[T])(implicit executionContext: ExecutionContext): Future[T] = {
instead of importing play's default ExecutionContext in the libary.
I am trying to understand Scala futures coming from Java background: I understand you can write:
val f = Future { ... }
then I have two questions:
How is this future scheduled? Automatically?
What scheduler will it use? In Java you would use an executor that could be a thread pool etc.
Furthermore, how can I achieve a scheduledFuture, the one that executes after a specific time delay? Thanks
The Future { ... } block is syntactic sugar for a call to Future.apply (as I'm sure you know Maciej), passing in the block of code as the first argument.
Looking at the docs for this method, you can see that it takes an implicit ExecutionContext - and it is this context which determines how it will be executed. Thus to answer your second question, the future will be executed by whichever ExecutionContext is in the implicit scope (and of course if this is ambiguous, it's a compile-time error).
In many case this will be the one from import ExecutionContext.Implicits.global, which can be tweaked by system properties but by default uses a ThreadPoolExecutor with one thread per processor core.
The scheduling however is a different matter. For some use-cases you could provide your own ExecutionContext which always applied the same delay before execution. But if you want the delay to be controllable from the call site, then of course you can't use Future.apply as there are no parameters to communicate how this should be scheduled. I would suggest submitting tasks directly to a scheduled executor in this case.
Andrzej's answer already covers most of the ground in your question. Worth mention is that Scala's "default" implicit execution context (import scala.concurrent.ExecutionContext.Implicits._) is literally a java.util.concurrent.Executor, and the whole ExecutionContext concept is a very thin wrapper, but is closely aligned with Java's executor framework.
For achieving something similar to scheduled futures, as Mauricio points out, you will have to use promises, and any third party scheduling mechanism.
Not having a common mechanism for this built into Scala 2.10 futures is a pity, but nothing fatal.
A promise is a handle for an asynchronous computation. You create one (assuming ExecutionContext in scope) by calling val p = Promise[Int](). We just promised an integer.
Clients can grab a future that depends on the promise being fulfilled, simply by calling p.future, which is just a Scala future.
Fulfilling a promise is simply a matter of calling p.successful(3), at which point the future will complete.
Play 2.x solves scheduling by using promises and a plain old Java 1.4 Timer.
Here is a linkrot-proof link to the source.
Let's also take a look at the source here:
object Promise {
private val timer = new java.util.Timer()
def timeout[A](message: => A, duration: Long, unit: TimeUnit = TimeUnit.MILLISECONDS)
(implicit ec: ExecutionContext): Future[A] = {
val p = Promise[A]()
timer.schedule(new java.util.TimerTask {
def run() {
p.completeWith(Future(message)(ec))
}
}, unit.toMillis(duration))
p.future
}
}
This can then be used like so:
val future3 = Promise.timeout(3, 10000) // will complete after 10 seconds
Notice this is much nicer than plugging a Thread.sleep(10000) into your code, which will block your thread and force a context switch.
Also worth noticing in this example is the val p = Promise... at the function's beginning, and the p.future at the end. This is a common pattern when working with promises. Take it to mean that this function makes some promise to the client, and kicks off an asynchronous computation in order to fulfill it.
Take a look here for more information about Scala promises. Notice they use a lowercase future method from the concurrent package object instead of Future.apply. The former simply delegates to the latter. Personally, I prefer the lowercase future.