What is the cost of a Scala Future? - scala

What is the cost of a scala Future? Is it bad practice to spin up, say, 1000 of them only to flatMap them away again right away?
In my case, I don't need 1000 futures - I could actually get away with about 10 or so, but it makes my code cleaner to use more futures, and I'm trying to get a sense of tradeoffs between code elegance and abusing resources. Obviously if I had blocking code, they'd be expensive, but if not, how many should I feel free to spin up to save a few lines of code?

You say you create some of them just to deal with a homogeneous list of Future[T]. In that case, if you just want to lift some T to a Future[T], you can do Future.successful(myValue). This causes no asynchronous background operations to be performed. It's a ready value, just wrapped in Future context.
EDIT: After re-reading your question and comments, I believe this is enough for an answer. Continue reading for extra info.
Regarding flatMapping, be aware that if you create 1000 futures beforehand as 1000 different vals, they will start right away (well, whenever JVM execution context decides that it's a good time to start, but definitely as soon as possible). However, if you create them in-place inside the flatMap, they will be chained (whole point of M-word's flatMap is to chain stuff in sequential series, with each step possibly depending on the result of previous one).
Demo:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
val f1 = Future({ Thread.sleep(2000); println("one") })
val f2 = Future({ Thread.sleep(2000); println("two") })
val result1 = f1.flatMap(f => f2)
Await.result(result1, 5 seconds)
val result2 = Future({ Thread.sleep(2000); println("one") })
.flatMap(f => Future({ Thread.sleep(2000); println("two") }))
Await.result(result2, 5 seconds)
In the case of result1, you will get "one" and "two" printed out together after two seconds (sometimes even first "two" then "one"). But in the second case where we created the futures in-place inside the flatMap, first "one" is printed after two seconds, and then after another two seconds "two". This means that if you chain 1000 Futures like this, but the chain breaks at step 42, the rest 958 futures will not be calculated.
By combining these two facts about Futures you can avoid creating unnecessary ones.
I hope I helped at least a little, because regarding your main question - how much memory and other overhead does a Future cost - I don't have the numbers. That really depends on the settings of your JVM and the machine that's running the code. I do think however that even if your system can take anything you throw at it, you shouldn't be doing (a lot of) unnecessary background Future computations. And even though there are such things as sensible timeouts and cancelling via their respective Promises, creating an extra million of Futures you won't need sounds like a bad design IMHO.
(Note that I said "background computations". If you mainly need all these Futures to keep all types "in the future level" so that the whole code is easier to work with (e.g. with for comprehensions), in that case aforementioned Future.successful is your friend since it's not a computation, just an already computed value stored in a Future context)

I might have misunderstood your question. Correct me if I am mistaken.
What is the cost of a scala Future?
Whenever you wrap expression(s) in future, a java runnable is created under the hood. And a Runnable is just an interface with a run(). Your block of codes is then wrapper inside the run method of the runnable. This runnable is then submitted to the execution context.
In a very general sense, a future is nothing more than a runnable with bunch of helper methods. An instance of future is no different from other objects. You may reference this thread to get a rough idea on what's the memory consumption of a single java object.
If you are interested, you can trace the whole chain of action starting from the creation of a future

Related

Opportunistic, partially and asyncronously pre-processing of a syncronously processing iterator

Let us use Scala.
I'm trying to find the best possible way to do an opportunistic, partial, and asynchronous pre-computation of some of the elements of an iterator that is otherwise processed synchronously.
The below image illustrates the problem.
There is a lead thread (blue) that takes an iterator and a state. The state contains mutable data that must be protected from concurrent access. Moreover, the state must be updated while the iterator is processed from the beginning, sequentially, and in order because the elements of the iterator depend on previous elements. Moreover, the nature of the dependency is not known in advance.
Processing some elements may lead to substantial overhead (2 orders of magnitude) compared to others, meaning that some elements are 1ms to compute and some elements are 300ms to compute. It would lead to significant improvements in terms of running time if I could pre-process the next k elements speculatively. A speculative pre-processing on asynchronous threads is possible (while the blue thread is synchronously processing), but the pre-processed data must be validated by the blue thread, whether the result of pre-computation is valid at that time. Usually (90% of the time), it should be valid. Thus, launching separate asynchronous threads to pre-process the remaining portion of the iterator speculatively would spear many 300s of milliseconds in running time.
I have studied comparisons of asynchronous and functional libraries of Scala to understand better which model of computation, or in other words, which description of computation (which library) could be a better fit to this processing problem. I was thinking about communication patterns and came about with the following ideas:
AKKA
Use an AKKA actor Blue for the blue thread that takes the iterator, and for each step, it sends a Step message to itself. On a Step message, before it starts the processing of the next ith element, it sends a PleasePreprocess(i+k) message with the i+kth element to one of the k pre-processor actors in place. The Blue would Step to i+1 only and only if PreprocessingKindlyDone(i+1) is received.
AKKA Streams
AFAIK AKKA streams also support the previous two-way backpressure mechanism, therefore, it could be a good candidate to implement what actors do without actually using actors.
Scala Futures
While the blue thread processes elements ˙processElement(e)˙ in iterator.map(processElement(_)), then it would also spawn Futures for preprocessing. However, maintaining these pre-processing Futures and awaiting their states would require a semi-blocking implementation in pure Scala as I see, so I would not go with this direction to the best of my current knowledge.
Use Monix
I have some knowledge of Monix but could not wrap my head around how this problem could be elegantly solved with Monix. I'm not seeing how the blue thread could wait for the result of i+1 and then continue. For this, I was thinking of using something like a sliding window with foldLeft(blueThreadAsZero){ (blue, preProc1, preProc2, notYetPreProc) => ... }, but could not find a similar construction.
Possibly, there could be libraries I did not mention that could better express computational patterns for this.
I hope I have described my problem adequately. Thank you for the hints/ideas or code snippets!
You need blocking anyhow, if your blue thread happens to go faster than the yellow ones. I don't think you need any fancy libraries for this, "vanilla scala" should do (like it actually does in most cases). Something like this, perhaps ...
def doit[T,R](it: Iterator[T], yellow: T => R, blue: R => R): Future[Seq[R]] = it
.map { elem => Future(yellow(elem)) }
.foldLeft(Future.successful(List.empty[R])) { (last, next) =>
last.flatMap { acc => next.map(blue).map(_ :: acc) }
}.map(_.reverse)
I didn't test or compile this, so it could need some tweaks, but conceptually, this should work: pass through the iterator and start preprocessing right away, then fold to tuck the "validation" on each completing preprocess sequentially.
I would split the processing into two steps, the pre-processing that could be run in parallel and the dependent one which has to be serial.
Then, you can just create a stream of data from the iterator do a parallel map applying the preprocess step and finish with a fold
Personally I would use fs2, but the same approach can be expressed with any streaming solution like AkkaStreams, Monix Observables or ZIO ZStreams
import fs2.Stream
import cats.effect.IO
val finalState =
Stream
.fromIterator[IO](iterator = ???, chunkSize = ???)
.parEvalMap(elem => IO(preProcess(elem))
.compile
.fold(initialState) {
case (acc, elem) =>
computeNewState(acc, elem)
}
PS: Remember to benchmark to make sure parallelism is actually speeding things up; it may not be worth the hassle.

What is Future and Promise in Scala? [duplicate]

I am trying to get my head around Scala's promise and future constructs.
I've been reading Futures and Promises in Scala Documentation and am a bit confused as I've got a feeling that the concepts of promises and futures are mixed up.
In my understanding a promise is a container that we could populate
value in a later point. And future is some sort of an asynchronous
operation that would complete in a different execution path.
In Scala we can obtain a result using the attached callbacks to future.
Where I'm lost is how promise has a future?
I have read about these concepts in Clojure too, assuming that promise and future have some generic common concept, but it seems like I was wrong.
A promise p completes the future returned by p.future. This future is
specific to the promise p. Depending on the implementation, it may be
the case that p.future eq p.
val p = promise[T]
val f = p.future
You can think of futures and promises as two different sides of a pipe.
On the promise side, data is pushed in, and on the future side, data can be pulled out.
And future is some sort of an asynchronous operation that would complete in a different execution path.
Actually, a future is a placeholder object for a value that may be become available at some point in time, asynchronously. It is not the asynchronous computation itself.
The fact that there is a future constructor called future that returns such a placeholder object and spawns an asynchronous computation that completes this placeholder object does not mean that the asynchronous computation is called a future. There are also other future constructors/factory methods.
But the point I do not get is how promise has a future?
To divide promises and futures into 2 separate interfaces was a design decision. You could have these two under the same interface Future, but that would then allow clients of futures to complete them instead of the intended completer of the future. This would cause unexpected errors, as there could be any number of contending completers.
E.g. for the asynchronous computation spawned by the future construct, it would no longer be clear whether it has to complete the promise, or if the client will do it.
Futures and promises are intended to constrain the flow of data in the program.
The idea is to have a future client that subscribes to the data to act on it once the data arrives.
The role of the promise client is to provide that data.
Mixing these two roles can lead to programs that are harder to understand or reason about.
You might also ask why the Promise trait does not extend Future. This is another design decision to discourage programmers from blindly passing Promises to clients where they should upcast the Promise to Future (this upcast is prone to be left out, whereas having to explicitly call future on the promise ensures you call it every time). In other words, by returning a promise you are giving the right to complete it to somebody else, and by returning the future you are giving the right to subscribe to it.
EDIT:
If you would like to learn more about futures, Chapter 4 in the Learning Concurrent Programming in Scala book describes them in detail. Disclaimer: I'm the author of the book.
The difference between the two is that futures are usually centered around the computation while promises are centered around data.
It seems your understanding matches this, but let me explain what I mean:
In both scala and clojure futures are (unless returned by some other function/method) created with some computation:
// scala
future { do_something() }
;; clojure
(future (do-something))
In both cases the "return-value" of the future can only be read (without blocking) only after the computation has terminated. When this is the case is typically outside the control of the programmer, as the computation gets executed in some thread (pool) in the background.
In contrast in both cases promises are an initially empty container, which can later be filled (exactly once):
// scala
val p = promise[Int]
...
p success 10 // or failure Exception()
;; clojure
(def p (promise))
(deliver p 10)
Once this is the case it can be read.
Reading the futures and promises is done through deref in clojure (and realized? can be used to check if deref will block). In scala reading is done through the methods provided by the Future trait. In order to read the result of a promise we thus have to obtain an object implementing Future, this is done by p.future. Now if the trait Future is implemented by a Promise, then p.future can return this and the two are equal. This is purely a implementation choice and does not change the concepts. So you were not wrong!
In any case Futures are mostly dealt with using callbacks.
At this point it might be worthwhile to reconsider the initial characterization of the two concepts:
Futures represent a computation that will produce a result at some point. Let's look at one possible implementation: We run the code in some thread(pool) and once its done, we arrange use the return value to fulfill a promise. So reading the result of the future is reading a promise; This is clojure's way of thinking (not necessarily of implementation).
On the other hand a promise represents a value that will be filled at some point. When it gets filled this means that some computation produced a result. So in a way this is like a future completing, so we should consume the value in the same way, using callbacks; This is scala's way of thinking.
Note that under the hood Future is implemented in terms of Promise and this Promise is completed with the body you passed to your Future:
def apply[T](body: =>T): Future[T] = impl.Future(body) //here I have omitted the implicit ExecutorContext
impl.Future is an implementation of Future trait:
def apply[T](body: =>T)(implicit executor: ExecutionContext): scala.concurrent.Future[T] =
{
val runnable = new PromiseCompletingRunnable(body)
executor.prepare.execute(runnable)
runnable.promise.future
}
Where PromiseCompletingRunnable looks like this:
class PromiseCompletingRunnable[T](body: => T) extends Runnable {
val promise = new Promise.DefaultPromise[T]()
override def run() = {
promise complete {
try Success(body) catch { case NonFatal(e) => Failure(e) }
}
} }
So you see even though they are seperate concepts that you can make use of independently in reality you can't get Future without using Promise.

What are some best practices to mix async libraries with sync code in scala

I'm working on a scala code where a 3rd party library returns a Future[Boolean] object while I need to consume this future object in my scala code which is fully written in a synchronous manner.
Currently, I'm doing Await.result on 3rd party lib operation to ensure it returns just boolean. Is there a better way to handle this, my scala code needs a boolean value for further operation?
As Luis noted in the comments, in general there's no alternative to Awaiting on the Future.
That said, you may have some choice about where to Await.
For instance, if you have code like
val result = Await.result(someFuture, Duration.Inf)
f(result)
It may be more useful to run f in Future land with
Await.result(someFuture.map(f), Duration.Inf)
If f happens to block, then it may be worth either wrapping f in blocking or explicitly using an ExecutionContext which will handle a lot of its threads being blocked (e.g. one that can have more threads than cores) for the map.
In general, you'll want to move Awaits to the outermost edge of your code as you can, even shifting edges if you can.

Clarification needed about futures and promises in Scala

I am trying to get my head around Scala's promise and future constructs.
I've been reading Futures and Promises in Scala Documentation and am a bit confused as I've got a feeling that the concepts of promises and futures are mixed up.
In my understanding a promise is a container that we could populate
value in a later point. And future is some sort of an asynchronous
operation that would complete in a different execution path.
In Scala we can obtain a result using the attached callbacks to future.
Where I'm lost is how promise has a future?
I have read about these concepts in Clojure too, assuming that promise and future have some generic common concept, but it seems like I was wrong.
A promise p completes the future returned by p.future. This future is
specific to the promise p. Depending on the implementation, it may be
the case that p.future eq p.
val p = promise[T]
val f = p.future
You can think of futures and promises as two different sides of a pipe.
On the promise side, data is pushed in, and on the future side, data can be pulled out.
And future is some sort of an asynchronous operation that would complete in a different execution path.
Actually, a future is a placeholder object for a value that may be become available at some point in time, asynchronously. It is not the asynchronous computation itself.
The fact that there is a future constructor called future that returns such a placeholder object and spawns an asynchronous computation that completes this placeholder object does not mean that the asynchronous computation is called a future. There are also other future constructors/factory methods.
But the point I do not get is how promise has a future?
To divide promises and futures into 2 separate interfaces was a design decision. You could have these two under the same interface Future, but that would then allow clients of futures to complete them instead of the intended completer of the future. This would cause unexpected errors, as there could be any number of contending completers.
E.g. for the asynchronous computation spawned by the future construct, it would no longer be clear whether it has to complete the promise, or if the client will do it.
Futures and promises are intended to constrain the flow of data in the program.
The idea is to have a future client that subscribes to the data to act on it once the data arrives.
The role of the promise client is to provide that data.
Mixing these two roles can lead to programs that are harder to understand or reason about.
You might also ask why the Promise trait does not extend Future. This is another design decision to discourage programmers from blindly passing Promises to clients where they should upcast the Promise to Future (this upcast is prone to be left out, whereas having to explicitly call future on the promise ensures you call it every time). In other words, by returning a promise you are giving the right to complete it to somebody else, and by returning the future you are giving the right to subscribe to it.
EDIT:
If you would like to learn more about futures, Chapter 4 in the Learning Concurrent Programming in Scala book describes them in detail. Disclaimer: I'm the author of the book.
The difference between the two is that futures are usually centered around the computation while promises are centered around data.
It seems your understanding matches this, but let me explain what I mean:
In both scala and clojure futures are (unless returned by some other function/method) created with some computation:
// scala
future { do_something() }
;; clojure
(future (do-something))
In both cases the "return-value" of the future can only be read (without blocking) only after the computation has terminated. When this is the case is typically outside the control of the programmer, as the computation gets executed in some thread (pool) in the background.
In contrast in both cases promises are an initially empty container, which can later be filled (exactly once):
// scala
val p = promise[Int]
...
p success 10 // or failure Exception()
;; clojure
(def p (promise))
(deliver p 10)
Once this is the case it can be read.
Reading the futures and promises is done through deref in clojure (and realized? can be used to check if deref will block). In scala reading is done through the methods provided by the Future trait. In order to read the result of a promise we thus have to obtain an object implementing Future, this is done by p.future. Now if the trait Future is implemented by a Promise, then p.future can return this and the two are equal. This is purely a implementation choice and does not change the concepts. So you were not wrong!
In any case Futures are mostly dealt with using callbacks.
At this point it might be worthwhile to reconsider the initial characterization of the two concepts:
Futures represent a computation that will produce a result at some point. Let's look at one possible implementation: We run the code in some thread(pool) and once its done, we arrange use the return value to fulfill a promise. So reading the result of the future is reading a promise; This is clojure's way of thinking (not necessarily of implementation).
On the other hand a promise represents a value that will be filled at some point. When it gets filled this means that some computation produced a result. So in a way this is like a future completing, so we should consume the value in the same way, using callbacks; This is scala's way of thinking.
Note that under the hood Future is implemented in terms of Promise and this Promise is completed with the body you passed to your Future:
def apply[T](body: =>T): Future[T] = impl.Future(body) //here I have omitted the implicit ExecutorContext
impl.Future is an implementation of Future trait:
def apply[T](body: =>T)(implicit executor: ExecutionContext): scala.concurrent.Future[T] =
{
val runnable = new PromiseCompletingRunnable(body)
executor.prepare.execute(runnable)
runnable.promise.future
}
Where PromiseCompletingRunnable looks like this:
class PromiseCompletingRunnable[T](body: => T) extends Runnable {
val promise = new Promise.DefaultPromise[T]()
override def run() = {
promise complete {
try Success(body) catch { case NonFatal(e) => Failure(e) }
}
} }
So you see even though they are seperate concepts that you can make use of independently in reality you can't get Future without using Promise.

Traversing lists and streams with a function returning a future

Introduction
Scala's Future (new in 2.10 and now 2.9.3) is an applicative functor, which means that if we have a traversable type F, we can take an F[A] and a function A => Future[B] and turn them into a Future[F[B]].
This operation is available in the standard library as Future.traverse. Scalaz 7 also provides a more general traverse that we can use here if we import the applicative functor instance for Future from the scalaz-contrib library.
These two traverse methods behave differently in the case of streams. The standard library traversal consumes the stream before returning, while Scalaz's returns the future immediately:
import scala.concurrent._
import ExecutionContext.Implicits.global
// Hangs.
val standardRes = Future.traverse(Stream.from(1))(future(_))
// Returns immediately.
val scalazRes = Stream.from(1).traverse(future(_))
There's also another difference, as Leif Warner observes here. The standard library's traverse starts all of the asynchronous operations immediately, while Scalaz's starts the first, waits for it to complete, starts the second, waits for it, and so on.
Different behavior for streams
It's pretty easy to show this second difference by writing a function that will sleep for a few seconds for the first value in the stream:
def howLong(i: Int) = if (i == 1) 10000 else 0
import scalaz._, Scalaz._
import scalaz.contrib.std._
def toFuture(i: Int)(implicit ec: ExecutionContext) = future {
printf("Starting %d!\n", i)
Thread.sleep(howLong(i))
printf("Done %d!\n", i)
i
}
Now Future.traverse(Stream(1, 2))(toFuture) will print the following:
Starting 1!
Starting 2!
Done 2!
Done 1!
And the Scalaz version (Stream(1, 2).traverse(toFuture)):
Starting 1!
Done 1!
Starting 2!
Done 2!
Which probably isn't what we want here.
And for lists?
Strangely enough the two traversals behave the same in this respect on lists—Scalaz's doesn't wait for one future to complete before starting the next.
Another future
Scalaz also includes its own concurrent package with its own implementation of futures. We can use the same kind of setup as above:
import scalaz.concurrent.{ Future => FutureZ, _ }
def toFutureZ(i: Int) = FutureZ {
printf("Starting %d!\n", i)
Thread.sleep(howLong(i))
printf("Done %d!\n", i)
i
}
And then we get the behavior of Scalaz on streams for lists as well as streams:
Starting 1!
Done 1!
Starting 2!
Done 2!
Perhaps less surprisingly, traversing an infinite stream still returns immediately.
Question
At this point we really need a table to summarize, but a list will have to do:
Streams with standard library traversal: consume before returning; don't wait for each future.
Streams with Scalaz traversal: return immediately; do wait for each future to complete.
Scalaz futures with streams: return immediately; do wait for each future to complete.
And:
Lists with standard library traversal: don't wait.
Lists with Scalaz traversal: don't wait.
Scalaz futures with lists: do wait for each future to complete.
Does this make any sense? Is there a "correct" behavior for this operation on lists and streams? Is there some reason that the "most asynchronous" behavior—i.e., don't consume the collection before returning, and don't wait for each future to complete before moving on to the next—isn't represented here?
I cannot answer it all, but i try on some parts:
Is there some reason that the "most asynchronous" behavior—i.e., don't
consume the collection before returning, and don't wait for each
future to complete before moving on to the next—isn't represented
here?
If you have dependent calculations and a limited number of threads, you can experience deadlocks. For example you have two futures depending on a third one (all three in the list of futures) and only two threads, you can experience a situation where the first two futures block all two threads and the third one never gets executed. (Of course, if your pool size is one, i.e. zou execute one calculation after the other, you can get similar situations)
To solve this, you need one thread per future, without any limitation. This works for small lists of futures, but not for big one. So if you run all in parallel, you will get a situation where small examples will run in all cases and bigger one will deadlock. (Example: Developer tests run fine, production deadlocks).
Is there a "correct" behavior for this operation on lists and streams?
I think it is impossible with futures. If you know something more of the dependencies, or when you know for sure that the calculations will not block, a more concurrent solution might be possible. But executing lists of futures looks for me "broken by design". Best solution seems one, that will already fail for small examples for deadlocks (i.e. execute one Future after the other).
Scalaz futures with lists: do wait for each future to complete.
I think scalaz uses for comprehensions internally for traversal. With for comprehensions, it is not guaranteed that the calculations are independent. So I guess that Scalaz is doing the right thing here with for comprehensions: Doing one calculation after the other. In the case of futures, this will always work, given you have unlimited threads in you operating system.
So in other words: You see just an artifact of how for comprehensions (must) work.
I hope this makes some sense.
If I understand the question correctly, I think it really comes down to the semantics of streams vs lists.
Traversing a list does what we'd expect from the docs:
Transforms a TraversableOnce[A] into a Future[TraversableOnce[B]] using the provided function A => Future[B]. This is useful for performing a parallel map. For example, to apply a function to all items of a list in parallel:
With streams, it's up to the developer to decide how they want it to work because it depends on more knowledge of the stream than the compiler has (streams can be infinite, but the type system doesn't know about it). if my stream is reading lines from a file, I want to consume it first, since chaining futures line by line wouldn't actually parallelize things. in this case, I would want the parallel approach.
On the other hand, if my stream is an infinite list generating sequential integers and hunting for the first prime greater than some large number, it would be impossible to consume the stream first in one sweep (the chained Future approach would be required, and we'd probably want to run over batches from the stream).
Rather than trying to figure out a canonical way to handle this, I wonder if there are missing types that would help make the different cases more explicit.