I have been trying to understand why Scala Futures are regarded as eager and violate referential transparency. I think I understand this part reasonably. However, I have trouble understanding what this means:
(A => Unit) => Unit
With respect to a Future.
I am not sure if this is the right forum, but ELI5 answers appreciated
The reason why Future is regarded as eager (and as such violates referential transparency) is because it evaluates as soon as the value is defined. Below is the ELI5 and non-ELI5 explanation for this.
As for (A => Unit) => Unit, it's a signature for the callback-driven asynchronous computation. In a synchronous computation, you evaluate the Future[A] to A, even if it means sitting in place and waiting a long time for the evaluation to finish. But with asynchronous computation, you don't sit and wait; instead, you pass a function of type A => Unit, and you immediately get the Unit back. Later, when the computation has finished in the background and value A has been produced, function A => Unit will be applied to it. So basically you tell the Future "once you obtain A, here's what I want you to do with it", and it responds "OK, will do, here's a Unit for you, leave now and do other stuff".
TBH I wouldn't overthink this signature too much because that's not what your mental model of working with Future should be. Instead, just become familiar with the notion of mapping and flatMapping. When you have a value wrapped in a Future, you shouldn't try to get that value out of the Future context because that would be a blocking synchronous operation. But what you can do is map over it and say "alright Future, I don't need this value A right now, I just want to describe a function A => B to you which turns it to another value B, and you make sure to apply it to once you have the original A". And if B is wrapped in a yet another Future, meaning your function is not A => B but A => Future[B], instead of mapping you should use flatMap. This is how you chain asynchronous operations. Imagine a database query which as a parameter needs something returned in the previous query.
And that's it. Somewhere at the end of the world, e.g. when you're done processing an http request and are ready to send some response payload over the wire, you will finally unwrap that future in a synchronous way (you can't send a payload if you don't know what to put in it).
Now, about referential transparency in Future:
ELI5:
Imagine you have two daughters, Anna and Betty. You tell them that their task will be to count to 20 out loud. You also tell them that Betty should start only after Anna is done. Whole process is hence expected to take about 40 seconds.
But if they evaluate their task eagerly (like Future does), as soon as you explain the task to them, they will each start counting right away. Whole process will hence last about 20 seconds.
In the context of programming, referential transparency says that you should always be able to replace (pseudocode):
// imagine >> as a pipe operator which starts the next function
// only after previous one has terminated
count(20) >> count(20)
with
anna = count(20)
betty = count(20)
anna >> betty
but that's not true in this situation because of eager evaluation (the girls start counting as soon as their task is explained to them, so in the second case the program will last only 20 seconds regardless of the pipe).
non-ELI5:
Let's prepare an execution context for Future and a function that will be evaluated. It simply sleeps for two seconds before printing "hi".
import scala.concurrent.ExecutionContext.Implicits.global
def f = {
Thread.sleep(2000)
println("hi")
}
Let's now write a for comprehension which will create two Futures one after another:
val done = for {
f1 <- Future(f)
f2 <- Future(f)
} yield (f1, f2)
import scala.concurrent.duration._
Await.result(done, 5000 millis)
As expected, after two seconds we'll get the first "hi" (from f1), and after additional two seconds we'll get the second "hi" (from f2).
Now let's do a small modification; we will first define two Future values, and then we'll use those in the for comprehension:
val future1 = Future(f)
val future2 = Future(f)
val done = for {
f1 <- future1
f2 <- future2
} yield (f1, f2)
import scala.concurrent.duration._
Await.result(done, 5000 millis)
What happens this time is that after approximately two seconds you get two simultaneous "hi" printouts. This is because both future1 and future2 started getting evaluated as soon as they were defined. By the time they got chained in the for comprehension, they were already running alongside each other on the given execution context.
This is why referential transparency is broken; normally you should be able to replace:
doStuff(foo)
with
val f = foo
doStuff(f)
without having any consequence on the behaviour of the program, but in the case of Future, as you can see above, that's not the case.
Related
for {
i <- 1 to 5
} yield Future(println(i))
Desugared to:
List(1,2,3,4,5).map {i => Future(println(i))}
The above code prints numbers in random order.
Now, if we see the multiple definitions of Monad:
a) Monad is a wrapper over an object
b) Monad is a mechanism for sequencing computations
The question that I'm trying to answer is that shouldn't map operation on List monad wait for the first element in the list to be printed and only then go for the computation of the second element regardless of Future?
Sorry, it might be simple and I'm complicating it but it gets trickier for me to find simple reasoning. Answers will be much appreciated:)
Compare:
for {
_ <- Future(println(1))
_ <- Future(println(2))
_ <- Future(println(3))
_ <- Future(println(4))
_ <- Future(println(5))
} yield ()
or
Future(println(1)).flatMap { _ =>
Future(println(2))
}.flatMap { _ =>
Future(println(3))
}.flatMap { _ =>
Future(println(4))
}.flatMap { _ =>
Future(println(5))
}
with
List(
Future(println(1)),
Future(println(2)),
Future(println(3)),
Future(println(4)),
Future(println(5))
)
The first two create the next Future only after the former completed and made the result available. The last one creates all Futures at once (and it doesn't differ much in this regard from your example with List[Future]).
Future (as opposed to IO from Cats Effect, Monix's Task or ZIO) is eager, so it starts execution the moment you create it. For that reason you have sequential result in the first two examples, and random order (race condition) in the third example.
If you used IO instead of Future it would be more apparent because you wouldn't be able to just have List[IO[Unit]] and execute side effects - you would have to somehow combine the different IOs into one, and the way you would do it would make it obvious whether the effects will be sequential or parallel.
The bottom line is - whether or not Future is a monad depends on how the .flatMap behaves (and how it behaves with combination with Future.successful), so your results doesn't invalidate the claim that Future is a monad. (You can have some doubts if you start checking its behavior with exceptions, but that is another topic).
The execution of map is sequential indeed, but when you wrap it to a Future it gets executed in an asynchronous manner, I mean it is evaluated in another thread and because of that, it is not possible to know what thread is going to finish earlier because it depends also in the thread management of the operating system and other considerations.
Both of your code snippets are still Monads in loose terms. When you did .map() on your object, the map picks element one by one in orderly fashion (from index 0 to index 4). And then it passes on that to an operation block (which is body of map - map is a higher order function that accepts a function of type f:This => That).
So monad operation's responsibility is picking it up and passing it as paramater to a function.
In your case the actual function type is:
f: Int => Future[Unit]
For clarity, your function actually looks like this:
def someFunction(i: Int): Future[Unit] = {
Future {
println(i)
}
}
So, what the map operation did here is that it picked on item from your object (in sequence, one by one) and called the someFunction(i). And that's all a monad does.
Now to answer why your println are random, it's because of JVM threads.
If you re-define the body of you map like this
List(1,2,3,4,5)
.map {i =>
println(s"Going to invoke the println in another thread for $i")
Future(println(i))
}
You'll see that the first println will be in sequence - always! It proves that .map() picks your elements in sequence. While the next println may or may not be out of sequence. This out of order fashion is not because of monad operation map but because of multithreading nature in multi core CPUs.
Say I have the following snippet
def testFailure2() = {
val f1 = Future.failed(new Exception("ex1"))
val f2 = Future.successful(2);
val f3 = Future.successful((5));
val f4 = Future.failed(new Exception("ex4"))
val l = List(f1, f2, f3, f4)
l
}
The return type is List[Future[Int]]. In a normal way, I can just do Future.sequence and get List[Future[Int]]. But in this scenario it won't work as I have a failed Future. So I want to convert this to List[Future[Int]] by ignoring the failed Futures. How do I do that?
Second Q on similar topic I have is, I understand filter, collect, partition, etc on a List. In this scenario, say I wanted to filter/partition the list into two lists
- Failed Futures in one
- Successfully done Futures in another.
How do I do that?
One way would be to first convert all Future[Int]s to Future[Option[Int]] that always succeed (but result in None if the original future fails). Then you can use Future.sequence and then flatten the result:
def sequenceIgnoringFailures[A](xs: List[Future[A]])(implicit ec: ExecutionContext): Future[List[A]] = {
val opts = xs.map(_.map(Some(_)).fallbackTo(Future(None)))
Future.sequence(opts).map(_.flatten)
}
The other answer is correct : you should use a Future[List[X]] where X is something that differentiate between failure and success. It can be an Option, an Either, a Try, or whatever you want.
It seems like you're bothered by this, and I suppose it's because you're willing to find something like :
Do all these futures in parallel, ignore the failed ones during the process
And you're given
Do all these futures, wait for everything to finish, and discard based on the result
But actually, there is no special way to express "ignore the failed ones". Something has to acknowledge each future result since you're interested in it, otherwise starting it makes no sense in the first place. And this something has to wait for all futures to finish anyway. And as such, the flag for "you can now ignore me" is indeed the Option being None, the Either being Left, or the Try being Failure. There is not, afaik, a specific flag for futures for "this result being discarded", and I don't think scala would need one.
So, fear not, and go for Future[List[X]], because it actually expresses what you want ! :-)
What is the cost of a scala Future? Is it bad practice to spin up, say, 1000 of them only to flatMap them away again right away?
In my case, I don't need 1000 futures - I could actually get away with about 10 or so, but it makes my code cleaner to use more futures, and I'm trying to get a sense of tradeoffs between code elegance and abusing resources. Obviously if I had blocking code, they'd be expensive, but if not, how many should I feel free to spin up to save a few lines of code?
You say you create some of them just to deal with a homogeneous list of Future[T]. In that case, if you just want to lift some T to a Future[T], you can do Future.successful(myValue). This causes no asynchronous background operations to be performed. It's a ready value, just wrapped in Future context.
EDIT: After re-reading your question and comments, I believe this is enough for an answer. Continue reading for extra info.
Regarding flatMapping, be aware that if you create 1000 futures beforehand as 1000 different vals, they will start right away (well, whenever JVM execution context decides that it's a good time to start, but definitely as soon as possible). However, if you create them in-place inside the flatMap, they will be chained (whole point of M-word's flatMap is to chain stuff in sequential series, with each step possibly depending on the result of previous one).
Demo:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
val f1 = Future({ Thread.sleep(2000); println("one") })
val f2 = Future({ Thread.sleep(2000); println("two") })
val result1 = f1.flatMap(f => f2)
Await.result(result1, 5 seconds)
val result2 = Future({ Thread.sleep(2000); println("one") })
.flatMap(f => Future({ Thread.sleep(2000); println("two") }))
Await.result(result2, 5 seconds)
In the case of result1, you will get "one" and "two" printed out together after two seconds (sometimes even first "two" then "one"). But in the second case where we created the futures in-place inside the flatMap, first "one" is printed after two seconds, and then after another two seconds "two". This means that if you chain 1000 Futures like this, but the chain breaks at step 42, the rest 958 futures will not be calculated.
By combining these two facts about Futures you can avoid creating unnecessary ones.
I hope I helped at least a little, because regarding your main question - how much memory and other overhead does a Future cost - I don't have the numbers. That really depends on the settings of your JVM and the machine that's running the code. I do think however that even if your system can take anything you throw at it, you shouldn't be doing (a lot of) unnecessary background Future computations. And even though there are such things as sensible timeouts and cancelling via their respective Promises, creating an extra million of Futures you won't need sounds like a bad design IMHO.
(Note that I said "background computations". If you mainly need all these Futures to keep all types "in the future level" so that the whole code is easier to work with (e.g. with for comprehensions), in that case aforementioned Future.successful is your friend since it's not a computation, just an already computed value stored in a Future context)
I might have misunderstood your question. Correct me if I am mistaken.
What is the cost of a scala Future?
Whenever you wrap expression(s) in future, a java runnable is created under the hood. And a Runnable is just an interface with a run(). Your block of codes is then wrapper inside the run method of the runnable. This runnable is then submitted to the execution context.
In a very general sense, a future is nothing more than a runnable with bunch of helper methods. An instance of future is no different from other objects. You may reference this thread to get a rough idea on what's the memory consumption of a single java object.
If you are interested, you can trace the whole chain of action starting from the creation of a future
Given the following scala.concurrent.Task instance created via Task#delay:
val t =
Task.delay { println(Thread.currentThread); Thread.sleep(5000); 42 }
I wrote a method that will run t asynchronously.
def f = t.runAsync {
case \/-(x) => println(x)
case -\/(e) => println(e.getMessage)
}
Running it shows that f evaluates entirely, i.e. waits 5 seconds, and then evaluates again. In other words, the second f appears to wait until the first f had completed
scala> {f; f; }
Thread[run-main-0,5,run-main-group-0]
42
Thread[run-main-0,5,run-main-group-0]
42
Then, I re-wrote t using Task#apply:
val u =
Task { println(Thread.currentThread); Thread.sleep(5000); 42 }
Again, I defined a method that executes u with runAsync:
def g = u.runAsync {
case \/-(x) => println(x)
case -\/(e) => println(e.getMessage)
}
Finally, I ran two g's.
scala> {g; g}
Thread[pool-3-thread-2,5,run-main-group-0]
Thread[pool-3-thread-3,5,run-main-group-0]
scala> 42
42
However, in the above result, the g's, more or less, ran at the same time.
I had expected that {f; f; } would've run asynchronously, i.e. in the same way as g. But, it seems to me that calling f resulted in a block.
EDIT
Task's docs note on runAsync:
Any pure, non-asynchronous computation at the head of this Future will be forced in the calling thread.
Since t's body is non-asynchronous, I suppose that the above comment explains why it blocked, i.e. "forced in the calling thread."
When is the right time to use Task#delay versus Task#apply?
You can think of Task.delay as a fancy version of something like () => Try[A]. It suspends evaluation of the computation, but doesn't have anything to say about what thread that evaluation is eventually going to run on, etc. (which means it's just going to run on the current thread).
This is often exactly what you want. Consider a definition like this:
val currentTime: Task[Long] = Task.xxx(System.currentTimeMillis)
We can't use now because that would evaluate the time immediately (and only once, on definition). We could use apply, but forcing an asynchronous boundary for this computation is wasteful and unnecessary—we actually want it to run in the current thread, just not right now. This is exactly what delay provides.
In general when you're modeling your computations, if something is always going to be computationally expensive, you might want to consider Task.apply, which means the evaluation will always happen on a thread determined by the current implicit ExecutorService. This may make usage a little cleaner, at the expense of flexibility—you're baking something you know about the runtime characteristics of the evaluation of the computation into its definition.
The nice thing about using delay to define your asynchronous computations is that you can always force an asynchronous boundary by wrapping your Task with Task.fork, which gets you essentially the same thing you'd have if you'd defined the computation with Task.apply. It's not possible to go in the other direction—if you use Task.apply, the implicit strategy is going to determine where the computation is evaluated and that's all there is to it.
I am having a issue with the below piece of code. I want 'combine' method to get triggered after all groundCoffee,heaterWater,frothedMilk method completes. They would be triggered concurrently.All the 4 methods grind,heatWater,frothMilk,brew are concurrently executed using a future.
def prepareCappuccino(): Future[Cappuccino] = {
val groundCoffee = grind("arabica beans")
val heatedWater = heatWater(Water(20))
val frothedMilk = frothMilk("milk")
for {
ground <- groundCoffee
water <- heatedWater
foam <- frothedMilk
espresso <- brew(ground, water)
} yield combine(espresso, foam)
}
When I execute the above method the output I am getting is below
start grinding...
heating the water now
milk frothing system engaged!
And the program exits after this. I got this example from a site while I was trying to learn futures. How can the program be made to wait so that combine method get triggered after all the futures return?
The solution already posted to Await for a future is a solution when you want to deliberately block execution on that thread. Two common reasons to do this are for testing, when you want to wait for the outcome before making an assertion, and when otherwise all threads would exit (as is the case when doing toy examples).
However in a proper long lived application Await is generally to be avoided.
Your question already contains one of the correct ways to do future composition - using a for comprehension. Bear in mind here, that for-comprehensions are converted to flatMaps, maps and withFilter operations, so any futures you invoke in the for-comprehension will only be created after the others complete, ie serially.
If you want a bunch of futures to operate in concurrently, then you would create them before entering the for-comprehension as you have done.
You can use the Await here:
val f = Future.sequence(futures.toList)
Await.ready(f, Duration.Inf)
I assume, you have all the futures packed in a list. The Await.ready makes all the waiting work.