I am reading through the Scala Cookbook (http://shop.oreilly.com/product/0636920026914.do)
There is an example related to Future use that involves for comprehension.
So far my understanding about for comprehension is when use with a collection it will produce another collection with the same type. For example, if each futureX is of type Future[Int], the following should also be of type Future[Int]:
for {
r1 <- future1
r2 <- future2
r3 <- future3
} yield (r1+r2+r3)
Could someone explain me what exactly happening when use <- in this code?
I know if it was a generator it will fetch each element by looping.
First about for comprehension. It was answered on SO many many times, that it's an abstraction over a couple of monadic operations: map, flatMap, withFilter. When you use <-, scalac desugars this lines into monadic flatMap:
r <- monad into monad.flatMap(r => ... )
it looks like an imperative computation (what a monad is all about), you bind a computation result to the r. And yield part is desugared into map call. Result type depends on the type of monad's.
Future trait has a flatMap and map functions, so we can use for comprehension with it. In your example can be desugared into the following code:
future1.flatMap(r1 => future2.flatMap(r2 => future3.map(r3 => r1 + r2 + r3) ) )
Parallelism aside
It goes without saying that if execution of future2 depends on r1 then you can't escape sequential execution, but if the future computations are independent, you have two choices. You can enforce sequential execution, or allow for parallel execution. You can't enforce the latter, as the execution context will handle this.
val res = for {
r1 <- computationReturningFuture1(...)
r2 <- computationReturningFuture2(...)
r3 <- computationReturningFuture3(...)
} yield (r1+r2+r3)
will always run sequentially. It can be easily explained by the desugaring, after which the subsequent computationReturningFutureX calls are only invoked inside of the flatMaps, i.e.
computationReturningFuture1(...).flatMap(r1 =>
computationReturningFuture2(...).flatMap(r2 =>
computationReturningFuture3(...).map(r3 => r1 + r2 + r3) ) )
However this is able to run in parallel and the for comprehension aggregates the results:
val future1 = computationReturningFuture1(...)
val future2 = computationReturningFuture2(...)
val future3 = computationReturningFuture3(...)
val res = for {
r1 <- future1
r2 <- future2
r3 <- future3
} yield (r1+r2+r3)
To elaborate those existing answers here a simple result to demonstrate how for comprehension works.
Its bit lengthy functions yet they worth taking look into it.
A function that give us a range of integers
scala> def createIntegers = Future{
println("INT "+ Thread.currentThread().getName+" Begin.")
val returnValue = List.range(1, 256)
println("INT "+ Thread.currentThread().getName+" End.")
returnValue
}
createIntegers: createIntegers: scala.concurrent.Future[List[Int]]
A function that give us a range of chars
scala> def createAsciiChars = Future{
println("CHAR "+ Thread.currentThread().getName+" Begin.")
val returnValue = new ListBuffer[Char]
for (i <- 1 to 256){
returnValue += i.toChar
}
println("CHAR "+ Thread.currentThread().getName+" End.")
returnValue
}
createAsciiChars: scala.concurrent.Future[scala.collection.mutable.ListBuffer[Char]]
Using these function calls within the for comprehension.
scala> val result = for{
i <- createIntegers
s <- createAsciiChars
} yield i.zip(s)
Await.result(result, Duration.Inf)
result: scala.concurrent.Future[List[(Int, Char)]] = Future(<not completed>)
For these below lines we can make out that all the function calls are synchronous i.e. createAsciiChars function call is not executed until createIntegers completes its execution.
scala> INT scala-execution-context-global-27 Begin.
INT scala-execution-context-global-27 End.
CHAR scala-execution-context-global-28 Begin.
CHAR scala-execution-context-global-28 End.
Making these function createAsciiChars, createIntegers calls outside the for comprehensions will be asynchronous execution.
It allows r1, r2, r3 to run in parallel, if possible. It may not be possible, depending things like how many threads are available to execute Future computations, but by using this syntax you are telling the compiler to run these computations in parallel if possible, then execute the yield() when all have completed.
Related
I am using ZIO: https://github.com/zio/zio
in my build.sbt:
"dev.zio" %% "zio" % "1.0.0-RC9"
No matter what I tried, my results are always being computed each time I need them:
val t = Task {
println(s"Compute")
12
}
val r = unsafeRun(for {
tt1 <- t
tt2 <- t
} yield {
tt1 + tt2
})
println(r)
For this example, the log look like :
Compute
Compute
24
I tried with Promise:
val p = for {
p <- Promise.make[Nothing, Int]
_ <- p.succeed {
println(s"Compute - P")
48
}
r <- p.await
} yield {
r
}
val r = unsafeRun(for {
tt1 <- p
tt2 <- p
} yield {
tt1 + tt2
})
And I get the same issue:
Compute - P
Compute - P
96
I tried with
val p = for {
p <- Promise.make[Nothing, Int]
_ <- p.succeed(48)
r <- p.await
} yield {
println(s"Compute - P")
r
}
first and I was thinking that maybe the pipeline is executed but not the value recomputed but I does not work either.
I would like to be able to compute asynchronously my values and be able to reuse them.
I looked at How do I make a Scalaz ZIO lazy? but it does not work for me either.
ZIO has memoize, which should do essentially what you want. I don't have a way to test it just now, but it should work something like:
for {
memoized <- t.memoize
tt1 <- memoized
tt2 <- memoized
} yield tt1 + tt2
Note that unless the second and third lines of your real code have some branching that might result in the Task never getting called, or getting called only once, this yields the same answer and side effects as the much simpler:
t flatMap {tt => tt + tt}
Does computing the results have side effects? If it doesn't you can just use a regular old lazy val, perhaps lifted into ZIO.
lazy val results = computeResults()
val resultsIO = ZIO.succeedLazy(results)
If it does have side effects, you can't really cache the results because that wouldn't be referentially transparent, which is the whole point of ZIO.
What you'll probably have to do is flatMap on your compute Task and write the rest of your program which needs the result of that computation inside that call to flatMap, threading the result value as a parameter through your function calls where necessary.
val compute = Task {
println(s"Compute")
12
}
compute.flatMap { result =>
// the rest of your program
}
If I have the following for comprehension, futures will be executed in order: f1, f2, f3:
val f = for {
r1 <- f1
r2 <- f2(r1)
r3 <- f3(r2)
} yield r3
For this one however, all the futures are started at the same time:
val f = for {
r1 <- f1
r2 <- f2
r3 <- f3
} yield ...
How can I enforce the order?(I want this order of execution f1, f2, f3)
It does matter what f1, f2, f3 are: a future will start executing a soon as it is created. In your first case, f2(r1) must be a function returning a future, so the future begins executing when the function is called, which happens when r1 becomes available.
If the second case is the same (f2 is a function), then behavior will be the same as the first case, your futures will be executed sequentially, one after the other.
But if you create the futures outside the for, and just assign them to variables f1, f2, f3, then by the time you get inside the comprehension, they are already running.
Future are eager constructs, that is, once created you can not dictate when they get processed. If the Future already exists when you attempt to use it in a for-comprehension, you've already lost the ability to sequence it's execution order.
If you want to enforce ordering on a method that accepts Future arguments then you'll need to wrap the evaluation in a thunk:
def foo(ft: => Future[Thing], f2: => Future[Thing]): Future[Other] = for{
r1 <- ft
r2 <- f2
} yield something(r1, r2)
If, on the other hand, you want to define the Future within a method body, then instead of val use a def
def foo() ={
def f1 = Future{ code here... }
def f2 = Future{ code here... }
for{
r1 <- f1
r2 <- f2
} yield something(r1, r2)
Executing futures in for comprehension is default behavior. It is good when few tasks are processed parrallel without any blocking.
But if you want to preserve procecessing order you have to ways:
Send result of first task to second like in your example
use andThen operator
val allposts = mutable.Set[String]()
Future {
session.getRecentPosts
} andThen {
posts => allposts ++= posts
} andThen {
posts =>
clearAll()
for (post <- allposts) render(post)
}
Given this simple computation i can not clearly see the difference between using applicative style over monadic style. Are there some better examples out there ( in scala ) when to use the one over the other.
println( (3.some |#| none[Int] |#| 4.some )( (a:Int,b:Int,c:Int) => { a + b + c } ) ) // prints None
println( for(
a <- Some(3);
b <- none[Int];
c <- Some(4)
) yield( a + b + c ) ) // prints None
Both computations ending up in a None so the end result is the same. The only difference i can see ist that there is no temporaray access to those vars in the for comprehension when using the applicative syntax.
Furthermore having one None value stops the whole computation. I thought applicative means "not dependent on the result of the computation before"
The applicative builder syntax will evaluate each term and can not use the result of a prior computation. However, even if the first result is None, all the other expressions will still be evaluated.
Whereas, with the for comprehension, it will 'fail fast' (it will not evaluate any further expressions after a None, in your case), plus you can access the results of previous computations.
Don't think of these things as simply different styles, they are calling different functions with different behaviours: i.e. flatMap vs apply
Monads represent sequential computations where each next computation depends on previous ones (if previous computation is empty you can't proceed, so you "fail fast"), more generic example of monadic computation:
println( for(
a <- Some(1);
b <- Some(a);
c <- Some(a + b)
) yield( a + b + c ) ) //=> 4
Applicative is just fmap on steroids where not only an argument, but a mapping function itself can be empty. In your case it can be rewritten as:
4.some <*>
{ none[Int] <*>
{ 3.some <*>
{ (_: Int) + (_: Int) + (_: Int) }.curried.some } }
On some step your function becomes Option[Int => Int] = None, but it doesn't stop from applying it to 4.some, only the result is None as expected. You still need to know the value of 4.some.
Suppose I have 4 future computations to do. The first two can be done in parallel, but the third must be done after the first two (even though the values of the first two are not used in the third -- think of each computation as a command that performs some db operation). Finally, there is a 4th computation that must occur after all of the first 3. Additionally, there is a side effect that can be started after the first 3 complete (think of this as kicking off a periodic runnable). In code, this could look like the following:
for {
_ <- async1 // not done in parallel with async2 :( is there
_ <- async2 // any way of achieving this cleanly inside of for?
_ <- async3
_ = sideEffect // do I need "=" here??
_ <- async4
} yield ()
The comments show my doubts about the quality of the code:
What's the cleanest way to do two operations in parallel in a for comprehension?
Is there is a way to achieve this result without so many "_" characters (nor assigning a named reference, at least in the case of sideEffect)
what's the cleanest and most idiomatic way to do this?
You can use zip to combine two futures, including the result of zip itself. You'll end up with tuples holding tuples, but if you use infix notation for Tuple2 it is easy to take them apart. Below I define a synonym ~ for succinctness (this is what the parser combinator library does, except its ~ is a different class that behaves similiarly to Tuple2).
As an alternative for _ = for the side effect, you can either move it into the yield, or combine it with the following statement using braces and a semicolon. I would still consider _ = to be more idiomatic, at least so far as having a side effecting statement in the for is idiomatic at all.
val ~ = Tuple2
for {
a ~ b ~ c <- async1 zip
async2 zip
async3
d <- { sideEffect; async4 }
} yield (a, b, c, d)
for-comprehensions represent monadic operations, and monadic operations are sequenced. There's superclass of monad, applicative, where computations don't depend on the results of prior computations, thus may be run in parallel.
Scalaz has a |#| operator for combining applicatives, so you can use (future1 |#| future2)(proc(_, _)) to dispatch two futures in parallel and then run "proc" on the result of both of them, as opposed to sequential computation of for {a <- future1; b <- future2(a)} yield b (or just future1 flatMap future2).
There's already a method on stdlib Futures called .zip that combines Futures in parallel, and indeed the scalaz impl uses this: https://github.com/scalaz/scalaz/blob/scalaz-seven/core/src/main/scala/scalaz/std/Future.scala#L36
And .zip and for-comprehensions may be intermixed to have parallel and sequential parts, as appropriate.
So just using the stdlib syntax, your above example could be written as:
for {
_ <- async1 zip async2
_ <- async3
_ = sideEffect
_ <- async4
} yield ()
Alternatively, written w/out a for-comprehension:
async1 zip async2 flatMap (_=> async3) flatMap {_=> sideEffect; async4}
Just as an FYI, it's really simple to get two futures to run in parallel and still process them via a for-comprehension. The suggested solutions of using zip can certainly work, but I find that when I want to handle a couple of futures and do something when they are all done, and I have two or more that are independent of each other, I do something like this:
val f1 = async1
val f2 = async2
//First two futures now running in parallel
for {
r1 <- f1
r2 <- f2
_ <- async3
_ = sideEffect
_ <- async4
} yield {
...
}
Now the way the for comprehension is structured certainly waits on f1 before checking on the completion status of f2, but the logic behind these two futures is running at the same time. This is a little simpler then some of the suggestions but still might give you what you need.
Your code already looks structured minus computing futures in parallel.
Use helper functions, ideally writing a code generator to print out
helpers for all tuple cases
As far as I know, you need to name the result or assign it _
Example code
Example code with helpers.
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Example {
def run: Future[Unit] = {
for {
(a, b, c) <- par(
Future.successful(1),
Future.successful(2),
Future.successful(3)
)
constant = 100
(d, e) <- par(
Future.successful(a + 10),
Future.successful(b + c)
)
} yield {
println(constant)
println(d)
println(e)
}
}
def par[A,B](a: Future[A], b: Future[B]): Future[(A, B)] = {
for {
a <- a
b <- b
} yield (a, b)
}
def par[A,B,C](a: Future[A], b: Future[B], c: Future[C]): Future[(A, B, C)] = {
for {
a <- a
b <- b
c <- c
} yield (a, b, c)
}
}
Example.run
Edit:
generated code for 1 to 20 futures: https://gist.github.com/nanop/c448db7ac1dfd6545967#file-parhelpers-scala
parPrinter script: https://gist.github.com/nanop/c448db7ac1dfd6545967#file-parprinter-scala
Disclaimer: the code snippet below relates to one of ongoing Coursera courses.
Let's consider it's posted just for a learning purpose and should not be used for submitting as a solution for one's homework assignment.
As the comment below states, we need to transform a list of Futures to a single Future of a list. More than that, the resulting Future should fail if at least one of input futures failed.
I met the following implementation and I don't understand it completely.
/** Given a list of futures `fs`, returns the future holding the list of values of all the futures from `fs`.
* The returned future is completed only once all of the futures in `fs` have been completed.
* The values in the list are in the same order as corresponding futures `fs`.
* If any of the futures `fs` fails, the resulting future also fails.
*/
def all[T](fs: List[Future[T]]): Future[List[T]] =
fs.foldRight(Future(Nil:List[T]))((f, fs2) =>
for {
x <- f
xs <- fs2
} yield (x::xs))
In particular, I don't understand the next things in it:
Where does Future[T] -> T transformation happen? It looks like xs <- fs2 is the only place we touch initial Futures, and each of xs type should be Future[T] (but somehow it becomes just T).
How are failures handled? It looks like the resulting Future object does fail when one of the input Futures fails.
1) Say f is a Future[T], then writing
for {
t <- f
} yield List(t)
will store the result of the Future f in t - therefor t is of type T. The yield turns it into a List[T], and the type of the whole for-comprehension ends up being Future[List[T]]. So the for-comprehension is where you extract your Ts from your Futures, do something with them, and put them back in a Future (OK, I'm simplifying a little bit here).
It's equivalent to
f.map(t => List(t))
2) If your Future f contains a Failure, then the for-comprehension will just return this failed Future instead of executing the yield.
As a general note, for-comprehension in Scala is just sugar that can be rewritten with map, flatMap, filter, foreach.
I'm an English-speaking right-hander, so normally I foldLeft, but each step of the fold looks like:
Fn flatMap ((x: T) => Fs map (xs => x :: xs))
Your value is x.
The function is applied on success, which explains why a failure stops you cold:
scala> timed(Await.ready(all(List(Future{Thread sleep 5*1000; 1},Future(2),Future{Thread sleep 10*1000; 3})), Duration.Inf))
res0: (Long, scala.concurrent.Awaitable[List[Int]]) = (10002419021,scala.concurrent.impl.Promise$DefaultPromise#2a8025a0)
scala> timed(Await.ready(all(List(Future{Thread sleep 5*1000; 1},Future(???),Future{Thread sleep 10*1000; 3})), Duration.Inf))
res1: (Long, scala.concurrent.Awaitable[List[Int]]) = (5000880298,scala.concurrent.impl.Promise$DefaultPromise#3750d517)
Notice that the failing version short-circuits.
See the ScalaDoc for flatMap for both bits of information.
Edit: I was speaking cautiously because it is Coursera work, but more plainly, this requirement is not met: "The returned future is completed only once all of the futures in fs have been completed."