Let's consider the following excerpt from scala.concurrent.Future.scala:
def zip[U](that: Future[U]): Future[(T, U)] = {
implicit val ec = internalExecutor
flatMap { r1 => that.map(r2 => (r1, r2)) }
}
def zipWith[U, R](that: Future[U])(f: (T, U) => R)(implicit executor: ExecutionContext): Future[R] =
flatMap(r1 => that.map(r2 => f(r1, r2)))(internalExecutor)
It does not differ a lot seemingly, except for the application of function f in the zipWith case. It is interesting to me, why the internalExecutor (which just delegates to the current thread) is declared as implicit value in the zip and thus used in both underlying map and flatMap calls, but is used explicitly only in the flatMap call inside the zipWith?
As I understand after some thinking, the f function execution may involve some blocking or intensive computation which is out of Scala library control, and so the user should provide another execution context for it to not occasionally block the internalExecutor (current thread). Is this understanding correct?
The application of f is done with the supplied ExecutionContext, and internalExecutor is used to perform the flattening operation. The rule is basically: When the user supplies the logic, that logic is executed on the ExecutionContext supplied by the user.
You could imagine that zipWith was implemented as this.zip(that).map(f.tupled) or that zip was implemented as zipWith(Tuple2.apply)(internalExecutor).
Related
I am reading chapter 13.2.1 and came across the example that can handle IO input and get rid of side effect in the meantime:
object IO extends Monad[IO] {
def unit[A](a: => A): IO[A] = new IO[A] { def run = a }
def flatMap[A,B](fa: IO[A])(f: A => IO[B]) = fa flatMap f
def apply[A](a: => A): IO[A] = unit(a)
}
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("Enter a temperature in degrees Fahrenheit: ")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine(fahrenheitToCelsius(d).toString)
} yield ()
I have couple of questions regarding this piece of code:
In the unit function, what does def run = a really do?
In the ReadLine function, what does IO { readLine } really do? Will it really execute the println function or just return an IO type?
What does _ in the for comprehension mean (_ <- PrintLine("Enter a temperature in degrees Fahrenheit: ")) ?
Why it removes the IO side effects? I saw these functions still interact with inputs and outputs.
The definition of your IO is as follows:
trait IO { def run: Unit }
Following that definition, you can understand that writing new IO[A] { def run = a } means initialising an anonymous class from your trait, and assigning a to be the method that runs when you call IO.run. Because a is a by name parameter, nothing is actually ran at creation time.
Any object or class in Scala which follows a contract of an apply method, can be called as: ClassName(args), where the compiler will search for an apply method on the object/class and convert it to a ClassName.apply(args) call. A more elaborate answer can be found here. As such, because the IO companion object posses such a method:
def apply[A](a: => A): IO[A] = unit(a)
The expansion is allowed to happen. Thus we actually call IO.apply(readLine) instead.
_ has many overloaded uses in Scala. This occurrence means "I don't care about the value returned from PrintLine, discard it". It is so because the value returned is of type Unit, which we have nothing to do with.
It is not that the IO datatype removes the part of doing IO, it's that it defers it to a later point in time. We usually say IO runs at the "edges" of the application, in the Main method. These interactions with the out side world will still occur, but since we encapsulate them inside IO, we can reason about them as values in our program, which brings a lot of benefit. For example, we can now compose side effects and depend on the success/failure of their execution. We can mock out these IO effects (using other data types such as Const), and many other surprisingly nice properties.
The simplest way to look at IO monad as a small piece of program definition.
Thus:
This is IO definition, run method defines what IO monad does. new IO[A] { def run = a } is scala way of creating an instance of class and defining method run.
There is a bit of syntactical sugar is going on. IO { readLine } is same as IO.apply { readLine } or IO.apply(readLine) where readLine is call-by-name function of type => String. This calls the unit method from object IO and thus this is just creation of instance of IO class that does not run yet.
Since IO is a monad, for comprehension can be used. It requires storing a result of each monad operation in a syntax like result <- someMonad. To ignore the result, _ can be used, thus _ <- someMonad reads as execute the monad but ignore the result.
This methods are all IO definitions, they don't run anything and thus there is no side effect. Side effects only appears when IO.run is called.
Looking for the best way to write a chain of functions that need to run async one after the other. Given these two options:
Option 1
def operation1(): Unit = {...}
def operation2(): Unit = {...}
def foo(): Future[Unit] =
Future {
operation1()
operation2()
} onComplete {case _ => println("done!")}
Option 2
def operation1(): Future[Unit] = {...}
def operation2(): Future[Unit] = {...}
def foo(): Future[Unit] = {
operation1()
.flatMap {case _ => operation2() }
.onComplete {case _ => println("done!")}
}
Are there any advantages/disadvantages of one over the other?
I believe that the Option 1 will run the two functions on the same background thread. Is that also the case for the Option 2?
Are there any good practices for this?
Another question, given this function:
def foo: Future[A]
if I want to cast the result to unit, is this the best way to do it:
foo map { _ => () }
Thanks!
The potential advantage of Option 1 over Option 2 is that, it guarantees operation2 will run right after operation1 - if it didn't fail with an exception - whereas, in Option 2, you might have exhausted your thread pool available threads when the flatMap is to be done.
Yes, Option1 will run the operations in the same thread for sure. Option 2 will try to run them in two threads as long as there are enough of them available.
flatMap[S](f: (T) ⇒ Future[S])(implicit executor: ExecutionContext): Future[S]
You did have to declare an implicit execution context, or import it: That determines which pool you are using. If you imported the default global executor then your pool is a fork join based one with - by default - as many threads as cores you machine has.
The first option is like having a thread running both operations, one after the another whereas the second option runs the first operation in a thread and then tries to get another thread from your ExecutionContext to run the second operation.
The best practice is to use what you need:
Do you want to make sure that operation2 run in a context where no more threads are available in the execution context? If the answer is yes, then use Option1. Otherwise, you can use Option2
In relation to your last question: What you're doing in your proposed snippet is not casting, your are mapping a function which provides an Unit value for any value of type A. The effect is that you get a future of type Unit which is useful to check its completion state. That is the best way to get what you seem to want.
Be aware, however, of the fact that, as well as with flatMap, the execution of that "transformation function" will be run in a different thread provided by the implicit executor in your context. That is why map has an implicit parameter executor too:
def map[S](f: (T) ⇒ S)(implicit executor: ExecutionContext): Future[S]
Scala Futures are a fine abstraction for asynchronous computation that can fail. What abstraction should I use when I know that there is no failure possible?
Here is a concrete use case:
case class Service123Error(e: Throwable)
val f1: Future[User] =
service123.call()
val f2: Future[Either[Service123Error, User]] =
f1.map(Right.apply)
.recover { case e => Left(Service123Error(e)) }
In this code, the types do not model the fact that f2 always successfully completes. Given several values similar to f2, I know I can Future.sequence over them (i.e. without the fail fast behavior), but the types are not carrying this information.
I would not stick an Either into a Future if it carries the error information. Simply use the failure side of the future.
Future(Right(User)) -> Future[User](User)
Future(Left(Service123Error)) -> Future[User](throw Service123Error)
Taking in account that any pattern matching on Future result matches Success[A] and Failure[A] (onComplete(), andThen()) (because they expect Try[A] as an argument, directly or through partial function), could there be a case when I would want to say explicitly that a function is of type Future[Try[A]]?
There are various constructs in Scala which carry failure case in them. There are Option, Either, Try and Future. (Futures main point is to abstract asynchronous operations, an error handling is there for convinience). Scalaz have even more: Validation (Disjunction and Maybe are better Either and Option).
They all have a bit different treatment of erroneous values. Yet Try and Future have very similar, both wrap Throwable. So IMHO, Future[Try[A]] doesn't add much information (about the error). Compare to having Future[Future[A]] or Try[Try[A]]. OTOH Future[Option[A]] or Future[Either[MyError, A]] make sense to me.
There might be sitatuon where you have for example potentially failing f: A => B and g: B => C and you'd like to avoid creating too much tasks in the ExecutionContext:
val a: Future[A] = ???
val c: Future[C] = a.map(f).map(g) // two tasks, not good
val c2: Future[Try[C]] = a.map(x => Try { f(x) } map g ) // one task, maybe better
// get will throw, but exception will be catched by Future's map
// Almost no information is lost compared to `c2`
// You cannot anymore distinguish between possible error in `a`
// and exceptions thrown by `f` or `g`.
val c3: Future[C] = a.map(x => Try { f (x) }.map(g).get)
In this case, I'd rather refactor f and g to have better types, at least: f: A => Option[B] and g: B => Option[C] then, ending up with Future[Option[C]].
Try[A] represents a synchronous computation that may fail.
Future[A] represents an asynchronous computation that may fail.
Under this definitions, Future[Try[A]] represents the result of a synchronous computation (that may fail) executed inside of an asynchronous computation (that may fail).
Does it make sense? Not to me, but I'm open to creative interpretations.
Why and how specifically is a Scala Future not a Monad; and would someone please compare it to something that is a Monad, like an Option?
The reason I'm asking is Daniel Westheide's The Neophyte's Guide to Scala Part 8: Welcome to the Future where I asked whether or not a Scala Future was a Monad, and the author responded that it wasn't, which threw off base. I came here to ask for a clarification.
A summary first
Futures can be considered monads if you never construct them with effectful blocks (pure, in-memory computation), or if any effects generated are not considered as part of semantic equivalence (like logging messages). However, this isn't how most people use them in practice. For most people using effectful Futures (which includes most uses of Akka and various web frameworks), they simply aren't monads.
Fortunately, a library called Scalaz provides an abstraction called Task that doesn't have any problems with or without effects.
A monad definition
Let's review briefly what a monad is. A monad must be able to define at least these two functions:
def unit[A](block: => A)
: Future[A]
def bind[A, B](fa: Future[A])(f: A => Future[B])
: Future[B]
And these functions must statisfy three laws:
Left identity: bind(unit(a))(f) ≡ f(a)
Right identity: bind(m) { unit(_) } ≡ m
Associativity: bind(bind(m)(f))(g) ≡ bind(m) { x => bind(f(x))(g) }
These laws must hold for all possible values by definition of a monad. If they don't, then we simply don't have a monad.
There are other ways to define a monad that are more or less the same. This one is popular.
Effects lead to non-values
Almost every usage of Future that I've seen uses it for asychronous effects, input/output with an external system like a web service or a database. When we do this, a Future isn't even a value, and mathematical terms like monads only describe values.
This problem arises because Futures execute immediately upon data contruction. This messes up the ability to substitute expressions with their evaluated values (which some people call "referential transparency"). This is one way to understand why Scala's Futures are inadequate for functional programming with effects.
Here's an illustration of the problem. If we have two effects:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits._
def twoEffects =
( Future { println("hello") },
Future { println("hello") } )
we will have two printings of "hello" upon calling twoEffects:
scala> twoEffects
hello
hello
scala> twoEffects
hello
hello
But if Futures were values, we should be able to factor out the common expression:
lazy val anEffect = Future { println("hello") }
def twoEffects = (anEffect, anEffect)
But this doesn't give us the same effect:
scala> twoEffects
hello
scala> twoEffects
The first call to twoEffects runs the effect and caches the result, so the effect isn't run the second time we call twoEffects.
With Futures, we end up having to think about the evaluation policy of the language. For instance, in the example above, the fact I use a lazy value rather than a strict one makes a difference in the operational semantics. This is exactly the kind of twisted reasoning functional programming is designed to avoid -- and it does it by programming with values.
Without substitution, laws break
In the presense of effects, monad laws break. Superficially, the laws appear to hold for simple cases, but the moment we begin to substitute expressions with their evaluated values, we end up with the same problems we illustrated above. We simply can't talk about mathematical concepts like monads when we don't have values in the first place.
To put it bluntly, if you use effects with your Futures, saying they're monads is not even wrong because they aren't even values.
To see how monad laws break, just factor out your effectful Future:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits._
def unit[A]
(block: => A)
: Future[A] =
Future(block)
def bind[A, B]
(fa: Future[A])
(f: A => Future[B])
: Future[B] =
fa flatMap f
lazy val effect = Future { println("hello") }
Again, it will only run one time, but you need it to run twice -- once for the right-hand side of the law, and another for the left. I'll illustrate the problem for the right identity law:
scala> effect // RHS has effect
hello
scala> bind(effect) { unit(_) } // LHS doesn't
The implicit ExecutionContext
Without putting an ExecutionContext in implicit scope, we can't define either unit or bind in our monad. This is because the Scala API for Futures has these signature:
object Future {
// what we need to define unit
def apply[T]
(body: ⇒ T)
(implicit executor: ExecutionContext)
: Future[T]
}
trait Future {
// what we need to define bind
flatMap[S]
(f: T ⇒ Future[S])
(implicit executor: ExecutionContext)
: Future[S]
}
As a "convenience" to the user, the standard library encourages users to define an execution context in implicit scope, but I think this is a huge hole in the API that just leads to defects. One scope of the computation may have one execution context defined while another scope can have another context defined.
Perhaps you can ignore the problem if you define an instance of unit and bind that pins both operations to a single context and use this instance consistently. But this is not what people do most of the time. Most of the time, people use Futures with for-yield comprehensions that become map and flatMap calls. To make for-yield comprehensions work, an execution context must be defined at some non-global implicit scope (because for-yield doesn't provide a way to specify additional parameters to the map and flatMap calls).
To be clear, Scala lets you use lots of things with for-yield comprehensions that aren't actually monads, so don't believe that you have a monad just because it works with for-yield syntax.
A better way
There's a nice library for Scala called Scalaz that has an abstraction called scalaz.concurrent.Task. This abstraction doesn't run effects upon data construction as the standard library Future does. Furthermore, Task actually is a monad. We compose Task monadically (we can use for-yield comprehensions if we like), and no effects run while we're composing. We have our final program when we have composed a single expression evaluating to Task[Unit]. This ends up being our equivalent of a "main" function, and we can finally run it.
Here's an example illustrating how we can substitute Task expressions with their respective evaluated values:
import scalaz.concurrent.Task
import scalaz.IList
import scalaz.syntax.traverse._
def twoEffects =
IList(
Task delay { println("hello") },
Task delay { println("hello") }).sequence_
We will have two printings of "hello" upon calling twoEffects:
scala> twoEffects.run
hello
hello
And if we factor out the common effect,
lazy val anEffect = Task delay { println("hello") }
def twoEffects =
IList(anEffect, anEffect).sequence_
we get what we'd expect:
scala> twoEffects.run
hello
hello
In fact, it doesn't really matter that whether we use a lazy value or a strict value with Task; we get hello printed out twice either way.
If you want to functionally program, consider using Task everywhere you may use Futures. If an API forces Futures upon you, you can convert the Future to a Task:
import concurrent.
{ ExecutionContext, Future, Promise }
import util.Try
import scalaz.\/
import scalaz.concurrent.Task
def fromScalaDeferred[A]
(future: => Future[A])
(ec: ExecutionContext)
: Task[A] =
Task
.delay { unsafeFromScala(future)(ec) }
.flatMap(identity)
def unsafeToScala[A]
(task: Task[A])
: Future[A] = {
val p = Promise[A]
task.runAsync { res =>
res.fold(p failure _, p success _)
}
p.future
}
private def unsafeFromScala[A]
(future: Future[A])
(ec: ExecutionContext)
: Task[A] =
Task.async(
handlerConversion
.andThen { future.onComplete(_)(ec) })
private def handlerConversion[A]
: ((Throwable \/ A) => Unit)
=> Try[A]
=> Unit =
callback =>
{ t: Try[A] => \/ fromTryCatch t.get }
.andThen(callback)
The "unsafe" functions run the Task, exposing any internal effects as side-effects. So try not to call any of these "unsafe" functions until you've composed one giant Task for your entire program.
I believe a Future is a Monad, with the following definitions:
def unit[A](x: A): Future[A] = Future.successful(x)
def bind[A, B](m: Future[A])(fun: A => Future[B]): Future[B] = fut.flatMap(fun)
Considering the three laws:
Left identity:
Future.successful(a).flatMap(f) is equivalent to f(a). Check.
Right identity:
m.flatMap(Future.successful _) is equivalent to m (minus some possible performance implications). Check.
Associativity
m.flatMap(f).flatMap(g) is equivalent to m.flatMap(x => f(x).flatMap(g)). Check.
Rebuttal to "Without substitution, laws break"
The meaning of equivalent in the monad laws, as I understand it, is you could replace one side of the expression with the other side in your code without changing the behavior of the program. Assuming you always use the same execution context, I think that is the case. In the example #sukant gave, it would have had the same issue if it had used Option instead of Future. I don't think the fact that the futures are evaluated eagerly is relevant.
As the other commenters have suggested, you are mistaken. Scala's Future type has the monadic properties:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits._
def unit[A](block: => A): Future[A] = Future(block)
def bind[A, B](fut: Future[A])(fun: A => Future[B]): Future[B] = fut.flatMap(fun)
This is why you can use for-comprehension syntax with futures in Scala.