How does the cats-effect IO monad really work? - scala

I'm new to functional programming and Scala, and I was checking out the Cats Effect framework and trying to understand what the IO monad does. So far what I've understood is that writing code in the IO block is just a description of what needs to be done and nothing happens until you explicitly run using the unsafe methods provided, and also a way to make code that performs side-effects referentially transparent by actually not running it.
I tried executing the snippet below just to try to understand what it means:
object Playground extends App {
var out = 10
var state = "paused"
def changeState(newState: String): IO[Unit] = {
state = newState
IO(println("Updated state."))
}
def x(string: String): IO[Unit] = {
out += 1
IO(println(string))
}
val tuple1 = (x("one"), x("two"))
for {
_ <- x("1")
_ <- changeState("playing")
} yield ()
println(out)
println(state)
}
And the output was:
13
paused
I don't understand why the assignment state = newState does not run, but the increment and assign expression out += 1 run. Am I missing something obvious on how this is supposed to work? I could really use some help. I understand that I can get this to run using the unsafe methods.

In your particular example, I think what is going on is that regular imperative Scala coded is unaffected by the IO monad--it runs when it normally would under the rules of Scala.
When you run:
for {
_ <- x("1")
_ <- changeState("playing")
} yield ()
this immediately calls x. That has nothing to do with the IO monad; it's just how for comprehensions are defined. The first step is to evaluate the first statement so you can call flatMap on it.
As you observe, you never "run" the monadic result, so the argument to flatMap, the monadic continuation, is never invoked, resulting in no call to changeState. This is specific to the IO monad, as, e.g., the List monad's flatMap would have immediately invoked the function (unless it were an empty list).

Related

How IO monad makes concurrency easy to do in Scala?

I watch this video and starts at 6min35s, it mentioned this graph:
saying that the IO Monad makes it easy to handle concurrency. I got confused on this: how does it work? How do the two for comprehension enable the concurrency (the computation of d and f)?
No, it doesn't enable the concurrency
for comprehensions only help you omitting several parentheses and indents.
The code you referred, translate to [flat]map is strictly equivalent to:
async.boundedQueue[Stuff](100).flatMap{ a =>
val d = computeB(a).flatMap{
b => computeD(b).map{ result =>
result
}
}
val f = computeC(a).flatMap{ c =>
computeE(c).flatMap{ e =>
computeF(e).map{ result =>
result
}
}
}
d.merge(f).map(g => g)
}
See, it only helps you omitting several parentheses and indents (joke)
The concurrency is hidden in flatMap and map
Once you understand how for is translated to flatMap and map, you can implement your concurrency inside them.
As map takes a function as argument, it doesn't mean that the function is executed during execution of map function, you can defer the function to another thread or run it latter. This is how concurrency implemented.
Take Promise and Future as example:
val future: Future = ```some promise```
val r: Future = for (v <- future) yield doSomething(v)
// or
val r: Future = future.map(v => doSomething(v))
r.wait
The function doSomething is not executed during the execution of Future.map function, instead it is called when the promise commits.
Conclusion
How to implement concurrency using for syntax suger:
for will convert into flatMap and map by scala compiler
Write you flatMap and map, where you will get a callback function from argument
Call the function you got whenever and wherever you like
Further reading
The flow control feature of many languages share a same property, they are like delimited continuation shift/reset, where they capture the following execution upto a scope into an function.
JavaScript:
async function() {
...
val yielded = await new Promise((resolve) => shift(resolve))
// resolve will captured execution of following statements upto end of the function.
...captured
}
Haskell:
do {
...
yielded_monad <- ```some monad``` -- shift function is >>= of the monad
...captured
}
Scala:
for {
...
yielded_monad <- ```some monad``` // shift function is flatMap/map of the monad
...captured
} yield ...
next time you see a language feature which capture following execution into a function, you know you can implement flow control using the feature.
The difference between delimited continuation and call/cc is that call/cc capture the whole following execution of the program, but delimited continuation has a scope.

How the sample Simple IO Type get rid of side effects in "FP in Scala"?

I am reading chapter 13.2.1 and came across the example that can handle IO input and get rid of side effect in the meantime:
object IO extends Monad[IO] {
def unit[A](a: => A): IO[A] = new IO[A] { def run = a }
def flatMap[A,B](fa: IO[A])(f: A => IO[B]) = fa flatMap f
def apply[A](a: => A): IO[A] = unit(a)
}
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("Enter a temperature in degrees Fahrenheit: ")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine(fahrenheitToCelsius(d).toString)
} yield ()
I have couple of questions regarding this piece of code:
In the unit function, what does def run = a really do?
In the ReadLine function, what does IO { readLine } really do? Will it really execute the println function or just return an IO type?
What does _ in the for comprehension mean (_ <- PrintLine("Enter a temperature in degrees Fahrenheit: ")) ?
Why it removes the IO side effects? I saw these functions still interact with inputs and outputs.
The definition of your IO is as follows:
trait IO { def run: Unit }
Following that definition, you can understand that writing new IO[A] { def run = a } means initialising an anonymous class from your trait, and assigning a to be the method that runs when you call IO.run. Because a is a by name parameter, nothing is actually ran at creation time.
Any object or class in Scala which follows a contract of an apply method, can be called as: ClassName(args), where the compiler will search for an apply method on the object/class and convert it to a ClassName.apply(args) call. A more elaborate answer can be found here. As such, because the IO companion object posses such a method:
def apply[A](a: => A): IO[A] = unit(a)
The expansion is allowed to happen. Thus we actually call IO.apply(readLine) instead.
_ has many overloaded uses in Scala. This occurrence means "I don't care about the value returned from PrintLine, discard it". It is so because the value returned is of type Unit, which we have nothing to do with.
It is not that the IO datatype removes the part of doing IO, it's that it defers it to a later point in time. We usually say IO runs at the "edges" of the application, in the Main method. These interactions with the out side world will still occur, but since we encapsulate them inside IO, we can reason about them as values in our program, which brings a lot of benefit. For example, we can now compose side effects and depend on the success/failure of their execution. We can mock out these IO effects (using other data types such as Const), and many other surprisingly nice properties.
The simplest way to look at IO monad as a small piece of program definition.
Thus:
This is IO definition, run method defines what IO monad does. new IO[A] { def run = a } is scala way of creating an instance of class and defining method run.
There is a bit of syntactical sugar is going on. IO { readLine } is same as IO.apply { readLine } or IO.apply(readLine) where readLine is call-by-name function of type => String. This calls the unit method from object IO and thus this is just creation of instance of IO class that does not run yet.
Since IO is a monad, for comprehension can be used. It requires storing a result of each monad operation in a syntax like result <- someMonad. To ignore the result, _ can be used, thus _ <- someMonad reads as execute the monad but ignore the result.
This methods are all IO definitions, they don't run anything and thus there is no side effect. Side effects only appears when IO.run is called.

Scalaz Task not starting

I'm trying to make an asyncronous call using scalaz Task.
For some strange reason whenever I attempt to call methodA, although I get a scalaz.concurrent.Task returned I never see the 2nd print statement which leads me to believe that the asynchronous call is never returned. Is there something that I'm missing that is required in order for me to start the task?
for {
foo <- println("we are in this for loop!").asAsyncSuccess()
authenticatedUser <- methodA(parameterA)
foo2 <- println("we are out of this this call!").asAsyncSuccess()
} yield {
authenticatedUser
}
def methodA(parameterA: SomeType): Task[User]
First off, Scalaz Task is lazy: it won't start when you create it or get from some method. It's by design and allows Tasks to combine well. It's not intended to start until you fully assemble your entire program into single main Task, so when you're done, you need to manually start an aggregated Task using one of the perform methods, like unsafePerformSync that will wait for the result of aggregate async computation on a current thread:
(for {
foo <- println("we are in this for loop!").asAsyncSuccess()
authenticatedUser <- methodA(parameterA)
foo2 <- println("we are out of this this call!").asAsyncSuccess()
} yield {
authenticatedUser
}).unsafePerformSync
As you can see from your original code, you've never started any Task. However, you've mentioned that first println printed its message to console. This may be related to asAsyncSuccess method: I've not found it in Scalaz, so I assume it's in implicit class somewhere in your code:
implicit class TaskOps[A](val a: A) extends AnyVal {
def asAsyncSuccess(): Task[A] = Task(a)
}
If you write a helper implicit class like this, it won't convert an effectful expression into lazy Task action, despite the fact that Task.apply
is call-by-name. It will only convert its result, which happens to be just Unit, because its own parameter a is not call-by-name. To work as intended, it needs to be like the following:
implicit class TaskOps[A](a: => A) {
def asAsyncSuccess(): Task[A] = Task(a)
}
You may also ask why first println prints something while second isn't. This is related to for-comprehensions desugaring into method calls. Specifically, compiler turns your code into this:
println("we are in this for loop!").asAsyncSuccess().flatMap(foo =>
methodA(parameterA).flatMap(authenticatedUser =>
println("we are out of this this call!").asAsyncSuccess().map(foo2 =>
authenticatedUser)))
If asAsyncSuccess doesn't respect intended Task semantics, first println gets executed immediately. But its result is then wrapped into Task that is, essentially, just a wrapper over start function, and flatMap implementation on it just combines such functions together without running them. So, methodA or second println won't be executed until you call unsafePerformSync or other perform helper method. Note, however, that second println will still be executed earlier than it should according to Task semantics, just not that earlier.

Use only the value from IO monad without precedent IO actions

Doing some home project, I encountered an interested effect, which now , seems obvious to me, but still I do not see a way to get away from it.
That is the gist (I am using ScalaZ, but in haskell there would be probably the same result):
def askAndReadResponse(question: String): IO[String] = {
putStrLn(question) >> readLn
}
def core: IO[String] = {
val answer: IO[String] = askAndReadResponse("enter something")
val cond: IO[Boolean] = answer map {_.length > 2}
IO.ioMonad.ifM(cond, answer, core)
}
When I am trying to get an input from core, the askAndReadResponse evaluates twice - once for evaluating the condition, and then in ifM (so I have the message and readLn one more time then necessary).
What I need - just the validated value (to print it later, for instance)
Is there any elegant way to do this, in particular - to pass further the result of IO, without preceding IO actions, namely avoiding execution of askAndReadResponse twice?
You can sequence the effects using monadic binding with flatMap:
def core: IO[String] = askAndReadResponse("enter something").flatMap {
case response if response.length > 2 => response.point[IO]
case response => core
}
This lets you take the result of one computation (the user entering text after being prompted) and use it in subsequent computations (the calculation about whether to return or loop, and the result if returning).
ifM just isn't going to be useful in your case—it would only work here if your condition and your successful branch were independent computations.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity