flatMap and For-Comprehension with IO Monad - scala

Looking at an IO Monad example from Functional Programming in Scala:
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("enter a temperature in degrees fahrenheit")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine((d + 32).toString)
} yield ()
I decided to re-write converter with a flatMap.
def converterFlatMap: IO[Unit] = PrintLine("enter a temperate in degrees F").
flatMap(x => ReadLine.map(_.toDouble)).
flatMap(y => PrintLine((y + 32).toString))
When I replaced the last flatMap with map, I did not see the result of the readLine printed out on the console.
With flatMap:
enter a temperate in degrees
37.0
With map:
enter a temperate in degrees
Why? Also, how is the signature (IO[Unit]) still the same with map or flatMap?
Here's the IO monad from this book.
sealed trait IO[A] { self =>
def run: A
def map[B](f: A => B): IO[B] =
new IO[B] { def run = f(self.run) }
def flatMap[B](f: A => IO[B]): IO[B] =
new IO[B] { def run = f(self.run).run }
}

I think Scala converts IO[IO[Unit]] into the IO[Unit] in the second case. Try to run both variants in scala console, and don't specify type for the def converterFlatMap: IO[Unit], and you'll see the difference.
As for why map doesn't work, it is clearly seen from the definition of IO:
when you map over IO[IO[T]], map inside will call run only on the outer IO, result will be IO[IO[T]], so only first two PrintLine and ReadLine will be executed.
flatMap will also execute inner IO, and result will be IO[T] where T is the type parameter A of the inner IO, so all three of the statements will be executed.
P.S.: I think you incorrectly expanded for-comprehension. according to rules, for loop that you have written should be expanded to:
PrintLine("enter a temperate in degrees F").flatMap { case _ =>
ReadLine.map(_.toDouble).flatMap { case d =>
PrintLine((d + 32).toString).map { case _ => ()}
}
}
Notice that in this version flatMaps/maps are nested.
P.P.S: In fact last for statement should be also flatMap, not map. If we assume that scala had a "return" operator that puts values into the monadic context,
(e.g. return(3) will create IO[Int] that does nothing and it's function run returns 3.), then we can rewrite for (x <- a; y <- b) yield y as a.flatMap(x => b.flatMap( y => return(y))),
but because b.flatMap( y => return(y)) works absolutely the same as b.map(y => y) last statement in the scala for comprehension is expanded into map.

Related

Avoiding boilerplate with OptionT (natural transform?)

I have the following methods:
trait Tr[F[_]]{
def getSet(): F[Set[String]]
def checksum(): F[Long]
def value(): F[String]
def doRun(v: String, c: Long, s: Set[String]): F[Unit]
}
now I want to write the following for comprehension:
import cats._
import cats.data.OptionT
import cats.implicits._
def fcmprhn[F[_]: Monad](F: Tr[F]): OptionT[F, Unit] =
for {
set <- OptionT {
F.getSet() map { s =>
if(s.nonEmpty) Some(s) else None
}
}
checksum <- OptionT.liftF(F.checksum())
v <- OptionT.liftF(F.value())
_ <- OptionT.liftF(F.doRun(v, checksum, set))
//can be lots of OptionT.liftF here
} yield ()
As you can see there is too much of OptionT boilerplate. Is there a way to avoid it?
I think I can make use of F ~> OptionT[F, ?]. Can you suggest something?
One approach could be to nest the "F only" portion of the for-comprehension within a single liftF:
def fcmprhn[F[_]: Monad](F: Tr[F]): OptionT[F, Unit] =
for {
set <- OptionT {
F.getSet() map { s =>
if(s.nonEmpty) Some(s) else None
}
}
_ <- OptionT.liftF {
for {
checksum <- F.checksum()
v <- F.value()
_ <- F.doRun(v, checksum, set)
// rest of F monad for-comprehension
} yield ()
}
} yield ()
You can write it in "mtl-style" instead. mtl-style refers to mtl library in haskell but really it just means that instead of encoding effects as values (i.e. OptionT[F, ?]), we encode them as functions that take an abstract effect F[_] and give that F capabilities using type classes. That means instead of using OptionT[F, Unit] as our return type we can just use F[Unit] as our return type because F has to be able to handle errors.
This makes writing code like yours a small bit easier, but it's effect is amplified as you add monad transformers to the stack. Right now you only have to lift once, but what if you wanted a StateT[OptionT[F, ?], S, Unit] in the future. With mtl-style, all you need to do is add another type class constraint.
Here's what your code'd look like written in mtl-style:
def fcmprhn[F[_]](F: Tr[F])(implicit E: MonadError[F, Unit]): F[Unit] =
for {
set <- OptionT {
F.getSet() flatMap { s =>
if(s.nonEmpty) E.pure(s) else E.raiseError(())
}
}
checksum <- F.checksum()
v <- F.value()
_ <- F.doRun(v, checksum, set)
} yield ()
And now when you run the program you can specify the F[_] to be something like what you had before OptionT[F, ?]:
fcmprhn[OptionT[F, ?]](OptionT.liftF(yourOriginalTr))

functional parallelism and laziness in Scala

Background
I have been reading the book Functional Programming in Scala, and have some questions regarding the content in Chapter 7: Purely functional parallelism.
Here is the code for the answers in the book: Par.scala, but I am confused about certain part of it.
Here is the first part of the code of Par.scala, which stands for Parallelism:
import java.util.concurrent._
object Par {
type Par[A] = ExecutorService => Future[A]
def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a)
private case class UnitFuture[A](get: A) extends Future[A] {
def isDone = true
def get(timeout: Long, units: TimeUnit): A = get
def isCancelled = false
def cancel(evenIfRunning: Boolean): Boolean = false
}
def map2[A, B, C](a: Par[A], b: Par[B])(f: (A, B) => C): Par[C] =
(es: ExecutorService) => {
val af = a(es)
val bf = b(es)
UnitFuture(f(af.get, bf.get))
}
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
def run[A](es: ExecutorService)(a: Par[A]): Future[A] = a(es)
def asyncF[A, B](f: A => B): A => Par[B] =
a => lazyUnit(f(a))
def map[A, B](pa: Par[A])(f: A => B): Par[B] =
map2(pa, unit(()))((a, _) => f(a))
}
The simplest possible model for Par[A] might be ExecutorService => Future[A], and run simply returns the Future.
unit promotes a constant value to a parallel computation by returning a UnitFuture, which is a simple implementation of Future that just wraps a constant value.
map2 combines the results of two parallel computations with a binary function.
fork marks a computation for concurrent evaluation. The evaluation won’t actually occur until forced by run. Here is with its simplest and most natural implementation of it. Even though it has its problems, let's first put them aside.
lazyUnit wraps its unevaluated argument in a Par and marks it for concurrent evaluation.
run extracts a value from a Par by actually performing the computation.
asyncF converts any function A => B to one that evaluates its result asynchronously.
Questions
The fork is the function confuses me a lot here, because it takes a lazy argument, which will be evaluated later when it is called. Then my questions are more about when we should use this fork, i.e., when we need lazy-evaluation and when we need to have the value directly.
Here is an exercise from the book:
EXERCISE 7.5
Hard: Write this function, called sequence. No additional primitives are required. Do not call run.
def sequence[A](ps: List[Par[A]]): Par[List[A]]
And here is the answers (offered here).
First
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldRight[Par[List[A]]](unit(List()))((h, t) => map2(h, t)(_ :: _))
What is the different between above code and the following:
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldLeft[Par[List[A]]](unit(List()))((t, h) => map2(h, t)(_ :: _))
Additionally
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] =
as match {
case Nil => unit(Nil)
case h :: t => map2(h, fork(sequenceRight(t)))(_ :: _)
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] = fork {
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(sequenceBalanced(l), sequenceBalanced(r))(_ ++ _)
}
}
In sequenceRight, fork is used when recursive function is directly called. However, in sequenceBalanced, fork is used outside of the whole function body.
Then, what is the differences or above code and the following (where we switched the places of fork):
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] = fork {
as match {
case Nil => unit(Nil)
case h :: t => map2(h, sequenceRight(t))(_ :: _)
}
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] =
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(fork(sequenceBalanced(l)), fork(sequenceBalanced(r)))(_ ++ _)
}
Finally, given the sequence defined above, we have the following function:
def parMap[A,B](ps: List[A])(f: A => B): Par[List[B]] = fork {
val fbs: List[Par[B]] = ps.map(asyncF(f))
sequence(fbs)
}
I would like to know, can I also implement the function in the following way, which is by applying the lazyUnit defined in the beginning? Is this implementation lazyUnit(ps.map(f)) lazy?
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
I did not completely understand your doubt. But I see a major problem with the following solution,
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
To understand the problem lets look at def lazyUnit,
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
So... lazyUnit takes an expression of type => A and submits it to ExecutorService to get evaluated. And returns the wrapped result of this parallel computation as Par[A].
In parMap for every element of ps: List[A], we not only have to evaluate the corresponding mapping using the function f: A => B but we have to do these evaluations in parallel.
But our solution lazyUnit(ps.map(f)) will submit the whole { ps.map(f) } evaluation as a single task to our ExecutionService. Which means we are not doing it in parallel.
What we need to do is make sure that for each element a in ps: [A], the function f: A => B is executed as a separate task for our ExecutorService.
Now, as we learned from our implementation is that we can run an expression of type exp: => A by using lazyUnit(exp) to get a result: Par[A].
So, we will do exactly that for every a: A in ps: List[A],
val parMappedTmp = ps.map( a => lazyUnit(f(a) ) )
// or
val parMappedTmp = ps.map( a => asyncF(f)(a) )
// or
val parMappedTmp = ps.map(asyncF(f))
But, Now our parMappedTmp is a List[Par[B]] and whereas we needed a Par[List[B]]
So, you will need a function with the following signature to get what you wanted,
def sequence[A](ps: List[Par[A]]): Par[List[A]]
Once you have it,
val parMapped = sequence(parMappedTmp)

yld V.S yield on SCALA

I am working in a project that use the following code:
case class R(f: Vector[String], s: Vector[String]) {
def apply(name: String): String = f(schema indexOf name)
def apply(names: S): Vector[String] = names map (this apply _)
}
def processCSV(file: String)(yld: R => Unit): Unit = {
val in = new Scanner(file)
val s= in.next('\n').split(",").toVector
while (in.hasNext) {
val f = schema map (n => in.next(if (n == s.last) '\n' else ','))
yld(R(f, s))
}
}
def execOp(op: Operator)(yld: R => Unit): Unit = op match {
case Scan(file, _, _, _) => processCSV(file)(yld)}
Then My question is what is the meaning of yld? That is the same of yield?
Exactly how works, someone can help me to understand how works this yld?
yield is a scala keyword used with for-comprehensions.
yld in that code is VERY different: it is just the name that the author of the code gave one of the parameters to the functions processCSV and execOp. Any other name could have been given to those parameters: fn, callback, cb, etc. Nothing special there. Given the type R => Unit, it is just a function that takes R as input and return Unit (equivalent to void in java). Essentially, a callback where the work happens as side-effects.

Chaining scala Try instances that contain Options

I am trying to find a cleaner way to express code that looks similar to this:
def method1: Try[Option[String]] = ???
def method2: Try[Option[String]] = ???
def method3: Try[Option[String]] = ???
method1 match
{
case f: Failure[Option[String]] => f
case Success(None) =>
method2 match
{
case f:Failure[Option[String]] => f
case Success(None) =>
{
method3
}
case s: Success[Option[String]] => s
}
case s: Success[Option[String]] => s
}
As you can see, this tries each method in sequence and if one fails then execution stops and the base match resolves to that failure. If method1 or method2 succeeds but contains None then the next method in the sequence is tried. If execution gets to method3 its results are always returned regardless of Success or Failure. This works fine in code but I find it difficult to follow whats happening.
I would love to use a for comprehension
for
{
attempt1 <- method1
attempt2 <- method2
attempt3 <- method3
}
yield
{
List(attempt1, attempt2, attempt3).find(_.isDefined)
}
because its beautiful and what its doing is quite clear. However, if all methods succeed then they are all executed every time, regardless of whether an earlier method returns a usable answer. Unfortunately I can't have that.
Any suggestions would be appreciated.
scalaz can be of help here. You'll need scalaz-contrib which adds a monad instance for Try, then you can use OptionT which has nice combinators. Here is an example:
import scalaz.OptionT
import scalaz.contrib.std.utilTry._
import scala.util.Try
def method1: OptionT[Try, String] = OptionT(Try(Some("method1")))
def method2: OptionT[Try, String] = OptionT(Try(Some("method2")))
def method3: OptionT[Try, String] = { println("method 3 is never called") ; OptionT(Try(Some("method3"))) }
def method4: OptionT[Try, String] = OptionT(Try(None))
def method5: OptionT[Try, String] = OptionT(Try(throw new Exception("fail")))
println((method1 orElse method2 orElse method3).run) // Success(Some(method1))
println((method4 orElse method2 orElse method3).run) // Success(Some(method2))
println((method5 orElse method2 orElse method3).run) // Failure(java.lang.Exception: fail)
If you don't mind creating a function for each method, you can do the following:
(Try(None: Option[String]) /: Seq(method1 _, method2 _, method3 _)){ (l,r) =>
l match { case Success(None) => r(); case _ => l }
}
This is not at all idiomatic, but I would like to point out that there's a reasonably short imperative version also with a couple tiny methods:
def okay(tos: Try[Option[String]]) = tos.isFailure || tos.success.isDefined
val ans = {
var m = method1
if (okay(m)) m
else if ({m = method2; okay(m)}) m
method3
}
The foo method should do the same stuff as your code, I don't think it is possible to do it using the for comprehension
type tryOpt = Try[Option[String]]
def foo(m1: tryOpt, m2: tryOpt, m3: tryOpt) = m1 flatMap {
case x: Some[String] => Try(x)
case None => m2 flatMap {
case y: Some[String] => Try(y)
case None => m3
}
}
method1.flatMap(_.map(Success _).getOrElse(method2)).flatMap(_.map(Success _).getOrElse(method3))
How this works:
The first flatMap takes a Try[Option[String]], if it is a Failure it returns the Failure, if it is a Success it returns _.map(Success _).getOrElse(method2) on the option. If the option is Some then it returns the a Success of the Some, if it is None it returns the result of method2, which could be Success[None], Success[Some[String]] or Failure.
The second map works similarly with the result it gets, which could be from method1 or method2.
Since getOrElse takes a by-name paramater method2 and method3 are only called if they need to be.
You could also use fold instead of map and getOrElse, although in my opinion that is less clear.
From this blog:
def riskyCodeInvoked(input: String): Int = ???
def anotherRiskyMethod(firstOutput: Int): String = ???
def yetAnotherRiskyMethod(secondOutput: String): Try[String] = ???
val result: Try[String] = Try(riskyCodeInvoked("Exception Expected in certain cases"))
.map(anotherRiskyMethod(_))
.flatMap(yetAnotherRiskyMethod(_))
result match {
case Success(res) => info("Operation Was successful")
case Failure(ex: ArithmeticException) => error("ArithmeticException occurred", ex)
case Failure(ex) => error("Some Exception occurred", ex)
}
BTW, IMO, Option is no need here?

Scalaz's traverse_ with IO monad

I want to use IO monad.
But this code do not run with large file.
I am getting a StackOverflowError.
I tried the -DXss option, but it throws the same error.
val main = for {
l <- getFileLines(file)(collect[String, List]).map(_.run)
_ <- l.traverse_(putStrLn)
} yield ()
How can I do it?
I wrote Iteratee that is output all element.
def putStrLn[E: Show]: IterV[E, IO[Unit]] = {
import IterV._
def step(i: IO[Unit])(input: Input[E]): IterV[E, IO[Unit]] =
input(el = e => Cont(step(i >|> effects.putStrLn(e.shows))),
empty = Cont(step(i)),
eof = Done(i, EOF[E]))
Cont(step(mzero[IO[Unit]]))
}
val main = for {
i <- getFileLines(file)(putStrLn).map(_.run)
} yield i.unsafePerformIO
This is also the same result.
I think to be caused by IO implementation.
This is because scalac is not optimizing loop inside getReaderLines for tail calls. loop is tail recursive but I think the case anonymous function syntax gets in the way.
Edit: actually it's not even tail recursive (the wrapping in the IO monad) causes at least one more call after the recursive call. When I was doing my testing yesterday, I was using similar code but I had dropped the IO monad and it was then possible to make the Iteratee tail recursive. The text below, assumes no IO monad...
I happened to find that out yesterday while experimenting with iteratees. I think changing the signature of loop to this will help (so for the time being you may have to reimplement getFilesLines and getReaderLines:
#annotations.tailrec
def loop(it: IterV[String, A]): IO[IterV[String, A]] = it match {
// ...
}
We should probably report this to the scalaz folk (and may be open an enhancement ticket for scala).
This shows what happens (code vaguely similar to getReaderLines.loop):
#annotation.tailrec
def f(i: Int): Int = i match {
case 0 => 0
case x => f(x - 1)
}
// f: (i: Int)Int
#annotation.tailrec
def g: Int => Int = {
case 0 => 0
case x => g(x - 1)
}
/* error: could not optimize #tailrec annotated method g:
it contains a recursive call not in tail position
def g: Int => Int = {
^
*/