Trying to understand how best deal with side-effects in FP.
I implemented this rudimentary IO implementation:
trait IO[A] {
def run: A
}
object IO {
def unit[A](a: => A): IO[A] = new IO[A] { def run = a }
def loadFile(fileResourcePath: String) = IO.unit[List[String]]{
Source.fromResource(fileResourcePath).getLines.toList }
def printMessage(message: String) = IO.unit[Unit]{ println(message) }
def readLine(message:String) = IO.unit[String]{ StdIn.readLine() }
}
I have the following use case:
- load lines from log file
- parse each line to BusinessType object
- process each BusinessType object
- print process result
Case 1:
So Scala code may look like this
val load: String => List[String]
val parse: List[String] => List[BusinessType]
val process: List[BusinessType] => String
val output: String => Unit
Case 2:
I decide to use IO above:
val load: String => IO[List[String]]
val parse: IO[List[String]] => List[BusinessType]
val process: List[BusinessType] => IO[Unit]
val output: IO[Unit] => Unit
In case 1 the load is impure because it's reading from file so is the output is also impure because, it's writing the result to console.
To be more functional I use case 2.
Questions:
- Aren't case 1 and 2 really the same thing?
- In case 2 aren't we just delaying the inevitable?
as the parse function will need to call the io.run
method and cause a side-effect?
- when they say "leave side-effects until the end of the world"
how does this apply to the example above? where is the
end of the world here?
Your IO monad seems to lack all the monad stuff, namely the part where you can flatMap over it to build bigger IO out of smaller IO. That way, everything stays "pure" until the call run at the very end.
In case 2 aren't we just delaying the inevitable?
as the parse function will need call the io.run
method and cause a side effect?
No. The parse function should not call io.run. It should return another IO that you can then combine with its input IO.
when they say "leave side-effects until the end of the world"
how does this apply to the example above? where is the
end of the world here?
End of the world would be the last thing your program does. You only run once. The rest of your program "purely" builds one giant IO for that.
Something like
def load(): IO[Seq[String]]
def parse(data: Seq[String]): IO[Parsed] // returns IO, because has side-effects
def pureComputation(data: Parsed): Result // no side-effects, no need to use I/O
def output(data: Result): IO[Unit]
// combining effects is "pure", so the whole thing
// can be a `val` (or a `def` if it takes some input params)
val program: IO[Unit] = for {
data <- load() // use <- to "map" over IO
parsed <- parse()
result = pureComputation(parsed) // = instead of <-, no I/O here
_ <- output(result)
} yield ()
// only `run` at the end produces any effects
def main() {
program.run()
}
Related
I am creating in Scala and Cats a function that does some I/O and that will be called by other parts of the code. I'm also learning Cats and I want my function to:
Be generic in its effect and use a F[_]
Run on a dedicated thread pool
I want to introduce async boundaries
I assume that all my functions are generic in F[_] up to the main method because I'm trying to follow these Cat's guidelines
But I struggle to make these constraint to work by using ContextShift or ExecutionContext. I have written a full example here and this is an exctract from the example:
object ComplexOperation {
// Thread pool for ComplexOperation internal use only
val cs = IO.contextShift(
ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
)
// Complex operation that takes resources and time
def run[F[_]: Sync](input: String): F[String] =
for {
r1 <- Sync[F].delay(cs.shift) *> op1(input)
r2 <- Sync[F].delay(cs.shift) *> op2(r1)
r3 <- Sync[F].delay(cs.shift) *> op3(r2)
} yield r3
def op1[F[_]: Sync](input: String): F[Int] = Sync[F].delay(input.length)
def op2[F[_]: Sync](input: Int): F[Boolean] = Sync[F].delay(input % 2 == 0)
def op3[F[_]: Sync](input: Boolean): F[String] = Sync[F].delay(s"Complex result: $input")
}
This clearly doesn't abstract over effects as ComplexOperation.run needs a ContextShift[IO] to be able to introduce async boundaries. What is the right (or best) way of doing this?
Creating ContextShift[IO] inside ComplexOperation.run makes the function depend on IO which I don't want.
Moving the creation of a ContextShift[IO] on the caller will simply shift the problem: the caller is also generic in F[_] so how does it obtain a ContextShift[IO] to pass to ComplexOperation.run without explicitly depending on IO?
Remember that I don't want to use one global ContextShift[IO] defined at the topmost level but I want each component to decide for itself.
Should my ComplexOperation.run create the ContextShift[IO] or is it the responsibility of the caller?
Am I doing this right at least? Or am I going against standard practices?
So I took the liberty to rewrite your code, hope it helps:
import cats.effect._
object Functions {
def sampleFunction[F[_]: Sync : ContextShift](file: String, blocker: Blocker): F[String] = {
val handler: Resource[F, Int] =
Resource.make(
blocker.blockOn(openFile(file))
) { handler =>
blocker.blockOn(closeFile(handler))
}
handler.use(handler => doWork(handler))
}
private def openFile[F[_]: Sync](file: String): F[Int] = Sync[F].delay {
println(s"Opening file $file with handler 2")
2
}
private def closeFile[F[_]: Sync](handler: Int): F[Unit] = Sync[F].delay {
println(s"Closing file handler $handler")
}
private def doWork[F[_]: Sync](handler: Int): F[String] = Sync[F].delay {
println(s"Calculating the value on file handler $handler")
"The final value"
}
}
object Main extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val result = Blocker[IO].use { blocker =>
Functions.sampleFunction[IO](file = "filePath", blocker)
}
for {
data <- result
_ <- IO(println(data))
} yield ExitCode.Success
}
}
You can see it running here.
So, what does this code does.
First, it creates a Resource for the file, since close has to be done, even on guarantee or on failure.
It is using Blocker to run the open and close operations on a blocking thread poo (that is done using ContextShift).
Finally, on the main, it creates a default Blocker for instance, for **IO*, and uses it to call your function; and prints the result.
Fell free to ask any question.
I am trying to implement the countWords function from the red book on the parallelism chapter. When I pass a thread pool to the function and I modify the function to print the thread counting the words, I can only see the main thread printed. This indicates me that I am not able to make this function execute in parallel.
What I currently have:
type Par[A] = ExecutorService => Future[A]
def asyncF[A, B](f: A => B): A => Par[B] = a => lazyUnit(f(a))
def lazyUnit[A](a: => A): Par[A] = fork(unit(a))
def unit[A](a: A): Par[A] = (_: ExecutorService) => UnitFuture(a)
def fork[A](a: => Par[A]): Par[A] =
es => es.submit(new Callable[A] {
def call = a(es).get
})
def countWords(l: List[String]): Par[Int] = map(sequence(l.map(asyncF {
println(Thread.currentThread())
s => s.split(" ").length
})))(_.sum)
When I run:
val listPar = List("ab cd", "hg ks", "lh ks", "lh hs")
val es = Executors.newFixedThreadPool(4)
val counts = countWords(listPar)(es)
println(counts.get(100, SECONDS))
I get:
Thread[main,5,main]
8
I would expect to see a thread printed per each element of the list (as there are four elements and a thread pool of size 4) however I can only see the main thread printed.
Any suggestions?
Thanks
I want to start with one piece of advice when asking questions - you should always provide a MCVE. Your code doesn't compile; for example, I have no idea where UnitFuture comes from, I have no idea what's the implementation of sequence that you're using etc.
Here is a snippet that works with standard Scala. First, the explanation:
Method countWords takes a list of strings to count, and also two services - one for handling Java Futures on different threads, and one for handling Scala Futures on different threads. Scala one is derived from Java one via ExecutionContext.fromExecutor method.
Why both Java and Scala? Well, I wanted to preserve Java because that's how you initially wrote your code, but I don't know how to sequence a Java Future. So what I did was:
for each substring:
fork a Java Future task
turn it into a Scala Future
sequence the obtained list of Scala Futures
In case you're not familiar with implicits, you will (if you intend to work with Scala). Here I used the execution context implicitly because it removes a lot of boilerplate - this way I don't have to explicitly pass it when converting to Scala future, when mapping / sequencing etc.
And now the code itself:
import java.util.concurrent.{Callable, ExecutorService, Executors}
import java.util.concurrent.{Future => JFuture}
import scala.concurrent.{ExecutionContext, Future}
def scalaFromJavaFuture[A](
javaFuture: JFuture[A]
)(implicit ec: ExecutionContext): Future[A] =
Future { javaFuture.get }(ec)
def fork(s: String)(es: ExecutorService): java.util.concurrent.Future[Int] =
es.submit(new Callable[Int] {
def call = {
println(s"Thread: ${Thread.currentThread()}, processing string: $s")
s.split(" ").size
}
})
def countWords(l: List[String])(es: ExecutorService)(implicit ec: ExecutionContext): Future[Int] = {
val listOfFutures = l.map(elem => scalaFromJavaFuture(fork(elem)(es)))
Future.sequence(listOfFutures).map(_.sum)
}
val listPar = List("ab cd", "hg ks", "lh ks", "lh hs")
val es = Executors.newFixedThreadPool(4)
implicit val ec = ExecutionContext.fromExecutor(es)
val counts = countWords(listPar)(es)
counts.onComplete(println)
Example output:
Thread: Thread[pool-1-thread-1,5,main], processing string: ab cd
Thread: Thread[pool-1-thread-3,5,main], processing string: hg ks
Thread: Thread[pool-1-thread-2,5,main], processing string: lh ks
Thread: Thread[pool-1-thread-4,5,main], processing string: lh hs
Success(8)
Note that it's up to execution context to determine the threads. Run it a couple of times and you will see for yourself - you might end up with e.g. only two threads being used:
Thread: Thread[pool-1-thread-1,5,main], processing string: ab cd
Thread: Thread[pool-1-thread-3,5,main], processing string: hg ks
Thread: Thread[pool-1-thread-1,5,main], processing string: lh ks
Thread: Thread[pool-1-thread-1,5,main], processing string: lh hs
Success(8)
I have a csv file from which i read data and populate my database. I am using scala to do this. Instead of firing db inserts in a paralleled way I want to execute the insert in sequential manner(i.e. one after another). I am not willing to use Await in a for loop. Any other approach apart from using await?
P.S: I have read the 1000 entries from csv to a list and looping on the list to create db inserts
Assuming you have some kind of save(entity: T): Future[_] method for your database, you can just fold your futures with flatMap (or for comprehension):
def saveAll(entities: List[T]): Future[Unit]
entities.foldLeft(Future.successful(())){
case (f, entity) => for {
_ <- f
_ <- save(entity)
} yield ()
}
}
Another option is recursive function. Less concise than foldLeft, but more readable to some. Just one more option for your consideration (assume save(entity: T): Future[R]:
def saveAll(entities: List[T]): Future[List[R]] = {
entities.headOption match {
case Some(entity) =>
for {
head <- save(entity)
tail <- saveAll(entities.tail)
} yield {
head :: tail
}
case None =>
Future.successful(Nil)
}
}
Yet another option, if your save method allows you to supply your own ExecutionContext i.e. save(entity: T)(implicit ec: ExecutionContext): Future[R], is just fire the Futures concurrently but use a single thread execution context:
def saveAll(entities: List[T]): Future[List[R]] = {
implicit ec = ExecutionContext.fromExecutionService(java.util.concurrent.Executors.newSingleThreadExecutor)
Future.sequence(entities.map(save))
}
Here's the code from FPIS
object test2 {
//a naive IO monad
sealed trait IO[A] { self =>
def run: A
def map[B](f: A => B): IO[B] = new IO[B] { def run = f(self.run) }
def flatMap[B](f: A => IO[B]): IO[B] = {
println("calling IO.flatMap")
new IO[B] {
def run = {
println("calling run from flatMap result")
f(self.run).run
}
}
}
}
object IO {
def unit[A](a: => A): IO[A] = new IO[A] { def run = a }
def apply[A](a: => A): IO[A] = unit(a) // syntax for IO { .. }
}
//composer in question
def forever[A,B](a: IO[A]): IO[B] = {
lazy val t: IO[B] = a flatMap (_ => t)
t
}
def PrintLine(msg: String) = IO { println(msg) }
def say = forever(PrintLine("Still Going..")).run
}
test2.say will print thousands of "Still Going" before stack overflows. But I don't know exactly how that happens.
The output looks like this:
scala> test2.say
calling IO.flatMap //only once
calling run from flatMap result
Still Going..
calling run from flatMap result
Still Going..
... //repeating until stack overflows
When function forever returns, is the lazy val t fully computed (cached)?
And, the flatMap method seems to be called only once (I add print statements) which counters the recursive definition of forever. Why?
===========
Another thing I find interesting is that the B type in forever[A, B] could be anything. Scala actually can run with it being opaque.
I manually tried forever[Unit, Double], forever[Unit, String] etc and it all worked. This feels smart.
What forever method does is, as the name suggests, makes the monadic instance a run forever. To be more precise, it gives us an infinite chain of monadic operations.
Its value t is defined recursively as:
t = a flatMap (_ => t)
which expands to
t = a flatMap (_ => a flatMap (_ => t))
which expands to
t = a flatMap (_ => a flatMap (_ => a flatMap (_ => t)))
and so on.
Lazy gives us the ability to define something like this. If we removed the lazy part we would either get a "forward reference" error (in case the recursive value is contained within some method) or it would simply be initialized with a default value and not used recursively (if contained within a class, which makes it a class field with a behind-the-scenes getter and setter).
Demo:
val rec: Int = 1 + rec
println(rec) // prints 1, "rec" in the body is initialized to default value 0
def foo() = {
val rec: Int = 1 + rec // ERROR: forward reference extends over definition of value rec
println(rec)
}
However, this alone is not the reason why the whole stack overflow thing happens. There is another recursive part, and this one is actually responsible for the stack overflow. It is hidden here:
def run = {
println("calling run from flatMap result")
f(self.run).run
}
Method run calls itself (see that self.run). When we define it like this, we don't evaluate self.run on the spot because f hasn't been invoked yet; we are just stating that it will be invoked once run() is invoked.
But when we create the value t in forever, we are creating an IO monad that flatMaps into itself (the function it provides to flatMap is "evaluate into yourself"). This will trigger the run and therefore the evaluation and invocation of f. We never really leave the flatMap context (hence only one printed statement for the flatMap part) because as soon as we try to flatMap, run starts evaluating the function f which returns the IO on which we call run which invokes the function f which returns the IO on which we call run which invokes the function f which returns the IO on which we call run...
I'd like to know when function forever returns, is the lazy val t fully computed (cached)?
Yes
If so then why need the lazy keyword?
It's no use in your case. It can be useful in situation like:
def repeat(n: Int): Seq[Int] {
lazy val expensive = "some expensive computation"
Seq.fill(n)(expensive)
// when n == 0, the 'expensive' computation will be skipped
// when n > 1, the 'expensive' computation will only be computed once
}
The other thing I don't understand is that the flatMap method seems to
be called only once (I add print statements) which counters the
recursive definition of forever. Why?
Not possible to comment until you can provide a Minimal, Complete, and Verifiable example, like #Yuval Itzchakov said
Updated 19/04/2017
Alright, I need to correct myself :-) In your case the lazy val is required due to the recursive reference back to itself.
To explain your observation, let's try to expand the forever(a).run call:
forever(a) expands to
{ lazy val t = a flatMap(_ => t) } expands to
{ lazy val t = new IO[B] { def run() = { ... t.run } }
Because t is lazy, flatMap and new IO[B] in 2 and 3 are invoked only once and then 'cached' for reuse.
On invoking run() on 3, you start a recursion on t.run and thus the result you observed.
Not exactly sure about your requirement, but a non-stack-blowing version of forever can be implemented like:
def forever[A, B](a: IO[A]): IO[B] = {
new IO[B] {
#tailrec
override def run: B = {
a.run
run
}
}
}
new IO[B] {
def run = {
println("calling run from flatMap result")
f(self.run).run
}
}
I get it now why overflowing occurs at run method: the outer run invocation in def run actually points to def run itself.
The call stack looks like this:
f(self.run).run
|-----|--- println
|--- f(self.run).run
|-----|------println
|------f(self.run).run
|------ (repeating)
f(self.run) always points to the same evaluated/cached lazy val t object
because f: _ => t simply returns t that IS the UNIQUE newly created
IO[B] that hosts its run method which we are calling and will immediately recursively call again.
That's how we can see print statements before stack overflows.
However still not clear how lazy val in this case can cook it right.
So, I read the article here about parallel comprehension. He gives the following code example:
// Make 3 parallel async calls
val fooFuture = WS.url("http://foo.com").get()
val barFuture = WS.url("http://bar.com").get()
val bazFuture = WS.url("http://baz.com").get()
for {
foo <- fooFuture
bar <- barFuture
baz <- bazFuture
} yield {
// Build a Result using foo, bar, and baz
Ok(...)
}
All fine so far, but, I am in a situation where I don't know how many WS.get()'s I need to do always, I want it to be dynamic. So for instance:
val checks = Seq(callOne(param), callTwo(param))
Where the calls are:
def callOne(param: String): Future[Boolean] = {
// do something and return the Future with a true/false value
Future(true)
}
def callTwo(param: String): Future[Boolean] = {
// do something and return the Future with a true/false value
Future(false)
}
So, my question is, how shall I react on the results of my sequence with WS calls (or database queries for that matter), in a for-yield?
I have given two example of calls, but I want the same code be able to process 1 to many number of calls in parallel and gather the results in the for-yield to ultimately proceed to do other things.
Important: All calls should be carried out in parallel, the quickest ones will complete before the slow ones without any respect to what order they are fired.
Future.sequence is likely what you want.
Example usage:
val futures = List(WS.url("http://foo.com").get(), WS.url("http://bar.com").get())
Future.sequence(futures) # => Transforms a Seq[Future[_]] to Future[Seq[_]]
The future returns from Future.sequence will not be completed until the all of the futures in the input sequence are completed.
Bonus:
If your futures are heterogeneously typed, and you need to preserve that type, you can use Hlist. I've written the following snippet which will take an Hlist of futures, and transform it to a Future containing an Hlist of resolved values:
import shapeless._
import scala.concurrent.{ExecutionContext,Future}
object FutureHelpers {
object FutureReducer extends Poly2 {
import scala.concurrent.ExecutionContext.Implicits.global
implicit def f[A, B <: HList] = at[Future[A], Future[B]] { (f, resultFuture) =>
for {
result <- resultFuture
value <- f
} yield value :: result
}
}
// Like Future.sequence, but for HList
// hsequence(Future { 1 } :: Future { "string" } :: HNil)
// => Future { 1 :: "string" :: HNil }
def hsequence[T <: HList](hlist: T)(implicit
executor: ExecutionContext,
folder: RightFolder[T, Future[HNil], FutureReducer.type]) = {
hlist.foldRight(Future.successful[HNil](HNil))(FutureReducer)
}
}