Scala, Using Responder to abstract a possible Asynchronous computation - scala

I have been looking into scala and AKKA for managing an obviously parallelisable algorithm.
I have some knowledge of functional programming and mostly do Java, so my FP might not be the best yet.
The algorithm I am working with is pretty simple, there is a top computation:
def computeFull(...): FullObject
This computation calls sub computations and then sum it up (to simplify):
def computePartial(...): Int
and computeFull does something like this (again simplifying):
val partials = for(x <- 1 to 10
y <- 1 to 10) yield computePartial(x, y)
partials.foldLeft(0)(_ + _)
So, it's very close to the AKKA example, doing the PI computation. I have many computeFull to call and many computePartial within each of them. So I can wrap all of this in AKKA actors, or to simplify in Futures, calling each computeFull and each computePartial in separate threads. I then can use the fold, zip and map functions of http://doc.akka.io/docs/akka/snapshot/scala/futures.html to combile the futures.
However, this implies that computeFull and computePartial will have to return Futures wrapping the actual results. They thus become dependent on AKKA and assuming that things are run in parallel. In fact, I also have to implicitly pass down the execution contexts within my functions.
I think that this is weird and that the algorithm "shouldn't" know the details of how it is parallelised, or if it is.
After reading about Futures in scala (and not the AKKA one) and looking into Code Continuation. It seems like the Responder monad that is provided by scala (http://www.scala-lang.org/api/current/scala/Responder.html) seems like the right way to abstract how the function calls are run.
I have this vague intuition that computeFull and computePartial could return Responders instead of futures and that when the monad in executed, it decides how the code embedded within the Responder gets executed (if it spawns a new actor or if it is executed on the same thread).
However, I am not really sure how to get to this result. Any suggestions? Do you think I am on the right way?

If you don’t want to be dependent on Akka (but note that Akka-style futures will be moved and included with Scala 2.10) and your computation is a simple fold on a collection you can simply use Scala’s parallel collections:
val partials = for { x <- (1 to 10).par
y <- 1 to 10
} yield computePartial(x, y)
// blocks until everything is computed
partials.foldLeft(0)(_ + _)
Of course, this will block until partials is ready, so it may not be a appropriate situation when you really need futures.
With Scala 2.10 style futures you can make that completely asynchronous without your algorithms ever noticing it:
def computePartial(x: Int, y: Int) = {
Thread.sleep((1000 * math.random).toInt)
println (x + ", " + y)
x * y
}
import scala.concurrent.future
import scala.concurrent.Future
val partials: IndexedSeq[Future[Int]] = for {
x <- 1 to 10
y <- 1 to 10
} yield future(computePartial(x, y))
val futureResult: Future[Int] = Future.sequence(partials).map(_.fold(0)(_ + _))
def useResult(result: Int) = println(s"The sum is $result")
// now I want to use the result of my computation
futureResult map { result => // called when ready
useResult(result)
}
// still no blocking
println("This is probably printed before the sum is calculated.")
So, computePartial does not need to know anything about how it is being executed. (It should not have any side-effects though, even though for the purpose of this example, a println side-effect was included.)
A possible computeFull method should manage the algorithm and as such know about Futures or parallelism. After all this is part of the algorithm.
(As for the Responder: Scala’s old futures use it so I don’t know where this is going. – And isn’t an execution context exactly the means of configuration you are looking for?)

The single actor in akka knows not if he runs in parrallel or not. That is how akka is designed. But if you don't want to rely on akka you can use parrallel collections like:
for (i <- (0 until numberOfPartialComputations).par) yield (
computePartial(i)
).sum
The sum is called on a parrallel collection and is performed in parrallel.

Related

Monads being a mechanism for sequencing computations, is the below list still a monad though they are printed in a random order Scala

for {
i <- 1 to 5
} yield Future(println(i))
Desugared to:
List(1,2,3,4,5).map {i => Future(println(i))}
The above code prints numbers in random order.
Now, if we see the multiple definitions of Monad:
a) Monad is a wrapper over an object
b) Monad is a mechanism for sequencing computations
The question that I'm trying to answer is that shouldn't map operation on List monad wait for the first element in the list to be printed and only then go for the computation of the second element regardless of Future?
Sorry, it might be simple and I'm complicating it but it gets trickier for me to find simple reasoning. Answers will be much appreciated:)
Compare:
for {
_ <- Future(println(1))
_ <- Future(println(2))
_ <- Future(println(3))
_ <- Future(println(4))
_ <- Future(println(5))
} yield ()
or
Future(println(1)).flatMap { _ =>
Future(println(2))
}.flatMap { _ =>
Future(println(3))
}.flatMap { _ =>
Future(println(4))
}.flatMap { _ =>
Future(println(5))
}
with
List(
Future(println(1)),
Future(println(2)),
Future(println(3)),
Future(println(4)),
Future(println(5))
)
The first two create the next Future only after the former completed and made the result available. The last one creates all Futures at once (and it doesn't differ much in this regard from your example with List[Future]).
Future (as opposed to IO from Cats Effect, Monix's Task or ZIO) is eager, so it starts execution the moment you create it. For that reason you have sequential result in the first two examples, and random order (race condition) in the third example.
If you used IO instead of Future it would be more apparent because you wouldn't be able to just have List[IO[Unit]] and execute side effects - you would have to somehow combine the different IOs into one, and the way you would do it would make it obvious whether the effects will be sequential or parallel.
The bottom line is - whether or not Future is a monad depends on how the .flatMap behaves (and how it behaves with combination with Future.successful), so your results doesn't invalidate the claim that Future is a monad. (You can have some doubts if you start checking its behavior with exceptions, but that is another topic).
The execution of map is sequential indeed, but when you wrap it to a Future it gets executed in an asynchronous manner, I mean it is evaluated in another thread and because of that, it is not possible to know what thread is going to finish earlier because it depends also in the thread management of the operating system and other considerations.
Both of your code snippets are still Monads in loose terms. When you did .map() on your object, the map picks element one by one in orderly fashion (from index 0 to index 4). And then it passes on that to an operation block (which is body of map - map is a higher order function that accepts a function of type f:This => That).
So monad operation's responsibility is picking it up and passing it as paramater to a function.
In your case the actual function type is:
f: Int => Future[Unit]
For clarity, your function actually looks like this:
def someFunction(i: Int): Future[Unit] = {
Future {
println(i)
}
}
So, what the map operation did here is that it picked on item from your object (in sequence, one by one) and called the someFunction(i). And that's all a monad does.
Now to answer why your println are random, it's because of JVM threads.
If you re-define the body of you map like this
List(1,2,3,4,5)
.map {i =>
println(s"Going to invoke the println in another thread for $i")
Future(println(i))
}
You'll see that the first println will be in sequence - always! It proves that .map() picks your elements in sequence. While the next println may or may not be out of sequence. This out of order fashion is not because of monad operation map but because of multithreading nature in multi core CPUs.

Cats effect - parallel composition of independent effects

I want to combine multiple IO values that should run independently in parallel.
val io1: IO[Int] = ???
val io2: IO[Int] = ???
As I see it, I have to options:
Use cats-effect's fibers with a fork-join pattern
val parallelSum1: IO[Int] = for {
fiber1 <- io1.start
fiber2 <- io2.start
i1 <- fiber1.join
i2 <- fiber2.join
} yield i1 + i2
Use the Parallel instance for IO with parMapN (or one of its siblings like parTraverse, parSequence, parTupled etc)
val parallelSum2: IO[Int] = (io1, io2).parMapN(_ + _)
Not sure about the pros and cons of each approach, and when should I choose one over the other. This becomes even more tricky when abstracting over the effect type IO (tagless-final style):
def io1[F[_]]: F[Int] = ???
def io2[F[_]]: F[Int] = ???
def parallelSum1[F[_]: Concurrent]: F[Int] = for {
fiber1 <- io1[F].start
fiber2 <- io2[F].start
i1 <- fiber1.join
i2 <- fiber2.join
} yield i1 + i2
def parallelSum2[F[_], G[_]](implicit parallel: Parallel[F, G]): F[Int] =
(io1[F], io2[F]).parMapN(_ + _)
The Parallel typeclass requires 2 type constructors, making it somewhat more cumbersome to use, without context bounds and with an additional vague type parameter G[_]
Your guidance is appreciated :)
Amitay
I want to combine multiple IO values that should run independently in
parallel.
The way I view it, in order to figure out "when do I use which?", we need to return the the old parallel vs concurrent discussion, which basically boils down to (quoting the accepted answer):
Concurrency is when two or more tasks can start, run, and complete in
overlapping time periods. It doesn't necessarily mean they'll ever
both be running at the same instant. For example, multitasking on a
single-core machine.
Parallelism is when tasks literally run at the same time, e.g., on a
multicore processor.
We often like to provide an example of concurrency when we we do IO like operations, such as creating an over the wire call, or talking to disk.
Question is, which one do you want when you say you want to execute "in parallel", is it the former or the latter?
If we're referring to the former, then using Concurrent[F] both conveys the intention by the signature and provides the proper execution semantics. If it's the latter, and we, for example, want to process a collection of elements in parallel, then going with Parallel[F, G] would be the better solution.
It is often quite confusing when we think about the semantics of this regarding IO, because it has both instances for Parallel and Concurrent and we mostly use it to opaquely define side effecting operations.
As a side note, the reason behind Parallel taking two unary type constructors is because of the fact that M (in Parallel[M[_], F[_]]) in always a Monad instance, and we need a way to prove the Monad has an Applicative[F] instance as well for parallel executions, because when we think of a Monad we always talk about sequential execution semantics.

How to implement a recursive Fibonacci sequence in Scala using FS2?

While trying to become familiar with FS2, I came across a nifty recursive implementation using the Scala collections' Stream, and thought I'd have a go at trying it in FS2:
import fs2.{Pure, Stream}
val fibs: Stream[Pure, Int] = Stream[Pure, Int](0) ++ fibs.fold[Int](1)(_ + _)
println(fibs take 10 toList) // This will hang
What is the reason this hangs in FS2, and what is the best way to get a similar, working solution?
Your issue is that Stream.fold consumes all elements of the stream, producing a single final value from the fold. Note that it only emits one element.
The recursive stream only terminates when 10 elements have been emitted (this is specified by take 10). Since this stream is not productive enough, fold continues to add values without stopping.
The simplest way to fix this is to use a combinator that emits the partial results from the fold; this is scan.
Also, FS2 can infer most of the types in this code, so you don't necessarily need as many type annotations.
The following implementation should work fine:
import fs2.{Pure, Stream}
val fibs: Stream[Pure, Int] = Stream(0) ++ fibs.scan(1)(_ + _)
println(fibs take 10 toList)

What are advantages of a Twitter Future over a Scala Future?

I know a lot of reasons for Scala Future to be better. Are there any reasons to use Twitter Future instead? Except the fact Finagle uses it.
Disclaimer: I worked at Twitter on the Future implementation. A little bit of context, we started our own implementation before Scala had a "good" implementation of Future.
Here're the features of Twitter's Future:
Some method names are different and Twitter's Future has some new helper methods in the companion.
e.g. Just one example: Future.join(f1, f2) can work on heterogeneous Future types.
Future.join(
Future.value(new Object), Future.value(1)
).map {
case (o: Object, i: Int) => println(o, i)
}
o and i keep their types, they're not casted into the least common supertype Any.
A chain of onSuccess is guaranteed to be executed in order:
e.g.:
f.onSuccess {
println(1) // #1
} onSuccess {
println(2) // #2
}
#1 is guaranteed to be executed before #2
The Threading model is a little bit different. There's no notion of ExecutionContext, the Thread that set the value in a Promise (Mutable implementation of a Future) is the one executing all the computations in the future graph.
e.g.:
val f1 = new Promise[Int]
f1.map(_ * 2).map(_ + 1)
f1.setValue(2) // <- this thread also executes *2 and +1
There's a notion of interruption/cancellation. With Scala's Futures, the information only flows in one direction, with Twitter's Future, you can notify a producer of some information (not necessarily a cancellation). In practice, it's used in Finagle to propagate the cancellation of a RPC. Because Finagle also propagates the cancellation across the network and because Twitter has a huge fan out of requests, this actually saves lots of work.
class MyMessage extends Exception
val p = new Promise[Int]
p.setInterruptHandler {
case ex: MyMessage => println("Receive MyMessage")
}
val f = p.map(_ + 1).map(_ * 2)
f.raise(new MyMessage) // print "Receive MyMessage"
Until recently, Twitter's Future were the only one to implement efficient tail recursion (i.e. you can have a recursive function that call itself without blowing up you call stack). It has been implemented in Scala 2.11+ (I believe).
As far as I can tell the main difference that could go in favor of using Twitter's Future is that it can be cancelled, unlike scala's Future.
Also, there used to be some support for tracing the call chains (as you probably know plain stack traces are close to being useless when using Futures). In other words, you could take a Future and tell what chain of map/flatMap produced it. But the idea has been abandoned if I understand correctly.

Are there better monadic abstraction alternative for representing long running, async task?

A Future is good at representing a single asynchronous task that will / should be completed within some fixed amount of time.
However, there exist another kind of asynchronous task, one where it's not possible / very hard to know exactly when it will finish. For example, the time taken for a particular string processing task might depend on various factors such as the input size.
For these kind of task, detecting failure might be better by checking if the task is able to make progress within a reasonable amount of time instead of by setting a hard timeout value such as in Future.
Are there any libraries providing suitable monadic abstraction of such kind of task in Scala?
You could use a stream of values like this:
sealed trait Update[T]
case class Progress[T](value: Double) extends Update[T]
case class Finished[T](result: T) extends Update[T]
let your task emit Progress values when it is convenient (e.g. every time a chunk of the computation has finished), and emit one Finished value once the complete computation is finished. The consumer could check for progress values to ensure that the task is still making progress. If a consumer is not interested in progress updates, you can just filter them out. I think this is more composable than an actor-based approach.
Depending on how much performance or purity you need, you might want to look at akka streams or scalaz streams. Akka streams has a pure DSL for building flow graphs, but allows mutability in processing stages. Scalaz streams is more functional, but has lower performance last I heard.
You can break your work into chunks. Each chunk is a Future with a timeout - your notion of reasonable progress. Chain those Futures together to get a complete task.
Example 1 - both chunks can run in parallel and don't depend on each other (embarassingly parallel task):
val chunk1 = Future { ... } // chunk 1 starts execution here
val chunk2 = Future { ... } // chunk 2 starts execution here
val result = for {
c1 <- chunk1
c2 <- chunk2
} yield combine(c1, c2)
Example 2 - second chunk depends on the first:
val chunk1 = Future { ... } // chunk 1 starts execution here
val result = for {
c1 <- chunk1
c2 <- Future { c1 => ... } // chunk 2 starts execution here
} yield combine(c1, c2)
There are obviously other constructs to help you when you have many Futures like sequence.
The article "The worst thing in our Scala code: Futures" by Ken Scambler points the need for a separation of concerns:
scala.concurrent.Future is built to work naturally with scala.util.Try, so our code often ends up clumsily using Try everywhere to represent failure, using raw exceptions as failure values even where no exceptions get thrown.
To do anything with a scala.concurrent.Future, you need to lug around an implicit ExecutionContext. This obnoxious dependency needs to be threaded through everywhere they are used.
So if your code does not depend directly on Future, but on simple Monad properties, you can abstract it with a Monad type:
trait Monad[F[_]] {
def flatMap[A,B](fa: F[A], f: A => F[B]): F[B]
def pure[A](a: A): F[A]
}
// read the type parameter as “for all F, constrained by Monad”.
final def load[F[_]: Monad](pageUrl: URL): F[Page]