functional parallelism and laziness in Scala - scala

Background
I have been reading the book Functional Programming in Scala, and have some questions regarding the content in Chapter 7: Purely functional parallelism.
Here is the code for the answers in the book: Par.scala, but I am confused about certain part of it.
Here is the first part of the code of Par.scala, which stands for Parallelism:
import java.util.concurrent._
object Par {
type Par[A] = ExecutorService => Future[A]
def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a)
private case class UnitFuture[A](get: A) extends Future[A] {
def isDone = true
def get(timeout: Long, units: TimeUnit): A = get
def isCancelled = false
def cancel(evenIfRunning: Boolean): Boolean = false
}
def map2[A, B, C](a: Par[A], b: Par[B])(f: (A, B) => C): Par[C] =
(es: ExecutorService) => {
val af = a(es)
val bf = b(es)
UnitFuture(f(af.get, bf.get))
}
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
def run[A](es: ExecutorService)(a: Par[A]): Future[A] = a(es)
def asyncF[A, B](f: A => B): A => Par[B] =
a => lazyUnit(f(a))
def map[A, B](pa: Par[A])(f: A => B): Par[B] =
map2(pa, unit(()))((a, _) => f(a))
}
The simplest possible model for Par[A] might be ExecutorService => Future[A], and run simply returns the Future.
unit promotes a constant value to a parallel computation by returning a UnitFuture, which is a simple implementation of Future that just wraps a constant value.
map2 combines the results of two parallel computations with a binary function.
fork marks a computation for concurrent evaluation. The evaluation won’t actually occur until forced by run. Here is with its simplest and most natural implementation of it. Even though it has its problems, let's first put them aside.
lazyUnit wraps its unevaluated argument in a Par and marks it for concurrent evaluation.
run extracts a value from a Par by actually performing the computation.
asyncF converts any function A => B to one that evaluates its result asynchronously.
Questions
The fork is the function confuses me a lot here, because it takes a lazy argument, which will be evaluated later when it is called. Then my questions are more about when we should use this fork, i.e., when we need lazy-evaluation and when we need to have the value directly.
Here is an exercise from the book:
EXERCISE 7.5
Hard: Write this function, called sequence. No additional primitives are required. Do not call run.
def sequence[A](ps: List[Par[A]]): Par[List[A]]
And here is the answers (offered here).
First
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldRight[Par[List[A]]](unit(List()))((h, t) => map2(h, t)(_ :: _))
What is the different between above code and the following:
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldLeft[Par[List[A]]](unit(List()))((t, h) => map2(h, t)(_ :: _))
Additionally
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] =
as match {
case Nil => unit(Nil)
case h :: t => map2(h, fork(sequenceRight(t)))(_ :: _)
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] = fork {
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(sequenceBalanced(l), sequenceBalanced(r))(_ ++ _)
}
}
In sequenceRight, fork is used when recursive function is directly called. However, in sequenceBalanced, fork is used outside of the whole function body.
Then, what is the differences or above code and the following (where we switched the places of fork):
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] = fork {
as match {
case Nil => unit(Nil)
case h :: t => map2(h, sequenceRight(t))(_ :: _)
}
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] =
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(fork(sequenceBalanced(l)), fork(sequenceBalanced(r)))(_ ++ _)
}
Finally, given the sequence defined above, we have the following function:
def parMap[A,B](ps: List[A])(f: A => B): Par[List[B]] = fork {
val fbs: List[Par[B]] = ps.map(asyncF(f))
sequence(fbs)
}
I would like to know, can I also implement the function in the following way, which is by applying the lazyUnit defined in the beginning? Is this implementation lazyUnit(ps.map(f)) lazy?
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))

I did not completely understand your doubt. But I see a major problem with the following solution,
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
To understand the problem lets look at def lazyUnit,
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
So... lazyUnit takes an expression of type => A and submits it to ExecutorService to get evaluated. And returns the wrapped result of this parallel computation as Par[A].
In parMap for every element of ps: List[A], we not only have to evaluate the corresponding mapping using the function f: A => B but we have to do these evaluations in parallel.
But our solution lazyUnit(ps.map(f)) will submit the whole { ps.map(f) } evaluation as a single task to our ExecutionService. Which means we are not doing it in parallel.
What we need to do is make sure that for each element a in ps: [A], the function f: A => B is executed as a separate task for our ExecutorService.
Now, as we learned from our implementation is that we can run an expression of type exp: => A by using lazyUnit(exp) to get a result: Par[A].
So, we will do exactly that for every a: A in ps: List[A],
val parMappedTmp = ps.map( a => lazyUnit(f(a) ) )
// or
val parMappedTmp = ps.map( a => asyncF(f)(a) )
// or
val parMappedTmp = ps.map(asyncF(f))
But, Now our parMappedTmp is a List[Par[B]] and whereas we needed a Par[List[B]]
So, you will need a function with the following signature to get what you wanted,
def sequence[A](ps: List[Par[A]]): Par[List[A]]
Once you have it,
val parMapped = sequence(parMappedTmp)

Related

yld V.S yield on SCALA

I am working in a project that use the following code:
case class R(f: Vector[String], s: Vector[String]) {
def apply(name: String): String = f(schema indexOf name)
def apply(names: S): Vector[String] = names map (this apply _)
}
def processCSV(file: String)(yld: R => Unit): Unit = {
val in = new Scanner(file)
val s= in.next('\n').split(",").toVector
while (in.hasNext) {
val f = schema map (n => in.next(if (n == s.last) '\n' else ','))
yld(R(f, s))
}
}
def execOp(op: Operator)(yld: R => Unit): Unit = op match {
case Scan(file, _, _, _) => processCSV(file)(yld)}
Then My question is what is the meaning of yld? That is the same of yield?
Exactly how works, someone can help me to understand how works this yld?
yield is a scala keyword used with for-comprehensions.
yld in that code is VERY different: it is just the name that the author of the code gave one of the parameters to the functions processCSV and execOp. Any other name could have been given to those parameters: fn, callback, cb, etc. Nothing special there. Given the type R => Unit, it is just a function that takes R as input and return Unit (equivalent to void in java). Essentially, a callback where the work happens as side-effects.

Implicit parameter "chaining" for DSL

I have an idea (vague), to pass (or chain) some implicit value in this manner, not introducing parameters to block f:
def block(x: Int)(f: => Unit)(implicit v: Int) = {
implicit val nv = v + x
f
}
def fun(implicit v: Int) = println(v)
such that if I used something alike:
implicit val ii: Int = 0
block(1) {
block(2) {
fun
}
}
It would print 3.
If I could say def block(x: Int)(f: implicit Int => Unit).
In other words I'm seeking for some design pattern which will allow me to implement this DSL: access some cumulative value inside nested blocks but without explicitly passing it as parameter. Is it possible? (implicits are not necessary, just a hint to emphasize that I don't want to pass that accumulator explicitly). Of course upper code will print 0.
EDIT: One of possible usages: composing http routes, in a following manner
prefix("path") {
prefix("subpath") {
post("action1") { (req, res) => do action }
get("action2") { (req, res) => do action }
}
}
Here post and get will access (how?) accumulated prefix, say List("path", "subpath") or "/path/subpath/".
Consider using DynamicVariable for this. It's really simple to use, and thread-safe:
val acc: DynamicVariable[Int] = new DynamicVariable(0)
def block(x: Int)(f: => Unit) = {
acc.withValue(acc.value + x)(f)
}
def fun = println(acc.value)
Passing state via implicit is dirty and will lead to unexpected and hard to track down bugs. What you're asking to do is build a function that can compose in such a way that nested calls accumulate over some operation and anything else uses that value to execute the function?
case class StateAccum[S](init: S){
val op: S => S
def flatMap[A <: S](f: S => StateAccum[A]) ={
val StateAccum(out) = f(s)
StateAccum(op(init, out))
}
def apply(f: S => A) = f(init)
}
which could allow you do exactly what you're after with a slight change in how you're calling it.
Now, if you really want the nested control structures, your apply would have to use an implicit value to distinguish the types of the return such that it applied the function to one and a flatMap to StateAccum returns. It gets crazy but looks like the following:
def apply[A](f: S => A)(implicit mapper: Mapper[S, A]): mapper.Out = mapper(this, f)
trait Mapper[S, A]{
type Out
def apply(s: StateAccum[S], f: S => A): Out
}
object Mapper extends LowPriorityMapper{
implicit def acuum[S, A <: S] = new Mapper[S, StateAccum[A]]{
type Out = StateAccum[A]
def apply(s: StateAccum[S], f: S => StateAccum[A]) = s.flatMap(f)
}
}
trait LowPriorityMapper{
implicit def acuum[S, A] = new Mapper[S, A]{
type Out = A
def apply(s: StateAccum[S], f: S => A) = f(s.init)
}
}

flatMap and For-Comprehension with IO Monad

Looking at an IO Monad example from Functional Programming in Scala:
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("enter a temperature in degrees fahrenheit")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine((d + 32).toString)
} yield ()
I decided to re-write converter with a flatMap.
def converterFlatMap: IO[Unit] = PrintLine("enter a temperate in degrees F").
flatMap(x => ReadLine.map(_.toDouble)).
flatMap(y => PrintLine((y + 32).toString))
When I replaced the last flatMap with map, I did not see the result of the readLine printed out on the console.
With flatMap:
enter a temperate in degrees
37.0
With map:
enter a temperate in degrees
Why? Also, how is the signature (IO[Unit]) still the same with map or flatMap?
Here's the IO monad from this book.
sealed trait IO[A] { self =>
def run: A
def map[B](f: A => B): IO[B] =
new IO[B] { def run = f(self.run) }
def flatMap[B](f: A => IO[B]): IO[B] =
new IO[B] { def run = f(self.run).run }
}
I think Scala converts IO[IO[Unit]] into the IO[Unit] in the second case. Try to run both variants in scala console, and don't specify type for the def converterFlatMap: IO[Unit], and you'll see the difference.
As for why map doesn't work, it is clearly seen from the definition of IO:
when you map over IO[IO[T]], map inside will call run only on the outer IO, result will be IO[IO[T]], so only first two PrintLine and ReadLine will be executed.
flatMap will also execute inner IO, and result will be IO[T] where T is the type parameter A of the inner IO, so all three of the statements will be executed.
P.S.: I think you incorrectly expanded for-comprehension. according to rules, for loop that you have written should be expanded to:
PrintLine("enter a temperate in degrees F").flatMap { case _ =>
ReadLine.map(_.toDouble).flatMap { case d =>
PrintLine((d + 32).toString).map { case _ => ()}
}
}
Notice that in this version flatMaps/maps are nested.
P.P.S: In fact last for statement should be also flatMap, not map. If we assume that scala had a "return" operator that puts values into the monadic context,
(e.g. return(3) will create IO[Int] that does nothing and it's function run returns 3.), then we can rewrite for (x <- a; y <- b) yield y as a.flatMap(x => b.flatMap( y => return(y))),
but because b.flatMap( y => return(y)) works absolutely the same as b.map(y => y) last statement in the scala for comprehension is expanded into map.

How to carry on executing Future sequence despite failure?

The traverse method from Future object stops at first failure. I want a tolerant/forgiving version of this method which on occurrence of errors carries on with the rest of the sequence.
Currently we have added the following method to our utils:
def traverseFilteringErrors[A, B <: AnyRef]
(seq: Seq[A])
(f: A => Future[B]): Future[Seq[B]] = {
val sentinelValue = null.asInstanceOf[B]
val allResults = Future.traverse(seq) { x =>
f(x) recover { case _ => sentinelValue }
}
val successfulResults = allResults map { result =>
result.filterNot(_ == sentinelValue)
}
successfulResults
}
Is there a better way to do this?
A genuinely useful thing (generally speaking) would be to be able to promote the error of a future into a proper value. Or in other words, transform a Future[T] into a Future[Try[T]] (the succesful return value becomes a Success[T] while the failure case becomes a Failure[T]). Here is how we might implement it:
// Can also be done more concisely (but less efficiently) as:
// f.map(Success(_)).recover{ case t: Throwable => Failure( t ) }
// NOTE: you might also want to move this into an enrichment class
def mapValue[T]( f: Future[T] ): Future[Try[T]] = {
val prom = Promise[Try[T]]()
f onComplete prom.success
prom.future
}
Now, if you do the following:
Future.traverse(seq)( f andThen mapValue )
You'll obtain a succesful Future[Seq[Try[A]]], whose eventual value contains a Success instance for each successful future, and a Failure instance for each failed future.
If needed, you can then use collect on this seq to drop the Failure instances and keep only the sucessful values.
In other words, you can rewrite your helper method as follows:
def traverseFilteringErrors[A, B](seq: Seq[A])(f: A => Future[B]): Future[Seq[B]] = {
Future.traverse( seq )( f andThen mapValue ) map ( _ collect{ case Success( x ) => x } )
}

What would be the good name for this operation?

I see that Scala standard library misses the method to get ranges of objects in the collection, that satisfy the predicate:
def <???>(p: A => Boolean): List[List[A]] = {
val buf = collection.mutable.ListBuffer[List[A]]()
var elems = this.dropWhile(e => !p(e))
while (elems.nonEmpty) {
buf += elems.takeWhile(p)
elems = elems.dropWhile(e => !p(e))
}
buf.toList
}
What would be the good name for such method? And is my implementation good enough?
I'd go for chunkWith or chunkBy
As for your implementation, I think this cries out for recursion! See if you can fill out this
#tailrec def chunkBy[A](l: List[A], acc: List[List[A]] = Nil)(p: A => Boolean): List[List[A]] = l match {
case Nil => acc
case l =>
val next = l dropWhile !p
val (chunk, rest) = next span p
chunkBy(rest, chunk :: acc)(p)
}
Why recursion? It's much easier to understand the algorithm and more likely to be bug-free (given the absence of vars).
The syntax !p for the negation of a predicate is achieved via an implicit conversion
implicit def PredicateW[A](p: A => Boolean) = new {
def unary_! : A => Boolean = a => !p(a)
}
I generally keep this around as it's astoundingly useful
How about:
def chunkBy[K](f: A => K): Map[K, List[List[A]]] = ...
Similar to groupBy but keeps contiguous chunks as chunks.
Using this, you can do xs.chunkBy(p)(true) to get what you want.
You probably want to call it splitWith because split is the string operation that more-or-less does that, and it's similar to splitAt.
Incidentally, here's a very compact implementation (though it does a lot of unnecessary work, so it's not a good implementation for speed; yours is fine for that):
def splitWith[A](xs: List[A])(p: A => Boolean) = {
(xs zip xs.scanLeft(1){ (i,x) => if (p(x) == ((i&1)==1)) i+1 else i }.tail).
filter(_._2 % 2 == 0).groupBy(_._2).toList.sortBy(_._1).map(_._2.map(_._1))
}
Just a little refinement of oxbow's code, this way the signature is lighter
def chunkBy[A](xs: List[A])(p: A => Boolean): List[List[A]] = {
#tailrec
def recurse(todo: List[A], acc: List[List[A]]): List[List[A]] = todo match {
case Nil => acc
case _ =>
val next = todo dropWhile (!p(_))
val (chunk, rest) = next span p
recurse(rest, acc ::: List(chunk))
}
recurse(xs, Nil)
}