I want to use IO monad.
But this code do not run with large file.
I am getting a StackOverflowError.
I tried the -DXss option, but it throws the same error.
val main = for {
l <- getFileLines(file)(collect[String, List]).map(_.run)
_ <- l.traverse_(putStrLn)
} yield ()
How can I do it?
I wrote Iteratee that is output all element.
def putStrLn[E: Show]: IterV[E, IO[Unit]] = {
import IterV._
def step(i: IO[Unit])(input: Input[E]): IterV[E, IO[Unit]] =
input(el = e => Cont(step(i >|> effects.putStrLn(e.shows))),
empty = Cont(step(i)),
eof = Done(i, EOF[E]))
Cont(step(mzero[IO[Unit]]))
}
val main = for {
i <- getFileLines(file)(putStrLn).map(_.run)
} yield i.unsafePerformIO
This is also the same result.
I think to be caused by IO implementation.
This is because scalac is not optimizing loop inside getReaderLines for tail calls. loop is tail recursive but I think the case anonymous function syntax gets in the way.
Edit: actually it's not even tail recursive (the wrapping in the IO monad) causes at least one more call after the recursive call. When I was doing my testing yesterday, I was using similar code but I had dropped the IO monad and it was then possible to make the Iteratee tail recursive. The text below, assumes no IO monad...
I happened to find that out yesterday while experimenting with iteratees. I think changing the signature of loop to this will help (so for the time being you may have to reimplement getFilesLines and getReaderLines:
#annotations.tailrec
def loop(it: IterV[String, A]): IO[IterV[String, A]] = it match {
// ...
}
We should probably report this to the scalaz folk (and may be open an enhancement ticket for scala).
This shows what happens (code vaguely similar to getReaderLines.loop):
#annotation.tailrec
def f(i: Int): Int = i match {
case 0 => 0
case x => f(x - 1)
}
// f: (i: Int)Int
#annotation.tailrec
def g: Int => Int = {
case 0 => 0
case x => g(x - 1)
}
/* error: could not optimize #tailrec annotated method g:
it contains a recursive call not in tail position
def g: Int => Int = {
^
*/
Related
How to write an early-return piece of code in scala with no returns/breaks?
For example
for i in 0..10000000
if expensive_operation(i)
return i
return -1
How about
input.find(expensiveOperation).getOrElse(-1)
You can use dropWhile
Here an example:
Seq(2,6,8,3,5).dropWhile(_ % 2 == 0).headOption.getOrElse(default = -1) // -> 8
And here you find more scala-takewhile-example
With your example
(0 to 10000000).dropWhile(!expensive_operation(_)).headOption.getOrElse(default = -1)`
Since you asked for intuition to solve this problem generically. Let me start from the basis.
Scala is (between other things) a functional programming language, as such there is a very important concept for us. And it is that we write programs by composing expressions rather than statements.
Thus, the concept of return value for us means the evaluation of an expression.
(Note this is related to the concept of referential transparency).
val a = expr // a is bounded to the evaluation of expr,
val b = (a, a) // and they are interchangeable, thus b === (expr, expr)
How this relates to your question. In the sense that we really do not have control structures but complex expressions. For example an if
val a = if (expr) exprA else exprB // if itself is an expression, that returns other expressions.
Thus instead of doing something like this:
def foo(a: Int): Int =
if (a != 0) {
val b = a * a
return b
}
return -1
We would do something like:
def foo(a: Int): Int =
if (a != 0)
a * a
else
-1
Because we can bound all the if expression itself as the body of foo.
Now, returning to your specific question. How can we early return a cycle?
The answer is, you can't, at least not without mutations. But, you can use a higher concept, instead of iterating, you can traverse something. And you can do that using recursion.
Thus, let's implement ourselves the find proposed by #Thilo, as a tail-recursive function.
(It is very important that the function is recursive by tail, so the compiler optimizes it as something equivalent to a while loop, that way we will not blow up the stack).
def find(start: Int, end: Int, step: Int = 1)(predicate: Int => Boolean): Option[Int] = {
#annotation.tailrec
def loop(current: Int): Option[Int] =
if (current == end)
None // Base case.
else if (predicate(current))
Some(current) // Early return.
else
loop(current + step) // Recursive step.
loop(current = start)
}
find(0, 10000)(_ == 10)
// res: Option[Int] = Some(10)
Or we may generalize this a little bit more, let's implement find for Lists of any kind of elements.
def find[T](list: List[T])(predicate: T => Boolean): Option[T] = {
#annotation.tailrec
def loop(remaining: List[T]): Option[T] =
remaining match {
case Nil => None
case t :: _ if (predicate(t)) => Some(t)
case _ :: tail => loop(remaining = tail)
}
loop(remaining = list)
}
This is not necessarily the best solution from a practical perspective but I still wanted to add it for educational purposes:
import scala.annotation.tailrec
def expensiveOperation(i: Int): Boolean = ???
#tailrec
def findFirstBy[T](f: (T) => Boolean)(xs: Seq[T]): Option[T] = {
xs match {
case Seq() => None
case Seq(head, _*) if f(head) => Some(head)
case Seq(_, tail#_*) => findFirstBy(f)(tail)
}
}
val result = findFirstBy(expensiveOperation)(Range(0, 10000000)).getOrElse(-1)
Please prefer collections methods (dropWhile, find, ...) in your production code.
There a lot of better answer here but I think a 'while' could work just fine in that situation.
So, this code
for i in 0..10000000
if expensive_operation(i)
return i
return -1
could be rewritten as
var i = 0
var result = false
while(!result && i<(10000000-1)) {
i = i+1
result = expensive_operation(i)
}
After the 'while' the variable 'result' will tell if it succeed or not.
Background
I have been reading the book Functional Programming in Scala, and have some questions regarding the content in Chapter 7: Purely functional parallelism.
Here is the code for the answers in the book: Par.scala, but I am confused about certain part of it.
Here is the first part of the code of Par.scala, which stands for Parallelism:
import java.util.concurrent._
object Par {
type Par[A] = ExecutorService => Future[A]
def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a)
private case class UnitFuture[A](get: A) extends Future[A] {
def isDone = true
def get(timeout: Long, units: TimeUnit): A = get
def isCancelled = false
def cancel(evenIfRunning: Boolean): Boolean = false
}
def map2[A, B, C](a: Par[A], b: Par[B])(f: (A, B) => C): Par[C] =
(es: ExecutorService) => {
val af = a(es)
val bf = b(es)
UnitFuture(f(af.get, bf.get))
}
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
def run[A](es: ExecutorService)(a: Par[A]): Future[A] = a(es)
def asyncF[A, B](f: A => B): A => Par[B] =
a => lazyUnit(f(a))
def map[A, B](pa: Par[A])(f: A => B): Par[B] =
map2(pa, unit(()))((a, _) => f(a))
}
The simplest possible model for Par[A] might be ExecutorService => Future[A], and run simply returns the Future.
unit promotes a constant value to a parallel computation by returning a UnitFuture, which is a simple implementation of Future that just wraps a constant value.
map2 combines the results of two parallel computations with a binary function.
fork marks a computation for concurrent evaluation. The evaluation won’t actually occur until forced by run. Here is with its simplest and most natural implementation of it. Even though it has its problems, let's first put them aside.
lazyUnit wraps its unevaluated argument in a Par and marks it for concurrent evaluation.
run extracts a value from a Par by actually performing the computation.
asyncF converts any function A => B to one that evaluates its result asynchronously.
Questions
The fork is the function confuses me a lot here, because it takes a lazy argument, which will be evaluated later when it is called. Then my questions are more about when we should use this fork, i.e., when we need lazy-evaluation and when we need to have the value directly.
Here is an exercise from the book:
EXERCISE 7.5
Hard: Write this function, called sequence. No additional primitives are required. Do not call run.
def sequence[A](ps: List[Par[A]]): Par[List[A]]
And here is the answers (offered here).
First
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldRight[Par[List[A]]](unit(List()))((h, t) => map2(h, t)(_ :: _))
What is the different between above code and the following:
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldLeft[Par[List[A]]](unit(List()))((t, h) => map2(h, t)(_ :: _))
Additionally
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] =
as match {
case Nil => unit(Nil)
case h :: t => map2(h, fork(sequenceRight(t)))(_ :: _)
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] = fork {
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(sequenceBalanced(l), sequenceBalanced(r))(_ ++ _)
}
}
In sequenceRight, fork is used when recursive function is directly called. However, in sequenceBalanced, fork is used outside of the whole function body.
Then, what is the differences or above code and the following (where we switched the places of fork):
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] = fork {
as match {
case Nil => unit(Nil)
case h :: t => map2(h, sequenceRight(t))(_ :: _)
}
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] =
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(fork(sequenceBalanced(l)), fork(sequenceBalanced(r)))(_ ++ _)
}
Finally, given the sequence defined above, we have the following function:
def parMap[A,B](ps: List[A])(f: A => B): Par[List[B]] = fork {
val fbs: List[Par[B]] = ps.map(asyncF(f))
sequence(fbs)
}
I would like to know, can I also implement the function in the following way, which is by applying the lazyUnit defined in the beginning? Is this implementation lazyUnit(ps.map(f)) lazy?
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
I did not completely understand your doubt. But I see a major problem with the following solution,
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
To understand the problem lets look at def lazyUnit,
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
So... lazyUnit takes an expression of type => A and submits it to ExecutorService to get evaluated. And returns the wrapped result of this parallel computation as Par[A].
In parMap for every element of ps: List[A], we not only have to evaluate the corresponding mapping using the function f: A => B but we have to do these evaluations in parallel.
But our solution lazyUnit(ps.map(f)) will submit the whole { ps.map(f) } evaluation as a single task to our ExecutionService. Which means we are not doing it in parallel.
What we need to do is make sure that for each element a in ps: [A], the function f: A => B is executed as a separate task for our ExecutorService.
Now, as we learned from our implementation is that we can run an expression of type exp: => A by using lazyUnit(exp) to get a result: Par[A].
So, we will do exactly that for every a: A in ps: List[A],
val parMappedTmp = ps.map( a => lazyUnit(f(a) ) )
// or
val parMappedTmp = ps.map( a => asyncF(f)(a) )
// or
val parMappedTmp = ps.map(asyncF(f))
But, Now our parMappedTmp is a List[Par[B]] and whereas we needed a Par[List[B]]
So, you will need a function with the following signature to get what you wanted,
def sequence[A](ps: List[Par[A]]): Par[List[A]]
Once you have it,
val parMapped = sequence(parMappedTmp)
Recently, I was playing with Scalaz Tutorial: Enumeration-based I/O With Iteratees written by Rúnar
I have a question about the implementation of enumerating the file.
def enumReader[A](r: BufferedReader,
it: IterV[String, A]): IO[IterV[String, A]] = {
#tailrec
def loop: IterV[String, A] => IO[IterV[String, A]] = {
case i#Done(_, _) => IO { i }
case i#Cont(k) => for {
s <- IO { r.readLine }
a <- if (s == null) IO { i } else loop(k(El(s)))
} yield a
}
loop(it)
}
My understanding of the code: The enumReader is getting signal Done or Cont from iteratee, if it is Cont, it call loop recursively.
However, this loop is not tail recursive, I use annotation #tailrec with a compilation error.
So, I think the problem is if enumReader try to read a large file, it will have stackoverflow exception.
Also, the reason I think it is hard is because usually when we want to change normal recursion to tail-recursion, we would have some accumulator used in the parameter, but in this case, it is a function IterV[String, A] => IO[IterV[String, A]]
Edit:
Further, I think when Iteratee method like Count may have same stackoverflow exception as well.
def counter[A]: IterV[A,Int] = {
def step(n: Int): Input[A] => IterV[A,Int] = {
case El(x) => Cont(step(n + 1))
case Empty => Cont(step(n))
case EOF => Done(n, EOF)
}
Cont(step(0))
can someone tell me how to refactor this one ?
Many thanks in advance
I'd like to make some function be optimized for tail-recursion. The function would emit stackoverflow exception without optimization.
Sample code:
import scala.util.Try
import scala.annotation.tailrec
object Main {
val trials = 10
#tailrec
val gcd : (Int, Int) => Int = {
case (a,b) if (a == b) => a
case (a,b) if (a > b) => gcd (a-b,b)
case (a,b) if (b > a) => gcd (a, b-a)
}
def main(args : Array[String]) : Unit = {
testTailRec()
}
def testTailRec() {
val outputs : List[Boolean] = Range(0, trials).toList.map(_ + 6000) map { x =>
Try( gcd(x, 1) ).toOption.isDefined
}
outputTestResult(outputs)
}
def outputTestResult(source : List[Boolean]) = {
val failed = source.count(_ == false)
val initial = source.takeWhile(_ == false).length
println( s"totally $failed failures, $initial of which at the beginning")
}
}
Running it will produce the following output:
[info] Running Main
[info] totally 2 failures, 2 of which at the beginning
So, first two runs are performed without optimization and are dropped half-way due to the stackoveflow exception, and only later invocations produce desired result.
There is a workaround: you need to warm up the function with fake runs before actually utilizing it. But it seems clumsy and highly inconvenient. Are there any other means to ensure my recursive function would be optimized for tail recursion before it runs for first time?
update:
I was told to use two-step definition
#tailrec
def gcd_worker(a: Int, b: Int): Int = {
if (a == b) a
else if (a > b) gcd(a-b,b)
else gcd(a, b-a)
}
val gcd : (Int,Int) => Int = gcd_worker(_,_)
I prefer to keep clean functional-style definition if it is possible.
I do not think #tailrec applies to the function defined as val at all. Change it to a def and it will run without errors.
From what I understand #tailrec[1] needs to be on a method, not a field. I was able to get this to be tail recursive in the REPL by making the following change:
#tailrec
def gcd(a: Int, b: Int): Int = {
if (a == b) a
else if (a > b) gcd(a-b,b)
else gcd(a, b-a)
}
[1] http://www.scala-lang.org/api/current/index.html#scala.annotation.tailrec
Looking at an IO Monad example from Functional Programming in Scala:
def ReadLine: IO[String] = IO { readLine }
def PrintLine(msg: String): IO[Unit] = IO { println(msg) }
def converter: IO[Unit] = for {
_ <- PrintLine("enter a temperature in degrees fahrenheit")
d <- ReadLine.map(_.toDouble)
_ <- PrintLine((d + 32).toString)
} yield ()
I decided to re-write converter with a flatMap.
def converterFlatMap: IO[Unit] = PrintLine("enter a temperate in degrees F").
flatMap(x => ReadLine.map(_.toDouble)).
flatMap(y => PrintLine((y + 32).toString))
When I replaced the last flatMap with map, I did not see the result of the readLine printed out on the console.
With flatMap:
enter a temperate in degrees
37.0
With map:
enter a temperate in degrees
Why? Also, how is the signature (IO[Unit]) still the same with map or flatMap?
Here's the IO monad from this book.
sealed trait IO[A] { self =>
def run: A
def map[B](f: A => B): IO[B] =
new IO[B] { def run = f(self.run) }
def flatMap[B](f: A => IO[B]): IO[B] =
new IO[B] { def run = f(self.run).run }
}
I think Scala converts IO[IO[Unit]] into the IO[Unit] in the second case. Try to run both variants in scala console, and don't specify type for the def converterFlatMap: IO[Unit], and you'll see the difference.
As for why map doesn't work, it is clearly seen from the definition of IO:
when you map over IO[IO[T]], map inside will call run only on the outer IO, result will be IO[IO[T]], so only first two PrintLine and ReadLine will be executed.
flatMap will also execute inner IO, and result will be IO[T] where T is the type parameter A of the inner IO, so all three of the statements will be executed.
P.S.: I think you incorrectly expanded for-comprehension. according to rules, for loop that you have written should be expanded to:
PrintLine("enter a temperate in degrees F").flatMap { case _ =>
ReadLine.map(_.toDouble).flatMap { case d =>
PrintLine((d + 32).toString).map { case _ => ()}
}
}
Notice that in this version flatMaps/maps are nested.
P.P.S: In fact last for statement should be also flatMap, not map. If we assume that scala had a "return" operator that puts values into the monadic context,
(e.g. return(3) will create IO[Int] that does nothing and it's function run returns 3.), then we can rewrite for (x <- a; y <- b) yield y as a.flatMap(x => b.flatMap( y => return(y))),
but because b.flatMap( y => return(y)) works absolutely the same as b.map(y => y) last statement in the scala for comprehension is expanded into map.