I'm reading and having fun with examples and exercises contained in the book Functional Programming in Scala. I'm studing the strictess and laziness chapter talking about the Stream.
I can't understand the output produced by the following code excerpt:
sealed trait Stream[+A]{
def foldRight[B](z: => B)(f: (A, => B) => B): B =
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f))
case _ => z
}
def map[B](f: A => B): Stream[B] = foldRight(Stream.empty[B])((h,t) => {println(s"map h:$h"); Stream.cons(f(h), t)})
def filter(f:A=>Boolean):Stream[A] = foldRight(Stream.empty[A])((h,t) => {println(s"filter h:$h"); if(f(h)) Stream.cons(h,t) else t})
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, tl: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] =
if (as.isEmpty) empty else cons(as.head, apply(as.tail: _*))
}
Stream(1,2,3,4,5,6).map(_+10).filter(_%2==0)
When I execute this code, I receive this output:
map h:1
filter h:11
map h:2
filter h:12
My questions are:
Why map and filter output are interleaved?
Could you explain all steps involved from the Stream creation until the last step for obtaining this behavior?
Where are other elements of the list that pass also filter transformation, so 4 and 6?
The key to understanding this behavior, I think, is in the signature of the foldRight.
def foldRight[B](z: => B)(f: (A, => B) => B): B = ...
Note that the 2nd argument, f, is a function that takes two parameters, an A and a by-name (lazy) B. Take away that laziness, f: (A, B) => B, and you not only get the expected method grouping (all the map() steps before all the filter() steps), they also come in reverse order with 6 processed first and 1 processed last, as you'd expect from a foldRight.
How does one little => perform all that magic? It basically says that the 2nd argument to f() is going to be held in reserve until it is required.
So, attempting to answer your questions.
Why map and filter output are interleaved?
Because each call to map() and filter() are delayed until the point when the values are requested.
Could you explain all steps involved from the Stream creation until the last step for obtaining this behavior?
Not really. That would take more time and SO answer space than I'm willing to contribute, but let's take just a few steps into the morass.
We start with a Stream, which looks likes a series of Cons, each holding an Int and a reference to the next Cons, but that's not completely accurate. Each Cons really holds two functions, when invoked the 1st produces an Int and the 2nd produces the next Cons.
Call map() and pass it the "+10" function. map() creates a new function: "Given h and t (both values), create a new Cons. The head function of the new Cons, when invoked, will be the "+10" function applied to the current head value. The new tail function will produce the t value as received." This new function is passed to foldRight.
foldRight receives the new function but the evaluation of the function's 2nd parameter will be delayed until it is needed. h() is called to retrieve the current head value, t() will be called to retrieve the current tail value and a recursive call to foldRight will be called on it.
Call filter() and pass it the "isEven" function. filter() creates a new function: "Given h and t, create a new Cons if h passes the isEven test. If not then return t." That's the real t. Not a promise to evaluate its value later.
Where are other elements of the list that pass also filter transformation, so 4 and 6?
They are still there waiting to be evaluated. We can force that evaluation by using pattern matching to extract the various Cons one by one.
val c0#Cons(_,_) = Stream(1,2,3,4,5,6).map(_+10).filter(_%2==0)
// **STDOUT**
//map h:1
//filter h:11
//map h:2
//filter h:12
c0.h() //res0: Int = 12
val c1#Cons(_,_) = c0.t()
// **STDOUT**
//map h:3
//filter h:13
//map h:4
//filter h:14
c1.h() //res1: Int = 14
val c2#Cons(_,_) = c1.t()
// **STDOUT**
//map h:5
//filter h:15
//map h:6
//filter h:16
c2.h() //res2: Int = 16
c2.t() //res3: Stream[Int] = Empty
Related
I am reading Functional Programming in Scala and am on chapter 5, which covers strictness and laziness. I've done all the problems in the book up to 5.13.
One thing I realized as I was working through the problems was that I didn't really understand all the nuances of foldRight until I started doing the problems in this chapter. My question is about the implementation of foldRight given in this chapter.
Let's begin by looking at the implementation for Streams given in this chapter.
import Stream._
trait Stream[+A] {
def toList: List[A] = this match {
case Empty => Nil
case Cons(h, t) => h() :: t().toList
}
def foldRight[B](z: => B)(f: (A, => B) => B): B = // The arrow `=>` in front of the argument type `B` means that the function `f` takes its second argument by name and may choose not to evaluate it.
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f)) // If `f` doesn't evaluate its second argument, the recursion never occurs.
case _ => z
}
def map[B](f: A => B): Stream[B] =
foldRight(empty[B])((h,t) => cons(f(h), t))
def filter(f: A => Boolean): Stream[A] =
foldRight(empty[A])((h,t) =>
if (f(h)) cons(h, t)
else t)
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, tl: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] =
if (as.isEmpty) empty
else cons(as.head, apply(as.tail: _*))
}
In particular, let's zoom in on the definition of foldRight given here:
def foldRight[B](z: => B)(f: (A, => B) => B): B = // The arrow `=>` in front of the argument type `B` means that the function `f` takes its second argument by name and may choose not to evaluate it.
this match {
case Cons(h,t) => f(h(), t().foldRight(z)(f)) // If `f` doesn't evaluate its second argument, the recursion never occurs.
case _ => z
}
From the text in the book, we know that this implementation of foldRight will not evaluate the second argument of the "chain" function unless it is called. Therefore, foldRight can terminate early before evaluating all the elements of the Stream. Consider this implementation of takeWhile in terms of foldRight as an example:
def takeWhile(f: A => Boolean): Stream[A] =
foldRight(empty[A])((h,t) =>
if (f(h)) cons(h,t)
else empty)
This call to foldRight in the definition of takeWhile can terminate early before traversing all the elements of the Stream. So we know foldRight can terminate early. However, this is not the same thing as saying foldRight is lazy. My question is if this implementation of foldRight is itself lazy or strict. That is, does it apply f to every relevant element in the Stream?
For example, let us take this example:
val someStream = Stream.apply(0, 1, 2, 3, 5)
someStream.foldRight(Empty:Stream[Int]){ (e, acc) =>
Cons(()=>e, ()=>acc)
}
What this call to foldRight would normally do for a non-lazy collection such as a list is that it will copy the list. For this call to someStream, however, will it create a copy of all the members of the Stream? Or will it only evaluate the first term?
Recall that the function in foldRight has the following signature: (A, => B) => B. My question really boils down to when acc, the call-by-name second argument of type B, is evaluated. Is it evaluated when f is first called? Or is acc evaluated when ()=>acc is evaluated? Depending on which one it is, we will either evaluate all the members of the Stream or only the first one when we perform the fold.
And just to reiterate, saying that foldRight terminates early and saying that foldRight is lazy are two different things. The discussion in the book makes it clear that the first statement can be true. This foldRight can terminate early without traversing all the elements of the Stream. But is the second statement true? Is the foldRight we have here lazy?
I bring this up because there is a program trace on page 72 that I don't think I fully understand. It is linked to whether foldRight is itself lazy or not.
Stream(1,2,3,4).map(_ + 10).filter(_ % 2 == 0).toList
cons(11, Stream(2,3,4).map(_ + 10)).filter(_ % 2 == 0).toList
Stream(2,3,4).map(_ + 10).filter(_ % 2 == 0).toList
cons(12, Stream(3,4).map(_ + 10)).filter(_ % 2 == 0).toList
12 :: Stream(3,4).map(_ + 10).filter(_ % 2 == 0).toList
12 :: cons(13, Stream(4).map(_ + 10)).filter(_ % 2 == 0).toList
12 :: Stream(4).map(_ + 10).filter(_ % 2 == 0).toList
12 :: cons(14, Stream().map(_ + 10)).filter(_ % 2 == 0).toList
12 :: 14 :: Stream().map(_ + 10).filter(_ % 2 == 0).toList
12 :: 14 :: List()
Notice how in this trace, the first term (the 1) is passed through the map and filter once before any of the subsequent terms are. Since the map and filter are defined in the Stream using foldRight, this has implications for my question.
So, let's summarize my question:
1) Is foldRight in the implementation given lazy or strict? This is a different question from whether it terminates early.
2) For the following example code,
val someStream = Stream.apply(0, 1, 2, 3, 5)
someStream.foldRight(Empty:Stream[Int]){ (e, acc) =>
Cons(()=>e, ()=>acc)
}
When is acc, the call-by-name parameter, evaluated? Is it when f is called and (e, acc) => Cons(()=>e, ()=>acc) is first created? Or is it when ()=>acc is evaluated?
3) Please explain why in the program trace given, the first term is passed through the map and filter first before moving onto the second term.
You are encouraged to use the text of Functional Programming in Scala to make your point.
I'm following a book's example to implement a Steam class using lazy evaluation in Scala.
sealed trait Stream[+A]
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, tl: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] = {
if (as.isEmpty) empty else cons(as.head, apply(as.tail: _*))
}
}
Then I used a simple function to test if it's working
def printAndReturn: Int = {
println("called")
1
}
Then I construct Stream like the following:
println(s"apply: ${
Stream(
printAndReturn,
printAndReturn,
printAndReturn,
printAndReturn
)
}")
The output is like this:
called
called
called
called
apply: Cons(fpinscala.datastructures.Stream$$$Lambda$7/1170794006#e580929,fpinscala.datastructures.Stream$$$Lambda$8/1289479439#4c203ea1)
Then I constructed Stream using cons:
println(s"cons: ${
cons(
printAndReturn,
cons(
printAndReturn,
cons(printAndReturn, Empty)
)
)
}")
The output is:
cons: Cons(fpinscala.datastructures.Stream$$$Lambda$7/1170794006#2133c8f8,fpinscala.datastructures.Stream$$$Lambda$8/1289479439#43a25848)
So here are two questions:
When constructing Stream using the apply function, all printAndReturn are evaluated. Is this because the recursive call to apply(as.head, ...) evaluates every head?
If the answer to the first question is true, then how do I change apply to make it not force evaluation?
No. If you put a breakpoint on the println you'll find that the method is actually being called when you first create the Stream. The line Stream(printAndReturn, ... actually calls your method however many times you put it there. Why? Consider the type signatures for cons and apply:
def cons[A](hd: => A, tl: => Stream[A]): Stream[A]
vs:
def apply[A](as: A*): Stream[A]
Note that the definition for cons has its parameters marked as => A. This is a by-name parameter. Declaring an input like this makes it lazy, delaying its evaluation until it is actually used. Hence your println will never get called using cons. Compare this to apply. You're not using a by name parameter and therefore anything that gets passed in to that method will automatically get evaluated.
Unfortunately there isn't a super easy way as of now. What you really want is something like def apply[A](as: (=>A)*): Stream[A] but unfortunately Scala does not support vararg by name parameters. See this answer for a few ideas on how to get around this. One way is to just wrap your function calls when creating the Stream:
Stream(
() => printAndReturn,
() => printAndReturn,
() => printAndReturn,
() => printAndReturn)
Which will then delay the evaluation.
When you called
Stream(
printAndReturn,
printAndReturn,
printAndReturn,
printAndReturn
)
the apply in the companion object was invoked. Looking at the parameter type of the apply, you would notice that it is strict. So the arguments will be evaluated first before being assigned to as. What as becomes is an Array of Ints
For 2, you can define apply as
def apply[A](as: (() => A)*): Stream[A] =
if (as.isEmpty) empty else cons(as.head(), apply(as.tail: _*))
and as was suggested above, you need to pass the arguments as thunks themselves as in
println(s"apply: ${Stream(
() => printAndReturn,
() => printAndReturn,
() => printAndReturn,
() => printAndReturn
)}")
if with recursion almost clear, for example
def product2(ints: List[Int]): Int = {
#tailrec
def productAccumulator(ints: List[Int], accum: Int): Int = {
ints match {
case Nil => accum
case x :: tail => productAccumulator(tail, accum * x)
}
}
productAccumulator(ints, 1)
}
I am not sure about to the corecursion. According to the Wikipedia article, "corecursion allows programs to produce arbitrarily complex and potentially infinite data structures, such as streams". For example construction like this
list.filter(...).map(...)
makes to posible prepare temporary streams after filter and map operations.
after filter stream will be collect only filtered elements, and next in the map we will change elements. Correct?
Do functional combinators use recursion executions for map filter
Does any body have good example in Scala "comparing recursion and corecursion"?
The simplest way to understand the difference is to think that recursion consumes data while corecursion produces data. Your example is recursion since it consumes the list you provide as parameter. Also, foldLeft and foldRight are recursion too, not corecursion. Now an example of corecursion. Consider the following function:
def unfold[A, S](z: S)(f: S => Option[(A, S)]): Stream[A]
Just by looking at its signature you can see this function is intended to produce an infinite stream of data. It takes an initial state, z of type S, and a function from S to a possible tuple that will contain the next state and the actual value of the stream, that is of type A. If the result of f is empty (None) then unfold stops producing elements otherwise it goes on passing the next state and so on. Here is its implementation:
def unfold[S, A](z: S)(f: S => Option[(A, S)]): Stream[A] = f(z) match {
case Some((a, s)) => a #:: unfold(s)(f)
case None => Stream.empty[A]
}
You can use this function to implement other productive functions. E.g. the following function will produce a stream of, at most, numOfValues elements of type A:
def elements[A](element: A, numOfValues: Int): Stream[A] = unfold(numOfValues) { x =>
if (x > 0) Some((element, x - 1)) else None
}
Usage example in REPL:
scala> elements("hello", 3)
res10: Stream[String] = Stream(hello, ?)
scala> res10.toList
res11: List[String] = List(hello, hello, hello)
I have a function in a context, (in a Maybe / Option) and I want to pass it a value and get back the return value, directly out of the context.
Let's take an example in Scala :
scala> Some((x:Int) => x * x)
res0: Some[Int => Int] = Some(<function1>)
Of course, I can do
res0.map(_(5))
to execute the function, but the result is wrapped in the context.
Ok, I could do :
res0.map(_(5)).getOrElse(...)
but I'm copy/pasting this everywhere in my code (I have a lot of functions wrapped in Option, or worst, in Either...).
I need a better form, something like :
res0.applyOrElse(5, ...)
Does this concept of 'applying a function in a concept to a value and immediatly returning the result out of the context' exists in FP with a specific name (I'm lost in all those Functor, Monad and Applicatives...) ?
You can use andThen to move the default from the place where you call the function to the place where you define it:
val foo: String => Option[Int] = s => Some(s.size)
val bar: String => Int = foo.andThen(_.getOrElse(100))
This only works for Function1, but if you want a more generic version, Scalaz provides functor instances for FunctionN:
import scalaz._, Scalaz._
val foo: (String, Int) => Option[Int] = (s, i) => Some(s.size + i)
val bar: (String, Int) => Int = foo.map(_.getOrElse(100))
This also works for Function1—just replace andThen above with map.
More generally, as I mention above, this looks a little like unliftId on Kleisli, which takes a wrapped function A => F[B] and collapses the F using a comonad instance for F. If you wanted something that worked generically for Option, Either[E, ?], etc., you could write something similar that would take a Optional instance for F and a default value.
You could write something like applyOrElse using Option.fold.
fold[B](ifEmpty: ⇒ B)(f: (A) ⇒ B): B
val squared = Some((x:Int) => x * x)
squared.fold {
// or else = ifEmpty
math.pow(5, 2).toInt
}{
// execute function
_(5)
}
Using Travis Browns recent answer on another question, I was able to puzzle together the following applyOrElse function. It depends on Shapeless and you need to pass the arguments as an HList so it might not be exactly what you want.
def applyOrElse[F, I <: HList, O](
optionFun: Option[F],
input: I,
orElse: => O
)(implicit
ftp: FnToProduct.Aux[F, I => O]
): O = optionFun.fold(orElse)(f => ftp(f)(input))
Which can be used as :
val squared = Some((x:Int) => x * x)
applyOrElse(squared, 2 :: HNil, 10)
// res0: Int = 4
applyOrElse(None, 2 :: HNil, 10)
// res1: Int = 10
val concat = Some((a: String, b: String) => s"$a $b")
applyOrElse(concat, "hello" :: "world" :: HNil, "not" + "executed")
// res2: String = hello world
The getOrElse is most logical way to do it. In regards to copy/pasting it all over the place - you might not be dividing your logic up on the best way. Generally, you want to defer resolving your Options (or Futures/etc) in your code until the point you need to have it unwrapped. In this case, it seems more sensible that your function takes in an an Int and returns an Int, and you map your option where you need the result of that function.
This is a followup to this question.
Here's the code I'm trying to understand (it's from http://apocalisp.wordpress.com/2010/10/17/scalaz-tutorial-enumeration-based-io-with-iteratees/):
object io {
sealed trait IO[A] {
def unsafePerformIO: A
}
object IO {
def apply[A](a: => A): IO[A] = new IO[A] {
def unsafePerformIO = a
}
}
implicit val IOMonad = new Monad[IO] {
def pure[A](a: => A): IO[A] = IO(a)
def bind[A,B](a: IO[A], f: A => IO[B]): IO[B] = IO {
implicitly[Monad[Function0]].bind(() => a.unsafePerformIO,
(x:A) => () => f(x).unsafePerformIO)()
}
}
}
This code is used like this (I'm assuming an import io._ is implied)
def bufferFile(f: File) = IO { new BufferedReader(new FileReader(f)) }
def closeReader(r: Reader) = IO { r.close }
def bracket[A,B,C](init: IO[A], fin: A => IO[B], body: A => IO[C]): IO[C] = for { a <- init
c <- body(a)
_ <- fin(a) } yield c
def enumFile[A](f: File, i: IterV[String, A]): IO[IterV[String, A]] = bracket(bufferFile(f),
closeReader(_:BufferedReader),
enumReader(_:BufferedReader, i))
I'm now trying to understand the implicit val IOMonad definition. Here's how I understand it. This is a scalaz.Monad, so it needs to define pure and bind abstract values of the scalaz.Monad trait.
pure takes a value and turns it into a value contained in the "container" type. For example it could take an Int and return a List[Int]. This seems pretty simple.
bind takes a "container" type and a function that maps the type that the container holds to another type. The value that is returned is the same container type, but it's now holding a new type. An example would be taking a List[Int] and mapping it to a List[String] using a function that maps Ints to Strings. Is bind pretty much the same as map?
The implementation of bind is where I'm stuck. Here's the code:
def bind[A,B](a: IO[A], f: A => IO[B]): IO[B] = IO {
implicitly[Monad[Function0]].bind(() => a.unsafePerformIO,
(x:A) => () => f(x).unsafePerformIO)()
}
This definition takes IO[A] and maps it to IO[B] using a function that takes an A and returns an IO[B]. I guess to do this, it has to use flatMap to "flatten" the result (correct?).
The = IO { ... } is the same as
= new IO[A] {
def unsafePerformIO = implicitly[Monad[Function0]].bind(() => a.unsafePerformIO,
(x:A) => () => f(x).unsafePerformIO)()
}
}
I think?
the implicitly method looks for an implicit value (value, right?) that implements Monad[Function0]. Where does this implicit definition come from? I'm guessing this is from the implicit val IOMonad = new Monad[IO] {...} definition, but we're inside that definition right now and things get a little circular and my brain starts to get stuck in an infinite loop :)
Also, the first argument to bind (() => a.unsafePerformIO) seems to be a function that takes no parameters and returns a.unsafePerformIO. How should I read this? bind takes a container type as its first argument, so maybe () => a.unsafePerformIO resolves to a container type?
IO[A] is intended to represent an Action returning an A, where the result of the Action may depend on the environment (meaning anything, values of variables, file system, system time...) and the execution of the action may also modify the environment. Actually, scala type for an Action would be Function0. Function0[A] returns an A when called and it is certainly allowed to depend on and modify the environment. IO is Function0 under another name, but it is intended to distinguish (tag?) those Function0 which depends on the environment from the other ones, which are actually pure value (if you say f is a function[A] which always returns the same value, without any side effect, there is no much difference between f and its result). To be precise, it is not so much that function tagged as IO must have side effect. It is that those not so tagged must have none. Note however than wrapping impure functions in IO is entirely voluntary, there is no way you will have a guarantee when you get a Function0 that it is pure. Using IO is certainly not the dominant style in scala.
pure takes a value and turns it into a value contained in the
"container" type.
Quite right, but "container" may mean quite a lot of things. And the one returned by pure must be as light as possible, it must be the one that makes no difference. The point of list is that they may have any number of values. The one returned by pure must have one. The point of IO is that it depends on and affect the environment. The one returned by pure must do no such thing. So it is actually the pure Function0 () => a, wrapped in IO.
bind pretty much the same as map
Not so, bind is the same as flatMap. As you write, map would receive a function from Int to String, but here you have the function from Int to List[String]
Now, forget IO for a moment and consider what bind/flatMap would mean for an Action, that is for Function0.
Let's have
val askUserForLineNumber: () => Int = {...}
val readingLineAt: Int => Function0[String] = {i: Int => () => ...}
Now if we must combine, as bind/flatMap does, those items to get an action that returns a String, what it must be is pretty clear: ask the reader for the line number, read that line and returns it. That would be
val askForLineNumberAndReadIt= () => {
val lineNumber : Int = askUserForLineNumber()
val readingRequiredLine: Function0[String] = readingLineAt(line)
val lineContent= readingRequiredLine()
lineContent
}
More generically
def bind[A,B](a: Function0[A], f: A => Function0[B]) = () => {
val value = a()
val nextAction = f(value)
val result = nextAction()
result
}
And shorter:
def bind[A,B](a: Function0[A], f: A => Function0[B])
= () => {f(a())()}
So we know what bind must be for Function0, pure is clear too. We can do
object ActionMonad extends Monad[Function0] {
def pure[A](a: => A) = () => a
def bind[A,B](a: () => A, f: A => Function0[B]) = () => f(a())()
}
Now, IO is Function0 in disguise. Instead of just doing a(), we must do a.unsafePerformIO. And to define one, instead of () => body, we write IO {body}
So there could be
object IOMonad extends Monad[IO] {
def pure[A](a: => A) = IO {a}
def bind[A,B](a: IO[A], f: A => IO[B]) = IO {f(a.unsafePerformIO).unsafePerformIO}
}
In my view, that would be good enough. But in fact it repeats the ActionMonad. The point in the code you refer to is to avoid that and reuse what is done for Function0 instead. One goes easily from IO to Function0 (with () => io.unsafePerformIo) as well as from Function0 to IO (with IO { action() }). If you have f: A => IO[B], you can also change that to f: A => Function0[B], just by composing with the IO to Function0 transform, so (x: A) => f(x).unsafePerformIO.
What happens here in the bind of IO is:
() => a.unsafePerformIO: turn a into a Function0
(x:A) => () => f(x).unsafePerformIO): turn f into an A => Function0[B]
implicitly[Monad[Function0]]: get the default monad for Function0, the very same as the ActionMonad above
bind(...): apply the bind of the Function0 monad to the arguments a and f that have just been converted to Function0
The enclosing IO{...}: convert the result back to IO.
(Not sure I like it much)