How stream passes incremental? - scala

I am trying to understand how Stream works and have following Stream implementation:
sealed trait Stream[+A] {
def toList: List[A] = {
#annotation.tailrec
def go(s: Stream[A], acc: List[A]): List[A] = s match {
case Cons(h, t) => go(t(), h() :: acc)
case _ => acc
}
go(this, List()).reverse
}
def foldRight[B](z: => B)(f: (A, => B) => B): B =
this match {
case Cons(h, t) => f(h(), t().foldRight(z)(f))
case _ => z
}
def map[B](f: A => B): Stream[B] =
this.foldRight(Stream.empty[B])((x, y) => Stream.cons(f(x), y))
def filter(f: A => Boolean): Stream[A] =
this.foldRight(Stream.empty[A])((h, t) => if (f(h)) Stream.cons(h, t) else t)
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, t1: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = t1
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] =
if (as.isEmpty) empty else cons(as.head, apply(as.tail: _*))
}
and the code that is using Stream:
Stream(1,2,3,4).map((x) => {
println(x)
x + 10
}).filter((x) => {
println(x)
x % 2 == 0
}).toList
as output I've got:
1
11
2
12
3
13
4
14
res4: List[Int] = List(12, 14)
As you can see on the output, there is no intermediate result, the source will be pass one for one, how is that possible?
I can not image, how does it work.

Let's take a look at what the methods you used do on Stream:
map and filter are both implemented with foldRight. To make it clearer, let's inline foldRight inside map (the same can be done with filter), using the referential transparency principle:
def map[B](f: A => B) = this match {
case Cons(h, t) => Stream.cons(f(h()), t().map(f))
case _ => Empty
}
Now, where in this code is f evaluated? Never, since Stream.cons parameters are call-by-name, so we only give the description for the new stream, not its values.
Once you are convinced of this fact, you can easily see that the same will apply for filter, so we can move forward to toList.
It will evaluate each element in the Stream, putting the values in a List that will be reversed at the end.
But evaluating an element of the Stream which has been filtered and mapped is precisely reading the description of the values, so the actual functions are evaluated here. Hence the console output in order: first the map function is called then the filter function, for each element, one at a time (since we are now on the lazily mapped and filtered Stream).

Related

Is it possible to pattern match on a by-name parameter without evaluating it?

Was playing with Lazy Structure Stream as below
import Stream._
sealed trait Stream[+A] {
..
def toList: List[A] = this match {
case Empty => Nil
case Cons(h, t) => println(s"${h()}::t().toList"); h()::t().toList
}
def foldRight[B](z: B) (f: ( A, => B) => B) : B = this match {
case Empty => println(s"foldRight of Empty return $z"); z
case Cons(h, t) => println(s"f(${h()}, t().foldRight(z)(f))"); f(h(), t().foldRight(z)(f))
}
..
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](h: => A, t: => Stream[A]): Stream[A] = {
lazy val hd = h
lazy val tl = t
Cons[A](() => hd, () => tl)
}
def empty[A]: Stream[A] = Empty
def apply[A](la: A*): Stream[A] = la match {
case list if list.isEmpty => empty[A]
case _ => cons(la.head, apply(la.tail:_*))
}
}
For a function takeWhile via foldRight i initially wrote:
def takeWhileFoldRight_0(p: A => Boolean) : Stream[A] = {
foldRight(empty[A]) {
case (a, b) if p(a) => println(s"takeWhileFoldRight cons($a, b) with p(a) returns: cons($a, b)"); cons(a, b)
case (a, b) if !p(a) => println(s"takeWhileFoldRight cons($a, b) with !p(a) returns: empty[A]"); empty[A]
}
}
Which when called as:
Stream(4,5,6).takeWhileFoldRight_0(_%2 == 0).toList
result in the following trace:
f(4, t().foldRight(z)(f))
f(5, t().foldRight(z)(f))
f(6, t().foldRight(z)(f))
foldRight of Empty return Empty
takeWhileFoldRight cons(6, b) with p(a) returns: cons(6, b)
takeWhileFoldRight cons(5, b) with !p(a) returns: empty[A]
takeWhileFoldRight cons(4, b) with p(a) returns: cons(4, b)
4::t().toList
res2: List[Int] = List(4)
Then questioning and questioning i figured that it might have been the unapply method in the pattern match that evaluate eagerly.
So i changed to
def takeWhileFoldRight(p: A => Boolean) : Stream[A] = {
foldRight(empty[A]) { (a, b) =>
if (p(a)) cons(a, b) else empty[A]
}
}
which when called as
Stream(4,5,6).takeWhileFoldRight(_%2 == 0).toList
result in the following trace:
f(4, t().foldRight(z)(f))
4::t().toList
f(5, t().foldRight(z)(f))
res1: List[Int] = List(4)
Hence my question:
Is there a way to recover the power of pattern match when working with by-name parameter ?
Said differently case i match parameter that are by-name without evaluating them eagerly ?
Or i have to go to a set of ugly nested "if" :p in that kind of scenario
Take a closer look at this fragment:
def toList: List[A] = this match {
case Empty => Nil
case Cons(h, t) => println(s"${h()}::t().toList"); h()::t().toList
}
def foldRight[B](z: B) (f: ( A, => B) => B) : B = this match {
case Empty => println(s"foldRight of Empty return $z"); z
case Cons(h, t) => println(s"f(${h()}, t().foldRight(z)(f))"); f(h(), t().foldRight(z)(f))
}
..
}
Here h and t in Cons aren't evaluated by unapply - after all unapply returns () => X functions without calling them. But you do. Twice for each match - once for printing and once for passing the result on. And you aren't remembering the result, so any future fold, map, etc would evaluate the function anew.
Depending on what behavior you want to have you should either:
Calculate the results once, right after matching them:
case Cons(h, t) =>
val hResult = h()
val tResult = t()
println(s"${hResult}::tail.toList")
hResult :: tResult.toList
or
not use case class because it cannot memoize the result and you might need to memoize it:
class Cons[A](fHead: () => A, fTail: () => Stream[A]) extends Stream[A] {
lazy val head: A = fHead()
lazy val tail: Stream[A] = fTail()
// also override: toString, equals, hashCode, ...
}
object Cons {
def apply[A](head: => A, tail: => Stream[A]): Stream[A] =
new Cons(() => head, () => tail)
def unapply[A](stream: Stream[A]): Option[(A, Stream[A])] = stream match {
case cons: Cons[A] => Some((cons.head, cons.tail)) // matches on type, doesn't use unapply
case _ => None
}
}
If you understand what you're doing you could also create a case class with overridden apply and unapply (like above) but that is almost always a signal that you shouldn't use a case class in the first place (because most likely toString, equals, hashCode, etc would have nonsensical implementation).

Scala cats trampoline

The test("ok") is copied from book "scala with cats" by Noel Welsh and Dave Gurnell pag.254 ("D.4 Safer Folding using Eval
"), the code run fine, it's the trampolined foldRight
import cats.Eval
test("ok") {
val list = (1 to 100000).toList
def foldRightEval[A, B](as: List[A], acc: Eval[B])(fn: (A, Eval[B]) => Eval[B]): Eval[B] =
as match {
case head :: tail =>
Eval.defer(fn(head, foldRightEval(tail, acc)(fn)))
case Nil =>
acc
}
def foldRight[A, B](as: List[A], acc: B)(fn: (A, B) => B): B =
foldRightEval(as, Eval.now(acc)) { (a, b) =>
b.map(fn(a, _))
}.value
val res = foldRight(list, 0L)(_ + _)
assert(res == 5000050000l)
}
The test("ko") returns same values of test("ok") for small list but for long list the value is different. Why?
test("ko") {
val list = (1 to 100000).toList
def foldRightSafer[A, B](as: List[A], acc: B)(fn: (A, B) => B): Eval[B] = as match {
case head :: tail =>
Eval.defer(foldRightSafer(tail, acc)(fn)).map(fn(head, _))
case Nil => Eval.now(acc)
}
val res = foldRightSafer(list, 0)((a, b) => a + b).value
assert(res == 5000050000l)
}
This is #OlegPyzhcov's comment, converted into a community wiki answer
You forgot the L in 0L passed as second argument to foldRightSafer.
Because of that, the inferred generic types of the invocation are
foldRightSafer[Int, Int]((list : List[Int]), (0: Int))((_: Int) + (_: Int))
and so your addition overflows and gives you something smaller than 2000000000 (9 zeroes, Int.MaxValue = 2147483647).

Have to specify the parametric type

I have a Stream trait, that contains following method:
sealed trait Stream[+A] {
def takeWhile2(f: A => Boolean): Stream[A] =
this.foldRight(Stream.empty[A])((x, y) => {
if (f(x)) Stream.cons(x, y) else Stream.empty
})
#annotation.tailrec
final def exists(p: A => Boolean): Boolean = this match {
case Cons(h, t) => p(h()) || t().exists(p)
case _ => false
}
}
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, t1: => Stream[A]): Stream[A] = {
lazy val head = hd
lazy val tail = t1
Cons(() => head, () => tail)
}
def empty[A]: Stream[A] = Empty
def apply[A](as: A*): Stream[A] =
if (as.isEmpty) empty else cons(as.head, apply(as.tail: _*))
}
Take a look at takeWhile2 body, it calls foldRight function.
When I would pass Stream.empty instead of Stream.empty[A], I would get compiler error, why?
That's because foldRight infers its type parameter from its first parameter list (ie its zero element).
Since this first element is Stream.empty, the type inferred is Stream[Nothing], and so it expects the second parameter to be a (A, Stream[Nothing]) => Stream[Nothing], which is clearly not the case.
The same issue is true with any fold operator on collections, Option, ...
That's because you have casted (x,y) as Stream.empty[A] when f(x) is true but when f(x) is false it will return Stream.empty[Nothing] i.e. if you don't specify a dataType default of Nothing is used. So the Stream[A] (expected return type) doesn't match with returned value of Stream[Nothing]

Tail recursive functions for BinaryTree

I am stuck with implementing tail recursive foreach, reduce, map and toList functions for a very simple implementation of binary tree.
sealed trait Tree[+A]
case object EmptyTree extends Tree[Nothing]
case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
object Tree {
def apply[A]: Tree[A] = EmptyTree
def apply[A](value: A): Tree[A] = Node(value, EmptyTree, EmptyTree)
def apply[A](value: A, left: Tree[A], right: Tree[A]): Tree[A] = Node(value, left, right)
def foreach[A](tree: Tree[A], f: (A) => Unit): Unit = {
//#tailrec
def iter[A](tree: Tree[A], f: (A) => Unit): Unit = tree match {
case EmptyTree =>
case Node(v, l, r) =>
iter(l, f)
f(v)
iter(r, f)
}
iter(tree, f)
}
def reduce[A](tree: Tree[A], value: A, f: (A, A) => A): A = {
//#tailrec
def loop(tree: Tree[A], value: A): A = tree match {
case Node(v, l, r) => loop(l, f(loop(r, value), v))
case EmptyTree => value
}
loop(tree, value)
}
def map[A, B](tree: Tree[A], f: A => B): Tree[B] = {
//#tailrec
def iter[A](tree: Tree[A], f: A => B): Tree[B] = tree match {
case Node(v, l, r) => Node(f(v), iter(l, f), iter(r, f))
case EmptyTree => EmptyTree
}
iter(tree, f)
}
def toList[A](t: Tree[A]): List[A] = {
//#tailrec
def iter[A](t: Tree[A]): List[A] = t match {
case Node(v, l, r) => v :: iter(l) ::: iter(r)
case EmptyTree => List.empty
}
iter(t)
}
}
Code for testing:
val tree = Tree(1, Tree(2, Tree(3), Tree(4)), Tree(5, Tree(6), Tree(7)))
Tree.foreach(tree, (x: Int) => println(x))
Tree.reduce(tree, 0, (x: Int, y: Int) => x + y)
Tree.map(tree, (x: Int) => x + 1)
Tree.toList(tree)
I cant use #tailrec attribute because as you can see, recursive calls are not the last calls in a function, and I do not know how to rewrite it because there are several calls in one function, for example
v :: iter(l) ::: iter(r)
I know that I can use accumulator for inner recursive functions but how I should use it in case of several calls ?
Thanks in advance.
Updated:
def toListRec[A](tree: Tree[A]): List[A] = {
#tailrec
def iter(result: List[A], todo: List[Tree[A]]): List[A] = todo match {
case x :: tail => x match {
case Node(v, l, r) => iter(v :: result, l :: r :: tail)
case EmptyTree => iter(result, tail)
}
case Nil => result.reverse
}
iter(List.empty, List(tree))
}
Without tail recursion, a(/the) stack is used to keep track of calling functions. If you want to use tail recursion, you'll have to find a way to keep track of this information elsewhere. In simpler "linear" cases, such as factorial, this information is pretty limited and can often easily be taken care of by using an accumulator.
In your case, the problem is that the recursion isn't linear. After one recursive call, the function doesn't just compute the result, but it makes another recursive call before being able to get to the result.
In order to apply tail recursion in this case, you will have to explicitly keep track of the remaining recursive calls that have to be made. An easy way is to simply keep a "to-do" list. For example:
def toList[A](t: Tree[A]): List[A] = {
#tailrec def iter[A](todo: List[Tree[A]], r: List[A]): List[A] =
todo match {
case t :: rest => t match {
case Node(v, l, r) => iter(l :: r :: rest, v :: r)
case EmptyTree => iter(rest, r)
}
case List.empty => reverse(r)
}
iter(List(t), List.empty)
}
Disclaimer: I know nothing about scala. :)
The solution that mweerden suggests would work, however, there is another way of solving the problem, which I think is much more elegant. Here is the code which traverses a tree to list
def toList[T](t: Tree[T]): List[T] = {
def tailRecursive(tree: Tree[T], acc: List[T]): List[T] = tree match {
case EmptyTree => acc
case Node(value, right, left) =>
tailRecursive(left, value :: tailRecursive(right, acc))
}
tailRecursive(t, List())
}
The solution implies that the tree is a binary search tree, and the list produced will be in ascending order (if the ascending order is not required, 6th line can be changed, putting the value in front of first recursive call or straightly into the accumulator would be possible).

Scala - map over sequence, stopping immediately when element cannot be processed

I'd like a function that maps a function f over a sequence xs, and if f(x) (where x is an element of xs) produces a Failure then don't process any further elements of xs but immediately return Failure. If f(x) succeeds for all x then return a Success containing a sequence of the results.
So the type signature might be something like
def traverse[A, B](xs: Seq[A])(f: A => Try[B]): Try[Seq[B]]
And some test cases:
def doWork(i: Int): Try[Int] = {
i match {
case 1 => Success(10)
case 2 => Failure(new IllegalArgumentException("computer says no"))
case 3 => Success(30)
}
}
traverse(Seq(1,2,3))(doWork)
res0: scala.util.Try[Seq[Int]] = Failure(java.lang.IllegalArgumentException: computer says no)
traverse(Seq(1,3))(doWork)
scala.util.Try[Seq[Int]] = Success(List(10, 30))
What would be the most elegant way to implement this?
Simple implementation:
def traverse[A, B](xs: Seq[A])(f: A => Try[B]): Try[Seq[B]] =
xs.foldLeft[Try[Seq[B]]](Success(Vector())) { (attempt, elem) => for {
seq <- attempt
next <- f(elem)
} yield seq :+ next
}
Trouble here that while function will not evaluate f after the Failure will occur, function will traverse the sequence to the end , which could be undesirable in case of some complex Stream, so we may use some specialized version:
def traverse1[A, B](xs: Seq[A])(f: A => Try[B]): Try[Seq[B]] = {
val ys = xs map f
ys find (_.isFailure) match {
case None => Success(ys map (_.get))
case Some(Failure(ex)) => Failure(ex)
}
}
which uses intermediate collection, which leads to unnecessary memory overhead in case of strict collection
or we could reimplement fold from scratch:
def traverse[A, B](xs: Seq[A])(f: A => Try[B]): Try[Seq[B]] = {
def loop(xs: Seq[A], acc: Seq[B]): Try[Seq[B]] = xs match {
case Seq() => Success(acc)
case elem +: tail =>
f(elem) match {
case Failure(ex) => Failure(ex)
case Success(next) => loop(tail, acc :+ next)
}
}
loop(xs, Vector())
}
As we could see inner loop will continue iterations while it deals only with Success
One way, but is it the most elegant?
def traverse[A, B](xs: Seq[A])(f: A => Try[B]): Try[Seq[B]] = {
Try(xs.map(f(_).get))
}