OutOfMemoryError in a Fibonacci stream in Scala - scala

When I define fib like this (1):
def fib(n: Int) = {
lazy val fibs: Stream[BigInt] = 0 #:: 1 #:: fibs.zip(fibs.tail).map{n => n._1 + n._2}
fibs.drop(n).head
}
I get an error:
scala> fib(1000000)
java.lang.OutOfMemoryError: Java heap space
On the other hand, this works fine (2):
def fib = {
lazy val fibs: Stream[BigInt] = 0 #:: 1 #:: fibs.zip(fibs.tail).map{n => n._1 + n._2}
fibs
}
scala> fib.drop(1000000).head
res17: BigInt = 195328212...
Moreover, if I change the stream definition in the following way, I can call drop(n).head within the function and don't get any error either (3):
def fib(n: Int) = {
lazy val fibs: (BigInt, BigInt) => Stream[BigInt] = (a, b) => a #:: fibs(b, a+b)
fibs(0, 1).drop(n).head
}
scala> fib(1000000)
res18: BigInt = 195328212...
Can you explain relevant differences between (1), (2) and (3)? Why does (2) work, while (1) does not? And why don't we need to move drop(n).head out of the function in (3)?

In the first case reference to the beginning of fibs stream exists while element number n is calculated - thus all values from 0 to 1000000 have to be kept in memory. This is the source of OutOfMemoryError.
In the second case reference to beginning of stream is not preserved anywhere, so items can be garbage collected (only one item at a time have to be kept in memory).
In the third case reference to beginning of stream does not exists anywhere explicitly (it can be garbage collected while next values are dropped). However if we change it into:
def fib(n: Int) = {
lazy val fibs: (BigInt, BigInt) => Stream[BigInt] = (a, b) => a #:: fibs(b, a+b)
val beg = fibs(0, 1)
beg.drop(n).head
}
Then OutOfMemoryError will occur again.

Related

Code to compute Stream of primes in Scala

I have slightly modified Daniel Sobral's prime Stream function from this SO post:
def primeStream: Stream[Int] => Stream[Int] =
s => s.head #:: primeStream(s.tail filter(_ % s.head != 0))
I'm using it with:
primeStream(Stream.from(2)).take(100).foreach(println)
and it works fine enough, but I'm wondering if I could get rid of that pesky Stream.from(2) with the following:
def primeStream: def primeStream: () => Stream[Int] =
() => Stream.from(2)
def primeStream: Stream[Int] => Stream[Int] =
s => s.head #:: primeStream(s.tail filter(_ % s.head != 0))
to achieve:
primeStream().take(100).foreach(println)
But that doesn't work. What am I missing?
I tried also:
def primeStream: Stream[Int] => Stream[Int] = {
() => Stream.from(2)
s: Stream[Int] => s.head #:: primeStream(s.tail filter(_ % s.head != 0))
}
which doesn't work.
This works:
def primeStream2(s: Stream[Int] = Stream.from(2)): Stream[Int] =
s.head #:: primeStream2(s.tail filter(_ % s.head != 0))
But I wanted to understand what I missed to make the syntax work for the more symmetric syntax above with 2 parallel definitions of primeStream .
The 1st attempt doesn't work because you're trying to define 2 different methods with the same name. Methods can't be differentiated by their return types. Also, other than their names they appear to be totally unrelated so if you were able to invoke one of them the existence of the other would be immaterial.
The 2nd attempt tries to put 2 unrelated, and unnamed, functions in the same code block. It will compile if you wrap the 1st function in parentheses but the result isn't what you're after.
I completely understand your desire to make Stream.from(2) automatic because if you pass anything else, like Stream.from(13), you don't get a Stream of prime integers.
There are a few different ways to get a lazy sequence of prime numbers with only one Stream invocation. This one is a little complicated because it tries to reduce the number of inner iterations when searching for the next prime.
val primeStream: Stream[Int] = 2 #:: Stream.iterate[Int](3)(x =>
Stream.iterate(x+2)(_+2).find(i => primeStream.takeWhile(p => p*p <= i)
.forall(i%_ > 0)).get)
You can also use the new (Scala 2.13) unfold() method to create the Stream.
val primes = Stream.unfold(List(2)) { case hd::tl =>
Option((hd, Range(hd+1, hd*2).find(n => tl.forall(n % _ > 0)).get::hd::tl))
}
Note that Stream has been deprecated since Scala 2.13 and should be replaced with the new LazyList.

How to create sequence?

I'm trying to come up with an endless Fibonacci sequence of numbers function, that passes two parameters. The parameters will set the first 2 elements in the sequence.
def fib(i: Int, j: Int): Stream[Int] = {
case 0 | 1 => current
case _ => Fib( current-1 ) + Fib( current -2 )
}
This is very easy to do, however, you have to recurs in the other direction. You do not define the current element based on previous elements but your function receives the current arguments and calls itself with the arguments of the next value:
def fib(i: Int, j: Int): Stream[Int] = i #:: fib(j, i + j)
println(fib(0,1).take(10))
In contrast to the typical recursive definition, this is not quaratic but just linear, so it is quite efficient. (Streams are of course more complex than a simple while loop).
For efficiency, this kind of thing is usually done with a Stream to avoid recalculating the same values over and over. The straightforward way to create a Stream of Fibonacci numbers is
val fibs: Stream[BigInt] = 0 #:: 1 #:: ( fibs zip fibs.tail map ( n => n._1 + n._2 ) )
But you can make a more efficient version of this kind of Stream by avoiding the zip, like so:
val fibs: Stream[BigInt] = {
def loop( h:BigInt, n:BigInt ): Stream[BigInt] = h #:: loop(n, h+n)
loop(0,1)
}
Notice that these use val; you generally DO NOT want to use def to define a stream!

Scala lazy val explanation

I am taking the Functional Programming in Scala course on Coursera and I am having a hard time understanding this code snippet -
def sqrtStream(x: Double): Stream[Double] = {
def improve(guess: Double): Double = (guess+ x/ guess) / 2
lazy val guesses: Stream[Double] = 1 #:: (guesses map improve)
guesses
}
This method would find 10 approximate square root of 4 in increasing order of accuracy when I would do sqrtSteam(4).take(10).toList.
Can someone explain the evaluation strategy of guesses here? My doubt is what value of guesses in substituted when the second value of guesses is picked up?
Let's start from simplified example:
scala> lazy val a: Int = a + 5
a: Int = <lazy>
scala> a
stack overflow here, because of infinite recursion
So a is recalculating til it gets some stable value, like here:
scala> def f(f:() => Any) = 0 //takes function with captured a - returns constant 0
f: (f: () => Any)Int
scala> lazy val a: Int = f(() => a) + 5
a: Int = <lazy>
scala> a
res4: Int = 5 // 0 + 5
You may replace def f(f:() => Any) = 0 with def f(f: => Any) = 0, so a definition will look like it's really passed to the f: lazy val a: Int = f(a) + 5.
Streams use same mechanism - guesses map improve will be passed as parameter called by name (and lambda linked to the lazy a will be saved inside Stream, but not calculated until tail is requested), so it's like lazy val guesses = #::(1, () => guesses map improve). When you call guessess.head - tail will not be evaluated; guesses.tail will lazily return Stream (improve(1), ?), guesses.tail.tail will be Stream(improve(improve(1)), ?) and so on.
The value of guesses is not substituted. A stream is like a list, but its elements are evaluated only when they are needed and then they stored, so next time you access them the evaluation will not be necessary. The reference to the stream itself does not change.
On top of the example Αλεχει wrote, there is a nice explanation in Scala API:
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream
You can easily find out what's going on by modifying the map function, as described in the scaladoc example:
scala> def sqrtStream(x: Double): Stream[Double] = {
| def improve(guess: Double): Double = (guess + x / guess) / 2
| lazy val guesses: Stream[Double] = 1 #:: (guesses map {n =>
| println(n, improve(n))
| improve(n)
| })
| guesses
| }
sqrtStream: (x: Double)Stream[Double]
The output is:
scala> sqrtStream(4).take(10).toList
(1.0,2.5)
(2.5,2.05)
(2.05,2.000609756097561)
(2.000609756097561,2.0000000929222947)
(2.0000000929222947,2.000000000000002)
(2.000000000000002,2.0)
(2.0,2.0)
(2.0,2.0)
(2.0,2.0)
res0: List[Double] = List(1.0, 2.5, 2.05, 2.000609756097561, 2.0000000929222947, 2.000000000000002, 2.0, 2.0, 2.0, 2.0)

Why does this function run out of memory?

It's a function to find the third largest of a collection of integers. I'm calling it like this:
val lineStream = thirdLargest(Source.fromFile("10m.txt").getLines.toIterable
val intStream = lineStream map { s => Integer.parseInt(s) }
thirdLargest(intStream)
The file 10m.txt contains 10 million lines with a random integer on each one. The thirdLargest function below should not be keeping any of the integers after it has tested them, and yet it causes the JVM to run out of memory (after about 90 seconds in my case).
def thirdLargest(numbers: Iterable[Int]): Option[Int] = {
def top3of4(top3: List[Int], fourth: Int) = top3 match {
case List(a, b, c) =>
if (fourth > c) List(b, c, fourth)
else if (fourth > b) List(b, fourth, c)
else if (fourth > a) List(fourth, b, c)
else top3
}
#tailrec
def find(top3: List[Int], rest: Iterable[Int]): Int = (top3, rest) match {
case (List(a, b, c), Nil) => a
case (top3, d #:: rest) => find(top3of4(top3, d), rest)
}
numbers match {
case a #:: b #:: c #:: rest => Some(find(List[Int](a, b, c).sorted, rest))
case _ => None
}
}
The OOM error has nothing to do with the way you read the file. It is totally fine and even recommended to use Source.getLines here. The problem is elsewhere.
Many people are being confused by the nature of Scala Stream concept. In fact this is not something you would want to use just to iterate over things. It is lazy indeed however it doesn't discard previous results – they're being memoized so there's no need to recalculate them again on the next use (which never happens in your case but that's where your memory goes). See also this answer.
Consider using foldLeft. Here's a working (but intentionally simplified) example for illustration purposes:
val lines = Source.fromFile("10m.txt").getLines()
print(lines.map(_.toInt).foldLeft(-1 :: -1 :: -1 :: Nil) { (best3, next) =>
(next :: best3).sorted.reverse.take(3)
})

How to implement lazy sequence (iterable) in scala?

I want to implement a lazy iterator that yields the next element in each call, in a 3-level nested loop.
Is there something similar in scala to this snippet of c#:
foreach (int i in ...)
{
foreach (int j in ...)
{
foreach (int k in ...)
{
yield return do(i,j,k);
}
}
}
Thanks, Dudu
Scala sequence types all have a .view method which produces a lazy equivalent of the collection. You can play around with the following in the REPL (after issuing :silent to stop it from forcing the collection to print command results):
def log[A](a: A) = { println(a); a }
for (i <- 1 to 10) yield log(i)
for (i <- (1 to 10) view) yield log(i)
The first will print out the numbers 1 to 10, the second will not until you actually try to access those elements of the result.
There is nothing in Scala directly equivalent to C#'s yield statement, which pauses the execution of a loop. You can achieve similar effects with the delimited continuations which were added for scala 2.8.
If you join iterators together with ++, you get a single iterator that runs over both. And the reduceLeft method helpfully joins together an entire collection. Thus,
def doIt(i: Int, j: Int, k: Int) = i+j+k
(1 to 2).map(i => {
(1 to 2).map(j => {
(1 to 2).iterator.map(k => doIt(i,j,k))
}).reduceLeft(_ ++ _)
}).reduceLeft(_ ++ _)
will produce the iterator you want. If you want it to be even more lazy than that, you can add .iterator after the first two (1 to 2) also. (Replace each (1 to 2) with your own more interesting collection or range, of course.)
You can use a Sequence Comprehension over Iterators to get what you want:
for {
i <- (1 to 10).iterator
j <- (1 to 10).iterator
k <- (1 to 10).iterator
} yield doFunc(i, j, k)
If you want to create a lazy Iterable (instead of a lazy Iterator) use Views instead:
for {
i <- (1 to 10).view
j <- (1 to 10).view
k <- (1 to 10).view
} yield doFunc(i, j, k)
Depending on how lazy you want to be, you may not need all of the calls to iterator / view.
If your 3 iterators are generally small (i.e., you can fully iterate them without concern for memory or CPU) and the expensive part is computing the result given i, j, and k, you can use Scala's Stream class.
val tuples = for (i <- 1 to 3; j <- 1 to 3; k <- 1 to 3) yield (i, j, k)
val stream = Stream(tuples: _*) map { case (i, j, k) => i + j + k }
stream take 10 foreach println
If your iterators are too large for this approach, you could extend this idea and create a Stream of tuples that calculates the next value lazily by keeping state for each iterator. For example (although hopefully someone has a nicer way of defining the product method):
def product[A, B, C](a: Iterable[A], b: Iterable[B], c: Iterable[C]): Iterator[(A, B, C)] = {
if (a.isEmpty || b.isEmpty || c.isEmpty) Iterator.empty
else new Iterator[(A, B, C)] {
private val aItr = a.iterator
private var bItr = b.iterator
private var cItr = c.iterator
private var aValue: Option[A] = if (aItr.hasNext) Some(aItr.next) else None
private var bValue: Option[B] = if (bItr.hasNext) Some(bItr.next) else None
override def hasNext = cItr.hasNext || bItr.hasNext || aItr.hasNext
override def next = {
if (cItr.hasNext)
(aValue get, bValue get, cItr.next)
else {
cItr = c.iterator
if (bItr.hasNext) {
bValue = Some(bItr.next)
(aValue get, bValue get, cItr.next)
} else {
aValue = Some(aItr.next)
bItr = b.iterator
(aValue get, bValue get, cItr.next)
}
}
}
}
}
val stream = product(1 to 3, 1 to 3, 1 to 3).toStream map { case (i, j, k) => i + j + k }
stream take 10 foreach println
This approach fully supports infinitely sized inputs.
I think the below code is what you're actually looking for... I think the compiler ends up translating it into the equivalent of the map code Rex gave, but is closer to the syntax of your original example:
scala> def doIt(i:Int, j:Int) = { println(i + ","+j); (i,j); }
doIt: (i: Int, j: Int)(Int, Int)
scala> def x = for( i <- (1 to 5).iterator;
j <- (1 to 5).iterator ) yield doIt(i,j)
x: Iterator[(Int, Int)]
scala> x.foreach(print)
1,1
(1,1)1,2
(1,2)1,3
(1,3)1,4
(1,4)1,5
(1,5)2,1
(2,1)2,2
(2,2)2,3
(2,3)2,4
(2,4)2,5
(2,5)3,1
(3,1)3,2
(3,2)3,3
(3,3)3,4
(3,4)3,5
(3,5)4,1
(4,1)4,2
(4,2)4,3
(4,3)4,4
(4,4)4,5
(4,5)5,1
(5,1)5,2
(5,2)5,3
(5,3)5,4
(5,4)5,5
(5,5)
scala>
You can see from the output that the print in "doIt" isn't called until the next value of x is iterated over, and this style of for generator is a bit simpler to read/write than a bunch of nested maps.
Turn the problem upside down. Pass "do" in as a closure. That's the entire point of using a functional language
Iterator.zip will do it:
iterator1.zip(iterator2).zip(iterator3).map(tuple => doSomething(tuple))
Just read the 20 or so first related links that are show on the side (and, indeed, where shown to you when you first wrote the title of your question).