Code to compute Stream of primes in Scala - scala

I have slightly modified Daniel Sobral's prime Stream function from this SO post:
def primeStream: Stream[Int] => Stream[Int] =
s => s.head #:: primeStream(s.tail filter(_ % s.head != 0))
I'm using it with:
primeStream(Stream.from(2)).take(100).foreach(println)
and it works fine enough, but I'm wondering if I could get rid of that pesky Stream.from(2) with the following:
def primeStream: def primeStream: () => Stream[Int] =
() => Stream.from(2)
def primeStream: Stream[Int] => Stream[Int] =
s => s.head #:: primeStream(s.tail filter(_ % s.head != 0))
to achieve:
primeStream().take(100).foreach(println)
But that doesn't work. What am I missing?
I tried also:
def primeStream: Stream[Int] => Stream[Int] = {
() => Stream.from(2)
s: Stream[Int] => s.head #:: primeStream(s.tail filter(_ % s.head != 0))
}
which doesn't work.
This works:
def primeStream2(s: Stream[Int] = Stream.from(2)): Stream[Int] =
s.head #:: primeStream2(s.tail filter(_ % s.head != 0))
But I wanted to understand what I missed to make the syntax work for the more symmetric syntax above with 2 parallel definitions of primeStream .

The 1st attempt doesn't work because you're trying to define 2 different methods with the same name. Methods can't be differentiated by their return types. Also, other than their names they appear to be totally unrelated so if you were able to invoke one of them the existence of the other would be immaterial.
The 2nd attempt tries to put 2 unrelated, and unnamed, functions in the same code block. It will compile if you wrap the 1st function in parentheses but the result isn't what you're after.
I completely understand your desire to make Stream.from(2) automatic because if you pass anything else, like Stream.from(13), you don't get a Stream of prime integers.
There are a few different ways to get a lazy sequence of prime numbers with only one Stream invocation. This one is a little complicated because it tries to reduce the number of inner iterations when searching for the next prime.
val primeStream: Stream[Int] = 2 #:: Stream.iterate[Int](3)(x =>
Stream.iterate(x+2)(_+2).find(i => primeStream.takeWhile(p => p*p <= i)
.forall(i%_ > 0)).get)
You can also use the new (Scala 2.13) unfold() method to create the Stream.
val primes = Stream.unfold(List(2)) { case hd::tl =>
Option((hd, Range(hd+1, hd*2).find(n => tl.forall(n % _ > 0)).get::hd::tl))
}
Note that Stream has been deprecated since Scala 2.13 and should be replaced with the new LazyList.

Related

OutOfMemoryError in a Fibonacci stream in Scala

When I define fib like this (1):
def fib(n: Int) = {
lazy val fibs: Stream[BigInt] = 0 #:: 1 #:: fibs.zip(fibs.tail).map{n => n._1 + n._2}
fibs.drop(n).head
}
I get an error:
scala> fib(1000000)
java.lang.OutOfMemoryError: Java heap space
On the other hand, this works fine (2):
def fib = {
lazy val fibs: Stream[BigInt] = 0 #:: 1 #:: fibs.zip(fibs.tail).map{n => n._1 + n._2}
fibs
}
scala> fib.drop(1000000).head
res17: BigInt = 195328212...
Moreover, if I change the stream definition in the following way, I can call drop(n).head within the function and don't get any error either (3):
def fib(n: Int) = {
lazy val fibs: (BigInt, BigInt) => Stream[BigInt] = (a, b) => a #:: fibs(b, a+b)
fibs(0, 1).drop(n).head
}
scala> fib(1000000)
res18: BigInt = 195328212...
Can you explain relevant differences between (1), (2) and (3)? Why does (2) work, while (1) does not? And why don't we need to move drop(n).head out of the function in (3)?
In the first case reference to the beginning of fibs stream exists while element number n is calculated - thus all values from 0 to 1000000 have to be kept in memory. This is the source of OutOfMemoryError.
In the second case reference to beginning of stream is not preserved anywhere, so items can be garbage collected (only one item at a time have to be kept in memory).
In the third case reference to beginning of stream does not exists anywhere explicitly (it can be garbage collected while next values are dropped). However if we change it into:
def fib(n: Int) = {
lazy val fibs: (BigInt, BigInt) => Stream[BigInt] = (a, b) => a #:: fibs(b, a+b)
val beg = fibs(0, 1)
beg.drop(n).head
}
Then OutOfMemoryError will occur again.

How to create sequence?

I'm trying to come up with an endless Fibonacci sequence of numbers function, that passes two parameters. The parameters will set the first 2 elements in the sequence.
def fib(i: Int, j: Int): Stream[Int] = {
case 0 | 1 => current
case _ => Fib( current-1 ) + Fib( current -2 )
}
This is very easy to do, however, you have to recurs in the other direction. You do not define the current element based on previous elements but your function receives the current arguments and calls itself with the arguments of the next value:
def fib(i: Int, j: Int): Stream[Int] = i #:: fib(j, i + j)
println(fib(0,1).take(10))
In contrast to the typical recursive definition, this is not quaratic but just linear, so it is quite efficient. (Streams are of course more complex than a simple while loop).
For efficiency, this kind of thing is usually done with a Stream to avoid recalculating the same values over and over. The straightforward way to create a Stream of Fibonacci numbers is
val fibs: Stream[BigInt] = 0 #:: 1 #:: ( fibs zip fibs.tail map ( n => n._1 + n._2 ) )
But you can make a more efficient version of this kind of Stream by avoiding the zip, like so:
val fibs: Stream[BigInt] = {
def loop( h:BigInt, n:BigInt ): Stream[BigInt] = h #:: loop(n, h+n)
loop(0,1)
}
Notice that these use val; you generally DO NOT want to use def to define a stream!

Merge two Streams (ordered) to get a final sorted Stream

For example, how to merge two Streams of sorted Integers? I thought it's very basic, but just found it's non trivial at all. The below one is not tail-recursive and it will stack-overflow when the Streams are large.
def merge(as: Stream[Int], bs: Stream[Int]): Stream[Int] = {
(as, bs) match {
case (Stream.Empty, bss) => bss
case (ass, Stream.Empty) => ass
case (a #:: ass, b #:: bss) =>
if (a < b) a #:: merge(ass, bs)
else b #:: merge(as, bss)
}
}
We may want to turn it into a tail-recursive one by introducing a accumulator. However, if we pre-pend the accumulator, we will only get a stream of reversed order; if we append the accumulator with concatenation (#:::), it's NOT lazy (strict) any more.
What could be the solution here? Thanks
Turning a comment into an answer, there's nothing wrong with your merge.
It's not recursive at all - any one call to merge returns a new Stream without any other call to merge. a #:: merge(ass, bs) return a stream with first element a and where merge(ass, bs) will be called to evaluate the rest of the stream when required.
So
val m = merge(Stream.from(1,2), Stream.from(2, 2))
//> m : Stream[Int] = Stream(1, ?)
m.drop(10000000).take(1)
//> res0: scala.collection.immutable.Stream[Int] = Stream(10000001, ?)
works just fine. No stack overflow.

Scala's Stream and StackOverflowError

Consider this code (taken from "Functional programming principles in Scala" course by Martin Odersky):
def sieve(s: Stream[Int]): Stream[Int] = {
s.head #:: sieve(s.tail.filter(_ % s.head != 0))
}
val primes = sieve(Stream.from(2))
primes.take(1000).toList
It works just fine. Notice that sieve is in fact NOT tail recursive (or is it?), even though Stream's tail is lazy.
But this code:
def sieve(n: Int): Stream[Int] = {
n #:: sieve(n + 1).filter(_ % n != 0)
}
val primes = sieve(2)
primes.take(1000).toList
throws StackOverflowError.
What is the problem with the second example? I guess filter messes things up, but I can't understand why. It returns a Stream, so it souldn't make evaluation eager (am I right?)
You can highlight the problem with a bit of tracking code:
var counter1, counter2 = 0
def sieve1(s: Stream[Int]): Stream[Int] = {
counter1 += 1
s.head #:: sieve1(s.tail.filter(_ % s.head != 0))
}
def sieve2(n: Int): Stream[Int] = {
counter2 += 1
n #:: sieve2(n + 1).filter(_ % n != 0)
}
sieve1(Stream.from(2)).take(100).toList
sieve2(2).take(100).toList
We can run this and check the counters:
scala> counter1
res2: Int = 100
scala> counter2
res3: Int = 540
So in the first case the depth of the call stack is the number of primes, and in the second it's the largest prime itself (well, minus one).
Neither one of these are tail recursive.
Using the tailrec annotation will tell you whether or not a function is tail recursive.
Adding #tailrec to the two functions above gives:
import scala.annotation.tailrec
#tailrec
def sieve(s: Stream[Int]): Stream[Int] = {
s.head #:: sieve(s.tail.filter(_ % s.head != 0))
}
#tailrec
def sieve(n: Int): Stream[Int] = {
n #:: sieve(n + 1).filter(_ % n != 0)
}
Loading this shows that both definitions are not tail recursive:
<console>:10: error: could not optimize #tailrec annotated method sieve: it contains a recursive call not in tail position
s.head #:: sieve(s.tail.filter(_ % s.head != 0))
^
<console>:10: error: could not optimize #tailrec annotated method sieve: it contains a recursive call not in tail position
n #:: sieve(n + 1).filter(_ % n != 0)

Pattern matching and infinite streams

So, I'm working to teach myself Scala, and one of the things I've been playing with is the Stream class. I tried to use a naïve translation of the classic Haskell version of Dijkstra's solution to the Hamming number problem:
object LazyHammingBad {
private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
(a, b) match {
case (x #:: xs, y #:: ys) =>
if (x < y) x #:: merge(xs, b)
else if (y < x) y #:: merge(a, ys)
else x #:: merge(xs, ys)
}
val numbers: Stream[BigInt] =
1 #:: merge(numbers map { _ * 2 },
merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
}
Taking this for a spin in the interpreter led quickly to disappointment:
scala> LazyHammingBad.numbers.take(10).toList
java.lang.StackOverflowError
I decided to look to see if other people had solved the problem in Scala using the Haskell approach, and adapted this solution from Rosetta Code:
object LazyHammingGood {
private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
if (a.head < b.head) a.head #:: merge(a.tail, b)
else if (b.head < a.head) b.head #:: merge(a, b.tail)
else a.head #:: merge(a.tail, b.tail)
val numbers: Stream[BigInt] =
1 #:: merge(numbers map {_ * 2},
merge(numbers map {_ * 3}, numbers map {_ * 5}))
}
This one worked nicely, but I still wonder how I went wrong in LazyHammingBad. Does using #:: to destructure x #:: xs force the evaluation of xs for some reason? Is there any way to use pattern matching safely with infinite streams, or do you just have to use head and tail if you don't want things to blow up?
a match {case x#::xs =>... is about the same as val (x, xs) = (a.head, a.tail). So the difference between the bad version and the good one, is that in that in the bad version, you're calling a.tail and b.tail right at the start, instead of just use them to build the tail of the resulting stream. Furthermore when you use them at the right of #:: (not pattern matching, but building the result, as in #:: merge(a.b.tail) you are not actually calling merge, that will be done only later, when accessing the tail of the returned Stream. So in the good version, a call to merge does not call tail at all. In the bad version, it calls it right at start.
Now if you consider numbers, or even a simplified version, say 1 #:: merge(numbers, anotherStream), when you call you call tail on that (as take(10) will), merge has to be evaluated. You call tail on numbers, which call merge with numbers as parameters, which calls tails on numbers, which calls merge, which calls tail...
By contrast, in super lazy Haskell, when you pattern match, it does barely any work. When you do case l of x:xs, it will evaluate l just enough to know whether it is an empty list or a cons.
If it is indeed a cons, x and xs will be available as two thunks, functions that will eventually give access, later, to content. The closest equivalent in Scala would be to just test empty.
Note also that in Scala Stream, while the tail is lazy, the head is not. When you have a (non empty) Stream, the head has to be known. Which means that when you get the tail of the stream, itself a stream, its head, that is the second element of the original stream, has to be computed. This is sometimes problematic, but in your example, you fail before even getting there.
Note that you can do what you want by defining a better pattern matcher for Stream:
Here's a bit I just pulled together in a Scala Worksheet:
object HammingTest {
// A convenience object for stream pattern matching
object #:: {
class TailWrapper[+A](s: Stream[A]) {
def unwrap = s.tail
}
object TailWrapper {
implicit def unwrap[A](wrapped: TailWrapper[A]) = wrapped.unwrap
}
def unapply[A](s: Stream[A]): Option[(A, TailWrapper[A])] = {
if (s.isEmpty) None
else {
Some(s.head, new TailWrapper(s))
}
}
}
def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
(a, b) match {
case (x #:: xs, y #:: ys) =>
if (x < y) x #:: merge(xs, b)
else if (y < x) y #:: merge(a, ys)
else x #:: merge(xs, ys)
} //> merge: (a: Stream[BigInt], b: Stream[BigInt])Stream[BigInt]
lazy val numbers: Stream[BigInt] =
1 #:: merge(numbers map { _ * 2 }, merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
//> numbers : Stream[BigInt] = <lazy>
numbers.take(10).toList //> res0: List[BigInt] = List(1, 2, 3, 4, 5, 6, 8, 9, 10, 12)
}
Now you just need to make sure that Scala finds your object #:: instead of the one in Stream.class whenever it's doing pattern matching. To facilitate that, it might be best to use a different name like #>: or ##:: and then just remember to always use that name when pattern matching.
If you ever need to match the empty stream, use case Stream.Empty. Using case Stream() will attempt to evaluate your entire stream there in the pattern match, which will lead to sadness.