First Element of a Lazy Stream in Scala

First Element of a Lazy Stream in Scala - scala

Here is a minimal example, I can define a function that gives my the next integer via
def nextInteger(input: Int): Int = input+1
I can then define a lazy stream of integers as
lazy val integers: Stream[Int] = 1 #:: integers map(x=>nextInteger(x))
To my surprise, taking the first element of this stream is 2 and not 1
scala> integers
res21: Stream[Int] = Stream(2, ?)
In this simple example I can achieve my desired result using 0 instead of 1 in the definition of integers, but how can one in general set up a stream such that the initial value isn't lost? In my case I am setting up an iterative algorithm and will want to know the initial value.
EDIT:
Furthermore, I've never understood the design choice which makes the following syntax fail:
scala> (integers take 10 toList) last
res27: Int = 11
scala> integers take 10 toList last
<console>:24: error: not found: value last
integers take 10 toList last
^
I find wrapping things in brackets cumbersome, is there a shorthand I am not aware of?

You're probably thinking that 1 #:: integers map(x=>nextInteger(x)) is parsed as 1 #:: (integers map(x=>nextInteger(x))) while it is actually parsed as (1 #:: integers).map(x=>nextInteger(x)). Adding parens fixes your problem:
val integers: Stream[Int] = 1 #:: (integers map nextInteger)
(Notice that since nextInteger is just a function, you don't need to make a lambda for it, and since Stream is already lazy, making integers lazy is unnecessary)
As to your edit, check out this excellent answer on the matter. In short: no there is no easy way. The thing is that unless you already know the arity of the functions involved, having something like what you suggest work would be hell for the next person reading your code... For example,
myList foo bar baz
Might be be myList.foo.bar.baz as well as myList.foo(bar).baz and you wouldn't know without checking the definitions of foo, bar, and baz. Scala decides to eliminate this ambiguity - it is always the latter.

Related

What is the intuition behind recursive algorithms with Streams?

Like the title says what is the intuition behind recursive algos with streams like:
val fibs: LazyList[Int] = (0 #:: fibs).scanLeft(1)(_ + _)
and
val fibs: LazyList[Int] = 0 #:: 1 #:: (fibs.zip(fibs.tail).map{ t => t._1 + t._2 })
How do they unfold? What is the base case for such algos (if it's Nil, why it's so?) and how do they progress towards fibs.take(5) e.g.?
EDIT.
I do understand there is no base case for a lazily defined Stream, as several people pointed out below. Rather, my question concerns what's the base case when infinite stream gets evaluated like in fibs.take(5)(the answer is Nil I believe, please correct me if I'm wrong) and what are the calculation steps in evaluating fibs.take(5)

It's say there are 2 things at play here:
recursive syntax making use of LazyList API
corecursive mathematics behind unfolding
So, let's start with a few words about API and syntax:
#:: takes lazy value and prepends it to LazyList definition, here it is fibs which makes its definition recursive on code level
LazyList lazily evaluates its arguments and then caches/memoizes them for future use letting us access already computed values immediately
However, the mechanism underneath is actually corecursive.
Let's see what is recursion when it comes to data using List as an example:
List(1,2,3,4)
This can be also written as
1 :: 2 :: 3 :: 4 :: Nil
Which is the same as
( ( ( Nil.::(4) ).::(3) ).::(2) ).::(1)
You can see that we:
take Nil
create ::(4, Nil) value which we use to
create ::(3, ::(4, Nil)) value
and so on
In other words, we have to start with some base case and build the whole things from-bottom-up. Such values by definition have to be finite and cannot be used to express series of (possibly) infinite computation.
But there exist an alternative which allows you to express such computations - corecursion and codata.
With corecursion you have a tuple:
the last computed value
a function which can take the value and return the next tuple (next value + next function!)
nothing prevent you from using the same function as second element of the tuple but it's good to have a choice
For instance you could define infinite series of LazyList(1, 2, 3, 4, 5, 6, ...) like:
// I use case class since
// type Pair = (Int, Int => Pair)
// would be illegal in Scala
final case class Pair(value: Int, f: Int => Pair)
val f: Int => Pair = n => Pair(n + 1, f)
Pair(1, f)
Then you would take Pair, get value out of it (1 initially) and use it to generate new Pairs (Pair(2, f), Pair(3, f), ...).
Structure which would use corecursion to generate its values would be called codata (so LazyList can be considered codata).
Same story with Fibonacci sequence, you could define it corecursively with
(Int, Int) as value (initialized to (0, 1)
val f: (Int, Int) => Pair = { case (n, m) => Pair((m, n + m), f } as function
finally, you'd have to pick _1 out of every generated (Int, Int) pair
However, LazyList's API gives you some nice tools so that you don't have to do this manually:
it memoizes (caches) computed values so you can access list(0), list(1), etc, they aren't forgotten right after use
it gives you methods like .map, .flatMap .scanLeft and so on, so while internally it might have more complex types used for corecursion, you are only seeing the final result that you need
Obviously, all of that is done lazily, by codata's definition: at each step you can only know values defined so far, and how to generate next of out it.
That leads us to your example:
val fibs: LazyList[Int] = (0 #:: fibs).scanLeft(1)(_ + _)
You can think of it as something that:
starts with a pair (0, f)
where the f takes this 0 argument, and combines it with 1 to create (0, 1) tuple
and then constructs next fs which trace the previous value, and passes it along current value to the function passed into scanLeft
where all the shenanigans with intermediate values and functions and memoization are handled internally by API
So if you asked me, the "base case" of such algos is a pair of value and function returning pair, run over and over again.

How do they unfold?
They don't. The #:: function takes a by-name argument, which means that it's evaluated lazily.
What is the base case for such algos (if it's Nil, why it's so?).
There is no "base case", these recursive definitions yield infinite streams:
scala> val fibs: LazyList[Int] = (0 #:: fibs).scanLeft(1)(_ + _)
val fibs: LazyList[Int] = LazyList(<not computed>)
scala> fibs.size
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
(Note the "<not computed>" token, which hints at the laziness)

dropWhile creates two iterators that have same underlying iterator?

I am observing, a behavior I don't fully understand:
scala> val a = Iterator(1,2,3,4,5)
a: Iterator[Int] = non-empty iterator
scala> val b = a.dropWhile(_ < 3)
b: Iterator[Int] = non-empty iterator
scala> b.next
res9: Int = 3
scala> b.next
res10: Int = 4
scala> a.next
res11: Int = 5
It looks like: iterator part (1,2,3) of iterator a is consumed, and (4,5) is left. Since 3 had to be evaluated it had to be consumed but by definition of dropWhile in has to be included in b. Iterator b is 3, (4,5) where (4,5) is whatever is left of a, the exactly same iterator. Is my understanding correct?
Given the above it looks quite dangerous, that behavior of b is altered by applying operations on a. Basically we have two objects pointing to the same location. Is using dropWhile like this bad style?

From the documentation for Iterator:
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it. The two most important exceptions are also the sole abstract methods: next and hasNext.
Basically, once you called any method on an iterator, other than next and hasNext, you should consider it destroyed, and dispose of it.

Is using dropWhile like this bad style?
yes :-)

lazy val v.s. val for recursive stream in Scala

I understand the basic of diff between val and lazy val .
but while I run across this example, I 'm confused.
The following code is right one. It is a recursion on stream type lazy value.
def recursive(): {
lazy val recurseValue: Stream[Int] = 1 #:: recurseValue.map(_+1)
recurseValue
}
If I change lazy val to val. It reports error.
def recursive(): {
//error forward reference failed.
val recurseValue: Stream[Int] = 1 #:: recurseValue.map(func)
recurseValue
}
My trace of thought in 2th example by substitution model/evaluation strategy is :
the right hand sight of #:: is call by name with that the value shall be of the form :
1 #:: ?,
and if 2th element being accessed afterward, it refer to current recurseValue value and rewriting it to :
1 :: ((1 #:: ?) map func) =
1 :: (func(1) #:: (? map func))
.... and so on and so on such that the compiler should success.
I don't see any error when I rewriting it ,is there somthing wrong?
EDIT:
CONCLUSION:I found it work fine if the val defined as a field. And I also noticed this post about implement of val. The conclusion is that the val has different implementation in method or field or REPL. That's confusing really.

That substitution model works for recursion if you are defining functions, but you can't define a variable in terms of itself unless it is lazy. All of the info needed to compute the right-hand side must be available for the assignment to take place, so a bit of laziness is required in order to recursively define a variable.
You probably don't really want to do this, but just to show that it works for functions:
scala> def r = { def x:Stream[Int] = 1#::( x map (_+1) ); x }
r: Stream[Int]
scala> r take 3 foreach println
1
2
3

How should I use #:: / hash colon colon in Scala?

In a Stackoverflow post about the creation of Fibonacci numbers I found the method #:: (What is the fastest way to write Fibonacci function in Scala?). In ScalaDocs I found this entry (see here, 1) describing the hash colon colon method as An extractor that allows to pattern match streams with #::.
I realized that I can use the fibonacci function like this
def fibonacci: Stream[Long] = {
def tail(h: Long, n: Long): Stream[Long] = h #:: tail(n, h + n)
tail(0, 1)
}
fibonacci(10) //res4: Long = 55
How should I understand the ScalaDocs explanation? Can you give an additional example?
Why it was not necessary to define a parameter in the fibonacci function above?

The method #:: is defined for Streams. It is similar to the :: method for Lists. The main difference between a List and a Stream is that the elements of a Stream are lazy evaluated.
There's some scala magic happens on the last line. Actually, first you're evaluating the fibonacci expression, and it returns a Stream object. The first and the second elements of this stream are 0 and 1, as follows from the third line of your example, and the rest of the Stream is defined via recursive call. And then you're extracting tenth element from the stream, and it evaluates to 55.
In the code below, I show similar access to the fourth List's element
val list = List(1,2,3,4,5)
println(list(3)) // prints 4
In a nutshell, think about Streams as infinite Lists. You can find more about Streams here http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream

In your example h #:: tail(n, h + n) creates a new stream, where the h is the head of the stream and tail(n, h + n) a stream which will be evaluated lazily.
Another (and maybe easier) example would be to define natural numbers as a stream of BigInt.
def naturalNumbers = {
def next(n: BigInt) : Stream[BigInt] = n #:: next(n + 1)
next(0)
}
println(naturalNumbers) would result in printing Stream(0, ?), because the head is strict, meaning that it will be always evaluated. The tail would be next(1), which is only evaluated when needed.
In your example fibonacci(10) is syntactic sugar for fibonacci.apply(10) which is defined in the Stream class and yields the element with the index in the stream.
You can also do a lot of others things with streams. For example get the first fibonacci number that is greater than 100: fibonacci.dropWhile(_ <= 100).head or just print the first 100 fibonacci numbers println(fibonacci.take(100).toList)

The quick answer to #2 is that fibonacci(10) isn't a function call with parameters, it's a function call with no parameters followed by an invocation of whatever is returned with the parameter "10".
It would have been easier to understand if written like this:
scala> val s = fibonacci
s: Stream[Long] = Stream(0, ?)
scala> s(10)
res1: Long = 55

Infinite streams in Scala

Say I have a function, for example the old favourite
def factorial(n:Int) = (BigInt(1) /: (1 to n)) (_*_)
Now I want to find the biggest value of n for which factorial(n) fits in a Long. I could do
(1 to 100) takeWhile (factorial(_) <= Long.MaxValue) last
This works, but the 100 is an arbitrary large number; what I really want on the left hand side is an infinite stream that keeps generating higher numbers until the takeWhile condition is met.
I've come up with
val s = Stream.continually(1).zipWithIndex.map(p => p._1 + p._2)
but is there a better way?
(I'm also aware I could get a solution recursively but that's not what I'm looking for.)

Stream.from(1)
creates a stream starting from 1 and incrementing by 1. It's all in the API docs.

A Solution Using Iterators
You can also use an Iterator instead of a Stream. The Stream keeps references of all computed values. So if you plan to visit each value only once, an iterator is a more efficient approach. The downside of the iterator is its mutability, though.
There are some nice convenience methods for creating Iterators defined on its companion object.
Edit
Unfortunately there's no short (library supported) way I know of to achieve something like
Stream.from(1) takeWhile (factorial(_) <= Long.MaxValue) last
The approach I take to advance an Iterator for a certain number of elements is drop(n: Int) or dropWhile:
Iterator.from(1).dropWhile( factorial(_) <= Long.MaxValue).next - 1
The - 1 works for this special purpose but is not a general solution. But it should be no problem to implement a last method on an Iterator using pimp my library. The problem is taking the last element of an infinite Iterator could be problematic. So it should be implemented as method like lastWith integrating the takeWhile.
An ugly workaround can be done using sliding, which is implemented for Iterator:
scala> Iterator.from(1).sliding(2).dropWhile(_.tail.head < 10).next.head
res12: Int = 9

as #ziggystar pointed out, Streams keeps the list of previously computed values in memory, so using Iterator is a great improvment.
to further improve the answer, I would argue that "infinite streams", are usually computed (or can be computed) based on pre-computed values. if this is the case (and in your factorial stream it definately is), I would suggest using Iterator.iterate instead.
would look roughly like this:
scala> val it = Iterator.iterate((1,BigInt(1))){case (i,f) => (i+1,f*(i+1))}
it: Iterator[(Int, scala.math.BigInt)] = non-empty iterator
then, you could do something like:
scala> it.find(_._2 >= Long.MaxValue).map(_._1).get - 1
res0: Int = 22
or use #ziggystar sliding solution...
another easy example that comes to mind, would be fibonacci numbers:
scala> val it = Iterator.iterate((1,1)){case (a,b) => (b,a+b)}.map(_._1)
it: Iterator[Int] = non-empty iterator
in these cases, your'e not computing your new element from scratch every time, but rather do an O(1) work for every new element, which would improve your running time even more.

The original "factorial" function is not optimal, since factorials are computed from scratch every time. The simplest/immutable implementation using memoization is like this:
val f : Stream[BigInt] = 1 #:: (Stream.from(1) zip f).map { case (x,y) => x * y }
And now, the answer can be computed like this:
println( "count: " + (f takeWhile (_<Long.MaxValue)).length )

The following variant does not test the current, but the next integer, in order to find and return the last valid number:
Iterator.from(1).find(i => factorial(i+1) > Long.MaxValue).get
Using .get here is acceptable, since find on an infinite sequence will never return None.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

First Element of a Lazy Stream in Scala - scala

Related

What is the intuition behind recursive algorithms with Streams?

dropWhile creates two iterators that have same underlying iterator?

lazy val v.s. val for recursive stream in Scala

How should I use #:: / hash colon colon in Scala?

Infinite streams in Scala

Categories

Resources