How should I use #:: / hash colon colon in Scala?

How should I use #:: / hash colon colon in Scala? - scala

In a Stackoverflow post about the creation of Fibonacci numbers I found the method #:: (What is the fastest way to write Fibonacci function in Scala?). In ScalaDocs I found this entry (see here, 1) describing the hash colon colon method as An extractor that allows to pattern match streams with #::.
I realized that I can use the fibonacci function like this
def fibonacci: Stream[Long] = {
def tail(h: Long, n: Long): Stream[Long] = h #:: tail(n, h + n)
tail(0, 1)
}
fibonacci(10) //res4: Long = 55
How should I understand the ScalaDocs explanation? Can you give an additional example?
Why it was not necessary to define a parameter in the fibonacci function above?

The method #:: is defined for Streams. It is similar to the :: method for Lists. The main difference between a List and a Stream is that the elements of a Stream are lazy evaluated.
There's some scala magic happens on the last line. Actually, first you're evaluating the fibonacci expression, and it returns a Stream object. The first and the second elements of this stream are 0 and 1, as follows from the third line of your example, and the rest of the Stream is defined via recursive call. And then you're extracting tenth element from the stream, and it evaluates to 55.
In the code below, I show similar access to the fourth List's element
val list = List(1,2,3,4,5)
println(list(3)) // prints 4
In a nutshell, think about Streams as infinite Lists. You can find more about Streams here http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream

In your example h #:: tail(n, h + n) creates a new stream, where the h is the head of the stream and tail(n, h + n) a stream which will be evaluated lazily.
Another (and maybe easier) example would be to define natural numbers as a stream of BigInt.
def naturalNumbers = {
def next(n: BigInt) : Stream[BigInt] = n #:: next(n + 1)
next(0)
}
println(naturalNumbers) would result in printing Stream(0, ?), because the head is strict, meaning that it will be always evaluated. The tail would be next(1), which is only evaluated when needed.
In your example fibonacci(10) is syntactic sugar for fibonacci.apply(10) which is defined in the Stream class and yields the element with the index in the stream.
You can also do a lot of others things with streams. For example get the first fibonacci number that is greater than 100: fibonacci.dropWhile(_ <= 100).head or just print the first 100 fibonacci numbers println(fibonacci.take(100).toList)

The quick answer to #2 is that fibonacci(10) isn't a function call with parameters, it's a function call with no parameters followed by an invocation of whatever is returned with the parameter "10".
It would have been easier to understand if written like this:
scala> val s = fibonacci
s: Stream[Long] = Stream(0, ?)
scala> s(10)
res1: Long = 55

Related

What is the intuition behind recursive algorithms with Streams?

Like the title says what is the intuition behind recursive algos with streams like:
val fibs: LazyList[Int] = (0 #:: fibs).scanLeft(1)(_ + _)
and
val fibs: LazyList[Int] = 0 #:: 1 #:: (fibs.zip(fibs.tail).map{ t => t._1 + t._2 })
How do they unfold? What is the base case for such algos (if it's Nil, why it's so?) and how do they progress towards fibs.take(5) e.g.?
EDIT.
I do understand there is no base case for a lazily defined Stream, as several people pointed out below. Rather, my question concerns what's the base case when infinite stream gets evaluated like in fibs.take(5)(the answer is Nil I believe, please correct me if I'm wrong) and what are the calculation steps in evaluating fibs.take(5)

It's say there are 2 things at play here:
recursive syntax making use of LazyList API
corecursive mathematics behind unfolding
So, let's start with a few words about API and syntax:
#:: takes lazy value and prepends it to LazyList definition, here it is fibs which makes its definition recursive on code level
LazyList lazily evaluates its arguments and then caches/memoizes them for future use letting us access already computed values immediately
However, the mechanism underneath is actually corecursive.
Let's see what is recursion when it comes to data using List as an example:
List(1,2,3,4)
This can be also written as
1 :: 2 :: 3 :: 4 :: Nil
Which is the same as
( ( ( Nil.::(4) ).::(3) ).::(2) ).::(1)
You can see that we:
take Nil
create ::(4, Nil) value which we use to
create ::(3, ::(4, Nil)) value
and so on
In other words, we have to start with some base case and build the whole things from-bottom-up. Such values by definition have to be finite and cannot be used to express series of (possibly) infinite computation.
But there exist an alternative which allows you to express such computations - corecursion and codata.
With corecursion you have a tuple:
the last computed value
a function which can take the value and return the next tuple (next value + next function!)
nothing prevent you from using the same function as second element of the tuple but it's good to have a choice
For instance you could define infinite series of LazyList(1, 2, 3, 4, 5, 6, ...) like:
// I use case class since
// type Pair = (Int, Int => Pair)
// would be illegal in Scala
final case class Pair(value: Int, f: Int => Pair)
val f: Int => Pair = n => Pair(n + 1, f)
Pair(1, f)
Then you would take Pair, get value out of it (1 initially) and use it to generate new Pairs (Pair(2, f), Pair(3, f), ...).
Structure which would use corecursion to generate its values would be called codata (so LazyList can be considered codata).
Same story with Fibonacci sequence, you could define it corecursively with
(Int, Int) as value (initialized to (0, 1)
val f: (Int, Int) => Pair = { case (n, m) => Pair((m, n + m), f } as function
finally, you'd have to pick _1 out of every generated (Int, Int) pair
However, LazyList's API gives you some nice tools so that you don't have to do this manually:
it memoizes (caches) computed values so you can access list(0), list(1), etc, they aren't forgotten right after use
it gives you methods like .map, .flatMap .scanLeft and so on, so while internally it might have more complex types used for corecursion, you are only seeing the final result that you need
Obviously, all of that is done lazily, by codata's definition: at each step you can only know values defined so far, and how to generate next of out it.
That leads us to your example:
val fibs: LazyList[Int] = (0 #:: fibs).scanLeft(1)(_ + _)
You can think of it as something that:
starts with a pair (0, f)
where the f takes this 0 argument, and combines it with 1 to create (0, 1) tuple
and then constructs next fs which trace the previous value, and passes it along current value to the function passed into scanLeft
where all the shenanigans with intermediate values and functions and memoization are handled internally by API
So if you asked me, the "base case" of such algos is a pair of value and function returning pair, run over and over again.

How do they unfold?
They don't. The #:: function takes a by-name argument, which means that it's evaluated lazily.
What is the base case for such algos (if it's Nil, why it's so?).
There is no "base case", these recursive definitions yield infinite streams:
scala> val fibs: LazyList[Int] = (0 #:: fibs).scanLeft(1)(_ + _)
val fibs: LazyList[Int] = LazyList(<not computed>)
scala> fibs.size
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
(Note the "<not computed>" token, which hints at the laziness)

First Element of a Lazy Stream in Scala

Here is a minimal example, I can define a function that gives my the next integer via
def nextInteger(input: Int): Int = input+1
I can then define a lazy stream of integers as
lazy val integers: Stream[Int] = 1 #:: integers map(x=>nextInteger(x))
To my surprise, taking the first element of this stream is 2 and not 1
scala> integers
res21: Stream[Int] = Stream(2, ?)
In this simple example I can achieve my desired result using 0 instead of 1 in the definition of integers, but how can one in general set up a stream such that the initial value isn't lost? In my case I am setting up an iterative algorithm and will want to know the initial value.
EDIT:
Furthermore, I've never understood the design choice which makes the following syntax fail:
scala> (integers take 10 toList) last
res27: Int = 11
scala> integers take 10 toList last
<console>:24: error: not found: value last
integers take 10 toList last
^
I find wrapping things in brackets cumbersome, is there a shorthand I am not aware of?

You're probably thinking that 1 #:: integers map(x=>nextInteger(x)) is parsed as 1 #:: (integers map(x=>nextInteger(x))) while it is actually parsed as (1 #:: integers).map(x=>nextInteger(x)). Adding parens fixes your problem:
val integers: Stream[Int] = 1 #:: (integers map nextInteger)
(Notice that since nextInteger is just a function, you don't need to make a lambda for it, and since Stream is already lazy, making integers lazy is unnecessary)
As to your edit, check out this excellent answer on the matter. In short: no there is no easy way. The thing is that unless you already know the arity of the functions involved, having something like what you suggest work would be hell for the next person reading your code... For example,
myList foo bar baz
Might be be myList.foo.bar.baz as well as myList.foo(bar).baz and you wouldn't know without checking the definitions of foo, bar, and baz. Scala decides to eliminate this ambiguity - it is always the latter.

When does Scala force a stream value?

I am comfortable with streams, but I admit I am puzzled by this behavior:
import collection.immutable.Stream
object StreamForceTest extends App {
println("Computing fibs")
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #::
fibs.zip(fibs.tail).map((x: (BigInt, BigInt)) => {
println("Adding " + x._1 + " and " + x._2);
x._1 + x._2
})
println("Taking first 5 elements")
val fibs5 = fibs.take(5)
println("Computing length of that prefix")
println("fibs5.length = " + fibs5.length)
}
with output
Computing fibs
Taking first 5 elements
Computing length of that prefix
Adding 0 and 1
Adding 1 and 1
Adding 1 and 2
fibs5.length = 5
Why should take(5) not force the stream's values to be computed,
while length does do so? Offhand neither one needs to actually
look at the values, but I would have thought that take was more
likely to do it than length. Inspecting the source code on github,
we find these definitions for take (including an illuminating
comment):
override def take(n: Int): Stream[A] = (
// Note that the n == 1 condition appears redundant but is not.
// It prevents "tail" from being referenced (and its head being evaluated)
// when obtaining the last element of the result. Such are the challenges
// of working with a lazy-but-not-really sequence.
if (n <= 0 || isEmpty) Stream.empty
else if (n == 1) cons(head, Stream.empty)
else cons(head, tail take n-1)
)
and length:
override def length: Int = {
var len = 0
var left = this
while (!left.isEmpty) {
len += 1
left = left.tail
}
len
}
The definition of head and tail is obtained from the specific
subclass (Empty and Cons). (Of course Empty is an object, not a
class, and its definitions of head and tail just throw
exceptions.) There are subtleties, but they seem to concern making
sure that the tail of a Cons is evaluated lazily; the head
definition is straight out of lecture 0 on Scala constructors.
Note that length doesn't go near head, but it's the one that
does the forcing.
All this is part of a general puzzlement about how close Scala streams
are to Haskell lists. I thought Haskell treated head and tail
symmetrically (I'm not a serious Haskell hacker), and Scala forced
head evaluation in more circumstances. I'm trying to figure out
exactly what those circumstances are.

Stream's head is strict and its tail is lazy, as you can see in cons.apply and in the Cons constructor:
def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)
class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]
Notice the context in which the take method refers to tail:
cons(head, tail take n-1)
Because the expression tail take n-1 is used as the second argument to cons, which is passed by name, it doesn't force evaluation of tail take n-1, thus doesn't force evaluation of tail.
Whereas in length, the statement
left = left.tail
, by assigning left.tail to a var, does force its evaluation.
Scala is "strict by default". In most situations, everything you reference will be evaluated. We only have lazy evaluation in cases where a method/constructor parameter declares an call-by-name argument with =>, and in the culture we don't typically use this unless there's a special reason.

Let me offer another answer to this, one that just looks from a high level, i.e. without actually considering the code.
If you want to know how long a Stream is, you must evaluate it all the way to the end. Otherwise, you can only guess at its length. Admittedly, you may not actually care about the values (since you only want to count them) but that's immaterial.
On the other hand, when you "take" a certain number of elements from a stream (or indeed any collection) you are simply saying that you want at most that number of elements. The result is still a stream even though it may have been truncated.

A variable used in its own definition?

An infinite stream:
val ones: Stream[Int] = Stream.cons(1, ones)
How is it possible for a value to be used in its own declaration? It seems this should produce a compiler error, yet it works.

It's not always a recursive definition. This actually works and produces 1:
val a : Int = a + 1
println(a)
variable a is created when you type val a: Int, so you can use it in the definition. Int is initialized to 0 by default. A class will be null.
As #Chris pointed out, Stream accepts => Stream[A] so a bit another rules are applied, but I wanted to explain general case. The idea is still the same, but the variable is passed by-name, so this makes the computation recursive. Given that it is passed by name, it is executed lazily. Stream computes each element one-by-one, so it calls ones each time it needs next element, resulting in the same element being produces once again. This works:
val ones: Stream[Int] = Stream.cons(1, ones)
println((ones take 10).toList) // List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Though you can make infinite stream easier: Stream.continually(1) Update As #SethTisue pointed out in the comments Stream.continually and Stream.cons are two completely different approaches, with very different results, because cons takes A when continually takes =>A, which means that continually recomputes each time the element and stores it in the memory, when cons can avoid storing it n times unless you convert it to the other structure like List. You should use continually only if you need to generate different values. See #SethTisue comment for details and examples.
But notice that you are required to specify the type, the same as with recursive functions.
And you can make the first example recursive:
lazy val b: Int = b + 1
println(b)
This will stackoverflow.

Look at the signature of Stream.cons.apply:
apply[A](hd: A, tl: ⇒ Stream[A]): Cons[A]
The ⇒ on the second parameter indicates that it has call-by-name semantics. Therefore your expression Stream.cons(1, ones) is not strictly evaluated; the argument ones does not need to be computed prior to being passed as an argument for tl.

The reason this does not produce a compiler error is because both Stream.cons and Cons are non-strict and lazily evaluate their second parameter.
ones can be used in it's own definition because the object cons has an apply method defined like this:
/** A stream consisting of a given first element and remaining elements
* #param hd The first element of the result stream
* #param tl The remaining elements of the result stream
*/
def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)
And Cons is defined like this:
final class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]
Notice that it's second parameter tl is passed by name (=> Stream[A]) rather than by value. In other words, the parameter tl is not evaluated until it is used in the function.
One advantage to using this technique is that you can compose complex expressions that may be only partially evaluated.

Infinite streams in Scala

Say I have a function, for example the old favourite
def factorial(n:Int) = (BigInt(1) /: (1 to n)) (_*_)
Now I want to find the biggest value of n for which factorial(n) fits in a Long. I could do
(1 to 100) takeWhile (factorial(_) <= Long.MaxValue) last
This works, but the 100 is an arbitrary large number; what I really want on the left hand side is an infinite stream that keeps generating higher numbers until the takeWhile condition is met.
I've come up with
val s = Stream.continually(1).zipWithIndex.map(p => p._1 + p._2)
but is there a better way?
(I'm also aware I could get a solution recursively but that's not what I'm looking for.)

Stream.from(1)
creates a stream starting from 1 and incrementing by 1. It's all in the API docs.

A Solution Using Iterators
You can also use an Iterator instead of a Stream. The Stream keeps references of all computed values. So if you plan to visit each value only once, an iterator is a more efficient approach. The downside of the iterator is its mutability, though.
There are some nice convenience methods for creating Iterators defined on its companion object.
Edit
Unfortunately there's no short (library supported) way I know of to achieve something like
Stream.from(1) takeWhile (factorial(_) <= Long.MaxValue) last
The approach I take to advance an Iterator for a certain number of elements is drop(n: Int) or dropWhile:
Iterator.from(1).dropWhile( factorial(_) <= Long.MaxValue).next - 1
The - 1 works for this special purpose but is not a general solution. But it should be no problem to implement a last method on an Iterator using pimp my library. The problem is taking the last element of an infinite Iterator could be problematic. So it should be implemented as method like lastWith integrating the takeWhile.
An ugly workaround can be done using sliding, which is implemented for Iterator:
scala> Iterator.from(1).sliding(2).dropWhile(_.tail.head < 10).next.head
res12: Int = 9

as #ziggystar pointed out, Streams keeps the list of previously computed values in memory, so using Iterator is a great improvment.
to further improve the answer, I would argue that "infinite streams", are usually computed (or can be computed) based on pre-computed values. if this is the case (and in your factorial stream it definately is), I would suggest using Iterator.iterate instead.
would look roughly like this:
scala> val it = Iterator.iterate((1,BigInt(1))){case (i,f) => (i+1,f*(i+1))}
it: Iterator[(Int, scala.math.BigInt)] = non-empty iterator
then, you could do something like:
scala> it.find(_._2 >= Long.MaxValue).map(_._1).get - 1
res0: Int = 22
or use #ziggystar sliding solution...
another easy example that comes to mind, would be fibonacci numbers:
scala> val it = Iterator.iterate((1,1)){case (a,b) => (b,a+b)}.map(_._1)
it: Iterator[Int] = non-empty iterator
in these cases, your'e not computing your new element from scratch every time, but rather do an O(1) work for every new element, which would improve your running time even more.

The original "factorial" function is not optimal, since factorials are computed from scratch every time. The simplest/immutable implementation using memoization is like this:
val f : Stream[BigInt] = 1 #:: (Stream.from(1) zip f).map { case (x,y) => x * y }
And now, the answer can be computed like this:
println( "count: " + (f takeWhile (_<Long.MaxValue)).length )

The following variant does not test the current, but the next integer, in order to find and return the last valid number:
Iterator.from(1).find(i => factorial(i+1) > Long.MaxValue).get
Using .get here is acceptable, since find on an infinite sequence will never return None.