Scala: Streams not acting lazy? - scala

I know streams are supposed to be lazily evaluated sequences in Scala, but I think I am suffering from some sort of fundamental misunderstanding because they seem to be more eager than I would have expected.
In this example:
val initial = Stream(1)
lazy val bad = Stream(1/0)
println((initial ++ bad) take 1)
I get a java.lang.ArithmeticException, which seems to be cause by zero division. I would expect that bad would never get evaluated since I only asked for one element from the stream. What's wrong?

OK, so after commenting other answers, I figured I could as well turn my comments into a proper answer.
Streams are indeed lazy, and will only compute their elements on demand (and you can use #:: to construct a stream element by element, much like :: for List). By example, the following will not throw any exception:
(1/2) #:: (1/0) #:: Stream.empty
This is because when applying #::, the tail is passed by name so as to not evaluate it eagerly, but only when needed (see ConsWrapper.# ::, const.apply and class Cons in Stream.scala for more details).
On the other hand, the head is passed by value, which means that it will always be eagerly evaluated, no matter what (as mentioned by Senthil). This means that doing the following will actually throw a ArithmeticException:
(1/0) #:: Stream.empty
It is a gotcha worth knowing about streams. However, this is not the issue you are facing.
In your case, the arithmetic exception happens before even instantiating a single Stream. When calling Stream.apply in lazy val bad = Stream(1/0), the argument is eagerly executed because it is not declared as a by name parameter. Stream.apply actually takes a vararg parameter, and those are necessarily passed by value.
And even if it was passed by name, the ArithmeticException would be triggered shortly after, because as said earlier the head of a Stream is always early evaluated.

The fact that Streams are lazy doesn't change the fact that method arguments are evaluated eagerly.
Stream(1/0) expands to Stream.apply(1/0). The semantics of the language require that the arguments are evaluated before the method is called (since the Stream.apply method doesn't use call-by-name arguments), so it attempts to evaluate 1/0 to pass as the argument to the Stream.apply method, which causes your ArithmeticException.
There are a few ways you can get this working though. Since you've already declared bad as a lazy val, the easiest is probably to use the also-lazy #::: stream concatenation operator to avoid forcing evaluation:
val initial = Stream(1)
lazy val bad = Stream(1/0)
println((initial #::: bad) take 1)
// => Stream(1, ?)

The Stream will evaluate the head & remaining tail is evaluated lazily. In your example, both the streams are having only the head & hence giving an error.

Related

How does LazyList.fill(n) actually works in Scala?

I'm trying to understand how does the LazyList.fill actually works. I implemented a retry logic using LazyList.fill(n). But seems like it is not working as expected.
def retry[T](n: Int)(block: => T): Try[T] = {
val lazyList = LazyList.fill(n)(Try(block))
lazyList find (_.isSuccess) getOrElse lazyList.head
}
Considering the above piece of code, I am trying to execute block with a retry logic. If the execution succeeds, return the result from block else retry until it succeeds for a maximum of n attempts.
Is it like LazyList will evaluate the first element and if it finds true, it skips the evaluation for the remaining elements in the list?
As I already mentiond in the comment, this is exactly what a LazyList is supposed to do.
The elements in a LazyList are also materialized/computed only when there is demand from an actual consumer.
And find method of LazyList respect this lazyness. You can find it cleary written in documentation as well - https://www.scala-lang.org/api/2.13.x/scala/collection/immutable/LazyList.html#find(p:A=%3EBoolean):Option[A]
def find(p: (A) => Boolean): Option[A]
// Finds the first element of the lazy list satisfying a predicate, if any.
// Note: may not terminate for infinite-sized collections.
// This method does not evaluate any elements further than the first element matching the predicate.
So, If the first element succeeds, it will stop at the first element itself.
Also, if you are writing a retry method then you probably also want to stop at first success. Why would you want to continue evaluating the block even after the suceess.
You might want to better clarify your exact requirements to get a more helpful answer.

What is meant by "effectively tail recursive"?

Chapter 7 in FP in Scala deals with creating a purely functional library for handling concurrency. To that end, it defines a type
type Par[A] = ExecutorService => Future[A]
and a set of useful functions such as fork
def fork[A] (a: => Par[A]): Par[A] =
es => es.submit(new Callable[A] {
def call = a(es).get
})
One of the exercises is about the function sequence with the following signature
def sequence[A](ps: List[Par[A]]): Par[List[A]]
The solution using foldRight is straightforward. However the authors included two other versions as answers, one of which states the following
// This implementation forks the recursive step off to a new logical thread,
// making it effectively tail-recursive. However, we are constructing
// a right-nested parallel program, and we can get better performance by
// dividing the list in half, and running both halves in parallel.
// See `sequenceBalanced` below.
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] =
as match {
case Nil => unit(Nil)
case h :: t => map2(h, fork(sequenceRight(t)))(_ :: _)
}
I am not quite sure what is meant by "effectively tail recursive". From the definition of fork it is clear that it accepts a by name parameter and so map2(h, fork(sequenceRight(t)))(_ :: _) will not evaluate the second parameter (until the executor service is provided). But that doesn't tell me how and why it is "effectively tail recursive".
Let's take some List(a, b, c). After passing it into sequenceRight it will turn into:
map2(
a,
fork(
map2(
b,
fork(
map2(
c,
fork(unit(Nil)
)(_ :: _)
)
)(_ :: _)
)
)(_ :: _)
This isn't tail recursive at all and compiler cannot treat it as one. However, when you would evaluate how it would be executed:
fork would make whatever you pass to it async, so it would return immediately,
map2 implementation will not block the execution until fork is executed to apply the function passed to map2, instead it would asynchronously transform the result calculated in fork to prepend the value
since recursion is done asynchronously, posting things to ExecutorService and appending operation to Future let you treat ExecutorService+Future like a trampoline
As a result what actually happens is:
sequenceRight(List(a, b, c)) call `map2(a, fork(sequenceRight(List(b, c))(_ :: _)
a will complete when it will complete but we can hold it as value even now
fork(sequenceRight(List(b, c)) is scheduled, but we aren't waiting until it complete, we can pass it around already
we can create a Future that will combine the result of the 2 above (and return it) without waiting for any of them completing!
as a result, this Future is returned immediately! It still runs, but this one call is finished!
same is true for recursively created Futures
once c and fork(unit(Nil)) completes, rResult :: Nil is computed
this allows completion of bResult :: cResult :: Nil
and this finally allows computation of the final result
In other words tail-recursive refers to recursive calls being rewritten into while loop to avoid stack overflow. But stack overflow is only an issue if recursive calls are being made synchronously. If you are returning something async, then the backtracing is shoved to ExecutionServices and Futures' observers, so they are hidden in a heap. From that point of view they solve the same issue of stack-overflow as tail-recursive calls, so "in spirit" they could be considered somewhat similar.
This is certainly not tail-recursion, and I think, they know it – that's why they added "effectively" :).
What they must mean is that this implementation does not create additional frames on stack for recursive invocations, which is true, since those invocations happen asynchronously, as you noted.
Now, whether this situation is even deserves to be called a "recursion" at all is a good question. I don't think there is a single accepted answer to that. I personally lean toward "yes", because the definition of recursion as "referencing itself" definitely includes this situation, and I don't really know how else to define it so that async invocations are excluded, but tail-recursion is not.
BTW, I am not much of an expert in Javascript, but I hear that the term "asynchronous recursion" is used fairly widely there.

Equivalent of Iterator.continually for an Iterable?

I need to produce an java.lang.Iterable[T] where the supply of T is some long running operation. In addition, after T is supplied it is wrapped and further computation is made to prepare for the next iteration.
Initially I thought I could do this with Iterator.continually. However, calling toIterable on the result actually creates a Stream[T] - with the problem being that the head is eagerly evaluated, which I don't want.
How can I either:
Create an Iterable[T] from a supplying function or
Convert an Iterator[T] into an Iterable[T] without using Stream?
In Scala 2.13, you can use LazyList:
LazyList.continually(1)
Unlike Stream, LazyList is also lazy in its head.
Because java.lang.Iterable is a very simple API, it's trivial to go from a scala.collection.Iterator to it.
case class IterableFromIterator[T](override val iterator:java.util.Iterator[T]) extends java.lang.Iterable[T]
val iterable:java.lang.Iterable[T] = IterableFromIterator(Iterator.continually(...).asJava)
Note this contradicts the expectation that iterable.iterator() produces a fresh Iterator each time; instead, iterable.iterator() can only be called once.

Is it safe to remove elements from a collection.mutable.HashSet during iteration?

A mutable Set's retain method is implemented as follows:
def retain(p: A => Boolean): Unit =
for (elem <- this.toList) // SI-7269 toList avoids ConcurrentModificationException
if (!p(elem)) this -= elem
But if I implement my own method that doesn't make a copy for iterating, nothing blows up.
def dumbRetain[A](self: mutable.Set[A], p: A => Boolean): Unit =
for (elem <- self)
if (!p(elem)) self -= elem
dumbRetain(mutable.HashSet(1,2,3,4,5,6), Set(2,4,6))
// everything is ok
I see that SI-7269's test case uses the JavaConversions wrapper around a java Set/Map, and it seems like the issue arises from the underlying java collection.
I know there will never be a java collection passed to my algorithm, so can I use dumbRetain without worrying about the ConcurrentModificationException? Or is this "coincidental behavior" that I shouldn't rely on?
edit to clarify, I would be using dumbRetain as an implementation detail in an algorithm which would be in full control of what it passes to dumbRetain. And this would be run in a single-threaded context.
This seems to rely on the specific implementation of mutable.HashSet, and there is nothing in the API that guarantees that it would work for all other implementations of mutable.Set, even if we exclude all wrappers for the Java collections.
The for-loop
for (elem <- self) {
...
}
is desugared into foreach, which for mutable.HashSet is implemented as follows:
override def foreach[U](f: A => U) {
var i = 0
val len = table.length
while (i < len) {
val curEntry = table(i)
if (curEntry ne null) f(entryToElem(curEntry))
i += 1
}
}
Essentially, it simply loops through the Array of the underlying FlatHashTable, and invokes the passed function f on every element. The whole foreach simply does not have any lines which could throw anything, it doesn't check for concurrent [footnote-1] modifications at all.
A ConcurrentModificationException seems to be the less troubling case: at least, your program fails fast, and even returns a detailed stack trace that points to the line in which the problem occurred. It would be actually much worse if it simply deteriorated into undefined behavior without throwing anything. This would be the worst case. However, this worst case shouldn't occur for collections from the standard library: Throw ConcurrentModificationException exception's in scala collections? #188
Quote:
In scala/scala#5295 (merged in to 2.12.x) I made sure that removing the element last returned from an iterator would not cause a problem for the iterator.
So, as long as you clearly state in the documentation that only the collections from standard library are supported, you will most likely not have any problems using it in your own code. But if you use it in a public interface, this would be an invitation for a bug analogous to "SI-7269" quoted in your question.
[footnote-1] "concurrent" as in "ConcurrentModificationException", not as in "concurrently executed threads".
EDIT: I've tried to choose less ambiguous formulations. Great Thanks #Dima for the feedback and the numerous suggestions.
Yeah, you can do it, as long as you are sure this is the scala's native HashSet implementation, not a wrapper around java ... and with understanding, that this is not thread-safe, and should never be used concurrently (the original HashSet.retain is that way too as well as the other mutators).
Better yet, just use immutable Set.filter, unless you actually have real hard evidence (not just intuition) demonstrating that your specific case absolutely requires mutable container.

Why `Source.fromFile(...).getLines()` is empty after I've iterated over it?

It was quite a surprise for me that (line <- lines) is so devastating! It completely unwinds lines iterator. So running the following snippet will make size = 0 :
val lines = Source.fromFile(args(0)).getLines()
var cnt = 0
for (line <- lines) {
cnt = readLines(line, cnt)
}
val size = lines.size
Is it a normal Scala practice to have well-hidden side-effects like this?
Source.getLines() returns an iterator. For every iterator, if you invoke a bulk operation such as foreach above, or map, take, toList, etc., then the iterator is no longer in a usable state.
That is the contract for Iterators and, more generally, classes that inherit TraversableOnce.
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it. The two most important exceptions are also the sole abstract methods: next and hasNext.
This is not the case for classes that inherit Traversable -- for those you can invoke the bulk traversal operations as many times as you want.
Source.getLines() returns an Iterator, and walking through an Iterator will mutate it. This is made quite clear in the Scala documentation
An iterator is mutable: most operations on it change its state. While it is often used to iterate through the elements of a collection, it can also be used without being backed by any collection (see constructors on the companion object).
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it. The two most important exceptions are also the sole abstract methods: next and hasNext.
Using for notation is just syntactic sugar for calling map, flatMap and foreach methods on the Iterator, which again have quite clear documentation stating not to use the iterator:
Reuse: After calling this method, one should discard the iterator it was called on, and use only the iterator that was returned. Using the old iterator is undefined, subject to change, and may result in changes to the new iterator as well.
Scala generally aims to be a 'pragmatic' language - mutation and side effects are allowed for performance and inter-operability reasons, although not encouraged. To call it 'well-hidden' is, however, something of a stretch.