scala - lazy iterator calls next too many times? - scala

I'm trying to build a lazy iterator that pulls from a blocking queue, and have encountered a weird problem where next() appears to be called more times than expected. Because my queue is blocking, this causes my application to get stuck in certain cases.
Some simplified sample code:
"infinite iterators" should {
def mkIter = new Iterable[Int] {
var i = 0
override def iterator: Iterator[Int] = {
new Iterator[Int] {
override def hasNext: Boolean = true
override def next(): Int = {
i = i + 1
i
}
}
}
override def toString(): String = "lazy"
}
"return subsets - not lazy" in {
val x = mkIter
x.take(2).toList must equal(List(1, 2))
x.take(2).toList must equal(List(3, 4))
}
"return subsets - lazy" in {
val x = mkIter
x.view.take(2).toList must equal(List(1, 2))
x.view.take(2).toList must equal(List(3, 4))
}
}
In the example above, the lazy test fails because the second call to take(2) returns List(4, 5).
Given that I see this behaviour with both Scala 2.10 and 2.11, I suspect the error is mine, but I'm not sure what I'm missing.

take invalidates iterators. See the code example at the top of http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.Iterator

As explained by #dlwh, Scala is explicitly documented to not allow reuse of an iterator after calling take(Int). That said, a way to implement my core use case is to create a new stream each time I want to get another element out of the iterator.
Adding to my example in the original question:
"return subsets - streams" in {
val x = mkIter
x.toStream.take(2).toList must equal(List(1, 2))
x.toStream.take(2).toList must equal(List(3, 4))
}
Note that toStream has the side effect of calling next() on the iterator, so this is only safe if you know you will be taking at least one item off of the stream. The advantage streams have over lazy views is that it will not call next() more than the minimum number of times needed.

Related

Is it possible to reference count call location?

This question isn't programming language specific (the more general the better), but I'm working in Scala (not necessarily on the JVM). Is there a means to reference count by call location, not the number of total calls? In particular, it would be great to be able to detect if a given method is called from more than one call location.
I think I can fake it to some extent by doing a reference equality check with a function, but this could be abused easily by having a global-ish token, or even calling the function multiple times in the same scope:
sealed case class Token();
class MyClass[A] {
var tokenOpt: Option[Token] = None
def callMeFromOnePlace(x: A)(implicit tk: Token) = {
tokenOpt match {
case Some(priorTk) => if (priorTk ne tk) throw new IllegalStateException("")
case None => tokenOpt = Some(tk)
}
// Do some work ...
}
}
Then this should work fine:
val myObj = new MyClass[Int]
val myIntList = List(1,2,3)
implicit val token = Token()
myIntList.map(ii => myObj.callMeFromOnePlace(ii))
But unfortunately, so would this:
val myObj = new MyClass[Int]
implicit val token = Token()
myObj.callMeFromOnePlace(1)
myObj.callMeFromOnePlace(1) //oops, want this to fail
When you are talking about call location, it can be represented by a call stack trace. Here is a simple example:
// keep track of calls here (you can use immutable style if you want)
var callCounts = Map.empty[Int, Int]
def f(): Unit = {
// calculate call stack trace hashCode for more efficient storage
// .toSeq makes WrappedArray, that knows how to properly calculate .hashCode()
val hashCode = new RuntimeException().getStackTrace.toSeq.hashCode()
val callLocation = hashCode
callCounts += (callLocation -> (callCounts.getOrElse(callLocation, 0) + 1))
}
List(1,2,3).foreach(_ =>
f()
)
f()
f()
println(callCounts) // Map(75070239 -> 3, 900408638 -> 1, -1658734417 -> 1)
I am not completely clear what you want to do but for your //oops.. example to fail you need just check the PriorTk is not None. (do note that it is not a thread safe solution )
For completeness, enforcing these kind of constraints from a type system perspective requires linear types.

Scala: what is the interest in using Iterators?

I have used Iterators after have worked with Regexes in Scala but I don't really understand the interest.
I know that it has a state and if I call the next() method on it, it will output a different result every time, but I don't see anything I can do with it and that is not possible with an Iterable.
And it doesn't seem to work as Akka Streams (for example) since the following example directly prints all the numbers (without waiting one second as I would expect it):
lazy val a = Iterator({Thread.sleep(1000); 1}, {Thread.sleep(1000); 2}, {Thread.sleep(1000); 3})
while(a.hasNext){ println(a.next()) }
So what is the purpose of using Iterators?
Perhaps, the most useful property of iterators is that they are lazy.
Consider something like this:
(1 to 10000)
.map { x => x * x }
.map { _.toString }
.find { _ == "4" }
This snippet will square 10000 numbers, then generate 10000 strings, and then return the second one.
This on the other hand:
(1 to 10000)
.iterator
.map { x => x * x }
.map { _.toString }
.find { _ == "4" }
... only computes two squares, and generates two strings.
Iterators are also often useful when you need to wrap around some poorly designed (java?) objects in order to be able to handle them in functional style:
val rs: ResultSet = jdbcQuery.executeQuery()
new Iterator {
def next = rs
def hasNext = rs.next
}.map { rs =>
fetchData(rs)
}
Streams are similar to iterators - they are also lazy, and also useful for wrapping:
Stream.continually(rs).takeWhile { _.next }.map(fetchData)
The main difference though is that streams remember the data that gets materialized, so that you can traverse them more than once. This is convenient, but may be costly if the original amount of data is very large, especially, if it gets filtered down to much smaller size:
Source
.fromFile("huge_file.txt")
.getLines
.filter(_ == "")
.toList
This only uses, roughly (ignoring buffering, object overhead, and other implementation specific details), the amount of memory, necessary to keep one line in memory, plus however many empty lines there are in the file.
This on the other hand:
val reader = new FileReader("huge_file.txt")
Stream
.continually(reader.readLine)
.takeWhile(_ != null)
.filter(_ == "")
.toList
... will end up with the entire content of the huge_file.txt in memory.
Finally, if I understand the intent of your example correctly, here is how you could do it with iterators:
val iterator = Seq(1,2,3).iterator.map { n => Thread.sleep(1000); n }
iterator.foreach(println)
// Or while(iterator.hasNext) { println(iterator.next) } as you had it.
There is a good explanation of what iterator is http://www.scala-lang.org/docu/files/collections-api/collections_43.html
An iterator is not a collection, but rather a way to access the
elements of a collection one by one. The two basic operations on an
iterator it are next and hasNext. A call to it.next() will return the
next element of the iterator and advance the state of the iterator.
Calling next again on the same iterator will then yield the element
one beyond the one returned previously. If there are no more elements
to return, a call to next will throw a NoSuchElementException.
First of all you should understand what is wrong with your example:
lazy val a = Iterator({Thread.sleep(1); 1}, {Thread.sleep(1); 2},
{Thread.sleep(2); 3}) while(a.hasNext){ println(a.next()) }
if you look at the apply method of Iterator, you'll see there are no calls by name,so all Thread.sleep are calling at the same time when apply method calls. Also Thread.sleep takes parameter of time to sleep in milliseconds, so if you want to sleep your thread on one second you should pass Thread.sleep(1000).
The companion object has additional methods which allow you do the next:
val a = Iterator.iterate(1)(x => {Thread.sleep(1000); x+1})
Iterator is very useful when you need to work with large data. Also you can implement your own:
val it = new Iterator[Int] {
var i = -1
def hasNext = true
def next(): Int = { i += 1; i }
}
I don't see anything I can do with it and that is not possible with an Iterable
In fact, what most collection can do can also be done with Array, but we don't do that because it's much less convenient
So same reason apply to iterator, if you want to model a mutable state, then iterator makes more sense.
For example, Random is implemented in a way resemble to iterator because it's use case fit more naturally in iterator, rather than iterable.

Does scala have a lazy evaluating wrapper?

I want to return a wrapper/holder for a result that I want to compute only once and only if the result is actually used. Something like:
def getAnswer(question: Question): Lazy[Answer] = ???
println(getAnswer(q).value)
This should be pretty easy to implement using lazy val:
class Lazy[T](f: () => T) {
private lazy val _result = Try(f())
def value: T = _result.get
}
But I'm wondering if there's already something like this baked into the standard API.
A quick search pointed at Streams and DelayedLazyVal but neither is quite what I'm looking for.
Streams do memoize the stream elements, but it seems like the first element is computed at construction:
def compute(): Int = { println("computing"); 1 }
val s1 = compute() #:: Stream.empty
// computing is printed here, before doing s1.take(1)
In a similar vein, DelayedLazyVal starts computing upon construction, even requires an execution context:
val dlv = new DelayedLazyVal(() => 1, { println("started") })
// immediately prints out "started"
There's scalaz.Need which I think you'd be able to use for this.

Scala - weird behaviour with Iterator.toList

I am new to Scala and I have a function as follows:
def selectSame(messages: BufferedIterator[Int]) = {
val head = messages.head
messages.takeWhile(_ == head)
}
Which is selecting from a buffered iterator only the elems matching the head. I am subsequently using this code:
val messageStream = List(1,1,1,2,2,3,3)
if (!messageStream.isEmpty) {
var lastTimeStamp = messageStream.head.timestamp
while (!messageStream.isEmpty) {
val messages = selectSame(messageStream).toList
println(messages)
}
Upon first execution I am getting (1,1,1) as expected, but then I only get the List(2), like if I lost one element down the line... Probably I am doing sth wrong with the iterators/lists, but I am a bit lost here.
Scaladoc of Iterator says about takeWhile:
Reuse: After calling this method, one should discard the iterator it
was called on, and use only the iterator that was returned. Using the
old iterator is undefined, subject to change, and may result in
changes to the new iterator as well.
So that's why. This basically means you cannot directly do what you want with Iterators and takeWhile. IMHO, easiest would be to quickly write your own recursive function to do that.
If you want to stick with Iterators, you could use the sameElements method on the Iterator to generate a duplicate where you'd call dropWhile.
Even better: Use span repeatedly:
def selectSame(messages: BufferedIterator[Int]) = {
val head = messages.head
messages.span(_ == head)
}
def iter(msgStream: BufferedIterator[Int]): Unit = if (!msgStream.isEmpty) {
val (msgs, rest) = selectSame(msgStream)
println(msgs.toList)
iter(rest)
}
val messageStream = List(1,1,1,2,2,3,3)
if (!messageStream.isEmpty) {
var lastTimeStamp = messageStream.head.timestamp
iter(messageStream0
}

Scala View + Stream combo causing OutOfMemory Error. How do I replace it with a View?

I was looking at solving a very simple problem, Eratosthenes sieve, using idiomatic Scala, for learning purposes.
I've learned a Stream caches, so it is not so performant when determining the nth element because it's an O(n) complexity access with memoisation of data, therefore not suitable for this situation.
def primes(nums: Stream[Int]): Stream[Int] = {
Stream.cons(nums.head,
primes((nums tail) filter (x => x % nums.head != 0)))
}
def ints(n: Int): Stream[Int] = {
Stream.cons(n, ints(n + 1))
};
def nthPrime(n: Int): Int = {
val prim = primes(ints(2)).view take n toList;
return prim(n - 1);
};
The Integer stream is the problematic one. While the prime number filtering is done, JVM runs OutOfMemory. What is the correct way to achieve the same functionality without using Streams?
Basically take a view of primes from a view of ints and display the last element, without memoisation?
I have had similar cases where a stream was a good idea, but I did not need to store it's values. In order to consume the stream without storing it's values I created (what I called) ThrowAwayIterator:
class ThrowAwayIterator[T](var stream: Stream[T]) extends Iterator[T] {
def hasNext: Boolean = stream.nonEmpty
def next(): T = {
val next = stream.head
stream = stream.tail
next
}
}
Make sure that you do not store a reference to the instance of stream that is passed in.