Scala - weird behaviour with Iterator.toList - scala

I am new to Scala and I have a function as follows:
def selectSame(messages: BufferedIterator[Int]) = {
val head = messages.head
messages.takeWhile(_ == head)
}
Which is selecting from a buffered iterator only the elems matching the head. I am subsequently using this code:
val messageStream = List(1,1,1,2,2,3,3)
if (!messageStream.isEmpty) {
var lastTimeStamp = messageStream.head.timestamp
while (!messageStream.isEmpty) {
val messages = selectSame(messageStream).toList
println(messages)
}
Upon first execution I am getting (1,1,1) as expected, but then I only get the List(2), like if I lost one element down the line... Probably I am doing sth wrong with the iterators/lists, but I am a bit lost here.

Scaladoc of Iterator says about takeWhile:
Reuse: After calling this method, one should discard the iterator it
was called on, and use only the iterator that was returned. Using the
old iterator is undefined, subject to change, and may result in
changes to the new iterator as well.
So that's why. This basically means you cannot directly do what you want with Iterators and takeWhile. IMHO, easiest would be to quickly write your own recursive function to do that.
If you want to stick with Iterators, you could use the sameElements method on the Iterator to generate a duplicate where you'd call dropWhile.
Even better: Use span repeatedly:
def selectSame(messages: BufferedIterator[Int]) = {
val head = messages.head
messages.span(_ == head)
}
def iter(msgStream: BufferedIterator[Int]): Unit = if (!msgStream.isEmpty) {
val (msgs, rest) = selectSame(msgStream)
println(msgs.toList)
iter(rest)
}
val messageStream = List(1,1,1,2,2,3,3)
if (!messageStream.isEmpty) {
var lastTimeStamp = messageStream.head.timestamp
iter(messageStream0
}

Related

unable to convert a java.util.List into Scala list

I want that the if block returns Right(List[PracticeQuestionTags]) but I am not able to do so. The if/else returns Either
//I get java.util.List[Result]
val resultList:java.util.List[Result] = transaction.scan(scan);
if(resultList.isEmpty == false){
val listIterator = resultList.listIterator()
val finalList:List[PracticeQuestionTag] = List()
//this returns Unit. How do I make it return List[PracticeQuestionTags]
val answer = while(listIterator.hasNext){
val result = listIterator.next()
val convertedResult:PracticeQuestionTag = rowToModel(result) //rowToModel takes Result and converts it into PracticeQuestionTag
finalList ++ List(convertedResult) //Add to List. I assumed that the while will return List[PracticeQuestionTag] because it is the last statement of the block but the while returns Unit
}
Right(answer) //answer is Unit, The block is returning Right[Nothing,Unit] :(
} else {Left(Error)}
Change the java.util.List list to a Scala List as soon as possible. Then you can handle it in Scala fashion.
import scala.jdk.CollectionConverters._
val resultList = transaction.scan(scan).asScala.toList
Either.cond( resultList.nonEmpty
, resultList.map(rowToModel(_))
, new Error)
Your finalList: List[PracticeQuestionTag] = List() is immutable scala list. So you can not change it, meaning there is no way to add, remove or do change to this list.
One way to achieve this is by using scala functional approach. Another is using a mutable list, then adding to that and that list can be final value of if expression.
Also, a while expression always evaluates to Unit, it will never have any value. You can use while to create your answer and then return it seperately.
val resultList: java.util.List[Result] = transaction.scan(scan)
if (resultList.isEmpty) {
Left(Error)
}
else {
val listIterator = resultList.listIterator()
val listBuffer: scala.collection.mutable.ListBuffer[PracticeQuestionTag] =
scala.collection.mutable.ListBuffer()
while (listIterator.hasNext) {
val result = listIterator.next()
val convertedResult: PracticeQuestionTag = rowToModel(result)
listBuffer.append(convertedResult)
}
Right(listBuffer.toList)
}

Scala iterators are confusing

I tried very hard to understand why iterators are behaving like that. I mean after performing once
result = lines.filter(_.nonEmpty).map(_.toInt)
the iterator buffer is over written with all elemnets except the last element.
I mean if I have 5 elements in my input text file after giving 5 times
result = lines.filter(_.nonEmpty).map(_.toInt)
my iterator is becoming empty.
Any help is much appreciated.... Thanks in advance
The doc is very clear that you must discard an iterator after invoking any method except next and hasNext.
http://www.scala-lang.org/api/2.11.8/#scala.collection.Iterator
som-snytt is right here, but didn't explain what exactly was going on.
When you transform an iterator, you need to save the result of the transformation and only use that. In particular, calling filter on an iterator internally buffers it, which calls next on the original iterator and saves it in a head variable. If you call next on the buffered thing, you get 4. If you call next on the original iterator, you get 8: your first element is gone. If you'd instead written:
var result = lines.filter(_.nonEmpty).map(_.toInt)
var result = result.filter(_.nonEmpty).map(_.toInt)
var result = result.filter(_.nonEmpty).map(_.toInt)
You could repeat the last line as many times as you want without the iterator becoming empty, because you're always operating on the transformed iterator.
EDIT: to address the buffering comment -- here's the code for Iterator.filter:
def filter(p: A => Boolean): Iterator[A] = new AbstractIterator[A] {
private var hd: A = _
private var hdDefined: Boolean = false
def hasNext: Boolean = hdDefined || {
do {
if (!self.hasNext) return false
hd = self.next()
} while (!p(hd))
hdDefined = true
true
}
def next() = if (hasNext) { hdDefined = false; hd } else empty.next()
}
The hd and hdDefined variables perform exactly the same buffering that is used in Iterator.buffered.

Does scala have a lazy evaluating wrapper?

I want to return a wrapper/holder for a result that I want to compute only once and only if the result is actually used. Something like:
def getAnswer(question: Question): Lazy[Answer] = ???
println(getAnswer(q).value)
This should be pretty easy to implement using lazy val:
class Lazy[T](f: () => T) {
private lazy val _result = Try(f())
def value: T = _result.get
}
But I'm wondering if there's already something like this baked into the standard API.
A quick search pointed at Streams and DelayedLazyVal but neither is quite what I'm looking for.
Streams do memoize the stream elements, but it seems like the first element is computed at construction:
def compute(): Int = { println("computing"); 1 }
val s1 = compute() #:: Stream.empty
// computing is printed here, before doing s1.take(1)
In a similar vein, DelayedLazyVal starts computing upon construction, even requires an execution context:
val dlv = new DelayedLazyVal(() => 1, { println("started") })
// immediately prints out "started"
There's scalaz.Need which I think you'd be able to use for this.

Split an iterator by a predicate

I need a method that can split Iterator[Char] into lines (separated by \n and \r)
For that, I wrote a general method that gets an iterator and a predicate and will split the iterator every time the predicate is true.
This is similar to span, but will split every time the predicate is true, not only the first time
this is my implementation:
def iterativeSplit[T](iterO: Iterator[T])(breakOn: T => Boolean): Iterator[List[T]] =
new Iterator[List[T]] {
private var iter = iterO
def hasNext = iter.hasNext
def next = {
val (i1,i2) = iter.span(el => !breakOn(el))
val cur = i1.toList
iter = i2.dropWhile(breakOn)
cur
}
}.withFilter(l => l.nonEmpty)
and it works well on small inputs, but on larges inputs, this runs very slow, and sometimes I get stack overflow exception
here is the code that recreates the issue:
val iter = ("aaaaaaaaabbbbbbbbbbbccccccccccccc\r\n" * 10000).iterator
iterativeSplit(iter)(c => c == '\r' || c == '\n').length
the stack trace during the run is:
...
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:591)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:591)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:615)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:591)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
...
looking at the source code (I'm using scala 2.10.4)
line 591 is the hasNext of the second iterator from the span, and line 651 is the hasNext in the iterator from dropWhile
I guess I'm using those 2 iterators incorrectly, but I can't see why
You can simplify your code as follows, which seems to solve the problem:
def iterativeSplit2[T](iter: Iterator[T])(breakOn: T => Boolean): Iterator[List[T]] =
new Iterator[List[T]] {
def hasNext = iter.hasNext
def next = {
val cur = iter.takeWhile(!breakOn(_)).toList
iter.dropWhile(breakOn)
cur
}
}.withFilter(l => l.nonEmpty)
Rather than using span (so you need to replace iter on each call to next), simply use takeWhile and dropWhile on the original iter. Then there's no need for the var.
I think the cause of your original stack overflow is that repeatedly calling span creates a long chain of Iterators, each of whose hasNext methods calls the hasNext of its parent Iterator. If you look at the source code for Iterator, you can see that each span creates new Iterators that forward calls to hasNext to the original iterator (via a BufferedIterator, which increases the call stack even further).
Update having consulted the documentation it seems that, although my solution above appears to work, it is not recommended - see particularly:
It is of particular importance to note that, unless stated otherwise,
one should never use an iterator after calling a method on it.
[...] Using the old iterator is undefined, subject to change, and may result in changes to the new iterator as well.
which applies to takeWhile and dropWhile (and span), but not next or hasNext.
It's possible to use span as in your original solution, but using streams rather than iterators, and recursion:
def split3[T](s: Stream[T])(breakOn: T => Boolean): Stream[List[T]] = s match {
case Stream.Empty => Stream.empty
case s => {
val (a, b) = s.span(!breakOn(_))
a.toList #:: split3(b.dropWhile(breakOn))(breakOn)
}
}
But the performance is pretty terrible. I'm sure there must be a better way...
Update 2: Here is a very imperative solution that has better performance:
import scala.collection.mutable.ListBuffer
def iterativeSplit4[T](iter: Iterator[T])(breakOn: T => Boolean): Iterator[List[T]] =
new Iterator[List[T]] {
val word = new ListBuffer[T]
def hasNext() = iter.hasNext
def next = {
var looking = true
while (looking) {
val c = iter.next
if (breakOn(c)) looking = false
else word += c
}
val w = word.toList
word.clear()
w
}
}.withFilter(_.nonEmpty)

scala - lazy iterator calls next too many times?

I'm trying to build a lazy iterator that pulls from a blocking queue, and have encountered a weird problem where next() appears to be called more times than expected. Because my queue is blocking, this causes my application to get stuck in certain cases.
Some simplified sample code:
"infinite iterators" should {
def mkIter = new Iterable[Int] {
var i = 0
override def iterator: Iterator[Int] = {
new Iterator[Int] {
override def hasNext: Boolean = true
override def next(): Int = {
i = i + 1
i
}
}
}
override def toString(): String = "lazy"
}
"return subsets - not lazy" in {
val x = mkIter
x.take(2).toList must equal(List(1, 2))
x.take(2).toList must equal(List(3, 4))
}
"return subsets - lazy" in {
val x = mkIter
x.view.take(2).toList must equal(List(1, 2))
x.view.take(2).toList must equal(List(3, 4))
}
}
In the example above, the lazy test fails because the second call to take(2) returns List(4, 5).
Given that I see this behaviour with both Scala 2.10 and 2.11, I suspect the error is mine, but I'm not sure what I'm missing.
take invalidates iterators. See the code example at the top of http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.Iterator
As explained by #dlwh, Scala is explicitly documented to not allow reuse of an iterator after calling take(Int). That said, a way to implement my core use case is to create a new stream each time I want to get another element out of the iterator.
Adding to my example in the original question:
"return subsets - streams" in {
val x = mkIter
x.toStream.take(2).toList must equal(List(1, 2))
x.toStream.take(2).toList must equal(List(3, 4))
}
Note that toStream has the side effect of calling next() on the iterator, so this is only safe if you know you will be taking at least one item off of the stream. The advantage streams have over lazy views is that it will not call next() more than the minimum number of times needed.