dropWhile creates two iterators that have same underlying iterator? - scala

I am observing, a behavior I don't fully understand:
scala> val a = Iterator(1,2,3,4,5)
a: Iterator[Int] = non-empty iterator
scala> val b = a.dropWhile(_ < 3)
b: Iterator[Int] = non-empty iterator
scala> b.next
res9: Int = 3
scala> b.next
res10: Int = 4
scala> a.next
res11: Int = 5
It looks like: iterator part (1,2,3) of iterator a is consumed, and (4,5) is left. Since 3 had to be evaluated it had to be consumed but by definition of dropWhile in has to be included in b. Iterator b is 3, (4,5) where (4,5) is whatever is left of a, the exactly same iterator. Is my understanding correct?
Given the above it looks quite dangerous, that behavior of b is altered by applying operations on a. Basically we have two objects pointing to the same location. Is using dropWhile like this bad style?

From the documentation for Iterator:
It is of particular importance to note that, unless stated otherwise, one should never use an iterator after calling a method on it. The two most important exceptions are also the sole abstract methods: next and hasNext.
Basically, once you called any method on an iterator, other than next and hasNext, you should consider it destroyed, and dispose of it.

Is using dropWhile like this bad style?
yes :-)

Related

First Element of a Lazy Stream in Scala

Here is a minimal example, I can define a function that gives my the next integer via
def nextInteger(input: Int): Int = input+1
I can then define a lazy stream of integers as
lazy val integers: Stream[Int] = 1 #:: integers map(x=>nextInteger(x))
To my surprise, taking the first element of this stream is 2 and not 1
scala> integers
res21: Stream[Int] = Stream(2, ?)
In this simple example I can achieve my desired result using 0 instead of 1 in the definition of integers, but how can one in general set up a stream such that the initial value isn't lost? In my case I am setting up an iterative algorithm and will want to know the initial value.
EDIT:
Furthermore, I've never understood the design choice which makes the following syntax fail:
scala> (integers take 10 toList) last
res27: Int = 11
scala> integers take 10 toList last
<console>:24: error: not found: value last
integers take 10 toList last
^
I find wrapping things in brackets cumbersome, is there a shorthand I am not aware of?
You're probably thinking that 1 #:: integers map(x=>nextInteger(x)) is parsed as 1 #:: (integers map(x=>nextInteger(x))) while it is actually parsed as (1 #:: integers).map(x=>nextInteger(x)). Adding parens fixes your problem:
val integers: Stream[Int] = 1 #:: (integers map nextInteger)
(Notice that since nextInteger is just a function, you don't need to make a lambda for it, and since Stream is already lazy, making integers lazy is unnecessary)
As to your edit, check out this excellent answer on the matter. In short: no there is no easy way. The thing is that unless you already know the arity of the functions involved, having something like what you suggest work would be hell for the next person reading your code... For example,
myList foo bar baz
Might be be myList.foo.bar.baz as well as myList.foo(bar).baz and you wouldn't know without checking the definitions of foo, bar, and baz. Scala decides to eliminate this ambiguity - it is always the latter.

Why does method "combinations" return Iterator rather than Stream in Scala?

I noticed that method combinations (from here) returns Iterator. It looks reasonable that the method should be "lazy" to avoid generating all combinations in advance. Now I wonder, why it returns Iterator instead of Stream (which is a lazy list in Scala).
So, why does combinations return Iterator rather than Stream ?
With Stream it is more likely that all generated values will be held in memory.
scala> val s = Stream.iterate(0)(_ + 1)
s: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> s.drop(3).head
res1: Int = 3
scala> s
res2: scala.collection.immutable.Stream[Int] = Stream(0, 1, 2, 3, ?)
When you retain a reference to your Stream, all generated elements will remain in memory. With an Iterator this is less likely to happen.
Of course this does not have to be the reason why the Scala library is designed the way it is...
I think because Stream caches all previously returned elements, so you would end up with all of them in memory. Iterator only returns the next one
As I can see on CombinationsItr code, you have a lazy evaluation each time you call next method.
See https://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_9_1_final/src//library/scala/collection/SeqLike.scala#L198
So when using next to get the next combination like this for example :
scala> val combinations = "azertyuiop".combinations(2)
scala> combinations.next
res9: String = az
the result of next is evaluated lazily.
Iterator is cheaper than Stream. You can always get the latter from the former, if you need.

How to clone an iterator?

Suppose I have an iterator:
val it = List("a","b","c").iterator
I want a copy of it; my code is:
val it2 = it.toList.iterator
It's correct, but seems not good. Is there any other API to do it?
The method you are looking for is duplicate.
scala> val it = List("a","b","c").iterator
it: Iterator[java.lang.String] = non-empty iterator
scala> val (it1,it2) = it.duplicate
it1: Iterator[java.lang.String] = non-empty iterator
it2: Iterator[java.lang.String] = non-empty iterator
scala> it1.length
res11: Int = 3
scala> it2.mkString
res12: String = abc
Warning: as of Scala 2.9.0, at least, this leaves the original iterator empty. You can val ls = it.toList; val it1 = ls.iterator; val it2 = ls.iterator to get two copies. Or use duplicate (which works for non-lists also).
Rex's answer is by the book, but in fact your original solution is by far the most efficient for scala.collection.immutable.List's.
List iterators can be duplicated using that mechanism with essentially no overhead. This can be confirmed by a quick review of the implementation of iterator() in scala.collection.immutable.LinearSeq, esp. the definition of the toList method, which simply returns the _.toList of the backing Seq which, if it's a List (as it is in your case) is the identity.
I wasn't aware of this property of List iterators before investigating your question, and I'm very grateful for the information ... amongst other things it means that many "list pebbling" algorithms can be implemented efficiently over Scala immutable Lists using Iterators as the pebbles.

When is one Set less than another in Scala?

I wanted to compare the cardinality of two sets in Scala. Since stuff sometimes "just work" in Scala, I tried using < between the sets. It seems to go through, but I can't make any sense out of the result.
Example:
scala> Set(1,2,3) < Set(1,4)
res20: Boolean = true
What does it return?
Where can I read about this method in the API?
Why isn't it listed anywhere under scala.collection.immutable.Set?
Update: Even the order(??) of the elements in the sets seem to matter:
scala> Set(2,3,1) < Set(1,3)
res24: Boolean = false
scala> Set(1,2,3) < Set(1,3)
res25: Boolean = true
This doesn't work with 2.8. On Scala 2.7, what happens is this:
scala.Predef.iterable2ordered(Set(1, 2, 3): Iterable[Int]) < (Set(1, 3, 2): Iterable[Int])
In other words, there's an implicit conversion defined on scala.Predef, which is "imported" for all Scala code, from an Iterable[A] to an Ordered[Iterable[A]], provided there's an implicit A => Ordered[A] available.
Given that the order of an iterable for sets is undefined, you can't really predict much about it. If you add elements to make the set size bigger than four, for instance, you'll get entirely different results.
If you want to compare the cardinality, just do so directly:
scala> Set(1, 2, 3).size < Set(2, 3, 4, 5).size
res0: Boolean = true
My knowledge of Scala is not extensive, but doing some test, I get the following:
scala> Set(1,2) <
<console>:5: error: missing arguments for method < in trait Ordered;
follow this method with `_' if you want to treat it as a partially applied function
Set(1,2) <
^
That tells me that < comes from the trait Ordered. More hints:
scala> Set(1,2) < _
res4: (Iterable[Int]) => Boolean = <function>
That is, the Set is evaluated into an Iterable, because maybe there is some implicit conversion from Iterable[A] to Ordered[Iterable[A]], but I'm not sure anymore... Tests are not consistent. For example, these two might suggest a kind of lexicographical compare:
scala> Set(1,2,3) < Set(1,2,4)
res5: Boolean = true
1 is equal, 2 is equal, 3 is less than 4.
scala> Set(1,2,4) < Set(1,2,3)
res6: Boolean = false
But these ones don't:
scala> Set(2,1) < Set(2,4)
res11: Boolean = true
scala> Set(2,1) < Set(2,2)
res12: Boolean = false
I think the correct answer is that found in the Ordered trait proper: There is no implementation for < between sets more than comparing their hashCode:
It is important that the hashCode method for an instance of Ordered[A] be consistent with the compare method. However, it is not possible to provide a sensible default implementation. Therefore, if you need to be able compute the hash of an instance of Ordered[A] you must provide it yourself either when inheiriting or instantiating.

How to do something like this in Scala?

Sorry for the lack of a descriptive title; I couldn't think of anything better. Edit it if you think of one.
Let's say I have two Lists of Objects, and they are always changing. They need to remain as separate lists, but many operations have to be done on both of them. This leads me to doing stuff like:
//assuming A and B are the lists
A.foo(params)
B.foo(params)
In other words, I'm doing the exact same operation to two different lists at many places in my code. I would like a way to reduce them down to one list without explicitly having to construct another list. I know that just combining lists A and b into a list C would solve all my problems, but then we'd just be back to the same operation if I needed to add a new object to the list (because I'd have to add it to C as well as its respective list).
It's in a tight loop and performance is very important. Is there any way to construct an iterator or something that would iterate A and then move on to B, all transparently? I know another solution would be to construct the combined list (C) every time I'd like to perform some kind of function on both of these lists, but that is a huge waste of time (computationally speaking).
Iterator is what you need here. Turning a List into an Iterator and concatenating 2 Iterators are both O(1) operations.
scala> val l1 = List(1, 2, 3)
l1: List[Int] = List(1, 2, 3)
scala> val l2 = List(4, 5, 6)
l2: List[Int] = List(4, 5, 6)
scala> (l1.iterator ++ l2.iterator) foreach (println(_)) // use List.elements for Scala 2.7.*
1
2
3
4
5
6
I'm not sure if I understand what's your meaning.
Anyway, this is my solution:
scala> var listA :List[Int] = Nil
listA: List[Int] = List()
scala> var listB :List[Int] = Nil
listB: List[Int] = List()
scala> def dealWith(op : List[Int] => Unit){ op(listA); op(listB) }
dealWith: ((List[Int]) => Unit)Unit
and then if you want perform a operator in both listA and listB,you can use like following:
scala> listA ::= 1
scala> listB ::= 0
scala> dealWith{ _ foreach println }
1
0