What is the difference between the methods iterator and view? - scala

scala> (1 to 10).iterator.map{_ * 2}.toList
res1: List[Int] = List(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
scala> (1 to 10).view.map{_ * 2}.force
res2: Seq[Int] = Vector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
Other than using next,hasNext, when should I choose iterator over view or view over iterator?

There's a huge difference between iterators and views. Iterators are use once only, compute on demand, while views are use multiple times, recompute each time, but only the elements needed. For instance:
scala> val list = List(1,2,3).map{x => println(x); x * 2}
1
2
3
list: List[Int] = List(2, 4, 6)
scala> list(2)
res14: Int = 6
scala> list(2)
res15: Int = 6
scala> val view = List(1,2,3).view.map{x => println(x); x * 2}
view: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)
scala> view(2)
3
res12: Int = 6
scala> view(2)
3
res13: Int = 6
scala> val iterator = List(1,2,3).iterator.map{x => println(x); x * 2}
iterator: Iterator[Int] = non-empty iterator
scala> iterator.drop(2).next
1
2
3
res16: Int = 6
scala> iterator.drop(2).next
[Iterator.next] (Iterator.scala:29)
(access lastException for the full trace)

view produces a lazy collection/stream. It's main charm is that it won't try and build the whole collection. This could prevent a OutOfMemoryError or improve performance when you only need the first few items in the collection. iterator makes no such guarantee.
One more thing. At least on Range, view returns a SeqView, which is a sub-type of Seq, so you can go back or start again from the beginning and do all that fun sequency stuff.
I guess the difference between an iterator and a view is a matter of in-front and behind. Iterators are expected to release what has been seen. Once next has been called, the previous is, hopefully, let go. Views are the converse. They promise to not acquire what has not been requested. If you have a view of all prime numbers, an infinite set, it has only acquired those primes you've asked for. It you wanted the 100th, 101 shouldn't be taking up memory yet.

This page talks about when to use views.
In summary, views are a powerful tool to reconcile concerns of
efficiency with concerns of modularity. But in order not to be
entangled in aspects of delayed evaluation, you should restrict views
to two scenarios. Either you apply views in purely functional code
where collection transformations do not have side effects. Or you
apply them over mutable collections where all modifications are done
explicitly. What's best avoided is a mixture of views and operations
that create new collections while also having side effects.

Related

Does scala collection's flatten keep order?

I have looked at the documentation of flatten and the example seems to indicate the order of the elements in the result maintains the order of the input. Is there a documentation or source code we can reference to make sure that it is the case? Or, is the documentation "Converts this collection of traversable collections into a collection formed by the elements of these traversable collections" enough to confirm this?
Update: My original question was not clear enough. I wanted to ask about the collections that maintain order internally (like List) and we use the default implicit traversable in flatten(). prayagupd has answered this question.
If you read the flatten function of scala 2.12.x, you can see it sequentially adding given inputs to a new collection.
//a sequential view of the collection
private def sequential: TraversableOnce[A] = this.asInstanceOf[GenTraversableOnce[A]].seq
def flatten[B](implicit asTraversable: A => /*<:<!!!*/ GenTraversableOnce[B]): CC[B] = {
val b = genericBuilder[B]
for (xs <- sequential)
b ++= asTraversable(xs).seq
b.result()
}
you can verify with example as well,
scala> List(List("order1", "order2"), List("order10", "order11")).flatten
res1: List[String] = List(order1, order2, order10, order11)
The order remains same even if you provide your own traversable,
scala> val asTraversable: List[String] => List[String] = list => list.map(elem => s"mutated $elem")
asTraversable: List[String] => List[String] = $$Lambda$1271/1988351538#513bec8c
scala> List(List("order1", "order2"), List("order10", "order11")).flatten(asTraversable)
res2: List[String] = List(mutated order1, mutated order2, mutated order10, mutated order11)
NOTE: above only applies to underlying data-structure that maintain the order.
For example Set does not maintain order
scala> Set(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).seq
res3: scala.collection.immutable.Set[Int] = Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)

How to efficiently delete all elements from ListBuffer in Scala?

I have a ListBuffer with thousand elements. After program has done calculations I want to fill it with new data. Is there a way like in C with free() to empty it? Or is it a good way to assign null to my ListBuffer and garbage collector will do all the work?
The method clear does just that.
scala> val xs = scala.collection.mutable.ListBuffer(1,2,3,4,5)
xs: scala.collection.mutable.ListBuffer[Int] = ListBuffer(1, 2, 3, 4, 5)
scala> xs.clear()
scala> xs
res2: scala.collection.mutable.ListBuffer[Int] = ListBuffer()

Zip two Arrays, always 3 elements of the first array then 2 elements of the second

I've manually built a method that takes 2 arrays and combines them to 1 like this:
a0,a1,a2,b0,b1,a3,a4,a5,b2,b3,a6,...
So I always take 3 elements of the first array, then 2 of the second one.
As I said, I built that function manually.
Now I guess I could make this a one-liner instead with the help of zip. The problem is, that zip alone is not enough as zip builds tuples like (a0, b0).
Of course I can flatMap this, but still not what I want:
val zippedArray: List[Float] = data1.zip(data2).toList.flatMap(t => List(t._1, t._2))
That way I'd get a List(a0, b0, a1, b1,...), still not what I want.
(I'd then use toArray for the list... it's more convenient to work with a List right now)
I thought about using take and drop but they return new data-structures instead of modifying the old one, so not really what I want.
As you can imagine, I'm not really into functional programming (yet). I do use it and I see huge benefits, but some things are so different to what I'm used to.
Consider grouping array a by 3, and array b by 2, namely
val a = Array(1,2,3,4,5,6)
val b = Array(11,22,33,44)
val g = (a.grouped(3) zip b.grouped(2)).toArray
Array((Array(1, 2, 3),Array(11, 22)), (Array(4, 5, 6),Array(33, 44)))
Then
g.flatMap { case (x,y) => x ++ y }
Array(1, 2, 3, 11, 22, 4, 5, 6, 33, 44)
Very similar answer to #elm but I wanted to show that you can use more lazy approach (iterator) to avoid creating temp structures:
scala> val a = List(1,2,3,4,5,6)
a: List[Int] = List(1, 2, 3, 4, 5, 6)
scala> val b = List(11,22,33,44)
b: List[Int] = List(11, 22, 33, 44)
scala> val groupped = a.sliding(3, 3) zip b.sliding(2, 2)
groupped: Iterator[(List[Int], List[Int])] = non-empty iterator
scala> val result = groupped.flatMap { case (a, b) => a ::: b }
result: Iterator[Int] = non-empty iterator
scala> result.toList
res0: List[Int] = List(1, 2, 3, 11, 22, 4, 5, 6, 33, 44)
Note that it stays an iterator all the way until we materialize it with toList

Using Stream as a substitute for loop

scala> val s = for(i <- (1 to 10).toStream) yield { println(i); i }
1
s: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> s.take(8).toList
2
3
4
5
6
7
8
res15: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
scala> s.take(8).toList
res16: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
Looks like Scala is storing stream values in memory to optimize access. Which means that Stream can't be used as a substitute for imperative loop as it will allocate memory. Things like (for(i <- (1 to 1000000).toStream, j <- (1 to 1000000).toStream) yield ...).reduceLeft(...) will fail because of memory. Was it wrong idea to use streams this way?
You should use Iterator instead of Stream if you want a loop-like collection construct. Stream is specifically for when you want to store past results. If you don't, then, for instance:
scala> Iterator.from(1).map(x => x*x).take(10).sum
res23: Int = 385
Methods continually, iterate, and tabulate are among the more useful ones in this regard (in the Iterator companion object).

Scala: Can I rely on the order of items in a Set?

This was quite an unplesant surprise:
scala> Set(1, 2, 3, 4, 5)
res18: scala.collection.immutable.Set[Int] = Set(4, 5, 1, 2, 3)
scala> Set(1, 2, 3, 4, 5).toList
res25: List[Int] = List(5, 1, 2, 3, 4)
The example by itself suggest a "no" answer to my question. Then what about ListSet?
scala> import scala.collection.immutable.ListSet
scala> ListSet(1, 2, 3, 4, 5)
res21: scala.collection.immutable.ListSet[Int] = Set(1, 2, 3, 4, 5)
This one seems to work, but should I rely on this behavior?
What other data structure is suitable for an immutable collection of unique items, where the original order must be preserved?
By the way, I do know about distict method in List. The problem is, I want to enforce uniqueness of items (while preserving the order) at interface level, so using distinct would mess up my neat design..
EDIT
ListSet doesn't seem very reliable either:
scala> ListSet(1, 2, 3, 4, 5).toList
res28: List[Int] = List(5, 4, 3, 2, 1)
EDIT2
In my search for a perfect design I tried this:
scala> class MyList[A](list: List[A]) { val values = list.distinct }
scala> implicit def toMyList[A](l: List[A]) = new MyList(l)
scala> implicit def fromMyList[A](l: MyList[A]) = l.values
Which actually works:
scala> val l1: MyList[Int] = List(1, 2, 3)
scala> l1.values
res0: List[Int] = List(1, 2, 3)
scala> val l2: List[Int] = new MyList(List(1, 2, 3))
l2: List[Int] = List(1, 2, 3)
The problem, however, is that I do not want to expose MyList outside the library. Is there any way to have the implicit conversion when overriding? For example:
trait T { def l: MyList[_] }
object O extends T { val l: MyList[_] = List(1, 2, 3) }
scala> O.l mkString(" ") // Let's test the implicit conversion
res7: String = 1 2 3
I'd like to do it like this:
object O extends T { val l = List(1, 2, 3) } // Doesn't work
That depends on the Set you are using. If you do not know which Set implementation you have, then the answer is simply, no you cannot be sure. In practice I usually encounter the following three cases:
I need the items in the Set to be ordered. For this I use classes mixing in the SortedSet trait which when you use only the Standard Scala API is always a TreeSet. It guarantees the elements are ordered according to their compareTo method (see the Ordered trat). You get a (very) small performance penalty for the sorting as the runtime of inserts/retrievals is now logarithmic, not (almost) constant like with the HashSet (assuming a good hash function).
You need to preserve the order in which the items are inserted. Then you use the LinkedHashSet. Practically as fast as the normal HashSet, needs a little more storage space for the additional links between elements.
You do not care about order in the Set. So you use a HashSet. (That is the default when using the Set.apply method like in your first example)
All this applies to Java as well, Java has a TreeSet, LinkedHashSet and HashSet and the corresponding interfaces SortedSet, Comparable and plain Set.
It is my belief that you should never rely on the order in a set. In no language.
Apart from that, have a look at this question which talks about this in depth.
ListSet will always return elements in the reverse order of insertion because it is backed by a List, and the optimal way of adding elements to a List is by prepending them.
Immutable data structures are problematic if you want first in, first out (a queue). You can get O(logn) or amortized O(1). Given the apparent need to build the set and then produce an iterator out of it (ie, you'll first put all elements, then you'll remove all elements), I don't see any way to amortize it.
You can rely that a ListSet will always return elements in last in, first out order (a stack). If that suffices, then go for it.