What is the difference between List.view and LazyList? - scala

I am new to Scala and I just learned that LazyList was created to replace Stream, and at the same time they added the .view methods to all collections.
So, I am wondering why was LazyList added to Scala collections library, when we can do List.view?
I just looked at the Scaladoc, and it seems that the only difference is that LazyList has memoization, while View does not. Am I right or wrong?

Stream elements are realized lazily except for the 1st (head) element. That was seen as a deficiency.
A List view is re-evaluated lazily but, as far as I know, has to be completely realized first.
def bang :Int = {print("BANG! ");1}
LazyList.fill(4)(bang) //res0: LazyList[Int] = LazyList(<not computed>)
Stream.fill(3)(bang) //BANG! res1: Stream[Int] = Stream(1, <not computed>)
List.fill(2)(bang).view //BANG! BANG! res2: SeqView[Int] = SeqView(<not computed>)

In 2.13, you can't force your way back from a view to the original collection type:
scala> case class C(n: Int) { def bump = new C(n+1).tap(i => println(s"bump to $i")) }
defined class C
scala> List(C(42)).map(_.bump)
bump to C(43)
res0: List[C] = List(C(43))
scala> List(C(42)).view.map(_.bump)
res1: scala.collection.SeqView[C] = SeqView(<not computed>)
scala> .force
^
warning: method force in trait View is deprecated (since 2.13.0): Views no longer know about their underlying collection type; .force always returns an IndexedSeq
bump to C(43)
res2: scala.collection.IndexedSeq[C] = Vector(C(43))
scala> LazyList(C(42)).map(_.bump)
res3: scala.collection.immutable.LazyList[C] = LazyList(<not computed>)
scala> .force
bump to C(43)
res4: res3.type = LazyList(C(43))
A function taking a view and optionally returning a strict realization would have to also take a "forcing function" such as _.toList, if the caller needs to choose the result type.
I don't do this sort of thing at my day job, but this behavior surprises me.

The difference is that LazyList can be generated from huge/infinite sequence, so you can do something like:
val xs = (1 to 1_000_000_000).to(LazyList)
And that won't run out of memory. After that you can operate on the lazy list with transformers. You won't be able to do the same by creating a List and taking a view from it. Having said that, SeqView has a much reacher set of methods compared to LazyList and that's why you can actually take a view of a LazyList like:
val xs = (1 to 1_000_000_000).to(LazyList)
val listView = xs.view

Related

Conversion of breakOut - use iterator or view?

Scala 2.13 migration guide contains a note regarding how to port collection.breakOut:
collection.breakOut no longer exists, use .view and .to(Collection) instead.
and few paragraphs below in a overview table there is:
Description
Old Code
New Code
Automatic Migration Rule
collection.breakOutno longer exists
val xs: List[Int]= ys.map(f)(collection.breakOut)
val xs =ys.iterator.map(f).to(List)
Collection213Upgrade
The scala-collection-migration rewrite rule uses .iterator. What is the difference between the two? Is there a reason to prefer one to the other?
When used like that there is no real difference.
A View can be reused while an Iterator must be discarded after it's been used once.
val list = List(1,2,3,4,5)
val view = list.view
val viewPlus1 = view.map(_ + 1).toList
view.foreach(println) // works as expected
val it = list.iterator
val itPlus1 = it.map(_ + 1).toList
it.foreach(println) // undefined behavior
In its simplest form a View[A] is a wrapper around a function () => Iterator[A], so all its methods can create a fresh Iterator[A] and delegate to the appropriate method on that iterator.

Scala ListBuffer add-all behavior is not consistent, some elements are lost

I came across a Scala collection behavior that somewhat dubious. Is this an expected behavior?
Following is a simplified code to reproduce the issue.
import scala.collection.mutable.{ Map => MutableMap }
import scala.collection.mutable.ListBuffer
val relationCache = MutableMap.empty[String, String]
val relationsToFlush = new ListBuffer[String]()
def addRelation(relation: String) = relationCache(relation) = relation
Range(0,170).map("string-#" + _).foreach(addRelation(_))
val relations = relationCache.values.toSeq /* Bad */
// val relations = relationCache.map(_._2).toSeq /* Good */
relationCache.clear
relationsToFlush ++= relations
relationsToFlush.size
Has two collections, mutable map (relationCache) and mutable list (relationsToFlush). relationCache takes elements and at later point it should be transferred to relationsToFlush and the cache should be cleared up.
However, not all elements transferred to relationsToFlush, output as below:
scala> relationsToFlush ++= relCache
res14: relationsToFlush.type = ListBuffer(string-#80, string-#27)
scala> relationsToFlush.size
res15: Int = 2
Where else if the code changed to
val relations = relationCache.map(_._2).toSeq /* Good */
Then we get the expected result (170 elements)
My guess is 'good' code creates new mutable list with those element while the other returns directly from map, hence its lost when clear is called on map. However, shouldn't the reference count gets bumped up when it returns to relations variable?
Scala Version: 2.11
You've stumbled across one of the vagaries of the Seq trait.
Since Seq is a trait, and not a class, it's not really a collection type distinct from the others, leading some to refer to it as a failed abstraction.
Consider the following REPL session. (Scala 2.12.7)
scala> List(1,2,3).toSeq
res4: scala.collection.immutable.Seq[Int] = List(1, 2, 3)
scala> Stream(1,2,3).toSeq
res5: scala.collection.immutable.Seq[Int] = Stream(1, ?)
scala> Vector(1,2,3).toSeq
res6: scala.collection.immutable.Seq[Int] = Vector(1, 2, 3)
Notice how the underlying collection type is retained. In particular the lazy Stream has not realized all its elements. That's what's happening here:
relationCache.values.toSeq
The toSeq transformation returns a Stream and nothing thereafter is forcing the realization of the rest of the elements.
The problem is combining lazy evaluation with mutable data structures.
The value of relations is lazy and is not computed until it is used. Since it is based on a mutable.Map collection, the results will be based on whatever that mutable.Map has at the time relations is first used.
To complicate things, relations is actually a Stream which means that those values are locked the first time they are read, meaning the subsequent changes to the mutable.Map will not affect that value of relations.
The simple fix is to use toList rather than toSeq, because List is not a lazy collection and will be evaluated immediately.

Scala collections: transform content and type of the collection in one pass

I am looking for a way to efficiently transform content and type of the collection.
For example apply map to a Set and get result as List.
Note that I want to build result collection while applying transformation to source collection (i.e. without creating intermediate collection and then transforming it to desired type).
So far I've come up with this (for Set being transformed into List while incrementing every element of the set):
val set = Set(1, 2, 3)
val cbf = new CanBuildFrom[Set[Int], Int, List[Int]] {
def apply(from: Set[Int]): Builder[Int, List[Int]] = List.newBuilder[Int]
def apply(): Builder[Int, List[Int]] = List.newBuilder[Int]
}
val list: List[Int] = set.map(_ + 1)(cbf)
... but I feel like there should be more short and elegant way to do this (without manually implementing CanBuildFrom every time when I need to do this).
Any ideas how to do this?
This is exactly what scala.collection.breakOut is for—it'll conjure up the CanBuildFrom you need if you tell it the desired new collection type:
import scala.collection.breakOut
val xs: List[Int] = Set(1, 2, 3).map(_ + 1)(breakOut)
This will only traverse the collection once. See this answer for more detail.
I think that this might do what you want, though I am not 100% positive so maybe someone could confirm:
(Set(1, 2, 3).view map (_ + 1)).toList
view creates, in this case, an IterableView, which, when you call map on it, does not actually perform the mapping. When toList is called, the map is performed and a list is built with the results.

Best practice: "If not immutable create copy"-pattern

I have function which gets a Seq[_] as an argument and returns an immutable class instance with this Seq as a val member. If the Seq is mutable I obviously want to create a defensive copy to guarantee that my return class instance cannot be modified.
What are the best practice for this pattern? First I was surprised that it is not possible to overload the function
def fnc(arg: immutable.Seq[_]) = ...
def fnc(arg: mutable.Seq[_]) = ...
I could also pattern-match:
def fnc(arg: Seq[_]) = arg match {
case s: immutable.Seq[_] => { println("immutable"); s}
case s: mutable.Seq[_] => {println("mutable"); List()++s }
case _: ?
}
But I am not sure about the _ case. Is it guaranteed that arg is immutable.Seq or mutable.Seq? I also don't know if List()++s is the correct way to convert it. I saw many posts on SO, but most of them where for 2.8 or earlier.
Are the Scala-Collections "intelligent" enough that I can just always (without pattern matching) write List()++s and I get the same instance if immutable and a deep copy if mutable?
What is the recommend way to do this?
You will need to pattern match if you want to support both,. The code for Seq() ++ does not guarantee (as part of its API) that it won't copy the rest if it's immutable:
scala> val v = Vector(1,2,3)
v: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3)
scala> Seq() ++ v
res1: Seq[Int] = List(1, 2, 3)
It may pattern-match itself for some special cases, but you know the cases you want. So:
def fnc[A](arg: Seq[A]): Seq[A] = arg match {
case s: collection.immutable.Seq[_] => arg
case _ => Seq[A]() ++ arg
}
You needn't worry about the _; this just says you don't care exactly what the type argument is (not that you could check anyway), and if you write it this way, you don't: pass through if immutable, otherwise copy.
What are the best practice for this pattern?
If you want to guarantee immutability, the best practice is to make a defensive copy, or require immutable.Seq.
But I am not sure about the _ case. Is it guaranteed that arg is immutable.Seq or mutable.Seq?
Not necessarily, but I believe every standard library collection that inherits from collection.Seq also inherits from one of those two. A custom collection, however, could theoretically inherit from just collection.Seq. See Rex's answer for an improvement on your pattern-matching solution.
Are the Scala-Collections "intelligent" enough that I can just always (without pattern matching) write List()++s and I get the same instance if immutable and a deep copy if mutable?
It appears they are in certain cases but not others, for example:
val immutableSeq = Seq[Int](0, 1, 2)
println((Seq() ++ immutableSeq) eq immutableSeq) // prints true
val mutableSeq = mutable.Seq[Int](0, 1, 2)
println((Seq() ++ mutableSeq) eq mutableSeq) // prints false
Where eq is reference equality. Note that the above also works with List() ++ s, however as Rex pointed out, it does not work for all collections, like Vector.
You certainly can overload in that way! E.g., this compiles fine:
object MIO
{
import collection.mutable
def f1[A](s: Seq[A]) = 23
def f1[A](s: mutable.Seq[A]) = 42
def f2(s: Seq[_]) = 19
def f2(s: mutable.Seq[_]) = 37
}
In the REPL:
Welcome to Scala version 2.10.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_37).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import rrs.scribble.MIO._; import collection.mutable.Buffer
import rrs.scribble.MIO._
import collection.mutable.Buffer
scala> f1(List(1, 2, 3))
res0: Int = 23
scala> f1(Buffer(1, 2, 3))
res1: Int = 42
scala> f2(List(1, 2, 3))
res2: Int = 19
scala> f2(Buffer(1, 2, 3))
res3: Int = 37

How to use mutable collections in Scala

I think I may be failing to understand how mutable collections work. I would expect mutable collections to be affected by applying map to them or adding new elements, however:
scala> val s: collection.mutable.Seq[Int] = collection.mutable.Seq(1)
s: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
scala> s :+ 2 //appended an element
res32: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2)
scala> s //the original collection is unchanged
res33: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
scala> s.map(_.toString) //mapped a function to it
res34: scala.collection.mutable.Seq[java.lang.String] = ArrayBuffer(1)
scala> s //original is unchanged
res35: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
//maybe mapping a function that changes the type of the collection shouldn't work
//try Int => Int
scala> s.map(_ + 1)
res36: scala.collection.mutable.Seq[Int] = ArrayBuffer(2)
scala> s //original unchanged
res37: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
This behaviour doesn't seem to be separate from the immutable collections, so when do they behave separately?
For both immutable and mutable collections, :+ and +: create new collections. If you want mutable collections that automatically grow, use the += and +=: methods defined by collection.mutable.Buffer.
Similarly, map returns a new collection — look for transform to change the collection in place.
map operation applies the given function to all the elements of collection, and produces a new collection.
The operation you are looking for is called transform. You can think of it as an in-place map except that the transformation function has to be of type a -> a instead of a -> b.
scala> import collection.mutable.Buffer
import collection.mutable.Buffer
scala> Buffer(6, 3, 90)
res1: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 3, 90)
scala> res1 transform { 2 * }
res2: res1.type = ArrayBuffer(12, 6, 180)
scala> res1
res3: scala.collection.mutable.Buffer[Int] = ArrayBuffer(12, 6, 180)
The map method never modifies the collection on which you call it. The type system wouldn't allow such an in-place map implementation to exist - unless you changed its type signature, so that on some type Collection[A] you could only map using a function of type A => A.
(Edit: as other answers have pointed out, there is such a method called transform!)
Because map creates a new collection, you can go from a Collection[A] to a Collection[B] using a function A => B, which is much more useful.
As others have noted, the map and :+ methods return a modified copy of the collection. More generally, all methods defined in collections from the scala.collection package will never modify a collection in place even when the dynamic type of the collection is from scala.collection.mutable. To modify a collection in place, look for methods defined in scala.collection.mutable._ but not in scala.collection._.
For example, :+ is defined in scala.collection.Seq, so it will never perform in-place modification even when the dynamic type is a scala.collection.mutable.ArrayBuffer. However, +=, which is defined in scala.collection.mutable.ArrayBuffer and not in scala.collection.Seq, will modify the collection in place.