How to use mutable collections in Scala - scala

I think I may be failing to understand how mutable collections work. I would expect mutable collections to be affected by applying map to them or adding new elements, however:
scala> val s: collection.mutable.Seq[Int] = collection.mutable.Seq(1)
s: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
scala> s :+ 2 //appended an element
res32: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2)
scala> s //the original collection is unchanged
res33: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
scala> s.map(_.toString) //mapped a function to it
res34: scala.collection.mutable.Seq[java.lang.String] = ArrayBuffer(1)
scala> s //original is unchanged
res35: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
//maybe mapping a function that changes the type of the collection shouldn't work
//try Int => Int
scala> s.map(_ + 1)
res36: scala.collection.mutable.Seq[Int] = ArrayBuffer(2)
scala> s //original unchanged
res37: scala.collection.mutable.Seq[Int] = ArrayBuffer(1)
This behaviour doesn't seem to be separate from the immutable collections, so when do they behave separately?

For both immutable and mutable collections, :+ and +: create new collections. If you want mutable collections that automatically grow, use the += and +=: methods defined by collection.mutable.Buffer.
Similarly, map returns a new collection — look for transform to change the collection in place.

map operation applies the given function to all the elements of collection, and produces a new collection.
The operation you are looking for is called transform. You can think of it as an in-place map except that the transformation function has to be of type a -> a instead of a -> b.
scala> import collection.mutable.Buffer
import collection.mutable.Buffer
scala> Buffer(6, 3, 90)
res1: scala.collection.mutable.Buffer[Int] = ArrayBuffer(6, 3, 90)
scala> res1 transform { 2 * }
res2: res1.type = ArrayBuffer(12, 6, 180)
scala> res1
res3: scala.collection.mutable.Buffer[Int] = ArrayBuffer(12, 6, 180)

The map method never modifies the collection on which you call it. The type system wouldn't allow such an in-place map implementation to exist - unless you changed its type signature, so that on some type Collection[A] you could only map using a function of type A => A.
(Edit: as other answers have pointed out, there is such a method called transform!)
Because map creates a new collection, you can go from a Collection[A] to a Collection[B] using a function A => B, which is much more useful.

As others have noted, the map and :+ methods return a modified copy of the collection. More generally, all methods defined in collections from the scala.collection package will never modify a collection in place even when the dynamic type of the collection is from scala.collection.mutable. To modify a collection in place, look for methods defined in scala.collection.mutable._ but not in scala.collection._.
For example, :+ is defined in scala.collection.Seq, so it will never perform in-place modification even when the dynamic type is a scala.collection.mutable.ArrayBuffer. However, +=, which is defined in scala.collection.mutable.ArrayBuffer and not in scala.collection.Seq, will modify the collection in place.

Related

Scala ListBuffer add-all behavior is not consistent, some elements are lost

I came across a Scala collection behavior that somewhat dubious. Is this an expected behavior?
Following is a simplified code to reproduce the issue.
import scala.collection.mutable.{ Map => MutableMap }
import scala.collection.mutable.ListBuffer
val relationCache = MutableMap.empty[String, String]
val relationsToFlush = new ListBuffer[String]()
def addRelation(relation: String) = relationCache(relation) = relation
Range(0,170).map("string-#" + _).foreach(addRelation(_))
val relations = relationCache.values.toSeq /* Bad */
// val relations = relationCache.map(_._2).toSeq /* Good */
relationCache.clear
relationsToFlush ++= relations
relationsToFlush.size
Has two collections, mutable map (relationCache) and mutable list (relationsToFlush). relationCache takes elements and at later point it should be transferred to relationsToFlush and the cache should be cleared up.
However, not all elements transferred to relationsToFlush, output as below:
scala> relationsToFlush ++= relCache
res14: relationsToFlush.type = ListBuffer(string-#80, string-#27)
scala> relationsToFlush.size
res15: Int = 2
Where else if the code changed to
val relations = relationCache.map(_._2).toSeq /* Good */
Then we get the expected result (170 elements)
My guess is 'good' code creates new mutable list with those element while the other returns directly from map, hence its lost when clear is called on map. However, shouldn't the reference count gets bumped up when it returns to relations variable?
Scala Version: 2.11
You've stumbled across one of the vagaries of the Seq trait.
Since Seq is a trait, and not a class, it's not really a collection type distinct from the others, leading some to refer to it as a failed abstraction.
Consider the following REPL session. (Scala 2.12.7)
scala> List(1,2,3).toSeq
res4: scala.collection.immutable.Seq[Int] = List(1, 2, 3)
scala> Stream(1,2,3).toSeq
res5: scala.collection.immutable.Seq[Int] = Stream(1, ?)
scala> Vector(1,2,3).toSeq
res6: scala.collection.immutable.Seq[Int] = Vector(1, 2, 3)
Notice how the underlying collection type is retained. In particular the lazy Stream has not realized all its elements. That's what's happening here:
relationCache.values.toSeq
The toSeq transformation returns a Stream and nothing thereafter is forcing the realization of the rest of the elements.
The problem is combining lazy evaluation with mutable data structures.
The value of relations is lazy and is not computed until it is used. Since it is based on a mutable.Map collection, the results will be based on whatever that mutable.Map has at the time relations is first used.
To complicate things, relations is actually a Stream which means that those values are locked the first time they are read, meaning the subsequent changes to the mutable.Map will not affect that value of relations.
The simple fix is to use toList rather than toSeq, because List is not a lazy collection and will be evaluated immediately.

What is the difference between List.view and LazyList?

I am new to Scala and I just learned that LazyList was created to replace Stream, and at the same time they added the .view methods to all collections.
So, I am wondering why was LazyList added to Scala collections library, when we can do List.view?
I just looked at the Scaladoc, and it seems that the only difference is that LazyList has memoization, while View does not. Am I right or wrong?
Stream elements are realized lazily except for the 1st (head) element. That was seen as a deficiency.
A List view is re-evaluated lazily but, as far as I know, has to be completely realized first.
def bang :Int = {print("BANG! ");1}
LazyList.fill(4)(bang) //res0: LazyList[Int] = LazyList(<not computed>)
Stream.fill(3)(bang) //BANG! res1: Stream[Int] = Stream(1, <not computed>)
List.fill(2)(bang).view //BANG! BANG! res2: SeqView[Int] = SeqView(<not computed>)
In 2.13, you can't force your way back from a view to the original collection type:
scala> case class C(n: Int) { def bump = new C(n+1).tap(i => println(s"bump to $i")) }
defined class C
scala> List(C(42)).map(_.bump)
bump to C(43)
res0: List[C] = List(C(43))
scala> List(C(42)).view.map(_.bump)
res1: scala.collection.SeqView[C] = SeqView(<not computed>)
scala> .force
^
warning: method force in trait View is deprecated (since 2.13.0): Views no longer know about their underlying collection type; .force always returns an IndexedSeq
bump to C(43)
res2: scala.collection.IndexedSeq[C] = Vector(C(43))
scala> LazyList(C(42)).map(_.bump)
res3: scala.collection.immutable.LazyList[C] = LazyList(<not computed>)
scala> .force
bump to C(43)
res4: res3.type = LazyList(C(43))
A function taking a view and optionally returning a strict realization would have to also take a "forcing function" such as _.toList, if the caller needs to choose the result type.
I don't do this sort of thing at my day job, but this behavior surprises me.
The difference is that LazyList can be generated from huge/infinite sequence, so you can do something like:
val xs = (1 to 1_000_000_000).to(LazyList)
And that won't run out of memory. After that you can operate on the lazy list with transformers. You won't be able to do the same by creating a List and taking a view from it. Having said that, SeqView has a much reacher set of methods compared to LazyList and that's why you can actually take a view of a LazyList like:
val xs = (1 to 1_000_000_000).to(LazyList)
val listView = xs.view

How flatMap in a Map works in scala?

This is my code
def testMap() = {
val x = Map(
1 -> Map(
2 -> 3,
3 -> 4
),
5 -> Map(
6 -> 7,
7 -> 8
)
)
for {
(a, v) <- x
(b, c) <- v
} yield {
a
}
}
The code above gives
List(1, 1, 5, 5)
If I change the yield value of the for comprehension a to (a, b), the result is
Map(1 -> 3, 5 -> 7)
If I change (a, b) to (a, b, c), the result is
List((1,2,3), (1,3,4), (5,6,7), (5,7,8))
My question is what is the mechanism behind the determination of the result type in this for comprehension?
When you look into the API Documentation into the details of the map-Method you will find, that it has a second, implicit parameter of type CanBuildFrom.
An instance of CanBuildFrom from defines how a certain collection is build when mapping over some other collection and a certain element type is provided.
In the case where you get a Map as result, you are mapping over a Map and are providing binary tuples. So the compiler searches for a CanBuildFrom-instance, that can handle that.
To find such an instance, the compiler looks in different places, e.g. the current scope, the class a method is invoked on and its companion object.
In this case it will find an implicit field called canBuildFrom in the companion object of Map that is suitable and can be used to build a Map as result. So it tries to infer the result type to Map and as this succeeds uses this instance.
In the case, where you provide single values or triples instead, the instance found in the companion of Map does not have the required type, so it continues searching up the inheritance tree. It finds it in the companion object of Iterable. The instance their allows to build an Iterable of an arbitrary element type. So the compiler uses that.
So why do you get a List? Because that happens to be the implementation used there, the type system only guarantees you an Iterable.
If you want to get an Iterable instead of a Map you can provide a CanBuildFrom instance explicitly (only if you call map and flatMap directly) or just force the return type. There you will also notice that you won't be able to request a List even though you get one.
This wont work:
val l: List[Int] = Map(1->2).map(x=>3)
This however will:
val l: Iterable[Int] = Map(1->2).map(x=>3)
To add to #dth, if you want a list, you can do:
val l = Map(1->2,3->4).view.map( ... ).toList
Here the map function apply on a lazy IterableView, which output also an IterableView, and the actual construction is triggered by the toList.
Note: Also, not using view can result in a dangerous behavior. Example:
val m = Map(2->2,3->3)
val l = m.map{ case (k,v) => (k/2,v) ).toList
// List((1,3))
val l = m.view.map{ case (k,v) => (k/2,v) ).toList
// List((1,2), (1,3))
Here, omitting the .view make the map output a Map which overrides duplicate keys (and does additional and unnecessary work).

Access element of an Array and return a monad?

If I access an index outside the bounds of an Array, I get an ArrayIndexOutOfBoundsException, eg:
val a = new Array[String](3)
a(4)
java.lang.ArrayIndexOutOfBoundsException: 4
Is there a method to return a monad instead (eg: Option)? And why doesn't the default collections apply method for Array support this?
You can use lift:
a.lift(4) // None
a.lift(2) // Some(null)
Array[T] is a PartialFunction[Int, T] and lift creates a Function[Int, Option[T]] from the index to an option of the element type.
You could use scala.util.Try:
scala> val a = new Array[String](3)
a: Array[String] = Array(null, null, null)
scala> import scala.util.Try
import scala.util.Try
scala> val fourth = Try(a(3))
third: scala.util.Try[String] = Failure(java.lang.ArrayIndexOutOfBoundsException
: 3)
scala> val third = Try(a(2))
third: scala.util.Try[String] = Success(null)
Another good idea is not using Array in the first place, but that's outside the scope of this question.
Where it is relevant to your question though is why this behaviour. Array is intended to function like a Java array and be compatible with a Java Array. Since this is how Java arrays work, this is how Array works.

Scala collections: transform content and type of the collection in one pass

I am looking for a way to efficiently transform content and type of the collection.
For example apply map to a Set and get result as List.
Note that I want to build result collection while applying transformation to source collection (i.e. without creating intermediate collection and then transforming it to desired type).
So far I've come up with this (for Set being transformed into List while incrementing every element of the set):
val set = Set(1, 2, 3)
val cbf = new CanBuildFrom[Set[Int], Int, List[Int]] {
def apply(from: Set[Int]): Builder[Int, List[Int]] = List.newBuilder[Int]
def apply(): Builder[Int, List[Int]] = List.newBuilder[Int]
}
val list: List[Int] = set.map(_ + 1)(cbf)
... but I feel like there should be more short and elegant way to do this (without manually implementing CanBuildFrom every time when I need to do this).
Any ideas how to do this?
This is exactly what scala.collection.breakOut is for—it'll conjure up the CanBuildFrom you need if you tell it the desired new collection type:
import scala.collection.breakOut
val xs: List[Int] = Set(1, 2, 3).map(_ + 1)(breakOut)
This will only traverse the collection once. See this answer for more detail.
I think that this might do what you want, though I am not 100% positive so maybe someone could confirm:
(Set(1, 2, 3).view map (_ + 1)).toList
view creates, in this case, an IterableView, which, when you call map on it, does not actually perform the mapping. When toList is called, the map is performed and a list is built with the results.