Converting mutable collection to immutable - scala

I'm looking for a best way of converting a collection.mutable.Seq[T] to collection.immutable.Seq[T].

If you want to convert ListBuffer into a List, use .toList. I mention this because that particular conversion is performed in constant time. Note, though, that any further use of the ListBuffer will result in its contents being copied first.
Otherwise, you can do collection.immutable.Seq(xs: _*), assuming xs is mutable, as you are unlikely to get better performance any other way.

As specified:
def convert[T](sq: collection.mutable.Seq[T]): collection.immutable.Seq[T] =
collection.immutable.Seq[T](sq:_*)
Addition
The native methods are a little tricky to use. They are already defined on scala.collection.Seq and you’ll have to take a close look whether they return a collection.immutable or a collection.mutable. For example .toSeq returns a collection.Seq which makes no guarantees about mutability. .toIndexedSeq however, returns a collection.immutable.IndexedSeq so it seems to be fine to use. I’m not sure though, if this is really the intended behaviour as there is also a collection.mutable.IndexedSeq.
The safest approach would be to convert it manually to the intended collection as shown above. When using a native conversion, I think it is best practice to add a type annotation including (mutable/immutable) to ensure the correct collection is returned.

toList (or toStream if you want it lazy) are the preferred way if you want a LinearSeq, as you can be sure what you get back is immutable (because List and Stream are). There's no toVector method if you want an immutable IndexedSeq, but it seems that toIndexedSeq gives you a Vector (which is immutable) most if not all of the time.
Another way is to use breakOut. This will look at the type you're aiming for in your return type, and if possible oblige you. e.g.
scala> val ms = collection.mutable.Seq(1,2,3)
ms: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 3)
scala> val r: List[Int] = ms.map(identity)(collection.breakOut)
r: List[Int] = List(1, 2, 3)
scala> val r: collection.immutable.Seq[Int] = ms.map(identity)(collection.breakOut)
r: scala.collection.immutable.Seq[Int] = Vector(1, 2, 3)
For more info on such black magic, get some strong coffee and see this question.

If you are also working with Set and Map you can also try these, using TreeSet as an example.
import scala.collection.mutable
val immutableSet = TreeSet(blue, green, red, yellow)
//converting a immutable set to a mutable set
val mutableSet = mutable.Set.empty ++= immutableSet
//converting a mutable set back to immutable set
val anotherImmutableSet = Set.empty ++ mutableSet
The above example is from book Programming in Scala

Related

Scala ListBuffer add-all behavior is not consistent, some elements are lost

I came across a Scala collection behavior that somewhat dubious. Is this an expected behavior?
Following is a simplified code to reproduce the issue.
import scala.collection.mutable.{ Map => MutableMap }
import scala.collection.mutable.ListBuffer
val relationCache = MutableMap.empty[String, String]
val relationsToFlush = new ListBuffer[String]()
def addRelation(relation: String) = relationCache(relation) = relation
Range(0,170).map("string-#" + _).foreach(addRelation(_))
val relations = relationCache.values.toSeq /* Bad */
// val relations = relationCache.map(_._2).toSeq /* Good */
relationCache.clear
relationsToFlush ++= relations
relationsToFlush.size
Has two collections, mutable map (relationCache) and mutable list (relationsToFlush). relationCache takes elements and at later point it should be transferred to relationsToFlush and the cache should be cleared up.
However, not all elements transferred to relationsToFlush, output as below:
scala> relationsToFlush ++= relCache
res14: relationsToFlush.type = ListBuffer(string-#80, string-#27)
scala> relationsToFlush.size
res15: Int = 2
Where else if the code changed to
val relations = relationCache.map(_._2).toSeq /* Good */
Then we get the expected result (170 elements)
My guess is 'good' code creates new mutable list with those element while the other returns directly from map, hence its lost when clear is called on map. However, shouldn't the reference count gets bumped up when it returns to relations variable?
Scala Version: 2.11
You've stumbled across one of the vagaries of the Seq trait.
Since Seq is a trait, and not a class, it's not really a collection type distinct from the others, leading some to refer to it as a failed abstraction.
Consider the following REPL session. (Scala 2.12.7)
scala> List(1,2,3).toSeq
res4: scala.collection.immutable.Seq[Int] = List(1, 2, 3)
scala> Stream(1,2,3).toSeq
res5: scala.collection.immutable.Seq[Int] = Stream(1, ?)
scala> Vector(1,2,3).toSeq
res6: scala.collection.immutable.Seq[Int] = Vector(1, 2, 3)
Notice how the underlying collection type is retained. In particular the lazy Stream has not realized all its elements. That's what's happening here:
relationCache.values.toSeq
The toSeq transformation returns a Stream and nothing thereafter is forcing the realization of the rest of the elements.
The problem is combining lazy evaluation with mutable data structures.
The value of relations is lazy and is not computed until it is used. Since it is based on a mutable.Map collection, the results will be based on whatever that mutable.Map has at the time relations is first used.
To complicate things, relations is actually a Stream which means that those values are locked the first time they are read, meaning the subsequent changes to the mutable.Map will not affect that value of relations.
The simple fix is to use toList rather than toSeq, because List is not a lazy collection and will be evaluated immediately.

What is the difference between List.view and LazyList?

I am new to Scala and I just learned that LazyList was created to replace Stream, and at the same time they added the .view methods to all collections.
So, I am wondering why was LazyList added to Scala collections library, when we can do List.view?
I just looked at the Scaladoc, and it seems that the only difference is that LazyList has memoization, while View does not. Am I right or wrong?
Stream elements are realized lazily except for the 1st (head) element. That was seen as a deficiency.
A List view is re-evaluated lazily but, as far as I know, has to be completely realized first.
def bang :Int = {print("BANG! ");1}
LazyList.fill(4)(bang) //res0: LazyList[Int] = LazyList(<not computed>)
Stream.fill(3)(bang) //BANG! res1: Stream[Int] = Stream(1, <not computed>)
List.fill(2)(bang).view //BANG! BANG! res2: SeqView[Int] = SeqView(<not computed>)
In 2.13, you can't force your way back from a view to the original collection type:
scala> case class C(n: Int) { def bump = new C(n+1).tap(i => println(s"bump to $i")) }
defined class C
scala> List(C(42)).map(_.bump)
bump to C(43)
res0: List[C] = List(C(43))
scala> List(C(42)).view.map(_.bump)
res1: scala.collection.SeqView[C] = SeqView(<not computed>)
scala> .force
^
warning: method force in trait View is deprecated (since 2.13.0): Views no longer know about their underlying collection type; .force always returns an IndexedSeq
bump to C(43)
res2: scala.collection.IndexedSeq[C] = Vector(C(43))
scala> LazyList(C(42)).map(_.bump)
res3: scala.collection.immutable.LazyList[C] = LazyList(<not computed>)
scala> .force
bump to C(43)
res4: res3.type = LazyList(C(43))
A function taking a view and optionally returning a strict realization would have to also take a "forcing function" such as _.toList, if the caller needs to choose the result type.
I don't do this sort of thing at my day job, but this behavior surprises me.
The difference is that LazyList can be generated from huge/infinite sequence, so you can do something like:
val xs = (1 to 1_000_000_000).to(LazyList)
And that won't run out of memory. After that you can operate on the lazy list with transformers. You won't be able to do the same by creating a List and taking a view from it. Having said that, SeqView has a much reacher set of methods compared to LazyList and that's why you can actually take a view of a LazyList like:
val xs = (1 to 1_000_000_000).to(LazyList)
val listView = xs.view

Scala collections: transform content and type of the collection in one pass

I am looking for a way to efficiently transform content and type of the collection.
For example apply map to a Set and get result as List.
Note that I want to build result collection while applying transformation to source collection (i.e. without creating intermediate collection and then transforming it to desired type).
So far I've come up with this (for Set being transformed into List while incrementing every element of the set):
val set = Set(1, 2, 3)
val cbf = new CanBuildFrom[Set[Int], Int, List[Int]] {
def apply(from: Set[Int]): Builder[Int, List[Int]] = List.newBuilder[Int]
def apply(): Builder[Int, List[Int]] = List.newBuilder[Int]
}
val list: List[Int] = set.map(_ + 1)(cbf)
... but I feel like there should be more short and elegant way to do this (without manually implementing CanBuildFrom every time when I need to do this).
Any ideas how to do this?
This is exactly what scala.collection.breakOut is for—it'll conjure up the CanBuildFrom you need if you tell it the desired new collection type:
import scala.collection.breakOut
val xs: List[Int] = Set(1, 2, 3).map(_ + 1)(breakOut)
This will only traverse the collection once. See this answer for more detail.
I think that this might do what you want, though I am not 100% positive so maybe someone could confirm:
(Set(1, 2, 3).view map (_ + 1)).toList
view creates, in this case, an IterableView, which, when you call map on it, does not actually perform the mapping. When toList is called, the map is performed and a list is built with the results.

Fastest way to append sequence objects in loop

I have a for loop within which I get an Seq[Seq[(String,Int)]] for every run. I have the usual way of running through the Seq[Seq[(String,Int)]] to get every Seq[(String,Int)] and then append it to a ListBuffer[Seq[String,Int]].
Here is the following code:
var lis; //Seq[Seq[Tuple2(String,Int)]]
var matches = new ListBuffer[(String,Int)]
someLoop.foreach(k=>
// someLoop gives lis object on evry run,
// and that needs to be added to matches list
lis.foreach(j => matches.appendAll(j))
)
Is there better way to do this process without running through Seq[Seq[String,Int]] loop, say directly adding all the seq objects from the Seq to the ListBuffer?
I tried the ++ operator, by adding matches and lis directly. It didn't work either. I use Scala 2.10.2
Try this:
matches.appendAll(lis.flatten)
This way you can avoid the mutable ListBuffer at all. lis.flatten will be the Seq[(String, Int)]. So you can shorten your code like this:
val lis = ... //whatever that is Seq[Seq[(String, Int)]]
val flatLis = lis.flatten // Seq[(String, Int)]
Avoid var's and mutable structures like ListBuffer as much as you can
You don't need to append to an empty ListBuffer, just create it directly:
import collection.breakOut
val matches: ListBuffer[(String,Int)] =
lis.flatten(breakOut)
breakOut is the magic here. Calling flatten on a Seq[Seq[T]] would usually create a Seq[T] that you'd then have to convert to a ListBuffer. Using breakOut causes it to look at the expected output type and build that kind of collection instead.
Of course... You were only using ListBuffer for mutability anyway, so a Seq[T] is probably exactly what you really want. In which case, just let the inferencer do its thing:
val matches = lis.flatten

Scala Option object inside another Option object

I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.