Why is there no mutable TreeMap in Scala? - scala

Is it lack of time, some technical problem or is there a reason why it should not exist?

It's just a missing case that will presumably eventually be filled in. There is no reason not to do it, and in certain cases it would be considerably faster than the immutable tree (since modifications require log(n) object creations with an immutable tree and only 1 with a mutable tree).
Edit: and in fact it was filled in in 2.12.
Mutable TreeMap.
(There is a corresponding Set also.)

Meanwhile you can use the Java TreeMap, which is exactly what you need.
val m = new java.util.TreeMap[String, Int]()
m.put("aa", 2)
m.put("cc", 3)

I assume that the reason is that having a mutable variant doesn't bring a big benefit. There are some cases mentioned in the other answers when a mutable map could be a bit more efficient, for example when replacing an already existing value: A mutable variant would save creation of new nodes, but the complexity would be still O(log n).
If you want to keep a shared reference to the map, you can use ImmutableMapAdaptor which wraps any immutable map into a mutable structure.

You'll also notice that TreeSet doesn't have a mutable equivalent either. It's because they share the common base class RedBlack, and the underlying data structure that keeps the Trees ordered by elements or keys is a red-black tree. I don't know too much about this data structure, but it's pretty complex (insertion and removal are pretty expensive compared to other Maps), so I assume that had something to do with a mutable variant not being included.
Basically, it's probably because the underlying data structure isn't readily mutable so TreeMap isn't. So, to answer your question, it's a technical problem. It can definitely be done though, there's just not much of a use case for it.

There may be performance reasons for a mutable TreeMap, but usually you can use an immutable map in the same way as you would a mutable one. You just have to assign it to a var rather than a val. It would be the same as for HashMap, which has mutable and immutable variants:
val mh = collection.mutable.HashMap[Int, Int]()
var ih = collection.immutable.HashMap[Int, Int]()
mh += (1 -> 2)
ih += (1 -> 2)
mh // scala.collection.mutable.HashMap[Int,Int] = Map(1 -> 2)
ih // scala.collection.immutable.HashMap[Int,Int] = Map(1 -> 2)

Related

Immutable DataStructures In Scala

We know that Scala supports immutable data structures..i.e each time u update the list it will create a new object and reference in the heap.
Example
val xs:List[Int] = List.apply(22)
val newList = xs ++ (33)
So when i append the second element to a list it will create a new list which will contain both 22 and 33.This exactly works like how immutable String works in Java.
So the question is each time I append a element in the list a new object will be created each time..This ldoes not look efficient to me.
is there some special data structures like persistent data structures are used when dealing with this..Does anyone know about this?
Appending to a list has O(n) complexity and is inefficient. A general approach is to prepend to a list while building it, and finally reverse it.
Now, your question on creating new object still applies to the prepend. Note that since xs is immutable, newList just points to xs for the rest of the data after the prepend.
While #manojlds is correct in his analysis, the original post asked about the efficiency of duplicating list nodes whenever you do an operation.
As #manojlds said, constructing lists often require thinking backwards, i.e., building a list and then reversing it. There are a number of other situations where list building requires "needless" copying.
To that end, there is a mutable data structure available in Scala called ListBuffer which you can use to build up your list and then extract the result as an immutable list:
val xsa = ListBuffer[Int](22)
xsa += 33
val newList = xsa.toList
However, the fact that the list data structure is, in general, immutable means that you have some very useful tools to analyze, de-compose and re-compose the list. Many builtin operations take advantage of the immutability. By extension, your own programs can also take advantage of this immutability.

Scala - initializing mutable Maps and exposing them as immutable

Is there any "good" code pattern for a Class initializing and populating private mutable Maps, and then exposing them as immutable ones? Or should I just eternally regret my functional misconduct in such cases?
In a certain Class, I am initializing some Maps as mutable ones, as the logic for initializing them does not fit very naturally, in this one case, with a pure mutable creation approach. Or, I was just lazy to model it immutably.
Now, I get Scala ugly code - after all the initialization computation, I copy-convert the mutable Maps into immutable ones (mostly through .toMap). This is already ugly as (1) the code has double the Maps and the double naming feels a bit off, and (2) the conversion lines look more involved than I'd hope for.
Additionally (3), it is to my disliking that the type definitions of the resulting immutable Maps, can only reside at the bottom of the code now, as they can only be declared after the initialization computation (or, can they be defined lazy and move to the top? still not entirely elegant).
Any way to elegantly wrap up around mutable Maps initialization code?
Something like:
scala> class X {
| private val mb = collection.immutable.Map.newBuilder[String, Int]
| def m = mb.result
| mb += ("a" -> 1) // stuff
| }
defined class X
scala> new X().m
res0: scala.collection.immutable.Map[String,Int] = Map(a -> 1)
I think using vars of immutables rather that vals of mutables, evolving the var collections according to my initialization logic, can be the optimal pattern wherever applicable. No duplicate collections, no code to convert from immutable to mutable, clear type definitions at the top of the Class...
However, it is my understanding that this functional way trades-off with run time efficiency, as mutable collections can provide better modification performance when running modification logic on them while building them.

scala.collection.breakOut vs views

This SO answer describes how scala.collection.breakOut can be used to prevent creating wasteful intermediate collections. For example, here we create an intermediate Seq[(String,String)]:
val m = List("A", "B", "C").map(x => x -> x).toMap
By using breakOut we can prevent the creation of this intermediate Seq:
val m: Map[String,String] = List("A", "B", "C").map(x => x -> x)(breakOut)
Views solve the same problem and in addition access elements lazily:
val m = (List("A", "B", "C").view map (x => x -> x)).toMap
I am assuming the creation of the View wrappers is fairly cheap, so my question is: Is there any real reason to use breakOut over Views?
You're going to make a trip from England to France.
With view: you're taking a set of notes in your notebook and boom, once you've called .force() you start making all of them: buy a ticket, board on the plane, ....
With breakOut: you're departing and boom, you in the Paris looking at the Eiffel tower. You don't remember how exactly you've arrived there, but you did this trip actually, just didn't make any memories.
Bad analogy, but I hope this give you a taste of what is the difference between them.
I don't think views and breakOut are identical.
A breakOut is a CanBuildFrom implementation used to simplify transformation operations by eliminating intermediary steps. E.g get from A to B without the intermediary collection. A breakOut means letting Scala choose the appropriate builder object for maximum efficiency of producing new items in a given scenario. More details here.
views deal with a different type of efficiency, the main sale pitch being: "No more new objects". Views store light references to objects to tackle different usage scenarios: lazy access etc.
Bottom line:
If you map on a view you may still get an intermediary collection of references created before the expected result can be produced. You could still have superior performance from:
collection.view.map(somefn)(breakOut)
Than from:
collection.view.map(someFn)
As of Scala 2.13, this is no longer a concern. Breakout has been removed and views are the recommended replacement.
Scala 2.13 Collections Rework
Views are also the recommended replacement for collection.breakOut.
For example,
val s: Seq[Int] = ...
val set: Set[String] = s.map(_.toString)(collection.breakOut)
can be expressed with the same performance characteristics as:
val s: Seq[Int] = ...
val set = s.view.map(_.toString).to(Set)
What flavian said.
One use case for views is to conserve memory. For example, if you had a million-character-long string original, and needed to use, one by one, all of the million suffixes of that string, you might use a collection of
val v = original.view
val suffixes = v.tails
views on the original string. Then you might loop over the suffixes one by one, using suffix.force() to convert them back to strings within the loop, thus only holding one in memory at a time. Of course, you could do the same thing by iterating with your own loop over the indices of the original string, rather than creating any kind of collection of the suffixes.
Another use-case is when creation of the derived objects is expensive, you need them in a collection (say, as values in a map), but you only will access a few, and you don't know which ones.
If you really have a case where picking between them makes sense, prefer breakOut unless there's a good argument for using view (like those above).
Views require more code changes and care than breakOut, in that you need to add force() where needed. Depending on context, failure to do so is
often only detected at run-time. With breakOut, generally if it
compiles, it's right.
In cases where view does not apply, breakOut
will be faster, since view generation and forcing is skipped.
If you use a debugger, you can inspect the collection contents, which you
can't meaningfully do with a collection of views.

Scala immutable map, when to go mutable?

My present use case is pretty trivial, either mutable or immutable Map will do the trick.
Have a method that takes an immutable Map, which then calls a 3rd party API method that takes an immutable Map as well
def doFoo(foo: String = "default", params: Map[String, Any] = Map()) {
val newMap =
if(someCondition) params + ("foo" -> foo) else params
api.doSomething(newMap)
}
The Map in question will generally be quite small, at most there might be an embedded List of case class instances, a few thousand entries max. So, again, assume little impact in going immutable in this case (i.e. having essentially 2 instances of the Map via the newMap val copy).
Still, it nags me a bit, copying the map just to get a new map with a few k->v entries tacked onto it.
I could go mutable and params.put("bar", bar), etc. for the entries I want to tack on, and then params.toMap to convert to immutable for the api call, that is an option. but then I have to import and pass around mutable maps, which is a bit of hassle compared to going with Scala's default immutable Map.
So, what are the general guidelines for when it is justified/good practice to use mutable Map over immutable Maps?
Thanks
EDIT
so, it appears that an add operation on an immutable map takes near constant time, confirming #dhg's and #Nicolas's assertion that a full copy is not made, which solves the problem for the concrete case presented.
Depending on the immutable Map implementation, adding a few entries may not actually copy the entire original Map. This is one of the advantages to the immutable data structure approach: Scala will try to get away with copying as little as possible.
This kind of behavior is easiest to see with a List. If I have a val a = List(1,2,3), then that list is stored in memory. However, if I prepend an additional element like val b = 0 :: a, I do get a new 4-element List back, but Scala did not copy the orignal list a. Instead, we just created one new link, called it b, and gave it a pointer to the existing List a.
You can envision strategies like this for other kinds of collections as well. For example, if I add one element to a Map, the collection could simply wrap the existing map, falling back to it when needed, all while providing an API as if it were a single Map.
Using a mutable object is not bad in itself, it becomes bad in a functional programming environment, where you try to avoid side-effects by keeping functions pure and objects immutable.
However, if you create a mutable object inside a function and modify this object, the function is still pure if you don't release a reference to this object outside the function. It is acceptable to have code like:
def buildVector( x: Double, y: Double, z: Double ): Vector[Double] = {
val ary = Array.ofDim[Double]( 3 )
ary( 0 ) = x
ary( 1 ) = y
ary( 2 ) = z
ary.toVector
}
Now, I think this approach is useful/recommended in two cases: (1) Performance, if creating and modifying an immutable object is a bottleneck of your whole application; (2) Code readability, because sometimes it's easier to modify a complex object in place (rather than resorting to lenses, zippers, etc.)
In addition to dhg's answer, you can take a look to the performance of the scala collections. If an add/remove operation doesn't take a linear time, it must do something else than just simply copying the entire structure. (Note that the converse is not true: it's not beacuase it takes linear time that your copying the whole structure)
I like to use collections.maps as the declared parameter types (input or return values) rather than mutable or immutable maps. The Collections maps are immutable interfaces that work for both types of implementations. A consumer method using a map really doesn't need to know about a map implementation or how it was constructed. (It's really none of its business anyway).
If you go with the approach of hiding a map's particular construction (be it mutable or immutable) from the consumers who use it then you're still getting an essentially immutable map downstream. And by using collection.Map as an immutable interface you completely remove all the ".toMap" inefficiency that you would have with consumers written to use immutable.Map typed objects. Having to convert a completely constructed map into another one simply to comply to an interface not supported by the first one really is absolutely unnecessary overhead when you think about it.
I suspect in a few years from now we'll look back at the three separate sets of interfaces (mutable maps, immutable maps, and collections maps) and realize that 99% of the time only 2 are really needed (mutable and collections) and that using the (unfortunately) default immutable map interface really adds a lot of unnecessary overhead for the "Scalable Language".

Scala Collections inconsistencies

Why is there a lack of consistency between Sets and Lists in Scala Collections API?
For example, there is immutable Set, but also a mutable one. If I want to use the latter, I can simply do this:
val set = Set[A]()
set += new A
However, there is no mutable List, per se. If I want to write a similar code snippet using Lists, which data structure to use? LinkedList sounds as a good candidate, because it is mutable, but has no += method defined. ListBuffer seems to satisfy the requirements, but it is not a list.
After reading 2.8 Collections docs I come to the conclusion MutableList is probably the best fit.
I still somehow wish there was scala.collection.mutable.List.
The reason for this is that Java has co-opted the functional List type to mean something that it is not (i.e. java.util.List is not a list).
It probably makes no sense for a functional programming language to have a mutable List as such a type is an oxymoron. Hence ListBuffer or ArrayBuffer. Or just use IndexedSeq, of which there are mutable and immutable implementations
The sequence/list analogue of Set in Scala's collection libraries is Seq. List is just a particular, immutable implementation of Seq, as is Vector. ArrayBuffer or ListBuffer are typical implementations of mutable.Seq.
ArraySeq may be what you are looking for, except += is exceptionally slow. You could also use a java.util.ArrayList and import collection.JavaConversions._
It seems Scala lacks a good mutable List-like collection with constant time index (like ArrayList for java).
In any case, note that "List" refers to exactly the type "scala.immutable.List". Therefore Seq (or some other more abstract collection type), is the type you should expect in methods rather than "List" if you want to generalize over immutable/mutable collections.
More ideal is requring an IndexedSeq, which sort of means that the index operation is performant for that collection. However, I'm not sure ListBuffer falls into that category.
Because Set is just a trait -- it is abstract and requires an implementation. So one can speak of classes which are mutable.Set or immutable.Set.
Meanwhile, List is a class, an implementation of the (abstract) trait immutable.LinearSeq. There can never be any other class which is also a List. You'll find out, however, that there is a mutable.LinearSeq trait.
In Java terms, you are comparing interfaces with classes -- they are distinct.
Don't forget scala.collection.mutable.{LinkedList,DoubleLinkedList}. They are mutable, and they are LinearSeq. Mutation is a little weird--you can modify the head by assigning to the elem reference, and the tail by assigning to the next reference.
For example, this loop changes all negative values to zero.
val lst = collection.mutable.LinkedList(1, -2, 7, -9)
var cur = lst
while (cur != Nil) {
if (cur.elem < 0) cur.elem = 0
cur = cur.next
}
This loop removes every second element from the list.
var cur = lst
while (cur != Nil && cur.next != Nil) {
cur.next = cur.next.next
cur = cur.next
}
I am not suggesting that these are any better than the immutable List. I am just pointing out that Scala has mutable lists that look fairly similar to what you have seen in your data structures class.