Why is there a lack of consistency between Sets and Lists in Scala Collections API?
For example, there is immutable Set, but also a mutable one. If I want to use the latter, I can simply do this:
val set = Set[A]()
set += new A
However, there is no mutable List, per se. If I want to write a similar code snippet using Lists, which data structure to use? LinkedList sounds as a good candidate, because it is mutable, but has no += method defined. ListBuffer seems to satisfy the requirements, but it is not a list.
After reading 2.8 Collections docs I come to the conclusion MutableList is probably the best fit.
I still somehow wish there was scala.collection.mutable.List.
The reason for this is that Java has co-opted the functional List type to mean something that it is not (i.e. java.util.List is not a list).
It probably makes no sense for a functional programming language to have a mutable List as such a type is an oxymoron. Hence ListBuffer or ArrayBuffer. Or just use IndexedSeq, of which there are mutable and immutable implementations
The sequence/list analogue of Set in Scala's collection libraries is Seq. List is just a particular, immutable implementation of Seq, as is Vector. ArrayBuffer or ListBuffer are typical implementations of mutable.Seq.
ArraySeq may be what you are looking for, except += is exceptionally slow. You could also use a java.util.ArrayList and import collection.JavaConversions._
It seems Scala lacks a good mutable List-like collection with constant time index (like ArrayList for java).
In any case, note that "List" refers to exactly the type "scala.immutable.List". Therefore Seq (or some other more abstract collection type), is the type you should expect in methods rather than "List" if you want to generalize over immutable/mutable collections.
More ideal is requring an IndexedSeq, which sort of means that the index operation is performant for that collection. However, I'm not sure ListBuffer falls into that category.
Because Set is just a trait -- it is abstract and requires an implementation. So one can speak of classes which are mutable.Set or immutable.Set.
Meanwhile, List is a class, an implementation of the (abstract) trait immutable.LinearSeq. There can never be any other class which is also a List. You'll find out, however, that there is a mutable.LinearSeq trait.
In Java terms, you are comparing interfaces with classes -- they are distinct.
Don't forget scala.collection.mutable.{LinkedList,DoubleLinkedList}. They are mutable, and they are LinearSeq. Mutation is a little weird--you can modify the head by assigning to the elem reference, and the tail by assigning to the next reference.
For example, this loop changes all negative values to zero.
val lst = collection.mutable.LinkedList(1, -2, 7, -9)
var cur = lst
while (cur != Nil) {
if (cur.elem < 0) cur.elem = 0
cur = cur.next
}
This loop removes every second element from the list.
var cur = lst
while (cur != Nil && cur.next != Nil) {
cur.next = cur.next.next
cur = cur.next
}
I am not suggesting that these are any better than the immutable List. I am just pointing out that Scala has mutable lists that look fairly similar to what you have seen in your data structures class.
Related
mutable.MutableList[A] can do the inline prepend with its +=:, but it's trait mutable.LinearSeq can not do this. I think the whole purpose of mutable package is for doing inline update in the collection, so why mutable.LinearSeq can not prepend? and what's diff between ListBuffer and MutableList?
Why mutable.LinearSeq can not prepend?
Prepend (+=:) isn't defined on mutable.LinearSeq, it's defined on the BufferLike trait, which ListBuffer extends. MutableList doesn't implement the Buffer trait, but did choose to provide a prepend method.
A mutable collection is meant for you to be able to mutate it without allocating a new collection, and that is what mutable stands for. Most mutable collection will expose an += method to append values, but it isn't necessary for all to be able to prepend elements in to their underlying collection, that is implementation specific. Depending on the implementation, += can either choose to append an element or prepend it to the underlying collection, depending on how it's storing its values.
What's diff between ListBuffer and MutableList?
One of the main differences between ListBuffer[A] and MutableList[A] is that the former uses lists as the underlying implementation and has O(1) for creating a List[A] from the buffer where the latter uses a LinkedList[A] for the underlying implementation and is the actual basis for the Queue[A] implementation in Scala.
My present use case is pretty trivial, either mutable or immutable Map will do the trick.
Have a method that takes an immutable Map, which then calls a 3rd party API method that takes an immutable Map as well
def doFoo(foo: String = "default", params: Map[String, Any] = Map()) {
val newMap =
if(someCondition) params + ("foo" -> foo) else params
api.doSomething(newMap)
}
The Map in question will generally be quite small, at most there might be an embedded List of case class instances, a few thousand entries max. So, again, assume little impact in going immutable in this case (i.e. having essentially 2 instances of the Map via the newMap val copy).
Still, it nags me a bit, copying the map just to get a new map with a few k->v entries tacked onto it.
I could go mutable and params.put("bar", bar), etc. for the entries I want to tack on, and then params.toMap to convert to immutable for the api call, that is an option. but then I have to import and pass around mutable maps, which is a bit of hassle compared to going with Scala's default immutable Map.
So, what are the general guidelines for when it is justified/good practice to use mutable Map over immutable Maps?
Thanks
EDIT
so, it appears that an add operation on an immutable map takes near constant time, confirming #dhg's and #Nicolas's assertion that a full copy is not made, which solves the problem for the concrete case presented.
Depending on the immutable Map implementation, adding a few entries may not actually copy the entire original Map. This is one of the advantages to the immutable data structure approach: Scala will try to get away with copying as little as possible.
This kind of behavior is easiest to see with a List. If I have a val a = List(1,2,3), then that list is stored in memory. However, if I prepend an additional element like val b = 0 :: a, I do get a new 4-element List back, but Scala did not copy the orignal list a. Instead, we just created one new link, called it b, and gave it a pointer to the existing List a.
You can envision strategies like this for other kinds of collections as well. For example, if I add one element to a Map, the collection could simply wrap the existing map, falling back to it when needed, all while providing an API as if it were a single Map.
Using a mutable object is not bad in itself, it becomes bad in a functional programming environment, where you try to avoid side-effects by keeping functions pure and objects immutable.
However, if you create a mutable object inside a function and modify this object, the function is still pure if you don't release a reference to this object outside the function. It is acceptable to have code like:
def buildVector( x: Double, y: Double, z: Double ): Vector[Double] = {
val ary = Array.ofDim[Double]( 3 )
ary( 0 ) = x
ary( 1 ) = y
ary( 2 ) = z
ary.toVector
}
Now, I think this approach is useful/recommended in two cases: (1) Performance, if creating and modifying an immutable object is a bottleneck of your whole application; (2) Code readability, because sometimes it's easier to modify a complex object in place (rather than resorting to lenses, zippers, etc.)
In addition to dhg's answer, you can take a look to the performance of the scala collections. If an add/remove operation doesn't take a linear time, it must do something else than just simply copying the entire structure. (Note that the converse is not true: it's not beacuase it takes linear time that your copying the whole structure)
I like to use collections.maps as the declared parameter types (input or return values) rather than mutable or immutable maps. The Collections maps are immutable interfaces that work for both types of implementations. A consumer method using a map really doesn't need to know about a map implementation or how it was constructed. (It's really none of its business anyway).
If you go with the approach of hiding a map's particular construction (be it mutable or immutable) from the consumers who use it then you're still getting an essentially immutable map downstream. And by using collection.Map as an immutable interface you completely remove all the ".toMap" inefficiency that you would have with consumers written to use immutable.Map typed objects. Having to convert a completely constructed map into another one simply to comply to an interface not supported by the first one really is absolutely unnecessary overhead when you think about it.
I suspect in a few years from now we'll look back at the three separate sets of interfaces (mutable maps, immutable maps, and collections maps) and realize that 99% of the time only 2 are really needed (mutable and collections) and that using the (unfortunately) default immutable map interface really adds a lot of unnecessary overhead for the "Scalable Language".
Being new to scala and a current java developer, scala was designed to encourage the use of immutability to class design.
How does this translate practically to the design of classes? The only thing that is brought to my mind is case classes. Are case classes strongly encouraged for defining data? Example? How else is immutability encouraged in Scala design of classes?
As a java developer, classes defining data were mutable. The equivalent Scala classes should be defined as case classes?
Well, case classes certainly help, but the biggest contributor is probably the collection library. The default collections are immutable, and the methods are geared toward manipulating collections by producing new ones instead of mutating. Since the immutable collections are persistent, that doesn't require copying the whole collection, which is something one often has to do in Java.
Beyond that, for-comprehensions are monadic comprehensions, which is helpful in doing immutable tasks, there's tail recursion optimization, which is very important in immutable algorithms, and general attention to immutability in many libraries, such as parser combinators and xml.
Finally, note that you have to ask for a var to get some mutability. Parameters are immutable, and val is just as short as var. Contrast this with Java, where parameters are mutable, and you need to add a final keyword to get immutability. Whereas in Scala it is as easy or easier to stay immutable, in Java it is easier to stay mutable.
Addendum
Persistent data structures are data structures that share parts between modified versions of it. This might be a bit difficult to understand, so let's consider Scala's List, which is pretty basic and easy to understand.
A Scala List is composed of two classes, known as cons and Nil. The former is actually written :: in Scala, but I'll refer to it by the traditional name.
Nil is the empty list. It doesn't contain anything. Methods that depend on the list not being empty, such as head and tail throw exceptions, while others work ok.
Naturally, cons must then represent a non-empty list. In fact, cons has exactly two elements: a value, and a list. These elements are known as head and tail.
So a list with three elements is composed of three cons, since each cons will hold only one value, plus a Nil. It must have a Nil because a cons must point to a list. As lists are not circular, then one of the cons must point to something other than a cons.
One example of such list is this:
val list = 1 :: 2 :: 3 :: Nil
Now, the components of a Scala List are immutable. One cannot change neither the value nor the list of a cons. One benefit of immutability is that you never need to copy the collection before passing or after receiving it from some other method: you know that list cannot change.
Now, let's consider what would happen if I modified that list. Let's consider two modifications: removing the first element and prepending a new element.
We can remove one element with the method tail, whose name is not a coincidence at all. So, we write:
val list2 = list.tail
And list2 will point to the same list that list's tail is pointing. Nothing at all was created: we simply reused part of list. So, let's prepend an element to list2 then:
val list3 = 0 :: list2
We created a new cons there. This new cons has a value (a head) equal to 0, and its tail points to list2. Note that both list and list3 point to the same list2. These elements are being shared by both list and list3.
There are many other persistent data structures. The very fact that the data you are manipulating is immutable makes it easy to share components.
One can find more information about this subject on the book by Chris Okasaki, Purely Functional Data Structures, or on his freely available thesis by the same name.
I'm using a library (JXPath) to query a graph of beans in order to extract matching elements. However, JXPath returns groups of matching elements as an instance of java.lang.Iterator and I'd rather like to convert it into an immutable scala list. Is there any simpler way of doing than iterating over the iterator and creating a new immutable list at each iteration step ?
You might want to rethink the need for a List, although it feels very familiar when coming from Java, and List is the default implementation of an immutable Seq, it often isn't the best choice of collection.
The operations that list is optimal for are those already available via an iterator (basically taking consecutive head elements and prepending elements). If an iterator doesn't already give you what you need, then I can pretty much guarantee that a List won't be your best choice - a vector would be more appropriate.
Having got that out the way... The recommended technique to convert between Java and Scala collections (since Scala 2.8.1) is via scala.collection.JavaConverters. This gives you more control than JavaConversions and avoids some possible implicit conflicts.
You won't have a direct implicit conversion this way. Instead, you get asScala and asJava methods pimped onto collections, allowing you to perform the conversions explicitly.
To convert a Java iterator to a Scala iterator:
javaIterator.asScala
To convert a Java iterator to a Scala List (via the scala iterator):
javaIterator.asScala.toList
You may also want to consider converting toSeq instead of toList. In the case of iterators, this'll return a Stream - allowing you to retain the lazy behaviour of iterators within the richer Seq interface.
EDIT:
There's no toVector method, but (as Daniel pointed out) there's a toIndexedSeq method that will return a Vector as the default IndexedSeq subclass (just as List is the default Seq).
javaIterator.asScala.toIndexedSeq
EDIT: You should probably look at Kevin Wright's answer, which provides a better solution available since Scala 2.8.1, with less implicit magic.
You can import the implicit conversions from scala.collection.JavaConversions and then create a new Scala collection seamlessly, e.g. like this:
import collection.JavaConversions._
println(List() ++ javaIterator)
Your Java iterator is converted to a Scala iterator by JavaConversions.asScalaIterator. A Scala iterator with elements of type A implements TraversableOnce[A], which is the argument type needed to concatenate collections with ++.
If you need another collection type, just change List() to whatever you need (e.g., IndexedSeq() or collection.mutable.Seq(), etc.).
What is the difference between Scala's MutableList and ListBuffer classes in scala.collection.mutable? When would you use one vs the other?
My use case is having a linear sequence where I can efficiently remove the first element, prepend, and append. What's the best structure for this?
A little explanation on how they work.
ListBuffer uses internally Nil and :: to build an immutable List and allows constant-time removal of the first and last elements. To do so, it keeps a pointer on the first and last element of the list, and is actually allowed to change the head and tail of the (otherwise immutable) :: class (nice trick allowed by the private[scala] var members of ::). Its toList method returns the normal immutable List in constant time as well, as it can directly return the structure maintained internally. It is also the default builder for immutable Lists (and thus can indeed be reasonably expected to have constant-time append). If you call toList and then again append an element to the buffer, it takes linear time with respect to the current number of elements in the buffer to recreate a new structure, as it must not mutate the exported list any more.
MutableList works internally with LinkedList instead, an (openly, not like ::) mutable linked list implementation which knows of its element and successor (like ::). MutableList also keeps pointers to the first and last element, but toList returns in linear time, as the resulting List is constructed from the LinkedList. Thus, it doesn't need to reinitialize the buffer after a List has been exported.
Given your requirements, I'd say ListBuffer and MutableList are equivalent. If you want to export their internal list at some point, then ask yourself where you want the overhead: when you export the list, and then no overhead if you go on mutating buffer (then go for MutableList), or only if you mutable the buffer again, and none at export time (then go for ListBuffer).
My guess is that in the 2.8 collection overhaul, MutableList predated ListBuffer and the whole Builder system. Actually, MutableList is predominantly useful from within the collection.mutable package: it has a private[mutable] def toLinkedList method which returns in constant time, and can thus efficiently be used as a delegated builder for all structures that maintain a LinkedList internally.
So I'd also recommend ListBuffer, as it may also get attention and optimization in the future than “purely mutable” structures like MutableList and LinkedList.
This gives you an overview of the performance characteristics: http://www.scala-lang.org/docu/files/collections-api/collections.html ; interestingly, MutableList and ListBuffer do not differ there. The documentation of MutableList says it is used internally as base class for Stack and Queue, so maybe ListBuffer is more the official class from the user perspective?
You want a list (why a list?) that is growable and shrinkable, and you want constant append and prepend. Well, Buffer, a trait, has constant append and prepend, with most other operations linear. I'm guessing that ListBuffer, a class that implements Buffer, has constant time removal of the first element.
So, my own recommendation is for ListBuffer.
First, lets go over some of the relevant types in Scala
List - An Immutable collection. A Recursive implementation i.e . i.e An instance of list has two primary elements the head and the tail, where the tail references another List.
List[T]
head: T
tail: List[T] //recursive
LinkedList - A mutable collection defined as a series of linked nodes, where each node contains a value and a pointer to the next node.
Node[T]
value: T
next: Node[T] //sequential
LinkedList[T]
first: Node[T]
List is a functional data structure (immutability) compared to LinkedList which is more standard in imperative languages.
Now, lets look at
ListBuffer - A mutable buffer implementation backed by a List.
MutableList - An implementation based on LinkedList ( Would have been more self explanatory if it had been named LinkedListBuffer instead )
They both offer similar complexity bounds on most operations.
However, if you request a List from a MutableList, then it has to convert the existing linear representation into the recursive representation which takes O(n) which is what #Jean-Philippe Pellet points out. But, if you request a Seq from MutableList the complexity is O(1).
So, IMO the choice narrows down to the specifics of your code and your preference. Though, I suspect there is a lot more List and ListBuffer out there.
Note that ListBuffer is final/sealed, while you can extend MutableList.
Depending on your application, extensibility may be useful.