why scala mutable.LinearSeq can not do +=: - scala

mutable.MutableList[A] can do the inline prepend with its +=:, but it's trait mutable.LinearSeq can not do this. I think the whole purpose of mutable package is for doing inline update in the collection, so why mutable.LinearSeq can not prepend? and what's diff between ListBuffer and MutableList?

Why mutable.LinearSeq can not prepend?
Prepend (+=:) isn't defined on mutable.LinearSeq, it's defined on the BufferLike trait, which ListBuffer extends. MutableList doesn't implement the Buffer trait, but did choose to provide a prepend method.
A mutable collection is meant for you to be able to mutate it without allocating a new collection, and that is what mutable stands for. Most mutable collection will expose an += method to append values, but it isn't necessary for all to be able to prepend elements in to their underlying collection, that is implementation specific. Depending on the implementation, += can either choose to append an element or prepend it to the underlying collection, depending on how it's storing its values.
What's diff between ListBuffer and MutableList?
One of the main differences between ListBuffer[A] and MutableList[A] is that the former uses lists as the underlying implementation and has O(1) for creating a List[A] from the buffer where the latter uses a LinkedList[A] for the underlying implementation and is the actual basis for the Queue[A] implementation in Scala.

Related

How do I deal with Scala collections generically?

I have realized that my typical way of passing Scala collections around could use some improvement.
def doSomethingCool(theFoos: List[Foo]) = { /* insert cool stuff here */ }
// if I happen to have a List
doSomethingCool(theFoos)
// but elsewhere I may have a Vector, Set, Option, ...
doSomethingCool(theFoos.toList)
I tend to write my library functions to take a List as the parameter type, but I'm certain that there's something more general I can put there to avoid all the occasional .toList calls I have in the application code. This is especially annoying since my doSomethingCool function typically only needs to call map, flatMap and filter, which are defined on all the collection types.
What are my options for that 'something more general'?
Here are more general traits, each of which extends the previous one:
GenTraversableOnce
GenTraversable
GenIterable
GenSeq
The traits above do not specify whether the collection is sequential or parallel. If your code requires that things be executed sequentially (typically, if your code has side effects of any kind), they are too general for it.
The following traits mandate sequential execution:
TraversableOnce
Traversable
Iterable
Seq
LinearSeq
The first one, TraversableOnce only allows you to call one method on the collection. After that, the collection has been "used". In exchange, it is general enough to accept iterators as well as collections.
Traversable is a pretty general collection that has most methods. There are some things it cannot do, however, in which case you need to go to Iterable.
All Iterable implement the iterator method, which allows you to get an Iterator for that collection. This gives it the capability for a few methods not present in Traversable.
A Seq[A] implements the function Int => A, which means you can access any element by its index. This is not guaranteed to be efficient, but it is a guarantee that each element has an index, and that you can make assertions about what that index is going to be. Contrast this with Map and Set, where you cannot tell what the index of an element is.
A LinearSeq is a Seq that provides fast head, tail, isEmpty and prepend. This is as close as you can get to a List without actually using a List explicitly.
Alternatively, you could have an IndexedSeq, which has fast indexed access (something List does not provide).
See also this question and this FAQ based on it.
The most obvious one is to use Traversable as the most general trait which will have the goodies you want. However, I think you are generally better sticking to:
Seq
IndexedSeq
Set
Map
A Seq will cover List, Vector etc, IndexedSeq will cover Vector etc etc. I found myself not using Iterable because I often need (or want) to know the size of the thing I have and back pre scala-2.8 Iterable did not provide access to this, so I kept having to turn things into sequences anyway!
Looks like Traversable and Iterable now have size methods so maybe I should go back to using them! Of course you could start "going mad" with GenTraversableOnce but that is not likely to aid in readability.

How to append or prepend on a Scala mutable.Seq

There's something I don't understand about Scala's collection.mutable.Seq. It describes the interface for all mutable sequences, yet I don't see methods to append or prepend elements without creating a new sequence. Am I missing something obvious here?
There are :+ and +: for append and prepend, respectively, but they create new collections — in order to be consistent with the behavior of immutable sequences, I assume. This is fine, but why is there no method like += and +=:, like ArrayBuffer and ListBuffer define, for in-place append and prepend? Does it mean that I cannot refer to a mutable seq that's typed as collection.mutable.Seq if I want to do in-place append?
Again, I must have missed something obvious, but cannot find what…
Mutability for sequences only guarantees that you'll be able to swap out the items for different ones (via the update method), as you can with e.g. primitive arrays. It does not guarantee that you'll be able to make the sequence larger (that's what the Growable trait is for) or smaller (Shrinkable).
Buffer is the abstract trait that contains Growable and Shrinkable, not Seq.

Create an immutable list from a java.lang.Iterator

I'm using a library (JXPath) to query a graph of beans in order to extract matching elements. However, JXPath returns groups of matching elements as an instance of java.lang.Iterator and I'd rather like to convert it into an immutable scala list. Is there any simpler way of doing than iterating over the iterator and creating a new immutable list at each iteration step ?
You might want to rethink the need for a List, although it feels very familiar when coming from Java, and List is the default implementation of an immutable Seq, it often isn't the best choice of collection.
The operations that list is optimal for are those already available via an iterator (basically taking consecutive head elements and prepending elements). If an iterator doesn't already give you what you need, then I can pretty much guarantee that a List won't be your best choice - a vector would be more appropriate.
Having got that out the way... The recommended technique to convert between Java and Scala collections (since Scala 2.8.1) is via scala.collection.JavaConverters. This gives you more control than JavaConversions and avoids some possible implicit conflicts.
You won't have a direct implicit conversion this way. Instead, you get asScala and asJava methods pimped onto collections, allowing you to perform the conversions explicitly.
To convert a Java iterator to a Scala iterator:
javaIterator.asScala
To convert a Java iterator to a Scala List (via the scala iterator):
javaIterator.asScala.toList
You may also want to consider converting toSeq instead of toList. In the case of iterators, this'll return a Stream - allowing you to retain the lazy behaviour of iterators within the richer Seq interface.
EDIT:
There's no toVector method, but (as Daniel pointed out) there's a toIndexedSeq method that will return a Vector as the default IndexedSeq subclass (just as List is the default Seq).
javaIterator.asScala.toIndexedSeq
EDIT: You should probably look at Kevin Wright's answer, which provides a better solution available since Scala 2.8.1, with less implicit magic.
You can import the implicit conversions from scala.collection.JavaConversions and then create a new Scala collection seamlessly, e.g. like this:
import collection.JavaConversions._
println(List() ++ javaIterator)
Your Java iterator is converted to a Scala iterator by JavaConversions.asScalaIterator. A Scala iterator with elements of type A implements TraversableOnce[A], which is the argument type needed to concatenate collections with ++.
If you need another collection type, just change List() to whatever you need (e.g., IndexedSeq() or collection.mutable.Seq(), etc.).

Difference between MutableList and ListBuffer

What is the difference between Scala's MutableList and ListBuffer classes in scala.collection.mutable? When would you use one vs the other?
My use case is having a linear sequence where I can efficiently remove the first element, prepend, and append. What's the best structure for this?
A little explanation on how they work.
ListBuffer uses internally Nil and :: to build an immutable List and allows constant-time removal of the first and last elements. To do so, it keeps a pointer on the first and last element of the list, and is actually allowed to change the head and tail of the (otherwise immutable) :: class (nice trick allowed by the private[scala] var members of ::). Its toList method returns the normal immutable List in constant time as well, as it can directly return the structure maintained internally. It is also the default builder for immutable Lists (and thus can indeed be reasonably expected to have constant-time append). If you call toList and then again append an element to the buffer, it takes linear time with respect to the current number of elements in the buffer to recreate a new structure, as it must not mutate the exported list any more.
MutableList works internally with LinkedList instead, an (openly, not like ::) mutable linked list implementation which knows of its element and successor (like ::). MutableList also keeps pointers to the first and last element, but toList returns in linear time, as the resulting List is constructed from the LinkedList. Thus, it doesn't need to reinitialize the buffer after a List has been exported.
Given your requirements, I'd say ListBuffer and MutableList are equivalent. If you want to export their internal list at some point, then ask yourself where you want the overhead: when you export the list, and then no overhead if you go on mutating buffer (then go for MutableList), or only if you mutable the buffer again, and none at export time (then go for ListBuffer).
My guess is that in the 2.8 collection overhaul, MutableList predated ListBuffer and the whole Builder system. Actually, MutableList is predominantly useful from within the collection.mutable package: it has a private[mutable] def toLinkedList method which returns in constant time, and can thus efficiently be used as a delegated builder for all structures that maintain a LinkedList internally.
So I'd also recommend ListBuffer, as it may also get attention and optimization in the future than “purely mutable” structures like MutableList and LinkedList.
This gives you an overview of the performance characteristics: http://www.scala-lang.org/docu/files/collections-api/collections.html ; interestingly, MutableList and ListBuffer do not differ there. The documentation of MutableList says it is used internally as base class for Stack and Queue, so maybe ListBuffer is more the official class from the user perspective?
You want a list (why a list?) that is growable and shrinkable, and you want constant append and prepend. Well, Buffer, a trait, has constant append and prepend, with most other operations linear. I'm guessing that ListBuffer, a class that implements Buffer, has constant time removal of the first element.
So, my own recommendation is for ListBuffer.
First, lets go over some of the relevant types in Scala
List - An Immutable collection. A Recursive implementation i.e . i.e An instance of list has two primary elements the head and the tail, where the tail references another List.
List[T]
head: T
tail: List[T] //recursive
LinkedList - A mutable collection defined as a series of linked nodes, where each node contains a value and a pointer to the next node.
Node[T]
value: T
next: Node[T] //sequential
LinkedList[T]
first: Node[T]
List is a functional data structure (immutability) compared to LinkedList which is more standard in imperative languages.
Now, lets look at
ListBuffer - A mutable buffer implementation backed by a List.
MutableList - An implementation based on LinkedList ( Would have been more self explanatory if it had been named LinkedListBuffer instead )
They both offer similar complexity bounds on most operations.
However, if you request a List from a MutableList, then it has to convert the existing linear representation into the recursive representation which takes O(n) which is what #Jean-Philippe Pellet points out. But, if you request a Seq from MutableList the complexity is O(1).
So, IMO the choice narrows down to the specifics of your code and your preference. Though, I suspect there is a lot more List and ListBuffer out there.
Note that ListBuffer is final/sealed, while you can extend MutableList.
Depending on your application, extensibility may be useful.

Scala Collections inconsistencies

Why is there a lack of consistency between Sets and Lists in Scala Collections API?
For example, there is immutable Set, but also a mutable one. If I want to use the latter, I can simply do this:
val set = Set[A]()
set += new A
However, there is no mutable List, per se. If I want to write a similar code snippet using Lists, which data structure to use? LinkedList sounds as a good candidate, because it is mutable, but has no += method defined. ListBuffer seems to satisfy the requirements, but it is not a list.
After reading 2.8 Collections docs I come to the conclusion MutableList is probably the best fit.
I still somehow wish there was scala.collection.mutable.List.
The reason for this is that Java has co-opted the functional List type to mean something that it is not (i.e. java.util.List is not a list).
It probably makes no sense for a functional programming language to have a mutable List as such a type is an oxymoron. Hence ListBuffer or ArrayBuffer. Or just use IndexedSeq, of which there are mutable and immutable implementations
The sequence/list analogue of Set in Scala's collection libraries is Seq. List is just a particular, immutable implementation of Seq, as is Vector. ArrayBuffer or ListBuffer are typical implementations of mutable.Seq.
ArraySeq may be what you are looking for, except += is exceptionally slow. You could also use a java.util.ArrayList and import collection.JavaConversions._
It seems Scala lacks a good mutable List-like collection with constant time index (like ArrayList for java).
In any case, note that "List" refers to exactly the type "scala.immutable.List". Therefore Seq (or some other more abstract collection type), is the type you should expect in methods rather than "List" if you want to generalize over immutable/mutable collections.
More ideal is requring an IndexedSeq, which sort of means that the index operation is performant for that collection. However, I'm not sure ListBuffer falls into that category.
Because Set is just a trait -- it is abstract and requires an implementation. So one can speak of classes which are mutable.Set or immutable.Set.
Meanwhile, List is a class, an implementation of the (abstract) trait immutable.LinearSeq. There can never be any other class which is also a List. You'll find out, however, that there is a mutable.LinearSeq trait.
In Java terms, you are comparing interfaces with classes -- they are distinct.
Don't forget scala.collection.mutable.{LinkedList,DoubleLinkedList}. They are mutable, and they are LinearSeq. Mutation is a little weird--you can modify the head by assigning to the elem reference, and the tail by assigning to the next reference.
For example, this loop changes all negative values to zero.
val lst = collection.mutable.LinkedList(1, -2, 7, -9)
var cur = lst
while (cur != Nil) {
if (cur.elem < 0) cur.elem = 0
cur = cur.next
}
This loop removes every second element from the list.
var cur = lst
while (cur != Nil && cur.next != Nil) {
cur.next = cur.next.next
cur = cur.next
}
I am not suggesting that these are any better than the immutable List. I am just pointing out that Scala has mutable lists that look fairly similar to what you have seen in your data structures class.