Quote from sources:
If this is empty then it does nothing and returns that.
There are some questions where authors ask how append to LinkedList, but i didn't found, why LinkedList is designed with such behavior.
And one more questions, does Scala has any List with add/append (which changes this with O(1)) and map operations?
If you expand the documentation for append in the mutable LinkedList API doc there is something more that least explains the O(n) performance of append:
def append(that: LinkedList[A]): LinkedList[A]
If this is empty then it does nothing and returns that. Otherwise,
appends that to this. The append requires a full traversal of this.
append takes a second LinkedList (that) and appends it to the current one (this). If the current LinkedList is empty the result of appending a second LinkedList to an empty one is just the second LinkedList.
I may be misunderstanding your question, but I didn't think this could be controversial or require particular design decisions.
As for performance characteristics of operations on scala collection I'm not sure if there's anything newer, but I've always pointed to this doc.
Related
I was following a Scala video tutorial and he mentioned prepend :: takes constant time and append :+ time increases with length of list. And, also he mentioned most of the time reversing the list prepending and re-reversing the list gives better performance than appending.
Question 1
Why prepend :: takes constant time and append :+ time increases with length of list?
But reason for that is not mentioned in the tutorial and I tried in google. I didn’t find the answer but I found another surprising thing.
Question 2
ListBuffer takes constant time for both append and prepend. If possible why it wasnt implemented in List?
Obvious there would be reason behind! Appreciate if someone could explain.
Answer 1:
List is implemented as Linked list. The reference you hold is to it's head.
e.g. if you have a list of 4 elements (1 to 4) it will be:
[1]->[2]->[3]->[4]->//
Prepending meaning adding new element to the head and return the new head:
[5]->[1]->[2]->[3]->[4]->//
The reference to the old head [1] still valid and from it's point of view there are still 4 elements.
On the other hand, appending meaning adding element to the end of the list.
Since List is immutable, we can't just add it to the end, but we need to clone the entire List:
[1']->[2']->[3']->[4']->[5]->//
Since clone mean copy the entire list in the same order, we need to iterate over each element and append it.
Answer 2:
ListBuffer is mutable collection, changing it will change all the references.
Ad. 1. The list in Scala is defined (simplifying) as a head and a tail. The tail is also a list. Adding an element to the head means creation a new list with a new head and the existing list as a new tail. The existing list is not changed. This is why it is a constant time operation.
Appending to a list needs rebuilding the existing list, which cannot be done in constant time.
Ad. 2. ListBuffer is a mutable collection. It may be more efficient in some applications, but on the other hand immutable collections are thread-safe and easily scalable.
I have realized that my typical way of passing Scala collections around could use some improvement.
def doSomethingCool(theFoos: List[Foo]) = { /* insert cool stuff here */ }
// if I happen to have a List
doSomethingCool(theFoos)
// but elsewhere I may have a Vector, Set, Option, ...
doSomethingCool(theFoos.toList)
I tend to write my library functions to take a List as the parameter type, but I'm certain that there's something more general I can put there to avoid all the occasional .toList calls I have in the application code. This is especially annoying since my doSomethingCool function typically only needs to call map, flatMap and filter, which are defined on all the collection types.
What are my options for that 'something more general'?
Here are more general traits, each of which extends the previous one:
GenTraversableOnce
GenTraversable
GenIterable
GenSeq
The traits above do not specify whether the collection is sequential or parallel. If your code requires that things be executed sequentially (typically, if your code has side effects of any kind), they are too general for it.
The following traits mandate sequential execution:
TraversableOnce
Traversable
Iterable
Seq
LinearSeq
The first one, TraversableOnce only allows you to call one method on the collection. After that, the collection has been "used". In exchange, it is general enough to accept iterators as well as collections.
Traversable is a pretty general collection that has most methods. There are some things it cannot do, however, in which case you need to go to Iterable.
All Iterable implement the iterator method, which allows you to get an Iterator for that collection. This gives it the capability for a few methods not present in Traversable.
A Seq[A] implements the function Int => A, which means you can access any element by its index. This is not guaranteed to be efficient, but it is a guarantee that each element has an index, and that you can make assertions about what that index is going to be. Contrast this with Map and Set, where you cannot tell what the index of an element is.
A LinearSeq is a Seq that provides fast head, tail, isEmpty and prepend. This is as close as you can get to a List without actually using a List explicitly.
Alternatively, you could have an IndexedSeq, which has fast indexed access (something List does not provide).
See also this question and this FAQ based on it.
The most obvious one is to use Traversable as the most general trait which will have the goodies you want. However, I think you are generally better sticking to:
Seq
IndexedSeq
Set
Map
A Seq will cover List, Vector etc, IndexedSeq will cover Vector etc etc. I found myself not using Iterable because I often need (or want) to know the size of the thing I have and back pre scala-2.8 Iterable did not provide access to this, so I kept having to turn things into sequences anyway!
Looks like Traversable and Iterable now have size methods so maybe I should go back to using them! Of course you could start "going mad" with GenTraversableOnce but that is not likely to aid in readability.
Being new to scala and a current java developer, scala was designed to encourage the use of immutability to class design.
How does this translate practically to the design of classes? The only thing that is brought to my mind is case classes. Are case classes strongly encouraged for defining data? Example? How else is immutability encouraged in Scala design of classes?
As a java developer, classes defining data were mutable. The equivalent Scala classes should be defined as case classes?
Well, case classes certainly help, but the biggest contributor is probably the collection library. The default collections are immutable, and the methods are geared toward manipulating collections by producing new ones instead of mutating. Since the immutable collections are persistent, that doesn't require copying the whole collection, which is something one often has to do in Java.
Beyond that, for-comprehensions are monadic comprehensions, which is helpful in doing immutable tasks, there's tail recursion optimization, which is very important in immutable algorithms, and general attention to immutability in many libraries, such as parser combinators and xml.
Finally, note that you have to ask for a var to get some mutability. Parameters are immutable, and val is just as short as var. Contrast this with Java, where parameters are mutable, and you need to add a final keyword to get immutability. Whereas in Scala it is as easy or easier to stay immutable, in Java it is easier to stay mutable.
Addendum
Persistent data structures are data structures that share parts between modified versions of it. This might be a bit difficult to understand, so let's consider Scala's List, which is pretty basic and easy to understand.
A Scala List is composed of two classes, known as cons and Nil. The former is actually written :: in Scala, but I'll refer to it by the traditional name.
Nil is the empty list. It doesn't contain anything. Methods that depend on the list not being empty, such as head and tail throw exceptions, while others work ok.
Naturally, cons must then represent a non-empty list. In fact, cons has exactly two elements: a value, and a list. These elements are known as head and tail.
So a list with three elements is composed of three cons, since each cons will hold only one value, plus a Nil. It must have a Nil because a cons must point to a list. As lists are not circular, then one of the cons must point to something other than a cons.
One example of such list is this:
val list = 1 :: 2 :: 3 :: Nil
Now, the components of a Scala List are immutable. One cannot change neither the value nor the list of a cons. One benefit of immutability is that you never need to copy the collection before passing or after receiving it from some other method: you know that list cannot change.
Now, let's consider what would happen if I modified that list. Let's consider two modifications: removing the first element and prepending a new element.
We can remove one element with the method tail, whose name is not a coincidence at all. So, we write:
val list2 = list.tail
And list2 will point to the same list that list's tail is pointing. Nothing at all was created: we simply reused part of list. So, let's prepend an element to list2 then:
val list3 = 0 :: list2
We created a new cons there. This new cons has a value (a head) equal to 0, and its tail points to list2. Note that both list and list3 point to the same list2. These elements are being shared by both list and list3.
There are many other persistent data structures. The very fact that the data you are manipulating is immutable makes it easy to share components.
One can find more information about this subject on the book by Chris Okasaki, Purely Functional Data Structures, or on his freely available thesis by the same name.
There's something I don't understand about Scala's collection.mutable.Seq. It describes the interface for all mutable sequences, yet I don't see methods to append or prepend elements without creating a new sequence. Am I missing something obvious here?
There are :+ and +: for append and prepend, respectively, but they create new collections — in order to be consistent with the behavior of immutable sequences, I assume. This is fine, but why is there no method like += and +=:, like ArrayBuffer and ListBuffer define, for in-place append and prepend? Does it mean that I cannot refer to a mutable seq that's typed as collection.mutable.Seq if I want to do in-place append?
Again, I must have missed something obvious, but cannot find what…
Mutability for sequences only guarantees that you'll be able to swap out the items for different ones (via the update method), as you can with e.g. primitive arrays. It does not guarantee that you'll be able to make the sequence larger (that's what the Growable trait is for) or smaller (Shrinkable).
Buffer is the abstract trait that contains Growable and Shrinkable, not Seq.
What is the difference between Scala's MutableList and ListBuffer classes in scala.collection.mutable? When would you use one vs the other?
My use case is having a linear sequence where I can efficiently remove the first element, prepend, and append. What's the best structure for this?
A little explanation on how they work.
ListBuffer uses internally Nil and :: to build an immutable List and allows constant-time removal of the first and last elements. To do so, it keeps a pointer on the first and last element of the list, and is actually allowed to change the head and tail of the (otherwise immutable) :: class (nice trick allowed by the private[scala] var members of ::). Its toList method returns the normal immutable List in constant time as well, as it can directly return the structure maintained internally. It is also the default builder for immutable Lists (and thus can indeed be reasonably expected to have constant-time append). If you call toList and then again append an element to the buffer, it takes linear time with respect to the current number of elements in the buffer to recreate a new structure, as it must not mutate the exported list any more.
MutableList works internally with LinkedList instead, an (openly, not like ::) mutable linked list implementation which knows of its element and successor (like ::). MutableList also keeps pointers to the first and last element, but toList returns in linear time, as the resulting List is constructed from the LinkedList. Thus, it doesn't need to reinitialize the buffer after a List has been exported.
Given your requirements, I'd say ListBuffer and MutableList are equivalent. If you want to export their internal list at some point, then ask yourself where you want the overhead: when you export the list, and then no overhead if you go on mutating buffer (then go for MutableList), or only if you mutable the buffer again, and none at export time (then go for ListBuffer).
My guess is that in the 2.8 collection overhaul, MutableList predated ListBuffer and the whole Builder system. Actually, MutableList is predominantly useful from within the collection.mutable package: it has a private[mutable] def toLinkedList method which returns in constant time, and can thus efficiently be used as a delegated builder for all structures that maintain a LinkedList internally.
So I'd also recommend ListBuffer, as it may also get attention and optimization in the future than “purely mutable” structures like MutableList and LinkedList.
This gives you an overview of the performance characteristics: http://www.scala-lang.org/docu/files/collections-api/collections.html ; interestingly, MutableList and ListBuffer do not differ there. The documentation of MutableList says it is used internally as base class for Stack and Queue, so maybe ListBuffer is more the official class from the user perspective?
You want a list (why a list?) that is growable and shrinkable, and you want constant append and prepend. Well, Buffer, a trait, has constant append and prepend, with most other operations linear. I'm guessing that ListBuffer, a class that implements Buffer, has constant time removal of the first element.
So, my own recommendation is for ListBuffer.
First, lets go over some of the relevant types in Scala
List - An Immutable collection. A Recursive implementation i.e . i.e An instance of list has two primary elements the head and the tail, where the tail references another List.
List[T]
head: T
tail: List[T] //recursive
LinkedList - A mutable collection defined as a series of linked nodes, where each node contains a value and a pointer to the next node.
Node[T]
value: T
next: Node[T] //sequential
LinkedList[T]
first: Node[T]
List is a functional data structure (immutability) compared to LinkedList which is more standard in imperative languages.
Now, lets look at
ListBuffer - A mutable buffer implementation backed by a List.
MutableList - An implementation based on LinkedList ( Would have been more self explanatory if it had been named LinkedListBuffer instead )
They both offer similar complexity bounds on most operations.
However, if you request a List from a MutableList, then it has to convert the existing linear representation into the recursive representation which takes O(n) which is what #Jean-Philippe Pellet points out. But, if you request a Seq from MutableList the complexity is O(1).
So, IMO the choice narrows down to the specifics of your code and your preference. Though, I suspect there is a lot more List and ListBuffer out there.
Note that ListBuffer is final/sealed, while you can extend MutableList.
Depending on your application, extensibility may be useful.