What is the most effective structure for appending elements to a List-like collection in Scala? - scala

I have to append elements to my collection. Which structure is more preferable? Appending to List costs O(n), what about ListBuffer, ArrayBuffer, Set, Map and other structures?

ListBuffer accotding to the docs:
It provides constant time prepend and append.
But it is mutable structure, so be careful using - preferably in a very limited scope (e.g. function or method).
ArrayBuffer according to the documentation:
Prepends and removes are linear in the buffer size.
Because this structure built on top of the dynamic array, hence sometimes require internal array copy for recreation, which in JVM is almost constant but still not exactly constant time. See System.arraycopy documentation for more details. Also mutable structure.
Set, Map - are not what you called List-like at all. Set - un-ordered (list IS ordered) structure, which contains ONLY unique elements. Map[K, V] - stores as the name stands, the mapping between K type keys to V type values.
So as conclusion: if you need to append elements I'd suggest to go with ListBuffer, but since this is mutable structure limit scope its usage ad whenever you need to pass it somewhere - convert it to List.

Related

Is there any benefit of working with an Iterator over a List

Is there any benefit of manipulating an Iterator over or List ?
I need to know if concatenating 2 iterators is better that concatenating to List ?
In a sense what the fundamental difference between working with iterator over the actual collection.
An Iterator isn't an actual data structure, although it behaves similar to one. It is just a traversal pointer to some actual data structure. Thus, unlike in an actual data structure, an Iterator can't "go back," that is, access old elements. Once you've gone through an Iterator, you're done.
What's cool about Iterator is that you can give it a map, filter, or other transformation elements, and instead of actually modifying any existing data structure, it will instead apply the transformation the next time you ask for an element.
"Concatenating" two Iterators creates a new Iterator that wraps both of them.
On the other hand, Lists are actual collections and can be re-traversed.

Since Scala lists are immutable, are they actually traversed at run-time for operations, length, last or xs(n)?

Until now I thought a list had to be traversed to count the length of it or get the last element.
Then I thought "since it is immutable, the length or last element, or any element for that sake, are all constant, so maybe some work could be saved by storing those in pointers on creation of a list".
If I have a list xs and use xs.length, and later on I use xs.length again, will the list be traversed twice?
Yes, the list is traversed with every call to length.
The thing about List is that there is no "manager" container to store all that information. A reference to a list is actually a reference to the first node of that list, and it only knows about it's own data element and the next node in the list. You could come up with a mechanism to cache that information but it would increase the overhead of List.
Sometimes. It depends on which implementation of List you are talking about. Most of the List's are defined as recursive data structures, eg head :: (tail:List) I think ListBuffer has a constant time lookup for length
The docs detail the performance of typical operations.

Scala how to update values in immutable list

I have a immutable list and need a new copy of it with elements replaced at multiple index locations. The List.updated is an O(n) operation and can only replace one at a time. What is the efficient way of doing this? Thanks!
List is not a good fit if you need random element access/update. From the documentation:
This class is optimal for last-in-first-out (LIFO), stack-like access patterns. If you need another access pattern, for example, random access or FIFO, consider using a collection more suited to this than List.
More generally, what you need is an indexed sequence instead of a linear one (such as List). From the documentation of IndexedSeq:
Indexed sequences support constant-time or near constant-time element access and length computation. They are defined in terms of abstract methods apply for indexing and length.
Indexed sequences do not add any new methods to Seq, but promise efficient implementations of random access patterns.
The default concrete implementation of IndexedSeq is Vector, so you may consider using it.
Here's an extract from its documentation (emphasis added):
Vector is a general-purpose, immutable data structure. It provides random access and updates in effectively constant time, as well as very fast append and prepend. Because vectors strike a good balance between fast random selections and fast random functional updates, they are currently the default implementation of immutable indexed sequences
list
.iterator
.zipWithIndex
.map { case (index, element) => newElementFor(index) }
.toList

LinkedList vs MutableList in scala

Below, both descriptions of these data structures: (from Programming in scala book)
Linked lists
Linked lists are mutable sequences that consist of nodes
that are linked with next pointers. In most languages null would be
picked as the empty linked list. That does not work for Scala
collections, because even empty sequences must support all sequence
methods. LinkedList.empty.isEmpty, in par- ticular, should return true
and not throw a NullPointerException. Empty linked lists are encoded
instead in a special way: Their next field points back to the node
itself. Like their immutable cousins, linked lists are best operated
on sequen- tially. In addition, linked lists make it easy to insert an
element or linked list into another linked list.
Mutable lists
A MutableList consists of a single linked list together with a pointer
that refers to the terminal empty node of that list. This makes list
append a con- stant time operation because it avoids having to
traverse the list in search for its terminal node. MutableList is
currently the standard implementation of mutable.LinearSeq in Scala.
Main difference is the addition of the last element's pointer in MutableList type.
Question is: What might be the usage preferring LinkedList rather than MutableList? Isn't MutableList strictly (despite the new pointer) equivalent and even more practical with a tiny addition of used memory (the last element's pointer)?
Since MutableList wraps a LinkedList, most operations involve an extra indirection step. Note that wrapping means, it contains an internal variable to a LinkedList (indeed two, because of the efficient last element lookup). So the linked list is a required building block to realise the mutable list.
If you do not need prepend or look up of the last element, you could thus just go for the LinkedList. Scala offers you a large choice of data structures, so the best is first to make a checklist of all the operations that you require (and their preferred efficiency), then choose the best fit.
Generally, I recommend you to use immutable structures, they are often as efficient as the mutable ones and don't produce problems with concurrency.

Which scala mutable list to use?

This is a followup question to No Scala mutable list
I want to use a mutable list in Scala. I can chose from
scala.collection.mutable.DoubleLinkedList
scala.collection.mutable.LinkedList
scala.collection.mutable.ListBuffer
scala.collection.mutable.MutableList
Which is nice, but what is the "standard", recommended, idiomatic scala way? I just want to use a list that I can add things to on the back.
In my case, I am using a HashMap, where the "lists" (I am meaning it in general sense) will be on value side. Then, I am reading something from a file and for every line, I want to find the right list in the hashmap and append the value to the list.
Depends what you need.
DoubleLinkedList is a linked list which allows you to traverse back-and-forth through the list of nodes. Use its prev and next references to go to the previous or the next node, respectively.
LinkedList is a singly linked list, so there are not prev pointers - if you only traverse to the next element of the list all the time, this is what you need.
EDIT: Note that the two above are meant to be used internally as building blocks for more complicated list structures like MutableLists which support efficient append, and mutable.Queues.
The two collections above both have linear-time append operations.
ListBuffer is a buffer class. Although it is backed by a singly linked list data structure, it does not expose the next pointer to the client, so you can only traverse it using iterators and the foreach.
Its main use is, however, as a buffer and an immutable list builder - you append elements to it via +=, and when you call result, you very efficiently get back a functional immutable.List. Unlike mutable and immutable lists, both append and prepend operations are constant-time - you can append at the end via += very efficiently.
MutableList is used internally, you usually do not use it unless you plan to implement a custom collection class based on the singly linked list data structure. Mutable queues, for example, inherit this class. MutableList class also has an efficient constant-time append operation, because it maintains a reference to the last node in the list.
The documentation's Concrete Mutable Collection Classes page (or the one for 2.12) has an overview of mutable list classes, including explanations on when to use which one.
If you want to append items you shouldn't use a List at all. Lists are good when you want to prepend items. Use ArrayBuffer instead.
I just want to use a list that I can add things to on the back.
Then choose something that implements Growable. I personally suggest one of the Buffer implementations.
I stay away from LinkedList and DoubleLinkedList, as they are present mainly as underlying implementation of other collections, but have quite a few bugs up to Scala 2.9.x. Starting with Scala 2.10.0, I expect the various bug fixes have brought them up to standard. Still, they lack some methods people expect, such as +=, which you'll find on collections based on them.