In PureScript, how does List differ from Array?
What are the reasons to use one over the other?
The representation for Array is a JavaScript array, whereas List is implemented as a cons (linked) list.
Lists have better performance characteristics when being built up item-by-item, or iterated over by taking an item from front each time - basically List has O(1) cons and uncons, vs O(n) for Array.
Take a look at the documentation for Array and List on Pursuit for more information about the running time of various operations.
Related
Until now I thought a list had to be traversed to count the length of it or get the last element.
Then I thought "since it is immutable, the length or last element, or any element for that sake, are all constant, so maybe some work could be saved by storing those in pointers on creation of a list".
If I have a list xs and use xs.length, and later on I use xs.length again, will the list be traversed twice?
Yes, the list is traversed with every call to length.
The thing about List is that there is no "manager" container to store all that information. A reference to a list is actually a reference to the first node of that list, and it only knows about it's own data element and the next node in the list. You could come up with a mechanism to cache that information but it would increase the overhead of List.
Sometimes. It depends on which implementation of List you are talking about. Most of the List's are defined as recursive data structures, eg head :: (tail:List) I think ListBuffer has a constant time lookup for length
The docs detail the performance of typical operations.
Previously I had been using "being the elements of" feature of loop to iterate over a sequence of an unknown type. I just found out that "being the elements of" is not provided in every implementation of Common Lisp and am wondering if there is any clean way to iterate over a sequence using loop. The best solution I have been able to come with is to coerce the sequence to a list and then iterate over that.
No, LOOP does not provide such a feature directly. If your LOOP implementation is extensible (which the standard says nothing about, too), you might be able to implement such a feature.
LOOP has clauses to iterate over lists - for item in list - and a clause to iterate over a vector - for e across vector - note that strings are also vectors, one-dimensional arrays. But not both together.
Otherwise use MAP or MAP-INTO to iterate over sequences.
The ITERATE macro provides such a feature: for i in-sequence seq.
Below, both descriptions of these data structures: (from Programming in scala book)
Linked lists
Linked lists are mutable sequences that consist of nodes
that are linked with next pointers. In most languages null would be
picked as the empty linked list. That does not work for Scala
collections, because even empty sequences must support all sequence
methods. LinkedList.empty.isEmpty, in par- ticular, should return true
and not throw a NullPointerException. Empty linked lists are encoded
instead in a special way: Their next field points back to the node
itself. Like their immutable cousins, linked lists are best operated
on sequen- tially. In addition, linked lists make it easy to insert an
element or linked list into another linked list.
Mutable lists
A MutableList consists of a single linked list together with a pointer
that refers to the terminal empty node of that list. This makes list
append a con- stant time operation because it avoids having to
traverse the list in search for its terminal node. MutableList is
currently the standard implementation of mutable.LinearSeq in Scala.
Main difference is the addition of the last element's pointer in MutableList type.
Question is: What might be the usage preferring LinkedList rather than MutableList? Isn't MutableList strictly (despite the new pointer) equivalent and even more practical with a tiny addition of used memory (the last element's pointer)?
Since MutableList wraps a LinkedList, most operations involve an extra indirection step. Note that wrapping means, it contains an internal variable to a LinkedList (indeed two, because of the efficient last element lookup). So the linked list is a required building block to realise the mutable list.
If you do not need prepend or look up of the last element, you could thus just go for the LinkedList. Scala offers you a large choice of data structures, so the best is first to make a checklist of all the operations that you require (and their preferred efficiency), then choose the best fit.
Generally, I recommend you to use immutable structures, they are often as efficient as the mutable ones and don't produce problems with concurrency.
This is a followup question to No Scala mutable list
I want to use a mutable list in Scala. I can chose from
scala.collection.mutable.DoubleLinkedList
scala.collection.mutable.LinkedList
scala.collection.mutable.ListBuffer
scala.collection.mutable.MutableList
Which is nice, but what is the "standard", recommended, idiomatic scala way? I just want to use a list that I can add things to on the back.
In my case, I am using a HashMap, where the "lists" (I am meaning it in general sense) will be on value side. Then, I am reading something from a file and for every line, I want to find the right list in the hashmap and append the value to the list.
Depends what you need.
DoubleLinkedList is a linked list which allows you to traverse back-and-forth through the list of nodes. Use its prev and next references to go to the previous or the next node, respectively.
LinkedList is a singly linked list, so there are not prev pointers - if you only traverse to the next element of the list all the time, this is what you need.
EDIT: Note that the two above are meant to be used internally as building blocks for more complicated list structures like MutableLists which support efficient append, and mutable.Queues.
The two collections above both have linear-time append operations.
ListBuffer is a buffer class. Although it is backed by a singly linked list data structure, it does not expose the next pointer to the client, so you can only traverse it using iterators and the foreach.
Its main use is, however, as a buffer and an immutable list builder - you append elements to it via +=, and when you call result, you very efficiently get back a functional immutable.List. Unlike mutable and immutable lists, both append and prepend operations are constant-time - you can append at the end via += very efficiently.
MutableList is used internally, you usually do not use it unless you plan to implement a custom collection class based on the singly linked list data structure. Mutable queues, for example, inherit this class. MutableList class also has an efficient constant-time append operation, because it maintains a reference to the last node in the list.
The documentation's Concrete Mutable Collection Classes page (or the one for 2.12) has an overview of mutable list classes, including explanations on when to use which one.
If you want to append items you shouldn't use a List at all. Lists are good when you want to prepend items. Use ArrayBuffer instead.
I just want to use a list that I can add things to on the back.
Then choose something that implements Growable. I personally suggest one of the Buffer implementations.
I stay away from LinkedList and DoubleLinkedList, as they are present mainly as underlying implementation of other collections, but have quite a few bugs up to Scala 2.9.x. Starting with Scala 2.10.0, I expect the various bug fixes have brought them up to standard. Still, they lack some methods people expect, such as +=, which you'll find on collections based on them.
At first I assumed that every collection class would receive an additional par method which would convert the collection to a fitting parallel data structure (like map returns the best collection for the element type in Scala 2.8).
Now it seems that some collection classes support a par method (e. g. Array) but others have toParSeq, toParIterable methods (e. g. List). This is a bit weird, since Array isn't used or recommended that often.
What is the reason for that? Wouldn't it be better to just have a par available on all collection classes doing the "right thing"?
If I have data which might be processed in parallel, what types should I use? The traits in scala.collection or the type of the implementation directly?
Or should I prefer Arrays now, because they seem to be cheaper to parallelize?
Lists aren't that well suited for parallel processing. The reason is that to get to the end of the list, you have to walk through every single element. Thus, you may as well just treat the list as an iterator, and thus may as well just use something more generic like toParIterable.
Any collection that has a fast index is a good candidate for parallel processing. This includes anything implementing LinearSeqOptimized, plus trees and hash tables. Array has as fast of an index as you can get, so it's a fairly natural choice. You can also use things like ArrayBuffer (which has a par method returning a ParArray).