Scala which data structure is most efficient for my intended operations? - scala

I am running a dynamic programming function where I carry a list of strings throughout the process.
Over time, I append new strings to the end of this list, and occasionally I may remove the last element. Right now I am currently using a mutable ListBuffer, doing += for the appends and .trimEnd(1) for the removals.
Once my dynamic programming procedure is done, I need to efficiently be able to access each element of that list/sequence/etc, and in order (the first item I inserted will be first accessed, whereas last item I inserted will be the last accessed).
I've also tried ArrayBuffers but they both seem too slow. I am trying to speed this process up and I am wondering if I am using a data structure that has O(n) operations when there may be something that has O(1) time operations for what I need.

A simple singly linked list will provide O(1) for the append/discard portions of what you describe. Worst case, if you need to reverse the list at the end before processing, that would be an O(n) operation paid once.
Note that if you go down this road, during the accumulation phase "first" and "last" will be reversed (you will prepend items and drop the first item by getting the tail of the list).

Related

Is there any benefit of working with an Iterator over a List

Is there any benefit of manipulating an Iterator over or List ?
I need to know if concatenating 2 iterators is better that concatenating to List ?
In a sense what the fundamental difference between working with iterator over the actual collection.
An Iterator isn't an actual data structure, although it behaves similar to one. It is just a traversal pointer to some actual data structure. Thus, unlike in an actual data structure, an Iterator can't "go back," that is, access old elements. Once you've gone through an Iterator, you're done.
What's cool about Iterator is that you can give it a map, filter, or other transformation elements, and instead of actually modifying any existing data structure, it will instead apply the transformation the next time you ask for an element.
"Concatenating" two Iterators creates a new Iterator that wraps both of them.
On the other hand, Lists are actual collections and can be re-traversed.

swift array 'contains' function in place optimization

I am wondering if the built-in array structure for the contains function has any optimizations. If it does a linear search of contains every time I run it, it's at best O(n), which turns into O(n^2) because I'll be looping through another set of points to check against, however if it somehow behind the scenes sorts the array the first time 'contains' is run, then every subsequent 'contains' would be O(log(n)).
I have an array that gradually gets larger the more in depth the user gets into the application, and I am using the 'contains' a lot, so I am expecting it to slow down the application the longer the user is using the application.
If the array doesn't have any behind the scenes optimizations, then ought I build my own? (e.g. quicksort, and do an insert(newElement:, at:) every time I add to the array?)
Specifically, I'm using
[CGPoint],
CGPointArrayVariable.contains(newCGPoint) // run 100s - 10000s of times ideally every frame, (but realistically probably every second)
and when I add new CGPoints I'm using
CGPointArrayVariable += newCGPointSet.
So, the question: Am I ok continuing to use the built in .contains function (will it be fast enough?) or should I build my own structure optimizing for the contains, and keeping the array sorted? (maybe an insertion sort would be better to use opposed to a quicksort, if that's the direction recommended)
Doing something like that every frame will be VERY inefficient.
I would suggest re-thinking your design to avoid tracking that amount of information for every frame.
If it's absolutely necessary, building your own Type that uses a Dictionary rather than an Array should be more efficient.
Also, if it works with your use case using a Setmight be the best option.

Since Scala lists are immutable, are they actually traversed at run-time for operations, length, last or xs(n)?

Until now I thought a list had to be traversed to count the length of it or get the last element.
Then I thought "since it is immutable, the length or last element, or any element for that sake, are all constant, so maybe some work could be saved by storing those in pointers on creation of a list".
If I have a list xs and use xs.length, and later on I use xs.length again, will the list be traversed twice?
Yes, the list is traversed with every call to length.
The thing about List is that there is no "manager" container to store all that information. A reference to a list is actually a reference to the first node of that list, and it only knows about it's own data element and the next node in the list. You could come up with a mechanism to cache that information but it would increase the overhead of List.
Sometimes. It depends on which implementation of List you are talking about. Most of the List's are defined as recursive data structures, eg head :: (tail:List) I think ListBuffer has a constant time lookup for length
The docs detail the performance of typical operations.

Scala's toList function appears to be slow

I was under the impression that calling seq.toList() on an immutable Seq would be making a new list which is sharing the structural state from the first list. We're finding that this could be really slow and I'm not sure why. It is just sharing the structural state, correct? I can't see why it'd be making an n-time copy of all the elements when it knows they'll never change.
A List in Scala is a particular data structure: instances of :: each containing a value, followed by Nil at the end of the chain.
If you toList a List, it will take O(1) time. If you toList on anything else then it must be converted into a List, which involves O(n) object allocations (all the :: instances).
So you have to ask whether you really want a scala.collection.immutable.List. That's what toList gives you.
Sharing structural state is possible for particular operations on particular data structures.
With the List data structure in Scala, my understanding is that every element refers to the next, starting from the head through the tail, so a singly linked list.
From a structural state sharing perspective, consider the restrictions placed on this from the internal data structure perspective. Adding an element to the head of a List (X) effectively creates a new list (X') with the new element as the head of X' and the old list (X) as the tail. For this particular operation, internal state can be shared completely.
The same operation above can be applied to create a new List (X'), with the new element as the head of X' and any element from X as the tail, as long as you accept that the tail will be the element you choose from X, plus all additional elements it already has in it's data structure.
When you think about it logically, each data structure has an internal structure that allows some operations to be performed with simple shared internal structure and other operations requiring more invasive and costly computations.
The key from my perspective here is having an understanding of the constraints placed on the operations by the internal data structure itself.
For example, consider the same operations above on a doubly linked list data structure and you will see that there are quite different restrictions.
Personally, I find drawing out an understanding of the internal structure can be helpful in understanding the consequences of particular operations.
In the case of the toList operation on an arbitrary sequence, with no knowledge of the arbitrary sequences internal data structure, one has to therefore assume O(n). List.toList has the obvious performance advantage of already being a list.

Scala List.updated

I am curious about List.updated. What is it's runtime? And how does it compare to just changing one element in an ArrayBuffer? In the background, how does it deal with copying all of the list? Is this an O(n) procedure? If so, is there an immutable data structure that has an updated like method without being so slow?
An example is:
val list = List(1,2,3)
val list2 = list.updated(2, 5) --> # list2 = (1,5,3)
var abuf = ArrayBuffer(1,2,3)
abuf(2) = 5 --> # abuf = (1,5,3)
The time and memory complexity of the updated(index, value) method is linear in terms of index (not in terms of the size of the list). The first index cells are recreated.
Changing an element in an ArrayBuffer has constant time and memory complexity. The backing array is updated in place, no copying occurs.
This updated method is not slow if you update elements near the beginning of the list. For larger sequences, Vector has a different way to share common parts of the list and will probably do less copying.
List.updated is an O(n) operation (linear).
It calls the linear List.splitAt operation to split the list at the index to get two list (before, rest), then builds a new list by appending the elements in before, the updated element and then the elements in rest.tail.
I'm not sure - this would have to be tested, but it seems that if the updated element was at the start of the list, it may be pretty efficient as in theory getting rest and appending rest.tail could be done in constant time.
I suppose performance would be O(n) since list doesn't store index to each element and implemented as links to next el -> el2 -> el3` so only list.head operation are O(1) as fast.
You should use IndexedSeq for that purpose with most common implmentation Vector.
Although it doesn't copy any data so only 1 value are actually updated in memory.
In general all scala immutable collections dosn't copy all data on modification or creation of updated new instance. It is key difference with Java collections.