Efficient way to get a random element in Scala? - scala

What is an efficient way to get a random element from a collection in Scala? There's a related question here, but like one of the comments pointed out, "[that] question does not specify any efficiency needs".

An arbitrary collection cannot be accessed in constant time. So you need some special collection with the desired property. For instance — Vector or Array. See Performance Characteristics of collections for others.

util.Random.shuffle(List.range(1,100)) take 3

Use a collection with a constant-time size() and get() method.

If you need random order of all collection elements, then Random.shuffle is what you need. (You'd better convert the original collection to array to avoid forward and backward conversion.)

Related

What is the best way to return a view to an ArraySeq?

Ideally when working from an IndexedSeqView over an ArraySeq, there should be only one array copy (or at least only one new array allocated) when converting the view back to an ArraySeq after some manipulation.
What is the most performant way to get back to an ArraySeq?
A couple options that come to mind:
ArraySeq.unsafeWrapArray(view.toArray)
seems promising, but does .toArray know the view is over an array and can use fast array copies to populate the new array?
ArraySeq.from(view)
I'm guessing this is O(n)
The standard way would be
view.to(ArraySeq)
Looking at the documentation, the recommended way is calling view.force to evaluate the transformations for the collection.

Best practices for finding objects from a list of ids?

I need to retrieve all documents associated with a list of ids.
I thought of a couple of ways to do it:
Use a filter, then add a or clause and for each id add a id equals x condition to that or.
Use the in operator, like query.field("_id").in(ids)
something else?
Is there a method considered to be a best practice for this kind of case?
Also, which method performs best in large data sets?

scala queue sort method

I am comparing a number of different methods for organizing the nodes at the "frontier" in dijkstra's single source shortest path algorithm. One of the implementations that I am playing around with is using q: scala.collection.mutable.Queue.
Essentially, each time I add a node to q, I sort q. This method, as expected, takes significantly longer than using scala.collection.mutable.PriorityQueue and a MinHeap that I implemented. My question is, what kind of sort is Queue using when I call q.sorted? I am specifically interested in the time complexity of the sorted implementation.
I have tried looking at the API (http://www.scala-lang.org/api/2.10.2/index.html#scala.collection.mutable.Queue) and code (https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/mutable/Queue.scala#L1) but haven't been able to track this down.
Thank you in advance for your help.
Queue inherits sorted method from SeqLike. And you can see, that it creates new array of same elements, sorts array via java.util.Arrays.sort and then creates new structure of original type.

Efficiently accessing array within array

I have a data type called Filter which has an NSMutableArray property which holds a bunch of FilterKey objects (different amount of keys for each filter). I have a bunch of these Filter objects stored in an NSMutableArray called Filters.
I have a UITableView for which each row is populated with data from one of the FilterKey objects. My question is, for a given row in the UITableView, how can I use the Filters array to find the right FilterKey (considering I've already put the Filters and Keys in order manually)?
Basically, I know I could just traverse through the Filters array and for each Filter object, traverse through all it's FilterKeys, but I'm wondering is there a better way to do this (ie better data structure, or something that would give me the right object faster?
Sorry if this is confusing if you have any questions please let me know in the comments.
Typically you would use sections and rows for this, where each section is a Filter and each row is a FilterKey.
It sounds like you just want to show the filter keys, and not have section headers for their filters (if I'm reading your post correctly). If you don't actually want headers, that's fine, just return 0 for tableView:heightForHeaderInSection: and nil for tableView:viewForHeaderInSection:.
All of this is really more for convenience than performance. It is unlikely that it will be much faster than running through the filters and adding up the counts of their keys. That kind of operation is extremely fast. But sections/rows maps your data better, so I'd probably use it anyway to keep the code simpler.
You can use NSMutableDictionary which is hash-mapped resulting in faster, easier, readable operations.
If you prefer arrays then there is no need to traverse to search for a specific value, you can use NSPredicate to filter your array.

What is the prefered way in using the parallel collections in Scala?

At first I assumed that every collection class would receive an additional par method which would convert the collection to a fitting parallel data structure (like map returns the best collection for the element type in Scala 2.8).
Now it seems that some collection classes support a par method (e. g. Array) but others have toParSeq, toParIterable methods (e. g. List). This is a bit weird, since Array isn't used or recommended that often.
What is the reason for that? Wouldn't it be better to just have a par available on all collection classes doing the "right thing"?
If I have data which might be processed in parallel, what types should I use? The traits in scala.collection or the type of the implementation directly?
Or should I prefer Arrays now, because they seem to be cheaper to parallelize?
Lists aren't that well suited for parallel processing. The reason is that to get to the end of the list, you have to walk through every single element. Thus, you may as well just treat the list as an iterator, and thus may as well just use something more generic like toParIterable.
Any collection that has a fast index is a good candidate for parallel processing. This includes anything implementing LinearSeqOptimized, plus trees and hash tables. Array has as fast of an index as you can get, so it's a fairly natural choice. You can also use things like ArrayBuffer (which has a par method returning a ParArray).