Where does Array get its toList method - scala

Through searches, I understand the way (or, a way) to convert an Array to a List is like so:
val l = Array(1, 2, 3).toList
But not only can I not find the toList method in Array's API docs, I can't find it in anything that seems to be an ancestor or inherited trait of Array.
Using the newer 2.9 API docs, I see that toList exists in these things:
ImmutableMapAdaptor ImmutableSetAdaptor IntMap List ListBuffer LongMap
MutableList Option ParIterableLike PriorityQueue Stack StackProxy
StreamIterator SynchronizedSet SynchronizedStack TraversableForwarder
TraversableOnce TraversableOnceMethods TraversableProxyLike
But I can't understand how toList gets from one of these to be part of Array. Can anyone explain this?

toList and similar methods not natively found on Java arrays (including our old favourites, map, flatMap, filter etc.) come from s.c.m.ArrayOps, which arrays acquire via implicit conversions in scala.Predef. Look for implicit methods whose names end with ArrayOps and you'll see where the magic comes from.

Related

How does Array.toList work in Scala?

I'm looking at this snippet:
val here: Array[Int] = rdd.collect()
println(here.toList)
... looking at the source for toList (on TraversableOnce) and Array (Array does not inherit TraversableOnce though), but I can't find the connection that will make Scala consider an Array as a TraversableOnce - if that is even what happens. Is there some implicit at work here? Is there a conversion via ArraySeq or WrappedArray? How does that toList work?
Array does not extend TraversableOnce, but it is implicitly convertible to IndexedSeq, which does!
This means that internally, the array is converted to a WrappedArray, then toList is called on this.
See here for more info:
http://www.scala-lang.org/docu/files/collections-api/collections_38.html

How to iterate through lazy iterable in scala? from stanford-tmt

Scala newbie here,
I'm using stanford's topic modelling toolkit
and it has a lazy iterable of type LazyIterable[(String, Array[Double])]
How should i iterate through all the elements in this iterable say it to print all these values?
I tried doing this by
while(it.hasNext){
System.out.println(it.next())
}
Gives an error
error: value next is not a member of scalanlp.collection.LazyIterable[(String, Array[Double])]
This is the API source -> iterable_name ->
InferCVB0DocumentTopicDistributions in
http://nlp.stanford.edu/software/tmt/tmt-0.4/api/edu/stanford/nlp/tmt/stage/package.html
Based on its source code, I can see that the LazyIterable implements the standard Scala Iterable interface, which means you have access to all the standard higher-order functions that all Scala collections implement - such as map, flatMap, filter, etc.
The one you will be interested in for printing all the values is foreach. So try this (no need for the while-loop):
it.foreach(println)
Seems like method invocation problem, just check the source code of LazyIterable, look at line 46
override def iterator : Iterator[A]
when you get an instance of LazyIterable, invoke iterator method, then you can do what you want.

Why aren't toList and friends deprecated?

Prior to version 2.10 of Scala sequence types had methods like toList and toArray for converting from one type to another. As of Scala 2.10 we have to[_], e.g. to[List], which appears to subsume toList and friends and also give us the ability to convert to new types like Vector and presumably even to our own collection types. And of course it gives you the ability to convert to a type which you know only as a type parameter, e.g. to[A] -- nice!
But why weren't the old methods deprecated? Are they faster? Are there cases where toList works but to[List] does not? Should we prefer one over the other where both work?
toList is implemented in TraversableOnce as to[List], so there won't be any noticeable performance difference.
However, toArray is (very slightly) more efficient than to[Array] as the former allocates an array of the right size while the latter first creates an array and then sets the size hint (as it does for every target collection type). This should not make a difference in a real application unless you are converting data to arrays in a tight loop.
The old methods could easily be deprecated, and I bet they will in the future, but people are so used to them that deprecating them right away would probably make some people angry.
On issue seems to be that you cannot use to[] in postfix notation:
scala> Array(1,2) toList
res2: List[Int] = List(1, 2)
scala> Array(1,2) to[List]
<console>:1: error: ';' expected but '[' found.
Array(1,2) to[List]
scala> Array(1,2).to[List]
res3: List[Int] = List(1, 2)

Create an immutable list from a java.lang.Iterator

I'm using a library (JXPath) to query a graph of beans in order to extract matching elements. However, JXPath returns groups of matching elements as an instance of java.lang.Iterator and I'd rather like to convert it into an immutable scala list. Is there any simpler way of doing than iterating over the iterator and creating a new immutable list at each iteration step ?
You might want to rethink the need for a List, although it feels very familiar when coming from Java, and List is the default implementation of an immutable Seq, it often isn't the best choice of collection.
The operations that list is optimal for are those already available via an iterator (basically taking consecutive head elements and prepending elements). If an iterator doesn't already give you what you need, then I can pretty much guarantee that a List won't be your best choice - a vector would be more appropriate.
Having got that out the way... The recommended technique to convert between Java and Scala collections (since Scala 2.8.1) is via scala.collection.JavaConverters. This gives you more control than JavaConversions and avoids some possible implicit conflicts.
You won't have a direct implicit conversion this way. Instead, you get asScala and asJava methods pimped onto collections, allowing you to perform the conversions explicitly.
To convert a Java iterator to a Scala iterator:
javaIterator.asScala
To convert a Java iterator to a Scala List (via the scala iterator):
javaIterator.asScala.toList
You may also want to consider converting toSeq instead of toList. In the case of iterators, this'll return a Stream - allowing you to retain the lazy behaviour of iterators within the richer Seq interface.
EDIT:
There's no toVector method, but (as Daniel pointed out) there's a toIndexedSeq method that will return a Vector as the default IndexedSeq subclass (just as List is the default Seq).
javaIterator.asScala.toIndexedSeq
EDIT: You should probably look at Kevin Wright's answer, which provides a better solution available since Scala 2.8.1, with less implicit magic.
You can import the implicit conversions from scala.collection.JavaConversions and then create a new Scala collection seamlessly, e.g. like this:
import collection.JavaConversions._
println(List() ++ javaIterator)
Your Java iterator is converted to a Scala iterator by JavaConversions.asScalaIterator. A Scala iterator with elements of type A implements TraversableOnce[A], which is the argument type needed to concatenate collections with ++.
If you need another collection type, just change List() to whatever you need (e.g., IndexedSeq() or collection.mutable.Seq(), etc.).

Difference between MutableList and ListBuffer

What is the difference between Scala's MutableList and ListBuffer classes in scala.collection.mutable? When would you use one vs the other?
My use case is having a linear sequence where I can efficiently remove the first element, prepend, and append. What's the best structure for this?
A little explanation on how they work.
ListBuffer uses internally Nil and :: to build an immutable List and allows constant-time removal of the first and last elements. To do so, it keeps a pointer on the first and last element of the list, and is actually allowed to change the head and tail of the (otherwise immutable) :: class (nice trick allowed by the private[scala] var members of ::). Its toList method returns the normal immutable List in constant time as well, as it can directly return the structure maintained internally. It is also the default builder for immutable Lists (and thus can indeed be reasonably expected to have constant-time append). If you call toList and then again append an element to the buffer, it takes linear time with respect to the current number of elements in the buffer to recreate a new structure, as it must not mutate the exported list any more.
MutableList works internally with LinkedList instead, an (openly, not like ::) mutable linked list implementation which knows of its element and successor (like ::). MutableList also keeps pointers to the first and last element, but toList returns in linear time, as the resulting List is constructed from the LinkedList. Thus, it doesn't need to reinitialize the buffer after a List has been exported.
Given your requirements, I'd say ListBuffer and MutableList are equivalent. If you want to export their internal list at some point, then ask yourself where you want the overhead: when you export the list, and then no overhead if you go on mutating buffer (then go for MutableList), or only if you mutable the buffer again, and none at export time (then go for ListBuffer).
My guess is that in the 2.8 collection overhaul, MutableList predated ListBuffer and the whole Builder system. Actually, MutableList is predominantly useful from within the collection.mutable package: it has a private[mutable] def toLinkedList method which returns in constant time, and can thus efficiently be used as a delegated builder for all structures that maintain a LinkedList internally.
So I'd also recommend ListBuffer, as it may also get attention and optimization in the future than “purely mutable” structures like MutableList and LinkedList.
This gives you an overview of the performance characteristics: http://www.scala-lang.org/docu/files/collections-api/collections.html ; interestingly, MutableList and ListBuffer do not differ there. The documentation of MutableList says it is used internally as base class for Stack and Queue, so maybe ListBuffer is more the official class from the user perspective?
You want a list (why a list?) that is growable and shrinkable, and you want constant append and prepend. Well, Buffer, a trait, has constant append and prepend, with most other operations linear. I'm guessing that ListBuffer, a class that implements Buffer, has constant time removal of the first element.
So, my own recommendation is for ListBuffer.
First, lets go over some of the relevant types in Scala
List - An Immutable collection. A Recursive implementation i.e . i.e An instance of list has two primary elements the head and the tail, where the tail references another List.
List[T]
head: T
tail: List[T] //recursive
LinkedList - A mutable collection defined as a series of linked nodes, where each node contains a value and a pointer to the next node.
Node[T]
value: T
next: Node[T] //sequential
LinkedList[T]
first: Node[T]
List is a functional data structure (immutability) compared to LinkedList which is more standard in imperative languages.
Now, lets look at
ListBuffer - A mutable buffer implementation backed by a List.
MutableList - An implementation based on LinkedList ( Would have been more self explanatory if it had been named LinkedListBuffer instead )
They both offer similar complexity bounds on most operations.
However, if you request a List from a MutableList, then it has to convert the existing linear representation into the recursive representation which takes O(n) which is what #Jean-Philippe Pellet points out. But, if you request a Seq from MutableList the complexity is O(1).
So, IMO the choice narrows down to the specifics of your code and your preference. Though, I suspect there is a lot more List and ListBuffer out there.
Note that ListBuffer is final/sealed, while you can extend MutableList.
Depending on your application, extensibility may be useful.