Create an immutable list from a java.lang.Iterator - scala

I'm using a library (JXPath) to query a graph of beans in order to extract matching elements. However, JXPath returns groups of matching elements as an instance of java.lang.Iterator and I'd rather like to convert it into an immutable scala list. Is there any simpler way of doing than iterating over the iterator and creating a new immutable list at each iteration step ?

You might want to rethink the need for a List, although it feels very familiar when coming from Java, and List is the default implementation of an immutable Seq, it often isn't the best choice of collection.
The operations that list is optimal for are those already available via an iterator (basically taking consecutive head elements and prepending elements). If an iterator doesn't already give you what you need, then I can pretty much guarantee that a List won't be your best choice - a vector would be more appropriate.
Having got that out the way... The recommended technique to convert between Java and Scala collections (since Scala 2.8.1) is via scala.collection.JavaConverters. This gives you more control than JavaConversions and avoids some possible implicit conflicts.
You won't have a direct implicit conversion this way. Instead, you get asScala and asJava methods pimped onto collections, allowing you to perform the conversions explicitly.
To convert a Java iterator to a Scala iterator:
javaIterator.asScala
To convert a Java iterator to a Scala List (via the scala iterator):
javaIterator.asScala.toList
You may also want to consider converting toSeq instead of toList. In the case of iterators, this'll return a Stream - allowing you to retain the lazy behaviour of iterators within the richer Seq interface.
EDIT:
There's no toVector method, but (as Daniel pointed out) there's a toIndexedSeq method that will return a Vector as the default IndexedSeq subclass (just as List is the default Seq).
javaIterator.asScala.toIndexedSeq

EDIT: You should probably look at Kevin Wright's answer, which provides a better solution available since Scala 2.8.1, with less implicit magic.
You can import the implicit conversions from scala.collection.JavaConversions and then create a new Scala collection seamlessly, e.g. like this:
import collection.JavaConversions._
println(List() ++ javaIterator)
Your Java iterator is converted to a Scala iterator by JavaConversions.asScalaIterator. A Scala iterator with elements of type A implements TraversableOnce[A], which is the argument type needed to concatenate collections with ++.
If you need another collection type, just change List() to whatever you need (e.g., IndexedSeq() or collection.mutable.Seq(), etc.).

Related

How to find max date from stream in scala using CompareTo?

I am new to Scala and trying to explore how I can use Java functionalities with Scala.
I am having stream of LocalDate which is a Java class and I am trying to find maximum date out of my list.
var processedResult : Stream[LocalDate] =List(javaList)
.toStream
.map { s => {
//some processing
LocalDate.parse(str, formatter)
}
}
I know we can do easily by using .compare() and .compareTo() in Java but I am not sure how do I use the same thing over here.
Also, I have no idea how Ordering works in Scala when it comes to sorting.
Can anyone suggest how can get this done?
First of all, a lot of minor details that I will point out since it seems you are pretty new to the language and I expect those to help you with your learning path.
First, avoid var at all costs, especially when learning.
While mutability has its place and is not always wrong, forcing you to avoid it while learning will help you. Particularly, avoid it when it doesn't provide any value; like in this case.
Second, this List(javaList) doesn't do what you think it does. It creates a single element Scala List whose unique element is a Java List. What you probably want is to transform that Java List into a Scala one, for that you can use the CollectionConverters.
import scala.jdk.CollectionConverters._ // This works if you are in 2.13
// if you are in 2.12 or lower use: import scala.collection.JavaConverters._
val scalaList = javaList.asScala.toList
Third, not sure why you want to use a Scala Stream, a Stream is for infinite or very large collections where you want all the transformations to be made lazily and only produce elements as they are consumed (also, btw, it was deprecated in 2.13 in favour of LazyList).
Maybe, you are confused because in Java you need a "Stream" to apply functional operations like map? If so, note that in Scala all collections provide the same rich API.
Fourth, Ordering is a Typeclass which is a functional pattern for Polymorphism. On its own, this is a very broad question so I won't answer it here, but I hope the two links provide insight.
The TL;DR; is simple, it is just that an Ordering for a type T knows how to order (sort) elements of type T. Thus operations like max will work for any collection of any type if, and only if, the compiler can prove the existence of an Ordering for that type if it can then it will pass such value implicitly to the method call for you; again the implicits topic is very broad and deserves its own question.
Now for your particular question, you can just call max or maxOption in the List or Stream and that is all.
Note that max will throw if the List is empty, whereas maxOption returns an Option which will be empty (None) for an empty input; idiomatic Scala favour the latter over the former.
If you really want to use compareTo then you can provide your own Ordering.
scalaList.maxOption(Ordering.fromLessThan[LocalDate]((d1, d2) => d1.compareTo(d2) < 0))
Ordering[A] is a type class which defines how to compare 2 elements of type A. So to compare LocalDates you need Ordering[LocalDate] instance.
LocalDate extends Comparable in Java and Scala conveniently provides instances for Comparables so when you invoke:
Ordering[java.time.LocalDate]
in REPL you'll see that Scala is able to provide you the instance without you needing to do anything (you could take a look at the list of methods provided by this typeclass).
Since you have and Ordering in implicit scope which types matches the Stream's type (e.g. Stream[LocalDate] needs Ordering[LocalDate]) you can call .max method... and that's it.
val processedResult : Stream[LocalDate] = ...
val newestDate: LocalDate = processedResult.max

why scala mutable.LinearSeq can not do +=:

mutable.MutableList[A] can do the inline prepend with its +=:, but it's trait mutable.LinearSeq can not do this. I think the whole purpose of mutable package is for doing inline update in the collection, so why mutable.LinearSeq can not prepend? and what's diff between ListBuffer and MutableList?
Why mutable.LinearSeq can not prepend?
Prepend (+=:) isn't defined on mutable.LinearSeq, it's defined on the BufferLike trait, which ListBuffer extends. MutableList doesn't implement the Buffer trait, but did choose to provide a prepend method.
A mutable collection is meant for you to be able to mutate it without allocating a new collection, and that is what mutable stands for. Most mutable collection will expose an += method to append values, but it isn't necessary for all to be able to prepend elements in to their underlying collection, that is implementation specific. Depending on the implementation, += can either choose to append an element or prepend it to the underlying collection, depending on how it's storing its values.
What's diff between ListBuffer and MutableList?
One of the main differences between ListBuffer[A] and MutableList[A] is that the former uses lists as the underlying implementation and has O(1) for creating a List[A] from the buffer where the latter uses a LinkedList[A] for the underlying implementation and is the actual basis for the Queue[A] implementation in Scala.

Why are there no methods toVector (like toList, toArray) in standard collection types?

If we are supposed to use Vector as the default Sequence type, why are there no methods toVector (like toList, toArray) in standard collection types?
In prototyping stage, is it okay to conform all collections to Seq type and use toSeq on all collection-returns (cast everything to Seq)?
Normally you should be more concerned with the interface that the collection implements rather than its concrete type, i.e. you should think in terms of Seq, LinearSeq, and IndexedSeq rather than List, Array, and Vector, which are concrete implementations. So arguably there shouldn't be toList and toArray either, but I guess they're there because they're so fundamental.
The toIndexedSeq method in practice provides you with a Vector, unless a collection overrides this in order to provide a more efficient implementation. You can also make a Vector with Vector() ++ c where c is your collection.
Scala 2.10 will come with a method .to[C[_]] so that you can write .to[List], .to[Array], .to[Vector], or any other compatible C.
Scala 2.10 adds not only .to[Vector], but .toVector, as well:
In TraversableOnce trait inherited by collections and iterators:
abstract def toVector: Vector[A]

Difference between MutableList and ListBuffer

What is the difference between Scala's MutableList and ListBuffer classes in scala.collection.mutable? When would you use one vs the other?
My use case is having a linear sequence where I can efficiently remove the first element, prepend, and append. What's the best structure for this?
A little explanation on how they work.
ListBuffer uses internally Nil and :: to build an immutable List and allows constant-time removal of the first and last elements. To do so, it keeps a pointer on the first and last element of the list, and is actually allowed to change the head and tail of the (otherwise immutable) :: class (nice trick allowed by the private[scala] var members of ::). Its toList method returns the normal immutable List in constant time as well, as it can directly return the structure maintained internally. It is also the default builder for immutable Lists (and thus can indeed be reasonably expected to have constant-time append). If you call toList and then again append an element to the buffer, it takes linear time with respect to the current number of elements in the buffer to recreate a new structure, as it must not mutate the exported list any more.
MutableList works internally with LinkedList instead, an (openly, not like ::) mutable linked list implementation which knows of its element and successor (like ::). MutableList also keeps pointers to the first and last element, but toList returns in linear time, as the resulting List is constructed from the LinkedList. Thus, it doesn't need to reinitialize the buffer after a List has been exported.
Given your requirements, I'd say ListBuffer and MutableList are equivalent. If you want to export their internal list at some point, then ask yourself where you want the overhead: when you export the list, and then no overhead if you go on mutating buffer (then go for MutableList), or only if you mutable the buffer again, and none at export time (then go for ListBuffer).
My guess is that in the 2.8 collection overhaul, MutableList predated ListBuffer and the whole Builder system. Actually, MutableList is predominantly useful from within the collection.mutable package: it has a private[mutable] def toLinkedList method which returns in constant time, and can thus efficiently be used as a delegated builder for all structures that maintain a LinkedList internally.
So I'd also recommend ListBuffer, as it may also get attention and optimization in the future than “purely mutable” structures like MutableList and LinkedList.
This gives you an overview of the performance characteristics: http://www.scala-lang.org/docu/files/collections-api/collections.html ; interestingly, MutableList and ListBuffer do not differ there. The documentation of MutableList says it is used internally as base class for Stack and Queue, so maybe ListBuffer is more the official class from the user perspective?
You want a list (why a list?) that is growable and shrinkable, and you want constant append and prepend. Well, Buffer, a trait, has constant append and prepend, with most other operations linear. I'm guessing that ListBuffer, a class that implements Buffer, has constant time removal of the first element.
So, my own recommendation is for ListBuffer.
First, lets go over some of the relevant types in Scala
List - An Immutable collection. A Recursive implementation i.e . i.e An instance of list has two primary elements the head and the tail, where the tail references another List.
List[T]
head: T
tail: List[T] //recursive
LinkedList - A mutable collection defined as a series of linked nodes, where each node contains a value and a pointer to the next node.
Node[T]
value: T
next: Node[T] //sequential
LinkedList[T]
first: Node[T]
List is a functional data structure (immutability) compared to LinkedList which is more standard in imperative languages.
Now, lets look at
ListBuffer - A mutable buffer implementation backed by a List.
MutableList - An implementation based on LinkedList ( Would have been more self explanatory if it had been named LinkedListBuffer instead )
They both offer similar complexity bounds on most operations.
However, if you request a List from a MutableList, then it has to convert the existing linear representation into the recursive representation which takes O(n) which is what #Jean-Philippe Pellet points out. But, if you request a Seq from MutableList the complexity is O(1).
So, IMO the choice narrows down to the specifics of your code and your preference. Though, I suspect there is a lot more List and ListBuffer out there.
Note that ListBuffer is final/sealed, while you can extend MutableList.
Depending on your application, extensibility may be useful.

Where does Array get its toList method

Through searches, I understand the way (or, a way) to convert an Array to a List is like so:
val l = Array(1, 2, 3).toList
But not only can I not find the toList method in Array's API docs, I can't find it in anything that seems to be an ancestor or inherited trait of Array.
Using the newer 2.9 API docs, I see that toList exists in these things:
ImmutableMapAdaptor ImmutableSetAdaptor IntMap List ListBuffer LongMap
MutableList Option ParIterableLike PriorityQueue Stack StackProxy
StreamIterator SynchronizedSet SynchronizedStack TraversableForwarder
TraversableOnce TraversableOnceMethods TraversableProxyLike
But I can't understand how toList gets from one of these to be part of Array. Can anyone explain this?
toList and similar methods not natively found on Java arrays (including our old favourites, map, flatMap, filter etc.) come from s.c.m.ArrayOps, which arrays acquire via implicit conversions in scala.Predef. Look for implicit methods whose names end with ArrayOps and you'll see where the magic comes from.