scala: How to sort Seq with an Ordering object?

scala: How to sort Seq with an Ordering object? - scala

From the scala reference:
http://www.scala-lang.org/files/archive/nightly/docs/library/index.html#scala.math.Ordering
case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
// sort by age
object AgeOrdering extends Ordering[Person] {
def compare(a:Person, b:Person) = a.age compare b.age
}
But if I want to sort a Seq using an Ordering object:
// Want to sort a Seq, or event and IndexedSeq
val ps = people.asInstanceOf[collection.Seq[Person]]
// Type not enough arguments
// for method stableSort: (implicit evidence$6: scala.reflect.ClassTag[Person],
// implicit evidence$7: scala.math.Ordering[Person])Array[Person]. Unspecified value
// parameter evidence$7.
Sorting.stableSort(ps)(AgeOrdering)
the compiler chokes. But the stableSort API says it will take an Seq. So why does the above fail?

In Sorting, the quickSort implementation is done in-place, with a bunch of index-based moves which are only suitable to be done on an Array (and a Seq does not define indexing at all).
quickSort[K](a: Array[K] ..): Unit
The quickSort method performs side-effects on the input and returns Unit.
The stableSort however, has two different forms. One that takes an Seq and one that takes an Array.
stableSort[K](a: Seq[K] ..): Array[K]
stableSort[K](a: Array[K] ..): Unit
The form that takes an Array also returns Unit and modifies the input (as quickSort but ensuring stability), while the form takes a Seq returns a new Array object (and the input sequence is not modified).
The error is because the quickSort method has the signature for the second parameter group: (implicit arg0: math.Ordering[K]) while the stableSort method has: (implicit arg0: ClassTag[K], arg1: math.Ordering[K]) - that is, it also requires a ClassTag[K/Person].
Even though the ordering was specified, no ClassTag[K] argument is specified (or implicitly resolved) and Scala needs this to create the Array[Person] that is used internally (and returned). Due to Scala being limited by Java's type erasure, the method doesn't get to know what type K really is without additional help.
I'm not sure why it's not being resolved implicitly (so much of Scala implicits is still "magic" to me!), but it's fairly easy to pass in explicitly:
val results = Sorting.stableSort(ps)(classTag[Person], AgeOrdering)
Some rational and usages of TypeTag/ClassTag is discussed in TypeTags and Manifests.
Also see Scala: What is a TypeTag and how do I use it?

Related

What happens when you create a Seq object with Seq(1,2,3)?

What exactly happens when you evaluate the expression: Seq(1,2,3)?
I am new to Scala and I am now a bit confused about the various collection types. Seq is a trait, right? So when you call it like this: Seq(1,2,3), it must be some kind of a companion object? Or not? Is it some kind of a class that extends Seq? And most importantly, what is the type of the returned value? Is it Seq and if yes, why is it not explicitly the extension class instead?
Also in the REPL I see that the contents of the evaluated expression is actually a List(1,2,3), but the type is apparently Seq[Int]. Why is it not an IndexedSeq collection type, like Vector? What is the logic behind all that?

What exactly happens when you evaluate expression: Seq(1,2,3)?
In Scala, foo(bar) is syntactic sugar for foo.apply(bar), unless this also has a method named foo, in which case it is a method call on the implicit this receiver, i.e. just like Java, it is then equivalent to this.foo(bar).
Just like any other OO language, the receiver of a method call alone decides what to do with that call, so in this case, Seq decides what to do.
Seq is a trait, right?
There are two Seqs in the standard library:
The trait Seq, which is a type.
The object Seq, which is a value.
So when you call it like that Seq(1,2,3) it must be some kind of a companion object? Or not?
Yes, it must be an object, since you can only call methods on objects. You cannot call methods on types, therefore, when you see a method call, it must be an object. Always. So, in this case, Seq cannot possibly be the Seq trait, it must be the Seq object.
Note that "it must be some kind of a companion object" is not true. The only thing you can see from that piece of code is that Seq is an object. You cannot know from that piece of code whether it is a companion object. For that, you would have to look at the source code. In this particular case, it turns out that it is, in fact, a companion object, but you cannot conclude that from the code you showed.
Is it some kind of a class that extends Seq?
No. It cannot possibly be a class, since you can only call methods on objects, and classes are not objects in Scala. (This is not like Ruby or Smalltalk, where classes are also objects and instances of the Class class.) It must be an object.
And most importantly what is the type of the returned value?
The easiest way to find that out is to simply look at the documentation for Seq.apply:
def apply[A](elems: A*): Seq[A]
Creates a collection with the specified elements.
A: the type of the collection's elements
elems: the elements of the created collection
returns a new collection with elements elems
So, as you can see, the return type of Seq.apply is Seq, or more precisely, Seq[A], where A is a type variable denoting the type of the elements of the collection.
Is it Seq and if yes, why is not explicitly the extension class instead?
Because there is no extension class.
Also, the standard design pattern in Scala is that the apply method of a companion object returns an instance of the companion class or trait. It would be weird and surprising to break this convention.
Also in REPL I see that the contents of the evaluated expression is actually a List(1,2,3), but the type is apparently Seq[Int].
The static type is Seq[Int]. That is all you need to know. That is all you can know.
Now, Seq is a trait, and traits cannot be instantiated, so the runtime type will be some subclass of Seq. But! You cannot and must not care, what specific runtime type it is.
Why is not an Indexed collection type, like Vector? What is the logic behind all that?
How do you know it is not going to return a Vector the next time you call it? It wouldn't matter one bit, since the static type is Seq and thus you are only allowed to call Seq methods on it, and you are only allowed to rely on the contract of Seq, i.e. Seq's post-conditions, invariants, etc. anyway. Even if you knew it was a Vector that is returned, you wouldn't be able to do anything with this knowledge.
Thus, Seq.apply returns the simplest thing it can possibly return, and that is a List.

Seq is the val of:
package object scala {
...
val Seq = scala.collection.Seq
...
}
it points to object scala.collection.Seq:
/** $factoryInfo
* The current default implementation of a $Coll is a `List`.
* #define coll sequence
* #define Coll `Seq`
*/
object Seq extends SeqFactory[Seq] {
/** $genericCanBuildFromInfo */
implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Seq[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]]
def newBuilder[A]: Builder[A, Seq[A]] = immutable.Seq.newBuilder[A]
}
and when you do Seq(1,2,3) the apply() method is ivoked from scala.collection.generic.GenericCompanion abstract class:
/** A template class for companion objects of "regular" collection classes
* represent an unconstrained higher-kinded type. Typically
* such classes inherit from trait `GenericTraversableTemplate`.
* #tparam CC The type constructor representing the collection class.
* #see [[scala.collection.generic.GenericTraversableTemplate]]
* #author Martin Odersky
* #since 2.8
* #define coll collection
* #define Coll `CC`
*/
abstract class GenericCompanion[+CC[X] <: GenTraversable[X]] {
...
/** Creates a $coll with the specified elements.
* #tparam A the type of the ${coll}'s elements
* #param elems the elements of the created $coll
* #return a new $coll with elements `elems`
*/
def apply[A](elems: A*): CC[A] = {
if (elems.isEmpty) empty[A]
else {
val b = newBuilder[A]
b ++= elems
b.result()
}
}
}
and finally, this method builds an object of Seq type by code mentioned above
And most importantly what is the type of the returned value?
object MainClass {
def main(args: Array[String]): Unit = {
val isList = Seq(1,2,3).isInstanceOf[List[Int]]
println(isList)
}
}
prints:
true
So, the type is scala.collection.immutable.List
Also in REPL I see that the contents of the evaluated expression is actually a List(1,2,3), but the type is apparently Seq[Int].
The default implementation of Seq is List by the code mentioned above.
Why is not an Indexed collection type, like Vector? What is the logic behind all that?
Because of immutable design. The list is immutable and to make it immutable and have a constant prepend operation but O(n) append operation cost and O(n) cost of accessing n'th element. The Vector has a constant efficient implementation of access and add elements by id, prepend and append operations.
To have a better understanding of how the List is designed in Scala, see https://mauricio.github.io/2013/11/25/learning-scala-by-building-scala-lists.html

Demystifying a function definition

I am new to Scala, and I hope this question is not too basic. I couldn't find the answer to this question on the web (which might be because I don't know the relevant keywords).
I am trying to understand the following definition:
def functionName[T <: AnyRef](name: Symbol)(range: String*)(f: T => String)(implicit tag: ClassTag[T]): DiscreteAttribute[T] = {
val r = ....
new anotherFunctionName[T](name.toString, f, Some(r))
}
First , why is it defined as def functionName[...](...)(...)(...)(...)? Can't we define it as def functionName[...](..., ..., ..., ...)?
Second, how does range: String* from range: String?
Third, would it be a problem if implicit tag: ClassTag[T] did not exist?

First , why is it defined as def functionName...(...)(...)(...)? Can't we define it as def functionName[...](..., ..., ..., ...)?
One good reason to use currying is to support type inference. Consider these two functions:
def pred1[A](x: A, f: A => Boolean): Boolean = f(x)
def pred2[A](x: A)(f: A => Boolean): Boolean = f(x)
Since type information flows from left to right if you try to call pred1 like this:
pred1(1, x => x > 0)
type of the x => x > 0 cannot be determined yet and you'll get an error:
<console>:22: error: missing parameter type
pred1(1, x => x > 0)
^
To make it work you have to specify argument type of the anonymous function:
pred1(1, (x: Int) => x > 0)
pred2 from the other hand can be used without specifying argument type:
pred2(1)(x => x > 0)
or simply:
pred2(1)(_ > 0)
Second, how does range: String* from range: String?
It is a syntax for defining Repeated Parameters a.k.a varargs. Ignoring other differences it can be used only on the last position and is available as a scala.Seq (here scala.Seq[String]). Typical usage is apply method of the collections types which allows for syntax like SomeDummyCollection(1, 2, 3). For more see:
What does `:_*` (colon underscore star) do in Scala?
Scala variadic functions and Seq
Is there a difference in Scala between Seq[T] and T*?
Third, would it be a problem if implicit tag: ClassTag[T] did not exist?
As already stated by Aivean it shouldn't be the case here. ClassTags are automatically generated by the compiler and should be accessible as long as the class exists. In general case if implicit argument cannot be accessed you'll get an error:
scala> import scala.concurrent._
import scala.concurrent._
scala> val answer: Future[Int] = Future(42)
<console>:13: error: Cannot find an implicit ExecutionContext. You might pass
an (implicit ec: ExecutionContext) parameter to your method
or import scala.concurrent.ExecutionContext.Implicits.global.
val answer: Future[Int] = Future(42)

Multiple argument lists: this is called "currying", and enables you to call a function with only some of the arguments, yielding a function that takes the rest of the arguments and produces the result type (partial function application). Here is a link to Scala documentation that gives an example of using this. Further, any implicit arguments to a function must be specified together in one argument list, coming after any other argument lists. While defining functions this way is not necessary (apart from any implicit arguments), this style of function definition can sometimes make it clearer how the function is expected to be used, and/or make the syntax for partial application look more natural (f(x) rather than f(x, _)).
Arguments with an asterisk: "varargs". This syntax denotes that rather than a single argument being expected, a variable number of arguments can be passed in, which will be handled as (in this case) a Seq[String]. It is the equivalent of specifying (String... range) in Java.
the implicit ClassTag: this is often needed to ensure proper typing of the function result, where the type (T here) cannot be determined at compile time. Since Scala runs on the JVM, which does not retain type information beyond compile time, this is a work-around used in Scala to ensure information about the type(s) involved is still available at runtime.

Check currying:Methods may define multiple parameter lists. When a method is called with a fewer number of parameter lists, then this will yield a function taking the missing parameter lists as its arguments.
range:String* is the syntax for varargs
implicit TypeTag parameter in Scala is the alternative for Class<T> clazzparameter in Java. It will be always available if your class is defined in scope. Read more about type tags.

Understanding Scala's flatmap type conversions

The docs for List state:
The type of the resulting collection is guided by the static type of list. This might cause unexpected results sometimes. For example:
// lettersOf will return a Seq[Char] of likely repeated letters, instead of a Set
def lettersOf(words: Seq[String]) = words flatMap (word => word.toSet)
// lettersOf will return a Set[Char], not a Seq
def lettersOf(words: Seq[String]) = words.toSet flatMap (word => word.toSeq)
I'm having a hard time understanding this. StringOps.toSet returns a Set of Char, so the first example ends up returning a Char Seq - fine. That makes sense. What I don't follow is why in the second example Scala constructs a Set instead of a Seq.
What exactly does "the resulting collection is guided by the static type of list" mean here?

Because of canBuildFrom method defined in Set class. As you can see in the ScalaDoc's CanBuildFrom trait it has thee type parameters CanBuildFrom[-From, -Elem, +To] where:
From - the type of the underlying collection that requests a builder to be created.
Elem - the element type of the collection to be created.
To - the type of the collection to be created.
Basiclly when you calling your flatMap function on the set it implicitly calls Set.canBuildFrom[Char] which return a Set[Char]
As for the static type. When Scala is tring to convert between collection types it uses this CanBuildFrom trait, which depends on the static type of your collection.
Updated for the comment
If we add -Xprint:typer to the scala command, we can see how Scala compiler after the typer phase resolves implicit method Set.canBuildFrom[Char] which is used to in flatMap method
def lettersOf(words: Seq[String]): scala.collection.immutable.Set[Char] = words.toSet[String].flatMap[Char, scala.collection.immutable.Set[Char]](((word: String) => scala.this.Predef.augmentString(word).toSeq))(immutable.this.Set.canBuildFrom[Char])

Calling map on a parallel collection via a reference to an ancestor type

I tried to make it optional to run a map operation sequentially or in parallel, for example using the following code:
val runParallel = true
val theList = List(1,2,3,4,5)
(if(runParallel) theList.par else theList) map println //Doesn't run in parallel
What I noticed is that the 'map' operation does not run in parallel as I'd expected. Although without the conditional, it would:
theList.par map println //Runs in parallel as visible in the output
The type of the expression (if(runParallel) theList else theList.par) which I expect to be the closest common ancestor of both types of theList and theList.par is a scary type that I won't paste here but it's interesting to look at (via scala console:)
:type (if(true) theList else theList.par)
Why doesn't the map on the parallel collection work in parallel?
UPDATE: This is discussed in SI-4843 but from the JIRA ticket it's not clear why this was happening on Scala 2.9.x.

The explanation of why it happens is a long story: in Scala 2.9.x (I don't know about the other versions) those collections methods such as filter or map relies on the CanBuildFrom mechanism. The idea is that you have an implicit parameter which is used to create a builder for the new collection:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
Thanks to this mechanism, the map method is defined only in the TraversableLike trait and its subclasses do not need to override it. As you see, inside the method map signature there are many type parameters. Let's look at the trivial ones:
The B which is the type of the elements of the new collection
The A is the type of the elements in the source collection
Let's look at the more complicated ones:
That is the new type of collection, which can be different from the current type. A classical example is when you map for example a BitSet using a toString:
scala> val a = BitSet(1,3,5)
a scala.collection.immutable.BitSet = BitSet(1, 3, 5)
scala> a.map {_.toString}
res2: scala.collection.immutable.Set[java.lang.String] = Set(1, 3, 5)
Since it is illegal to create a BitSet[String] your map result will be a Set[String]
Finally Repr is the type of the current collection. When you try to map a collection over a function, the compiler will resolve a suitable CanBuildFrom using the type parameters.
As it is reasonable, the map method has been overridden in parallel collections in ParIterableLike as the following:
def map[S, That](f: T => S)(implicit bf: CanBuildFrom[Repr, S, That]): That = bf ifParallel { pbf =>
executeAndWaitResult(new Map[S, That](f, pbf, splitter) mapResult { _.result })
} otherwise seq.map(f)(bf2seq(bf))
As you can see the method has the same signature, but it uses a different approach: it test whether the provided CanBuildFrom is parallel and otherwise falls back on the default implementation.
Therefore, Scala parallel collections use special CanBuildFrom (parallel ones) which create parallel builders for the map methods.
However, what happens when you do
(if(runParallel) theList.par else theList) map println //Doesn't run in parallel
is the map method gets executed on the result of
(if(runParallel) theList.par else theList)
whose return type is the first common ancestors of both classes (in this case just a certain number of traits mixed togethers). Since it is a common ancestor, is type parameter Repr will be some kind of common ancestors of both collections representation, let's call it Repr1.
Conclusion
When you call the map method, the compiler should find a suitable CanBuildFrom[Repr, B, That] for the operation. Since our Repr1 is not the one of a parallel collection, there won't be any CanBuildFrom[Repr1,B,That] capable of providing a parallel builder. This is actually a correct behaviour with respect to the implementation of Scala collections, if the behaviour would be different that would mean that every map of non parallel collections would be run in parallel as well.
The point here is that, for how Scala collections are designed in 2.9.x there is no alternative. If the compiler does not provide a CanBuildFrom for a parallel collection, the map won't be parallel.

scala implicit or explicit conversion from iterator to iterable

Does Scala provide a built-in class, utility, syntax, or other mechanism for converting (by wrapping) an Iterator with an Iterable?
For example, I have an Iterator[Foo] and I need an Iterable[Foo], so currently I am:
val foo1: Iterator[Foo] = ....
val foo2: Iterable[Foo] = new Iterable[Foo] {
def elements = foo1
}
This seems ugly and unnecessary. What's a better way?

Iterator has a toIterable method in Scala 2.8.0, but not in 2.7.7 or earlier. It's not implicit, but you could define your own implicit conversion if you need one.

You should be very careful about ever implicitly converting an Iterator into an Iterable (I normally use Iterator.toList - explicitly). The reason for this is that, by passing the result into a method (or function) which expects an Iterable, you lose control of it to the extent that your program might be broken. Here's one example:
def printTwice(itr : Iterable[String]) : Unit = {
itr.foreach(println(_))
itr.foreach(println(_))
}
If an Iterator were somehow implicitly convertible into an Iterable, what will the following would print?
printTwice(Iterator.single("Hello"))
It will (of course) only print Hello once. Very recently, the trait TraversableOnce has been added to the collections library, which unifies Iterator and Iterable. To my mind, this is arguably a mistake.
My personal preference is to use Iterator explicitly wherever possible and then use List, Set or IndexedSeq directly. I have found that I can rarely write a method which is genuinely agnostic of the type it is passed. One example:
def foo(trades: Iterable[Trade]) {
log.info("Processing %d trades", trades.toList.length) //hmmm, converted to a List
val shorts = trades.filter(_.side.isSellShort)
log.info("Found %d sell-short", shorts.toList.length) //hmmm, converted to a List again
//etc