Understanding Scala's flatmap type conversions - scala

The docs for List state:
The type of the resulting collection is guided by the static type of list. This might cause unexpected results sometimes. For example:
// lettersOf will return a Seq[Char] of likely repeated letters, instead of a Set
def lettersOf(words: Seq[String]) = words flatMap (word => word.toSet)
// lettersOf will return a Set[Char], not a Seq
def lettersOf(words: Seq[String]) = words.toSet flatMap (word => word.toSeq)
I'm having a hard time understanding this. StringOps.toSet returns a Set of Char, so the first example ends up returning a Char Seq - fine. That makes sense. What I don't follow is why in the second example Scala constructs a Set instead of a Seq.
What exactly does "the resulting collection is guided by the static type of list" mean here?

Because of canBuildFrom method defined in Set class. As you can see in the ScalaDoc's CanBuildFrom trait it has thee type parameters CanBuildFrom[-From, -Elem, +To] where:
From - the type of the underlying collection that requests a builder to be created.
Elem - the element type of the collection to be created.
To - the type of the collection to be created.
Basiclly when you calling your flatMap function on the set it implicitly calls Set.canBuildFrom[Char] which return a Set[Char]
As for the static type. When Scala is tring to convert between collection types it uses this CanBuildFrom trait, which depends on the static type of your collection.
Updated for the comment
If we add -Xprint:typer to the scala command, we can see how Scala compiler after the typer phase resolves implicit method Set.canBuildFrom[Char] which is used to in flatMap method
def lettersOf(words: Seq[String]): scala.collection.immutable.Set[Char] = words.toSet[String].flatMap[Char, scala.collection.immutable.Set[Char]](((word: String) => scala.this.Predef.augmentString(word).toSeq))(immutable.this.Set.canBuildFrom[Char])

Related

Scala: list to set using flatMap

I have class with field of type Set[String]. Also, I have list of objects of this class. I'd like to collect all strings from all sets of these objects into one set. Here is how I can do it already:
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
list.flatMap(_.field).toSet // Set(123, 456, 798)
It works, but I think, I can achieve the same using only flatMap, without toSet invocation. I tried this, but it had given compilation error:
// error: Cannot construct a collection of type Set[String]
// with elements of type String based on a collection of type List[MyClass].
list.flatMap[String, Set[String]](_.field)
If I change type of list to Set (i.e., val list = Set(...)), then such flatMap invocation works.
So, can I use somehow Set.canBuildFrom or any other CanBuildFrom object to invoke flatMap on List object, so that I'll get Set as a result?
The CanBuildFrom instance you want is called breakOut and has to be provided as a second parameter:
import scala.collection.breakOut
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
val s: Set[String] = list.flatMap(_.field)(breakOut)
Note that explicit type annotation on variable s is mandatory - that's how the type is chosen.
Edit:
If you're using Scalaz or cats, you can use foldMap as well:
import scalaz._, Scalaz._
list.foldMap(_.field)
This does essentially what mdms answer proposes, except the Set.empty and ++ parts are already baked in.
The way flatMap work in Scala is that it can only remove one wrapper for the same type of wrappers i.e. List[List[String]] -> flatMap -> List[String]
if you apply flatMap on different wrapper data types then you will always get the final outcome as higher level wrapper data type i.e.List[Set[String]] -> flatMap -> List[String]
if you want to apply the flatMap on different wrapper type i.e. List[Set[String]] -> flatMap -> Set[String] in you have 2 options :-
Explicitly cast the one datatype wrapper to another i.e. list.flatMap(_.field).toSet or
By providing implicit converter ie. implicit def listToSet(list: List[String]): Set[String] = list.toSet and the you can get val set:Set[String] = list.flatMap(_.field)
only then what you are trying to achieve will be accomplished.
Conclusion:- if you apply flatMap on 2 wrapped data type then you will always get the final result as op type which is on top of wrapper data type i.e. List[Set[String]] -> flatMap -> List[String] and if you want to convert or cast to different datatype then either you need to implicitly or explicitly cast it.
You could maybe provide a specific CanBuildFrom, but why not to use a fold instead?
list.foldLeft(Set.empty[String]){case (set, myClass) => set ++ myClass.field}
Still just one pass through the collection, and if you are sure the list is not empty, you could even user reduceLeft instead.

Demystifying a function definition

I am new to Scala, and I hope this question is not too basic. I couldn't find the answer to this question on the web (which might be because I don't know the relevant keywords).
I am trying to understand the following definition:
def functionName[T <: AnyRef](name: Symbol)(range: String*)(f: T => String)(implicit tag: ClassTag[T]): DiscreteAttribute[T] = {
val r = ....
new anotherFunctionName[T](name.toString, f, Some(r))
}
First , why is it defined as def functionName[...](...)(...)(...)(...)? Can't we define it as def functionName[...](..., ..., ..., ...)?
Second, how does range: String* from range: String?
Third, would it be a problem if implicit tag: ClassTag[T] did not exist?
First , why is it defined as def functionName...(...)(...)(...)? Can't we define it as def functionName[...](..., ..., ..., ...)?
One good reason to use currying is to support type inference. Consider these two functions:
def pred1[A](x: A, f: A => Boolean): Boolean = f(x)
def pred2[A](x: A)(f: A => Boolean): Boolean = f(x)
Since type information flows from left to right if you try to call pred1 like this:
pred1(1, x => x > 0)
type of the x => x > 0 cannot be determined yet and you'll get an error:
<console>:22: error: missing parameter type
pred1(1, x => x > 0)
^
To make it work you have to specify argument type of the anonymous function:
pred1(1, (x: Int) => x > 0)
pred2 from the other hand can be used without specifying argument type:
pred2(1)(x => x > 0)
or simply:
pred2(1)(_ > 0)
Second, how does range: String* from range: String?
It is a syntax for defining Repeated Parameters a.k.a varargs. Ignoring other differences it can be used only on the last position and is available as a scala.Seq (here scala.Seq[String]). Typical usage is apply method of the collections types which allows for syntax like SomeDummyCollection(1, 2, 3). For more see:
What does `:_*` (colon underscore star) do in Scala?
Scala variadic functions and Seq
Is there a difference in Scala between Seq[T] and T*?
Third, would it be a problem if implicit tag: ClassTag[T] did not exist?
As already stated by Aivean it shouldn't be the case here. ClassTags are automatically generated by the compiler and should be accessible as long as the class exists. In general case if implicit argument cannot be accessed you'll get an error:
scala> import scala.concurrent._
import scala.concurrent._
scala> val answer: Future[Int] = Future(42)
<console>:13: error: Cannot find an implicit ExecutionContext. You might pass
an (implicit ec: ExecutionContext) parameter to your method
or import scala.concurrent.ExecutionContext.Implicits.global.
val answer: Future[Int] = Future(42)
Multiple argument lists: this is called "currying", and enables you to call a function with only some of the arguments, yielding a function that takes the rest of the arguments and produces the result type (partial function application). Here is a link to Scala documentation that gives an example of using this. Further, any implicit arguments to a function must be specified together in one argument list, coming after any other argument lists. While defining functions this way is not necessary (apart from any implicit arguments), this style of function definition can sometimes make it clearer how the function is expected to be used, and/or make the syntax for partial application look more natural (f(x) rather than f(x, _)).
Arguments with an asterisk: "varargs". This syntax denotes that rather than a single argument being expected, a variable number of arguments can be passed in, which will be handled as (in this case) a Seq[String]. It is the equivalent of specifying (String... range) in Java.
the implicit ClassTag: this is often needed to ensure proper typing of the function result, where the type (T here) cannot be determined at compile time. Since Scala runs on the JVM, which does not retain type information beyond compile time, this is a work-around used in Scala to ensure information about the type(s) involved is still available at runtime.
Check currying:Methods may define multiple parameter lists. When a method is called with a fewer number of parameter lists, then this will yield a function taking the missing parameter lists as its arguments.
range:String* is the syntax for varargs
implicit TypeTag parameter in Scala is the alternative for Class<T> clazzparameter in Java. It will be always available if your class is defined in scope. Read more about type tags.

scala: How to sort Seq with an Ordering object?

From the scala reference:
http://www.scala-lang.org/files/archive/nightly/docs/library/index.html#scala.math.Ordering
case class Person(name:String, age:Int)
val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 19))
// sort by age
object AgeOrdering extends Ordering[Person] {
def compare(a:Person, b:Person) = a.age compare b.age
}
But if I want to sort a Seq using an Ordering object:
// Want to sort a Seq, or event and IndexedSeq
val ps = people.asInstanceOf[collection.Seq[Person]]
// Type not enough arguments
// for method stableSort: (implicit evidence$6: scala.reflect.ClassTag[Person],
// implicit evidence$7: scala.math.Ordering[Person])Array[Person]. Unspecified value
// parameter evidence$7.
Sorting.stableSort(ps)(AgeOrdering)
the compiler chokes. But the stableSort API says it will take an Seq. So why does the above fail?
In Sorting, the quickSort implementation is done in-place, with a bunch of index-based moves which are only suitable to be done on an Array (and a Seq does not define indexing at all).
quickSort[K](a: Array[K] ..): Unit
The quickSort method performs side-effects on the input and returns Unit.
The stableSort however, has two different forms. One that takes an Seq and one that takes an Array.
stableSort[K](a: Seq[K] ..): Array[K]
stableSort[K](a: Array[K] ..): Unit
The form that takes an Array also returns Unit and modifies the input (as quickSort but ensuring stability), while the form takes a Seq returns a new Array object (and the input sequence is not modified).
The error is because the quickSort method has the signature for the second parameter group: (implicit arg0: math.Ordering[K]) while the stableSort method has: (implicit arg0: ClassTag[K], arg1: math.Ordering[K]) - that is, it also requires a ClassTag[K/Person].
Even though the ordering was specified, no ClassTag[K] argument is specified (or implicitly resolved) and Scala needs this to create the Array[Person] that is used internally (and returned). Due to Scala being limited by Java's type erasure, the method doesn't get to know what type K really is without additional help.
I'm not sure why it's not being resolved implicitly (so much of Scala implicits is still "magic" to me!), but it's fairly easy to pass in explicitly:
val results = Sorting.stableSort(ps)(classTag[Person], AgeOrdering)
Some rational and usages of TypeTag/ClassTag is discussed in TypeTags and Manifests.
Also see Scala: What is a TypeTag and how do I use it?

Why can a Map object be created without an apply-method?

C:\Users\John>scala
Welcome to Scala version 2.9.2 (Java HotSpot(TM) Client VM, Java 1.6.0_32).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.collection.mutable.Map
import scala.collection.mutable.Map
scala> Map()
res4: scala.collection.mutable.Map[Nothing,Nothing] = Map()
When using Map() without keyword new the apply method from the corresponding companion object will be called. But the Scala Documentation does not list an apply method for mutable Maps (only an apply method to retrieve a value from the map is provided).
Why is the code above still working ?
It looks like a bug in scaladoc. There is an apply method in object collection.mutable.Map (inherited from GenMapFactory) but it does not appear in the doc for Map. This problem seems to be fixed in the doc for upcomping 2.10.
Note : you must look into the object documentation, not the class one. The method apply in the class of course works with an existing map instance, and retrieve data from it.
There is an apply() method on the companion object of scala.collection.immutable.Map(). It is inherited from scala.collection.MapFactory . That method takes a variable number of pair arguments, and is usually used as
Map("foo"->3, "bar"->4, "barangus"->5)
Calling it with no arguments evidently works as well, but someone brighter than me would have to explain why the type inference engine comes up with scala.collection.mutable.Map[Nothing,Nothing] for it.
As sepp2k already mentioned in his comment the symbol Map refers to the companion object of Map, which gives you access to its single instance. In patter-matching this is often used to identify a message:
scala> case object Foo
defined module Foo
scala> def send[A](a: A) = a match { case Foo => "got a Foo" case Map => "got a Map" }
send: [A](a: A)String
scala> send(Map)
res8: String = got a Map
scala> send(Foo)
res9: String = got a Foo
If you write Map() you will call the apply method of object Map. Because you did not give any values to insert into the Map the compiler can't infer any type, thus it has to use the bottom type - Nothing - which is a subtype of every type. It is the only possible type to infer which will not break the type system although there are variances. Would Nothing not exist the following code would not compile:
scala> Map(1 -> 1) ++ Map()
res10: scala.collection.mutable.Map[Int,Int] = Map(1 -> 1)
If you take a look to the type signature of ++ which is as follows (source)
def ++[B1 >: B](xs: GenTraversableOnce[(A, B1)]): Map[A, B1]
you will notice the lower-bound type parameter B1 >: B. Because Nothing is a subtype of everything (and B in our case) the compiler can find a B1 (which is Int in our case) and successfully infer a type signature for our Map. This lower-bound is needed because B is covariant (source),
trait MapLike[A, +B, ...] ...
which means that we are not allowed to deliver it as a method parameter (because method parameters are in contravariant position). If the method parameters would not be in contravariant position Liskov's substitution principle would no longer kept by the type system. Thus to get the code to compile a new type (here called B1) has to be found.
As Didier Dupont already pointed out there are some bugs in Scaladoc 2.9, which are solved in 2.10. Not only some missed methods are displayed there but also methods added by an implicit conversion can be displayed (Array for example does display a lot of methods in 2.10 which are not displayed in 2.9).

Spurious ambiguous reference error in Scala 2.7.7 compiler/interpreter?

Can anyone explain the compile error below? Interestingly, if I change the return type of the get() method to String, the code compiles just fine. Note that the thenReturn method has two overloads: a unary method and a varargs method that takes at least one argument. It seems to me that if the invocation is ambiguous here, then it would always be ambiguous.
More importantly, is there any way to resolve the ambiguity?
import org.scalatest.mock.MockitoSugar
import org.mockito.Mockito._
trait Thing {
def get(): java.lang.Object
}
new MockitoSugar {
val t = mock[Thing]
when(t.get()).thenReturn("a")
}
error: ambiguous reference to overloaded definition,
both method thenReturn in trait OngoingStubbing of type
java.lang.Object,java.lang.Object*)org.mockito.stubbing.OngoingStubbing[java.lang.Object]
and method thenReturn in trait OngoingStubbing of type
(java.lang.Object)org.mockito.stubbing.OngoingStubbing[java.lang.Object]
match argument types (java.lang.String)
when(t.get()).thenReturn("a")
Well, it is ambiguous. I suppose Java semantics allow for it, and it might merit a ticket asking for Java semantics to be applied in Scala.
The source of the ambiguitity is this: a vararg parameter may receive any number of arguments, including 0. So, when you write thenReturn("a"), do you mean to call the thenReturn which receives a single argument, or do you mean to call the thenReturn that receives one object plus a vararg, passing 0 arguments to the vararg?
Now, what this kind of thing happens, Scala tries to find which method is "more specific". Anyone interested in the details should look up that in Scala's specification, but here is the explanation of what happens in this particular case:
object t {
def f(x: AnyRef) = 1 // A
def f(x: AnyRef, xs: AnyRef*) = 2 // B
}
if you call f("foo"), both A and B
are applicable. Which one is more
specific?
it is possible to call B with parameters of type (AnyRef), so A is
as specific as B.
it is possible to call A with parameters of type (AnyRef,
Seq[AnyRef]) thanks to tuple
conversion, Tuple2[AnyRef,
Seq[AnyRef]] conforms to AnyRef. So
B is as specific as A. Since both are
as specific as the other, the
reference to f is ambiguous.
As to the "tuple conversion" thing, it is one of the most obscure syntactic sugars of Scala. If you make a call f(a, b), where a and b have types A and B, and there is no f accepting (A, B) but there is an f which accepts (Tuple2(A, B)), then the parameters (a, b) will be converted into a tuple.
For example:
scala> def f(t: Tuple2[Int, Int]) = t._1 + t._2
f: (t: (Int, Int))Int
scala> f(1,2)
res0: Int = 3
Now, there is no tuple conversion going on when thenReturn("a") is called. That is not the problem. The problem is that, given that tuple conversion is possible, neither version of thenReturn is more specific, because any parameter passed to one could be passed to the other as well.
In the specific case of Mockito, it's possible to use the alternate API methods designed for use with void methods:
doReturn("a").when(t).get()
Clunky, but it'll have to do, as Martin et al don't seem likely to compromise Scala in order to support Java's varargs.
Well, I figured out how to resolve the ambiguity (seems kind of obvious in retrospect):
when(t.get()).thenReturn("a", Array[Object](): _*)
As Andreas noted, if the ambiguous method requires a null reference rather than an empty array, you can use something like
v.overloadedMethod(arg0, null.asInstanceOf[Array[Object]]: _*)
to resolve the ambiguity.
If you look at the standard library APIs you'll see this issue handled like this:
def meth(t1: Thing): OtherThing = { ... }
def meth(t1: Thing, t2: Thing, ts: Thing*): OtherThing = { ... }
By doing this, no call (with at least one Thing parameter) is ambiguous without extra fluff like Array[Thing](): _*.
I had a similar problem using Oval (oval.sf.net) trying to call it's validate()-method.
Oval defines 2 validate() methods:
public List<ConstraintViolation> validate(final Object validatedObject)
public List<ConstraintViolation> validate(final Object validatedObject, final String... profiles)
Trying this from Scala:
validator.validate(value)
produces the following compiler-error:
both method validate in class Validator of type (x$1: Any,x$2: <repeated...>[java.lang.String])java.util.List[net.sf.oval.ConstraintViolation]
and method validate in class Validator of type (x$1: Any)java.util.List[net.sf.oval.ConstraintViolation]
match argument types (T)
var violations = validator.validate(entity);
Oval needs the varargs-parameter to be null, not an empty-array, so I finally got it to work with this:
validator.validate(value, null.asInstanceOf[Array[String]]: _*)