Scala: list to set using flatMap - scala

I have class with field of type Set[String]. Also, I have list of objects of this class. I'd like to collect all strings from all sets of these objects into one set. Here is how I can do it already:
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
list.flatMap(_.field).toSet // Set(123, 456, 798)
It works, but I think, I can achieve the same using only flatMap, without toSet invocation. I tried this, but it had given compilation error:
// error: Cannot construct a collection of type Set[String]
// with elements of type String based on a collection of type List[MyClass].
list.flatMap[String, Set[String]](_.field)
If I change type of list to Set (i.e., val list = Set(...)), then such flatMap invocation works.
So, can I use somehow Set.canBuildFrom or any other CanBuildFrom object to invoke flatMap on List object, so that I'll get Set as a result?

The CanBuildFrom instance you want is called breakOut and has to be provided as a second parameter:
import scala.collection.breakOut
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
val s: Set[String] = list.flatMap(_.field)(breakOut)
Note that explicit type annotation on variable s is mandatory - that's how the type is chosen.
Edit:
If you're using Scalaz or cats, you can use foldMap as well:
import scalaz._, Scalaz._
list.foldMap(_.field)
This does essentially what mdms answer proposes, except the Set.empty and ++ parts are already baked in.

The way flatMap work in Scala is that it can only remove one wrapper for the same type of wrappers i.e. List[List[String]] -> flatMap -> List[String]
if you apply flatMap on different wrapper data types then you will always get the final outcome as higher level wrapper data type i.e.List[Set[String]] -> flatMap -> List[String]
if you want to apply the flatMap on different wrapper type i.e. List[Set[String]] -> flatMap -> Set[String] in you have 2 options :-
Explicitly cast the one datatype wrapper to another i.e. list.flatMap(_.field).toSet or
By providing implicit converter ie. implicit def listToSet(list: List[String]): Set[String] = list.toSet and the you can get val set:Set[String] = list.flatMap(_.field)
only then what you are trying to achieve will be accomplished.
Conclusion:- if you apply flatMap on 2 wrapped data type then you will always get the final result as op type which is on top of wrapper data type i.e. List[Set[String]] -> flatMap -> List[String] and if you want to convert or cast to different datatype then either you need to implicitly or explicitly cast it.

You could maybe provide a specific CanBuildFrom, but why not to use a fold instead?
list.foldLeft(Set.empty[String]){case (set, myClass) => set ++ myClass.field}
Still just one pass through the collection, and if you are sure the list is not empty, you could even user reduceLeft instead.

Related

Understanding Scala's flatmap type conversions

The docs for List state:
The type of the resulting collection is guided by the static type of list. This might cause unexpected results sometimes. For example:
// lettersOf will return a Seq[Char] of likely repeated letters, instead of a Set
def lettersOf(words: Seq[String]) = words flatMap (word => word.toSet)
// lettersOf will return a Set[Char], not a Seq
def lettersOf(words: Seq[String]) = words.toSet flatMap (word => word.toSeq)
I'm having a hard time understanding this. StringOps.toSet returns a Set of Char, so the first example ends up returning a Char Seq - fine. That makes sense. What I don't follow is why in the second example Scala constructs a Set instead of a Seq.
What exactly does "the resulting collection is guided by the static type of list" mean here?
Because of canBuildFrom method defined in Set class. As you can see in the ScalaDoc's CanBuildFrom trait it has thee type parameters CanBuildFrom[-From, -Elem, +To] where:
From - the type of the underlying collection that requests a builder to be created.
Elem - the element type of the collection to be created.
To - the type of the collection to be created.
Basiclly when you calling your flatMap function on the set it implicitly calls Set.canBuildFrom[Char] which return a Set[Char]
As for the static type. When Scala is tring to convert between collection types it uses this CanBuildFrom trait, which depends on the static type of your collection.
Updated for the comment
If we add -Xprint:typer to the scala command, we can see how Scala compiler after the typer phase resolves implicit method Set.canBuildFrom[Char] which is used to in flatMap method
def lettersOf(words: Seq[String]): scala.collection.immutable.Set[Char] = words.toSet[String].flatMap[Char, scala.collection.immutable.Set[Char]](((word: String) => scala.this.Predef.augmentString(word).toSeq))(immutable.this.Set.canBuildFrom[Char])

Type alias for immutable collections

What is the best way to resolve the compilation error in the example below? Assume that 'm' must be of type GenMap and I do not have control over the arguments of myFun.
import scala.collection.GenMap
object Test {
def myFun(m: Map[Int, String]) = m
val m: GenMap[Int, String] = Map(1 -> "One", 2 -> "two")
//Build error here on m.seq
// Found scala.collection.Map[Int, String]
// Required scala.collection.immutable.Map[Int, String]
val result = myFun(m.seq)
}
EDIT:
I should have been clearer. In my actual use-case I don't have control over myFun, so I have to pass it a Map. The 'm' also arises from another scala component as a GenMap. I need to convert one to another, but there appears to be a conflict between collection.Map and collection.immutable.Map
m.seq.toMap will solve your problem.
According to the signature presented in the API toMap returns a scala.collection.immutable.Map which is said to be required in your error message. scala.collection.Map returned by the seq method is a more general trait which besides being a parent to immutable map is also a parent to the mutable and concurrent map.

Why can a Map object be created without an apply-method?

C:\Users\John>scala
Welcome to Scala version 2.9.2 (Java HotSpot(TM) Client VM, Java 1.6.0_32).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.collection.mutable.Map
import scala.collection.mutable.Map
scala> Map()
res4: scala.collection.mutable.Map[Nothing,Nothing] = Map()
When using Map() without keyword new the apply method from the corresponding companion object will be called. But the Scala Documentation does not list an apply method for mutable Maps (only an apply method to retrieve a value from the map is provided).
Why is the code above still working ?
It looks like a bug in scaladoc. There is an apply method in object collection.mutable.Map (inherited from GenMapFactory) but it does not appear in the doc for Map. This problem seems to be fixed in the doc for upcomping 2.10.
Note : you must look into the object documentation, not the class one. The method apply in the class of course works with an existing map instance, and retrieve data from it.
There is an apply() method on the companion object of scala.collection.immutable.Map(). It is inherited from scala.collection.MapFactory . That method takes a variable number of pair arguments, and is usually used as
Map("foo"->3, "bar"->4, "barangus"->5)
Calling it with no arguments evidently works as well, but someone brighter than me would have to explain why the type inference engine comes up with scala.collection.mutable.Map[Nothing,Nothing] for it.
As sepp2k already mentioned in his comment the symbol Map refers to the companion object of Map, which gives you access to its single instance. In patter-matching this is often used to identify a message:
scala> case object Foo
defined module Foo
scala> def send[A](a: A) = a match { case Foo => "got a Foo" case Map => "got a Map" }
send: [A](a: A)String
scala> send(Map)
res8: String = got a Map
scala> send(Foo)
res9: String = got a Foo
If you write Map() you will call the apply method of object Map. Because you did not give any values to insert into the Map the compiler can't infer any type, thus it has to use the bottom type - Nothing - which is a subtype of every type. It is the only possible type to infer which will not break the type system although there are variances. Would Nothing not exist the following code would not compile:
scala> Map(1 -> 1) ++ Map()
res10: scala.collection.mutable.Map[Int,Int] = Map(1 -> 1)
If you take a look to the type signature of ++ which is as follows (source)
def ++[B1 >: B](xs: GenTraversableOnce[(A, B1)]): Map[A, B1]
you will notice the lower-bound type parameter B1 >: B. Because Nothing is a subtype of everything (and B in our case) the compiler can find a B1 (which is Int in our case) and successfully infer a type signature for our Map. This lower-bound is needed because B is covariant (source),
trait MapLike[A, +B, ...] ...
which means that we are not allowed to deliver it as a method parameter (because method parameters are in contravariant position). If the method parameters would not be in contravariant position Liskov's substitution principle would no longer kept by the type system. Thus to get the code to compile a new type (here called B1) has to be found.
As Didier Dupont already pointed out there are some bugs in Scaladoc 2.9, which are solved in 2.10. Not only some missed methods are displayed there but also methods added by an implicit conversion can be displayed (Array for example does display a lot of methods in 2.10 which are not displayed in 2.9).

What is the best way to create and pass around dictionaries containing multiple types in scala?

By dictionary I mean a lightweight map from names to values that can be used as the return value of a method.
Options that I'm aware of include making case classes, creating anon objects, and making maps from Strings -> Any.
Case classes require mental overhead to create (names), but are strongly typed.
Anon objects don't seem that well documented and it's unclear to me how to use them as arguments since there is no named type.
Maps from String -> Any require casting for retrieval.
Is there anything better?
Ideally these could be built from json and transformed back into it when appropriate.
I don't need static typing (though it would be nice, I can see how it would be impossible) - but I do want to avoid explicit casting.
Here's the fundamental problem with what you want:
def get(key: String): Option[T] = ...
val r = map.get("key")
The type of r will be defined from the return type of get -- so, what should that type be? From where could it be defined? If you make it a type parameter, then it's relatively easy:
import scala.collection.mutable.{Map => MMap}
val map: MMap[String, (Manifest[_], Any) = MMap.empty
def get[T : Manifest](key: String): Option[T] = map.get(key).filter(_._1 <:< manifest[T]).map(_._2.asInstanceOf[T])
def put[T : Manifest](key: String, obj: T) = map(key) = manifest[T] -> obj
Example:
scala> put("abc", 2)
scala> put("def", true)
scala> get[Boolean]("abc")
res2: Option[Boolean] = None
scala> get[Int]("abc")
res3: Option[Int] = Some(2)
The problem, of course, is that you have to tell the compiler what type you expect to be stored on the map under that key. Unfortunately, there is simply no way around that: the compiler cannot know what type will be stored under that key at compile time.
Any solution you take you'll end up with this same problem: somehow or other, you'll have to tell the compiler what type should be returned.
Now, this shouldn't be a burden in a Scala program. Take that r above... you'll then use that r for something, right? That something you are using it for will have methods appropriate to some type, and since you know what the methods are, then you must also know what the type of r must be.
If this isn't the case, then there's something fundamentally wrong with the code -- or, perhaps, you haven't progressed from wanting the map to knowing what you'll do with it.
So you want to parse json and turn it into objects that resemble the javascript objets described in the json input? If you want static typing, case classes are pretty much your only option and there are already libraries handling this, for example lift-json.
Another option is to use Scala 2.9's experimental support for dynamic typing. That will give you elegant syntax at the expense of type safety.
You can use approach I've seen in the casbah library, when you explicitly pass a type parameter into the get method and cast the actual value inside the get method. Here is a quick example:
case class MultiTypeDictionary(m: Map[String, Any]) {
def getAs[T <: Any](k: String)(implicit mf: Manifest[T]): T =
cast(m.get(k).getOrElse {throw new IllegalArgumentException})(mf)
private def cast[T <: Any : Manifest](a: Any): T =
a.asInstanceOf[T]
}
implicit def map2multiTypeDictionary(m: Map[String, Any]) =
MultiTypeDictionary(m)
val dict: MultiTypeDictionary = Map("1" -> 1, "2" -> 2.0, "3" -> "3")
val a: Int = dict.getAs("1")
val b: Int = dict.getAs("2") //ClassCastException
val b: Int = dict.getAs("4") //IllegalArgumetExcepton
You should note that there is no real compile-time checks, so you have to deal with all exceptions drawbacks.
UPD Working MultiTypeDictionary class
If you have only a limited number of types which can occur as values, you can use some kind of union type (a.k.a. disjoint type), having e.g. a Map[Foo, Bar | Baz | Buz | Blargh]. If you have only two possibilities, you can use Either[A,B], giving you a Map[Foo, Either[Bar, Baz]]. For three types you might cheat and use Map[Foo, Either[Bar, Either[Baz,Buz]]], but this syntax obviously doesn't scale well. If you have more types you can use things like...
http://cleverlytitled.blogspot.com/2009/03/disjoint-bounded-views-redux.html
http://svn.assembla.com/svn/metascala/src/metascala/OneOfs.scala
http://www.chuusai.com/2011/06/09/scala-union-types-curry-howard/

How do I form the union of scala SortedMaps?

(I'm using Scala nightlies, and see the same behaviour in 2.8.0b1 RC4. I'm a Scala newcomer.)
I have two SortedMaps that I'd like to form the union of. Here's the code I'd like to use:
import scala.collection._
object ViewBoundExample {
class X
def combine[Y](a: SortedMap[X, Y], b: SortedMap[X, Y]): SortedMap[X, Y] = {
a ++ b
}
implicit def orderedX(x: X): Ordered[X] = new Ordered[X] { def compare(that: X) = 0 }
}
The idea here is the 'implicit' statement means Xs can be converted to Ordered[X]s, and then it makes sense combine SortedMaps into another SortedMap, rather than just a map.
When I compile, I get
sieversii:scala-2.8.0.Beta1-RC4 scott$ bin/scalac -versionScala compiler version
2.8.0.Beta1-RC4 -- Copyright 2002-2010, LAMP/EPFL
sieversii:scala-2.8.0.Beta1-RC4 scott$ bin/scalac ViewBoundExample.scala
ViewBoundExample.scala:8: error: type arguments [ViewBoundExample.X] do not
conform to method ordered's type parameter bounds [A <: scala.math.Ordered[A]]
a ++ b
^
one error found
It seems my problem would go away if that type parameter bound was [A <% scala.math.Ordered[A]], rather than [A <: scala.math.Ordered[A]]. Unfortunately, I can't even work out where the method 'ordered' lives! Can anyone help me track it down?
Failing that, what am I meant to do to produce the union of two SortedMaps? If I remove the return type of combine (or change it to Map) everything works fine --- but then I can't rely on the return being sorted!
Currently, what you are using is the scala.collection.SortedMap trait, whose ++ method is inherited from the MapLike trait. Therefore, you see the following behaviour:
scala> import scala.collection.SortedMap
import scala.collection.SortedMap
scala> val a = SortedMap(1->2, 3->4)
a: scala.collection.SortedMap[Int,Int] = Map(1 -> 2, 3 -> 4)
scala> val b = SortedMap(2->3, 4->5)
b: scala.collection.SortedMap[Int,Int] = Map(2 -> 3, 4 -> 5)
scala> a ++ b
res0: scala.collection.Map[Int,Int] = Map(1 -> 2, 2 -> 3, 3 -> 4, 4 -> 5)
scala> b ++ a
res1: scala.collection.Map[Int,Int] = Map(1 -> 2, 2 -> 3, 3 -> 4, 4 -> 5)
The type of the return result of ++ is a Map[Int, Int], because this would be the only type it makes sense the ++ method of a MapLike object to return. It seems that ++ keeps the sorted property of the SortedMap, which I guess it is because ++ uses abstract methods to do the concatenation, and those abstract methods are defined as to keep the order of the map.
To have the union of two sorted maps, I suggest you use scala.collection.immutable.SortedMap.
scala> import scala.collection.immutable.SortedMap
import scala.collection.immutable.SortedMap
scala> val a = SortedMap(1->2, 3->4)
a: scala.collection.immutable.SortedMap[Int,Int] = Map(1 -> 2, 3 -> 4)
scala> val b = SortedMap(2->3, 4->5)
b: scala.collection.immutable.SortedMap[Int,Int] = Map(2 -> 3, 4 -> 5)
scala> a ++ b
res2: scala.collection.immutable.SortedMap[Int,Int] = Map(1 -> 2, 2 -> 3, 3 -> 4, 4 -> 5)
scala> b ++ a
res3: scala.collection.immutable.SortedMap[Int,Int] = Map(1 -> 2, 2 -> 3, 3 -> 4, 4 -> 5)
This implementation of the SortedMap trait declares a ++ method which returns a SortedMap.
Now a couple of answers to your questions about the type bounds:
Ordered[T] is a trait which if mixed in a class it specifies that that class can be compared using <, >, =, >=, <=. You just have to define the abstract method compare(that: T) which returns -1 for this < that, 1 for this > that and 0 for this == that. Then all other methods are implemented in the trait based on the result of compare.
T <% U represents a view bound in Scala. This means that type T is either a subtype of U or it can be implicitly converted to U by an implicit conversion in scope. The code works if you put <% but not with <: as X is not a subtype of Ordered[X] but can be implicitly converted to Ordered[X] using the OrderedX implicit conversion.
Edit: Regarding your comment. If you are using the scala.collection.immutable.SortedMap, you are still programming to an interface not to an implementation, as the immutable SortedMap is defined as a trait. You can view it as a more specialised trait of scala.collection.SortedMap, which provides additional operations (like the ++ which returns a SortedMap) and the property of being immutable. This is in line with the Scala philosophy - prefer immutability - therefore I don't see any problem of using the immutable SortedMap. In this case you can guarantee the fact that the result will definitely be sorted, and this can't be changed as the collection is immutable.
Though, I still find it strange that the scala.collection.SortedMap does not provide a ++ method witch returns a SortedMap as a result. All the limited testing I have done seem to suggest that the result of a concatenation of two scala.collection.SortedMaps indeed produces a map which keeps the sorted property.
Have you picked a tough nut to crack as a beginner to Scala! :-)
Ok, brief tour, don't expect to fully understand it right now. First, note that the problem happens at the method ++. Searching for its definition, we find it at the trait MapLike, receiving either an Iterator or a Traversable. Since y is a SortedMap, then it is the Traversable version being used.
Note in its extensive type signature that there is a CanBuildFrom being passed. It is being passed implicitly, so you don't normally need to worry about it. However, to understand what is going on, this time you do.
You can locate CanBuildFrom by either clicking on it where it appears in the definition of ++, or by filtering. As mentioned by Randall on the comments, there's an unmarked blank field on the upper left of the scaladoc page. You just have to click there and type, and it will return matches for whatever it is you typed.
So, look up the trait CanBuildFrom on ScalaDoc and select it. It has a large number of subclasses, each one responsible for building a specific type of collection. Search for and click on the subclass SortedMapCanBuildFrom. This is the class of the object you need to produce a SortedMap from a Traversable. Note on the instance constructor (the constructor for the class) that it receives an implicit Ordering parameter. Now we are getting closer.
This time, use the filter filter to search for Ordering. Its companion object (click on the small "o" the name) hosts an implicit that will generate Orderings, as companion objects are examined for implicits generating instances or conversions for that class. It is defined inside the trait LowPriorityOrderingImplicits, which object Ordering extends, and looking at it you'll see the method ordered[A <: Ordered[A]], which will produce the Ordering required... or would produce it, if only there wasn't a problem.
One might assume the implicit conversion from X to Ordered[X] would be enough, just as I had before looking more carefully into this. That, however, is a conversion of objects, and ordered expects to receive a type which is a subtype of Ordered[X]. While one can convert an object of type X to an object of type Ordered[X], X, itself, is not a subtype of Ordered[X], so it can't be passed as a parameter to ordered.
On the other hand, you can create an implicit val Ordering[X], instead of the def Ordered[X], and you'll get around the problem. Specifically:
object ViewBoundExample {
class X
def combine[Y](a: SortedMap[X, Y], b: SortedMap[X, Y]): SortedMap[X, Y] = {
a ++ b
}
implicit val orderingX = new Ordering[X] { def compare(x: X, y: X) = 0 }
}
I think most people initial reaction to Ordered/Ordering must be one of perplexity: why have classes for the same thing? The former extends java.lang.Comparable, whereas the latter extends java.util.Comparator. Alas, the type signature for compare pretty much sums the main difference:
def compare(that: A): Int // Ordered
def compare(x: T, y: T): Int // Ordering
The use of an Ordered[A] requires for either A to extend Ordered[A], which would require one to be able to modify A's definition, or to pass along a method which can convert an A into an Ordered[A]. Scala is perfectly capable of doing the latter easily, but then you have to convert each instance before comparing.
On the other hand, the use of Ordering[A] requires the creation of a single object, such as demonstrated above. When you use it, you just pass two objects of type A to compare -- no objects get created in the process.
So there are some performance gains to be had, but there is a much more important reason for Scala's preference for Ordering over Ordered. Look again on the companion object to Ordering. You'll note that there are several implicits for many of Scala classes defined in there. You may recall I mentioned earlier that an implicit for class T will be searched for inside the companion object of T, and that's exactly what is going on.
This could be done for Ordered as well. However, and this is the sticking point, that means every method supporting both Ordering and Ordered would fail! That's because Scala would look for an implicit to make it work, and would find two: one for Ordering, one for Ordered. Being unable to decide which is it you wanted, Scala gives up with an error message. So, a choice had to be made, and Ordering had more going on for it.
Duh, I forgot to explain why the signature isn't defined as ordered[A <% Ordered[A]], instead of ordered[A <: Ordered[A]]. I suspect doing so would cause the double implicits failure I have mentioned before, but I'll ask the guy who actually did this stuff and had the double implicit problems whether this particular method is problematic.