What's up with immutable.Map.map? - scala

How does immutable.Map.map work? It looks like there is something wrong with the documentation:
def map[B](f: (A) ⇒ B): Map[B]
[use case]
Builds a new collection by applying a function to all elements of this immutable map.
Full Signature
def map[B, That](f: ((A, B)) ⇒ B)(implicit bf: CanBuildFrom[Map[A, B], B, That]): That
Map[B] does not make sense since Map takes two type parameters.
In the full signature, there is a name conflict between B the type argument of map, and B the type parameter of Map.

The understandable confusion stems from the fact that map is not implemented in Map, but in TraversableLike. However, the function documentation in inherited by sub classes.
TraversableLike takes two type parameters TraversableLike[+A, +Repr] and the map function has the signature TraversableLike.map[B](f: (A) ⇒ B): Traversable[B]. In the api docs of Map the documentation for that is inherited and partly adjusted Traversable[B] by Map[B], but B from TraversableLike is not resolved to (A, B) form Map.
(Might be a bug in Scaladoc, that probably won't be fixed, as there could be an erasure problem. But that's just guessing on my side.)
You can check what is actually implemented in Map if you configure visibility of members right above the members documentation.
EDIT:
Now to your core question:
If the documentation would give us, what we could read intuitively
and if we simplify it a bit for readability
and if we than further use a bit more of natural language instead of acronyms, the signature of map for a Map[A, B] could look like:
map[ResultItem, ResultCollection](f: (A,B) => ResultItem)(implicit bf: CanBuildFrom[Map[A, B], ResultItem, ResultCollection]): ResultCollection
So, basically you apply a function to each key-value-pair of the Map that transforms a key-value-pair of type (A,B) into a value of type ResultType.
As you can build almost any kind of collection from a map (or any other collection), this result type does not have to be another tuple. And ResultCollection does not have to be another Map. E.g.:
Map("1" -> 1, "2" -> 2).map((keyValuePair: (String, Int)) => keyValuePair._2)
or short
Map("1" -> 1, "2" -> 2).map(_._2)
has List(1, 2) as result, List[Int] as ResultCollection, Int as ResultItem.
This is possible because of the implicit CanBuildFrom parameter that adds an implicit builder that takes the result of the map function and appends it to its builder-result. In most cases CanBuildFrom is inferred by the compiler. However, there are cases when it will not be able to infer the proper result collection.
In such cases you have to give the compiler more information:
val test2: Vector[Int] = Map("1" -> 1, "2" -> 2).map(_._2)(collection.breakOut)
val test3: Set[Int] = Map("1" -> 1, "2" -> 2).map(_._2)(collection.breakOut)
For more information on breakOut and CanBuildFrom I'd recommend this answer

Related

Scala map explicit type

I'm new in Scala and programming in general.. I have troubles with the Scala map function..
The simple signature of the map function is: def map[B](f: (A) ⇒ B): List[B]
So i guess the B of map[B] is generic and i can explicit set the type of the result?
If i try to run the code:
val donuts1: Seq[Int] = Seq(1,2,3)
val donuts2: List[Int] = {
donuts1.map[Int](_ => 1)
}
i got the error message "expression of type int doesn't conform to expexted type B"
I don't understand the problem here.. Could someone explain the problem?
Thank you!
The map() signature quoted in your question is a simplified/abbreviated version of the full signature.
final def map[B, That](f: (A) ⇒ B)(implicit bf: CanBuildFrom[List[A], B, That]): That
So if you want to specify the type parameters (which is almost never needed) then you have to specify both.
val donuts1: List[Int] = List(1,2,3)
val donuts2: List[Int] = donuts1.map[Int,List[Int]](_ => 1)
//donuts2: List[Int] = List(1, 1, 1)
and i can explicit set the type of the result?
Not really. The type parameter has to agree with what the f function/lambda returns. If you specify the type parameter then you're (usually) just asking the compiler to confirm that the result type is actually what you think it should be.

How does flatmap really work in Scala

I took the scala odersky course and thought that the function that Flatmap takes as arguments , takes an element of Monad and returns a monad of different type.
trait M[T] {
def flatMap[U](f: T => M[U]): M[U]
}
On Monad M[T] , the return type of function is also the same Monad , the type parameter U might be different.
However I have seen examples on internet , where the function returns a completely different Monad. I was under impression that return type of function should the same Monad. Can someone simplify the below to explain how flapmap results in the actual value instead of Option in the list.
Is the List not a Monad in Scala.
val l= List(1,2,3,4,5)
def f(x:int) = if (x>2) Some(x) else None
l.map(x=>f(x))
//Result List[Option[Int]] = List( None , None , Some(3) , Some(4) , Some(5))
l.flatMap(x=>f(x))
//Result: List(3,4,5)
Let's start from the fact that M[T]is not a monad by itself. It's a type constructor. It becomes a monad when it's associated with two operators: bind and return (or unit). There are also monad laws these operators must satisfy, but let's omit them for brevity. In Haskell the type of bind is:
class Monad m where
...
(>>=) :: m a -> (a -> m b) -> m b
where m is a type constructor. Since Scala is OO language bind will look like (first argument is self):
trait M[T] {
def bind[U](f: T => M[U]): M[U]
}
Here M === m, T === a, U === b. bind is often called flatMap. In a pure spherical world in a vacuum that would be a signature of flatMap in OO language. Scala is a very practical language, so the real signature of flatMap for List is:
final def flatMap[B, That](f: (A) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]): That
It's not bind, but will work as a monadic bind if you provide f in the form of (A) => List[B] and also make sure that That is List[B]. On the other hand Scala is not going to watch your back if you provide something different, but will try to find some meaningful conversion (e.g. CanBuildFrom or something else) if it exists.
UPDATE
You can play with scalac flags (-Xlog-implicits, -Xlog-implicit-conversions) to see what's happening:
scala> List(1).flatMap { x => Some(x) }
<console>:1: inferred view from Some[Int] to scala.collection.GenTraversableOnce[?] via scala.this.Option.option2Iterable[Int]: (xo: Option[Int])Iterable[Int]
List(1).flatMap { x => Some(x) }
^
res1: List[Int] = List(1)
Hmm, perhaps confusingly, the signature you gave is not actually correct, since it's really (in simplified form):
def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): Traversable[B]
Since the compiler is open-source, you can actually see what it's doing (with its full signature):
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = bf(repr) // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = builder
for (x <- this) b ++= f(x).seq
b.result
}
So you can see that there is actually no requirement that the return type of f be the same as the return type of flatMap.
The flatmap found in the standard library is a much more general and flexible method than the monadic bind method like the flatMap from Odersky's example.
For example, the full signature of flatmap on List is
def flatMap[B, That](f: (A) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]): That
Instead of requiring the function passed into flatmap to return a List, it is able to return any GenTraversableOnce object, a very generic type.
flatmap then uses the implicit CanBuildFrom mechanism to determine the appropriate type to return.
So when you use flatmap with a function that returns a List, it is a monadic bind operation, but it lets you use other types as well.

Avoiding downcasts in generic classes

Noticing that my code was essentially iterating over a list and updating a value in a Map, I first created a trivial helper method which took a function for the transformation of the map value and return an updated map. As the program evolved, it gained a few other Map-transformation functions, so it was natural to turn it into an implicit value class that adds methods to scala.collection.immutable.Map[A, B]. That version works fine.
However, there's nothing about the methods that require a specific map implementation and they would seem to apply to a scala.collection.Map[A, B] or even a MapLike. So I would like it to be generic in the map type as well as the key and value types. This is where it all goes pear-shaped.
My current iteration looks like this:
implicit class RichMap[A, B, MapType[A, B] <: collection.Map[A, B]](
val self: MapType[A, B]
) extends AnyVal {
def updatedWith(k: A, f: B => B): MapType[A, B] =
self updated (k, f(self(k)))
}
This code does not compile because self updated (k, f(self(k))) isa scala.collection.Map[A, B], which is not a MapType[A, B]. In other words, the return type of self.updated is as if self's type was the upper type bound rather than the actual declared type.
I can "fix" the code with a downcast:
def updatedWith(k: A, f: B => B): MapType[A, B] =
self.updated(k, f(self(k))).asInstanceOf[MapType[A, B]]
This does not feel satisfactory because downcasting is a code smell and indicates misuse of the type system. In this particular case it would seem that the value will always be of the cast-to type, and that the whole program compiles and runs correctly with this downcast supports this view, but it still smells.
So, is there a better way to write this code to have scalac correctly infer types without using a downcast, or is this a compiler limitation and a downcast is necessary?
[Edited to add the following.]
My code which uses this method is somewhat more complex and messy as I'm still exploring a few ideas, but an example minimum case is the computation of a frequency distribution as a side-effect with code roughly like this:
var counts = Map.empty[Int, Int] withDefaultValue 0
for (item <- items) {
// loads of other gnarly item-processing code
counts = counts updatedWith (count, 1 + _)
}
There are three answers to my question at the time of writing.
One boils down to just letting updatedWith return a scala.collection.Map[A, B] anyway. Essentially, it takes my original version that accepted and returned an immutable.Map[A, B], and makes the type less specific. In other words, it's still insufficiently generic and sets policy on which types the caller uses. I can certainly change the type on the counts declaration, but that is also a code smell to work around a library returning the wrong type, and all it really does is move the downcast into the caller's code. So I don't really like this answer at all.
The other two are variations on CanBuildFrom and builders in that they essentially iterate over the map to produce a modified copy. One inlines a modified updated method, whereas the other calls the original updated and appends it to the builder and thus appears to make an extra temporary copy. Both are good answers which solve the type correctness problem, although the one that avoids an extra copy is the better of the two from a performance standpoint and I prefer it for that reason. The other is however shorter and arguably more clearly shows intent.
In the case of a hypothetical immutable Map that shares large trees in a similar vein to List, this copying would break the sharing and reduce performance and so it would be preferable to use the existing modified without performing copies. However, Scala's immutable maps don't appear to do this and so copying (once) seems to be the pragmatic solution that is unlikely to make any difference in practice.
Yes! Use CanBuildFrom. This is how the Scala collections library infers the closest collection type to the one you want, using CanBuildFrom evidence. So long as you have implicit evidence of CanBuildFrom[From, Elem, To], where From is the type of collection you're starting with, Elem is the type contained within the collection, and To is the end result you want. The CanBuildFrom will supply a Builder to which you can add elements to, and when you're done, you can call Builder#result() to get the completed collection of the appropriate type.
In this case:
From = MapType[A, B]
Elem = (A, B) // The type actually contained in maps
To = MapType[A, B]
Implementation:
import scala.collection.generic.CanBuildFrom
implicit class RichMap[A, B, MapType[A, B] <: collection.Map[A, B]](
val self: MapType[A, B]
) extends AnyVal {
def updatedWith(k: A, f: B => B)(implicit cbf: CanBuildFrom[MapType[A, B], (A, B), MapType[A, B]]): MapType[A, B] = {
val builder = cbf()
builder ++= self.updated(k, f(self(k)))
builder.result()
}
}
scala> val m = collection.concurrent.TrieMap(1 -> 2, 5 -> 3)
m: scala.collection.concurrent.TrieMap[Int,Int] = TrieMap(1 -> 2, 5 -> 3)
scala> m.updatedWith(1, _ + 10)
res1: scala.collection.concurrent.TrieMap[Int,Int] = TrieMap(1 -> 12, 5 -> 3)
Please note that updated method returns Map class, rather than generic, so I would say you should be fine returning Map as well. But if you really want to return a proper type, you could have a look at implementation of updated in List.updated
I've wrote a small example. I'm not sure it covers all the cases, but it works on my tests. I also used mutable Map, because it was harder for me to test immutable, but I guess it can be easily converted.
implicit class RichMap[A, B, MapType[x, y] <: Map[x, y]](val self: MapType[A, B]) extends AnyVal {
import scala.collection.generic.CanBuildFrom
def updatedWith[R >: B](k: A, f: B => R)(implicit bf: CanBuildFrom[MapType[A, B], (A, R), MapType[A, R]]): MapType[A, R] = {
val b = bf(self)
for ((key, value) <- self) {
if (key != k) {
b += (key -> value)
} else {
b += (key -> f(value))
}
}
b.result()
}
}
import scala.collection.immutable.{TreeMap, HashMap}
val map1 = HashMap(1 -> "s", 2 -> "d").updatedWith(2, _.toUpperCase()) // map1 type is HashMap[Int, String]
val map2 = TreeMap(1 -> "s", 2 -> "d").updatedWith(2, _.toUpperCase()) // map2 type is TreeMap[Int, String]
val map3 = HashMap(1 -> "s", 2 -> "d").updatedWith(2, _.asInstanceOf[Any]) // map3 type is HashMap[Int, Any]
Please also note that CanBuildFrom pattern is much more powerfull and this example doesn't use all of it's power. Thanks to CanBuildFrom some operations can change the type of collection completely like BitSet(1, 3, 5, 7) map {_.toString } type is actually SortedSet[String].

Why does the scaladoc say HashMap.toArray returns Array[A] instead of Array[(A,B)]?

I was looking at the definition of toArray for hashmaps :
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.HashMap
It has
toArray: Array[A]
def toArray[B >: (A, B)](implicit arg0: ClassTag[B]): Array[B]
I don't quite understand this - the first bit says you get an Array[A], but the second part says you get Array[B]? Neither of these are what I expect - Array[(A,B)]
When I check it myself :
scala> val x = scala.collection.mutable.HashMap[String, Int]()
x: scala.collection.mutable.HashMap[String,Int] = Map()
scala> x.put("8", 7)
res0: Option[Int] = None
scala> x foreach println
(8,7)
scala> x.toArray
res2: Array[(String, Int)] = Array((8,7))
why isn't it like toList?
toList: scala.List[(A, B)]
The scaladoc has all kinds of subtle bugs. The problem here is that you are seeing the "simplified" version of the method signature (meant as a way to convey the essential part of the signature and hide things such as CanBuildFrom in map/flatMap methods, which are really an implementation detail).
The simplification got a little awry here, and does not seem to make much sense.
If you click on the "full signature" link, you'll see that the real signature looks like:
def toArray[B >: (A, B)](implicit arg0: ClassTag[B]): Array[B]
In fact this is still wrong, as we certainly cannot have a type B where B >: (A, B). It should be more like :
def toArray[C >: (A, B)](implicit arg0: ClassTag[C]): Array[C]
The problem is that there are actually two Bs: the first one comes from the HashMap class declaration itself (HashMap[A, +B]) while the other one comes from the methods toArray defined in its base class
TraversableOnce (def toArray[B >: A](implicit arg0: ClassTag[B]): Array[B]). It just happens that the scaladoc generator failed to dedup the two instances of B
The API you see in the the Scaladoc of toArray:
def toArray[B >: (A, B)](implicit arg0: ClassTag[B]): Array[B]
Is equivalent to:
def toArray[C >: (A, B)](implicit arg0: ClassTag[C]): Array[C]
The choice of the type variable B is indeed unfortunate (and maybe even a Scaladoc bug, I'm not sure if you are allowed to write that).
It basically means you'll get an array of the most specific supertype of (A,B) for which a ClassTag is available. The ClassTag is required in order to create the Array.
This basically means that if at compile-time, the-run time type of the Map you are converting is fully known, you will get a Array[(A,B)]. However, if you have up-casted your Map somewhere, the run-time type of the resulting Array will depend on the up-casted type, and not on the runtime type. This is different behavior than toList and due to the JVMs restrictions on how native arrays can be created.
The Scaladoc is just wrong because it inherits toArray from TraversableOnce, where the type of the collection is A and the return value is B. The Array[A] thing is left over from TraversableOnce where A is whatever TraversableOnce is traversing (in this case, actually (A,B) for a different definition of A and B); and although it fills the (A,B) in properly in the long form, it still uses B as the new return variable instead of a different letter like C.
Kind of confusing! It actually should read
def toArray[C >: (A,B)](...[C]): Array[C]
and the short form should be
toArray: Array[(A,B)]
just like you expect.

Function syntax puzzler in scalaz

Following watching Nick Partidge's presentation on deriving scalaz, I got to looking at this example, which is just awesome:
import scalaz._
import Scalaz._
def even(x: Int) : Validation[NonEmptyList[String], Int]
= if (x % 2 ==0) x.success else "not even: %d".format(x).wrapNel.fail
println( even(3) <|*|> even(5) ) //prints: Failure(NonEmptyList(not even: 3, not even: 5))
I was trying to understand what the <|*|> method was doing, here is the source code:
def <|*|>[B](b: M[B])(implicit t: Functor[M], a: Apply[M]): M[(A, B)]
= <**>(b, (_: A, _: B))
OK, that is fairly confusing (!) - but it references the <**> method, which is declared thus:
def <**>[B, C](b: M[B], z: (A, B) => C)(implicit t: Functor[M], a: Apply[M]): M[C]
= a(t.fmap(value, z.curried), b)
So I have a few questions:
How come the method appears to take a higher-kinded type of one type parameter (M[B]) but can get passed a Validation (which has two type paremeters)?
The syntax (_: A, _: B) defines the function (A, B) => Pair[A,B] which the 2nd method expects: what is happening to the Tuple2/Pair in the failure case? There's no tuple in sight!
Type Constructors as Type Parameters
M is a type parameter to one of Scalaz's main pimps, MA, that represents the Type Constructor (aka Higher Kinded Type) of the pimped value. This type constructor is used to look up the appropriate instances of Functor and Apply, which are implicit requirements to the method <**>.
trait MA[M[_], A] {
val value: M[A]
def <**>[B, C](b: M[B], z: (A, B) => C)(implicit t: Functor[M], a: Apply[M]): M[C] = ...
}
What is a Type Constructor?
From the Scala Language Reference:
We distinguish between first-order
types and type constructors, which
take type parameters and yield types.
A subset of first-order types called
value types represents sets of
(first-class) values. Value types are
either concrete or abstract. Every
concrete value type can be represented
as a class type, i.e. a type
designator (§3.2.3) that refers to a
class1 (§5.3), or as a compound type
(§3.2.7) representing an intersection
of types, possibly with a refinement
(§3.2.7) that further constrains the
types of itsmembers. Abstract value
types are introduced by type
parameters (§4.4) and abstract type
bindings (§4.3). Parentheses in types
are used for grouping. We assume that
objects and packages also implicitly
define a class (of the same name as
the object or package, but
inaccessible to user programs).
Non-value types capture properties of
identifiers that are not values
(§3.3). For example, a type
constructor (§3.3.3) does not directly
specify the type of values. However,
when a type constructor is applied to
the correct type arguments, it yields
a first-order type, which may be a
value type. Non-value types are
expressed indirectly in Scala. E.g., a
method type is described by writing
down a method signature, which in
itself is not a real type, although it
gives rise to a corresponding function
type (§3.3.1). Type constructors are
another example, as one can write type
Swap[m[_, _], a,b] = m[b, a], but
there is no syntax to write the
corresponding anonymous type function
directly.
List is a type constructor. You can apply the type Int to get a Value Type, List[Int], which can classify a value. Other type constructors take more than one parameter.
The trait scalaz.MA requires that it's first type parameter must be a type constructor that takes a single type to return a value type, with the syntax trait MA[M[_], A] {}. The type parameter definition describes the shape of the type constructor, which is referred to as its Kind. List is said to have the kind '* -> *.
Partial Application of Types
But how can MA wrap a values of type Validation[X, Y]? The type Validation has a kind (* *) -> *, and could only be passed as a type argument to a type parameter declared like M[_, _].
This implicit conversion in object Scalaz converts a value of type Validation[X, Y] to a MA:
object Scalaz {
implicit def ValidationMA[A, E](v: Validation[E, A]): MA[PartialApply1Of2[Validation, E]#Apply, A] = ma[PartialApply1Of2[Validation, E]#Apply, A](v)
}
Which in turn uses a trick with a type alias in PartialApply1Of2 to partially apply the type constructor Validation, fixing the type of the errors, but leaving the type of the success unapplied.
PartialApply1Of2[Validation, E]#Apply would be better written as [X] => Validation[E, X]. I recently proposed to add such a syntax to Scala, it might happen in 2.9.
Think of this as a type level equivalent of this:
def validation[A, B](a: A, b: B) = ...
def partialApply1Of2[A, B C](f: (A, B) => C, a: A): (B => C) = (b: B) => f(a, b)
This lets you combine Validation[String, Int] with a Validation[String, Boolean], because the both share the type constructor [A] Validation[String, A].
Applicative Functors
<**> demands the the type constructor M must have associated instances of Apply and Functor. This constitutes an Applicative Functor, which, like a Monad, is a way to structure a computation through some effect. In this case the effect is that the sub-computations can fail (and when they do, we accumulate the failures).
The container Validation[NonEmptyList[String], A] can wrap a pure value of type A in this 'effect'. The <**> operator takes two effectful values, and a pure function, and combines them with the Applicative Functor instance for that container.
Here's how it works for the Option applicative functor. The 'effect' here is the possibility of failure.
val os: Option[String] = Some("a")
val oi: Option[Int] = Some(2)
val result1 = (os <**> oi) { (s: String, i: Int) => s * i }
assert(result1 == Some("aa"))
val result2 = (os <**> (None: Option[Int])) { (s: String, i: Int) => s * i }
assert(result2 == None)
In both cases, there is a pure function of type (String, Int) => String, being applied to effectful arguments. Notice that the result is wrapped in the same effect (or container, if you like), as the arguments.
You can use the same pattern across a multitude of containers that have an associated Applicative Functor. All Monads are automatically Applicative Functors, but there are even more, like ZipStream.
Option and [A]Validation[X, A] are both Monads, so you could also used Bind (aka flatMap):
val result3 = oi flatMap { i => os map { s => s * i } }
val result4 = for {i <- oi; s <- os} yield s * i
Tupling with `<|**|>`
<|**|> is really similar to <**>, but it provides the pure function for you to simply build a Tuple2 from the results. (_: A, _ B) is a shorthand for (a: A, b: B) => Tuple2(a, b)
And beyond
Here's our bundled examples for Applicative and Validation. I used a slightly different syntax to use the Applicative Functor, (fa ⊛ fb ⊛ fc ⊛ fd) {(a, b, c, d) => .... }
UPDATE: But what happens in the Failure Case?
what is happening to the Tuple2/Pair in the failure case?
If any of the sub-computations fails, the provided function is never run. It only is run if all sub-computations (in this case, the two arguments passed to <**>) are successful. If so, it combines these into a Success. Where is this logic? This defines the Apply instance for [A] Validation[X, A]. We require that the type X must have a Semigroup avaiable, which is the strategy for combining the individual errors, each of type X, into an aggregated error of the same type. If you choose String as your error type, the Semigroup[String] concatenates the strings; if you choose NonEmptyList[String], the error(s) from each step are concatenated into a longer NonEmptyList of errors. This concatenation happens below when two Failures are combined, using the ⊹ operator (which expands with implicits to, for example, Scalaz.IdentityTo(e1).⊹(e2)(Semigroup.NonEmptyListSemigroup(Semigroup.StringSemigroup)).
implicit def ValidationApply[X: Semigroup]: Apply[PartialApply1Of2[Validation, X]#Apply] = new Apply[PartialApply1Of2[Validation, X]#Apply] {
def apply[A, B](f: Validation[X, A => B], a: Validation[X, A]) = (f, a) match {
case (Success(f), Success(a)) => success(f(a))
case (Success(_), Failure(e)) => failure(e)
case (Failure(e), Success(_)) => failure(e)
case (Failure(e1), Failure(e2)) => failure(e1 ⊹ e2)
}
}
Monad or Applicative, how shall I choose?
Still reading? (Yes. Ed)
I've shown that sub-computations based on Option or [A] Validation[E, A] can be combined with either Apply or with Bind. When would you choose one over the other?
When you use Apply, the structure of the computation is fixed. All sub-computations will be executed; the results of one can't influence the the others. Only the 'pure' function has an overview of what happened. Monadic computations, on the other hand, allow the first sub-computation to influence the later ones.
If we used a Monadic validation structure, the first failure would short-circuit the entire validation, as there would be no Success value to feed into the subsequent validation. However, we are happy for the sub-validations to be independent, so we can combine them through the Applicative, and collect all the failures we encounter. The weakness of Applicative Functors has become a strength!