Folding on Type without Monoid Instance - scala

I'm working on this Functional Programming in Scala exercise:
// But what if our list has an element type that doesn't have a Monoid instance?
// Well, we can always map over the list to turn it into a type that does.
As I understand this exercise, it means that, if we have a Monoid of type B, but our input List is of type A, then we need to convert the List[A] to List[B], and then call foldLeft.
def foldMap[A, B](as: List[A], m: Monoid[B])(f: A => B): B = {
val bs = as.map(f)
bs.foldLeft(m.zero)((s, i) => m.op(s, i))
}
Does this understanding and code look right?

First I'd simplify the syntax of the body a bit:
def foldMap[A, B](as: List[A], m: Monoid[B])(f: A => B): B =
as.map(f).foldLeft(m.zero)(m.ops)
Then I'd move the monoid instance into its own implicit parameter list:
def foldMap[A, B](as: List[A])(f: A => B)(implicit m: Monoid[B]): B =
as.map(f).foldLeft(m.zero)(m.ops)
See the original "Type Classes as Objects and Implicits" paper for more detail about how Scala implements type classes using implicit parameter resolution, or this answer by Rex Kerr that I've also linked above.
Next I'd switch the order of the other two parameter lists:
def foldMap[A, B](f: A => B)(as: List[A])(implicit m: Monoid[B]): B =
as.map(f).foldLeft(m.zero)(m.ops)
In general you want to place the parameter lists containing parameters that change the least often first, in order to make partial application more useful. In this case there may only be one possible useful value of A => B for any A and B, but there are lots of values of List[A].
For example, switching the order allows us to write the following (which assumes a monoid instance for Bar):
val fooSum: List[Foo] => Bar = foldMap(fooToBar)
Finally, as a performance optimization (mentioned by stew above), you could avoid creating an intermediate list by moving the application of f into the fold:
def foldMap[A, B](f: A => B)(as: List[A])(implicit m: Monoid[B]): B =
as.foldLeft(m.zero) {
case (acc, a) => m.op(acc, f(a))
}
This is equivalent and more efficient, but to my eye much less clear, so I'd suggest treating it like any optimization—if you need it, use it, but think twice about whether the gains are really worth the loss of clarity.

Related

Variance of Scala List map function

I have a question that's been bugging me.
Lists in Scala are covariant (List[+A])
Let's say we have these classes:
class A
class B extends A
The map function of List[B] takes a function f: B => C
But I can also use a f: A => C
which is a subclass of f: B => C
and it totally makes sense.
What I am currently confused by is that
the map function should accept only functions that are superclasses of the original function (since functions are contravariant on their arguments), which does not apply in the example I've given.
I know there's something wrong with my logic and I would like to enlightened.
Your error lies in the assumption that map(f: A => C) should only accept functions that are superclasses of A => C.
While in reality, map will accept any function that is a subclass of A => C.
In Scala, a function parameter can always be a subclass of the required type.
The covariance of A in List[A] only tells you that, wherever a List[A] is required, you can provide a List[B], as long as B <: A.
Or, in simpler words: List[B] can be treated as if it was a subclass of List[A].
I have compiled a small example to explain these two behaviours:
class A
class B extends A
// this means: B <: A
val listA: List[A] = List()
val listB: List[B] = List()
// Example 1: List[B] <: List[A]
// Note: Here the List[+T] is our parameter! (Covariance!)
def printListA(list: List[A]): Unit = println(list)
printListA(listA)
printListA(listB)
// Example 2: Function[A, _] <: Function[B, _]
// Note: Here a Function1[-T, +R] is our parameter (Contravariance!)
class C
def fooA(a: A): C = ???
def fooB(b: B): C = ???
listB.map(fooB)
listB.map(fooA)
Try it out!
I hope this helps.
As you already suspected, you are mixing up things here.
On the one hand, you have a List[+A], which tells us something about the relationships between List[A] and List[B], given a relationship between A and B. The fact that List is covariant in A just means that List[B] <: List[A] when B <: A, as you know already know.
On the other hand, List exposes a method map to change its "contents". This method does not really care about List[A], but only about As, so the variance of List is irrelevant here. The bit that is confusing you here is that there is indeed some sub-typing to take into consideration: map accepts an argument (a A => C in this case, but it's not really relevant) and, as usual with methods and functions, you can always substitute its argument with anything that is a subtype of it. In your specific case, any AcceptedType will be ok, as long as AcceptedType <: Function1[A,C]. The variance that matters here is Function's, not List's.

How does flatmap really work in Scala

I took the scala odersky course and thought that the function that Flatmap takes as arguments , takes an element of Monad and returns a monad of different type.
trait M[T] {
def flatMap[U](f: T => M[U]): M[U]
}
On Monad M[T] , the return type of function is also the same Monad , the type parameter U might be different.
However I have seen examples on internet , where the function returns a completely different Monad. I was under impression that return type of function should the same Monad. Can someone simplify the below to explain how flapmap results in the actual value instead of Option in the list.
Is the List not a Monad in Scala.
val l= List(1,2,3,4,5)
def f(x:int) = if (x>2) Some(x) else None
l.map(x=>f(x))
//Result List[Option[Int]] = List( None , None , Some(3) , Some(4) , Some(5))
l.flatMap(x=>f(x))
//Result: List(3,4,5)
Let's start from the fact that M[T]is not a monad by itself. It's a type constructor. It becomes a monad when it's associated with two operators: bind and return (or unit). There are also monad laws these operators must satisfy, but let's omit them for brevity. In Haskell the type of bind is:
class Monad m where
...
(>>=) :: m a -> (a -> m b) -> m b
where m is a type constructor. Since Scala is OO language bind will look like (first argument is self):
trait M[T] {
def bind[U](f: T => M[U]): M[U]
}
Here M === m, T === a, U === b. bind is often called flatMap. In a pure spherical world in a vacuum that would be a signature of flatMap in OO language. Scala is a very practical language, so the real signature of flatMap for List is:
final def flatMap[B, That](f: (A) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]): That
It's not bind, but will work as a monadic bind if you provide f in the form of (A) => List[B] and also make sure that That is List[B]. On the other hand Scala is not going to watch your back if you provide something different, but will try to find some meaningful conversion (e.g. CanBuildFrom or something else) if it exists.
UPDATE
You can play with scalac flags (-Xlog-implicits, -Xlog-implicit-conversions) to see what's happening:
scala> List(1).flatMap { x => Some(x) }
<console>:1: inferred view from Some[Int] to scala.collection.GenTraversableOnce[?] via scala.this.Option.option2Iterable[Int]: (xo: Option[Int])Iterable[Int]
List(1).flatMap { x => Some(x) }
^
res1: List[Int] = List(1)
Hmm, perhaps confusingly, the signature you gave is not actually correct, since it's really (in simplified form):
def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): Traversable[B]
Since the compiler is open-source, you can actually see what it's doing (with its full signature):
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = bf(repr) // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = builder
for (x <- this) b ++= f(x).seq
b.result
}
So you can see that there is actually no requirement that the return type of f be the same as the return type of flatMap.
The flatmap found in the standard library is a much more general and flexible method than the monadic bind method like the flatMap from Odersky's example.
For example, the full signature of flatmap on List is
def flatMap[B, That](f: (A) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]): That
Instead of requiring the function passed into flatmap to return a List, it is able to return any GenTraversableOnce object, a very generic type.
flatmap then uses the implicit CanBuildFrom mechanism to determine the appropriate type to return.
So when you use flatmap with a function that returns a List, it is a monadic bind operation, but it lets you use other types as well.

Parameterised Class vs Function

I had a query about what is the difference between parameterising the class vs parameterising the function.
I have provided implementation of a Functor as follows:
trait Functor[F[_],A,B] {
def map(fa: F[A]) (f: A => B) : F[B]
}
and the other where the function is parameterised as follows:
trait Functor[F[_]] {
def map[A,B](fa: F[A]) (f: A => B) : F[B]
}
Where are the cases where we should use one over another?
Another follow up question:
Why do we pass the argument to functor as F[_] and not as F[A] or F[B]. What cases arise when we use either F[A] or F[B]?
Always prefer the second one. With the first one you can implement instances as nonsensical as:
trait WrongFunctor[F[_],A,B] {
def map(fa: F[A])(f: A => B) : F[B]
}
case class notEvenRemotelyAFunctor[A]() extends WrongFunctor[List,A,Int] {
def map(fa: List[A])(f: A => Int) : List[Int] =
if(f(fa.head) < 4) List(3) else List(4)
}
type Id[X] = X
case object ILikeThree extends WrongFunctor[Id, Int, Int] {
def map(fa: Int)(f: Int => Int): Int = if(fa == 3) 3 else f(fa)
}
Even if you do things right, you'd need for a fixed functor implementation one object per different types at which you want to use fmap. But the important point is that the second one at least makes it harder to write that kind of wrong "functors"; less non-functors will slip by:
trait Functor[F[_]] {
def map[A,B](fa: F[A])(f: A => B) : F[B]
}
case object ILikeThreeAgain extends Functor[Id] {
def map[A,B](fa: A)(f: A => B) : B =
??? // how do I write the above here?
}
The keywords here are parametricity and parametric polymorphism. The intuition is that if something is defined generically, you can derive properties it will satisfy just from the types involved. See for example Bartosz Milewski blog - Parametricity: Money for Nothing and Theorems for Free for a good explanation, or the canonical Theorems for free paper.
Follow-up question
Another follow up question: Why do we pass the argument to functor as F[_] and not as F[A] or F[B]. What cases arise when we use either F[A] or F[B]?
Because that's part of what a functor is; it is a "constructor":
for each input type A it gives you as output another type F[A]
and for each function f: A => B, another function fmap(f): F[A] => F[B] satisfying fmap(id[A]) == id[F[A]] and fmap(f andThen g) == fmap(f) andThen fmap(g)
So for 1. you need a way of representing functions on types; and that's what F[_] is.
Note that having a map method like in your signature is in this case equivalent to fmap:
trait Functor[F[_]] {
def map[A,B](fa: F[A])(f: A => B) : F[B]
def fmap[A,B](f: A => B): F[A] => F[B] =
{ fa => map(fa)(f) }
def mapAgain[A,B](fa: F[A])(f: A => B) : F[B] =
fmap(f)(fa)
}
Now for how this links with real category theory:
Instances of your Functor[F[_]] trait above are meant to represent Scala-enriched functors
F: Scala → Scala
Let's unpack this.
There's a (usually implicitly defined) category Scala with objects types, and morphisms functions f: A ⇒ B. This category is cartesian closed, where the internal hom is the type A ⇒ B, and the product (A,B). We can work then with Scala-enriched categories and functors. What's a Scala-enriched category? basically one that you can define using Scala the language: you have
a set of objects (which you'd need to represent as types)
for each A,B a type C[A,B] with identities id[X]: C[X,Y] and composition andThen[X,Y,Z]: (C[X,Y], C[Y,Z]) => C[X,Z] satisfying the category axioms
Enriched functors F: C → D are then
a function from the objects of C to those of D, A -> F[A]
for each pair of objects A,B: C a morphism in Scala i.e. a function fmap: C[A,B] => C[F[A], F[B]] satisfying the functor laws fmap(id[A]) == id[F[A]] and fmap(f andThen g) == fmap(f) andThen fmap(g)
Scala is naturally enriched in itself, with Scala[X,Y] = X => Y, and enriched functors F: Scala → Scala are what instances of your Functor[F[_]] trait are meant to represent.
Of course this needs all sort of qualifications about how Scala breaks this and that, morphism equality etc. But the moral of the story is: your base language L (like Scala in this case) is likely trying to be a cartesian-closed (or at least symmetric monoidal-closed) category, and functors definable through it correspond to L-enriched functors.

Avoiding downcasts in generic classes

Noticing that my code was essentially iterating over a list and updating a value in a Map, I first created a trivial helper method which took a function for the transformation of the map value and return an updated map. As the program evolved, it gained a few other Map-transformation functions, so it was natural to turn it into an implicit value class that adds methods to scala.collection.immutable.Map[A, B]. That version works fine.
However, there's nothing about the methods that require a specific map implementation and they would seem to apply to a scala.collection.Map[A, B] or even a MapLike. So I would like it to be generic in the map type as well as the key and value types. This is where it all goes pear-shaped.
My current iteration looks like this:
implicit class RichMap[A, B, MapType[A, B] <: collection.Map[A, B]](
val self: MapType[A, B]
) extends AnyVal {
def updatedWith(k: A, f: B => B): MapType[A, B] =
self updated (k, f(self(k)))
}
This code does not compile because self updated (k, f(self(k))) isa scala.collection.Map[A, B], which is not a MapType[A, B]. In other words, the return type of self.updated is as if self's type was the upper type bound rather than the actual declared type.
I can "fix" the code with a downcast:
def updatedWith(k: A, f: B => B): MapType[A, B] =
self.updated(k, f(self(k))).asInstanceOf[MapType[A, B]]
This does not feel satisfactory because downcasting is a code smell and indicates misuse of the type system. In this particular case it would seem that the value will always be of the cast-to type, and that the whole program compiles and runs correctly with this downcast supports this view, but it still smells.
So, is there a better way to write this code to have scalac correctly infer types without using a downcast, or is this a compiler limitation and a downcast is necessary?
[Edited to add the following.]
My code which uses this method is somewhat more complex and messy as I'm still exploring a few ideas, but an example minimum case is the computation of a frequency distribution as a side-effect with code roughly like this:
var counts = Map.empty[Int, Int] withDefaultValue 0
for (item <- items) {
// loads of other gnarly item-processing code
counts = counts updatedWith (count, 1 + _)
}
There are three answers to my question at the time of writing.
One boils down to just letting updatedWith return a scala.collection.Map[A, B] anyway. Essentially, it takes my original version that accepted and returned an immutable.Map[A, B], and makes the type less specific. In other words, it's still insufficiently generic and sets policy on which types the caller uses. I can certainly change the type on the counts declaration, but that is also a code smell to work around a library returning the wrong type, and all it really does is move the downcast into the caller's code. So I don't really like this answer at all.
The other two are variations on CanBuildFrom and builders in that they essentially iterate over the map to produce a modified copy. One inlines a modified updated method, whereas the other calls the original updated and appends it to the builder and thus appears to make an extra temporary copy. Both are good answers which solve the type correctness problem, although the one that avoids an extra copy is the better of the two from a performance standpoint and I prefer it for that reason. The other is however shorter and arguably more clearly shows intent.
In the case of a hypothetical immutable Map that shares large trees in a similar vein to List, this copying would break the sharing and reduce performance and so it would be preferable to use the existing modified without performing copies. However, Scala's immutable maps don't appear to do this and so copying (once) seems to be the pragmatic solution that is unlikely to make any difference in practice.
Yes! Use CanBuildFrom. This is how the Scala collections library infers the closest collection type to the one you want, using CanBuildFrom evidence. So long as you have implicit evidence of CanBuildFrom[From, Elem, To], where From is the type of collection you're starting with, Elem is the type contained within the collection, and To is the end result you want. The CanBuildFrom will supply a Builder to which you can add elements to, and when you're done, you can call Builder#result() to get the completed collection of the appropriate type.
In this case:
From = MapType[A, B]
Elem = (A, B) // The type actually contained in maps
To = MapType[A, B]
Implementation:
import scala.collection.generic.CanBuildFrom
implicit class RichMap[A, B, MapType[A, B] <: collection.Map[A, B]](
val self: MapType[A, B]
) extends AnyVal {
def updatedWith(k: A, f: B => B)(implicit cbf: CanBuildFrom[MapType[A, B], (A, B), MapType[A, B]]): MapType[A, B] = {
val builder = cbf()
builder ++= self.updated(k, f(self(k)))
builder.result()
}
}
scala> val m = collection.concurrent.TrieMap(1 -> 2, 5 -> 3)
m: scala.collection.concurrent.TrieMap[Int,Int] = TrieMap(1 -> 2, 5 -> 3)
scala> m.updatedWith(1, _ + 10)
res1: scala.collection.concurrent.TrieMap[Int,Int] = TrieMap(1 -> 12, 5 -> 3)
Please note that updated method returns Map class, rather than generic, so I would say you should be fine returning Map as well. But if you really want to return a proper type, you could have a look at implementation of updated in List.updated
I've wrote a small example. I'm not sure it covers all the cases, but it works on my tests. I also used mutable Map, because it was harder for me to test immutable, but I guess it can be easily converted.
implicit class RichMap[A, B, MapType[x, y] <: Map[x, y]](val self: MapType[A, B]) extends AnyVal {
import scala.collection.generic.CanBuildFrom
def updatedWith[R >: B](k: A, f: B => R)(implicit bf: CanBuildFrom[MapType[A, B], (A, R), MapType[A, R]]): MapType[A, R] = {
val b = bf(self)
for ((key, value) <- self) {
if (key != k) {
b += (key -> value)
} else {
b += (key -> f(value))
}
}
b.result()
}
}
import scala.collection.immutable.{TreeMap, HashMap}
val map1 = HashMap(1 -> "s", 2 -> "d").updatedWith(2, _.toUpperCase()) // map1 type is HashMap[Int, String]
val map2 = TreeMap(1 -> "s", 2 -> "d").updatedWith(2, _.toUpperCase()) // map2 type is TreeMap[Int, String]
val map3 = HashMap(1 -> "s", 2 -> "d").updatedWith(2, _.asInstanceOf[Any]) // map3 type is HashMap[Int, Any]
Please also note that CanBuildFrom pattern is much more powerfull and this example doesn't use all of it's power. Thanks to CanBuildFrom some operations can change the type of collection completely like BitSet(1, 3, 5, 7) map {_.toString } type is actually SortedSet[String].

Does flatMap functional signature (input -> output) proposes its doing any flattening?

flatMap signature:
/* applies a transformation of the monad "content" by composing
* this monad with an operation resulting in another monad instance
* of the same type
*/
def flatMap(f: A => M[B]): M[B]
is there anyway to understand by its signature (input to output) (except for the name flat) that its flattening the structure? or must I read its implementation to understand that? isn't good coding practice means I can understand by the signature of the function (input to output even without function name) exactly what it does? if so how does flatMap follows this basic programming practice? or does it violates it?
It helps to understand that the type of the thing that flatMap is defined on is also M and would be something like Traversable[A] or a subclass such as a List[A], or another Monad such as Option or Future. Therefore the left hand side of the function definition that flatMap requires (f: A => M[B]) hints to us that this function processes the individual items contained within the Monad, and produces M[B] for each one. The fact that it then returns M[B] rather than M[M[B]] is a hint that some flattening is taking place.
However I disagree that it should be possible to know exactly what a function does by its signature. For example:
Trait Number {
def plus(that: Number): Number
def minus(that: Number): Number
}
Without the function names and possible also some documentation, I don't think it is reasonable to expect that another person can know what plus and minus do.
Does flatMap functional signature (input -> output) proposes its doing any flattening?
Yes, it does.
In fact, join (or flatten), map and unit form a minimal set of primitive functions needed to implement a monad.
flatMap can be implemented in terms of these other functions.
//minimal set
def join[A](mma: M[M[A]]): M[A] = ???
def map[A](ma: M[A], f: A => B): M[B] = ???
def unit[A](a: A): M[A] = ???
def flatMap[A](ma: M[A], f: A => M[B]): M[B] = join(map(ma))
It should now be clear that flatMap flattens the structure of the mapped monad.
For comparison, here's another mimimal set of primitives, where join/flatten is implemented in terms of flatMap.
// minimal set
def unit[A](a: A): M[A] = ???
def flatMap[A](ma: M[A], f: A => M[B]): M[B] = ???
def map[A](ma: M[A], f: A => B): M[B] = flatMap(ma, a => unit(f(a)))
def join[A](mma: M[M[A]]): M[A] = flatMap(mma, ma => ma)