Use monoids to compute a frequency map of words - scala

I am learning the book functional programming in scala. chapter 10, exercise 20, to implement the following method:
def frequencyMap(strings: IndexedSeq[String]): Map[String, Int]
honestly, I don't have a solution, so, I checked the answer from GIT:
def mapMergeMonoid[K, V](V: Monoid[V]): Monoid[Map[K, V]] =
new Monoid[Map[K, V]] {
def zero = Map()
def op(a: Map[K, V], b: Map[K, V]) =
a.map {
case (k, v) => (k, V.op(v, b.get(k) getOrElse V.zero))
}
}
def bag[A](as: IndexedSeq[A]): Map[A, Int] =
foldMapV(as, mapMergeMonoid[A, Int](intAddition))((a: A) => Map(a -> 1))
def frequencyMap(strings: IndexedSeq[String]): Map[String, Int] = bag(strings)
But, when I tried to run a test, I got wrong answer:
frequencyMap(Vector("a rose", "is a", "rose is", "a rose"))
the answer printed:
Map(a rose -> 1)
the expected result:
Map(a -> 3, rose -> 3, is -> 2)
I can't figure out where is wrong with the implementation. could someone explain it for me? thanks.
Edit after the correct answer, and update with a correct implementation
def mapMergeMonoid[K, V](V: Monoid[V]): Monoid[Map[K, V]] =
new Monoid[Map[K, V]] {
def zero = Map()
def op(a: Map[K, V], b: Map[K, V]) = {
// (a.map {
// case (k, v) => (k, V.op(v, b.get(k) getOrElse V.zero))
//
val c = for {
(ka, va) <- a
(kb, vb) <- b
if (ka == kb)
} yield (ka -> V.op(va, vb))
a ++ b ++ c
}
}
def bag[A](as: IndexedSeq[A]): Map[A, Int] =
foldMapV(as, mapMergeMonoid[A, Int](intAddition))((a: A) => Map(a -> 1))
def frequencyMap(strings: IndexedSeq[String]): Map[String, Int] = bag(strings.map(_.split("\\s+")).flatten)

I don't have the book, but following the code from here, I can point out a few things for you to think about( based on some println()s in a REPL session ):
As S.R.I pointed out, you have a list of phrases, not words. The functions on github don't do anything to separate the word groups for you, so at best, your current input Vector could get:
Map(a rose -> 2, is a -> 1, rose is -> 1)
You could create a list of words by doing the following to your Vector:
Vector("a rose", "is a", "rose is", "a rose") map( _.split( " " ).toSeq ) flatten
The mapMergeMonoid function only appears to add the value of k and v together when there keys(k) are the same. Meaning an unsorted list is going to result in a lot of Map(string -> 1).
You could sort the Vector of words by making the following changes to Vector:
(Vector("a rose", "is a", "rose is", "a rose") map( _.split( " " ).toSeq ) flatten) sortWith(_.compareTo(_) < 0)
While foldMapV does split out all the phrases or words from the Vector in a Map[String, Int], it only appears to return the leftmost desired merge for a sorted IndexedSeq. With a sorted Vector of words as I've indicated, I got the result Map(a -> 3) from your implementation of frequencyMap. I would be inclined to use something other than foldMapV though or modify it to make frequencyMap work. The bit that appears to be missing to me is an accumulation of the results of the function it's applying into one large Map. I would also try a more "standard" head/tail recursion than the splitAt() calls(as I'm not convinced the foldMapV was handling is and rose very well).

Related

updated method on ListMap

I'm using ListMap because I need to keep the insertion order in place. After initializing it seems it works. but when I call updated on it the order gets messed up. 1- Why is that? 2- Is there any other MapLike that doesn't have this problem, if not how should I update the map without problem?
scala> import scala.collection.immutable.ListMap
import scala.collection.immutable.ListMap
scala> val a = ListMap(0 -> "A", 1 -> "B", 2 ->"C")
a: scala.collection.immutable.ListMap[Int,String] = Map(0 -> A, 1 -> B, 2 -> C)
scala> a.foreach(println)
(0,A)
(1,B)
(2,C)
scala> val b = a.updated(1, "D")
b: scala.collection.immutable.ListMap[Int,String] = Map(0 -> A, 2 -> C, 1 -> D)
scala> b.foreach(println)
(0,A)
(2,C)
(1,D)
I could not find any existent immutable collection with desired property. But it could be crafted manually.
import scala.collection.immutable.{IntMap, Map, MapLike}
class OrderedMap[K, +V] private[OrderedMap](backing: Map[K, V], val order: IntMap[K], coorder: Map[K, Int], extSize: Int)
extends Map[K, V] with MapLike[K, V, OrderedMap[K, V]] {
def +[B1 >: V](kv: (K, B1)): OrderedMap[K, B1] = {
val (k, v) = kv
if (backing contains k)
new OrderedMap(backing + kv, order, coorder, extSize)
else new OrderedMap(backing + kv, order + (extSize -> k), coorder + (k -> extSize), extSize + 1)
}
def get(key: K): Option[V] = backing.get(key)
def iterator: Iterator[(K, V)] = for (key <- order.valuesIterator) yield (key, backing(key))
def -(key: K): OrderedMap[K, V] = if (backing contains key) {
val index = coorder(key)
new OrderedMap(backing - key, order - index, coorder - key, extSize)
} else this
override def empty: OrderedMap[K, V] = OrderedMap.empty[K, V]
}
object OrderedMap {
def empty[K, V] = new OrderedMap[K, V](Map.empty, IntMap.empty, Map.empty, 0)
def apply[K, V](assocs: (K, V)*): OrderedMap[K, V] = assocs.foldLeft(empty[K, V])(_ + _)
}
Here order is preserved insertion order map (probably with "holes"). coorder special field needed for efficient handling element removal. extSize is basically order.lastkey + 1 but more straightforward
Now you can verify that
val a = OrderedMap(0 -> "A", 1 -> "B", 2 -> "C")
a.foreach(println)
val b = a.updated(1, "D")
b.foreach(println)
prints
(0,A)
(1,B)
(2,C)
and
(0,A)
(1,D)
(2,C)
From the scala doc for updated
"This method allows one to create a new map with an additional mapping
from key to value."
Note it does not say "with a different value of an existing key". So when you updated with 1->D, that's a new/additional mapping. So it appears at the end of the list, preserving insertion order. The old mapping 1->C is no longer present in the map.
So it's not "messed up" and it's not a problem. It's doing what it's documented to do, the mappings are in insertion order.

How to convert Map[A,Future[B]] to Future[Map[A,B]]?

I've been working with the Scala Akka library and have come across a bit of a problem. As the title says, I need to convert Map[A, Future[B]] to Future[Map[A,B]]. I know that one can use Future.sequence for Iterables like Lists, but that doesn't work in this case.
I was wondering: is there a clean way in Scala to make this conversion?
See if this works for you:
val map = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
val fut = Future.sequence(map.map(entry => entry._2.map(i => (entry._1, i)))).map(_.toMap)
The idea is to map the map to an Iterable for a Tuple of the key of the map and the result of the future tied to that key. From there you can sequence that Iterable and then once you have the aggregate Future, map it and convert that Iterable of Tuples to a map via toMap.
Now, an alternative to this approach is to try and do something similar to what the sequence function is doing, with a couple of tweaks. You could write a sequenceMap function like so:
def sequenceMap[A, B](in: Map[B, Future[A]])(implicit executor: ExecutionContext): Future[Map[B, A]] = {
val mb = new MapBuilder[B,A, Map[B,A]](Map())
in.foldLeft(Promise.successful(mb).future) {
(fr, fa) => for (r <- fr; a <- fa._2.asInstanceOf[Future[A]]) yield (r += ((fa._1, a)))
} map (_.result)
}
And then use it in an example like this:
val map = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
val fut = sequenceMap(map)
fut onComplete{
case Success(m) => println(m)
case Failure(ex) => ex.printStackTrace()
}
This might be slightly more efficient than the first example as it creates less intermediate collections and has less hits to the ExecutionContext.
I think the most succinct we can be with core Scala 2.12.x is
val futureMap = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
Future.traverse(futureMap.toList) { case (k, fv) => fv.map(k -> _) } map(_.toMap)
Update: You can actually get the nice .sequence syntax in Scalaz 7 without too much fuss:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{ Future, future }
import scalaz._, Scalaz.{ ToTraverseOps => _, _ }
import scalaz.contrib.std._
val m = Map("a" -> future(1), "b" -> future(2), "c" -> future(3))
And then:
scala> m.sequence.onSuccess { case result => println(result) }
Map(a -> 1, b -> 2, c -> 3)
In principle it shouldn't be necessary to hide ToTraverseOps like this, but for now it does the trick. See the rest of my answer below for more details about the Traverse type class, dependencies, etc.
As copumpkin notes in a comment above, Scalaz contains a Traverse type class with an instance for Map[A, _] that is one of the puzzle pieces here. The other piece is the Applicative instance for Future, which isn't in Scalaz 7 (which is still cross-built against pre-Future 2.9), but is in scalaz-contrib.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scalaz._, Scalaz._
import scalaz.contrib.std._
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] = {
type M[X] = Map[A, X]
(m: M[Future[B]]).sequence
}
Or:
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] =
Traverse[({ type L[X] = Map[A, X] })#L] sequence m
Or:
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] =
TraverseOpsUnapply(m).sequence
In a perfect world you'd be able to write m.sequence, but the TraverseOps machinery that should make this syntax possible isn't currently able to tell how to go from a particular Map instance to the appropriate Traverse instance.
This also works, where the idea is to use the sequence result (of the map's values) to fire a promise that says you can start retrieving values from your map. mapValues gives you a non-strict view of your map, so the value.get.get is only applied when you retrieve the value. That's right, you get to keep your map! Free ad for the puzzlers in that link.
import concurrent._
import concurrent.duration._
import scala.util._
import ExecutionContext.Implicits.global
object Test extends App {
def calc(i: Int) = { Thread sleep i * 1000L ; i }
val m = Map("a" -> future{calc(1)}, "b" -> future{calc(2)}, "c" -> future{calc(3)})
val m2 = m mapValues (_.value.get.get)
val k = Future sequence m.values
val p = Promise[Map[String,Int]]
k onFailure { case t: Throwable => p failure t }
k onSuccess { case _ => p success m2 }
val res = Await.result(p.future, Duration.Inf)
Console println res
}
Here's the REPL where you see it force the m2 map by printing all its values:
scala> val m2 = m mapValues (_.value.get.get)
m2: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
This shows the same thing with futures that are still in the future:
scala> val m2 = m mapValues (_.value.get.get)
java.util.NoSuchElementException: None.get
Just make a new future which waits for all futures in the map values , then builds a map to return.
I would try to avoid using overengineered Scalaz based super-functional solutions (unless your project is already heavily Scalaz based and has tons of "computationally sophisticated" code; no offense on the "overengineered" remark):
// the map you have
val foo: Map[A, Future[B]] = ???
// get a Seq[Future[...]] so that we can run Future.sequence on it
val bar: Seq[Future[(A, B)]] = foo.map { case (k, v) => v.map(k -> _) }
// here you go; convert back `toMap` once it completes
Future.sequence(bar).onComplete { data =>
// do something with data.toMap
}
However, it should be safe to assume that your map values are somehow generated from the map keys, which initially reside in a Seq such as List, and that the part of code that builds the initial Map is under your control as opposed to being sent from elsewhere. So I would personally take an even simpler/cleaner approach instead by not starting out with Map[A, Future[B]] in the first place.
def fetchAgeFromDb(name: String): Future[Int] = ???
// no foo needed anymore
// no Map at all before the future completes
val bar = personNames.map { name => fetchAgeFromDb(name).map(name -> _) }
// just as above
Future.sequence(bar).onComplete { data =>
// do something with data.toMap
}
Is this solution acceptable :
without an execution context this should works ...
def removeMapFuture[A, B](in: Future[Map[A, Future[B]]]) = {
in.flatMap { k =>
Future.sequence(k.map(l =>
l._2.map(l._1 -> _)
)).map {
p => p.toMap
}
}
}

Create a Map of Iterables only using immutable collections

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.
Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).
I can do it using mutable Map and/or using mutable ListBuffers.
However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)
I have found out a really simple way to do this
pairs.groupBy{_._1}.mapValues{_.map{_._2}}
And that's it.
Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:
loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation
So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we
def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
#annotation.tailrec def mkMapInner(
data: Iterator[(K,V)],
map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
): Map[K,Vector[V]] = {
if (data.hasNext) {
val (k,v) = data.next
mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
}
else map
}
mkMapInner(data.iterator)
}
Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with #annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)
Let's test it out:
val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] =
Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))
Now, it turns out that Scala's collection libraries also contain something useful for this:
scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }
with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.
For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):
pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}
Although of course in this case the groupBy approach is more convenient.
val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)
ps.groupBy(_._1).mapValues(_ map (_._2))
// = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))
This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:
ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
// = Map(1 -> Vector(2, 5), 3 -> Vector(4))
It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....
More info on breakOut here.
As a method:
def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
I perform this function so often that I have an implicit written called groupByKey that does precisely this:
class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)
And you use it like this:
scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))
Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.

Is the ordering of members of a map, seeming to be by addition, reliable?

When I create an immutable map with a standard call to Map() or by concatenating the existing maps created that way, in all my tests I get that traversing its members provides them in the order of addition. That's exactly the way I need them to be sorted, but there's not a word in the documentation about the reliability of the ordering of the members of the map.
So I was wondering whether it is safe to expect the standard Map to return its items in the order of addition or I should look for some other implementations and which ones in that case.
I don't think it's safe, the order is not preserved starting from 5 elements (Scala 2.9.1):
scala> Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4, 5 -> 5)
res9: scala.collection.immutable.Map[Int,Int] =
Map(5 -> 5, 1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4)
With bigger maps the order is completely "random", try Map((1 to 100) zip (1 to 100): _*).
Try LinkedHashMap for ordered entries and TreeMap to achieve sorted entries.
There is no promise about the order of Map. There is an OrderedMap in scalas collection package. The values in that package are ordered by an implicit Ordering. As quickfix I recommend you to use a list of keys for the ordering of your Map.
var keyOrdering = List[Int]()
var unorderedMap = Map[Int, String]()
unorderedMap += (1 -> "one")
keyOrdering :+= 1
Edit
You could implement your own Ordering and pass it to a SortedMap as well.
Edit #2
A simple example would be the following:
scala> import scala.collection.SortedMap
import scala.collection.SortedMap
scala> implicit object IntOrdering extends Ordering[Int]
| def compare(a: Int, b: Int) = b - a
| }
defined module IntOrdering
scala> var sm = SortedMap[Int, String]()
sm: scala.collection.SortedMap[Int,String] = Map()
scala> sm += (1 -> "one")
scala> sm += (2 -> "two")
scala> println(sm)
Map(2 -> two, 1 -> one)
The implicit Ordering is applied to the keys, so IntOrdering might be applied to a SortedMap[Int, Any].
Edit #3
A self ordering DataType like in my comment might look this way:
case class DataType[T](t: T, index: Int)
object DataType{
private var index = -1
def apply[T](t: T) = { index += 1 ; new DataType[T](t, index)
}
Now we need to change the Ordering:
implicit object DataTypeOrdering extends Ordering[DataType[_]] {
def compare(a: DataType[_], b: DataType[_]) = a.index - b.index
}
I hope this is the way you expected my answer.
After digging I've found out that there exists an immutable ListMap that behaves exactly as I want it, but according to this table its performance is just awfull. So I wrote a custom immutable implementation that should perform effectively on all operations except removal, where it performs linearly. It does require a bit more memory as it's backed by a standard Map and a Queue, which itself utilizes a List twice, but in the current age it's not an issue, right.
import collection.immutable.Queue
object OrderedMap {
def apply[A, B](elems: (A, B)*) =
new OrderedMap(Map(elems: _*), Queue(elems: _*))
}
class OrderedMap[A, B](
map: Map[A, B] = Map[A, B](),
protected val queue: Queue[(A, B)] = Queue()
) extends Map[A, B] {
def get(key: A) =
map.get(key)
def iterator =
queue.iterator
def +[B1 >: B](kv: (A, B1)) =
new OrderedMap(
map + kv,
queue enqueue kv
)
def -(key: A) =
new OrderedMap(
map - key,
queue filter (_._1 != key)
)
override def hashCode() =
queue.hashCode
override def equals(that: Any) =
that match {
case that: OrderedMap[A, B] =>
queue.equals(that.queue)
case _ =>
super.equals(that)
}
}

Scala: how to merge a collection of Maps

I have a List of Map[String, Double], and I'd like to merge their contents into a single Map[String, Double]. How should I do this in an idiomatic way? I imagine that I should be able to do this with a fold. Something like:
val newMap = Map[String, Double]() /: listOfMaps { (accumulator, m) => ... }
Furthermore, I'd like to handle key collisions in a generic way. That is, if I add a key to the map that already exists, I should be able to specify a function that returns a Double (in this case) and takes the existing value for that key, plus the value I'm trying to add. If the key does not yet exist in the map, then just add it and its value unaltered.
In my specific case I'd like to build a single Map[String, Double] such that if the map already contains a key, then the Double will be added to the existing map value.
I'm working with mutable maps in my specific code, but I'm interested in more generic solutions, if possible.
Well, you could do:
mapList reduce (_ ++ _)
except for the special requirement for collision.
Since you do have that special requirement, perhaps the best would be doing something like this (2.8):
def combine(m1: Map, m2: Map): Map = {
val k1 = Set(m1.keysIterator.toList: _*)
val k2 = Set(m2.keysIterator.toList: _*)
val intersection = k1 & k2
val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
r2 ++ r1
}
You can then add this method to the map class through the Pimp My Library pattern, and use it in the original example instead of "++":
class CombiningMap(m1: Map[Symbol, Double]) {
def combine(m2: Map[Symbol, Double]) = {
val k1 = Set(m1.keysIterator.toList: _*)
val k2 = Set(m2.keysIterator.toList: _*)
val intersection = k1 & k2
val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
r2 ++ r1
}
}
// Then use this:
implicit def toCombining(m: Map[Symbol, Double]) = new CombiningMap(m)
// And finish with:
mapList reduce (_ combine _)
While this was written in 2.8, so keysIterator becomes keys for 2.7, filterKeys might need to be written in terms of filter and map, & becomes **, and so on, it shouldn't be too different.
How about this one:
def mergeMap[A, B](ms: List[Map[A, B]])(f: (B, B) => B): Map[A, B] =
(Map[A, B]() /: (for (m <- ms; kv <- m) yield kv)) { (a, kv) =>
a + (if (a.contains(kv._1)) kv._1 -> f(a(kv._1), kv._2) else kv)
}
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
val mm = mergeMap(ms)((v1, v2) => v1 + v2)
println(mm) // prints Map(hello -> 5.5, world -> 2.2, goodbye -> 3.3)
And it works in both 2.7.5 and 2.8.0.
I'm surprised no one's come up with this solution yet:
myListOfMaps.flatten.toMap
Does exactly what you need:
Merges the list to a single Map
Weeds out any duplicate keys
Example:
scala> List(Map('a -> 1), Map('b -> 2), Map('c -> 3), Map('a -> 4, 'b -> 5)).flatten.toMap
res7: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 4, 'b -> 5, 'c -> 3)
flatten turns the list of maps into a flat list of tuples, toMap turns the list of tuples into a map with all the duplicate keys removed
Starting Scala 2.13, another solution which handles duplicate keys and is only based on the standard library consists in merging the Maps as sequences (flatten) before applying the new groupMapReduce operator which (as its name suggests) is an equivalent of a groupBy followed by a mapping and a reduce step of grouped values:
List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
.flatten
.groupMapReduce(_._1)(_._2)(_ + _)
// Map("world" -> 2.2, "goodbye" -> 3.3, "hello" -> 5.5)
This:
flattens (concatenates) the maps as a sequence of tuples (List(("hello", 1.1), ("world", 2.2), ("goodbye", 3.3), ("hello", 4.4))), which keeps all key/values (even duplicate keys)
groups elements based on their first tuple part (_._1) (group part of groupMapReduce)
maps grouped values to their second tuple part (_._2) (map part of groupMapReduce)
reduces mapped grouped values (_+_) by taking their sum (but it can be any reduce: (T, T) => T function) (reduce part of groupMapReduce)
The groupMapReduce step can be seen as a one-pass version equivalent of:
list.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _))
Interesting, noodling around with this a bit, I got the following (on 2.7.5):
General Maps:
def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: Seq[scala.collection.Map[A,B]]): Map[A, B] = {
listOfMaps.foldLeft(Map[A, B]()) { (m, s) =>
Map(
s.projection.map { pair =>
if (m contains pair._1)
(pair._1, collisionFunc(m(pair._1), pair._2))
else
pair
}.force.toList:_*)
}
}
But man, that is hideous with the projection and forcing and toList and whatnot. Separate question: what's a better way to deal with that within the fold?
For mutable Maps, which is what I was dealing with in my code, and with a less general solution, I got this:
def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: List[mutable.Map[A,B]]): mutable.Map[A, B] = {
listOfMaps.foldLeft(mutable.Map[A,B]()) {
(m, s) =>
for (k <- s.keys) {
if (m contains k)
m(k) = collisionFunc(m(k), s(k))
else
m(k) = s(k)
}
m
}
}
That seems a little bit cleaner, but will only work with mutable Maps as it's written. Interestingly, I first tried the above (before I asked the question) using /: instead of foldLeft, but I was getting type errors. I thought /: and foldLeft were basically equivalent, but the compiler kept complaining that I needed explicit types for (m, s). What's up with that?
I reading this question quickly so I'm not sure if I'm missing something (like it has to work for 2.7.x or no scalaz):
import scalaz._
import Scalaz._
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms.reduceLeft(_ |+| _)
// returns Map(goodbye -> 3.3, hello -> 5.5, world -> 2.2)
You can change the monoid definition for Double and get another way to accumulate the values, here getting the max:
implicit val dbsg: Semigroup[Double] = semigroup((a,b) => math.max(a,b))
ms.reduceLeft(_ |+| _)
// returns Map(goodbye -> 3.3, hello -> 4.4, world -> 2.2)
I wrote a blog post about this , check it out :
http://www.nimrodstech.com/scala-map-merge/
basically using scalaz semi group you can achieve this pretty easily
would look something like :
import scalaz.Scalaz._
listOfMaps reduce(_ |+| _)
a oneliner helper-func, whose usage reads almost as clean as using scalaz:
def mergeMaps[K,V](m1: Map[K,V], m2: Map[K,V])(f: (V,V) => V): Map[K,V] =
(m1 -- m2.keySet) ++ (m2 -- m1.keySet) ++ (for (k <- m1.keySet & m2.keySet) yield { k -> f(m1(k), m2(k)) })
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms.reduceLeft(mergeMaps(_,_)(_ + _))
// returns Map(goodbye -> 3.3, hello -> 5.5, world -> 2.2)
for ultimate readability wrap it in an implicit custom type:
class MyMap[K,V](m1: Map[K,V]) {
def merge(m2: Map[K,V])(f: (V,V) => V) =
(m1 -- m2.keySet) ++ (m2 -- m1.keySet) ++ (for (k <- m1.keySet & m2.keySet) yield { k -> f(m1(k), m2(k)) })
}
implicit def toMyMap[K,V](m: Map[K,V]) = new MyMap(m)
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms reduceLeft { _.merge(_)(_ + _) }