Scala: how to merge a collection of Maps

Scala: how to merge a collection of Maps - scala

I have a List of Map[String, Double], and I'd like to merge their contents into a single Map[String, Double]. How should I do this in an idiomatic way? I imagine that I should be able to do this with a fold. Something like:
val newMap = Map[String, Double]() /: listOfMaps { (accumulator, m) => ... }
Furthermore, I'd like to handle key collisions in a generic way. That is, if I add a key to the map that already exists, I should be able to specify a function that returns a Double (in this case) and takes the existing value for that key, plus the value I'm trying to add. If the key does not yet exist in the map, then just add it and its value unaltered.
In my specific case I'd like to build a single Map[String, Double] such that if the map already contains a key, then the Double will be added to the existing map value.
I'm working with mutable maps in my specific code, but I'm interested in more generic solutions, if possible.

Well, you could do:
mapList reduce (_ ++ _)
except for the special requirement for collision.
Since you do have that special requirement, perhaps the best would be doing something like this (2.8):
def combine(m1: Map, m2: Map): Map = {
val k1 = Set(m1.keysIterator.toList: _*)
val k2 = Set(m2.keysIterator.toList: _*)
val intersection = k1 & k2
val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
r2 ++ r1
}
You can then add this method to the map class through the Pimp My Library pattern, and use it in the original example instead of "++":
class CombiningMap(m1: Map[Symbol, Double]) {
def combine(m2: Map[Symbol, Double]) = {
val k1 = Set(m1.keysIterator.toList: _*)
val k2 = Set(m2.keysIterator.toList: _*)
val intersection = k1 & k2
val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
r2 ++ r1
}
}
// Then use this:
implicit def toCombining(m: Map[Symbol, Double]) = new CombiningMap(m)
// And finish with:
mapList reduce (_ combine _)
While this was written in 2.8, so keysIterator becomes keys for 2.7, filterKeys might need to be written in terms of filter and map, & becomes **, and so on, it shouldn't be too different.

How about this one:
def mergeMap[A, B](ms: List[Map[A, B]])(f: (B, B) => B): Map[A, B] =
(Map[A, B]() /: (for (m <- ms; kv <- m) yield kv)) { (a, kv) =>
a + (if (a.contains(kv._1)) kv._1 -> f(a(kv._1), kv._2) else kv)
}
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
val mm = mergeMap(ms)((v1, v2) => v1 + v2)
println(mm) // prints Map(hello -> 5.5, world -> 2.2, goodbye -> 3.3)
And it works in both 2.7.5 and 2.8.0.

I'm surprised no one's come up with this solution yet:
myListOfMaps.flatten.toMap
Does exactly what you need:
Merges the list to a single Map
Weeds out any duplicate keys
Example:
scala> List(Map('a -> 1), Map('b -> 2), Map('c -> 3), Map('a -> 4, 'b -> 5)).flatten.toMap
res7: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 4, 'b -> 5, 'c -> 3)
flatten turns the list of maps into a flat list of tuples, toMap turns the list of tuples into a map with all the duplicate keys removed

Starting Scala 2.13, another solution which handles duplicate keys and is only based on the standard library consists in merging the Maps as sequences (flatten) before applying the new groupMapReduce operator which (as its name suggests) is an equivalent of a groupBy followed by a mapping and a reduce step of grouped values:
List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
.flatten
.groupMapReduce(_._1)(_._2)(_ + _)
// Map("world" -> 2.2, "goodbye" -> 3.3, "hello" -> 5.5)
This:
flattens (concatenates) the maps as a sequence of tuples (List(("hello", 1.1), ("world", 2.2), ("goodbye", 3.3), ("hello", 4.4))), which keeps all key/values (even duplicate keys)
groups elements based on their first tuple part (_._1) (group part of groupMapReduce)
maps grouped values to their second tuple part (_._2) (map part of groupMapReduce)
reduces mapped grouped values (_+_) by taking their sum (but it can be any reduce: (T, T) => T function) (reduce part of groupMapReduce)
The groupMapReduce step can be seen as a one-pass version equivalent of:
list.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _))

Interesting, noodling around with this a bit, I got the following (on 2.7.5):
General Maps:
def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: Seq[scala.collection.Map[A,B]]): Map[A, B] = {
listOfMaps.foldLeft(Map[A, B]()) { (m, s) =>
Map(
s.projection.map { pair =>
if (m contains pair._1)
(pair._1, collisionFunc(m(pair._1), pair._2))
else
pair
}.force.toList:_*)
}
}
But man, that is hideous with the projection and forcing and toList and whatnot. Separate question: what's a better way to deal with that within the fold?
For mutable Maps, which is what I was dealing with in my code, and with a less general solution, I got this:
def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: List[mutable.Map[A,B]]): mutable.Map[A, B] = {
listOfMaps.foldLeft(mutable.Map[A,B]()) {
(m, s) =>
for (k <- s.keys) {
if (m contains k)
m(k) = collisionFunc(m(k), s(k))
else
m(k) = s(k)
}
m
}
}
That seems a little bit cleaner, but will only work with mutable Maps as it's written. Interestingly, I first tried the above (before I asked the question) using /: instead of foldLeft, but I was getting type errors. I thought /: and foldLeft were basically equivalent, but the compiler kept complaining that I needed explicit types for (m, s). What's up with that?

I reading this question quickly so I'm not sure if I'm missing something (like it has to work for 2.7.x or no scalaz):
import scalaz._
import Scalaz._
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms.reduceLeft(_ |+| _)
// returns Map(goodbye -> 3.3, hello -> 5.5, world -> 2.2)
You can change the monoid definition for Double and get another way to accumulate the values, here getting the max:
implicit val dbsg: Semigroup[Double] = semigroup((a,b) => math.max(a,b))
ms.reduceLeft(_ |+| _)
// returns Map(goodbye -> 3.3, hello -> 4.4, world -> 2.2)

I wrote a blog post about this , check it out :
http://www.nimrodstech.com/scala-map-merge/
basically using scalaz semi group you can achieve this pretty easily
would look something like :
import scalaz.Scalaz._
listOfMaps reduce(_ |+| _)

a oneliner helper-func, whose usage reads almost as clean as using scalaz:
def mergeMaps[K,V](m1: Map[K,V], m2: Map[K,V])(f: (V,V) => V): Map[K,V] =
(m1 -- m2.keySet) ++ (m2 -- m1.keySet) ++ (for (k <- m1.keySet & m2.keySet) yield { k -> f(m1(k), m2(k)) })
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms.reduceLeft(mergeMaps(_,_)(_ + _))
// returns Map(goodbye -> 3.3, hello -> 5.5, world -> 2.2)
for ultimate readability wrap it in an implicit custom type:
class MyMap[K,V](m1: Map[K,V]) {
def merge(m2: Map[K,V])(f: (V,V) => V) =
(m1 -- m2.keySet) ++ (m2 -- m1.keySet) ++ (for (k <- m1.keySet & m2.keySet) yield { k -> f(m1(k), m2(k)) })
}
implicit def toMyMap[K,V](m: Map[K,V]) = new MyMap(m)
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms reduceLeft { _.merge(_)(_ + _) }

Related

How to reverse Map

Trying to reverse Map and the output is only 2 element
val occurrences: Map[String, Int] = arr.groupMapReduce(identity)(_ => 1)(_ + _)
Output: HashMap(world -> 2, Hello, -> 1, hello, -> 1, hello -> 2, and -> 1, world, -> 1)
val reversed = for ((k,v) <- occurrences) yield (v, k)
Output: HashMap(1 -> world,, 2 -> hello)
How did I lost the other patameters?

Similar to #user proposal, but trying to be a little bit more efficient.
def invertMap[K, V](map: Map[K, V]): Map[V, List[K]] =
map
.view
.groupMap(_._2)(_._1)
.view
.mapValues(_.toList)
.toMap
The performance difference would probably be negligible so go with the one you find more readable.

As #Luis Miguel Mejía Suárez said, you can't duplicate keys in a Map, so when you try to make the values the keys, some of the entries are lost.
You can instead do this to obtain a Map[Int, List[String]]
val occurrences = Map("world" -> 2, "Hello," -> 1, "hello," -> 1, "hello" -> 2, "and" -> 1, "world," -> 1)
val x: Map[Int, List[String]] =
occurrences.toList
.groupBy { case (k, v) => v }
.view.mapValues(v => v.map(_._1))
.toMap
Output:
Map(1 -> List(Hello,, hello,, and, world,), 2 -> List(world, hello))
P.S. The .view and .toMap stuff is because mapValues on MapOps is deprecated for now. There'll be a proper strict version later, though.

Scala concatenate maps from a list

Having val mapList: List[Map[String, Int]], I want to do something like:
val map = mapList foldLeft (Map[String, Int]()) ( _ ++ _ )
or
val map = mapList foldLeft (Map[String, Int]())
( (m1: Map[String, Int], m2: Map[String, Int]) => m1 ++ m2 )
Neither option is compiled (first says "missing parameter type for expanded function (x, y) => x ++ y" and second says "type mismatch; found (Map[String, Int], Map[String, Int]) => Map[String, Int]; required: String").
I want to achieve a classical solution for concatenating a list of immutable maps such as List( Map("apple" -> 5, "pear" -> 7), Map("pear" -> 3, "apricot" -> 0) ) would produce a Map("apple" -> 5, "pear" -> 10, "apricot" -> 0).
Using scala 2.10.5.

You need to add a dot before foldLeft. You can only use spaces instead of dots under specialized conditions, such as for methods with exactly 1 parameter (arity-1 methods):
val map = mapList.foldLeft(Map[String, Int]()) ( _ ++ _ )
You can read more about method invocation best practices here.
You might also be interested in the reduce methods, which are specialized versions of the fold methods, where the return type is the same as the type of the elements of the collection. For example reduceLeft uses the first element of the collection as a seed for the foldLeft. Of course, since this relies on the first element's existence, it will throw an exception if the collection is empty. Since reduceLeft takes only 1 parameter, you can more easily use a space to invoke the method:
mapList.reduceLeft( _ ++ _)
mapList reduceLeft(_ ++ _)
Finally, you should note that all you are doing here is merging the maps. When using ++ to merge the maps, you will just override keys that are already present in the map – you won't be adding the values of duplicate keys. If you wanted to do that, you could follow the answers provided here, and apply them to the foldLeft or reduceLeft. For example:
mapList reduceLeft { (acc, next) =>
(acc.toList ++ next.toList).groupBy(_._1).toMap.mapValues(_.map(_._2).sum)
}
Or slightly differently:
mapList.map(_.toSeq).reduceLeft(_ ++ _).groupBy(_._1).toMap.mapValues(_.map(_._2).sum)
And, if you're using Scalaz, then most concisely:
mapList reduceLeft { _ |+| _ }

In Scala, is there an equivalent of Haskell's "fromListWith" for Map?

In Haskell, there is a function called fromListWith which can generate a Map from a function (used to merge values with the same key) and a list:
fromListWith :: Ord k => (a -> a -> a) -> [(k, a)] -> Map k a
The following expression will be evaluated to true:
fromListWith (++) [(5,"a"), (5,"b"), (3,"b"), (3,"a"), (5,"a")] == fromList [(3, "ab"), (5, "aba")]
In Scala, there is a similar function called toMap on List objects , which can also convert a list to a Map, but it can't have a parameter of function to deal with duplicated keys.
Does anyone have ideas about this?

Apart from using scalaz you could also define one yourself:
implicit class ListToMapWith[K, V](list: List[(K, V)]) {
def toMapWith(op: (V, V) => V) =
list groupBy (_._1) mapValues (_ map (_._2) reduce op)
}
Here is a usage example:
scala> val testList = List((5,"a"), (5,"b"), (3,"b"), (3,"a"), (5,"a"))
scala> testList toMapWith (_ + _)
res1: scala.collection.immutable.Map[Int,String] = Map(5 -> aba, 3 -> ba)

The stdlib doesn't have such a feature, however, there is a port of Data.Map available in scalaz that does have this function available.

Create a Map of Iterables only using immutable collections

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.
Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).
I can do it using mutable Map and/or using mutable ListBuffers.
However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)

I have found out a really simple way to do this
pairs.groupBy{_._1}.mapValues{_.map{_._2}}
And that's it.

Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:
loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation
So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we
def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
#annotation.tailrec def mkMapInner(
data: Iterator[(K,V)],
map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
): Map[K,Vector[V]] = {
if (data.hasNext) {
val (k,v) = data.next
mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
}
else map
}
mkMapInner(data.iterator)
}
Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with #annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)
Let's test it out:
val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] =
Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))
Now, it turns out that Scala's collection libraries also contain something useful for this:
scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }
with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.

For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):
pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}
Although of course in this case the groupBy approach is more convenient.

val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)
ps.groupBy(_._1).mapValues(_ map (_._2))
// = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))
This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:
ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
// = Map(1 -> Vector(2, 5), 3 -> Vector(4))
It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....
More info on breakOut here.
As a method:
def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))

I perform this function so often that I have an implicit written called groupByKey that does precisely this:
class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)
And you use it like this:
scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))
Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.

Scala best way of turning a Collection into a Map-by-key?

If I have a collection c of type T and there is a property p on T (of type P, say), what is the best way to do a map-by-extracting-key?
val c: Collection[T]
val m: Map[P, T]
One way is the following:
m = new HashMap[P, T]
c foreach { t => m add (t.getP, t) }
But now I need a mutable map. Is there a better way of doing this so that it's in 1 line and I end up with an immutable Map? (Obviously I could turn the above into a simple library utility, as I would in Java, but I suspect that in Scala there is no need)

You can use
c map (t => t.getP -> t) toMap
but be aware that this needs 2 traversals.

You can construct a Map with a variable number of tuples. So use the map method on the collection to convert it into a collection of tuples and then use the : _* trick to convert the result into a variable argument.
scala> val list = List("this", "maps", "string", "to", "length") map {s => (s, s.length)}
list: List[(java.lang.String, Int)] = List((this,4), (maps,4), (string,6), (to,2), (length,6))
scala> val list = List("this", "is", "a", "bunch", "of", "strings")
list: List[java.lang.String] = List(this, is, a, bunch, of, strings)
scala> val string2Length = Map(list map {s => (s, s.length)} : _*)
string2Length: scala.collection.immutable.Map[java.lang.String,Int] = Map(strings -> 7, of -> 2, bunch -> 5, a -> 1, is -> 2, this -> 4)

In addition to #James Iry's solution, it is also possible to accomplish this using a fold. I suspect that this solution is slightly faster than the tuple method (fewer garbage objects are created):
val list = List("this", "maps", "string", "to", "length")
val map = list.foldLeft(Map[String, Int]()) { (m, s) => m(s) = s.length }

This can be implemented immutably and with a single traversal by folding through the collection as follows.
val map = c.foldLeft(Map[P, T]()) { (m, t) => m + (t.getP -> t) }
The solution works because adding to an immutable Map returns a new immutable Map with the additional entry and this value serves as the accumulator through the fold operation.
The tradeoff here is the simplicity of the code versus its efficiency. So, for large collections, this approach may be more suitable than using 2 traversal implementations such as applying map and toMap.

Another solution (might not work for all types)
import scala.collection.breakOut
val m:Map[P, T] = c.map(t => (t.getP, t))(breakOut)
this avoids the creation of the intermediary list, more info here:
Scala 2.8 breakOut

What you're trying to achieve is a bit undefined.
What if two or more items in c share the same p? Which item will be mapped to that p in the map?
The more accurate way of looking at this is yielding a map between p and all c items that have it:
val m: Map[P, Collection[T]]
This could be easily achieved with groupBy:
val m: Map[P, Collection[T]] = c.groupBy(t => t.p)
If you still want the original map, you can, for instance, map p to the first t that has it:
val m: Map[P, T] = c.groupBy(t => t.p) map { case (p, ts) => p -> ts.head }

Scala 2.13+
instead of "breakOut" you could use
c.map(t => (t.getP, t)).to(Map)
Scroll to "View": https://www.scala-lang.org/blog/2017/02/28/collections-rework.html

This is probably not the most efficient way to turn a list to map, but it makes the calling code more readable. I used implicit conversions to add a mapBy method to List:
implicit def list2ListWithMapBy[T](list: List[T]): ListWithMapBy[T] = {
new ListWithMapBy(list)
}
class ListWithMapBy[V](list: List[V]){
def mapBy[K](keyFunc: V => K) = {
list.map(a => keyFunc(a) -> a).toMap
}
}
Calling code example:
val list = List("A", "AA", "AAA")
list.mapBy(_.length) //Map(1 -> A, 2 -> AA, 3 -> AAA)
Note that because of the implicit conversion, the caller code needs to import scala's implicitConversions.

c map (_.getP) zip c
Works well and is very intuitiv

How about using zip and toMap?
myList.zip(myList.map(_.length)).toMap

For what it's worth, here are two pointless ways of doing it:
scala> case class Foo(bar: Int)
defined class Foo
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> val c = Vector(Foo(9), Foo(11))
c: scala.collection.immutable.Vector[Foo] = Vector(Foo(9), Foo(11))
scala> c.map(((_: Foo).bar) &&& identity).toMap
res30: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
scala> c.map(((_: Foo).bar) >>= (Pair.apply[Int, Foo] _).curried).toMap
res31: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))

This works for me:
val personsMap = persons.foldLeft(scala.collection.mutable.Map[Int, PersonDTO]()) {
(m, p) => m(p.id) = p; m
}
The Map has to be mutable and the Map has to be return since adding to a mutable Map does not return a map.

use map() on collection followed with toMap
val map = list.map(e => (e, e.length)).toMap

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Scala: how to merge a collection of Maps - scala

I wrote a blog post about this , check it out : http://www.nimrodstech.com/scala-map-merge/ basically using scalaz semi group you can achieve this pretty easily would look something like : import scalaz.Scalaz._ listOfMaps reduce(_ |+| _)

Related

How to reverse Map

Scala concatenate maps from a list

In Scala, is there an equivalent of Haskell's "fromListWith" for Map?

Create a Map of Iterables only using immutable collections

Scala best way of turning a Collection into a Map-by-key?

Categories

Resources