Scala: Merge map - scala

How can I merge maps like below:
Map1 = Map(1 -> Class1(1), 2 -> Class1(2))
Map2 = Map(2 -> Class2(1), 3 -> Class2(2))
After merged.
Merged = Map( 1 -> List(Class1(1)), 2 -> List(Class1(2), Class2(1)), 3 -> Class2(2))
Can be List, Set or any other collection who has size attribute.

Using the standard lib, you can do it as follows:
// convert maps to seq, to keep duplicate keys and concat
val merged = Map(1 -> 2).toSeq ++ Map(1 -> 4).toSeq
// merged: Seq[(Int, Int)] = ArrayBuffer((1,2), (1,4))
// group by key
val grouped = merged.groupBy(_._1)
// grouped: scala.collection.immutable.Map[Int,Seq[(Int, Int)]] = Map(1 -> ArrayBuffer((1,2), (1,4)))
// remove key from value set and convert to list
val cleaned = grouped.mapValues(_.map(_._2).toList)
// cleaned: scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(2, 4))

This is the simplest implementation i could come up with,
val m1 = Map(1 -> "1", 2 -> "2")
val m2 = Map(2 -> "21", 3 -> "3")
def merge[K, V](m1:Map[K, V], m2:Map[K, V]):Map[K, List[V]] =
(m1.keySet ++ m2.keySet) map { i => i -> (m1.get(i).toList ::: m2.get(i).toList) } toMap
merge(m1, m2) // Map(1 -> List(1), 2 -> List(2, 21), 3 -> List(3))

You could use scalaz:
import scalaz._, Scalaz._
val m1 = Map('a -> 1, 'b -> 2)
val m2 = Map('b -> 3, 'c -> 4)
m1.mapValues{List(_)} |+| m2.mapValues{List(_)}
// Map('b -> List(2, 3), 'c -> List(4), 'a -> List(1))
You could use Set(_) instead of List(_) to get Sets as values in Map.
See Semigroup in scalaz cheat sheet (or in learning scalaz) for details about |+| operator.
For Int |+| works as +, for List - as ++, for Map it applies |+| to values of same keys.

One clean way to do it, with cats:
import cats.implicits._
Map(1 -> "Hello").combine(Map(2 -> "Goodbye"))
//Map(2 -> Goodbye, 1 -> Hello)
It's important to note that both maps have to be of the same type (in this case, Map[Int, String]).
Long explanation:
combine isn't really a member of Map. By importing cats.implicits you're bringing into scope cats's Map built-in monoid instances, along with some implicit classes which enable the terse syntax.
The above is equivalent to this:
Monoid[Map[Int, String]].combine(Map(1 -> "Hello"), Map(2 -> "Goodbye"))
Where we're using the Monoid "summoner" function to get the Monoid[Map[Int, String]] instance in scope and using its combine function.

Starting Scala 2.13, another solution only based on the standard library consists in using groupMap which (as its name suggests) is an equivalent of a groupBy followed by mapValues:
// val m1 = Map(1 -> "a", 2 -> "b")
// val m2 = Map(2 -> "c", 3 -> "d")
(m1.toSeq ++ m2).groupMap(_._1)(_._2)
// Map[Int,Seq[String]] = Map(2 -> List("b", "c"), 1 -> List("a"), 3 -> List("d"))
This:
Concatenates the two maps as a sequence of tuples (List((1,"a"), (2,"b"), (2,"c"), (3,"d"))). For conciseness, m2 is implicitly converted to Seq to adapt to the type of m1.toSeq - but you could choose to make it explicit by using m2.toSeq.
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2) (map part of groupMap)

I wrote a blog post about this , check it out :
http://www.nimrodstech.com/scala-map-merge/
basically using scalaz semi group you can achieve this pretty easily
would look something like :
import scalaz.Scalaz._
Map1 |+| Map2

You can use foldLeft to merge two Maps of the same type
def merge[A, B](a: Map[A, B], b: Map[A, B])(mergef: (B, Option[B]) => B): Map[A, B] = {
val (big, small) = if (a.size > b.size) (a, b) else (b, a)
small.foldLeft(big) { case (z, (k, v)) => z + (k -> mergef(v, z.get(k))) }
}
def mergeIntSum[A](a: Map[A, Int], b: Map[A, Int]): Map[A, Int] =
merge(a, b)((v1, v2) => v2.map(_ + v1).getOrElse(v1))
Example:
val a = Map("a" -> 1, "b" -> 5, "c" -> 6)
val b = Map("a" -> 4, "z" -> 8)
mergeIntSum(a, b)
res0: Map[String,Int] = Map(a -> 5, b -> 5, c -> 6, z -> 8)

There is a Scala module called scala-collection-contrib, which offers very useful methods like mergeByKey.
First, we need to add an additional dependency to build.sbt:
libraryDependencies += "org.scala-lang.modules" %% "scala-collection-contrib" % "0.1.0"
and then it's possible to do merge like this:
import scala.collection.decorators._
val map1 = Map(1 -> Class1(1), 2 -> Class1(2))
val map2 = Map(2 -> Class2(1), 3 -> Class2(2))
map1.mergeByKeyWith(map2){
case (a,b) => a.toList ++ b.toList
}

Solution to combine two maps: Map[A,B], the result type: Map[A,List[B]] via the Scala Cats (slightly improved version, offered by #David Castillo)
//convert each original map to Map[A,List[B]].
//Add an instance of Monoid[List] into the scope to combine lists:
import cats.instances.map._ // for Monoid
import cats.syntax.semigroup._ // for |+|
import cats.instances.list._
val map1 = Map("a" -> 1, "b" -> 2)
.mapValues(List(_))
val map2 = Map("b" -> 3, "d" -> 4)
.mapValues(List(_))
map1 |+| map2

If you don't want to mess around with original maps you could do something like following
val target = map1.clone()
val source = map2.clone()
source.foreach(e => target += e._1 -> e._2)

left.keys map { k => k -> List(left(k),right(k)) } toMap
Is concise and will work, assuming your two maps are left and right. Not sure about efficiency.
But your question is a bit ambiguous, for two reasons. You don't specify
The subtyping relationship between the values (i.e. class1,class2),
What happens if the maps have different keys
For the first case, consider the following example:
val left = Map("foo" ->1, "bar" ->2)
val right = Map("bar" -> 'a', "foo" -> 'b')
Which results in
res0: Map[String,List[Int]] = Map(foo -> List(1, 98), bar -> List(2, 97))
Notice how the Chars have been converted to Ints, because of the scala type hierarchy. More generally, if in your example class1 and class2 are not related, you would get back a List[Any]; this is probably not what you wanted.
You can work around this by dropping the List constructor from my answer; this will return Tuples which preserve the type:
res0: Map[String,(Int, Char)] = Map(foo -> (1,b), bar -> (2,a))
The second problem is what happens when you have maps that don't have the same keys. This will result in a key not found exception. Put in another way, are you doing a left, right, or inner join of the two maps? You can disambiguate the type of join by switching to right.keys or right.keySet ++ left.keySet for right/inner joins respectively. The later will work around the missing key problem, but maybe that's not what you want i.e. maybe you want a left or right join instead. In that case you can consider using the withDefault method of Map to ensure every key returns a value, e.g. None, but this needs a bit more work.

m2.foldLeft(m1.mapValues{List[CommonType](_)}) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, List.empty) :+ v)
}
As noted by jwvh, List type should be specified explicitly if Class1 is not upper type bound for Class2. CommonType is a type which is upper bound for both Class1 and Class2.

This answer does not solve the initial question directly although it solves a common/related scenario which is merging two maps by the common keys.
Based on #Drexin's answer I wrote a generic method to extend the existing Map functionality by providing a join method for Maps:
object implicits {
type A = Any
implicit class MapExt[K, B <: A, C <: A](val left: immutable.Map[K, B]) {
def join(right: immutable.Map[K, C]) : immutable.Map[K, Seq[A]] = {
val inter = left.keySet.intersect(right.keySet)
val leftFiltered = left.filterKeys{inter.contains}
val rightFiltered = right.filterKeys{inter.contains}
(leftFiltered.toSeq ++ rightFiltered.toSeq)
.groupBy(_._1)
.mapValues(_.map{_._2}.toList)
}
}
}
Notes:
Join is based on intersection of the keys, which resembles an "inner join" from the SQL world.
It works with Scala <= 2.12, for Scala 2.13 consider using groupMap as #Xavier Guihot suggested.
Consider replacing type A with your own base type.
Usage:
import implicits._
val m1 = Map("k11" -> "v11", "k12" -> "v12")
val m2 = Map("k11" -> "v21", "k12" -> "v22", "k13" -> "v23")
println (m1 join m2)
// Map(k11 -> List(v11, v21), k12 -> List(v12, v22))

this's two map merge
def mergeMap[A, B](map1: Map[A, B], map2: Map[A, B], op: (B, B) => B, default: => B): Map[A, B] = (map1.keySet ++ map2.keySet).map(x => (x, op(map1.getOrElse(x, default), map2.getOrElse(x, default)))).toMap
this’s multiple map merge
def mergeMaps[A, B](maps: Seq[Map[A, B]], op: (B, B) => B, default: => B): Map[A, B] = maps.reduce((a, b) => mergeMap(a, b, op, default))

Related

Difference between mapValues and transform in Map

In Scala Map (see API) what is the difference in semantics and performance between mapValues and transform ?
For any given map, for instance
val m = Map( "a" -> 2, "b" -> 3 )
both
m.mapValues(_ * 5)
m.transform( (k,v) => v * 5 )
deliver the same result.
Let's say we have a Map[A,B]. For clarification: I'm always referring to an immutable Map.
mapValues takes a function B => C, where C is the new type for the values.
transform takes a function (A, B) => C, where this C is also the type for the values.
So both will result in a Map[A,C].
However with the transform function you can influence the result of the new values by the value of their keys.
For example:
val m = Map( "a" -> 2, "b" -> 3 )
m.transform((key, value) => key + value) //Map[String, String](a -> a2, b -> b3)
Doing this with mapValues will be quite hard.
The next difference is that transform is strict, whereas mapValues will give you only a view, which will not store the updated elements. It looks like this:
protected class MappedValues[C](f: B => C) extends AbstractMap[A, C] with DefaultMap[A, C] {
override def foreach[D](g: ((A, C)) => D): Unit = for ((k, v) <- self) g((k, f(v)))
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
override def size = self.size
override def contains(key: A) = self.contains(key)
def get(key: A) = self.get(key).map(f)
}
(taken from https://github.com/scala/scala/blob/v2.11.2/src/library/scala/collection/MapLike.scala#L244)
So performance-wise it depends what is more effective. If f is expensive and you only access a few elements of the resulting map, mapValues might be better, since f is only applied on demand. Otherwise I would stick to map or transform.
transform can also be expressed with map. Assume m: Map[A,B] and f: (A,B) => C, then
m.transform(f) is equivalent to m.map{case (a, b) => (a, f(a, b))}
collection.Map doesn't provide transform: it has a different signature for mutable and immutable Maps.
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val im = Map('a -> 1, 'b -> 2, 'c -> 3)
im: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 1, 'b -> 2, 'c -> 3)
scala> im.mapValues(_ * 7) eq im
res0: Boolean = false
scala> im.transform { case (k,v) => v*7 } eq im
res2: Boolean = false
scala> val mm = collection.mutable.Map('a -> 1, 'b -> 2, 'c -> 3)
mm: scala.collection.mutable.Map[Symbol,Int] = Map('b -> 2, 'a -> 1, 'c -> 3)
scala> mm.mapValues(_ * 7) eq mm
res3: Boolean = false
scala> mm.transform { case (k,v) => v*7 } eq mm
res5: Boolean = true
Mutable transform mutates in place:
scala> mm.transform { case (k,v) => v*7 }
res6: mm.type = Map('b -> 98, 'a -> 49, 'c -> 147)
scala> mm.transform { case (k,v) => v*7 }
res7: mm.type = Map('b -> 686, 'a -> 343, 'c -> 1029)
So mutable transform doesn't change the type of the map:
scala> im mapValues (_ => "hi")
res12: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
scala> mm mapValues (_ => "hi")
res13: scala.collection.Map[Symbol,String] = Map('b -> hi, 'a -> hi, 'c -> hi)
scala> mm.transform { case (k,v) => "hi" }
<console>:9: error: type mismatch;
found : String("hi")
required: Int
mm.transform { case (k,v) => "hi" }
^
scala> im.transform { case (k,v) => "hi" }
res15: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
...as can happen when constructing a new map.
Here's a couple of unmentioned differences:
mapValues creates a Map that is NOT serializable, without any indication that it's just a view (the type is Map[_, _], but just try to send one across the wire).
Since mapValues is just a view, every instance contains the real Map - which could be another result of mapValues. Imagine you have an actor with some state, and every mutation of the state sets the new state to be a mapValues on the previous state...in the end you have deeply nested maps with a copy of each previous state of the actor (and, yes, both of these are from experience).

Zip two HashMaps(or dictionaries)

What would be a functional way to zip two dictionaries in Scala?
map1 = new HashMap("A"->1,"B"->2)
map2 = new HashMap("B"->22,"D"->4) // B is the only common key
zipper(map1,map2) should give something similar to
Seq( ("A",1,0), // no A in second map, so third value is zero
("B",2,22),
("D",0,4)) // no D in first map, so second value is zero
If not functional, any other style is also appreciated
def zipper(map1: Map[String, Int], map2: Map[String, Int]) = {
for(key <- map1.keys ++ map2.keys)
yield (key, map1.getOrElse(key, 0), map2.getOrElse(key, 0))
}
scala> val map1 = scala.collection.immutable.HashMap("A" -> 1, "B" -> 2)
map1: scala.collection.immutable.HashMap[String,Int] = Map(A -> 1, B -> 2)
scala> val map2 = scala.collection.immutable.HashMap("B" -> 22, "D" -> 4)
map2: scala.collection.immutable.HashMap[String,Int] = Map(B -> 22, D -> 4)
scala> :load Zipper.scala
Loading Zipper.scala...
zipper: (map1: Map[String,Int], map2: Map[String,Int])Iterable[(String, Int, Int)]
scala> zipper(map1, map2)
res1: Iterable[(String, Int, Int)] = Set((A,1,0), (B,2,22), (D,0,4))
Note using get is probably preferable to getOrElse in this case. None is used to specify that a value does not exist instead of using 0.
As an alternative to Brian's answer, this can be used to enhance the map class by way of implicit methods:
implicit class MapUtils[K, +V](map: collection.Map[K, V]) {
def zipAllByKey[B >: V, C >: V](that: collection.Map[K, C], thisElem: B, thatElem: C): Iterable[(K, B, C)] =
for (key <- map.keys ++ that.keys)
yield (key, map.getOrElse(key, thisElem), that.getOrElse(key, thatElem))
}
The naming and API are similar to the sequence zipAll.

Use map elements as method arguments

I've been stuck on this for a while now: I have this method
Method(a: (A,B)*): Unit
and I have a map of type Map[A,B], is there a way to convert this map so that I can directly use it as an argument?
Something like:
Method(map.convert)
Thank you.
You can build a sequence of pairs from a Map by using .toSeq.
You can also pass a sequence of type Seq[T] as varargs by "casting" it with : _*.
Chain the conversions to achieve what you want:
scala> val m = Map('a' -> 1, 'b' -> 2, 'z' -> 26)
m: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, z -> 26)
scala> def foo[A,B](pairs : (A,B)*) = pairs.foreach(println)
foo: [A, B](pairs: (A, B)*)Unit
scala> foo(m.toSeq : _*)
(a,1)
(b,2)
(z,26)
I'm not sure but you can try converting map into array of tuples and then cast it to vararg type, like so:
Method(map.toArray: _*)
Just convert the map to a sequence of pairs using ".toSeq" and pass it to the var-args method postfixed with ":_*" (which is Scala syntax to allow passing a sequence as arguments to a var-args method).
Example:
def m(a: (String, Int)*) { for ((k, v) <- a) println(k+"->"+v) }
val x= Map("a" -> 1, "b" -> 2)
m(x.toSeq:_*)
In repl:
scala> def m(a: (String, Int)*) { for ((k, v) <- a) println(k+"->"+v) }
m: (a: (String, Int)*)Unit
scala> val x= Map("a" -> 1, "b" -> 2)
x: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2)
scala> m(x.toSeq:_*)
a->1
b->2

Cleaner tuple groupBy

I have a sequence of key-value pairs (String, Int), and I want to group them by key into a sequence of values (i.e. Seq[(String, Int)]) => Map[String, Iterable[Int]])).
Obviously, toMap isn't useful here, and groupBy maintains the values as tuples. The best I managed to come up with is:
val seq: Seq[( String, Int )]
// ...
seq.groupBy( _._1 ).mapValues( _.map( _._2 ) )
Is there a cleaner way of doing this?
Here's a pimp that adds a toMultiMap method to traversables. Would it solve your problem?
import collection._
import mutable.Builder
import generic.CanBuildFrom
class TraversableOnceExt[CC, A](coll: CC, asTraversable: CC => TraversableOnce[A]) {
def toMultiMap[T, U, That](implicit ev: A <:< (T, U), cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] =
toMultiMapBy(ev)
def toMultiMapBy[T, U, That](f: A => (T, U))(implicit cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] = {
val mutMap = mutable.Map.empty[T, mutable.Builder[U, That]]
for (x <- asTraversable(coll)) {
val (key, value) = f(x)
val builder = mutMap.getOrElseUpdate(key, cbf(coll))
builder += value
}
val mapBuilder = immutable.Map.newBuilder[T, That]
for ((k, v) <- mutMap)
mapBuilder += ((k, v.result))
mapBuilder.result
}
}
implicit def commomExtendTraversable[A, C[A] <: TraversableOnce[A]](coll: C[A]): TraversableOnceExt[C[A], A] =
new TraversableOnceExt[C[A], A](coll, identity)
Which can be used like this:
val map = List(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(map) // Map(1 -> List(a, à), 2 -> List(b))
val byFirstLetter = Set("abc", "aeiou", "cdef").toMultiMapBy(elem => (elem.head, elem))
println(byFirstLetter) // Map(c -> Set(cdef), a -> Set(abc, aeiou))
If you add the following implicit defs, it will also work with collection-like objects such as Strings and Arrays:
implicit def commomExtendStringTraversable(string: String): TraversableOnceExt[String, Char] =
new TraversableOnceExt[String, Char](string, implicitly)
implicit def commomExtendArrayTraversable[A](array: Array[A]): TraversableOnceExt[Array[A], A] =
new TraversableOnceExt[Array[A], A](array, implicitly)
Then:
val withArrays = Array(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(withArrays) // Map(1 -> [C#377653ae, 2 -> [C#396fe0f4)
val byLowercaseCode = "Mama".toMultiMapBy(c => (c.toLower.toInt, c))
println(byLowercaseCode) // Map(97 -> aa, 109 -> Mm)
There's no method or data structure in the standard library to do this, and your solution looks about as concise as you'll get. If you use this in more than one place, you might like to factor it out into a utility method
def groupTuples[A, B](seq: Seq[(A, B)]) =
seq groupBy (_._1) mapValues (_ map (_._2))
which you then obviously just call with groupTuples(seq). This might not be the most efficient possible in terms of CPU clock cycles, but I don't think it's particularly inefficient either.
I did a rough benchmark against Jean-Philippe's solution on a list of 9 tuples and this is marginally faster. Both were about twice as fast as folding the sequence into a map (effectively re-implementing groupBy to give the output you want).
I don't know if you consider it cleaner:
seq.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List(1 -> 'a', 1 -> 'b', 2 -> 'c').groupMap(_._1)(_._2)
// Map[Int,List[Char]] = Map(2 -> List(c), 1 -> List(a, b))
This:
groups elements based on the first part of tuples (Map(2 -> List((2,c)), 1 -> List((1,a), (1,b))))
maps grouped values (List((1,a), (1,b))) by taking their second tuple part (List(a, b)).

Convert List of tuple to map (and deal with duplicate key ?)

I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")] into map ("a" -> ["b", "f"], "c" -> ["d"]). Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
btw, actual type of key-value I use here is (Int, Node) and I want to turn into a map of (Int -> NodeSeq)
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
List("a" -> 1, "b" -> 2, "a" -> 3).toMap
// Result: Map(a -> 3, c -> 2)
As of 2.12, the default policy reads:
Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.
Group and then project:
scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
More scalish way to use fold, in the way like there (skip map f step).
Here's another alternative:
x.groupBy(_._1).mapValues(_.map(_._2))
For Googlers that do care about duplicates:
implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}
> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
This:
groups elements based on the first part of tuples (group part of groupMap)
maps grouped values by taking their second tuple part (map part of groupMap)
This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
GroupBy variation
list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
Fold Left variation
list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})
Aggregate Variation - Similar to fold Left
list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)
import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)
// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)
// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap
Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.
val x = List("a" -> "b", "c" -> "d", "a" -> "f")
x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}
res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))
You can try this
scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)