Check if RDD contains same key and merge them if yes - scala

I have a RDD[(String,Map[String,Int])],
[("A",Map("acs"->2,"sdv"->2,"sfd"->1),("B",Map("ass"->2,"fvv"->2,"ffd"->1)),("A"),Map("acs"->2,"sdv"->2,"sfd"->1)]
I want to merge the elements with same key as,
[("A",Map("acs"->4,"sdv"->4,"sfd"->2),("B",Map("ass"->2,"fvv"->2,"ffd"->1))]
How to do this is in scala?

If you define mapSum (see merge two maps and sum values):
def mapSum[T](map1: Map[T, Int], map2: Map[T, Int]): Map[T, Int] = map1 ++ map2.map{ case (k,v) => k -> (v + map1.getOrElse(k,0)) }
Then you can groupBy and reduce (similar to your other question):
# rdd.groupBy(_._1).map(_._2.reduce((a, b) => (a._1, mapSum(a._2, b._2)))).collect
res11: Array[(String, Map[String, Int])] = Array(
("A", Map("acs" -> 4, "sdv" -> 4, "sfd" -> 2)),
("B", Map("ass" -> 2, "fvv" -> 2, "ffd" -> 1))
)

An efficient approach would be to use reduceByKey to aggregate the Map (in the accumulator) by summing the values of matched keys:
val rdd = sc.parallelize(Seq(
("A", Map("acs"->2, "sdv"->2, "sfd"->1)),
("B", Map("ass"->2, "fvv"->2, "ffd"->1)),
("A", Map("acs"->2, "sdv"->2, "sfd"->1))
))
rdd.reduceByKey( (acc, m) =>
acc ++ m.map{ case (k, v) => (k, acc.getOrElse(k, 0) + v) }
).collect
// res1: Array[(String, scala.collection.immutable.Map[String,Int])] = Array(
// (A,Map(acs -> 4, sdv -> 4, sfd -> 2)),
// (B,Map(ass -> 2, fvv -> 2, ffd -> 1))
// )

Related

The operation about Merge two tuples

Now there are two thus tuples.
1st tuple:((A,1),(B,3),(D,5)......)
2nd tuple:((A,3),(B,1),(E,6)......)
And the function is to merge those two tuples to this.
((A,1,3),(B,3,1),(D,5,0),(E,0,6)......)
If the first tuple contains a key that is not in the second tuple, set the value to 0, and vice versa. How could I code this function in scala?
Lets say you get the input in the format
val tuple1: List[(String, Int)] = List(("A",1),("B",3),("D",5),("E",0))
val tuple2: List[(String, Int)] = List(("A",3),("B",1),("D",6))
You can write a merge function as
def merge(tuple1: List[(String, Int)],tuple2: List[(String, Int)]) =
{
val map1 = tuple1.toMap
val map2 = tuple2.toMap
map1.map{ case (k,v) =>
(k,v,map2.get(k).getOrElse(0))
}
}
On calling the function
merge(tuple1,tuple2)
You will get the output as
res0: scala.collection.immutable.Iterable[(String, Int, Int)] = List((A,1,3), (B,3,1), (D,5,6), (E,0,0))
Please let me know if that answers your question.
val t1= List(("A",1),("B",3),("D",5),("E",0))
val t2= List(("A",3),("B",1),("D",6),("H",5))
val t3 = t2.filter{ case (k,v) => !t1.exists(case (k1,_) => k1==k)) }.map{case (k,_) => (k,0)}
val t4 = t1.filter{ case (k,v) => !t2.exists{case (k1,_) => k1==k} }.map{case (k,_) => (k,0)}
val t5=(t1 ++ t3).sortBy{case (k,v) => k}
val t6=(t2 ++ t4).sortBy{case (k,v) => k}
t5.zip(t6).map{case ((k,v1),(_,v2)) => (k,v1,v2) }
res: List[(String, Int, Int)] = List(("A", 1, 3), ("B", 3, 1), ("D", 5, 6), ("E", 0, 0), ("H", 0, 5))
In terms of what's happening here
t3 and t4 - find the missing value in t1 and t2 respectively and add them with a zero value
t5 and t6 sort the unified list (t1 with t3 and t2 with t4). Lastly they are zipped together and transformed to the desired output

Scala Map: Combine keys with the same value?

Suppose I have a Map like
val x = Map(1 -> List("a", "b"), 2 -> List("a"),
3 -> List("a", "b"), 4 -> List("a"),
5 -> List("c"))
How would I create from this a new Map where the keys are Lists of keys from x having the same value, e.g., how can I implement
def someFunction(m: Map[Int, List[String]]): Map[List[Int], List[String]] =
// stuff that would turn x into
// Map(List(1, 3) -> List("a", "b"), List(2, 4) -> List("a"), List(5) -> List("c"))
?
You can convert the Map to a List and then use groupBy to aggregate the first element of each tuple:
x.toList.groupBy(_._2).mapValues(_.map(_._1)).map{ case (x, y) => (y, x) }
// res37: scala.collection.immutable.Map[List[Int],List[String]] =
// Map(List(2, 4) -> List(a), List(1, 3) -> List(a, b), List(5) -> List(c))
Or as #Dylan commented, use _.swap to switch the tuples' elements:
x.toList.groupBy(_._2).mapValues(_.map(_._1)).map(_.swap)

reduce variable number of tuples Sequences to Map[Key, List[Value]] in Scala

I have two sequences:
Seq("a" -> 1, "b" -> 2)
Seq("a" -> 3, "b" -> 4)
What I want is a result Map that looks like this:
Map(a -> List(3, 1), b -> List(4, 2))
val s1 = Seq("a" -> 1, "b" -> 2)
val s2 = Seq("a" -> 3, "b" -> 4)
val ss = s1 ++ s2
val toMap = ss.groupBy(x => x._1).map { case (k,v) => (k, v.map(_._2))}
res0: scala.collection.immutable.Map[String,Seq[Int]] = Map(b -> List(2, 4), a -> List(1, 3))
You can sort this or something you want.
You can try
scala> val seq = Seq("a" -> 1, "b" -> 2) ++ Seq("a" -> 3, "b" -> 4)
seq: Seq[(String, Int)] = List((a,1), (b,2), (a,3), (b,4))
scala> seq groupBy(_._1) mapValues(_ map(_._2))
res9: scala.collection.immutable.Map[String,Seq[Int]] = Map(b -> List(2, 4), a -> List(1, 3))
def reduceToMap[K, V](seqs: Seq[(K, V)]*): Map[K, List[V]] = {
seqs.reduce(_ ++ _).foldLeft(Map.empty[K, List[V]])((memo, next) =>
memo.get(next._1) match {
case None => memo.updated(next._1, next._2 :: Nil)
case Some(xs) => memo.updated(next._1, next._2 :: xs)
}
)
}
scala> reduceToMap(Seq("a" -> 1, "b" -> 2), Seq("a" -> 3, "b" -> 4))
res0: Map[String,List[Int]] = Map(a -> List(3, 1), b -> List(4, 2))
scala> reduceToMap(Seq.empty)
res1: Map[Nothing,List[Nothing]] = Map()

Scala: using foldl to add pairs from list to a map?

I am trying to add pairs from list to a map using foldl. I get the following error:
"missing arguments for method /: in trait TraversableOnce; follow this method with `_' if you want to treat it as a partially applied function"
code:
val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
def lstToMap(lst:List[(String,Int)], map: Map[String, Int] ) = {
(map /: lst) addToMap ( _, _)
}
def addToMap(pair: (String, Int), map: Map[String, Int]): Map[String, Int] = {
map + (pair._1 -> pair._2)
}
What is wrong?
scala> val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
pairs: List[(String, Int)] = List((a,1), (a,2), (c,3), (d,4))
scala> (Map.empty[String, Int] /: pairs)(_ + _)
res9: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
But you know, you could just do:
scala> pairs.toMap
res10: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
You need to swap the input values of addToMap and put it in parenthesis for this to work:
def addToMap( map: Map[String, Int], pair: (String, Int)): Map[String, Int] = {
map + (pair._1 -> pair._2)
}
def lstToMap(lst:List[(String,Int)], map: Map[String, Int] ) = {
(map /: lst)(addToMap)
}
missingfaktor's answer is much more concise, reusable, and scala-like.
If you already have a collection of Tuple2s, you don't need to implement this yourself, there is already a toMap method that only works if the elements are tuples!
The full signature is:
def toMap[T, U](implicit ev: <:<[A, (T, U)]): Map[T, U]
It works by requiring an implicit A <:< (T, U) which is essentially a function that can take the element type A and cast/convert it to tuples of type (T, U). Another way of saying this is that it requires an implicit witness that A is-a (T, U). Therefore, this is completely type-safe.
Update: which is what #missingfaktor said
This is not a direct answer to the question, which is about folding correctly on the map, but I deem it important to emphasize that
a Map can be treated as a generic Traversable of pairs
and you can easily combine the two!
scala> val pairs = List(("a", 1), ("a", 2), ("c", 3), ("d", 4))
pairs: List[(String, Int)] = List((a,1), (a,2), (c,3), (d,4))
scala> Map.empty[String, Int] ++ pairs
res1: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)
scala> pairs.toMap
res2: scala.collection.immutable.Map[String,Int] = Map(a -> 2, c -> 3, d -> 4)

Reverse / transpose a one-to-many map in Scala

What is the best way to turn a Map[A, Set[B]] into a Map[B, Set[A]]?
For example, how do I turn a
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
into a
Map("a" -> Set(1),
"b" -> Set(1, 2),
"c" -> Set(2, 3),
"d" -> Set(3))
(I'm using immutable collections only here. And my real problem has nothing to do with strings or integers. :)
with help from aioobe and Moritz:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_)(v)))).toMap
It's a bit more readable if you explicitly call contains:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_).contains(v)))).toMap
Best I've come up with so far is
val intToStrs = Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
def mappingFor(key: String) =
intToStrs.keys.filter(intToStrs(_) contains key).toSet
val newKeys = intToStrs.values.flatten
val inverseMap = newKeys.map(newKey => (newKey -> mappingFor(newKey))).toMap
Or another one using folds:
def reverse2[A,B](m:Map[A,Set[B]])=
m.foldLeft(Map[B,Set[A]]()){case (r,(k,s)) =>
s.foldLeft(r){case (r,e)=>
r + (e -> (r.getOrElse(e, Set()) + k))
}
}
Here's a one statement solution
orginalMap
.map{case (k, v)=>value.map{v2=>(v2,k)}}
.flatten
.groupBy{_._1}
.transform {(k, v)=>v.unzip._2.toSet}
This bit rather neatly (*) produces the tuples needed to construct the reverse map
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
.map{case (k, v)=>v.map{v2=>(v2,k)}}.flatten
produces
List((a,1), (b,1), (b,2), (c,2), (c,3), (d,3))
Converting it directly to a map overwrites the values corresponding to duplicate keys though
Adding .groupBy{_._1} gets this
Map(c -> List((c,2), (c,3)),
a -> List((a,1)),
d -> List((d,3)),
b -> List((b,1), (b,2)))
which is closer. To turn those lists into Sets of the second half of the pairs.
.transform {(k, v)=>v.unzip._2.toSet}
gives
Map(c -> Set(2, 3), a -> Set(1), d -> Set(3), b -> Set(1, 2))
QED :)
(*) YMMV
A simple, but maybe not super-elegant solution:
def reverse[A,B](m:Map[A,Set[B]])={
var r = Map[B,Set[A]]()
m.keySet foreach { k=>
m(k) foreach { e =>
r = r + (e -> (r.getOrElse(e, Set()) + k))
}
}
r
}
The easiest way I can think of is:
// unfold values to tuples (v,k)
// for all values v in the Set referenced by key k
def vk = for {
(k,vs) <- m.iterator
v <- vs.iterator
} yield (v -> k)
// fold iterator back into a map
(Map[String,Set[Int]]() /: vk) {
// alternative syntax: vk.foldLeft(Map[String,Set[Int]]()) {
case (m,(k,v)) if m contains k =>
// Map already contains a Set, so just add the value
m updated (k, m(k) + v)
case (m,(k,v)) =>
// key not in the map - wrap value in a Set and return updated map
m updated (k, Set(v))
}