Reduce rdd of maps - scala

I have and rdd like that :
Map(A -> Map(A1 -> 1))
Map(A -> Map(A2 -> 2))
Map(A -> Map(A3 -> 3))
Map(B -> Map(B1 -> 4))
Map(B -> Map(B2 -> 5))
Map(B -> Map(B3 -> 6))
Map(C -> Map(C1 -> 7))
Map(C -> Map(C2 -> 8))
Map(C -> Map(C3 -> 9))
I need to have the same rdd reduced by key and having as many values as it has previously:
Map(A -> Map(A1 -> 1, A2 -> 2, A3 -> 3))
Map(B -> Map(B1 -> 4, B2 -> 5, B3 -> 6))
Map(C -> Map(C1 -> 7, C2 -> 8, C3 -> 9))
I tried with a reduce:
val prueba = replacements_2.reduce((x,y) => x ++ y)
But only remains the value of the last element evaluated with the same key:
(A,Map(A3 -> 3))
(C,Map(C3 -> 9))
(B,Map(B3 -> 6))

I think you should model your data differently, your Map approach seems a bit awkward. Why represent 1 entry by a Map with 1 element? A Tuple2 is more suitable for this... Anyway, you need reduceByKey. To do this, you first need to convert your rdd to a key-value RDD:
rdd
.map(m => (m.keys.head,m.values.head)) // create key-value RDD
.reduceByKey((a,b) => a++b) // merge maps
.map{case (k,v) => Map(k -> v)} // create Map again

Related

Scala grouping of Sequence of <Key, Value(Key)> to Map of <Key, Seq(Value)> [duplicate]

I have
val a = List((1,2), (1,3), (3,4), (3,5), (4,5))
I am using A.groupBy(_._1) which is groupBy with the first element. But, it gives me output as
Map(1 -> List((1,2) , (1,3)) , 3 -> List((3,4), (3,5)), 4 -> List((4,5)))
But, I want answer as
Map(1 -> List(2, 3), 3 -> List(4,5) , 4 -> List(5))
So, how can I do this?
You can do that by following up with mapValues (and a map over each value to extract the second element):
scala> a.groupBy(_._1).mapValues(_.map(_._2))
res2: scala.collection.immutable.Map[Int,List[Int]] = Map(4 -> List(5), 1 -> List(2, 3), 3 -> List(4, 5))
Make life easy with pattern match and Map#withDefaultValue:
scala> a.foldLeft(Map.empty[Int, List[Int]].withDefaultValue(Nil)){
case(r, (x, y)) => r.updated(x, r(x):+y)
}
res0: scala.collection.immutable.Map[Int,List[Int]] =
Map(1 -> List(2, 3), 3 -> List(4, 5), 4 -> List(5))
There are two points:
Map#withDefaultValue will get a map with a given default value, then you don't need to check if the map contains a key.
When somewhere in scala expected a function value (x1,x2,..,xn) => y, you can always use a pattern matching case(x1,x2,..,xn) => y here, the compiler will translate it to a function auto. Look into 8.5 Pattern Matching Anonymous Functions for more information.
Sorry for my poor english.
As from Scala 2.13 it would be possible to use groupMap
so you'd be able to write just:
// val list = List((1, 2), (1, 3), (3, 4), (3, 5), (4, 5))
list.groupMap(_._1)(_._2)
// Map(1 -> List(2, 3), 3 -> List(4, 5), 4 -> List(5))
As a variant:
a.foldLeft(Map[Int, List[Int]]()) {case (acc, (a,b)) => acc + (a -> (b::acc.getOrElse(a,List())))}
You can also do it with a foldLeft to have only one iteration.
a.foldLeft(Map.empty[Int, List[Int]])((map, t) =>
if(map.contains(t._1)) map + (t._1 -> (t._2 :: map(t._1)))
else map + (t._1 -> List(t._2)))
scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(3, 2), 3 ->
List(5, 4), 4 -> List(5))
If the order of the elements in the lists matters you need to include a reverse.
a.foldLeft(Map.empty[Int, List[Int]])((map, t) =>
if(map.contains(t._1)) (map + (t._1 -> (t._2 :: map(t._1)).reverse))
else map + (t._1 -> List(t._2)))
scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(2, 3), 3 ->
List(4, 5), 4 -> List(5))

find unique elements amongst the values of a map in scala

I have Map[String,Seq[String]].
I want to find the unique elements among all the values in the map. I want to do this in Scala.
Say, I have
Map['a' -> Seq(1,2,3),
'b' -> Seq(2,3),
'c' -> Seq(4)
]
I want the desired result to be
Map['a' -> Seq(3), 'c' -> Seq(4)]
Any idea on how to do this?
Thanks!
If you are looking for unique element in each list, then you can use currentList.diff(rest_of_the_list)
Given
scala> val input = Map('a' -> Seq(1,2,3), 'b' -> Seq(2,3), 'c' -> Seq(4))
input: scala.collection.immutable.Map[Char,Seq[Int]] = Map(a -> List(1, 2, 3), b -> List(2, 3), c -> List(4))
Find the rest of the elements for each key,
scala> val unions = input.map(elem => elem._1 -> input.filter(!_._1.equals(elem._1)).flatMap(_._2).toSet)
unions: scala.collection.immutable.Map[Char,scala.collection.immutable.Set[Int]] = Map(a -> Set(2, 3, 4), b -> Set(1, 2, 3, 4), c -> Set(1, 2, 3))
Then, iterate over input map and find the unique element in each each list
scala> input.map(x => x._1 -> x._2.diff(unions(x._1).toList))
res18: scala.collection.immutable.Map[Char,Seq[Int]] = Map(a -> List(1), b -> List(), c -> List(4))
If you don't want empty keys (b in above example)
scala> input.map(x => x._1 -> x._2.diff(unions(x._1).toList)).filter(_._2.nonEmpty)
res21: scala.collection.immutable.Map[Char,Seq[Int]] = Map(a -> List(1), c -> List(4))
Find the elements that non-unique by flattening all values and filter elements that size more than 1. Then, remove all non-unique element in every key.
val input = Map('a' -> Seq(1,2,3),
'b' -> Seq(2,3),
'c' -> Seq(4))
val nonUnique = input.values.flatten
.groupBy(identity)
.filter(_._2.size > 1)
.keys.toSeq
input.mapValues(x => x.diff(nonUnique)).filter(_._2.size == 1)

Scala Map update

I want to update Map value which is present in another Map. When I try to update is says 'value update is not a member of Option[scala.collection.immutable.Map[Int,Int]]'.
I tried to convert the value to Map but still, it didn't work for me.
val map = Map("one" -> Map(1 -> 11), "two" -> Map(2 -> 22))
val value = map1.get("one")
value(1) = 100 //value update is not a member of Option[scala.collection.Map[Int,Int]]
There are two mistakes you are making.
Calling get on a Map will return an Option, hence you are not able to set the value.
You are using immutable Map when your operation/purpose is to update the value of some key, for which you need to use mutable map.
Let us try to do the write some snippets to solve these two problems.
scala> val map = Map("one" -> Map(1 -> 11), "two" -> Map(2 -> 22))
map: scala.collection.immutable.Map[String,scala.collection.immutable.Map[Int,Int]] = Map(one -> Map(1 -> 11), two -> Map(2 -> 22))
scala> val valueOption = map.get("one")
valueOption: Option[scala.collection.immutable.Map[Int,Int]] = Some(Map(1 -> 11))
scala> val value = map("one")
value: scala.collection.immutable.Map[Int,Int] = Map(1 -> 11)
scala> value(1) = 100
<console>:13: error: value update is not a member of scala.collection.immutable.Map[Int,Int]
value(1) = 100
You should notice the difference between getting the value using .get and directly using parenthesis. This is a more understandable error and no need to understand Scala magic happening underneath.
Now if you repeat the same statements after importing mutable Map, you will be able to get what you are trying to achieve.
scala> import scala.collection.mutable.Map
import scala.collection.mutable.Map
scala> val map = Map("one" -> Map(1 -> 11), "two" -> Map(2 -> 22))
map: scala.collection.mutable.Map[String,scala.collection.mutable.Map[Int,Int]] = Map(one -> Map(1 -> 11), two -> Map(2 -> 22))
scala> val value = map("one")
value: scala.collection.mutable.Map[Int,Int] = Map(1 -> 11)
scala> value(1) = 100
scala> map
res2: scala.collection.mutable.Map[String,scala.collection.mutable.Map[Int,Int]] = Map(one -> Map(1 -> 100), two -> Map(2 -> 22))
When you created you first map it is already immutable which cannot be changed
scala> val map = Map("one" -> Map(1 -> 11), "two" -> Map(2 -> 22))
map: scala.collection.immutable.Map[String,scala.collection.immutable.Map[Int,Int]] = Map(one -> Map(1 -> 11), two -> Map(2 -> 22))
Your second command is returning an Option of immutable Map again and that can not be updated too.
scala> val value = map.get("one")
value: Option[scala.collection.immutable.Map[Int,Int]] = Some(Map(1 -> 11))
As chunjef suggested, you should be using mutable Map
scala> val map = Map("one" -> scala.collection.mutable.Map(1 -> 11), "two" -> scala.collection.mutable.Map(2 -> 22))
map: scala.collection.immutable.Map[String,scala.collection.mutable.Map[Int,Int]] = Map(one -> Map(1 -> 11), two -> Map(2 -> 22))

Reverse a map of type [Int, Seq[Int]]

I need to reverse a map
customerIdToAccountIds:Map[Int, Seq[Int]]
such that each account ID is a key to a list of all the customer IDs of the account (many-to-many relationship):
accountIdToCustomerIds:Map[Int, Seq[Int]]
What is a good idiomatic way to accomplish this? Thanks!
Input:
val customerIdToAccountIds:Map[Int, Seq[Int]] = Map(1 -> Seq(5,6,7), 2 -> Seq(5,6,7), 3 -> Seq(5,7,8))
val accountIdToCustomerIds:Map[Int, Seq[Int]] = ???
1 -> Seq(5,6,7)
2 -> Seq(5,6,7)
3 -> Seq(5,7,8)
Output:
5 -> Seq(1,2,3)
6 -> Seq(1,2)
7 -> Seq(1,2,3)
8 -> Seq(3)
val m = Map( 1 -> Seq(5,6,7)
, 2 -> Seq(5,6,7)
, 3 -> Seq(5,7,8) )
// Map inverter: from (k -> List(vs)) to (v -> List(ks))
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.map(_._2)}
//result: Map(8 -> List(3), 5 -> List(1, 2, 3), 7 -> List(1, 2, 3), 6 -> List(1, 2))
val customerIdToAccountIds = Map(1 -> Seq(5, 6, 7), 2 -> Seq(5, 6, 7), 3 -> Seq(5, 7, 8))
val accountIdToCustomerIds = customerIdToAccountIds.toSeq.flatMap {
case (customerId, accountIds) => accountIds.map { accountId => (customerId, accountId) } // swap
}.groupBy(_._2).mapValues(_.map(_._1)) // groupBy accountId and extract customerId from tuples

Scala flatten nested map

I have a nested Map like this one:
Map(1 -> Map(2 -> 3.0, 4 -> 5.0), 6 -> Map(7 -> 8.0))
I would like to 'flatten' it in a way such that the keys of the outer and inner maps are paired, i.e. for the example above:
Seq((1,2),(1,4),(6,7))
what is an elegant way to do this?
val m = Map(1 -> Map(2 -> 3.0, 4 -> 5.0), 6 -> Map(7 -> 8.0))
m.toSeq.flatMap({case (k, v) => v.keys.map((k,_))})
With for-comprehension:
val m = Map(1 -> Map(2 -> 3.0, 4 -> 5.0), 6 -> Map(7 -> 8.0))
scala> for((k1, v1) <- m.toSeq; k2 <- v1.keys) yield (k1, k2)
res4: Seq[(Int, Int)] = ArrayBuffer((1,2), (1,4), (6,7))