Convert RDD[(K,V) to Map[K,List[V]] - scala

How can i convert a RDD of tuple2 (Key,Value) with duplicate Keys into a Map[K,List[V]] ?
Input example:
val list = List((1,a),(1,b),(2,c),(2,d))
val rdd = sparkContext.parallelize(list)
Output expected:
Map((1,List(a,b)),(2,List(c,d)))

Just use groupByKey, then collectAsMap:
val rdd = sc.parallelize(List((1,"a"),(1,"b"),(2,"c"),(2,"d")))
rdd.groupByKey.collectAsMap
// res1: scala.collection.Map[Int,Iterable[String]] =
// Map(2 -> CompactBuffer(c, d), 1 -> CompactBuffer(a, b))
Alternatively, use map/reduceByKey then collectAsMap:
rdd.map{ case (k, v) => (k, Seq(v)) }.reduceByKey(_ ++ _).
collectAsMap
// res2: scala.collection.Map[Int,Seq[String]] =
// Map(2 -> List(c, d), 1 -> List(a, b))

You can use groupByKey , collectAsMap and map to achieve this like below
val rdd = sc.parallelize(List((1,"a"),(1,"b"),(2,"c"),(2,"d")))
val map=rdd.groupByKey.collectAsMap.map(x=>(x._1,x._2.toList))
Sample output:
Map(2 -> List(c, d), 1 -> List(a, b))

Related

Scala: inverting a one-to-many relationship [duplicate]

This question already has answers here:
Elegant way to invert a map in Scala
(10 answers)
Closed 3 years ago.
I have:
val intsPerChar: List[(Char, List[Int])] = List(
'A' -> List(1,2,3),
'B' -> List(2,3)
)
I want to get a mapping of ints with the chars that they have a mapping with. ie, I want to get:
val charsPerInt: Map[Int, List[Char]] = Map(
1 -> List('A'),
2 -> List('A', 'B'),
3 -> List('A', 'B')
)
Currently, I am doing the following:
val numbers: List[Int] = l.flatMap(_._2).distinct
numbers.map( n =>
n -> l.filter(_._2.contains(n)).map(_._1)
).toMap
Is there a less explicit way of doing this? ideally some sort of groupBy.
Try
intsPerChar
.flatMap { case (c, ns) => ns.map((_, c)) }
.groupBy(_._1)
.mapValues(_.map(_._2))
// Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))
Might be personal preference as to whether you consider it more or less readable, but the following is another option:
intsPerChar
.flatMap(n => n._2.map(i => i -> n._1)) // List((1,A), (2,A), (3,A), (2,B), (3,B))
.groupBy(_._1) // Map(2 -> List((2,A), (2,B)), 1 -> List((1,A)), 3 -> List((3,A), (3,B)))
.transform { (_, v) => v.unzip._2}
Final output is:
Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))

Using groupBy on a List of Tuples in Scala

I tried to group a list of tuples in Scala.
The input:
val a = List((1,"a"), (2,"b"), (3,"c"), (1,"A"), (2,"B"))
I applied:
a.groupBy(e => e._1)
The output I get is:
Map[Int,List[(Int, String)]] = Map(2 -> List((2,b), (2,B)), 1 -> List((1,a), (1,A)), 3 -> List((3,c)))
This is slightly different with what I expect:
Map[Int,List[(Int, String)]] = Map(2 -> List(b, B), 1 -> List(a, A)), 3 -> List(c))
What can I do to get the expected output?
You probably meant something like this:
a.groupBy(_._1).mapValues(_.map(_._2))
or:
a.groupBy(_._1).mapValues(_.unzip._2)
Result:
Map(2 -> List(b, B), 1 -> List(a, A), 3 -> List(c))
If you do not want to use mapValues, is this what you are expecting?
a.groupBy(_._1).map(f => (f._1, f._2.map(_._2)))
Result
Map(2 -> List(b, B), 1 -> List(a, A), 3 -> List(c))

Convert Seq[Seq[String,String]] to Map[String,Seq[String]]

I need to convert this structure
val seq = Seq(Seq("a","aa"), Seq("b","bb"), Seq("a", "a2"), Seq("b","b2") )
to this Map:
val map2 = Map ( "a" -> Seq("aa","a2"), "b" -> Seq("bb","b2") )
cannot use toMap because it only works with Tuple2 as input. Any ideas how to approach this?
You can first group by the first item of each sub-seq and then map the resulting grouped values to only keep the second element of subsequences:
Seq(Seq("a","aa"), Seq("b","bb"), Seq("a", "a2"), Seq("b","b2") )
.groupBy(_(0)) // Map(b -> List(List(b, bb), List(b, b2)), a -> List(List(a, aa), List(a, a2)))
.mapValues(_.map(_(1))) // Map(b -> List(bb, b2), a -> List(aa, a2))
which returns:
Map(b -> List(bb, b2), a -> List(aa, a2))
Similar: instead of using _(0) and _(1) you could also use .groupBy(_.head).mapValues(_.map(_.last))
The mapValues part can be made a bit more explicit this way:
.mapValues{
case valueLists => // List(List(b, bb), List(b, b2))
valueLists.map{
case List(k, v) => v // List(b, bb) => bb
}
}

Functional way to split a map of lists into a list of maps

I'm a bit stuck on this problem. I feel like I'm "thinking backwards" and it's confusing me a bit.
I have a Map[Long, Seq[String]] which I would like to convert into a Seq[Map[Long, String]]. Going the other direction is rather simple, as we can just group elements together, however, I'm not sure how to split this apart in a functional manner.
So,
val x = Map(1 -> List("a","b","c"), 2 -> List("d", "e"), 3 -> List("f"))
should become
List(Map(1 -> "a", 2 -> "d", 3 -> "f"), Map(1 -> "b", 2 -> "e"), Map(1 -> "c"))
I was thinking along the lines of using x.partition and then recursing on each resulting tuple, but I'm not really sure what I'd partition on :/
I'm writing in scala, but any functional answer is welcome (language agnostic).
In Haskell:
> import qualified Data.Map as M
> import Data.List
> m = M.fromList [(1,["a","b","c"]), (2,["d","e"]), (3,["f"])]
> map M.fromList . transpose . map (\(i,xs) -> map ((,) i) xs) . M.toList $ m
[fromList [(1,"a"),(2,"d"),(3,"f")],fromList [(1,"b"),(2,"e")],fromList [(1,"c")]]
M.toList and M.fromList convert a map to a list of association pairs, and back.
map ((,) i) xs is the same as [(i,x) | x<-xs], adding (i,...) to each element.
transpose exchanges the "rows" and "columns" in a list of lists, similarly to a matrix transposition.
Borrowing a neat transpose method from this SO answer, here's another way to do it:
def transpose[A](xs: List[List[A]]): List[List[A]] = xs.filter(_.nonEmpty) match {
case Nil => Nil
case ys: List[List[A]] => ys.map{ _.head }::transpose(ys.map{ _.tail })
}
transpose[(Int, String)](
x.toList.map{ case (k, v) => v.map( (k, _) ) }
).map{ _.toMap }
// Res1: List[scala.collection.immutable.Map[Int,String]] = List(
// Map(1 -> a, 2 -> d, 3 -> f), Map(1 -> b, 2 -> e), Map(1 -> c)
// )
In Scala:
val result = x.toList
.flatMap { case (k, vs) => vs.zipWithIndex.map { case (v, i) => (i, k, v) } } // flatten and add indices to inner lists
.groupBy(_._1) // group by index
.toList.sortBy(_._1).map(_._2) // can be replaced with .values if order isn't important
.map(_.map { case (_, k, v) => (k, v) }.toMap) // remove indices
Here is my answer in OCaml (using just Standard Library):
module M = Map.Make(struct type t = int let compare = compare end)
let of_bindings b =
List.fold_right (fun (k, v) m -> M.add k v m) b M.empty
let splitmap m =
let split1 (k, v) (b1, b2) =
match v with
| [] -> (b1, b2)
| [x] -> ((k, x) :: b1, b2)
| h :: t -> ((k, h) :: b1, (k, t) :: b2)
in
let rec loop sofar m =
if M.cardinal m = 0 then
List.rev sofar
else
let (b1, b2) =
List.fold_right split1 (M.bindings m) ([], [])
in
let (ms, m') = (of_bindings b1, of_bindings b2) in
loop (ms :: sofar) m'
in
loop [] m
It works for me:
# let m = of_bindings [(1, ["a"; "b"; "c"]); (2, ["d"; "e"]); (3, ["f"])];;
val m : string list M.t = <abstr>
# let ms = splitmap m;;
val ms : string M.t list = [<abstr>; <abstr>; <abstr>]
# List.map M.bindings ms;;
- : (M.key * string) list list =
[[(1, "a"); (2, "d"); (3, "f")]; [(1, "b"); (2, "e")]; [(1, "c")]]

Reverse / transpose a one-to-many map in Scala

What is the best way to turn a Map[A, Set[B]] into a Map[B, Set[A]]?
For example, how do I turn a
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
into a
Map("a" -> Set(1),
"b" -> Set(1, 2),
"c" -> Set(2, 3),
"d" -> Set(3))
(I'm using immutable collections only here. And my real problem has nothing to do with strings or integers. :)
with help from aioobe and Moritz:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_)(v)))).toMap
It's a bit more readable if you explicitly call contains:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_).contains(v)))).toMap
Best I've come up with so far is
val intToStrs = Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
def mappingFor(key: String) =
intToStrs.keys.filter(intToStrs(_) contains key).toSet
val newKeys = intToStrs.values.flatten
val inverseMap = newKeys.map(newKey => (newKey -> mappingFor(newKey))).toMap
Or another one using folds:
def reverse2[A,B](m:Map[A,Set[B]])=
m.foldLeft(Map[B,Set[A]]()){case (r,(k,s)) =>
s.foldLeft(r){case (r,e)=>
r + (e -> (r.getOrElse(e, Set()) + k))
}
}
Here's a one statement solution
orginalMap
.map{case (k, v)=>value.map{v2=>(v2,k)}}
.flatten
.groupBy{_._1}
.transform {(k, v)=>v.unzip._2.toSet}
This bit rather neatly (*) produces the tuples needed to construct the reverse map
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
.map{case (k, v)=>v.map{v2=>(v2,k)}}.flatten
produces
List((a,1), (b,1), (b,2), (c,2), (c,3), (d,3))
Converting it directly to a map overwrites the values corresponding to duplicate keys though
Adding .groupBy{_._1} gets this
Map(c -> List((c,2), (c,3)),
a -> List((a,1)),
d -> List((d,3)),
b -> List((b,1), (b,2)))
which is closer. To turn those lists into Sets of the second half of the pairs.
.transform {(k, v)=>v.unzip._2.toSet}
gives
Map(c -> Set(2, 3), a -> Set(1), d -> Set(3), b -> Set(1, 2))
QED :)
(*) YMMV
A simple, but maybe not super-elegant solution:
def reverse[A,B](m:Map[A,Set[B]])={
var r = Map[B,Set[A]]()
m.keySet foreach { k=>
m(k) foreach { e =>
r = r + (e -> (r.getOrElse(e, Set()) + k))
}
}
r
}
The easiest way I can think of is:
// unfold values to tuples (v,k)
// for all values v in the Set referenced by key k
def vk = for {
(k,vs) <- m.iterator
v <- vs.iterator
} yield (v -> k)
// fold iterator back into a map
(Map[String,Set[Int]]() /: vk) {
// alternative syntax: vk.foldLeft(Map[String,Set[Int]]()) {
case (m,(k,v)) if m contains k =>
// Map already contains a Set, so just add the value
m updated (k, m(k) + v)
case (m,(k,v)) =>
// key not in the map - wrap value in a Set and return updated map
m updated (k, Set(v))
}