Using groupBy on a List of Tuples in Scala - scala

I tried to group a list of tuples in Scala.
The input:
val a = List((1,"a"), (2,"b"), (3,"c"), (1,"A"), (2,"B"))
I applied:
a.groupBy(e => e._1)
The output I get is:
Map[Int,List[(Int, String)]] = Map(2 -> List((2,b), (2,B)), 1 -> List((1,a), (1,A)), 3 -> List((3,c)))
This is slightly different with what I expect:
Map[Int,List[(Int, String)]] = Map(2 -> List(b, B), 1 -> List(a, A)), 3 -> List(c))
What can I do to get the expected output?

You probably meant something like this:
a.groupBy(_._1).mapValues(_.map(_._2))
or:
a.groupBy(_._1).mapValues(_.unzip._2)
Result:
Map(2 -> List(b, B), 1 -> List(a, A), 3 -> List(c))

If you do not want to use mapValues, is this what you are expecting?
a.groupBy(_._1).map(f => (f._1, f._2.map(_._2)))
Result
Map(2 -> List(b, B), 1 -> List(a, A), 3 -> List(c))

Related

Scala: inverting a one-to-many relationship [duplicate]

This question already has answers here:
Elegant way to invert a map in Scala
(10 answers)
Closed 3 years ago.
I have:
val intsPerChar: List[(Char, List[Int])] = List(
'A' -> List(1,2,3),
'B' -> List(2,3)
)
I want to get a mapping of ints with the chars that they have a mapping with. ie, I want to get:
val charsPerInt: Map[Int, List[Char]] = Map(
1 -> List('A'),
2 -> List('A', 'B'),
3 -> List('A', 'B')
)
Currently, I am doing the following:
val numbers: List[Int] = l.flatMap(_._2).distinct
numbers.map( n =>
n -> l.filter(_._2.contains(n)).map(_._1)
).toMap
Is there a less explicit way of doing this? ideally some sort of groupBy.
Try
intsPerChar
.flatMap { case (c, ns) => ns.map((_, c)) }
.groupBy(_._1)
.mapValues(_.map(_._2))
// Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))
Might be personal preference as to whether you consider it more or less readable, but the following is another option:
intsPerChar
.flatMap(n => n._2.map(i => i -> n._1)) // List((1,A), (2,A), (3,A), (2,B), (3,B))
.groupBy(_._1) // Map(2 -> List((2,A), (2,B)), 1 -> List((1,A)), 3 -> List((3,A), (3,B)))
.transform { (_, v) => v.unzip._2}
Final output is:
Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))

Convert RDD[(K,V) to Map[K,List[V]]

How can i convert a RDD of tuple2 (Key,Value) with duplicate Keys into a Map[K,List[V]] ?
Input example:
val list = List((1,a),(1,b),(2,c),(2,d))
val rdd = sparkContext.parallelize(list)
Output expected:
Map((1,List(a,b)),(2,List(c,d)))
Just use groupByKey, then collectAsMap:
val rdd = sc.parallelize(List((1,"a"),(1,"b"),(2,"c"),(2,"d")))
rdd.groupByKey.collectAsMap
// res1: scala.collection.Map[Int,Iterable[String]] =
// Map(2 -> CompactBuffer(c, d), 1 -> CompactBuffer(a, b))
Alternatively, use map/reduceByKey then collectAsMap:
rdd.map{ case (k, v) => (k, Seq(v)) }.reduceByKey(_ ++ _).
collectAsMap
// res2: scala.collection.Map[Int,Seq[String]] =
// Map(2 -> List(c, d), 1 -> List(a, b))
You can use groupByKey , collectAsMap and map to achieve this like below
val rdd = sc.parallelize(List((1,"a"),(1,"b"),(2,"c"),(2,"d")))
val map=rdd.groupByKey.collectAsMap.map(x=>(x._1,x._2.toList))
Sample output:
Map(2 -> List(c, d), 1 -> List(a, b))

Convert Seq[Seq[String,String]] to Map[String,Seq[String]]

I need to convert this structure
val seq = Seq(Seq("a","aa"), Seq("b","bb"), Seq("a", "a2"), Seq("b","b2") )
to this Map:
val map2 = Map ( "a" -> Seq("aa","a2"), "b" -> Seq("bb","b2") )
cannot use toMap because it only works with Tuple2 as input. Any ideas how to approach this?
You can first group by the first item of each sub-seq and then map the resulting grouped values to only keep the second element of subsequences:
Seq(Seq("a","aa"), Seq("b","bb"), Seq("a", "a2"), Seq("b","b2") )
.groupBy(_(0)) // Map(b -> List(List(b, bb), List(b, b2)), a -> List(List(a, aa), List(a, a2)))
.mapValues(_.map(_(1))) // Map(b -> List(bb, b2), a -> List(aa, a2))
which returns:
Map(b -> List(bb, b2), a -> List(aa, a2))
Similar: instead of using _(0) and _(1) you could also use .groupBy(_.head).mapValues(_.map(_.last))
The mapValues part can be made a bit more explicit this way:
.mapValues{
case valueLists => // List(List(b, bb), List(b, b2))
valueLists.map{
case List(k, v) => v // List(b, bb) => bb
}
}

Inverse a nested Map in Scala

I have a Map of type Map[A, Map[B, C]].
How can I inverse it to have a Map of type Map[B, Map[A, C]]?
There are lots of ways you could define this operation. I'll walk through a couple of the ones that I find the clearest. For the first implementation I'll start with a helper method:
def flattenNestedMap[A, B, C](nested: Map[A, Map[B, C]]): Map[(A, B), C] =
for {
(a, innerMap) <- nested
(b, c) <- innerMap
} yield (a, b) -> c
This flattens the nested map to a map from pairs to values. Next we can define another helper operation that gets us almost what we need.
def groupByBs[A, B, C](flattened: Map[(A, B), C]): Map[B, Map[(A, B), C]] =
flattened.groupBy(_._1._2)
Now we just need to remove the redundant B from the keys in the inner map:
def invert[A, B, C](nested: Map[A, Map[B, C]]): Map[B, Map[A, C]] =
groupByBs(flattenNestedMap(nested)).mapValues(
_.map {
case ((a, _), c) => a -> c
}
)
(Note that mapValues is lazy, which means that the result will be recomputed every time you use it. In general this isn't a problem, and there are easy workarounds, but they're not really relevant to the question.)
And we're done:
scala> invert(Map(1 -> Map(2 -> 3), 10 -> Map(2 -> 4)))
res0: Map[Int,Map[Int,Int]] = Map(2 -> Map(1 -> 3, 10 -> 4))
You could also skip the helper methods and just chain the operations in invert. I find breaking them up a little clearer, but that's a matter of style.
Alternatively you could use a couple of folds:
def invert[A, B, C](nested: Map[A, Map[B, C]]): Map[B, Map[A, C]] =
nested.foldLeft(Map.empty[B, Map[A, C]]) {
case (acc, (a, innerMap)) =>
innerMap.foldLeft(acc) {
case (innerAcc, (b, c)) =>
innerAcc.updated(b, innerAcc.getOrElse(b, Map.empty).updated(a, c))
}
}
Which does the same thing:
scala> invert(Map(1 -> Map(2 -> 3), 10 -> Map(2 -> 4)))
res1: Map[Int,Map[Int,Int]] = Map(2 -> Map(1 -> 3, 10 -> 4))
The foldLeft version has more of the shape of the straightforward imperative version—we're (functionally) iterating through the key-value pairs of the outer and inner maps and building up the result. Off the top of my head I'd guess it's also a little more efficient, but I'm not sure about that, and it's unlikely to matter much, so I'd suggest choosing the one you personally find clearer.
You can simply do it using map operation on given Map collection :
scala> Map("A" -> Map("B" -> "C"), "X" -> Map("Y" -> "Z"))
res1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,String]] = Map(A -> Map(B -> C), X -> Map(Y -> Z))
scala> res1.map{ case (key, valueMap) => valueMap.map{ case (vmKey, vmValue) => (vmKey -> Map(key -> vmValue)) } }
res2: scala.collection.immutable.Iterable[scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,String]]] = List(Map(B -> Map(A -> C)), Map(Y -> Map(X -> Z)))

Scala troubles with sorting

I am still on studying period when it comes to scala and faces some problems that I would like to solve.
What I have at the moment is a Seq of items type X. Now I want to make a function that returns me a map of numbers mapped with set of items that appear on that original seq certain amount of time.
Here is small example what I want to do:
val exampleSeq[X]: Seq = [a, b, d, d, c, b, d]
val exampleSeq2[x]: Seq = [a, a, a, c, c, b, b, c]
myMagicalFunction(exampleSeq) returns Map[1 -> Set[a, c], 2 -> Set[b], 3 -> Set[d]]
myMagicalFunction(exampleSeq2) returns Map[2 -> Set[b], 3 -> Set[a, c]]
So far I have been able to create a function that maps the item with the times it appears:
function[X](seq: Seq[X]) = seq.groupBy(item => item).mapValues(_.size)
Return for my exampleSeq from that one is
Map(a -> 1, b -> 2, c -> 1, d -> 3)
Thank you for answers :)
One approach, for
val a = Seq('a', 'b', 'd', 'd', 'c', 'b', 'd')
this
val b = for ( (k,v) <- a.groupBy(identity).mapValues(_.size).toArray )
yield (v,k)
delivers
Array((2,b), (3,d), (1,a), (1,c))
and so
b.groupBy(_._1).mapValues(_.map(_._2).toSet)
res: Map(2 -> Set(b), 1 -> Set(a, c), 3 -> Set(d))
Note seq.groupBy(item => item) is equivalent to seq.groupBy(identity).
You are almost there! Departing from the collection element -> count, you only need a transformation to get to count -> Col[elem].
Lets say that freqItem = Map(a -> 1, b -> 2, c -> 1, d -> 3) you would do something like:
val freqSet = freqItem.toSeq.map(_.swap).groupBy(_._1).mapValues(_.toSet)
Note that we transform the Map into a Seq before swapping the (k,v) into (v,k) because mapping over a Map preserves the semantics of key uniqueness and you'd lose one of (1 -> a), (1 -> b) otherwise.
You can write your function as :
def f[T](l: Seq[T]): Map[Int, Set[T]] = {
l.map {
x => (x, l.count(_ == x))
}.distinct.groupBy(_._2).mapValues(_.map(_._1).toSet)
}
val l = List("a","a","a","b","b","b","b","c","c","d","e")
f(l)
res0: Map[Int,Set[String]] = Map(2 -> Set(c), 4 -> Set(b), 1 -> Set(d, e), 3 -> Set(a))
scala> case class A(name:String,age:Int)
defined class A
scala> val l = List(new A("a",1),new A("b",2),new A("a",1),new A("c",1) )
l: List[A] = List(A(a,1), A(b,2), A(a,1), A(c,1))
scala> f[A](l)
res1: Map[Int,Set[A]] = Map(2 -> Set(A(a,1)), 1 -> Set(A(b,2), A(c,1)))