Scala: inverting a one-to-many relationship [duplicate] - scala

This question already has answers here:
Elegant way to invert a map in Scala
(10 answers)
Closed 3 years ago.
I have:
val intsPerChar: List[(Char, List[Int])] = List(
'A' -> List(1,2,3),
'B' -> List(2,3)
)
I want to get a mapping of ints with the chars that they have a mapping with. ie, I want to get:
val charsPerInt: Map[Int, List[Char]] = Map(
1 -> List('A'),
2 -> List('A', 'B'),
3 -> List('A', 'B')
)
Currently, I am doing the following:
val numbers: List[Int] = l.flatMap(_._2).distinct
numbers.map( n =>
n -> l.filter(_._2.contains(n)).map(_._1)
).toMap
Is there a less explicit way of doing this? ideally some sort of groupBy.

Try
intsPerChar
.flatMap { case (c, ns) => ns.map((_, c)) }
.groupBy(_._1)
.mapValues(_.map(_._2))
// Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))

Might be personal preference as to whether you consider it more or less readable, but the following is another option:
intsPerChar
.flatMap(n => n._2.map(i => i -> n._1)) // List((1,A), (2,A), (3,A), (2,B), (3,B))
.groupBy(_._1) // Map(2 -> List((2,A), (2,B)), 1 -> List((1,A)), 3 -> List((3,A), (3,B)))
.transform { (_, v) => v.unzip._2}
Final output is:
Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))

Related

Convert RDD[(K,V) to Map[K,List[V]]

How can i convert a RDD of tuple2 (Key,Value) with duplicate Keys into a Map[K,List[V]] ?
Input example:
val list = List((1,a),(1,b),(2,c),(2,d))
val rdd = sparkContext.parallelize(list)
Output expected:
Map((1,List(a,b)),(2,List(c,d)))
Just use groupByKey, then collectAsMap:
val rdd = sc.parallelize(List((1,"a"),(1,"b"),(2,"c"),(2,"d")))
rdd.groupByKey.collectAsMap
// res1: scala.collection.Map[Int,Iterable[String]] =
// Map(2 -> CompactBuffer(c, d), 1 -> CompactBuffer(a, b))
Alternatively, use map/reduceByKey then collectAsMap:
rdd.map{ case (k, v) => (k, Seq(v)) }.reduceByKey(_ ++ _).
collectAsMap
// res2: scala.collection.Map[Int,Seq[String]] =
// Map(2 -> List(c, d), 1 -> List(a, b))
You can use groupByKey , collectAsMap and map to achieve this like below
val rdd = sc.parallelize(List((1,"a"),(1,"b"),(2,"c"),(2,"d")))
val map=rdd.groupByKey.collectAsMap.map(x=>(x._1,x._2.toList))
Sample output:
Map(2 -> List(c, d), 1 -> List(a, b))

Using groupBy on a List of Tuples in Scala

I tried to group a list of tuples in Scala.
The input:
val a = List((1,"a"), (2,"b"), (3,"c"), (1,"A"), (2,"B"))
I applied:
a.groupBy(e => e._1)
The output I get is:
Map[Int,List[(Int, String)]] = Map(2 -> List((2,b), (2,B)), 1 -> List((1,a), (1,A)), 3 -> List((3,c)))
This is slightly different with what I expect:
Map[Int,List[(Int, String)]] = Map(2 -> List(b, B), 1 -> List(a, A)), 3 -> List(c))
What can I do to get the expected output?
You probably meant something like this:
a.groupBy(_._1).mapValues(_.map(_._2))
or:
a.groupBy(_._1).mapValues(_.unzip._2)
Result:
Map(2 -> List(b, B), 1 -> List(a, A), 3 -> List(c))
If you do not want to use mapValues, is this what you are expecting?
a.groupBy(_._1).map(f => (f._1, f._2.map(_._2)))
Result
Map(2 -> List(b, B), 1 -> List(a, A), 3 -> List(c))

Scala troubles with sorting

I am still on studying period when it comes to scala and faces some problems that I would like to solve.
What I have at the moment is a Seq of items type X. Now I want to make a function that returns me a map of numbers mapped with set of items that appear on that original seq certain amount of time.
Here is small example what I want to do:
val exampleSeq[X]: Seq = [a, b, d, d, c, b, d]
val exampleSeq2[x]: Seq = [a, a, a, c, c, b, b, c]
myMagicalFunction(exampleSeq) returns Map[1 -> Set[a, c], 2 -> Set[b], 3 -> Set[d]]
myMagicalFunction(exampleSeq2) returns Map[2 -> Set[b], 3 -> Set[a, c]]
So far I have been able to create a function that maps the item with the times it appears:
function[X](seq: Seq[X]) = seq.groupBy(item => item).mapValues(_.size)
Return for my exampleSeq from that one is
Map(a -> 1, b -> 2, c -> 1, d -> 3)
Thank you for answers :)
One approach, for
val a = Seq('a', 'b', 'd', 'd', 'c', 'b', 'd')
this
val b = for ( (k,v) <- a.groupBy(identity).mapValues(_.size).toArray )
yield (v,k)
delivers
Array((2,b), (3,d), (1,a), (1,c))
and so
b.groupBy(_._1).mapValues(_.map(_._2).toSet)
res: Map(2 -> Set(b), 1 -> Set(a, c), 3 -> Set(d))
Note seq.groupBy(item => item) is equivalent to seq.groupBy(identity).
You are almost there! Departing from the collection element -> count, you only need a transformation to get to count -> Col[elem].
Lets say that freqItem = Map(a -> 1, b -> 2, c -> 1, d -> 3) you would do something like:
val freqSet = freqItem.toSeq.map(_.swap).groupBy(_._1).mapValues(_.toSet)
Note that we transform the Map into a Seq before swapping the (k,v) into (v,k) because mapping over a Map preserves the semantics of key uniqueness and you'd lose one of (1 -> a), (1 -> b) otherwise.
You can write your function as :
def f[T](l: Seq[T]): Map[Int, Set[T]] = {
l.map {
x => (x, l.count(_ == x))
}.distinct.groupBy(_._2).mapValues(_.map(_._1).toSet)
}
val l = List("a","a","a","b","b","b","b","c","c","d","e")
f(l)
res0: Map[Int,Set[String]] = Map(2 -> Set(c), 4 -> Set(b), 1 -> Set(d, e), 3 -> Set(a))
scala> case class A(name:String,age:Int)
defined class A
scala> val l = List(new A("a",1),new A("b",2),new A("a",1),new A("c",1) )
l: List[A] = List(A(a,1), A(b,2), A(a,1), A(c,1))
scala> f[A](l)
res1: Map[Int,Set[A]] = Map(2 -> Set(A(a,1)), 1 -> Set(A(b,2), A(c,1)))

Map a single entry of a Map

I want to achieve something like the following:
(_ : Map[K,Int]).mapKey(k, _ + 1)
And the mapKey function applies its second argument (Int => Int) only to the value stored under k. Is there something inside the standard lib? If not I bet there's something in Scalaz.
Of course I can write this function myself (m.updated(k,f(m(k))) and its simple to do so. But I've come over this problem several times, so maybe its already done?
For Scalaz I imagine something along the following code:
(m: Map[A,B]).project(k: A).map(f: B => B): Map[A,B]
You could of course add
def changeForKey[A,B](a: A, fun: B => B): Tuple2[A, B] => Tuple2[A, B] = { kv =>
kv match {
case (`a`, b) => (a, fun(b))
case x => x
}
}
val theMap = Map('a -> 1, 'b -> 2)
theMap map changeForKey('a, (_: Int) + 1)
res0: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 2, 'b -> 2)
But this would circumvent any optimisation regarding memory re-use and access.
I came also up with a rather verbose and inefficient scalaz solution using a zipper for your proposed project method:
theMap.toStream.toZipper.flatMap(_.findZ(_._1 == 'a).flatMap(elem => elem.delete.map(_.insert((elem.focus._1, fun(elem.focus._2)))))).map(_.toStream.toMap)
or
(for {
z <- theMap.toStream.toZipper
elem <- z.findZ(_._1 == 'a)
z2 <- elem.delete
} yield z2.insert((elem.focus._1, fun(elem.focus._2)))).map(_.toStream.toMap)
Probably of little use. I’m just posting for reference.
Here is one way:
scala> val m = Map(2 -> 3, 5 -> 11)
m: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 5 -> 11)
scala> m ++ (2, m.get(2).map(1 +)).sequence
res53: scala.collection.immutable.Map[Int,Int] = Map(2 -> 4, 5 -> 11)
scala> m ++ (9, m.get(9).map(1 +)).sequence
res54: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 5 -> 11)
This works because (A, Option[B]).sequence gives Option[(A, B)]. (sequence in general turns types inside out. i.e. F[G[A]] => [G[F[A]], given F : Traverse and G : Applicative.)
You can pimp it with this so that it creates a new map based on the old one:
class MapUtils[A, B](map: Map[A, B]) {
def mapValueAt(a: A)(f: (B) => B) = map.get(a) match {
case Some(b) => map + (a -> f(b))
case None => map
}
}
implicit def toMapUtils[A, B](map: Map[A, B]) = new MapUtils(map)
val m = Map(1 -> 1)
m.mapValueAt(1)(_ + 1)
// Map(1 -> 2)
m.mapValueAt(2)(_ + 1)
// Map(1 -> 1)

Reverse / transpose a one-to-many map in Scala

What is the best way to turn a Map[A, Set[B]] into a Map[B, Set[A]]?
For example, how do I turn a
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
into a
Map("a" -> Set(1),
"b" -> Set(1, 2),
"c" -> Set(2, 3),
"d" -> Set(3))
(I'm using immutable collections only here. And my real problem has nothing to do with strings or integers. :)
with help from aioobe and Moritz:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_)(v)))).toMap
It's a bit more readable if you explicitly call contains:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_).contains(v)))).toMap
Best I've come up with so far is
val intToStrs = Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
def mappingFor(key: String) =
intToStrs.keys.filter(intToStrs(_) contains key).toSet
val newKeys = intToStrs.values.flatten
val inverseMap = newKeys.map(newKey => (newKey -> mappingFor(newKey))).toMap
Or another one using folds:
def reverse2[A,B](m:Map[A,Set[B]])=
m.foldLeft(Map[B,Set[A]]()){case (r,(k,s)) =>
s.foldLeft(r){case (r,e)=>
r + (e -> (r.getOrElse(e, Set()) + k))
}
}
Here's a one statement solution
orginalMap
.map{case (k, v)=>value.map{v2=>(v2,k)}}
.flatten
.groupBy{_._1}
.transform {(k, v)=>v.unzip._2.toSet}
This bit rather neatly (*) produces the tuples needed to construct the reverse map
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
.map{case (k, v)=>v.map{v2=>(v2,k)}}.flatten
produces
List((a,1), (b,1), (b,2), (c,2), (c,3), (d,3))
Converting it directly to a map overwrites the values corresponding to duplicate keys though
Adding .groupBy{_._1} gets this
Map(c -> List((c,2), (c,3)),
a -> List((a,1)),
d -> List((d,3)),
b -> List((b,1), (b,2)))
which is closer. To turn those lists into Sets of the second half of the pairs.
.transform {(k, v)=>v.unzip._2.toSet}
gives
Map(c -> Set(2, 3), a -> Set(1), d -> Set(3), b -> Set(1, 2))
QED :)
(*) YMMV
A simple, but maybe not super-elegant solution:
def reverse[A,B](m:Map[A,Set[B]])={
var r = Map[B,Set[A]]()
m.keySet foreach { k=>
m(k) foreach { e =>
r = r + (e -> (r.getOrElse(e, Set()) + k))
}
}
r
}
The easiest way I can think of is:
// unfold values to tuples (v,k)
// for all values v in the Set referenced by key k
def vk = for {
(k,vs) <- m.iterator
v <- vs.iterator
} yield (v -> k)
// fold iterator back into a map
(Map[String,Set[Int]]() /: vk) {
// alternative syntax: vk.foldLeft(Map[String,Set[Int]]()) {
case (m,(k,v)) if m contains k =>
// Map already contains a Set, so just add the value
m updated (k, m(k) + v)
case (m,(k,v)) =>
// key not in the map - wrap value in a Set and return updated map
m updated (k, Set(v))
}