I am new to Scala I was trying to flatten the list and invert the mapping. For example I have a map as below :
Map("abc"->List(1,2,3),"def"->List(1,5,6))
I want the result to be :
Map(1->List("abc","def"),2->List("abc"),3->List("abc"),5->List("def"),6->List("def"))
What is the best way to achieve this?
scala> val mm = Map("abc"->List(1,2,3),"def"->List(1,5,6))
mm.toList.flatMap{ case (s, l) => l.map(ll => (ll, s))}.groupBy(_._1).map{ case (i, l) => (i, l.map(_._2))}
mm: scala.collection.immutable.Map[String,List[Int]] = Map(abc -> List(1, 2, 3), def -> List(1, 5, 6))
scala> res9: scala.collection.immutable.Map[Int,List[String]] = Map(5 -> List(def), 1 -> List(abc, def), 6 -> List(def), 2 -> List(abc), 3 -> List(abc))
scala>
UPDATE:
A slightly different solution I like better:
mm.toList.flatMap{ case (s, l) =>
l.map(li => (li, s))
}.foldLeft(Map.empty[Int, List[String]]){
case (m, (i, s)) => m.updated(i, s :: m.getOrElse(i, List.empty))
}
Here is how you can do in simple way
val data = Map("abc"->List(1,2,3),"def"->List(1,5,6))
val list = data.toList.flatMap(x => {
x._2.map(y => (y, x._1))
}).groupBy(_._1).map(x => (x._1, x._2.map(_._2)))
Output:
(5,List(def))
(1,List(abc, def))
(6,List(def))
(2,List(abc))
(3,List(abc))
Hope this helps!
Here is one more way of doing this:
Map("abc" -> List(1,2,3), "def"-> List(1,5,6)).flatMap {
case (key, values) => values.map(elem => Map(elem -> key))
}.flatten.foldRight(Map.empty[Int, List[String]]) { (elem, acc) =>
val (key, value) = elem
if (acc.contains(key)) {
val newValues = acc(key) ++ List(value)
(acc - key) ++ Map(key -> newValues)
} else {
acc ++ Map(key -> List(value))
}
}
So basically what I do is to go over the initial Map, transform that to a tuple and then do a foldRight and group identical keys into the accumulator.
This is a bit verbose than the other solutions posted here, but I prefer to avoid using underscores in my implementations as much as possible.
Another way to invert the Map:
val m = Map("abc" -> List(1, 2, 3), "def" -> List(1, 5, 6))
m.map{ case (k, v) => v.map((_, k)) }.flatten.
groupBy(_._1).mapValues( _.map(_._2) )
// res1: scala.collection.immutable.Map[Int,scala.collection.immutable.Iterable[String]] = Map(
// 5 -> List(def), 1 -> List(abc, def), 6 -> List(def), 2 -> List(abc), 3 -> List(abc)
// )
Related
I am trying to reverse a map that has a String as the key and a set of numbers as its value
My goal is to create a list that contains a tuple of a number and a list of strings that had the same number in the value set
I have this so far:
def flipMap(toFlip: Map[String, Set[Int]]): List[(Int, List[String])] = {
toFlip.flatMap(_._2).map(x => (x, toFlip.keys.toList)).toList
}
but it is only assigning every String to every Int
val map = Map(
"A" -> Set(1,2),
"B" -> Set(2,3)
)
should produce:
List((1, List(A)), (2, List(A, B)), (3, List(B)))
but is producing:
List((1, List(A, B)), (2, List(A, B)), (3, List(A, B)))
This works to, but it's not exactly what you might need and you may need some conversions to get the exact data type you need:
toFlip.foldLeft(Map.empty[Int, Set[String]]) {
case (acc, (key, numbersSet)) =>
numbersSet.foldLeft(acc) {
(updatingMap, newNumber) =>
updatingMap.updatedWith(newNumber) {
case Some(existingSet) => Some(existingSet + key)
case None => Some(Set(key))
}
}
}
I used Set to avoid duplicate key insertions in the the inner List, and used Map for better look up instead of the outer List.
You can do something like this:
def flipMap(toFlip: Map[String, Set[Int]]): List[(Int, List[String])] =
toFlip
.toList
.flatMap {
case (key, values) =>
values.map(value => value -> key)
}.groupMap(_._1)(_._2)
.view
.mapValues(_.distinct)
.toList
Note, I personally would return a Map instead of a List
Or if you have cats in scope.
def flipMap(toFlip: Map[String, Set[Int]]): Map[Int, Set[String]] =
toFlip.view.flatMap {
case (key, values) =>
values.map(value => Map(value -> Set(key)))
}.toList.combineAll
// both scala2 & scala3
scala> map.flatten{ case(k, s) => s.map(v => (k, v)) }.groupMapReduce{ case(k, v) => v }{case(k, v) => List(k)}{ _ ++ _ }
val res0: Map[Int, List[String]] = Map(1 -> List(A), 2 -> List(A, B), 3 -> List(B))
// scala3 only
scala> map.flatten((k, s) => s.map(v => (k, v))).groupMapReduce((k, v) => v)((k, v) => List(k))( _ ++ _ )
val res1: Map[Int, List[String]] = Map(1 -> List(A), 2 -> List(A, B), 3 -> List(B))
I've tried to simplify a real code but not to much.
Given the following input, implementation of f and g are just for examples, real one are more complicated
scala> val m = Map("a" -> 1, "b" -> 2, "c" -> 3, "d" -> 4)
m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)
scala> val f : Int => Option[Int] = i => if (i % 2 == 0) Some(i) else None
f: Int => Option[Int] = <function1>
scala> val g = (a:Int, l:List[Int]) => a :: l
g: (Int, List[Int]) => List[Int] = <function2>
Here is the process :
m.foldLeft(List[Int]()) { case (l, (k, v)) =>
f(v) match {
case Some(w) => g(w, l)
case None => l
}
}
Is it possible to use scalaz to better reveal the intent ?
I'm thinkink about m.traverseS
If g really needs to handle the whole List[Int] then I immediately thought of Endo (requires slightly rewriting g):
val f: Int => Iterable[Int] = ???
val g: Int => Endo[List[Int]] = ???
val m = Map("a" -> 1, "b" -> 2, "c" -> 3, "d" -> 4)
Foldable[List].fold(m.values.toList flatMap f map g).apply(List[Int]())
I'm not sure that's much clearer though.
m.collect{ case(s, i) => (s, f(i))}
.filter{ case (s,i) => i.isDefined }
.toList
.traverseS({s => State({ l: List[Int] => (g(s._2.get, l), ())})})
.run(Nil)
._1
.reverse
In Scala Map (see API) what is the difference in semantics and performance between mapValues and transform ?
For any given map, for instance
val m = Map( "a" -> 2, "b" -> 3 )
both
m.mapValues(_ * 5)
m.transform( (k,v) => v * 5 )
deliver the same result.
Let's say we have a Map[A,B]. For clarification: I'm always referring to an immutable Map.
mapValues takes a function B => C, where C is the new type for the values.
transform takes a function (A, B) => C, where this C is also the type for the values.
So both will result in a Map[A,C].
However with the transform function you can influence the result of the new values by the value of their keys.
For example:
val m = Map( "a" -> 2, "b" -> 3 )
m.transform((key, value) => key + value) //Map[String, String](a -> a2, b -> b3)
Doing this with mapValues will be quite hard.
The next difference is that transform is strict, whereas mapValues will give you only a view, which will not store the updated elements. It looks like this:
protected class MappedValues[C](f: B => C) extends AbstractMap[A, C] with DefaultMap[A, C] {
override def foreach[D](g: ((A, C)) => D): Unit = for ((k, v) <- self) g((k, f(v)))
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
override def size = self.size
override def contains(key: A) = self.contains(key)
def get(key: A) = self.get(key).map(f)
}
(taken from https://github.com/scala/scala/blob/v2.11.2/src/library/scala/collection/MapLike.scala#L244)
So performance-wise it depends what is more effective. If f is expensive and you only access a few elements of the resulting map, mapValues might be better, since f is only applied on demand. Otherwise I would stick to map or transform.
transform can also be expressed with map. Assume m: Map[A,B] and f: (A,B) => C, then
m.transform(f) is equivalent to m.map{case (a, b) => (a, f(a, b))}
collection.Map doesn't provide transform: it has a different signature for mutable and immutable Maps.
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val im = Map('a -> 1, 'b -> 2, 'c -> 3)
im: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 1, 'b -> 2, 'c -> 3)
scala> im.mapValues(_ * 7) eq im
res0: Boolean = false
scala> im.transform { case (k,v) => v*7 } eq im
res2: Boolean = false
scala> val mm = collection.mutable.Map('a -> 1, 'b -> 2, 'c -> 3)
mm: scala.collection.mutable.Map[Symbol,Int] = Map('b -> 2, 'a -> 1, 'c -> 3)
scala> mm.mapValues(_ * 7) eq mm
res3: Boolean = false
scala> mm.transform { case (k,v) => v*7 } eq mm
res5: Boolean = true
Mutable transform mutates in place:
scala> mm.transform { case (k,v) => v*7 }
res6: mm.type = Map('b -> 98, 'a -> 49, 'c -> 147)
scala> mm.transform { case (k,v) => v*7 }
res7: mm.type = Map('b -> 686, 'a -> 343, 'c -> 1029)
So mutable transform doesn't change the type of the map:
scala> im mapValues (_ => "hi")
res12: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
scala> mm mapValues (_ => "hi")
res13: scala.collection.Map[Symbol,String] = Map('b -> hi, 'a -> hi, 'c -> hi)
scala> mm.transform { case (k,v) => "hi" }
<console>:9: error: type mismatch;
found : String("hi")
required: Int
mm.transform { case (k,v) => "hi" }
^
scala> im.transform { case (k,v) => "hi" }
res15: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
...as can happen when constructing a new map.
Here's a couple of unmentioned differences:
mapValues creates a Map that is NOT serializable, without any indication that it's just a view (the type is Map[_, _], but just try to send one across the wire).
Since mapValues is just a view, every instance contains the real Map - which could be another result of mapValues. Imagine you have an actor with some state, and every mutation of the state sets the new state to be a mapValues on the previous state...in the end you have deeply nested maps with a copy of each previous state of the actor (and, yes, both of these are from experience).
I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")] into map ("a" -> ["b", "f"], "c" -> ["d"]). Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
btw, actual type of key-value I use here is (Int, Node) and I want to turn into a map of (Int -> NodeSeq)
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
List("a" -> 1, "b" -> 2, "a" -> 3).toMap
// Result: Map(a -> 3, c -> 2)
As of 2.12, the default policy reads:
Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.
Group and then project:
scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
More scalish way to use fold, in the way like there (skip map f step).
Here's another alternative:
x.groupBy(_._1).mapValues(_.map(_._2))
For Googlers that do care about duplicates:
implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}
> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
This:
groups elements based on the first part of tuples (group part of groupMap)
maps grouped values by taking their second tuple part (map part of groupMap)
This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
GroupBy variation
list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
Fold Left variation
list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})
Aggregate Variation - Similar to fold Left
list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)
import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)
// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)
// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap
Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.
val x = List("a" -> "b", "c" -> "d", "a" -> "f")
x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}
res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))
You can try this
scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)
Quite a few functions on Map take a function on a key-value tuple as the argument. E.g. def foreach(f: ((A, B)) ⇒ Unit): Unit. So I looked for a short way to write an argument to foreach:
> val map = Map(1 -> 2, 3 -> 4)
map: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 3 -> 4)
> map.foreach((k, v) => println(k))
error: wrong number of parameters; expected = 1
map.foreach((k, v) => println(k))
^
> map.foreach({(k, v) => println(k)})
error: wrong number of parameters; expected = 1
map.foreach({(k, v) => println(k)})
^
> map.foreach(case (k, v) => println(k))
error: illegal start of simple expression
map.foreach(case (k, v) => println(k))
^
I can do
> map.foreach(_ match {case (k, v) => println(k)})
1
3
Any better alternatives?
You were very close with map.foreach(case (k, v) => println(k)). To use case in an anonymous function, surround it by curly brackets.
map foreach { case (k, v) => println(k) }
In such cases I often use the for syntax.
for ((k,v) <- map) println(k)
According to Chapter 23 in "Programming in Scala" the above for loop is translated to call foreach.
One alternative is the tupled method of the Function object:
import Function.tupled;
// map tupled foreach {(k, v) => println(k)}
map foreach tupled {(k, v) => println(k)}
You can also access a tuple as follows:
scala> val map = Map(1 -> 2, 3 -> 4)
map: scala.collection.immutable.Map[Int,Int] = Map((1,2), (3,4))
scala> map foreach (t => println(t._1))
1
3
Welcome to Scala version 2.8.0.Beta1-prerelease (OpenJDK Server VM, Java 1.6.0_0).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val m = Map('a -> 'b, 'c -> 'd)
m: scala.collection.immutable.Map[Symbol,Symbol] = Map('a -> 'b, 'c -> 'd)
scala> m foreach { case(k, v) => println(k) }
'a
'c
I was pretty close with the last attempt, actually:
> map.foreach({case (k, v) => println(k)})
1
3