Correct interpretation of Scala Map's flatmap expression - scala

I am scratching my head vigorously, to understand the logic that produces the value out of a flatMap() operation:
val ys = Map("a" -> List(1 -> 11,1 -> 111), "b" -> List(2 -> 22,2 -> 222)).flatMap(e => {
| println("e =" + e)
| (e._2)
| })
e =(a,List((1,11), (1,111)))
e =(b,List((2,22), (2,222)))
ys: scala.collection.immutable.Map[Int,Int] = Map(1 -> 111, 2 -> 222)
The println clearly shows that flatMap is taking in one entry out of the input Map. So, e._2 is a List of Pairs. I can't figure out what exactly happens after that!
I am missing a very important and subtle step somewhere. Please enlighten me.

It can be thought of as:
First we map:
val a = Map("a" -> List(1 -> 11,1 -> 111), "b" -> List(2 -> 22,2 -> 222)).map(e => e._2)
// List(List((1, 11), (1, 111)), List((2, 22), (2, 222)))
Then we flatten:
val b = a.flatten
// List((1, 11), (1, 111), (2, 22), (2, 222))
Then we convert back to a map:
b.toMap
// Map(1 -> 111, 2 -> 222)
Since a map cannot have 2 values for 1 key, the value is overwritten.
Really whats going on is that the flatMap is being converted into a loop like so:
for (x <- m0) b ++= f(x)
where:
m0 is our original map
b is a collection builder that has to build a Map, aka, MapBuilder
f is our function being passed into the flatMap (it returns a List[(Int, Int)])
x is an element in our original map
The ++= function takes the list we got from calling f(x), and calls += on every element, to add it to our map. For a Map, += just calls the original + operator for a Map, which updates the value if the key already exists.
Finally we call result on our builder which just returns us our Map.

Related

Scala: Why the mutable value inside Map cannot be changed if the Map is created from GroupBy

I want to create a Map object with the key as an integer and the value as a mutable Set. However, when I create my Map object from GroupBy function, the value in my mutable Set can not be changed anymore. Can anyone tell my why this happen?
import scala.collection.mutable
val groupedMap: Map[Int, mutable.Set[Int]] =
List((1,1),(1,2),(2,3))
.groupBy(_._1)
.mapValues(_.map(_._2).to[mutable.Set])
val originalMap: Map[Int, mutable.Set[Int]] =
Map(1 -> mutable.Set(1, 2), 2 -> mutable.Set(3))
println(groupedMap) // Map(1 -> Set(1, 2), 2 -> Set(3))
println(originalMap) // Map(1 -> Set(1, 2), 2 -> Set(3))
groupedMap(1) += 99
originalMap(1) += 99
println(groupedMap) // Map(1 -> Set(1, 2), 2 -> Set(3)) <- HERE IS THE PROBLEM, THE VALUE 99 CAN NOT BE ADDED TO MY MUTABLE SET!
println(originalMap) // Map(1 -> Set(99, 1, 2), 2 -> Set(3))
.mapValues is lazy, meaning the function you give it is executed every time you access the value, so, when you do groupedMap(1) += 99, it runs your conversion, returns a Set to which you add 99, and discards it.
Then, when you print it, it runs the conversion again ... and prints the original contents.
If the above does not seem clear, try running this snippet as an illustration:
val foo = Map("foo" -> "bar")
.mapValues { _ => println("mapValues"); "baz" }
println(foo("foo") + foo("foo"))
This is one of many problems you run into when using mutable data. Don't do it. In 99% of use cases in scala it is not needed. So, it is better to just pretend it does not exist at all, until you get enough grasp of the language to be able to determine definitively the remaining 1%.

Invert a Map (String -> List) in Scala

I have a Map[String, List[String]] and I want to invert it. For example, if I have something like
"1" -> List("a","b","c")
"2" -> List("a","j","k")
"3" -> List("a","c")
The result should be
"a" -> List("1","2","3")
"b" -> List("1")
"c" -> List("1","3")
"j" -> List("2")
"k" -> List("2")
I've tried this:
m.map(_.swap)
But it returns a Map[List[String], String]:
List("a","b","c") -> "1"
List("a","j","k") -> "2"
List("a","c") -> "3"
Map inversion is a little more complicated.
val m = Map("1" -> List("a","b","c")
,"2" -> List("a","j","k")
,"3" -> List("a","c"))
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.map(_._2)}
//res0: Map[String,Iterable[String]] = Map(j -> List(2), a -> List(1, 2, 3), b -> List(1), c -> List(1, 3), k -> List(2))
Flatten the Map into a collection of tuples. groupBy will create a new Map with the old values as the new keys. Then un-tuple the values by removing the key (previously value) elements.
An alternative that does not rely on strange implicit arguments of flatten, as requested by yishaiz:
val m = Map(
"1" -> List("a","b","c"),
"2" -> List("a","j","k"),
"3" -> List("a","c"),
)
val res = (for ((digit, chars) <- m.toList; c <- chars) yield (c, digit))
.groupBy(_._1) // group by characters
.mapValues(_.unzip._2) // drop redundant digits from lists
res foreach println
gives:
(j,List(2))
(a,List(1, 2, 3))
(b,List(1))
(c,List(1, 3))
(k,List(2))
A simple nested for-comprehension may be used to invert the map in such a way that each value in the List of values are keys in the inverted map with respective keys as their values
implicit class MapInverter[T] (map: Map[T, List[T]]) {
def invert: Map[T, T] = {
val result = collection.mutable.Map.empty[T, T]
for ((key, values) <- map) {
for (v <- values) {
result += (v -> key)
}
}
result.toMap
}
Usage:
Map(10 -> List(3, 2), 20 -> List(16, 17, 18, 19)).invert

Convert List of Maps to Map of Lists based on Map Key

Lets say I have the following list:
val myList = List(Map(1 -> 1), Map(2 -> 2), Map(2 -> 7))
I want to convert this list to a single Map of Int -> List(Int) such that if we have duplicate keys then both values should be included in the resulting value list:
Map(2 -> List(7, 2), 1 -> List(1))
I came up with this working solution but it seems excessive and clunky:
myList.foldLeft(scala.collection.mutable.Map[Int,List[Int]]()) {(result,element) =>
for((k,v) <- element) {
if (result.keySet.contains(k)) {
result(k) = result(k).:: (v)
} else {
result += (k -> List(v))
}
}
result
}
Is there a better or more efficient approach here?
myList
.flatten
.groupBy(_._1)
.mapValues(_.map(_._2))
You can use a simpler (but probably less efficient) code:
val myList = List(Map(1 -> 1), Map(2 -> 2), Map(2 -> 7))
val grouped = myList.flatMap(_.toList).groupBy(_._1).mapValues(l => l.map(_._2))
println(grouped)
Map(2 -> List(2, 7), 1 -> List(1))
The idea is to first get List of all tuples from all inner Maps and then group them.
Starting Scala 2.13, we can now use groupMap which is a one-pass equivalent of a groupBy followed by mapValues (as its name suggests):
// val maps = List(Map(1 -> 1), Map(2 -> 2), Map(2 -> 7))
maps.flatten.groupMap(_._1)(_._2) // Map(1 -> List(1), 2 -> List(2, 7))
This:
flattens the list of maps into a list of tuples (List((1, 1), (2, 2), (2, 7)))
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2) (map part of groupMap)

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

Its a nested map with contents like this when i print it onto screen
(5, Map ( "ABCD" -> Map("3200" -> 3,
"3350.800" -> 4,
"200.300" -> 3)
(1, Map ( "DEF" -> Map("1200" -> 32,
"1320.800" -> 4,
"2100" -> 3)
I need to get something like this
Case Class( 5, ABCD 3200, 3)
Case Class(5, ABCD 3350.800, 4)
CaseClass(5,ABCD., 200.300, 3)
CaseClass(1, DEF 1200, 32)
CaseClass(1 DEF, 1320.800, 4)
etc etc.
basically a list of case classes
And map it to a case class object so that i can save it to cassandra.
I have tried doing flatMapValues but that un nests the map only one level. Also used flatMap . that doesnt work either or I'am making mistakes
Any suggestions ?
Fairly straightforward using a for-comprehension and some pattern matching to destructure things:
val in = List((5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3))),
(1, Map ("DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3))))
case class Thing(a:Int, b:String, c:String, d:Int)
for { (index, m) <- in
(k,v) <-m
(innerK, innerV) <- v}
yield Thing(index, k, innerK, innerV)
//> res0: List[maps.maps2.Thing] = List(Thing(5,ABCD,3200,3),
// Thing(5,ABCD,3350.800,4),
// Thing(5,ABCD,200.300,3),
// Thing(1,DEF,1200,32),
// Thing(1,DEF,1320.800,4),
// Thing(1,DEF,2100,3))
So let's pick part the for-comprehension
(index, m) <- in
This is the same as
t <- in
(index, m) = t
In the first line t will successively be set to each element of in.
t is therefore a tuple (Int, Map(...))
Patten matching lets us put that "patten" for the tuple on the right hand side and the compiler picks apart the tuple, sets index to the Int and m to the Map.
(k,v) <-m
As before this is equivalent to
u <-m
(k, v) = u
And this time u takes each element of Map. Which again are tuples of key and value. So k is set successively to each key and v to the value.
And v is your inner map so we do the same thing again with the inner map
(innerK, innerV) <- v}
Now we have everything we need to create the case class. yield just says make a collection of whatever is "yielded" each time through the loop.
yield Thing(index, k, innerK, innerV)
Under the hood, this just translates to a set of maps/flatmaps
The yield is just the value Thing(index, k, innerK, innerV)
We get one of those for each element of v
v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}
but there's an inner map per element of the outer map
m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
(flatMap because we get a List of Lists if we just did a map and we want to flatten it to just the list of items)
Similarly, we do one of those for every element in the List
in.flatMap (z => val (index, m) = z; m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
Let's do that in _1, _2 style-y.
in.flatMap (z=> z._2.flatMap{y=>y._2.map{x=>;Thing(z._1, y._1, x._1, x._2)}}}
which produces exactly the same result. But isn't it clearer as a for-comprehension?
You can do this like this if you prefer collection operation
case class Record(v1: Int, v2: String, v3: Double, v4: Int)
val data = List(
(5, Map ( "ABC" ->
Map(
3200. -> 3,
3350.800 -> 4,
200.300 -> 3))
),
(1, Map ( "DEF" ->
Map(
1200. -> 32,
1320.800 -> 4,
2100. -> 3))
)
)
val rdd = sc.parallelize(data)
val result = rdd.flatMap(p => {
p._2.toList
.flatMap(q => q._2.toList.map(l => (q._1, l)))
.map((p._1, _))
}).map(p => Record(p._1, p._2._1, p._2._2._1, p._2._2._2))
println(result.collect.toList)
//List(
// Record(5,ABC,3200.0,3),
// Record(5,ABC,3350.8,4),
// Record(5,ABC,200.3,3),
// Record(1,DEF,1200.0,32),
// Record(1,DEF,1320.8,4),
// Record(1,DEF,2100.0,3)
//)

how to union the element whose type is map of a vector to a map in scala?

I have a vector of Maps as below, how to convert them into one map?
scala> (1 to 100).takeWhile(_<10).map{x=>val y=x+1;Map(x->y)}
res8: scala.collection.immutable.IndexedSeq[scala.collection.immutable.Map[Int,Int]] = Vector(Map(1 -> 2), Map(2 -> 3), Map(3 -> 4), Map(4 -> 5), Map(5 -> 6), Map(6 -> 7), Map(7 -> 8), Map(8 -> 9), Map(9 -> 10))
If you do not need to turn each element into a map, then the tuples can go straight to a Map like this
(1 to 100).takeWhile(_<10).map{x=>val y=x+1;x->y}.toMap
If you do have to go from a Seq of maps, as shown in the question then fold may be used to join the maps together
val v = (1 to 100).takeWhile(_<10).map{x=>val y=x+1;Map(x->y)}
v.fold(Map.empty)((a,b) => a ++ b )
Fold works by starting with an initial value, in this case Map.empty and then performing an operation on that value and then keeping the result of that op to be used with the next element of the sequence. It then repeats for every element in the sequence. In the example that I have given, the operation was (a,b) => a ++ b, where a starts as the initial value and then is the result of each iteration and b is the current element being considered from the sequence being folded over.