Scala - Undo a flatmap after transformation - scala

How can I merge a Seq of Maps to a single Map i.e.
Seq[Map[String, String]] => Map[String, String]
For example:
val someSeq = rdd.map(_._2).flatMap(...) //some transformation to produce the sequence of maps
where someSeq is Seq(student1, student2) and student1 and student2 are Maps :
var student1 = Map(a -> "1", b -> "1")
var student2 = Map(c -> "1", d -> "1")
I need a result like this:
val apps = Map(a -> "1", b -> "1", c -> "1", d -> "1")
Any idea ?

Unrelated to Spark, but one approach would be to fold over the sequence as follows:
val student1 = Map("a" -> "1", "b" -> "1")
val student2 = Map("c" -> "1", "d" -> "1")
val students = Seq(student1, student2)
students.foldLeft(Map[String, String]())(_ ++ _)
Returns
Map(a -> 1, b -> 1, c -> 1, d -> 1)
In regards to "undoing" a flatMap, I don't believe this is really possible. In order to achieve that, consider the notion of undoing a "flatten".
For example:
val x = Seq(1, 2)
val y = Seq(3, 4)
val combined = Seq(x, y)
val flattened = combined.flatten
val b = Seq(1, 2, 3)
val c = Seq(4)
val combined2 = Seq(b, c)
val flattened2 = combined2.flatten
flattened == flattened2
Returns true.
So basically, in this instance, you can go from unflattened to flattened, but not vice versa, because vice versa would yield multiple answers.

Related

Invert a Map (String -> List) in Scala

I have a Map[String, List[String]] and I want to invert it. For example, if I have something like
"1" -> List("a","b","c")
"2" -> List("a","j","k")
"3" -> List("a","c")
The result should be
"a" -> List("1","2","3")
"b" -> List("1")
"c" -> List("1","3")
"j" -> List("2")
"k" -> List("2")
I've tried this:
m.map(_.swap)
But it returns a Map[List[String], String]:
List("a","b","c") -> "1"
List("a","j","k") -> "2"
List("a","c") -> "3"
Map inversion is a little more complicated.
val m = Map("1" -> List("a","b","c")
,"2" -> List("a","j","k")
,"3" -> List("a","c"))
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.map(_._2)}
//res0: Map[String,Iterable[String]] = Map(j -> List(2), a -> List(1, 2, 3), b -> List(1), c -> List(1, 3), k -> List(2))
Flatten the Map into a collection of tuples. groupBy will create a new Map with the old values as the new keys. Then un-tuple the values by removing the key (previously value) elements.
An alternative that does not rely on strange implicit arguments of flatten, as requested by yishaiz:
val m = Map(
"1" -> List("a","b","c"),
"2" -> List("a","j","k"),
"3" -> List("a","c"),
)
val res = (for ((digit, chars) <- m.toList; c <- chars) yield (c, digit))
.groupBy(_._1) // group by characters
.mapValues(_.unzip._2) // drop redundant digits from lists
res foreach println
gives:
(j,List(2))
(a,List(1, 2, 3))
(b,List(1))
(c,List(1, 3))
(k,List(2))
A simple nested for-comprehension may be used to invert the map in such a way that each value in the List of values are keys in the inverted map with respective keys as their values
implicit class MapInverter[T] (map: Map[T, List[T]]) {
def invert: Map[T, T] = {
val result = collection.mutable.Map.empty[T, T]
for ((key, values) <- map) {
for (v <- values) {
result += (v -> key)
}
}
result.toMap
}
Usage:
Map(10 -> List(3, 2), 20 -> List(16, 17, 18, 19)).invert

Scala two map merge

How can I merge maps like below:
val map1 = Map(1 -> "a", 2 -> "b")
val map2 = Map("a" -> "A", "b" -> "B")
After merged.
Merged = Map( 1 -> List("a", "A"), 2 -> List("b", "B"))
Can be List, Set or any other collection who has size attribute.
I'm not sure I understand what are you searching for exactly, but to achieve that for the provided example you could do:
val map1 = Map(1 -> "a", 2 -> "b")
val map2 = Map("a" -> "A", "b" -> "B")
map1.mapValues(value => (value, map2(value)))
However you should be careful to have every value from a as a key in b (I just assumed this happens from the provided example).
Given two maps with value1 as key2
scala> val x = Map(1 -> "a", 2 -> "b")
x: scala.collection.immutable.Map[Int,String] = Map(1 -> a, 2 -> b)
scala> val y = Map("a" -> "A", "b" -> "B")
y: scala.collection.immutable.Map[String,String] = Map(a -> A, b -> B)
Merge as Map(k1 -> List(v1, v2))
scala> val z = x.map { case (k1, v1) => (k1, List(v1, y(v1))) }
z: scala.collection.immutable.Map[Int,List[String]] = Map(1 -> List(a, A), 2 -> List(b, B))
You basically need to get value from first map then lookup the second map, and just create a List out of those (v1, v2).
Try This
scala> val map1 = Map(1 -> "a", 2 -> "b")
map1: scala.collection.immutable.Map[Int,String] = Map(1 -> a, 2 -> b)
scala> val map2 = Map("a" -> "A", "b" -> "B")
map2: scala.collection.immutable.Map[String,String] = Map(a -> A, b -> B)
scala> map1.zip(map2).map(x=>x._1._1 -> List(x._2._1,x._2._2))
res44: scala.collection.immutable.Map[Int,List[String]] = Map(1 -> List(a, A), 2 -> List(b, B))

How to update a nested immutable map

I'm trying to find a cleaner way to update nested immutable structures in Scala. I think I'm looking for something similar to assoc-in in Clojure. I'm not sure how much types factor into this.
For example, in Clojure, to update the "city" attribute of a nested map I'd do:
> (def person {:name "john", :dob "1990-01-01", :home-address {:city "norfolk", :state "VA"}})
#'user/person
> (assoc-in person [:home-address :city] "richmond")
{:name "john", :dob "1990-01-01", :home-address {:state "VA", :city "richmond"}}
What are my options in Scala?
val person = Map("name" -> "john", "dob" -> "1990-01-01",
"home-address" -> Map("city" -> "norfolk", "state" -> "VA"))
As indicated in the other answer, you can leverage case classes to get cleaner, typed data objects. But in case what you need is simply to update a map:
val m = Map("A" -> 1, "B" -> 2)
val m2 = m + ("A" -> 3)
The result (in a worksheet):
m: scala.collection.immutable.Map[String,Int] = Map(A -> 1, B -> 2)
m2: scala.collection.immutable.Map[String,Int] = Map(A -> 3, B -> 2)
The + operator on a Map will add the new key-value pair, overwriting if it already exists. Notably, though, because the original value is a val, you have to assign the result to a new val, because you cannot change the original.
Because, in your example, you're rewriting a nested value, doing this manually becomes somewhat more onerous:
val m = Map("A" -> 1, "B" -> Map("X" -> 2, "Y" -> 4))
val m2 = m + ("B" -> Map("X" -> 3))
This yields some loss-of-data (the nested Y value disappears):
m: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 2, Y -> 4))
m2: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 3)) // Note that 'Y' has gone away.
Thus, forcing you to copy the original value and then re-assign it back:
val m = Map("A" -> 1, "B" -> Map("X" -> 2, "Y" -> 4))
val b = m.get("B") match {
case Some(b: Map[String, Any]) => b + ("X" -> 3) // Will update `X` while keeping other key-value pairs
case None => Map("X" -> 3)
}
val m2 = m + ("B" -> b)
This yields the 'expected' result, but is obviously a lot of code:
m: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 2, Y -> 4))
b: scala.collection.immutable.Map[String,Any] = Map(X -> 3, Y -> 4)
m2: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 3, Y -> 4))
In short, with any immutable data structure when you 'update' it you're really copying all the pieces you want and then including updated values where appropriate. If the structure is complicated this can get onerous. Hence the recommendation that #0___ gave with, say, Monocle.
Scala is a statically typed language, so you may first want to increase the safety of your code by moving away from any-string-to-any-string.
case class Address(city: String, state: String)
case class Person(name: String, dob: java.util.Date, homeAddress: Address)
(Yes, there are better alternatives for java.util.Date).
Then you create an update like this:
val person = Person(name = "john", dob = new java.util.Date(90, 0, 1),
homeAddress = Address(city = "norfolk", state = "VA"))
person.copy(homeAddress = person.homeAddress.copy(city = "richmond"))
To avoid this nested copy, you would use a lens library, like Monocle or Quicklens (there are many others).
import com.softwaremill.quicklens._
person.modify(_.homeAddress.city).setTo("richmond")
The other two answers nicely sum up the importance of correctly modelling your problem so we don't end up having to deal with Map[String, Object] type of collection.
Just adding my two cents here for a brute force solution utilizing the quiet powerful function pipelining and higher order function features in Scala. The ugly asInstanceOf casting is needed because the Map values are of different types and hence Scala treats the Map signature as Map[String,Any].
val person: Map[String,Any] = Map("name" -> "john", "dob" -> "1990-01-01", "home-address" -> Map("city" -> "norfolk", "state" -> "VA"))
val newperson = person.map({case(k,v) => if(k == "home-address") v.asInstanceOf[Map[String,String]].updated("city","Virginia") else k -> v})

Custom ordering in TreeMap

Here are examples that I have been playing with:
import collection.immutable.{TreeSet, TreeMap}
val ts = TreeSet(9, 23, 1, 2)
ts
val tm = TreeMap(3 -> "c", 1 -> "a", 2 -> "b")
tm
// convert a map to a sorted map
val m = Map("98" -> List(4, 12, 14), "001" -> List(22, 11))
val t = TreeMap(m.toSeq: _*)
t // sorted by key
// sort an unsorted map
m.toSeq.sortWith((x, y) => x._2(0) < y._2(0))
// add a unsorted map into a sorted map
val m1 = Map("07" -> List(3, 5, 1), "05" -> List(12, 5, 3))
val t1: TreeMap[String, List[Int]] = t ++ m1
t1 // "001" is the first key
I can use sortWith on a Map to get a custom ordering, what if I want to use a TreeMap that uses a different ordering than the default?
You can't use Map's values to define default ordering of a Map.
TreeMap[A,B]'s constructor accepts an implicit Ordering[A] parameter, so you could do something like this:
// Will sort according to default Int ordering (ascending by numeric value)
scala> val tm = TreeMap(3 -> "c", 1 -> "a", 2 -> "b")
tm: scala.collection.immutable.TreeMap[Int,String] = Map(1 -> a, 2 -> b, 3 -> c)
// A wild implicit appears! (orders descending by numeric value)
scala> implicit val tmOrd = Ordering[Int].on((x:Int) => -x)
tmOrd: scala.math.Ordering[Int] = scala.math.Ordering$$anon$5#1d8e2eea
// Our implicit is implicitly (yeah) used by constructor
scala> val invTm = TreeMap(3 -> "c", 1 -> "a", 2 -> "b")
invTm: scala.collection.immutable.TreeMap[Int,String] = Map(3 -> c, 2 -> b, 1 -> a)
Note that it's safer to limit a scope of implicits like this one. If you can, you should construct an (not-implicit) object and pass it manually, or separate the scope of implicit declaration from the place where other code can be affected by its presence.
The reason behind this is that TreeMap is built on top of a tree that uses keys' values to maintain structure constraints that allow for efficient data reads/writes based on keys, which is the primary purpose of a Map. Ordering on values in a Map simply makes no sense.
Upd.: The complexity of ordering logic doesn't mean anything. According to your comment:
scala> object ComplexOrdering extends Ordering[Int] {
| def compare(a: Int, b: Int) = {
| if(a == 3) -1 else if(a == 2 * b) -1 else if(a == 3 * b) 0 else 1
| }
| }
defined object ComplexOrdering
scala> val tm = TreeMap(3 -> "c", 1 -> "a", 2 -> "b")
tm: scala.collection.immutable.TreeMap[Int,String] = Map(1 -> a, 2 -> b, 3 -> c)
scala> val tm = TreeMap(3 -> "c", 1 -> "a", 2 -> "b")(ComplexOrdering)
tm: scala.collection.immutable.TreeMap[Int,String] = Map(3 -> c, 2 -> b, 1 -> a)
TreeMap is defined as a Map-like type with a specified ordering of its keys. That ordering is given by an implicit parameter to the constructor:
new TreeMap()(implicit ordering: Ordering[A]) // For TreeMap[A,B]
so you can set an alternative ordering on the keys at construction by explicitly providing a custom Ordering[A].
The class does not, however, provide any (direct) means of setting an ordering based on the values. What you have with calling .toSeq.sortWith is about the best you can do as far as I know, short of coding your own collection type.

Scala: What is the idiomatic syntax for transforming nested collections?

For example: I have a list of Maps and I'd like to create a List from the values of the 3rd "column" of the maps...
val l = List(Map(1 -> "test", 2 -> "test", 3 -> "test"), Map(4 -> "test", 5 -> "test", 6 -> "test"))
Well, there's no ordering on maps, so the "third column" part of your question doesn't really make sense. If you mean something like "return a list of values that have map key 3"), they you could do
val l = List(Map(1 -> "test1", 2 -> "test2", 3 -> "test3"), Map(1 -> "test4", 2 -> "test5", 3 -> "test6"))
val thirdArgs= for(map<-l; value <-map.get(3)) yield value
// or equivalently val thirdArgs= l.flatMap(_.get(3))
println(thirdArgs)// prints List(test3, test6)
This relies on the fact that map.get(3) returns an Option[String], and the Scala for-comprehension syntax works with Option.
If you actually meant "third column", then the data structure you want isn't a map, but a tuple.
val l = List(("test1","test2","test3"), ("test4","test5", "test6"))
val thirdArgs= for(tuple<-l) yield tuple._3
// or equivalently val thirdArgs= l.map(_._3)
println(thirdArgs)// prints List(test3, test6)
Kim, we need an almost infinite amount of "hold my hand" simple Scala solutions posted on the web, so junior programmers can google them and get a running start. Here we go:
Maybe this is what you want:
scala> val l = List(Map(1 -> "test1", 2 -> "test2", 3 -> "test3"),
| Map(1 -> "test4", 2 -> "test5", 3 -> "test6"))
>l: List[scala.collection.immutable.Map[Int,java.lang.String]] = List(Map(1 -> test1, 2 -> test2, 3 -> test3), Map(1 -> test4, 2 -> test5, 3 -> test6))
The you can get the third "row" like this:
scala> l.map( numMap => numMap(3))
res1: List[java.lang.String] = List(test3, test6)