Scala: What is the idiomatic syntax for transforming nested collections? - scala

For example: I have a list of Maps and I'd like to create a List from the values of the 3rd "column" of the maps...
val l = List(Map(1 -> "test", 2 -> "test", 3 -> "test"), Map(4 -> "test", 5 -> "test", 6 -> "test"))

Well, there's no ordering on maps, so the "third column" part of your question doesn't really make sense. If you mean something like "return a list of values that have map key 3"), they you could do
val l = List(Map(1 -> "test1", 2 -> "test2", 3 -> "test3"), Map(1 -> "test4", 2 -> "test5", 3 -> "test6"))
val thirdArgs= for(map<-l; value <-map.get(3)) yield value
// or equivalently val thirdArgs= l.flatMap(_.get(3))
println(thirdArgs)// prints List(test3, test6)
This relies on the fact that map.get(3) returns an Option[String], and the Scala for-comprehension syntax works with Option.
If you actually meant "third column", then the data structure you want isn't a map, but a tuple.
val l = List(("test1","test2","test3"), ("test4","test5", "test6"))
val thirdArgs= for(tuple<-l) yield tuple._3
// or equivalently val thirdArgs= l.map(_._3)
println(thirdArgs)// prints List(test3, test6)

Kim, we need an almost infinite amount of "hold my hand" simple Scala solutions posted on the web, so junior programmers can google them and get a running start. Here we go:
Maybe this is what you want:
scala> val l = List(Map(1 -> "test1", 2 -> "test2", 3 -> "test3"),
| Map(1 -> "test4", 2 -> "test5", 3 -> "test6"))
>l: List[scala.collection.immutable.Map[Int,java.lang.String]] = List(Map(1 -> test1, 2 -> test2, 3 -> test3), Map(1 -> test4, 2 -> test5, 3 -> test6))
The you can get the third "row" like this:
scala> l.map( numMap => numMap(3))
res1: List[java.lang.String] = List(test3, test6)

Related

Concatenate two Scala mutable maps preserving the keys of the first map

The scala API let's you append one map to another as follows:
import scala.collection.mutable.{Map => MutableMap}
val m1: MutableMap[Int,String] = MutableMap(1 -> "A", 2 -> "B", 3 -> "C")
val m2: MutableMap[Int,String] = MutableMap(2 -> "X", 3 -> "Y", 4 -> "Z")
m1 ++= m2 // outputs: Map(2 -> X, 4 -> Z, 1 -> A, 3 -> Y)
m1 // outputs: Map(2 -> X, 4 -> Z, 1 -> A, 3 -> Y)
The behaviour is to override the repeated pairs with the pairs coming from the right map.
What is a good way to do it in the opposite way? That is, concatenating the pairs of m1 and m2 in m1 where the pairs of m1 are kept if repeated in m2.
m1 ++= (m2 ++ m1) perhaps?
Do you have to mutate m1 (that's rarely the right thing to do in scala anyway)?
You could just create a new map as m2 ++ m1 otherwise ...
Store as a list (or similar collection) and group them:
val l1 = List(1 -> "A", 2 -> "B", 3 -> "C")
val l2 = List(2 -> "X", 3 -> "Y", 4 -> "Z")
(l1 ::: l2).groupBy(_._1) //Map[Int, List[Int, String]]
//output: Map(2 -> List((2,B), (2,X)), 4 -> List((4,Z)), 1 -> List((1,A)), 3 -> List((3,C), (3,Y)))
You can of course remove the leftover integers from the Map's value lists if you want.

How to update a nested immutable map

I'm trying to find a cleaner way to update nested immutable structures in Scala. I think I'm looking for something similar to assoc-in in Clojure. I'm not sure how much types factor into this.
For example, in Clojure, to update the "city" attribute of a nested map I'd do:
> (def person {:name "john", :dob "1990-01-01", :home-address {:city "norfolk", :state "VA"}})
#'user/person
> (assoc-in person [:home-address :city] "richmond")
{:name "john", :dob "1990-01-01", :home-address {:state "VA", :city "richmond"}}
What are my options in Scala?
val person = Map("name" -> "john", "dob" -> "1990-01-01",
"home-address" -> Map("city" -> "norfolk", "state" -> "VA"))
As indicated in the other answer, you can leverage case classes to get cleaner, typed data objects. But in case what you need is simply to update a map:
val m = Map("A" -> 1, "B" -> 2)
val m2 = m + ("A" -> 3)
The result (in a worksheet):
m: scala.collection.immutable.Map[String,Int] = Map(A -> 1, B -> 2)
m2: scala.collection.immutable.Map[String,Int] = Map(A -> 3, B -> 2)
The + operator on a Map will add the new key-value pair, overwriting if it already exists. Notably, though, because the original value is a val, you have to assign the result to a new val, because you cannot change the original.
Because, in your example, you're rewriting a nested value, doing this manually becomes somewhat more onerous:
val m = Map("A" -> 1, "B" -> Map("X" -> 2, "Y" -> 4))
val m2 = m + ("B" -> Map("X" -> 3))
This yields some loss-of-data (the nested Y value disappears):
m: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 2, Y -> 4))
m2: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 3)) // Note that 'Y' has gone away.
Thus, forcing you to copy the original value and then re-assign it back:
val m = Map("A" -> 1, "B" -> Map("X" -> 2, "Y" -> 4))
val b = m.get("B") match {
case Some(b: Map[String, Any]) => b + ("X" -> 3) // Will update `X` while keeping other key-value pairs
case None => Map("X" -> 3)
}
val m2 = m + ("B" -> b)
This yields the 'expected' result, but is obviously a lot of code:
m: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 2, Y -> 4))
b: scala.collection.immutable.Map[String,Any] = Map(X -> 3, Y -> 4)
m2: scala.collection.immutable.Map[String,Any] = Map(A -> 1, B -> Map(X -> 3, Y -> 4))
In short, with any immutable data structure when you 'update' it you're really copying all the pieces you want and then including updated values where appropriate. If the structure is complicated this can get onerous. Hence the recommendation that #0___ gave with, say, Monocle.
Scala is a statically typed language, so you may first want to increase the safety of your code by moving away from any-string-to-any-string.
case class Address(city: String, state: String)
case class Person(name: String, dob: java.util.Date, homeAddress: Address)
(Yes, there are better alternatives for java.util.Date).
Then you create an update like this:
val person = Person(name = "john", dob = new java.util.Date(90, 0, 1),
homeAddress = Address(city = "norfolk", state = "VA"))
person.copy(homeAddress = person.homeAddress.copy(city = "richmond"))
To avoid this nested copy, you would use a lens library, like Monocle or Quicklens (there are many others).
import com.softwaremill.quicklens._
person.modify(_.homeAddress.city).setTo("richmond")
The other two answers nicely sum up the importance of correctly modelling your problem so we don't end up having to deal with Map[String, Object] type of collection.
Just adding my two cents here for a brute force solution utilizing the quiet powerful function pipelining and higher order function features in Scala. The ugly asInstanceOf casting is needed because the Map values are of different types and hence Scala treats the Map signature as Map[String,Any].
val person: Map[String,Any] = Map("name" -> "john", "dob" -> "1990-01-01", "home-address" -> Map("city" -> "norfolk", "state" -> "VA"))
val newperson = person.map({case(k,v) => if(k == "home-address") v.asInstanceOf[Map[String,String]].updated("city","Virginia") else k -> v})

Fixed Length SortedMap in Scala

I'm new to Scala, does Scala support a fixed length SortedMap?
What I have in mind is a map that does the following:
Takes a max_size parameter upon creation
Upon an add, checks if there are already max_size elements
If there is, remove the smallest key and its value first (key's gonna be an Int)
Then adds the key and value to the map.
Strictly speaking, I don't need the map to be sorted, but it seems necessary/available if we're removing the smallest key
I wanted to ask before I started rolling my own. Also I will be running this under Samza, which I believe is single threaded and so concurrency won't be a concern.
I'm on scala 2.10
You can do something simple like this based on TreeMap which guarantees order of elements by key:
import scala.collection.immutable.TreeMap
def add[K,V](map: TreeMap[K,V], elem: (K,V), maxSize: Int): TreeMap[K,V] = {
map.takeRight(maxSize - 1) + elem
}
Here is how to use it:
scala> val m = TreeMap(1 -> "one", 2 -> "two", 3 -> "three")
m: scala.collection.immutable.TreeMap[Int,String] =
Map(1 -> one, 2 -> two, 3 -> three)
scala> val m1 = add(m, 0 -> "zero", 4)
m1: scala.collection.immutable.TreeMap[Int,String] =
Map(0 -> zero, 1 -> one, 2 -> two, 3 -> three)
scala> val m2 = add(m1, 4 -> "four", 4)
m2: scala.collection.immutable.TreeMap[Int,String] =
Map(1 -> one, 2 -> two, 3 -> three, 4 -> four)
scala> val m3 = add(m2, 5 -> "five", 4)
m3: scala.collection.immutable.TreeMap[Int,String] =
Map(2 -> two, 3 -> three, 4 -> four, 5 -> five)
scala> val m4 = add(m3, 0 -> "zero", 4)
m4: scala.collection.immutable.TreeMap[Int,String] =
Map(0 -> zero, 3 -> three, 4 -> four, 5 -> five)
You can obviously try to make it more convenient to suit your needs.
Aleksey's answer was very helpful. I made a small fix to it
import scala.collection.immutable.TreeMap
def add[K,V](map: TreeMap[K,V], elem: (K,V), maxSize: Int): TreeMap[K,V] = {
(map + elem).takeRight(maxSize - 1)
}
val m = TreeMap(1 -> "one", 2 -> "two", 3 -> "three")
val m1 = add(m, 0 -> "zero", 4)
val m2 = add(m1, 4 -> "four", 4)
val m3 = add(m2, 0 -> "zero", 4)
val m4 = add(m3, 1 -> "one", 4)
val m5 = add(m4, 0 -> "zero", 4)
val m6 = add(m5, 1 -> "one", 4)

Scala - Undo a flatmap after transformation

How can I merge a Seq of Maps to a single Map i.e.
Seq[Map[String, String]] => Map[String, String]
For example:
val someSeq = rdd.map(_._2).flatMap(...) //some transformation to produce the sequence of maps
where someSeq is Seq(student1, student2) and student1 and student2 are Maps :
var student1 = Map(a -> "1", b -> "1")
var student2 = Map(c -> "1", d -> "1")
I need a result like this:
val apps = Map(a -> "1", b -> "1", c -> "1", d -> "1")
Any idea ?
Unrelated to Spark, but one approach would be to fold over the sequence as follows:
val student1 = Map("a" -> "1", "b" -> "1")
val student2 = Map("c" -> "1", "d" -> "1")
val students = Seq(student1, student2)
students.foldLeft(Map[String, String]())(_ ++ _)
Returns
Map(a -> 1, b -> 1, c -> 1, d -> 1)
In regards to "undoing" a flatMap, I don't believe this is really possible. In order to achieve that, consider the notion of undoing a "flatten".
For example:
val x = Seq(1, 2)
val y = Seq(3, 4)
val combined = Seq(x, y)
val flattened = combined.flatten
val b = Seq(1, 2, 3)
val c = Seq(4)
val combined2 = Seq(b, c)
val flattened2 = combined2.flatten
flattened == flattened2
Returns true.
So basically, in this instance, you can go from unflattened to flattened, but not vice versa, because vice versa would yield multiple answers.

Compare two Maps in Scala

Is there any pre-defined function that I can use to compare two Maps based on the key and give me the difference? Right now, I iterate Map1 and foreach key, I check if there is an element in Map2 and I pattern match to find the difference. Is there a much elegant way to do this?
Consider the difference between the maps converted into sets of tuples,
(m1.toSet diff m2.toSet).toMap
Try:
val diff = (m1.keySet -- m2.keySet) ++ (m2.keySet -- m1.keySet)
diff contains the elements that are in m1 and not in m2 and that are in m2 and not in m1.
This solution looks like right way:
scala> val x = Map(1 -> "a", 2 -> "b", 3 -> "c")
x: scala.collection.immutable.Map[Int,String] = Map(1 -> a, 2 -> b, 3 -> c)
scala> val y = Map(1 -> "a", 2 -> "b", 4 -> "d")
y: scala.collection.immutable.Map[Int,String] = Map(1 -> a, 2 -> b, 4 -> d)
scala> val diff : Map[Int, String] = x -- y.keySet
diff: Map[Int,String] = Map(3 -> c)
Found it here https://gist.github.com/frgomes/69068062e7849dfe9d5a53bd3543fb81
I think the -- operator will do what you're looking for: http://www.scala-lang.org/api/current/index.html#scala.collection.Map#--(xs:scala.collection.GenTraversableOnce[A]):Repr
Although this will probably only work given the assumption that Map2 is always a subset of Map1...