Typesafe keys for a map - scala

Given the following code:
val m: Map[String, Int] = .. // fetch from somewhere
val keys: List[String] = m.keys.toList
val keysSubset: List[String] = ... // choose random keys
We can define the following method:
def sumValues(m: Map[String, Int], ks: List[String]): Int =
ks.map(m).sum
And call this as:
sumValues(m, keysSubset)
However, the problem with sumValues is that if ks happens to have a key not present on the map, the code will still compile but throw an exception at runtime. Ex:
// assume m = Map("two" -> 2, "three" -> 3)
sumValues(m, 1 :: Nil)
What I want instead is a definition for sumValues such that the ks argument should, at compile time, be guaranteed to only contain keys that are present on the map. As such, my guess is that the existing sumValues type signature needs to accept some form of implicit evidence that the ks argument is somehow derived from the list of keys of the map.
I'm not limited to a scala Map however, as any record-like structure would do. The map structure however won't have a hardcoded value, but something derived/passed on as an argument.
Note: I'm not really after summing the values, but more of figuring out a type signature for sumValues whose calls to it can only compile if the ks argument is provably from the list of keys the map (or record-like structure).

Another solution could be to map only the intersection (i.e. : between m keys and ks).
For example :
scala> def sumValues(m: Map[String, Int], ks: List[String]): Int = {
| m.keys.filter(ks.contains).map(m).sum
| }
sumValues: (m: Map[String,Int], ks: List[String])Int
scala> val map = Map("hello" -> 5)
map: scala.collection.immutable.Map[String,Int] = Map(hello -> 5)
scala> sumValues(map, List("hello", "world"))
res1: Int = 5
I think this solution is better than providing a default value because more generic (i.e. : you can use it not only with sums). However, I guess that this solution is less effective in term of performance because the intersection.
EDIT : As #jwvh pointed out in it message below, ks.intersect(m.keys.toSeq).map(m).sum is, to my opinion, more readable than m.keys.filter(ks.contains).map(m).sum.

Related

spark: value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]]

I'm new to spark and scala and I've come up with a compile error with scala:
Let's say we have a rdd, which is a map like this:
val rawData = someRDD.map{
//some ops
Map(
"A" -> someInt_var1 //Int
"B" -> someInt_var2 //Int
"C" -> somelong_var //Long
)
}
Then, I want to get histogram info of these vars. So, here is my code:
rawData.map{row => row.get("A")}.histogram(10)
And the compile error says:
value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]]
I'm wondering why rawData.map{row => row.get("A")} is org.apache.spark.rdd.RDD[Option[Any]] and how to transform it to rdd[Int]?
I have tried like this:
rawData.map{row => row.get("A")}.map{_.toInt}.histogram(10)
But it compiles fail:
value toInt is not a member of Option[Any]
I'm totally confused and seeking for help here.
You get Option because Map.get returns an option; Map.get returns None if the key doesn't exist in the Map; And Option[Any] is also related to the miscellaneous data types of the Map's Value, you have both Int and Long, in my case it returns AnyVal instead of Any;
A possible solution is use getOrElse to get rid of Option by providing a default value when the key doesn't exist, and if you are sure A's value is always a int, you can convert it from AnyVal to Int using asInstanceOf[Int];
A simplified example as follows:
val rawData = sc.parallelize(Seq(Map("A" -> 1, "B" -> 2, "C" -> 4L)))
rawData.map(_.get("A"))
// res6: org.apache.spark.rdd.RDD[Option[AnyVal]] = MapPartitionsRDD[9] at map at <console>:27
rawData.map(_.getOrElse("A", 0).asInstanceOf[Int]).histogram(10)
// res7: (Array[Double], Array[Long]) = (Array(1.0, 1.0),Array(1))

Scala: Default Value for a Map of Tuples

Look at the following Map:
scala> val v = Map("id" -> ("_id", "$oid")).withDefault(identity)
v: scala.collection.immutable.Map[String,java.io.Serializable] = Map(id -> (_id,$oid))
The compiler generates a Map[String,java.io.Serializable] and the value of id can be retrieved like this:
scala> v("id")
res37: java.io.Serializable = (_id,$oid)
Now, if I try to access an element that does not exist like this...
scala> v("idx")
res45: java.io.Serializable = idx
... then as expected I get back the key itself... but how do I get back a tuple with the key itself and an empty string like this?
scala> v("idx")
resXX: java.io.Serializable = (idx,"")
I always need to get back a tuple, regardless of whether or not the element exists.
Thanks.
Instead of .withDefault(identity) you can use
val v = Map("id" -> ("_id", "$oid")).withDefault(x => (x, ""))
withDefault takes as a parameter a function that will create the default value when needed.
This will also change the return type from useless Serializable to more useful (String, String).

What's the meaning of Seq[Int] as a key in Map[Seq[Int], FactorNode]?

I have the following line of code in Scala:
private val factorNodes: mutable.Map[Seq[Int], FactorNode] = mutable.Map[Seq[Int], FactorNode]()
So, this instantiates a mutable.Map but I don't understand the key -- Seq[Int].
Is Seq[Int] an array of integers or just a special way of indexing to a position in the map?
Seq[Int] is a trait (similar to a Java interface). When implemented, it's basically an array of integers. This means that your map uses arrays as keys. You can do something like the following:
val a: FactorNode = new FactorNode
val b: FactorNode = new FactorNode
val map: mutable.Map[Seq[Int], FactorNode] = mutable.Map(Seq(1,2,3) -> a)
map += (Seq(1,2,5) -> b)
// and to retrieve:
map(Seq(4,5,6)) // should fail.
map(Seq(1,2,5)) // should return b.

How to turn a list of objects into a map of two fields in Scala

I'm having a real brain fart here. I'm working with the Play Framework. I have a method which takes a map and turns it into a HTML select element. I had a one-liner to take a list of objects and convert it into a map of two of the object's fields, id and name. However, I'm a Java programmer and my Scala is weak, and I've only gone and forgotten the syntax of how I did it.
I had something like
organizations.all.map {org => /* org.prop1, org.prop2 */ }
Can anyone complete the commented part?
I would suggest:
map { org => (org.id, org.name) } toMap
e.g.
scala> case class T(val a : Int, val b : String)
defined class T
scala> List(T(1, "A"), T(2, "B"))
res0: List[T] = List(T(1,A), T(2,B))
scala> res0.map(t => (t.a, t.b))
res1: List[(Int, String)] = List((1,A), (2,B))
scala> res0.map(t => (t.a, t.b)).toMap
res2: scala.collection.immutable.Map[Int,String] = Map(1 -> A, 2 -> B)
You could also take an intermediary List out of the equation and go straight to the Map like this:
case class Org(prop1:String, prop2:Int)
val list = List(Org("foo", 1), Org("bar", 2))
val map:Map[String,Int] = list.map(org => (org.prop1, org.prop2))(collection.breakOut)
Using collection.breakOut as the implicit CanBuildFrom allows you to basically skip a step in the process of getting a Map from a List.

How to set and get keys from scala TreeMap?

Suppose I have
import scala.collection.immutable.TreeMap
val tree = new TreeMap[String, List[String]]
Now after above declaration, I want to assign key "k1" to List("foo", "bar")
and then how do i get or read back the key "k1" and also read back non-existent key "k2"?
what happens if I try to read non-existent key "k2" ?
The best way to "mutate" the immutable map is by referring to it in a variable (var as opposed to val):
var tree = TreeMap.empty[String, List[String]]
tree += ("k1" -> List("foo", "bar")) //a += b is sugar for "c = a + b; a = c"
It can be accessed directly using the apply method, where scala syntactic sugar kicks in so you can just access using parens:
val l = tree("k1") //equivalent to tree.apply("k1")
However, I rarely access maps like this because the method will throw a MatchError is the key is not present. Use get instead, which returns an Option[V] where V is the value-type:
val l = tree.get("k1") //returns Option[List[String]] = Some(List("foo", "bar"))
val m = tree.get("k2") //returns Option[List[String]] = None
In this case, the value returned for an absent key is None. What can I do with an optional result? Well, you can make use of methods map, flatMap, filter, collect and getOrElse. Try and avoid pattern-matching on it, or using the Option.get method directly!
For example:
val wordLen : List[Int] = tree.get("k1").map(l => l.map(_.length)) getOrElse Nil
EDIT: one way of building a Map without declaring it as a var, and assuming you are doing this by transforming some separate collection, is to do it via a fold. For example:
//coll is some collection class CC[A]
//f : A => (K, V)
val m = (TreeMap.empty[K, V] /: coll) { (tree, c) => tree + f(c) }
This may not be possible for your use case