Scala Nested HashMaps, how to access Case Class value properties? - scala

New to Scala, continue to struggle with Option related code. I have a HashMap built of Case Class instances that themselves contain hash maps with Case Class instance values. It is not clear to me how to access properties of the retrieved Class instances:
import collection.mutable.HashMap
case class InnerClass(name: String, age: Int)
case class OuterClass(name: String, nestedMap: HashMap[String, InnerClass])
// Load some data...hash maps are mutable
val innerMap = new HashMap[String, InnerClass]()
innerMap += ("aaa" -> InnerClass("xyz", 0))
val outerMap = new HashMap[String, OuterClass]()
outerMap += ("AAA" -> OuterClass("XYZ", innerMap))
// Try to retrieve data
val outerMapTest = outerMap.getOrElse("AAA", None)
val nestedMap = outerMapTest.nestedMap
This produces error: value nestedMap is not a member of Option[ScalaFiddle.OuterClass]
// Try to retrieve data a different way
val outerMapTest = outerMap.getOrElse("AAA", None)
val nestedMap = outerMapTest.nestedMap
This produces error: value nestedMap is not a member of Product with Serializable
Please advise on how I would go about getting access to outerMapTest.nestedMap. I'll eventually need to get values and properties out of the nestedMap HashMap as well.

Since you are using .getOrElse("someKey", None) which returns you a type Product (not the actual type as you expect to be OuterClass)
scala> val outerMapTest = outerMap.getOrElse("AAA", None)
outerMapTest: Product with Serializable = OuterClass(XYZ,Map(aaa -> InnerClass(xyz,0)))
so Product either needs to be pattern matched or casted to OuterClass
pattern match example
scala> outerMapTest match { case x : OuterClass => println(x.nestedMap); case _ => println("is not outerclass") }
Map(aaa -> InnerClass(xyz,0))
Casting example which is a terrible idea when outerMapTest is None, (pattern matching is favored over casting)
scala> outerMapTest.asInstanceOf[OuterClass].nestedMap
res30: scala.collection.mutable.HashMap[String,InnerClass] = Map(aaa -> InnerClass(xyz,0))
But better way of solving it would simply use .get which very smart and gives you Option[OuterClass],
scala> outerMap.get("AAA").map(outerClass => outerClass.nestedMap)
res27: Option[scala.collection.mutable.HashMap[String,InnerClass]] = Some(Map(aaa -> InnerClass(xyz,0)))
For key that does not exist, gives you None
scala> outerMap.get("I dont exist").map(outerClass => outerClass.nestedMap)
res28: Option[scala.collection.mutable.HashMap[String,InnerClass]] = None

Here are some steps you can take to get deep inside a nested structure like this.
outerMap.lift("AAA") // Option[OuterClass]
.map(_.nestedMap) // Option[HashMap[String,InnerClass]]
.flatMap(_.lift("aaa")) // Option[InnerClass]
.map(_.name) // Option[String]
.getOrElse("no name") // String
Notice that if either of the inner or outer maps doesn't have the specified key ("aaa" or "AAA" respectively) then the whole thing will safely result in the default string ("no name").

A HashMap will return None if a key is not found so it is unnecessary to do getOrElse to return None if the key is not found.
A simple solution to your problem would be to use get only as below
Change your first get as
val outerMapTest = outerMap.get("AAA").get
you can check the output as
println(outerMapTest.name)
println(outerMapTest.nestedMap)
And change the second get as
val nestedMap = outerMapTest.nestedMap.get("aaa").get
You can test the outputs as
println(nestedMap.name)
println(nestedMap.age)
Hope this is helpful

You want
val maybeInner = outerMap.get("AAA").flatMap(_.nestedMap.get("aaa"))
val maybeName = maybeInner.map(_.name)
Which if your feeling adventurous you can get with
val name: String = maybeName.get
But that will throw an error if its not there. If its a None

you can access the nestMap using below expression.
scala> outerMap.get("AAA").map(_.nestedMap).getOrElse(HashMap())
res5: scala.collection.mutable.HashMap[String,InnerClass] = Map(aaa -> InnerClass(xyz,0))
if "AAA" didnt exist in the outerMap Map object then the below expression would have returned an empty HashMap as indicated in the .getOrElse method argument (HashMap()).

Related

spark: value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]]

I'm new to spark and scala and I've come up with a compile error with scala:
Let's say we have a rdd, which is a map like this:
val rawData = someRDD.map{
//some ops
Map(
"A" -> someInt_var1 //Int
"B" -> someInt_var2 //Int
"C" -> somelong_var //Long
)
}
Then, I want to get histogram info of these vars. So, here is my code:
rawData.map{row => row.get("A")}.histogram(10)
And the compile error says:
value histogram is not a member of org.apache.spark.rdd.RDD[Option[Any]]
I'm wondering why rawData.map{row => row.get("A")} is org.apache.spark.rdd.RDD[Option[Any]] and how to transform it to rdd[Int]?
I have tried like this:
rawData.map{row => row.get("A")}.map{_.toInt}.histogram(10)
But it compiles fail:
value toInt is not a member of Option[Any]
I'm totally confused and seeking for help here.
You get Option because Map.get returns an option; Map.get returns None if the key doesn't exist in the Map; And Option[Any] is also related to the miscellaneous data types of the Map's Value, you have both Int and Long, in my case it returns AnyVal instead of Any;
A possible solution is use getOrElse to get rid of Option by providing a default value when the key doesn't exist, and if you are sure A's value is always a int, you can convert it from AnyVal to Int using asInstanceOf[Int];
A simplified example as follows:
val rawData = sc.parallelize(Seq(Map("A" -> 1, "B" -> 2, "C" -> 4L)))
rawData.map(_.get("A"))
// res6: org.apache.spark.rdd.RDD[Option[AnyVal]] = MapPartitionsRDD[9] at map at <console>:27
rawData.map(_.getOrElse("A", 0).asInstanceOf[Int]).histogram(10)
// res7: (Array[Double], Array[Long]) = (Array(1.0, 1.0),Array(1))

How to set a default map value as a new object in scala?

I'm trying to create a Map of objects like this:
case class Worker(var name:String, var tasks:List[Int]=List())
val map = Map[Int, Worker]().withDefaultValue(Worker(""))
So that whenever a new request id (the key in the map) comes in I can create a new worker object for that id. I do not know how many request ids will come in or the range of id beforehand. But this code gives a very weird result.
scala> map.size
res34: Int = 0
scala> map(1).name = "first worker"
map(1).name: String = first worker
scala> map.size
res35: Int = 0
scala> map(3).name = "request #3"
map(3).name: String = request #3
scala> map.size
res36: Int = 0
scala> map(1).name
res37: String = request #3
It is not the result I expected. The map size is always 0. There is no new Worker objects created. I tried with mutable and immutable Maps, case class and regular class with new Worker and also tried { override def default(key:Int)=Worker("") }. Nothing works as expected. Can someone help me understand the behavior of withDefaultValue or override def default? or scala way to do this? I know I can achieve this the java way if map.containsKey(xx) map.get(xx).name = "abc" else map.put(xx, new Worker). But it is tedious.
Accessing the default value of a Map does not add that value to the Map itself as a new value. You are simply mutating the same default value each time (same reference). It would be quite unexpected for a standard Map implementation to mutate itself.
To implement what you want, you might actually want to override the apply method to add a new element to the Map when one is not found.
import scala.collection.mutable.HashMap
case class Worker(var name: String, var tasks: List[Int] = Nil)
class RequestMap extends HashMap[Int, Worker] { self =>
override def apply(a: Int): Worker = {
super.get(a) match { // Find the element through the parent implementation
case Some(value) => value // If it exists, return it
case None => {
val w = Worker("") // Otherwise make a new one
self += a -> w // Add it to the Map with the desired key
w // And return it
}
}
}
}
This does what you want, but it is just an example of a direction you can take. It is not thread safe, and would need some sort of locking mechanism to make it so.
scala> val map = new RequestMap
map: RequestMap = Map()
scala> map(1)
res3: Worker = Worker(,List())
scala> map(2)
res4: Worker = Worker(,List())
scala> map.size
res5: Int = 2

How do I append to a listbuffer which is a value of a mutable map in Scala?

val mymap= collection.mutable.Map.empty[String,Seq[String]]
mymap("key") = collection.mutable.ListBuffer("a","b")
mymap.get("key") += "c"
The last line to append to the list buffer is giving error. How the append can be done ?
When you run the code in the scala console:
→$scala
scala> val mymap= collection.mutable.Map.empty[String,Seq[String]]
mymap: scala.collection.mutable.Map[String,Seq[String]] = Map()
scala> mymap("key") = collection.mutable.ListBuffer("a","b")
scala> mymap.get("key")
res1: Option[Seq[String]] = Some(ListBuffer(a, b))
You'll see that mymap.get("key") is an optional type. You can't add a string to the optional type.
Additionally, since you typed mymap to Seq[String], Seq[String] does not have a += operator taking in a String.
The following works:
val mymap= collection.mutable.Map.empty[String,collection.mutable.ListBuffer[String]]
mymap("key") = collection.mutable.ListBuffer("a","b")
mymap.get("key").map(_ += "c")
Using the .map function will take advantage of the optional type and prevent noSuchElementException as Łukasz noted.
To deal with your problems one at a time:
Map.get returns an Option[T] and Option does not provide a += or + method.
Even if you use Map.apply (mymap("key")) the return type of apply will be V (in this case Seq) regardless of what the actual concrete type is (Vector, List, Set, etc.). Seq does not provide a += method, and its + method expects another Seq.
Given that, to get what you want you need to declare the type of the Map to be a mutable type:
import collection.mutable.ListBuffer
val mymap= collection.mutable.Map.empty[String,ListBuffer[String]]
mymap("key") = ListBuffer("a","b")
mymap("key") += "c"
will work as you expect it to.
If you really want to have immutable value, then something like this should also work:
val mymap= collection.mutable.Map.empty[String,Seq[String]]
mymap("key") = Vector("a","b")
val oldValue = mymap.get("key").getOrElse(Vector[String]())
mymap("key") = oldValue :+ "c"
I used Vector here, because adding elements to the end of List is unefficient by design.

Scala: Default Value for a Map of Tuples

Look at the following Map:
scala> val v = Map("id" -> ("_id", "$oid")).withDefault(identity)
v: scala.collection.immutable.Map[String,java.io.Serializable] = Map(id -> (_id,$oid))
The compiler generates a Map[String,java.io.Serializable] and the value of id can be retrieved like this:
scala> v("id")
res37: java.io.Serializable = (_id,$oid)
Now, if I try to access an element that does not exist like this...
scala> v("idx")
res45: java.io.Serializable = idx
... then as expected I get back the key itself... but how do I get back a tuple with the key itself and an empty string like this?
scala> v("idx")
resXX: java.io.Serializable = (idx,"")
I always need to get back a tuple, regardless of whether or not the element exists.
Thanks.
Instead of .withDefault(identity) you can use
val v = Map("id" -> ("_id", "$oid")).withDefault(x => (x, ""))
withDefault takes as a parameter a function that will create the default value when needed.
This will also change the return type from useless Serializable to more useful (String, String).

Scala Macros: Checking for a certain annotation

Thanks to the answers to my previous question, I was able to create a function macro such that it returns a Map that maps each field name to its value of a class, e.g.
...
trait Model
case class User (name: String, age: Int, posts: List[String]) extends Model {
val numPosts: Int = posts.length
...
def foo = "bar"
...
}
So this command
val myUser = User("Foo", 25, List("Lorem", "Ipsum"))
myUser.asMap
returns
Map("name" -> "Foo", "age" -> 25, "posts" -> List("Lorem", "Ipsum"), "numPosts" -> 2)
This is where Tuples for the Map are generated (see Travis Brown's answer):
...
val pairs = weakTypeOf[T].declarations.collect {
case m: MethodSymbol if m.isAccessor =>
val name = c.literal(m.name.decoded)
val value = c.Expr(Select(model, m.name))
reify(name.splice -> value.splice).tree
}
...
Now I want to ignore fields that have #transient annotation. How would I check if a method has a #transient annotation?
I'm thinking of modifying the snippet above as
val pairs = weakTypeOf[T].declarations.collect {
case m: MethodSymbol if m.isAccessor && !m.annotations.exists(???) =>
val name = c.literal(m.name.decoded)
val value = c.Expr(Select(model, m.name))
reify(name.splice -> value.splice).tree
}
but I can't find what I need to write in exists part. How would I get #transient as an Annotation so I could pass it there?
Thanks in advance!
The annotation will be on the val itself, not on the accessor. The easiest way to access the val is through the accessed method on MethodSymbol:
def isTransient(m: MethodSymbol) = m.accessed.annotations.exists(
_.tpe =:= typeOf[scala.transient]
)
Now you can just write the following in your collect:
case m: MethodSymbol if m.isAccessor && !isTransient(m) =>
Note that the version of isTransient I've given here has to be defined in your macro, since it needs the imports from c.universe, but you could factor it out by adding a Universe argument if you're doing this kind of thing in several macros.