efficent way Add key and value to Map of Set in scala - scala

I have an scala Map like:
val myMap: mutable.Map[String, mutable.Set[String]]= mutable.Map[String, mutable.Set[String]]()
I would like to add in the more efficient way an element key: String and another value. The addition will check if the new key String is in the Map in positive case then add the new value to the current corresponding Set. If the key is not present, then add the key and create a new Set of values with the first element: value.
Regards
Yasset

Given a (key, value) if key is present in the map then value is added to the set. if key is not present then key with empty set it added to the map.
def update(key: String, value: String, map: Map[String, Set[String]]): Unit =
map.get(key)
.map(_ => map(key) += value)
.getOrElse(map(key) = Set[String](value))

What you basically need is a MultiMap.
import collection.mutable.{ HashMap, MultiMap, Set }
val mm = new HashMap[Int, Set[String]] with MultiMap[Int, String]
mm.addBinding(1, "a")
mm.addBinding(2, "b")
mm.addBinding(1, "c")
println(mm)
//prints Map(2 -> Set(b), 1 -> Set(c, a))
Guava comes with different types of MultiMap. Array backed, hash based, Linked List based etc. And you don't have to mixin to create multimaps. The api is quite elaborate.
With Scala Multimap you have a restriction to use Map[A,Set[B]]. You cannot do new HashMap[Int, List[String]] with MultiMap[Int, String].

Related

Scala create immutable nested map

I have a situation here
I have two strins
val keyMap = "anrodiApp,key1;iosApp,key2;xyz,key3"
val tentMap = "androidApp,tenant1; iosApp,tenant1; xyz,tenant2"
So what I want to add is to create a nested immutable nested map like this
tenant1 -> (andoidiApp -> key1, iosApp -> key2),
tenant2 -> (xyz -> key3)
So basically want to group by tenant and create a map of keyMap
Here is what I tried but is done using mutable map which I do want, is there a way to create this using immmutable map
case class TenantSetting() {
val requesterKeyMapping = new mutable.HashMap[String, String]()
}
val requesterKeyMapping = keyMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map(keyValuePair => (keyValuePair[0],keyValuePair[1]))
.toMap
}.flatten.toMap
val config = new mutable.HashMap[String, TenantSetting]
tentMap.split(";")
.map { keyValueList => keyValueList.split(',')
.filter(_.size==2)
.map { keyValuePair =>
val requester = keyValuePair[0]
val tenant = keyValuePair[1]
if (!config.contains(tenant)) config.put(tenant, new TenantSetting)
config.get(tenant).get.requesterKeyMapping.put(requester, requesterKeyMapping.get(requester).get)
}
}
The logic to break the strings into a map can be the same for both as it's the same syntax.
What you had for the first string was not quite right as the filter you were applying to each string from the split result and not on the array result itself. Which also showed in that you were using [] on keyValuePair which was of type String and not Array[String] as I think you were expecting. Also you needed a trim in there to cope with the spaces in the second string. You might want to also trim the key and value to avoid other whitespace issues.
Additionally in this case the combination of map and filter can be more succinctly done with collect as shown here:
How to convert an Array to a Tuple?
The use of the pattern with 2 elements ensures you filter out anything with length other than 2 as you wanted.
The iterator is to make the combination of map and collect more efficient by only requiring one iteration of the collection returned from the first split (see comments below).
With both strings turned into a map it just needs the right use of groupByto group the first map by the value of the second based on the same key to get what you wanted. Obviously this only works if the same key is always in the second map.
def toMap(str: String): Map[String, String] =
str
.split(";")
.iterator
.map(_.trim.split(','))
.collect { case Array(key, value) => (key.trim, value.trim) }
.toMap
val keyMap = toMap("androidApp,key1;iosApp,key2;xyz,key3")
val tentMap = toMap("androidApp,tenant1; iosApp,tenant1; xyz,tenant2")
val finalMap = keyMap.groupBy { case (k, _) => tentMap(k) }
Printing out finalMap gives:
Map(tenant2 -> Map(xyz -> key3), tenant1 -> Map(androidApp -> key1, iosApp -> key2))
Which is what you wanted.

How to get the globally declared MapState value in RichCoMapFunction [ Apache Flink ]?

I'm implementing the Flink datastream for some real time data calculation. So that i'm getting datastream value from two type of source. And i need to do some transformation based on some key. When i'm using RichCoMapFunction, Mapstate is not visible to globally. My program as follows
class Transformer extends RichCoMapFunction[(String, Map[String, String]), (String, Map[String, String]), Map[String, String]] {
private var sourceMap1: MapState[String, Map[String, String]] = _
private var sourceMap2: MapState[String, Map[String, String]] = _
override def map1(in1: (String, Map[String, String])): Map[String, String] = {
sourceMap1.put(in1._2("key"), in1._2)
println(sourceMap1.keys()) // Working with updated values
println(sourceMap2.keys()) // Return empty value always
return in1._2
}
override def map2(in2: (String, Map[String, String])): Map[String, String] = {
sourceMap2.put(in2._2("key"), in2._2)
println(sourceMap1.keys()) // Return empty value always
println(sourceMap2.keys()) // Working with updated values
return in2._2
}
override def open(parameters: Configuration): Unit = {
val desc1: MapStateDescriptor[String, Map[String, String]] = new MapStateDescriptor[String, Map[String, String]]("sourceMap1", classOf[String], classOf[Map[String, String]])
sourceMap1 = getRuntimeContext.getMapState(desc1)
val desc2: MapStateDescriptor[String, Map[String, String]] = new MapStateDescriptor[String, Map[String, String]]("sourceMap2", classOf[String], classOf[Map[String, String]])
sourceMap2 = getRuntimeContext.getMapState(desc2)
}
}
I need to access sourceMap2 in map1 function since its declared as global. But when i'm trying to print the keys of sourceMap2 in map1 function it's always return as empty value. But if i'm printing the sourceMap1 in map1 function means it will print all the added keys.
When using keyed state, Flink will store a separate state value for each key value. This means that if you have a stateful mapper m with state s and you process records (x1, y1) and (x2, y2) where x is the key, Flink will store s(x1) = (x1, v1) and s(x2) = (x2, v2) in its state backend.
When processing (x2, y2), then you only have access to s(x2) and it is not possible to access s(x1).
I assume that this is the reason why you see presumably empty MapState. The incoming records for map1 and map2 will have different keys and, therefore, you access the sourceMap2 in map1 for a key (not the map key but the keyBy key) for which no key-value pairs have been stored. The same applies to map2 where you access sourceMap1 under a key for which no key-value pairs have been stored yet.
Your Transformer class is being applied to two connected, keyed streams. sourceMap1 and sourceMap2 are keyed state, meaning that you have a separate, nested hash map for every key of the two connected streams. One pair of these maps is in scope each time map1 or map2 is called, i.e., the pair corresponding to the key of the item being mapped.
If instead you want to have global state, shared across all the keys, have a look at the broadcast state pattern.

Scala Nested HashMaps, how to access Case Class value properties?

New to Scala, continue to struggle with Option related code. I have a HashMap built of Case Class instances that themselves contain hash maps with Case Class instance values. It is not clear to me how to access properties of the retrieved Class instances:
import collection.mutable.HashMap
case class InnerClass(name: String, age: Int)
case class OuterClass(name: String, nestedMap: HashMap[String, InnerClass])
// Load some data...hash maps are mutable
val innerMap = new HashMap[String, InnerClass]()
innerMap += ("aaa" -> InnerClass("xyz", 0))
val outerMap = new HashMap[String, OuterClass]()
outerMap += ("AAA" -> OuterClass("XYZ", innerMap))
// Try to retrieve data
val outerMapTest = outerMap.getOrElse("AAA", None)
val nestedMap = outerMapTest.nestedMap
This produces error: value nestedMap is not a member of Option[ScalaFiddle.OuterClass]
// Try to retrieve data a different way
val outerMapTest = outerMap.getOrElse("AAA", None)
val nestedMap = outerMapTest.nestedMap
This produces error: value nestedMap is not a member of Product with Serializable
Please advise on how I would go about getting access to outerMapTest.nestedMap. I'll eventually need to get values and properties out of the nestedMap HashMap as well.
Since you are using .getOrElse("someKey", None) which returns you a type Product (not the actual type as you expect to be OuterClass)
scala> val outerMapTest = outerMap.getOrElse("AAA", None)
outerMapTest: Product with Serializable = OuterClass(XYZ,Map(aaa -> InnerClass(xyz,0)))
so Product either needs to be pattern matched or casted to OuterClass
pattern match example
scala> outerMapTest match { case x : OuterClass => println(x.nestedMap); case _ => println("is not outerclass") }
Map(aaa -> InnerClass(xyz,0))
Casting example which is a terrible idea when outerMapTest is None, (pattern matching is favored over casting)
scala> outerMapTest.asInstanceOf[OuterClass].nestedMap
res30: scala.collection.mutable.HashMap[String,InnerClass] = Map(aaa -> InnerClass(xyz,0))
But better way of solving it would simply use .get which very smart and gives you Option[OuterClass],
scala> outerMap.get("AAA").map(outerClass => outerClass.nestedMap)
res27: Option[scala.collection.mutable.HashMap[String,InnerClass]] = Some(Map(aaa -> InnerClass(xyz,0)))
For key that does not exist, gives you None
scala> outerMap.get("I dont exist").map(outerClass => outerClass.nestedMap)
res28: Option[scala.collection.mutable.HashMap[String,InnerClass]] = None
Here are some steps you can take to get deep inside a nested structure like this.
outerMap.lift("AAA") // Option[OuterClass]
.map(_.nestedMap) // Option[HashMap[String,InnerClass]]
.flatMap(_.lift("aaa")) // Option[InnerClass]
.map(_.name) // Option[String]
.getOrElse("no name") // String
Notice that if either of the inner or outer maps doesn't have the specified key ("aaa" or "AAA" respectively) then the whole thing will safely result in the default string ("no name").
A HashMap will return None if a key is not found so it is unnecessary to do getOrElse to return None if the key is not found.
A simple solution to your problem would be to use get only as below
Change your first get as
val outerMapTest = outerMap.get("AAA").get
you can check the output as
println(outerMapTest.name)
println(outerMapTest.nestedMap)
And change the second get as
val nestedMap = outerMapTest.nestedMap.get("aaa").get
You can test the outputs as
println(nestedMap.name)
println(nestedMap.age)
Hope this is helpful
You want
val maybeInner = outerMap.get("AAA").flatMap(_.nestedMap.get("aaa"))
val maybeName = maybeInner.map(_.name)
Which if your feeling adventurous you can get with
val name: String = maybeName.get
But that will throw an error if its not there. If its a None
you can access the nestMap using below expression.
scala> outerMap.get("AAA").map(_.nestedMap).getOrElse(HashMap())
res5: scala.collection.mutable.HashMap[String,InnerClass] = Map(aaa -> InnerClass(xyz,0))
if "AAA" didnt exist in the outerMap Map object then the below expression would have returned an empty HashMap as indicated in the .getOrElse method argument (HashMap()).

scala get first key from seq of map

In scala, I know the mySeq is an array of Map object and the array only has one element. then I want to get first key of this element. Why it doesn't work ? it gave me error: value keySet is not a member of (Int, String)
code:
val mySeq: Seq[(Int, String)] = ...
val aMap = mySeq(0)
val firstKey = aMap.keySet.head
That's actually a Seq of tuples:
val aTuple = mySeq(0)
val firstKey = aTuple._1
To declare a Seq or maps, you'd use:
val mySeq: Seq[Map[Int, String]] = ...
But note that it doesn't make much sense to get the first key of a map, since maps are usually unordered by design.

Typesafe keys for a map

Given the following code:
val m: Map[String, Int] = .. // fetch from somewhere
val keys: List[String] = m.keys.toList
val keysSubset: List[String] = ... // choose random keys
We can define the following method:
def sumValues(m: Map[String, Int], ks: List[String]): Int =
ks.map(m).sum
And call this as:
sumValues(m, keysSubset)
However, the problem with sumValues is that if ks happens to have a key not present on the map, the code will still compile but throw an exception at runtime. Ex:
// assume m = Map("two" -> 2, "three" -> 3)
sumValues(m, 1 :: Nil)
What I want instead is a definition for sumValues such that the ks argument should, at compile time, be guaranteed to only contain keys that are present on the map. As such, my guess is that the existing sumValues type signature needs to accept some form of implicit evidence that the ks argument is somehow derived from the list of keys of the map.
I'm not limited to a scala Map however, as any record-like structure would do. The map structure however won't have a hardcoded value, but something derived/passed on as an argument.
Note: I'm not really after summing the values, but more of figuring out a type signature for sumValues whose calls to it can only compile if the ks argument is provably from the list of keys the map (or record-like structure).
Another solution could be to map only the intersection (i.e. : between m keys and ks).
For example :
scala> def sumValues(m: Map[String, Int], ks: List[String]): Int = {
| m.keys.filter(ks.contains).map(m).sum
| }
sumValues: (m: Map[String,Int], ks: List[String])Int
scala> val map = Map("hello" -> 5)
map: scala.collection.immutable.Map[String,Int] = Map(hello -> 5)
scala> sumValues(map, List("hello", "world"))
res1: Int = 5
I think this solution is better than providing a default value because more generic (i.e. : you can use it not only with sums). However, I guess that this solution is less effective in term of performance because the intersection.
EDIT : As #jwvh pointed out in it message below, ks.intersect(m.keys.toSeq).map(m).sum is, to my opinion, more readable than m.keys.filter(ks.contains).map(m).sum.