Getting the key of a key/value pair in Scala/Spark - scala

I created a key-value pair using the arrow operator in Scala/Spark:
val picardsShip = "Picard" -> "Enterprise-D"
I extracted the value by using println(picardsShip._2) and that returns the value Enterprise-D, as expected. However, I want to extract the key in this case to get Picard. When I type picardsShip. in the IntelliJ IDEA editor that I am using, I don't see anything suggesting an obvious attribute or method to get the key. Any ideas? Thank you.

Careful here, you haven't created a key/value pair. You've created a tuple. When you use the REPL, you can see what's happening:
scala> val picardsShip = "Picard" -> "Enterprise-D"
picardsShip: (String, String) = (Picard,Enterprise-D)
to access the first value, use what #Krzysztof Atłasik suggested and use ._1:
scala> picardsShip._1
res1: String = Picard
Yes, you can create a key/value pair using the -> operator, but if you want it to operate on it like you have a key/value pair - you have to be a bit more specific by wrapping it in a map. And then you no longer have 1 key/value pair, but theoretically none to many key value pairs, which means you have basically a kind of Seq which is what Scala is really built around. The Map gives you access to an API that basically says "okay, here is a series of items that can be treated as key-value pairs" and will allow you to operate on it as you would expect key-value pairs to look at:
scala> val shipMap: Map[String, String] = Map(("Picard" -> "Enterprise-D"))
shipMap: Map[String,String] = Map(Picard -> Enterprise-D)
scala> shipMap("Picard")
res2: String = Enterprise-D
scala> shipMap.keys.head
res3: String = Picard
scala> shipMap.foreach( kv => println(kv._1))
Picard

Related

scala add _2 s in a list of Tuple 2

I have the following mutable Hashmap in Scala:
HashMap((b,3), (c,4), (a,8), (a,2))
and need to be converted to the following:
HashMap((b,3), (c,4), (a,10))
I need something like reduceByKey function logic.
I added the code here
def main(args: Array[String]) = {
val m = new mutable.HashMap[String,Tuple2[String,Int]]()
println("Hello, world")
m.+=(("xx",("a",2)))
m.+=(("uu",("b",3)))
m.+=(("zz",("a",8)))
m.+=(("yy",("c",4)))
println(m.values)
}
For pre 2.13 Scala versions you can try using groupBy with map:
m.values
.groupBy(_._1)
.mapValues(_.map(_._2).sum)
It sounds like what you have is not a hashmap but m.values of type Iterable[Tuple2[String, Int]], which is more manageable. In that case, as hinted at in the comments, groupMapReduce does it all in one function. This function groups "matching" elements together, applies a transformation to each element, and then reduces the groups using a binary operation.
m.values.groupMapReduce(_._1)(_._2)(_ + _)
This says "Group the values by the first element of their tuple, then keep the second element (i.e. the number), and then add all of the numbers in each group". This produces a map from the first element of the tuple to the sum.
Map(a -> 10, b -> 3, c -> 4)
Note that this is a Map, not necessarily a HashMap. If you want a HashMap (i.e. for mutability), you'll need to convert it yourself.

Spark RDD - CountByValue - Map type - order by key

From spark RDD - countByValue is returning Map Datatype and want to sort by key ascending/ descending .
val s = flightsObjectRDD.map(_.dep_delay / 60 toInt).countByValue() // RDD type is action and returning Map datatype
s.toSeq.sortBy(_._1)
The above code is working as expected. But countByValue itself have implicit sorting . How can i implement that way?
You exit the Big Data realm and get into Scala itself. And then into all those structures that are immutable, sorted, hashed and mutable, or a combination of these. I think that is the reason for the -1 initially. Nice folks out there, anyway.
Take this example, the countByValue returns a Map to the Driver, so only of interest for small amounts of data. Map is also (key, value) pair but with hashing and immutable. So we need to manipulate it. This is what you can do. First up you can sort the Map on the key in ascending order.
val rdd1 = sc.parallelize(Seq(("HR",5),("RD",4),("ADMIN",5),("SALES",4),("SER",6),("MAN",8),("MAN",8),("HR",5),("HR",6),("HR",5)))
val map = rdd1.countByValue
val res1 = ListMap(map.toSeq.sortBy(_._1):_*) // ascending sort on key part of Map
res1: scala.collection.immutable.ListMap[(String, Int),Long] = Map((ADMIN,5) -> 1, (HR,5) -> 3, (HR,6) -> 1, (MAN,8) -> 2, (RD,4) -> 1, (SALES,4) -> 1, (SER,6) -> 1)
However, you cannot apply reverse or descending logic on the key as it is hashing. Next best thing is as follows:
val res2 = map.toList.sortBy(_._1).reverse
val res22 = map.toSeq.sortBy(_._1).reverse
res2: List[((String, Int), Long)] = List(((SER,6),1), ((SALES,4),1), ((RD,4),1), ((MAN,8),2), ((HR,6),1), ((HR,5),3), ((ADMIN,5),1))
res22: Seq[((String, Int), Long)] = ArrayBuffer(((SER,6),1), ((SALES,4),1), ((RD,4),1), ((MAN,8),2), ((HR,6),1), ((HR,5),3), ((ADMIN,5),1))
But you cannot apply the .toMap against the .reverse here, as it will hash and lose the sort. So, you must make a compromise.

what is equivalent function of Map.compute in scala.collection.mutable.Map

Java has method in java.util.Map called compute which provides a way to update map when the key is present or absent in the map.
Does scala.collection.mutable.Map provides any similar function?
I've checked the documentation Map and HashMap but couldn't find equivalent ones.
you can use update and getOrElse as in
val x= scala.collection.mutable.Map("a"->1,"b"->2)
x.update("c",x.getOrElse("c",1)+41)
x.update("a",x.getOrElse("a",1)+41)
There is getOrElseUpdate defined in mutable.MapLike trait which does exactly what you want:
def getOrElseUpdate(key: K, op: ⇒ V): V
If given key is already in this map, returns associated value.
Otherwise, computes value from given expression op, stores with key in map and returns that value.
The correct answer above can be simplified by configuring default value for the case when the key is absent. Also, read then update value can be done by one operator map("key") += value
val map = collection.mutable.Map("a" -> 1, "b" -> 2).withDefaultValue(1)
map("c") += 41
map("a") += 41
println(map)
returns Map(b -> 2, a -> 42, c -> 42)

Creating a Map from a Set of keys

I have a set of keys, say Set[MyKey] and for each of the keys I want to compute the value through some value function, lets say computeValueOf(key: MyKey). In the end I want to have a Map which maps key -> value
What is the most efficient way to do this without iterating too much?
A collection of Tuple2s can be converted to a Map, where the tuple's first element will be the key and the second element will be the value.
val setOfKeys = Set[MyKey]()
setOfKeys.map(key => (key, computeValueOf(key)).toMap
This is actually a pretty neat application for collection.breakOut, one of my favorite pieces of bizarre Scala voodoo:
type MyKey = Int
def computeValueOf(key: MyKey) = "value" * key
val mySet: Set[MyKey] = Set(1, 2, 3)
val myMap: Map[MyKey, String] =
mySet.map(k => k -> computeValueOf(k))(collection.breakOut)
See this answer for some discussion of what's going on here. Unlike the version with toMap, this won't construct an intermediate Set, saving you some allocations and a traversal. It's also much less readable, though—I only offer it because you mentioned that you wanted to avoid "iterating too much".

Nested Default Maps in Scala

I'm trying to construct nested maps in Scala, where both the outer and inner map use the "withDefaultValue" method. For example, the following :
val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m(1)(2)
res: Int = 3
m(1)(2) = 5
m(1)(2)
res: Int = 5
m(2)(3) = 6
m
res : scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
So the map, when addressed by the appropriate keys, gives me back what I put in. However, the map itself appears empty! Even m.size returns 0 in this example. Can anyone explain what's going on here?
Short answer
It's definitely not a bug.
Long answer
The behavior of withDefaultValue is to store a default value (in your case, a mutable map) inside the Map to be returned in the case that they key does not exist. This is not the same as a value that is inserted into the Map when they key is not found.
Let's look closely at what's happening. It will be easier to understand if we pull the default map out as a separate variable so we can inspect is at will; let's call it default
import collection.mutable.HashMap
val default = HashMap.empty[Int,Int].withDefaultValue(3)
So default is a mutable map (that has its own default value). Now we can create m and give default as the default value.
import collection.mutable.{Map => MMap}
val m = HashMap.empty[Int, MMap[Int,Int]].withDefaultValue(default)
Now whenever m is accessed with a missing key, it will return default. Notice that this is the exact same behavior as you have because withDefaultValue is defined as:
def withDefaultValue (d: B): Map[A, B]
Notice that it's d: B and not d: => B, so it will not create a new map each time the default is accessed; it will return the same exact object, what we've called default.
So let's see what happens:
m(1) // Map()
Since key 1 is not in m, the default, default is returned. default at this time is an empty Map.
m(1)(2) = 5
Since m(1) returns default, this operation stores 5 as the value for key 2 in default. Nothing is written to the Map m because m(1) resolves to default which is a separate Map entirely. We can check this by viewing default:
default // Map(2 -> 5)
But as we said, m is left unchanged
m // Map()
Now, how to achieve what you really wanted? Instead of using withDefaultValue, you want to make use of getOrElseUpdate:
def getOrElseUpdate (key: A, op: ⇒ B): B
Notice how we see op: => B? This means that the argument op will be re-evaluated each time it is needed. This allows us to put a new Map in there and have it be a separate new Map for each invalid key. Let's take a look:
val m2 = HashMap.empty[Int, MMap[Int,Int]]
No default values needed here.
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(3)) // Map()
Key 1 doesn't exist, so we insert a new HashMap, and return that new value. We can check that it was inserted as we expected. Notice that 1 maps to the newly added empty map and that they 3 was not added anywhere because of the behavior explained above.
m2 // Map(1 -> Map())
Likewise, we can update the Map as expected:
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(1))(2) = 6
and check that it was added:
m2 // Map(1 -> Map(2 -> 6))
withDefaultValue is used to return a value when the key was not found. It does not populate the map. So you map stays empty. Somewhat like using getOrElse(a, b) where b is provided by withDefaultValue.
I just had the exact same problem, and was happy to find dhg's answer. Since typing getOrElseUpdate all the time is not very concise, I came up with this little extension of the idea that I want to share:
You can declare a class that uses getOrElseUpdate as default behavior for the () operator:
class DefaultDict[K, V](defaultFunction: (K) => V) extends HashMap[K, V] {
override def default(key: K): V = return defaultFunction(key)
override def apply(key: K): V =
getOrElseUpdate(key, default(key))
}
Now you can do what you want to do like this:
var map = new DefaultDict[Int, DefaultDict[Int, Int]](
key => new DefaultDict(key => 3))
map(1)(2) = 5
Which does now result in map containing 5 (or rather: containing a DefaultDict containing the value 5 for the key 2).
What you're seeing is the effect that you've created a single Map[Int, Int] this is the default value whenever the key isn't in the outer map.
scala> val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m: scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
scala> m(2)(2)
res1: Int = 3
scala> m(1)(2) = 5
scala> m(2)(2)
res2: Int = 5
To get the effect that you're looking for, you'll have to wrap the Map with an implementation that actually inserts the default value when a key isn't found in the Map.
Edit:
I'm not sure what your actual use case is, but you may have an easier time using a pair for the key to a single Map.
scala> val m = HashMap.empty[(Int, Int), Int].withDefaultValue(3)
m: scala.collection.mutable.Map[(Int, Int),Int] = Map()
scala> m((1, 2))
res0: Int = 3
scala> m((1, 2)) = 5
scala> m((1, 2))
res3: Int = 5
scala> m
res4: scala.collection.mutable.Map[(Int, Int),Int] = Map((1,2) -> 5)
I know it's a bit late but I've just seen the post while I was trying to solve the same problem.
Probably the API are different from the 2012 version but you may want to use withDefaultinstead that withDefaultValue.
The difference is that withDefault takes a function as parameter, that is executed every time a missed key is requested ;)