Scala: Map keys with wildcards? - scala

Is it possible to use keys with wildcards for Scala Maps? For example tuples of the form (x,_)? Example:
scala> val t1 = ("x","y")
scala> val t2 = ("y","x")
scala> val m = Map(t1 -> "foo", t2 -> "foo")
scala> m(("x","y"))
res5: String = foo
scala> m(("x",_))
<console>:11: error: missing parameter type for expanded function ((x$1) => scala.Tuple2("x", x$1))
m(("x",_))
^
It would be great if there was a way to retrieve all (composite_key, value) pares where only some part of composite key is defined. Other ways to get the same functionality in Scala?

How about use collect
Map( 1 -> "1" -> "11", 2 -> "2" -> "22").collect { case (k#(1, _), v ) => k -> v }

Using comprehensions like this:
for ( a # ((k1,k2), v) <- m if k1 == "x" ) yield a

In general, you can do something like
m.filter(m => (m._1 == "x"))
but in your particular example it will still return only one result, because a Map has only one entry per key. If your key itself is composite then it will indeed make more sense:
scala> Map((1,2)->"a", (1,3)->"b", (3,4)->"c").filter(m => (m._1._1 == 1))
res0: scala.collection.immutable.Map[(Int, Int),String] = Map((1,2) -> a, (1,3) -> b)

Think about what is happening under the hood of the Map. The default Map in Scala is scala.collection.immutable.HashMap, which stores things based on their hash codes. Do ("x", "y") and ("x", "y2") have hash codes that relate to each other in anyway? No, they don't, and their is no efficient way to implement wildcards with this map. The other answers provide solutions, but these will iterate over key/value pair in the entire Map, which is not efficient.
If you expect you are going to want to do operations like this, use a TreeMap. This doesn't use a hash table internally, put instead puts elements into a tree based on an ordering. This is similar to the way a relational database uses B-Trees for its indices. Your wildcard query is like using a two-column index to filter on the first column in the index.
Here is an example:
import scala.collection.immutable.TreeMap
val t1 = ("x","y")
val t2 = ("x","y2")
val t3 = ("y","x")
val m = TreeMap(t1 -> "foo1", t2 -> "foo2", t3 -> "foo3")
// "" is < than all other strings
// "x\u0000" is the next > string after "x"
val submap = m.from(("x", "")).to(("x\u0000", ""))
submap.values.foreach(println) // prints foo1, foo2

Related

Scala - conditional product/join of two arrays with default values using for comprehensions

I have two Sequences, say:
val first = Array("B", "L", "T")
val second = Array("T70", "B25", "B80", "A50", "M100", "B50")
How do I get a product such that elements of the first array are joined with each element of the second array which startsWith the former and also yield a default empty result when no element in the second array meets the condition.
Effectively to get an Output:
expectedProductArray = Array("B-B25", "B-B80", "B-B50", "L-Default", "T-T70")
I tried doing,
val myProductArray: Array[String] = for {
f <- first
s <- second if s.startsWith(f)
} yield s"""$f-$s"""
and i get:
myProductArray = Array("B-B25", "B-B80", "B-B50", "T-T70")
Is there an Idiomatic way of adding a default value for values in first sequence not having a corresponding value in the second sequence with the given criteria? Appreciate your thoughts.
Here's one approach by making array second a Map and looking up the Map for elements in array first with getOrElse:
val first = Array("B", "L", "T")
val second = Array("T70", "B25", "B80", "A50", "M100", "B50")
val m = second.groupBy(_(0).toString)
// m: scala.collection.immutable.Map[String,Array[String]] =
// Map(M -> Array(M100), A -> Array(A50), B -> Array(B25, B80, B50), T -> Array(T70))
first.flatMap(x => m.getOrElse(x, Array("Default")).map(x + "-" + _))
// res1: Array[String] = Array(B-B25, B-B80, B-B50, L-Default, T-T70)
In case you prefer using for-comprehension:
for {
x <- first
y <- m.getOrElse(x, Array("Default"))
} yield s"$x-$y"

Finding a map in list/vector of maps in Scala

I have a vector/list of maps (Map[String,Int]). How can I find if a key-value pair exists in one of these maps in the list of maps using .find?
val res = List(Map("1" -> 1), Map("2" -> 2)).find(t => t.exists(j => j == ("2", 2)))
println(res)
use find with exists to check whether it exists in maps.
chengpohi's solution is pretty inefficient, and also different to how I understand the question.
Let m: Map[String,Int].
Why chengpoi's solution is inefficient
First, using m.exists(j => j == ("2",2)), which can also be written m.contains("2" -> 2) looks at every entry of m, while m("2").toSeq.contains(2) performs only a single map lookup.
Note that m.contains("2" -> 2) will not work, as contains is overridden for Map to check for a key, i.e., m.contains("2") works—and is also fast.
To obtain the same result as chengpoi, but efficiently:
def mapExists[K,V](ms: List[Map[K,V]], k: K, v: V): Option[(K,V)] =
ms.get(k).filter(_ == v).map(_ => k -> v)
Note that this method returns its arguments, which is quite redundant.
How I understand the question
Second, I understood the question as checking whether the List contains a Map with a specific pair.
This would translate to
def mapExists[K,V](ms: List[Map[K,V]], k: K, v: V): Boolean =
ms.exists(_.get(k).contains(v))
It can be done even like this using just the key value we are interested to find:
scala> val res = List(Map("A" -> 10), Map("B" -> 20)).find(_.keySet.contains("B"))
res: Option[scala.collection.immutable.Map[String,Int]] = Some(Map(B -> 20))
scala>

Summing items within a Tuple

Below is a data structure of List of tuples, ot type List[(String, String, Int)]
val data3 = (List( ("id1" , "a", 1), ("id1" , "a", 1), ("id1" , "a", 1) , ("id2" , "a", 1)) )
//> data3 : List[(String, String, Int)] = List((id1,a,1), (id1,a,1), (id1,a,1),
//| (id2,a,1))
I'm attempting to count the occurences of each Int value associated with each id. So above data structure should be converted to List((id1,a,3) , (id2,a,1))
This is what I have come up with but I'm unsure how to group similar items within a Tuple :
data3.map( { case (id,name,num) => (id , name , num + 1)})
//> res0: List[(String, String, Int)] = List((id1,a,2), (id1,a,2), (id1,a,2), (i
//| d2,a,2))
In practice data3 is of type spark obj RDD , I'm using a List in this example for testing but same solution should be compatible with an RDD . I'm using a List for local testing purposes.
Update : based on following code provided by maasg :
val byKey = rdd.map({case (id1,id2,v) => (id1,id2)->v})
val byKeyGrouped = byKey.groupByKey
val result = byKeyGrouped.map{case ((id1,id2),values) => (id1,id2,values.sum)}
I needed to amend slightly to get into format I expect which is of type
.RDD[(String, Seq[(String, Int)])]
which corresponds to .RDD[(id, Seq[(name, count-of-names)])]
:
val byKey = rdd.map({case (id1,id2,v) => (id1,id2)->v})
val byKeyGrouped = byKey.groupByKey
val result = byKeyGrouped.map{case ((id1,id2),values) => ((id1),(id2,values.sum))}
val counted = result.groupedByKey
In Spark, you would do something like this: (using Spark Shell to illustrate)
val l = List( ("id1" , "a", 1), ("id1" , "a", 1), ("id1" , "a", 1) , ("id2" , "a", 1))
val rdd = sc.parallelize(l)
val grouped = rdd.groupBy{case (id1,id2,v) => (id1,id2)}
val result = grouped.map{case ((id1,id2),values) => (id1,id2,value.foldLeft(0){case (cumm, tuple) => cumm + tuple._3})}
Another option would be to map the rdd into a PairRDD and use groupByKey:
val byKey = rdd.map({case (id1,id2,v) => (id1,id2)->v})
val byKeyGrouped = byKey.groupByKey
val result = byKeyGrouped.map{case ((id1,id2),values) => (id1,id2,values.sum)}
Option 2 is a slightly better option when handling large sets as it does not replicate the id's in the cummulated value.
This seems to work when I use scala-ide:
data3
.groupBy(tupl => (tupl._1, tupl._2))
.mapValues(v =>(v.head._1,v.head._2, v.map(_._3).sum))
.values.toList
And the result is the same as required by the question
res0: List[(String, String, Int)] = List((id1,a,3), (id2,a,1))
You should look into List.groupBy.
You can use the id as the key, and then use the length of your values in the map (ie all the items sharing the same id) to know the count.
#vptheron has the right idea.
As can be seen in the docs
def groupBy[K](f: (A) ⇒ K): Map[K, List[A]]
Partitions this list into a map of lists according to some discriminator function.
Note: this method is not re-implemented by views. This means when applied to a view it will >always force the view and return a new list.
K the type of keys returned by the discriminator function.
f the discriminator function.
returns
A map from keys to lists such that the following invariant holds:
(xs partition f)(k) = xs filter (x => f(x) == k)
That is, every key k is bound to a list of those elements x for which f(x) equals k.
So something like the below function, when used with groupBy will give you a list with keys being the ids.
(Sorry, I don't have access to an Scala compiler, so I can't test)
def f(tupule: A) :String = {
return tupule._1
}
Then you will have to iterate through the List for each id in the Map and sum up the number of integer occurrences. That is straightforward, but if you still need help, ask in the comments.
The following is the most readable, efficient and scalable
data.map {
case (key1, key2, value) => ((key1, key2), value)
}
.reduceByKey(_ + _)
which will give a RDD[(String, String, Int)]. By using reduceByKey it means the summation will paralellize, i.e. for very large groups it will be distributed and summation will happen on the map side. Think about the case where there are only 10 groups but billions of records, using .sum won't scale as it will only be able to distribute to 10 cores.
A few more notes about the other answers:
Using head here is unnecessary: .mapValues(v =>(v.head._1,v.head._2, v.map(_._3).sum)) can just use .mapValues(v =>(v_1, v._2, v.map(_._3).sum))
Using a foldLeft here is really horrible when the above shows .map(_._3).sum will do: val result = grouped.map{case ((id1,id2),values) => (id1,id2,value.foldLeft(0){case (cumm, tuple) => cumm + tuple._3})}

Nested Default Maps in Scala

I'm trying to construct nested maps in Scala, where both the outer and inner map use the "withDefaultValue" method. For example, the following :
val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m(1)(2)
res: Int = 3
m(1)(2) = 5
m(1)(2)
res: Int = 5
m(2)(3) = 6
m
res : scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
So the map, when addressed by the appropriate keys, gives me back what I put in. However, the map itself appears empty! Even m.size returns 0 in this example. Can anyone explain what's going on here?
Short answer
It's definitely not a bug.
Long answer
The behavior of withDefaultValue is to store a default value (in your case, a mutable map) inside the Map to be returned in the case that they key does not exist. This is not the same as a value that is inserted into the Map when they key is not found.
Let's look closely at what's happening. It will be easier to understand if we pull the default map out as a separate variable so we can inspect is at will; let's call it default
import collection.mutable.HashMap
val default = HashMap.empty[Int,Int].withDefaultValue(3)
So default is a mutable map (that has its own default value). Now we can create m and give default as the default value.
import collection.mutable.{Map => MMap}
val m = HashMap.empty[Int, MMap[Int,Int]].withDefaultValue(default)
Now whenever m is accessed with a missing key, it will return default. Notice that this is the exact same behavior as you have because withDefaultValue is defined as:
def withDefaultValue (d: B): Map[A, B]
Notice that it's d: B and not d: => B, so it will not create a new map each time the default is accessed; it will return the same exact object, what we've called default.
So let's see what happens:
m(1) // Map()
Since key 1 is not in m, the default, default is returned. default at this time is an empty Map.
m(1)(2) = 5
Since m(1) returns default, this operation stores 5 as the value for key 2 in default. Nothing is written to the Map m because m(1) resolves to default which is a separate Map entirely. We can check this by viewing default:
default // Map(2 -> 5)
But as we said, m is left unchanged
m // Map()
Now, how to achieve what you really wanted? Instead of using withDefaultValue, you want to make use of getOrElseUpdate:
def getOrElseUpdate (key: A, op: ⇒ B): B
Notice how we see op: => B? This means that the argument op will be re-evaluated each time it is needed. This allows us to put a new Map in there and have it be a separate new Map for each invalid key. Let's take a look:
val m2 = HashMap.empty[Int, MMap[Int,Int]]
No default values needed here.
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(3)) // Map()
Key 1 doesn't exist, so we insert a new HashMap, and return that new value. We can check that it was inserted as we expected. Notice that 1 maps to the newly added empty map and that they 3 was not added anywhere because of the behavior explained above.
m2 // Map(1 -> Map())
Likewise, we can update the Map as expected:
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(1))(2) = 6
and check that it was added:
m2 // Map(1 -> Map(2 -> 6))
withDefaultValue is used to return a value when the key was not found. It does not populate the map. So you map stays empty. Somewhat like using getOrElse(a, b) where b is provided by withDefaultValue.
I just had the exact same problem, and was happy to find dhg's answer. Since typing getOrElseUpdate all the time is not very concise, I came up with this little extension of the idea that I want to share:
You can declare a class that uses getOrElseUpdate as default behavior for the () operator:
class DefaultDict[K, V](defaultFunction: (K) => V) extends HashMap[K, V] {
override def default(key: K): V = return defaultFunction(key)
override def apply(key: K): V =
getOrElseUpdate(key, default(key))
}
Now you can do what you want to do like this:
var map = new DefaultDict[Int, DefaultDict[Int, Int]](
key => new DefaultDict(key => 3))
map(1)(2) = 5
Which does now result in map containing 5 (or rather: containing a DefaultDict containing the value 5 for the key 2).
What you're seeing is the effect that you've created a single Map[Int, Int] this is the default value whenever the key isn't in the outer map.
scala> val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m: scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
scala> m(2)(2)
res1: Int = 3
scala> m(1)(2) = 5
scala> m(2)(2)
res2: Int = 5
To get the effect that you're looking for, you'll have to wrap the Map with an implementation that actually inserts the default value when a key isn't found in the Map.
Edit:
I'm not sure what your actual use case is, but you may have an easier time using a pair for the key to a single Map.
scala> val m = HashMap.empty[(Int, Int), Int].withDefaultValue(3)
m: scala.collection.mutable.Map[(Int, Int),Int] = Map()
scala> m((1, 2))
res0: Int = 3
scala> m((1, 2)) = 5
scala> m((1, 2))
res3: Int = 5
scala> m
res4: scala.collection.mutable.Map[(Int, Int),Int] = Map((1,2) -> 5)
I know it's a bit late but I've just seen the post while I was trying to solve the same problem.
Probably the API are different from the 2012 version but you may want to use withDefaultinstead that withDefaultValue.
The difference is that withDefault takes a function as parameter, that is executed every time a missed key is requested ;)

Scala: How do I use fold* with Map?

I have a Map[String, String] and want to concatenate the values to a single string.
I can see how to do this using a List...
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.reduceLeft[String](_+_)
res8: String = testing123
fold* or reduce* seem to be the right approach I just can't get the syntax right for a Map.
Folds on a map work the same way they would on a list of pairs. You can't use reduce because then the result type would have to be the same as the element type (i.e. a pair), but you want a string. So you use foldLeft with the empty string as the neutral element. You also can't just use _+_ because then you'd try to add a pair to a string. You have to instead use a function that adds the accumulated string, the first value of the pair and the second value of the pair. So you get this:
scala> val m = Map("la" -> "la", "foo" -> "bar")
m: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(la -> la, foo -> bar)
scala> m.foldLeft("")( (acc, kv) => acc + kv._1 + kv._2)
res14: java.lang.String = lalafoobar
Explanation of the first argument to fold:
As you know the function (acc, kv) => acc + kv._1 + kv._2 gets two arguments: the second is the key-value pair currently being processed. The first is the result accumulated so far. However what is the value of acc when the first pair is processed (and no result has been accumulated yet)? When you use reduce the first value of acc will be the first pair in the list (and the first value of kv will be the second pair in the list). However this does not work if you want the type of the result to be different than the element types. So instead of reduce we use fold where we pass the first value of acc as the first argument to foldLeft.
In short: the first argument to foldLeft says what the starting value of acc should be.
As Tom pointed out, you should keep in mind that maps don't necessarily maintain insertion order (Map2 and co. do, but hashmaps do not), so the string may list the elements in a different order than the one in which you inserted them.
The question has been answered already, but I'd like to point out that there are easier ways to produce those strings, if that's all you want. Like this:
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.mkString
res0: String = testing123
scala> val m = Map(1 -> "abc", 2 -> "def", 3 -> "ghi")
m: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,abc), (2,def), (3,ghi))
scala> m.values.mkString
res1: String = abcdefghi