Filter Map by key set - scala

Is there a shortcut to filter a Map keeping only the entries where the key is contained in a given Set?
Here is some example code
scala> val map = Map("1"->1, "2"->2, "3"->3)
map: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> map.filterKeys(Set("1","2").contains)
res0: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2)
I am searching for something shorter than this.

Answering the Question
You can take advantage of the fact that a Set[A] is a predicate; i.e. A => Boolean
map filterKeys set
Here it is at work:
scala> val map = Map("1" -> 1, "2" -> 2, "3" -> 3)
map: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> val set = Set("1", "2")
set: scala.collection.immutable.Set[java.lang.String] = Set(1, 2)
scala> map filterKeys set
res0: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2)
Or if you prefer:
scala> map filterKeys Set("1", "2")
res1: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2)
Predicates
It's actually really useful to have some wrapper around a predicate. Like so:
scala> class PredicateW[A](self: A => Boolean) {
| def and(other: A => Boolean): A => Boolean = a => self(a) && other(a)
| def or(other: A => Boolean): A => Boolean = a => self(a) || other(a)
| def unary_! : A => Boolean = a => !self(a)
| }
defined class PredicateW
And an implicit conversion:
scala> implicit def Predicate_Is_PredicateW[A](p: A => Boolean) = new PredicateW(p)
Predicate_Is_PredicateW: [A](p: A => Boolean)PredicateW[A]
And then you can use it:
scala> map filterKeys (Set("1", "2") and Set("2", "3"))
res2: scala.collection.immutable.Map[java.lang.String,Int] = Map(2 -> 2)
scala> map filterKeys (Set("1", "2") or Set("2", "3"))
res3: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> map filterKeys !Set("2", "3")
res4: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1)
This can be extended to xor, nand etc etc and if you include symbolic unicode can make for amazingly readable code:
val mustReport = trades filter (uncoveredShort ∨ exceedsDollarMax)
val european = {
val Europe = (_ : Market).exchange.country.region == Region.EU
trades filter (_.market ∈: Europe)
}

Sorry, not a direct answer to your question, but if you know which keys you want to remove (instead of which ones you want to keep), you could do this:
map -- Set("3")

A tangential tip, in case you are going to follow the PredicateW idea in #oxbow_lakes' answer:
In functional programming, instead of defining ad hoc functions, we aim for more generalized and composable abstractions. For this particular case, Applicative fits the bill.
Set themselves are functions, and the Applicative instance for [B]Function1[A, B] lets us lift functions to context. In other words, you can lift functions of type (Boolean, Boolean) => Boolean (such as ||, && etc.) to (A => Boolean, A => Boolean) => (A => Boolean). (Here you can find a great explanation on this concept of lifting.)
However the data structure Set itself has an Applicative instance available, which will be favored over [B]Applicative[A => B] instance. To prevent that, we will have to explicitly tell the compiler to treat the given set as a function. We define a following enrichment for that:
scala> implicit def setAsFunction[A](set: Set[A]) = new {
| def f: A => Boolean = set
| }
setAsFunction: [A](set: Set[A])java.lang.Object{def f: A => Boolean}
scala> Set(3, 4, 2).f
res144: Int => Boolean = Set(3, 4, 2)
And now put this Applicative goodness into use.
scala> val map = Map("1" -> 1, "2" -> 2, "3" -> 3)
map: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> map filterKeys ((Set("1", "2").f |#| Set("2", "3").f)(_ && _))
res150: scala.collection.immutable.Map[java.lang.String,Int] = Map(2 -> 2)
scala> map filterKeys ((Set("1", "2").f |#| Set("2", "3").f)(_ || _))
res151: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1, 2 -> 2, 3 -> 3)
scala> map filterKeys (Set("2", "3").f map (!_))
res152: scala.collection.immutable.Map[java.lang.String,Int] = Map(1 -> 1)
Note: All of the above requires Scalaz.

Related

Scala groupBy for a list

I'd like to create a map on which the key is the string and the value is the number of how many times the string appears on the list. I tried the groupBy method, but have been unsuccessful with that.
Required Answer
scala> val l = List("abc","abc","cbe","cab")
l: List[String] = List(abc, abc, cbe, cab)
scala> l.groupBy(identity).mapValues(_.size)
res91: scala.collection.immutable.Map[String,Int] = Map(cab -> 1, abc -> 2, cbe -> 1)
Suppose you have a list as
scala> val list = List("abc", "abc", "bc", "b", "abc")
list: List[String] = List(abc, abc, bc, b, abc)
You can write a function
scala> def generateMap(list: List[String], map:Map[String, Int]) : Map[String, Int] = list match {
| case x :: y => if(map.keySet.contains(x)) generateMap(y, map ++ Map(x -> (map(x)+1))) else generateMap(y, map ++ Map(x -> 1))
| case Nil => map
| }
generateMap: (list: List[String], map: Map[String,Int])Map[String,Int]
Then call the function as
scala> generateMap(list, Map.empty)
res1: Map[String,Int] = Map(abc -> 3, bc -> 1, b -> 1)
This also works:
scala> val l = List("abc","abc","cbe","cab")
val l: List[String] = List(abc, abc, cbe, cab)
scala> l.groupBy(identity).map(x => (x._1, x._2.length))
val res1: Map[String, Int] = HashMap(cbe -> 1, abc -> 2, cab -> 1)

Difference between mapValues and transform in Map

In Scala Map (see API) what is the difference in semantics and performance between mapValues and transform ?
For any given map, for instance
val m = Map( "a" -> 2, "b" -> 3 )
both
m.mapValues(_ * 5)
m.transform( (k,v) => v * 5 )
deliver the same result.
Let's say we have a Map[A,B]. For clarification: I'm always referring to an immutable Map.
mapValues takes a function B => C, where C is the new type for the values.
transform takes a function (A, B) => C, where this C is also the type for the values.
So both will result in a Map[A,C].
However with the transform function you can influence the result of the new values by the value of their keys.
For example:
val m = Map( "a" -> 2, "b" -> 3 )
m.transform((key, value) => key + value) //Map[String, String](a -> a2, b -> b3)
Doing this with mapValues will be quite hard.
The next difference is that transform is strict, whereas mapValues will give you only a view, which will not store the updated elements. It looks like this:
protected class MappedValues[C](f: B => C) extends AbstractMap[A, C] with DefaultMap[A, C] {
override def foreach[D](g: ((A, C)) => D): Unit = for ((k, v) <- self) g((k, f(v)))
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
override def size = self.size
override def contains(key: A) = self.contains(key)
def get(key: A) = self.get(key).map(f)
}
(taken from https://github.com/scala/scala/blob/v2.11.2/src/library/scala/collection/MapLike.scala#L244)
So performance-wise it depends what is more effective. If f is expensive and you only access a few elements of the resulting map, mapValues might be better, since f is only applied on demand. Otherwise I would stick to map or transform.
transform can also be expressed with map. Assume m: Map[A,B] and f: (A,B) => C, then
m.transform(f) is equivalent to m.map{case (a, b) => (a, f(a, b))}
collection.Map doesn't provide transform: it has a different signature for mutable and immutable Maps.
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val im = Map('a -> 1, 'b -> 2, 'c -> 3)
im: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 1, 'b -> 2, 'c -> 3)
scala> im.mapValues(_ * 7) eq im
res0: Boolean = false
scala> im.transform { case (k,v) => v*7 } eq im
res2: Boolean = false
scala> val mm = collection.mutable.Map('a -> 1, 'b -> 2, 'c -> 3)
mm: scala.collection.mutable.Map[Symbol,Int] = Map('b -> 2, 'a -> 1, 'c -> 3)
scala> mm.mapValues(_ * 7) eq mm
res3: Boolean = false
scala> mm.transform { case (k,v) => v*7 } eq mm
res5: Boolean = true
Mutable transform mutates in place:
scala> mm.transform { case (k,v) => v*7 }
res6: mm.type = Map('b -> 98, 'a -> 49, 'c -> 147)
scala> mm.transform { case (k,v) => v*7 }
res7: mm.type = Map('b -> 686, 'a -> 343, 'c -> 1029)
So mutable transform doesn't change the type of the map:
scala> im mapValues (_ => "hi")
res12: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
scala> mm mapValues (_ => "hi")
res13: scala.collection.Map[Symbol,String] = Map('b -> hi, 'a -> hi, 'c -> hi)
scala> mm.transform { case (k,v) => "hi" }
<console>:9: error: type mismatch;
found : String("hi")
required: Int
mm.transform { case (k,v) => "hi" }
^
scala> im.transform { case (k,v) => "hi" }
res15: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
...as can happen when constructing a new map.
Here's a couple of unmentioned differences:
mapValues creates a Map that is NOT serializable, without any indication that it's just a view (the type is Map[_, _], but just try to send one across the wire).
Since mapValues is just a view, every instance contains the real Map - which could be another result of mapValues. Imagine you have an actor with some state, and every mutation of the state sets the new state to be a mapValues on the previous state...in the end you have deeply nested maps with a copy of each previous state of the actor (and, yes, both of these are from experience).

How to tell if a Map has a default value?

Is there a way to check if a Map has a defined default value? What I would like is some equivalent of myMap.getOrElse(x, y) where if the key x is not in the map,
if myMap has a default value, return that value
else return y
A contrived example of the issue:
scala> def f(m: Map[String, String]) = m.getOrElse("hello", "world")
f: (m: Map[String,String])String
scala> val myMap = Map("a" -> "A").withDefaultValue("Z")
myMap: scala.collection.immutable.Map[String,String] = Map(a -> A)
scala> f(myMap)
res0: String = world
In this case, I want res0 to be "Z" instead of "world", because myMap was defined with that as a default value. But getOrElse doesn't work that way.
I could use m.apply instead of m.getOrElse, but the map is not guaranteed to have a default value, so it could throw an exception (I could catch the exception, but this is nonideal).
scala> def f(m: Map[String, String]) = try {
| m("hello")
| } catch {
| case e: java.util.NoSuchElementException => "world"
| }
f: (m: Map[String,String])String
scala> val myMap = Map("a" -> "A").withDefaultValue("Z")
myMap: scala.collection.immutable.Map[String,String] = Map(a -> A)
scala> f(myMap)
res0: String = Z
scala> val mapWithNoDefault = Map("a" -> "A")
mapWithNoDefault: scala.collection.immutable.Map[String,String] = Map(a -> A)
scala> f(mapWithNoDefault)
res1: String = world
The above yields the expected value but seems messy. I can't pattern match and call apply or getOrElse based on whether or not the map had a default value, because the type is the same (scala.collection.immutable.Map[String,String]) regardless of default-ness.
Is there a way to do this that doesn't involve catching exceptions?
You can check whether the map is an instance of Map.WithDefault:
implicit class EnrichedMap[K, V](m: Map[K, V]) {
def getOrDefaultOrElse(k: K, v: => V) =
if (m.isInstanceOf[Map.WithDefault[K, V]]) m(k) else m.getOrElse(k, v)
}
And then:
scala> val myMap = Map("a" -> "A").withDefaultValue("Z")
myMap: scala.collection.immutable.Map[String,String] = Map(a -> A)
scala> myMap.getOrDefaultOrElse("hello", "world")
res11: String = Z
scala> val myDefaultlessMap = Map("a" -> "A")
myDefaultlessMap: scala.collection.immutable.Map[String,String] = Map(a -> A)
scala> myDefaultlessMap.getOrDefaultOrElse("hello", "world")
res12: String = world
Whether this kind of reflection is any better than using exceptions for non-exceptional control flow is an open question.
You could use Try instead of try/catch, and it would look a little cleaner.
val m = Map(1 -> 2, 3 -> 4)
import scala.util.Try
Try(m(10)).getOrElse(0)
res0: Int = 0
val m = Map(1 -> 2, 3 -> 4).withDefaultValue(100)
Try(m(10)).getOrElse(0)
res1: Int = 100

Convert List of tuple to map (and deal with duplicate key ?)

I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")] into map ("a" -> ["b", "f"], "c" -> ["d"]). Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
btw, actual type of key-value I use here is (Int, Node) and I want to turn into a map of (Int -> NodeSeq)
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
List("a" -> 1, "b" -> 2, "a" -> 3).toMap
// Result: Map(a -> 3, c -> 2)
As of 2.12, the default policy reads:
Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.
Group and then project:
scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
More scalish way to use fold, in the way like there (skip map f step).
Here's another alternative:
x.groupBy(_._1).mapValues(_.map(_._2))
For Googlers that do care about duplicates:
implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}
> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
This:
groups elements based on the first part of tuples (group part of groupMap)
maps grouped values by taking their second tuple part (map part of groupMap)
This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
GroupBy variation
list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
Fold Left variation
list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})
Aggregate Variation - Similar to fold Left
list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)
import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)
// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)
// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap
Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.
val x = List("a" -> "b", "c" -> "d", "a" -> "f")
x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}
res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))
You can try this
scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)

What is the most succinct Scala way to reverse a Map?

What is the most succinct Scala way to reverse a Map? The Map may contain non-unique values.
EDIT:
The reversal of Map[A, B] should give Map[B, Set[A]] (or a MultiMap, that would be even better).
If you can lose duplicate keys:
scala> val map = Map(1->"one", 2->"two", -2->"two")
map: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,one), (2,two), (-2,two))
scala> map.map(_ swap)
res0: scala.collection.immutable.Map[java.lang.String,Int] = Map((one,1), (two,-2))
If you don't want access as a multimap, just a map to sets, then:
scala> map.groupBy(_._2).mapValues(_.keys.toSet)
res1: scala.collection.immutable.Map[
java.lang.String,scala.collection.immutable.Set[Int]
] = Map((one,Set(1)), (two,Set(2, -2)))
If you insist on getting a MultiMap, then:
scala> import scala.collection.mutable.{HashMap, Set, MultiMap}
scala> ( (new HashMap[String,Set[Int]] with MultiMap[String,Int]) ++=
| map.groupBy(_._2).mapValues(Set[Int]() ++= _.keys) )
res2: scala.collection.mutable.HashMap[String,scala.collection.mutable.Set[Int]]
with scala.collection.mutable.MultiMap[String,Int] = Map((one,Set(1)), (two,Set(-2, 2)))
scala> val m1 = Map(1 -> "one", 2 -> "two", 3 -> "three", 4 -> "four")
m1: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,one), (2,two), (3,three), (4,four))
scala> m1.map(pair => pair._2 -> pair._1)
res0: scala.collection.immutable.Map[java.lang.String,Int] = Map((one,1), (two,2), (three,3), (four,4))
Edit for clarified question:
object RevMap {
def
main(args: Array[String]): Unit = {
val m1 = Map("one" -> 3, "two" -> 3, "three" -> 5, "four" -> 4, "five" -> 5, "six" -> 3)
val rm1 = (Map[Int, Set[String]]() /: m1) { (map: Map[Int, Set[String]], pair: (String, Int)) =>
map + ((pair._2, map.getOrElse(pair._2, Set[String]()) + pair._1)) }
printf("m1=%s%nrm1=%s%n", m1, rm1)
}
}
% scala RevMap
m1=Map(four -> 4, three -> 5, two -> 3, six -> 3, five -> 4, one -> 3)
rm1=Map(4 -> Set(four, five), 5 -> Set(three), 3 -> Set(two, six, one))
I'm not sure this qualifies as succinct.
How about:
implicit class RichMap[A, B](map: Map[A, Seq[B]])
{
import scala.collection.mutable._
def reverse: MultiMap[B, A] =
{
val result = new HashMap[B, Set[A]] with MultiMap[B, A]
map.foreach(kv => kv._2.foreach(result.addBinding(_, kv._1)))
result
}
}
or
implicit class RichMap[A, B](map: Map[A, Seq[B]])
{
import scala.collection.mutable._
def reverse: MultiMap[B, A] =
{
val result = new HashMap[B, Set[A]] with MultiMap[B, A]
map.foreach{case(k,v) => v.foreach(result.addBinding(_, k))}
result
}
}