Related
Suppose I have a list of int. I can invoke a lift function on it and get another function with type T => Option[T].
val f0: Int => Option[Int] = List(1, 2).lift
println(f0.apply(0)) // Some(1)
println(f0.apply(1)) // Some(2)
println(f0.apply(2)) // None
But how does it work? Why do I can apply a lift (from PartialFunction trait) function to List? Is there some implicit magic?
There is no "implicit magic". List[T] simply is a subclass of PartialFunction[Int, T]
As mentioned List[T] is a subtype of PartialFunction[Int, T]. Now List is not a direct subclass of PartialFunction.
It's the Seq trait that extends PartialFunction in the form trait Seq[+A] extends PartialFunction[Int, A] .
Seq is the trait inherited by collections like List which in turn gives them the methods like Lift etc.
I think its a matter of perspective.
I would look at that as Seqis a PartialFunction that goes from Int values to the element type of the sequence and whose isDefinedAt method returns true for the interval from 0 until length.
Look here.
Likewise for maps all maps extend the trait MapLike which extends PartialFunction. Now a Map[A,B] extends PartialFunction[A,B].
So think of Maps as PartialFunctions where the isDefinedAt method returns true for all defined keys.
I am copying a sample from my worksheet . I assign Map to a PartialFunction
to illustrate the sameness.
val m = Map("a" -> 1, "b" -> 2, "c" -> 3, "d" -> 4)
//> m : scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3
//| , d -> 4)
val f2: PartialFunction[String, Int] = m //> f2 : PartialFunction[String,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)
m.isDefinedAt("d") //> res5: Boolean = true
f2.isDefinedAt("e") //> res6: Boolean = false
Look here.
In Scala Map (see API) what is the difference in semantics and performance between mapValues and transform ?
For any given map, for instance
val m = Map( "a" -> 2, "b" -> 3 )
both
m.mapValues(_ * 5)
m.transform( (k,v) => v * 5 )
deliver the same result.
Let's say we have a Map[A,B]. For clarification: I'm always referring to an immutable Map.
mapValues takes a function B => C, where C is the new type for the values.
transform takes a function (A, B) => C, where this C is also the type for the values.
So both will result in a Map[A,C].
However with the transform function you can influence the result of the new values by the value of their keys.
For example:
val m = Map( "a" -> 2, "b" -> 3 )
m.transform((key, value) => key + value) //Map[String, String](a -> a2, b -> b3)
Doing this with mapValues will be quite hard.
The next difference is that transform is strict, whereas mapValues will give you only a view, which will not store the updated elements. It looks like this:
protected class MappedValues[C](f: B => C) extends AbstractMap[A, C] with DefaultMap[A, C] {
override def foreach[D](g: ((A, C)) => D): Unit = for ((k, v) <- self) g((k, f(v)))
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
override def size = self.size
override def contains(key: A) = self.contains(key)
def get(key: A) = self.get(key).map(f)
}
(taken from https://github.com/scala/scala/blob/v2.11.2/src/library/scala/collection/MapLike.scala#L244)
So performance-wise it depends what is more effective. If f is expensive and you only access a few elements of the resulting map, mapValues might be better, since f is only applied on demand. Otherwise I would stick to map or transform.
transform can also be expressed with map. Assume m: Map[A,B] and f: (A,B) => C, then
m.transform(f) is equivalent to m.map{case (a, b) => (a, f(a, b))}
collection.Map doesn't provide transform: it has a different signature for mutable and immutable Maps.
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val im = Map('a -> 1, 'b -> 2, 'c -> 3)
im: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 1, 'b -> 2, 'c -> 3)
scala> im.mapValues(_ * 7) eq im
res0: Boolean = false
scala> im.transform { case (k,v) => v*7 } eq im
res2: Boolean = false
scala> val mm = collection.mutable.Map('a -> 1, 'b -> 2, 'c -> 3)
mm: scala.collection.mutable.Map[Symbol,Int] = Map('b -> 2, 'a -> 1, 'c -> 3)
scala> mm.mapValues(_ * 7) eq mm
res3: Boolean = false
scala> mm.transform { case (k,v) => v*7 } eq mm
res5: Boolean = true
Mutable transform mutates in place:
scala> mm.transform { case (k,v) => v*7 }
res6: mm.type = Map('b -> 98, 'a -> 49, 'c -> 147)
scala> mm.transform { case (k,v) => v*7 }
res7: mm.type = Map('b -> 686, 'a -> 343, 'c -> 1029)
So mutable transform doesn't change the type of the map:
scala> im mapValues (_ => "hi")
res12: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
scala> mm mapValues (_ => "hi")
res13: scala.collection.Map[Symbol,String] = Map('b -> hi, 'a -> hi, 'c -> hi)
scala> mm.transform { case (k,v) => "hi" }
<console>:9: error: type mismatch;
found : String("hi")
required: Int
mm.transform { case (k,v) => "hi" }
^
scala> im.transform { case (k,v) => "hi" }
res15: scala.collection.immutable.Map[Symbol,String] = Map('a -> hi, 'b -> hi, 'c -> hi)
...as can happen when constructing a new map.
Here's a couple of unmentioned differences:
mapValues creates a Map that is NOT serializable, without any indication that it's just a view (the type is Map[_, _], but just try to send one across the wire).
Since mapValues is just a view, every instance contains the real Map - which could be another result of mapValues. Imagine you have an actor with some state, and every mutation of the state sets the new state to be a mapValues on the previous state...in the end you have deeply nested maps with a copy of each previous state of the actor (and, yes, both of these are from experience).
How can I merge maps like below:
Map1 = Map(1 -> Class1(1), 2 -> Class1(2))
Map2 = Map(2 -> Class2(1), 3 -> Class2(2))
After merged.
Merged = Map( 1 -> List(Class1(1)), 2 -> List(Class1(2), Class2(1)), 3 -> Class2(2))
Can be List, Set or any other collection who has size attribute.
Using the standard lib, you can do it as follows:
// convert maps to seq, to keep duplicate keys and concat
val merged = Map(1 -> 2).toSeq ++ Map(1 -> 4).toSeq
// merged: Seq[(Int, Int)] = ArrayBuffer((1,2), (1,4))
// group by key
val grouped = merged.groupBy(_._1)
// grouped: scala.collection.immutable.Map[Int,Seq[(Int, Int)]] = Map(1 -> ArrayBuffer((1,2), (1,4)))
// remove key from value set and convert to list
val cleaned = grouped.mapValues(_.map(_._2).toList)
// cleaned: scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(2, 4))
This is the simplest implementation i could come up with,
val m1 = Map(1 -> "1", 2 -> "2")
val m2 = Map(2 -> "21", 3 -> "3")
def merge[K, V](m1:Map[K, V], m2:Map[K, V]):Map[K, List[V]] =
(m1.keySet ++ m2.keySet) map { i => i -> (m1.get(i).toList ::: m2.get(i).toList) } toMap
merge(m1, m2) // Map(1 -> List(1), 2 -> List(2, 21), 3 -> List(3))
You could use scalaz:
import scalaz._, Scalaz._
val m1 = Map('a -> 1, 'b -> 2)
val m2 = Map('b -> 3, 'c -> 4)
m1.mapValues{List(_)} |+| m2.mapValues{List(_)}
// Map('b -> List(2, 3), 'c -> List(4), 'a -> List(1))
You could use Set(_) instead of List(_) to get Sets as values in Map.
See Semigroup in scalaz cheat sheet (or in learning scalaz) for details about |+| operator.
For Int |+| works as +, for List - as ++, for Map it applies |+| to values of same keys.
One clean way to do it, with cats:
import cats.implicits._
Map(1 -> "Hello").combine(Map(2 -> "Goodbye"))
//Map(2 -> Goodbye, 1 -> Hello)
It's important to note that both maps have to be of the same type (in this case, Map[Int, String]).
Long explanation:
combine isn't really a member of Map. By importing cats.implicits you're bringing into scope cats's Map built-in monoid instances, along with some implicit classes which enable the terse syntax.
The above is equivalent to this:
Monoid[Map[Int, String]].combine(Map(1 -> "Hello"), Map(2 -> "Goodbye"))
Where we're using the Monoid "summoner" function to get the Monoid[Map[Int, String]] instance in scope and using its combine function.
Starting Scala 2.13, another solution only based on the standard library consists in using groupMap which (as its name suggests) is an equivalent of a groupBy followed by mapValues:
// val m1 = Map(1 -> "a", 2 -> "b")
// val m2 = Map(2 -> "c", 3 -> "d")
(m1.toSeq ++ m2).groupMap(_._1)(_._2)
// Map[Int,Seq[String]] = Map(2 -> List("b", "c"), 1 -> List("a"), 3 -> List("d"))
This:
Concatenates the two maps as a sequence of tuples (List((1,"a"), (2,"b"), (2,"c"), (3,"d"))). For conciseness, m2 is implicitly converted to Seq to adapt to the type of m1.toSeq - but you could choose to make it explicit by using m2.toSeq.
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2) (map part of groupMap)
I wrote a blog post about this , check it out :
http://www.nimrodstech.com/scala-map-merge/
basically using scalaz semi group you can achieve this pretty easily
would look something like :
import scalaz.Scalaz._
Map1 |+| Map2
You can use foldLeft to merge two Maps of the same type
def merge[A, B](a: Map[A, B], b: Map[A, B])(mergef: (B, Option[B]) => B): Map[A, B] = {
val (big, small) = if (a.size > b.size) (a, b) else (b, a)
small.foldLeft(big) { case (z, (k, v)) => z + (k -> mergef(v, z.get(k))) }
}
def mergeIntSum[A](a: Map[A, Int], b: Map[A, Int]): Map[A, Int] =
merge(a, b)((v1, v2) => v2.map(_ + v1).getOrElse(v1))
Example:
val a = Map("a" -> 1, "b" -> 5, "c" -> 6)
val b = Map("a" -> 4, "z" -> 8)
mergeIntSum(a, b)
res0: Map[String,Int] = Map(a -> 5, b -> 5, c -> 6, z -> 8)
There is a Scala module called scala-collection-contrib, which offers very useful methods like mergeByKey.
First, we need to add an additional dependency to build.sbt:
libraryDependencies += "org.scala-lang.modules" %% "scala-collection-contrib" % "0.1.0"
and then it's possible to do merge like this:
import scala.collection.decorators._
val map1 = Map(1 -> Class1(1), 2 -> Class1(2))
val map2 = Map(2 -> Class2(1), 3 -> Class2(2))
map1.mergeByKeyWith(map2){
case (a,b) => a.toList ++ b.toList
}
Solution to combine two maps: Map[A,B], the result type: Map[A,List[B]] via the Scala Cats (slightly improved version, offered by #David Castillo)
//convert each original map to Map[A,List[B]].
//Add an instance of Monoid[List] into the scope to combine lists:
import cats.instances.map._ // for Monoid
import cats.syntax.semigroup._ // for |+|
import cats.instances.list._
val map1 = Map("a" -> 1, "b" -> 2)
.mapValues(List(_))
val map2 = Map("b" -> 3, "d" -> 4)
.mapValues(List(_))
map1 |+| map2
If you don't want to mess around with original maps you could do something like following
val target = map1.clone()
val source = map2.clone()
source.foreach(e => target += e._1 -> e._2)
left.keys map { k => k -> List(left(k),right(k)) } toMap
Is concise and will work, assuming your two maps are left and right. Not sure about efficiency.
But your question is a bit ambiguous, for two reasons. You don't specify
The subtyping relationship between the values (i.e. class1,class2),
What happens if the maps have different keys
For the first case, consider the following example:
val left = Map("foo" ->1, "bar" ->2)
val right = Map("bar" -> 'a', "foo" -> 'b')
Which results in
res0: Map[String,List[Int]] = Map(foo -> List(1, 98), bar -> List(2, 97))
Notice how the Chars have been converted to Ints, because of the scala type hierarchy. More generally, if in your example class1 and class2 are not related, you would get back a List[Any]; this is probably not what you wanted.
You can work around this by dropping the List constructor from my answer; this will return Tuples which preserve the type:
res0: Map[String,(Int, Char)] = Map(foo -> (1,b), bar -> (2,a))
The second problem is what happens when you have maps that don't have the same keys. This will result in a key not found exception. Put in another way, are you doing a left, right, or inner join of the two maps? You can disambiguate the type of join by switching to right.keys or right.keySet ++ left.keySet for right/inner joins respectively. The later will work around the missing key problem, but maybe that's not what you want i.e. maybe you want a left or right join instead. In that case you can consider using the withDefault method of Map to ensure every key returns a value, e.g. None, but this needs a bit more work.
m2.foldLeft(m1.mapValues{List[CommonType](_)}) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, List.empty) :+ v)
}
As noted by jwvh, List type should be specified explicitly if Class1 is not upper type bound for Class2. CommonType is a type which is upper bound for both Class1 and Class2.
This answer does not solve the initial question directly although it solves a common/related scenario which is merging two maps by the common keys.
Based on #Drexin's answer I wrote a generic method to extend the existing Map functionality by providing a join method for Maps:
object implicits {
type A = Any
implicit class MapExt[K, B <: A, C <: A](val left: immutable.Map[K, B]) {
def join(right: immutable.Map[K, C]) : immutable.Map[K, Seq[A]] = {
val inter = left.keySet.intersect(right.keySet)
val leftFiltered = left.filterKeys{inter.contains}
val rightFiltered = right.filterKeys{inter.contains}
(leftFiltered.toSeq ++ rightFiltered.toSeq)
.groupBy(_._1)
.mapValues(_.map{_._2}.toList)
}
}
}
Notes:
Join is based on intersection of the keys, which resembles an "inner join" from the SQL world.
It works with Scala <= 2.12, for Scala 2.13 consider using groupMap as #Xavier Guihot suggested.
Consider replacing type A with your own base type.
Usage:
import implicits._
val m1 = Map("k11" -> "v11", "k12" -> "v12")
val m2 = Map("k11" -> "v21", "k12" -> "v22", "k13" -> "v23")
println (m1 join m2)
// Map(k11 -> List(v11, v21), k12 -> List(v12, v22))
this's two map merge
def mergeMap[A, B](map1: Map[A, B], map2: Map[A, B], op: (B, B) => B, default: => B): Map[A, B] = (map1.keySet ++ map2.keySet).map(x => (x, op(map1.getOrElse(x, default), map2.getOrElse(x, default)))).toMap
this’s multiple map merge
def mergeMaps[A, B](maps: Seq[Map[A, B]], op: (B, B) => B, default: => B): Map[A, B] = maps.reduce((a, b) => mergeMap(a, b, op, default))
What would be a functional way to zip two dictionaries in Scala?
map1 = new HashMap("A"->1,"B"->2)
map2 = new HashMap("B"->22,"D"->4) // B is the only common key
zipper(map1,map2) should give something similar to
Seq( ("A",1,0), // no A in second map, so third value is zero
("B",2,22),
("D",0,4)) // no D in first map, so second value is zero
If not functional, any other style is also appreciated
def zipper(map1: Map[String, Int], map2: Map[String, Int]) = {
for(key <- map1.keys ++ map2.keys)
yield (key, map1.getOrElse(key, 0), map2.getOrElse(key, 0))
}
scala> val map1 = scala.collection.immutable.HashMap("A" -> 1, "B" -> 2)
map1: scala.collection.immutable.HashMap[String,Int] = Map(A -> 1, B -> 2)
scala> val map2 = scala.collection.immutable.HashMap("B" -> 22, "D" -> 4)
map2: scala.collection.immutable.HashMap[String,Int] = Map(B -> 22, D -> 4)
scala> :load Zipper.scala
Loading Zipper.scala...
zipper: (map1: Map[String,Int], map2: Map[String,Int])Iterable[(String, Int, Int)]
scala> zipper(map1, map2)
res1: Iterable[(String, Int, Int)] = Set((A,1,0), (B,2,22), (D,0,4))
Note using get is probably preferable to getOrElse in this case. None is used to specify that a value does not exist instead of using 0.
As an alternative to Brian's answer, this can be used to enhance the map class by way of implicit methods:
implicit class MapUtils[K, +V](map: collection.Map[K, V]) {
def zipAllByKey[B >: V, C >: V](that: collection.Map[K, C], thisElem: B, thatElem: C): Iterable[(K, B, C)] =
for (key <- map.keys ++ that.keys)
yield (key, map.getOrElse(key, thisElem), that.getOrElse(key, thatElem))
}
The naming and API are similar to the sequence zipAll.
I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")] into map ("a" -> ["b", "f"], "c" -> ["d"]). Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
btw, actual type of key-value I use here is (Int, Node) and I want to turn into a map of (Int -> NodeSeq)
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
List("a" -> 1, "b" -> 2, "a" -> 3).toMap
// Result: Map(a -> 3, c -> 2)
As of 2.12, the default policy reads:
Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.
Group and then project:
scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
More scalish way to use fold, in the way like there (skip map f step).
Here's another alternative:
x.groupBy(_._1).mapValues(_.map(_._2))
For Googlers that do care about duplicates:
implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}
> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))
Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
This:
groups elements based on the first part of tuples (group part of groupMap)
maps grouped values by taking their second tuple part (map part of groupMap)
This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2)) but performed in one pass through the List.
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
GroupBy variation
list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
Fold Left variation
list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})
Aggregate Variation - Similar to fold Left
list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)
import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)
// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)
// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap
Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.
val x = List("a" -> "b", "c" -> "d", "a" -> "f")
x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}
res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))
You can try this
scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)