Cleaner tuple groupBy - scala

I have a sequence of key-value pairs (String, Int), and I want to group them by key into a sequence of values (i.e. Seq[(String, Int)]) => Map[String, Iterable[Int]])).
Obviously, toMap isn't useful here, and groupBy maintains the values as tuples. The best I managed to come up with is:
val seq: Seq[( String, Int )]
// ...
seq.groupBy( _._1 ).mapValues( _.map( _._2 ) )
Is there a cleaner way of doing this?

Here's a pimp that adds a toMultiMap method to traversables. Would it solve your problem?
import collection._
import mutable.Builder
import generic.CanBuildFrom
class TraversableOnceExt[CC, A](coll: CC, asTraversable: CC => TraversableOnce[A]) {
def toMultiMap[T, U, That](implicit ev: A <:< (T, U), cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] =
toMultiMapBy(ev)
def toMultiMapBy[T, U, That](f: A => (T, U))(implicit cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] = {
val mutMap = mutable.Map.empty[T, mutable.Builder[U, That]]
for (x <- asTraversable(coll)) {
val (key, value) = f(x)
val builder = mutMap.getOrElseUpdate(key, cbf(coll))
builder += value
}
val mapBuilder = immutable.Map.newBuilder[T, That]
for ((k, v) <- mutMap)
mapBuilder += ((k, v.result))
mapBuilder.result
}
}
implicit def commomExtendTraversable[A, C[A] <: TraversableOnce[A]](coll: C[A]): TraversableOnceExt[C[A], A] =
new TraversableOnceExt[C[A], A](coll, identity)
Which can be used like this:
val map = List(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(map) // Map(1 -> List(a, à), 2 -> List(b))
val byFirstLetter = Set("abc", "aeiou", "cdef").toMultiMapBy(elem => (elem.head, elem))
println(byFirstLetter) // Map(c -> Set(cdef), a -> Set(abc, aeiou))
If you add the following implicit defs, it will also work with collection-like objects such as Strings and Arrays:
implicit def commomExtendStringTraversable(string: String): TraversableOnceExt[String, Char] =
new TraversableOnceExt[String, Char](string, implicitly)
implicit def commomExtendArrayTraversable[A](array: Array[A]): TraversableOnceExt[Array[A], A] =
new TraversableOnceExt[Array[A], A](array, implicitly)
Then:
val withArrays = Array(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(withArrays) // Map(1 -> [C#377653ae, 2 -> [C#396fe0f4)
val byLowercaseCode = "Mama".toMultiMapBy(c => (c.toLower.toInt, c))
println(byLowercaseCode) // Map(97 -> aa, 109 -> Mm)

There's no method or data structure in the standard library to do this, and your solution looks about as concise as you'll get. If you use this in more than one place, you might like to factor it out into a utility method
def groupTuples[A, B](seq: Seq[(A, B)]) =
seq groupBy (_._1) mapValues (_ map (_._2))
which you then obviously just call with groupTuples(seq). This might not be the most efficient possible in terms of CPU clock cycles, but I don't think it's particularly inefficient either.
I did a rough benchmark against Jean-Philippe's solution on a list of 9 tuples and this is marginally faster. Both were about twice as fast as folding the sequence into a map (effectively re-implementing groupBy to give the output you want).

I don't know if you consider it cleaner:
seq.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}

Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:
List(1 -> 'a', 1 -> 'b', 2 -> 'c').groupMap(_._1)(_._2)
// Map[Int,List[Char]] = Map(2 -> List(c), 1 -> List(a, b))
This:
groups elements based on the first part of tuples (Map(2 -> List((2,c)), 1 -> List((1,a), (1,b))))
maps grouped values (List((1,a), (1,b))) by taking their second tuple part (List(a, b)).

Related

flatmapping a nested Map in scala

Suppose I have val someMap = Map[String -> Map[String -> String]] defined as such:
val someMap =
Map(
("a1" -> Map( ("b1" -> "c1"), ("b2" -> "c2") ) ),
("a2" -> Map( ("b3" -> "c3"), ("b4" -> "c4") ) ),
("a3" -> Map( ("b5" -> "c5"), ("b6" -> "c6") ) )
)
and I would like to flatten it to something that looks like
List(
("a1","b1","c1"),("a1","b2","c2"),
("a2","b3","c3"),("a2","b4","c4"),
("a3","b5","c5"),("a3","b6","c6")
)
What is the most efficient way of doing this? I was thinking about creating some helper function that processes each (a_i -> Map(String,String)) key value pair and return
def helper(key: String, values: Map[String -> String]): (String,String,String)
= {val sublist = values.map(x => (key,x._1,x._2))
return sublist
}
then flatmap this function over someMap. But this seems somewhat unnecessary to my novice scala eyes, so I was wondering if there was a more efficient way to parse this Map.
No need to create helper function just write nested lambda:
val result = someMap.flatMap { case (k, v) => v.map { case (k1, v1) => (k, k1, v1) } }
Or
val y = someMap.flatMap(x => x._2.map(y => (x._1, y._1, y._2)))
Since you're asking about efficiency, the most efficient yet functional approach I can think of is using foldLeft and foldRight.
You need foldRight since :: constructs the immutable list in reverse.
someMap.foldRight(List.empty[(String, String, String)]) { case ((a, m), acc) =>
m.foldRight(acc) {
case ((b, c), acc) => (a, b, c) :: acc
}
}
Here, assuming Map.iterator.reverse is implemented efficiently, no intermediate collections are created.
Alternatively, you can use foldLeft and then reverse the result:
someMap.foldLeft(List.empty[(String, String, String)]) { case (acc, (a, m)) =>
m.foldLeft(acc) {
case (acc, (b, c)) => (a, b, c) :: acc
}
}.reverse
This way a single intermediate List is created, but you don't rely on the implementation of the reversed iterator (foldLeft uses forward iterator).
Note: one liners, such as someMap.flatMap(x => x._2.map(y => (x._1, y._1, y._2))) are less efficient, as, in addition to the temporary buffer to hold intermediate results of flatMap, they create and discard additional intermediate collections for each inner map.
UPD
Since there seems to be some confusion, I'll clarify what I mean. Here is an implementation of map, flatMap, foldLeft and foldRight from TraversibleLike:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = { // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = bf(repr)
b.sizeHint(this)
b
}
val b = builder
for (x <- this) b += f(x)
b.result
}
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
def builder = bf(repr) // extracted to keep method size under 35 bytes, so that it can be JIT-inlined
val b = builder
for (x <- this) b ++= f(x).seq
b.result
}
def foldLeft[B](z: B)(op: (B, A) => B): B = {
var result = z
this foreach (x => result = op(result, x))
result
}
def foldRight[B](z: B)(op: (A, B) => B): B =
reversed.foldLeft(z)((x, y) => op(y, x))
It's clear that map and flatMap create intermediate buffer using corresponding builder, while foldLeft and foldRight reuse the same user-supplied accumulator object, and only use iterators.

How to unpack Option[(Int, Int)] in Scala

The following is a valid and readable piece of code to unpack returned values.
def func: (Int, Int) = (1, 2)
val (a, b) = func
What about the functions that return Option? For example:
def func2: Option[(Int, Int)] = Some((1, 2))
How can I unpack this in a readable way?
Note that (Int, Int) is sugar for tuple type
Tuple2[Int, Int]
so Option[(Int, Int)] becomes
Option[Tuple2[Int, Int]]
thus correct syntax would be
val Some(Tuple2(a, b)) = func2
or
val Some((a, b)) = func2
or
val Some(a -> b) = func2
However mind if func2 returns None then it will explode with MatchError. The reason becomes clear if we examine the expanded version which is something like
val x: (Int, Int) = func2 match {
case Some((a, b)) => (a, b)
// but what about None case ??
}
val a = x._1
val b = x._2
Note how we did not handle None case. For this reason such extraction is rarely done. Usually we map over the Option and continue working within the context of the Option
func2.map { case (a, b) =>
// work with a and b
}
or we provide some default value if possible
val (a, b) = func2.getOrElse((0, 0))

How does flatMap work on Maps in scala? map can be used as mapValues on Maps but how does the flatMap function work on Map objects?

I am not able to understand the functioning of flatMap function on Map objects.
You use flatMap if you want to flatten your result from map-function.
keep in mind:
flatMap(something)
is identical to
map(something).flatten
I think it is good question, cause map cannot be flatten as other collections. First off all we should look at the signature of this method:
def flatMap[B](f: (A) ⇒ GenTraversableOnce[B]): Map[B]
So, the documentation says that it should return Map, but it is not true, cause it can return any GenTraversableOnce and not only it. We can see it in the provided examples:
def getWords(lines: Seq[String]): Seq[String] = lines flatMap (line => line split "\\W+")
// lettersOf will return a Seq[Char] of likely repeated letters, instead of a Set
def lettersOf(words: Seq[String]) = words flatMap (word => word.toSet)
// lettersOf will return a Set[Char], not a Seq
def lettersOf(words: Seq[String]) = words.toSet flatMap (word => word.toSeq)
// xs will be an Iterable[Int]
val xs = Map("a" -> List(11,111), "b" -> List(22,222)).flatMap(_._2)
// ys will be a Map[Int, Int]
val ys = Map("a" -> List(1 -> 11,1 -> 111), "b" -> List(2 -> 22,2 -> 222)).flatMap(_._2)
So let look at the full signature:
def flatMap[B, That](f: ((K, V)) ⇒ GenTraversableOnce[B])(implicit bf: CanBuildFrom[Map[K, V], B, That]): That
Now we see it returns That - something that implicit CanBuildFrom can provide for us.
You can find many explanation how CanBuildFrom works.
But the main idea is put some function from your key -> value pair to GenTraversableOnce, it can be some Map, Seq or even Option and it will be mapped and flattened. Also you can provide your own CanBuildFrom.
If you have a value or a key in the Map having a list then it can be flatMap-ed.
Example:
val a = Map(1->List(1,2),2->List(2,3))
a.map(_._2) gives List(List(1,2),List(2,3))
you can flatten this using flatMap => a.flatMap(_._2) or a.map(_._2).flatten gives List(1,2,2,3)
src: http://www.scala-lang.org/old/node/12158.html
Not sure about any other way of using flatMap on a Map though.

Scala: Merge map

How can I merge maps like below:
Map1 = Map(1 -> Class1(1), 2 -> Class1(2))
Map2 = Map(2 -> Class2(1), 3 -> Class2(2))
After merged.
Merged = Map( 1 -> List(Class1(1)), 2 -> List(Class1(2), Class2(1)), 3 -> Class2(2))
Can be List, Set or any other collection who has size attribute.
Using the standard lib, you can do it as follows:
// convert maps to seq, to keep duplicate keys and concat
val merged = Map(1 -> 2).toSeq ++ Map(1 -> 4).toSeq
// merged: Seq[(Int, Int)] = ArrayBuffer((1,2), (1,4))
// group by key
val grouped = merged.groupBy(_._1)
// grouped: scala.collection.immutable.Map[Int,Seq[(Int, Int)]] = Map(1 -> ArrayBuffer((1,2), (1,4)))
// remove key from value set and convert to list
val cleaned = grouped.mapValues(_.map(_._2).toList)
// cleaned: scala.collection.immutable.Map[Int,List[Int]] = Map(1 -> List(2, 4))
This is the simplest implementation i could come up with,
val m1 = Map(1 -> "1", 2 -> "2")
val m2 = Map(2 -> "21", 3 -> "3")
def merge[K, V](m1:Map[K, V], m2:Map[K, V]):Map[K, List[V]] =
(m1.keySet ++ m2.keySet) map { i => i -> (m1.get(i).toList ::: m2.get(i).toList) } toMap
merge(m1, m2) // Map(1 -> List(1), 2 -> List(2, 21), 3 -> List(3))
You could use scalaz:
import scalaz._, Scalaz._
val m1 = Map('a -> 1, 'b -> 2)
val m2 = Map('b -> 3, 'c -> 4)
m1.mapValues{List(_)} |+| m2.mapValues{List(_)}
// Map('b -> List(2, 3), 'c -> List(4), 'a -> List(1))
You could use Set(_) instead of List(_) to get Sets as values in Map.
See Semigroup in scalaz cheat sheet (or in learning scalaz) for details about |+| operator.
For Int |+| works as +, for List - as ++, for Map it applies |+| to values of same keys.
One clean way to do it, with cats:
import cats.implicits._
Map(1 -> "Hello").combine(Map(2 -> "Goodbye"))
//Map(2 -> Goodbye, 1 -> Hello)
It's important to note that both maps have to be of the same type (in this case, Map[Int, String]).
Long explanation:
combine isn't really a member of Map. By importing cats.implicits you're bringing into scope cats's Map built-in monoid instances, along with some implicit classes which enable the terse syntax.
The above is equivalent to this:
Monoid[Map[Int, String]].combine(Map(1 -> "Hello"), Map(2 -> "Goodbye"))
Where we're using the Monoid "summoner" function to get the Monoid[Map[Int, String]] instance in scope and using its combine function.
Starting Scala 2.13, another solution only based on the standard library consists in using groupMap which (as its name suggests) is an equivalent of a groupBy followed by mapValues:
// val m1 = Map(1 -> "a", 2 -> "b")
// val m2 = Map(2 -> "c", 3 -> "d")
(m1.toSeq ++ m2).groupMap(_._1)(_._2)
// Map[Int,Seq[String]] = Map(2 -> List("b", "c"), 1 -> List("a"), 3 -> List("d"))
This:
Concatenates the two maps as a sequence of tuples (List((1,"a"), (2,"b"), (2,"c"), (3,"d"))). For conciseness, m2 is implicitly converted to Seq to adapt to the type of m1.toSeq - but you could choose to make it explicit by using m2.toSeq.
groups elements based on their first tuple part (_._1) (group part of groupMap)
maps grouped values to their second tuple part (_._2) (map part of groupMap)
I wrote a blog post about this , check it out :
http://www.nimrodstech.com/scala-map-merge/
basically using scalaz semi group you can achieve this pretty easily
would look something like :
import scalaz.Scalaz._
Map1 |+| Map2
You can use foldLeft to merge two Maps of the same type
def merge[A, B](a: Map[A, B], b: Map[A, B])(mergef: (B, Option[B]) => B): Map[A, B] = {
val (big, small) = if (a.size > b.size) (a, b) else (b, a)
small.foldLeft(big) { case (z, (k, v)) => z + (k -> mergef(v, z.get(k))) }
}
def mergeIntSum[A](a: Map[A, Int], b: Map[A, Int]): Map[A, Int] =
merge(a, b)((v1, v2) => v2.map(_ + v1).getOrElse(v1))
Example:
val a = Map("a" -> 1, "b" -> 5, "c" -> 6)
val b = Map("a" -> 4, "z" -> 8)
mergeIntSum(a, b)
res0: Map[String,Int] = Map(a -> 5, b -> 5, c -> 6, z -> 8)
There is a Scala module called scala-collection-contrib, which offers very useful methods like mergeByKey.
First, we need to add an additional dependency to build.sbt:
libraryDependencies += "org.scala-lang.modules" %% "scala-collection-contrib" % "0.1.0"
and then it's possible to do merge like this:
import scala.collection.decorators._
val map1 = Map(1 -> Class1(1), 2 -> Class1(2))
val map2 = Map(2 -> Class2(1), 3 -> Class2(2))
map1.mergeByKeyWith(map2){
case (a,b) => a.toList ++ b.toList
}
Solution to combine two maps: Map[A,B], the result type: Map[A,List[B]] via the Scala Cats (slightly improved version, offered by #David Castillo)
//convert each original map to Map[A,List[B]].
//Add an instance of Monoid[List] into the scope to combine lists:
import cats.instances.map._ // for Monoid
import cats.syntax.semigroup._ // for |+|
import cats.instances.list._
val map1 = Map("a" -> 1, "b" -> 2)
.mapValues(List(_))
val map2 = Map("b" -> 3, "d" -> 4)
.mapValues(List(_))
map1 |+| map2
If you don't want to mess around with original maps you could do something like following
val target = map1.clone()
val source = map2.clone()
source.foreach(e => target += e._1 -> e._2)
left.keys map { k => k -> List(left(k),right(k)) } toMap
Is concise and will work, assuming your two maps are left and right. Not sure about efficiency.
But your question is a bit ambiguous, for two reasons. You don't specify
The subtyping relationship between the values (i.e. class1,class2),
What happens if the maps have different keys
For the first case, consider the following example:
val left = Map("foo" ->1, "bar" ->2)
val right = Map("bar" -> 'a', "foo" -> 'b')
Which results in
res0: Map[String,List[Int]] = Map(foo -> List(1, 98), bar -> List(2, 97))
Notice how the Chars have been converted to Ints, because of the scala type hierarchy. More generally, if in your example class1 and class2 are not related, you would get back a List[Any]; this is probably not what you wanted.
You can work around this by dropping the List constructor from my answer; this will return Tuples which preserve the type:
res0: Map[String,(Int, Char)] = Map(foo -> (1,b), bar -> (2,a))
The second problem is what happens when you have maps that don't have the same keys. This will result in a key not found exception. Put in another way, are you doing a left, right, or inner join of the two maps? You can disambiguate the type of join by switching to right.keys or right.keySet ++ left.keySet for right/inner joins respectively. The later will work around the missing key problem, but maybe that's not what you want i.e. maybe you want a left or right join instead. In that case you can consider using the withDefault method of Map to ensure every key returns a value, e.g. None, but this needs a bit more work.
m2.foldLeft(m1.mapValues{List[CommonType](_)}) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, List.empty) :+ v)
}
As noted by jwvh, List type should be specified explicitly if Class1 is not upper type bound for Class2. CommonType is a type which is upper bound for both Class1 and Class2.
This answer does not solve the initial question directly although it solves a common/related scenario which is merging two maps by the common keys.
Based on #Drexin's answer I wrote a generic method to extend the existing Map functionality by providing a join method for Maps:
object implicits {
type A = Any
implicit class MapExt[K, B <: A, C <: A](val left: immutable.Map[K, B]) {
def join(right: immutable.Map[K, C]) : immutable.Map[K, Seq[A]] = {
val inter = left.keySet.intersect(right.keySet)
val leftFiltered = left.filterKeys{inter.contains}
val rightFiltered = right.filterKeys{inter.contains}
(leftFiltered.toSeq ++ rightFiltered.toSeq)
.groupBy(_._1)
.mapValues(_.map{_._2}.toList)
}
}
}
Notes:
Join is based on intersection of the keys, which resembles an "inner join" from the SQL world.
It works with Scala <= 2.12, for Scala 2.13 consider using groupMap as #Xavier Guihot suggested.
Consider replacing type A with your own base type.
Usage:
import implicits._
val m1 = Map("k11" -> "v11", "k12" -> "v12")
val m2 = Map("k11" -> "v21", "k12" -> "v22", "k13" -> "v23")
println (m1 join m2)
// Map(k11 -> List(v11, v21), k12 -> List(v12, v22))
this's two map merge
def mergeMap[A, B](map1: Map[A, B], map2: Map[A, B], op: (B, B) => B, default: => B): Map[A, B] = (map1.keySet ++ map2.keySet).map(x => (x, op(map1.getOrElse(x, default), map2.getOrElse(x, default)))).toMap
this’s multiple map merge
def mergeMaps[A, B](maps: Seq[Map[A, B]], op: (B, B) => B, default: => B): Map[A, B] = maps.reduce((a, b) => mergeMap(a, b, op, default))

Zip two HashMaps(or dictionaries)

What would be a functional way to zip two dictionaries in Scala?
map1 = new HashMap("A"->1,"B"->2)
map2 = new HashMap("B"->22,"D"->4) // B is the only common key
zipper(map1,map2) should give something similar to
Seq( ("A",1,0), // no A in second map, so third value is zero
("B",2,22),
("D",0,4)) // no D in first map, so second value is zero
If not functional, any other style is also appreciated
def zipper(map1: Map[String, Int], map2: Map[String, Int]) = {
for(key <- map1.keys ++ map2.keys)
yield (key, map1.getOrElse(key, 0), map2.getOrElse(key, 0))
}
scala> val map1 = scala.collection.immutable.HashMap("A" -> 1, "B" -> 2)
map1: scala.collection.immutable.HashMap[String,Int] = Map(A -> 1, B -> 2)
scala> val map2 = scala.collection.immutable.HashMap("B" -> 22, "D" -> 4)
map2: scala.collection.immutable.HashMap[String,Int] = Map(B -> 22, D -> 4)
scala> :load Zipper.scala
Loading Zipper.scala...
zipper: (map1: Map[String,Int], map2: Map[String,Int])Iterable[(String, Int, Int)]
scala> zipper(map1, map2)
res1: Iterable[(String, Int, Int)] = Set((A,1,0), (B,2,22), (D,0,4))
Note using get is probably preferable to getOrElse in this case. None is used to specify that a value does not exist instead of using 0.
As an alternative to Brian's answer, this can be used to enhance the map class by way of implicit methods:
implicit class MapUtils[K, +V](map: collection.Map[K, V]) {
def zipAllByKey[B >: V, C >: V](that: collection.Map[K, C], thisElem: B, thatElem: C): Iterable[(K, B, C)] =
for (key <- map.keys ++ that.keys)
yield (key, map.getOrElse(key, thisElem), that.getOrElse(key, thatElem))
}
The naming and API are similar to the sequence zipAll.