Scala best way of turning a Collection into a Map-by-key? - scala

If I have a collection c of type T and there is a property p on T (of type P, say), what is the best way to do a map-by-extracting-key?
val c: Collection[T]
val m: Map[P, T]
One way is the following:
m = new HashMap[P, T]
c foreach { t => m add (t.getP, t) }
But now I need a mutable map. Is there a better way of doing this so that it's in 1 line and I end up with an immutable Map? (Obviously I could turn the above into a simple library utility, as I would in Java, but I suspect that in Scala there is no need)

You can use
c map (t => t.getP -> t) toMap
but be aware that this needs 2 traversals.

You can construct a Map with a variable number of tuples. So use the map method on the collection to convert it into a collection of tuples and then use the : _* trick to convert the result into a variable argument.
scala> val list = List("this", "maps", "string", "to", "length") map {s => (s, s.length)}
list: List[(java.lang.String, Int)] = List((this,4), (maps,4), (string,6), (to,2), (length,6))
scala> val list = List("this", "is", "a", "bunch", "of", "strings")
list: List[java.lang.String] = List(this, is, a, bunch, of, strings)
scala> val string2Length = Map(list map {s => (s, s.length)} : _*)
string2Length: scala.collection.immutable.Map[java.lang.String,Int] = Map(strings -> 7, of -> 2, bunch -> 5, a -> 1, is -> 2, this -> 4)

In addition to #James Iry's solution, it is also possible to accomplish this using a fold. I suspect that this solution is slightly faster than the tuple method (fewer garbage objects are created):
val list = List("this", "maps", "string", "to", "length")
val map = list.foldLeft(Map[String, Int]()) { (m, s) => m(s) = s.length }

This can be implemented immutably and with a single traversal by folding through the collection as follows.
val map = c.foldLeft(Map[P, T]()) { (m, t) => m + (t.getP -> t) }
The solution works because adding to an immutable Map returns a new immutable Map with the additional entry and this value serves as the accumulator through the fold operation.
The tradeoff here is the simplicity of the code versus its efficiency. So, for large collections, this approach may be more suitable than using 2 traversal implementations such as applying map and toMap.

Another solution (might not work for all types)
import scala.collection.breakOut
val m:Map[P, T] = c.map(t => (t.getP, t))(breakOut)
this avoids the creation of the intermediary list, more info here:
Scala 2.8 breakOut

What you're trying to achieve is a bit undefined.
What if two or more items in c share the same p? Which item will be mapped to that p in the map?
The more accurate way of looking at this is yielding a map between p and all c items that have it:
val m: Map[P, Collection[T]]
This could be easily achieved with groupBy:
val m: Map[P, Collection[T]] = c.groupBy(t => t.p)
If you still want the original map, you can, for instance, map p to the first t that has it:
val m: Map[P, T] = c.groupBy(t => t.p) map { case (p, ts) => p -> ts.head }

Scala 2.13+
instead of "breakOut" you could use
c.map(t => (t.getP, t)).to(Map)
Scroll to "View": https://www.scala-lang.org/blog/2017/02/28/collections-rework.html

This is probably not the most efficient way to turn a list to map, but it makes the calling code more readable. I used implicit conversions to add a mapBy method to List:
implicit def list2ListWithMapBy[T](list: List[T]): ListWithMapBy[T] = {
new ListWithMapBy(list)
}
class ListWithMapBy[V](list: List[V]){
def mapBy[K](keyFunc: V => K) = {
list.map(a => keyFunc(a) -> a).toMap
}
}
Calling code example:
val list = List("A", "AA", "AAA")
list.mapBy(_.length) //Map(1 -> A, 2 -> AA, 3 -> AAA)
Note that because of the implicit conversion, the caller code needs to import scala's implicitConversions.

c map (_.getP) zip c
Works well and is very intuitiv

How about using zip and toMap?
myList.zip(myList.map(_.length)).toMap

For what it's worth, here are two pointless ways of doing it:
scala> case class Foo(bar: Int)
defined class Foo
scala> import scalaz._, Scalaz._
import scalaz._
import Scalaz._
scala> val c = Vector(Foo(9), Foo(11))
c: scala.collection.immutable.Vector[Foo] = Vector(Foo(9), Foo(11))
scala> c.map(((_: Foo).bar) &&& identity).toMap
res30: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))
scala> c.map(((_: Foo).bar) >>= (Pair.apply[Int, Foo] _).curried).toMap
res31: scala.collection.immutable.Map[Int,Foo] = Map(9 -> Foo(9), 11 -> Foo(11))

This works for me:
val personsMap = persons.foldLeft(scala.collection.mutable.Map[Int, PersonDTO]()) {
(m, p) => m(p.id) = p; m
}
The Map has to be mutable and the Map has to be return since adding to a mutable Map does not return a map.

use map() on collection followed with toMap
val map = list.map(e => (e, e.length)).toMap

Related

Scala: Collect values defined in hashmap passed by a list argument

Suppose I have the following variables:
val m = HashMap( ("1", "one"), ("2", "two"), ("3", "three") )
val l = List("1", "2")
I would like to extract the list List("one","two"), which corresponds to the values for each key in the list present in the map.
This is my solution, works like a charm. Still I would like to know if I'm reinventing the wheel and if there's some idiomatic solution for doing what I intend to do:
class Mapper[T,V](val map: HashMap[T,V]) extends PartialFunction[T, V]{
override def isDefinedAt(x: T): Boolean = map.contains(x)
override def apply(x: T): V = map.get(x) match {
case Some(v) => v
}
}
val collected = l collect (new Mapper(map) )
List("one", "two")
Yes, you are reinventing the wheel. Your code is equivalent to
l collect m
but with additional layer of indirection that doesn't add anything to HashMap (which already implements PartialFunction—just expand the "Linear Supertypes" list to see that).
Alternatively, you can also use flatMap as follows:
l flatMap m.get
The implicit CanBuildFroms make sure that the result is actually a List.
You could do this, which seems a bit simpler:
val res = l.map(m.get(_)) // List(Some("one"), Some("two"))
.flatMap(_.toList)
Or even this, using a for-comprehension:
val res = for {
key <- l
value <- m.get(key)
} yield value
I would suggest something like this:
m.collect { case (k, v) if l.contains(k) => v }
note:
does not preserve the order from l
does not handle the case of duplicates in l

Does there exist in Scala, or a library, an equivalent to Clojure's diff as applied to maps?

In Clojure the diff function can be applied to maps, that doesn't seem to be the case in Scala, is anyone aware of something in Scala that would make it more accessible to obtain what the Clojure diff function obtains when it is applied to maps?
Here's the Clojure diff function explained for reference.
http://clojuredocs.org/clojure_core/clojure.data/diff
This is equivalent to Clojure's diff:
import collection.generic.CanBuildFrom
def diff[T, Col](x: Col with TraversableOnce[T], y: Col with TraversableOnce[T])
(implicit cbf: CanBuildFrom[Col, T, Col]): (Col, Col, Col) = {
val xs = x.toSet
val ys = y.toSet
def convert(s: Set[T]) = (cbf(x) ++= s).result
(convert(xs diff ys), convert(ys diff xs), convert(xs intersect ys))
}
It can operate on any kind of TraversableOnce and will return results with the same type as its parameters:
scala> diff(Map(1 -> 2), Map(1 -> 2))
res35: (scala.collection.immutable.Map[Int,Int], scala.collection.immutable.Map[Int,Int], scala.collection.immutable.Map[Int,Int]) = (Map(),Map(),Map(1 -> 2))
As others have said there isn't something exactly like that, but you can build it anyways. Here's my attempt that is added on as a companion to the map class. It produces the same result as the clojure diff example.
object MapsDemo extends App{
implicit class MapCompanionOps[A,B](val a: Map[A,B]) extends AnyVal {
def diff(b: Map[A,B]): (Map[A,B],Map[A,B],Map[A,B]) = {
(a.filter(p => !b.exists(_ == p)), //things-only-in-a
b.filter(p => !a.exists(_ == p)), //things-only-in-b
a.flatMap(p => b.find(_ == p) )) //things-in-both
}
}
val uno = Map("same" ->"same","different" -> "one")
val dos = Map("same" ->"same","different" -> "two","onlyhere"->"whatever")
println(uno diff dos) //(Map(different -> one),Map(different -> two, onlyhere -> whatever),Map(same -> same))
println( Map("a"->1).diff(Map("a"->1,"b"->2)) ) //(Map(),Map(b -> 2),Map(a -> 1))
}
You can achieve that by converting the maps to list first. For example:
scala> val a = Map(1->2, 2-> 3).toList
scala> val b = Map(1->2, 3-> 4).toList
scala> val closureDiff = List(a.diff(b), b.diff(a), a.intersect(b))
closureDiff: List[List[(Int, Int)]] = List(List((2,3)), List((3,4)), List((1,2)))
There is no function in the standard library that doe exactly what you need. However, an un-optimized version can be implemented easily in this manner(sorry for "span" mistake at first try).
def diffffK,V:(Map[K,V],Map[K,V],Map[K,V]) = {
val (both,left) = m1.partition({case (k,v) => m2.get(k) == Some(v) })
val right = m2.filter({case (k,v) => both.get(k) != Some(v) })
(both,left,right)
}
also, a map can be converted to a set with a single operator(toSet) and then you can use intercept, union and diff operators of Set.

How to convert Map[A,Future[B]] to Future[Map[A,B]]?

I've been working with the Scala Akka library and have come across a bit of a problem. As the title says, I need to convert Map[A, Future[B]] to Future[Map[A,B]]. I know that one can use Future.sequence for Iterables like Lists, but that doesn't work in this case.
I was wondering: is there a clean way in Scala to make this conversion?
See if this works for you:
val map = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
val fut = Future.sequence(map.map(entry => entry._2.map(i => (entry._1, i)))).map(_.toMap)
The idea is to map the map to an Iterable for a Tuple of the key of the map and the result of the future tied to that key. From there you can sequence that Iterable and then once you have the aggregate Future, map it and convert that Iterable of Tuples to a map via toMap.
Now, an alternative to this approach is to try and do something similar to what the sequence function is doing, with a couple of tweaks. You could write a sequenceMap function like so:
def sequenceMap[A, B](in: Map[B, Future[A]])(implicit executor: ExecutionContext): Future[Map[B, A]] = {
val mb = new MapBuilder[B,A, Map[B,A]](Map())
in.foldLeft(Promise.successful(mb).future) {
(fr, fa) => for (r <- fr; a <- fa._2.asInstanceOf[Future[A]]) yield (r += ((fa._1, a)))
} map (_.result)
}
And then use it in an example like this:
val map = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
val fut = sequenceMap(map)
fut onComplete{
case Success(m) => println(m)
case Failure(ex) => ex.printStackTrace()
}
This might be slightly more efficient than the first example as it creates less intermediate collections and has less hits to the ExecutionContext.
I think the most succinct we can be with core Scala 2.12.x is
val futureMap = Map("a" -> future{1}, "b" -> future{2}, "c" -> future{3})
Future.traverse(futureMap.toList) { case (k, fv) => fv.map(k -> _) } map(_.toMap)
Update: You can actually get the nice .sequence syntax in Scalaz 7 without too much fuss:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{ Future, future }
import scalaz._, Scalaz.{ ToTraverseOps => _, _ }
import scalaz.contrib.std._
val m = Map("a" -> future(1), "b" -> future(2), "c" -> future(3))
And then:
scala> m.sequence.onSuccess { case result => println(result) }
Map(a -> 1, b -> 2, c -> 3)
In principle it shouldn't be necessary to hide ToTraverseOps like this, but for now it does the trick. See the rest of my answer below for more details about the Traverse type class, dependencies, etc.
As copumpkin notes in a comment above, Scalaz contains a Traverse type class with an instance for Map[A, _] that is one of the puzzle pieces here. The other piece is the Applicative instance for Future, which isn't in Scalaz 7 (which is still cross-built against pre-Future 2.9), but is in scalaz-contrib.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scalaz._, Scalaz._
import scalaz.contrib.std._
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] = {
type M[X] = Map[A, X]
(m: M[Future[B]]).sequence
}
Or:
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] =
Traverse[({ type L[X] = Map[A, X] })#L] sequence m
Or:
def sequence[A, B](m: Map[A, Future[B]]): Future[Map[A, B]] =
TraverseOpsUnapply(m).sequence
In a perfect world you'd be able to write m.sequence, but the TraverseOps machinery that should make this syntax possible isn't currently able to tell how to go from a particular Map instance to the appropriate Traverse instance.
This also works, where the idea is to use the sequence result (of the map's values) to fire a promise that says you can start retrieving values from your map. mapValues gives you a non-strict view of your map, so the value.get.get is only applied when you retrieve the value. That's right, you get to keep your map! Free ad for the puzzlers in that link.
import concurrent._
import concurrent.duration._
import scala.util._
import ExecutionContext.Implicits.global
object Test extends App {
def calc(i: Int) = { Thread sleep i * 1000L ; i }
val m = Map("a" -> future{calc(1)}, "b" -> future{calc(2)}, "c" -> future{calc(3)})
val m2 = m mapValues (_.value.get.get)
val k = Future sequence m.values
val p = Promise[Map[String,Int]]
k onFailure { case t: Throwable => p failure t }
k onSuccess { case _ => p success m2 }
val res = Await.result(p.future, Duration.Inf)
Console println res
}
Here's the REPL where you see it force the m2 map by printing all its values:
scala> val m2 = m mapValues (_.value.get.get)
m2: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
This shows the same thing with futures that are still in the future:
scala> val m2 = m mapValues (_.value.get.get)
java.util.NoSuchElementException: None.get
Just make a new future which waits for all futures in the map values , then builds a map to return.
I would try to avoid using overengineered Scalaz based super-functional solutions (unless your project is already heavily Scalaz based and has tons of "computationally sophisticated" code; no offense on the "overengineered" remark):
// the map you have
val foo: Map[A, Future[B]] = ???
// get a Seq[Future[...]] so that we can run Future.sequence on it
val bar: Seq[Future[(A, B)]] = foo.map { case (k, v) => v.map(k -> _) }
// here you go; convert back `toMap` once it completes
Future.sequence(bar).onComplete { data =>
// do something with data.toMap
}
However, it should be safe to assume that your map values are somehow generated from the map keys, which initially reside in a Seq such as List, and that the part of code that builds the initial Map is under your control as opposed to being sent from elsewhere. So I would personally take an even simpler/cleaner approach instead by not starting out with Map[A, Future[B]] in the first place.
def fetchAgeFromDb(name: String): Future[Int] = ???
// no foo needed anymore
// no Map at all before the future completes
val bar = personNames.map { name => fetchAgeFromDb(name).map(name -> _) }
// just as above
Future.sequence(bar).onComplete { data =>
// do something with data.toMap
}
Is this solution acceptable :
without an execution context this should works ...
def removeMapFuture[A, B](in: Future[Map[A, Future[B]]]) = {
in.flatMap { k =>
Future.sequence(k.map(l =>
l._2.map(l._1 -> _)
)).map {
p => p.toMap
}
}
}

Filtering a Scala Multimap and outputting as a list of Tuples

I have a map using the multimap trait, like so
val multiMap = new HashMap[Foo, Set[Bar]] with MultiMap[Foo, Bar]
I would like to combine filtering this map on specific values
multiMap.values.filter(bar => barCondition)
with flattening the matching results into a list of tuples of the form
val fooBarPairs: List[(Foo, Bar)]
What would be the idiomatic way of doing this? I was hoping that Scala might provide something like an anamorphism to do this without looping, but as a complete newbie I am not sure what my options are.
Here's an example:
import collection.mutable.{HashMap, MultiMap, Set}
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
mm.addBinding("One", 1).addBinding("One",11).addBinding("Two",22).
addBinding("Two",222)
// mm.type = Map(Two -> Set(22, 222), One -> Set(1, 11))
I think the easiest way to get what you want is to use a for-expression:
for {
(str, xs) <- mm.toSeq
x <- xs
if x > 10
} yield (str, x) // = ArrayBuffer((Two,222), (Two,22), (One,11))
You need the .toSeq or the output type will be a Map, which would mean each mapping is overidden by subsequent elements. Use toList on this output if you need a List specifically.
Here is an example of what I think you want to do:
scala> mm
res21: scala.collection.mutable.HashMap[String,scala.collection.mutable.Set[Int]] with scala.collection.mutable.MultiMap[String,Int]
= Map(two -> Set(6, 4, 5), one -> Set(2, 1, 3))
scala> mm.toList.flatMap(pair =>
pair._2.toList.flatMap(bar =>
if (bar%2==0)
Some((pair._1, bar))
else
None))
res22: List[(String, Int)] = List((two,6), (two,4), (one,2))
Here is another, slightly more concise solution:
import collection.mutable.{HashMap, MultiMap, Set}
val mm = new HashMap[String, Set[Int]] with MultiMap[String, Int]
val f = (i: Int) => i > 10
mm.addBinding("One", 1)
.addBinding("One",11)
.addBinding("Two",22)
.addBinding("Two",222)
/* Map(Two -> Set(22, 222), One -> Set(1, 11)) */
mm.map{case (k, vs) => vs.filter(f).map((k, _))}.flatten
/* ArrayBuffer((Two,222), (Two,22), (One,11)) */

Create a Map of Iterables only using immutable collections

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.
Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).
I can do it using mutable Map and/or using mutable ListBuffers.
However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)
I have found out a really simple way to do this
pairs.groupBy{_._1}.mapValues{_.map{_._2}}
And that's it.
Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:
loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation
So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we
def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
#annotation.tailrec def mkMapInner(
data: Iterator[(K,V)],
map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
): Map[K,Vector[V]] = {
if (data.hasNext) {
val (k,v) = data.next
mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
}
else map
}
mkMapInner(data.iterator)
}
Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with #annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)
Let's test it out:
val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] =
Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))
Now, it turns out that Scala's collection libraries also contain something useful for this:
scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }
with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.
For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):
pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}
Although of course in this case the groupBy approach is more convenient.
val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)
ps.groupBy(_._1).mapValues(_ map (_._2))
// = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))
This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:
ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
// = Map(1 -> Vector(2, 5), 3 -> Vector(4))
It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....
More info on breakOut here.
As a method:
def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
I perform this function so often that I have an implicit written called groupByKey that does precisely this:
class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)
And you use it like this:
scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))
Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.