scala: best way to merge two mutable maps of mutable sets - scala

What is the best way in scala to merge two mutable maps of mutable sets? The operation must be commutative. The things I've tried seem ugly...
import scala.collection.mutable
var d1 = mutable.Map[String, mutable.SortedSet[String]]()
var d2 = mutable.Map[String, mutable.SortedSet[String]]()
// adding some elements. Accumulating nimals with the set of sounds they make.
d1.getOrElseUpdate("dog", mutable.SortedSet[String]("woof"))
d2.getOrElseUpdate("cow", mutable.SortedSet[String]("moo"))
d2.getOrElseUpdate("dog", mutable.SortedSet[String]("woof", "bark"))
magic (that is commutative!)
scala.collection.mutable.Map[String,scala.collection.mutable.SortedSet[String]] =
Map(dog -> TreeSet(bark, woof), cow -> TreeSet(moo))
Basically, I want to override the definition for ++ to merge the sets for matching map keys. Notice that
d1 ++ d2 gives the right answer, while d2 ++ d1 does not (so ++ is not commutative here).

For every key that would be in the resulting Map, you have to merge (++) the value Sets from d1 and d2 for that key.
For mutable.Maps and mutable.Sets, when you are updating one of the Maps, the implementation is really straightforward:
for ((key, values) <- d2)
d1.getOrElseUpdate(key, mutable.SortedSet.empty) ++= values
You can actually create an empty mutable.Map, and use that code to update it with d1 and d2 (and other Maps if needed) in any order.
You can wrap this operation in a following function:
val d1 = mutable.Map[String, mutable.SortedSet[String]](
"dog" -> mutable.SortedSet("woof"),
"cow" -> mutable.SortedSet("moo"))
val d2 = mutable.Map[String, mutable.SortedSet[String]](
"dog" -> mutable.SortedSet("woof", "bark"))
def updateMap[A, B : Ordering]( // `Ordering` is a requirement for `SortedSet`
d1: mutable.Map[A, mutable.SortedSet[B]])(
// `Iterable`s are enough here, but allow to pass a `Map[A, Set[B]]`
d2: Iterable[(A, Iterable[B])]
): Unit =
for ((key, values) <- d2)
d1.getOrElseUpdate(key, mutable.SortedSet.empty) ++= values
// You can call
// `updateMap(d1)(d2)` or
// `updateMap(d2)(d1)` to achieve the same result (but in different variables)
For immutable Maps one possible implementation is:
(
for (key <- d1.keySet ++ d2.keySet)
yield key -> (d1.getOrElse(key, Set.empty) ++ d2.getOrElse(key, Set.empty))
).toMap
Other, likely more effective, but probably slightly more complex implementations are also possible.

Related

Scala efficient set inclusion detection

Let a collection of tuples where the first item is a set, for instance
val xs = Seq(
((1 to 5).toSet ++ Set(9), "apple"),
((15 to 17).toSet, "pear"),
((21 to 30).toSet, "grape"))
Given a value x:Int, how to efficiently identify the second item ? (The real use case includes thousands of sets.)
For val x = 22 the result would be Some("grape"), for val x = 19 the result would be None.
Note Values in each set are not necessarily consecutive.
Note Sets do not overlap (any sets intersection proves empty).
Depends on your use case, but given you're concerned with efficiency, I assume you're going to do a lot of lookups.
I also assume you use one xs, and lookup in that a lot of times.
Preprocess xs into a map of Int->String
val xsMap = (xs flatMap { case (s, v) => s.map((_,v))}).toMap[Int, String]
Then it's trivial (and O(1)) to look up elements
xsMap.get(22) //> res0: Option[String] = Some(grape)
xsMap.get(19) //> res1: Option[String] = None
What about:
s.find(_._1.contains(11)).map(_._2)

Converting Function1 to Map in Scala

I am trying to compose two maps and get a composed map in Scala. I am doing the following:
val m1=Map(1->'A',2->'C',3->'Z')
val m2=Map('A' -> "Andy", 'Z'->"Zee",'D'->"David")
val m3=m2.compose(m1)
Now m3 is m1;m2, but its type is Function1.
Question:
How can I convert Function1 to Map?
ps.
Here, instead of m2.compose(m1), it is suggested to use val m3 = m1 andThen m2. Anyways, the result is of type Function1.
What do you expext for non-existing keys? If yuo want to remove this values from map use
val m1: Map[Int, Char] = Map(1 -> 'A', 2 -> 'C', 3 -> 'Z')
val m2: Map[Char, String] = Map('A' -> "Andy", 'Z' -> "Zee", 'D' -> "David")
val m3: Map[Int, String] = m1.filter(x => m2.contains(x._2)).map(x => x._1 -> m2(x._2))
//Map(1 -> Andy, 3 -> Zee)
Skipping .filter(x => m2.contains(x._2)) with throw an exception for this situations
You can also do
val m3: Map[Int, String] = m1.mapValues(m2)
There is a caveat that despite the return type this will recalculate m2(m1(x)) on every access to m3(x), just like m2.compose(m1).
since I updated the title of the question, the answer does not now directly address the question subject. I will wait to see if somebody possibly provides a direct answer (that is, how to convert Function1 to map).
def funToMap[A, B](f: A => B): Map[A, B] = Map.empty[A, B].withDefault(f)
Why composing two Maps gives you a Function1
A Map offers the semantics of a Function1 and then some. When you compose two Map functions, you get a new Function1 because compose is an operation on functions.
So you should not expect to get a Map by composing two Maps -- you are just composing the function aspects of the Maps, not the Maps themselves.
How to get a composed Map
To actually compose two Maps into another Map, you can use mapValues, but be careful. If you do it like this:
val m3 = m1 mapValues m2
you will silently get a view (see this bug) rather than a simple map, so as you do lookups in m3 it will really do lookups in m1 and then in m2. For large Maps and few lookups, that's a win because you didn't process a lot of entries you didn't need to. But if you do lots of lookups, this can cause a serious performance problem.
To really make a new single Map data structure that represents the composition of m1 and m2 you need to do something like
val m3 = (m1 mapValues m2).view.force
That is kind of unfortunate. In a perfect world m1 mapValues m2 would give you the new single composed Map -- and very quickly, by creating a new data structure with shape identical to m1 but with values run through m2 -- and (m1 mapValues m2).view would give you the view (and the type would reflect that).

Scala: Map keys with wildcards?

Is it possible to use keys with wildcards for Scala Maps? For example tuples of the form (x,_)? Example:
scala> val t1 = ("x","y")
scala> val t2 = ("y","x")
scala> val m = Map(t1 -> "foo", t2 -> "foo")
scala> m(("x","y"))
res5: String = foo
scala> m(("x",_))
<console>:11: error: missing parameter type for expanded function ((x$1) => scala.Tuple2("x", x$1))
m(("x",_))
^
It would be great if there was a way to retrieve all (composite_key, value) pares where only some part of composite key is defined. Other ways to get the same functionality in Scala?
How about use collect
Map( 1 -> "1" -> "11", 2 -> "2" -> "22").collect { case (k#(1, _), v ) => k -> v }
Using comprehensions like this:
for ( a # ((k1,k2), v) <- m if k1 == "x" ) yield a
In general, you can do something like
m.filter(m => (m._1 == "x"))
but in your particular example it will still return only one result, because a Map has only one entry per key. If your key itself is composite then it will indeed make more sense:
scala> Map((1,2)->"a", (1,3)->"b", (3,4)->"c").filter(m => (m._1._1 == 1))
res0: scala.collection.immutable.Map[(Int, Int),String] = Map((1,2) -> a, (1,3) -> b)
Think about what is happening under the hood of the Map. The default Map in Scala is scala.collection.immutable.HashMap, which stores things based on their hash codes. Do ("x", "y") and ("x", "y2") have hash codes that relate to each other in anyway? No, they don't, and their is no efficient way to implement wildcards with this map. The other answers provide solutions, but these will iterate over key/value pair in the entire Map, which is not efficient.
If you expect you are going to want to do operations like this, use a TreeMap. This doesn't use a hash table internally, put instead puts elements into a tree based on an ordering. This is similar to the way a relational database uses B-Trees for its indices. Your wildcard query is like using a two-column index to filter on the first column in the index.
Here is an example:
import scala.collection.immutable.TreeMap
val t1 = ("x","y")
val t2 = ("x","y2")
val t3 = ("y","x")
val m = TreeMap(t1 -> "foo1", t2 -> "foo2", t3 -> "foo3")
// "" is < than all other strings
// "x\u0000" is the next > string after "x"
val submap = m.from(("x", "")).to(("x\u0000", ""))
submap.values.foreach(println) // prints foo1, foo2

How to dynamically generate parallel futures with for-yield

I have below code:
val f1 = Future(genA1)
val f2 = Future(genA2)
val f3 = Future(genA3)
val f4 = Future(genA4)
val results: Future[Seq[A]] = for {
a1 <- f1
a2 <- f2
a3 <- f3
a4 <- f4
} yield Seq(a, b, c, d)
Now I have a requirement to optionally exclude a2, how to modified the code? ( with map or flatMap is also acceptable)
Further more, say if I have M possible future needs to be aggregated like above, and N of M could be optionally excluded against some flag (biz logic), how should I handle it?
thanks in advance!
Leon
In question1, I understand that you want to exclude one entry (e.g B) from the sequence given some logic and in question2, you want to supress N entries from a total of M, and have the future computed on those results. We could generalize both cases to something like this:
// Using a map as simple example, but 'generators' could be a function that creates the required computation
val generators = Map('a' -> genA1, 'b' -> genA1, 'c' -> genA3, 'd' -> genA4)
...
// shouldAccept(k) => Business logic to decide which computations should be executed.
val selectedGenerators = generators.filter{case (k,v) => shouldAccept(k)}
// Create Seq[Future] from the selected computations
val futures = selectedGenerators.map{case (k,v) => Future(v)}
// Create Future[Seq[_]] to have the result of computing all entries.
val result = Future.sequence(futures)
In general, what I think you are looking for is Future.sequence, which takes a Seq[Future[_]] and produces a Future[Seq[_]], which is basically what you are doing "by hand" with the for-comprehension.

Scala: how to make a Hash(Trie)Map from a Map (via Anorm in Play)

Having read this quote on HashTrieMaps on docs.scala-lang.org:
For instance, to find a given key in a map, one first takes the hash code of the key. Then, the lowest 5 bits of the hash code are used to select the first subtree, followed by the next 5 bits and so on. The selection stops once all elements stored in a node have hash codes that differ from each other in the bits that are selected up to this level.
I figured that be a great (read: fast!) collection to store my Map[String, Long] in.
In my Play Framework (using Scala) I have this piece of code using Anorm that loads in around 18k of elements. It takes a few seconds to load (no big deal, but any tips?). I'd like to have it 'in memory' for fast look ups for string to long translation.
val data = DB.withConnection { implicit c ⇒
SQL( "SELECT stringType, longType FROM table ORDER BY stringType;" )
.as( get[String]( "stringType" ) ~ get[Long]( "longType " )
map { case ( s ~ l ) ⇒ s -> l }* ).toMap.withDefaultValue( -1L )
}
This code makes data of type class scala.collection.immutable.Map$WithDefault. I'd like this to be of type HashTrieMap (or HashMap, as I understand the linked quote all Scala HashMaps are of HashTrieMap?). Weirdly enough I found no way on how to convert it to a HashTrieMap. (I'm new to Scala, Play and Anorm.)
// Test for the repl (2.9.1.final). Map[String, Long]:
val data = Map( "Hello" -> 1L, "world" -> 2L ).withDefaultValue ( -1L )
data: scala.collection.immutable.Map[java.lang.String,Long] =
Map(Hello -> 1, world -> 2)
// Google showed me this, but it is still a Map[String, Long].
val hm = scala.collection.immutable.HashMap( data.toArray: _* ).withDefaultValue( -1L )
// This generates an error.
val htm = scala.collection.immutable.HashTrieMap( data.toArray: _* ).withDefaultValue( -1L )
So my question is how to convert the MapWithDefault to HashTrieMap (or HashMap if that shares the implementation of HashTrieMap)?
Any feedback welcome.
As the documentation that you pointed to explains, immutable maps already are implemented under the hood as HashTrieMaps. You can easily verify this in the REPL:
scala> println( Map(1->"one", 2->"two", 3->"three", 4->"four", 5->"five").getClass )
class scala.collection.immutable.HashMap$HashTrieMap
So you have nothing special to do, your code already is using HashMap.HashTrieMap without you even realizing.
More precisely, the default implementation of immutable.Map is immutable.HashMap, which is further refined (extended) by immutable.HashMap.HashTrieMap.
Note though that small immutable maps are not instances of immutable.HashMap.HashTrieMap, but are implemented as special cases (this is an optimization). There is a certain size threshold where they start being impelmented as immutable.HashMap.HashTrieMap.
As an example, entering the following in the REPL:
val m0 = HashMap[Int,String]()
val m1 = m0 + (1 -> "one")
val m2 = m1 + (2 -> "two")
val m3 = m2 + (3 -> "three")
println(s"m0: ${m0.getClass.getSimpleName}, m1: ${m1.getClass.getSimpleName}, m2: ${m2.getClass.getSimpleName}, m3: ${m3.getClass.getSimpleName}")
will print this:
m0: EmptyHashMap$, m1: HashMap1, m2: HashTrieMap, m3: HashTrieMap
So here the empty map is an instance of EmptyHashMap$. Adding an element to that gives a HashMap1, and adding yet another element finally gives a HashTrieMap.
Finally, the use of withDefaultValue does not change anything, as withDefaultValue will just return an instance Map.WithDefault wich wraps the initial map (which will still be a HashMap.HashTrieMap).