I am trying to compose two maps and get a composed map in Scala. I am doing the following:
val m1=Map(1->'A',2->'C',3->'Z')
val m2=Map('A' -> "Andy", 'Z'->"Zee",'D'->"David")
val m3=m2.compose(m1)
Now m3 is m1;m2, but its type is Function1.
Question:
How can I convert Function1 to Map?
ps.
Here, instead of m2.compose(m1), it is suggested to use val m3 = m1 andThen m2. Anyways, the result is of type Function1.
What do you expext for non-existing keys? If yuo want to remove this values from map use
val m1: Map[Int, Char] = Map(1 -> 'A', 2 -> 'C', 3 -> 'Z')
val m2: Map[Char, String] = Map('A' -> "Andy", 'Z' -> "Zee", 'D' -> "David")
val m3: Map[Int, String] = m1.filter(x => m2.contains(x._2)).map(x => x._1 -> m2(x._2))
//Map(1 -> Andy, 3 -> Zee)
Skipping .filter(x => m2.contains(x._2)) with throw an exception for this situations
You can also do
val m3: Map[Int, String] = m1.mapValues(m2)
There is a caveat that despite the return type this will recalculate m2(m1(x)) on every access to m3(x), just like m2.compose(m1).
since I updated the title of the question, the answer does not now directly address the question subject. I will wait to see if somebody possibly provides a direct answer (that is, how to convert Function1 to map).
def funToMap[A, B](f: A => B): Map[A, B] = Map.empty[A, B].withDefault(f)
Why composing two Maps gives you a Function1
A Map offers the semantics of a Function1 and then some. When you compose two Map functions, you get a new Function1 because compose is an operation on functions.
So you should not expect to get a Map by composing two Maps -- you are just composing the function aspects of the Maps, not the Maps themselves.
How to get a composed Map
To actually compose two Maps into another Map, you can use mapValues, but be careful. If you do it like this:
val m3 = m1 mapValues m2
you will silently get a view (see this bug) rather than a simple map, so as you do lookups in m3 it will really do lookups in m1 and then in m2. For large Maps and few lookups, that's a win because you didn't process a lot of entries you didn't need to. But if you do lots of lookups, this can cause a serious performance problem.
To really make a new single Map data structure that represents the composition of m1 and m2 you need to do something like
val m3 = (m1 mapValues m2).view.force
That is kind of unfortunate. In a perfect world m1 mapValues m2 would give you the new single composed Map -- and very quickly, by creating a new data structure with shape identical to m1 but with values run through m2 -- and (m1 mapValues m2).view would give you the view (and the type would reflect that).
Related
I have the following mutable Hashmap in Scala:
HashMap((b,3), (c,4), (a,8), (a,2))
and need to be converted to the following:
HashMap((b,3), (c,4), (a,10))
I need something like reduceByKey function logic.
I added the code here
def main(args: Array[String]) = {
val m = new mutable.HashMap[String,Tuple2[String,Int]]()
println("Hello, world")
m.+=(("xx",("a",2)))
m.+=(("uu",("b",3)))
m.+=(("zz",("a",8)))
m.+=(("yy",("c",4)))
println(m.values)
}
For pre 2.13 Scala versions you can try using groupBy with map:
m.values
.groupBy(_._1)
.mapValues(_.map(_._2).sum)
It sounds like what you have is not a hashmap but m.values of type Iterable[Tuple2[String, Int]], which is more manageable. In that case, as hinted at in the comments, groupMapReduce does it all in one function. This function groups "matching" elements together, applies a transformation to each element, and then reduces the groups using a binary operation.
m.values.groupMapReduce(_._1)(_._2)(_ + _)
This says "Group the values by the first element of their tuple, then keep the second element (i.e. the number), and then add all of the numbers in each group". This produces a map from the first element of the tuple to the sum.
Map(a -> 10, b -> 3, c -> 4)
Note that this is a Map, not necessarily a HashMap. If you want a HashMap (i.e. for mutability), you'll need to convert it yourself.
I recently discovered breakOut and love how elegant it is, but noticed that it doesn't maintain order.
eg (from REPL):
scala> val list = List("apples", "bananas", "oranges")
list: List[String] = List(apples, bananas, oranges)
scala> val hs: HashMap[String, Int] = list.map{x => (x -> x.length)}(breakOut)
hs: scala.collection.mutable.HashMap[String,Int] = Map(bananas -> 7, oranges -> 7, apples -> 6)
I like using breakOut since it's really clean and neat but ordering does matter to me. Is there a way to get it to maintain order or do I have to add elements to my hashmap one at a time?
You see this behavior, because of the fact that HashMap is a data structure with undefined order. Even if you see some ordering of the elements in the hash map and it's consistent across the runs, you shouldn't depend on it. If you really need the order, consider using LinkedHashMap
What is the best way in scala to merge two mutable maps of mutable sets? The operation must be commutative. The things I've tried seem ugly...
import scala.collection.mutable
var d1 = mutable.Map[String, mutable.SortedSet[String]]()
var d2 = mutable.Map[String, mutable.SortedSet[String]]()
// adding some elements. Accumulating nimals with the set of sounds they make.
d1.getOrElseUpdate("dog", mutable.SortedSet[String]("woof"))
d2.getOrElseUpdate("cow", mutable.SortedSet[String]("moo"))
d2.getOrElseUpdate("dog", mutable.SortedSet[String]("woof", "bark"))
magic (that is commutative!)
scala.collection.mutable.Map[String,scala.collection.mutable.SortedSet[String]] =
Map(dog -> TreeSet(bark, woof), cow -> TreeSet(moo))
Basically, I want to override the definition for ++ to merge the sets for matching map keys. Notice that
d1 ++ d2 gives the right answer, while d2 ++ d1 does not (so ++ is not commutative here).
For every key that would be in the resulting Map, you have to merge (++) the value Sets from d1 and d2 for that key.
For mutable.Maps and mutable.Sets, when you are updating one of the Maps, the implementation is really straightforward:
for ((key, values) <- d2)
d1.getOrElseUpdate(key, mutable.SortedSet.empty) ++= values
You can actually create an empty mutable.Map, and use that code to update it with d1 and d2 (and other Maps if needed) in any order.
You can wrap this operation in a following function:
val d1 = mutable.Map[String, mutable.SortedSet[String]](
"dog" -> mutable.SortedSet("woof"),
"cow" -> mutable.SortedSet("moo"))
val d2 = mutable.Map[String, mutable.SortedSet[String]](
"dog" -> mutable.SortedSet("woof", "bark"))
def updateMap[A, B : Ordering]( // `Ordering` is a requirement for `SortedSet`
d1: mutable.Map[A, mutable.SortedSet[B]])(
// `Iterable`s are enough here, but allow to pass a `Map[A, Set[B]]`
d2: Iterable[(A, Iterable[B])]
): Unit =
for ((key, values) <- d2)
d1.getOrElseUpdate(key, mutable.SortedSet.empty) ++= values
// You can call
// `updateMap(d1)(d2)` or
// `updateMap(d2)(d1)` to achieve the same result (but in different variables)
For immutable Maps one possible implementation is:
(
for (key <- d1.keySet ++ d2.keySet)
yield key -> (d1.getOrElse(key, Set.empty) ++ d2.getOrElse(key, Set.empty))
).toMap
Other, likely more effective, but probably slightly more complex implementations are also possible.
Just now I am surprised to learn that mapValues produces a view. The consequence is shown in the following example:
case class thing(id: Int)
val rand = new java.util.Random
val distribution = Map(thing(0) -> 0.5, thing(1) -> 0.5)
val perturbed = distribution mapValues { _ + 0.1 * rand.nextGaussian }
val sumProbs = perturbed.map{_._2}.sum
val newDistribution = perturbed mapValues { _ / sumProbs }
The idea is that I have a distribution, which is perturbed with some randomness then I renormalize it. The code actually fails in its original intention: since mapValues produces a view, _ + 0.1 * rand.nextGaussian is always re-evaluated whenever perturbed is used.
I am now doing something like distribution map { case (s, p) => (s, p + 0.1 * rand.nextGaussian) }, but that's just a little bit verbose. So the purpose of this question is:
Remind people who are unaware of this fact.
Look for reasons why they make mapValues output views.
Whether there is an alternative method that produces concrete Map.
Are there any other commonly-used collection methods that have this trap.
Thanks.
There's a ticket about this, SI-4776 (by YT).
The commit that introduces it has this to say:
Following a suggestion of jrudolph, made filterKeys and mapValues
transform abstract maps, and duplicated functionality for immutable
maps. Moved transform and filterNot from immutable to general maps.
Review by phaller.
I have not been able to find the original suggestion by jrudolph, but I assume it was done to make mapValues more efficient. Give the question, that may come as a surprise, but mapValues is more efficient if you are not likely to iterate over the values more than once.
As a work-around, one can do mapValues(...).view.force to produce a new Map.
The scala doc say:
a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements.
So this should be expected, but this scares me a lot, I'll have to review bunch of code tomorrow. I wasn't expecting a behavior like that :-(
Just an other workaround:
You can call toSeq to get a copy, and if you need it back to map toMap, but this unnecessary create objects, and have a performance implication over using map
One can relatively easy write, a mapValues which doesn't create a view, I'll do it tomorrow and post the code here if no one do it before me ;)
EDIT:
I found an easy way to 'force' the view, use '.map(identity)' after mapValues (so no need of implementing a specific function):
scala> val xs = Map("a" -> 1, "b" -> 2)
xs: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1, b -> 2)
scala> val ys = xs.mapValues(_ + Random.nextInt).map(identity)
ys: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1315230132, b -> 1614948101)
scala> ys
res7: scala.collection.immutable.Map[java.lang.String,Int] = Map(a -> 1315230132, b -> 1614948101)
It's a shame the type returned isn't actually a view! othewise one would have been able to call 'force' ...
Is better(and deprecated) in scala 2.13, now returns a MapView:
API Doc
Having read this quote on HashTrieMaps on docs.scala-lang.org:
For instance, to find a given key in a map, one first takes the hash code of the key. Then, the lowest 5 bits of the hash code are used to select the first subtree, followed by the next 5 bits and so on. The selection stops once all elements stored in a node have hash codes that differ from each other in the bits that are selected up to this level.
I figured that be a great (read: fast!) collection to store my Map[String, Long] in.
In my Play Framework (using Scala) I have this piece of code using Anorm that loads in around 18k of elements. It takes a few seconds to load (no big deal, but any tips?). I'd like to have it 'in memory' for fast look ups for string to long translation.
val data = DB.withConnection { implicit c ⇒
SQL( "SELECT stringType, longType FROM table ORDER BY stringType;" )
.as( get[String]( "stringType" ) ~ get[Long]( "longType " )
map { case ( s ~ l ) ⇒ s -> l }* ).toMap.withDefaultValue( -1L )
}
This code makes data of type class scala.collection.immutable.Map$WithDefault. I'd like this to be of type HashTrieMap (or HashMap, as I understand the linked quote all Scala HashMaps are of HashTrieMap?). Weirdly enough I found no way on how to convert it to a HashTrieMap. (I'm new to Scala, Play and Anorm.)
// Test for the repl (2.9.1.final). Map[String, Long]:
val data = Map( "Hello" -> 1L, "world" -> 2L ).withDefaultValue ( -1L )
data: scala.collection.immutable.Map[java.lang.String,Long] =
Map(Hello -> 1, world -> 2)
// Google showed me this, but it is still a Map[String, Long].
val hm = scala.collection.immutable.HashMap( data.toArray: _* ).withDefaultValue( -1L )
// This generates an error.
val htm = scala.collection.immutable.HashTrieMap( data.toArray: _* ).withDefaultValue( -1L )
So my question is how to convert the MapWithDefault to HashTrieMap (or HashMap if that shares the implementation of HashTrieMap)?
Any feedback welcome.
As the documentation that you pointed to explains, immutable maps already are implemented under the hood as HashTrieMaps. You can easily verify this in the REPL:
scala> println( Map(1->"one", 2->"two", 3->"three", 4->"four", 5->"five").getClass )
class scala.collection.immutable.HashMap$HashTrieMap
So you have nothing special to do, your code already is using HashMap.HashTrieMap without you even realizing.
More precisely, the default implementation of immutable.Map is immutable.HashMap, which is further refined (extended) by immutable.HashMap.HashTrieMap.
Note though that small immutable maps are not instances of immutable.HashMap.HashTrieMap, but are implemented as special cases (this is an optimization). There is a certain size threshold where they start being impelmented as immutable.HashMap.HashTrieMap.
As an example, entering the following in the REPL:
val m0 = HashMap[Int,String]()
val m1 = m0 + (1 -> "one")
val m2 = m1 + (2 -> "two")
val m3 = m2 + (3 -> "three")
println(s"m0: ${m0.getClass.getSimpleName}, m1: ${m1.getClass.getSimpleName}, m2: ${m2.getClass.getSimpleName}, m3: ${m3.getClass.getSimpleName}")
will print this:
m0: EmptyHashMap$, m1: HashMap1, m2: HashTrieMap, m3: HashTrieMap
So here the empty map is an instance of EmptyHashMap$. Adding an element to that gives a HashMap1, and adding yet another element finally gives a HashTrieMap.
Finally, the use of withDefaultValue does not change anything, as withDefaultValue will just return an instance Map.WithDefault wich wraps the initial map (which will still be a HashMap.HashTrieMap).