I have this initial kind of maps:
m: Map[(String, String, String), Double]
and I would like to merge them in a way to get a final Map with the following type:
mm: Map[(String, String, String), Seq[Double]]
So for example:
val m1 = Map (("a","b","c") -> 2.0, ("a","b","d") -> 3.0)
val m2 = Map (("a","b","c") -> 5.0, ("a","b","k") -> 3.0)
// after the merge
Map (("a","b","c") -> Seq(2.0, 5.0), ("a","b","d") -> Seq(3.0), ("a","b","k") -> Seq(3.0))
How can I get that with Scala?
You can do:
(m1.toSeq ++ m2.toSeq)
.groupBy { case (k, v) => k }
.mapValues(_.map { case (k, v) => v })
If you have already imported scalaz then you can do:
m1.mapValues(_.point[List]) |+| m2.mapValues(_.point[List])
You can convert Maps to Seq and then group the Seq by the key:
(m1.toSeq ++ m2.toSeq).groupBy(_._1).mapValues(_.map(_._2))
// res80: scala.collection.immutable.Map[(String, String, String),Seq[Double]] = Map((a,b,k) -> ArrayBuffer(3.0), (a,b,c) -> ArrayBuffer(2.0, 5.0), (a,b,d) -> ArrayBuffer(3.0))
Related
Trying to reverse Map and the output is only 2 element
val occurrences: Map[String, Int] = arr.groupMapReduce(identity)(_ => 1)(_ + _)
Output: HashMap(world -> 2, Hello, -> 1, hello, -> 1, hello -> 2, and -> 1, world, -> 1)
val reversed = for ((k,v) <- occurrences) yield (v, k)
Output: HashMap(1 -> world,, 2 -> hello)
How did I lost the other patameters?
Similar to #user proposal, but trying to be a little bit more efficient.
def invertMap[K, V](map: Map[K, V]): Map[V, List[K]] =
map
.view
.groupMap(_._2)(_._1)
.view
.mapValues(_.toList)
.toMap
The performance difference would probably be negligible so go with the one you find more readable.
As #Luis Miguel Mejía Suárez said, you can't duplicate keys in a Map, so when you try to make the values the keys, some of the entries are lost.
You can instead do this to obtain a Map[Int, List[String]]
val occurrences = Map("world" -> 2, "Hello," -> 1, "hello," -> 1, "hello" -> 2, "and" -> 1, "world," -> 1)
val x: Map[Int, List[String]] =
occurrences.toList
.groupBy { case (k, v) => v }
.view.mapValues(v => v.map(_._1))
.toMap
Output:
Map(1 -> List(Hello,, hello,, and, world,), 2 -> List(world, hello))
P.S. The .view and .toMap stuff is because mapValues on MapOps is deprecated for now. There'll be a proper strict version later, though.
I was thinking of a way to create a tuple consisting of the String key from the map along with each of the Strings from the Set together as tuple that form the key in a new map. Value for the new map will be initialized to 0.0.
Ex: If I have to following:
Map[ USA, Set[CA, NY, WA]]
I want to create a new map from this which looks like:
Map[(USA,CA) -> 0.0, (USA,NY) -> 0.0, (USA,WA) -> 0.0]
I am able to create a Map[String, String] but I was hoping to get some help in creating the tuple key.
Map("USA" -> Set("CA", "NY", "WA")) flatMap { case (k, set) => set.map((k, _) -> 0.0) }
val myMap = Map("USA" -> Set("CA", "NY", "WA"))
val newMap = myMap.foldLeft(Map[(String, String), Double]()) {
case (acc, (key, values)) => {
acc ++ (for {
value <- values
} yield (key, value) -> 0.0)
}
}
Trying to convert a Map[Long, Set[Long]] to a Map[Long, Long].
I tried this but having compile issues:
m.map(_.swap).map(k => k._1.map((_, k._2)))
Example:
Map(10 -> Set(1,2,3), 11 -> Set(4,5))
Should become:
Map(1 -> 10,
2 -> 10,
3 -> 10,
4 -> 11,
5 -> 11)
flatMap on Map[A,B] will "just work" with collections of tuples:
m.flatMap {case (k,v) => v.map(_ -> k)} // Map[Long,Long]
going from a Map[Long,Set[Long]] to a series of Set[(Long,Long)] that gets flattened to a Map[Long,Long].
Assuming in is your Map[Long, Set[Long]]:
in./:(Map.empty[Long, Long]) { case (acc, (key, values)) => acc ++ values.map(_ -> key) }
To clarify, seem like you have this:
Map(10 -> Set(1,2,3), 11 -> Set(4,5))
And you want to convert this map in another map, but with something like this:
Map(1 -> 10,
2 -> 10,
3 -> 10,
4 -> 11,
5 -> 11)
As you can see if the sets are not disjoint, some keys in the resulted map with be missing:
Having this in consideration, the code will look like this:
val m: Map[Long, Set[Long]] = Map(10l -> Set(1l,2l,3l), 11l -> Set(4l,5l))
m.map(_.swap).map(k => k._1.map((_, k._2)))
val foo: Iterable[(Long, Long)] = m.flatMap { t =>
val (key, value) = t
value.map(_ -> key)
}
val result: Map[Long, Long] = foo.toMap
This will invert your Map m from Map[Long, Set[Long]] to Map[Long, List[Long]].
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.map(_._2)}
You haven't specified what should happen when different Set values contains some of the same Longs (i.e. Map(8 -> Set(1,2), 9 -> Set(2,3))). If you're sure that won't happen you can use the following adjustment.
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.head._2}
Or even more simply:
m.flatten {case(k, vs) => vs.map((_, k))}.toMap
I have nested maps with a key -> Map(key1 -> Map(), key2 -> Map()) kinda representation, which basically represent the path structure of a particular HTTP request made.
root/twiki/bin/edit/Main/Double_bounce_sender
root/twiki/bin/rdiff/TWiki/NewUserTemplate
I have stored them in a Map of maps which would give me the hierarchy of the path. Using a parser I read the data off server logs and get the corresponding data needed and then index the data in a sorted map.
val mainList: RDD[List[String]] = requesturl flatMap ( r => r.toString split("\\?") map (x => parser(x.split("/").filter(x => !x.contains("=")).toList).valuesIterator.toList))
def parser(list: List[String]): Map[Int, String]= {
val m = list.zipWithIndex.map(_.swap).toMap
val sM = SortedMap(m.toSeq:_*)
sM.+(0 -> "root")
}
After getting the data in the structure required, I loop through the entire collection to structure the data into a path map which would look like
root - twiki - bin - edit - Main - Double_bounce_sender
-rdiff - TWiki - NewUserTemplate
- oops - etc - local - getInterface
type innerMap = mutable.HashMap[String, Any]
def getData(input: RDD[List[String]]): mutable.HashMap[String, innerMap] ={
var mainMap = new mutable.HashMap[String, innerMap]
for(x <- input){
val z: mutable.HashMap[String, innerMap] = storeData(x.toIterator, mainMap ,x(0).toString)
mainMap = mainMap ++ z
}
mainMap
}
def storeData(list: Iterator[String], map: mutable.HashMap[String, innerMap], root: String): mutable.HashMap[String, innerMap]={
list.hasNext match {
case true =>
val v = list.next()
val y = map contains (root) match {
case true =>
println("Adding when exists: "+v)
val childMap = map.get(v).get match {
case _:HashMap[String, Any] => asInstanceOf[mutable.HashMap[String, innerMap]]
case _ => new mutable.HashMap[String, innerMap]
}
val x = map + (v -> storeData(list, childMap, v))
x
case false =>
val x = map + (v -> storeData(list, new mutable.HashMap[String, innerMap], v))
x
}
y.asInstanceOf[mutable.HashMap[String, innerMap]]
case false =>
new mutable.HashMap[String, innerMap]
}
}
The get data method calls each input list and sends it to the storeData method which builds the map.
I'm stuck at two places.
The MainMap(HashMap[String, innerMap]) sent recursively to storeData goes as a new empty map every time.
The second issue is that I'm trying to figure out a way of merging 2 nested Maps that do not have a defined length. Such as merging the maps below.
Map(root -> Map(twiki -> Map(bin -> Map(edit -> Map(Main -> Map(Double -> Map())))))))
Map(root -> Map(twiki -> Map(bin -> Map(rdiff -> Map(TWiki -> Map(NewUser -> Map())))))))
Looking for suggestions on how I could implement this solution and get a final map that contains all the possible paths present in the server log files in one map.
to merge these two maps you can use scalaz and |+| method
# Map("root" ->
Map("twiki" ->
Map("bin" ->
Map("rdiff" ->
Map("TWiki" ->
Map("NewUser" ->
Map.empty[String, String]))))))
res2: Map[String, Map[String, Map[String, Map[String, Map[String, Map[String, Map[String, String]]]]]]] =
Map("root" ->
Map("twiki" ->
Map("bin" ->
Map("rdiff" ->
Map("TWiki" ->
Map("NewUser" -> Map()))))))
# Map("root" ->
Map("twiki" ->
Map("bin" ->
Map("edit" ->
Map("Main" ->
Map("Double" -> Map.empty[String, String]))))))
res3: Map[String, Map[String, Map[String, Map[String, Map[String, Map[String, Map[String, String]]]]]]] =
Map("root" ->
Map("twiki" ->
Map("bin" ->
Map("edit" ->
Map("Main" ->
Map("Double" -> Map()))))))
res2 |+| res3
res4: Map[String, Map[String, Map[String, Map[String, Map[String, Map[String, Map[String, String]]]]]]] =
Map("root" ->
Map("twiki" ->
Map("bin" ->
Map(
"edit" ->
Map("Main" ->
Map("Double" -> Map())),
"rdiff" ->
Map("TWiki" ->
Map("NewUser" -> Map()))))))
Maybe something like this?
scala> type Node = Map[String, Any];
defined type alias Node
scala> def merge( me : Node, you : Node ) : Node = {
| val keySet = me.keySet ++ you.keySet;
| def nodeForKey( parent : Node, key : String ) : Node = parent.getOrElse( key, Map.empty ).asInstanceOf[Node]
| keySet.map( key => (key -> merge( nodeForKey( me, key ), nodeForKey( you, key ) ) ) ).toMap
| }
merge: (me: Node, you: Node)Node
scala> val path1 = Map( "root" -> Map("bin" -> Map("sh" -> Map.empty) ) )
path1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,scala.collection.immutable.Map[Nothing,Nothing]]]] = Map(root -> Map(bin -> Map(sh -> Map())))
scala> val path2 = Map( "root" -> Map( "bin" -> Map("csh" -> Map.empty), "usr" -> Map.empty ) )
path2: scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,scala.collection.immutable.Map[_ <: String, scala.collection.immutable.Map[Nothing,Nothing]]]] = Map(root -> Map(bin -> Map(csh -> Map()), usr -> Map()))
scala> merge( path1, path2 )
res8: Node = Map(root -> Map(bin -> Map(sh -> Map(), csh -> Map()), usr -> Map()))
I have a List of Map[String, Double], and I'd like to merge their contents into a single Map[String, Double]. How should I do this in an idiomatic way? I imagine that I should be able to do this with a fold. Something like:
val newMap = Map[String, Double]() /: listOfMaps { (accumulator, m) => ... }
Furthermore, I'd like to handle key collisions in a generic way. That is, if I add a key to the map that already exists, I should be able to specify a function that returns a Double (in this case) and takes the existing value for that key, plus the value I'm trying to add. If the key does not yet exist in the map, then just add it and its value unaltered.
In my specific case I'd like to build a single Map[String, Double] such that if the map already contains a key, then the Double will be added to the existing map value.
I'm working with mutable maps in my specific code, but I'm interested in more generic solutions, if possible.
Well, you could do:
mapList reduce (_ ++ _)
except for the special requirement for collision.
Since you do have that special requirement, perhaps the best would be doing something like this (2.8):
def combine(m1: Map, m2: Map): Map = {
val k1 = Set(m1.keysIterator.toList: _*)
val k2 = Set(m2.keysIterator.toList: _*)
val intersection = k1 & k2
val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
r2 ++ r1
}
You can then add this method to the map class through the Pimp My Library pattern, and use it in the original example instead of "++":
class CombiningMap(m1: Map[Symbol, Double]) {
def combine(m2: Map[Symbol, Double]) = {
val k1 = Set(m1.keysIterator.toList: _*)
val k2 = Set(m2.keysIterator.toList: _*)
val intersection = k1 & k2
val r1 = for(key <- intersection) yield (key -> (m1(key) + m2(key)))
val r2 = m1.filterKeys(!intersection.contains(_)) ++ m2.filterKeys(!intersection.contains(_))
r2 ++ r1
}
}
// Then use this:
implicit def toCombining(m: Map[Symbol, Double]) = new CombiningMap(m)
// And finish with:
mapList reduce (_ combine _)
While this was written in 2.8, so keysIterator becomes keys for 2.7, filterKeys might need to be written in terms of filter and map, & becomes **, and so on, it shouldn't be too different.
How about this one:
def mergeMap[A, B](ms: List[Map[A, B]])(f: (B, B) => B): Map[A, B] =
(Map[A, B]() /: (for (m <- ms; kv <- m) yield kv)) { (a, kv) =>
a + (if (a.contains(kv._1)) kv._1 -> f(a(kv._1), kv._2) else kv)
}
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
val mm = mergeMap(ms)((v1, v2) => v1 + v2)
println(mm) // prints Map(hello -> 5.5, world -> 2.2, goodbye -> 3.3)
And it works in both 2.7.5 and 2.8.0.
I'm surprised no one's come up with this solution yet:
myListOfMaps.flatten.toMap
Does exactly what you need:
Merges the list to a single Map
Weeds out any duplicate keys
Example:
scala> List(Map('a -> 1), Map('b -> 2), Map('c -> 3), Map('a -> 4, 'b -> 5)).flatten.toMap
res7: scala.collection.immutable.Map[Symbol,Int] = Map('a -> 4, 'b -> 5, 'c -> 3)
flatten turns the list of maps into a flat list of tuples, toMap turns the list of tuples into a map with all the duplicate keys removed
Starting Scala 2.13, another solution which handles duplicate keys and is only based on the standard library consists in merging the Maps as sequences (flatten) before applying the new groupMapReduce operator which (as its name suggests) is an equivalent of a groupBy followed by a mapping and a reduce step of grouped values:
List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
.flatten
.groupMapReduce(_._1)(_._2)(_ + _)
// Map("world" -> 2.2, "goodbye" -> 3.3, "hello" -> 5.5)
This:
flattens (concatenates) the maps as a sequence of tuples (List(("hello", 1.1), ("world", 2.2), ("goodbye", 3.3), ("hello", 4.4))), which keeps all key/values (even duplicate keys)
groups elements based on their first tuple part (_._1) (group part of groupMapReduce)
maps grouped values to their second tuple part (_._2) (map part of groupMapReduce)
reduces mapped grouped values (_+_) by taking their sum (but it can be any reduce: (T, T) => T function) (reduce part of groupMapReduce)
The groupMapReduce step can be seen as a one-pass version equivalent of:
list.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _))
Interesting, noodling around with this a bit, I got the following (on 2.7.5):
General Maps:
def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: Seq[scala.collection.Map[A,B]]): Map[A, B] = {
listOfMaps.foldLeft(Map[A, B]()) { (m, s) =>
Map(
s.projection.map { pair =>
if (m contains pair._1)
(pair._1, collisionFunc(m(pair._1), pair._2))
else
pair
}.force.toList:_*)
}
}
But man, that is hideous with the projection and forcing and toList and whatnot. Separate question: what's a better way to deal with that within the fold?
For mutable Maps, which is what I was dealing with in my code, and with a less general solution, I got this:
def mergeMaps[A,B](collisionFunc: (B,B) => B)(listOfMaps: List[mutable.Map[A,B]]): mutable.Map[A, B] = {
listOfMaps.foldLeft(mutable.Map[A,B]()) {
(m, s) =>
for (k <- s.keys) {
if (m contains k)
m(k) = collisionFunc(m(k), s(k))
else
m(k) = s(k)
}
m
}
}
That seems a little bit cleaner, but will only work with mutable Maps as it's written. Interestingly, I first tried the above (before I asked the question) using /: instead of foldLeft, but I was getting type errors. I thought /: and foldLeft were basically equivalent, but the compiler kept complaining that I needed explicit types for (m, s). What's up with that?
I reading this question quickly so I'm not sure if I'm missing something (like it has to work for 2.7.x or no scalaz):
import scalaz._
import Scalaz._
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms.reduceLeft(_ |+| _)
// returns Map(goodbye -> 3.3, hello -> 5.5, world -> 2.2)
You can change the monoid definition for Double and get another way to accumulate the values, here getting the max:
implicit val dbsg: Semigroup[Double] = semigroup((a,b) => math.max(a,b))
ms.reduceLeft(_ |+| _)
// returns Map(goodbye -> 3.3, hello -> 4.4, world -> 2.2)
I wrote a blog post about this , check it out :
http://www.nimrodstech.com/scala-map-merge/
basically using scalaz semi group you can achieve this pretty easily
would look something like :
import scalaz.Scalaz._
listOfMaps reduce(_ |+| _)
a oneliner helper-func, whose usage reads almost as clean as using scalaz:
def mergeMaps[K,V](m1: Map[K,V], m2: Map[K,V])(f: (V,V) => V): Map[K,V] =
(m1 -- m2.keySet) ++ (m2 -- m1.keySet) ++ (for (k <- m1.keySet & m2.keySet) yield { k -> f(m1(k), m2(k)) })
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms.reduceLeft(mergeMaps(_,_)(_ + _))
// returns Map(goodbye -> 3.3, hello -> 5.5, world -> 2.2)
for ultimate readability wrap it in an implicit custom type:
class MyMap[K,V](m1: Map[K,V]) {
def merge(m2: Map[K,V])(f: (V,V) => V) =
(m1 -- m2.keySet) ++ (m2 -- m1.keySet) ++ (for (k <- m1.keySet & m2.keySet) yield { k -> f(m1(k), m2(k)) })
}
implicit def toMyMap[K,V](m: Map[K,V]) = new MyMap(m)
val ms = List(Map("hello" -> 1.1, "world" -> 2.2), Map("goodbye" -> 3.3, "hello" -> 4.4))
ms reduceLeft { _.merge(_)(_ + _) }