merge sets' elements with HashMap's key in scala - scala

I hope there is an easy way to solve that
I have two RDDs
g.vertices
(4,Set(5, 3))
(0,Set(1, 4))
(1,Set(2))
(6,Set())
(3,Set(0))
(5,Set(2))
(2,Set(1))
maps
Map(4 -> Set(5, 3))
Map(0 -> Set(1, 4))
Map(1 -> Set(2))
Map(6 -> Set())
Map(3 -> Set(0))
Map(5 -> Set(2))
Map(2 -> Set(1))
How can I do something like this?
(4,Map(5 -> Set(2), 3 -> Set(0)))
(0,Map(1 -> Set(2), 4 -> Set(5, 3)))
(1,Map(2 -> Set(1)))
(6,Map())
(3,Map(0 -> Set(1, 4)))
(5,Map(2 -> Set(1)))
(2,Map(1 -> Set(2)))
I want to combine map's key with elements of set, so I want to change sets' elements (merge them with map's key)
I thought about
val maps = g.vertices.map { case (id, attr) => HashMap(id -> attr) }
g.mapVertices{case (id, data) => data.map{case vId => maps.
map { case i if i.keySet.contains(vId) => HashMap(vId -> i.values) } }}
but I have an error
org.apache.spark.SparkException: RDD transformations and actions can
only be invoked by the driver, not inside of other transformations;
for example, rdd1.map(x => rdd2.values.count() * x) is invalid because
the values transformation and count action cannot be performed inside
of the rdd1.map transformation. For more information, see SPARK-5063.

This is a simple use case for join. In the following code, A is the type of the keys in g.vertices, K and V are the key and value types for maps:
def joinByKeys[A, K, V](sets: RDD[(A, Set[K])], maps: RDD[Map[K, V]]): RDD[(A, Map[K, V])] = {
val flattenSets = sets.flatMap(p => p._2.map(_ -> p._1)) // create a pair for each element of vertice's sets
val flattenMaps = maps.flatMap(identity) // create an RDD with all indexed values in Maps
flattenMaps.join(flattenSets).map{ // join them by their key
case (k, (v, a)) => (a, (k, v)) // reorder to put the vertexId as id
}.aggregateByKey(Map.empty[K, V])(_ + _, _ ++ _) // aggregate the maps
}

Related

Scala: inverting a one-to-many relationship [duplicate]

This question already has answers here:
Elegant way to invert a map in Scala
(10 answers)
Closed 3 years ago.
I have:
val intsPerChar: List[(Char, List[Int])] = List(
'A' -> List(1,2,3),
'B' -> List(2,3)
)
I want to get a mapping of ints with the chars that they have a mapping with. ie, I want to get:
val charsPerInt: Map[Int, List[Char]] = Map(
1 -> List('A'),
2 -> List('A', 'B'),
3 -> List('A', 'B')
)
Currently, I am doing the following:
val numbers: List[Int] = l.flatMap(_._2).distinct
numbers.map( n =>
n -> l.filter(_._2.contains(n)).map(_._1)
).toMap
Is there a less explicit way of doing this? ideally some sort of groupBy.
Try
intsPerChar
.flatMap { case (c, ns) => ns.map((_, c)) }
.groupBy(_._1)
.mapValues(_.map(_._2))
// Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))
Might be personal preference as to whether you consider it more or less readable, but the following is another option:
intsPerChar
.flatMap(n => n._2.map(i => i -> n._1)) // List((1,A), (2,A), (3,A), (2,B), (3,B))
.groupBy(_._1) // Map(2 -> List((2,A), (2,B)), 1 -> List((1,A)), 3 -> List((3,A), (3,B)))
.transform { (_, v) => v.unzip._2}
Final output is:
Map(2 -> List(A, B), 1 -> List(A), 3 -> List(A, B))

Invert a Map (String -> List) in Scala

I have a Map[String, List[String]] and I want to invert it. For example, if I have something like
"1" -> List("a","b","c")
"2" -> List("a","j","k")
"3" -> List("a","c")
The result should be
"a" -> List("1","2","3")
"b" -> List("1")
"c" -> List("1","3")
"j" -> List("2")
"k" -> List("2")
I've tried this:
m.map(_.swap)
But it returns a Map[List[String], String]:
List("a","b","c") -> "1"
List("a","j","k") -> "2"
List("a","c") -> "3"
Map inversion is a little more complicated.
val m = Map("1" -> List("a","b","c")
,"2" -> List("a","j","k")
,"3" -> List("a","c"))
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.map(_._2)}
//res0: Map[String,Iterable[String]] = Map(j -> List(2), a -> List(1, 2, 3), b -> List(1), c -> List(1, 3), k -> List(2))
Flatten the Map into a collection of tuples. groupBy will create a new Map with the old values as the new keys. Then un-tuple the values by removing the key (previously value) elements.
An alternative that does not rely on strange implicit arguments of flatten, as requested by yishaiz:
val m = Map(
"1" -> List("a","b","c"),
"2" -> List("a","j","k"),
"3" -> List("a","c"),
)
val res = (for ((digit, chars) <- m.toList; c <- chars) yield (c, digit))
.groupBy(_._1) // group by characters
.mapValues(_.unzip._2) // drop redundant digits from lists
res foreach println
gives:
(j,List(2))
(a,List(1, 2, 3))
(b,List(1))
(c,List(1, 3))
(k,List(2))
A simple nested for-comprehension may be used to invert the map in such a way that each value in the List of values are keys in the inverted map with respective keys as their values
implicit class MapInverter[T] (map: Map[T, List[T]]) {
def invert: Map[T, T] = {
val result = collection.mutable.Map.empty[T, T]
for ((key, values) <- map) {
for (v <- values) {
result += (v -> key)
}
}
result.toMap
}
Usage:
Map(10 -> List(3, 2), 20 -> List(16, 17, 18, 19)).invert

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

Its a nested map with contents like this when i print it onto screen
(5, Map ( "ABCD" -> Map("3200" -> 3,
"3350.800" -> 4,
"200.300" -> 3)
(1, Map ( "DEF" -> Map("1200" -> 32,
"1320.800" -> 4,
"2100" -> 3)
I need to get something like this
Case Class( 5, ABCD 3200, 3)
Case Class(5, ABCD 3350.800, 4)
CaseClass(5,ABCD., 200.300, 3)
CaseClass(1, DEF 1200, 32)
CaseClass(1 DEF, 1320.800, 4)
etc etc.
basically a list of case classes
And map it to a case class object so that i can save it to cassandra.
I have tried doing flatMapValues but that un nests the map only one level. Also used flatMap . that doesnt work either or I'am making mistakes
Any suggestions ?
Fairly straightforward using a for-comprehension and some pattern matching to destructure things:
val in = List((5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3))),
(1, Map ("DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3))))
case class Thing(a:Int, b:String, c:String, d:Int)
for { (index, m) <- in
(k,v) <-m
(innerK, innerV) <- v}
yield Thing(index, k, innerK, innerV)
//> res0: List[maps.maps2.Thing] = List(Thing(5,ABCD,3200,3),
// Thing(5,ABCD,3350.800,4),
// Thing(5,ABCD,200.300,3),
// Thing(1,DEF,1200,32),
// Thing(1,DEF,1320.800,4),
// Thing(1,DEF,2100,3))
So let's pick part the for-comprehension
(index, m) <- in
This is the same as
t <- in
(index, m) = t
In the first line t will successively be set to each element of in.
t is therefore a tuple (Int, Map(...))
Patten matching lets us put that "patten" for the tuple on the right hand side and the compiler picks apart the tuple, sets index to the Int and m to the Map.
(k,v) <-m
As before this is equivalent to
u <-m
(k, v) = u
And this time u takes each element of Map. Which again are tuples of key and value. So k is set successively to each key and v to the value.
And v is your inner map so we do the same thing again with the inner map
(innerK, innerV) <- v}
Now we have everything we need to create the case class. yield just says make a collection of whatever is "yielded" each time through the loop.
yield Thing(index, k, innerK, innerV)
Under the hood, this just translates to a set of maps/flatmaps
The yield is just the value Thing(index, k, innerK, innerV)
We get one of those for each element of v
v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}
but there's an inner map per element of the outer map
m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
(flatMap because we get a List of Lists if we just did a map and we want to flatten it to just the list of items)
Similarly, we do one of those for every element in the List
in.flatMap (z => val (index, m) = z; m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
Let's do that in _1, _2 style-y.
in.flatMap (z=> z._2.flatMap{y=>y._2.map{x=>;Thing(z._1, y._1, x._1, x._2)}}}
which produces exactly the same result. But isn't it clearer as a for-comprehension?
You can do this like this if you prefer collection operation
case class Record(v1: Int, v2: String, v3: Double, v4: Int)
val data = List(
(5, Map ( "ABC" ->
Map(
3200. -> 3,
3350.800 -> 4,
200.300 -> 3))
),
(1, Map ( "DEF" ->
Map(
1200. -> 32,
1320.800 -> 4,
2100. -> 3))
)
)
val rdd = sc.parallelize(data)
val result = rdd.flatMap(p => {
p._2.toList
.flatMap(q => q._2.toList.map(l => (q._1, l)))
.map((p._1, _))
}).map(p => Record(p._1, p._2._1, p._2._2._1, p._2._2._2))
println(result.collect.toList)
//List(
// Record(5,ABC,3200.0,3),
// Record(5,ABC,3350.8,4),
// Record(5,ABC,200.3,3),
// Record(1,DEF,1200.0,32),
// Record(1,DEF,1320.8,4),
// Record(1,DEF,2100.0,3)
//)

What is a more functional way of creating a Map of List?

I have this working code to create a Map between the characters in a String, and a List containing the indexes.
scala> "Lollipop".zipWithIndex.foldLeft(Map[Char, List[Int]]())((acc, t) => acc + (t._1 -> (acc.getOrElse(t._1, List[Int]()) :+ t._2)))
res122: scala.collection.immutable.Map[Char,List[Int]] = Map(i -> List(4), L -> List(0), l -> List(2, 3), p -> List(5, 7), o -> List(1, 6))
But the use of acc.getOrElse looks imperative.
Is there a more functional way that hides this from the user?
for {
(c, l) <- "Lollipop".zipWithIndex.groupBy{ _._1 }
} yield c -> l.map{ _._2 }
// Map(i -> Vector(4), L -> Vector(0), l -> Vector(2, 3), p -> Vector(5, 7), o -> Vector(1, 6))
After groupBy{ _._1 } you'll get a Map[Char, Seq[(Char, Int)]]. So you have to convert pairs (Char, Int) to Int, using p => p._2 or just _._2.
You could use mapValueslike this:
"Lollipop".zipWithIndex.groupBy{ _._1 }.mapValues{ _.map{_._2} }
But mapValues creates a lazy collection, so you could get a performance issue in case of multiple access to the same element by key.
Alternative is to use default value for your map (rewritten code a little bit to be more explicit):
val empty = Map.empty[Char, List[Int]].withDefaultValue(List.empty)
"Lollipop".zipWithIndex.foldLeft(empty) {
case (acc, (char, position)) => {
val positions = acc(char) :+ position
acc + (char -> positions)
}
}

Reverse / transpose a one-to-many map in Scala

What is the best way to turn a Map[A, Set[B]] into a Map[B, Set[A]]?
For example, how do I turn a
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
into a
Map("a" -> Set(1),
"b" -> Set(1, 2),
"c" -> Set(2, 3),
"d" -> Set(3))
(I'm using immutable collections only here. And my real problem has nothing to do with strings or integers. :)
with help from aioobe and Moritz:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_)(v)))).toMap
It's a bit more readable if you explicitly call contains:
def reverse[A, B](m: Map[A, Set[B]]) =
m.values.toSet.flatten.map(v => (v, m.keys.filter(m(_).contains(v)))).toMap
Best I've come up with so far is
val intToStrs = Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
def mappingFor(key: String) =
intToStrs.keys.filter(intToStrs(_) contains key).toSet
val newKeys = intToStrs.values.flatten
val inverseMap = newKeys.map(newKey => (newKey -> mappingFor(newKey))).toMap
Or another one using folds:
def reverse2[A,B](m:Map[A,Set[B]])=
m.foldLeft(Map[B,Set[A]]()){case (r,(k,s)) =>
s.foldLeft(r){case (r,e)=>
r + (e -> (r.getOrElse(e, Set()) + k))
}
}
Here's a one statement solution
orginalMap
.map{case (k, v)=>value.map{v2=>(v2,k)}}
.flatten
.groupBy{_._1}
.transform {(k, v)=>v.unzip._2.toSet}
This bit rather neatly (*) produces the tuples needed to construct the reverse map
Map(1 -> Set("a", "b"),
2 -> Set("b", "c"),
3 -> Set("c", "d"))
.map{case (k, v)=>v.map{v2=>(v2,k)}}.flatten
produces
List((a,1), (b,1), (b,2), (c,2), (c,3), (d,3))
Converting it directly to a map overwrites the values corresponding to duplicate keys though
Adding .groupBy{_._1} gets this
Map(c -> List((c,2), (c,3)),
a -> List((a,1)),
d -> List((d,3)),
b -> List((b,1), (b,2)))
which is closer. To turn those lists into Sets of the second half of the pairs.
.transform {(k, v)=>v.unzip._2.toSet}
gives
Map(c -> Set(2, 3), a -> Set(1), d -> Set(3), b -> Set(1, 2))
QED :)
(*) YMMV
A simple, but maybe not super-elegant solution:
def reverse[A,B](m:Map[A,Set[B]])={
var r = Map[B,Set[A]]()
m.keySet foreach { k=>
m(k) foreach { e =>
r = r + (e -> (r.getOrElse(e, Set()) + k))
}
}
r
}
The easiest way I can think of is:
// unfold values to tuples (v,k)
// for all values v in the Set referenced by key k
def vk = for {
(k,vs) <- m.iterator
v <- vs.iterator
} yield (v -> k)
// fold iterator back into a map
(Map[String,Set[Int]]() /: vk) {
// alternative syntax: vk.foldLeft(Map[String,Set[Int]]()) {
case (m,(k,v)) if m contains k =>
// Map already contains a Set, so just add the value
m updated (k, m(k) + v)
case (m,(k,v)) =>
// key not in the map - wrap value in a Set and return updated map
m updated (k, Set(v))
}