Scala: Cartesian product between a variable number of Arrays - scala

I am a newbie in Scala and would appreciate any direction or help in solving the following problem.
Input
I have a Map[String, Array[Double]] that looks like follow:
Map(foo -> Array(12, 25, 100), bar -> Array(0.1, 0.001))
The Map can contain between 1 and 10 keys (depends on the some parameters in my application).
Processing
I would like to apply a cartesian product between the arrays of all keys and generate a structure that contains all the possible combinations of all values of all arrays.
In the example above, the cartesian product will create 3x2=6 different combinations: (12, 0.1), (12, 0.001), (25, 0.1), (25, 0.01), (100,0.1) and (100, 0.01).
For the sake of another example, in some cases I might have three key: the first key has an Array of 4 values, the second has an Array of 5 values and the third has an Array of 3 values, in this case the product has to generate 4x5x3=60 different combinations.
Desired output
Something like:
Map(config1 -> (foo -> 12, bar -> 0.1), config2 -> (foo -> 12, bar -> 0.001), config3 -> (foo -> 25, bar -> 0.1), config4 -> (foo -> 25, bar -> 0.001), config5 -> (foo -> 100, bar -> 0.1), config6 -> (foo -> 100, bar -> 0.001))

You can use a for comprehension to create a cartesian product of two lists, arrays, ...
val start = Map(
'foo -> Array(12, 25, 100),
'bar -> Array(0.1, 0.001),
'baz -> Array(2))
// transform arrays into lists with values paired with map key
val pairedWithKey = start.map { case (k,v) => v.map(i => k -> i).toList }
val accumulator = pairedWithKey.head.map(x => Vector(x))
val cartesianProd = pairedWithKey.tail.foldLeft(accumulator)( (acc, elem) =>
for { x <- acc; y <- elem } yield x :+ y
)
cartesianProd foreach println
// Vector(('foo,12), ('bar,0.1), ('baz,2))
// Vector(('foo,12), ('bar,0.001), ('baz,2))
// Vector(('foo,25), ('bar,0.1), ('baz,2))
// Vector(('foo,25), ('bar,0.001), ('baz,2))
// Vector(('foo,100), ('bar,0.1), ('baz,2))
// Vector(('foo,100), ('bar,0.001), ('baz,2))
You might want to add some checks before using head and tail.

Since the number of Arrays is dynamic, there is no way you can get tuples as the result.
You can, however, utilize recursion for your purpose:
def process(a: Map[String, Seq[Double]]) = {
def product(a: List[(String, Seq[Double])]): Seq[List[(String, Double)]] =
a match {
case (name, values) :: tail =>
for {
result <- product(tail)
value <- values
} yield (name, value) :: result
case Nil => Seq(List())
}
product(a.toList)
}
val a = Map("foo" -> List(12.0, 25.0, 100.0), "bar" -> List(0.1, 0.001))
println(process(a))
Which gives a result of:
List(List((foo,12.0), (bar,0.1)), List((foo,25.0), (bar,0.1)), List((foo,100.0), (bar,0.1)), List((foo,12.0), (bar,0.001)), List((foo,25.0), (bar,0.001)), List((foo,100.0), (bar,0.001)))

Related

Invert a Map (String -> List) in Scala

I have a Map[String, List[String]] and I want to invert it. For example, if I have something like
"1" -> List("a","b","c")
"2" -> List("a","j","k")
"3" -> List("a","c")
The result should be
"a" -> List("1","2","3")
"b" -> List("1")
"c" -> List("1","3")
"j" -> List("2")
"k" -> List("2")
I've tried this:
m.map(_.swap)
But it returns a Map[List[String], String]:
List("a","b","c") -> "1"
List("a","j","k") -> "2"
List("a","c") -> "3"
Map inversion is a little more complicated.
val m = Map("1" -> List("a","b","c")
,"2" -> List("a","j","k")
,"3" -> List("a","c"))
m flatten {case(k, vs) => vs.map((_, k))} groupBy (_._1) mapValues {_.map(_._2)}
//res0: Map[String,Iterable[String]] = Map(j -> List(2), a -> List(1, 2, 3), b -> List(1), c -> List(1, 3), k -> List(2))
Flatten the Map into a collection of tuples. groupBy will create a new Map with the old values as the new keys. Then un-tuple the values by removing the key (previously value) elements.
An alternative that does not rely on strange implicit arguments of flatten, as requested by yishaiz:
val m = Map(
"1" -> List("a","b","c"),
"2" -> List("a","j","k"),
"3" -> List("a","c"),
)
val res = (for ((digit, chars) <- m.toList; c <- chars) yield (c, digit))
.groupBy(_._1) // group by characters
.mapValues(_.unzip._2) // drop redundant digits from lists
res foreach println
gives:
(j,List(2))
(a,List(1, 2, 3))
(b,List(1))
(c,List(1, 3))
(k,List(2))
A simple nested for-comprehension may be used to invert the map in such a way that each value in the List of values are keys in the inverted map with respective keys as their values
implicit class MapInverter[T] (map: Map[T, List[T]]) {
def invert: Map[T, T] = {
val result = collection.mutable.Map.empty[T, T]
for ((key, values) <- map) {
for (v <- values) {
result += (v -> key)
}
}
result.toMap
}
Usage:
Map(10 -> List(3, 2), 20 -> List(16, 17, 18, 19)).invert

merge sets' elements with HashMap's key in scala

I hope there is an easy way to solve that
I have two RDDs
g.vertices
(4,Set(5, 3))
(0,Set(1, 4))
(1,Set(2))
(6,Set())
(3,Set(0))
(5,Set(2))
(2,Set(1))
maps
Map(4 -> Set(5, 3))
Map(0 -> Set(1, 4))
Map(1 -> Set(2))
Map(6 -> Set())
Map(3 -> Set(0))
Map(5 -> Set(2))
Map(2 -> Set(1))
How can I do something like this?
(4,Map(5 -> Set(2), 3 -> Set(0)))
(0,Map(1 -> Set(2), 4 -> Set(5, 3)))
(1,Map(2 -> Set(1)))
(6,Map())
(3,Map(0 -> Set(1, 4)))
(5,Map(2 -> Set(1)))
(2,Map(1 -> Set(2)))
I want to combine map's key with elements of set, so I want to change sets' elements (merge them with map's key)
I thought about
val maps = g.vertices.map { case (id, attr) => HashMap(id -> attr) }
g.mapVertices{case (id, data) => data.map{case vId => maps.
map { case i if i.keySet.contains(vId) => HashMap(vId -> i.values) } }}
but I have an error
org.apache.spark.SparkException: RDD transformations and actions can
only be invoked by the driver, not inside of other transformations;
for example, rdd1.map(x => rdd2.values.count() * x) is invalid because
the values transformation and count action cannot be performed inside
of the rdd1.map transformation. For more information, see SPARK-5063.
This is a simple use case for join. In the following code, A is the type of the keys in g.vertices, K and V are the key and value types for maps:
def joinByKeys[A, K, V](sets: RDD[(A, Set[K])], maps: RDD[Map[K, V]]): RDD[(A, Map[K, V])] = {
val flattenSets = sets.flatMap(p => p._2.map(_ -> p._1)) // create a pair for each element of vertice's sets
val flattenMaps = maps.flatMap(identity) // create an RDD with all indexed values in Maps
flattenMaps.join(flattenSets).map{ // join them by their key
case (k, (v, a)) => (a, (k, v)) // reorder to put the vertexId as id
}.aggregateByKey(Map.empty[K, V])(_ + _, _ ++ _) // aggregate the maps
}

Correct interpretation of Scala Map's flatmap expression

I am scratching my head vigorously, to understand the logic that produces the value out of a flatMap() operation:
val ys = Map("a" -> List(1 -> 11,1 -> 111), "b" -> List(2 -> 22,2 -> 222)).flatMap(e => {
| println("e =" + e)
| (e._2)
| })
e =(a,List((1,11), (1,111)))
e =(b,List((2,22), (2,222)))
ys: scala.collection.immutable.Map[Int,Int] = Map(1 -> 111, 2 -> 222)
The println clearly shows that flatMap is taking in one entry out of the input Map. So, e._2 is a List of Pairs. I can't figure out what exactly happens after that!
I am missing a very important and subtle step somewhere. Please enlighten me.
It can be thought of as:
First we map:
val a = Map("a" -> List(1 -> 11,1 -> 111), "b" -> List(2 -> 22,2 -> 222)).map(e => e._2)
// List(List((1, 11), (1, 111)), List((2, 22), (2, 222)))
Then we flatten:
val b = a.flatten
// List((1, 11), (1, 111), (2, 22), (2, 222))
Then we convert back to a map:
b.toMap
// Map(1 -> 111, 2 -> 222)
Since a map cannot have 2 values for 1 key, the value is overwritten.
Really whats going on is that the flatMap is being converted into a loop like so:
for (x <- m0) b ++= f(x)
where:
m0 is our original map
b is a collection builder that has to build a Map, aka, MapBuilder
f is our function being passed into the flatMap (it returns a List[(Int, Int)])
x is an element in our original map
The ++= function takes the list we got from calling f(x), and calls += on every element, to add it to our map. For a Map, += just calls the original + operator for a Map, which updates the value if the key already exists.
Finally we call result on our builder which just returns us our Map.

How to un-nest a spark rdd that has the following type ((String, scala.collection.immutable.Map[String,scala.collection.immutable.Map[String,Int]]))

Its a nested map with contents like this when i print it onto screen
(5, Map ( "ABCD" -> Map("3200" -> 3,
"3350.800" -> 4,
"200.300" -> 3)
(1, Map ( "DEF" -> Map("1200" -> 32,
"1320.800" -> 4,
"2100" -> 3)
I need to get something like this
Case Class( 5, ABCD 3200, 3)
Case Class(5, ABCD 3350.800, 4)
CaseClass(5,ABCD., 200.300, 3)
CaseClass(1, DEF 1200, 32)
CaseClass(1 DEF, 1320.800, 4)
etc etc.
basically a list of case classes
And map it to a case class object so that i can save it to cassandra.
I have tried doing flatMapValues but that un nests the map only one level. Also used flatMap . that doesnt work either or I'am making mistakes
Any suggestions ?
Fairly straightforward using a for-comprehension and some pattern matching to destructure things:
val in = List((5, Map ( "ABCD" -> Map("3200" -> 3, "3350.800" -> 4, "200.300" -> 3))),
(1, Map ("DEF" -> Map("1200" -> 32, "1320.800" -> 4, "2100" -> 3))))
case class Thing(a:Int, b:String, c:String, d:Int)
for { (index, m) <- in
(k,v) <-m
(innerK, innerV) <- v}
yield Thing(index, k, innerK, innerV)
//> res0: List[maps.maps2.Thing] = List(Thing(5,ABCD,3200,3),
// Thing(5,ABCD,3350.800,4),
// Thing(5,ABCD,200.300,3),
// Thing(1,DEF,1200,32),
// Thing(1,DEF,1320.800,4),
// Thing(1,DEF,2100,3))
So let's pick part the for-comprehension
(index, m) <- in
This is the same as
t <- in
(index, m) = t
In the first line t will successively be set to each element of in.
t is therefore a tuple (Int, Map(...))
Patten matching lets us put that "patten" for the tuple on the right hand side and the compiler picks apart the tuple, sets index to the Int and m to the Map.
(k,v) <-m
As before this is equivalent to
u <-m
(k, v) = u
And this time u takes each element of Map. Which again are tuples of key and value. So k is set successively to each key and v to the value.
And v is your inner map so we do the same thing again with the inner map
(innerK, innerV) <- v}
Now we have everything we need to create the case class. yield just says make a collection of whatever is "yielded" each time through the loop.
yield Thing(index, k, innerK, innerV)
Under the hood, this just translates to a set of maps/flatmaps
The yield is just the value Thing(index, k, innerK, innerV)
We get one of those for each element of v
v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}
but there's an inner map per element of the outer map
m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
(flatMap because we get a List of Lists if we just did a map and we want to flatten it to just the list of items)
Similarly, we do one of those for every element in the List
in.flatMap (z => val (index, m) = z; m.flatMap{y=>val (k, v) = y;v.map{x=>val (innerK, innerV) = t;Thing(index, k, innerK, innerV)}}
Let's do that in _1, _2 style-y.
in.flatMap (z=> z._2.flatMap{y=>y._2.map{x=>;Thing(z._1, y._1, x._1, x._2)}}}
which produces exactly the same result. But isn't it clearer as a for-comprehension?
You can do this like this if you prefer collection operation
case class Record(v1: Int, v2: String, v3: Double, v4: Int)
val data = List(
(5, Map ( "ABC" ->
Map(
3200. -> 3,
3350.800 -> 4,
200.300 -> 3))
),
(1, Map ( "DEF" ->
Map(
1200. -> 32,
1320.800 -> 4,
2100. -> 3))
)
)
val rdd = sc.parallelize(data)
val result = rdd.flatMap(p => {
p._2.toList
.flatMap(q => q._2.toList.map(l => (q._1, l)))
.map((p._1, _))
}).map(p => Record(p._1, p._2._1, p._2._2._1, p._2._2._2))
println(result.collect.toList)
//List(
// Record(5,ABC,3200.0,3),
// Record(5,ABC,3350.8,4),
// Record(5,ABC,200.3,3),
// Record(1,DEF,1200.0,32),
// Record(1,DEF,1320.8,4),
// Record(1,DEF,2100.0,3)
//)

In Scala, apply function to values for some keys in immutable map

Let an immutable map
val m = (0 to 3).map {x => (x,x*10) }.toMap
m: scala.collection.immutable.Map[Int,Int] = Map(0 -> 0, 1 -> 10, 2 -> 20, 3 -> 30)
a collection of keys of interest
val k = Set(0,2)
and a function
def f(i:Int) = i + 1
How to apply f onto the values in the map mapped by the keys of interest so that the resulting map would be
Map(0 -> 1, 1 -> 10, 2 -> 21, 3 -> 30)
m.transform{ (key, value) => if (k(key)) f(value) else value }
That's the first thing that popped into my mind but I am pretty sure that in Scala you could do it prettier:
m.map(e => {
if(k.contains(e._1)) (e._1 -> f(e._2)) else (e._1 -> e._2)
})
A variation of #regis-jean-gilles answer using map and pattern matching
m.map { case a # (key, value) => if (k(key)) key -> f(value) else a }