Combine values with same keys in Scala - scala

I currently have 2 lists List('a','b','a') and List(45,65,12) with many more elements and elements in 2nd list linked to elements in first list by having a key value relationship. I want combine elements with same keys by adding their corresponding values and create a map which should look like Map('a'-> 57,'b'->65) as 57 = 45 + 12.
I have currently implemented it as
val keys = List('a','b','a')
val values = List(45,65,12)
val finalMap:Map(char:Int) =
scala.collection.mutable.Map().withDefaultValue(0)
0 until keys.length map (w => finalMap(keys(w)) += values(w))
I feel that there should be a better way(functional way) of creating the desired map than how I am doing it. How could I improve my code and do the same thing in more functional way?

val m = keys.zip(values).groupBy(_._1).mapValues(l => l.map(_._2).sum)
EDIT: To explain how the code works, zip pairs corresponding elements of two input sequences, so
keys.zip(values) = List((a, 45), (b, 65), (a, 12))
Now you want to group together all the pairs with the same first element. This can be done with groupBy:
keys.zip(values).groupBy(_._1) = Map((a, List((a, 45), (a, 12))), (b, List((b, 65))))
groupBy returns a map whose keys are the type being grouped on, and whose values are a list of the elements in the input sequence with the same key.
The keys of this map are the characters in keys, and the values are a list of associated pair from keys and values. Since the keys are the ones you want in the output map, you only need to transform the values from List[Char, Int] to List[Int].
You can do this by summing the values from the second element of each pair in the list.
You can extract the values from each pair using map e.g.
List((a, 45), (a, 12)).map(_._2) = List(45,12)
Now you can sum these values using sum:
List(45, 12).sum = 57
You can apply this transform to all the values in the map using mapValues to get the result you want.

I was going to +1 Lee's first version, but mapValues is a view, and ell always looks like one to me. Just not to seem petty.
scala> (keys zip values) groupBy (_._1) map { case (k,v) => (k, (v map (_._2)).sum) }
res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
Hey, the answer with fold disappeared. You can't blink on SO, the action is so fast.
I'm going to +1 Lee's typing speed anyway.
Edit: to explain how mapValues is a view:
scala> keys.zip(values).groupBy(_._1).mapValues(l => l.map { v =>
| println("OK mapping")
| v._2
| }.sum)
OK mapping
OK mapping
OK mapping
res2: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
scala> res2('a') // recomputes
OK mapping
OK mapping
res4: Int = 57
Sometimes that is what you want, but often it is surprising. I think there is a puzzler for it.

You were actually on the right track to a reasonably efficient functional solution. If we just switch to an immutable collection and use a fold on a key-value zip, we get:
( Map[Char,Int]() /: (keys,values).zipped ) ( (m,kv) =>
m + ( kv._1 -> ( m.getOrElse( kv._1, 0 ) + kv._2 ) )
)
Or you could use withDefaultValue 0, as you did, if you want the final map to have that default. Note that .zipped is faster than zip because it doesn't create an intermediate collection. And a groupBy would create a number of other intermediate collections. Of course it may not be worth optimizing, and if it is you could do even better than this, but I wanted to show you that your line of thinking wasn't far off the mark.

Related

Merge entries in Map without using loop in scala

Could someone suggest a best way to merge entries in map for below use case in scala (possibly without using loop)?
From
val map1 = Map(((1,"case0")->List(1,2,3)), ((2,"case0")->List(3,4,5)), ((1,"case1")->List(2,4,6)), ((2,"case1")->List(3)))
To
Map(((1,"nocase")->List(2)), ((2,"nocase")->List(3)))
You can do it as follows:
map1.groupBy(_._1._1).map{case (key, elements) => ((key, "nocase"), elements.values.reduce(_ intersect _ ))}
With the group you group the elements by the first element of the key, then with the map, you build the new key, with the "nocase" string as in your example. With elements.value you get all the elements for the given keys and you can reduce them with the intersect you get the expected output
Output:
Map[(Int, String),List[Int]] = Map((2,nocase) -> List(3), (1,nocase) -> List(2))

How to create a map from a RDD[String] using scala?

My file is,
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
Here there are 7 rows & 5 columns(0,1,2,3,4)
I want the output as,
Map(0 -> Set("sunny","overcast","rainy"))
Map(1 -> Set("hot","mild","cool"))
Map(2 -> Set("high","normal"))
Map(3 -> Set("false","true"))
Map(4 -> Set("yes","no"))
The output must be the type of [Map[Int,Set[String]]]
EDIT: Rewritten to present the map-reduce version first, as it's more suited to Spark
Since this is Spark, we're probably interested in parallelism/distribution. So we need to take care to enable that.
Splitting each string into words can be done in partitions. Getting the set of values used in each column is a bit more tricky - the naive approach of initialising a set then adding every value from every row is inherently serial/local, since there's only one set (per column) we're adding the value from each row to.
However, if we have the set for some part of the rows and the set for the rest, the answer is just the union of these sets. This suggests a reduce operation where we merge sets for some subset of the rows, then merge those and so on until we have a single set.
So, the algorithm:
Split each row into an array of strings, then change this into an
array of sets of the single string value for each column - this can
all be done with one map, and distributed.
Now reduce this using an
operation that merges the set for each column in turn. This also can
be distributed
turn the single row that results into a Map
It's no coincidence that we do a map, then a reduce, which should remind you of something :)
Here's a one-liner that produces the single row:
val data = List(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val row = data.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
Converting it to a Map as the question asks:
val theMap = row.zipWithIndex.map(_.swap).toMap
Zip the list with the index, since that's what we need as the key of
the map.
The elements of each tuple are unfortunately in the wrong
order for .toMap, so swap them.
Then we have a list of (key, value)
pairs which .toMap will turn into the desired result.
These don't need to change AT ALL to work with Spark. We just need to use a RDD, instead of the List. Let's convert data into an RDD just to demo this:
val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)
val rdd = sc.makeRDD(data)
val row = rdd.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
(This can be converted into a Map as before)
An earlier oneliner works neatly (transpose is exactly what's needed here) but is very difficult to distribute (transpose inherently needs to visit every row)
data.map(_.split("\\W+")).transpose.map(_.toSet)
(Omitting the conversion to Map for clarity)
Split each string into words.
Transpose the result, so we have a list that has a list of the first words, then a list of the second words, etc.
Convert each of those to a set.
Maybe this do the trick:
val a = Array(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val b = new Array[Map[String, Set[String]]](5)
for (i <- 0 to 4)
b(i) = Map(i.toString -> (Set() ++ (for (s <- a) yield s.split(",")(i))) )
println(b.mkString("\n"))

Scala - Sort a Map based on Tuple Values

I am attempting to sort a Map based on Value Tuples
val data = Map("ip-11-254-25-225:9000" -> (1, 1413669308124L),
"ip-11-232-145-172:9000" -> (0, 1413669141265L),
"ip-11-232-132-31:9000" -> (0, 1413669128111L),
"ip-11-253-67-184:9000" -> (0, 1413669134073L),
"ip-11-232-142-77:9000" -> (0, 1413669139043L))
The criteria of sort should be based on both the values in the tuple
I tried
SortedMap[String, (Long,Long)]() ++ data
but with no success.
Can someone please suggest a better way to sort first on tuple._1 and then on tuple._2
In general you can select which element in the tuple has priority in the sorting; consider this
data.toSeq.sortBy {case (k,(a,b)) => (k,a,b) }
In the case pattern we extract each element of the (nested) tuples. In the expression above we sort by key, then by first element in value tuple and then second. On the other hand, this
data.toSeq.sortBy {case (k,(a,b)) => (k,b) }
will sort by key and last element in value tuple, and this
data.toSeq.sortBy {case (k,(a,b)) => (a,b) }
by the values of the map; this one
data.toSeq.sortBy {case (k,(a,b)) => (b) }
will sort by the last element in the values tuple.
Update As helpfully pointed out by #Paul a Map preserves no ordering, hence the result remains here as a sequence.
Is this what you're looking for?
data.toVector.sortBy(_._2)
This will sort the entries by the values (the tuples), where the order depends on both tuple arguments. The default sort behavior for a tuple is to sort on _1 and then _2:
Vector((2,1), (1,2), (1,3), (1,1)).sorted
// Vector((1,1), (1,2), (1,3), (2,1))
Sorting Tuples
First note that if you sort a collection of tuples you will get the same result as you expect, i.e. first items will be compared first and then second items.
For example (a1, b1) > (a2, b2) if and only if (a1 > a2) || ((a1 == a2) && (b1 > b2)).
So for getting the expected result in terms of sorting tuples you don't need to do anything.
Then it remains to how to sort a map based on values and how to preserve the order after sorting.
Sort map on values
You can use the sortBy method of List and then use an ordered data structure to preserve the ordering, like this:
new scala.collection.immutable.ListMap() ++ data.toList.sortBy(_._2)
If you run this in Scala REPL you will get the follwoing result:
scala> new scala.collection.immutable.ListMap() ++ data.toList.sortBy(_._2)
res3: scala.collection.immutable.ListMap[String,(Int, Long)] = Map(ip-11-232-132-31:9000 -> (0,1413669128111), ip-11-253-67-184:9000 -> (0,1413669134073), ip-11-232-142-77:9000 -> (0,1413669139043), ip-11-232-145-172:9000 -> (0,1413669141265), ip-11-254-25-225:9000 -> (1,1413669308124))
If you simply want to sort them and traverse the result (i.e. if you don't need a map as result) you don't even need to use ListMap.

In scala, how to get an array of keys and values from map, with the correct order (i-th key is for the i-th value)?

say I have a map in Scala - key: String, value: String.
Is there an easy way to get the arrays of keys and values in corresponding order? E.g. the i-th element of the key array should be the key related to the i-th value of the values array.
What I've tried is iterating through the map and getting them one by one:
valuesMap.foreach{keyVal => keys.append(keyVal.1); values.append(keyVal.2); // the idea, not the actual code
Is there a simplier way?
The question could probably be asked: is there any way to guarantee a specific order of map.keys/map.values?
For example, when generating an SQL query it may be convenient to have arrays of column names and values separately, but with the same order.
You can use toSeq to get a sequence of pairs that has a fixed order, and then you can sort by the key (or any other function of the pairs). For example:
scala> val pairs = Map(3 -> 'c, 1 -> 'a, 4 -> 'd, 2 -> 'b).toSeq.sortBy(_._1)
res0: Seq[(Int, Symbol)] = ArrayBuffer((1,'a), (2,'b), (3,'c), (4,'d))
Now you can use unzip to turn this sequence of pairs into a pair of sequences:
scala> val (keys, vals) = pairs.unzip
keys: Seq[Int] = ArrayBuffer(1, 2, 3, 4)
vals: Seq[Symbol] = ArrayBuffer('a, 'b, 'c, 'd)
The keys and values will line up, and you've only performed the sorting operation once.
If all you want is that key- and value-lists line up, then it's really easy:
val (keys, vals) = yourMap.toSeq.unzip
All the information of the original map is preserved the ordering. You can get it back like so
val originalMap = (keys zip values).toMap
assert(originalMap == yourMap)

How to find the number of (key , value) pairs in a map in scala?

I need to find the number of (key , value) pairs in a Map in my Scala code. I can iterate through the map and get an answer but I wanted to know if there is any direct function for this purpose or not.
you can use .size
scala> val m=Map("a"->1,"b"->2,"c"->3)
m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
scala> m.size
res3: Int = 3
Use Map#size:
The size of this traversable or iterator.
The size method is from TraversableOnce so, barring infinite sequences or sequences that shouldn't be iterated again, it can be used over a wide range - List, Map, Set, etc.