In scala, how to get an array of keys and values from map, with the correct order (i-th key is for the i-th value)? - scala

say I have a map in Scala - key: String, value: String.
Is there an easy way to get the arrays of keys and values in corresponding order? E.g. the i-th element of the key array should be the key related to the i-th value of the values array.
What I've tried is iterating through the map and getting them one by one:
valuesMap.foreach{keyVal => keys.append(keyVal.1); values.append(keyVal.2); // the idea, not the actual code
Is there a simplier way?
The question could probably be asked: is there any way to guarantee a specific order of map.keys/map.values?
For example, when generating an SQL query it may be convenient to have arrays of column names and values separately, but with the same order.

You can use toSeq to get a sequence of pairs that has a fixed order, and then you can sort by the key (or any other function of the pairs). For example:
scala> val pairs = Map(3 -> 'c, 1 -> 'a, 4 -> 'd, 2 -> 'b).toSeq.sortBy(_._1)
res0: Seq[(Int, Symbol)] = ArrayBuffer((1,'a), (2,'b), (3,'c), (4,'d))
Now you can use unzip to turn this sequence of pairs into a pair of sequences:
scala> val (keys, vals) = pairs.unzip
keys: Seq[Int] = ArrayBuffer(1, 2, 3, 4)
vals: Seq[Symbol] = ArrayBuffer('a, 'b, 'c, 'd)
The keys and values will line up, and you've only performed the sorting operation once.

If all you want is that key- and value-lists line up, then it's really easy:
val (keys, vals) = yourMap.toSeq.unzip
All the information of the original map is preserved the ordering. You can get it back like so
val originalMap = (keys zip values).toMap
assert(originalMap == yourMap)

Related

Spark RDD - CountByValue - Map type - order by key

From spark RDD - countByValue is returning Map Datatype and want to sort by key ascending/ descending .
val s = flightsObjectRDD.map(_.dep_delay / 60 toInt).countByValue() // RDD type is action and returning Map datatype
s.toSeq.sortBy(_._1)
The above code is working as expected. But countByValue itself have implicit sorting . How can i implement that way?
You exit the Big Data realm and get into Scala itself. And then into all those structures that are immutable, sorted, hashed and mutable, or a combination of these. I think that is the reason for the -1 initially. Nice folks out there, anyway.
Take this example, the countByValue returns a Map to the Driver, so only of interest for small amounts of data. Map is also (key, value) pair but with hashing and immutable. So we need to manipulate it. This is what you can do. First up you can sort the Map on the key in ascending order.
val rdd1 = sc.parallelize(Seq(("HR",5),("RD",4),("ADMIN",5),("SALES",4),("SER",6),("MAN",8),("MAN",8),("HR",5),("HR",6),("HR",5)))
val map = rdd1.countByValue
val res1 = ListMap(map.toSeq.sortBy(_._1):_*) // ascending sort on key part of Map
res1: scala.collection.immutable.ListMap[(String, Int),Long] = Map((ADMIN,5) -> 1, (HR,5) -> 3, (HR,6) -> 1, (MAN,8) -> 2, (RD,4) -> 1, (SALES,4) -> 1, (SER,6) -> 1)
However, you cannot apply reverse or descending logic on the key as it is hashing. Next best thing is as follows:
val res2 = map.toList.sortBy(_._1).reverse
val res22 = map.toSeq.sortBy(_._1).reverse
res2: List[((String, Int), Long)] = List(((SER,6),1), ((SALES,4),1), ((RD,4),1), ((MAN,8),2), ((HR,6),1), ((HR,5),3), ((ADMIN,5),1))
res22: Seq[((String, Int), Long)] = ArrayBuffer(((SER,6),1), ((SALES,4),1), ((RD,4),1), ((MAN,8),2), ((HR,6),1), ((HR,5),3), ((ADMIN,5),1))
But you cannot apply the .toMap against the .reverse here, as it will hash and lose the sort. So, you must make a compromise.

Sorting three lists using the ordering from one of them in Scala

Given three distinct lists of the same length, I want to sort all three of them, using the ordering from one of them. For example, for the given three lists:
val a = Seq(2, 1, 3)
val b = Seq("Hi", "there", "world")
val c = Seq(1.0, 2.0, 3.0)
...and assuming that we sort by the ordering from a, I want the result to look like this:
aSorted: Seq[Int] = List(1, 2, 3) // Sorted by its own order
bSorted: Seq[String] = List("there", "Hi", "world") // Reordered the same way as aSorted
cSorted: Seq[Double] = List(2.0, 1.0, 3.0) // Reordered the same way as aSorted
All functions from Sorting appear to work on sequences without any way to specify a swap operation. So do I have to resort to writing my own code to sort? Or should I implement some custom sequence type? If so, how?
You can do this pretty cleanly with zip, sortBy, and unzip.
val (aSorted, pair) = a.zip(b.zip(c)).sortBy(_._1).unzip
val (bSorted, cSorted) = pair.unzip
zip takes two sequences and returns a sequence of pairs (dropping any extra elements if the lengths don't match). This means b.zip(c) is a sequence of (String, Double) elements, and a.zip(b.zip(c)) is a sequence of (Int, (String, Double)).
We can then use sortBy(_._1) to sort this sequence by the elements from the first sequence.
Lastly unzip just undoes zip, turning a sequence of (Int, (String, Double)) into a pair of sequences—one of Int elements and one of (String, Double) elements. Then we just do the same operation again on the second of these two sequences, and you've got the result you want.

Scala - Sort a Map based on Tuple Values

I am attempting to sort a Map based on Value Tuples
val data = Map("ip-11-254-25-225:9000" -> (1, 1413669308124L),
"ip-11-232-145-172:9000" -> (0, 1413669141265L),
"ip-11-232-132-31:9000" -> (0, 1413669128111L),
"ip-11-253-67-184:9000" -> (0, 1413669134073L),
"ip-11-232-142-77:9000" -> (0, 1413669139043L))
The criteria of sort should be based on both the values in the tuple
I tried
SortedMap[String, (Long,Long)]() ++ data
but with no success.
Can someone please suggest a better way to sort first on tuple._1 and then on tuple._2
In general you can select which element in the tuple has priority in the sorting; consider this
data.toSeq.sortBy {case (k,(a,b)) => (k,a,b) }
In the case pattern we extract each element of the (nested) tuples. In the expression above we sort by key, then by first element in value tuple and then second. On the other hand, this
data.toSeq.sortBy {case (k,(a,b)) => (k,b) }
will sort by key and last element in value tuple, and this
data.toSeq.sortBy {case (k,(a,b)) => (a,b) }
by the values of the map; this one
data.toSeq.sortBy {case (k,(a,b)) => (b) }
will sort by the last element in the values tuple.
Update As helpfully pointed out by #Paul a Map preserves no ordering, hence the result remains here as a sequence.
Is this what you're looking for?
data.toVector.sortBy(_._2)
This will sort the entries by the values (the tuples), where the order depends on both tuple arguments. The default sort behavior for a tuple is to sort on _1 and then _2:
Vector((2,1), (1,2), (1,3), (1,1)).sorted
// Vector((1,1), (1,2), (1,3), (2,1))
Sorting Tuples
First note that if you sort a collection of tuples you will get the same result as you expect, i.e. first items will be compared first and then second items.
For example (a1, b1) > (a2, b2) if and only if (a1 > a2) || ((a1 == a2) && (b1 > b2)).
So for getting the expected result in terms of sorting tuples you don't need to do anything.
Then it remains to how to sort a map based on values and how to preserve the order after sorting.
Sort map on values
You can use the sortBy method of List and then use an ordered data structure to preserve the ordering, like this:
new scala.collection.immutable.ListMap() ++ data.toList.sortBy(_._2)
If you run this in Scala REPL you will get the follwoing result:
scala> new scala.collection.immutable.ListMap() ++ data.toList.sortBy(_._2)
res3: scala.collection.immutable.ListMap[String,(Int, Long)] = Map(ip-11-232-132-31:9000 -> (0,1413669128111), ip-11-253-67-184:9000 -> (0,1413669134073), ip-11-232-142-77:9000 -> (0,1413669139043), ip-11-232-145-172:9000 -> (0,1413669141265), ip-11-254-25-225:9000 -> (1,1413669308124))
If you simply want to sort them and traverse the result (i.e. if you don't need a map as result) you don't even need to use ListMap.

How to find the number of (key , value) pairs in a map in scala?

I need to find the number of (key , value) pairs in a Map in my Scala code. I can iterate through the map and get an answer but I wanted to know if there is any direct function for this purpose or not.
you can use .size
scala> val m=Map("a"->1,"b"->2,"c"->3)
m: scala.collection.immutable.Map[String,Int] = Map(a -> 1, b -> 2, c -> 3)
scala> m.size
res3: Int = 3
Use Map#size:
The size of this traversable or iterator.
The size method is from TraversableOnce so, barring infinite sequences or sequences that shouldn't be iterated again, it can be used over a wide range - List, Map, Set, etc.

Combine values with same keys in Scala

I currently have 2 lists List('a','b','a') and List(45,65,12) with many more elements and elements in 2nd list linked to elements in first list by having a key value relationship. I want combine elements with same keys by adding their corresponding values and create a map which should look like Map('a'-> 57,'b'->65) as 57 = 45 + 12.
I have currently implemented it as
val keys = List('a','b','a')
val values = List(45,65,12)
val finalMap:Map(char:Int) =
scala.collection.mutable.Map().withDefaultValue(0)
0 until keys.length map (w => finalMap(keys(w)) += values(w))
I feel that there should be a better way(functional way) of creating the desired map than how I am doing it. How could I improve my code and do the same thing in more functional way?
val m = keys.zip(values).groupBy(_._1).mapValues(l => l.map(_._2).sum)
EDIT: To explain how the code works, zip pairs corresponding elements of two input sequences, so
keys.zip(values) = List((a, 45), (b, 65), (a, 12))
Now you want to group together all the pairs with the same first element. This can be done with groupBy:
keys.zip(values).groupBy(_._1) = Map((a, List((a, 45), (a, 12))), (b, List((b, 65))))
groupBy returns a map whose keys are the type being grouped on, and whose values are a list of the elements in the input sequence with the same key.
The keys of this map are the characters in keys, and the values are a list of associated pair from keys and values. Since the keys are the ones you want in the output map, you only need to transform the values from List[Char, Int] to List[Int].
You can do this by summing the values from the second element of each pair in the list.
You can extract the values from each pair using map e.g.
List((a, 45), (a, 12)).map(_._2) = List(45,12)
Now you can sum these values using sum:
List(45, 12).sum = 57
You can apply this transform to all the values in the map using mapValues to get the result you want.
I was going to +1 Lee's first version, but mapValues is a view, and ell always looks like one to me. Just not to seem petty.
scala> (keys zip values) groupBy (_._1) map { case (k,v) => (k, (v map (_._2)).sum) }
res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
Hey, the answer with fold disappeared. You can't blink on SO, the action is so fast.
I'm going to +1 Lee's typing speed anyway.
Edit: to explain how mapValues is a view:
scala> keys.zip(values).groupBy(_._1).mapValues(l => l.map { v =>
| println("OK mapping")
| v._2
| }.sum)
OK mapping
OK mapping
OK mapping
res2: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
scala> res2('a') // recomputes
OK mapping
OK mapping
res4: Int = 57
Sometimes that is what you want, but often it is surprising. I think there is a puzzler for it.
You were actually on the right track to a reasonably efficient functional solution. If we just switch to an immutable collection and use a fold on a key-value zip, we get:
( Map[Char,Int]() /: (keys,values).zipped ) ( (m,kv) =>
m + ( kv._1 -> ( m.getOrElse( kv._1, 0 ) + kv._2 ) )
)
Or you could use withDefaultValue 0, as you did, if you want the final map to have that default. Note that .zipped is faster than zip because it doesn't create an intermediate collection. And a groupBy would create a number of other intermediate collections. Of course it may not be worth optimizing, and if it is you could do even better than this, but I wanted to show you that your line of thinking wasn't far off the mark.