Scala - Sort a Map based on Tuple Values - scala

I am attempting to sort a Map based on Value Tuples
val data = Map("ip-11-254-25-225:9000" -> (1, 1413669308124L),
"ip-11-232-145-172:9000" -> (0, 1413669141265L),
"ip-11-232-132-31:9000" -> (0, 1413669128111L),
"ip-11-253-67-184:9000" -> (0, 1413669134073L),
"ip-11-232-142-77:9000" -> (0, 1413669139043L))
The criteria of sort should be based on both the values in the tuple
I tried
SortedMap[String, (Long,Long)]() ++ data
but with no success.
Can someone please suggest a better way to sort first on tuple._1 and then on tuple._2

In general you can select which element in the tuple has priority in the sorting; consider this
data.toSeq.sortBy {case (k,(a,b)) => (k,a,b) }
In the case pattern we extract each element of the (nested) tuples. In the expression above we sort by key, then by first element in value tuple and then second. On the other hand, this
data.toSeq.sortBy {case (k,(a,b)) => (k,b) }
will sort by key and last element in value tuple, and this
data.toSeq.sortBy {case (k,(a,b)) => (a,b) }
by the values of the map; this one
data.toSeq.sortBy {case (k,(a,b)) => (b) }
will sort by the last element in the values tuple.
Update As helpfully pointed out by #Paul a Map preserves no ordering, hence the result remains here as a sequence.

Is this what you're looking for?
data.toVector.sortBy(_._2)
This will sort the entries by the values (the tuples), where the order depends on both tuple arguments. The default sort behavior for a tuple is to sort on _1 and then _2:
Vector((2,1), (1,2), (1,3), (1,1)).sorted
// Vector((1,1), (1,2), (1,3), (2,1))

Sorting Tuples
First note that if you sort a collection of tuples you will get the same result as you expect, i.e. first items will be compared first and then second items.
For example (a1, b1) > (a2, b2) if and only if (a1 > a2) || ((a1 == a2) && (b1 > b2)).
So for getting the expected result in terms of sorting tuples you don't need to do anything.
Then it remains to how to sort a map based on values and how to preserve the order after sorting.
Sort map on values
You can use the sortBy method of List and then use an ordered data structure to preserve the ordering, like this:
new scala.collection.immutable.ListMap() ++ data.toList.sortBy(_._2)
If you run this in Scala REPL you will get the follwoing result:
scala> new scala.collection.immutable.ListMap() ++ data.toList.sortBy(_._2)
res3: scala.collection.immutable.ListMap[String,(Int, Long)] = Map(ip-11-232-132-31:9000 -> (0,1413669128111), ip-11-253-67-184:9000 -> (0,1413669134073), ip-11-232-142-77:9000 -> (0,1413669139043), ip-11-232-145-172:9000 -> (0,1413669141265), ip-11-254-25-225:9000 -> (1,1413669308124))
If you simply want to sort them and traverse the result (i.e. if you don't need a map as result) you don't even need to use ListMap.

Related

Merge entries in Map without using loop in scala

Could someone suggest a best way to merge entries in map for below use case in scala (possibly without using loop)?
From
val map1 = Map(((1,"case0")->List(1,2,3)), ((2,"case0")->List(3,4,5)), ((1,"case1")->List(2,4,6)), ((2,"case1")->List(3)))
To
Map(((1,"nocase")->List(2)), ((2,"nocase")->List(3)))
You can do it as follows:
map1.groupBy(_._1._1).map{case (key, elements) => ((key, "nocase"), elements.values.reduce(_ intersect _ ))}
With the group you group the elements by the first element of the key, then with the map, you build the new key, with the "nocase" string as in your example. With elements.value you get all the elements for the given keys and you can reduce them with the intersect you get the expected output
Output:
Map[(Int, String),List[Int]] = Map((2,nocase) -> List(3), (1,nocase) -> List(2))

Optimize Sorting Iterable Values after grouping in Spark

I have RDD[(String,(Int, Int)], I need to get top 10 values(tuples) for each key after sorting. I tried:
val sortedRDD = rdd.groupByKey.mapValues( x => x.toList.sortWith((x,y) => <<sorting logic>>).take(10))
This throws OutOfMemoryException as Iterable[(Int, Int)] is large for few keys for some keys. How should i handle this?, Is there a way to do this without using .groupByKey().
You should use aggregateByKey instead of groupByKey to perform the sorting and "trimming" (that keeps only top 10) while grouping instead of grouping into potentially-huge groups and only then mapping the result.
Here's how this could look:
// your sorting logic:
val sortingFunction: ((Int, Int), (Int, Int)) => Boolean = ???
val N = 10
val sortedRDD = rdd.aggregateByKey(List[(Int, Int)]())(
// first function: seqOp, how to add another item of the group to the result
{
case (topSoFar, candidate) if topSoFar.size < N => candidate :: topSoFar
case (topTen, candidate) => (candidate :: topTen).sortWith(sortingFunction).take(N)
},
// second function: combOp, how to add combine two partial results created by seqOp
{ case (list1, list2) => (list1 ++ list2).sortWith(sortingFunction).take(N) }
)
Notice that per group, we always create values that are 10 items or less.
NOTE: performance can possibly be improved by performing less "sort" operations (we sort the same list again and again whenever we add another item / list). To solve that, you can consider using a "sorted set" with a limited capacity (see Limited SortedSet) as the value, so that each addition efficiently adds or discards the new value without sorting.

GroupBy using two conditions in scala

I have List[(Int,Int)]
For ex.
val a= List((1,2), (3,4), (1,3), (4,2), (5,4), (3,8))
I want to perform operation like this:
Take first element and groupby using below condition:
If first tuple of the element is in remaining elements' first tuple then include it
or
If first tuple of the element is in remaining elements' first tuple then include it
Then skip that tuples which are included and for remaining tuple do the same process.
Possible Answer:
val ans= Map((1,2)->List((1,2),(1,3),(4,2)), (3,4)->List(3,4),(5,4),(3,8))
How can I do this?
This seems to work
a.foldLeft(List[((Int, Int), List[(Int, Int)])]())
{(acc, t) => if (acc.exists (_._2.contains(t)))
acc
else
(t, a.filter(u => u != t && (u._1 == t._1 || u._2 == t._2)))::acc
}.toMap
//> res0: scala.collection.immutable.Map[(Int, Int),List[(Int, Int)]] =
// Map((3,4) -> List((5,4), (3,8)),
(1,2) -> List((1,3), (4,2)))
Go over the list. If this tuple is already in our accumulator, do nothing. Otherwise, filter the list for all tuples that are not the current tuple and share either the first or second element with the current one.

In scala, how to get an array of keys and values from map, with the correct order (i-th key is for the i-th value)?

say I have a map in Scala - key: String, value: String.
Is there an easy way to get the arrays of keys and values in corresponding order? E.g. the i-th element of the key array should be the key related to the i-th value of the values array.
What I've tried is iterating through the map and getting them one by one:
valuesMap.foreach{keyVal => keys.append(keyVal.1); values.append(keyVal.2); // the idea, not the actual code
Is there a simplier way?
The question could probably be asked: is there any way to guarantee a specific order of map.keys/map.values?
For example, when generating an SQL query it may be convenient to have arrays of column names and values separately, but with the same order.
You can use toSeq to get a sequence of pairs that has a fixed order, and then you can sort by the key (or any other function of the pairs). For example:
scala> val pairs = Map(3 -> 'c, 1 -> 'a, 4 -> 'd, 2 -> 'b).toSeq.sortBy(_._1)
res0: Seq[(Int, Symbol)] = ArrayBuffer((1,'a), (2,'b), (3,'c), (4,'d))
Now you can use unzip to turn this sequence of pairs into a pair of sequences:
scala> val (keys, vals) = pairs.unzip
keys: Seq[Int] = ArrayBuffer(1, 2, 3, 4)
vals: Seq[Symbol] = ArrayBuffer('a, 'b, 'c, 'd)
The keys and values will line up, and you've only performed the sorting operation once.
If all you want is that key- and value-lists line up, then it's really easy:
val (keys, vals) = yourMap.toSeq.unzip
All the information of the original map is preserved the ordering. You can get it back like so
val originalMap = (keys zip values).toMap
assert(originalMap == yourMap)

Combine values with same keys in Scala

I currently have 2 lists List('a','b','a') and List(45,65,12) with many more elements and elements in 2nd list linked to elements in first list by having a key value relationship. I want combine elements with same keys by adding their corresponding values and create a map which should look like Map('a'-> 57,'b'->65) as 57 = 45 + 12.
I have currently implemented it as
val keys = List('a','b','a')
val values = List(45,65,12)
val finalMap:Map(char:Int) =
scala.collection.mutable.Map().withDefaultValue(0)
0 until keys.length map (w => finalMap(keys(w)) += values(w))
I feel that there should be a better way(functional way) of creating the desired map than how I am doing it. How could I improve my code and do the same thing in more functional way?
val m = keys.zip(values).groupBy(_._1).mapValues(l => l.map(_._2).sum)
EDIT: To explain how the code works, zip pairs corresponding elements of two input sequences, so
keys.zip(values) = List((a, 45), (b, 65), (a, 12))
Now you want to group together all the pairs with the same first element. This can be done with groupBy:
keys.zip(values).groupBy(_._1) = Map((a, List((a, 45), (a, 12))), (b, List((b, 65))))
groupBy returns a map whose keys are the type being grouped on, and whose values are a list of the elements in the input sequence with the same key.
The keys of this map are the characters in keys, and the values are a list of associated pair from keys and values. Since the keys are the ones you want in the output map, you only need to transform the values from List[Char, Int] to List[Int].
You can do this by summing the values from the second element of each pair in the list.
You can extract the values from each pair using map e.g.
List((a, 45), (a, 12)).map(_._2) = List(45,12)
Now you can sum these values using sum:
List(45, 12).sum = 57
You can apply this transform to all the values in the map using mapValues to get the result you want.
I was going to +1 Lee's first version, but mapValues is a view, and ell always looks like one to me. Just not to seem petty.
scala> (keys zip values) groupBy (_._1) map { case (k,v) => (k, (v map (_._2)).sum) }
res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
Hey, the answer with fold disappeared. You can't blink on SO, the action is so fast.
I'm going to +1 Lee's typing speed anyway.
Edit: to explain how mapValues is a view:
scala> keys.zip(values).groupBy(_._1).mapValues(l => l.map { v =>
| println("OK mapping")
| v._2
| }.sum)
OK mapping
OK mapping
OK mapping
res2: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
scala> res2('a') // recomputes
OK mapping
OK mapping
res4: Int = 57
Sometimes that is what you want, but often it is surprising. I think there is a puzzler for it.
You were actually on the right track to a reasonably efficient functional solution. If we just switch to an immutable collection and use a fold on a key-value zip, we get:
( Map[Char,Int]() /: (keys,values).zipped ) ( (m,kv) =>
m + ( kv._1 -> ( m.getOrElse( kv._1, 0 ) + kv._2 ) )
)
Or you could use withDefaultValue 0, as you did, if you want the final map to have that default. Note that .zipped is faster than zip because it doesn't create an intermediate collection. And a groupBy would create a number of other intermediate collections. Of course it may not be worth optimizing, and if it is you could do even better than this, but I wanted to show you that your line of thinking wasn't far off the mark.