Sorting keys in an RDD

Sorting keys in an RDD - scala

I need to sort the keys in an RDD, but there is no natural sorting order (not ascending or descending). I wouldn't even know how to write a Comparator to do it. Say I had a map of apples, pears, oranges, and grapes, I want to sort by oranges, apples, grapes, and pears.
Any ideas on how to do this in Spark/Scala? Thanks!

In Scala, you need to look for the Ordering[T] trait rather than the Comparator interface -- mostly a cosmetic difference so that the focus is on the attribute of the data rather than a thing which compares two instances of the data. Implementing the trait requires that the compare(T,T) method be defined. A very explicit version of the enumerated comparison could be:
object fruitOrdering extends Ordering[String] {
def compare(lhs: String, rhs: String): Int = (lhs, rhs) match {
case ("orange", "orange") => 0
case ("orange", _) => -1
case ("apple", "orange") => 1
case ("apple", "apple") => 0
case ("apple", _) => -1
case ("grape", "orange") => 1
case ("grape", "apple") => 1
case ("grape", "grape") => 0
case ("grape", _) => -1
case ("pear", "orange") => 1
case ("pear", "apple") => 1
case ("pear", "grape") => 1
case ("pear", "pear") => 0
case ("pear", _) => -1
case _ => 0
}
}
Or, to slightly adapt zero323's answer:
object fruitOrdering2 extends Ordering[String] {
private val values = Seq("orange", "apple", "grape", "pear")
// generate the map based off of indices so we don't have to worry about human error during updates
private val ordinalMap = values.zipWithIndex.toMap.withDefaultValue(Int.MaxValue)
def compare(lhs: String, rhs: String): Int = ordinalMap(lhs).compare(ordinalMap(rhs))
}
Now that you have an instance of Ordering[String], you need to inform the sortBy method use this ordering rather than the built-in one. If you look at the signature for RDD#sortBy you'll see the full signature is
def sortBy[K](f: (T) ⇒ K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
That implicit Ordering[K] in the second parameter list is normally looked up by the compiler for pre-defined orderings -- that's how it knows what the natural ordering should be. Any implicit parameter, however, can be given an explicit value instead. Note that if you supply one implicit value then you need to supply all, so in this case we also need to provide the ClassTag[K]. That's always generated by the compiler but can be easily explicitly generated using scala.reflect.classTag.
Specifying all of that, the invocation would look like:
import scala.reflect.classTag
rdd.sortBy { case (key, _) => key }(fruitOrdering, classOf[String])
That's still pretty messy, though, isn't it? Luckily we can use implicit classes to take away a lot of the cruft. Here's a snippet that I use fairly commonly:
package com.example.spark
import scala.reflect.ClassTag
import org.apache.spark.rdd.RDD
package object implicits {
implicit class RichSortingRDD[A : ClassTag](underlying: RDD[A]) {
def sorted(implicit ord: Ordering[A]): RDD[A] =
underlying.sortBy(identity)(ord, implicitly[ClassTag[A]])
def sortWith(fn: (A, A) => Int): RDD[A] = {
val ord = new Ordering[A] { def compare(lhs: A, rhs: A): Int = fn(lhs, rhs) }
sorted(ord)
}
}
implicit class RichSortingPairRDD[K : ClassTag, V](underlying: RDD[(K, V)]) {
def sortByKey(implicit ord: Ordering[K]): RDD[(K, V)] =
underlying.sortBy { case (key, _) => key } (ord, implicitly[ClassTag[K]])
def sortByKeyWith(fn: (K, K) => Int): RDD[(K, V)] = {
val ord = new Ordering[K] { def compare(lhs: K, rhs: K): Int = fn(lhs, rhs) }
sortByKey(ord)
}
}
}
And in action:
import com.example.spark.implicits._
val rdd = sc.parallelize(Seq(("grape", 0.3), ("apple", 5.0), ("orange", 5.6)))
rdd.sortByKey(fruitOrdering).collect
// Array[(String, Double)] = Array((orange,5.6), (apple,5.0), (grape,0.3))
rdd.sortByKey.collect // Natural ordering by default
// Array[(String, Double)] = Array((apple,5.0), (grape,0.3), (orange,5.6))
rdd.sortWith(_._2 compare _._2).collect // sort by the value instead
// Array[(String, Double)] = Array((grape,0.3), (apple,5.0), (orange,5.6))

If the only way you can describe the order is enumeration then simply enumerate:
val order = Map("orange" -> 0L, "apple" -> 1L, "grape" -> 2L, "pear" -> 3L)
val rdd = sc.parallelize(Seq(("grape", 0.3), ("apple", 5.0), ("orange", 5.6)))
val sorted = rdd.sortBy{case (key, _) => order.getOrElse(key, Long.MaxValue)}
sorted.collect
// Array[(String, Double)] = Array((orange,5.6), (apple,5.0), (grape,0.3))

There is a sortBy method in Spark which allows you to define an arbitrary ordering and whether you want ascending or descending. E.g.
scala> val rdd = sc.parallelize(Seq ( ("a", 1), ("z", 7), ("p", 3), ("a", 13) ))
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[331] at parallelize at <console>:70
scala> rdd.sortBy( _._2, ascending = false) .collect.mkString("\n")
res34: String =
(a,13)
(z,7)
(p,3)
(a,1)
scala> rdd.sortBy( _._1, ascending = false) .collect.mkString("\n")
res35: String =
(z,7)
(p,3)
(a,1)
(a,13)
scala> rdd.sortBy
def sortBy[K](f: T => K, ascending: Boolean, numPartitions: Int)(implicit ord: scala.math.Ordering[K], ctag: scala.reflect.ClassTag[K]): RDD[T]
The last part tells you what the signature of sortBy is. The ordering used in previous examples is by the first and second part of the pair.
Edit: answered too quickly, without checking your question, sorry... Anyway, you would define your ordering like in your example:
def myord(fruit:String) = fruit match {
case "oranges" => 1 ;
case "apples" => 2;
case "grapes" =>3;
case "pears" => 4;
case _ => 5}
val rdd = sc.parallelize(Seq("apples", "oranges" , "pears", "grapes" , "other") )
Then, the result of ordering would be:
scala> rdd.sortBy[Int](myord, ascending = true).collect.mkString("\n")
res1: String =
oranges
apples
grapes
pears
other

I don't know about spark, but with pure Scala collections that would be
_.sortBy(_.fruitType)
For example,
val l: List[String] = List("the", "big", "bang")
val sortedByFirstLetter = l.sortBy(_.head)
// List(big, bang, the)

Related

Sorting a collection of collections by indices of inner collection elements in Scala

Let us have a collection of collections as below:
type Row = IndexedSeq[Any]
type RowTable = IndexedSeq[Row]
val table: RowTable = IndexedSeq(
IndexedSeq(2, "b", ... /* some elements of type Any*/),
IndexedSeq(1, "a", ...),
IndexedSeq(2, "c", ...))
Each Row in RowTable "has the same schema", meaning that as in example if the first row in the table contains Int, String, ..., then the second row in the table contains the elements of the same type in the same order, i.e., Int, String, ....
I would like to sort Rows in a RowTable by given indices of Row's elements and the sorting direction (ascending or descending sort) for that element.
For example, the collection above would be sorted this way for Index 0 ascending and Index 1 descending and the rest of elements are not important in sorting:
1, "a", ...
2, "c", ...
2, "b", ...
Since Row is IndexedSeq[Any], we do not know the type of each element to compare it; however, we know that it may be casted to Comparable[Any] and, thus, has compareTo() method to compare it with an element under the same index in another row.
The indices, as mentioned above, that will determine the sorting order are not known before we start sorting. How can I code this in Scala?

First of all, it's a bad design to compare a pair of Any.
By default, scala doesn't provide any way to get Ordering[Any]. Hence if you want to compare a pair of Any, you should implement Ordering[Any] by yourself:
object AnyOrdering extends Ordering[Any] {
override def compare(xRaw: Any, yRaw: Any): Int = {
(xRaw, yRaw) match {
case (x: Int, y: Int) => Ordering.Int.compare(x, y)
case (_: Int, _) => 1
case (_, _: Int) => -1
...
case (x: String, y: String) => Ordering.String.compare(x, y)
case (_: String, _) => 1
case (_, _: String) => -1
...
case (_, _) => 0
}
}
}
In your example, you want to compare two IndexedSeq[T] recursively. Scala doesn't provide any recursive Ordering and you need to implement it too:
def recOrdering[T](implicit ordering: Ordering[T]): Ordering[IndexedSeq[T]] = new Ordering[IndexedSeq[T]] {
override def compare(x: IndexedSeq[T], y: IndexedSeq[T]): Int = compareRec(x, y)
#tailrec
private def compareRec(x: IndexedSeq[T], y: IndexedSeq[T]): Int = {
(x.headOption, y.headOption) match {
case (Some(xHead), Some(yHead)) =>
val compare = ordering.compare(xHead, yHead)
if (compare == 0) {
compareRec(x.tail, y.tail)
} else {
compare
}
case (Some(_), None) => 1
case (None, Some(_)) => -1
}
}
}
After that you can finally sort your collection:
table.sorted(recOrdering(AnyOrdering))

(Sorry for unidiomatic (maybe not compiling) code; I can probably help with it upon request)
We can use the code below to sort a table
table.sortWith {
case (tupleL, tupleR) => isLessThan(tupleL, tupleR)
}
where isLessThan is defined as follows (unidiomatic to Scala, ik):
def isLessThan(tupleL: Row, tupleR: Row): Boolean = {
var i = 0
while (i < sortInfos.length) {
val sortInfo = sortInfos(i)
val result = tupleL(sortInfo.fieldIndex)
.asInstanceOf[Comparable[Any]].compareTo(
tupleR(sortInfo.fieldIndex)
.asInstanceOf[Comparable[Any]])
if (result != 0) {
if (sortInfo.isDescending) {
if (result > 0)
return true
else
return false
} else {
if (result < 0)
return true
else
return false
}
}
i += 1
}
true
}
where sortInfos is IndexedSeq[SortInfo] and
case class SortInfo(val fieldIndex: Int, val isDescending: Boolean)

Here is working Example with Ordering[IndexedSeq[Any]]:
val table: IndexedSeq[IndexedSeq[Any]] = IndexedSeq(
IndexedSeq(2, "b", "a"),
IndexedSeq(2, "b"),
IndexedSeq("c", 2),
IndexedSeq(1, "c"),
IndexedSeq("c", "c"),
//IndexedSeq((), "c"), //it will blow in runtime
IndexedSeq(2, "a"),
)
implicit val isaOrdering:Ordering[IndexedSeq[Any]] = { (a, b) =>
a.zip(b).filter {case (a, b)=> a != b}.collectFirst {
case (a:Int, b:Int) => a compare b
case (a:String, b:String) => a compare b
case (a:String, b:Int) => 1 //prefere ints over strings
case (a:Int, b:String) => -1 //prefere ints over strings
case _ => throw new RuntimeException(s"cannot compare $a to $b")
}.getOrElse(a.length compare b.length) //shorter will be first
}
println(table.sorted) //used implicitly
println(table.sorted(isaOrdering))
//Vector(Vector(1, c), Vector(2, a), Vector(2, b), Vector(2, b, a), Vector(c, 2), Vector(c, c))
https://scalafiddle.io/sf/yvLEnYL/4
or if you really need to compare different types somehow this is best I could find:
implicit val isaOrdering:Ordering[IndexedSeq[Any]] = { (a, b) =>
a.zip(b).filter {case (a, b)=> a != b}.collectFirst {
case (a:Int, b:Int) => a compare b
case (a:String, b:String) => a compare b
//add your known types here
// ...
//below is rule that cares about unknown cases.
//We don't know types at all, at best what we can do is compare equality.
//If they are equal then return 0... if not we throw
//this could be also very slow (don't tested)
case (a, b) =>
//not nice but it is stable at least
val ac = a.getClass.getName
val bc = b.getClass.getName
ac.compare(bc) match {
case 0 => if (ac == bc) 0 else throw new RuntimeException(s"cannot compare $a to $b")
case x => x
}
}.getOrElse(a.length compare b.length) //shorter will be first
}
https://scalafiddle.io/sf/yvLEnYL/5
This implementation will fail in runtime when we could not compare them.

group by with foldleft scala

I have the following list in input:
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
and I want to write a function in scala using groupBy and foldleft ( just one function) in order to sum up third and fourth colum for lines having the same title(first column here), the wanted output is :
val listOutput1 =
List(
"itemA,CATS,5,5",
"itemB,CATQ,8,11",
"itemC,CARC,5,10"
)
def sumIndex (listIn:List[String]):List[String]={
listIn.map(_.split(",")).groupBy(_(0)).map{
case (title, label) =>
"%s,%s,%d,%d".format(
title,
label.head.apply(1),
label.map(_(2).toInt).sum,
label.map(_(3).toInt).sum)}.toList
}
Kind regards

The logic in your code looks sound, here it is with a case class implemented as that handles edge cases more cleanly:
// represents a 'row' in the original list
case class Item(
name: String,
category: String,
amount: Int,
price: Int
)
// safely converts the row of strings into case class, throws exception otherwise
def stringsToItem(strings: Array[String]): Item = {
if (strings.length != 4) {
throw new Exception(s"Invalid row: ${strings.foreach(print)}; must contain only 4 entries!")
} else {
val n = strings.headOption.getOrElse("N/A")
val cat = strings.lift(1).getOrElse("N/A")
val amt = strings.lift(2).filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
val p = strings.lastOption.filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
Item(n, cat, amt, p)
}
}
// original code with case class and method above used
listInput1.map(_.split(","))
.map(stringsToItem)
.groupBy(_.name)
.map { case (name, items) =>
Item(
name,
category = items.head.category,
amount = items.map(_.amount).sum,
price = items.map(_.price).sum
)
}.toList

You can solve it with a single foldLeft, iterating the input list only once. Use a Map to aggregate the result.
listInput1.map(_.split(",")).foldLeft(Map.empty[String, Int]) {
(acc: Map[String, Int], curr: Array[String]) =>
val label: String = curr(0)
val oldValue: Int = acc.getOrElse(label, 0)
val newValue: Int = oldValue + curr(2).toInt + curr(3).toInt
acc.updated(label, newValue)
}
result: Map(itemA -> 10, itemB -> 19, itemC -> 15)

If you have a list as
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
Then you can write a general function that can be used with foldLeft and reduceLeft as
def accumulateLeft(x: Map[String, Tuple3[String, Int, Int]], y: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and you can call them as
foldLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldLeft(Map.empty[String, Tuple3[String, Int, Int]])(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res0: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
reduceLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceLeft(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res1: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
Similarly you can just interchange the variables in the general function so that it can be used with foldRight and reduceRight as
def accumulateRight(y: Map[String, Tuple3[String, Int, Int]], x: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and calling the function would give you
foldRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldRight(Map.empty[String, Tuple3[String, Int, Int]])(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res2: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
reduceRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceRight(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res3: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
So you don't really need a groupBy and can use any of the foldLeft, foldRight, reduceLeft or reduceRight functions to get your desired output.

In Scala, given a list of lists, how can I create one nested HashMap from the elements?

In Scala, given a list of lists, how can I create one nested HashMap from the elements? I would like to create the HashMap as a hierarchical tree such that for an element at index i, the element at index i - 1 is its parent.
Example for lists of known length:
val lst = List (
List(34, 56, 78),
List(34, 56,79),
List (87, 23, 12),
List(87, 90, 78),
List(1, 45, 87)
)
scala> lst.groupBy(l => l(0))
.mapValues(l => l.groupBy(x => x(1)))
.mapValues{ case x => x.mapValues(y => y.map (z => z(2))) }
res2: scala.collection.immutable.Map[Int,scala.collection.immutable.Map[Int,List[Int]]] = Map(34 -> Map(56 -> List(78, 79)), 1 -> Map(45 -> List(87)), 87 -> Map(23 -> List(12), 90 -> List(78)))
This method works when the length of the elements are known but does not work for an arbitrary length N. Is there any solution that can create this nested map for lists of any length where every list has the same length?

Some preliminary tests seem to indicate that this might work.
def nest(lli: List[List[Int]]): Traversable[_] =
if (lli.head.size == 1)
lli.flatten.distinct
else
lli.groupBy(_.head)
.mapValues(vs => nest(vs.map(_.tail)))

private def buildPartitionTree(partitionValues: List[List[Any]]): Map[Any, Any] = {
val valuesAsNestedMaps = partitionValues.map(_.foldRight(Map[Any,Map[Any,_]]()) { case (partitionValue, map) =>
Map(partitionValue.toString -> map)
}).map(_.asInstanceOf[Map[Any, Any]])
valuesAsNestedMaps.reduce[Map[Any, Any]] { case (map1: Map[Any, Any], map2: Map[Any, Any]) => mergeMaps(map1, map2) }
}
private def mergeMaps(map1 : Map[Any, Any], map2 : Map[Any, Any]) = (map1.keySet ++ map2.keySet).map(key =>
key -> mergeMapValues(map1.get(key), map2.get(key))
).toMap
private def mergeMapValues(o1 : Option[Any], o2 : Option[Any]): Any = (o1, o2) match {
case (Some(v1: Map[Any, Any]), Some(v2: Map[Any, Any])) => mergeMaps(v1, v2)
case (None, Some(x)) => x
case (Some(y), None) => y
}
val nestedMap = buildPartitionTree(lst)

Since the size of sublists is arbitrary you cannot specify the result type of desired function. Consider introducing recursive data structure like this:
trait Tree[A]
case class Node[A](key:A, list:List[Tree[A]]) extends Tree[A]
case class Leaf[A](value:A) extends Tree[A]
Now you can create function producing desired result in terms of trees:
def toTree[A](key:A, list:List[List[A]]):Tree[A] =
if (list.exists(_.isEmpty)) Leaf(key)
else Node(key, list.groupBy(_.head).map {case (k,v) => toTree(k, v.map(_.tail))}.toList)
Since you don't have 'root' value for key, you can call toTree function with some fake key:
toTree(-1, lst)
res1: Node(-1,List(Node(34,List(Node(56,List(Leaf(79), Leaf(78))))), Node(1,List(Node(45,List(Leaf(87))))), Node(87,List(Node(23,List(Leaf(12))), Node(90,List(Leaf(78)))))))

subsets manipulation on vectors in spark scala

I have an RDD curRdd of the form
res10: org.apache.spark.rdd.RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] = ShuffledRDD[102]
with curRdd.collect() producing the following result.
Array((Vector((5,2)),1), (Vector((1,1)),2), (Vector((1,1), (5,2)),2))
Here key : vector of pairs of ints and value: count
Now, I want to convert it into another RDD of the same form RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] by percolating down the counts.
That is (Vector((1,1), (5,2)),2)) will contribute its count of 2 to any key which is a subset of it like (Vector((5,2)),1) becomes (Vector((5,2)),3).
For the example above, our new RDD will have
(Vector((5,2)),3), (Vector((1,1)),4), (Vector((1,1), (5,2)),2)
How do I achieve this? Kindly help.

First you can introduce subsets operation for Seq:
implicit class SubSetsOps[T](val elems: Seq[T]) extends AnyVal {
def subsets: Vector[Seq[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}
empty subset will allways the be first element in the result vector, so you can omit it with .tail
Now your task is pretty obvious map-reduce which is flatMap-reduceByKey in terms of RDD:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
Update
This implementation could introduce new sets in the result, if you would like to choose only those that was presented in the original collection, you can join result with original:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
.join(curRdd map identity[(Seq[(Int, Int)], Int)])
.map { case (key, (v, _)) => (key, v) }
Note that map identity step is needed to convert key type from Vector[_] to Seq[_] in the original RDD. You can instead modify SubSetsOps definition substituting all occurencest of Seq[T] with Vector[T] or change definition following (hardcode scala.collection) way:
import scala.collection.SeqLike
import scala.collection.generic.CanBuildFrom
implicit class SubSetsOps[T, F[e] <: SeqLike[e, F[e]]](val elems: F[T]) extends AnyVal {
def subsets(implicit cbf: CanBuildFrom[F[T], T, F[T]]): Vector[F[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}

Combining a filter within a map

I have a list which I am combining to a map in this way, by calling the respective value calculation function. I am using collection.breakout to avoid creating unnecessary intermediate collections since what I am doing is a bit combinatorial, and every little bit of saved iterations helps.
I need to filter out certain tuples from the map, in my case where the value is less than 0. Is it possible to add this to the map itself rather than doing a filter afterwards (thus iterating once again)?
val myMap: Map[Key, Int] = keyList.map(key => key -> computeValue(key))(collection.breakOut)
val myFilteredMap = myMap.filter(_._2 >= 0)
In other words I wish to obtain the second map ideally at one go, so ideally in the first call to map() I filter out the tuples I don't want. Is this possible in any way?

You can easily do this with a foldLeft:
keyList.foldLeft( Map[Key,Int]() ) {
(map, key) =>
val value = computeValue(key)
if ( value >= 0 ) {
map + (key -> value)
} else {
map
}
}

It would probably be best to do a flatMap:
import collection.breakOut
type Key = Int
val keyList = List(-1,0,1,2,3)
def computeValue(i: Int) = i*2
val myMap: Map[Key, Int] =
keyList.flatMap { key =>
val v = computeValue(key)
if (v >= 0) Some(key -> v)
else None
}(breakOut)
You can use collect
val myMap: Map[Key, Int] =
keyList.collect {
case key if computeValue(key) >= 0 => key -> computeValue(key)
}(breakOut)
But that requires re-computing computeValue(key), which is silly. Collect is better when you filter then map.
Or make your own method!:
import scala.collection.generic.CanBuildFrom
import scala.collection.TraversableLike
implicit class EnrichedWithMapfilter[A, Repr](val self: TraversableLike[A, Repr]) extends AnyVal {
def maptofilter[B, That](f: A => B)(p: B => Boolean)(implicit bf: CanBuildFrom[Repr, (A, B), That]): That = {
val b = bf(self.asInstanceOf[Repr])
b.sizeHint(self)
for (x <- self) {
val v = f(x)
if (p(v))
b += x -> f(x)
}
b.result
}
}
val myMap: Map[Key, Int] = keyList.maptofilter(computeValue)(_ >= 0)(breakOut)