I'm extending Sparks AccumulableParam[mutable.HashMap[Int,Long], Int] with Scala, for some experiments. Part of this, is to define the method def addInPlace(t1: mutable.HashMap[Int,Long], t2: mutable.HashMap[Int,Long]): mutable.HashMap[Int,Long].
What I want to do:
import scala.collection.mutable.HashMap
def addInPlace(t1: mutable.HashMap[Int,Long], t2: mutable.HashMap[Int,Long]):
mutable.HashMap[Int,Long] = {
t1 ++ t2.map { case (s, c) => (s, c + t1.getOrElse(s, 0L)) }
}
I get the error:
Expression of type mutable.Map[Int, Long] doesn't conform to selected type mutable.HashMap[Int, Long]
In this case the ++ operator returns Map instead of HashMap, even though both terms t1 and t2.map {...} are of type HashMap[int, Long].
So my question is, how to make ++ return a HashMap instead, or how to convert the resulting Map to a HashMap.
An ugly way to do it is to use asInstanceOf[mutable.HashMap[Int, Long]] on the result map:
def addInPlace(t1: mutable.HashMap[Int,Long], t2: mutable.HashMap[Int,Long]): mutable.HashMap[Int,Long] = {
val result = t1 ++ t2.map { case (s, c) => (s, c + t1.getOrElse(s, 0L)) }
result.asInstanceOf[mutable.HashMap[Int, Long]]
}
Related
I use SortedMap as follows:
class Cls
val m = SortedMap[Long, Cls]()
def m = {
val v = m.max._1 //no implicit ordering defined for (Long, Cls)
//do some with it
}
What is the idiomatic way to define Ordering for the Map using Ordering of keys (Long in my case)?
Use Ordering.by to create an Ordering[T] based from a function T => S and an Ordering[S].
Ordering.by((t: (Long, Cls)) => t._1)
Will get you an ordering based on the first field of the tuple. An Ordering[Long] is implicitly available so there is no need to provide it explicitly.
You could use the one define in scala.math.Ordering companion object by importing scala.math.Ordering._. For reference, here is the implementation:
implicit def Tuple2[T1, T2](implicit ord1: Ordering[T1], ord2: Ordering[T2]): Ordering[(T1, T2)] =
new Ordering[(T1, T2)]{
def compare(x: (T1, T2), y: (T1, T2)): Int = {
val compare1 = ord1.compare(x._1, y._1)
if (compare1 != 0) return compare1
val compare2 = ord2.compare(x._2, y._2)
if (compare2 != 0) return compare2
0
}
}
I am new to Scala, and I am implementing a TreeMap with a multidimensional key like this:
class dimSet (val d:Vector[Int]) extends IndexedSeq[Int] {
def apply(idx:Int) = d(idx)
def length: Int = d.length
}
…
var vals : TreeMap[dimSet, A] = TreeMap[dimSet, A]()(orddimSet)
I have this method
def appOp0(t:TreeMap[dimSet,A], t1:TreeMap[dimSet,A], op:(A,A) => A, unop : (A) => A):TreeMap[dimSet,A] = {
if (t.isEmpty) t1.map((e:Tuple2[dimSet, A]) => (e._1, unop(e._2)))
else if (t1.isEmpty) t.map((t:Tuple2[dimSet, A]) => (t._1, unop(t._2)))
else {
val h = t.head
val h1 = t1.head
if ((h._1) == (h1._1)) appOp0(t.tail, t1.tail, op, unop) + ((h._1, op(h._2, h1._2)))
else if (orddimSet.compare(h._1,h1._1) == 1) appOp0(t, t1.tail, op, unop) + ((h1._1, unop(h1._2)))
else appOp0(t.tail, t1, op, unop) + ((h._1, unop(h._2)))
}
}
But the map method on the TreeMaps (second and third lines) returns a Map, not a TreeMap
I tried on repl with a simplier example and I got this:
scala> val t = TreeMap[dimSet, Double]( (new dimSet(Vector(1,1)), 5.1), (new dimSet(Vector(1,2)), 6.3), (new dimSet(Vector(3,1)), 7.1), (new dimSet(Vector(2,2)), 8.4)) (orddimSet)
scala> val tsq = t.map[(dimSet,Double), TreeMap[dimSet,Double]]((v:Tuple2[dimSet, Double]) => ((v._1, v._2 * v._2)))
<console>:41: error: Cannot construct a collection of type scala.collection.immutable.TreeMap[dimSet,Double] with elements of type (dimSet, Double) based on a collection of type scala.collection.immutable.TreeMap[dimSet,Double].
val tsq = t.map[(dimSet,Double), TreeMap[dimSet,Double]]((v:Tuple2[dimSet, Double]) => ((v._1, v._2 * v._2)))
^
scala> val tsq = t.map((v:Tuple2[dimSet, Double]) => ((v._1, v._2 * v._2)))
tsq: scala.collection.immutable.Map[dimSet,Double] = Map((1, 1) -> 26.009999999999998, (1, 2) -> 39.69, (2, 2) -> 70.56, (3, 1) -> 50.41)
I think CanBuildFrom cannot build my TreeMap as it can do with other TreeMaps, but I couldn't find why, ¿What can I do to return a TreeMap?
Thanks
The problem probably is that there is no implicit Ordering[dimSet] available when you call map. That call requires a CanBuildFrom, which in turn requires an implicit Ordering for TreeMap keys: see in docs.
So make orddimSet implicitly available before calling map:
implicit val ev = orddimSet
if (t.isEmpty) t1.map((e:Tuple2[dimSet, A]) => (e._1, unop(e._2)))
Or you can make an Ordering[dimSet] always automatically implicitly available, if you define an implicit Ordering in dimSet's companion object:
object dimSet {
implicit val orddimSet: Ordering[dimSet] = ??? // you code here
}
I have an RDD curRdd of the form
res10: org.apache.spark.rdd.RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] = ShuffledRDD[102]
with curRdd.collect() producing the following result.
Array((Vector((5,2)),1), (Vector((1,1)),2), (Vector((1,1), (5,2)),2))
Here key : vector of pairs of ints and value: count
Now, I want to convert it into another RDD of the same form RDD[(scala.collection.immutable.Vector[(Int, Int)], Int)] by percolating down the counts.
That is (Vector((1,1), (5,2)),2)) will contribute its count of 2 to any key which is a subset of it like (Vector((5,2)),1) becomes (Vector((5,2)),3).
For the example above, our new RDD will have
(Vector((5,2)),3), (Vector((1,1)),4), (Vector((1,1), (5,2)),2)
How do I achieve this? Kindly help.
First you can introduce subsets operation for Seq:
implicit class SubSetsOps[T](val elems: Seq[T]) extends AnyVal {
def subsets: Vector[Seq[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}
empty subset will allways the be first element in the result vector, so you can omit it with .tail
Now your task is pretty obvious map-reduce which is flatMap-reduceByKey in terms of RDD:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
Update
This implementation could introduce new sets in the result, if you would like to choose only those that was presented in the original collection, you can join result with original:
val result = curRdd
.flatMap { case (keys, count) => keys.subsets.tail.map(_ -> count) }
.reduceByKey(_ + _)
.join(curRdd map identity[(Seq[(Int, Int)], Int)])
.map { case (key, (v, _)) => (key, v) }
Note that map identity step is needed to convert key type from Vector[_] to Seq[_] in the original RDD. You can instead modify SubSetsOps definition substituting all occurencest of Seq[T] with Vector[T] or change definition following (hardcode scala.collection) way:
import scala.collection.SeqLike
import scala.collection.generic.CanBuildFrom
implicit class SubSetsOps[T, F[e] <: SeqLike[e, F[e]]](val elems: F[T]) extends AnyVal {
def subsets(implicit cbf: CanBuildFrom[F[T], T, F[T]]): Vector[F[T]] = elems match {
case Seq() => Vector(elems)
case elem +: rest => {
val recur = rest.subsets
recur ++ recur.map(elem +: _)
}
}
}
Given this spinet of code in Scala:
val mapMerge : (Map[VertexId, Factor], Map[VertexId, Factor]) => Map[VertexId, Factor] = (d1, d2) => d1 ++ d2
That can be shortened to:
val mapMerge : (Map[VertexId, Factor], Map[VertexId, Factor]) => Map[VertexId, Factor] = _ ++ _
What actually the code does is renaming the operator ++ of Map[VertexId, Factor] and therefore: Is there a way to assign that operator to the variable? Like in this imaginary example:
val mapMerge : (Map[VertexId, Factor], Map[VertexId, Factor]) => Map[VertexId, Factor] = Map.++
And probably with type inference it would enough to write
val mapMerge = Map[VertexId,Factor].++
Thanks
Unfortunately, no, because the "operators" in Scala are instance methods — not functions from a typeclass, like in Haskell.
Whey you write _ ++ _, you are creating a new 2-argument function(lambda) with unnamed parameters. This is equivalent to (a, b) => a ++ b, which is in turn equivalent to (a, b) => a.++(b), but not to (a, b) => SomeClass.++(a, b).
You can emulate typeclasses by using implicit arguments (see "typeclasses in scala" presentation)
You can pass "operators" like functions — which are not really operators. And you can have operators which look the same. See this example:
object Main {
trait Concat[A] { def ++ (x: A, y: A): A }
implicit object IntConcat extends Concat[Int] {
override def ++ (x: Int, y: Int): Int = (x.toString + y.toString).toInt
}
implicit class ConcatOperators[A: Concat](x: A) {
def ++ (y: A) = implicitly[Concat[A]].++(x, y)
}
def main(args: Array[String]): Unit = {
val a = 1234
val b = 765
val c = a ++ b // Instance method from ConcatOperators — can be used with infix notation like other built-in "operators"
println(c)
val d = highOrderTest(a, b)(IntConcat.++) // 2-argument method from the typeclass instance
println(d)
// both calls to println print "1234765"
}
def highOrderTest[A](x: A, y: A)(fun: (A, A) => A) = fun(x, y)
}
Here we define Concat typeclass and create an implementation for Int and we use operator-like name for the method in typeclass.
Because you can implement a typeclass for any type, you can use such trick with any type — but that would require writing quite some supporting code, and sometimes it is not worth the result.
I have a list which I am combining to a map in this way, by calling the respective value calculation function. I am using collection.breakout to avoid creating unnecessary intermediate collections since what I am doing is a bit combinatorial, and every little bit of saved iterations helps.
I need to filter out certain tuples from the map, in my case where the value is less than 0. Is it possible to add this to the map itself rather than doing a filter afterwards (thus iterating once again)?
val myMap: Map[Key, Int] = keyList.map(key => key -> computeValue(key))(collection.breakOut)
val myFilteredMap = myMap.filter(_._2 >= 0)
In other words I wish to obtain the second map ideally at one go, so ideally in the first call to map() I filter out the tuples I don't want. Is this possible in any way?
You can easily do this with a foldLeft:
keyList.foldLeft( Map[Key,Int]() ) {
(map, key) =>
val value = computeValue(key)
if ( value >= 0 ) {
map + (key -> value)
} else {
map
}
}
It would probably be best to do a flatMap:
import collection.breakOut
type Key = Int
val keyList = List(-1,0,1,2,3)
def computeValue(i: Int) = i*2
val myMap: Map[Key, Int] =
keyList.flatMap { key =>
val v = computeValue(key)
if (v >= 0) Some(key -> v)
else None
}(breakOut)
You can use collect
val myMap: Map[Key, Int] =
keyList.collect {
case key if computeValue(key) >= 0 => key -> computeValue(key)
}(breakOut)
But that requires re-computing computeValue(key), which is silly. Collect is better when you filter then map.
Or make your own method!:
import scala.collection.generic.CanBuildFrom
import scala.collection.TraversableLike
implicit class EnrichedWithMapfilter[A, Repr](val self: TraversableLike[A, Repr]) extends AnyVal {
def maptofilter[B, That](f: A => B)(p: B => Boolean)(implicit bf: CanBuildFrom[Repr, (A, B), That]): That = {
val b = bf(self.asInstanceOf[Repr])
b.sizeHint(self)
for (x <- self) {
val v = f(x)
if (p(v))
b += x -> f(x)
}
b.result
}
}
val myMap: Map[Key, Int] = keyList.maptofilter(computeValue)(_ >= 0)(breakOut)