How to implement cost estimation for deferred computations with `flatMap`? - scala

I have implemented a Calculation class that takes 2 parameters: a calculation input that is a call-by-name param and also a cost. When I try to flatMap calculations, the first part of it gets executed. Is it possible to defer everything in flatMap and still provide the total cost?
class Calculation[+R](input: => R, val cost: Int = 0) {
def value: R = input
def map[A](f: R => A): Calculation[A] =
new Calculation(f(input), cost)
def flatMap[A](f: R => Calculation[A]): Calculation[A] = {
val step = f(input)
new Calculation(step.value, cost + step.cost)
}
}
object Rextester extends App {
val f1 = new Calculation({
println("F1")
"F1"
})
val f2 = f1.flatMap(s => new Calculation({
println("F2")
s + " => F2"
}))
println(f2.cost)
}
Once f2 is declared (flatMap is called), we can see that "F1" will be printed. The printed cost is "15", which is correct, but I would like to have the actual calculation fully deferred, meaning that I shouldn't see the f1 being executed when I calculate the cost.

You just need a little more laziness, so that the cost isn't eagerly evaluated in the flatMap:
class Calculation[+R](input: => R, c: => Int = 0) {
def value: R = input
lazy val cost: Int = c
def map[A](f: R => A): Calculation[A] =
new Calculation(f(input), cost)
def flatMap[A](f: R => Calculation[A]): Calculation[A] = {
lazy val step = f(value)
new Calculation(step.value, cost + step.cost)
}
}
Note that this still might not have exactly the semantics you want (e.g. calling f2.value twice in a row will results in both F1 and F2 being printed the first time, and only F2 the second), but it does keep the side effect from occurring when f2 is defined.

If I understand your requirement
defer everything in flatMap and still provide the total cost
correctly, then you want to compute an estimate for the total costs before making any computations. I don't see how this is supposed to work with the signature flatMap[A](f: R => Calculation[A]): Calculation[A] - your cost is attached to Calculation[A], and your Calculation[A] depends on a concrete instance of R, so you cannot compute the cost before computing R.
Constant costs for computation steps
Here is a completely different proposal:
sealed trait Step[-A, +B] extends (A => B) { outer =>
def estimatedCosts: Int
def andThen[U >: B, C](next: Step[U, C]): Step[A, C] = new Step[A, C] {
def apply(a: A): C = next(outer(a))
def estimatedCosts = outer.estimatedCosts + next.estimatedCosts
}
def result(implicit u_is_a: Unit <:< A): B = this(u_is_a(()))
}
type Computation[+R] = Step[Unit, R]
The trait Step represents a computation step, for which the costs do not depend on the input. It's essentially just a Function[A, B] with an integer value attached to it. Your Computation[R] then becomes a special case, namely Step[Unit, R].
Here is how you can use it:
val x = new Step[Unit, Int] {
def apply(_u: Unit) = 42
def estimatedCosts = 0
}
val mul = new Step[Int, Int] {
def apply(i: Int) = {
println("<computing> adding is easy")
i + 58
}
def estimatedCosts = 10
}
val sqrt = new Step[Int, Double] {
def apply(i: Int) = {
println("<computing> finding square roots is difficult")
math.sqrt(i)
}
def estimatedCosts = 50
}
val c: Computation[Double] = x andThen mul andThen sqrt
println("Estimated costs: " + c.estimatedCosts)
println("(nothing computed so far)")
println(c.result)
If you run it, you obtain:
Estimated costs: 60
(nothing computed so far)
<computing> adding is easy
<computing> finding square roots is difficult
10.0
What it does is the following:
It starts with value 42, adds 58 to it, and then computes the square root of the sum
Addition is set to cost 10 units, square root costs 50.
It gives you the cost estimate of 60 units, without performing any computations.
Only when you invoke .result does it compute the actual result 10.0
Admittedly, it's not very useful for anything except very coarse order-of-magnitude estimates. It's so coarse that even using Ints barely makes any sense.
Non-constant costs per step
You can make your cost estimates more accurate by keeping track of a size estimate as follows:
trait Step[-A, +B] extends (A => B) {
def outputSizeEstimate(inputSizeEstimate: Int): Int
def costs(inputSizeEstimate: Int): Int
}
trait Computation[+R] { outer =>
def result: R
def resultSize: Int
def estimatedCosts: Int
def map[S](step: Step[R, S]): Computation[S] = new Computation[S] {
def result: S = step(outer.result)
def estimatedCosts: Int = outer.estimatedCosts + step.costs(outer.resultSize)
def resultSize: Int = step.outputSizeEstimate(outer.resultSize)
}
}
val x = new Computation[List[Int]] {
def result = (0 to 10).toList
def resultSize = 10
def estimatedCosts = 10
}
val incrementEach = new Step[List[Int], List[Int]] {
def outputSizeEstimate(inputSize: Int) = inputSize
def apply(xs: List[Int]) = {
println("incrementing...")
xs.map(1.+)
}
def costs(inputSize: Int) = 3 * inputSize
}
val timesSelf = new Step[List[Int], List[(Int, Int)]] {
def outputSizeEstimate(n: Int) = n * n
def apply(xs: List[Int]) = {
println("^2...")
for (x <- xs; y <- xs) yield (x, y)
}
def costs(n: Int) = 5 * n * n
}
val addPairs = new Step[List[(Int, Int)], List[Int]] {
def outputSizeEstimate(n: Int) = n
def apply(xs: List[(Int, Int)]) = {
println("adding...")
xs.map{ case (a, b) => a + b }
}
def costs(n: Int) = 7 * n
}
val y = x map incrementEach map timesSelf map addPairs
println("Estimated costs (manually): " + (10 + 30 + 500 + 700))
println("Estimated costs (automatically): " + y.estimatedCosts)
println("(nothing computed so far)")
println(y.result)
The output looks encouraging:
Estimated costs (manually): 1240
Estimated costs (automatically): 1240
(nothing computed so far)
incrementing...
^2...
adding...
List(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...[omitted]..., 20, 21, 22)
Note that the approach is not restricted to lists and integers: the size estimates can be arbitrarily complicated. For example, they could be dimensions of matrices or tensors. Actually, they don't have to be sizes at all. Those estimates could just as well contain any other kind of "static conservative estimates", like types or logical predicates.
Non-constant costs, using Writer
Using the Writer monad from Cats, we can express the same idea more succinctly by replacing the two methods outputSizeEstimate and costs on a Step by a single method that takes an Int and returns a Writer[Int, Int]:
Writers .value corresponds to size estimate for the output
Writers .written corresponds to costs of the step (which might depend on the input size)
Full code:
import cats.data.Writer
import cats.syntax.writer._
import cats.instances.int._
object EstimatingCosts extends App {
type Costs = Int
type Size = Int
trait Step[-A, +B] extends (A => B) {
def sizeWithCosts(inputSizeEstimate: Size): Writer[Costs, Size]
}
object Step {
def apply[A, B]
(sizeCosts: Size => (Size, Costs))
(mapResult: A => B)
: Step[A, B] = new Step[A, B] {
def apply(a: A) = mapResult(a)
def sizeWithCosts(s: Size) = { val (s2, c) = sizeCosts(s); Writer(c, s2) }
}
}
trait Computation[+R] { outer =>
def result: R
def sizeWithCosts: Writer[Costs, Size]
def size: Size = sizeWithCosts.value
def costs: Costs = sizeWithCosts.written
def map[S](step: Step[R, S]): Computation[S] = new Computation[S] {
lazy val result: S = step(outer.result)
lazy val sizeWithCosts = outer.sizeWithCosts.flatMap(step.sizeWithCosts)
}
}
object Computation {
def apply[A](initialSize: Size, initialCosts: Costs)(a: => A) = {
new Computation[A] {
lazy val result = a
lazy val sizeWithCosts = Writer(initialCosts, initialSize)
}
}
}
val x = Computation(10, 10){ (0 to 10).toList }
val incrementEach = Step(n => (n, 3 * n)){ (xs: List[Int]) =>
println("incrementing...")
xs.map(1.+)
}
val timesSelf = Step(n => (n * n, 5 * n * n)) { (xs: List[Int]) =>
println("^2...")
for (x <- xs; y <- xs) yield (x, y)
}
val addPairs = Step(n => (n, 7 * n)) { (xs: List[(Int, Int)]) =>
println("adding...")
xs.map{ case (a, b) => a + b }
}
val y = x map incrementEach map timesSelf map addPairs
println("Estimated costs (manually): " + (10 + 30 + 500 + 700))
println("Estimated costs (automatically): " + y.costs)
println("(nothing computed so far)")
println(y.result)
}
The output stays exactly the same as in the previous section.
PS: I think I came up with a more concise way to summarize this entire answer:
Use the product category of the ordinary ambient Scala category (types and functions) with the monoid of endomorphisms on object Int in the Kleisli category of Writer[Int, ?].
In some hypothetical language, the answer might have been:
Use Sc * End{Kl(Writer[Int, ?])}[Int].

First of all there is no reason to re-invent your own Functor and FlatMap and I would strongly advice you to use existing implementation.
If you need deffered computation than cats.Writer[Int, ?] is your friend.
With its support you can write your cost as well as get functor and monad instances.
Let me give you an example. We start with some initial cost
val w = Writer.put("F1")(0)
w.flatMap(v => Writer.value(v + "F2"))

Related

Monadic approach to estimating PI in scala

I'm trying to understand how to leverage monads in scala to solve simple problems as way of building up my familiarity. One simple problem is estimating PI using a functional random number generator. I'm including the code below for a simple stream based approach.
I'm looking for help in translating this to a monadic approach. For example, is there an idiomatic way convert this code to using the state (and other monads) in a stack safe way?
trait RNG {
def nextInt: (Int, RNG)
def nextDouble: (Double, RNG)
}
case class Point(x: Double, y: Double) {
val isInCircle = (x * x + y * y) < 1.0
}
object RNG {
def nonNegativeInt(rng: RNG): (Int, RNG) = {
val (ni, rng2) = rng.nextInt
if (ni > 0) (ni, rng2)
else if (ni == Int.MinValue) (0, rng2)
else (ni + Int.MaxValue, rng2)
}
def double(rng: RNG): (Double, RNG) = {
val (ni, rng2) = nonNegativeInt(rng)
(ni.toDouble / Int.MaxValue, rng2)
}
case class Simple(seed: Long) extends RNG {
def nextInt: (Int, RNG) = {
val newSeed = (seed * 0x5DEECE66DL + 0xBL) & 0xFFFFFFFFFFFFL
val nextRNG = Simple(newSeed)
val n = (newSeed >>> 16).toInt
(n, nextRNG)
}
def nextDouble: (Double, RNG) = {
val (n, nextRNG) = nextInt
double(nextRNG)
}
}
}
object PI {
import RNG._
def doubleStream(rng: Simple):Stream[Double] = rng.nextDouble match {
case (d:Double, next:Simple) => d #:: doubleStream(next)
}
def estimate(rng: Simple, iter: Int): Double = {
val doubles = doubleStream(rng).take(iter)
val inside = (doubles zip doubles.drop(3))
.map { case (a, b) => Point(a, b) }
.filter(p => p.isInCircle)
.size * 1.0
(inside / iter) * 4.0
}
}
// > PI.estimate(RNG.Simple(10), 100000)
// res1: Double = 3.14944
I suspect I'm looking for something like replicateM from the Applicative monad in cats but I'm not sure how to line up the types or how to do it in a way that doesn't accumulate intermediate results in memory. Or, is there a way to do it with a for comprehension that can iteratively build up Points?
Id you want to iterate using monad in a stack safe way, then there is a tailRecM method implemented in Monad type class:
// assuming random generated [-1.0,1.0]
def calculatePi[F[_]](iterations: Int)
(random: => F[Double])
(implicit F: Monad[F]): F[Double] = {
case class Iterations(total: Int, inCircle: Int)
def step(data: Iterations): F[Either[Iterations, Double]] = for {
x <- random
y <- random
isInCircle = (x * x + y * y) < 1.0
newTotal = data.total + 1
newInCircle = data.inCircle + (if (isInCircle) 1 else 0)
} yield {
if (newTotal >= iterations) Right(newInCircle.toDouble / newTotal.toDouble * 4.0)
else Left(Iterations(newTotal, newInCircle))
}
// iterates until Right value is returned
F.tailRecM(Iterations(0, 0))(step)
}
calculatePi(10000)(Future { Random.nextDouble }).onComplete(println)
It uses by-name param because you could try to pass there something like Future (even though the Future is not lawful), which are eager, so you would end up with evaluating the same thing time and time again. With by name param at least you have the chance of passing there a recipe for side-effecting random. Of course, if we use Option, List as a monad holding our "random" number, we should also expect funny results.
The correct solution would be using something that ensures that this F[A] is lazily evaluated, and any side effect inside is evaluated each time you need a value from inside. For that you basically have to use some of Effects type classes, like e.g. Sync from Cats Effects.
def calculatePi[F[_]](iterations: Int)
(random: F[Double])
(implicit F: Sync[F]): F[Double] = {
...
}
calculatePi(10000)(Coeval( Random.nextDouble )).value
calculatePi(10000)(Task( Random.nextDouble )).runAsync
Alternatively, if you don't care about purity that much, you could pass side effecting function or object instead of F[Int] for generating random numbers.
// simplified, hardcoded F=Coeval
def calculatePi(iterations: Int)
(random: () => Double): Double = {
case class Iterations(total: Int, inCircle: Int)
def step(data: Iterations) = Coeval {
val x = random()
val y = random()
val isInCircle = (x * x + y * y) < 1.0
val newTotal = data.total + 1
val newInCircle = data.inCircle + (if (isInCircle) 1 else 0)
if (newTotal >= iterations) Right(newInCircle.toDouble / newTotal.toDouble * 4.0)
else Left(Iterations(newTotal, newInCircle))
}
Monad[Coeval].tailRecM(Iterations(0, 0))(step).value
}
Here is another approach that my friend Charles Miller came up with. It's a bit more direct since it uses RNG directly but it follows the same approach provided by #Mateusz Kubuszok above that leverages Monad.
The key difference is that it leverages the State monad so we can thread the RNG state through the computation and generate the random numbers using the "pure" random number generator.
import cats._
import cats.data._
import cats.implicits._
object PICharles {
type RNG[A] = State[Long, A]
object RNG {
def nextLong: RNG[Long] =
State.modify[Long](
seed ⇒ (seed * 0x5DEECE66DL + 0xBL) & 0xFFFFFFFFFFFFL
) >> State.get
def nextInt: RNG[Int] = nextLong.map(l ⇒ (l >>> 16).toInt)
def nextNatural: RNG[Int] = nextInt.map { i ⇒
if (i > 0) i
else if (i == Int.MinValue) 0
else i + Int.MaxValue
}
def nextDouble: RNG[Double] = nextNatural.map(_.toDouble / Int.MaxValue)
def runRng[A](seed: Long)(rng: RNG[A]): A = rng.runA(seed).value
def unsafeRunRng[A]: RNG[A] ⇒ A = runRng(System.currentTimeMillis)
}
object PI {
case class Step(count: Int, inCircle: Int)
def calculatePi(iterations: Int): RNG[Double] = {
def step(s: Step): RNG[Either[Step, Double]] =
for {
x ← RNG.nextDouble
y ← RNG.nextDouble
isInCircle = (x * x + y * y) < 1.0
newInCircle = s.inCircle + (if (isInCircle) 1 else 0)
} yield {
if (s.count >= iterations)
Right(s.inCircle.toDouble / s.count.toDouble * 4.0)
else
Left(Step(s.count + 1, newInCircle))
}
Monad[RNG].tailRecM(Step(0, 0))(step(_))
}
def unsafeCalculatePi(iterations: Int) =
RNG.unsafeRunRng(calculatePi(iterations))
}
}
Thanks Charles & Mateusz for your help!

Rougly (or partially) sort a list in Scala

Considering a list of several million objects like:
case class Point(val name:String, val x:Double, val y:Double)
I need, for a given Point target, to pick the 10 other points which are closest to the target.
val target = Point("myPoint", 34, 42)
val points = List(...) // list of several million points
def distance(p1: Point, p2: Point) = ??? // return the distance between two points
val closest10 = points.sortWith((a, b) => {
distance(a, target) < distance(b, target)
}).take(10)
This method works but is very slow. Indeed, the whole list is exhaustively sorted for each target request, whereas past the first 10 closest points, I really don't care about any kind of sorting. I don't even need that the first 10 closest are returned in the correct order.
Ideally, I'd be looking for a "return 10 first and don't pay attention to the rest" kind of method..
Naive solution that I can think of would sound like this: sort by buckets of 1000, take first bucket, sort it by buckets of 100, take first bucket, sort it by buckets of 10, take first bucket, done.
Question is, I guess this must be a very common problem in CS, so before rolling out my own solution based on this naive approach, I'd like to know of any state-of-the-art way of doing that, or even if some standard method already exists.
TL;DR how to get the first 10 items of an unsorted list, without having to sort the whole list?
Below is a barebone method adapted from this SO answer for picking n smallest integers from a list (which can be enhanced to handle more complex data structure):
def nSmallest(n: Int, list: List[Int]): List[Int] = {
def update(l: List[Int], e: Int): List[Int] =
if (e < l.head) (e :: l.tail).sortWith(_ > _) else l
list.drop(n).foldLeft( list.take(n).sortWith(_ > _) )( update(_, _) )
}
nSmallest( 5, List(3, 2, 8, 2, 9, 1, 5, 5, 9, 1, 7, 3, 4) )
// res1: List[Int] = List(3, 2, 2, 1, 1)
Please note that the output is in reverse order.
I was looking at this and wondered if a PriorityQueue might be useful.
import scala.collection.mutable.PriorityQueue
case class Point(val name:String, val x:Double, val y:Double)
val target = Point("myPoint", 34, 42)
val points = List(...) //list of points
def distance(p1: Point, p2: Point) = ??? //distance between two points
//load points-priority-queue with first 10 points
val ppq = PriorityQueue(points.take(10):_*){
case (a,b) => distance(a,target) compare distance(b,target) //prioritize points
}
//step through everything after the first 10
points.drop(10).foldLeft(distance(ppq.head,target))((mxDst,nextPnt) =>
if (mxDst > distance(nextPnt,target)) {
ppq.dequeue() //drop current far point
ppq.enqueue(nextPnt) //load replacement point
distance(ppq.head,target) //return new max distance
} else mxDst)
val result: List[Double] = ppq.dequeueAll //10 closest points
How it can be done with QuickSelect. I used in-place QuickSelect. Basically, for every target point we calculate the distance between all points and target and use QuickSelect to get k-th smallest distance (k-th order statistic). Will this algo be faster than using sorting depends on factors like number of points, number of nearests and number of targets. In my machine for 3kk random generated points, 10 target points and asking for 10 nearest points, it's 2 times faster than using Sort algo:
Number of points: 3000000
Number of targets: 10
Number of nearest: 10
QuickSelect: 10737 ms.
Sort: 20763 ms.
Results from QuickSelect are valid
Code:
import scala.annotation.tailrec
import scala.concurrent.duration.Deadline
import scala.util.Random
case class Point(val name: String, val x: Double, val y: Double)
class NearestPoints(val points: Seq[Point]) {
private case class PointWithDistance(p: Point, d: Double) extends Ordered[PointWithDistance] {
def compare(that: PointWithDistance): Int = d.compareTo(that.d)
}
def distance(p1: Point, p2: Point): Double = {
Math.sqrt(Math.pow(p2.x - p1.x, 2) + Math.pow(p2.y - p1.y, 2))
}
def get(target: Point, n: Int): Seq[Point] = {
val pd = points.map(p => PointWithDistance(p, distance(p, target))).toArray
(1 to n).map(i => quickselect(i, pd).get.p)
}
// In-place QuickSelect from https://gist.github.com/mooreniemi/9e45d55c0410cad0a9eb6d62a5b9b7ae
def quickselect[T <% Ordered[T]](k: Int, xs: Array[T]): Option[T] = {
def randint(lo: Int, hi: Int): Int =
lo + scala.util.Random.nextInt((hi - lo) + 1)
#inline
def swap[T](xs: Array[T], i: Int, j: Int): Unit = {
val t = xs(i)
xs(i) = xs(j)
xs(j) = t
}
def partition[T <% Ordered[T]](xs: Array[T], l: Int, r: Int): Int = {
var pivotIndex = randint(l, r)
val pivotValue = xs(pivotIndex)
swap(xs, r, pivotIndex)
pivotIndex = l
var i = l
while (i <= r - 1) {
if (xs(i) < pivotValue) {
swap(xs, i, pivotIndex)
pivotIndex = pivotIndex + 1
}
i = i + 1
}
swap(xs, r, pivotIndex)
pivotIndex
}
#tailrec
def quickselect0[T <% Ordered[T]](xs: Array[T], l: Int, r: Int, k: Int): T = {
if (l == r) {
xs(l)
} else {
val pivotIndex = partition(xs, l, r)
k compare pivotIndex match {
case 0 => xs(k)
case -1 => quickselect0(xs, l, pivotIndex - 1, k)
case 1 => quickselect0(xs, pivotIndex + 1, r, k)
}
}
}
xs match {
case _ if xs.isEmpty => None
case _ if k < 1 || k > xs.length => None
case _ => Some(quickselect0(xs, 0, xs.size - 1, k - 1))
}
}
}
object QuickSelectVsSort {
def main(args: Array[String]): Unit = {
val rnd = new Random(42L)
val MAX_N: Int = 3000000
val NUM_OF_NEARESTS: Int = 10
val NUM_OF_TARGETS: Int = 10
println(s"Number of points: $MAX_N")
println(s"Number of targets: $NUM_OF_TARGETS")
println(s"Number of nearest: $NUM_OF_NEARESTS")
// Generate random points
val points = (1 to MAX_N)
.map(x => Point(x.toString, rnd.nextDouble, rnd.nextDouble))
// Generate target points
val targets = (1 to NUM_OF_TARGETS).map(x => Point(s"Target$x", rnd.nextDouble, rnd.nextDouble))
var start = Deadline.now
val np = new NearestPoints(points)
val viaQuickSelect = targets.map { case target =>
val nearest = np.get(target, NUM_OF_NEARESTS)
nearest
}
var end = Deadline.now
println(s"QuickSelect: ${(end - start).toMillis} ms.")
start = Deadline.now
val viaSort = targets.map { case target =>
val closest = points.sortWith((a, b) => {
np.distance(a, target) < np.distance(b, target)
}).take(NUM_OF_NEARESTS)
closest
}
end = Deadline.now
println(s"Sort: ${(end - start).toMillis} ms.")
// Validate
assert(viaQuickSelect.length == viaSort.length)
viaSort.zipWithIndex.foreach { case (p, idx) =>
assert(p == viaQuickSelect(idx))
}
println("Results from QuickSelect are valid")
}
}
For finding the top n elements in a list you can Quicksort it and terminate early. That is, terminate at the point where you know there are n elements that are bigger than the pivot. See my implementation in the Rank class of Apache Jackrabbit (in Java though), which does just that.

Scala Probabilistic Priority Queue - dequeue with probability by priority

I have a priority queue, which holds several tasks, each task with a numeric non-unique priority, as follows:
import scala.collection.mutable
class Task(val name: String, val priority: Int) {
override def toString = s"Task(name=$name, priority=$priority)"
}
val task_a = new Task("a", 5)
val task_b = new Task("b", 1)
val task_c = new Task("c", 5)
val pq: mutable.PriorityQueue[Task] =
new mutable.PriorityQueue()(Ordering.by(_.priority))
pq.enqueue(task_a)
pq.enqueue(task_b)
pq.enqueue(task_c)
I want to get the next task:
pq.dequeue()
But this way, I'll always get back task a, even though there's also task c with the same priority.
How to get one of the items with the highest priority randomly? That is to get either task a or task c, with 50/50 chance.
How to get any of the items randomly, with probability according to priority? That is to get 45% task a, 10% task b, and 45% task c.
This should be a good starting point:
abstract class ProbPriorityQueue[V] {
protected type K
protected implicit def ord: Ordering[K]
protected val impl: SortedMap[K, Set[V]]
protected val priority: V => K
def isEmpty: Boolean = impl.isEmpty
def dequeue: Option[(V, ProbPriorityQueue[V])] = {
if (isEmpty) {
None
} else {
// I wish Scala allowed us to collapse these operations...
val k = impl.lastKey
val s = impl(k)
val v = s.head
val s2 = s - v
val impl2 = if (s2.isEmpty)
impl - k
else
impl.updated(k, s2)
Some((v, ProbPriorityQueue.create(impl2, priority)))
}
}
}
object ProbPriorityQueue {
def apply[K: Ordering, V](vs: V*)(priority: V => K): ProbPriorityQueue = {
val impl = vs.foldLeft(SortedMap[K, Set[V]]()) {
case (acc, v) =>
val k = priority(v)
acc get k map { s => acc.updated(k, s + v) } getOrElse (acc + (k -> Set(v)))
}
create(impl, priority)
}
private def create[K0:, V](impl0: SortedMap[K0, Set[V]], priority0: V => K0)(implicit ord0: Ordering[K0]): ProbPriorityQueue[V] =
new ProbPriorityQueue[V] {
type K = K0
def ord = ord0
val impl = impl0
val priority = priority0
}
}
I didn't implement the select function, which would produce a value with weighted probability, but that shouldn't be too hard to do. In order to implement that function, you will need an additional mapping function (similar to priority) which has type K => Double, where Double is the probability weight attached to a particular key bucket. This makes everything somewhat messier, so it didn't seem worth bothering about.
Also this seems like a remarkably specific set of requirements. You're either doing a very interested bit of distributed scheduling, or homework.

abstracting over a collection

Recently, I wrote an iterator for a cartesian product of Anys, and started with a List of List, but recognized, that I can easily switch to the more abstract trait Seq.
I know, you like to see the code. :)
class Cartesian (val ll: Seq[Seq[_]]) extends Iterator [Seq[_]] {
def combicount: Int = (1 /: ll) (_ * _.length)
val last = combicount
var iter = 0
override def hasNext (): Boolean = iter < last
override def next (): Seq[_] = {
val res = combination (ll, iter)
iter += 1
res
}
def combination (xx: Seq [Seq[_]], i: Int): List[_] = xx match {
case Nil => Nil
case x :: xs => x (i % x.length) :: combination (xs, i / x.length)
}
}
And a client of that class:
object Main extends Application {
val illi = new Cartesian (List ("abc".toList, "xy".toList, "AB".toList))
// val ivvi = new Cartesian (Vector (Vector (1, 2, 3), Vector (10, 20)))
val issi = new Cartesian (Seq (Seq (1, 2, 3), Seq (10, 20)))
// val iaai = new Cartesian (Array (Array (1, 2, 3), Array (10, 20)))
(0 to 5).foreach (dummy => println (illi.next ()))
// (0 to 5).foreach (dummy => println (issi.next ()))
}
/*
List(a, x, A)
List(b, x, A)
List(c, x, A)
List(a, y, A)
List(b, y, A)
List(c, y, A)
*/
The code works well for Seq and Lists (which are Seqs), but of course not for Arrays or Vector, which aren't of type Seq, and don't have a cons-method '::'.
But the logic could be used for such collections too.
I could try to write an implicit conversion to and from Seq for Vector, Array, and such, or try to write an own, similar implementation, or write an Wrapper, which transforms the collection to a Seq of Seq, and calls 'hasNext' and 'next' for the inner collection, and converts the result to an Array, Vector or whatever. (I tried to implement such workarounds, but I have to recognize: it's not that easy. For a real world problem I would probably rewrite the Iterator independently.)
However, the whole thing get's a bit out of control if I have to deal with Arrays of Lists or Lists of Arrays and other mixed cases.
What would be the most elegant way to write the algorithm in the broadest, possible way?
There are two solutions. The first is to not require the containers to be a subclass of some generic super class, but to be convertible to one (by using implicit function arguments). If the container is already a subclass of the required type, there's a predefined identity conversion which only returns it.
import collection.mutable.Builder
import collection.TraversableLike
import collection.generic.CanBuildFrom
import collection.mutable.SeqLike
class Cartesian[T, ST[T], TT[S]](val ll: TT[ST[T]])(implicit cbf: CanBuildFrom[Nothing, T, ST[T]], seqLike: ST[T] => SeqLike[T, ST[T]], traversableLike: TT[ST[T]] => TraversableLike[ST[T], TT[ST[T]]] ) extends Iterator[ST[T]] {
def combicount (): Int = (1 /: ll) (_ * _.length)
val last = combicount - 1
var iter = 0
override def hasNext (): Boolean = iter < last
override def next (): ST[T] = {
val res = combination (ll, iter, cbf())
iter += 1
res
}
def combination (xx: TT[ST[T]], i: Int, builder: Builder[T, ST[T]]): ST[T] =
if (xx.isEmpty) builder.result
else combination (xx.tail, i / xx.head.length, builder += xx.head (i % xx.head.length) )
}
This sort of works:
scala> new Cartesian[String, Vector, Vector](Vector(Vector("a"), Vector("xy"), Vector("AB")))
res0: Cartesian[String,Vector,Vector] = empty iterator
scala> new Cartesian[String, Array, Array](Array(Array("a"), Array("xy"), Array("AB")))
res1: Cartesian[String,Array,Array] = empty iterator
I needed to explicitly pass the types because of bug https://issues.scala-lang.org/browse/SI-3343
One thing to note is that this is better than using existential types, because calling next on the iterator returns the right type, and not Seq[Any].
There are several drawbacks here:
If the container is not a subclass of the required type, it is converted to one, which costs in performance
The algorithm is not completely generic. We need types to be converted to SeqLike or TraversableLike only to use a subset of functionality these types offer. So making a conversion function can be tricky.
What if some capabilities can be interpreted differently in different contexts? For example, a rectangle has two 'length' properties (width and height)
Now for the alternative solution. We note that we don't actually care about the types of collections, just their capabilities:
TT should have foldLeft, get(i: Int) (to get head/tail)
ST should have length, get(i: Int) and a Builder
So we can encode these:
trait HasGet[T, CC[_]] {
def get(cc: CC[T], i: Int): T
}
object HasGet {
implicit def seqLikeHasGet[T, CC[X] <: SeqLike[X, _]] = new HasGet[T, CC] {
def get(cc: CC[T], i: Int): T = cc(i)
}
implicit def arrayHasGet[T] = new HasGet[T, Array] {
def get(cc: Array[T], i: Int): T = cc(i)
}
}
trait HasLength[CC] {
def length(cc: CC): Int
}
object HasLength {
implicit def seqLikeHasLength[CC <: SeqLike[_, _]] = new HasLength[CC] {
def length(cc: CC) = cc.length
}
implicit def arrayHasLength[T] = new HasLength[Array[T]] {
def length(cc: Array[T]) = cc.length
}
}
trait HasFold[T, CC[_]] {
def foldLeft[A](cc: CC[T], zero: A)(op: (A, T) => A): A
}
object HasFold {
implicit def seqLikeHasFold[T, CC[X] <: SeqLike[X, _]] = new HasFold[T, CC] {
def foldLeft[A](cc: CC[T], zero: A)(op: (A, T) => A): A = cc.foldLeft(zero)(op)
}
implicit def arrayHasFold[T] = new HasFold[T, Array] {
def foldLeft[A](cc: Array[T], zero: A)(op: (A, T) => A): A = {
var i = 0
var result = zero
while (i < cc.length) {
result = op(result, cc(i))
i += 1
}
result
}
}
}
(strictly speaking, HasFold is not required since its implementation is in terms of length and get, but i added it here so the algorithm will translate more cleanly)
now the algorithm is:
class Cartesian[T, ST[_], TT[Y]](val ll: TT[ST[T]])(implicit cbf: CanBuildFrom[Nothing, T, ST[T]], stHasLength: HasLength[ST[T]], stHasGet: HasGet[T, ST], ttHasFold: HasFold[ST[T], TT], ttHasGet: HasGet[ST[T], TT], ttHasLength: HasLength[TT[ST[T]]]) extends Iterator[ST[T]] {
def combicount (): Int = ttHasFold.foldLeft(ll, 1)((a,l) => a * stHasLength.length(l))
val last = combicount - 1
var iter = 0
override def hasNext (): Boolean = iter < last
override def next (): ST[T] = {
val res = combination (ll, 0, iter, cbf())
iter += 1
res
}
def combination (xx: TT[ST[T]], j: Int, i: Int, builder: Builder[T, ST[T]]): ST[T] =
if (ttHasLength.length(xx) == j) builder.result
else {
val head = ttHasGet.get(xx, j)
val headLength = stHasLength.length(head)
combination (xx, j + 1, i / headLength, builder += stHasGet.get(head, (i % headLength) ))
}
}
And use:
scala> new Cartesian[String, Vector, List](List(Vector("a"), Vector("xy"), Vector("AB")))
res6: Cartesian[String,Vector,List] = empty iterator
scala> new Cartesian[String, Array, Array](Array(Array("a"), Array("xy"), Array("AB")))
res7: Cartesian[String,Array,Array] = empty iterator
Scalaz probably has all of this predefined for you, unfortunately, I don't know it well.
(again I need to pass the types because inference doesn't infer the right kind)
The benefit is that the algorithm is now completely generic and that there is no need for implicit conversions from Array to WrappedArray in order for it to work
Excercise: define for tuples ;-)

How can I extend Scala collections with an argmax method?

I would like to add to all collections where it makes sense, an argMax method.
How to do it? Use implicits?
On Scala 2.8, this works:
val list = List(1, 2, 3)
def f(x: Int) = -x
val argMax = list max (Ordering by f)
As pointed by mkneissl, this does not return the set of maximum points. Here's an alternate implementation that does, and tries to reduce the number of calls to f. If calls to f don't matter that much, see mkneissl's answer. Also, note that his answer is curried, which provides superior type inference.
def argMax[A, B: Ordering](input: Iterable[A], f: A => B) = {
val fList = input map f
val maxFList = fList.max
input.view zip fList filter (_._2 == maxFList) map (_._1) toSet
}
scala> argMax(-2 to 2, (x: Int) => x * x)
res15: scala.collection.immutable.Set[Int] = Set(-2, 2)
The argmax function (as I understand it from Wikipedia)
def argMax[A,B](c: Traversable[A])(f: A=>B)(implicit o: Ordering[B]): Traversable[A] = {
val max = (c map f).max(o)
c filter { f(_) == max }
}
If you really want, you can pimp it onto the collections
implicit def enhanceWithArgMax[A](c: Traversable[A]) = new {
def argMax[B](f: A=>B)(implicit o: Ordering[B]): Traversable[A] = ArgMax.argMax(c)(f)(o)
}
and use it like this
val l = -2 to 2
assert (argMax(l)(x => x*x) == List(-2,2))
assert (l.argMax(x => x*x) == List(-2,2))
(Scala 2.8)
Yes, the usual way would be to use the 'pimp my library' pattern to decorate your collection. For example (N.B. just as illustration, not meant to be a correct or working example):
trait PimpedList[A] {
val l: List[A]
//example argMax, not meant to be correct
def argMax[T <% Ordered[T]](f:T => T) = {error("your definition here")}
}
implicit def toPimpedList[A](xs: List[A]) = new PimpedList[A] {
val l = xs
}
scala> def f(i:Int):Int = 10
f: (i: Int) Int
scala> val l = List(1,2,3)
l: List[Int] = List(1, 2, 3)
scala> l.argMax(f)
java.lang.RuntimeException: your definition here
at scala.Predef$.error(Predef.scala:60)
at PimpedList$class.argMax(:12)
//etc etc...
Nice and easy ? :
val l = List(1,0,10,2)
l.zipWithIndex.maxBy(x => x._1)._2
You can add functions to an existing API in Scala by using the Pimp my Library pattern. You do this by defining an implicit conversion function. For example, I have a class Vector3 to represent 3D vectors:
class Vector3 (val x: Float, val y: Float, val z: Float)
Suppose I want to be able to scale a vector by writing something like: 2.5f * v. I can't directly add a * method to class Float ofcourse, but I can supply an implicit conversion function like this:
implicit def scaleVector3WithFloat(f: Float) = new {
def *(v: Vector3) = new Vector3(f * v.x, f * v.y, f * v.z)
}
Note that this returns an object of a structural type (the new { ... } construct) that contains the * method.
I haven't tested it, but I guess you could do something like this:
implicit def argMaxImplicit[A](t: Traversable[A]) = new {
def argMax() = ...
}
Here's a way of doing so with the implicit builder pattern. It has the advantage over the previous solutions that it works with any Traversable, and returns a similar Traversable. Sadly, it's pretty imperative. If anyone wants to, it could probably be turned into a fairly ugly fold instead.
object RichTraversable {
implicit def traversable2RichTraversable[A](t: Traversable[A]) = new RichTraversable[A](t)
}
class RichTraversable[A](t: Traversable[A]) {
def argMax[That, C](g: A => C)(implicit bf : scala.collection.generic.CanBuildFrom[Traversable[A], A, That], ord:Ordering[C]): That = {
var minimum:C = null.asInstanceOf[C]
val repr = t.repr
val builder = bf(repr)
for(a<-t){
val test: C = g(a)
if(test == minimum || minimum == null){
builder += a
minimum = test
}else if (ord.gt(test, minimum)){
builder.clear
builder += a
minimum = test
}
}
builder.result
}
}
Set(-2, -1, 0, 1, 2).argmax(x=>x*x) == Set(-2, 2)
List(-2, -1, 0, 1, 2).argmax(x=>x*x) == List(-2, 2)
Here's a variant loosely based on #Daniel's accepted answer that also works for Sets.
def argMax[A, B: Ordering](input: GenIterable[A], f: A => B) : GenSet[A] = argMaxZip(input, f) map (_._1) toSet
def argMaxZip[A, B: Ordering](input: GenIterable[A], f: A => B): GenIterable[(A, B)] = {
if (input.isEmpty) Nil
else {
val fPairs = input map (x => (x, f(x)))
val maxF = fPairs.map(_._2).max
fPairs filter (_._2 == maxF)
}
}
One could also do a variant that produces (B, Iterable[A]), of course.
Based on other answers, you can pretty easily combine the strengths of each (minimal calls to f(), etc.). Here we have an implicit conversion for all Iterables (so they can just call .argmax() transparently), and a stand-alone method if for some reason that is preferred. ScalaTest tests to boot.
class Argmax[A](col: Iterable[A]) {
def argmax[B](f: A => B)(implicit ord: Ordering[B]): Iterable[A] = {
val mapped = col map f
val max = mapped max ord
(mapped zip col) filter (_._1 == max) map (_._2)
}
}
object MathOps {
implicit def addArgmax[A](col: Iterable[A]) = new Argmax(col)
def argmax[A, B](col: Iterable[A])(f: A => B)(implicit ord: Ordering[B]) = {
new Argmax(col) argmax f
}
}
class MathUtilsTests extends FunSuite {
import MathOps._
test("Can argmax with unique") {
assert((-10 to 0).argmax(_ * -1).toSet === Set(-10))
// or alternate calling syntax
assert(argmax(-10 to 0)(_ * -1).toSet === Set(-10))
}
test("Can argmax with multiple") {
assert((-10 to 10).argmax(math.pow(_, 2)).toSet === Set(-10, 10))
}
}