How to define a function as generic across all numbers in scala? - scala

I thought I needed to parameterise my function across all Ordering[_] types. But that doesn't work.
How can I make the following function work for all types that support the required mathematical operations, and how could I have found that out by myself?
/**
* Given a list of positive values and a candidate value, round the candidate value
* to the nearest value in the list of buckets.
*
* #param buckets
* #param candidate
* #return
*/
def bucketise(buckets: Seq[Int], candidate: Int): Int = {
// x <= y
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / 2f
if (candidate < midPoint) x else y
}
}
I tried command clicking on the mathematical operators (/, +) in intellij, but just got a notice Sc synthetic function.

If you want to use just the scala standard library, look at Numeric[T]. In your case, since you want to do a non-integer division, you would have to use the Fractional[T] subclass of Numeric.
Here is how the code would look using scala standard library typeclasses. Note that Fractional extends from Ordered. This is convenient in this case, but it is also not mathematically generic. E.g. you can't define a Fractional[T] for Complex because it is not ordered.
def bucketiseScala[T: Fractional](buckets: Seq[T], candidate: T): T = {
// so we can use integral operators such as + and /
import Fractional.Implicits._
// so we can use ordering operators such as <. We do have a Ordering[T]
// typeclass instance because Fractional extends Ordered
import Ordering.Implicits._
// integral does not provide a simple way to create an integral from an
// integer, so this ugly hack
val two = (implicitly[Fractional[T]].one + implicitly[Fractional[T]].one)
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / two
if (candidate < midPoint) x else y
}
}
However, for serious generic numerical computations I would suggest taking a look at spire. It provides a much more elaborate hierarchy of numerical typeclasses. Spire typeclasses are also specialized and therefore often as fast as working directly with primitives.
Here is how to use example would look using spire:
// imports all operator syntax as well as standard typeclass instances
import spire.implicits._
// we need to provide Order explicitly, since not all fields have an order.
// E.g. you can define a Field[Complex] even though complex numbers do not
// have an order.
def bucketiseSpire[T: Field: Order](buckets: Seq[T], candidate: T): T = {
// spire provides a way to get the typeclass instance using the type
// (standard practice in all libraries that use typeclasses extensively)
// the line below is equivalent to implicitly[Field[T]].fromInt(2)
// it also provides a simple way to convert from an integer
// operators are all enabled using the spire.implicits._ import
val two = Field[T].fromInt(2)
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / two
if (candidate < midPoint) x else y
}
}
Spire even provides automatic conversion from integers to T if there exists a Field[T], so you could even write the example like this (almost identical to the non-generic version). However, I think the example above is easier to understand.
// this is how it would look when using all advanced features of spire
def bucketiseSpireShort[T: Field: Order](buckets: Seq[T], candidate: T): T = {
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / 2
if (candidate < midPoint) x else y
}
}
Update: spire is very powerful and generic, but can also be somewhat confusing to a beginner. Especially when things don't work. Here is an excellent blog post explaining the basic approach and some of the issues.

Related

Scala Implicit Conversion Function Name Clashes

I am working with a simple complex number case class in Scala and would like to create an add function that works between complex numbers, doubles and ints. Below is a simple example of a working solution:
case class Complex(re: Double, im: Double)
implicit def toComplex[A](n: A)(implicit f: A => Double): Complex = Complex(n, 0)
implicit class NumberWithAdd[A](n: A)(implicit f: A => Complex) {
def add(m: Complex) = Complex(n.re + m.re, n.im + m.im)
}
Note I am deliberately not including the add function in the complex case class. Using the above I can do all of this:
scala> val z = Complex(1, 2); val w = Complex(2, 3)
z: Complex = Complex(1.0,2.0)
w: Complex = Complex(2.0,3.0)
scala> z add w
res5: Complex = Complex(3.0,5.0)
scala> z add 1
res6: Complex = Complex(2.0,2.0)
scala> 1 add z
res7: Complex = Complex(2.0,2.0)
I'd like to use '+' instead of 'add, but however this does not work. I get the following error:
Error:(14, 4) value + is not a member of A$A288.this.Complex
z + 1
^
Both z + w and 1 + z still work however.
What I'd like to know is why does changing the function name from 'add' to '+' break this? Is there an alternate route to getting this functionality (without simply putting the add function in the complex case class)? Any help would be appreciated.
Edit - Motivation
I'm playing around with monoids and other algebraic structures. I would like to be able to generalise the '...WithAdd' function to automatically work for any class that has a corresponding monoid:
trait Monoid[A] {
val identity: A
def op(x: A, y: A): A
}
implicit class withOp[A](n: A)(implicit val monoid: Monoid[A]) {
def +(m: A): A = monoid.op(n, m)
}
case class Complex(re: Double, im: Double) {
override def toString: String = re + " + " + im + "i"
}
class ComplexMonoid extends Monoid[Complex] {
val identity = Complex(0, 0)
def op(z: Complex, w: Complex): Complex = {
Complex(z.re + w.re, z.im + w.im)
}
}
implicit val complexMonoid = new ComplexMonoid
Using the above I can now do Complex(1, 2) + Complex(3, 1) giving Complex = 4.0 + 3.0i. This is great for code reuse as I could now add extra functions to the Monoid and withAdd function (such as appling op n times to an element, giving the power function for multiplication) and it would work for any case class that has a corresponding monoid. It is only with complex numbers and trying to incorporate doubles, ints, etc., that I then run into the problem above.
I would use a regular class, not a case class. Then it would be easy to create methods to add or subtract these Complex numbers, like:
class Complex(val real : Double, val imag : Double) {
def +(that: Complex) =
new Complex(this.real + that.real, this.imag + that.imag)
def -(that: Complex) =
new Complex(this.real - that.real, this.imag - that.imag)
override def toString = real + " + " + imag + "i"
}
As the source page shows, it will now support something that looks like operator overloading (it's not, because + and - are functions and not operators).
The problem with implicit class NumberWithAdd and its method + is that the same method also exist in number classes such as Int and Double. The + method of NumberWithAdd basically allows you to start with a number that can be casted to Complex and add a Complex object to that first item. That is, the left hand value can be anything (as long as it can be converted) and the right hand value must be Complex.
That works great for w + z (no need to convert w) and 1 + z (implicit conversion for Int to Complex is available). It fails for z + 1 because + is not available in the class Complex .
Since z + 1 is actually z.+(1), Scala will look for other possible matches for +(i: Int) in classes that Complex can be converted into. It also checks NumberWithAdd, which does have a + function but that one required a Complex as right hand value. (It would match a function that requires an Int as right hand value.) There are other functions named + that do accept Int, but there's no conversion from Complex to what those functions want as left hand values.
The same definition of + does work when it's in the (case) class Complex. In that case, both w + z and z + 1 simply use that definition. The case 1 + z is now a little more complicated. Since Int does not have a function + that accepts a Complex value, Scala will find the one that does (in Complex) and determines whether or not it is possible to convert Int into Complex. That is possible using the implicit functions, the conversion takes place and the function is executed.
When the function + in the class NumberWithAdd is renamed add, there's no confusion with functions in Int because Int does not have a function +. So Scala will try harder to apply the function add and it will do the Int to Complex conversion. It will even do that conversion when you try 1 add 2.
Note: My explanations may not fully describe the actual inner workings.

Scala vector scalar multiplication

I'm implementing some machine learning algorithm in Apache Spark MLlib and I would like to multiply vector with scalar:
Where u_i_j_m is a Double and x_i is a vector
I've tried the following:
import breeze.linalg.{ DenseVector => BDV, Vector => BV}
import org.apache.spark.mllib.linalg.{DenseVector, Vectors, Vector}
...
private def runAlgorithm(data: RDD[VectorWithNorm]): = {
...
data.mapPartitions { data_ponts =>
c = Array.fill(clustersNum)(BDV.zeros[Double](dim).asInstanceOf[BV[Double]])
...
data_ponts.foreach { data_point =>
...
u_i_j_m : Double = ....
val temp= data_point.vector * u_i_j_m)
// c(j) = temp
}
}
}
Where VectorWithNorm is defined as following:
class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable {
def this(vector: Vector) = this(vector, Vectors.norm(vector, 2.0))
def this(array: Array[Double]) = this(Vectors.dense(array))
def toDense: VectorWithNorm = new VectorWithNorm(Vectors.dense(vector.toArray), norm)
}
But when I build the project I get the following error:
Error: value * is not a member of org.apache.spark.mllib.linalg.Vector
val temp = (data_point.vector * u_i_j_m)
How can I do this multiplication correctly?
Unfortunately the Spark-Scala contributors decided that they will not pick a library for underlying computations i.e. linear algebra, in Scala. Under the hood they use breeze, but scalar * and + on Spark Vector's are private, as well as other useful methods. This is quite different than python where you can use excellent numpy linear algebra library. The argument was that developers are stretched thin, that breeze was suspicious because development stalled (if I remember correctly), there was an alternative (apache.commons.math), so they decided to let the users pick which linalg library you want to use in Scala. But, prompted by some members of the community, there is now a spark-package which provides linear algebra on org.apache.spark.mllib.linalg.Vector - see here.
In your code you are using sparks's Vector trait instead of breeze's DenseVector, that's why there is no * operator defined on your data_point.vector member.

Understanding Random monad in Scala

This a follow-up to my previous question
Travis Brown pointed out that java.util.Random is side-effecting and suggested a random monad Rng library to make the code purely functional. Now I am trying to build a simplified random monad by myself to understand how it works.
Does it make sense ? How would you fix/improve the explanation below ?
Random Generator
First we plagiarize a random generating function from java.util.Random
// do some bit magic to generate a new random "seed" from the given "seed"
// and return both the new "seed" and a random value based on it
def next(seed: Long, bits: Int): (Long, Int) = ...
Note that next returns both the new seed and the value rather than just the value. We need it to pass the new seed to another function invocation.
Random Point
Now let's write a function to generate a random point in a unit square.
Suppose we have a function to generate a random double in range [0, 1]
def randomDouble(seed: Long): (Long, Double) = ... // some bit magic
Now we can write a function to generate a random point.
def randomPoint(seed: Long): (Long, (Double, Double)) = {
val (seed1, x) = randomDouble(seed)
val (seed2, y) = randomDouble(seed1)
(seed2, (x, y))
}
So far, so good and both randomDouble and randomPoint are pure. The only problem is that we compose randomDouble to build randomPoint ad hoc. We don't have a generic tool to compose functions yielding random values.
Monad Random
Now we will define a generic tool to compose functions yielding random values. First, we generalize the type of randomDouble:
type Random[A] = Long => (Long, A) // generate a random value of type A
and then build a wrapper class around it.
class Random[A](run: Long => (Long, A))
We need the wrapper to define methods flatMap (as bind in Haskell) and map used by for-comprehension.
class Random[A](run: Long => (Long, A)) {
def apply(seed: Long) = run(seed)
def flatMap[B](f: A => Random[B]): Random[B] =
new Random({seed: Long => val (seed1, a) = run(seed); f(a)(seed1)})
def map[B](f: A => B): Random[B] =
new Random({seed: Long = val (seed1, a) = run(seed); (seed1, f(a))})
}
Now we add a factory-function to create a trivial Random[A] (which is absolutely deterministic rather than "random", by the way) This is a return function (as return in Haskell).
def certain[A](a: A) = new Random({seed: Long => (seed, a)})
Random[A] is a computation yielding random value of type A. The methods flatMap , map, and function unit serves for composing simple computations to build more complex ones. For example, we will compose two Random[Double] to build Random[(Double, Double)].
Monadic Random Point
Now when we have a monad we are ready to revisit randomPoint and randomDouble. Now we define them differently as functions yielding Random[Double] and Random[(Double, Double)]
def randomDouble(): Random[Double] = new Random({seed: Long => ... })
def randomPoint(): Random[(Double, Double)] =
randomDouble().flatMap(x => randomDouble().flatMap(y => certain(x, y))
This implementation is better than the previous one since it uses a generic tool (flatMap and certain) to compose two calls of Random[Double] and build Random[(Double, Double)].
Now can re-use this tool to build more functions generating random values.
Monte-Carlo calculation of Pi
Now we can use map to test if a random point is in the circle:
def randomCircleTest(): Random[Boolean] =
randomPoint().map {case (x, y) => x * x + y * y <= 1}
We can also define a Monte-Carlo simulation in terms of Random[A]
def monteCarlo(test: Random[Boolean], trials: Int): Random[Double] = ...
and finally the function to calculate PI
def pi(trials: Int): Random[Double] = ....
All those functions are pure. Side-effects occur only when we finally apply the pi function to get the value of pi.
Your approach is quite nice although it is a little bit complicated. I also suggested you to take a look at Chapter 6 of Functional Programming in Scala By Paul Chiusano and Runar Bjarnason.
The chapter is called Purely Functional State and it shows how to create purely functional random generator and define its data type algebra on top of that to have full function composition support.

Scala - Enforcing size of Vector at compile time

Is it possible to enforce the size of a Vector passed in to a method at compile time? I want to model an n-dimensional Euclidean space using a collection of points in the space that looks something like this (this is what I have now):
case class EuclideanPoint(coordinates: Vector[Double]) {
def distanceTo(desination: EuclieanPoint): Double = ???
}
If I have a coordinate that is created via EuclideanPoint(Vector(1, 0, 0)), it is a 3D Euclidean point. Given that, I want to make sure the destination point passed in a call to distanceTo is of the same dimension.
I know I can do this by using Tuple1 to Tuple22, but I want to represent many different geometric spaces and I would be writing 22 classes for each space if I did it with Tuples - is there a better way?
It is possible to do this in a number of ways that all look more or less like what Randall Schulz has described in a comment. The Shapeless library provides a particularly convenient implementation, which lets you get something pretty close to what you want like this:
import shapeless._
case class EuclideanPoint[N <: Nat](
coordinates: Sized[IndexedSeq[Double], N] { type A = Double }
) {
def distanceTo(destination: EuclideanPoint[N]): Double =
math.sqrt(
(this.coordinates zip destination.coordinates).map {
case (a, b) => (a - b) * (a - b)
}.sum
)
}
Now you can write the following:
val orig2d = EuclideanPoint(Sized(0.0, 0.0))
val unit2d = EuclideanPoint(Sized(1.0, 1.0))
val orig3d = EuclideanPoint(Sized(0.0, 0.0, 0.0))
val unit3d = EuclideanPoint(Sized(1.0, 1.0, 1.0))
And:
scala> orig2d distanceTo unit2d
res0: Double = 1.4142135623730951
scala> orig3d distanceTo unit3d
res1: Double = 1.7320508075688772
But not:
scala> orig2d distanceTo unit3d
<console>:15: error: type mismatch;
found : EuclideanPoint[shapeless.Nat._3]
required: EuclideanPoint[shapeless.Nat._2]
orig2d distanceTo unit3d
^
Sized comes with a number of nice features, including a handful of collections operations that carry along static guarantees about length. We can write the following for example:
val somewhere = EuclideanPoint(Sized(0.0) ++ Sized(1.0, 0.0))
And have an ordinary old point in three-dimensional space.
You could do something your self by doing a type level encoding of the Natural Numbers like: http://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/. Then just parametrizing your Vector by a Natural. Would not require the extra dependency, but would probably be more complicated then using Shapeless.

How does one write the Pythagoras Theorem in Scala?

The square of the hypotenuse of a right triangle is equal to the sum of the squares on the other two sides.
This is Pythagoras's Theorem. A function to calculate the hypotenuse based on the length "a" and "b" of it's sides would return sqrt(a * a + b * b).
The question is, how would you define such a function in Scala in such a way that it could be used with any type implementing the appropriate methods?
For context, imagine a whole library of math theorems you want to use with Int, Double, Int-Rational, Double-Rational, BigInt or BigInt-Rational types depending on what you are doing, and the speed, precision, accuracy and range requirements.
This only works on Scala 2.8, but it does work:
scala> def pythagoras[T](a: T, b: T, sqrt: T => T)(implicit n: Numeric[T]) = {
| import n.mkNumericOps
| sqrt(a*a + b*b)
| }
pythagoras: [T](a: T,b: T,sqrt: (T) => T)(implicit n: Numeric[T])T
scala> def intSqrt(n: Int) = Math.sqrt(n).toInt
intSqrt: (n: Int)Int
scala> pythagoras(3,4, intSqrt)
res0: Int = 5
More generally speaking, the trait Numeric is effectively a reference on how to solve this type of problem. See also Ordering.
The most obvious way:
type Num = {
def +(a: Num): Num
def *(a: Num): Num
}
def pyth[A <: Num](a: A, b: A)(sqrt: A=>A) = sqrt(a * a + b * b)
// usage
pyth(3, 4)(Math.sqrt)
This is horrible for many reasons. First, we have the problem of the recursive type, Num. This is only allowed if you compile this code with the -Xrecursive option set to some integer value (5 is probably more than sufficient for numbers). Second, the type Num is structural, which means that any usage of the members it defines will be compiled into corresponding reflective invocations. Putting it mildly, this version of pyth is obscenely inefficient, running on the order of several hundred thousand times slower than a conventional implementation. There's no way around the structural type though if you want to define pyth for any type which defines +, * and for which there exists a sqrt function.
Finally, we come to the most fundamental issue: it's over-complicated. Why bother implementing the function in this way? Practically speaking, the only types it will ever need to apply to are real Scala numbers. Thus, it's easiest just to do the following:
def pyth(a: Double, b: Double) = Math.sqrt(a * a + b * b)
All problems solved! This function is usable on values of type Double, Int, Float, even odd ones like Short thanks to the marvels of implicit conversion. While it is true that this function is technically less flexible than our structurally-typed version, it is vastly more efficient and eminently more readable. We may have lost the ability to calculate the Pythagrean theorem for unforeseen types defining + and *, but I don't think you're going to miss that ability.
Some thoughts on Daniel's answer:
I've experimented to generalize Numeric to Real, which would be more appropriate for this function to provide the sqrt function. This would result in:
def pythagoras[T](a: T, b: T)(implicit n: Real[T]) = {
import n.mkNumericOps
(a*a + b*b).sqrt
}
It is tricky, but possible, to use literal numbers in such generic functions.
def pythagoras[T](a: T, b: T)(sqrt: (T => T))(implicit n: Numeric[T]) = {
import n.mkNumericOps
implicit val fromInt = n.fromInt _
//1 * sqrt(a*a + b*b) Not Possible!
sqrt(a*a + b*b) * 1 // Possible
}
Type inference works better if the sqrt is passed in a second parameter list.
Parameters a and b would be passed as Objects, but #specialized could fix this. Unfortuantely there will still be some overhead in the math operations.
You can almost do without the import of mkNumericOps. I got frustratringly close!
There is a method in java.lang.Math:
public static double hypot (double x, double y)
for which the javadocs asserts:
Returns sqrt(x2 +y2) without intermediate overflow or underflow.
looking into src.zip, Math.hypot uses StrictMath, which is a native Method:
public static native double hypot(double x, double y);