Scala - Enforcing size of Vector at compile time - scala

Is it possible to enforce the size of a Vector passed in to a method at compile time? I want to model an n-dimensional Euclidean space using a collection of points in the space that looks something like this (this is what I have now):
case class EuclideanPoint(coordinates: Vector[Double]) {
def distanceTo(desination: EuclieanPoint): Double = ???
}
If I have a coordinate that is created via EuclideanPoint(Vector(1, 0, 0)), it is a 3D Euclidean point. Given that, I want to make sure the destination point passed in a call to distanceTo is of the same dimension.
I know I can do this by using Tuple1 to Tuple22, but I want to represent many different geometric spaces and I would be writing 22 classes for each space if I did it with Tuples - is there a better way?

It is possible to do this in a number of ways that all look more or less like what Randall Schulz has described in a comment. The Shapeless library provides a particularly convenient implementation, which lets you get something pretty close to what you want like this:
import shapeless._
case class EuclideanPoint[N <: Nat](
coordinates: Sized[IndexedSeq[Double], N] { type A = Double }
) {
def distanceTo(destination: EuclideanPoint[N]): Double =
math.sqrt(
(this.coordinates zip destination.coordinates).map {
case (a, b) => (a - b) * (a - b)
}.sum
)
}
Now you can write the following:
val orig2d = EuclideanPoint(Sized(0.0, 0.0))
val unit2d = EuclideanPoint(Sized(1.0, 1.0))
val orig3d = EuclideanPoint(Sized(0.0, 0.0, 0.0))
val unit3d = EuclideanPoint(Sized(1.0, 1.0, 1.0))
And:
scala> orig2d distanceTo unit2d
res0: Double = 1.4142135623730951
scala> orig3d distanceTo unit3d
res1: Double = 1.7320508075688772
But not:
scala> orig2d distanceTo unit3d
<console>:15: error: type mismatch;
found : EuclideanPoint[shapeless.Nat._3]
required: EuclideanPoint[shapeless.Nat._2]
orig2d distanceTo unit3d
^
Sized comes with a number of nice features, including a handful of collections operations that carry along static guarantees about length. We can write the following for example:
val somewhere = EuclideanPoint(Sized(0.0) ++ Sized(1.0, 0.0))
And have an ordinary old point in three-dimensional space.

You could do something your self by doing a type level encoding of the Natural Numbers like: http://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/. Then just parametrizing your Vector by a Natural. Would not require the extra dependency, but would probably be more complicated then using Shapeless.

Related

Scala: Detect and extract something more specific from a collection of `Any` values

Scala: Detect and extract something more specific from a collection of Any values.
(Motivation: The Saddle library -- the only Scala library I have found that provides a Frame type, which is critical for data science -- leads me to this puzzle. See last section for details.)
The problem
Imagine a collection c of type C[Any]. Suppose that some of the elements of c are of a type T which is strictly more specific than Any.
I would like a way to find all the elements of type T, and to then create an object d of type C[T], rather than C[Any].
Some code demonstrating the problem
scala> val c = List(0,1,"2","3")
<console>:11: warning: a type was inferred to be `Any`;
this may indicate a programming error.
val c = List(0,1,"2","3")
^
c: List[Any] = List(0, 1, 2, 3)
scala> :t c(0)
Any // I wish this were more specific
// Scala can convert some elements to Int.
scala> val c0 = c(0) . asInstanceOf[Int]
c0: Int = 0
// But how would I detect which?
scala> val d = c.slice(0,2)
d: List[Any] = List(0, 1) // I wish this were List[Int]
Motivation: Why the Saddle library leads me to this problem
Saddle lets you manipulate "Frames" (tables). Frames can have columns of various types. Some systems (e.g. Pandas) assign a separate type to each column. Every Frame in Saddle, however, has exactly three type parameters: The type of row labels, the type of column labels, and the type of cells.
Real world data is typically a mix of strings and numbers. The way such tables are represented in Saddle is as a Frame with a cell type of Any. I'd like to downcast (upcast? polymorphism is hard) a column to something more specific than a Series of Any values. I'd also like to be able to test a column, to be sure that the cast is appropriate.
I posted an issue on Saddle's Github site about the puzzle.
You could do something like this
scala> val c = List(0,1,"2","3")
c: List[Any] = List(0, 1, 2, 3)
scala> c.collect { case x: Int => x; case s: String => s.toInt }
res0: List[Int] = List(0, 1, 2, 3)
If you just want the Int types you can simply drop the second case.

Achieving compile safe indexing with Shapeless

Problem: Let val v = List(0.5, 1.2, 0.3) model a realisation of some vector v = (v_1, v_2, v_3). The index j in v_j is implied by the element's position in the list. A lot of boilerplate and bugs have been due to tracking these indices, eg, when creating modified lists from the original. (How to make sure (in compile-time) that collection wasn't reordered? seems related.)
General question: What could be a good way to ensure correct indexing at compile time? (I assume that performance is not essential.)
My plan is to use subclasses of Nat in shapeless to model the indices as (explicit) types. The current solution is
import shapeless._
import Nat._
trait Elem[+A, +N<:Nat]{
val v: A
val ind: N}
case class DecElem[N<:Nat](v: BigDecimal, ind: N) extends Elem[BigDecimal, N]
object Decimals {
type One = DecElem[ _0]:: HNil
type Two = DecElem[ _0]:: DecElem[_1] :: HNil
//...
}
case class Scalar(v: Decimals.One)
case class VecTwo(v: Decimals.Two)
This, however, gets tedious in larger dimensions.
Another question is how to approach the generic case in trait Elem[+A, +N<:Nat]. As a start, I defined case class ElemVector[A, M<:Nat](vs: Sized[List[Elem[A, Nat]], M]), which loses the specific index type in Elem. What might be a strategy to circumvent this difficulty?
(Note: A better design may be to wrap List[A] by attaching explicit indices and deal with wrapper class. This, however, does not essentially change the question.)
UPDATE Here's is a trivial illustration of accidental swapping of vector elements.
import shapeless.Nat._
import shapeless.{Sized, nat}
type VectorTwo = Sized[IndexedSeq[Double], nat._2]
val f = (p: VectorTwo, x: Double) => {
val V = p(_0)
val K = p(_1)
V * x/(K + x)
}
val V : Double = 500
val K : Double = 1
val correct: VectorTwo = Sized(V, K)
val wrong: VectorTwo = Sized(K, V) //Compiles!
f(correct, 10) // = 454.54
f(wrong, 10) // = 0.02
I'd like to enlist the compiler to prevent such errors in vectors with many elements, and wonder if there could be an elegant solution with Shapeless.

How to define a function as generic across all numbers in scala?

I thought I needed to parameterise my function across all Ordering[_] types. But that doesn't work.
How can I make the following function work for all types that support the required mathematical operations, and how could I have found that out by myself?
/**
* Given a list of positive values and a candidate value, round the candidate value
* to the nearest value in the list of buckets.
*
* #param buckets
* #param candidate
* #return
*/
def bucketise(buckets: Seq[Int], candidate: Int): Int = {
// x <= y
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / 2f
if (candidate < midPoint) x else y
}
}
I tried command clicking on the mathematical operators (/, +) in intellij, but just got a notice Sc synthetic function.
If you want to use just the scala standard library, look at Numeric[T]. In your case, since you want to do a non-integer division, you would have to use the Fractional[T] subclass of Numeric.
Here is how the code would look using scala standard library typeclasses. Note that Fractional extends from Ordered. This is convenient in this case, but it is also not mathematically generic. E.g. you can't define a Fractional[T] for Complex because it is not ordered.
def bucketiseScala[T: Fractional](buckets: Seq[T], candidate: T): T = {
// so we can use integral operators such as + and /
import Fractional.Implicits._
// so we can use ordering operators such as <. We do have a Ordering[T]
// typeclass instance because Fractional extends Ordered
import Ordering.Implicits._
// integral does not provide a simple way to create an integral from an
// integer, so this ugly hack
val two = (implicitly[Fractional[T]].one + implicitly[Fractional[T]].one)
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / two
if (candidate < midPoint) x else y
}
}
However, for serious generic numerical computations I would suggest taking a look at spire. It provides a much more elaborate hierarchy of numerical typeclasses. Spire typeclasses are also specialized and therefore often as fast as working directly with primitives.
Here is how to use example would look using spire:
// imports all operator syntax as well as standard typeclass instances
import spire.implicits._
// we need to provide Order explicitly, since not all fields have an order.
// E.g. you can define a Field[Complex] even though complex numbers do not
// have an order.
def bucketiseSpire[T: Field: Order](buckets: Seq[T], candidate: T): T = {
// spire provides a way to get the typeclass instance using the type
// (standard practice in all libraries that use typeclasses extensively)
// the line below is equivalent to implicitly[Field[T]].fromInt(2)
// it also provides a simple way to convert from an integer
// operators are all enabled using the spire.implicits._ import
val two = Field[T].fromInt(2)
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / two
if (candidate < midPoint) x else y
}
}
Spire even provides automatic conversion from integers to T if there exists a Field[T], so you could even write the example like this (almost identical to the non-generic version). However, I think the example above is easier to understand.
// this is how it would look when using all advanced features of spire
def bucketiseSpireShort[T: Field: Order](buckets: Seq[T], candidate: T): T = {
buckets.foldLeft(buckets.head) { (x, y) =>
val midPoint = (x + y) / 2
if (candidate < midPoint) x else y
}
}
Update: spire is very powerful and generic, but can also be somewhat confusing to a beginner. Especially when things don't work. Here is an excellent blog post explaining the basic approach and some of the issues.

Functional programming, Scala map and fold left [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some good tutorials on fold left?
Original question, restored from deletion to provide context for other answers:
I am trying to implement a method for finding the boudning box of rectangle, circle, location and the group which all extends Shape. Group is basically an array of Shapes
abstract class Shape
case class Rectangle(width: Int, height: Int) extends Shape
case class Location(x: Int, y: Int, shape: Shape) extends Shape
case class Circle(radius: Int) extends Shape
case class Group(shape: Shape*) extends Shape
I got the bounding box computed for all three except the Group one. So now for the bounding box method I know I should be using map and fold left for Group, but I just can't find out the exact syntax of creating it.
object BoundingBox {
def boundingBox(s: Shape): Location = s match {
case Circle(c)=>
new Location(-c,-c,s)
case Rectangle(_, _) =>
new Location(0, 0, s)
case Location(x, y, shape) => {
val b = boundingBox(shape)
Location(x + b.x, y + b.y, b.shape)
}
case Group(shapes # _*) => ( /: shapes) { } // i dont know how to proceed here.
}
}
Group bounding box is basically the smallest bounding box with all the shapes enclosed.
Now that you've edited to ask an almost completely different question, I'll give a different answer. Rather than point to a tutorial on maps and folds, I'll just give one.
In Scala, you first need to know how to create an anonymous function. It goes like so, from most general to more specific:
(var1: Type1, var2: Type2, ..., varN: TypeN) => /* output */
(var1, var2, ..., varN) => /* output, if types can be inferred */
var1 => /* output, if type can be inferred and N=1 */
Here are some examples:
(x: Double, y: Double, z: Double) => Math.sqrt(x*x + y*y + z*z)
val f:(Double,Double)=>Double = (x,y) => x*y + Math.exp(-x*y)
val neg:Double=>Double = x => -x
Now, the map method of lists and such will apply a function (anonymous or otherwise) to every element of the map. That is, if you have
List(a1,a2,...,aN)
f:A => B
then
List(a1,a2,...,aN) map (f)
produces
List( f(a1) , f(a2) , ..., f(aN) )
There are all sorts of reasons why this might be useful. Maybe you have a bunch of strings and you want to know how long each is, or you want to make them all upper case, or you want them backwards. If you have a function that does what you want to one element, map will do it to all elements:
scala> List("How","long","are","we?") map (s => s.length)
res0: List[Int] = List(3, 4, 3, 3)
scala> List("How","capitalized","are","we?") map (s => s.toUpperCase)
res1: List[java.lang.String] = List(HOW, CAPITALIZED, ARE, WE?)
scala> List("How","backwards","are","we?") map (s => s.reverse)
res2: List[scala.runtime.RichString] = List(woH, sdrawkcab, era, ?ew)
So, that's map in general, and in Scala.
But what if we want to collect our results? That's where fold comes in (foldLeft being the version that starts on the left and works right).
Suppose we have a function f:(B,A) => B, that is, it takes a B and an A, and combines them to produce a B. Well, we could start with a B, and then feed our list of A's into it one at a time, and at the end of it all, we'd have some B. That's exactly what fold does. foldLeft does it starting from the left end of the list; foldRight starts from the right. That is,
List(a1,a2,...,aN) foldLeft(b0)(f)
produces
f( f( ... f( f(b0,a1) , a2 ) ... ), aN )
where b0 is, of course, your initial value.
So, maybe we have a function that takes an int and a string, and returns the int or the length of the string, whichever is greater--if we folded our list using that, it would tell us the longest string (assuming that we start with 0). Or we could add the length to the int, accumulating values as we go.
Let's give it a try.
scala> List("How","long","is","longest?").foldLeft(0)((i,s) => i max s.length)
res3: Int = 8
scala> List("How","long","is","everyone?").foldLeft(0)((i,s) => i + s.length)
res4: Int = 18
Okay, fine, but what if we want to know who is the longest? One way (perhaps not the best, but it illustrates a useful pattern well) is to carry along both the length (an integer) and the leading contender (a string). Let's give that a go:
scala> List("Who","is","longest?").foldLeft((0,""))((i,s) =>
| if (i._1 < s.length) (s.length,s)
| else i
| )
res5: (Int, java.lang.String) = (8,longest?)
Here, i is now a tuple of type (Int,String), and i._1 is the first part of that tuple (an Int).
But in some cases like this, using a fold isn't really want we want. If we want the longer of two strings, the most natural function would be one like max:(String,String)=>String. How do we apply that one?
Well, in this case, there is a default "shortest" case, so we could fold the string-max function starting with "". But a better way is to use reduce. As with fold, there are two versions, one that works from the left, the other which works from the right. It takes no initial value, and requires a function f:(A,A)=>A. That is, it takes two things and returns one of the same type. Here's an example with a string-max function:
scala> List("Who","is","longest?").reduceLeft((s1,s2) =>
| if (s2.length > s1.length) s2
| else s1
| )
res6: java.lang.String = longest?
Now, there are just two more tricks. First, the following two mean the same thing:
list.foldLeft(b0)(f)
(b0 /: list)(f)
Notice how the second is shorter, and it sort of gives you the impression that you're taking b0 and doing something to the list with it (which you are). (:\ is the same as foldRight, but you use it like so: (list :\ b0) (f)
Second, if you only refer to a variable once, you can use _ instead of the variable name and omit the x => part of the anonymous function declaration. Here are two examples:
scala> List("How","long","are","we?") map (_.length)
res7: List[Int] = List(3, 4, 3, 3)
scala> (0 /: List("How","long","are","we","all?"))(_ + _.length)
res8: Int = 16
At this point, you should be able to create functions and map, fold, and reduce them using Scala. Thus, if you know how your algorithm should work, it should be reasonably straightforward to implement it.
The basic algorithm would go like this:
shapes.tail.foldLeft(boundingBox(shapes.head)) {
case (box, shape) if box contains shape => box
case (box, shape) if shape contains box => shape
case (box, shape) => boxBounding(box, shape)
}
Now you have to write contains and boxBounding, which is a pure algorithms problem more than a language problem.
If the shapes all had the same center, implementing contains would be easier. It would go like this:
abstract class Shape { def contains(s: Shape): Boolean }
case class Rectangle(width: Int, height: Int) extends Shape {
def contains(s: Shape): Boolean = s match {
case Rectangle(w2, h2) => width >= w2 && height >= h2
case Location(x, y, s) => // not the same center
case Circle(radius) => width >= radius && height >= radius
case Group(shapes # _*) => shapes.forall(this.contains(_))
}
}
case class Location(x: Int, y: Int, shape: Shape) extends Shape {
def contains(s: Shape): Boolean = // not the same center
}
case class Circle(radius: Int) extends Shape {
def contains(s: Shape): Boolean = s match {
case Rectangle(width, height) => radius >= width && radius >= height
case Location(x, y) => // not the same center
case Circle(r2) => radius >= r2
case Group(shapes # _*) => shapes.forall(this.contains(_))
}
}
case class Group(shapes: Shape*) extends Shape {
def contains(s: Shape): Boolean = shapes.exists(_ contains s)
}
As for boxBounding, which takes two shapes and combine them, it will usually be a rectangle, but can be a circle under certain circunstances. Anyway, it is pretty straight-forward, once you have the algorithm figured out.
A bounding box is usually a rectangle. I don't think a circle located at (-r,-r) is the bounding box of a circle of radius r....
Anyway, suppose you have a bounding box b1 and another b2 and a function combineBoxes that computes the bounding box of b1 and b2.
Then if you have a non-empty set of shapes in your group, you can use reduceLeft to compute the whole bounding box of a list of bounding boxes by combining them two at a time until only one giant box remains. (The same idea can be used to reduce a list of numbers to a sum of numbers by adding them in pairs. And it's called reduceLeft because it works left to right across the list.)
Suppose that blist is a list of bounding boxes of each shape. (Hint: this is where map comes in.) Then
val bigBox = blist reduceLeft( (box1,box2) => combineBoxes(box1,box2) )
You'll need to catch the empty group case separately, however. (Since it has a no well-defined bounding box, you don't want to use folds; folds are good for when there is a default empty case that makes sense. Or you have to fold with Option, but then your combining function has to understand how to combine None with Some(box), which is probably not worth it in this case--but very well might be if you were writing production code that needs to elegantly handle various sorts of empty list situations.)

How does one write the Pythagoras Theorem in Scala?

The square of the hypotenuse of a right triangle is equal to the sum of the squares on the other two sides.
This is Pythagoras's Theorem. A function to calculate the hypotenuse based on the length "a" and "b" of it's sides would return sqrt(a * a + b * b).
The question is, how would you define such a function in Scala in such a way that it could be used with any type implementing the appropriate methods?
For context, imagine a whole library of math theorems you want to use with Int, Double, Int-Rational, Double-Rational, BigInt or BigInt-Rational types depending on what you are doing, and the speed, precision, accuracy and range requirements.
This only works on Scala 2.8, but it does work:
scala> def pythagoras[T](a: T, b: T, sqrt: T => T)(implicit n: Numeric[T]) = {
| import n.mkNumericOps
| sqrt(a*a + b*b)
| }
pythagoras: [T](a: T,b: T,sqrt: (T) => T)(implicit n: Numeric[T])T
scala> def intSqrt(n: Int) = Math.sqrt(n).toInt
intSqrt: (n: Int)Int
scala> pythagoras(3,4, intSqrt)
res0: Int = 5
More generally speaking, the trait Numeric is effectively a reference on how to solve this type of problem. See also Ordering.
The most obvious way:
type Num = {
def +(a: Num): Num
def *(a: Num): Num
}
def pyth[A <: Num](a: A, b: A)(sqrt: A=>A) = sqrt(a * a + b * b)
// usage
pyth(3, 4)(Math.sqrt)
This is horrible for many reasons. First, we have the problem of the recursive type, Num. This is only allowed if you compile this code with the -Xrecursive option set to some integer value (5 is probably more than sufficient for numbers). Second, the type Num is structural, which means that any usage of the members it defines will be compiled into corresponding reflective invocations. Putting it mildly, this version of pyth is obscenely inefficient, running on the order of several hundred thousand times slower than a conventional implementation. There's no way around the structural type though if you want to define pyth for any type which defines +, * and for which there exists a sqrt function.
Finally, we come to the most fundamental issue: it's over-complicated. Why bother implementing the function in this way? Practically speaking, the only types it will ever need to apply to are real Scala numbers. Thus, it's easiest just to do the following:
def pyth(a: Double, b: Double) = Math.sqrt(a * a + b * b)
All problems solved! This function is usable on values of type Double, Int, Float, even odd ones like Short thanks to the marvels of implicit conversion. While it is true that this function is technically less flexible than our structurally-typed version, it is vastly more efficient and eminently more readable. We may have lost the ability to calculate the Pythagrean theorem for unforeseen types defining + and *, but I don't think you're going to miss that ability.
Some thoughts on Daniel's answer:
I've experimented to generalize Numeric to Real, which would be more appropriate for this function to provide the sqrt function. This would result in:
def pythagoras[T](a: T, b: T)(implicit n: Real[T]) = {
import n.mkNumericOps
(a*a + b*b).sqrt
}
It is tricky, but possible, to use literal numbers in such generic functions.
def pythagoras[T](a: T, b: T)(sqrt: (T => T))(implicit n: Numeric[T]) = {
import n.mkNumericOps
implicit val fromInt = n.fromInt _
//1 * sqrt(a*a + b*b) Not Possible!
sqrt(a*a + b*b) * 1 // Possible
}
Type inference works better if the sqrt is passed in a second parameter list.
Parameters a and b would be passed as Objects, but #specialized could fix this. Unfortuantely there will still be some overhead in the math operations.
You can almost do without the import of mkNumericOps. I got frustratringly close!
There is a method in java.lang.Math:
public static double hypot (double x, double y)
for which the javadocs asserts:
Returns sqrt(x2 +y2) without intermediate overflow or underflow.
looking into src.zip, Math.hypot uses StrictMath, which is a native Method:
public static native double hypot(double x, double y);