Scala multivariate array sum - scala

I am calculating confidence interval of some ratios using bootstrapping for multi variables, and each variable has different confidence intervals.
case class Limits(lowerLimit: Double, upperLimit: Double, confidenceInterval: Double)
case class CI(c85: Limits, c90: Limits, c95: Limits, c99: Limits)
For bootstrapping, I am running a loop for 100 times.
val arrCIRatio: Array[CI] = Array()
var ci85: Limits = new Limits(0.0, 0.0, 0.0)
var ci90: Limits = new Limits(0.0, 0.0, 0.0)
var ci95: Limits = new Limits(0.0, 0.0, 0.0)
var ci99: Limits = new Limits(0.0, 0.0, 0.0)
val a = Array(1,2,3,4,5,6,7,8,9,10)
val rg = new scala.util.Random(100)
for(iteration <- 1 to 100){
val i= rg.nextInt(10)
ci85 = getInterval(a(i), 0.85)
ci90 = getInterval(a(i), 0.90)
ci95 = getInterval(a(i), 0.95)
ci99 = getInterval(a(i), 0.99)
arrCIRatio(iteration) = new CI(ci85,ci90,ci95,ci99)
}
After the loop finishes, I would like to take average of each upper and lower limits inside the CI array for all the ci85, ci89, ci95, ci99.
I can use foldLeft to calculate the sum
x.foldLeft(0.0)( (x, y) => x + y.lowerLimit, x.foldLeft(0.0)( (x, y) => x + y.upperLimit
or in naive way:
var avgci85: Limits = new Limits(0.0, 0.0, 0.0)
var avgci90: Limits = new Limits(0.0, 0.0, 0.0)
var avgci95: Limits = new Limits(0.0, 0.0, 0.0)
var avgci99: Limits = new Limits(0.0, 0.0, 0.0)
for(ci <- arrCIRatio){
ci85 = ci.c85
ci90 = ci.c90
ci95 = ci.c95
ci99 = ci.c99
avgci85 = new Limits(avgci85.lowerLimit + ci85.lowerLimit, avgci85.upperLimit + ci85.upperLimit, 0.85)
avgci90 = new Limits(avgci90.lowerLimit + ci90.lowerLimit, avgci90.upperLimit + ci85.upperLimit, 0.90)
avgci95 = new Limits(avgci95.lowerLimit + ci95.lowerLimit, avgci95.upperLimit + ci85.upperLimit, 0.95)
avgci99 = new Limits(avgci99.lowerLimit + ci99.lowerLimit, avgci99.upperLimit + ci85.upperLimit, 0.99)
}
But I have to do the same process for atleast 10 variables, and all the CI inside the array.
So in the end, for 1 variable it will be a 3 dimensional array of 10000 x 4 x 3.
I don't know how to sum all the variables inside that array e.g. summing all the lower limit of ci85 in the array. It would be great if someone can help with it.

Maybe just create method/operator for your CI and Limits which would allow you to combine two instances? Something like that:
case class Limits(lowerLimit: Double, upperLimit: Double, confidenceInterval: Double) {
// operator |+| would allow us to combine two limits
def |+|(l: Limits): Limits = Limits(l.lowerLimit + lowerLimit, l.upperLimit + upperLimit, confidenceInterval)
// I don't know what should be done in case of attempt of combining two limits with different confidenceInterval.
// Maybe it's a sign that every kind of Limits should be separate case class extending the common trait
}
object Limits {
val Zero = new Limits(0,0,0) //zero element for convienience
}
case class CI(c85: Limits, c90: Limits, c95: Limits, c99: Limits) {
//same operator for CI, we use |+| from Limits to combine them
def |+|(c: CI): CI = CI(c.c85 |+| c85, c.c90 |+| c90, c.c95 |+| c95, c.c99 |+| c99)
}
object CI {
val Zero = new CI(Limits.Zero, Limits.Zero, Limits.Zero, Limits.Zero)
}
Then you'd be able to fold your CI easily:
arrCIRatio.fold(CI.Zero)(_ |+| _)
What we did is called monoid. Instead of implementing |+| and Zero inside cases classes you could implement it using typeclasses (as described in the article).

Related

Why is vertical traversal faster than horizontal traversal when it should be the opposite?

So I am doing the scala parallel programming course, which challenged us to implement the box blur using different implementations.
One of them is to chunk the image by rows and another is to chunk the image by columns. The image is stored as (row major order) :
type RGBA = Int
/** Image is a two-dimensional matrix of pixel values. */
class Img(val width: Int, val height: Int, private val data: Array[RGBA]) {
def this(w: Int, h: Int) = this(w, h, new Array(w * h))
def apply(x: Int, y: Int): RGBA = {
data(y * width + x)
}
def update(x: Int, y: Int, c: RGBA): Unit = data(y * width + x) = c
}
This is the implementation of basic blur, which is same in all implementations.
def boxBlurKernel(src: Img, x: Int, y: Int, radius: Int): RGBA = {
val pixels = for {
j <- (y - radius to y + radius)
i <- (x - radius to x + radius)
if (i > 0 && i < src.width && j > 0 && j < src.height)
} yield src(i,j)
val reds = pixels.map(red)
val greens = pixels.map(green)
val blues = pixels.map(blue)
val alphas = pixels.map(alpha)
val redComponent = reds.sum / pixels.size
val greenComponent = greens.sum / pixels.size
val blueComponent = blues.sum / pixels.size
val alphaComponent = alphas.sum / pixels.size
rgba(redComponent,greenComponent,blueComponent,alphaComponent)
}
Now we implement a vertical blur implementation -
def blur(src: Img, dst: Img, from: Int, end: Int, radius: Int): Unit = {
val imageHeight = src.height
val xCoordinates: Seq[Int] = from until end
val yCoordinates: Seq[Int] = 0 until imageHeight
for {
xCoordinate <- xCoordinates
yCoordinate <- yCoordinates
} yield dst.update(xCoordinate, yCoordinate, boxBlurKernel(src, xCoordinate, yCoordinate, radius))
}
def parBlur(src: Img, dst: Img, numTasks: Int, radius: Int): Unit = {
val imageWidth = src.width
val boundaries = linspace(0, imageWidth, numTasks + 1).map(_.toInt).toScalaVector.sliding(2)
val tasks = boundaries.toList.map { case Seq(from, end) => task {
blur(src, dst, from, end, radius)
}
}
tasks.foreach(_.join())
}
And then we implement Horizontal blur
def blur(src: Img, dst: Img, from: Int, end: Int, radius: Int): Unit = {
val imageWidth = src.width
val xCoordinates = 0 until imageWidth
val yCoordinates = from until end
for {
yCoordinate <- yCoordinates
xCoordinate <- xCoordinates
} yield dst.update(xCoordinate, yCoordinate, boxBlurKernel(src, xCoordinate, yCoordinate, radius))
}
def parBlur(src: Img, dst: Img, numTasks: Int, radius: Int): Unit = {
val imageHeight = src.height
val boundaries = linspace(0, imageHeight, numTasks + 1).map(_.toInt).toScalaVector.sliding(2)
boundaries.toList.map {
case Seq(from: Int, end: Int) => task(from, end, blur(src, dst, from, end, radius))
}.foreach(_.join())
}
Now since the Image is stored in row major format, it was expected that Horizontal blur utilizes the processor cache more efficiently and should be somewhat faster than Vertical blur timings.
However, I find opposite results.
Vertical box blur time -
[info] Running (fork) scalashop.VerticalBoxBlurRunner
fork/join blur time: 2281.5884644 ms
Horizontal box blur time -
[info] Running (fork) scalashop.HorizontalBoxBlurRunner
fork/join blur time with number of tasks = 32: 2680.8516574 ms
I'm running these benchmarks with scalameter and on Mac OS 2.2 GHz
The task parallel primitive is returning a ForkJoinTask inturn.
Performance will depend on a range of factors, including the size of the image, the size of the kernel, the architecture of the processor, and the efficiency of the JIT compiler.
In this case boxBlurKernel is very inefficient because it uses at least 9 loops when the computation could be done in a single pass. It also clips the coordinates each time rather than clipping the bounds before entering the loop.
It is seems unlikely that the difference is due to caching because such a small portion of the code is reading new data into the cache. The number of loops makes it very hard for the JIT compiler to optimise the code, and the difference may simply be that it generate more efficient code in the second case.
Using multiple tasks complicates things, so it would be better to test with a single task to see which version is actually more efficient. After that, work on the overall efficiency of the algorithm before worrying about the relative efficiency of two different versions.

How to program a circle fit in scala

I want to fit a circle to given 2D points in Scala.
Apache commons math has an example for this in java, which I am trying to translate to scala (without success, because my knowledge of Java is almost non existent).
I took the example code from "http://commons.apache.org/proper/commons-math/userguide/leastsquares.html", (see end of page) which I tried to translate into scala:
import org.apache.commons.math3.linear._
import org.apache.commons.math3.fitting._
import org.apache.commons.math3.fitting.leastsquares._
import org.apache.commons.math3.fitting.leastsquares.LeastSquaresOptimizer._
import org.apache.commons.math3._
import org.apache.commons.math3.geometry.euclidean.twod.Vector2D
import org.apache.commons.math3.util.Pair
import org.apache.commons.math3.fitting.leastsquares.LeastSquaresOptimizer.Optimum
def circleFitting: Unit = {
val radius: Double = 70.0
val observedPoints = Array(new Vector2D(30.0D, 68.0D), new Vector2D(50.0D, -6.0D), new Vector2D(110.0D, -20.0D), new Vector2D(35.0D, 15.0D), new Vector2D(45.0D, 97.0D))
// the model function components are the distances to current estimated center,
// they should be as close as possible to the specified radius
val distancesToCurrentCenter = new MultivariateJacobianFunction() {
//def value(point: RealVector): (RealVector, RealMatrix) = {
def value(point: RealVector): Pair[RealVector, RealMatrix] = {
val center = new Vector2D(point.getEntry(0), point.getEntry(1))
val value: RealVector = new ArrayRealVector(observedPoints.length)
val jacobian: RealMatrix = new Array2DRowRealMatrix(observedPoints.length, 2)
for (i <- 0 to observedPoints.length) {
var o = observedPoints(i)
var modelI: Double = Vector2D.distance(o, center)
value.setEntry(i, modelI)
// derivative with respect to p0 = x center
jacobian.setEntry(i, 0, (center.getX() - o.getX()) / modelI)
// derivative with respect to p1 = y center
jacobian.setEntry(i, 1, (center.getX() - o.getX()) / modelI)
}
new Pair(value, jacobian)
}
}
// the target is to have all points at the specified radius from the center
val prescribedDistances = Array.fill[Double](observedPoints.length)(radius)
// least squares problem to solve : modeled radius should be close to target radius
val problem:LeastSquaresProblem = new LeastSquaresBuilder().start(Array(100.0D, 50.0D)).model(distancesToCurrentCenter).target(prescribedDistances).maxEvaluations(1000).maxIterations(1000).build()
val optimum:Optimum = new LevenbergMarquardtOptimizer().optimize(problem) //LeastSquaresOptimizer.Optimum
val fittedCenter: Vector2D = new Vector2D(optimum.getPoint().getEntry(0), optimum.getPoint().getEntry(1))
println("circle fitting wurde aufgerufen!")
println("CIRCLEFITTING: fitted center: " + fittedCenter.getX() + " " + fittedCenter.getY())
println("CIRCLEFITTING: RMS: " + optimum.getRMS())
println("CIRCLEFITTING: evaluations: " + optimum.getEvaluations())
println("CIRCLEFITTING: iterations: " + optimum.getIterations())
}
This gives no compile errors, but crashes with:
Exception in thread "main" java.lang.NullPointerException
at org.apache.commons.math3.linear.EigenDecomposition.<init>(EigenDecomposition.java:119)
at org.apache.commons.math3.fitting.leastsquares.LeastSquaresFactory.squareRoot(LeastSquaresFactory.java:245)
at org.apache.commons.math3.fitting.leastsquares.LeastSquaresFactory.weightMatrix(LeastSquaresFactory.java:155)
at org.apache.commons.math3.fitting.leastsquares.LeastSquaresFactory.create(LeastSquaresFactory.java:95)
at org.apache.commons.math3.fitting.leastsquares.LeastSquaresBuilder.build(LeastSquaresBuilder.java:59)
at twoDhotScan.FittingFunctions$.circleFitting(FittingFunctions.scala:49)
at twoDhotScan.Main$.delayedEndpoint$twoDhotScan$Main$1(hotScan.scala:14)
at twoDhotScan.Main$delayedInit$body.apply(hotScan.scala:11)
at scala.Function0.apply$mcV$sp(Function0.scala:34)
at scala.Function0.apply$mcV$sp$(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:389)
at scala.App.main(App.scala:76)
at scala.App.main$(App.scala:74)
at twoDhotScan.Main$.main(hotScan.scala:11)
at twoDhotScan.Main.main(hotScan.scala)
I guess the problem is somewhere in the definition of the function distancesToCurrentCenter. I don't even know if this MultivariateJacobianFunction is supposed to be a real function or an object or what ever.
After some long fiddeling with the code, I got it running
The NullPointerException was gone after I updated apache-commons-math3 from version 3.3 to version 3.6.1 in my build.sbt file. Don't know if I forgot a paramater of if it was a bug. There were also 2 bugs in the example on the apache-commons-math website: They had two times a .getX operator where should have been an .getY.
So here is a running example for a circle fit with known radius:
import org.apache.commons.math3.analysis.{ MultivariateVectorFunction, MultivariateMatrixFunction }
import org.apache.commons.math3.fitting.leastsquares.LeastSquaresOptimizer.Optimum
import org.apache.commons.math3.fitting.leastsquares.{ MultivariateJacobianFunction, LeastSquaresProblem, LeastSquaresBuilder, LevenbergMarquardtOptimizer }
import org.apache.commons.math3.geometry.euclidean.twod.Vector2D
import org.apache.commons.math3.linear.{ Array2DRowRealMatrix, RealMatrix, RealVector, ArrayRealVector }
object Main extends App {
val radius: Double = 20.0
val pointsList: List[(Double, Double)] = List(
(18.36921795, 10.71416674),
(0.21196357, -22.46528791),
(-4.153845171, -14.75588526),
(3.784114125, -25.55910336),
(31.32998899, 2.546924253),
(34.61542186, -12.90323269),
(19.30193011, -28.53185596),
(16.05620863, 10.97209111),
(31.67011956, -20.05020878),
(19.91175561, -28.38748712))
/*******************************************************************************
***** Random values on a circle with centerX=15, centerY=-9 and radius 20 *****
*******************************************************************************/
val observedPoints: Array[Vector2D] = (pointsList map { case (x, y) => new Vector2D(x, y) }).toArray
val vectorFunktion: MultivariateVectorFunction = new MultivariateVectorFunction {
def value(variables: Array[Double]): Array[Double] = {
val center = new Vector2D(variables(0), variables(1))
observedPoints map { p: Vector2D => Vector2D.distance(p, center) }
}
}
val matrixFunction = new MultivariateMatrixFunction {
def value(variables: Array[Double]): Array[Array[Double]] = {
val center = new Vector2D(variables(0), variables(1))
(observedPoints map { p: Vector2D => Array((center.getX - p.getX) / Vector2D.distance(p, center), (center.getY - p.getY) / Vector2D.distance(p, center)) })
}
}
// the target is to have all points at the specified radius from the center
val prescribedDistances = Array.fill[Double](observedPoints.length)(radius)
// least squares problem to solve : modeled radius should be close to target radius
val problem = new LeastSquaresBuilder().start(Array(100.0D, 50.0D)).model(vectorFunktion, matrixFunction).target(prescribedDistances).maxEvaluations(25).maxIterations(25).build
val optimum: Optimum = new LevenbergMarquardtOptimizer().optimize(problem)
val fittedCenter: Vector2D = new Vector2D(optimum.getPoint.getEntry(0), optimum.getPoint.getEntry(1))
println("Ergebnisse des LeastSquareBuilder:")
println("CIRCLEFITTING: fitted center: " + fittedCenter.getX + " " + fittedCenter.getY)
println("CIRCLEFITTING: RMS: " + optimum.getRMS)
println("CIRCLEFITTING: evaluations: " + optimum.getEvaluations)
println("CIRCLEFITTING: iterations: " + optimum.getIterations + "\n")
}
Tested on Scala version 2.12.6, compiled with sbt version 1.2.8
Does anabody know how to do this without a fixed radius?
After some reasearch on circle fitting I've found a wonderful algorith in the paper: "Error alalysis for circle fitting algorithms" by H. Al-Sharadqah and N. Chernov (available here: http://people.cas.uab.edu/~mosya/cl/ )
I implemented it in scala:
import org.apache.commons.math3.linear.{ Array2DRowRealMatrix, RealMatrix, RealVector, LUDecomposition, EigenDecomposition }
object circleFitFunction {
def circleFit(dataXY: List[(Double, Double)]) = {
def square(x: Double): Double = x * x
def multiply(pair: (Double, Double)): Double = pair._1 * pair._2
val n: Int = dataXY.length
val (xi, yi) = dataXY.unzip
//val S: Double = math.sqrt(((xi map square) ++ yi map square).sum / n)
val zi: List[Double] = dataXY map { case (x, y) => x * x + y * y }
val x: Double = xi.sum / n
val y: Double = yi.sum / n
val z: Double = ((xi map square) ++ (yi map square)).sum / n
val zz: Double = (zi map square).sum / n
val xx: Double = (xi map square).sum / n
val yy: Double = (yi map square).sum / n
val xy: Double = ((xi zip yi) map multiply).sum / n
val zx: Double = ((zi zip xi) map multiply).sum / n
val zy: Double = ((zi zip yi) map multiply).sum / n
val N: RealMatrix = new Array2DRowRealMatrix(Array(
Array(8 * z, 4 * x, 4 * y, 2),
Array(4 * x, 1, 0, 0),
Array(4 * y, 0, 1, 0),
Array(2.0D, 0, 0, 0)))
val M: RealMatrix = new Array2DRowRealMatrix(Array(
Array(zz, zx, zy, z),
Array(zx, xx, xy, x),
Array(zy, xy, yy, y),
Array(z, x, y, 1.0D)))
val Ninverse = new LUDecomposition(N).getSolver().getInverse()
val eigenValueProblem = new EigenDecomposition(Ninverse.multiply(M))
// Get all eigenvalues
// As we need only the smallest positive eigenvalue, all negative eigenvalues are replaced by Double.MaxValue
val eigenvalues: Array[Double] = eigenValueProblem.getRealEigenvalues() map (lambda => if (lambda < 0) Double.MaxValue else lambda)
// Now get the index of the smallest positive eigenvalue, to get the associated eigenvector
val i: Int = eigenvalues.zipWithIndex.min._2
val eigenvector: RealVector = eigenValueProblem.getEigenvector(3)
val A = eigenvector.getEntry(0)
val B = eigenvector.getEntry(1)
val C = eigenvector.getEntry(2)
val D = eigenvector.getEntry(3)
val centerX: Double = -B / (2 * A)
val centerY: Double = -C / (2 * A)
val Radius: Double = math.sqrt((B * B + C * C - 4 * A * D) / (4 * A * A))
val RMS: Double = (dataXY map { case (x, y) => (Radius - math.sqrt((x - centerX) * (x - centerX) + (y - centerY) * (y - centerY))) } map square).sum / n
(centerX, centerY, Radius, RMS)
}
}
I kept all the Names form the paper (see Chaper 4 and 8 and look for the Hyperfit-Algorithm) and I tried to limit the Matrix operations.
It's still not what I need, cause this sort of algorithm (algebraic fit) has known issues with fitting partially circles (arcs) and maybe big circles.
With my data, I had once the situation that it spit out completly wrong results, and I found out that I had an Eigenvalue of -0.1...
The Eigenvector of this Value produced the right result, but it was sorted out because of the negative Eigenvalue. So this one is not always stable (as so many other circle fitting algorithms)
But what a nice Algorithm!!!
Looks a bit like dark magic to me.
If someone needs not to much precision and a lot of speed (and has data from a full circle not to big) this would be my choice.
Next thing I will try is to implement a Levenberg Marquardt Algorithm form the same page I mentioned above.

type mismatch in scala when using reduce

Can anybody help me understand what's wrong with the code below?
case class Point(x: Double, y: Double)
def centroid(points: IndexedSeq[Point]): Point = {
val x = points.reduce(_.x + _.x)
val y = points.reduce(_.y + _.y)
val len = points.length
Point(x/len, y/len)
}
I get the error when I run it:
Error:(10, 30) type mismatch;
found : Double
required: A$A145.this.Point
val x = points.reduce(_.x + _.x)
^
reduce, in this case, takes a function of type (Point, Point) => Point and returns a Point.
One way to calculate the centroid:
case class Point(x: Double, y: Double)
def centroid(points: IndexedSeq[Point]): Point = {
val x = points.map(_.x).sum
val y = points.map(_.y).sum
val len = points.length
Point(x/len, y/len)
}
If you want to use reduce you need to reduce both x and y in a single pass like this
def centroid(points: IndexedSeq[Point]): Point = {
val p = points.reduce( (s, p) => Point(s.x + p.x, s.y + p.y) )
val len = points.length
Point(p.x/len, p.y/len)
}
If you want to compute x and y independently then use foldLeft rather than reduce like this
def centroid(points: IndexedSeq[Point]): Point = {
val x = points.foldLeft(0.0)(_ + _.x)
val y = points.foldLeft(0.0)(_ + _.y)
val len = points.length
Point(x/len, y/len)
}
This is perhaps clearer but does process the points twice so it may be marginally less efficient.

Evaluating intermediate function values only once

I need a function that will translate and scale a complex number, i.e if z is complex, the fuction should return ( z - translate ) * scale
The function is to be parametrized by the dimensions of the screen and the scaling factor, here is what I have:
def affTransform(width: Int, height: Int, scaleFactor: Double)(z: Complex): Complex = {
val scale: Double = 4.0 / width
val translate = Complex(width / 2, height / 2)
(z - translate) * scale
}
With this in place, the following works as expected:
val transform: Complex => Complex = affTransform(W, H, 4)
...
val zz: Complex = transform(z)
The problem is that the calculation:
val scale: Double = 4.0 / width
val translate = Complex(width / 2, height / 2)
is performed every time transform is applied which is conceptually redundant. This is also the case when affTransform is not curried and transform is defined as partially applied.
Is there a way to define affTransform and\or transform so that scale and translate are calculated only once?
Simply take away the last of the argument lists and return a function instead:
case class Complex(re: Double, im: Double) {
def - (that: Complex) = Complex(this.re - that.re, this.im - that.im)
def * (scalar: Double) = Complex(re * scalar, im * scalar)
}
def aff(width: Int, height: Int, scaleFactor: Double): Complex => Complex = {
println("Heavy calculation here...")
val scale: Double = 4.0 / width
val translate = Complex(width / 2, height / 2)
z: Complex => (z - translate) * scale
}
val transform: Complex => Complex = aff(640, 480, 4)
transform(Complex(12, 34))

Scala: Using Sets for (non - primitive) co-odinate values

I'm using integer coordinates for hex grids as follows:
object Cood
{
val up = Cood(0, 2)
val upRight = Cood(1, 1)
val downRight = Cood(1, -1)
val down = Cood(0, - 2)
val downLeft = Cood(-1, -1)
val upLeft = Cood(- 1, 1)
val dirns: List[Cood] = List[Cood](up, upRight, downRight, down, downLeft, upLeft)
}
case class Cood(x: Int, y: Int)
{
def +(operand: Cood): Cood = Cood(x + operand.x, y + operand.y)
def -(operand: Cood): Cood = Cood(x - operand.x, y - operand.y)
def *(operand: Int): Cood = Cood(x * operand, y * operand)
}
Hexs and Sides both have coordinate values. Every Hex has 6 sides but some sides will be shared by 2 Hexs. Eg Hex(2, 2) and its upper neighbour Hex(2, 6) share Side(2, 4). So I want to apply set operations something like this:
val hexCoods: Set[Cood] = ... some code
val sideCoods: Set[Cood] = hexCoods.flatMap(i => Cood.dirns.map(_ + i).toSet)
But if I do this Cood will be treated as a reference type and the duplicate co-ordinates won't be stripped out. Is there any way round this?
Did you try it?
scala> Set.empty + Cood(1,1) + Cood(1,2) + Cood(1,1)
res0: scala.collection.immutable.Set[Cood] = Set(Cood(1,1), Cood(1,2))
Like #sschaef pointed out in the comments, case classes have automatically-generated equals and hashCode methods, which implement structural equality rather than just comparing identity. This means that you shouldn't get duplicates in your set, and sure enough the set in my test didn't have a duplicate entry.