How to fill a ListBuffer[ List[Any] ] (Scala)? - scala

I have a ListBuffer declared like this:
var distances_buffer: ListBuffer[List[Any]] = ListBuffer.empty[List[Any]]
and I am trying to fill it with data like this:
for(current <- 0 to training_list_length - 1){
//A and B are examples
distances_buffer(current) ++= List[Any](A,B)
}
However I get the following error:
java.lang.IndexOutOfBoundsException: 0
What am I missing?
EDIT! More Info:
I have a list (Named: training_list) of points and their class. (x, y, class) :
training_list : List[((Double, Double, String))]
I also have an extra point with a x and a y value given.
My goal is to calculate the euclidean distance of the extra point from each point inside the training list, and create a result which look like this:
//example
List((x: Double, y: Double, class: String), distance: String)
List((4.3,3.0,Iris-setosa), 1.2529964086141665), (4.4,3.0,Iris-setosa), 1.341640786499874)...
As you can see, in the list I want to include the coordinates of the point (from the training_list), the class of the point and also the distance.
for(current <- 0 to training_list_length - 1){
val dist = eDistance(test_x, test_y, training_x(current), training_y(current))
distances_buffer += ListBuffer[Any](training_result(current),dist)
}
After the creation of this list I want to sort it based on the distances. Also stuck here!

It seems, for the way you name things, that you come from a Python background.
I would advice you to study a little bit about Scala first, since we are very different.
Not only on style things (like camel case vs dash case), but on more fundamental things like:
A strong and static type system. Thus, things like Any are usually a code smell and for 99.99% of cases, completely unnecessary.
A mix between OOP & FP. So, you do not necessary have to become an FP expert, but there some things that are idiomatic even in the OOP side of Scala, like immutability and common operations (higher order functions) like map, flatMap, filter & reduce.
Our List is very different from the Python one, accessing an element by index is O(n) (in Python it would be O(1)). A Python list is more like an Array that can be resized.
Also, we do not really have a for loop. We have something called for comprehension which is nothing more than syntactic sugar for calls to map, flatMap, filter & in some cases foreach.
The list could go on, but I think my point is clear.
Scala is not only new syntax, is a different way to program.
Anyways, here is an idiomatic way to solve your problem.
(Not necessary this is the best way, there are many variations that can be done)
// First lets create some custom types / classes to represent your data.
final case class Point(x: Double, y: Double)
final case class ClassifiedPoint(point: Point, clazz: String)
// Lets define the Euclidean distance function.
def euclideanDistance(p1: Point, p2: Point): Double =
math.sqrt(
math.pow((p1.x - p2.x), 2) +
math.pow((p1.y - p2.y), 2)
)
}
// This is what you need.
// Note that I made it somewhat more generic that is has to be.
// For example, instead of using the euclidean distance function directly on the body,
// we receive the distance function to use.
// Also, I use a technique called currying to split the arguments in different lists,
// This allows the caller to partially apply them.
def computeDistance(distanceFun: (Point, Point) => Double)
(trainingList: List[ClassifiedPoint])
(referencePoint: Point): List[(ClassifiedPoint, Double)] =
trainingList.map { classifiedPoint =>
val distance = distanceFun(classifiedPoint.point, referencePoint)
classifiedPoint -> distance
}
Which you can use like this.
val trainingList = List(
ClassifiedPoint(Point(x = 4.3d, y = 3.0d), clazz = "Iris-setosa"),
ClassifiedPoint(Point(x = 4.4d, y = 3.0d), clazz = "Iris-setosa")
)
// Partial application to create a new function.
val computeEuclideanDistance = computeDistance(euclideanDistance) _
computeEuclideanDistance(trainingList, Point(x = 3.0d, y = 0.0d))
// res: List[(ClassifiedPoint, Double)] =
// List(
// (ClassifiedPoint(Point(4.3, 3.0), "Iris-setosa"), 3.269556544854363),
// (ClassifiedPoint(Point(4.4, 3.0), "Iris-setosa"), 3.3105890714493698)
// )

As suggested by Luis, Any and var are to be avoided if possible, so here is an example that might nudge you to consider a different approach
case class Point(x: Double, y: Double, `class`: String)
def distance(a: Point, b: Point): Double =
math.hypot(a.x - b.x, a.y - b.y)
val targetPoint = Point(1,2,"Extra-point")
val training_list : List[((Double, Double, String))] = List((4.3,3.0,"Iris-setosa"), (4.4,3.0,"Iris-setosa"))
val points = training_list.map(Point.tupled)
val unsortedPoints: List[(Point, Double)] = points.map(point => (point, distance(point, targetPoint)))
unsortedPoints.sortBy(_._2)
which outputs
res0: List[(Point, Double)] = List((Point(4.3,3.0,Iris-setosa),3.4481879299133333), (Point(4.4,3.0,Iris-setosa),3.5440090293338704))
I have copied distance calculation from Xavier.

Related

What is the mechanism by which functions with multiple parameter lists can (sometimes) be used with less than the required number of parameters?

Let me introduce this question by way of an example. This was taken from Lecture 2.3 of Martin Odersky's Functional Programming course.
I have a function to find fixed points iteratively like so
object fixed_points {
println("Welcome to Fixed Points")
val tolerance = 0.0001
def isCloseEnough(x: Double, y: Double) =
abs((x-y)/x) < tolerance
def fixedPoint(f: Double => Double)(firstGuess: Double) = {
def iterate(guess: Double): Double = {
println(guess)
val next = f(guess)
if (isCloseEnough(guess, next)) next
else iterate(next)
}
iterate(firstGuess)
}
I can adapt this function to finding square roots like so
def sqrt(x: Double) =
fixedPoint(y => x/y)(1.0)
However, this does not converge for certain arguments (like 4 for example). So I apply an average damping to it, essentially converting it to Newton-Raphson like so
def sqrt(x: Double) =
fixedPoint(y => (x/y+y)/2)(1.0)
which converges.
Now average damping is general enough to warrant its own function, so I refactor my code like so
def averageDamp(f: Double => Double)(x: Double) = (x+f(x))/2
and
def sqrtDamp(x: Double) =
fixedPoint(averageDamp(y=>x/y))(1.0) (*)
Whoa! What just happened?? I'm using averageDamp with only one parameter (when it was defined with two) and the compiler does not complain!
Now, I understand that I can use partial application like so
def a = averageDamp(x=>2*x)_
a(3) // returns 4.5
No problems there. But when I attempt to use averageDamp with less than the requisite number of parameters (as was done in sqrtDamp) like so
def a = averageDamp(x=>2*x) (**)
I get an error missing arguments for method averageDamp.
Questions:
How is what I have done in (**) different from (*) that the compiler complains in the former but not the latter?
So it looks like using less than the requisite parameters is allowed under certain circumstances. What are these circumstances and what is the name given to this mechanism? (I realize this would come under the topic of 'currying', but I'm after the specific name of this subset of currying, as it were)
This answer expands on the comment posted by #som-snytt.
The difference between (**) and (*) is that in the former, fixedPoint provides a type definition, whereas in the latter a does not. Essentially, whenever your code provides an explicit type declaration, the compiler is happy yo overlook the omission of the trailing underscore. This is a deliberate design decision, see Martin Odersky's explanation.
To illustrate this point, here is a small example.
object A {
def add(a: Int)(b:Int): Int = a + b
val x: Int => Int = add(5) // compiles fine
val y = add(5) // produces the following compiler error
}
/* missing arguments for method add in object A;
follow this method with `_' if you want to treat it as a partially applied function
val y = add(5)
^
*/

Random as instance of scalaz.Monad

This is a follow-up to my previous question. I wrote a monad (for an exercise) that is actually a function generating random values. However it is not defined as an instance of type class scalaz.Monad.
Now I looked at Rng library and noticed that it defined Rng as scalaz.Monad:
implicit val RngMonad: Monad[Rng] =
new Monad[Rng] {
def bind[A, B](a: Rng[A])(f: A => Rng[B]) = a flatMap f
def point[A](a: => A) = insert(a)
}
So I wonder how exactly users benefit from that. How can we use the fact that Rng is an instance of type class scalaz.Monad ? Can you give any examples ?
Here's a simple example. Suppose I want to pick a random size for a range, and then pick a random index inside that range, and then return both the range and the index. The second computation of a random value clearly depends on the first—I need to know the size of the range in order to pick a value in the range.
This kind of thing is specifically what monadic binding is for—it allows you to write the following:
val rangeAndIndex: Rng[(Range, Int)] = for {
max <- Rng.positiveint
index <- Rng.chooseint(0, max)
} yield (0 to max, index)
This wouldn't be possible if we didn't have a Monad instance for Rng.
One of the benefit is that you will get a lot of useful methods defined in MonadOps.
For example, Rng.double.iterateUntil(_ < 0.1) will produce only the values that are less than 0.1 (while the values greater than 0.1 will be skipped).
iterateUntil can be used for generation of distribution samples using a rejection method.
E.g. this is the code that creates a beta distribution sample generator:
import com.nicta.rng.Rng
import java.lang.Math
import scalaz.syntax.monad._
object Main extends App {
def beta(alpha: Double, beta: Double): Rng[Double] = {
// Purely functional port of Numpy's beta generator: https://github.com/numpy/numpy/blob/31b94e85a99db998bd6156d2b800386973fef3e1/numpy/random/mtrand/distributions.c#L187
if (alpha <= 1.0 && beta <= 1.0) {
val rng: Rng[Double] = Rng.double
val xy: Rng[(Double, Double)] = for {
u <- rng
v <- rng
} yield (Math.pow(u, 1 / alpha), Math.pow(v, 1 / beta))
xy.iterateUntil { case (x, y) => x + y <= 1.0 }.map { case (x, y) => x / (x + y) }
} else ???
}
val rng: Rng[List[Double]] = beta(0.5, 0.5).fill(10)
println(rng.run.unsafePerformIO) // Prints 10 samples of the beta distribution
}
Like any interface, declaring an instance of Monad[Rng] does two things: it provides an implementation of the Monad methods under standard names, and it expresses an implicit contract that those method implementations conform to certain laws (in this case, the monad laws).
#Travis gave an example of one thing that's implemented with these interfaces, the Scalaz implementation of map and flatMap. You're right that you could implement these directly; they're "inherited" in Monad (actually a little more complex than that).
For an example of a method that you definitely have to implement some Scalaz interface for, how about sequence? This is a method that turns a List (or more generally a Traversable) of contexts into a single context for a List, e.g.:
val randomlyGeneratedNumbers: List[Rng[Int]] = ...
randomlyGeneratedNumbers.sequence: Rng[List[Int]]
But this actually only uses Applicative[Rng] (which is a superclass), not the full power of Monad. I can't actually think of anything that uses Monad directly (there are a few methods on MonadOps, e.g. untilM, but I've never used any of them in anger), but you might want a Bind for a "wrapper" case where you have an "inner" Monad "inside" your Rng things, in which case MonadTrans is useful:
val a: Rng[Reader[Config, Int]] = ...
def f: Int => Rng[Reader[Config, Float]] = ...
//would be a pain to manually implement something to combine a and f
val b: ReaderT[Rng, Config, Int] = ...
val g: Int => ReaderT[Rng, Config, Float] = ...
b >>= g
To be totally honest though, Applicative is probably good enough for most Monad use cases, at least the simpler ones.
Of course all of these methods are things you could implement yourself, but like any library the whole point of Scalaz is that they're already implemented, and under standard names, making it easier for other people to understand your code.

Migrate from MurmurHash to MurmurHash3

In Scala 2.10, MurmurHash for some reason is deprecated, saying I should use MurmurHash3 now. But the API is different, and there is no useful scaladocs for MurmurHash3 -> fail.
For instance, current code:
trait Foo {
type Bar
def id: Int
def path: Bar
override def hashCode = {
import util.MurmurHash._
var h = startHash(2)
val c = startMagicA
val k = startMagicB
h = extendHash(h, id, c, k)
h = extendHash(h, path.##, nextMagicA(c), nextMagicB(k))
finalizeHash(h)
}
}
How would I do this using MurmurHash3 instead? This needs to be a fast operation, preferably without allocations, so I do not want to construct a Product, Seq, Array[Byte] or whathever MurmurHash3 seems to be offering me.
The MurmurHash3 algorithm was changed, confusingly, from an algorithm that mixed in its own salt, essentially (c and k), to one that just does more bit-mixing. The basic operation is now mix, which you should fold over all your values, after which you should finalizeHash (the Int argument for length is for convenience also, to help with distinguishing collections of different length). If you want to replace your last mix by mixLast, it's a little faster and removes redundancy with finalizeHash. If it takes you too long to detect what the last mix is, just mix.
Typically for a collection you'll want to mix in an extra value to indicate what type of collection it is.
So minimally you'd have
override def hashCode = finalizeHash(mixLast(id, path.##), 0)
and "typically" you'd
// Pick any string or number that suits you, put in companion object
val fooSeed = MurmurHash3.stringHash("classOf[Foo]")
// I guess "id" plus "path" is two things?
override def hashCode = finalizeHash(mixLast( mix(fooSeed,id), path.## ), 2)
Note that the length field is NOT there to give a high-quality hash that mixes in that number. All mixing of important hash values should be done with mix.
Looking at the source code of MurmurHash3 suggests something like this:
override def hashCode = {
import util.hashing.MurmurHash3._
val h = symmetricSeed // I'm not sure which seed to use here
val h1 = mix(h, id)
val h2 = mixLast(h1, path ##)
finalizeHash(h2, 2)
}
or, in (almost) one line:
import util.hashing.MurmurHash3._
override def hashCode = finalizeHash(mix(mix(symmetricSeed, id), path ##), 2)

When imperative style fits better?

From the Programming in Scala (second edition), bottom of the p.98:
A balanced attitude for Scala programmers
Prefer vals, immutable objects, and methods without side effects.
Reach for them first. Use vars, mutable objects, and methods with side effects when you have a specific need and justification for them.
It is explained on previous pages why to prefer vals, immutable objects, and methods without side effects so this sentence makes perfect sense.
But second sentence:"Use vars, mutable objects, and methods with side effects when you have a specific need and justification for them." is not explained so well.
So my question is:
What is justification or specific need to use vars, mutable objects and methods with side effect?
P.s.: It would be great if someone could provide some examples for each of those (besides explanation).
In many cases functional programming increases the level of abstraction and hence makes your code more concise and easier/faster to write and understand. But there are situations where the resulting bytecode cannot be as optimized (fast) as for an imperative solution.
Currently (Scala 2.9.1) one good example is summing up ranges:
(1 to 1000000).foldLeft(0)(_ + _)
Versus:
var x = 1
var sum = 0
while (x <= 1000000) {
sum += x
x += 1
}
If you profile these you will notice a significant difference in execution speed. So sometimes performance is a really good justification.
Ease of Minor Updates
One reason to use mutability is if you're keeping track of some ongoing process. For example, let's suppose I am editing a large document and have a complex set of classes to keep track of the various elements of the text, the editing history, the cursor position, and so on. Now suppose the user clicks on a different part of the text. Do I recreate the document object, copying many fields but not the EditState field; recreate the EditState with new ViewBounds and documentCursorPosition? Or do I alter a mutable variable in one spot? As long as thread safety is not an issue then is is much simpler and less error-prone to just update a variable or two than to copy everything. If thread safety is an issue, then protecting from concurrent access may be more work than using the immutable approach and dealing with out-of-date requests.
Computational efficiency
Another reason to use mutability is for speed. Object creation is cheap, but simple method calls are cheaper, and operations on primitive types are cheaper yet.
Let's suppose, for example, that we have a map and we want to sum the values and the squares of the values.
val xs = List.range(1,10000).map(x => x.toString -> x).toMap
val sum = xs.values.sum
val sumsq = xs.values.map(x => x*x).sum
If you do this every once in a while, it's no big deal. But if you pay attention to what's going on, for every list element you first recreate it (values), then sum it (boxed), then recreate it again (values), then recreate it yet again in squared form with boxing (map), then sum it. This is at least six object creations and five full traversals just to do two adds and one multiply per item. Incredibly inefficient.
You might try to do better by avoiding the multiple recursion and passing through the map only once, using a fold:
val (sum,sumsq) = ((0,0) /: xs){ case ((sum,sumsq),(_,v)) => (sum + v, sumsq + v*v) }
And this is much better, with about 15x better performance on my machine. But you still have three object creations every iteration. If instead you
case class SSq(var sum: Int = 0, var sumsq: Int = 0) {
def +=(i: Int) { sum += i; sumsq += i*i }
}
val ssq = SSq()
xs.foreach(x => ssq += x._2)
you're about twice as fast again because you cut the boxing down. If you have your data in an array and use a while loop, then you can avoid all object creation and boxing and speed up by another factor of 20.
Now, that said, you could also have chosen a recursive function for your array:
val ar = Array.range(0,10000)
def suma(xs: Array[Int], start: Int = 0, sum: Int = 0, sumsq: Int = 0): (Int,Int) = {
if (start >= xs.length) (sum, sumsq)
else suma(xs, start+1, sum+xs(start), sumsq + xs(start)*xs(start))
}
and written this way it's just as fast as the mutable SSq. But if we instead do this:
def sumb(xs: Array[Int], start: Int = 0, ssq: (Int,Int) = (0,0)): (Int,Int) = {
if (start >= xs.length) ssq
else sumb(xs, start+1, (ssq._1+xs(start), ssq._2 + xs(start)*xs(start)))
}
we're now 10x slower again because we have to create an object on each step.
So the bottom line is that it really only matters that you have immutability when you cannot conveniently carry your updating structure along as independent arguments to a method. Once you go beyond the complexity where that works, mutability can be a big win.
Cumulative Object Creation
If you need to build up a complex object with n fields from potentially faulty data, you can use a builder pattern that looks like so:
abstract class Built {
def x: Int
def y: String
def z: Boolean
}
private class Building extends Built {
var x: Int = _
var y: String = _
var z: Boolean = _
}
def buildFromWhatever: Option[Built] = {
val b = new Building
b.x = something
if (thereIsAProblem) return None
b.y = somethingElse
// check
...
Some(b)
}
This only works with mutable data. There are other options, of course:
class Built(val x: Int = 0, val y: String = "", val z: Boolean = false) {}
def buildFromWhatever: Option[Built] = {
val b0 = new Built
val b1 = b0.copy(x = something)
if (thereIsAProblem) return None
...
Some(b)
}
which in many ways is even cleaner, except you have to copy your object once for each change that you make, which can be painfully slow. And neither of these are particularly bulletproof; for that you'd probably want
class Built(val x: Int, val y: String, val z: Boolean) {}
class Building(
val x: Option[Int] = None, val y: Option[String] = None, val z: Option[Boolean] = None
) {
def build: Option[Built] = for (x0 <- x; y0 <- y; z0 <- z) yield new Built(x,y,z)
}
def buildFromWhatever: Option[Build] = {
val b0 = new Building
val b1 = b0.copy(x = somethingIfNotProblem)
...
bN.build
}
but again, there's lots of overhead.
I've found that imperative / mutable style is better fit for dynamic programming algorithms. If you insist on immutablility, it's harder to program for most people, and you end up using vast amounts of memory and / or overflowing the stack. One example: Dynamic programming in the functional paradigm
Some examples:
(Originally a comment) Any program has to do some input and output (otherwise, it's useless). But by definition, input/output is a side effect and can't be done without calling methods with side effects.
One major advantage of Scala is ability to use Java libraries. Many of them rely on mutable objects and methods with side-effects.
Sometimes you need a var due to scoping. See Temperature4 in this blog post for an example.
Concurrent programming. If you use actors, sending and receiving messages are a side effect; if you use threads, synchronizing on locks is a side effect and locks are mutable; event-driven concurrency is all about side effects; futures, concurrent collections, etc. are mutable.

Clojure's 'let' equivalent in Scala

Often I face following situation: suppose I have these three functions
def firstFn: Int = ...
def secondFn(b: Int): Long = ...
def thirdFn(x: Int, y: Long, z: Long): Long = ...
and I also have calculate function. My first approach can look like this:
def calculate(a: Long) = thirdFn(firstFn, secondFn(firstFn), secondFn(firstFn) + a)
It looks beautiful and without any curly brackets - just one expression. But it's not optimal, so I end up with this code:
def calculate(a: Long) = {
val first = firstFn
val second = secondFn(first)
thirdFn(first, second, second + a)
}
Now it's several expressions surrounded with curly brackets. At such moments I envy Clojure a little bit. With let function I can define this function in one expression.
So my goal here is to define calculate function with one expression. I come up with 2 solutions.
1 - With scalaz I can define it like this (are there better ways to do this with scalaz?):
def calculate(a: Long) =
firstFn |> {first => secondFn(first) |> {second => thirdFn(first, second, second + a)}}
What I don't like about this solution is that it's nested. The more vals I have the deeper this nesting is.
2 - With for comprehension I can achieve something similar:
def calculate(a: Long) =
for (first <- Option(firstFn); second <- Option(secondFn(first))) yield thirdFn(first, second, second + a)
From one hand this solution has flat structure, just like let in Clojure, but from the other hand I need to wrap functions' results in Option and receive Option as result from calculate (it's good it I'm dealing with nulls, but I don't... and don't want to).
Are there better ways to achieve my goal? What is the idiomatic way for dealing with such situations (may be I should stay with vals... but let way of doing it looks so elegant)?
From other hand it's connected to Referential transparency. All three functions are referentially transparent (in my example firstFn calculates some constant like Pi), so theoretically they can be replaced with calculation results. I know this, but compiler does not, so it can't optimize my first attempt. And here is my second question:
Can I somehow (may be with annotation) give hint to compiler, that my function is referentially transparent, so that it can optimize this function for me (put some kind of caching there, for example)?
Edit
Thanks everybody for the great answers! It's just impossible to select one best answer (may be because they all so good) so I will accept answer with the most up-votes, I think it's fair enough.
in the non-recursive case, let is a restructuring of lambda.
def firstFn : Int = 42
def secondFn(b : Int) : Long = 42
def thirdFn(x : Int, y : Long, z : Long) : Long = x + y + z
def let[A, B](x : A)(f : A => B) : B = f(x)
def calculate(a: Long) = let(firstFn){first => let(secondFn(first)){second => thirdFn(first, second, second + a)}}
Of course, that's still nested. Can't avoid that. But you said you like the monadic form. So here's the identity monad
case class Identity[A](x : A) {
def map[B](f : A => B) = Identity(f(x))
def flatMap[B](f : A => Identity[B]) = f(x)
}
And here's your monadic calculate. Unwrap the result by calling .x
def calculateMonad(a : Long) = for {
first <- Identity(firstFn)
second <- Identity(secondFn(first))
} yield thirdFn(first, second, second + a)
But at this point it sure looks like the original val version.
The Identity monad exists in Scalaz with more sophistication
http://scalaz.googlecode.com/svn/continuous/latest/browse.sxr/scalaz/Identity.scala.html
Stick with the original form:
def calculate(a: Long) = {
val first = firstFn
val second = secondFn(first)
thirdFn(first, second, second + a)
}
It's concise and clear, even to Java developers. It's roughly equivalent to let, just without limiting the scope of the names.
Here's an option you may have overlooked.
def calculate(a: Long)(i: Int = firstFn)(j: Long = secondFn(i)) = thirdFn(i,j,j+a)
If you actually want to create a method, this is the way I'd do it.
Alternatively, you could create a method (one might name it let) that avoids nesting:
class Usable[A](a: A) {
def use[B](f: A=>B) = f(a)
def reuse[B,C](f: A=>B)(g: (A,B)=>C) = g(a,f(a))
// Could add more
}
implicit def use_anything[A](a: A) = new Usable(a)
def calculate(a: Long) =
firstFn.reuse(secondFn)((first, second) => thirdFn(first,second,second+a))
But now you might need to name the same things multiple times.
If you feel the first form is cleaner/more elegant/more readable, then why not just stick with it?
First, read this recent commit message to the Scala compiler from none other than Martin Odersky and take it to heart...
Perhaps the real issue here is instantly jumping the gun on claiming it's sub-optimal. The JVM is pretty hot at optimising this sort of thing. At times, it's just plain amazing!
Assuming you have a genuine performance issue in an application that's in genuine need of a speed up, you should start with a profiler report proving that this is a significant bottleneck, on a suitably configured and warmed up JVM.
Then, and only then, should you look at ways to make it faster that may end up sacrificing code clarity.
Why not use pattern matching here:
def calculate(a: Long) = firstFn match { case f => secondFn(f) match { case s => thirdFn(f,s,s + a) } }
How about using currying to record the function return values (parameters from preceding parameter groups are available in suceeding groups).
A bit odd looking but fairly concise and no repeated invocations:
def calculate(a: Long)(f: Int = firstFn)(s: Long = secondFn(f)) = thirdFn(f, s, s + a)
println(calculate(1L)()())