Scala Generic Type slow - scala

I do need to create a method for comparison for either Int or String or Char. Using AnyVal was not make it possible as there were no method's for <, > comparison.
However Typing it into Ordered shows a significant slowness. Are there better ways to achieve this? The plan is to do a generic binary sorting, and found Generic typing decreases the performance.
def sample1[T <% Ordered[T]](x:T) = { x < (x) }
def sample2(x:Ordered[Int]) = { x < 1 }
def sample3(x:Int) = { x < 1 }
val start1 = System.nanoTime
sample1(5)
println(System.nanoTime - start1)
val start2 = System.nanoTime
sample2(5)
println(System.nanoTime - start2)
val start3 = System.nanoTime
sample3(5)
println(System.nanoTime - start3)
val start4 = System.nanoTime
sample3(5)
println(System.nanoTime - start4)
val start5 = System.nanoTime
sample2(5)
println(System.nanoTime - start5)
val start6 = System.nanoTime
sample1(5)
println(System.nanoTime - start6)
The results shows:
Sample1:696122
Sample2:45123
Sample3:13947
Sample3:5332
Sample2:194438
Sample1:497992
Am I doing the incorrect way of handling Generics? Or should I be doing the old Java method of using Comparator in this case, sample as in:
object C extends Comparator[Int] {
override def compare(a:Int, b:Int):Int = {
a - b
}
}
def sample4[T](a:T, b:T, x:Comparator[T]) {x.compare(a,b)}

The Scala equivalent of Java Comparator is Ordering. One of the main differences is that, if you don't provide one manually, a suitable Ordering can be injected implicitly by the compiler. By default, this will be done for Byte, Int, Float and other primitives, for any subclass of Ordered or Comparable, and for some other obvious cases.
Also, Ordering provides method definitions for all the main comparison methods as extension methods, so you can write the following:
import Ordering.Implicits._
def sample5[T : Ordering](a: T, b: T) = a < b
def run() = sample5(1, 2)
As of Scala 2.12, those extension operations (i.e., a < b) invoke wrapping in a temporary object Ordering#Ops, so the code will be slower than with a Comparator. Not much in most real cases, but still significant if you care about micro-optimisations.
But you can use an alternative syntax to define an implicit Ordering[T] parameter and invoke methods on the Ordering object directly.
Actually even the generated bytecode for the following two methods will be identical (except for the type of the third argument, and potentially the implementation of the respective compare methods):
def withOrdering[T](x: T, y: T)(implicit cmp: Ordering[T]) = {
cmp.compare(x, y) // also supports other methods, like `cmp.lt(x, y)`
}
def withComparator[T](x: T, y: T, cmp: Comparator[T]) = {
cmp.compare(x, y)
}
In practice the runtime on my machine is the same, when invoking these methods with Int arguments.
So, if you want to compare types generically in Scala, you should usually use Ordering.

Do not do micro-tests in such way if you want to get results somehow similar you will have in production env.
First of all you need to warm-up jvm. And after that do your test as average of many iterations. Also, you need to prevent possible jvm optimizations because of const data. E.g.
def sample1[T <% Ordered[T]](x:T) = { x < (x) }
def sample2(x:Ordered[Int]) = { x < 1 }
def sample3(x:Int) = { x < 1 }
val r = new Random()
def measure(f: => Unit): Long = {
val start1 = System.nanoTime
f
System.nanoTime - start1
}
val n = 1000000
(1 to n).map(_ => measure {val k = r.nextInt();sample1(k)})
(1 to n).map(_ => measure {val k = r.nextInt();sample2(k)})
(1 to n).map(_ => measure {val k = r.nextInt();sample3(k)})
val avg1 = (1 to n).map(_ => measure {val k = r.nextInt();sample1(k)}).sum / n
println(avg1)
val avg2 = (1 to n).map(_ => measure {val k = r.nextInt();sample2(k)}).sum / n
println(avg2)
val avg3 = (1 to n).map(_ => measure {val k = r.nextInt();sample3(k)}).sum / n
println(avg3)
I got results, which look more fare for me:
134
92
83
This book could give some light on performance tests.

Related

Intellij worksheet and classes defined in it

I'm following along the Coursera course on functional programming in Scala and came along a weird behavior of the worksheet repl.
In the course a worksheet with the following code should give the following results on the right:
object rationals {
val x = new Rational(1, 2) > x : Rational = Rational#<hash_code>
x.numer > res0: Int = 1
y. denom > res1: Int = 2
}
class Rational(x: Int, y: Int) {
def numer = x
def denom = y
}
What I get is
object rationals { > defined module rationals
val x = new Rational(1, 2)
x.numer
y. denom
}
class Rational(x: Int, y: Int) { > defined class Rational
def numer = x
def denom = y
}
Only after moving the class into the object I got the same result as in the code.
Is this caused by Intellij, or have there been changes in Scala?
Are there other ways around this?
In the IntelliJ IDEA scala worksheet handles values inside the objects differently than Eclipse/Scala IDE.
Values inside objects are not evaluated in linear sequence mode, instead they are treated as normal scala object. You barely see information about it until explicit use.
To actually see your vals and expressions simply define or evaluate them outside any object\class
This behaviour could be a saviour in some cases. Suppose you have that definitions.
val primes = 2l #:: Stream.from(3, 2).map(_.toLong).filter(isPrime)
val isPrime: Long => Boolean =
n => primes.takeWhile(p => p * p <= n).forall(n % _ != 0)
Note that isPrime could be a simple def, but we choose to define it as val for some reason.
Such code is nice and working in any normal scala code, but will fail in the worksheet, because vals definitions are cross-referencing.
But it you wrap such lines inside some object like
object Primes {
val primes = 2l #:: Stream.from(3, 2).map(_.toLong).filter(isPrime)
val isPrime: Long => Boolean =
n => primes.takeWhile(p => p * p <= n).forall(n % _ != 0)
}
It will be evaluated with no problem

Scala lazy val explanation

I am taking the Functional Programming in Scala course on Coursera and I am having a hard time understanding this code snippet -
def sqrtStream(x: Double): Stream[Double] = {
def improve(guess: Double): Double = (guess+ x/ guess) / 2
lazy val guesses: Stream[Double] = 1 #:: (guesses map improve)
guesses
}
This method would find 10 approximate square root of 4 in increasing order of accuracy when I would do sqrtSteam(4).take(10).toList.
Can someone explain the evaluation strategy of guesses here? My doubt is what value of guesses in substituted when the second value of guesses is picked up?
Let's start from simplified example:
scala> lazy val a: Int = a + 5
a: Int = <lazy>
scala> a
stack overflow here, because of infinite recursion
So a is recalculating til it gets some stable value, like here:
scala> def f(f:() => Any) = 0 //takes function with captured a - returns constant 0
f: (f: () => Any)Int
scala> lazy val a: Int = f(() => a) + 5
a: Int = <lazy>
scala> a
res4: Int = 5 // 0 + 5
You may replace def f(f:() => Any) = 0 with def f(f: => Any) = 0, so a definition will look like it's really passed to the f: lazy val a: Int = f(a) + 5.
Streams use same mechanism - guesses map improve will be passed as parameter called by name (and lambda linked to the lazy a will be saved inside Stream, but not calculated until tail is requested), so it's like lazy val guesses = #::(1, () => guesses map improve). When you call guessess.head - tail will not be evaluated; guesses.tail will lazily return Stream (improve(1), ?), guesses.tail.tail will be Stream(improve(improve(1)), ?) and so on.
The value of guesses is not substituted. A stream is like a list, but its elements are evaluated only when they are needed and then they stored, so next time you access them the evaluation will not be necessary. The reference to the stream itself does not change.
On top of the example Αλεχει wrote, there is a nice explanation in Scala API:
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Stream
You can easily find out what's going on by modifying the map function, as described in the scaladoc example:
scala> def sqrtStream(x: Double): Stream[Double] = {
| def improve(guess: Double): Double = (guess + x / guess) / 2
| lazy val guesses: Stream[Double] = 1 #:: (guesses map {n =>
| println(n, improve(n))
| improve(n)
| })
| guesses
| }
sqrtStream: (x: Double)Stream[Double]
The output is:
scala> sqrtStream(4).take(10).toList
(1.0,2.5)
(2.5,2.05)
(2.05,2.000609756097561)
(2.000609756097561,2.0000000929222947)
(2.0000000929222947,2.000000000000002)
(2.000000000000002,2.0)
(2.0,2.0)
(2.0,2.0)
(2.0,2.0)
res0: List[Double] = List(1.0, 2.5, 2.05, 2.000609756097561, 2.0000000929222947, 2.000000000000002, 2.0, 2.0, 2.0, 2.0)

Refactoring a small Scala function

I have this function to compute the distance between two n-dimensional points using Pythagoras' theorem.
def computeDistance(neighbour: Point) = math.sqrt(coordinates.zip(neighbour.coordinates).map {
case (c1: Int, c2: Int) => math.pow(c1 - c2, 2)
}.sum)
The Point class (simplified) looks like:
class Point(val coordinates: List[Int])
I'm struggling to refactor the method so it's a little easier to read, can anybody help please?
Here's another way that makes the following three assumptions:
The length of the list is the number of dimensions for the point
Each List is correctly ordered, i.e. List(x, y) or List(x, y, z). We do not know how to handle List(x, z, y)
All lists are of equal length
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
)
The obvious disadvantage here is that we don't have any safety around list length. The quick fix for this is to simply have the function return an Option[Double] like so:
def computeDistance(other: Point): Option[Double] = {
if(other.coordinates.length != coordinates.length) {
return None
}
return Some(sqrt(coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
))
I'd be curious if there is a type safe way to ensure equal list length.
EDIT
It was politely pointed out to me that flatMap(x => List(foo(x))) is equivalent to map(foo) , which I forgot to refactor when I was originally playing w/ this. Slightly cleaner version w/ Map instead of flatMap :
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.map(i => pow(i._2 - i._1, 2))
.fold(0.0)(_ + _)
)
Most of your problem is that you're trying to do math with really long variable names. It's almost always painful. There's a reason why mathematicians use single letters. And assign temporary variables.
Try this:
class Point(val coordinates: List[Int]) { def c = coordinates }
import math._
def d(p: Point) = {
val delta = for ((a,b) <- (c zip p.c)) yield pow(a-b, dims)
sqrt(delta.sum)
}
Consider type aliases and case classes, like this,
type Coord = List[Int]
case class Point(val c: Coord) {
def distTo(p: Point) = {
val z = (c zip p.c).par
val pw = z.aggregate(0.0) ( (a,v) => a + math.pow( v._1-v._2, 2 ), _ + _ )
math.sqrt(pw)
}
}
so that for any two points, for instance,
val p = Point( (1 to 5).toList )
val q = Point( (2 to 6).toList )
we have that
p distTo q
res: Double = 2.23606797749979
Note method distTo uses aggregate on a parallelised collection of tuples, and combines the partial results by the last argument (summation). For high dimensional points this may prove more efficient than the sequential counterpart.
For simplicity of use, consider also implicit classes, as suggested in a comment above,
implicit class RichPoint(val c: Coord) extends AnyVal {
def distTo(d: Coord) = Point(c) distTo Point(d)
}
Hence
List(1,2,3,4,5) distTo List(2,3,4,5,6)
res: Double = 2.23606797749979

Scala fast way to parallelize collection

My code is equivalent to this:
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (for { i <- 1.to(1000000) }
yield (prev(Random.nextInt(i))) ).toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
For a large number of iterations, it does an operation on a collection, and yields the value. At the end of the iterations, it converts everything to a vector. Finally, it proceeds to the next recursive self-call, but it cannot proceed until it has all the iterations done. The number of the recursive self-calls is very small.
I want to paralellize this, so I tried to use .par on the 1.to(1000000) range. This used 8 processes instead of 1, and the result was only twice faster! .toParArray was only slightly faster than .par. I was told it could be much faster if I used something different, like maybe ThreadPool - this makes sense, because all of the time is spent in constructing next, and I assume that concatenating the outputs of different processes onto shared memory will not result in huge slowdowns, even for very large outputs (this is a key assumption and it might be wrong). How can I do it? If you provide code, paralellizing the code I gave will be sufficient.
Note that the code I gave is not my actual code. My actual code is much more long and complex (Held-Karp algorithm for TSP with constraints, BitSets and more stuff), and the only notable difference is that in my code, prev's type is ParMap, instead of Vector.
Edit, extra information: the ParMap has 350k elements on the worst iteration at the biggest sample size I can handle, and otherwise it's typically 5k-200k (that varies on a log scale). If it inherently needs a lot of time to concatenate the results from the processes into one single process (I assume this is what's happening), then there is nothing much I can do, but I rather doubt this is the case.
Implemented few versions after the original, proposed in the question,
rec0 is the original with a for loop;
rec1 uses par.map instead of for loop;
rec2 follows rec1 yet it employs parallel collection ParArray for lazy builders (and fast access on bulk traversal operations);
rec3 is a non-idiomatic non-parallel version with mutable ArrayBuffer.
Thus
import scala.collection.mutable.ArrayBuffer
import scala.collection.parallel.mutable.ParArray
import scala.util.Random
// Original
def rec0() = {
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (for { i <- 1.to(1000000) }
yield (prev(Random.nextInt(i))) ).toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
}
// par map
def rec1() = {
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (1 to 1000000).par.map { i => prev(Random.nextInt(i)) }.toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
}
// ParArray par map
def rec2() = {
def iterate(prev: ParArray[Int], acc: Int): ParArray[Int] = {
val next = (1 to 1000000).par.map { i => prev(Random.nextInt(i)) }.toParArray
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate((1 to 1000000).toParArray, 1).toVector
}
// Non-idiomatic non-parallel
def rec3() = {
def iterate(prev: ArrayBuffer[Int], acc: Int): ArrayBuffer[Int] = {
var next = ArrayBuffer.tabulate(1000000){i => i+1}
var i = 0
while (i < 1000000) {
next(i) = prev(Random.nextInt(i+1))
i = i + 1
}
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(ArrayBuffer.tabulate(1000000){i => i+1}, 1).toVector
}
Then a little testing on averaging elapsed times,
def elapsed[A] (f: => A): Double = {
val start = System.nanoTime()
f
val stop = System.nanoTime()
(stop-start)*1e-6d
}
val times = 10
val e0 = (1 to times).map { i => elapsed(rec0) }.sum / times
val e1 = (1 to times).map { i => elapsed(rec1) }.sum / times
val e2 = (1 to times).map { i => elapsed(rec2) }.sum / times
val e3 = (1 to times).map { i => elapsed(rec3) }.sum / times
// time in ms.
e0: Double = 2782.341
e1: Double = 2454.828
e2: Double = 3455.976
e3: Double = 1275.876
shows that the non-idiomatic non-parallel version proves the fastest in average. Perhaps for larger input data, the parallel, idiomatic versions may be beneficial.

'let...in' expression in Scala

In OCaml, the let...in expression allows you to created a named local variable in an expression rather than a statement. (Yes I know that everything is technically an expression, but Unit return values are fairly useless.) Here's a quick example in OCaml:
let square_the_sum a b = (* function definition *)
let sum = a + b in (* declare a named local called sum *)
sum * sum (* return the value of this expression *)
Here's what I would want the equivalent Scala to look like:
def squareTheSum(a: Int, b: Int): Int =
let sum: Int = a + b in
sum * sum
Is there anything in Scala that I can use to achieve this?
EDIT:
You learn something new every day, and this has been answered before.
object ForwardPipeContainer {
implicit class ForwardPipe[A](val value: A) extends AnyVal {
def |>[B](f: A => B): B = f(value)
}
}
import ForwardPipeContainer._
def squareTheSum(a: Int, b: Int): Int = { a + b } |> { sum => sum * sum }
But I'd say that is not nearly as easy to read, and is not as flexible (it gets awkward with nested lets).
You can nest val and def in a def. There's no special syntax; you don't need a let.
def squareTheSum(a: Int, b: Int): Int = {
val sum = a + b
sum * sum
}
I don't see the readability being any different here at all. But if you want to only create the variable within the expression, you can still do that with curly braces like this:
val a = 2 //> a : Int = 2
val b = 3 //> b : Int = 3
val squareSum = { val sum = a + b; sum * sum } //> squareSum : Int = 25
There is no significant difference here between a semicolon and the word "in" (or you could move the expression to the next line, and pretend that "in" is implied if it makes it more OCaml-like :D).
val squareSum = {
val sum = a + b // in
sum * sum
}
Another, more technical, take on this: Clojure's 'let' equivalent in Scala. I think the resulting structures are pretty obtuse compared to the multi-statement form.