Why this function call in Scala is not optimized away? - scala

I'm running this program with Scala 2.10.3:
object Test {
def main(args: Array[String]) {
def factorial(x: BigInt): BigInt =
if (x == 0) 1 else x * factorial(x - 1)
val N = 1000
val t = new Array[Long](N)
var r: BigInt = 0
for (i <- 0 until N) {
val t0 = System.nanoTime()
r = r + factorial(300)
t(i) = System.nanoTime()-t0
}
val ts = t.sortWith((x, y) => x < y)
for (i <- 0 to 10)
print(ts(i) + " ")
println("*** " + ts(N/2) + "\n" + r)
}
}
and call to a pure function factorial with constant argument is evaluated during each loop iteration (conclusion based on timing results). Shouldn't the optimizer reuse function call result after the first call?
I'm using Scala IDE for Eclipse. Are there any optimization flags for the compiler, which may produce more efficient code?

Scala is not a purely functional language, so without an effect system it cannot know that factorial is pure (for example, it doesn't "know" anything about the multiplication of big ints).
You need to add your own memoization approach here. Most simply add a val f300 = factorial(300) outside your loop.
Here is a question about memoization.

Related

Getting rid of for loops in Scala

Here is a problem that involves a factorial. For a given number, n, find the answer to the following:
(1 / n!) * (1! + 2! + 3! + ... + n!)
The iterative solution in Scala is very easy – a simple for loop suffices.
object MyClass {
def fsolve(n: Int): Double = {
var a: Double = 1
var cum: Double = 1
for (i <- n to 2 by -1) {
a = a * (1.0/i.toDouble)
cum += a
}
scala.math.floor(cum*1000000) / 1000000
}
def main(args: Array[String]) {
println(fsolve(7)) // answer 1.173214
}
}
I want to get rid of the for loop and use a foldLeft operation. Since the idea is to reduce a list of numbers to a single result, a foldLeft, or a similar instruction ought to do the job. How? I’m struggling to find a good Scala example I can follow. The code below illustrates where I am struggling to make the leap to more idiomatic Scala.
object MyClass {
def fsolve(n: Int) = {
(n to 2 by -1).foldLeft(1.toDouble) (_*_)
// what comes next????
}
def main(args: Array[String]) {
println(fsolve(7))
}
}
Any suggestions or pointers to a solution?
The result is returned from foldLeft, like this:
val cum = (n to 2 by -1).foldLeft(1.toDouble) (_*_)
Only in your case the function needs to be different, as the fold above would multiply all i values together. You will pass both cum and a values for the folding:
def fsolve(n: Int): Double = {
val (cum, _) = (n to 2 by -1).foldLeft(1.0, 1.0) { case ((sum, a),i) =>
val newA = a * (1.0/i.toDouble)
(sum + newA, newA)
}
scala.math.floor(cum*1000000) / 1000000
}
The formula you have provided maps very nicely to the scanLeft function. It works sort of like a combination of foldLeft and map, running the fold operation but storing each generated value in the output list. The following code generates all of the factorials from 1 to n, sums them up, then divides by n!. Note that by performing a single floating point division at the end, instead of at every intermediate step, you reduce the odds of floating point errors.
def fsolve(n: Int): Double =
{
val factorials = (2 to n).scanLeft(1)((cum: Int, value: Int) => value*cum)
scala.math.floor(factorials.reduce(_+_)/factorials.last.toDouble*1000000)/1000000
}
I'll try to implement a solution without filling in the blanks but by proposing a different approach.
def fsolve(n: Int): Double = {
require(n > 0, "n must be positive")
def f(n: Int): Long = (1 to n).fold(1)(_ * _)
(1.0 / f(n)) * ((1 to n).map(f).sum)
}
In the function I make sure to fail for invalid input with require, I define factorial (as f) and then use it by simply writing down the function in the closest possible way to the original expression we wanted to implement:
(1.0 / f(n)) * ((1 to n).map(f).sum)
If you really want to fold explicitly you can rewrite this expression as follows:
(1.0 / f(n)) * ((1 to n).map(f).fold(0L)(_ + _))
Also, please note that since all operations you are executing (sums and multiplications) are commutative, you can use fold instead of foldLeft: using the former doesn't prescribe an order in which operation should run, allowing a specific implementation of the collection to run the computation in parallel.
You can play around with this code here on Scastie.

Intellij worksheet and classes defined in it

I'm following along the Coursera course on functional programming in Scala and came along a weird behavior of the worksheet repl.
In the course a worksheet with the following code should give the following results on the right:
object rationals {
val x = new Rational(1, 2) > x : Rational = Rational#<hash_code>
x.numer > res0: Int = 1
y. denom > res1: Int = 2
}
class Rational(x: Int, y: Int) {
def numer = x
def denom = y
}
What I get is
object rationals { > defined module rationals
val x = new Rational(1, 2)
x.numer
y. denom
}
class Rational(x: Int, y: Int) { > defined class Rational
def numer = x
def denom = y
}
Only after moving the class into the object I got the same result as in the code.
Is this caused by Intellij, or have there been changes in Scala?
Are there other ways around this?
In the IntelliJ IDEA scala worksheet handles values inside the objects differently than Eclipse/Scala IDE.
Values inside objects are not evaluated in linear sequence mode, instead they are treated as normal scala object. You barely see information about it until explicit use.
To actually see your vals and expressions simply define or evaluate them outside any object\class
This behaviour could be a saviour in some cases. Suppose you have that definitions.
val primes = 2l #:: Stream.from(3, 2).map(_.toLong).filter(isPrime)
val isPrime: Long => Boolean =
n => primes.takeWhile(p => p * p <= n).forall(n % _ != 0)
Note that isPrime could be a simple def, but we choose to define it as val for some reason.
Such code is nice and working in any normal scala code, but will fail in the worksheet, because vals definitions are cross-referencing.
But it you wrap such lines inside some object like
object Primes {
val primes = 2l #:: Stream.from(3, 2).map(_.toLong).filter(isPrime)
val isPrime: Long => Boolean =
n => primes.takeWhile(p => p * p <= n).forall(n % _ != 0)
}
It will be evaluated with no problem

Scala fast way to parallelize collection

My code is equivalent to this:
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (for { i <- 1.to(1000000) }
yield (prev(Random.nextInt(i))) ).toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
For a large number of iterations, it does an operation on a collection, and yields the value. At the end of the iterations, it converts everything to a vector. Finally, it proceeds to the next recursive self-call, but it cannot proceed until it has all the iterations done. The number of the recursive self-calls is very small.
I want to paralellize this, so I tried to use .par on the 1.to(1000000) range. This used 8 processes instead of 1, and the result was only twice faster! .toParArray was only slightly faster than .par. I was told it could be much faster if I used something different, like maybe ThreadPool - this makes sense, because all of the time is spent in constructing next, and I assume that concatenating the outputs of different processes onto shared memory will not result in huge slowdowns, even for very large outputs (this is a key assumption and it might be wrong). How can I do it? If you provide code, paralellizing the code I gave will be sufficient.
Note that the code I gave is not my actual code. My actual code is much more long and complex (Held-Karp algorithm for TSP with constraints, BitSets and more stuff), and the only notable difference is that in my code, prev's type is ParMap, instead of Vector.
Edit, extra information: the ParMap has 350k elements on the worst iteration at the biggest sample size I can handle, and otherwise it's typically 5k-200k (that varies on a log scale). If it inherently needs a lot of time to concatenate the results from the processes into one single process (I assume this is what's happening), then there is nothing much I can do, but I rather doubt this is the case.
Implemented few versions after the original, proposed in the question,
rec0 is the original with a for loop;
rec1 uses par.map instead of for loop;
rec2 follows rec1 yet it employs parallel collection ParArray for lazy builders (and fast access on bulk traversal operations);
rec3 is a non-idiomatic non-parallel version with mutable ArrayBuffer.
Thus
import scala.collection.mutable.ArrayBuffer
import scala.collection.parallel.mutable.ParArray
import scala.util.Random
// Original
def rec0() = {
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (for { i <- 1.to(1000000) }
yield (prev(Random.nextInt(i))) ).toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
}
// par map
def rec1() = {
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (1 to 1000000).par.map { i => prev(Random.nextInt(i)) }.toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
}
// ParArray par map
def rec2() = {
def iterate(prev: ParArray[Int], acc: Int): ParArray[Int] = {
val next = (1 to 1000000).par.map { i => prev(Random.nextInt(i)) }.toParArray
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate((1 to 1000000).toParArray, 1).toVector
}
// Non-idiomatic non-parallel
def rec3() = {
def iterate(prev: ArrayBuffer[Int], acc: Int): ArrayBuffer[Int] = {
var next = ArrayBuffer.tabulate(1000000){i => i+1}
var i = 0
while (i < 1000000) {
next(i) = prev(Random.nextInt(i+1))
i = i + 1
}
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(ArrayBuffer.tabulate(1000000){i => i+1}, 1).toVector
}
Then a little testing on averaging elapsed times,
def elapsed[A] (f: => A): Double = {
val start = System.nanoTime()
f
val stop = System.nanoTime()
(stop-start)*1e-6d
}
val times = 10
val e0 = (1 to times).map { i => elapsed(rec0) }.sum / times
val e1 = (1 to times).map { i => elapsed(rec1) }.sum / times
val e2 = (1 to times).map { i => elapsed(rec2) }.sum / times
val e3 = (1 to times).map { i => elapsed(rec3) }.sum / times
// time in ms.
e0: Double = 2782.341
e1: Double = 2454.828
e2: Double = 3455.976
e3: Double = 1275.876
shows that the non-idiomatic non-parallel version proves the fastest in average. Perhaps for larger input data, the parallel, idiomatic versions may be beneficial.

What is the fastest way to write Fibonacci function in Scala?

I've looked over a few implementations of Fibonacci function in Scala starting from a very simple one, to the more complicated ones.
I'm not entirely sure which one is the fastest. I'm leaning towards the impression that the ones that uses memoization is faster, however I wonder why Scala doesn't have a native memoization.
Can anyone enlighten me toward the best and fastest (and cleanest) way to write a fibonacci function?
The fastest versions are the ones that deviate from the usual addition scheme in some way. Very fast is the calculation somehow similar to a fast binary exponentiation based on these formulas:
F(2n-1) = F(n)² + F(n-1)²
F(2n) = (2F(n-1) + F(n))*F(n)
Here is some code using it:
def fib(n:Int):BigInt = {
def fibs(n:Int):(BigInt,BigInt) = if (n == 1) (1,0) else {
val (a,b) = fibs(n/2)
val p = (2*b+a)*a
val q = a*a + b*b
if(n % 2 == 0) (p,q) else (p+q,p)
}
fibs(n)._1
}
Even though this is not very optimized (e.g. the inner loop is not tail recursive), it will beat the usual additive implementations.
for me the simplest defines a recursive inner tail function:
def fib: Stream[Long] = {
def tail(h: Long, n: Long): Stream[Long] = h #:: tail(n, h + n)
tail(0, 1)
}
This doesn't need to build any Tuple objects for the zip and is easy to understand syntactically.
Scala does have memoization in the form of Streams.
val fib: Stream[BigInt] = 0 #:: 1 #:: fib.zip(fib.tail).map(p => p._1 + p._2)
scala> fib take 100 mkString " "
res22: String = 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 ...
Stream is a LinearSeq so you might like to convert it to an IndexedSeq if you're doing a lot of fib(42) type calls.
However I would question what your use-case is for a fibbonaci function. It will overflow Long in less than 100 terms so larger terms aren't much use for anything. The smaller terms you can just stick in a table and look them up if speed is paramount. So the details of the computation probably don't matter much since for the smaller terms they're all quick.
If you really want to know the results for very big terms, then it depends on whether you just want one-off values (use Landei's solution) or, if you're making a sufficient number of calls, you may want to pre-compute the whole lot. The problem here is that, for example, the 100,000th element is over 20,000 digits long. So we're talking gigabytes of BigInt values which will crash your JVM if you try to hold them in memory. You could sacrifice accuracy and make things more manageable. You could have a partial-memoization strategy (say, memoize every 100th term) which makes a suitable memory / speed trade-off. There is no clear anwser for what is the fastest: it depends on your usage and resources.
This could work. it takes O(1) space O(n) time to calculate a number, but has no caching.
object Fibonacci {
def fibonacci(i : Int) : Int = {
def h(last : Int, cur: Int, num : Int) : Int = {
if ( num == 0) cur
else h(cur, last + cur, num - 1)
}
if (i < 0) - 1
else if (i == 0 || i == 1) 1
else h(1,2,i - 2)
}
def main(args: Array[String]){
(0 to 10).foreach( (x : Int) => print(fibonacci(x) + " "))
}
}
The answers using Stream (including the accepted answer) are very short and idiomatic, but they aren't the fastest. Streams memoize their values (which isn't necessary in iterative solutions), and even if you don't keep the reference to the stream, a lot of memory may be allocated and then immediately garbage-collected. A good alternative is to use an Iterator: it doesn't cause memory allocations, is functional in style, short and readable.
def fib(n: Int) = Iterator.iterate(BigInt(0), BigInt(1)) { case (a, b) => (b, a+b) }.
map(_._1).drop(n).next
A little simpler tail Recursive solution that can calculate Fibonacci for large values of n. The Int version is faster but is limited, when n > 46 integer overflow occurs
def tailRecursiveBig(n :Int) : BigInt = {
#tailrec
def aux(n : Int, next :BigInt, acc :BigInt) :BigInt ={
if(n == 0) acc
else aux(n-1, acc + next,next)
}
aux(n,1,0)
}
This has already been answered, but hopefully you will find my experience helpful. I had a lot of trouble getting my mind around scala infinite streams. Then, I watched Paul Agron's presentation where he gave very good suggestions: (1) implement your solution with basic Lists first, then if you are going to generify your solution with parameterized types, create a solution with simple types like Int's first.
using that approach I came up with a real simple (and for me, easy to understand solution):
def fib(h: Int, n: Int) : Stream[Int] = { h #:: fib(n, h + n) }
var x = fib(0,1)
println (s"results: ${(x take 10).toList}")
To get to the above solution I first created, as per Paul's advice, the "for-dummy's" version, based on simple lists:
def fib(h: Int, n: Int) : List[Int] = {
if (h > 100) {
Nil
} else {
h :: fib(n, h + n)
}
}
Notice that I short circuited the list version, because if i didn't it would run forever.. But.. who cares? ;^) since it is just an exploratory bit of code.
The code below is both fast and able to compute with high input indices. On my computer it returns the 10^6:th Fibonacci number in less than two seconds. The algorithm is in a functional style but does not use lists or streams. Rather, it is based on the equality \phi^n = F_{n-1} + F_n*\phi, for \phi the golden ratio. (This is a version of "Binet's formula".) The problem with using this equality is that \phi is irrational (involving the square root of five) so it will diverge due to finite-precision arithmetics if interpreted naively using Float-numbers. However, since \phi^2 = 1 + \phi it is easy to implement exact computations with numbers of the form a + b\phi for a and b integers, and this is what the algorithm below does. (The "power" function has a bit of optimization in it but is really just iteration of the "mult"-multiplication on such numbers.)
type Zphi = (BigInt, BigInt)
val phi = (0, 1): Zphi
val mult: (Zphi, Zphi) => Zphi = {
(z, w) => (z._1*w._1 + z._2*w._2, z._1*w._2 + z._2*w._1 + z._2*w._2)
}
val power: (Zphi, Int) => Zphi = {
case (base, ex) if (ex >= 0) => _power((1, 0), base, ex)
case _ => sys.error("no negative power plz")
}
val _power: (Zphi, Zphi, Int) => Zphi = {
case (t, b, e) if (e == 0) => t
case (t, b, e) if ((e & 1) == 1) => _power(mult(t, b), mult(b, b), e >> 1)
case (t, b, e) => _power(t, mult(b, b), e >> 1)
}
val fib: Int => BigInt = {
case n if (n < 0) => 0
case n => power(phi, n)._2
}
EDIT: An implementation which is more efficient and in a sense also more idiomatic is based on Typelevel's Spire library for numeric computations and abstract algebra. One can then paraphrase the above code in a way much closer to the mathematical argument (We do not need the whole ring-structure but I think it's "morally correct" to include it). Try running the following code:
import spire.implicits._
import spire.algebra._
case class S(fst: BigInt, snd: BigInt) {
override def toString = s"$fst + $snd"++"φ"
}
object S {
implicit object SRing extends Ring[S] {
def zero = S(0, 0): S
def one = S(1, 0): S
def plus(z: S, w: S) = S(z.fst + w.fst, z.snd + w.snd): S
def negate(z: S) = S(-z.fst, -z.snd): S
def times(z: S, w: S) = S(z.fst * w.fst + z.snd * w.snd
, z.fst * w.snd + z.snd * w.fst + z.snd * w.snd)
}
}
object Fibo {
val phi = S(0, 1)
val fib: Int => BigInt = n => (phi pow n).snd
def main(arg: Array[String]) {
println( fib(1000000) )
}
}

Scala performance - Sieve

Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .
My implementation is:
import scala.math
object Sieve {
// Returns all prime numbers until maxNum
def getPrimes(maxNum : Int) = {
def sieve(list: List[Int], stop : Int) : List[Int] = {
list match {
case Nil => Nil
case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
case _ => list
}
}
val stop : Int = math.sqrt(maxNum).toInt
sieve((2 to maxNum).toList, stop)
}
def main(args: Array[String]) = {
val ap = printf("%d ", (_:Int));
// works
getPrimes(1000).foreach(ap(_))
// works
getPrimes(100000).foreach(ap(_))
// out of memory
getPrimes(1000000).foreach(ap(_))
}
}
Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .
Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .
PS: I've done something very similar in Haskell, and there I didn't encountered any issues .
I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.
import Stream._
val primes = 2 #:: sieve(3)
def sieve(n: Int) : Stream[Int] =
if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
else n #:: sieve(n + 2)
def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)
Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.
The code in question is not tail recursive, so Scala cannot optimize the recursion away. Also, Haskell is non-strict by default, so you can't hardly compare it to Scala. For instance, whereas Haskell benefits from foldRight, Scala benefits from foldLeft.
There are many Scala implementations of Sieve of Eratosthenes, including some in Stack Overflow. For instance:
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))
The following answer is about a 100 times faster than the "one-liner" answer using a Set (and the results don't need sorting to ascending order) and is more of a functional form than the other answer using an array although it uses a mutable BitSet as a sieving array:
object SoE {
def makeSoE_Primes(top: Int): Iterator[Int] = {
val topndx = (top - 3) / 2
val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
def cullp(i: Int) = {
import scala.annotation.tailrec; val p = i + i + 3
#tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
cull((p * p - 3) >>> 1)
}
(0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
}
}
It can be tested by the following code:
object Main extends App {
import SoE._
val top_num = 10000000
val strt = System.nanoTime()
val count = makeSoE_Primes(top_num).size
val end = System.nanoTime()
println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
println(f"Found $count primes up to $top_num" + ".")
println("Using one large mutable1 BitSet and functional code.")
}
With the results from the the above as follows:
Successfully completed without errors. [total 294 ms]
Found 664579 primes up to 10000000.
Using one large mutable BitSet and functional code.
There is an overhead of about 40 milliseconds for even small sieve ranges, and there are various non-linear responses with increasing range as the size of the BitSet grows beyond the different CPU caches.
It looks like List isn't very effecient space wise. You can get an out of memory exception by doing something like this
1 to 2000000 toList
I "cheated" and used a mutable array. Didn't feel dirty at all.
def primesSmallerThan(n: Int): List[Int] = {
val nonprimes = Array.tabulate(n + 1)(i => i == 0 || i == 1)
val primes = new collection.mutable.ListBuffer[Int]
for (x <- nonprimes.indices if !nonprimes(x)) {
primes += x
for (y <- x * x until nonprimes.length by x if (x * x) > 0) {
nonprimes(y) = true
}
}
primes.toList
}