How to make following code run for large values of `n` while maintaining the functional style in Scala? - scala

Seq.fill(n)(math.pow(Random.nextFloat,2) + math.pow(Random.nextFloat,2)).filter(_<1).size.toFloat/n*4
Basically this scala code checks number of times a random points comes out of first quadrant of a unit circle. For large values of n this code gives memory limit exceeded error as it requires too big sequence. I can write this java way. But is there some functional way to achieve this task?

If you use an Iterator no intermediate collection has to be created in memory.
Iterator.fill(n)(math.pow(Random.nextFloat,2) + math.pow(Random.nextFloat,2)).filter(_<1).size.toFloat/n*4

Instead of filtering the values which are less than 1 after filling the sequence, consider adding valid numbers (i.e numbers greater or equal to 1) to the list. Thus saving unnecessary iteration on the collection.
def nums(n: Int): Iterator[Float] = {
import scala.util.Random
def helper(items: Iterator[Float], counter: Int): Iterator[Float] = {
val num = math.pow(Random.nextFloat,2) + math.pow(Random.nextFloat,2)
if (counter > 0) {
if (num >= 1) helper( items ++ Iterator(num.toFloat), counter - 1) else helper(items, counter - 1)
} else items
}
helper(Iterator[Float](), n)
}
final answer
nums(n).toFloat/(n * 4)
Scala REPL
scala> def nums(n: Int): Iterator[Float] = {
| import scala.util.Random
| def helper(items: Iterator[Float], counter: Int): Iterator[Float] = {
| val num = math.pow(Random.nextFloat,2) + math.pow(Random.nextFloat,2)
| if (counter > 0) {
| if (num >= 1) helper( items ++ Iterator(num.toFloat), counter - 1) else helper(items, counter - 1)
| } else items
| }
| helper(Iterator[Float](), n)
| }
nums: (n: Int)Iterator[Float]
scala> nums(10000).size.toFloat/(10000 * 4)
res1: Float = 0.053925

Related

Code efficiency in Scala loops, counting up or counting down?

Clearly, if you need to count up, count up. If you need to count down, count down. However, other things being equal, is one faster than the other?
Here is my Scala code for a well-known puzzle - checking if a number is divisible by 13.
In the first example, I reverse my array and count upwards in the subsequent for-loop. In the second example I leave the array alone and do a decrementing for-loop. On the surface, the second example looks faster. Unfortunately, on the site where I run the code, it always times out.
// works every time
object Thirteen {
import scala.annotation.tailrec
#tailrec
def thirt(n: Long): Long = {
val getNum = (n: Int) => Array(1, 10, 9, 12, 3, 4)(n % 6)
val ni = n.toString.split("").reverse.map(_.toInt)
var s: Long = 0
for (i <- 0 to ni.length-1) {
s += ni(i) * getNum(i)
}
if (s == n) s else thirt(s)
}
}
// times out every time
object Thirteen {
import scala.annotation.tailrec
#tailrec
def thirt(n: Long): Long = {
val getNum = (n: Int) => Array(1, 10, 9, 12, 3, 4)(n % 6)
val ni = n.toString.split("").map(_.toInt)
var s: Long = 0
for (i <- ni.length-1 to 0 by -1) {
s = s + ni(i) * getNum(i)
}
if (s == n) s else thirt(s)
}
}
I ask the following questions:
Is there an obvious rule I am unaware of?
What is an easy way to test two code versions for performance – reliably measuring performance in the JVM appears difficult.
Does it help to look at the underlying byte code?
Is there a better piece of code solving
the same problem, If so, I'd be very grateful to see it.
Whilst I've seen similar questions, I can't find a definitive answer.
Here's how I'd be tempted to tackle it.
val nums :Stream[Int] = 1 #:: 10 #:: 9 #:: 12 #:: 3 #:: 4 #:: nums
def thirt(n :Long) :Long = {
val s :Long = Stream.iterate(n)(_ / 10)
.takeWhile(_ > 0)
.zip(nums)
.foldLeft(0L){case (sum, (i, num)) => sum + i%10 * num}
if (s == n) s else thirt(s)
}

Collatz - maximum number of steps and the corresponding number

I am trying to write a Scala function that takes an upper bound as argument and calculates the steps for the numbers in a range from 1 up to this bound. It had to return the maximum number of steps and the corresponding number that needs that many steps. (as a pair - first element is the number of steps and second is the corresponding index)
I already have created a function called "collatz" which computes the number of steps. I am very new with Scala and I am a bit stuck because of the limitations. Here's how I thought to start the function:
def max(x:Int):Int = {
for (i<-(1 to x).toList) yield collatz(i)
the way I think to solve this problem is to: 1. iterate through the range and apply collatz to all elements while putting them in a new list which stores the number of steps. 2. find the maximum of the new list by using List.max 3. Use List.IndexOf to find the index. However, I'm really stuck since I don't know how to do this without using var (and only using val). Thanks!
Something like this:
def collatzMax(n: Long): (Long, Long) = {
require(n > 0, "Collatz function is not defined for n <= 0")
def collatz(n: Long, steps: Long): Long = n match {
case n if (n <= 1) => steps
case n if (n % 2 == 0) => collatz(n / 2, steps + 1)
case n if (n % 2 == 1) => collatz(3 * n + 1, steps + 1)
}
def loop(n: Long, current: Long, acc: List[(Long, Long)]): List[(Long, Long)] =
if (current > n) acc
else {
loop(n, current + 1, collatz(current, 0) -> current :: acc)
}
loop(n, 1, Nil).sortBy(-_._1).head
}
Example:
collatzMax(12)
result: (Long, Long) = (19,9) // 19 steps for collatz(9)
Using for:
def collatzMax(n: Long) =
(for(i <- 1L to n) yield collatz(i) -> i).sortBy(-_._1).head
Or(continuing your idea):
def maximum(x: Long): (Long, Long) = {
val lst = for (i <- 1L to x) yield collatz(i)
val maxValue = lst.max
(maxValue, lst.indexOf(maxValue) + 1)
}
Try:
(1 to x).map(collatz).maxBy(_._2)._1

Scala functional solution for spoj "Prime Generator"

I worked on the Prime Generator problem for almost 3 days.
I want to make a Scala functional solution(which means "no var", "no mutable data"), but every time it exceed the time limitation.
My solution is:
object Main {
def sqrt(num: Int) = math.sqrt(num).toInt
def isPrime(num: Int): Boolean = {
val end = sqrt(num)
def isPrimeHelper(current: Int): Boolean = {
if (current > end) true
else if (num % current == 0) false
else isPrimeHelper(current + 1)
}
isPrimeHelper(2)
}
val feedMax = sqrt(1000000000)
val feedsList = (2 to feedMax).filter(isPrime)
val feedsSet = feedsList.toSet
def findPrimes(min: Int, max: Int) = (min to max) filter {
num => if (num <= feedMax) feedsSet.contains(num)
else feedsList.forall(p => num % p != 0 || p * p > num)
}
def main(args: Array[String]) {
val total = readLine().toInt
for (i <- 1 to total) {
val Array(from, to) = readLine().split("\\s+")
val primes = findPrimes(from.toInt, to.toInt)
primes.foreach(println)
println()
}
}
}
I'm not sure where can be improved. I also searched a lot, but can't find a scala solution(most are c/c++ ones)
Here is a nice fully functional scala solution using the sieve of eratosthenes: http://en.literateprograms.org/Sieve_of_Eratosthenes_(Scala)#chunk def:ints
Check out this elegant and efficient one liner by Daniel Sobral: http://dcsobral.blogspot.se/2010/12/sieve-of-eratosthenes-real-one-scala.html?m=1
lazy val unevenPrimes: Stream[Int] = {
def nextPrimes(n: Int, sqrt: Int, sqr: Int): Stream[Int] =
if (n > sqr) nextPrimes(n, sqrt + 1, (sqrt + 1)*(sqrt + 1)) else
if (unevenPrimes.takeWhile(_ <= sqrt).exists(n % _ == 0)) nextPrimes(n + 2, sqrt, sqr)
else n #:: nextPrimes(n + 2, sqrt, sqr)
3 #:: 5 #:: nextPrimes(7, 3, 9)
}

Scala fast way to parallelize collection

My code is equivalent to this:
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (for { i <- 1.to(1000000) }
yield (prev(Random.nextInt(i))) ).toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
For a large number of iterations, it does an operation on a collection, and yields the value. At the end of the iterations, it converts everything to a vector. Finally, it proceeds to the next recursive self-call, but it cannot proceed until it has all the iterations done. The number of the recursive self-calls is very small.
I want to paralellize this, so I tried to use .par on the 1.to(1000000) range. This used 8 processes instead of 1, and the result was only twice faster! .toParArray was only slightly faster than .par. I was told it could be much faster if I used something different, like maybe ThreadPool - this makes sense, because all of the time is spent in constructing next, and I assume that concatenating the outputs of different processes onto shared memory will not result in huge slowdowns, even for very large outputs (this is a key assumption and it might be wrong). How can I do it? If you provide code, paralellizing the code I gave will be sufficient.
Note that the code I gave is not my actual code. My actual code is much more long and complex (Held-Karp algorithm for TSP with constraints, BitSets and more stuff), and the only notable difference is that in my code, prev's type is ParMap, instead of Vector.
Edit, extra information: the ParMap has 350k elements on the worst iteration at the biggest sample size I can handle, and otherwise it's typically 5k-200k (that varies on a log scale). If it inherently needs a lot of time to concatenate the results from the processes into one single process (I assume this is what's happening), then there is nothing much I can do, but I rather doubt this is the case.
Implemented few versions after the original, proposed in the question,
rec0 is the original with a for loop;
rec1 uses par.map instead of for loop;
rec2 follows rec1 yet it employs parallel collection ParArray for lazy builders (and fast access on bulk traversal operations);
rec3 is a non-idiomatic non-parallel version with mutable ArrayBuffer.
Thus
import scala.collection.mutable.ArrayBuffer
import scala.collection.parallel.mutable.ParArray
import scala.util.Random
// Original
def rec0() = {
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (for { i <- 1.to(1000000) }
yield (prev(Random.nextInt(i))) ).toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
}
// par map
def rec1() = {
def iterate(prev: Vector[Int], acc: Int): Vector[Int] = {
val next = (1 to 1000000).par.map { i => prev(Random.nextInt(i)) }.toVector
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(1.to(1000000).toVector, 1)
}
// ParArray par map
def rec2() = {
def iterate(prev: ParArray[Int], acc: Int): ParArray[Int] = {
val next = (1 to 1000000).par.map { i => prev(Random.nextInt(i)) }.toParArray
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate((1 to 1000000).toParArray, 1).toVector
}
// Non-idiomatic non-parallel
def rec3() = {
def iterate(prev: ArrayBuffer[Int], acc: Int): ArrayBuffer[Int] = {
var next = ArrayBuffer.tabulate(1000000){i => i+1}
var i = 0
while (i < 1000000) {
next(i) = prev(Random.nextInt(i+1))
i = i + 1
}
if (acc < 20) iterate(next, acc + 1)
else next
}
iterate(ArrayBuffer.tabulate(1000000){i => i+1}, 1).toVector
}
Then a little testing on averaging elapsed times,
def elapsed[A] (f: => A): Double = {
val start = System.nanoTime()
f
val stop = System.nanoTime()
(stop-start)*1e-6d
}
val times = 10
val e0 = (1 to times).map { i => elapsed(rec0) }.sum / times
val e1 = (1 to times).map { i => elapsed(rec1) }.sum / times
val e2 = (1 to times).map { i => elapsed(rec2) }.sum / times
val e3 = (1 to times).map { i => elapsed(rec3) }.sum / times
// time in ms.
e0: Double = 2782.341
e1: Double = 2454.828
e2: Double = 3455.976
e3: Double = 1275.876
shows that the non-idiomatic non-parallel version proves the fastest in average. Perhaps for larger input data, the parallel, idiomatic versions may be beneficial.

Scala performance - Sieve

Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .
My implementation is:
import scala.math
object Sieve {
// Returns all prime numbers until maxNum
def getPrimes(maxNum : Int) = {
def sieve(list: List[Int], stop : Int) : List[Int] = {
list match {
case Nil => Nil
case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
case _ => list
}
}
val stop : Int = math.sqrt(maxNum).toInt
sieve((2 to maxNum).toList, stop)
}
def main(args: Array[String]) = {
val ap = printf("%d ", (_:Int));
// works
getPrimes(1000).foreach(ap(_))
// works
getPrimes(100000).foreach(ap(_))
// out of memory
getPrimes(1000000).foreach(ap(_))
}
}
Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .
Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .
PS: I've done something very similar in Haskell, and there I didn't encountered any issues .
I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.
import Stream._
val primes = 2 #:: sieve(3)
def sieve(n: Int) : Stream[Int] =
if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
else n #:: sieve(n + 2)
def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)
Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.
The code in question is not tail recursive, so Scala cannot optimize the recursion away. Also, Haskell is non-strict by default, so you can't hardly compare it to Scala. For instance, whereas Haskell benefits from foldRight, Scala benefits from foldLeft.
There are many Scala implementations of Sieve of Eratosthenes, including some in Stack Overflow. For instance:
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))
The following answer is about a 100 times faster than the "one-liner" answer using a Set (and the results don't need sorting to ascending order) and is more of a functional form than the other answer using an array although it uses a mutable BitSet as a sieving array:
object SoE {
def makeSoE_Primes(top: Int): Iterator[Int] = {
val topndx = (top - 3) / 2
val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
def cullp(i: Int) = {
import scala.annotation.tailrec; val p = i + i + 3
#tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
cull((p * p - 3) >>> 1)
}
(0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
}
}
It can be tested by the following code:
object Main extends App {
import SoE._
val top_num = 10000000
val strt = System.nanoTime()
val count = makeSoE_Primes(top_num).size
val end = System.nanoTime()
println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
println(f"Found $count primes up to $top_num" + ".")
println("Using one large mutable1 BitSet and functional code.")
}
With the results from the the above as follows:
Successfully completed without errors. [total 294 ms]
Found 664579 primes up to 10000000.
Using one large mutable BitSet and functional code.
There is an overhead of about 40 milliseconds for even small sieve ranges, and there are various non-linear responses with increasing range as the size of the BitSet grows beyond the different CPU caches.
It looks like List isn't very effecient space wise. You can get an out of memory exception by doing something like this
1 to 2000000 toList
I "cheated" and used a mutable array. Didn't feel dirty at all.
def primesSmallerThan(n: Int): List[Int] = {
val nonprimes = Array.tabulate(n + 1)(i => i == 0 || i == 1)
val primes = new collection.mutable.ListBuffer[Int]
for (x <- nonprimes.indices if !nonprimes(x)) {
primes += x
for (y <- x * x until nonprimes.length by x if (x * x) > 0) {
nonprimes(y) = true
}
}
primes.toList
}