Is nested function efficient? - scala

In programming languages like Scala or Lua, we can define nested functions such as
function factorial(n)
function _fac(n, acc)
if n == 0 then
return acc
else
return _fac(n-1, acc * n)
end
end
return _fac(n, 1)
end
Does this approach cause any inefficiency because the nested function instance is defined, or created, everytime we invoke the outer function?

Does this approach cause any inefficiency because the nested function
instance is defined, or created, everytime we invoke the outer
function?
Efficiency is a large and broad topic. I am assuming that by "inefficient" you mean "does calling the method recursively each time have an overhead"?
I can only speak on Scala's behalf, specifically the flavor targeting the JVM, as other flavors may act differently.
We can divide this question into two, depending on what you really meant.
Nested (local scope) methods in Scala are a lexical scope feature, meaning they give you the accessibility to outer method values, but once we emit the bytecode, they are defined at the class level, just as a plain java method.
For completeness, do know that Scala also has function values, which are first class citizens, meaning you can pass them around to other functions, then these would incur an allocation overhead, since they are implemented using classes.
Factorial can be written in a tail recursive manner, as you wrote it in your example. The Scala compiler is intelligent enough such that it will notice your method is tail recursive and turn it into an iterative loop, avoiding the method invocation for each iteration. It may also, if found possible, attempt to inline the factorial method, avoiding the overhead of an additional stack frame allocation.
For example, consider the following factorial implementation in Scala:
def factorial(num: Int): Int = {
#tailrec
def fact(num: Int, acc: Int): Int = num match {
case 0 => acc
case n => fact(n - 1, acc * n)
}
fact(num, 1)
}
On the face of it, we have a recursive method. Let's see what the JVM bytecode looks like:
private final int fact$1(int, int);
Code:
0: iload_1
1: istore 4
3: iload 4
5: tableswitch { // 0 to 0
0: 24
default: 28
}
24: iload_2
25: goto 41
28: iload 4
30: iconst_1
31: isub
32: iload_2
33: iload 4
35: imul
36: istore_2
37: istore_1
38: goto 0
41: ireturn
What we see here is that the recursion turned into an iterative loop (a tableswitch + a jump instruction).
Regarding the method instance creation, if our method was not tail recursive, the JVM runtime would need to interpret it for each invocation, until the C2 compiler finds it sufficient such that it will JIT compile it and re-use the machine code for each method call afterwards.
Generally, I would say this shouldn't worry you unless you've noticed the method is on the execution of your hot path and profiling the code led you to ask this question.
To conclude, efficiency is a very delicate, use case specific topic. I think we don't have enough information to tell you, from the simplified example you've provided, if this is the best option to choose for your use case. I say again, if this isn't something that showed up on your profiler, don't worry about this.

Let's benchmark it in Lua with/without nested functions.
Variant 1 (inner function object is created on every call)
local function factorial1(n)
local function _fac1(n, acc)
if n == 0 then
return acc
else
return _fac1(n-1, acc * n)
end
end
return _fac1(n, 1)
end
Variant 2 (functions are not nested)
local function _fac2(n, acc)
if n == 0 then
return acc
else
return _fac2(n-1, acc * n)
end
end
local function factorial2(n)
return _fac2(n, 1)
end
Benchmarking code (calculate 12! 10 mln times and display used CPU time in seconds):
local N = 1e7
local start_time = os.clock()
for j = 1, N do
factorial1(12)
end
print("CPU time of factorial1 = ", os.clock() - start_time)
local start_time = os.clock()
for j = 1, N do
factorial2(12)
end
print("CPU time of factorial2 = ", os.clock() - start_time)
Output for Lua 5.3 (interpreter)
CPU time of factorial1 = 8.237
CPU time of factorial2 = 6.074
Output for LuaJIT (JIT-compiler)
CPU time of factorial1 = 1.493
CPU time of factorial2 = 0.141

The answer depends on the language of course.
What happens in Scala in particular is that inner functions are compiled as they were standing outside of the scope of the function within which they are defined.
In this way the language only allows you to invoke them from the lexical scope where they are defined in, but does not actually instantiate the function multiple times.
We can easily test this by compiling two variants of the
The first one is a fairly faithful port of your Lua code:
class Function1 {
def factorial(n: Int): Int = {
def _fac(n: Int, acc: Int): Int =
if (n == 0)
acc
else
_fac(n-1, acc * n)
_fac(n, 1)
}
}
The second one is more or less the same, but the tail recursive function is defined outside of the scope of factorial:
class Function2 {
def factorial(n: Int): Int = _fac(n, 1)
private final def _fac(n: Int, acc: Int): Int =
if (n == 0)
acc
else
_fac(n-1, acc * n)
}
We can now compile these two classes with scalac and then use javap to have a look at the compiler output:
javap -p Function*.scala
which will yield the following output
Compiled from "Function1.scala"
public class Function1 {
public int factorial(int);
private final int _fac$1(int, int);
public Function1();
}
Compiled from "Function2.scala"
public class Function2 {
public int factorial(int);
private final int _fac(int, int);
public Function2();
}
I added the private final keywords to minimize the difference between the two, but the main thing to notice is that in both cases the definitions appear at the class level, with inner functions automatically defined as private and final and with a small decoration to ensure no name class (e.g. if you define a loop inner function inside two different ones).
Not sure about Lua or other languages, but I can expect at least most compiled languages to adopt a similar approach.

Yes (or it used to), as evidenced by Lua's effort to reuse function values when execution passes through a function definition multiple times.
Lua 5.2 Changes
Equality between function values has changed. Now, a function
definition may not create a new value; it may reuse some previous
value if there is no observable difference to the new function.
Since you have coded (assuming Lua) a function assigned to a global or local declared at a higher scope, you could code the short-circuit yourself (presuming no other code sets it to anything other than nil or false):
function factorial(n)
_fac = _fac or function (n, acc)
…
end
…
end

I don't know about lua, but in Scala are very common and used in recursive functions to ensure tail-safe optimization:
def factorial(i: Int): Int = {
#tailrec
def fact(i: Int, accumulator: Int): Int = {
if (i <= 1)
accumulator
else
fact(i - 1, i * accumulator)
}
fact(i, 1)
}
More info about tail-safe and recursion here

Related

When does Scala force a stream value?

I am comfortable with streams, but I admit I am puzzled by this behavior:
import collection.immutable.Stream
object StreamForceTest extends App {
println("Computing fibs")
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #::
fibs.zip(fibs.tail).map((x: (BigInt, BigInt)) => {
println("Adding " + x._1 + " and " + x._2);
x._1 + x._2
})
println("Taking first 5 elements")
val fibs5 = fibs.take(5)
println("Computing length of that prefix")
println("fibs5.length = " + fibs5.length)
}
with output
Computing fibs
Taking first 5 elements
Computing length of that prefix
Adding 0 and 1
Adding 1 and 1
Adding 1 and 2
fibs5.length = 5
Why should take(5) not force the stream's values to be computed,
while length does do so? Offhand neither one needs to actually
look at the values, but I would have thought that take was more
likely to do it than length. Inspecting the source code on github,
we find these definitions for take (including an illuminating
comment):
override def take(n: Int): Stream[A] = (
// Note that the n == 1 condition appears redundant but is not.
// It prevents "tail" from being referenced (and its head being evaluated)
// when obtaining the last element of the result. Such are the challenges
// of working with a lazy-but-not-really sequence.
if (n <= 0 || isEmpty) Stream.empty
else if (n == 1) cons(head, Stream.empty)
else cons(head, tail take n-1)
)
and length:
override def length: Int = {
var len = 0
var left = this
while (!left.isEmpty) {
len += 1
left = left.tail
}
len
}
The definition of head and tail is obtained from the specific
subclass (Empty and Cons). (Of course Empty is an object, not a
class, and its definitions of head and tail just throw
exceptions.) There are subtleties, but they seem to concern making
sure that the tail of a Cons is evaluated lazily; the head
definition is straight out of lecture 0 on Scala constructors.
Note that length doesn't go near head, but it's the one that
does the forcing.
All this is part of a general puzzlement about how close Scala streams
are to Haskell lists. I thought Haskell treated head and tail
symmetrically (I'm not a serious Haskell hacker), and Scala forced
head evaluation in more circumstances. I'm trying to figure out
exactly what those circumstances are.
Stream's head is strict and its tail is lazy, as you can see in cons.apply and in the Cons constructor:
def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)
class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A]
Notice the context in which the take method refers to tail:
cons(head, tail take n-1)
Because the expression tail take n-1 is used as the second argument to cons, which is passed by name, it doesn't force evaluation of tail take n-1, thus doesn't force evaluation of tail.
Whereas in length, the statement
left = left.tail
, by assigning left.tail to a var, does force its evaluation.
Scala is "strict by default". In most situations, everything you reference will be evaluated. We only have lazy evaluation in cases where a method/constructor parameter declares an call-by-name argument with =>, and in the culture we don't typically use this unless there's a special reason.
Let me offer another answer to this, one that just looks from a high level, i.e. without actually considering the code.
If you want to know how long a Stream is, you must evaluate it all the way to the end. Otherwise, you can only guess at its length. Admittedly, you may not actually care about the values (since you only want to count them) but that's immaterial.
On the other hand, when you "take" a certain number of elements from a stream (or indeed any collection) you are simply saying that you want at most that number of elements. The result is still a stream even though it may have been truncated.

Is this definition of a tail recursive fibonacci function tail-recursive?

I've seen around the following F# definition of a continuation-passing-style fibonacci function, that I always assumed to be tail recursive:
let fib k =
let rec fib' k cont =
match k with
| 0 | 1 -> cont 1
| k -> fib' (k-1) (fun a -> fib' (k-2) (fun b -> cont (a+b)))
fib' k id
When trying out the equivalent code in Scala, I've made use of the existent #tailrec and was caught off-guard when the Scala compiler informed me the recursive calls are NOT in tail position:
def fib(k: Int): Int = {
#tailrec def go(k: Int, cont: Int => Int): Int = {
if (k == 0 || k == 1) cont(1)
else go(k-1, { a => go(k-2, { b => cont(a+b) })})
}
go(k, { x => x })
}
I believe my Scala implementation is equivalent to the F# one, so I'm left wondering why the function is not tail recursive?
The second call to go on line 4 is not in tail position, it is wrapped inside an anonymous function. (It is in tail position for that function, but not for go itself.)
For continuation passing style you need Proper Tail Calls, which Scala unfortunately doesn't have. (In order to provide PTCs on the JVM, you need to manage your own stack and not use the JVM call stack which breaks interoperability with other languages, however, interoperability is a major design goal of Scala.)
The JVMs support for tail call elimination is limited.
I can't speak of the F# implementation, but in the scala you've got nested calls to go, so it's not in tail position. The simplest way to think about it is from the stacks point of view: is there any other information the stack needs to 'remember', when doing a recursive call?
In the case of the nested go calls there obviously is, because the inner call has to be completed before the computation 'goes back' and completes the outer call.
Fib can recursively be defined like so:
def fib(k:Int) = {
#tailrec
def go(k:Int, p:Int, c:Int) : Int = {
if(k == 0) p
else { go(k-1, c p+c) }
}
go(k,0 1)
}
Unfortunately, the JVM does not support tail-call optimisation yet (?) (to be fair, it can sometimes optimize some calls). Scala implements tail-recursion optimisation via program transformation (every tail-recursive function is equivalent to a loop). This is generally enough for simple recursive functions but mutual recursion or continuation-passing style require the full optimisation.
This is indeed problematic when using advanced functional patterns like CPS or monadic style. To avoid blowing the stack up you need using Trampolines. It works but this is neither as convenient nor efficient as proper tail-call optimisation. Edward Kmett's comments on the subject is a good reading.

Scala, Erastothenes: Is there a straightforward way to replace a stream with an iteration?

I wrote a function that generates primes indefinitely (wikipedia: incremental sieve of Erastothenes) usings streams. It returns a stream, but it also merges streams of prime multiples internally to mark upcoming composites. The definition is concise, functional, elegant and easy to understand, if I do say so myself:
def primes(): Stream[Int] = {
def merge(a: Stream[Int], b: Stream[Int]): Stream[Int] = {
def next = a.head min b.head
Stream.cons(next, merge(if (a.head == next) a.tail else a,
if (b.head == next) b.tail else b))
}
def test(n: Int, compositeStream: Stream[Int]): Stream[Int] = {
if (n == compositeStream.head) test(n+1, compositeStream.tail)
else Stream.cons(n, test(n+1, merge(compositeStream, Stream.from(n*n, n))))
}
test(2, Stream.from(4, 2))
}
But, I get a "java.lang.OutOfMemoryError: GC overhead limit exceeded" when I try to generate the 1000th prime.
I have an alternative solution that returns an iterator over primes and uses a priority queue of tuples (multiple, prime used to generate multiple) internally to mark upcoming composites. It works well, but it takes about twice as much code, and I basically had to restart from scratch:
import scala.collection.mutable.PriorityQueue
def primes(): Iterator[Int] = {
// Tuple (composite, prime) is used to generate a primes multiples
object CompositeGeneratorOrdering extends Ordering[(Long, Int)] {
def compare(a: (Long, Int), b: (Long, Int)) = b._1 compare a._1
}
var n = 2;
val composites = PriorityQueue(((n*n).toLong, n))(CompositeGeneratorOrdering)
def advance = {
while (n == composites.head._1) { // n is composite
while (n == composites.head._1) { // duplicate composites
val (multiple, prime) = composites.dequeue
composites.enqueue((multiple + prime, prime))
}
n += 1
}
assert(n < composites.head._1)
val prime = n
n += 1
composites.enqueue((prime.toLong * prime.toLong, prime))
prime
}
Iterator.continually(advance)
}
Is there a straightforward way to translate the code with streams to code with iterators? Or is there a simple way to make my first attempt more memory efficient?
It's easier to think in terms of streams; I'd rather start that way, then tweak my code if necessary.
I guess it's a bug in current Stream implementation.
primes().drop(999).head works fine:
primes().drop(999).head
// Int = 7919
You'll get OutOfMemoryError with stored Stream like this:
val prs = primes()
prs.drop(999).head
// Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
The problem here with class Cons implementation: it contains not only calculated tail, but also a function to calculate this tail. Even when the tail is calculated and function is not needed any more!
In this case functions are extremely heavy, so you'll get OutOfMemoryError even with 1000 functions stored.
We have to drop that functions somehow.
Intuitive fix is failed:
val prs = primes().iterator.toStream
prs.drop(999).head
// Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
With iterator on Stream you'll get StreamIterator, with StreamIterator#toStream you'll get initial heavy Stream.
Workaround
So we have to convert it manually:
def toNewStream[T](i: Iterator[T]): Stream[T] =
if (i.hasNext) Stream.cons(i.next, toNewStream(i))
else Stream.empty
val prs = toNewStream(primes().iterator)
// Stream[Int] = Stream(2, ?)
prs.drop(999).head
// Int = 7919
In your first code, you should postpone the merging until the square of a prime is seen amongst the candidates. This will drastically reduce the number of streams in use, radically improving your memory usage issues. To get the 1000th prime, 7919, we only need to consider primes not above its square root, 88. That's just 23 primes/streams of their multiples, instead of 999 (22, if we ignore the evens from the outset). For the 10,000th prime, it's the difference between having 9999 streams of multiples and just 66. And for the 100,000th, only 189 are needed.
The trick is to separate the primes being consumed from the primes being produced, via a recursive invocation:
def primes(): Stream[Int] = {
def merge(a: Stream[Int], b: Stream[Int]): Stream[Int] = {
def next = a.head min b.head
Stream.cons(next, merge(if (a.head == next) a.tail else a,
if (b.head == next) b.tail else b))
}
def test(n: Int, q: Int,
compositeStream: Stream[Int],
primesStream: Stream[Int]): Stream[Int] = {
if (n == q) test(n+2, primesStream.tail.head*primesStream.tail.head,
merge(compositeStream,
Stream.from(q, 2*primesStream.head).tail),
primesStream.tail)
else if (n == compositeStream.head) test(n+2, q, compositeStream.tail,
primesStream)
else Stream.cons(n, test(n+2, q, compositeStream, primesStream))
}
Stream.cons(2, Stream.cons(3, Stream.cons(5,
test(7, 25, Stream.from(9, 6), primes().tail.tail))))
}
As an added bonus, there's no need to store the squares of primes as Longs. This will also be much faster and have better algorithmic complexity (time and space) as it avoids doing a lot of superfluous work. Ideone testing shows it runs at about ~ n1.5..1.6 empirical orders of growth in producing up to n = 80,000 primes.
There's still an algorithmic problem here: the structure that is created here is still a linear left-leaning structure (((mults_of_2 + mults_of_3) + mults_of_5) + ...), with more frequently-producing streams situated deeper inside it (so the numbers have more levels to percolate through, going up). The right-leaning structure should be better, mults_of_2 + (mults_of_3 + (mults_of_5 + ...)). Making it a tree should bring a real improvement in time complexity (pushing it down typically to about ~ n1.2..1.25). For a related discussion, see this haskellwiki page.
The "real" imperative sieve of Eratosthenes usually runs at around ~ n1.1 (in n primes produced) and an optimal trial division sieve at ~ n1.40..1.45. Your original code runs at about cubic time, or worse. Using imperative mutable array is usually the fastest, working by segments (a.k.a. the segmented sieve of Eratosthenes).
In the context of your second code, this is how it is achieved in Python.
Is there a straightforward way to translate the code with streams to code with iterators? Or is there a simple way to make my first attempt more memory efficient?
#Will Ness has given you an improved answer using Streams and given reasons why your code is taking so much memory and time as in adding streams early and a left-leaning linear structure, but no one has completely answered the second (or perhaps main) part of your question as to can a true incremental Sieve of Eratosthenes be implemented with Iterator's.
First, we should properly credit this right-leaning algorithm of which your first code is a crude (left-leaning) example (since it prematurely adds all prime composite streams to the merge operations), which is due to Richard Bird as in the Epilogue of Melissa E. O'Neill's definitive paper on incremental Sieve's of Eratosthenes.
Second, no, it isn't really possible to substitute Iterator's for Stream's in this algorithm as it depends on moving through a stream without restarting the stream, and although one can access the head of an iterator (the current position), using the next value (skipping over the head) to generate the rest of the iteration as a stream requires building a completely new iterator at a terrible cost in memory and time. However, we can use an Iterator to output the results of the sequence of primes in order to minimize memory use and make it easy to use iterator higher order functions, as you will see in my code below.
Now Will Ness has walked you though the principles of postponing adding prime composite streams to the calculations until they are needed, which works well when one is storing these in a structure such as a Priority Queue or a HashMap and was even missed in the O'Neill paper, but for the Richard Bird algorithm this is not necessary as future stream values will not be accessed until needed so are not stored if the Streams are being properly lazily built (as is lazily and left-leaning). In fact, this algorithm doesn't even need the memorization and overheads of a full Stream as each composite number culling sequence only moves forward without reference to any past primes other than one needs a separate source of the base primes, which can be supplied by a recursive call of the same algorithm.
For ready reference, let's list the Haskell code of the Richard Bird algorithms as follows:
primes = 2:([3..] ‘minus‘ composites)
where
composites = union [multiples p | p <− primes]
multiples n = map (n*) [n..]
(x:xs) ‘minus‘ (y:ys)
| x < y = x:(xs ‘minus‘ (y:ys))
| x == y = xs ‘minus‘ ys
| x > y = (x:xs) ‘minus‘ ys
union = foldr merge []
where
merge (x:xs) ys = x:merge’ xs ys
merge’ (x:xs) (y:ys)
| x < y = x:merge’ xs (y:ys)
| x == y = x:merge’ xs ys
| x > y = y:merge’ (x:xs) ys
In the following code I have simplified the 'minus' function (called "minusStrtAt") as we don't need to build a completely new stream but can incorporate the composite subtraction operation with the generation of the original (in my case odds only) sequence. I have also simplified the "union" function (renaming it as "mrgMltpls")
The stream operations are implemented as a non memoizing generic Co Inductive Stream (CIS) as a generic class where the first field of the class is the value of the current position of the stream and the second is a thunk (a zero argument function that returns the next value of the stream through embedded closure arguments to another function).
def primes(): Iterator[Long] = {
// generic class as a Co Inductive Stream element
class CIS[A](val v: A, val cont: () => CIS[A])
def mltpls(p: Long): CIS[Long] = {
var px2 = p * 2
def nxtmltpl(cmpst: Long): CIS[Long] =
new CIS(cmpst, () => nxtmltpl(cmpst + px2))
nxtmltpl(p * p)
}
def allMltpls(mps: CIS[Long]): CIS[CIS[Long]] =
new CIS(mltpls(mps.v), () => allMltpls(mps.cont()))
def merge(a: CIS[Long], b: CIS[Long]): CIS[Long] =
if (a.v < b.v) new CIS(a.v, () => merge(a.cont(), b))
else if (a.v > b.v) new CIS(b.v, () => merge(a, b.cont()))
else new CIS(b.v, () => merge(a.cont(), b.cont()))
def mrgMltpls(mlps: CIS[CIS[Long]]): CIS[Long] =
new CIS(mlps.v.v, () => merge(mlps.v.cont(), mrgMltpls(mlps.cont())))
def minusStrtAt(n: Long, cmpsts: CIS[Long]): CIS[Long] =
if (n < cmpsts.v) new CIS(n, () => minusStrtAt(n + 2, cmpsts))
else minusStrtAt(n + 2, cmpsts.cont())
// the following are recursive, where cmpsts uses oddPrms and
// oddPrms uses a delayed version of cmpsts in order to avoid a race
// as oddPrms will already have a first value when cmpsts is called to generate the second
def cmpsts(): CIS[Long] = mrgMltpls(allMltpls(oddPrms()))
def oddPrms(): CIS[Long] = new CIS(3, () => minusStrtAt(5L, cmpsts()))
Iterator.iterate(new CIS(2L, () => oddPrms()))
{(cis: CIS[Long]) => cis.cont()}
.map {(cis: CIS[Long]) => cis.v}
}
The above code generates the 100,000th prime (1299709) on ideone in about 1.3 seconds with about a 0.36 second overhead and has an empirical computational complexity to 600,000 primes of about 1.43. The memory use is negligible above that used by the program code.
The above code could be implemented using the built-in Scala Streams, but there is a performance and memory use overhead (of a constant factor) that this algorithm does not require. Using Streams would mean that one could use them directly without the extra Iterator generation code, but as this is used only for final output of the sequence, it doesn't cost much.
To implement some basic tree folding as Will Ness has suggested, one only needs to add a "pairs" function and hook it into the "mrgMltpls" function:
def primes(): Iterator[Long] = {
// generic class as a Co Inductive Stream element
class CIS[A](val v: A, val cont: () => CIS[A])
def mltpls(p: Long): CIS[Long] = {
var px2 = p * 2
def nxtmltpl(cmpst: Long): CIS[Long] =
new CIS(cmpst, () => nxtmltpl(cmpst + px2))
nxtmltpl(p * p)
}
def allMltpls(mps: CIS[Long]): CIS[CIS[Long]] =
new CIS(mltpls(mps.v), () => allMltpls(mps.cont()))
def merge(a: CIS[Long], b: CIS[Long]): CIS[Long] =
if (a.v < b.v) new CIS(a.v, () => merge(a.cont(), b))
else if (a.v > b.v) new CIS(b.v, () => merge(a, b.cont()))
else new CIS(b.v, () => merge(a.cont(), b.cont()))
def pairs(mltplss: CIS[CIS[Long]]): CIS[CIS[Long]] = {
val tl = mltplss.cont()
new CIS(merge(mltplss.v, tl.v), () => pairs(tl.cont()))
}
def mrgMltpls(mlps: CIS[CIS[Long]]): CIS[Long] =
new CIS(mlps.v.v, () => merge(mlps.v.cont(), mrgMltpls(pairs(mlps.cont()))))
def minusStrtAt(n: Long, cmpsts: CIS[Long]): CIS[Long] =
if (n < cmpsts.v) new CIS(n, () => minusStrtAt(n + 2, cmpsts))
else minusStrtAt(n + 2, cmpsts.cont())
// the following are recursive, where cmpsts uses oddPrms and
// oddPrms uses a delayed version of cmpsts in order to avoid a race
// as oddPrms will already have a first value when cmpsts is called to generate the second
def cmpsts(): CIS[Long] = mrgMltpls(allMltpls(oddPrms()))
def oddPrms(): CIS[Long] = new CIS(3, () => minusStrtAt(5L, cmpsts()))
Iterator.iterate(new CIS(2L, () => oddPrms()))
{(cis: CIS[Long]) => cis.cont()}
.map {(cis: CIS[Long]) => cis.v}
}
The above code generates the 100,000th prime (1299709) on ideone in about 0.75 seconds with about a 0.37 second overhead and has an empirical computational complexity to the 1,000,000th prime (15485863) of about 1.09 (5.13 seconds). The memory use is negligible above that used by the program code.
Note that the above codes are completely functional in that there is no mutable state used whatsoever, but that the Bird algorithm (or even the tree folding version) isn't as fast as using a Priority Queue or HashMap for larger ranges as the number of operations to handle the tree merging has a higher computational complexity than the log n overhead of the Priority Queue or the linear (amortized) performance of a HashMap (although there is a large constant factor overhead to handle the hashing so that advantage isn't really seen until some truly large ranges are used).
The reason that these codes use so little memory is that the CIS streams are formulated with no permanent reference to the start of the streams so that the streams are garbage collected as they are used, leaving only the minimal number of base prime composite sequence place holders, which as Will Ness has explained is very small - only 546 base prime composite number streams for generating the first million primes up to 15485863, each placeholder only taking a few 10's of bytes (eight for the Long number, eight for the 64-bit function reference, with another couple of eight bytes for the pointer to the closure arguments and another few bytes for function and class overheads, for a total per stream placeholder of perhaps 40 bytes, or a total of not much more than 20 Kilobytes for generating the sequence for a million primes).
If you just want an infinite stream of primes, this is the most elegant way in my opinion:
def primes = {
def sieve(from : Stream[Int]): Stream[Int] = from.head #:: sieve(from.tail.filter(_ % from.head != 0))
sieve(Stream.from(2))
}

Tail Recursion and Side effect

I am actually learning scala and I have a question about tail-recursion. Here is an example of factorial with tail recursion in scala :
def factorial(n: Int): Int = {
#tailrec
def loop(acc: Int, n: Int): Int = {
if (n == 0) acc
else loop(n * acc, n - 1)
}
loop(1, n)
}
My question is updating the parameter, acc as we do it in the function loop can be considered as a side effect? Since in FP, we want to prevent or diminish the risk of side effect.
Maybe I get this wrong, but can someone explain to me this concept.
Thanks for your help
You aren't actually changing the value of any parameter here (as they are vals by definition, you couldn't, even if you wanted to).
You are returning a new value, calculated from the arguments passed in (and only those). Which, as #om-nom-nom pointed out in his comment, is the definition of pure function.

Scala: benchmark methods' speed by time: Total runtime for methods

I have two ways of finding factorial using Scala. I want to find out how much slower non-tail recursive way is compared to tail-recursive way.
// factorial non-tail recursive
def fact1(n: Int): Int =
if (n==0) 1
else n*fact(n-1)
// factorial tail recursive
def fact(n: Int): Int = {
def loop(acc: Int, n:Int): Int =
if (n==0) acc
else loop(acc*n, n-1)
loop(1, n)
}
// a1 = Time.now
fact1(100)
// a2 = Time.now
// a2-a1
// b1 = Time.now
fact(100)
// b2 = Time.now
// b2-b1
I just wrote up Ruby code for Time.now. Basically, how would you write Time.now like code in Scala?
You can use the java.lang.System class which provides methods for computing the current time:
currentTimeMillis for the current time in milliseconds
nanoTime for the current time in nanoseconds (which is recommended now).
However, writing good micro-benchmarks is very difficult and it is recommended to rely on a framework for that. Caliper is really good and there is a nice template for projects in scala.
You may use the Scala wrapper method scala.compat.Platform.currentTime which forwards the call to System.currentTimeMillis.