A grasp of immutable datastructures - scala

I am learning scala and as a good student I try to obey all rules I found.
One rule is: IMMUTABILITY!!!
So I have tried to code everything with immutable data structures and vals, and sometimes this is really hard.
But today I thought to myself: the only important thing is that the object/class should have no mutable state. I am not forced to code all methods in an immutable style, because these methods don't affect each other.
My Question: Am I correct or are there any problems/disadvantages I dont see?
EDIT:
Code example for aishwarya:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Explanation: It is a method to calculate the log-likelihood of a homogeneous Markov model of variable order. The apply method of state takes all previous symbols and the coming symbol and returns the probability of doing so.
As you may see: the whole method is just multiplying some probabilities which would be much easier using vars.

The rule is not really immutability, but referential transparency. It's perfectly OK to use locally declared mutable variables and arrays, because none of the effects are observable to any other parts of the overall program.
The principle of referential transparency (RT) is this:
An expression e is referentially transparent if for all programs p every occurrence of e in p can be replaced with the result of evaluating e, without affecting the observable result of p.
Note that if e creates and mutates some local state, it doesn't violate RT since nobody can observe this happening.
That said, I very much doubt that your implementation is any more straightforward with vars.

The case for functional programming is one of being concise in your code and bringing in a more mathematical approach. It can reduce the possibility of bugs and make your code smaller and more readable. As for being easier or not, it does require that you think about your problems differently. But once you get use to thinking with functional patterns it's likely that functional will become easier that the more imperative style.
It is really hard to be perfectly functional and have zero mutable state but very beneficial to have minimal mutable state. The thing to remember is that everything needs to done in balance and not to the extreme. By reducing the amount of mutable state you end up making it harder to write code with unintended consequences. A common pattern is to have a mutable variable whose value is immutable. This way identity ( the named variable ) and value ( an immutable object the variable can be assigned ) are seperate.
var acc: List[Int] = Nil
// lots of complex stuff that adds values
acc ::= 1
acc ::= 2
acc ::= 3
// do loop current list
acc foreach { i => /* do stuff that mutates acc */ acc ::= i * 10 }
println( acc ) // List( 1, 2, 3, 10, 20, 30 )
The foreach is looping over the value of acc at the time we started the foreach. Any mutations to acc do not affect the loop. This is much safer than the typical iterators in java where the list can change mid iteration.
There is also a concurrency concern. Immutable objects are useful because of the JSR-133 memory model specification which asserts that the initialization of an objects final members will occur before any thread can have visibility to those members, period! If they are not final then they are "mutable" and there is no guarantee of proper initialization.
Actors are the perfect place to put mutable state. Objects that represent data should be immutable. Take the following example.
object MyActor extends Actor {
var acc: List[Int] = Nil
def act() {
loop {
react {
case i: Int => acc ::= i
case "what is your current value" => reply( acc )
case _ => // ignore all other messages
}
}
}
}
In this case we can send the value of acc ( which is a List ) and not worry about synchronization because List is immutable aka all of the members of the List object are final. Also because of the immutability we know that no other actor can change the underlying data structure that was sent and thus no other actor can change the mutable state of this actor.

Since Apocalisp has already mentioned the stuff I was going to quote him on, I'll discuss the code. You say it is just multiplying stuff, but I don't see that -- it makes reference to at least three important methods defined outside: order, states and M.log. I can infer that order is an Int, and that states return a function that takes a List[T] and a T and returns Double.
There's also some weird stuff going on...
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
sequence is never used except to define seqPos, so why do that?
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
Actually, you could use sequence here instead of seqPos.map( _._1 ), since all that does is undo the zipWithIndex. Also, slice(0, pos) is just take(pos).
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Now, given the missing methods, it is difficult to assert how this should really be written in functional style. Keeping the mystery methods would yield:
def logLikelihood(seq: Iterator[T]): Double = {
import scala.collection.immutable.Queue
case class State(index: Int, order: Int, slice: Queue[T], result: Double)
seq.foldLeft(State(0, 0, Queue.empty, 0.0)) {
case (State(index, ord, slice, result), symb) =>
val state = states(order)
val partial = M.log(state(slice, symb))
val newSlice = slice enqueue symb
State(index + 1,
if (ord == order) ord else ord + 1,
if (queue.size > order) newSlice.dequeue._2 else newSlice,
result + partial)
}.result
}
Only I suspect the state/M.log stuff could be made part of State as well. I notice other optimizations now that I have written it like this. The sliding window you are using reminds me, of course, of sliding:
seq.sliding(order).zipWithIndex.map {
case (slice, index) => M.log(states(index + order)(slice.init, slice.last))
}.sum
That will only start at the orderth element, so some adaptation would be in order. Not too difficult, though. So let's rewrite it again:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val slices = (1 until order).map(sequence take) ::: sequence.sliding(order)
slices.zipWithIndex.map {
case (slice, index) => M.log(states(index)(slice.init, slice.last))
}.sum
}
I wish I could see M.log and states... I bet I could turn that map into a foldLeft and do away with these two methods. And I suspect the method returned by states could take the whole slice instead of two parameters.
Still... not bad, is it?

Related

Scala: dynamic programming recursion using iterators

Learning how to do dynamic programming in Scala, and I'm often finding myself in a situation where I want to recursively proceed over an array (or some other iterable) of items. When I do this, I tend to write cumbersome functions like this:
def arraySum(array: Array[Int], index: Int, accumulator: Int): Int => {
if (index == array.length) {
accumulator
} else {
arraySum(array, index + 1, accumulator + array(index)
}
}
arraySum(Array(1,2,3), 0, 0)
(Ignore for a moment that I could just call sum on the array or do a .reduce(_ + _), I'm trying to learn programming principles.)
But this seems like I'm passing alot of variables, and what exactly is the point of passing the array to each function call? This seems unclean.
So instead I got the idea to do this with iterators and not worry about passing indexes:
def arraySum(iter: Iterator[Int])(implicit accumulator: Int = 0): Int = {
try {
val nextInt = iter.next()
arraySum(iter)(accumulator + nextInt)
} catch {
case nee: NoSuchElementException => accumulator
}
}
arraySum(Array(1,2,3).toIterator)
This seems like a much cleaner solution. However, this falls apart when you need to use dynamic programming to explore some outcome space and you don't need to call the iterator at every function call. E.g.
def explore(iter: Iterator[Int])(implicit accumulator: Int = 0): Int = {
if (someCase) {
explore(iter)(accumulator)
} else if (someOtherCase){
val nextInt = iter.next()
explore(iter)(accumulator + nextInt)
} else {
// Some kind of aggregation/selection of explore results
}
}
My understanding is that the iter iterator here functions as pass by reference, so when this function calls iter.next() that changes the instance of iter that is passed to all other recursive calls of the function. So to get around that, now I'm cloning the iterator at every call of the explore function. E.g.:
def explore(iter: Iterator[Int])(implicit accumulator: Int = 0): Int = {
if (someCase) {
explore(iter)(accumulator)
} else if (someOtherCase){
val iterClone = iter.toList.toIterator
explore(iterClone)(accumulator + iterClone.next())
} else {
// Some kind of aggregation/selection of explore results
}
}
But this seems pretty stupid, and the stupidity escalates when I have multiple iterators that may or may not need cloning in multiple else if cases. What is the right way to handle situations like this? How can I elegantly solve these kinds of problems?
Suppose that you want to write a back-tracking recursive function that needs some complex data structure as an argument, so that the recursive calls receive a slightly modified version of the data structure. You have several options how you could do it:
Clone the entire data structure, modify it, pass it to recursive call. This is very simple, but usually very expensive.
Modify the mutable structure in-place, pass it to the recursive call, then revert the modification when backtracking. You have to ensure that every possible call of your recursive function always restores the original state of the data structure exactly. This is much more efficient, but is hard to implement, because it can be very error prone.
Subdivide the structure into a large immutable and a small mutable part. For example, you could pass an index (or a pair of indices) that specify some slice of an array explicitly, along with an array that is never mutated. You could then "clone" and save only the mutable part, and restore it when backtracking. If it works, it is both simple and fast, but it doesn't always work, because substructures can be hard to describe by just few integer indices.
Rely on persistent immutable data structures whenever you can.
I'd like to elaborate on the last point, because this is the preferred way to do it in Scala and in functional programming in general.
Here is your original code, that uses the third strategy:
def arraySum(array: Array[Int], index: Int, accumulator: Int): Int = {
if (index == array.length) {
accumulator
} else {
arraySum(array, index + 1, accumulator + array(index))
}
}
If you would use a List instead of an Array, you could rewrite it to this:
#annotation.tailrec
def listSum(list: List[Int], acc: Int): Int = list match {
case Nil => acc
case h :: t => listSum(t, acc + h)
}
Here, h :: t is a pattern that deconstructs the list into the head and the tail.
Note that you don't need an explicit index any more, because accessing the tail t of the list is a constant-time operation, so that only the relevant remaining sublist is passed to the recursive call of listSum.
There is no backtracking here, but if the recursive method would backtrack, using lists would bring another advantage: extracting a sublist is almost free (constant time operation), but it's still guaranteed to be immutable, so you can just pass it into the recursive call, without having to care about whether the recursive call modifies it or not, and so you don't have to do anything to undo any modifications that could have been done by the recursive calls. This is the advantage of persistent immutable data structures: related lists can share most of their structure, while still appearing immutable from the outside, so that it's impossible to break anything in the parent list just because you have the access to the tail of this list. This would not be the case with a view over a mutable array.

Early return from a for loop in Scala

Now I have some Scala code similar to the following:
def foo(x: Int, fn: (Int, Int) => Boolean): Boolean = {
for {
i <- 0 until x
j <- i + 1 until x
if fn(i, j)
} return true
false
}
But I get the feeling that return true is not so functional (or maybe it is?). Is there a way to rewrite this piece of code in a more elegant way?
In general, what is the more functional (if any) way to write the return-early-from-a-loop kind of code?
There are several methods can help, such as find, exists, etc. For your case, try this:
def foo2(x: Int, fn: (Int, Int) => Boolean): Boolean = {
(0 until x).exists(i =>
(i+1 until x).exists(j=>fn(i, j)))
}
Since all you are checking for is existence, you can just compose 2 uses of exists:
(0 until x).exists(i => (i + 1 until x).exists(fn(i, _)))
More generally, if you are concerned with more than just determining if a certain element exists, you can convert your comprehension to a series of Streams, Iterators, or views, you can use exists and it will evaluate lazily, avoiding unnecessary executions of the loop:
def foo(x: Int, fn: (Int, Int) => Boolean): Boolean = {
(for {
i <- (0 until x).iterator
j <- (i + 1 until x).iterator
} yield(i, j)).exists(fn.tupled)
}
You can also use map and flatMap instead of a for, and toStream or view instead of iterator:
(0 until x).view.flatMap(i => (i + 1 until x).toStream.map(j => i -> j)).exists(fn.tupled)
You can also use view on any collection to get a collection where all the transformers are performed lazily. This is possibly the most idiomatic way to short-circuit a collection traversal. From the docs on views:
Scala collections are by default strict in all their transformers, except for Stream, which implements all its transformer methods lazily. However, there is a systematic way to turn every collection into a lazy one and vice versa, which is based on collection views. A view is a special kind of collection that represents some base collection, but implements all transformers lazily.
As far as overhead is concerned, it really depends on the specifics! Different collections have different implementations of view, toStream, and iterator that may vary in amount of overhead. If fn is very expensive to compute, this overhead is probably worth it, and keeping a consistent, idiomatic, functional style to your code makes it more maintainable, debuggable, and readable. If you are in a situation that calls for extreme optimization, you may want to fall back to the lower-level constructs like return (which isn't without it's own overhead!).

Is there a way in Scala to remove the mutable variable(s) or it is fine to keep the mutable variables in the below case?

I understand that Scala embraces immutability fully.
Now I am thinking a scenario that I have to hold some state (via variables) in a class or such. I will need to update these variables later; then I can revisit the class later to access the updated variables.
I will try to make it simple with one very straightforward example:
class A {
var x: Int
def compute: Int = {calling some other processes or such using x as input}
}
......
def invoker() {
val a: A = new A
a.x = 1
......
val res1 = a.compute
a.x = 5
......
val res2 = a.compute
......
}
So you see, I need to keep changing x and get the results. If you argue that I can simply keep x as an argument for compute such as
def compute(x: Int)
......
That's a good idea but I cannot do it in my case as I need to separate setting value for x and computing the result completely. In other words, setting x value should not trigger "computing" to occur, rather, I need to be able to set x value anytime in the program and be able to reuse the value for computation any other time in the program when I need it.
I am using a variable (var x: Int) in this case. Is this legitimate or there is still some immutable way to handle it?
Any time you store state you will need to use mutability.
In your case, you want to store x and compute separately. Inherently, this means state is required since the results of compute depends on the state of x
If you really want the class with compute to be immutable, then some other mutable class will need to contain x and it will need to be passed to the compute method.
rather, I need to be able to set x value anytime in the program and be able to reuse the value for computation any other time in the program when I need it.
Then, by definition you want your class to be stateful. You could restructure your problem so that particular class doesn't require state, but whether that's useful and/or worth the hassle is something you'll have to figure out.
Your pattern is used in a ListBuffer for example (with size as your compute function).
So yes, there might be cases where you can use this pattern for good reasons. Example:
val l = List(1, 2, 3)
val lb = new ListBuffer[Int]
l.foreach(n => lb += n * n)
val result = lb.toList
println(result)
On the other hand a buffer is normally only used to create an immutable instance as soon as possible. If you look at this code, there are two items which might indicate that it can be changed: The mutable buffer and foreach (because foreach is only called for its side-effects)
So another option is
val l = List(1, 2, 3)
val result = l.map(n => n * n)
println(result)
which does the same in fewer lines. I prefer this style, because your are just looking at immutable instances and "functional" functions.
In your abstract example, you could try to separate the mutable state and the function:
class X(var i: Int)
class A {
def compute(x: X): Int = { ... }
}
possibly even
class X(val i: Int)
This way compute becomes functional: It's return value only depends from the parameter.
My personal favorite regarding an "unexpected" immutable class is scala.collection.immutable.Queue. With an "imperative" background, you just not expect a queue to be immutable.
So if you look at your pattern, it's likely that you can change it to being immutable.
I would create an immutable A class (here its a case class) and let an object handle the mutability. For each state change we create a new A object and change the reference in the object. This is handle concurrency bit better if you set x from a different thread, you just have to make the variable a volatile or an AtomicReference.
object A {
private[this] var a = A(0)
def setX(x: Int) { if (x != a.x) a = new A(x) }
def getA: A = a
}
case class A(x: Int) {
def compute: Int = { /*do your stuff*/ }
}
After a few more months on functional programming, here is my rethinking.
Every time a variable is modified/changed/updated/mutated, the imperative way of handling this is to record such change right with that variable. The functional way of thinking is to make the activity (that cause the change) bring the new state to you. In other words, it's like cause effect stuff. Functional way thinking focuses on the transition activity between cause and effect.
Given all that, in any given point of time in the program execution, our achievement is the intermediate result. We need somewhere to hold the result no matter how we do it. Such intermediate result is the state and yes, we need some variable to hold it. That's what I want to share with just abstract thinking.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity

When imperative style fits better?

From the Programming in Scala (second edition), bottom of the p.98:
A balanced attitude for Scala programmers
Prefer vals, immutable objects, and methods without side effects.
Reach for them first. Use vars, mutable objects, and methods with side effects when you have a specific need and justification for them.
It is explained on previous pages why to prefer vals, immutable objects, and methods without side effects so this sentence makes perfect sense.
But second sentence:"Use vars, mutable objects, and methods with side effects when you have a specific need and justification for them." is not explained so well.
So my question is:
What is justification or specific need to use vars, mutable objects and methods with side effect?
P.s.: It would be great if someone could provide some examples for each of those (besides explanation).
In many cases functional programming increases the level of abstraction and hence makes your code more concise and easier/faster to write and understand. But there are situations where the resulting bytecode cannot be as optimized (fast) as for an imperative solution.
Currently (Scala 2.9.1) one good example is summing up ranges:
(1 to 1000000).foldLeft(0)(_ + _)
Versus:
var x = 1
var sum = 0
while (x <= 1000000) {
sum += x
x += 1
}
If you profile these you will notice a significant difference in execution speed. So sometimes performance is a really good justification.
Ease of Minor Updates
One reason to use mutability is if you're keeping track of some ongoing process. For example, let's suppose I am editing a large document and have a complex set of classes to keep track of the various elements of the text, the editing history, the cursor position, and so on. Now suppose the user clicks on a different part of the text. Do I recreate the document object, copying many fields but not the EditState field; recreate the EditState with new ViewBounds and documentCursorPosition? Or do I alter a mutable variable in one spot? As long as thread safety is not an issue then is is much simpler and less error-prone to just update a variable or two than to copy everything. If thread safety is an issue, then protecting from concurrent access may be more work than using the immutable approach and dealing with out-of-date requests.
Computational efficiency
Another reason to use mutability is for speed. Object creation is cheap, but simple method calls are cheaper, and operations on primitive types are cheaper yet.
Let's suppose, for example, that we have a map and we want to sum the values and the squares of the values.
val xs = List.range(1,10000).map(x => x.toString -> x).toMap
val sum = xs.values.sum
val sumsq = xs.values.map(x => x*x).sum
If you do this every once in a while, it's no big deal. But if you pay attention to what's going on, for every list element you first recreate it (values), then sum it (boxed), then recreate it again (values), then recreate it yet again in squared form with boxing (map), then sum it. This is at least six object creations and five full traversals just to do two adds and one multiply per item. Incredibly inefficient.
You might try to do better by avoiding the multiple recursion and passing through the map only once, using a fold:
val (sum,sumsq) = ((0,0) /: xs){ case ((sum,sumsq),(_,v)) => (sum + v, sumsq + v*v) }
And this is much better, with about 15x better performance on my machine. But you still have three object creations every iteration. If instead you
case class SSq(var sum: Int = 0, var sumsq: Int = 0) {
def +=(i: Int) { sum += i; sumsq += i*i }
}
val ssq = SSq()
xs.foreach(x => ssq += x._2)
you're about twice as fast again because you cut the boxing down. If you have your data in an array and use a while loop, then you can avoid all object creation and boxing and speed up by another factor of 20.
Now, that said, you could also have chosen a recursive function for your array:
val ar = Array.range(0,10000)
def suma(xs: Array[Int], start: Int = 0, sum: Int = 0, sumsq: Int = 0): (Int,Int) = {
if (start >= xs.length) (sum, sumsq)
else suma(xs, start+1, sum+xs(start), sumsq + xs(start)*xs(start))
}
and written this way it's just as fast as the mutable SSq. But if we instead do this:
def sumb(xs: Array[Int], start: Int = 0, ssq: (Int,Int) = (0,0)): (Int,Int) = {
if (start >= xs.length) ssq
else sumb(xs, start+1, (ssq._1+xs(start), ssq._2 + xs(start)*xs(start)))
}
we're now 10x slower again because we have to create an object on each step.
So the bottom line is that it really only matters that you have immutability when you cannot conveniently carry your updating structure along as independent arguments to a method. Once you go beyond the complexity where that works, mutability can be a big win.
Cumulative Object Creation
If you need to build up a complex object with n fields from potentially faulty data, you can use a builder pattern that looks like so:
abstract class Built {
def x: Int
def y: String
def z: Boolean
}
private class Building extends Built {
var x: Int = _
var y: String = _
var z: Boolean = _
}
def buildFromWhatever: Option[Built] = {
val b = new Building
b.x = something
if (thereIsAProblem) return None
b.y = somethingElse
// check
...
Some(b)
}
This only works with mutable data. There are other options, of course:
class Built(val x: Int = 0, val y: String = "", val z: Boolean = false) {}
def buildFromWhatever: Option[Built] = {
val b0 = new Built
val b1 = b0.copy(x = something)
if (thereIsAProblem) return None
...
Some(b)
}
which in many ways is even cleaner, except you have to copy your object once for each change that you make, which can be painfully slow. And neither of these are particularly bulletproof; for that you'd probably want
class Built(val x: Int, val y: String, val z: Boolean) {}
class Building(
val x: Option[Int] = None, val y: Option[String] = None, val z: Option[Boolean] = None
) {
def build: Option[Built] = for (x0 <- x; y0 <- y; z0 <- z) yield new Built(x,y,z)
}
def buildFromWhatever: Option[Build] = {
val b0 = new Building
val b1 = b0.copy(x = somethingIfNotProblem)
...
bN.build
}
but again, there's lots of overhead.
I've found that imperative / mutable style is better fit for dynamic programming algorithms. If you insist on immutablility, it's harder to program for most people, and you end up using vast amounts of memory and / or overflowing the stack. One example: Dynamic programming in the functional paradigm
Some examples:
(Originally a comment) Any program has to do some input and output (otherwise, it's useless). But by definition, input/output is a side effect and can't be done without calling methods with side effects.
One major advantage of Scala is ability to use Java libraries. Many of them rely on mutable objects and methods with side-effects.
Sometimes you need a var due to scoping. See Temperature4 in this blog post for an example.
Concurrent programming. If you use actors, sending and receiving messages are a side effect; if you use threads, synchronizing on locks is a side effect and locks are mutable; event-driven concurrency is all about side effects; futures, concurrent collections, etc. are mutable.