Documenting Scala functional chains [closed] - scala

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Scala (and functional programming, in general), advocates a style of programming where you produce functional "chains" of the form
collection.operation1(...).operation2(...)...
where the operations are various combinations of map, filter, etc.
Where the equivalent Java code might require 50 lines, the Scala code can be done in 1 or 2 lines. The functional chain can change an input collection to something completely different.
The disadvantage of the Scala code is that 10 minutes later (never mind 6 months later), I can't figure out what I was thinking, because the notation is so compact, and lacks type information (because of implied types).
How do you document this? Do you put a large block comment before the chain, changing an elegant 1 line solution into a bulky 40 line solution consisting of 39 lines of comment? Do you intersperse your comments like this?
collection.
// Select the items that meet condition X
filter(predicate_function).
// Change these items from A's to B's
map(transformation_function).
// etc.
Something else? No documentation? (Leave them guessing. They'll never "downsize" you then, because no one else can maintain the code. :-))

If you find yourself writing comments at that detail level, you're just repeating what the code says.
For long functional chains, define new functions to replace parts of the chain. Give these meaningful names. Then you might be able to avoid comments. The names of these functions themselves should explain what they do.
The best comments are the ones that explain why the code does something. Well-written code should make the "how" obvious from the code itself.

I don't write that code to begin with (unless it's a script for one-time use or playing around in the REPL).
If I can explain what the code does in one comment and the reads okay, then I keep it as a one liner:
// Find all real-valued square roots and group them in integer bins
ds.filter(_ >= 0).map(math.sqrt).groupBy(_.toInt).map(_._2)
If I can't understand this by reading carefully through the chain of commands, then I should break it up more into functionally distinct units. For example, if I expected someone to not realize that the square root of a negative number is not real-valued, I would say:
// Only non-negative numbers have a real-valued square root
val nonneg = ds.filter(_ >= 0)
// Find square roots and group them in integer bins
nonneg.map(math.sqrt).groupBy(_.toInt).map(_._2)
In particular, if someone doesn't know the Scala collections library well, and doesn't have the patience to spend five to ten minutes understanding one line of code, then either they shouldn't be working on my code (nor on anything else that accomplishes something nontrivial that they don't understand and don't have the patience to understand), or I should know in advance that I'm providing an e.g. language and mathematics tutorial in addition to writing working code, either by writing a paragraph explaining how the following line works, or breaking it out command by command, or including comments at the start of each anonymous function explaining what is going on (as appropriate).
Anyway, if you can't understand what it does, you probably need some intermediate values. They are very helpful for mental-resetting ("I can't see how to get from A to C!...but...okay, I can understand A to B. And I can understand B to C.")

If your chained operations are all monadic transforms: map, flatMap, filter, then it's often much, much clearer to rewrite the logic as a for-comprehension.
coll.filter(predicate).map(transform)
could become
for(elem <- coll if predicate) yield transform(elem)
it's even easier to show off the power of the technique if you have a longer sequence of operations, such as with Kassen's example:
def eligibleCustomers(products: Seq[Product]) = for {
product <- products
customer <- product.customers
paying <- customer if customer.isPremium
eligible <- paying if paying.age < 20
} yield eligible

If you don't want to split it in multiple methods as hammar suggested you can split the line and give the intermediate values names (and optionally types).
def eligibleCustomers: List[Customer] = {
val customers = products.flatMap(_.customers)
val paying = customers.filter(_.isPremium)
val eligible = paying.filter(_.age < 20)
eligible
}

The linelength is a somehow natural indicator, when your chain is getting too long. :)
Of course, it will depend upon how trivial the chain is:
customerdata.filter (_.age < 40).filter (_.city == "Rio").
filter (_.income > 3000).filter (_.joined < 2005)
filter (_.sex == 'f'). ...
I recently had your impression, where an application of 3 files, one of them a bit lengthy, consisting of 4 classes, one of them not trivial, and of about 10 to 20 methods. Each method was about 5 to 10 lines, and each 2 of them could have been easily combined to a lager one, but I had to convince myself, that although measuring the elegance in spared lines of codes isn't completely wrong, sparing lines isn't the goal itself.
But splitting a method into two often makes complexity per line lower, but not the overall complexity, to understand the whole program.
If the problem domain is complex - filter data at different levels, rowwise, columnwise, map it, group it, build averages, build graphs, paginate them ... - the complicated job has to be done somewhere.
The program isn't more easy to understand, you just have to hit page down less often. It is a readjustment, that you have to read a line of code more slowly.

It doesn't bother me that much now I'm used to Scala. If you want to be more explicit with types, you can always, for example, replace things like map(_.foo) with map { a:A => a.foo } to make the code more readable in lengthy/complex operations. Not that I usually find the need to do that.

Related

for vs map in functional programming

I am learning functional programming using scala. In general I notice that for loops are not much used in functional programs instead they use map.
Questions
What are the advantages of using map over for loop in terms of performance, readablity etc ?
What is the intention of bringing in a map function when it can be achieved using loop ?
Program 1: Using For loop
val num = 1 to 1000
val another = 1000 to 2000
for ( i <- num )
{
for ( j <- another)
{
println(i,j)
}
}
Program 2 : Using map
val num = 1 to 1000
val another = 1000 to 2000
val mapper = num.map(x => another.map(y => (x,y))).flatten
mapper.map(x=>println(x))
Both program 1 and program 2 does the same thing.
The answer is quite simple actually.
Whenever you use a loop over a collection it has a semantic purpose. Either you want to iterate the items of the collection and print them. Or you want to transform the type of the elements to another type (map). Or you want to change the cardinality, such as computing the sum of the elements of a collection (fold).
Of course, all that can also be done using for - loops but to the reader of the code, it is more work to figure out which semantic purpose the loop has, compared to a well known named operation such as map, iter, fold, filter, ...
Another aspect is, that for loops lead to the dark side of using mutable state. How would you sum the elements of a collection in a for loop without mutable state? You would not. Instead you would need to write a recursive function. So, for good measure, it is best to drop the habit of thinking in for loops early and enjoy the brave new functional way of doing things.
I'll start by quoting Programming in Scala.
"Every for expression can be expressed in terms of the three higher-order functions map, flatMap and filter. This section describes the translation scheme, which is also used by the Scala compiler."
http://www.artima.com/pins1ed/for-expressions-revisited.html#23.4
So the reason that you noticed for-loops are not used as much is because they technically aren't needed, and any for expressions you do see are just syntactic sugar which the compiler will translate into some equivalent. The rules for translating a for expression into a map/flatMap/filter expression are listed in the link above.
Generally speaking, in functional programming there is no index variable to mutate. This means one typically makes heavy use of function calls (often in the form of recursion) such as list folds in place of a while or for loop.
For a good example of using list folds in place of while/for loops, I recommend "Explain List Folds to Yourself" by Tony Morris.
https://vimeo.com/64673035
If a function is tail-recursive (denoted with #tailrec) then it can be optimized so as to not incur the high use of the stack which is common in recursive functions. In this case the compiler can translate the tail-recursive function to the "while loop equivalent".
To answer the second part of Question 1, there are some cases where one could make an argument that a for expression is clearer (although certainly there are cases where the opposite is true too.) One such example is given in the Coursera.org course "Functional Programming with Scala" by Dr. Martin Odersky:
for {
i <- 1 until n
j <- 1 until i
if isPrime(i + j)
} yield (i, j)
is arguably more clear than
(1 until n).flatMap(i =>
(1 until i).withFilter(j => isPrime(i + j))
.map(j => (i, j)))
For more information check out Dr. Martin Odersky's "Functional Programming with Scala" course on Coursera.org. Lecture 6.5 "Translation of For" in particular discusses this in more detail.
Also, as a quick side note, in your example you use
mapper.map(x => println(x))
It is generally more accepted to use foreach in this case because you have the intent of side-effecting. Also, there is short hand
mapper.foreach(println)
As for Question 2, it is better to use the map function in place of loops (especially when there is mutation in the loop) because map is a function and it can be composed. Also, once one is acquainted and used to using map, it is very easy to reason about.
The two programs that you have provided are not the same, even if the output might suggest that they are. It is true that for comprehensions are de-sugared by the compiler, but the first program you have is actually equivalent to:
val num = 1 to 1000
val another = 1000 to 2000
num.foreach(i => another.foreach(j => println(i,j)))
It should be noted that the resultant type for the above (and your example program) is Unit
In the case of your second program, the resultant type of the program is, as determined by the compiler, Seq[Unit] - which is now a Seq that has the length of the product of the loop members. As a result, you should always use foreach to indicate an effect that results in a Unit result.
Think about what is happening at the machine-language level. Loops are still fundamental. Functional programming abstracts the loop that is implemented in conventional programming.
Essentially, instead of writing a loop as you would in conventional or imparitive programming, the use of chaining or pipelining in functional programming allows the compiler to optimize the code for the user, and map is simply mapping the function to each element as a list or collection is iterated through. Functional programming, is more convenient, and abstracts the mundane implementation of "for" loops etc. There are limitations to this convenience, particularly if you intend to use functional programming to implement parallel processing.
It is arguable depending on the Software Engineer or developer, that the compiler will be more efficient and know ahead of time the situation it is implemented in. IMHO, mid-level Software Engineers who are familiar with functional programming, well versed in conventional programming, and knowledgeable in parallel processing, will implement both conventional and functional.

Is there any advantage to avoiding while loops in Scala?

Reading Scala docs written by the experts one can get the impression that tail recursion is better than a while loop, even when the latter is more concise and clearer. This is one example
object Helpers {
implicit class IntWithTimes(val pip:Int) {
// Recursive
def times(f: => Unit):Unit = {
#tailrec
def loop(counter:Int):Unit = {
if (counter >0) { f; loop(counter-1) }
}
loop(pip)
}
// Explicit loop
def :#(f: => Unit) = {
var lc = pip
while (lc > 0) { f; lc -= 1 }
}
}
}
(To be clear, the expert was not addressing looping at all, but in the example they chose to write a loop in this fashion as if by instinct, which is what the raised the question for me: should I develop a similar instinct..)
The only aspect of the while loop that could be better is the iteration variable should be local to the body of the loop, and the mutation of the variable should be in a fixed place, but Scala chooses not to provide that syntax.
Clarity is subjective, but the question is does the (tail) recursive style offer improved performance?
I'm pretty sure that, due to the limitations of the JVM, not every potentially tail-recursive function will be optimised away by the Scala compiler as so, so the short (and sometimes wrong) answer to your question on performance is no.
The long answer to your more general question (having an advantage) is a little more contrived. Note that, by using while, you are in fact:
creating a new variable that holds a counter.
mutating that variable.
Off-by-one errors and the perils of mutability will ensure that, on the long run, you'll introduce bugs with a while pattern. In fact, your times function could easily be implemented as:
def times(f: => Unit) = (1 to pip) foreach f
Which not only is simpler and smaller, but also avoids any creation of transient variables and mutability. In fact, if the type of the function you are calling would be something to which the results matter, then the while construction would start to be even more difficult to read. Please attempt to implement the following using nothing but whiles:
def replicate(l: List[Int])(times: Int) = l.flatMap(x => List.fill(times)(x))
Then proceed to define a tail-recursive function that does the same.
UPDATE:
I hear you saying: "hey! that's cheating! foreach is neither a while nor a tail-rec call". Oh really? Take a look into Scala's definition of foreach for Lists:
def foreach[B](f: A => B) {
var these = this
while (!these.isEmpty) {
f(these.head)
these = these.tail
}
}
If you want to learn more about recursion in Scala, take a look at this blog post. Once you are into functional programming, go crazy and read Rúnar's blog post. Even more info here and here.
In general, a directly tail recursive function (i.e., one that always calls itself directly and cannot be overridden) will always be optimized into a while loop by the compiler. You can use the #tailrec annotation to verify that the compiler is able to do this for a particular function.
As a general rule, any tail recursive function can be rewritten (usually automatically by the compiler) as a while loop and vice versa.
The purpose of writing functions in a (tail) recursive style is not to maximize performance or even conciseness, but to make the intent of the code as clear as possible, while simultaneously minimizing the chance of introducing bugs (by eliminating mutable variables, which generally make it harder to keep track of what the "inputs" and "outputs" of the function are). A properly written recursive function consists of a series of checks for terminating conditions (using either cascading if-else or a pattern match) with the recursive call(s) (plural only if not tail recursive) made if none of the terminating conditions are met.
The benefit of using recursion is most dramatic when there are several different possible terminating conditions. A series of if conditionals or patterns is generally much easier to comprehend than a single while condition with a whole bunch of (potentially complex and inter-related) boolean expressions &&'d together, especially if the return value needs to be different depending on which terminating condition is met.
Did these experts say that performance was the reason? I'm betting their reasons are more to do with expressive code and functional programming. Could you cite examples of their arguments?
One interesting reason why recursive solutions can be more efficient than more imperative alternatives is that they very often operate on lists and in a way that uses only head and tail operations. These operations are actually faster than random-access operations on more complex collections.
Anther reason that while-based solutions may be less efficient is that they can become very ugly as the complexity of the problem increases...
(I have to say, at this point, that your example is not a good one, since neither of your loops do anything useful. Your recursive loop is particularly atypical since it returns nothing, which implies that you are missing a major point about recursive functions. The functional bit. A recursive function is much more than another way of repeating the same operation n times.)
While loops do not return a value and require side effects to achieve anything. It is a control structure which only works at all for very simple tasks. This is because each iteration of the loop has to examine all of the state to decide what to next. The loops boolean expression may also have to be come very complex if there are multiple potential exit paths (or that complexity has to be distributed throughout the code in the loop, which can be ugly and obfuscatory).
Recursive functions offer the possibility of a much cleaner implementation. A good recursive solution breaks a complex problem down in to simpler parts, then delegates each part on to another function which can deal with it - the trick being that that other function is itself (or possibly a mutually recursive function, though that is rarely seen in Scala - unlike the various Lisp dialects, where it is common - because of the poor tail recursion support). The recursively called function receives in its parameters only the simpler subset of data and only the relevant state; it returns only the solution to the simpler problem. So, in contrast to the while loop,
Each iteration of the function only has to deal with a simple subset of the problem
Each iteration only cares about its inputs, not the overall state
Sucess in each subtask is clearly defined by the return value of the call that handled it.
State from different subtasks cannot become entangled (since it is hidden within each recursive function call).
Multiple exit points, if they exist, are much easier to represent clearly.
Given these advantages, recursion can make it easier to achieve an efficient solution. Especially if you count maintainability as an important factor in long-term efficiency.
I'm going to go find some good examples of code to add. Meanwhile, at this point I always recommend The Little Schemer. I would go on about why but this is the second Scala recursion question on this site in two days, so look at my previous answer instead.

Is recursion in scala very necessary?

In the coursera scala tutorial, most examples are using top-down iterations. Partially, as I can see, iterations are used to avoid for/while loops. I'm from C++ and feel a little confused about this.
Is iteration chosen over for/while loops? Is it practical in production? Any risk of stackoverflow? How about efficiency? How about bottom up Dynamic Programming (especially when they are not tail-recusions)?
Also, should I use less "if" conditions, instead use more "case" and subclasses?
Truly high-quality Scala will use very little iteration and only slightly more recursion. What would be done with looping in lower-level imperative languages is usually best done with higher-order combinators, map and flatmap most especially, but also filter, zip, fold, foreach, reduce, collect, partition, scan, groupBy, and a good few others. Iteration is best done only in performance critical sections, and recursion done only in a some deep edge cases where the higher-order combinators don't quite fit (which usually aren't tail recursive, fwiw). In three years of coding Scala in production systems, I used iteration once, recursion twice, and map about five times per day.
Hmm, several questions in one.
Necessity of Recursion
Recursion is not necessary, but it can sometimes provide a very elegant solution.
If the solution is tail recursive and the compiler supports tail call optimisation, then the solution can even be efficient.
As has been well said already, Scala has many combinator functions which can be used to perform the same tasks more expressively and efficiently.
One classic example is writing a function to return the nth Fibonacci number. Here's a naive recursive implementation:
def fib (n: Long): Long = n match {
case 0 | 1 => n
case _ => fib( n - 2) + fib( n - 1 )
}
Now, this is inefficient (definitely not tail recursive) but it is very obvious how its structure relates to the Fibonacci sequence. We can make it properly tail recursive, though:
def fib (n: Long): Long = {
def fibloop(current: Long, next: => Long, iteration: Long): Long = {
if (n == iteration)
current
else
fibloop(next, current + next, iteration + 1)
}
fibloop(0, 1, 0)
}
That could have been written more tersely, but it is an efficient recursive implementation. That said, it is not as pretty as the first and it's structure is less clearly related to the original problem.
Finally, stolen shamelessly from elsewhere on this site is Luigi Plinge's streams-based implementation:
val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _)
Very terse, efficient, elegant and (if you understand streams and lazy evaluation) very expressive. It is also, in fact, recursive; #:: is a recursive function, but one that operates in a lazily-evaluated context. You certainly have to be able to think recursively to come up with this kind of solution.
Iteration compared to For/While loops
I'm assuming you mean the traditional C-Style for, here.
Recursive solutions can often be preferable to while loops because C/C++/Java-style while loops do not return a value and require side effects to achieve anything (this is also true for C-Style for and Java-style foreach). Frankly, I often wish Scala had never implemented while (or had implemented it as syntactic sugar for something like Scheme's named let), because it allows classically-trained Java developers to keep doing things the way they always did. There are situations where a loop with side effects, which is what while gives you, is a more expressive way of achieving something but I had rather Java-fixated devs were forced to reach a little harder for it (e.g. by abusing a for comprehension).
Simply, traditional while and for make clunky imperative coding much too easy. If you don't care about that, why are you using Scala?
Efficiency and risk of Stackoverflow
Tail optimisation eliminates the risk of stackoverflow. Rewriting recursive solutions to be properly tail recursive can make them very ugly (particularly in any language running on the JVM).
Recursive solutions can be more efficient than more imperative solutions, sometimes suprisingly so. One reason is that they often operate on lists, in a way that only involves head and tail access. Head and tail operations on lists are actually faster than random access operations on more structured collections.
Dynamic Programming
A good recursive algorithm typically reduces a complex problem to a small set of simpler problems, picks one to solve and delegates the rest to another function (usually a recursive call to itself). Now, to me this sounds like a great fit for dynamic programming. Certainly, if I am trying a recursive approach to a problem, I often start with a naive solution which I know can't solve every case, see where it fails, add that pattern to the solution and iterate towards success.
The Little Schemer has many examples of this iterative approach to recursive programming, particularly because it re-uses earlier solutions as sub-components for later, more complex ones. I would say it is the epitome of the Dynamic Programming approach. (It is also one of the best-written educational books about software ever produced). I can recommend it, not least because it teaches you Scheme at the same time. If you really don't want to learn Scheme (why? why would you not?), it has been adapted for a few other languages
If versus Match
if expressions, in Scala, return values (which is very useful and why Scala has no need for a ternary operator). There is no reason to avoid simple
if (something)
// do something
else
// do something else
expressions. The principle reason to match instead of a simple if...else is to use the power of case statements to extract information from complex objects. Here is one example.
On the other hand, if...else if...else if...else is a terrible pattern
There's no easy way to see if you covered all the possibilities properly, even with a final else in place.
Unintentionally nested if expressions are hard to spot
It is too easy to link unrelated conditions together (accidentally or through bone-headed design)
Wherever you find you have written else if, look for an alternative. match is a good place to start.
I'm assuming that, since you say "recursion" in your title, you also mean "recursion" in your question, and not "iteration" (which cannot be chosen "over for/while loops", because those are iterative :D).
You might be interested in reading Effective Scala, especially the section on control structures, which should mostly answer your question. In short:
Recursion isn't "better" than iteration. Often it is easier to write a recursive algorithm for a given problem, then it is to write an iterative algorithm (of course there are cases where the opposite applies). When "tail call optimization" can be applied to a problem, the compiler actually converts it to an iterative algorithm, thus making it impossible for a StackOverflow to happen, and without performance impact. You can read about tail call optimization in Effective Scala, too.
The main problem with your question is that it is very broad. There are many many resources available on functional programming, idiomatic scala, dynamic programming and so on, and no answer here on Stack Overflow would be able to cover all those topics. It'd be probably a good idea to just roam the interwebs for a while, and then come back with more concrete questions :)
One of the main benefits of recursion is that it lets you create solutions without mutation. for following example, you have to calculate the sum of all the elements of a List.
One of the many ways to solve this problem is as below. The imperative solution to this problem uses for loop as shown:
scala> var total = 0
scala> for(f <- List(1,2,3)) { total += f }
And recursion solution would look like following:
def total(xs: List[Int]): Int = xs match {
case Nil => 0
case x :: ys => x + total(ys)
}
The difference is that a recursive solution doesn’t use any mutable temporary variables by letting you break the problem into smaller pieces. Because Functional programming is all about writing side effect free programs it's always encourage to use recursion vs loops (that use mutating variables).
Head recursion is a traditional way of doing recursion, where you perform the recursive call first and then take the return value from the recursive function and calculate the result.
Generally when you call a function an entry is added to the call stack of a currently running thread. The downside is that the call stack has a defined size so quickly you may get StackOverflowError exception. This is why Java prefers to iterate rather than recurse. Because Scala runs on the JVM, Scala also suffers from this problem. But starting with Scala 2.8.1, Scala gets away this limitation by doing tail call optimization. you can do tail recursion in Scala.
To recap recursion is preferred way in functional programming to avoid using mutation and secondly tail recursion is supported in Scala so you don't get into StackOverFlow exceptions which you get in Java.
Hope this helps.
As for stack overflow, a lot of the time you can get away with it because of tail call elimination.
The reason scala and other function paradigms avoid for/while loops they are highly dependent on state and time. That makes it much harder to reason about complex "loops" in a formal and precise manor.

Is it possible to have a compiler which optimizes a = func(a)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Say I have an object of type A. Consider this case for any function of the type A -> A (i.e. takes object of type A and returns another object of type A):
foo = func(foo)
Here, the simplest case would be to for the result of func(foo) to be copied into foo.
Is it possible to optimize this so that:
foo gets modified inplace in func
There are no constraints on the language used. What I want to know is what constraints and properties the language must have to enable such an optimization. Are there any existing languages which perform such an optimization?
Example(in pseudo code):
type Matrix = List<List<int>>
Matrix rotate90Deg(Matrix x):
Matrix result(x.columns, x.rows) #Assume it has a constructor which takes as args the num of rows, and num of cols.
for (int i = 0; i < x.rows; i++):
for (int j = 0; j < x.columns; j++):
result[i][j] = x[j][i]
return result
Matrix a = [[1,2,3],[4,5,6],[7,8,9]]
a = rotate90Deg(a)
Here, is it possible to optimize the code so that it doesn't allocate memory for a new matrix(result), and instead just modifies the original matrix passed.
First of all, you have to realize that some operations are inherently not possible to be computed in-place. Matrix-matrix multiplication is an example of this, and rotate90Deg would fall under this category since such an operation is actually a matrix multiplication by the appropriate multiplication matrix.
Now as for your example, you actually coded up a matrix transpose function. Matrix transpose can be done in-place since you are swapping pairs of numbers, but I doubt that any compilers can automatically detect this and optimize it for you. Indeed, there are many, many tricks that one can do to optimize matrix transpose in order to be cache-friendly in order to gain huge performance increases. Nevertheless, with an naive implementation, you will almost certainly end up with something very similar to what Aditya Kumar describes in his answer.
As I have foreshadowed by using the word "naive" earlier, programmers can coax the compiler to inline lots and lots of things in extremely optimized ways through advanced templating and other meta-programming techniques. (At least in C++, and maybe other languages that allow you to overload operator =.) For anyone interested in a case study of how this is done and what is involved, take a look at the Eigen matrix library, and how it handles a simple operation like u = v + w; where the three variables are all matrices of floats. Following is a brief overview of the key points.
A naive implementation would overload operator+ to return a temporary and operator= to copy that temporary to the result. Of course, in C++11 it is pretty easy to avoid the final copy during assignment by way of move constructors, but you will still have unnecessary temporaries if you had something a little more complex with multiple operators on the right hand side like u = 3.15f * u.transposed() + 5.0f; since each operator/method would return a temporary, and that temporary would have to be looped over in order to process the next operator.
Long story short, what Eigen does is rather than perform each operation when the corresponding function call occurs, the calls return a templated functor of sorts which merely describes the operation that needs to take place, and all the actual work ends up happening in operator =, thus enabling the compiler to emit a single, inlined loop for traversing the data only once and doing the operation truly in-place.
Yes it is possible, and this optimization is provided by at least C++11 (inlining).
To explain the optimization a little bit.
e.g.
foo_t foo;
foo = func(foo); // #1
foo_t func(foo_t foo1) {
foo_t new_foo;
// operate on new_foo by using foo1
return new_foo;
}
There are three instances of foo_t being made:
foo is copied and passed as foo1 to func
new_foo is created.
new_foo is assigned to foo by copying the contents of new_foo into foo;
All the three copies can be eliminated provided there are some invariants.
foo (the argument to be passed to function is never used later with the same original value. This is equivalent to saying that foo is 'dead' at line #1. This is established here as foo is reassigned.
the scope of object new_foo in function func has its lifetime that does not extend the life of function func. This is also established here as the way new_foo is created, it will be on stack and the lifetime of objects in stack is the same as the lifetime of the function in which the object was created.
In C++ it can be achieved using inlining the function func. After inlining, the code basically will look like this.
`foo_t foo;`
`foo_t new_foo;`
`// operate on new_foo by using foo`
`foo = new_foo;`
Although, C++ provides inlining as a language feature but almost any optimizing compiler do inlining these days.
Now it depends on what kind of operation you perform on new_foo and foo whether this extra new_foo will be optimized away or not. For some data types it is trivial (the compiler can do a 'copy-propagation' followed by 'dead-code elimination' to remove new_foo completely.

Why do immutable objects enable functional programming?

I'm trying to learn scala and I'm unable to grasp this concept. Why does making an object immutable help prevent side-effects in functions. Can anyone explain like I'm five?
Interesting question, a bit difficult to answer.
Functional programming is very much about using mathematics to reason about programs. To do so, one needs a formalism that describe the programs and how one can make proofs about properties they might have.
There are many models of computation that provide such formalisms, such as lambda calculus and turing machines. And there's a certain degree of equivalency between them (see this question, for a discussion).
In a very real sense, programs with mutability and some other side effects have a direct mapping to functional program. Consider this example:
a = 0
b = 1
a = a + b
Here are two ways of mapping it to functional program. First one, a and b are part of a "state", and each line is a function from a state into a new state:
state1 = (a = 0, b = ?)
state2 = (a = state1.a, b = 1)
state3 = (a = state2.a + state2.b, b = state2.b)
Here's another, where each variable is associated with a time:
(a, t0) = 0
(b, t1) = 1
(a, t2) = (a, t0) + (b, t1)
So, given the above, why not use mutability?
Well, here's the interesting thing about math: the less powerful the formalism is, the easier it is to make proofs with it. Or, to put it in other words, it's too hard to reason about programs that have mutability.
As a consequence, there's very little advance regarding concepts in programming with mutability. The famous Design Patterns were not arrived at through study, nor do they have any mathematical backing. Instead, they are the result of years and years of trial and error, and some of them have since proved to be misguided. Who knows about the other dozens "design patterns" seen everywhere?
Meanwhile, Haskell programmers came up with Functors, Monads, Co-monads, Zippers, Applicatives, Lenses... dozens of concepts with mathematical backing and, most importantly, actual patterns of how code is composed to make up programs. Things you can use to reason about your program, increase reusability and improve correctness. Take a look at the Typeclassopedia for examples.
It's no wonder people not familiar with functional programming get a bit scared with this stuff... by comparison, the rest of the programming world is still working with a few decades-old concepts. The very idea of new concepts is alien.
Unfortunately, all these patterns, all these concepts, only apply with the code they are working with does not contain mutability (or other side effects). If it does, then their properties cease to be valid, and you can't rely on them. You are back to guessing, testing and debugging.
In short, if a function mutates an object then it has side effects. Mutation is a side effect. This is just true by definition.
In truth, in a purely functional language it should not matter if an object is technically mutable or immutable, because the language will never "try" to mutate an object anyway. A pure functional language doesn't give you any way to perform side effects.
Scala is not a pure functional language, though, and it runs in the Java environment in which side effects are very popular. In this environment, using objects that are incapable of mutation encourages you to use a pure functional style because it makes a side-effect oriented style impossible. You are using data types to enforce purity because the language does not do it for you.
Now I will say a bunch of other stuff in the hope that it helps this make sense to you.
Fundamental to the concept of a variable in functional languages is referential transparency.
Referential transparency means that there is no difference between a value, and a reference to that value. In a language where this is true, it makes it much simpler to think about a program works, since you never have to stop and ask, is this a value, or a reference to a value? Anyone who's ever programmed in C recognizes that a great part of the challenge of learning that paradigm is knowing which is which at all times.
In order to have referential transparency, the value that a reference refers to can never change.
(Warning, I'm about to make an analogy.)
Think of it this way: in your cell phone, you have saved some phone numbers of other people's cell phones. You assume that whenever you call that phone number, you will reach the person you intend to talk to. If someone else wants to talk to your friend, you give them the phone number and they reach that same person.
If someone changes their cell phone number, this system breaks down. Suddenly, you need to get their new phone number if you want to reach them. Maybe you call the same number six months later and reach a different person. Calling the same number and reaching a different person is what happens when functions perform side effects: you have what seems to be the same thing, but you try to use it, it turns out it's different now. Even if you expected this, what about all the people you gave that number to, are you going to call them all up and tell them that the old number doesn't reach the same person anymore?
You counted on the phone number corresponding to that person, but it didn't really. The phone number system lacks referential transparency: the number isn't really ALWAYS the same as the person.
Functional languages avoid this problem. You can give out your phone number and people will always be able to reach you, for the rest of your life, and will never reach anybody else at that number.
However, in the Java platform, things can change. What you thought was one thing, might turn into another thing a minute later. If this is the case, how can you stop it?
Scala uses the power of types to prevent this, by making classes that have referential transparency. So, even though the language as a whole isn't referentially transparent, your code will be referentially transparent as long as you use immutable types.
Practically speaking, the advantages of coding with immutable types are:
Your code is simpler to read when the reader doesn't have to look out for surprising side effects.
If you use multiple threads, you don't have to worry about locking because shared objects can never change. When you have side effects, you have to really think through the code and figure out all the places where two threads might try to change the same object at the same time, and protect against the problems that this might cause.
Theoretically, at least, the compiler can optimize some code better if it uses only immutable types. I don't know if Java can do this effectively, though, since it allows side effects. This is a toss-up at best, anyway, because there are some problems that can be solved much more efficiently by using side effects.
I'm running with this 5 year old explanation:
class Account(var myMoney:List[Int] = List(10, 10, 1, 1, 1, 5)) {
def getBalance = println(myMoney.sum + " dollars available")
def myMoneyWithInterest = {
myMoney = myMoney.map(_ * 2)
println(myMoney.sum + " dollars will accru in 1 year")
}
}
Assume we are at an ATM and it is using this code to give us account information.
You do the following:
scala> val myAccount = new Account()
myAccount: Account = Account#7f4a6c40
scala> myAccount.getBalance
28 dollars available
scala> myAccount.myMoneyWithInterest
56 dollars will accru in 1 year
scala> myAccount.getBalance
56 dollars available
We mutated the account balance when we wanted to check our current balance plus a years worth of interest. Now we have an incorrect account balance. Bad news for the bank!
If we were using val instead of var to keep track of myMoney in the class definition, we would not have been able to mutate the dollars and raise our balance.
When defining the class (in the REPL) with val:
error: reassignment to val
myMoney = myMoney.map(_ * 2
Scala is telling us that we wanted an immutable value but are trying to change it!
Thanks to Scala, we can switch to val, re-write our myMoneyWithInterest method and rest assured that our Account class will never alter the balance.
One important property of functional programming is: If I call the same function twice with the same arguments I'll get the same result. This makes reasoning about code much easier in many cases.
Now imagine a function returning the attribute content of some object. If that content can change the function might return different results on different calls with the same argument. => no more functional programming.
First a few definitions:
A side effect is a change in state -- also called a mutation.
An immutable object is an object which does not support mutation, (side effects).
A function which is passed mutable objects (either as parameters or in the global environment) may or may not produce side effects. This is up to the implementation.
However, it is impossible for a function which is passed only immutable objects (either as parameters or in the global environment) to produce side effects. Therefore, exclusive use of immutable objects will preclude the possibility of side effects.
Nate's answer is great, and here is some example.
In functional programming, there is an important feature that when you call a function with same argument, you always get same return value.
This is always true for immutable objects, because you can't modify them after create it:
class MyValue(val value: Int)
def plus(x: MyValue) = x.value + 10
val x = new MyValue(10)
val y = plus(x) // y is 20
val z = plus(x) // z is still 20, plus(x) will always yield 20
But if you have mutable objects, you can't guarantee that plus(x) will always return same value for same instance of MyValue.
class MyValue(var value: Int)
def plus(x: MyValue) = x.value + 10
val x = new MyValue(10)
val y = plus(x) // y is 20
x.value = 30
val z = plus(x) // z is 40, you can't for sure what value will plus(x) return because MyValue.value may be changed at any point.
Why do immutable objects enable functional programming?
They don't.
Take one definition of "function," or "prodecure," "routine" or "method," which I believe applies to many programming languages: "A section of code, typically named, accepting arguments and/or returning a value."
Take one definition of "functional programming:" "Programming using functions." The ability to program with functions is indepedent of whether state is modified.
For instance, Scheme is considered a functional programming language. It features tail calls, higher-order functions and aggregate operations using functions. It also has mutable objects. While mutability destroys some nice mathematical qualities, it does not necessarily prevent "functional programming."
I've read all the answers and they don't satisfy me, because they mostly talk about "immutability", and not about its relation to FP.
The main question is:
Why do immutable objects enable functional programming?
So I've searched a bit more and I have another answer, I believe the easy answer to this question is: "Because Functional Programming is basically defined on the basis of functions that are easy to reason about". Here's the definition of Functional Programming:
The process of building software by composing pure functions.
If a function is not pure -- which means receiving the same input, it's not guaranteed to always produce the same output (e.g., if the function relies on a global object, or date and time, or a random number to compute the output) -- then that function is unpredictable, that's it! Now exactly the same story goes about "immutability" as well, if objects are not immutable, a function with the same object as its input may have different results (aka side effects) each time used, and this will make it hard to reason about the program.
I first tried to put this in a comment, but it got longer than the limit, I'm by no means a pro so please take this answer with a grain of salt.