Scala's for-comprehensions: vital feature or syntactic sugar? - scala

When I first started looking at Scala, I liked the look of for-comprehensions. They seemed to be a bit like the foreach loops I was used to from Java 5, but with functional restrictions and a lot of sweet syntactic niceness.
But as I've absorbed the Scala style, I find that every time I could use a for-comprension I'm using map, flatMap, filter, reduce and foreach instead. The intention of the code seems clearer to me that way, with fewer potential hidden surprises, and they're usually shorter code too.
As far as I'm aware, for-comprehensions are always compiled down into these methods anyway, so I'm wondering: what are they actually for? Am I missing some functional revalation (it wouldn't be the first time)? Do for-comprehensions do something the other features can't, or would at least be much clumsier at? Do they shine under a particular use case? Is it really just a matter of personal taste?

Please refer to this question. The short answer is that for-comprehensions can be more readable. In particular, if you have many nested generators, the actual scope of what you are doing becomes more clear, and you don't need huge indents.

Another great use of for-comprehension is for internal DSL. ScalaQL is a great example of this. It can turn this
val underAge = for {
p <- Person
c <- Company
if p.company is c
if p.age < 14
} yield p
into this
SELECT p.* FROM people p JOIN companies c ON p.company_id = c.id WHERE p.age < 14
and a whole lot more.

for-comprehensions are syntactic sugar, but that doesn't mean they aren't vital. They are usually more concise than their expanded form, which is nice, but perhaps more importantly they help programmers from imperative languages use functional constructs.
When I first started with Scala I used for-comprehensions a lot, because they were familiar. Then I almost stopped completely, because I felt like using the underlying methods was more explicit and therefore clearer. Now I'm back to using for-comprehensions because I think they better express the intent of what I'm doing rather than the means of doing it.

In some cases, for comprehensions can express intent better, so when they do, use them.
Also note that with for comprehensions, you get pattern matching for free. For example, iterating over a Map is much simpler with for comprehension:
for ((key, value) <- map) println (key + "-->" + value)
than with foreach:
map foreach { case (key, value) => println (key + "-->" + value) }

You are right. The for-comprehension is syntactic sugar. I believe the underlying methods are more succinct and easier to read, once you are used to them.
Compare the following equivalent statements:
1. for (i <- 1 to 100; if (i % 3 == 0)) yield Math.pow(i, 2)
2. (1 to 100).filter(_ % 3 == 0).map(Math.pow(_, 2))
In my opinion, the addition of the semicolon in #1 distracts from the sense that this is a single chained statement. There is also a sense that i is a var (is it 1, or is it 99, or something inbetween?) which detracts from an otherwise functional style.
Option 2 is more evidently a chain of method calls on objects. Each link in the chain clearly stating its responsibility. There are no intermediate variables.
Perhaps for comprehensions are included as a convenience for developers transitioning from Java. Regardless, which is chosen is a matter of style and preference.

Related

for vs map in functional programming

I am learning functional programming using scala. In general I notice that for loops are not much used in functional programs instead they use map.
Questions
What are the advantages of using map over for loop in terms of performance, readablity etc ?
What is the intention of bringing in a map function when it can be achieved using loop ?
Program 1: Using For loop
val num = 1 to 1000
val another = 1000 to 2000
for ( i <- num )
{
for ( j <- another)
{
println(i,j)
}
}
Program 2 : Using map
val num = 1 to 1000
val another = 1000 to 2000
val mapper = num.map(x => another.map(y => (x,y))).flatten
mapper.map(x=>println(x))
Both program 1 and program 2 does the same thing.
The answer is quite simple actually.
Whenever you use a loop over a collection it has a semantic purpose. Either you want to iterate the items of the collection and print them. Or you want to transform the type of the elements to another type (map). Or you want to change the cardinality, such as computing the sum of the elements of a collection (fold).
Of course, all that can also be done using for - loops but to the reader of the code, it is more work to figure out which semantic purpose the loop has, compared to a well known named operation such as map, iter, fold, filter, ...
Another aspect is, that for loops lead to the dark side of using mutable state. How would you sum the elements of a collection in a for loop without mutable state? You would not. Instead you would need to write a recursive function. So, for good measure, it is best to drop the habit of thinking in for loops early and enjoy the brave new functional way of doing things.
I'll start by quoting Programming in Scala.
"Every for expression can be expressed in terms of the three higher-order functions map, flatMap and filter. This section describes the translation scheme, which is also used by the Scala compiler."
http://www.artima.com/pins1ed/for-expressions-revisited.html#23.4
So the reason that you noticed for-loops are not used as much is because they technically aren't needed, and any for expressions you do see are just syntactic sugar which the compiler will translate into some equivalent. The rules for translating a for expression into a map/flatMap/filter expression are listed in the link above.
Generally speaking, in functional programming there is no index variable to mutate. This means one typically makes heavy use of function calls (often in the form of recursion) such as list folds in place of a while or for loop.
For a good example of using list folds in place of while/for loops, I recommend "Explain List Folds to Yourself" by Tony Morris.
https://vimeo.com/64673035
If a function is tail-recursive (denoted with #tailrec) then it can be optimized so as to not incur the high use of the stack which is common in recursive functions. In this case the compiler can translate the tail-recursive function to the "while loop equivalent".
To answer the second part of Question 1, there are some cases where one could make an argument that a for expression is clearer (although certainly there are cases where the opposite is true too.) One such example is given in the Coursera.org course "Functional Programming with Scala" by Dr. Martin Odersky:
for {
i <- 1 until n
j <- 1 until i
if isPrime(i + j)
} yield (i, j)
is arguably more clear than
(1 until n).flatMap(i =>
(1 until i).withFilter(j => isPrime(i + j))
.map(j => (i, j)))
For more information check out Dr. Martin Odersky's "Functional Programming with Scala" course on Coursera.org. Lecture 6.5 "Translation of For" in particular discusses this in more detail.
Also, as a quick side note, in your example you use
mapper.map(x => println(x))
It is generally more accepted to use foreach in this case because you have the intent of side-effecting. Also, there is short hand
mapper.foreach(println)
As for Question 2, it is better to use the map function in place of loops (especially when there is mutation in the loop) because map is a function and it can be composed. Also, once one is acquainted and used to using map, it is very easy to reason about.
The two programs that you have provided are not the same, even if the output might suggest that they are. It is true that for comprehensions are de-sugared by the compiler, but the first program you have is actually equivalent to:
val num = 1 to 1000
val another = 1000 to 2000
num.foreach(i => another.foreach(j => println(i,j)))
It should be noted that the resultant type for the above (and your example program) is Unit
In the case of your second program, the resultant type of the program is, as determined by the compiler, Seq[Unit] - which is now a Seq that has the length of the product of the loop members. As a result, you should always use foreach to indicate an effect that results in a Unit result.
Think about what is happening at the machine-language level. Loops are still fundamental. Functional programming abstracts the loop that is implemented in conventional programming.
Essentially, instead of writing a loop as you would in conventional or imparitive programming, the use of chaining or pipelining in functional programming allows the compiler to optimize the code for the user, and map is simply mapping the function to each element as a list or collection is iterated through. Functional programming, is more convenient, and abstracts the mundane implementation of "for" loops etc. There are limitations to this convenience, particularly if you intend to use functional programming to implement parallel processing.
It is arguable depending on the Software Engineer or developer, that the compiler will be more efficient and know ahead of time the situation it is implemented in. IMHO, mid-level Software Engineers who are familiar with functional programming, well versed in conventional programming, and knowledgeable in parallel processing, will implement both conventional and functional.

Is recursion in scala very necessary?

In the coursera scala tutorial, most examples are using top-down iterations. Partially, as I can see, iterations are used to avoid for/while loops. I'm from C++ and feel a little confused about this.
Is iteration chosen over for/while loops? Is it practical in production? Any risk of stackoverflow? How about efficiency? How about bottom up Dynamic Programming (especially when they are not tail-recusions)?
Also, should I use less "if" conditions, instead use more "case" and subclasses?
Truly high-quality Scala will use very little iteration and only slightly more recursion. What would be done with looping in lower-level imperative languages is usually best done with higher-order combinators, map and flatmap most especially, but also filter, zip, fold, foreach, reduce, collect, partition, scan, groupBy, and a good few others. Iteration is best done only in performance critical sections, and recursion done only in a some deep edge cases where the higher-order combinators don't quite fit (which usually aren't tail recursive, fwiw). In three years of coding Scala in production systems, I used iteration once, recursion twice, and map about five times per day.
Hmm, several questions in one.
Necessity of Recursion
Recursion is not necessary, but it can sometimes provide a very elegant solution.
If the solution is tail recursive and the compiler supports tail call optimisation, then the solution can even be efficient.
As has been well said already, Scala has many combinator functions which can be used to perform the same tasks more expressively and efficiently.
One classic example is writing a function to return the nth Fibonacci number. Here's a naive recursive implementation:
def fib (n: Long): Long = n match {
case 0 | 1 => n
case _ => fib( n - 2) + fib( n - 1 )
}
Now, this is inefficient (definitely not tail recursive) but it is very obvious how its structure relates to the Fibonacci sequence. We can make it properly tail recursive, though:
def fib (n: Long): Long = {
def fibloop(current: Long, next: => Long, iteration: Long): Long = {
if (n == iteration)
current
else
fibloop(next, current + next, iteration + 1)
}
fibloop(0, 1, 0)
}
That could have been written more tersely, but it is an efficient recursive implementation. That said, it is not as pretty as the first and it's structure is less clearly related to the original problem.
Finally, stolen shamelessly from elsewhere on this site is Luigi Plinge's streams-based implementation:
val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _)
Very terse, efficient, elegant and (if you understand streams and lazy evaluation) very expressive. It is also, in fact, recursive; #:: is a recursive function, but one that operates in a lazily-evaluated context. You certainly have to be able to think recursively to come up with this kind of solution.
Iteration compared to For/While loops
I'm assuming you mean the traditional C-Style for, here.
Recursive solutions can often be preferable to while loops because C/C++/Java-style while loops do not return a value and require side effects to achieve anything (this is also true for C-Style for and Java-style foreach). Frankly, I often wish Scala had never implemented while (or had implemented it as syntactic sugar for something like Scheme's named let), because it allows classically-trained Java developers to keep doing things the way they always did. There are situations where a loop with side effects, which is what while gives you, is a more expressive way of achieving something but I had rather Java-fixated devs were forced to reach a little harder for it (e.g. by abusing a for comprehension).
Simply, traditional while and for make clunky imperative coding much too easy. If you don't care about that, why are you using Scala?
Efficiency and risk of Stackoverflow
Tail optimisation eliminates the risk of stackoverflow. Rewriting recursive solutions to be properly tail recursive can make them very ugly (particularly in any language running on the JVM).
Recursive solutions can be more efficient than more imperative solutions, sometimes suprisingly so. One reason is that they often operate on lists, in a way that only involves head and tail access. Head and tail operations on lists are actually faster than random access operations on more structured collections.
Dynamic Programming
A good recursive algorithm typically reduces a complex problem to a small set of simpler problems, picks one to solve and delegates the rest to another function (usually a recursive call to itself). Now, to me this sounds like a great fit for dynamic programming. Certainly, if I am trying a recursive approach to a problem, I often start with a naive solution which I know can't solve every case, see where it fails, add that pattern to the solution and iterate towards success.
The Little Schemer has many examples of this iterative approach to recursive programming, particularly because it re-uses earlier solutions as sub-components for later, more complex ones. I would say it is the epitome of the Dynamic Programming approach. (It is also one of the best-written educational books about software ever produced). I can recommend it, not least because it teaches you Scheme at the same time. If you really don't want to learn Scheme (why? why would you not?), it has been adapted for a few other languages
If versus Match
if expressions, in Scala, return values (which is very useful and why Scala has no need for a ternary operator). There is no reason to avoid simple
if (something)
// do something
else
// do something else
expressions. The principle reason to match instead of a simple if...else is to use the power of case statements to extract information from complex objects. Here is one example.
On the other hand, if...else if...else if...else is a terrible pattern
There's no easy way to see if you covered all the possibilities properly, even with a final else in place.
Unintentionally nested if expressions are hard to spot
It is too easy to link unrelated conditions together (accidentally or through bone-headed design)
Wherever you find you have written else if, look for an alternative. match is a good place to start.
I'm assuming that, since you say "recursion" in your title, you also mean "recursion" in your question, and not "iteration" (which cannot be chosen "over for/while loops", because those are iterative :D).
You might be interested in reading Effective Scala, especially the section on control structures, which should mostly answer your question. In short:
Recursion isn't "better" than iteration. Often it is easier to write a recursive algorithm for a given problem, then it is to write an iterative algorithm (of course there are cases where the opposite applies). When "tail call optimization" can be applied to a problem, the compiler actually converts it to an iterative algorithm, thus making it impossible for a StackOverflow to happen, and without performance impact. You can read about tail call optimization in Effective Scala, too.
The main problem with your question is that it is very broad. There are many many resources available on functional programming, idiomatic scala, dynamic programming and so on, and no answer here on Stack Overflow would be able to cover all those topics. It'd be probably a good idea to just roam the interwebs for a while, and then come back with more concrete questions :)
One of the main benefits of recursion is that it lets you create solutions without mutation. for following example, you have to calculate the sum of all the elements of a List.
One of the many ways to solve this problem is as below. The imperative solution to this problem uses for loop as shown:
scala> var total = 0
scala> for(f <- List(1,2,3)) { total += f }
And recursion solution would look like following:
def total(xs: List[Int]): Int = xs match {
case Nil => 0
case x :: ys => x + total(ys)
}
The difference is that a recursive solution doesn’t use any mutable temporary variables by letting you break the problem into smaller pieces. Because Functional programming is all about writing side effect free programs it's always encourage to use recursion vs loops (that use mutating variables).
Head recursion is a traditional way of doing recursion, where you perform the recursive call first and then take the return value from the recursive function and calculate the result.
Generally when you call a function an entry is added to the call stack of a currently running thread. The downside is that the call stack has a defined size so quickly you may get StackOverflowError exception. This is why Java prefers to iterate rather than recurse. Because Scala runs on the JVM, Scala also suffers from this problem. But starting with Scala 2.8.1, Scala gets away this limitation by doing tail call optimization. you can do tail recursion in Scala.
To recap recursion is preferred way in functional programming to avoid using mutation and secondly tail recursion is supported in Scala so you don't get into StackOverFlow exceptions which you get in Java.
Hope this helps.
As for stack overflow, a lot of the time you can get away with it because of tail call elimination.
The reason scala and other function paradigms avoid for/while loops they are highly dependent on state and time. That makes it much harder to reason about complex "loops" in a formal and precise manor.

The case for point free style in Scala

This may seem really obvious to the FP cognoscenti here, but what is point free style in Scala good for? What would really sell me on the topic is an illustration that shows how point free style is significantly better in some dimension (e.g. performance, elegance, extensibility, maintainability) than code solving the same problem in non-point free style.
Quite simply, it's about being able to avoid specifying a name where none is needed, consider a trivial example:
List("a","b","c") foreach println
In this case, foreach is looking to accept String => Unit, a function that accepts a String and returns Unit (essentially, that there's no usable return and it works purely through side effect)
There's no need to bind a name here to each String instance that's passed to println. Arguably, it just makes the code more verbose to do so:
List("a","b","c") foreach {println(_)}
Or even
List("a","b","c") foreach {s => println(s)}
Personally, when I see code that isn't written in point-free style, I take it as an indicator that the bound name may be used twice, or that it has some significance in documenting the code. Likewise, I see point-free style as a sign that I can reason about the code more simply.
One appeal of point-free style in general is that without a bunch of "points" (values as opposed to functions) floating around, which must be repeated in several places to thread them through the computation, there are fewer opportunities to make a mistake, e.g. when typing a variable's name.
However, the advantages of point-free are quickly counterbalanced in Scala by its meagre ability to infer types, a fact which is exacerbated by point-free code because "points" serve as clues to the type inferencer. In Haskell, with its almost-complete type inferencing, this is usually not an issue.
I see no other advantage than "elegance": It's a little bit shorter, and may be more readable. It allows to reason about functions as entities, without going mentally a "level deeper" to function application, but of course you need getting used to it first.
I don't know any example where performance improves by using it (maybe it gets worse in cases where you end up with a function when a method would be sufficient).
Scala's point-free syntax is part of the magic Scala operators-which-are-really-functions. Even the most basic operators are functions:
For example:
val x = 1
val y = x + 1
...is the same as...
val x = 1
val y = x.+(1)
...but of course, the point-free style reads more naturally (the plus appears to be an operator).

why use foldLeft instead of procedural version?

So in reading this question it was pointed out that instead of the procedural code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
var result = exp
for ((oldS, newS) <- replacements)
result = result.replace(oldS, newS)
result
}
You could write the following functional code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
replacements.foldLeft(exp){
case (result, (oldS, newS)) => result.replace(oldS, newS)
}
}
I would almost certainly write the first version because coders familiar with either procedural or functional styles can easily read and understand it, while only coders familiar with functional style can easily read and understand the second version.
But setting readability aside for the moment, is there something that makes foldLeft a better choice than the procedural version? I might have thought it would be more efficient, but it turns out that the implementation of foldLeft is actually just the procedural code above. So is it just a style choice, or is there a good reason to use one version or the other?
Edit: Just to be clear, I'm not asking about other functions, just foldLeft. I'm perfectly happy with the use of foreach, map, filter, etc. which all map nicely onto for-comprehensions.
Answer: There are really two good answers here (provided by delnan and Dave Griffith) even though I could only accept one:
Use foldLeft because there are additional optimizations, e.g. using a while loop which will be faster than a for loop.
Use fold if it ever gets added to regular collections, because that will make the transition to parallel collections trivial.
It's shorter and clearer - yes, you need to know what a fold is to understand it, but when you're programming in a language that's 50% functional, you should know these basic building blocks anyway. A fold is exactly what the procedural code does (repeatedly applying an operation), but it's given a name and generalized. And while it's only a small wheel you're reinventing, but it's still a wheel reinvention.
And in case the implementation of foldLeft should ever get some special perk - say, extra optimizations - you get that for free, without updating countless methods.
Other than a distaste for mutable variable (even mutable locals), the basic reason to use fold in this case is clarity, with occasional brevity. Most of the wordiness of the fold version is because you have to use an explicit function definition with a destructuring bind. If each element in the list is used precisely once in the fold operation (a common case), this can be simplified to use the short form. Thus the classic definition of the sum of a collection of numbers
collection.foldLeft(0)(_+_)
is much simpler and shorter than any equivalent imperative construct.
One additional meta-reason to use functional collection operations, although not directly applicable in this case, is to enable a move to using parallel collection operations if needed for performance. Fold can't be parallelized, but often fold operations can be turned into commutative-associative reduce operations, and those can be parallelized. With Scala 2.9, changing something from non-parallel functional to parallel functional utilizing multiple processing cores can sometimes be as easy as dropping a .par onto the collection you want to execute parallel operations on.
One word I haven't seen mentioned here yet is declarative:
Declarative programming is often defined as any style of programming that is not imperative. A number of other common definitions exist that attempt to give the term a definition other than simply contrasting it with imperative programming. For example:
A program that describes what computation should be performed and not how to compute it
Any programming language that lacks side effects (or more specifically, is referentially transparent)
A language with a clear correspondence to mathematical logic.
These definitions overlap substantially.
Higher-order functions (HOFs) are a key enabler of declarativity, since we only specify the what (e.g. "using this collection of values, multiply each value by 2, sum the result") and not the how (e.g. initialize an accumulator, iterate with a for loop, extract values from the collection, add to the accumulator...).
Compare the following:
// Sugar-free Scala (Still better than Java<5)
def sumDoubled1(xs: List[Int]) = {
var sum = 0 // Initialized correctly?
for (i <- 0 until xs.size) { // Fenceposts?
sum = sum + (xs(i) * 2) // Correct value being extracted?
// Value extraction and +/* smashed together
}
sum // Correct value returned?
}
// Iteration sugar (similar to Java 5)
def sumDoubled2(xs: List[Int]) = {
var sum = 0
for (x <- xs) // We don't need to worry about fenceposts or
sum = sum + (x * 2) // value extraction anymore; that's progress
sum
}
// Verbose Scala
def sumDoubled3(xs: List[Int]) = xs.map((x: Int) => x*2). // the doubling
reduceLeft((x: Int, y: Int) => x+y) // the addition
// Idiomatic Scala
def sumDoubled4(xs: List[Int]) = xs.map(_*2).reduceLeft(_+_)
// ^ the doubling ^
// \ the addition
Note that our first example, sumDoubled1, is already more declarative than (most would say superior to) C/C++/Java<5 for loops, because we haven't had to micromanage the iteration state and termination logic, but we're still vulnerable to off-by-one errors.
Next, in sumDoubled2, we're basically at the level of Java>=5. There are still a couple things that can go wrong, but we're getting pretty good at reading this code-shape, so errors are quite unlikely. However, don't forget that a pattern that's trivial in a toy example isn't always so readable when scaled up to production code!
With sumDoubled3, desugared for didactic purposes, and sumDoubled4, the idiomatic Scala version, the iteration, initialization, value extraction and choice of return value are all gone.
Sure, it takes time to learn to read the functional versions, but we've drastically foreclosed our options for making mistakes. The "business logic" is clearly marked, and the plumbing is chosen from the same menu that everyone else is reading from.
It is worth pointing out that there is another way of calling foldLeft which takes advantages of:
The ability to use (almost) any Unicode symbol in an identifier
The feature that if a method name ends with a colon :, and is called infix, then the target and parameter are switched
For me this version is much clearer, because I can see that I am folding the expr value into the replacements collection
def expand(expr: String, replacements: Traversable[(String, String)]): String = {
(expr /: replacements) { case (r, (o, n)) => r.replace(o, n) }
}

Why should I avoid using local modifiable variables in Scala?

I'm pretty new to Scala and most of the time before I've used Java. Right now I have warnings all over my code saying that i should "Avoid mutable local variables" and I have a simple question - why?
Suppose I have small problem - determine max int out of four. My first approach was:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
var subMax1 = a
if (b > a) subMax1 = b
var subMax2 = c
if (d > c) subMax2 = d
if (subMax1 > subMax2) subMax1
else subMax2
}
After taking into account this warning message I found another solution:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
max(max(a, b), max(c, d))
}
def max(a: Int, b: Int): Int = {
if (a > b) a
else b
}
It looks more pretty, but what is ideology behind this?
Whenever I approach a problem I'm thinking about it like: "Ok, we start from this and then we incrementally change things and get the answer". I understand that the problem is that I try to change some initial state to get an answer and do not understand why changing things at least locally is bad? How to iterate over collection then in functional languages like Scala?
Like an example: Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6? Can't think of solution without local mutable variable.
In your particular case there is another solution:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
val submax1 = if (a > b) a else b
val submax2 = if (c > d) c else d
if (submax1 > submax2) submax1 else submax2
}
Isn't it easier to follow? Of course I am a bit biased but I tend to think it is, BUT don't follow that rule blindly. If you see that some code might be written more readably and concisely in mutable style, do it this way -- the great strength of scala is that you don't need to commit to neither immutable nor mutable approaches, you can swing between them (btw same applies to return keyword usage).
Like an example: Suppose we have a list of ints, how to write a
function that returns the sublist of ints which are divisible by 6?
Can't think of solution without local mutable variable.
It is certainly possible to write such function using recursion, but, again, if mutable solution looks and works good, why not?
It's not so related with Scala as with the functional programming methodology in general. The idea is the following: if you have constant variables (final in Java), you can use them without any fear that they are going to change. In the same way, you can parallelize your code without worrying about race conditions or thread-unsafe code.
In your example is not so important, however imagine the following example:
val variable = ...
new Future { function1(variable) }
new Future { function2(variable) }
Using final variables you can be sure that there will not be any problem. Otherwise, you would have to check the main thread and both function1 and function2.
Of course, it's possible to obtain the same result with mutable variables if you do not ever change them. But using inmutable ones you can be sure that this will be the case.
Edit to answer your edit:
Local mutables are not bad, that's the reason you can use them. However, if you try to think approaches without them, you can arrive to solutions as the one you posted, which is cleaner and can be parallelized very easily.
How to iterate over collection then in functional languages like Scala?
You can always iterate over a inmutable collection, while you do not change anything. For example:
val list = Seq(1,2,3)
for (n <- list)
println n
With respect to the second thing that you said: you have to stop thinking in a traditional way. In functional programming the usage of Map, Filter, Reduce, etc. is normal; as well as pattern matching and other concepts that are not typical in OOP. For the example you give:
Like an example: Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6?
val list = Seq(1,6,10,12,18,20)
val result = list.filter(_ % 6 == 0)
Firstly you could rewrite your example like this:
def max(first: Int, others: Int*): Int = {
val curMax = Math.max(first, others(0))
if (others.size == 1) curMax else max(curMax, others.tail : _*)
}
This uses varargs and tail recursion to find the largest number. Of course there are many other ways of doing the same thing.
To answer your queston - It's a good question and one that I thought about myself when I first started to use scala. Personally I think the whole immutable/functional programming approach is somewhat over hyped. But for what it's worth here are the main arguments in favour of it:
Immutable code is easier to read (subjective)
Immutable code is more robust - it's certainly true that changing mutable state can lead to bugs. Take this for example:
for (int i=0; i<100; i++) {
for (int j=0; j<100; i++) {
System.out.println("i is " + i = " and j is " + j);
}
}
This is an over simplified example but it's still easy to miss the bug and the compiler won't help you
Mutable code is generally not thread safe. Even trivial and seemingly atomic operations are not safe. Take for example i++ this looks like an atomic operation but it's actually equivalent to:
int i = 0;
int tempI = i + 0;
i = tempI;
Immutable data structures won't allow you to do something like this so you would need to explicitly think about how to handle it. Of course as you point out local variables are generally threadsafe, but there is no guarantee. It's possible to pass a ListBuffer instance variable as a parameter to a method for example
However there are downsides to immutable and functional programming styles:
Performance. It is generally slower in both compilation and runtime. The compiler must enforce the immutability and the JVM must allocate more objects than would be required with mutable data structures. This is especially true of collections.
Most scala examples show something like val numbers = List(1,2,3) but in the real world hard coded values are rare. We generally build collections dynamically (from a database query etc). Whilst scala can reassign the values in a colection it must still create a new collection object every time you modify it. If you want to add 1000 elements to a scala List (immutable) the JVM will need to allocate (and then GC) 1000 objects
Hard to maintain. Functional code can be very hard to read, it's not uncommon to see code like this:
val data = numbers.foreach(_.map(a => doStuff(a).flatMap(somethingElse)).foldleft("", (a : Int,b: Int) => a + b))
I don't know about you but I find this sort of code really hard to follow!
Hard to debug. Functional code can also be hard to debug. Try putting a breakpoint halfway into my (terrible) example above
My advice would be to use a functional/immutable style where it genuinely makes sense and you and your colleagues feel comfortable doing it. Don't use immutable structures because they're cool or it's "clever". Complex and challenging solutions will get you bonus points at Uni but in the commercial world we want simple solutions to complex problems! :)
Your two main questions:
Why warn against local state changes?
How can you iterate over collections without mutable state?
I'll answer both.
Warnings
The compiler warns against the use of mutable local variables because they are often a cause of error. That doesn't mean this is always the case. However, your sample code is pretty much a classic example of where mutable local state is used entirely unnecessarily, in a way that not only makes it more error prone and less clear but also less efficient.
Your first code example is more inefficient than your second, functional solution. Why potentially make two assignments to submax1 when you only ever need to assign one? You ask which of the two inputs is larger anyway, so why not ask that first and then make one assignment? Why was your first approach to temporarily store partial state only halfway through the process of asking such a simple question?
Your first code example is also inefficient because of unnecessary code duplication. You're repeatedly asking "which is the biggest of two values?" Why write out the code for that 3 times independently? Needlessly repeating code is a known bad habit in OOP every bit as much as FP and for precisely the same reasons. Each time you needlessly repeat code, you open a potential source of error. Adding mutable local state (especially when so unnecessary) only adds to the fragility and to the potential for hard to spot errors, even in short code. You just have to type submax1 instead of submax2 in one place and you may not notice the error for a while.
Your second, FP solution removes the code duplication, dramatically reducing the chance of error, and shows that there was simply no need for mutable local state. It's also, as you yourself say, cleaner and clearer - and better than the alternative solution in om-nom-nom's answer.
(By the way, the idiomatic Scala way to write such a simple function is
def max(a: Int, b: Int) = if (a > b) a else b
which terser style emphasises its simplicity and makes the code less verbose)
Your first solution was inefficient and fragile, but it was your first instinct. The warning caused you to find a better solution. The warning proved its value. Scala was designed to be accessible to Java developers and is taken up by many with a long experience of imperative style and little or no knowledge of FP. Their first instinct is almost always the same as yours. You have demonstrated how that warning can help improve code.
There are cases where using mutable local state can be faster but the advice of Scala experts in general (not just the pure FP true believers) is to prefer immutability and to reach for mutability only where there is a clear case for its use. This is so against the instincts of many developers that the warning is useful even if annoying to experienced Scala devs.
It's funny how often some kind of max function comes up in "new to FP/Scala" questions. The questioner is very often tripping up on errors caused by their use of local state... which link both demonstrates the often obtuse addiction to mutable state among some devs while also leading me on to your other question.
Functional Iteration over Collections
There are three functional ways to iterate over collections in Scala
For Comprehensions
Explicit Recursion
Folds and other Higher Order Functions
For Comprehensions
Your question:
Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6? Can't think of solution without local mutable variable
Answer: assuming xs is a list (or some other sequence) of integers, then
for (x <- xs; if x % 6 == 0) yield x
will give you a sequence (of the same type as xs) containing only those items which are divisible by 6, if any. No mutable state required. Scala just iterates over the sequence for you and returns anything matching your criteria.
If you haven't yet learned the power of for comprehensions (also known as sequence comprehensions) you really should. Its a very expressive and powerful part of Scala syntax. You can even use them with side effects and mutable state if you want (look at the final example on the tutorial I just linked to). That said, there can be unexpected performance penalties and they are overused by some developers.
Explicit Recursion
In the question I linked to at the end of the first section, I give in my answer a very simple, explicitly recursive solution to returning the largest Int from a list.
def max(xs: List[Int]): Option[Int] = xs match {
case Nil => None
case List(x: Int) => Some(x)
case x :: y :: rest => max( (if (x > y) x else y) :: rest )
}
I'm not going to explain how the pattern matching and explicit recursion work (read my other answer or this one). I'm just showing you the technique. Most Scala collections can be iterated over recursively, without any need for mutable state. If you need to keep track of what you've been up to along the way, you pass along an accumulator. (In my example code, I stick the accumulator at the front of the list to keep the code smaller but look at the other answers to those questions for more conventional use of accumulators).
But here is a (naive) explicitly recursive way of finding those integers divisible by 6
def divisibleByN(n: Int, xs: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: rest if x % n == 0 => x :: divisibleByN(n, rest)
case _ :: rest => divisibleByN(n, rest)
}
I call it naive because it isn't tail recursive and so could blow your stack. A safer version can be written using an accumulator list and an inner helper function but I leave that exercise to you. The result will be less pretty code than the naive version, no matter how you try, but the effort is educational.
Recursion is a very important technique to learn. That said, once you have learned to do it, the next important thing to learn is that you can usually avoid using it explicitly yourself...
Folds and other Higher Order Functions
Did you notice how similar my two explicit recursion examples are? That's because most recursions over a list have the same basic structure. If you write a lot of such functions, you'll repeat that structure many times. Which makes it boilerplate; a waste of your time and a potential source of error.
Now, there are any number of sophisticated ways to explain folds but one simple concept is that they take the boilerplate out of recursion. They take care of the recursion and the management of accumulator values for you. All they ask is that you provide a seed value for the accumulator and the function to apply at each iteration.
For example, here is one way to use fold to extract the highest Int from the list xs
xs.tail.foldRight(xs.head) {(a, b) => if (a > b) a else b}
I know you aren't familiar with folds, so this may seem gibberish to you but surely you recognise the lambda (anonymous function) I'm passing in on the right. What I'm doing there is taking the first item in the list (xs.head) and using it as the seed value for the accumulator. Then I'm telling the rest of the list (xs.tail) to iterate over itself, comparing each item in turn to the accumulator value.
This kind of thing is a common case, so the Collections api designers have provided a shorthand version:
xs.reduce {(a, b) => if (a > b) a else b}
(If you look at the source code, you'll see they have implemented it using a fold).
Anything you might want to do iteratively to a Scala collection can be done using a fold. Often, the api designers will have provided a simpler higher-order function which is implemented, under the hood, using a fold. Want to find those divisible-by-six Ints again?
xs.foldRight(Nil: List[Int]) {(x, acc) => if (x % 6 == 0) x :: acc else acc}
That starts with an empty list as the accumulator, iterates over every item, only adding those divisible by 6 to the accumulator. Again, a simpler fold-based HoF has been provided for you:
xs filter { _ % 6 == 0 }
Folds and related higher-order functions are harder to understand than for comprehensions or explicit recursion, but very powerful and expressive (to anybody else who understands them). They eliminate boilerplate, removing a potential source of error. Because they are implemented by the core language developers, they can be more efficient (and that implementation can change, as the language progresses, without breaking your code). Experienced Scala developers use them in preference to for comprehensions or explicit recursion.
tl;dr
Learn For comprehensions
Learn explicit recursion
Don't use them if a higher-order function will do the job.
It is always nicer to use immutable variables since they make your code easier to read. Writing a recursive code can help solve your problem.
def max(x: List[Int]): Int = {
if (x.isEmpty == true) {
0
}
else {
Math.max(x.head, max(x.tail))
}
}
val a_list = List(a,b,c,d)
max_value = max(a_list)