Use-cases for Streams in Scala - scala

In Scala there is a Stream class that is very much like an iterator. The topic Difference between Iterator and Stream in Scala? offers some insights into the similarities and differences between the two.
Seeing how to use a stream is pretty simple but I don't have very many common use-cases where I would use a stream instead of other artifacts.
The ideas I have right now:
If you need to make use of an infinite series. But this does not seem like a common use-case to me so it doesn't fit my criteria. (Please correct me if it is common and I just have a blind spot)
If you have a series of data where each element needs to be computed but that you may want to reuse several times. This is weak because I could just load it into a list which is conceptually easier to follow for a large subset of the developer population.
Perhaps there is a large set of data or a computationally expensive series and there is a high probability that the items you need will not require visiting all of the elements. But in this case an Iterator would be a good match unless you need to do several searches, in that case you could use a list as well even if it would be slightly less efficient.
There is a complex series of data that needs to be reused. Again a list could be used here. Although in this case both cases would be equally difficult to use and a Stream would be a better fit since not all elements need to be loaded. But again not that common... or is it?
So have I missed any big uses? Or is it a developer preference for the most part?
Thanks

The main difference between a Stream and an Iterator is that the latter is mutable and "one-shot", so to speak, while the former is not. Iterator has a better memory footprint than Stream, but the fact that it is mutable can be inconvenient.
Take this classic prime number generator, for instance:
def primeStream(s: Stream[Int]): Stream[Int] =
Stream.cons(s.head, primeStream(s.tail filter { _ % s.head != 0 }))
val primes = primeStream(Stream.from(2))
It can be easily be written with an Iterator as well, but an Iterator won't keep the primes computed so far.
So, one important aspect of a Stream is that you can pass it to other functions without having it duplicated first, or having to generate it again and again.
As for expensive computations/infinite lists, these things can be done with Iterator as well. Infinite lists are actually quite useful -- you just don't know it because you didn't have it, so you have seen algorithms that are more complex than strictly necessary just to deal with enforced finite sizes.

In addition to Daniel's answer, keep in mind that Stream is useful for short-circuiting evaluations. For example, suppose I have a huge set of functions that take String and return Option[String], and I want to keep executing them until one of them works:
val stringOps = List(
(s:String) => if (s.length>10) Some(s.length.toString) else None ,
(s:String) => if (s.length==0) Some("empty") else None ,
(s:String) => if (s.indexOf(" ")>=0) Some(s.trim) else None
);
Well, I certainly don't want to execute the entire list, and there isn't any handy method on List that says, "treat these as functions and execute them until one of them returns something other than None". What to do? Perhaps this:
def transform(input: String, ops: List[String=>Option[String]]) = {
ops.toStream.map( _(input) ).find(_ isDefined).getOrElse(None)
}
This takes a list and treats it as a Stream (which doesn't actually evaluate anything), then defines a new Stream that is a result of applying the functions (but that doesn't evaluate anything either yet), then searches for the first one which is defined--and here, magically, it looks back and realizes it has to apply the map, and get the right data from the original list--and then unwraps it from Option[Option[String]] to Option[String] using getOrElse.
Here's an example:
scala> transform("This is a really long string",stringOps)
res0: Option[String] = Some(28)
scala> transform("",stringOps)
res1: Option[String] = Some(empty)
scala> transform(" hi ",stringOps)
res2: Option[String] = Some(hi)
scala> transform("no-match",stringOps)
res3: Option[String] = None
But does it work? If we put a println into our functions so we can tell if they're called, we get
val stringOps = List(
(s:String) => {println("1"); if (s.length>10) Some(s.length.toString) else None },
(s:String) => {println("2"); if (s.length==0) Some("empty") else None },
(s:String) => {println("3"); if (s.indexOf(" ")>=0) Some(s.trim) else None }
);
// (transform is the same)
scala> transform("This is a really long string",stringOps)
1
res0: Option[String] = Some(28)
scala> transform("no-match",stringOps)
1
2
3
res1: Option[String] = None
(This is with Scala 2.8; 2.7's implementation will sometimes overshoot by one, unfortunately. And note that you do accumulate a long list of None as your failures accrue, but presumably this is inexpensive compared to your true computation here.)

I could imagine, that if you poll some device in real time, a Stream is more convenient.
Think of an GPS tracker, which returns the actual position if you ask it. You can't precompute the location where you will be in 5 minutes. You might use it for a few minutes only to actualize a path in OpenStreetMap or you might use it for an expedition over six months in a desert or the rain forest.
Or a digital thermometer or other kinds of sensors which repeatedly return new data, as long as the hardware is alive and turned on - a log file filter could be another example.

Stream is to Iterator as immutable.List is to mutable.List. Favouring immutability prevents a class of bugs, occasionally at the cost of performance.
scalac itself isn't immune to these problems: http://article.gmane.org/gmane.comp.lang.scala.internals/2831
As Daniel points out, favouring laziness over strictness can simplify algorithms and make it easier to compose them.

Related

Scala large list allocation causes heavy garbage collection

EDIT, Summary: So, in the long chain of back and too, I think the "final answer" is a little hard to find. In essence however, Yuval pointed out that the incremental allocation of a large amount of memory forces a heap resize (actually, two by the look of the graph). A heap resize on a normal JVM involves a full GC, the most expensive, timeconsuming, collection possible. So, the reality is that my process isn't collecting lots of garbage per se, rather its doing heap resizes which inherently trigger expensive GC as part of the heap reorganization process. Those of us more familiar with Java than Scala are more likely to have allocated a simple ArrayList, which even if it causes heap resizing, is only a few objects (and likely allocated directly into old-gen if it's a big array) which would be far less work--because it's far fewer objects!--for a full GC anyway. Moral is likely that some other structure would be more appropriate for very large "lists".
I was trying to experiment with some of Scala's data structures (actually, with the parallel stuff, but that's not relevant to the problem I bumped into). I'm trying to create a fairly long list (with the intention of processing it purely sequentially). But try as I might, I'm failing to create a simple list without invoking vast quantities of garbage collection. I'm fairly sure that I'm simply pre-pending the new items to the existing tail, but the GC load suggests that I'm not. I've tried a couple of techniques so far (I'm starting to suspect that I'm misunderstanding something truly fundamental about this structure :( )
Here's the first effort:
val myList = {
#tailrec
def addToList(remaining:Long, acc:List[Double]): List[Double] =
if (remaining > 0) addToList(remaining - 1, 0 :: acc)
else acc
addToList(10000000, Nil)
}
And when I began to doubt I knew how to do recursion after all, I came up with this mutating beast.
val myList = {
var rv: List[Double] = Nil
var count = 10000000
while (count > 0) {
rv = 0.0 :: rv
}
rv
}
They both give the same effect: 8 cores running flat out doing GC (according to jvisualvm) and memory allocation reaching peaks at just over 1GB, which I assume is the real allocated space required for the data, but on the way, it creates a seemingly vast amount of trash on the way.
Am I doing something horribly wrong here? Am I somehow forcing the recreation of the entire list with every new element (I'm trying very hard to only do "prepend" type operations, which I thought should avoid that).
Or maybe, I have half a memory of hearing that Scala List does something odd to help it transform into a mutable list, or a parallel list, or something. Really don't recall what. Is this something to do with that? And if so, what the heck was "that" anyway?
Oh, and here's the image of the GC process. Notice the front-loading on the otherwise triangular rise of the memory that represents the "real" allocated data. That huge hump, and the associated CPU usage are my problem:
EDIT: I should clarify, I'm interested in two things. First, if my creation of the list is intrinsically faulty (i.e. if I'm not in fact only performing prepend operations) then I'd like to understand why, and how I should do this "right". Second, if my construction is sound and the odd behavior is intrinsic in the List, I'd like to understand the List better, so I know what it's doing, and why. I'm not particularly interested (at this point) in alternative ways to build a sequential data structure that sidesteps this problem. I anticipate using List a lot, and would like to know what's happening. (Later, I might well want to investigate other structures in this level of detail, but not right now).
First, if my creation of the list is intrinsically faulty (i.e. if
I'm not in fact only performing prepend operations) then I'd like to
understand why
You are constructing the list properly, there's no problem there.
Second, if my construction is sound and the odd behavior is intrinsic
in the List, I'd like to understand the List better, so I know what
it's doing, and why
List[A] in Scala is based on a linked list implementation, where you have a head of type A, and a tail of type List[A]. List[A] is an abstract class with two implementations, one presenting the empty list called Nil, and one called "Cons", or ::, representing a list which has a head value and a tail, which can be either full or empty:
def ::[B >: A] (x: B): List[B] =
new scala.collection.immutable.::(x, this)
If we look at the implementation for ::, we can see that it is a simple case class with two fields:
final case class ::[B](override val head: B, private[scala] var tl: List[B]) extends List[B] {
override def tail : List[B] = tl
override def isEmpty: Boolean = false
}
A quick look using the memory tab in IntelliJ shows:
That we have ten million Double values, and ten million instances of the :: case class, which in itself has additional overhead for being a case class (the compiler "enhances" these classes with additional structure).
Your JVisualVM instance doesn't show the GC graph being fully utilized, it is rather showing your CPU is overworked from generating the large list of items. During the allocation process, you generate a lot of intermediate lists until you reach your fully generated list, which means data has to be evicted between the different GC levels (Eden, Survivor and Old, assuming you're running the JVM flavor of Scala).
If we want a bit more information, we can use Mission Control to see into what's causing the memory pressure. This is a sample generated from a 30 second profile running:
def main(args: Array[String]): Unit = {
def myList: List[Double] = {
#tailrec
def addToList(remaining:Long, acc:List[Double]): List[Double] =
if (remaining > 0) addToList(remaining - 1, 0 :: acc)
else acc
addToList(10000000, Nil)
}
while (true) {
myList
}
}
We see that we have a call to BoxesRunTime.boxToDouble which happens due to the fact that :: is a generic class and doesn't have a #specialized attribute for double. We go scala.Int -> scala.Double -> java.lang.Double.

Scala: Is there any reason to prefer `filter+map` over `collect`?

Can there be any reason to prefer filter+map:
list.filter (i => aCondition(i)).map(i => fun(i))
over collect? :
list.collect(case i if aCondition(i) => fun(i))
The one with collect (single look) looks faster and cleaner to me. So I would always go for collect.
Most of Scala's collections eagerly apply operations and (unless you're using a macro library that does this for you) will not fuse operations. So filter followed by map will usually create two collections (and even if you use Iterator or somesuch, the intermediate form will be transiently created, albeit only an element at a time), whereas collect will not.
On the other hand, collect uses a partial function to implement the joint test, and partial functions are slower than predicates (A => Boolean) at testing whether something is in the collection.
Additionally, there can be cases where it is simply clearer to read one than the other and you don't care about performance or memory usage differences of a factor of 2 or so. In that case, use whichever is clearer. Generally if you already have the functions named, it's clearer to read
xs.filter(p).map(f)
xs.collect{ case x if p(x) => f(x) }
but if you are supplying the closures inline, collect generally looks cleaner
xs.filter(x < foo(x, x)).map(x => bar(x, x))
xs.collect{ case x if foo(x, x) => bar(x, x) }
even though it's not necessarily shorter, because you only refer to the variable once.
Now, how big is the difference in performance? That varies, but if we consider a a collection like this:
val v = Vector.tabulate(10000)(i => ((i%100).toString, (i%7).toString))
and you want to pick out the second entry based on filtering the first (so the filter and map operations are both really easy), then we get the following table.
Note: one can get lazy views into collections and gather operations there. You don't always get your original type back, but you can always use to get the right collection type. So xs.view.filter(p).map(f).toVector would, because of the view, not create an intermediate. That is tested below also. It has also been suggested that one can xs.flatMap(x => if (p(x)) Some(f(x)) else None) and that this is efficient. That is not so. It's also tested below. And one can avoid the partial function by explicitly creating a builder: val vb = Vector.newBuilder[String]; xs.foreach(x => if (p(x)) vb += f(x)); vb.result, and the results for that are also listed below.
In the table below, three conditions have been tested: filter out nothing, filter out half, filter out everything. The times have been normalized to filter/map (100% = same time as filter/map, lower is better). Error bounds are around +- 3%.
Performance of different filter/map alternatives
====================== Vector ========================
filter/map collect view filt/map flatMap builder
100% 44% 64% 440% 30% filter out none
100% 60% 76% 605% 42% filter out half
100% 112% 103% 1300% 74% filter out all
Thus, filter/map and collect are generally pretty close (with collect winning when you keep a lot), flatMap is far slower under all situations, and creating a builder always wins. (This is true specifically for Vector. Other collections may have somewhat different characteristics, but the trends for most will be similar because the differences in operations are similar.) Views in this test tend to be a win, but they don't always work seamlessly (and they aren't really better than collect except for the empty case).
So, bottom line: prefer filter then map if it aids clarity when speed doesn't matter, or prefer it for speed when you're filtering out almost everything but still want to keep things functional (so don't want to use a builder); and otherwise use collect.
I guess this is rather opinion based, but given the following definitions:
scala> val l = List(1,2,3,4)
l: List[Int] = List(1, 2, 3, 4)
scala> def f(x: Int) = 2*x
f: (x: Int)Int
scala> def p(x: Int) = x%2 == 0
p: (x: Int)Boolean
Which of the two do you find nicer to read:
l.filter(p).map(f)
or
l.collect{ case i if p(i) => f(i) }
(Note that I had to fix your syntax above, as you need the bracket and case to add the if condition).
I personally find the filter+map much nicer to read and understand. It's all a matter of the exact syntax that you use, but given p and f, you don't have to write anonymous functions when using filter or map, while you do need them when using collect.
You also have the possibility to use flatMap:
l.flatMap(i => if(p(i)) Some(f(i)) else None)
Which is likely to be the most efficient among the 3 solutions, but I find it less nice than map and filter.
Overall, it's very difficult to say which one would be faster, as it depends a lot of which optimizations end up being performed by scalac and then the JVM. All 3 should be pretty close, and definitely not a factor in deciding which one to use.
One case where filter/map looks cleaner is when you want to flatten the result of filter
def getList(x: Int) = {
List.range(x, 0, -1)
}
val xs = List(1,2,3,4)
//Using filter and flatMap
xs.filter(_ % 2 == 0).flatMap(getList)
//Using collect and flatten
xs.collect{ case x if x % 2 == 0 => getList(x)}.flatten

scala.collection.breakOut vs views

This SO answer describes how scala.collection.breakOut can be used to prevent creating wasteful intermediate collections. For example, here we create an intermediate Seq[(String,String)]:
val m = List("A", "B", "C").map(x => x -> x).toMap
By using breakOut we can prevent the creation of this intermediate Seq:
val m: Map[String,String] = List("A", "B", "C").map(x => x -> x)(breakOut)
Views solve the same problem and in addition access elements lazily:
val m = (List("A", "B", "C").view map (x => x -> x)).toMap
I am assuming the creation of the View wrappers is fairly cheap, so my question is: Is there any real reason to use breakOut over Views?
You're going to make a trip from England to France.
With view: you're taking a set of notes in your notebook and boom, once you've called .force() you start making all of them: buy a ticket, board on the plane, ....
With breakOut: you're departing and boom, you in the Paris looking at the Eiffel tower. You don't remember how exactly you've arrived there, but you did this trip actually, just didn't make any memories.
Bad analogy, but I hope this give you a taste of what is the difference between them.
I don't think views and breakOut are identical.
A breakOut is a CanBuildFrom implementation used to simplify transformation operations by eliminating intermediary steps. E.g get from A to B without the intermediary collection. A breakOut means letting Scala choose the appropriate builder object for maximum efficiency of producing new items in a given scenario. More details here.
views deal with a different type of efficiency, the main sale pitch being: "No more new objects". Views store light references to objects to tackle different usage scenarios: lazy access etc.
Bottom line:
If you map on a view you may still get an intermediary collection of references created before the expected result can be produced. You could still have superior performance from:
collection.view.map(somefn)(breakOut)
Than from:
collection.view.map(someFn)
As of Scala 2.13, this is no longer a concern. Breakout has been removed and views are the recommended replacement.
Scala 2.13 Collections Rework
Views are also the recommended replacement for collection.breakOut.
For example,
val s: Seq[Int] = ...
val set: Set[String] = s.map(_.toString)(collection.breakOut)
can be expressed with the same performance characteristics as:
val s: Seq[Int] = ...
val set = s.view.map(_.toString).to(Set)
What flavian said.
One use case for views is to conserve memory. For example, if you had a million-character-long string original, and needed to use, one by one, all of the million suffixes of that string, you might use a collection of
val v = original.view
val suffixes = v.tails
views on the original string. Then you might loop over the suffixes one by one, using suffix.force() to convert them back to strings within the loop, thus only holding one in memory at a time. Of course, you could do the same thing by iterating with your own loop over the indices of the original string, rather than creating any kind of collection of the suffixes.
Another use-case is when creation of the derived objects is expensive, you need them in a collection (say, as values in a map), but you only will access a few, and you don't know which ones.
If you really have a case where picking between them makes sense, prefer breakOut unless there's a good argument for using view (like those above).
Views require more code changes and care than breakOut, in that you need to add force() where needed. Depending on context, failure to do so is
often only detected at run-time. With breakOut, generally if it
compiles, it's right.
In cases where view does not apply, breakOut
will be faster, since view generation and forcing is skipped.
If you use a debugger, you can inspect the collection contents, which you
can't meaningfully do with a collection of views.

why use foldLeft instead of procedural version?

So in reading this question it was pointed out that instead of the procedural code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
var result = exp
for ((oldS, newS) <- replacements)
result = result.replace(oldS, newS)
result
}
You could write the following functional code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
replacements.foldLeft(exp){
case (result, (oldS, newS)) => result.replace(oldS, newS)
}
}
I would almost certainly write the first version because coders familiar with either procedural or functional styles can easily read and understand it, while only coders familiar with functional style can easily read and understand the second version.
But setting readability aside for the moment, is there something that makes foldLeft a better choice than the procedural version? I might have thought it would be more efficient, but it turns out that the implementation of foldLeft is actually just the procedural code above. So is it just a style choice, or is there a good reason to use one version or the other?
Edit: Just to be clear, I'm not asking about other functions, just foldLeft. I'm perfectly happy with the use of foreach, map, filter, etc. which all map nicely onto for-comprehensions.
Answer: There are really two good answers here (provided by delnan and Dave Griffith) even though I could only accept one:
Use foldLeft because there are additional optimizations, e.g. using a while loop which will be faster than a for loop.
Use fold if it ever gets added to regular collections, because that will make the transition to parallel collections trivial.
It's shorter and clearer - yes, you need to know what a fold is to understand it, but when you're programming in a language that's 50% functional, you should know these basic building blocks anyway. A fold is exactly what the procedural code does (repeatedly applying an operation), but it's given a name and generalized. And while it's only a small wheel you're reinventing, but it's still a wheel reinvention.
And in case the implementation of foldLeft should ever get some special perk - say, extra optimizations - you get that for free, without updating countless methods.
Other than a distaste for mutable variable (even mutable locals), the basic reason to use fold in this case is clarity, with occasional brevity. Most of the wordiness of the fold version is because you have to use an explicit function definition with a destructuring bind. If each element in the list is used precisely once in the fold operation (a common case), this can be simplified to use the short form. Thus the classic definition of the sum of a collection of numbers
collection.foldLeft(0)(_+_)
is much simpler and shorter than any equivalent imperative construct.
One additional meta-reason to use functional collection operations, although not directly applicable in this case, is to enable a move to using parallel collection operations if needed for performance. Fold can't be parallelized, but often fold operations can be turned into commutative-associative reduce operations, and those can be parallelized. With Scala 2.9, changing something from non-parallel functional to parallel functional utilizing multiple processing cores can sometimes be as easy as dropping a .par onto the collection you want to execute parallel operations on.
One word I haven't seen mentioned here yet is declarative:
Declarative programming is often defined as any style of programming that is not imperative. A number of other common definitions exist that attempt to give the term a definition other than simply contrasting it with imperative programming. For example:
A program that describes what computation should be performed and not how to compute it
Any programming language that lacks side effects (or more specifically, is referentially transparent)
A language with a clear correspondence to mathematical logic.
These definitions overlap substantially.
Higher-order functions (HOFs) are a key enabler of declarativity, since we only specify the what (e.g. "using this collection of values, multiply each value by 2, sum the result") and not the how (e.g. initialize an accumulator, iterate with a for loop, extract values from the collection, add to the accumulator...).
Compare the following:
// Sugar-free Scala (Still better than Java<5)
def sumDoubled1(xs: List[Int]) = {
var sum = 0 // Initialized correctly?
for (i <- 0 until xs.size) { // Fenceposts?
sum = sum + (xs(i) * 2) // Correct value being extracted?
// Value extraction and +/* smashed together
}
sum // Correct value returned?
}
// Iteration sugar (similar to Java 5)
def sumDoubled2(xs: List[Int]) = {
var sum = 0
for (x <- xs) // We don't need to worry about fenceposts or
sum = sum + (x * 2) // value extraction anymore; that's progress
sum
}
// Verbose Scala
def sumDoubled3(xs: List[Int]) = xs.map((x: Int) => x*2). // the doubling
reduceLeft((x: Int, y: Int) => x+y) // the addition
// Idiomatic Scala
def sumDoubled4(xs: List[Int]) = xs.map(_*2).reduceLeft(_+_)
// ^ the doubling ^
// \ the addition
Note that our first example, sumDoubled1, is already more declarative than (most would say superior to) C/C++/Java<5 for loops, because we haven't had to micromanage the iteration state and termination logic, but we're still vulnerable to off-by-one errors.
Next, in sumDoubled2, we're basically at the level of Java>=5. There are still a couple things that can go wrong, but we're getting pretty good at reading this code-shape, so errors are quite unlikely. However, don't forget that a pattern that's trivial in a toy example isn't always so readable when scaled up to production code!
With sumDoubled3, desugared for didactic purposes, and sumDoubled4, the idiomatic Scala version, the iteration, initialization, value extraction and choice of return value are all gone.
Sure, it takes time to learn to read the functional versions, but we've drastically foreclosed our options for making mistakes. The "business logic" is clearly marked, and the plumbing is chosen from the same menu that everyone else is reading from.
It is worth pointing out that there is another way of calling foldLeft which takes advantages of:
The ability to use (almost) any Unicode symbol in an identifier
The feature that if a method name ends with a colon :, and is called infix, then the target and parameter are switched
For me this version is much clearer, because I can see that I am folding the expr value into the replacements collection
def expand(expr: String, replacements: Traversable[(String, String)]): String = {
(expr /: replacements) { case (r, (o, n)) => r.replace(o, n) }
}

Why should I avoid using local modifiable variables in Scala?

I'm pretty new to Scala and most of the time before I've used Java. Right now I have warnings all over my code saying that i should "Avoid mutable local variables" and I have a simple question - why?
Suppose I have small problem - determine max int out of four. My first approach was:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
var subMax1 = a
if (b > a) subMax1 = b
var subMax2 = c
if (d > c) subMax2 = d
if (subMax1 > subMax2) subMax1
else subMax2
}
After taking into account this warning message I found another solution:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
max(max(a, b), max(c, d))
}
def max(a: Int, b: Int): Int = {
if (a > b) a
else b
}
It looks more pretty, but what is ideology behind this?
Whenever I approach a problem I'm thinking about it like: "Ok, we start from this and then we incrementally change things and get the answer". I understand that the problem is that I try to change some initial state to get an answer and do not understand why changing things at least locally is bad? How to iterate over collection then in functional languages like Scala?
Like an example: Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6? Can't think of solution without local mutable variable.
In your particular case there is another solution:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
val submax1 = if (a > b) a else b
val submax2 = if (c > d) c else d
if (submax1 > submax2) submax1 else submax2
}
Isn't it easier to follow? Of course I am a bit biased but I tend to think it is, BUT don't follow that rule blindly. If you see that some code might be written more readably and concisely in mutable style, do it this way -- the great strength of scala is that you don't need to commit to neither immutable nor mutable approaches, you can swing between them (btw same applies to return keyword usage).
Like an example: Suppose we have a list of ints, how to write a
function that returns the sublist of ints which are divisible by 6?
Can't think of solution without local mutable variable.
It is certainly possible to write such function using recursion, but, again, if mutable solution looks and works good, why not?
It's not so related with Scala as with the functional programming methodology in general. The idea is the following: if you have constant variables (final in Java), you can use them without any fear that they are going to change. In the same way, you can parallelize your code without worrying about race conditions or thread-unsafe code.
In your example is not so important, however imagine the following example:
val variable = ...
new Future { function1(variable) }
new Future { function2(variable) }
Using final variables you can be sure that there will not be any problem. Otherwise, you would have to check the main thread and both function1 and function2.
Of course, it's possible to obtain the same result with mutable variables if you do not ever change them. But using inmutable ones you can be sure that this will be the case.
Edit to answer your edit:
Local mutables are not bad, that's the reason you can use them. However, if you try to think approaches without them, you can arrive to solutions as the one you posted, which is cleaner and can be parallelized very easily.
How to iterate over collection then in functional languages like Scala?
You can always iterate over a inmutable collection, while you do not change anything. For example:
val list = Seq(1,2,3)
for (n <- list)
println n
With respect to the second thing that you said: you have to stop thinking in a traditional way. In functional programming the usage of Map, Filter, Reduce, etc. is normal; as well as pattern matching and other concepts that are not typical in OOP. For the example you give:
Like an example: Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6?
val list = Seq(1,6,10,12,18,20)
val result = list.filter(_ % 6 == 0)
Firstly you could rewrite your example like this:
def max(first: Int, others: Int*): Int = {
val curMax = Math.max(first, others(0))
if (others.size == 1) curMax else max(curMax, others.tail : _*)
}
This uses varargs and tail recursion to find the largest number. Of course there are many other ways of doing the same thing.
To answer your queston - It's a good question and one that I thought about myself when I first started to use scala. Personally I think the whole immutable/functional programming approach is somewhat over hyped. But for what it's worth here are the main arguments in favour of it:
Immutable code is easier to read (subjective)
Immutable code is more robust - it's certainly true that changing mutable state can lead to bugs. Take this for example:
for (int i=0; i<100; i++) {
for (int j=0; j<100; i++) {
System.out.println("i is " + i = " and j is " + j);
}
}
This is an over simplified example but it's still easy to miss the bug and the compiler won't help you
Mutable code is generally not thread safe. Even trivial and seemingly atomic operations are not safe. Take for example i++ this looks like an atomic operation but it's actually equivalent to:
int i = 0;
int tempI = i + 0;
i = tempI;
Immutable data structures won't allow you to do something like this so you would need to explicitly think about how to handle it. Of course as you point out local variables are generally threadsafe, but there is no guarantee. It's possible to pass a ListBuffer instance variable as a parameter to a method for example
However there are downsides to immutable and functional programming styles:
Performance. It is generally slower in both compilation and runtime. The compiler must enforce the immutability and the JVM must allocate more objects than would be required with mutable data structures. This is especially true of collections.
Most scala examples show something like val numbers = List(1,2,3) but in the real world hard coded values are rare. We generally build collections dynamically (from a database query etc). Whilst scala can reassign the values in a colection it must still create a new collection object every time you modify it. If you want to add 1000 elements to a scala List (immutable) the JVM will need to allocate (and then GC) 1000 objects
Hard to maintain. Functional code can be very hard to read, it's not uncommon to see code like this:
val data = numbers.foreach(_.map(a => doStuff(a).flatMap(somethingElse)).foldleft("", (a : Int,b: Int) => a + b))
I don't know about you but I find this sort of code really hard to follow!
Hard to debug. Functional code can also be hard to debug. Try putting a breakpoint halfway into my (terrible) example above
My advice would be to use a functional/immutable style where it genuinely makes sense and you and your colleagues feel comfortable doing it. Don't use immutable structures because they're cool or it's "clever". Complex and challenging solutions will get you bonus points at Uni but in the commercial world we want simple solutions to complex problems! :)
Your two main questions:
Why warn against local state changes?
How can you iterate over collections without mutable state?
I'll answer both.
Warnings
The compiler warns against the use of mutable local variables because they are often a cause of error. That doesn't mean this is always the case. However, your sample code is pretty much a classic example of where mutable local state is used entirely unnecessarily, in a way that not only makes it more error prone and less clear but also less efficient.
Your first code example is more inefficient than your second, functional solution. Why potentially make two assignments to submax1 when you only ever need to assign one? You ask which of the two inputs is larger anyway, so why not ask that first and then make one assignment? Why was your first approach to temporarily store partial state only halfway through the process of asking such a simple question?
Your first code example is also inefficient because of unnecessary code duplication. You're repeatedly asking "which is the biggest of two values?" Why write out the code for that 3 times independently? Needlessly repeating code is a known bad habit in OOP every bit as much as FP and for precisely the same reasons. Each time you needlessly repeat code, you open a potential source of error. Adding mutable local state (especially when so unnecessary) only adds to the fragility and to the potential for hard to spot errors, even in short code. You just have to type submax1 instead of submax2 in one place and you may not notice the error for a while.
Your second, FP solution removes the code duplication, dramatically reducing the chance of error, and shows that there was simply no need for mutable local state. It's also, as you yourself say, cleaner and clearer - and better than the alternative solution in om-nom-nom's answer.
(By the way, the idiomatic Scala way to write such a simple function is
def max(a: Int, b: Int) = if (a > b) a else b
which terser style emphasises its simplicity and makes the code less verbose)
Your first solution was inefficient and fragile, but it was your first instinct. The warning caused you to find a better solution. The warning proved its value. Scala was designed to be accessible to Java developers and is taken up by many with a long experience of imperative style and little or no knowledge of FP. Their first instinct is almost always the same as yours. You have demonstrated how that warning can help improve code.
There are cases where using mutable local state can be faster but the advice of Scala experts in general (not just the pure FP true believers) is to prefer immutability and to reach for mutability only where there is a clear case for its use. This is so against the instincts of many developers that the warning is useful even if annoying to experienced Scala devs.
It's funny how often some kind of max function comes up in "new to FP/Scala" questions. The questioner is very often tripping up on errors caused by their use of local state... which link both demonstrates the often obtuse addiction to mutable state among some devs while also leading me on to your other question.
Functional Iteration over Collections
There are three functional ways to iterate over collections in Scala
For Comprehensions
Explicit Recursion
Folds and other Higher Order Functions
For Comprehensions
Your question:
Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6? Can't think of solution without local mutable variable
Answer: assuming xs is a list (or some other sequence) of integers, then
for (x <- xs; if x % 6 == 0) yield x
will give you a sequence (of the same type as xs) containing only those items which are divisible by 6, if any. No mutable state required. Scala just iterates over the sequence for you and returns anything matching your criteria.
If you haven't yet learned the power of for comprehensions (also known as sequence comprehensions) you really should. Its a very expressive and powerful part of Scala syntax. You can even use them with side effects and mutable state if you want (look at the final example on the tutorial I just linked to). That said, there can be unexpected performance penalties and they are overused by some developers.
Explicit Recursion
In the question I linked to at the end of the first section, I give in my answer a very simple, explicitly recursive solution to returning the largest Int from a list.
def max(xs: List[Int]): Option[Int] = xs match {
case Nil => None
case List(x: Int) => Some(x)
case x :: y :: rest => max( (if (x > y) x else y) :: rest )
}
I'm not going to explain how the pattern matching and explicit recursion work (read my other answer or this one). I'm just showing you the technique. Most Scala collections can be iterated over recursively, without any need for mutable state. If you need to keep track of what you've been up to along the way, you pass along an accumulator. (In my example code, I stick the accumulator at the front of the list to keep the code smaller but look at the other answers to those questions for more conventional use of accumulators).
But here is a (naive) explicitly recursive way of finding those integers divisible by 6
def divisibleByN(n: Int, xs: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: rest if x % n == 0 => x :: divisibleByN(n, rest)
case _ :: rest => divisibleByN(n, rest)
}
I call it naive because it isn't tail recursive and so could blow your stack. A safer version can be written using an accumulator list and an inner helper function but I leave that exercise to you. The result will be less pretty code than the naive version, no matter how you try, but the effort is educational.
Recursion is a very important technique to learn. That said, once you have learned to do it, the next important thing to learn is that you can usually avoid using it explicitly yourself...
Folds and other Higher Order Functions
Did you notice how similar my two explicit recursion examples are? That's because most recursions over a list have the same basic structure. If you write a lot of such functions, you'll repeat that structure many times. Which makes it boilerplate; a waste of your time and a potential source of error.
Now, there are any number of sophisticated ways to explain folds but one simple concept is that they take the boilerplate out of recursion. They take care of the recursion and the management of accumulator values for you. All they ask is that you provide a seed value for the accumulator and the function to apply at each iteration.
For example, here is one way to use fold to extract the highest Int from the list xs
xs.tail.foldRight(xs.head) {(a, b) => if (a > b) a else b}
I know you aren't familiar with folds, so this may seem gibberish to you but surely you recognise the lambda (anonymous function) I'm passing in on the right. What I'm doing there is taking the first item in the list (xs.head) and using it as the seed value for the accumulator. Then I'm telling the rest of the list (xs.tail) to iterate over itself, comparing each item in turn to the accumulator value.
This kind of thing is a common case, so the Collections api designers have provided a shorthand version:
xs.reduce {(a, b) => if (a > b) a else b}
(If you look at the source code, you'll see they have implemented it using a fold).
Anything you might want to do iteratively to a Scala collection can be done using a fold. Often, the api designers will have provided a simpler higher-order function which is implemented, under the hood, using a fold. Want to find those divisible-by-six Ints again?
xs.foldRight(Nil: List[Int]) {(x, acc) => if (x % 6 == 0) x :: acc else acc}
That starts with an empty list as the accumulator, iterates over every item, only adding those divisible by 6 to the accumulator. Again, a simpler fold-based HoF has been provided for you:
xs filter { _ % 6 == 0 }
Folds and related higher-order functions are harder to understand than for comprehensions or explicit recursion, but very powerful and expressive (to anybody else who understands them). They eliminate boilerplate, removing a potential source of error. Because they are implemented by the core language developers, they can be more efficient (and that implementation can change, as the language progresses, without breaking your code). Experienced Scala developers use them in preference to for comprehensions or explicit recursion.
tl;dr
Learn For comprehensions
Learn explicit recursion
Don't use them if a higher-order function will do the job.
It is always nicer to use immutable variables since they make your code easier to read. Writing a recursive code can help solve your problem.
def max(x: List[Int]): Int = {
if (x.isEmpty == true) {
0
}
else {
Math.max(x.head, max(x.tail))
}
}
val a_list = List(a,b,c,d)
max_value = max(a_list)