Avoid testing duplicate values with ScalaTest forAll - scala

I'm playing with property-based testing on ScalaTest and I had the following code:
val myStrings = Gen.oneOf("hi", "hello")
forAll(myStrings) { s: String =>
println(s"String tested: $s")
}
When I run the forAll code, I've noticed that the same value is tried more than once, e.g.
String tested: hi
String tested: hello
String tested: hi
String tested: hello
String tested: hi
String tested: hello
...
I was wondering if there is a way for, given the code above, for each value in oneOf to be tried only once. In other words, to get ScalaTest not to use the same value twice.
Even if I used other generators, such as Gen.alphaStr, I'd like to find a way to avoid testing the same String twice. The reason I'm interested in doing this is because each test runs against a server running in a different process, and hence there's a bit of cost involved, so I'd like to avoid testing the same thing twice.

What you're trying to do is seems to be against scalacheck ideology(see Note1); however it's kind of possible (with high probability) by reducing the number of samples:
scala> forAll(oneOf("a", "b")){i => println(i); true}.check(Test.Parameters.default.withMinSuccessfulTests(2))
a
b
+ OK, passed 2 tests.
Note that you can still get aa/bb sometimes, as scala-check is built on randomness and statistical approach. If you need to always check all combinations - you probably don't need scala-check:
scala> assert(Set("a", "b").forall(_ => true))
Basically Gen allows you to create an infinite collection that represents a distribution of input values. The more values you generate - the better sampling you get. So if you have N possible states, you can't guarantee that they won't repeat in an infinite collection.
The only way to do exactly what you want is to explicitly check for duplicates before calling the service. You can use something like Option(ConcurrentHashMap.putIfAbscent(value, value)).isEmpty for that. Keep in mind it is a risk of OOM so be careful to take care of the amount of generated values and maybe even add an explicit check.
Note1) What scalacheck is needed for reducing number of combinations from maximum (which is more than 100) to some value that still gives you a good check. So scalacheck is useful when a set of possible inputs is really huge. And in that case the probability of repetitions is really small
P.S.
Talking about oneOf (from scaladoc):
def oneOf[T](t0: T, t1: T, tn: T*): Gen[T]
Picks a random value from a list
See also (examples are a bit outdated): How can I reduce the number of test cases ScalaCheck generates?

I would aim to increase the entropy of values. Using random sentences will increase it a lot, although not (theoretically) fixing the issue.
val genWord = Gen.onOf("hi", "hello")
def sentanceOf(words: Int): Gen[String] = {
Gen.listOfN(words, genWord).map(_.mkString(" ")
}

Related

how to use forAll in scalatest to generate only one object of a generator?

Im working with scalatest and scalacheck, alsso working with FeatureSpec.
I have a generator class that generate object for me that looks something like this:
object InvoiceGen {
def myObj = for {
country <- Gen.oneOf(Seq("France", "Germany", "United Kingdom", "Austria"))
type <- Gen.oneOf(Seq("Communication", "Restaurants", "Parking"))
amount <- Gen.choose(100, 4999)
number <- Gen.choose(1, 10000)
valid <- Arbitrary.arbitrary[Boolean]
} yield SomeObject(country, type, "1/1/2014", amount,number.toString, 35, "something", documentTypeValid, valid, "")
Now, I have the testing class which works with FeatureSpec and everything that I need to run the tests.
In this class I have scenarios, and in each scenario I want to generate a different object.
The thing is from what I understand is that to generate object is better to use forAll func, but for all will not sure to bring you an object so you can add minSuccessful(1) to make sure you get at list 1 obj....
I did it like this and it works:
scenario("some scenario") {
forAll(MyGen.myObj, minSuccessful(1)) { someObject =>
Given("A connection to the system")
loginActions shouldBe 'Connected
When("something")
//blabla
Then("something should happened")
//blabla
}
}
but im not sure exactly what it means.
What I want is to generate an invoice each scenario and do some actions on it...
im not sure why i care if the generation work or didnt work...i just want a generated object to work with.
TL;DR: To get one object, and only one, use myObj.sample.get. Unless your generator is doing something fancy that's perfectly safe and won't blow up.
I presume that your intention is to run some kind of integration/acceptance test with some randomly generated domain object—in other words (ab-)use scalacheck as a simple data generator—and you hope that minSuccessful(1) would ensure that the test only runs once.
Be aware that this is not the case!. scalacheck will run your test multiple times if it fails, to try and shrink the input data to a minimal counterexample.
If you'd like to ensure that your test runs only once you must use sample.
However, if running the test multiple times is fine, prefer minSuccessful(1) to "succeed fast" but still profit from minimized counterexamples in case the test fails.
Gen.sample returns an option because generators can fail:
ScalaCheck generators can fail, for instance if you're adding a filter (listingGen.suchThat(...)), and that failure is modeled with the Option type.
But:
[…] if you're sure that your generator never will fail, you can simply call Option.get like you do in your example above. Or you can use Option.getOrElse to replace None with a default value.
Generally if your generator is simple, i.e. does not use generators that could fail and does not use any filters on its own, it's perfectly safe to just call .get on the option returned by .sample. I've been doing that in the past and never had problems with it. If your generators frequently return None from .sample they'd likely make scalacheck fail to successfully generate values as well.
If all that you want is a single object use Gen.sample.get.
minSuccessful has a very different meaning: It's the minimal number of successful tests that scalacheck runs—which by no means implies
that scalacheck takes only a single value out of the generator, or
that the test runs only once.
With minSuccessful(1) scalacheck wants one successful test. It'll take samples out of the generator until the test runs at least once—i.e. if you filter the generated values with whenever in your test body scalacheck will take samples as long as whenever discards them.
If the test passes scalacheck is happy and won't run the test a second time.
However if the test fails scalacheck will try and produce a minimal example to fail the test. It'll shrink the input data and run the test as long as it fails and then provides you with the minimized counter example rather than the actual input that triggered the initial failure.
That's an important property of property testing as it helps you to discover bugs: The original data is frequently too large to lend itself for debugging. Minimizing it helps you discover the piece of input data that actually triggers the failure, i.e. corner cases like empty strings that you didn't think of.
I think the way you want to use Scalacheck (generate only one object and execute the test for it) defeats the purpose of property-based testing. Let me explain a bit in detail:
In classical unit-testing, you would generate your system under test, be it an object or a system of dependent objects, with some fixed data. This could e.g. be strings like "foo" and "bar" or, if you needed a name, you would use something like "John Doe". For integers and other data, you can also randomly choose some values.
The main advantage is that these are "plain" values—you can directly see them in the code and correlate them with the output of a failed test. The big disadvantage is that the tests will only ever run with the values you specified, which in turn means that your code is also only tested with these values.
In contrast, property-based testing allows you to just describe how the data should look like (e.g. "a positive integer", "a string of maximum 20 characters"). The testing framework will then—with the help of generators—generate a number of matching objects and execute the test for all of them. This way, you can be more sure that your code will actually be correct for different inputs, which after all is the purpose of testing: to check if your code does what it should for the possible inputs.
I never really worked with Scalacheck, but a colleague explained it to me that it also tries to cover edge-cases, e.g. putting in a 0 and MAX_INT for a positive integer, or an empty string for the aforementioned string with max. 20 characters.
So, to sum it up: Running a property-based test only once for one generic object is the wrong thing to do. Instead, once you have the generator infrastructure in place, embrace the advantage you then have and let your code be checked a lot more times!

Scala (method / function).hashCode static value?

I' was curious if the hashCode() function returns always the same value for a lambda or something in Scala?
My tests have shown to me some static value that does not change even over builds. Is this intended behavior or may it change in future?
If this was some static behavior it would help me a lot building my library.
EDIT:
Let's take this source code:
object Main {
def main(args: Array[String]): Unit = {
val x = (s: String) => 1
val y = (s: String) => 2
println(x.hashCode())
println(y.hashCode())
}
}
It's output on console is for me always 1792393294 and 226170135.
What I'm currently doing is implementing a parser combinator library which I implemented in several languages. And I need to know when wrapper classes are the same (e.g. the underlying functions are the same) so I can implement something like a call stack, which I need to parse as far as possible on failure but prevent endless recursion on error.
Thanks in advance!
The default implementation for hashCode (at least in the Oracle JVM) is in terms of the (initial) memory address of that particular object. So if your program has constructed exactly the same sized objects in exactly the same order before constructing that object, it will in practice return the same value every time.
But this is not at all reliable; most programs do not do exactly the same things every time they run. As soon as you're doing something like responding to user input, that perfect reproducibility will disappear - e.g. maybe you sometimes add enough entries to a HashMap to trigger an enlargement of the table, and sometimes not. And if you construct the same value later in the program, it will of course have a different address; try doing
val z = (s: String) => 1
and observe that it will have a different hashCode from x. Not to mention that the numbers may well be different across different JVMs, different versions of the same JVM, or even when the same JVM is launched with a different -Xms setting.
Computers are often a lot more deterministic in practice than in theory. But this is not the kind of thing that's specified to happen, and certainly not something to rely on in your programs.

Lazily generate partial sums in Scala

I want to produce a lazy list of partial sums and stop when I have found a "suitable" sum. For example, I want to do something like the following:
val str = Stream.continually {
val i = Random.nextInt
println("generated " + i)
List(i)
}
str
.take(5)
.scanLeft(List[Int]())(_ ++ _)
.find(l => !l.forall(_ > 0))
This produces output like the following:
generated -354822103
generated 1841977627
z: Option[List[Int]] = Some(List(-354822103))
This is nice because I've avoided producing the entire list of lists before finding a suitable list. However, it's suboptimal because I generated one extra random number that I don't need (i.e., the second, positive number in this test run). I know I can hand code a solution to do what I want, but is there a way to use the core scala collection library to achieve this result without writing my own recursion?
The above example is just a toy, but the real application involves heavy-duty network traffic for each "retry" as I build up a map until the map is "complete".
EDIT: Note that even substituting take(1) for find(...) results in the generation of a random number even though the returned value List() does not depend on the number. Does anyone know why the number is being generated in this case? I would think scanLeft does not need to fetch an element of the iterable receiving the call to scanLeft in this case.

Use-cases for Streams in Scala

In Scala there is a Stream class that is very much like an iterator. The topic Difference between Iterator and Stream in Scala? offers some insights into the similarities and differences between the two.
Seeing how to use a stream is pretty simple but I don't have very many common use-cases where I would use a stream instead of other artifacts.
The ideas I have right now:
If you need to make use of an infinite series. But this does not seem like a common use-case to me so it doesn't fit my criteria. (Please correct me if it is common and I just have a blind spot)
If you have a series of data where each element needs to be computed but that you may want to reuse several times. This is weak because I could just load it into a list which is conceptually easier to follow for a large subset of the developer population.
Perhaps there is a large set of data or a computationally expensive series and there is a high probability that the items you need will not require visiting all of the elements. But in this case an Iterator would be a good match unless you need to do several searches, in that case you could use a list as well even if it would be slightly less efficient.
There is a complex series of data that needs to be reused. Again a list could be used here. Although in this case both cases would be equally difficult to use and a Stream would be a better fit since not all elements need to be loaded. But again not that common... or is it?
So have I missed any big uses? Or is it a developer preference for the most part?
Thanks
The main difference between a Stream and an Iterator is that the latter is mutable and "one-shot", so to speak, while the former is not. Iterator has a better memory footprint than Stream, but the fact that it is mutable can be inconvenient.
Take this classic prime number generator, for instance:
def primeStream(s: Stream[Int]): Stream[Int] =
Stream.cons(s.head, primeStream(s.tail filter { _ % s.head != 0 }))
val primes = primeStream(Stream.from(2))
It can be easily be written with an Iterator as well, but an Iterator won't keep the primes computed so far.
So, one important aspect of a Stream is that you can pass it to other functions without having it duplicated first, or having to generate it again and again.
As for expensive computations/infinite lists, these things can be done with Iterator as well. Infinite lists are actually quite useful -- you just don't know it because you didn't have it, so you have seen algorithms that are more complex than strictly necessary just to deal with enforced finite sizes.
In addition to Daniel's answer, keep in mind that Stream is useful for short-circuiting evaluations. For example, suppose I have a huge set of functions that take String and return Option[String], and I want to keep executing them until one of them works:
val stringOps = List(
(s:String) => if (s.length>10) Some(s.length.toString) else None ,
(s:String) => if (s.length==0) Some("empty") else None ,
(s:String) => if (s.indexOf(" ")>=0) Some(s.trim) else None
);
Well, I certainly don't want to execute the entire list, and there isn't any handy method on List that says, "treat these as functions and execute them until one of them returns something other than None". What to do? Perhaps this:
def transform(input: String, ops: List[String=>Option[String]]) = {
ops.toStream.map( _(input) ).find(_ isDefined).getOrElse(None)
}
This takes a list and treats it as a Stream (which doesn't actually evaluate anything), then defines a new Stream that is a result of applying the functions (but that doesn't evaluate anything either yet), then searches for the first one which is defined--and here, magically, it looks back and realizes it has to apply the map, and get the right data from the original list--and then unwraps it from Option[Option[String]] to Option[String] using getOrElse.
Here's an example:
scala> transform("This is a really long string",stringOps)
res0: Option[String] = Some(28)
scala> transform("",stringOps)
res1: Option[String] = Some(empty)
scala> transform(" hi ",stringOps)
res2: Option[String] = Some(hi)
scala> transform("no-match",stringOps)
res3: Option[String] = None
But does it work? If we put a println into our functions so we can tell if they're called, we get
val stringOps = List(
(s:String) => {println("1"); if (s.length>10) Some(s.length.toString) else None },
(s:String) => {println("2"); if (s.length==0) Some("empty") else None },
(s:String) => {println("3"); if (s.indexOf(" ")>=0) Some(s.trim) else None }
);
// (transform is the same)
scala> transform("This is a really long string",stringOps)
1
res0: Option[String] = Some(28)
scala> transform("no-match",stringOps)
1
2
3
res1: Option[String] = None
(This is with Scala 2.8; 2.7's implementation will sometimes overshoot by one, unfortunately. And note that you do accumulate a long list of None as your failures accrue, but presumably this is inexpensive compared to your true computation here.)
I could imagine, that if you poll some device in real time, a Stream is more convenient.
Think of an GPS tracker, which returns the actual position if you ask it. You can't precompute the location where you will be in 5 minutes. You might use it for a few minutes only to actualize a path in OpenStreetMap or you might use it for an expedition over six months in a desert or the rain forest.
Or a digital thermometer or other kinds of sensors which repeatedly return new data, as long as the hardware is alive and turned on - a log file filter could be another example.
Stream is to Iterator as immutable.List is to mutable.List. Favouring immutability prevents a class of bugs, occasionally at the cost of performance.
scalac itself isn't immune to these problems: http://article.gmane.org/gmane.comp.lang.scala.internals/2831
As Daniel points out, favouring laziness over strictness can simplify algorithms and make it easier to compose them.

Why should I avoid using local modifiable variables in Scala?

I'm pretty new to Scala and most of the time before I've used Java. Right now I have warnings all over my code saying that i should "Avoid mutable local variables" and I have a simple question - why?
Suppose I have small problem - determine max int out of four. My first approach was:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
var subMax1 = a
if (b > a) subMax1 = b
var subMax2 = c
if (d > c) subMax2 = d
if (subMax1 > subMax2) subMax1
else subMax2
}
After taking into account this warning message I found another solution:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
max(max(a, b), max(c, d))
}
def max(a: Int, b: Int): Int = {
if (a > b) a
else b
}
It looks more pretty, but what is ideology behind this?
Whenever I approach a problem I'm thinking about it like: "Ok, we start from this and then we incrementally change things and get the answer". I understand that the problem is that I try to change some initial state to get an answer and do not understand why changing things at least locally is bad? How to iterate over collection then in functional languages like Scala?
Like an example: Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6? Can't think of solution without local mutable variable.
In your particular case there is another solution:
def max4(a: Int, b: Int,c: Int, d: Int): Int = {
val submax1 = if (a > b) a else b
val submax2 = if (c > d) c else d
if (submax1 > submax2) submax1 else submax2
}
Isn't it easier to follow? Of course I am a bit biased but I tend to think it is, BUT don't follow that rule blindly. If you see that some code might be written more readably and concisely in mutable style, do it this way -- the great strength of scala is that you don't need to commit to neither immutable nor mutable approaches, you can swing between them (btw same applies to return keyword usage).
Like an example: Suppose we have a list of ints, how to write a
function that returns the sublist of ints which are divisible by 6?
Can't think of solution without local mutable variable.
It is certainly possible to write such function using recursion, but, again, if mutable solution looks and works good, why not?
It's not so related with Scala as with the functional programming methodology in general. The idea is the following: if you have constant variables (final in Java), you can use them without any fear that they are going to change. In the same way, you can parallelize your code without worrying about race conditions or thread-unsafe code.
In your example is not so important, however imagine the following example:
val variable = ...
new Future { function1(variable) }
new Future { function2(variable) }
Using final variables you can be sure that there will not be any problem. Otherwise, you would have to check the main thread and both function1 and function2.
Of course, it's possible to obtain the same result with mutable variables if you do not ever change them. But using inmutable ones you can be sure that this will be the case.
Edit to answer your edit:
Local mutables are not bad, that's the reason you can use them. However, if you try to think approaches without them, you can arrive to solutions as the one you posted, which is cleaner and can be parallelized very easily.
How to iterate over collection then in functional languages like Scala?
You can always iterate over a inmutable collection, while you do not change anything. For example:
val list = Seq(1,2,3)
for (n <- list)
println n
With respect to the second thing that you said: you have to stop thinking in a traditional way. In functional programming the usage of Map, Filter, Reduce, etc. is normal; as well as pattern matching and other concepts that are not typical in OOP. For the example you give:
Like an example: Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6?
val list = Seq(1,6,10,12,18,20)
val result = list.filter(_ % 6 == 0)
Firstly you could rewrite your example like this:
def max(first: Int, others: Int*): Int = {
val curMax = Math.max(first, others(0))
if (others.size == 1) curMax else max(curMax, others.tail : _*)
}
This uses varargs and tail recursion to find the largest number. Of course there are many other ways of doing the same thing.
To answer your queston - It's a good question and one that I thought about myself when I first started to use scala. Personally I think the whole immutable/functional programming approach is somewhat over hyped. But for what it's worth here are the main arguments in favour of it:
Immutable code is easier to read (subjective)
Immutable code is more robust - it's certainly true that changing mutable state can lead to bugs. Take this for example:
for (int i=0; i<100; i++) {
for (int j=0; j<100; i++) {
System.out.println("i is " + i = " and j is " + j);
}
}
This is an over simplified example but it's still easy to miss the bug and the compiler won't help you
Mutable code is generally not thread safe. Even trivial and seemingly atomic operations are not safe. Take for example i++ this looks like an atomic operation but it's actually equivalent to:
int i = 0;
int tempI = i + 0;
i = tempI;
Immutable data structures won't allow you to do something like this so you would need to explicitly think about how to handle it. Of course as you point out local variables are generally threadsafe, but there is no guarantee. It's possible to pass a ListBuffer instance variable as a parameter to a method for example
However there are downsides to immutable and functional programming styles:
Performance. It is generally slower in both compilation and runtime. The compiler must enforce the immutability and the JVM must allocate more objects than would be required with mutable data structures. This is especially true of collections.
Most scala examples show something like val numbers = List(1,2,3) but in the real world hard coded values are rare. We generally build collections dynamically (from a database query etc). Whilst scala can reassign the values in a colection it must still create a new collection object every time you modify it. If you want to add 1000 elements to a scala List (immutable) the JVM will need to allocate (and then GC) 1000 objects
Hard to maintain. Functional code can be very hard to read, it's not uncommon to see code like this:
val data = numbers.foreach(_.map(a => doStuff(a).flatMap(somethingElse)).foldleft("", (a : Int,b: Int) => a + b))
I don't know about you but I find this sort of code really hard to follow!
Hard to debug. Functional code can also be hard to debug. Try putting a breakpoint halfway into my (terrible) example above
My advice would be to use a functional/immutable style where it genuinely makes sense and you and your colleagues feel comfortable doing it. Don't use immutable structures because they're cool or it's "clever". Complex and challenging solutions will get you bonus points at Uni but in the commercial world we want simple solutions to complex problems! :)
Your two main questions:
Why warn against local state changes?
How can you iterate over collections without mutable state?
I'll answer both.
Warnings
The compiler warns against the use of mutable local variables because they are often a cause of error. That doesn't mean this is always the case. However, your sample code is pretty much a classic example of where mutable local state is used entirely unnecessarily, in a way that not only makes it more error prone and less clear but also less efficient.
Your first code example is more inefficient than your second, functional solution. Why potentially make two assignments to submax1 when you only ever need to assign one? You ask which of the two inputs is larger anyway, so why not ask that first and then make one assignment? Why was your first approach to temporarily store partial state only halfway through the process of asking such a simple question?
Your first code example is also inefficient because of unnecessary code duplication. You're repeatedly asking "which is the biggest of two values?" Why write out the code for that 3 times independently? Needlessly repeating code is a known bad habit in OOP every bit as much as FP and for precisely the same reasons. Each time you needlessly repeat code, you open a potential source of error. Adding mutable local state (especially when so unnecessary) only adds to the fragility and to the potential for hard to spot errors, even in short code. You just have to type submax1 instead of submax2 in one place and you may not notice the error for a while.
Your second, FP solution removes the code duplication, dramatically reducing the chance of error, and shows that there was simply no need for mutable local state. It's also, as you yourself say, cleaner and clearer - and better than the alternative solution in om-nom-nom's answer.
(By the way, the idiomatic Scala way to write such a simple function is
def max(a: Int, b: Int) = if (a > b) a else b
which terser style emphasises its simplicity and makes the code less verbose)
Your first solution was inefficient and fragile, but it was your first instinct. The warning caused you to find a better solution. The warning proved its value. Scala was designed to be accessible to Java developers and is taken up by many with a long experience of imperative style and little or no knowledge of FP. Their first instinct is almost always the same as yours. You have demonstrated how that warning can help improve code.
There are cases where using mutable local state can be faster but the advice of Scala experts in general (not just the pure FP true believers) is to prefer immutability and to reach for mutability only where there is a clear case for its use. This is so against the instincts of many developers that the warning is useful even if annoying to experienced Scala devs.
It's funny how often some kind of max function comes up in "new to FP/Scala" questions. The questioner is very often tripping up on errors caused by their use of local state... which link both demonstrates the often obtuse addiction to mutable state among some devs while also leading me on to your other question.
Functional Iteration over Collections
There are three functional ways to iterate over collections in Scala
For Comprehensions
Explicit Recursion
Folds and other Higher Order Functions
For Comprehensions
Your question:
Suppose we have a list of ints, how to write a function that returns sublist of ints which are divisible by 6? Can't think of solution without local mutable variable
Answer: assuming xs is a list (or some other sequence) of integers, then
for (x <- xs; if x % 6 == 0) yield x
will give you a sequence (of the same type as xs) containing only those items which are divisible by 6, if any. No mutable state required. Scala just iterates over the sequence for you and returns anything matching your criteria.
If you haven't yet learned the power of for comprehensions (also known as sequence comprehensions) you really should. Its a very expressive and powerful part of Scala syntax. You can even use them with side effects and mutable state if you want (look at the final example on the tutorial I just linked to). That said, there can be unexpected performance penalties and they are overused by some developers.
Explicit Recursion
In the question I linked to at the end of the first section, I give in my answer a very simple, explicitly recursive solution to returning the largest Int from a list.
def max(xs: List[Int]): Option[Int] = xs match {
case Nil => None
case List(x: Int) => Some(x)
case x :: y :: rest => max( (if (x > y) x else y) :: rest )
}
I'm not going to explain how the pattern matching and explicit recursion work (read my other answer or this one). I'm just showing you the technique. Most Scala collections can be iterated over recursively, without any need for mutable state. If you need to keep track of what you've been up to along the way, you pass along an accumulator. (In my example code, I stick the accumulator at the front of the list to keep the code smaller but look at the other answers to those questions for more conventional use of accumulators).
But here is a (naive) explicitly recursive way of finding those integers divisible by 6
def divisibleByN(n: Int, xs: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: rest if x % n == 0 => x :: divisibleByN(n, rest)
case _ :: rest => divisibleByN(n, rest)
}
I call it naive because it isn't tail recursive and so could blow your stack. A safer version can be written using an accumulator list and an inner helper function but I leave that exercise to you. The result will be less pretty code than the naive version, no matter how you try, but the effort is educational.
Recursion is a very important technique to learn. That said, once you have learned to do it, the next important thing to learn is that you can usually avoid using it explicitly yourself...
Folds and other Higher Order Functions
Did you notice how similar my two explicit recursion examples are? That's because most recursions over a list have the same basic structure. If you write a lot of such functions, you'll repeat that structure many times. Which makes it boilerplate; a waste of your time and a potential source of error.
Now, there are any number of sophisticated ways to explain folds but one simple concept is that they take the boilerplate out of recursion. They take care of the recursion and the management of accumulator values for you. All they ask is that you provide a seed value for the accumulator and the function to apply at each iteration.
For example, here is one way to use fold to extract the highest Int from the list xs
xs.tail.foldRight(xs.head) {(a, b) => if (a > b) a else b}
I know you aren't familiar with folds, so this may seem gibberish to you but surely you recognise the lambda (anonymous function) I'm passing in on the right. What I'm doing there is taking the first item in the list (xs.head) and using it as the seed value for the accumulator. Then I'm telling the rest of the list (xs.tail) to iterate over itself, comparing each item in turn to the accumulator value.
This kind of thing is a common case, so the Collections api designers have provided a shorthand version:
xs.reduce {(a, b) => if (a > b) a else b}
(If you look at the source code, you'll see they have implemented it using a fold).
Anything you might want to do iteratively to a Scala collection can be done using a fold. Often, the api designers will have provided a simpler higher-order function which is implemented, under the hood, using a fold. Want to find those divisible-by-six Ints again?
xs.foldRight(Nil: List[Int]) {(x, acc) => if (x % 6 == 0) x :: acc else acc}
That starts with an empty list as the accumulator, iterates over every item, only adding those divisible by 6 to the accumulator. Again, a simpler fold-based HoF has been provided for you:
xs filter { _ % 6 == 0 }
Folds and related higher-order functions are harder to understand than for comprehensions or explicit recursion, but very powerful and expressive (to anybody else who understands them). They eliminate boilerplate, removing a potential source of error. Because they are implemented by the core language developers, they can be more efficient (and that implementation can change, as the language progresses, without breaking your code). Experienced Scala developers use them in preference to for comprehensions or explicit recursion.
tl;dr
Learn For comprehensions
Learn explicit recursion
Don't use them if a higher-order function will do the job.
It is always nicer to use immutable variables since they make your code easier to read. Writing a recursive code can help solve your problem.
def max(x: List[Int]): Int = {
if (x.isEmpty == true) {
0
}
else {
Math.max(x.head, max(x.tail))
}
}
val a_list = List(a,b,c,d)
max_value = max(a_list)