Is it Scala style to use a for loop in Scala/Spark? - scala

I have heard that it is a good practice in Scala to eliminate for loops and do things "the Scala way". I even found a Scala style checker at http://www.scalastyle.org. Are for loops a no-no in Scala? In a course at https://www.udemy.com/course/apache-spark-with-scala-hands-on-with-big-data/learn/lecture/5363798#overview I found this example, which makes me thing that for looks are okay to use, but using the Scala format and syntax of course, in a single line and not like the traditional Java for looks in multiple lines of code. See this example I found from that Udemy course:
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
for (ship <- shipList) {println(ship)}
That for loop prints this result, as expected:
Enterprise Defiant Voyager Deep Space Nine
I was wondering if using for as in the example above is acceptable Scala style code, or it if is a no-no and why. Thank you!

There is no problem in this for loop, but you can use functions form List object for your work in more functional way.
e.g. instead of using
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
for (ship <- shipList) {println(ship)}
You can use
val shipList = List("Enterprise", "Defiant", "Voyager", "Deep Space Nine")
shipList.foreach(element => println(element) )
or
shipList.foreach(println)

You can use for loops in Scala, there is no problem with that. But the difference is that this for-loop is not an expression and does not return a value, so you need to use a variable in order to return any value. Scala gives preference to work with immutable types.
In your example you print messages in the console, you need to perform a "side effect" to extract the value breaking the referencial transparency, I mean, you depend on the IO operation to extract a value, or you have mutate a variable which is in the scope which maybe is being accessed by another thread or another concurrent task thereby there is no guarantee that the value that you collect wont be what you are expecting. Obviously, all these hypothesis are related to concurrent/parallel programming and there is where Scala and the immutable style help.
To show the elements of a collection you can use a for loop, but if you want to count the total number of chars in Scala you do that using a expression like:
val chars = shipList.foldLeft(0)((a, b) => a + b.length)
To sum up, most of the times the Scala code that you will read uses immutable style of programming although not always because Scala supports the other way of coding too, but it is weird to find something using a classic Java OOP style, mutating object instances and using getters and setters.

Related

How to find max date from stream in scala using CompareTo?

I am new to Scala and trying to explore how I can use Java functionalities with Scala.
I am having stream of LocalDate which is a Java class and I am trying to find maximum date out of my list.
var processedResult : Stream[LocalDate] =List(javaList)
.toStream
.map { s => {
//some processing
LocalDate.parse(str, formatter)
}
}
I know we can do easily by using .compare() and .compareTo() in Java but I am not sure how do I use the same thing over here.
Also, I have no idea how Ordering works in Scala when it comes to sorting.
Can anyone suggest how can get this done?
First of all, a lot of minor details that I will point out since it seems you are pretty new to the language and I expect those to help you with your learning path.
First, avoid var at all costs, especially when learning.
While mutability has its place and is not always wrong, forcing you to avoid it while learning will help you. Particularly, avoid it when it doesn't provide any value; like in this case.
Second, this List(javaList) doesn't do what you think it does. It creates a single element Scala List whose unique element is a Java List. What you probably want is to transform that Java List into a Scala one, for that you can use the CollectionConverters.
import scala.jdk.CollectionConverters._ // This works if you are in 2.13
// if you are in 2.12 or lower use: import scala.collection.JavaConverters._
val scalaList = javaList.asScala.toList
Third, not sure why you want to use a Scala Stream, a Stream is for infinite or very large collections where you want all the transformations to be made lazily and only produce elements as they are consumed (also, btw, it was deprecated in 2.13 in favour of LazyList).
Maybe, you are confused because in Java you need a "Stream" to apply functional operations like map? If so, note that in Scala all collections provide the same rich API.
Fourth, Ordering is a Typeclass which is a functional pattern for Polymorphism. On its own, this is a very broad question so I won't answer it here, but I hope the two links provide insight.
The TL;DR; is simple, it is just that an Ordering for a type T knows how to order (sort) elements of type T. Thus operations like max will work for any collection of any type if, and only if, the compiler can prove the existence of an Ordering for that type if it can then it will pass such value implicitly to the method call for you; again the implicits topic is very broad and deserves its own question.
Now for your particular question, you can just call max or maxOption in the List or Stream and that is all.
Note that max will throw if the List is empty, whereas maxOption returns an Option which will be empty (None) for an empty input; idiomatic Scala favour the latter over the former.
If you really want to use compareTo then you can provide your own Ordering.
scalaList.maxOption(Ordering.fromLessThan[LocalDate]((d1, d2) => d1.compareTo(d2) < 0))
Ordering[A] is a type class which defines how to compare 2 elements of type A. So to compare LocalDates you need Ordering[LocalDate] instance.
LocalDate extends Comparable in Java and Scala conveniently provides instances for Comparables so when you invoke:
Ordering[java.time.LocalDate]
in REPL you'll see that Scala is able to provide you the instance without you needing to do anything (you could take a look at the list of methods provided by this typeclass).
Since you have and Ordering in implicit scope which types matches the Stream's type (e.g. Stream[LocalDate] needs Ordering[LocalDate]) you can call .max method... and that's it.
val processedResult : Stream[LocalDate] = ...
val newestDate: LocalDate = processedResult.max

Is the reason we can use val defining functions in Scala?

Is the reason a val variable can be used to contain a function definition is because functions are first class citizens where they can be contained in variables?
In Scala damn near everything is an expression. From a practical perspective what that means is pretty much every bit of syntactically correct Scala code that you can write evaluates to an object that can you can do more Scala on. Examples of things you can do to these objects are: call a method on it, pass it to a function, or store it in a val. Expressions can be thought of in contrast to statements, which are just instructions to the computer to do something. An example of the use of statements in Scala are import commands. The heavy prevalence of expressions in Scala are a deliberate design choice intended to make the language more flexible and extensible.

Is this bad style to replace map or flatMap with return and get on Try in Scala?

I'm wondering if this is a bad code style to replace map with return and get on Try for readability? Say I have some variable with Try inside it and then I need to do anything on it.
val myData: Try[String]
I can do:
myData.flatMap{
some long code
}
Or I can do:
if (myData.isFailure) return myData
val myString = myData.get
some long code that use myString
Yes I would regard this as bad style. One of the biggest gains I have made as a programmer have come from replacing statements with expressions. So I would rewrite your example using pattern matching.
myData match {
case Success(myString) =>
some long code that use myString
case Failure(_) => tFileDf
}
I understand that if-guards are very common in languages like java and C# and they even make the code easier to read in these cases. Since moving to Scala I find less cases where if-guards would improve my code. However the move to expressions has greatly improved the quality of all of my code.
Once you make this change you will gradually use less mutable state and side effects. Gradually your methods will become pure functions. They will probably become shorter. Then you will discover the benefits of total functions. These things improve all of your code where if-guards are local optimisations that will start to clash with modern scala code, and even make it error prone to change.
In the case above I would probably consider exposing the Try, this exposes the failure case more explicitly than returning a default or error value of the same return type.
Another reason that return is discouraged is that it does not alway play nicely in functions.
My advice is to embrace the more functional aspects of Scala and see where it takes you. It will take you more time to write than the equivalent in Java it pay dividends very quickly.

for vs map in functional programming

I am learning functional programming using scala. In general I notice that for loops are not much used in functional programs instead they use map.
Questions
What are the advantages of using map over for loop in terms of performance, readablity etc ?
What is the intention of bringing in a map function when it can be achieved using loop ?
Program 1: Using For loop
val num = 1 to 1000
val another = 1000 to 2000
for ( i <- num )
{
for ( j <- another)
{
println(i,j)
}
}
Program 2 : Using map
val num = 1 to 1000
val another = 1000 to 2000
val mapper = num.map(x => another.map(y => (x,y))).flatten
mapper.map(x=>println(x))
Both program 1 and program 2 does the same thing.
The answer is quite simple actually.
Whenever you use a loop over a collection it has a semantic purpose. Either you want to iterate the items of the collection and print them. Or you want to transform the type of the elements to another type (map). Or you want to change the cardinality, such as computing the sum of the elements of a collection (fold).
Of course, all that can also be done using for - loops but to the reader of the code, it is more work to figure out which semantic purpose the loop has, compared to a well known named operation such as map, iter, fold, filter, ...
Another aspect is, that for loops lead to the dark side of using mutable state. How would you sum the elements of a collection in a for loop without mutable state? You would not. Instead you would need to write a recursive function. So, for good measure, it is best to drop the habit of thinking in for loops early and enjoy the brave new functional way of doing things.
I'll start by quoting Programming in Scala.
"Every for expression can be expressed in terms of the three higher-order functions map, flatMap and filter. This section describes the translation scheme, which is also used by the Scala compiler."
http://www.artima.com/pins1ed/for-expressions-revisited.html#23.4
So the reason that you noticed for-loops are not used as much is because they technically aren't needed, and any for expressions you do see are just syntactic sugar which the compiler will translate into some equivalent. The rules for translating a for expression into a map/flatMap/filter expression are listed in the link above.
Generally speaking, in functional programming there is no index variable to mutate. This means one typically makes heavy use of function calls (often in the form of recursion) such as list folds in place of a while or for loop.
For a good example of using list folds in place of while/for loops, I recommend "Explain List Folds to Yourself" by Tony Morris.
https://vimeo.com/64673035
If a function is tail-recursive (denoted with #tailrec) then it can be optimized so as to not incur the high use of the stack which is common in recursive functions. In this case the compiler can translate the tail-recursive function to the "while loop equivalent".
To answer the second part of Question 1, there are some cases where one could make an argument that a for expression is clearer (although certainly there are cases where the opposite is true too.) One such example is given in the Coursera.org course "Functional Programming with Scala" by Dr. Martin Odersky:
for {
i <- 1 until n
j <- 1 until i
if isPrime(i + j)
} yield (i, j)
is arguably more clear than
(1 until n).flatMap(i =>
(1 until i).withFilter(j => isPrime(i + j))
.map(j => (i, j)))
For more information check out Dr. Martin Odersky's "Functional Programming with Scala" course on Coursera.org. Lecture 6.5 "Translation of For" in particular discusses this in more detail.
Also, as a quick side note, in your example you use
mapper.map(x => println(x))
It is generally more accepted to use foreach in this case because you have the intent of side-effecting. Also, there is short hand
mapper.foreach(println)
As for Question 2, it is better to use the map function in place of loops (especially when there is mutation in the loop) because map is a function and it can be composed. Also, once one is acquainted and used to using map, it is very easy to reason about.
The two programs that you have provided are not the same, even if the output might suggest that they are. It is true that for comprehensions are de-sugared by the compiler, but the first program you have is actually equivalent to:
val num = 1 to 1000
val another = 1000 to 2000
num.foreach(i => another.foreach(j => println(i,j)))
It should be noted that the resultant type for the above (and your example program) is Unit
In the case of your second program, the resultant type of the program is, as determined by the compiler, Seq[Unit] - which is now a Seq that has the length of the product of the loop members. As a result, you should always use foreach to indicate an effect that results in a Unit result.
Think about what is happening at the machine-language level. Loops are still fundamental. Functional programming abstracts the loop that is implemented in conventional programming.
Essentially, instead of writing a loop as you would in conventional or imparitive programming, the use of chaining or pipelining in functional programming allows the compiler to optimize the code for the user, and map is simply mapping the function to each element as a list or collection is iterated through. Functional programming, is more convenient, and abstracts the mundane implementation of "for" loops etc. There are limitations to this convenience, particularly if you intend to use functional programming to implement parallel processing.
It is arguable depending on the Software Engineer or developer, that the compiler will be more efficient and know ahead of time the situation it is implemented in. IMHO, mid-level Software Engineers who are familiar with functional programming, well versed in conventional programming, and knowledgeable in parallel processing, will implement both conventional and functional.

The case for point free style in Scala

This may seem really obvious to the FP cognoscenti here, but what is point free style in Scala good for? What would really sell me on the topic is an illustration that shows how point free style is significantly better in some dimension (e.g. performance, elegance, extensibility, maintainability) than code solving the same problem in non-point free style.
Quite simply, it's about being able to avoid specifying a name where none is needed, consider a trivial example:
List("a","b","c") foreach println
In this case, foreach is looking to accept String => Unit, a function that accepts a String and returns Unit (essentially, that there's no usable return and it works purely through side effect)
There's no need to bind a name here to each String instance that's passed to println. Arguably, it just makes the code more verbose to do so:
List("a","b","c") foreach {println(_)}
Or even
List("a","b","c") foreach {s => println(s)}
Personally, when I see code that isn't written in point-free style, I take it as an indicator that the bound name may be used twice, or that it has some significance in documenting the code. Likewise, I see point-free style as a sign that I can reason about the code more simply.
One appeal of point-free style in general is that without a bunch of "points" (values as opposed to functions) floating around, which must be repeated in several places to thread them through the computation, there are fewer opportunities to make a mistake, e.g. when typing a variable's name.
However, the advantages of point-free are quickly counterbalanced in Scala by its meagre ability to infer types, a fact which is exacerbated by point-free code because "points" serve as clues to the type inferencer. In Haskell, with its almost-complete type inferencing, this is usually not an issue.
I see no other advantage than "elegance": It's a little bit shorter, and may be more readable. It allows to reason about functions as entities, without going mentally a "level deeper" to function application, but of course you need getting used to it first.
I don't know any example where performance improves by using it (maybe it gets worse in cases where you end up with a function when a method would be sufficient).
Scala's point-free syntax is part of the magic Scala operators-which-are-really-functions. Even the most basic operators are functions:
For example:
val x = 1
val y = x + 1
...is the same as...
val x = 1
val y = x.+(1)
...but of course, the point-free style reads more naturally (the plus appears to be an operator).