if statement inside for loop scala - scala

I'm new to scala and I'm trying to learn it. I'm stuck with an very simple thing which I cannot figure out. I simply want, inside a for loop, to check if the current item is present in a map then use that value, otherwise calculate it with a function.
It should be preatty easy but coming from python I cannot understand where the mistake is.
val predictions = for (review <- testValuesAsList;
if (storage.contains("review")){prediction = mymap("review")}
else {prediction = computeItemAvgRat(review._2, trainValuesAsList)}
) yield (review._3, prediction)
return predictions

There are no for loops in Scala, only for comprehensions; they are just sugar syntax for map, flatMap and withFilter calls.
And that is what is making your code fail.
IMHO, the for syntax is rarely better than just calling the combinators explicitly.
val predictions = testValuesAsList.map {
case (_, b, c) => // I don't know what would be proper names for the variables.
val prediction = myMap.getOrElse(
key = "review",
default = computeItemAvgRat(b, trainValuesAsList)
)
c -> prediction
}

Related

What is the advantage of using Option.map over Option.isEmpty and Option.get?

I am a new to Scala coming from Java background, currently confused about the best practice considering Option[T].
I feel like using Option.map is just more functional and beautiful, but this is not a good argument to convince other people. Sometimes, isEmpty check feels more straight forward thus more readable. Is there any objective advantages, or is it just personal preference?
Example:
Variation 1:
someOption.map{ value =>
{
//some lines of code
}
} orElse(foo)
Variation 2:
if(someOption.isEmpty){
foo
} else{
val value = someOption.get
//some lines of code
}
I intentionally excluded the options to use fold or pattern matching. I am simply not pleased by the idea of treating Option as a collection right now, and using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO. But no matter why I dislike these options, I want to keep the scope of this question to be the above two variations as named in the title.
Is there any objective advantages, or is it just personal preference?
I think there's a thin line between objective advantages and personal preference. You cannot make one believe there is an absolute truth to either one.
The biggest advantage one gains from using the monadic nature of Scala constructs is composition. The ability to chain operations together without having to "worry" about the internal value is powerful, not only with Option[T], but also working with Future[T], Try[T], Either[A, B] and going back and forth between them (also see Monad Transformers).
Let's try and see how using predefined methods on Option[T] can help with control flow. For example, consider a case where you have an Option[Int] which you want to multiply only if it's greater than a value, otherwise return -1. In the imperative approach, we get:
val option: Option[Int] = generateOptionValue
var res: Int = if (option.isDefined) {
val value = option.get
if (value > 40) value * 2 else -1
} else -1
Using collections style method on Option, an equivalent would look like:
val result: Int = option
.filter(_ > 40)
.map(_ * 2)
.getOrElse(-1)
Let's now consider a case for composition. Let's say we have an operation which might throw an exception. Additionaly, this operation may or may not yield a value. If it returns a value, we want to query a database with that value, otherwise, return an empty string.
A look at the imperative approach with a try-catch block:
var result: String = _
try {
val maybeResult = dangerousMethod()
if (maybeResult.isDefined) {
result = queryDatabase(maybeResult.get)
} else result = ""
}
catch {
case NonFatal(e) => result = ""
}
Now let's consider using scala.util.Try along with an Option[String] and composing both together:
val result: String = Try(dangerousMethod())
.toOption
.flatten
.map(queryDatabase)
.getOrElse("")
I think this eventually boils down to which one can help you create clear control flow of your operations. Getting used to working with Option[T].map rather than Option[T].get will make your code safer.
To wrap up, I don't believe there's a single truth. I do believe that composition can lead to beautiful, readable, side effect deferring safe code and I'm all for it. I think the best way to show other people what you feel is by giving them examples as we just saw, and letting them feel for themselves the power they can leverage with these sets of tools.
using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO
If you do just want an isEmpty check, isEmpty/isDefined is perfectly fine. But in your case you also want to get the value. And using pattern matching for this is not abuse; it's precisely the basic use-case. Using get allows to very easily make errors like forgetting to check isDefined or making the wrong check:
if(someOption.isEmpty){
val value = someOption.get
//some lines of code
} else{
//some other lines
}
Hopefully testing would catch it, but there's no reason to settle for "hopefully".
Combinators (map and friends) are better than get for the same reason pattern matching is: they don't allow you to make this kind of mistake. Choosing between pattern matching and combinators is a different question. Generally combinators are preferred because they are more composable (as Yuval's answer explains). If you want to do something covered by a single combinator, I'd generally choose them; if you need a combination like map ... getOrElse, or a fold with multi-line branches, it depends on the specific case.
It seems similar to you in case of Option but just consider the case of Future. You will not be able to interact with the future's value after going out of Future monad.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Promise
import scala.util.{Success, Try}
// create a promise which we will complete after sometime
val promise = Promise[String]();
// Now lets consider the future contained in this promise
val future = promise.future;
val byGet = if (!future.value.isEmpty) {
val valTry = future.value.get
valTry match {
case Success(v) => v + " :: Added"
case _ => "DEFAULT :: Added"
}
} else "DEFAULT :: Added"
val byMap = future.map(s => s + " :: Added")
// promise was completed now
promise.complete(Try("PROMISE"))
//Now lets print both values
println(byGet)
// DEFAULT :: Added
println(byMap)
// Success(PROMISE :: Added)

Scala practices: lists and case classes

I've just started using Scala/Spark and having come from a Java background and I'm still trying to wrap my head around the concept of immutability and other best practices of Scala.
This is a very small segment of code from a larger program:
intersections is RDD(Key, (String, String))
obs is (Key, (String, String))
Data is just a case class I've defined above.
val intersections = map1 join map2
var listOfDatas = List[Data]()
intersections take NumOutputs foreach (obs => {
listOfDatas ::= ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
listOfDatas foreach println
This code works and does what I need it to do, but I was wondering if there was a better way of making this happen. I'm using a variable list and rewriting it with a new list every single time I iterate, and I'm sure there has to be a better way to create an immutable list that's populated with the results of the ParseInformation method call. Also, I remember reading somewhere that instead of accessing the tuple values directly, the way I have done, you should use case classes within functions (as partial functions I think?) to improve readability.
Thanks in advance for any input!
This might work locally, but only because you are takeing locally. It will not work once distributed as the listOfDatas is passed to each worker as a copy. The better way of doing this IMO is:
val processedData = intersections map{case (key, (item1, item2)) => {
ParseInfo(key, item1, item2)
}}
processedData foreach println
A note for a new to functional dev: If all you are trying to do is transform data in an iterable (List), forget foreach. Use map instead, which runs your transformation on each item and spits out a new iterable of the results.
What's the type of intersections? It looks like you can replace foreach with map:
val listOfDatas: List[Data] =
intersections take NumOutputs map (obs => {
ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})

Spark's RDD.map() will not execute unless the item inside RDD is visited

I'm not quite sure about how Scala and Spark works, maybe I write the code in the wrong way.
The function I want to achieve is, for a given Seq[String, Int], assign a random item in v._2.path to _._2.
To do that, I implement a method and call this method in the next line
def getVerticesWithFeatureSeq(graph: Graph[WikiVertex, WikiEdge.Value]): RDD[(VertexId, WikiVertex)] = {
graph.vertices.map(v => {
//For each token in the sequence, assign an article to them based on its path(root to this node)
println(v._1+" before "+v._2.featureSequence)
v._2.featureSequence = v._2.featureSequence.map(f => (f._1, v._2.path.apply(new scala.util.Random().nextInt(v._2.path.size))))
println(v._1+" after "+v._2.featureSequence)
(v._1, v._2)
})
}
val dt = getVerticesWithFeatureSeq(wikiGraph)
When I execute it, I suppose the println should print out something, but it didn't.
If I add another line of code
dt.foreach(println)
println inside map will print correctly.
Is there some latency of spark's code execution? Like if no one is accessing a variable, the computing will be deferred or even canceled?
Is graph.vertices an RDD? That would explain your issue, since Spark transformations are lazy until no action is executed, foreach in your case:
val dt = getVerticesWithFeatureSeq(wikiGraph) //no result is computed yet, map transformation is 'recorded'
dt.foreach(println) //foreach action requires a result, this triggers the computation
RDD's remember the transformations applied and they are only computed when an action requires a result to be returned to the driver program.
You can check http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations for further details and a list of available transformations and actions.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity

What are good examples of: "operation of a program should map input values to output values rather than change data in place"

I came across this sentence in Scala in explaining its functional behavior.
operation of a program should map input of values to output values rather than change data in place
Could somebody explain it with a good example?
Edit: Please explain or give example for the above sentence in its context, please do not make it complicate to get more confusion
The most obvious pattern that this is referring to is the difference between how you would write code which uses collections in Java when compared with Scala. If you were writing scala but in the idiom of Java, then you would be working with collections by mutating data in place. The idiomatic scala code to do the same would favour the mapping of input values to output values.
Let's have a look at a few things you might want to do to a collection:
Filtering
In Java, if I have a List<Trade> and I am only interested in those trades executed with Deutsche Bank, I might do something like:
for (Iterator<Trade> it = trades.iterator(); it.hasNext();) {
Trade t = it.next();
if (t.getCounterparty() != DEUTSCHE_BANK) it.remove(); // MUTATION
}
Following this loop, my trades collection only contains the relevant trades. But, I have achieved this using mutation - a careless programmer could easily have missed that trades was an input parameter, an instance variable, or is used elsewhere in the method. As such, it is quite possible their code is now broken. Furthermore, such code is extremely brittle for refactoring for this same reason; a programmer wishing to refactor a piece of code must be very careful to not let mutated collections escape the scope in which they are intended to be used and, vice-versa, that they don't accidentally use an un-mutated collection where they should have used a mutated one.
Compare with Scala:
val db = trades filter (_.counterparty == DeutscheBank) //MAPPING INPUT TO OUTPUT
This creates a new collection! It doesn't affect anyone who is looking at trades and is inherently safer.
Mapping
Suppose I have a List<Trade> and I want to get a Set<Stock> for the unique stocks which I have been trading. Again, the idiom in Java is to create a collection and mutate it.
Set<Stock> stocks = new HashSet<Stock>();
for (Trade t : trades) stocks.add(t.getStock()); //MUTATION
Using scala the correct thing to do is to map the input collection and then convert to a set:
val stocks = (trades map (_.stock)).toSet //MAPPING INPUT TO OUTPUT
Or, if we are concerned about performance:
(trades.view map (_.stock)).toSet
(trades.iterator map (_.stock)).toSet
What are the advantages here? Well:
My code can never observe a partially-constructed result
The application of a function A => B to a Coll[A] to get a Coll[B] is clearer.
Accumulating
Again, in Java the idiom has to be mutation. Suppose we are trying to sum the decimal quantities of the trades we have done:
BigDecimal sum = BigDecimal.ZERO
for (Trade t : trades) {
sum.add(t.getQuantity()); //MUTATION
}
Again, we must be very careful not to accidentally observe a partially-constructed result! In scala, we can do this in a single expression:
val sum = (0 /: trades)(_ + _.quantity) //MAPPING INTO TO OUTPUT
Or the various other forms:
(trades.foldLeft(0)(_ + _.quantity)
(trades.iterator map (_.quantity)).sum
(trades.view map (_.quantity)).sum
Oh, by the way, there is a bug in the Java implementation! Did you spot it?
I'd say it's the difference between:
var counter = 0
def updateCounter(toAdd: Int): Unit = {
counter += toAdd
}
updateCounter(8)
println(counter)
and:
val originalValue = 0
def addToValue(value: Int, toAdd: Int): Int = value + toAdd
val firstNewResult = addToValue(originalValue, 8)
println(firstNewResult)
This is a gross over simplification but fuller examples are things like using a foldLeft to build up a result rather than doing the hard work yourself: foldLeft example
What it means is that if you write pure functions like this you always get the same output from the same input, and there are no side effects, which makes it easier to reason about your programs and ensure that they are correct.
so for example the function:
def times2(x:Int) = x*2
is pure, while
def add5ToList(xs: MutableList[Int]) {
xs += 5
}
is impure because it edits data in place as a side effect. This is a problem because that same list could be in use elsewhere in the the program and now we can't guarantee the behaviour because it has changed.
A pure version would use immutable lists and return a new list
def add5ToList(xs: List[Int]) = {
5::xs
}
There are plenty examples with collections, which are easy to come by but might give the wrong impression. This concept works at all levels of the language (it doesn't at the VM level, however). One example is the case classes. Consider these two alternatives:
// Java-style
class Person(initialName: String, initialAge: Int) {
def this(initialName: String) = this(initialName, 0)
private var name = initialName
private var age = initialAge
def getName = name
def getAge = age
def setName(newName: String) { name = newName }
def setAge(newAge: Int) { age = newAge }
}
val employee = new Person("John")
employee.setAge(40) // we changed the object
// Scala-style
case class Person(name: String, age: Int) {
def this(name: String) = this(name, 0)
}
val employee = new Person("John")
val employeeWithAge = employee.copy(age = 40) // employee still exists!
This concept is applied on the construction of the immutable collection themselves: a List never changes. Instead, new List objects are created when necessary. Use of persistent data structures reduce the copying that would happen on a mutable data structure.