scala for/yield on instance of class - scala

I am scratching my head on an example I've seen in breeze's documentation about distributions.
After creating a Rand instance, they show that you can do the following:
import breeze.stats.distributions._
val pois = new Poisson(3.0);
val doublePoi: Rand[Double] = for(x <- pois) yield x.toDouble
Now, this is very cool, I can get a Rand object that I can get Double instead of Int when I call the samples method. Another example might be:
val abc = ('a' to 'z').map(_.toString).toArray
val letterDist: Rand[String] = for(x <- pois) yield {
val i = if (x > 26) x % 26 else x
abc(i)
}
val lettersSamp = letterDist.samples.take(20)
println(letterSamp)
The question is, what is going on here? Rand[T] is not a collection, and all the for/yield examples I've seen so far work on collections. The scala docs don't mention much, the only thing I found is translating for-comprehensions in here.
What is the underlying rule here? How else can this be used (doesn't have to be a breeze related answer)

Scala has rules for translating for and for-yield expressions to the equivalent flatMap and map calls, optionally also applying filters using withFilter and such. The actual specification for how you translate each for comprehension expression into the equivalent method calls can be found in this section of the Scala Specification.
If we take your example and compile it we'll see the underlying transformation happen to the for-yield expression. This is done using scalac -Xprint:typer command to print out the type trees:
val letterDist: breeze.stats.distributions.Rand[String] =
pois.map[String](((x: Int) => {
val i: Int = if (x.>(26))
x.%(26)
else
x;
abc.apply(i)
}));
Here you can see that for-yield turns into a single map passing in an Int and applying the if-else inside the expression. This works because Rand[T] has a map method defined:
def map[E](f: T => E): Rand[E] = MappedRand(outer, f)

For comprehensions are just syntactic sugar for flatMap, map and withFilter. The main requirement for use in a for comprehension is that those methods are implemented. They are therefore not limited to collections E.g. some common non-collections used in for comprehensions are Option, Try and Future.
In your case, Poisson seems to inherit from a trait called Rand
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/distributions/Rand.scala
This trait has map, flatmap, and withFilter defined.
Tip: If you use an IDE like IntelliJ - you can press alt+enter on your for comprehension, and chose convert to desugared expression and you will see how it expands.

Related

Spark Cassandra Connector: for comprehension error (type mismatch)

Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
IntelliJ 2016.1
Java 8
Scala 2.10.5
Cassandra 3.x
spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
#slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}
For comprehension is actually syntax sugar; what's really going on underneath is a series of chained flatMap calls, with a single map at the end which replaces yield. Scala compiler translates every for comprehension like this. If you use if conditions in your for comprehension, they are translated into filters, and if you don't yield anything foreach is used. For more information, see here.
So, to explain on your case - this:
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(...)
is actually translated by the compiler to this:
val notifHist:RDD[NotifHistSingle] = (1 to 2)
.flatMap(x => sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", x)
.flatMap(x => x.eventhistids)
.map(x => NotifHistSingle(...))
You are getting the error if you include the 1 to 2 line because that makes your for comprehension operate on a sequence (vector, to be more precise). So when invoking flatMap(), compiler expects you to follow up with a function that transforms each element of your vector to a GenTraversableOnce. If you take a closer look at the type of your for expression (most IDEs will display it just by hovering over it) you can see it for yourself:
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That
This is the problem. Compiler doesn't know how to flatMap the vector 1 to 10 using a function that returns CassandraRDD. It wants a function that returns GenTraversableOnce. If you remove the 1 to 2 line then you remove this restriction.
Bottom line - if you want to use a for comprehension and yield values out of it, you have to obey the type rules. It's impossible to flatten a sequence consisting of elements which are not sequences and cannot be turned into sequences.
You can always map instead of flatMap since map is less restrictive (it requires A => B instead of A => GenTraversableOnce[B]). This means that instead of getting all results in one giant sequence, you will get a sequence where each element is a group of results (one group for each query). You can also play around the types, trying to get a GenTraversableOnce from your query result (e.g. invoking sc.cassandraTable().where().toArray or something; I don't really work with Cassandra so I don't know).

Explain how looping behaves in Scala compared to Java when accessing values

In the following code :
val expression : String = "1+1";
for (expressionChar <- expression) {
println(expressionChar)
}
the output is "1+1"
How is each character being accessed here ? What is going on behind the scenes in Scala. Using Java I would need to use the .charAt method , but not in Scala, why ?
In scala, a for comprehension:
for (p <- e) dosomething
would be translated to
e.foreach { x => dosomething }
You can look into Scala Reference 6.19 for more details.
However, String in scala is just a Java String, which doesnot have a method foreach.
But there is an implicit conversion defined in Predef which convert String to StringOps which indeed have a foreach method.
So, finally the code for(x <- "abcdef") println(x) be translated to:
Predef.augmentString("abcdef").foreach(x => println(x))
You can look int the scala reference or Scala Views for more information.
scala> for (c <- "1+1") println(c)
1
+
1
Is the same as
scala> "1+1" foreach (c => println(c))
1
+
1
Scalas for-comprehension is not a loop but a composition of higher order functions that operate on the input. The rules on how the for-comprehension is translated by the compiler can be read here.
Furthermore, in Scala Strings are collections. Internally they are just plain old Java Strings for which there exists some implicit conversions to give them the full power of the Scala collection library. But for users they can be seen as collections.
In your example foreach iterates over each element of the string and executes a closure on them (which is the body of the for-comprehension or the argument in the foreach call).
Even more information about for-comprehensions can be found in section 7 of the StackOverflow Scala Tutorial.

Scala Option object inside another Option object

I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity

Scala's for-comprehension `if` statements

Is it possible in scala to specialize on the conditions inside an if within a for comprehension? I'm thinking along the lines of:
val collection: SomeGenericCollection[Int] = ...
trait CollectionFilter
case object Even extends CollectionFilter
case object Odd extends CollectionFilter
val evenColl = for { i <- collection if(Even) } yield i
//evenColl would be a SomeGenericEvenCollection instance
val oddColl = for { i <- collection if(Odd) } yield i
//oddColl would be a SomeGenericOddCollection instance
The gist is that by yielding i, I get a new collection of a potentially different type (hence me referring to it as "specialization")- as opposed to just a filtered-down version of the same GenericCollection type.
The reason I ask is that I saw something that I couldn't figure out (an example can be found on line 33 of this ScalaQuery example. What it does is create a query for a database (i.e. SELECT ... FROM ... WHERE ...), where I would have expected it to iterate over the results of said query.
So, I think you are asking if it is possible for the if statement in a for-comprehension to change the result type. The answer is "yes, but...".
First, understand how for-comprehensions are expanded. There are questions here on Stack Overflow discussing it, and there are parameters you can pass to the compiler so it will show you what's going on.
Anyway, this code:
val evenColl = for { i <- collection if(Even) } yield i
Is translated as:
val evenColl = collection.withFilter(i => Even).map(i => i)
So, if the withFilter method changes the collection type, it will do what you want -- in this simple case. On more complex cases, that alone won't work:
for {
x <- xs
y <- ys
if cond
} yield (x, y)
is translated as
xs.flatMap(ys.withFilter(y => cond).map(y => (x, y)))
In which case flatMap is deciding what type will be returned. If it takes the cue from what result was returned, then it can work.
Now, on Scala Collections, withFilter doesn't change the type of the collection. You could write your own classes that would do that, however.
yes you can - please refer to this tutorial for an easy example. The scala query example you cited is also iterating on the collection, it is then using that data to build the query.