When are scala's for-comprehensions lazy? - scala

In Python, I can do something like this:
lazy = ((i,j) for i in range(0,10000) for j in range(0,10000))
sum((1 for i in lazy))
It will take a while, but the memory use is constant.
The same construct in scala:
(for(i<-0 to 10000; j<-i+1 to 10000) yield (i,j)).count((a:(Int,Int)) => true)
After a while, I get a java.lang.OutOfMemoryError, even though it should be evaluated lazily.

Nothing's inherently lazy about Scala's for-comprehension; it's syntactic sugar* which won't change the fact that the combination of your two ranges will be eager.
If you work with lazy views of your ranges, the result of the comprehension will be lazy too:
scala> for(i<-(0 to 10000).view; j<-(i+1 to 10000).view) yield (i,j)
res0: scala.collection.SeqView[(Int, Int),Seq[_]] = SeqViewN(...)
scala> res0.count((a: (Int, Int)) => true)
res1: Int = 50005000
The laziness here is nothing to do with the for-comprehension, but because when flatMap or map (see below) are called on some type of container, you get back a result in the same type of container. So, the for-comprehension will just preserve the laziness (or lack of) of whatever you put in.
*for something like:
(0 to 10000).flatMap(i => (i+1 to 10000).map(j => (i, j)))

Laziness comes not from the for-comprehension, but from the collection itself. You should look into the strictness characteristics of the collection.
But, for the lazy :-), here's a summary: Iterator and Stream are non-strict, as are selected methods of the view of any collection. So, if you want laziness, be sure to .iterator, .view or .toStream your collection first.

Related

Monads - Purpose of Flatten

So I've been doing some reading about Monads in scala, and all the syntax relating to their flatMap functions and for comprehensions. Intuitively, I understand why Monads need to use the map part of the flatMap function, as usually when I use a map function its on a container of zero, one or many elements and returns the same container but with the passed in function applied to all the elements of the container. A Monad similarly is a container of zero, one or many elements.
However, what is the purpose of the flatten part of the flatMap? I cannot understand the intuition behind it. To me it seems like extra boilerplate that requires all the functions passed to flatMap to create a container / monad around their return value only to have that container instantly destroyed by the flatten part of the flatMap. I cannot think of a single example where flatMap is used which can't be simplified by being replaced simply by map. For example:
var monad = Option(5)
var monad2 = None
def flatAdder(i:Int) = Option(i + 1)
def adder(i:Int) = i + 1
// What I see in all the examples
monad.flatMap(flatAdder)
// Option[Int] = Some(6)
monad.flatMap(flatAdder).flatMap(flatAdder)
// Option[Int] = Some(7)
monad2.flatMap(flatAdder)
// Option[Int] = None
monad2.flatMap(flatAdder).flatMap(flatAdder)
// Option[Int] = None
// Isn't this a lot easier?
monad.map(adder)
// Option[Int] = Some(6)
monad.map(adder).map(adder)
// Option[Int] = Some(7)
monad2.map(adder)
// Option[Int] = None
monad2.map(adder).map(adder)
// Option[Int] = None
To me, using map by itself seems far more intuitive and simple than flatMap and the flatten part does not seem to add any sort of value. However, in Scala a large emphasis is placed on flatMap rather than map, to the point it even gets its own syntax for for comprehensions, so clearly I must be missing something. My question is: in what cases is the flatten part of flatMap actually useful? and what other advantages does flatMap have over map?
What if each element being processed could result in zero or more elements?
List(4, 0, 15).flatMap(n => primeFactors(n))
//List(2, 2, 3, 5)
What if you have a name, that might not be spelled correctly, and you want their office assignment, if they have one?
def getID(name:String): Option[EmployeeID] = ???
def getOffice(id:EmployeeID): Option[Office] = ???
val office :Option[Office] =
getID(nameAttempt).flatMap(id => getOffice(id))
You use flatMap() when you have a monad and you need to feed its contents to a monad producer. You want the result Monad[X] not Monad[Monad[X]].
The idea behind flatMap is to be able to take a value of M[A] and a function f having a type of A => M[B] and produce an M[B], if we didn't have a way to "flatten" the value then we'd always end up having an M[M[B]] which is what we don't want most of the time.
(Yes, there are some cases that you may want an M[M[_]] like when you want to create multiple instances of M[_])

Scala Seq vs List performance

So I am a bit perplexed. I have a piece of code in Scala (the exact code is not really all that important). I had all my methods written to take Seq[T]. These methods are mostly tail recursive and use the Seq[T] as an accumulator which is fed initially like Seq(). Interestingly enough, when I swap all the signature with the concrete List() implementation, I am observing three fold improvement in performance.
Isn't it the case that Seq's default implementation is in fact an immutable List ? So if that is the case, what is really going on ?
Calling Seq(1,2,3) and calling List(1,2,3) will both result in a 1 :: 2 :: 3 :: Nil. The Seq.apply method is just a very generic method that looks like this:
def apply[A](elems: A*): CC[A] = {
if (elems.isEmpty) empty[A]
else {
val b = newBuilder[A]
b ++= elems
b.result()
}
}
newBuilder is the thing that sort of matters here. That method delegates to scala.collection.immutable.Seq.newBuilder:
def newBuilder[A]: Builder[A, Seq[A]] = new mutable.ListBuffer
So the Builder for a Seq is a mutable.ListBuffer. A Seq gets constructed by appending the elements to the empty ListBuffer and then calling result on it, which is implemented like this:
def result: List[A] = toList
/** Converts this buffer to a list. Takes constant time. The buffer is
* copied lazily, the first time it is mutated.
*/
override def toList: List[A] = {
exported = !isEmpty
start
}
List also has a ListBuffer as Builder. It goes through a slightly different but similar building process. It is not going to make a big difference anyway, since I assume that most of your algorithm will consist of prepending things to a Seq, not calling Seq.apply(...) the whole time. Even if you did it shouldn't make much difference.
It's really not possible to say what is causing the behavior you're seeing without seeing the code that has that behavior.

sequence List of disjunction with tuples

I use sequenceU to turn inside type out while working with disjunctions in scalaz.
for e.g.
val res = List[\/[Errs,MyType]]
doing
res.sequenceU will give \/[Errs,List[MyType]]
Now if I have a val res2 = List[(\/[Errs,MyType], DefModel)] - List containing tuples of disjunctions; what's the right way to convert
res2 to \/[Errs,List[ (Mype,DefModel)]
As noted in the comments, the most straightforward way to write this is probably just with a traverse and map:
def sequence(xs: List[(\/[Errs, MyType], DefModel)]): \/[Errs, List[(MyType, DefModel)]] =
xs.traverseU { case (m, d) => m.map((_, d)) }
It's worth noting, though, that tuples are themselves traversable, so the following is equivalent:
def sequence(xs: List[(\/[Errs, MyType], DefModel)]): \/[Errs, List[(MyType, DefModel)]] =
xs.traverseU(_.swap.sequenceU.map(_.swap))
Note that this would be even simpler if the disjunction were on the right side of the tuple. If you're willing to make that change, you can also more conveniently take advantage of the fact that Traverse instances compose:
def sequence(xs: List[(DefModel, \/[Errs, MyType])]): \/[Errs, List[(DefModel, MyType)]] =
Traverse[List].compose[(DefModel, ?)].sequenceU(xs)
I'm using kind-projector here but you could also write out the type lambda.

Scala Option object inside another Option object

I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity