scala turning an Iterator[Option[T]] into an Iterator[T] - scala

I have an Iterator[Option[T]] and I want to get an Iterator[T] for those Options where T isDefined. There must be a better way than this:
it filter { _ isDefined} map { _ get }
I would have thought that it was possible in one construct... Anybody any ideas?

In the case where it is an Iterable
val it:Iterable[Option[T]] = ...
it.flatMap( x => x ) //returns an Iterable[T]
In the case where it is an Iterator
val it:Iterator[Option[T]] = ...
it.flatMap( x => x elements ) //returns an Iterator[T]
it.flatMap( _ elements) //equivalent

In newer versions this is now possible:
val it: Iterator[Option[T]] = ...
val flatIt = it.flatten

This works for me (Scala 2.8):
it.collect {case Some(s) => s}

To me, this is a classic use case for the monadic UI.
for {
opt <- iterable
t <- opt
} yield t
It's just sugar for the flatMap solution described above, and it produces identical bytecode. However, syntax matters, and I think one of the best times to use Scala's monadic for syntax is when you're working with Option, especially in conjunction with collections.
I think this formulation is considerably more readable, especially for those not very familiar with functional programming. I often try both the monadic and functional expressions of a loop and see which seems more straightforward. I think flatMap is hard name for most people to grok (and actually, calling it >>= makes more intuitive sense to me).

Related

List implementation of foldLeft in Scala

Scala foldLeft implementation is:
def foldLeft[B](z: B)(op: (B, A) => B): B = {
var result = z
this foreach (x => result = op(result, x))
result
}
Why scala develovers don't use something like tail recursion or something else like this(It's just example) :
def foldLeft[T](start: T, myList: List[T])(f:(T, T) => T): T = {
def foldRec(accum: T, list: List[T]): T = {
list match {
case Nil => accum
case head :: tail => foldRec(f(accum, head), tail)
}
}
foldRec(start, myList)
}
Can it be? Why if it cannot/can?
"Why not replace this simple three-line piece of code with this less simple seven-line piece of code that does the same thing?"
Um. That's why.
(If you are asking about performance, then one would need benchmarks of both solutions and an indication that the non-closure version was significantly faster.)
According to this answer, Scala does support tail-recursion optimization, but it looks like it wasn't there from the beginning, and it might still not work in every case, so that specific implementation might be a leftover.
That said, Scala is multi-paradigm and I don't think it strives for purity in terms of its functional programming, so I wouldn't be surprised if they went for the most practical or convenient approach.
Beside the imperative solution is simpler, it is also way more general. As you may have noticed, foldLeft is implemented in TraversableOnce and depends only on the foreach method. Thus, by extending Traversable and implementing foreach, which is probably the simplest method to implement on any collection, you get all these wonderful methods.
The declarative implementation on the other hand is reflexive on the structure of the List and very specific as it depends on Nil and ::.

Combining nested lists in Scala - flattened Carthesian product

I have an interesting problem which is proving difficult for someone new to Scala.
I need to combine 2 lists:
listA : List[List[Int]]
listB : List[Int]
In the following way:
val listA = List(List(1,1), List(2,2))
val listB = List(3,4)
val listC = ???
// listC: List[List[Int]] = List(List(1,1,3),List(1,1,4),List(2,2,3),List(2,2,4)
In Java, I would use a couple of nested loops:
for(List<Integer> list : listA) {
for(Integer i: listB) {
subList = new ArrayList<Integer>(list);
subList.add(i);
listC.add(subList);
}
}
I'm guessing this is a one liner in Scala, but so far it's eluding me.
You want to perform a flattened Cartesian product. For-comprehensions are the easiest way to do this and may look similar to your Java solution:
val listC = for (list <- listA; i <- listB) yield list :+ i
scand1sk's answer is almost certainly the approach you should be using here, but as a side note, there's another way of thinking about this problem. What you're doing is in effect lifting the append operation into the applicative functor for lists. This means that using Scalaz you can write the following:
import scalaz._, Scalaz._
val listC = (listA |#| listB)(_ :+ _)
We can think of (_ :+ _) as a function that takes a list of things, and a single thing of the same type, and returns a new list:
(_ :+ _): ((List[Thing], Thing) => List[Thing])
Scalaz provides an applicative functor instance for lists, so we can in effect create a new function that adds an extra list layer to each of the types above. The weird (x |#| y)(_ :+ _) syntax says: create just such a function and apply it to x and y. And the result is what you're looking for.
And as in the for-comprehension case, if you don't care about order, you can make the operation more efficient by using :: and flipping the order of the arguments.
For more information, see my similar answer here about the cartesian product in Haskell, this introduction to applicative functors in Scala, or this blog post about making the syntax for this kind of thing a little less ugly in Scala. And of course feel free to ignore all of the above if you don't care.

Scala Option object inside another Option object

I have a model, which has some Option fields, which contain another Option fields. For example:
case class First(second: Option[Second], name: Option[String])
case class Second(third: Option[Third], title: Option[String])
case class Third(numberOfSmth: Option[Int])
I'm receiving this data from external JSON's and sometimes this data may contain null's, that was the reason of such model design.
So the question is: what is the best way to get a deepest field?
First.get.second.get.third.get.numberOfSmth.get
Above method looks really ugly and it may cause exception if one of the objects will be None. I was looking in to Scalaz lib, but didn't figure out a better way to do that.
Any ideas?
The solution is to use Option.map and Option.flatMap:
First.flatMap(_.second.flatMap(_.third.map(_.numberOfSmth)))
Or the equivalent (see the update at the end of this answer):
First flatMap(_.second) flatMap(_.third) map(_.numberOfSmth)
This returns an Option[Int] (provided that numberOfSmth returns an Int). If any of the options in the call chain is None, the result will be None, otherwise it will be Some(count) where count is the value returned by numberOfSmth.
Of course this can get ugly very fast. For this reason scala supports for comprehensions as a syntactic sugar. The above can be rewritten as:
for {
first <- First
second <- first .second
third <- second.third
} third.numberOfSmth
Which is arguably nicer (especially if you are not yet used to seeing map/flatMap everywhere, as will certainly be the case after a while using scala), and generates the exact same code under the hood.
For more background, you may check this other question: What is Scala's yield?
UPDATE:
Thanks to Ben James for pointing out that flatMap is associative. In other words x flatMap(y flatMap z))) is the same as x flatMap y flatMap z. While the latter is usually not shorter, it has the advantage of avoiding any nesting, which is easier to follow.
Here is some illustration in the REPL (the 4 styles are equivalent, with the first two using flatMap nesting, the other two using flat chains of flatMap):
scala> val l = Some(1,Some(2,Some(3,"aze")))
l: Some[(Int, Some[(Int, Some[(Int, String)])])] = Some((1,Some((2,Some((3,aze))))))
scala> l.flatMap(_._2.flatMap(_._2.map(_._2)))
res22: Option[String] = Some(aze)
scala> l flatMap(_._2 flatMap(_._2 map(_._2)))
res23: Option[String] = Some(aze)
scala> l flatMap(_._2) flatMap(_._2) map(_._2)
res24: Option[String] = Some(aze)
scala> l.flatMap(_._2).flatMap(_._2).map(_._2)
res25: Option[String] = Some(aze)
There is no need for scalaz:
for {
first <- yourFirst
second <- f.second
third <- second.third
number <- third.numberOfSmth
} yield number
Alternatively you can use nested flatMaps
This can be done by chaining calls to flatMap:
def getN(first: Option[First]): Option[Int] =
first flatMap (_.second) flatMap (_.third) flatMap (_.numberOfSmth)
You can also do this with a for-comprehension, but it's more verbose as it forces you to name each intermediate value:
def getN(first: Option[First]): Option[Int] =
for {
f <- first
s <- f.second
t <- s.third
n <- t.numberOfSmth
} yield n
I think it is an overkill for your problem but just as a general reference:
This nested access problem is addressed by a concept called Lenses. They provide a nice mechanism to access nested data types by simple composition. As introduction you might want to check for instance this SO answer or this tutorial. The question whether it makes sense to use Lenses in your case is whether you also have to perform a lot of updates in you nested option structure (note: update not in the mutable sense, but returning a new modified but immutable instance). Without Lenses this leads to lengthy nested case class copy code. If you do not have to update at all, I would stick to om-nom-nom's suggestion.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity

Generalizing a collection method

If I want to generalize the following method to all collection types that support all the necessary operations (foldLeft, flatMap, map, and :+) then how do I do it? Currently it only works with lists.
Code:
def join[A](lists: List[List[A]]): List[List[A]] = {
lists.foldLeft(List(List[A]())) { case (acc, cur) =>
for {
a <- acc
c <- cur
} yield a :+ c
}
}
If you want this only for collections that support :+, the easiest way is just to define it in terms of Seq instead of List.
You can make it a lot more generic, all the way down to Traversable, by using builders. I'd be happy to explain that when I have a bit more time on my hands, but it tends to get complicated at that level.
Scalaz applicative functors is probably the way to go, but I'll let someone with more Scalaz experience than me handle that particular answer.