StringTokenizer to Scala Iterator

StringTokenizer to Scala Iterator - scala

I am trying to convert java.util.StringTokenizer to Scala's Iterator and the following approach fails:
def toIterator(st: StringTokenizer): Iterator[String] =
Iterator.continually(st.nextToken()).takeWhile(_ => st.hasMoreTokens()))
But this works:
def toIterator(st: StringTokenizer): Iterator[String] =
Iterator.fill(st.countTokens())(st.nextToken())
You can see this in Scala console:
scala> Iterator("a b", "c d").map(new java.util.StringTokenizer(_)).flatMap(st => Iterator.continually(st.nextToken()).takeWhile(_ => st.hasMoreTokens())).toList
res1: List[String] = List(a, c)
scala> Iterator("a b", "c d").map(new java.util.StringTokenizer(_)).flatMap(st => Iterator.fill(st.countTokens())(st.nextToken())).toList
res2: List[String] = List(a, b, c, d)
As you can see res1 is incorrect and res2 is correct. What am I doing wrong? The first version should work and is preferred since it is 2x faster than second approach since it does not scan the string twice

takeWhile is not intended to be used statefully. It should take a pure function that determines, solely based on the input, whether or not to continue.
Specifically, the iterator must produce the value before the takeWhile predicate gets called. Even though your function ignores the takeWhile argument, it still gets evaluated. So nextToken gets called and then we check for more tokens.
To be perfectly precise, in your "a b" case,
First, we call nextToken, which is what Iterator.continually does. There's a next token, so it returns "a".
Now, to determine if we should include the next token, we call your predicate with "a" as argument. Your predicate ignores "a" and calls hasMoreTokens. Our tokenizer has more tokens (namely, "b"), so it returns true. Continue.
Now we call nextToken again. This returns "b".
We need to determine if we should include this in our result, so our takeWhile predicate runs with "b" as argument. Our takeWhile predicate ignores its argument and calls hasMoreTokens. We have no more tokens anymore, so this returns false. We should not include this element.
takeWhile has returned false, so we stop at the last element for which it returned true. Our resulting list is List("a").
Since we're abusing a pure functional technique like takeWhile to be stateful, we get unintuitive results.
As much as it looks snazzy and clever to have a one-line solution, what you have is a stateful, imperative object that you want to adapt to the Iterator interface. Hiding that statefulness in a bunch of pure function calls is not a good idea, so we should just write our own subclass of Iterator and do it properly.
import java.util.StringTokenizer
final class StringTokenizerIterator(
private val tokenizer: StringTokenizer
) extends Iterator[String] {
def hasNext: Boolean = tokenizer.hasMoreTokens
def next(): String = tokenizer.nextToken()
}
object Example {
def toIterator(st: StringTokenizer): Iterator[String] =
new StringTokenizerIterator(st)
def main(args: Array[String]) = {
println(Iterator("a b", "c d")
.map(new java.util.StringTokenizer(_))
.flatMap(toIterator(_))
.toList)
}
}
We're doing the same work you were doing calling the appropriate StringTokenizer functions, but we're doing it in a full class that encapsulates the state, rather than pretending the stateful part is not there. It's longer code, yes, but it should be longer. We don't want the messy parts of it to go unnoticed.

Related

Scala Seq vs List performance

So I am a bit perplexed. I have a piece of code in Scala (the exact code is not really all that important). I had all my methods written to take Seq[T]. These methods are mostly tail recursive and use the Seq[T] as an accumulator which is fed initially like Seq(). Interestingly enough, when I swap all the signature with the concrete List() implementation, I am observing three fold improvement in performance.
Isn't it the case that Seq's default implementation is in fact an immutable List ? So if that is the case, what is really going on ?

Calling Seq(1,2,3) and calling List(1,2,3) will both result in a 1 :: 2 :: 3 :: Nil. The Seq.apply method is just a very generic method that looks like this:
def apply[A](elems: A*): CC[A] = {
if (elems.isEmpty) empty[A]
else {
val b = newBuilder[A]
b ++= elems
b.result()
}
}
newBuilder is the thing that sort of matters here. That method delegates to scala.collection.immutable.Seq.newBuilder:
def newBuilder[A]: Builder[A, Seq[A]] = new mutable.ListBuffer
So the Builder for a Seq is a mutable.ListBuffer. A Seq gets constructed by appending the elements to the empty ListBuffer and then calling result on it, which is implemented like this:
def result: List[A] = toList
/** Converts this buffer to a list. Takes constant time. The buffer is
* copied lazily, the first time it is mutated.
*/
override def toList: List[A] = {
exported = !isEmpty
start
}
List also has a ListBuffer as Builder. It goes through a slightly different but similar building process. It is not going to make a big difference anyway, since I assume that most of your algorithm will consist of prepending things to a Seq, not calling Seq.apply(...) the whole time. Even if you did it shouldn't make much difference.
It's really not possible to say what is causing the behavior you're seeing without seeing the code that has that behavior.

Scala: "map" vs "foreach" - is there any reason to use "foreach" in practice?

In Scala collections, if one wants to iterate over a collection (without returning results, i.e. doing a side effect on every element of collection), it can be done either with
final def foreach(f: (A) ⇒ Unit): Unit
or
final def map[B](f: (A) ⇒ B): SomeCollectionClass[B]
With the exception of possible lazy mapping(*), from an end-user perspective, I see zero differences in these invocations:
myCollection.foreach { element =>
doStuffWithElement(element);
}
myCollection.map { element =>
doStuffWithElement(element);
}
given that I can just ignore what map outputs. I can't think of any specific reason why two different methods should exist & be used, when map seems to include all the functionality of foreach, and, in fact, I would be pretty much impressed if an intelligent compiler & VM won't optimize out that collection object creation given that it's not assigned to anything, or read, or used anywhere.
So, the question is - am I right - and there are no reasons to call foreach anywhere in one's code?
Notes:
(*) The lazy mapping concept, as throughly illustrated in this question, might change things a bit and justify usage of foreach, but as far as I can see, one specifically needs to stumble upon a LazyMap, normal
(**) If one's not using a collection, but writing one, then one would quickly stumble upon the fact that for comprehension syntax syntax is in fact a syntax sugar that generates "foreach" call, i.e. these two lines generate fully equivalent code:
for (element <- myCollection) { doStuffWithElement(element); }
myCollection.foreach { element => doStuffWithElement(element); }
So if one cares about other people using that collection class with for syntax, one might still want to implement foreach method.

I can think of a couple motivations:
When the foreach is the last line of a method that is of type Unit your compiler will not give an warning but will with map (and you need -Ywarn-value-discard on). Sometimes you get warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses using map but wouldn't with foreach.
General readability - a reader can know that your mutating some state without returning something at a glance, but greater cognitive resources would be required to understand the same operation if map was used
Further to 1. you also can have type checking when passing named functions around, then into map and foreach
Using foreach won't build a new list, so will be more efficient (thanks #Vishnu)

scala> (1 to 5).iterator map println
res0: Iterator[Unit] = non-empty iterator
scala> (1 to 5).iterator foreach println
1
2
3
4
5

I'd be impressed if the builder machinery could be optimized away.
scala> :pa
// Entering paste mode (ctrl-D to finish)
implicit val cbf = new collection.generic.CanBuildFrom[List[Int],Int,List[Int]] {
def apply() = new collection.mutable.Builder[Int, List[Int]] {
val b = new collection.mutable.ListBuffer[Int]
override def +=(i: Int) = { println(s"Adding $i") ; b +=(i) ; this }
override def clear() = () ; override def result() = b.result() }
def apply(from: List[Int]) = apply() }
// Exiting paste mode, now interpreting.
cbf: scala.collection.generic.CanBuildFrom[List[Int],Int,List[Int]] = $anon$2#e3cee7b
scala> List(1,2,3) map (_ + 1)
Adding 2
Adding 3
Adding 4
res1: List[Int] = List(2, 3, 4)
scala> List(1,2,3) foreach (_ + 1)

Ending a for-comprehension loop when a check on one of the items returns false

I am a bit new to Scala, so apologies if this is something a bit trivial.
I have a list of items which I want to iterate through. I to execute a check on each of the items and if just one of them fails I want the whole function to return false. So you can see this as an AND condition. I want it to be evaluated lazily, i.e. the moment I encounter the first false return false.
I am used to the for - yield syntax which filters items generated through some generator (list of items, sequence etc.). In my case however I just want to break out and return false without executing the rest of the loop. In normal Java one would just do a return false; within the loop.
In an inefficient way (i.e. not stopping when I encounter the first false item), I could do it:
(for {
item <- items
if !satisfiesCondition(item)
} yield item).isEmpty
Which is essentially saying that if no items make it through the filter all of them satisfy the condition. But this seems a bit convoluted and inefficient (consider you have 1 million items and the first one already did not satisfy the condition).
What is the best and most elegant way to do this in Scala?

Stopping early at the first false for a condition is done using forall in Scala. (A related question)
Your solution rewritten:
items.forall(satisfiesCondition)
To demonstrate short-circuiting:
List(1,2,3,4,5,6).forall { x => println(x); x < 3 }
1
2
3
res1: Boolean = false
The opposite of forall is exists which stops as soon as a condition is met:
List(1,2,3,4,5,6).exists{ x => println(x); x > 3 }
1
2
3
4
res2: Boolean = true

Scala's for comprehensions are not general iterations. That means they cannot produce every possible result that one can produce out of an iteration, as, for example, the very thing you want to do.
There are three things that a Scala for comprehension can do, when you are returning a value (that is, using yield). In the most basic case, it can do this:
Given an object of type M[A], and a function A => B (that is, which returns an object of type B when given an object of type A), return an object of type M[B];
For example, given a sequence of characters, Seq[Char], get UTF-16 integer for that character:
val codes = for (char <- "A String") yield char.toInt
The expression char.toInt converts a Char into an Int, so the String -- which is implicitly converted into a Seq[Char] in Scala --, becomes a Seq[Int] (actually, an IndexedSeq[Int], through some Scala collection magic).
The second thing it can do is this:
Given objects of type M[A], M[B], M[C], etc, and a function of A, B, C, etc into D, return an object of type M[D];
You can think of this as a generalization of the previous transformation, though not everything that could support the previous transformation can necessarily support this transformation. For example, we could produce coordinates for all coordinates of a battleship game like this:
val coords = for {
column <- 'A' to 'L'
row <- 1 to 10
} yield s"$column$row"
In this case, we have objects of the types Seq[Char] and Seq[Int], and a function (Char, Int) => String, so we get back a Seq[String].
The third, and final, thing a for comprehension can do is this:
Given an object of type M[A], such that the type M[T] has a zero value for any type T, a function A => B, and a condition A => Boolean, return either the zero or an object of type M[B], depending on the condition;
This one is harder to understand, though it may look simple at first. Let's look at something that looks simple first, say, finding all vowels in a sequence of characters:
def vowels(s: String) = for {
letter <- s
if Set('a', 'e', 'i', 'o', 'u') contains letter.toLower
} yield letter.toLower
val aStringVowels = vowels("A String")
It looks simple: we have a condition, we have a function Char => Char, and we get a result, and there doesn't seem to be any need for a "zero" of any kind. In this case, the zero would be the empty sequence, but it hardly seems worth mentioning it.
To explain it better, I'll switch from Seq to Option. An Option[A] has two sub-types: Some[A] and None. The zero, evidently, is the None. It is used when you need to represent the possible absence of a value, or the value itself.
Now, let's say we have a web server where users who are logged in and are administrators get extra javascript on their web pages for administration tasks (like wordpress does). First, we need to get the user, if there's a user logged in, let's say this is done by this method:
def getUser(req: HttpRequest): Option[User]
If the user is not logged in, we get None, otherwise we get Some(user), where user is the data structure with information about the user that made the request. We can then model that operation like this:
def adminJs(req; HttpRequest): Option[String] = for {
user <- getUser(req)
if user.isAdmin
} yield adminScriptForUser(user)
Here it is easier to see the point of the zero. When the condition is false, adminScriptForUser(user) cannot be executed, so the for comprehension needs something to return instead, and that something is the "zero": None.
In technical terms, Scala's for comprehensions provides syntactic sugars for operations on monads, with an extra operation for monads with zero (see list comprehensions in the same article).
What you actually want to accomplish is called a catamorphism, usually represented as a fold method, which can be thought of as a function of M[A] => B. You can write it with fold, foldLeft or foldRight in a sequence, but none of them would actually short-circuit the iteration.
Short-circuiting arises naturally out of non-strict evaluation, which is the default in Haskell, in which most of these papers are written. Scala, as most other languages, is by default strict.
There are three solutions to your problem:
Use the special methods forall or exists, which target your precise use case, though they don't solve the generic problem;
Use a non-strict collection; there's Scala's Stream, but it has problems that prevents its effective use. The Scalaz library can help you there;
Use an early return, which is how Scala library solves this problem in the general case (in specific cases, it uses better optimizations).
As an example of the third option, you could write this:
def hasEven(xs: List[Int]): Boolean = {
for (x <- xs) if (x % 2 == 0) return true
false
}
Note as well that this is called a "for loop", not a "for comprehension", because it doesn't return a value (well, it returns Unit), since it doesn't have the yield keyword.
You can read more about real generic iteration in the article The Essence of The Iterator Pattern, which is a Scala experiment with the concepts described in the paper by the same name.

forall is definitely the best choice for the specific scenario but for illustration here's good old recursion:
#tailrec def hasEven(xs: List[Int]): Boolean = xs match {
case head :: tail if head % 2 == 0 => true
case Nil => false
case _ => hasEven(xs.tail)
}
I tend to use recursion a lot for loops w/short circuit use cases that don't involve collections.

UPDATE:
DO NOT USE THE CODE IN MY ANSWER BELOW!
Shortly after I posted the answer below (after misinterpreting the original poster's question), I have discovered a way superior generic answer (to the listing of requirements below) here: https://stackoverflow.com/a/60177908/501113
It appears you have several requirements:
Iterate through a (possibly large) list of items doing some (possibly expensive) work
The work done to an item could return an error
At the first item that returns an error, short circuit the iteration, throw away the work already done, and return the item's error
A for comprehension isn't designed for this (as is detailed in the other answers).
And I was unable to find another Scala collections pre-built iterator that provided the requirements above.
While the code below is based on a contrived example (transforming a String of digits into a BigInt), it is the general pattern I prefer to use; i.e. process a collection and transform it into something else.
def getDigits(shouldOnlyBeDigits: String): Either[IllegalArgumentException, BigInt] = {
#scala.annotation.tailrec
def recursive(
charactersRemaining: String = shouldOnlyBeDigits
, accumulator: List[Int] = Nil
): Either[IllegalArgumentException, List[Int]] =
if (charactersRemaining.isEmpty)
Right(accumulator) //All work completed without error
else {
val item = charactersRemaining.head
val isSuccess =
item.isDigit //Work the item
if (isSuccess)
//This item's work completed without error, so keep iterating
recursive(charactersRemaining.tail, (item - 48) :: accumulator)
else {
//This item hit an error, so short circuit
Left(new IllegalArgumentException(s"item [$item] is not a digit"))
}
}
recursive().map(digits => BigInt(digits.reverse.mkString))
}
When it is called as getDigits("1234") in a REPL (or Scala Worksheet), it returns:
val res0: Either[IllegalArgumentException,BigInt] = Right(1234)
And when called as getDigits("12A34") in a REPL (or Scala Worksheet), it returns:
val res1: Either[IllegalArgumentException,BigInt] = Left(java.lang.IllegalArgumentException: item [A] is not digit)
You can play with this in Scastie here:
https://scastie.scala-lang.org/7ddVynRITIOqUflQybfXUA

Stream as constructor arg sometimes fully evaluated during early class initialization

Streams can be used as class constructor arguments:
scala> ( 0 to 10).toStream.map(i =>{println("bla" + i); -i})
bla0
res0: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> class B(val a:Seq[Int]){println(a.tail.head)}
defined class B
scala> new B(res0)
bla1
-1
res1: B = B#fdb84e
So, the Stream does not get fully evaluated although handed in as a Seq argument, and although being partly evaluated. Works as expected.
I have a class like this:
class HazelSimpleResultSet[T] (col: Seq[T], comparator:Comparator[T]) extends HGRandomAccessResult[T] with CountMe
{
val foo: Int = -1 // col of type Stream[T] already fully evaluated here.
def count = col.size
....
}
where HGRandomAccessResult and CountMe are simple interfaces.
I most cases I want to use Streams as col constructor arguments, to avoid costly operations. In the debugger I can follow that it works in some cases, since the value displayed for col remains Stream(xy, ?) and "tlVal = null", even after initialization of HazelSimpleResultSet.
Furthermore, for testing, I include println in the blocks that construct the Streams like this:
keyvalues.foldLeft(Stream.empty[KeyType]){ case (a, b) => ({ println("evaluating "+ b); unpack[KeyType](b)}) #:: a}
in order to follow in the Console when exactly the Stream is evaluted.
So, in some cases it works, but in some cases the Stream gets full evaluted during the very first moments of initialization of HazelSimpleResultSet. I cannot see no relevant difference in the Stream handed in, i'm just sure they are unevaluted Streams till that moment.
"Stepping into" with the debugger, I can see that it gets evaluted in the line of the class definition itself, before even reaching the class body, i.e. before initialization of any field.
EDIT:
I can define the class in a (suboptimal) way such that no field at all is referencing to the Stream, and still I get that behaviour.
The CountMe interface defines a "count" method, which calls col.size which would then evaluate all the Stream. I tried to define count in terms of a lazy val size, but that didn't make a difference.
I'm a bit at a loss why it doesn't work in some cases. Anybody has any hints about hidden caveats of Streams?
EDIT:
An important note: The Stream object wraps some serious state that it needs to evaluate, i.e. a reference to a NoSQL database (hazelcast).
Question: what are the caveats here? Is there something in particular I must take care of when my Stream carries stateful references necessary for evaluation?

If you create Stream like this:
Stream ({println("eval 1"); 1}, {println("eval 2"); 2})
then you are actually calling Stream.apply which is implemented like this:
/** A stream consisting of given elements */
override def apply[A](xs: A*): Stream[A] = xs.toStream
which means that what actually happens is:
All elements are evaluated!
A Seq containing these elements is created.
A Stream is created out of this Seq
So as you can see, if you create your Stream this way, all its elements are evaluated eagerly. This is not how you create lazily-evaluated Stream. What you want to do is probably use #:: and #::: operators that evaluate their operands lazily. Look up the docs for their usage.

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)

With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.

What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.

You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

StringTokenizer to Scala Iterator - scala

Related

Scala Seq vs List performance

Scala: "map" vs "foreach" - is there any reason to use "foreach" in practice?

Ending a for-comprehension loop when a check on one of the items returns false

Stream as constructor arg sometimes fully evaluated during early class initialization

what is proper monad or sequence comprehension to both map and carry state across?

Categories

Resources