Abort early in a fold - scala

What's the best way to terminate a fold early? As a simplified example, imagine I want to sum up the numbers in an Iterable, but if I encounter something I'm not expecting (say an odd number) I might want to terminate. This is a first approximation
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
nums.foldLeft (Some(0): Option[Int]) {
case (Some(s), n) if n % 2 == 0 => Some(s + n)
case _ => None
}
}
However, this solution is pretty ugly (as in, if I did a .foreach and a return -- it'd be much cleaner and clearer) and worst of all, it traverses the entire iterable even if it encounters a non-even number.
So what would be the best way to write a fold like this, that terminates early? Should I just go and write this recursively, or is there a more accepted way?

My first choice would usually be to use recursion. It is only moderately less compact, is potentially faster (certainly no slower), and in early termination can make the logic more clear. In this case you need nested defs which is a little awkward:
def sumEvenNumbers(nums: Iterable[Int]) = {
def sumEven(it: Iterator[Int], n: Int): Option[Int] = {
if (it.hasNext) {
val x = it.next
if ((x % 2) == 0) sumEven(it, n+x) else None
}
else Some(n)
}
sumEven(nums.iterator, 0)
}
My second choice would be to use return, as it keeps everything else intact and you only need to wrap the fold in a def so you have something to return from--in this case, you already have a method, so:
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
Some(nums.foldLeft(0){ (n,x) =>
if ((n % 2) != 0) return None
n+x
})
}
which in this particular case is a lot more compact than recursion (though we got especially unlucky with recursion since we had to do an iterable/iterator transformation). The jumpy control flow is something to avoid when all else is equal, but here it's not. No harm in using it in cases where it's valuable.
If I was doing this often and wanted it within the middle of a method somewhere (so I couldn't just use return), I would probably use exception-handling to generate non-local control flow. That is, after all, what it is good at, and error handling is not the only time it's useful. The only trick is to avoid generating a stack trace (which is really slow), and that's easy because the trait NoStackTrace and its child trait ControlThrowable already do that for you. Scala already uses this internally (in fact, that's how it implements the return from inside the fold!). Let's make our own (can't be nested, though one could fix that):
import scala.util.control.ControlThrowable
case class Returned[A](value: A) extends ControlThrowable {}
def shortcut[A](a: => A) = try { a } catch { case Returned(v) => v }
def sumEvenNumbers(nums: Iterable[Int]) = shortcut{
Option(nums.foldLeft(0){ (n,x) =>
if ((x % 2) != 0) throw Returned(None)
n+x
})
}
Here of course using return is better, but note that you could put shortcut anywhere, not just wrapping an entire method.
Next in line for me would be to re-implement fold (either myself or to find a library that does it) so that it could signal early termination. The two natural ways of doing this are to not propagate the value but an Option containing the value, where None signifies termination; or to use a second indicator function that signals completion. The Scalaz lazy fold shown by Kim Stebel already covers the first case, so I'll show the second (with a mutable implementation):
def foldOrFail[A,B](it: Iterable[A])(zero: B)(fail: A => Boolean)(f: (B,A) => B): Option[B] = {
val ii = it.iterator
var b = zero
while (ii.hasNext) {
val x = ii.next
if (fail(x)) return None
b = f(b,x)
}
Some(b)
}
def sumEvenNumbers(nums: Iterable[Int]) = foldOrFail(nums)(0)(_ % 2 != 0)(_ + _)
(Whether you implement the termination by recursion, return, laziness, etc. is up to you.)
I think that covers the main reasonable variants; there are some other options also, but I'm not sure why one would use them in this case. (Iterator itself would work well if it had a findOrPrevious, but it doesn't, and the extra work it takes to do that by hand makes it a silly option to use here.)

The scenario you describe (exit upon some unwanted condition) seems like a good use case for the takeWhile method. It is essentially filter, but should end upon encountering an element that doesn't meet the condition.
For example:
val list = List(2,4,6,8,6,4,2,5,3,2)
list.takeWhile(_ % 2 == 0) //result is List(2,4,6,8,6,4,2)
This will work just fine for Iterators/Iterables too. The solution I suggest for your "sum of even numbers, but break on odd" is:
list.iterator.takeWhile(_ % 2 == 0).foldLeft(...)
And just to prove that it's not wasting your time once it hits an odd number...
scala> val list = List(2,4,5,6,8)
list: List[Int] = List(2, 4, 5, 6, 8)
scala> def condition(i: Int) = {
| println("processing " + i)
| i % 2 == 0
| }
condition: (i: Int)Boolean
scala> list.iterator.takeWhile(condition _).sum
processing 2
processing 4
processing 5
res4: Int = 6

You can do what you want in a functional style using the lazy version of foldRight in scalaz. For a more in depth explanation, see this blog post. While this solution uses a Stream, you can convert an Iterable into a Stream efficiently with iterable.toStream.
import scalaz._
import Scalaz._
val str = Stream(2,1,2,2,2,2,2,2,2)
var i = 0 //only here for testing
val r = str.foldr(Some(0):Option[Int])((n,s) => {
println(i)
i+=1
if (n % 2 == 0) s.map(n+) else None
})
This only prints
0
1
which clearly shows that the anonymous function is only called twice (i.e. until it encounters the odd number). That is due to the definition of foldr, whose signature (in case of Stream) is def foldr[B](b: B)(f: (Int, => B) => B)(implicit r: scalaz.Foldable[Stream]): B. Note that the anonymous function takes a by name parameter as its second argument, so it need no be evaluated.
Btw, you can still write this with the OP's pattern matching solution, but I find if/else and map more elegant.

Well, Scala does allow non local returns. There are differing opinions on whether or not this is a good style.
scala> def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
| nums.foldLeft (Some(0): Option[Int]) {
| case (None, _) => return None
| case (Some(s), n) if n % 2 == 0 => Some(s + n)
| case (Some(_), _) => None
| }
| }
sumEvenNumbers: (nums: Iterable[Int])Option[Int]
scala> sumEvenNumbers(2 to 10)
res8: Option[Int] = None
scala> sumEvenNumbers(2 to 10 by 2)
res9: Option[Int] = Some(30)
EDIT:
In this particular case, as #Arjan suggested, you can also do:
def sumEvenNumbers(nums: Iterable[Int]): Option[Int] = {
nums.foldLeft (Some(0): Option[Int]) {
case (Some(s), n) if n % 2 == 0 => Some(s + n)
case _ => return None
}
}

You can use foldM from cats lib (as suggested by #Didac) but I suggest to use Either instead of Option if you want to get actual sum out.
bifoldMap is used to extract the result from Either.
import cats.implicits._
def sumEven(nums: Stream[Int]): Either[Int, Int] = {
nums.foldM(0) {
case (acc, n) if n % 2 == 0 => Either.right(acc + n)
case (acc, n) => {
println(s"Stopping on number: $n")
Either.left(acc)
}
}
}
examples:
println("Result: " + sumEven(Stream(2, 2, 3, 11)).bifoldMap(identity, identity))
> Stopping on number: 3
> Result: 4
println("Result: " + sumEven(Stream(2, 7, 2, 3)).bifoldMap(identity, identity))
> Stopping on number: 7
> Result: 2

Cats has a method called foldM which does short-circuiting (for Vector, List, Stream, ...).
It works as follows:
def sumEvenNumbers(nums: Stream[Int]): Option[Long] = {
import cats.implicits._
nums.foldM(0L) {
case (acc, c) if c % 2 == 0 => Some(acc + c)
case _ => None
}
}
If it finds a not even element it returns None without computing the rest, otherwise it returns the sum of the even entries.
If you want to keep count until an even entry is found, you should use an Either[Long, Long]

#Rex Kerr your answer helped me, but I needed to tweak it to use Either
def foldOrFail[A,B,C,D](map: B => Either[D, C])(merge: (A, C) => A)(initial: A)(it: Iterable[B]): Either[D, A] = {
val ii= it.iterator
var b= initial
while (ii.hasNext) {
val x= ii.next
map(x) match {
case Left(error) => return Left(error)
case Right(d) => b= merge(b, d)
}
}
Right(b)
}

You could try using a temporary var and using takeWhile. Here is a version.
var continue = true
// sample stream of 2's and then a stream of 3's.
val evenSum = (Stream.fill(10)(2) ++ Stream.fill(10)(3)).takeWhile(_ => continue)
.foldLeft(Option[Int](0)){
case (result,i) if i%2 != 0 =>
continue = false;
// return whatever is appropriate either the accumulated sum or None.
result
case (optionSum,i) => optionSum.map( _ + i)
}
The evenSum should be Some(20) in this case.

You can throw a well-chosen exception upon encountering your termination criterion, handling it in the calling code.

A more beutiful solution would be using span:
val (l, r) = numbers.span(_ % 2 == 0)
if(r.isEmpty) Some(l.sum)
else None
... but it traverses the list two times if all the numbers are even

Just for an "academic" reasons (:
var headers = Source.fromFile(file).getLines().next().split(",")
var closeHeaderIdx = headers.takeWhile { s => !"Close".equals(s) }.foldLeft(0)((i, S) => i+1)
Takes twice then it should but it is a nice one liner.
If "Close" not found it will return
headers.size
Another (better) is this one:
var headers = Source.fromFile(file).getLines().next().split(",").toList
var closeHeaderIdx = headers.indexOf("Close")

Related

How do you stop building an Option[Collection] upon reaching the first None?

When building up a collection inside an Option, each attempt to make the next member of the collection might fail, making the collection as a whole a failure, too. Upon the first failure to make a member, I'd like to give up immediately and return None for the whole collection. What is an idiomatic way to do this in Scala?
Here's one approach I've come up with:
def findPartByName(name: String): Option[Part] = . . .
def allParts(names: Seq[String]): Option[Seq[Part]] =
names.foldLeft(Some(Seq.empty): Option[Seq[Part]]) {
(result, name) => result match {
case Some(parts) =>
findPartByName(name) flatMap { part => Some(parts :+ part) }
case None => None
}
}
In other words, if any call to findPartByName returns None, allParts returns None. Otherwise, allParts returns a Some containing a collection of Parts, all of which are guaranteed to be valid. An empty collection is OK.
The above has the advantage that it stops calling findPartByName after the first failure. But the foldLeft still iterates once for each name, regardless.
Here's a version that bails out as soon as findPartByName returns a None:
def allParts2(names: Seq[String]): Option[Seq[Part]] = Some(
for (name <- names) yield findPartByName(name) match {
case Some(part) => part
case None => return None
}
)
I currently find the second version more readable, but (a) what seems most readable is likely to change as I get more experience with Scala, (b) I get the impression that early return is frowned upon in Scala, and (c) neither one seems to make what's going on especially obvious to me.
The combination of "all-or-nothing" and "give up on the first failure" seems like such a basic programming concept, I figure there must be a common Scala or functional idiom to express it.
The return in your code is actually a couple levels deep in anonymous functions. As a result, it must be implemented by throwing an exception which is caught in the outer function. This isn't efficient or pretty, hence the frowning.
It is easiest and most efficient to write this with a while loop and an Iterator.
def allParts3(names: Seq[String]): Option[Seq[Part]] = {
val iterator = names.iterator
var accum = List.empty[Part]
while (iterator.hasNext) {
findPartByName(iterator.next) match {
case Some(part) => accum +:= part
case None => return None
}
}
Some(accum.reverse)
}
Because we don't know what kind of Seq names is, we must create an iterator to loop over it efficiently rather than using tail or indexes. The while loop can be replaced with a tail-recursive inner function, but with the iterator a while loop is clearer.
Scala collections have some options to use laziness to achieve that.
You can use view and takeWhile:
def allPartsWithView(names: Seq[String]): Option[Seq[Part]] = {
val successes = names.view.map(findPartByName)
.takeWhile(!_.isEmpty)
.map(_.get)
.force
if (!names.isDefinedAt(successes.size)) Some(successes)
else None
}
Using ifDefinedAt avoids potentially traversing a long input names in the case of an early failure.
You could also use toStream and span to achieve the same thing:
def allPartsWithStream(names: Seq[String]): Option[Seq[Part]] = {
val (good, bad) = names.toStream.map(findPartByName)
.span(!_.isEmpty)
if (bad.isEmpty) Some(good.map(_.get).toList)
else None
}
I've found trying to mix view and span causes findPartByName to be evaluated twice per item in case of success.
The whole idea of returning an error condition if any error occurs does, however, sound more like a job ("the" job?) for throwing and catching exceptions. I suppose it depends on the context in your program.
Combining the other answers, i.e., a mutable flag with the map and takeWhile we love.
Given an infinite stream:
scala> var count = 0
count: Int = 0
scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
Take until a predicate fails:
scala> var failed = false
failed: Boolean = false
scala> vs map { case x if x < 5 => println(s"Yup $x"); Some(x) case x => println(s"Nope $x"); failed = true; None } takeWhile (_.nonEmpty) map (_.get)
Yup 1
res0: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res1: List[Int] = List(1, 2, 3, 4)
or more simply:
scala> var count = 0
count: Int = 0
scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> var failed = false
failed: Boolean = false
scala> vs map { case x if x < 5 => println(s"Yup $x"); x case x => println(s"Nope $x"); failed = true; -1 } takeWhile (_ => !failed)
Yup 1
res3: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res4: List[Int] = List(1, 2, 3, 4)
I think your allParts2 function has a problem as one of the two branches of your match statement will perform a side effect. The return statement is the not-idiomatic bit, behaving as if you are doing an imperative jump.
The first function looks better, but if you are concerned with the sub-optimal iteration that foldLeft could produce you should probably go for a recursive solution as the following:
def allParts(names: Seq[String]): Option[Seq[Part]] = {
#tailrec
def allPartsRec(names: Seq[String], acc: Seq[String]): Option[Seq[String]] = names match {
case Seq(x, xs#_*) => findPartByName(x) match {
case Some(part) => allPartsRec(xs, acc +: part)
case None => None
}
case _ => Some(acc)
}
allPartsRec(names, Seq.empty)
}
I didn't compile/run it but the idea should be there and I believe it is more idiomatic than using the return trick!
I keep thinking that this has to be a one- or two-liner. I came up with one:
def allParts4(names: Seq[String]): Option[Seq[Part]] = Some(
names.map(findPartByName(_) getOrElse { return None })
)
Advantage:
The intent is extremely clear. There's no clutter and there's no exotic or nonstandard Scala.
Disadvantages:
The early return violates referential transparency, as Aldo Stracquadanio pointed out. You can't put the body of allParts4 into its calling code without changing its meaning.
Possibly inefficient due to the internal throwing and catching of an exception, as wingedsubmariner pointed out.
Sure enough, I put this into some real code, and within ten minutes, I'd enclosed the expression inside something else, and predictably got surprising behavior. So now I understand a little better why early return is frowned upon.
This is such a common operation, so important in code that makes heavy use of Option, and Scala is normally so good at combining things, I can't believe there isn't a pretty natural idiom to do it correctly.
Aren't monads good for specifying how to combine actions? Is there a GiveUpAtTheFirstSignOfResistance monad?

Accumulate result until some condition is met in functional way

I have some expensive computation in a loop, and I need to find max value produced by the calculations, though if, say, it will equal to LIMIT I'd like to stop the calculation and return my accumulator.
It may easily be done by recursion:
val list: List[Int] = ???
val UpperBound = ???
def findMax(ls: List[Int], max: Int): Int = ls match {
case h :: rest =>
val v = expensiveComputation(h)
if (v == UpperBound) v
else findMax(rest, math.max(max, v))
case _ => max
}
findMax(list, 0)
My question: whether this behaviour template has a name and reflected in scala collection library?
Update: Do something up to N times or until condition is met in Scala - There is an interesting idea (using laziness and find or exists at the end) but it is not directly applicable to my particular case or requires mutable var to track accumulator.
I think your recursive function is quite nice, so honestly I wouldn't change that, but here's a way to use the collections library:
list.foldLeft(0) {
case (max, next) =>
if(max == UpperBound)
max
else
math.max(expensiveComputation(next), max)
}
It will iterate over the whole list, but after it has hit the upper bound it won't perform the expensive computation.
Update
Based on your comment I tried adapting foldLeft a bit, based on LinearSeqOptimized's foldLeft implementation.
def foldLeftWithExit[A, B](list: Seq[A])(z: B)(exit: B => Boolean)(f: (B, A) => B): B = {
var acc = z
var remaining = list
while (!remaining.isEmpty && !exit(acc)) {
acc = f(acc, list.head)
remaining = remaining.tail
}
acc
}
Calling it:
foldLeftWithExit(list)(0)(UpperBound==){
case (max, next) => math.max(expensiveComputation(next), max)
}
You could potentially use implicits to omit the first parameter of list.
Hope this helps.

Idiomatic "do until" collection updating

Scenario:
val col: IndexedSeq[Array[Char]] = for (i <- 1 to n) yield {
val x = for (j <- 1 to m) yield 'x'
x.toArray
}
This is a fairly simple char matrix. toArray used to allow updating.
var west = last.x - 1
while (west >= 0 && arr(last.y)(west) == '.') {
arr(last.y)(west) = ch;
west -= 1;
}
This is updating all . to ch until a non-dot char is found.
Generically, update until stop condition is met, unknown number of steps.
What is the idiomatic equivalent of it?
Conclusion
It's doable, but the trade-off isn't worth it, a lot of performance is lost to expressive syntax when the collection allows updating.
Your wish for a "cleaner, more idiomatic" solution is of course a little fuzzy, because it leaves a lot of room for subjectivity. In general, I'd consider a tail-recursive updating routine more idiomatic, but it might not be "cleaner" if you're more familiar with a non-functional programming style. I came up with this:
#tailrec
def update(arr:List[Char], replace:Char, replacement:Char, result:List[Char] = Nil):List[Char] = arr match {
case `replace` :: tail =>
update(tail, replace, replacement, replacement :: result)
case _ => result.reverse ::: arr
}
This takes one of the inner sequences (assuming a List for easier pattern matching, since Arrays are trivially convertible to lists), and replaces the replace char with the replacement recursively.
You can then use map to update the outer sequence, like so:
col.map { x => update(x, '.', ch) }
Another more reusable alternative is writing your own mapUntil, or using one which is implemented in a supplemental library (Scalaz probably has something like it). The one I came up with looks like this:
def mapUntil[T](input:List[T])(f:(T => Option[T])) = {
#tailrec
def inner(xs:List[T], result:List[T]):List[T] = xs match {
case Nil => Nil
case head :: tail => f(head) match {
case None => (head :: result).reverse ::: tail
case Some(x) => inner(tail, x :: result)
}
}
inner(input, Nil)
}
It does the same as a regular map invocation, except that it stops as soon as the passed function returns None, e.g.
mapUntil(List(1,2,3,4)) {
case x if x >= 3 => None
case x => Some(x-1)
}
Will result in
List[Int] = List(0, 1, 3, 4)
If you want to look at Scalaz, this answer might be a good place to start.
x3ro's answer is the right answer, esp. if you care about performance or are going to be using this operation in multiple places. I would like to add simple solution using only what you find in the collections API:
col.map { a =>
val (l, r) = a.span(_ == '.')
l.map {
case '.' => ch
case x => x
} ++ r
}

Creating a repeating true/false List in scala

I want to generate a Seq/List of true/false values which I can zip with some input in order to do the equivalent of checking whether a for loop index is odd/even.
Is there a better way than
input.zip((1 to n).map(_ % 2 == 0))
or
input.zip(List.tabulate(n)(_ % 2 != 0))
I would have thought something like (true, false).repeat(n/2) is more obvious
Using #DaveGriffith's idea:
input.zip(Stream.iterate(false)(!_))
Or, if you use this pattern in several places:
def falseTrueStream = Stream.iterate(false)(!_)
input.zip(falseTrueStream)
This has the distinct advantage of not needing to specify the size of the false-true list.
Edit:
Of course, def falseTrueStream creates the stream of true/false objects every time you use it, and as #DanielCSobral mentioned, making it a val will cause the objects to be held in memory (until the program ends if the val is on an object).
If you're slightly evil and want to prematurely optimize it, you can build the Stream objects yourself.
object TrueFalseStream extends Stream[Boolean] {
val tailDefined = true
override val isEmpty = false
override val head = true
override val tail = FalseTrueStream
}
object FalseTrueStream extends Stream[Boolean] {
val tailDefined = true
override val isEmpty = false
override val head = false
override val tail = TrueFalseStream
}
If you want a list of alternating true/false of size n:
List.iterate(false, n)(!_)
So then you could do:
val input = List("a", "b", "c", "d")
input.zip(List.iterate(false, input.length)(!_))
//List[(java.lang.String, Boolean)] = List((a,false), (b,true), (c,false), (d,true))
There's a very useful function in Haskell - cycle - which is useful for such purposes:
haskell> zip [1..7] $ cycle [True, False]
[(1,True),(2,False),(3,True),(4,False),(5,True),(6,False),(7,True)]
For some reason, Scala standard library doesn't have it. You can define it on your own, and then use it.
scala> def cycle[A](s: Stream[A]): Stream[A] = Stream.continually(s).flatten
cycle: [A](s: Stream[A])Stream[A]
scala> (1 to 7) zip cycle(Stream(true, false))
res13: scala.collection.immutable.IndexedSeq[(Int, Boolean)] = Vector((1,true), (2,false), (3,true), (4,false), (5,true), (6,false), (7,true))
You want
input.indices.map(_%2==0)
I couldn't come up with anything simpler (and this is far from simple):
(for(_ <- 1 to n/2) yield List(true, false)).flatten
and:
(1 to n/2).foldLeft(List[Boolean]()) {(cur,_) => List(true, false) ++ cur}
Watch for odd n!
However based on your requirements it looks like you might want to have something lazy:
def oddEven(init: Boolean): Stream[Boolean] = Stream.cons(init, oddEven(!init))
...and it never ends (try: oddEven(true) foreach println). Now you can take as much as you want:
oddEven(true).take(10).toList
...in order to do the equivalent of checking whether a for loop index is odd/even.
I'm ignoring your specific request, and addressing your main concern in a different way.
You can make your own control function, like so:
def for2[A,B](xs: List[A])(f: A => Unit, g: A => Unit): Unit = xs match {
case (y :: ys) => {
f(y)
for2(ys)(g, f)
}
case _ => Unit
}
Testing
> for2(List(0,1,2,3,4,5))((x) => println("E: " + x), (x) => println("O: " + x))
E: 0
O: 1
E: 2
O: 3
E: 4
O: 5

Functional code for looping with early exit

How can I refactor this code in functional style (scala idiomatic)
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return obj
}
null.asInstanceOf[T]
}
This is almost exactly what the find method does, except that it returns an Option. So if you want this exact behavior, you can add a call to Option.orNull, like this:
objects.find(expensiveFunc).orNull
First, don't use null in Scala (except when interacting with Java code) but Options. Second, replace loops with recursion. Third, have a look at the rich API of Scala functions, the method you are looking for already exists as pointed by sepp2k.
For learning puprose your example could be rewritten as:
def findFirst[T](objects: List[T]):Option[T] = objects match {
case first :: rest if expensiveFunc( first ) != null => Some( first )
case _ :: rest => findFirst( rest )
case Nil => None
}
How about a fold?
Our somehow pseudo-expensive function:
scala> def divByFive (n: Int) : Option[Int] = {
| println ("processing " + n)
| if (n % 5 == 0) Some (n) else None }
divByFive: (n: Int)Option[Int]
Folding on an Option:
scala> ((None: Option[Int]) /: (1 to 11)) ((a, b) =>
| if (a != None) a else divByFive (b))
processing 1
processing 2
processing 3
processing 4
processing 5
res69: Option[Int] = Some(5)