Why are the Scala libraries implemented with mutable state? - scala

Why are some methods in Scala's standard libraries implemented with mutable state?
For instance, the find method as part of scala.Iterator class is implemented as
def find(p: A => Boolean): Option[A] = {
var res: Option[A] = None
while (res.isEmpty && hasNext) {
val e = next()
if (p(e)) res = Some(e)
}
res
}
Which could have been implemented as a #tailrec'd method, perhaps something like
def findNew(p: A => Boolean): Option[A] = {
#tailrec
def findRec(e: A): Option[A] = {
if (p(e)) Some(e)
else {
if (hasNext) findRec(next())
else None
}
}
if (hasNext) findRec(next())
else None
}
Now I suppose one argument could be the use of mutable state and a while loop could be more efficient, which is understandably very important in library code, but is that really the case over a #tailrec'd method?

There is no harm in having a mutable state as long as he is not shared.
In your example there is no way the mutable var could be accessed from outside, so it's not possible that this mutable variable change due to a side effect.
It's always good to enforce immutability as much as possible, but when performance matter there is nothing wrong in having some mutability as long as it's constrained in a safe way.
NOTE: Iterator is a data-structure which is not side-effect free and this could lead to some weird behavior, but this is an other story and in no way the reason for designing a method in such way. You'll find method like that in immutable data-structure too.

In this case the tailrec quite possibly has the same performance as the while loop. I would say that in this case the while loop solution is shorter and more concise.
But, iterators are a mutable abstraction anyway, so the gain of having a tail recursive method to avoid that var, which is local to that short code snippet, is questionable.

Scala is not designed for functional purity but for broadly useful capability. Part of this includes trying to have the most efficient implementations of basic library routines (certainly not universally true, but it often is).
As such, if you have two possible interfaces:
trait Iterator[A] { def next: A }
trait FunctionalIterator[A] { def next: (A, FunctionalIterator[A]) }
and the second one is awkward and slower, it's quite sensible to choose the first.
When a functionally pure implementation is superior for the bulk of use cases, you'll typically find the functionally pure one.
And when it comes to simply using a while loop vs. recursion, either one is easy enough to maintain so it's really up to the preferences of the coder. Note that find would have to be marked final in the tailrec case, so while preserves more flexibility:
trait Foo {
def next: Int
def foo: Int = {
var a = next
while (a < 0) a = next
a
}
}
defined trait Foo
trait Bar {
def next: Int
#tailrec def bar: Int = {
val a = next
if (a < 0) bar else a
}
}
<console>:10: error: could not optimize #tailrec annotated method bar:
it is neither private nor final so can be overridden
#tailrec def bar: Int = {
^
There are ways to get around this (nested methods, final, redirect to private method, etc.), but it tends to adds boilerplate to the point where the while is syntactically more compact.

Related

Scala: List.forall using while loop

I see following implementation in List.scala from Scala library:
override final def forall(p: A => Boolean): Boolean = {
var these: List[A] = this
while (!these.isEmpty) {
if (!p(these.head)) return false
these = these.tail
}
true
}
This method can be implemented recursively to get rid of var and while loop.
Reading through all available books,blogs, articles etc online etc, I am under impression that we are supposed to follow recursive approach as much as we can in Scala.
The mutability of var these is not visible outside forall method and likely helps with performance
override final def forall(p: A => Boolean): Boolean = {
var these: List[A] = this
...
true
} // var is out-of-scope at this point
so technically forall is still pure from perspective of callers. Tail recursive approach would also probably have similar performance as var+while though.
The recursive approach is considered more declarative and readable that's why it's a good approach. From the other side functional approach implies immutibilty in particular which can lead to perfornace degradation so using such approach in Scala library is understandable. I assume that the authors of the Scala library didn't assume users heavily read their code as it's should be under the hood.

Why is this method causing a StackOverflowError? [duplicate]

Why won't the Scala compiler apply tail call optimization unless a method is final?
For example, this:
class C {
#tailrec def fact(n: Int, result: Int): Int =
if(n == 0)
result
else
fact(n - 1, n * result)
}
results in
error: could not optimize #tailrec annotated method: it is neither private nor final so can be overridden
What exactly would go wrong if the compiler applied TCO in a case such as this?
Consider the following interaction with the REPL. First we define a class with a factorial method:
scala> class C {
def fact(n: Int, result: Int): Int =
if(n == 0) result
else fact(n - 1, n * result)
}
defined class C
scala> (new C).fact(5, 1)
res11: Int = 120
Now let's override it in a subclass to double the superclass's answer:
scala> class C2 extends C {
override def fact(n: Int, result: Int): Int = 2 * super.fact(n, result)
}
defined class C2
scala> (new C).fact(5, 1)
res12: Int = 120
scala> (new C2).fact(5, 1)
What result do you expect for this last call? You might be expecting 240. But no:
scala> (new C2).fact(5, 1)
res13: Int = 7680
That's because when the superclass's method makes a recursive call, the recursive call goes through the subclass.
If overriding worked such that 240 was the right answer, then it would be safe for tail-call optimization to be performed in the superclass here. But that isn't how Scala (or Java) works.
Unless a method is marked final, it might not be calling itself when it makes a recursive call.
And that's why #tailrec doesn't work unless a method is final (or private).
UPDATE: I recommend reading the other two answers (John's and Rex's) as well.
Recursive calls might be to a subclass instead of to a superclass; final will prevent that. But why might you want that behavior? The Fibonacci series doesn't provide any clues. But this does:
class Pretty {
def recursivePrinter(a: Any): String = { a match {
case xs: List[_] => xs.map(recursivePrinter).mkString("L[",",","]")
case xs: Array[_] => xs.map(recursivePrinter).mkString("A[",",","]")
case _ => a.toString
}}
}
class Prettier extends Pretty {
override def recursivePrinter(a: Any): String = { a match {
case s: Set[_] => s.map(recursivePrinter).mkString("{",",","}")
case _ => super.recursivePrinter(a)
}}
}
scala> (new Prettier).recursivePrinter(Set(Set(0,1),1))
res8: String = {{0,1},1}
If the Pretty call was tail-recursive, we'd print out {Set(0, 1),1} instead since the extension wouldn't apply.
Since this sort of recursion is plausibly useful, and would be destroyed if tail calls on non-final methods were allowed, the compiler inserts a real call instead.
Let foo::fact(n, res) denote your routine. Let baz::fact(n, res) denote someone else's override of your routine.
The compiler is telling you that the semantics allow baz::fact() to be a wrapper, that MAY upcall (?) foo::fact() if it wants to. Under such a scenario, the rule is that foo::fact(), when it recurs, must activate baz::fact() rather than foo::fact(), and, while foo::fact() is tail-recursive, baz::fact() may not be. At that point, rather than looping on the tail-recursive call, foo::fact() must return to baz::fact(), so it can unwind itself.
What exactly would go wrong if the compiler applied TCO in a case such as this?
Nothing would go wrong. Any language with proper tail call elimination will do this (SML, OCaml, F#, Haskell etc.). The only reason Scala does not is that the JVM does not support tail recursion and Scala's usual hack of replacing self-recursive calls in tail position with goto does not work in this case. Scala on the CLR could do this as F# does.
The popular and accepted answer to this question is actually misleading, because the question itself is confusing. The OP does not make the distinction between tailrec and TCO, and the answer does not address this.
The key point is that the requirements for tailrec are more strict than the requirements for TCO.
The tailrec annotation requires that tail calls are made to the same function, whereas TCO can be used on tail calls to any function.
The compiler could use TCO on fact because there is a call in the tail position. Specifically, it could turn the call to fact into a jump to fact by adjusting the stack appropriately. It does not matter that this version of fact is not the same as the function making the call.
So the accepted answer correctly explains why a non-final function cannot be tailrec because you cannot guarantee that the tail calls are to the same function and not to an overloaded version of the function. But it incorrectly implies that it is not safe to use TCO on this method, when in fact this would be perfectly safe and a good optimisation.
[ Note that, as explained by Jon Harrop, you cannot implement TCO on the JVM, but that is a restriction of the compiler, not the language, and is unrelated to tailrec ]
And for reference, here is how you can avoid the problem without making the method final:
class C {
def fact(n: Int): Int = {
#tailrec
def loop(n: Int, result: Int): Int =
if (n == 0) {
result
} else {
loop(n - 1, n * result)
}
loop(n, 1)
}
}
This works because loop is a concrete function rather than a method and cannot be overridden. This version also has the advantage of removing the spurious result parameter to fact.
This is the pattern I use for all recursive algorithms.

Simple example of extending a Scala collection

I'm looking for a very simple example of subclassing a Scala collection. I'm not so much interested in full explanations of how and why it all works; plenty of those are available here and elsewhere on the Internet. I'd like to know the simple way to do it.
The class below might be as simple an example as possible. The idea is, make a subclass of Set[Int] which has one additional method:
class SlightlyCustomizedSet extends Set[Int] {
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Obviously this is wrong. One problem is that there's no constructor to put things into the Set. A CanBuildFrom object must be built, preferably by calling some already-existing library code that knows how to build it. I've seen examples that implement several additional methods in the companion object, but they're showing how it all works or how to do something more complicated. I'd like to see how to leverage what's already in the libraries to knock this out in a couple lines of code. What's the smallest, simplest way to implement this?
If you just want to add a single method to a class, then subclassing may not be the way to go. Scala's collections library is somewhat complicated, and leaf classes aren't always amenable to subclassing (one might start by subclassing HashSet, but this would start you on a journey down a deep rabbit hole).
Perhaps a simpler way to achieve your goal would be something like:
implicit class SetPimper(val s: Set[Int]) extends AnyVal {
def findOdd: Option[Int] = s.find(_ % 2 == 1)
}
This doesn't actually subclass Set, but creates an implicit conversion that allows you to do things like:
Set(1,2,3).findOdd // Some(1)
Down the Rabbit Hole
If you've come from a Java background, it might be surprising that it's so difficult to extend standard collections - after all the Java standard library's peppered with j.u.ArrayList subclasses, for pretty much anything that can contain other things. However, Scala has one key difference: its first-choice collections are all immutable.
This means that they don't have add methods that modify them in-place. Instead, they have + methods that construct a new instance, with all the original items, plus the new item. If they'd implemented this naïvely, it'd be very inefficient, so they use various class-specific tricks to allow the new instances to share data with the original one. The + method may even return an object of a different type to the original - some of the collections classes use a different representation for small or empty collections.
However, this also means that if you want to subclass one of the immutable collections, then you need to understand the guts of the class you're subclassing, to ensure that your instances of your subclass are constructed in the same way as the base class.
By the way, none of this applies to you if you want to subclass the mutable collections. They're seen as second class citizens in the scala world, but they do have add methods, and rarely need to construct new instances. The following code:
class ListOfUsers(users: Int*) extends scala.collection.mutable.HashSet[Int] {
this ++= users
def findOdd: Option[Int] = find(_ % 2 == 1)
}
Will probably do more-or-less what you expect in most cases (map and friends might not do quite what you expect, because of the the CanBuildFrom stuff that I'll get to in a minute, but bear with me).
The Nuclear Option
If inheritance fails us, we always have a nuclear option to fall back on: composition. We can create our own Set subclass that delegates its responsibilities to a delegate, as such:
import scala.collection.SetLike
import scala.collection.mutable.Builder
import scala.collection.generic.CanBuildFrom
class UserSet(delegate: Set[Int]) extends Set[Int] with SetLike[Int, UserSet] {
override def contains(key: Int) = delegate.contains(key)
override def iterator = delegate.iterator
override def +(elem: Int) = new UserSet(delegate + elem)
override def -(elem: Int) = new UserSet(delegate - elem)
override def empty = new UserSet(Set.empty)
override def newBuilder = UserSet.newBuilder
override def foreach[U](f: Int => U) = delegate.foreach(f) // Optional
override def size = delegate.size // Optional
}
object UserSet {
def apply(users: Int*) = (newBuilder ++= users).result()
def newBuilder = new Builder[Int, UserSet] {
private var delegateBuilder = Set.newBuilder[Int]
override def +=(elem: Int) = {
delegateBuilder += elem
this
}
override def clear() = delegateBuilder.clear()
override def result() = new UserSet(delegateBuilder.result())
}
implicit object UserSetCanBuildFrom extends CanBuildFrom[UserSet, Int, UserSet] {
override def apply() = newBuilder
override def apply(from: UserSet) = newBuilder
}
}
This is arguably both too complicated and too simple at the same time. It's far more lines of code than we meant to write, and yet, it's still pretty naïve.
It'll work without the companion class, but without CanBuildFrom, map will return a plain Set, which may not be what you expect. We've also overridden the optional methods that the documentation for Set recommends we implement.
If we were being thorough, we'd have created a CanBuildFrom, and implemented empty for our mutable class, as this ensures that the handful of methods that create new instances will work as we expect.
But that sounds like a lot of work...
If that sounds like too much work, consider something like the following:
case class UserSet(users: Set[Int])
Sure, you have to type a few more letters to get at the set of users, but I think it separates concerns better than subclassing.

Could Scala's “if … else” have been implemented as a library function?

I'm wondering if if … else could have been implemented in Predef with special compiler treatment, in a similar way to what's being done with classOf[A]: the definition is in Predef, the implementation is filled in by the compiler.
Granted, many people would find reassuring to know that an if is always an if, and an else is always an else, no matter the context. However, defining else as a method on the result type of if would remove it from the list of keywords, and allow library designers to define their own else methods. (I know I can use any keyword as an identifier with backticks, but something like `else` just looks awful in code.) Such methods could be useful in cases discusses in situations such as this one, discussed on the mailing list, where people are forced to use otherwise when defining methods that actually should be named else. (Also discussed on SO here and here.)
So:
Would such an approach be possible, even in theory, or does it break some fundamental principle in Scala?
What would the downsides be?
Maybe I don't understand your question, but you can already implement if ... else ... as a library function. Consider this:
class If[A](condition: =>Boolean)(ifBlock: =>A) {
def els(elseBlock: =>A):A = condition match {
case true => ifBlock
case false => elseBlock
}
}
new If(2==3)(
println("equal")
) els (
println("not equal")
)
Of course this doesn't do exactly what if ... else ... does, but with some polishing I think it would. I once implemented a very simple interpreter for a language that had pattern matching built in with if ... else ... being implemented in much the same way I did here.
The short answer is "yes"; branching logic on some predicate can be implemented as a library function.
It's worth pointing out that, as Viktor Klang and others have noted, if/else is essentially folding a boolean. Folding is something we do frequently - sometimes it's clear and explicit, and sometimes not.
// Either#fold is explicit
scala> Left[String, Double]("fail") fold(identity, _ + 1 toString)
res0: java.lang.String = fail
scala> Right[String, Double](4) fold(identity, _ + 1 toString)
res1: java.lang.String = 5.0
Folding an option cannot be done explicitly, but we do it all the time.
// Option has no fold - wont compile!
Some(5) fold(1+, 0)
// .. but the following is equivalent and valid
scala> Some(5) map(1+) getOrElse(0)
res3: Int = 6
Branching logic on a boolean is also a fold, and you can pimp Boolean accordingly. Note the use of by-name parameters to achieve lazy evaluation. Without this feature, such an implementation wouldn't be possible.
// pimped Boolean - evaluates t when true, f when false
class FoldableBoolean(b: Boolean) {
def fold[A](t: => A, f: => A) =
if(b) t else f
}
implicit def b2fb(b: Boolean) = new FoldableBoolean(b)
Now we can fold Booleans:
scala> true fold("true!", "false")
res24: java.lang.String = true!
scala> false fold("true!", "false")
res25: java.lang.String = false
Not just if-else, but any language feature can be overridden in a branch of the language known as "Scala Virtualized"
https://github.com/TiarkRompf/scala-virtualized
This forms the basis of the Delite project at Stanford PPL, and is also at the heart of the research being funded by Scala's EU grant. So you can reasonably expect it to be part of the core language at some point in the future.
Any object-oriented language (or any language with runtime polymorphism) can implement conditionals as a library feature, since method dispatch already is a more general form of conditional anyway. Smalltalk, for example, has absolutely no conditionals whatsoever except for method dispatch.
There is no need for any kind of compiler magic, except maybe for syntactic convenience.
In Scala, it would look maybe a little bit like this:
trait MyBooleanLike {
def iff[T <: AnyRef](thenn: => T): T
def iffElse[T](thenn: => T)(els: => T): T
def &&(other: => MyBoolean): MyBoolean
def ||(other: => MyBoolean): MyBoolean
def nott: MyBoolean
}
trait MyTruthiness extends MyBooleanLike {
def iff[T](thenn: => T) = thenn
def iffElse[T](thenn: => T)(els: => T) = thenn
def &&(other: => MyBoolean) = other
def ||(other: => MyBoolean) = MyTrue
def nott = MyFalse
}
trait MyFalsiness extends MyBooleanLike {
def iff[T](thenn: => T): T = null.asInstanceOf[T]
def iffElse[T](thenn: => T)(els: => T) = els
def &&(other: => MyBoolean) = MyFalse
def ||(other: => MyBoolean) = other
def nott = MyTrue
}
abstract class MyBoolean extends MyBooleanLike
class MyTrueClass extends MyBoolean with MyTruthiness {}
class MyFalseClass extends MyBoolean with MyFalsiness {}
object MyTrue extends MyTrueClass {}
object MyFalse extends MyFalseClass {}
Just add a little implicit conversion:
object MyBoolExtension {
implicit def boolean2MyBoolean(b: => Boolean) =
if (b) { MyTrue } else { MyFalse }
}
import MyBoolExtension._
And now we can use it:
object Main extends App {
(2 < 3) iff { println("2 is less than 3") }
}
[Note: my type-fu is rather weak. I had to cheat a little bit to get this to compile within a reasonable timeframe. Someone with a better understanding of Scala's type system may want to fix it up. Also, now that I look at it, 8 classes, traits and objects, two of them abstract, seems a little over-engineered ;-) ]
Of course, the same is true for pattern matching as well. Any language with pattern matching doesn't need other kinds of conditionals, since pattern matching is more general anyway.
[BTW: This is basically a port of this Ruby code I wrote a couple of years ago for fun.]

Why won't the Scala compiler apply tail call optimization unless a method is final?

Why won't the Scala compiler apply tail call optimization unless a method is final?
For example, this:
class C {
#tailrec def fact(n: Int, result: Int): Int =
if(n == 0)
result
else
fact(n - 1, n * result)
}
results in
error: could not optimize #tailrec annotated method: it is neither private nor final so can be overridden
What exactly would go wrong if the compiler applied TCO in a case such as this?
Consider the following interaction with the REPL. First we define a class with a factorial method:
scala> class C {
def fact(n: Int, result: Int): Int =
if(n == 0) result
else fact(n - 1, n * result)
}
defined class C
scala> (new C).fact(5, 1)
res11: Int = 120
Now let's override it in a subclass to double the superclass's answer:
scala> class C2 extends C {
override def fact(n: Int, result: Int): Int = 2 * super.fact(n, result)
}
defined class C2
scala> (new C).fact(5, 1)
res12: Int = 120
scala> (new C2).fact(5, 1)
What result do you expect for this last call? You might be expecting 240. But no:
scala> (new C2).fact(5, 1)
res13: Int = 7680
That's because when the superclass's method makes a recursive call, the recursive call goes through the subclass.
If overriding worked such that 240 was the right answer, then it would be safe for tail-call optimization to be performed in the superclass here. But that isn't how Scala (or Java) works.
Unless a method is marked final, it might not be calling itself when it makes a recursive call.
And that's why #tailrec doesn't work unless a method is final (or private).
UPDATE: I recommend reading the other two answers (John's and Rex's) as well.
Recursive calls might be to a subclass instead of to a superclass; final will prevent that. But why might you want that behavior? The Fibonacci series doesn't provide any clues. But this does:
class Pretty {
def recursivePrinter(a: Any): String = { a match {
case xs: List[_] => xs.map(recursivePrinter).mkString("L[",",","]")
case xs: Array[_] => xs.map(recursivePrinter).mkString("A[",",","]")
case _ => a.toString
}}
}
class Prettier extends Pretty {
override def recursivePrinter(a: Any): String = { a match {
case s: Set[_] => s.map(recursivePrinter).mkString("{",",","}")
case _ => super.recursivePrinter(a)
}}
}
scala> (new Prettier).recursivePrinter(Set(Set(0,1),1))
res8: String = {{0,1},1}
If the Pretty call was tail-recursive, we'd print out {Set(0, 1),1} instead since the extension wouldn't apply.
Since this sort of recursion is plausibly useful, and would be destroyed if tail calls on non-final methods were allowed, the compiler inserts a real call instead.
Let foo::fact(n, res) denote your routine. Let baz::fact(n, res) denote someone else's override of your routine.
The compiler is telling you that the semantics allow baz::fact() to be a wrapper, that MAY upcall (?) foo::fact() if it wants to. Under such a scenario, the rule is that foo::fact(), when it recurs, must activate baz::fact() rather than foo::fact(), and, while foo::fact() is tail-recursive, baz::fact() may not be. At that point, rather than looping on the tail-recursive call, foo::fact() must return to baz::fact(), so it can unwind itself.
What exactly would go wrong if the compiler applied TCO in a case such as this?
Nothing would go wrong. Any language with proper tail call elimination will do this (SML, OCaml, F#, Haskell etc.). The only reason Scala does not is that the JVM does not support tail recursion and Scala's usual hack of replacing self-recursive calls in tail position with goto does not work in this case. Scala on the CLR could do this as F# does.
The popular and accepted answer to this question is actually misleading, because the question itself is confusing. The OP does not make the distinction between tailrec and TCO, and the answer does not address this.
The key point is that the requirements for tailrec are more strict than the requirements for TCO.
The tailrec annotation requires that tail calls are made to the same function, whereas TCO can be used on tail calls to any function.
The compiler could use TCO on fact because there is a call in the tail position. Specifically, it could turn the call to fact into a jump to fact by adjusting the stack appropriately. It does not matter that this version of fact is not the same as the function making the call.
So the accepted answer correctly explains why a non-final function cannot be tailrec because you cannot guarantee that the tail calls are to the same function and not to an overloaded version of the function. But it incorrectly implies that it is not safe to use TCO on this method, when in fact this would be perfectly safe and a good optimisation.
[ Note that, as explained by Jon Harrop, you cannot implement TCO on the JVM, but that is a restriction of the compiler, not the language, and is unrelated to tailrec ]
And for reference, here is how you can avoid the problem without making the method final:
class C {
def fact(n: Int): Int = {
#tailrec
def loop(n: Int, result: Int): Int =
if (n == 0) {
result
} else {
loop(n - 1, n * result)
}
loop(n, 1)
}
}
This works because loop is a concrete function rather than a method and cannot be overridden. This version also has the advantage of removing the spurious result parameter to fact.
This is the pattern I use for all recursive algorithms.