Scala foldLeft implementation is:
def foldLeft[B](z: B)(op: (B, A) => B): B = {
var result = z
this foreach (x => result = op(result, x))
result
}
Why scala develovers don't use something like tail recursion or something else like this(It's just example) :
def foldLeft[T](start: T, myList: List[T])(f:(T, T) => T): T = {
def foldRec(accum: T, list: List[T]): T = {
list match {
case Nil => accum
case head :: tail => foldRec(f(accum, head), tail)
}
}
foldRec(start, myList)
}
Can it be? Why if it cannot/can?
"Why not replace this simple three-line piece of code with this less simple seven-line piece of code that does the same thing?"
Um. That's why.
(If you are asking about performance, then one would need benchmarks of both solutions and an indication that the non-closure version was significantly faster.)
According to this answer, Scala does support tail-recursion optimization, but it looks like it wasn't there from the beginning, and it might still not work in every case, so that specific implementation might be a leftover.
That said, Scala is multi-paradigm and I don't think it strives for purity in terms of its functional programming, so I wouldn't be surprised if they went for the most practical or convenient approach.
Beside the imperative solution is simpler, it is also way more general. As you may have noticed, foldLeft is implemented in TraversableOnce and depends only on the foreach method. Thus, by extending Traversable and implementing foreach, which is probably the simplest method to implement on any collection, you get all these wonderful methods.
The declarative implementation on the other hand is reflexive on the structure of the List and very specific as it depends on Nil and ::.
Related
Here is a piece of code from the documentation for fs2. The function go is recursive. The question is how do we know if it is stack safe and how to reason if any function is stack safe?
import fs2._
// import fs2._
def tk[F[_],O](n: Long): Pipe[F,O,O] = {
def go(s: Stream[F,O], n: Long): Pull[F,O,Unit] = {
s.pull.uncons.flatMap {
case Some((hd,tl)) =>
hd.size match {
case m if m <= n => Pull.output(hd) >> go(tl, n - m)
case m => Pull.output(hd.take(n.toInt)) >> Pull.done
}
case None => Pull.done
}
}
in => go(in,n).stream
}
// tk: [F[_], O](n: Long)fs2.Pipe[F,O,O]
Stream(1,2,3,4).through(tk(2)).toList
// res33: List[Int] = List(1, 2)
Would it also be stack safe if we call go from another method?
def tk[F[_],O](n: Long): Pipe[F,O,O] = {
def go(s: Stream[F,O], n: Long): Pull[F,O,Unit] = {
s.pull.uncons.flatMap {
case Some((hd,tl)) =>
hd.size match {
case m if m <= n => otherMethod(...)
case m => Pull.output(hd.take(n.toInt)) >> Pull.done
}
case None => Pull.done
}
}
def otherMethod(...) = {
Pull.output(hd) >> go(tl, n - m)
}
in => go(in,n).stream
}
My previous answer here gives some background information that might be useful. The basic idea is that some effect types have flatMap implementations that support stack-safe recursion directly—you can nest flatMap calls either explicitly or through recursion as deeply as you want and you won't overflow the stack.
For some effect types it's not possible for flatMap to be stack-safe, because of the semantics of the effect. In other cases it may be possible to write a stack-safe flatMap, but the implementers might have decided not to because of performance or other considerations.
Unfortunately there's no standard (or even conventional) way to know whether the flatMap for a given type is stack-safe. Cats does include a tailRecM operation that should provide stack-safe monadic recursion for any lawful monadic effect type, and sometimes looking at a tailRecM implementation that's known to be lawful can provide some hints about whether a flatMap is stack-safe. In the case of Pull it looks like this:
def tailRecM[A, B](a: A)(f: A => Pull[F, O, Either[A, B]]) =
f(a).flatMap {
case Left(a) => tailRecM(a)(f)
case Right(b) => Pull.pure(b)
}
This tailRecM is just recursing through flatMap, and we know that Pull's Monad instance is lawful, which is pretty good evidence that Pull's flatMap is stack-safe. The one complicating factor here is that the instance for Pull has an ApplicativeError constraint on F that Pull's flatMap doesn't, but in this case that doesn't change anything.
So the tk implementation here is stack-safe because flatMap on Pull is stack-safe, and we know that from looking at its tailRecM implementation. (If we dug a little deeper we could figure out that flatMap is stack-safe because Pull is essentially a wrapper for FreeC, which is trampolined.)
It probably wouldn't be terribly hard to rewrite tk in terms of tailRecM, although we'd have to add the otherwise unnecessary ApplicativeError constraint. I'm guessing the authors of the documentation chose not to do that for clarity, and because they knew Pull's flatMap is fine.
Update: here's a fairly mechanical tailRecM translation:
import cats.ApplicativeError
import fs2._
def tk[F[_], O](n: Long)(implicit F: ApplicativeError[F, Throwable]): Pipe[F, O, O] =
in => Pull.syncInstance[F, O].tailRecM((in, n)) {
case (s, n) => s.pull.uncons.flatMap {
case Some((hd, tl)) =>
hd.size match {
case m if m <= n => Pull.output(hd).as(Left((tl, n - m)))
case m => Pull.output(hd.take(n.toInt)).as(Right(()))
}
case None => Pull.pure(Right(()))
}
}.stream
Note that there's no explicit recursion.
The answer to your second question depends on what the other method looks like, but in the case of your specific example, >> will just result in more flatMap layers, so it should be fine.
To address your question more generally, this whole topic is a confusing mess in Scala. You shouldn't have to dig into implementations like we did above just to know whether a type supports stack-safe monadic recursion or not. Better conventions around documentation would be a help here, but unfortunately we're not doing a very good job of that. You could always use tailRecM to be "safe" (which is what you'll want to do when the F[_] is generic, anyway), but even then you're trusting that the Monad implementation is lawful.
To sum up: it's a bad situation all around, and in sensitive situations you should definitely write your own tests to verify that implementations like this are stack-safe.
While checking Intel's BigDL repo, I stumbled upon this method:
private def recursiveListFiles(f: java.io.File, r: Regex): Array[File] = {
val these = f.listFiles()
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_, r))
}
I noticed that it was not tail recursive and decided to write a tail recursive version:
private def recursiveListFiles(f: File, r: Regex): Array[File] = {
#scala.annotation.tailrec def recursiveListFiles0(f: Array[File], r: Regex, a: Array[File]): Array[File] = {
f match {
case Array() => a
case htail => {
val these = htail.head.listFiles()
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
recursiveListFiles0(these.filter(_.isDirectory)++htail.tail, r, a ++ good)
}
}
}
recursiveListFiles0(Array[File](f), r, Array.empty[File])
}
What made this difficult compared to what I am used to is the concept that a File can be transformed into an Array[File] which adds another level of depth.
What is the theory behind recursion on datatypes that have the following member?
def listTs[T]: T => Traversable[T]
Short answer
If you generalize the idea and think of it as a monad (polymorphic thing working for arbitrary type params) then you won't be able to implement a tail recursive implementation.
Trampolines try to solve this very problem by providing a way to evaluate a recursive computation without overflowing the stack. The general idea is to create a stream of pairs of (result, computation). So at each step you'll have to return the computed result up to that point and a function to create the next result (aka thunk).
From Rich Dougherty’s blog:
A trampoline is a loop that repeatedly runs functions. Each function,
called a thunk, returns the next function for the loop to run. The
trampoline never runs more than one thunk at a time, so if you break
up your program into small enough thunks and bounce each one off the
trampoline, then you can be sure the stack won't grow too big.
More + References
In the categorical sense, the theory behind such data types is closely related to Cofree Monads and fold and unfold functions, and in general to Fixed point types.
See this fantastic talk: Fun and Games with Fix Cofree and Doobie by Rob Norris which discusses a use case very similar to your question.
This article about Free monads and Trampolines is also related to your first question: Stackless Scala With Free Monads.
See also this part of the Matryoshka docs. Matryoshka is a Scala library implementing monads around the concept of FixedPoint types.
In languages like SML, Erlang and in buch of others we may define functions like this:
fun reverse [] = []
| reverse x :: xs = reverse xs # [x];
I know we can write analog in Scala like this (and I know, there are many flaws in the code below):
def reverse[T](lst: List[T]): List[T] = lst match {
case Nil => Nil
case x :: xs => reverse(xs) ++ List(x)
}
But I wonder, if we could write former code in Scala, perhaps with desugaring to the latter.
Is there any fundamental limitations for such syntax being implemented in the future (I mean, really fundamental -- e.g. the way type inference works in scala, or something else, except parser obviously)?
UPD
Here is a snippet of how it could look like:
type T
def reverse(Nil: List[T]) = Nil
def reverse(x :: xs: List[T]): List[T] = reverse(xs) ++ List(x)
It really depends on what you mean by fundamental.
If you are really asking "if there is a technical showstopper that would prevent to implement this feature", then I would say the answer is no. You are talking about desugaring, and you are on the right track here. All there is to do is to basically stitch several separates cases into one single function, and this can be done as a mere preprocessing step (this only requires syntactic knowledge, no need for semantic knowledge). But for this to even make sense, I would define a few rules:
The function signature is mandatory (in Haskell by example, this would be optional, but it is always optional whether you are defining the function at once or in several parts). We could try to arrange to live without the signature and attempt to extract it from the different parts, but lack of type information would quickly come to byte us. A simpler argument is that if we are to try to infer an implicit signature, we might as well do it for all the methods. But the truth is that there are very good reasons to have explicit singatures in scala and I can't imagine to change that.
All the parts must be defined within the same scope. To start with, they must be declared in the same file because each source file is compiled separately, and thus a simple preprocessor would not be enough to implement the feature. Second, we still end up with a single method in the end, so it's only natural to have all the parts in the same scope.
Overloading is not possible for such methods (otherwise we would need to repeat the signature for each part just so the preprocessor knows which part belongs to which overload)
Parts are added (stitched) to the generated match in the order they are declared
So here is how it could look like:
def reverse[T](lst: List[T]): List[T] // Exactly like an abstract def (provides the signature)
// .... some unrelated code here...
def reverse(Nil) = Nil
// .... another bit of unrelated code here...
def reverse(x :: xs ) = reverse(xs) ++ List(x)
Which could be trivially transformed into:
def reverse[T](list: List[T]): List[T] = lst match {
case Nil => Nil
case x :: xs => reverse(xs) ++ List(x)
}
// .... some unrelated code here...
// .... another bit of unrelated code here...
It is easy to see that the above transformation is very mechanical and can be done by just manipulating a source AST (the AST produced by the slightly modified grammar that accepts this new constructs), and transforming it into the target AST (the AST produced by the standard scala grammar).
Then we can compile the result as usual.
So there you go, with a few simple rules we are able to implement a preprocessor that does all the work to implement this new feature.
If by fundamental you are asking "is there anything that would make this feature out of place" then it can be argued that this does not feel very scala. But more to the point, it does not bring that much to the table. Scala author(s) actually tend toward making the language simpler (as in less built-in features, trying to move some built-in features into libraries) and adding a new syntax that is not really more readable goes against the goal of simplification.
In SML, your code snippet is literally just syntactic sugar (a "derived form" in the terminology of the language spec) for
val rec reverse = fn x =>
case x of [] => []
| x::xs = reverse xs # [x]
which is very close to the Scala code you show. So, no there is no "fundamental" reason that Scala couldn't provide the same kind of syntax. The main problem is Scala's need for more type annotations, which makes this shorthand syntax far less attractive in general, and probably not worth the while.
Note also that the specific syntax you suggest would not fly well, because there is no way to distinguish one case-by-case function definition from two overloaded functions syntactically. You probably would need some alternative syntax, similar to SML using "|".
I don't know SML or Erlang, but I know Haskell. It is a language without method overloading. Method overloading combined with such pattern matching could lead to ambiguities. Imagine following code:
def f(x: String) = "String "+x
def f(x: List[_]) = "List "+x
What should it mean? It can mean method overloading, i.e. the method is determined in compile time. It can also mean pattern matching. There would be just a f(x: AnyRef) method that would do the matching.
Scala also has named parameters, which would be probably also broken.
I don't think that Scala is able to offer more simple syntax than you have shown in general. A simpler syntax may IMHO work in some special cases only.
There are at least two problems:
[ and ] are reserved characters because they are used for type arguments. The compiler allows spaces around them, so that would not be an option.
The other problem is that = returns Unit. So the expression after the | would not return any result
The closest I could come up with is this (note that is very specialized towards your example):
// Define a class to hold the values left and right of the | sign
class |[T, S](val left: T, val right: PartialFunction[T, T])
// Create a class that contains the | operator
class OrAssoc[T](left: T) {
def |(right: PartialFunction[T, T]): T | T = new |(left, right)
}
// Add the | to any potential target
implicit def anyToOrAssoc[S](left: S): OrAssoc[S] = new OrAssoc(left)
object fun {
// Use the magic of the update method
def update[T, S](choice: T | S): T => T = { arg =>
if (choice.right.isDefinedAt(arg)) choice.right(arg)
else choice.left
}
}
// Use the above construction to define a new method
val reverse: List[Int] => List[Int] =
fun() = List.empty[Int] | {
case x :: xs => reverse(xs) ++ List(x)
}
// Call the method
reverse(List(3, 2, 1))
If I want to generalize the following method to all collection types that support all the necessary operations (foldLeft, flatMap, map, and :+) then how do I do it? Currently it only works with lists.
Code:
def join[A](lists: List[List[A]]): List[List[A]] = {
lists.foldLeft(List(List[A]())) { case (acc, cur) =>
for {
a <- acc
c <- cur
} yield a :+ c
}
}
If you want this only for collections that support :+, the easiest way is just to define it in terms of Seq instead of List.
You can make it a lot more generic, all the way down to Traversable, by using builders. I'd be happy to explain that when I have a bit more time on my hands, but it tends to get complicated at that level.
Scalaz applicative functors is probably the way to go, but I'll let someone with more Scalaz experience than me handle that particular answer.
I have an Iterator[Option[T]] and I want to get an Iterator[T] for those Options where T isDefined. There must be a better way than this:
it filter { _ isDefined} map { _ get }
I would have thought that it was possible in one construct... Anybody any ideas?
In the case where it is an Iterable
val it:Iterable[Option[T]] = ...
it.flatMap( x => x ) //returns an Iterable[T]
In the case where it is an Iterator
val it:Iterator[Option[T]] = ...
it.flatMap( x => x elements ) //returns an Iterator[T]
it.flatMap( _ elements) //equivalent
In newer versions this is now possible:
val it: Iterator[Option[T]] = ...
val flatIt = it.flatten
This works for me (Scala 2.8):
it.collect {case Some(s) => s}
To me, this is a classic use case for the monadic UI.
for {
opt <- iterable
t <- opt
} yield t
It's just sugar for the flatMap solution described above, and it produces identical bytecode. However, syntax matters, and I think one of the best times to use Scala's monadic for syntax is when you're working with Option, especially in conjunction with collections.
I think this formulation is considerably more readable, especially for those not very familiar with functional programming. I often try both the monadic and functional expressions of a loop and see which seems more straightforward. I think flatMap is hard name for most people to grok (and actually, calling it >>= makes more intuitive sense to me).