Why scala #tailrec can't be used on Option.flatMap? - scala

In scala, the following 2 functions serve exactly the same purpose:
#tailrec
final def fn(str: String): Option[String] = {
Option(str).filter(_.nonEmpty).flatMap { v =>
fn(v.drop(1))
}
}
#tailrec
final def fn2(str: String): Option[String] = {
Option(str).filter(_.nonEmpty) match {
case None => None
case Some(v) => fn2(v.drop(1))
}
}
However #tailrec only works in second case, in the first case it will generate the following error:
Error: could not optimize #tailrec annotated method fn: it contains a
recursive call not in tail position
Option(str).filter(_.nonEmpty).flatMap { v =>
Why this error was given? And why these 2 codes generate different kinds JVM bytecode

For fn to be tail-recursive, the recursive call must be the last action in the function. If you pass fn to another function such as flatMap then the other function is free to perform other actions after calling fn and therefore the compiler cannot be sure that it is tail recursive.
In some cases the compiler could detect that calling fn is the last action in the other function, but not in the general case. And this would rely on a specific implementation of that other function so the tailrec annotation might become invalid if that other function were changed, which is an undesirable dependency.

Specifically for the last question:
And why these 2 codes generate different kinds JVM bytecode
Because on JVM there's no guarantee that the JAR containing Option class at runtime is the same as was seen at compile-time. This is good, because otherwise even minor versions of libraries (including standard Java and Scala libraries) would be incompatible, and you'd need all dependencies to be using the same minor version of their common dependencies.
If that class doesn't have a suitable flatMap method, you'll get AbstractMethodError, but otherwise semantics of Scala require that its flatMap method must be called. So the compiler has to emit bytecode to actually call the method.
Kotlin works around this by using inline functions and Scala 3 will support them too, but I don't know if it'll use them for such cases.

Consider the following:
List('a', 'b').flatMap(List(_,'g')) //res0: List[Char] = List(a, g, b, g)
I seems pretty obvious that flatMap() is doing some internal post-processing in order to achieve that result. How else would List('a','g') get combined with List('b','g')?

Related

How to mock tail recursive functions?

I want to test my code which has a few tail recursive functions.
I couldn't mock the tail recursive functions because they need to declared either Final or Private.
Most mocking frameworks don't support mocking such methods, the ones that support isn't working as expected.
Is this possible at all?
Can some one provide me their ideas to mock tail recursive functions?
I tried mocking using Mockito Framework version 3.0.0. My Test suite is extended with the Mockito-Sugar trait.
Though mockito documentation suggests that final methods can be mocked, It results in failure for me.
I tried using scala-mock. I faced different problems and didn't work out.
One way to fix this is to wrap the recursive code in an outer function. For example:
def factorial(n: Int): Int = {
#annotation.tailrec
def loop(i: Int, res: Int): Int =
if (i <= 1) {
res
} else {
loop(i-1, i*res)
}
loop(n-1, n)
}
Using this pattern the factorial method does not need to be final or private so it can be overridden for testing.
Another advantage of this pattern is that the accumulator value res does not have to be exposed in the main function interface. This pattern also allows special cases to be handled outside the main recursive code, making the inner code simpler and potentially faster.

Could not optimize methods

Here is a minimal code that raise the compilation error 'Recursive call not in tail position'. However, I'm using an #inline and the recursive call is in tail position. The reason why I'm using this #inline is that I have the code pf the original reccall duplicated twice.
import scala.annotation._
object Test {
#tailrec private def test(i: Int): Int = {
#inline def reccall(i: Int): Int = test(i-1)
i match {
case 0 => 0
case i => reccall(i)
}
}
}
I've looked at the answers Recursive call not in tail position #tailrec why does this method not compile with 'contains a recursive call not in tail position'? but they do not apply to my case. Using Scala 2.12
It appears, the way #inline is implemented is that it still passes the parameter via stack. The jump is eliminated, by inserting the code inline, but the stack is still used for the arguments. This makes it impossible to be in a tail position, because the stack needs to be cleaned up after the call is completed.
Besides, annotating a function with #inline does not guarantee that the optimizer will inline it, just that it will "try especially hard".
Well, the mechanism of how tail recursion is actualized in JVM is explained in following way:
Scala, in the case of tail recursion, can eliminate the creation of a
new stack frame and just re-use the current stack frame. The stack
never gets any deeper, no matter how many times the recursive call is
made.
So in your case it cannot reuse the current stack frame belonging to the test method since it MUST create a new stack frame for the reccall method anyway.
Recursive call is implicit in this case, made from another method. So I believe you cannot really have tail recursion implemented for such case.
You may just remove the reccall method altogether and write case i => test(i-1) and then compiler will not complain.
NOTE: also I believe #inline has nothing to do here and is not essential in this example, since if I remove it - compiler still complains the same reason.
The issue here is that #inline is strictly advisory: it doesn't guarantee that the compiler will inline the function. Since #tailrec only works if it's absolutely guaranteed that the tail-calls can be eliminated, this means that using #tailrec has to assume no inlining.

Why do we need flatMap (in general)?

I have been looking into FP languages (off and on) for some time and have played with Scala, Haskell, F#, and some others. I like what I see and understand some of the fundamental concepts of FP (with absolutely no background in Category Theory - so don't talk Math, please).
So, given a type M[A] we have map which takes a function A=>B and returns a M[B]. But we also have flatMap which takes a function A=>M[B] and returns a M[B]. We also have flatten which takes a M[M[A]] and returns a M[A].
In addition, many of the sources I have read describe flatMap as map followed by flatten.
So, given that flatMap seems to be equivalent to flatten compose map, what is its purpose? Please don't say it is to support 'for comprehensions' as this question really isn't Scala-specific. And I am less concerned with the syntactic sugar than I am in the concept behind it. The same question arises with Haskell's bind operator (>>=). I believe they both are related to some Category Theory concept but I don't speak that language.
I have watched Brian Beckman's great video Don't Fear the Monad more than once and I think I see that flatMap is the monadic composition operator but I have never really seen it used the way he describes this operator. Does it perform this function? If so, how do I map that concept to flatMap?
BTW, I had a long writeup on this question with lots of listings showing experiments I ran trying to get to the bottom of the meaning of flatMap and then ran into this question which answered some of my questions. Sometimes I hate Scala implicits. They can really muddy the waters. :)
FlatMap, known as "bind" in some other languages, is as you said yourself for function composition.
Imagine for a moment that you have some functions like these:
def foo(x: Int): Option[Int] = Some(x + 2)
def bar(x: Int): Option[Int] = Some(x * 3)
The functions work great, calling foo(3) returns Some(5), and calling bar(3) returns Some(9), and we're all happy.
But now you've run into the situation that requires you to do the operation more than once.
foo(3).map(x => foo(x)) // or just foo(3).map(foo) for short
Job done, right?
Except not really. The output of the expression above is Some(Some(7)), not Some(7), and if you now want to chain another map on the end you can't because foo and bar take an Int, and not an Option[Int].
Enter flatMap
foo(3).flatMap(foo)
Will return Some(7), and
foo(3).flatMap(foo).flatMap(bar)
Returns Some(15).
This is great! Using flatMap lets you chain functions of the shape A => M[B] to oblivion (in the previous example A and B are Int, and M is Option).
More technically speaking; flatMap and bind have the signature M[A] => (A => M[B]) => M[B], meaning they take a "wrapped" value, such as Some(3), Right('foo), or List(1,2,3) and shove it through a function that would normally take an unwrapped value, such as the aforementioned foo and bar. It does this by first "unwrapping" the value, and then passing it through the function.
I've seen the box analogy being used for this, so observe my expertly drawn MSPaint illustration:
This unwrapping and re-wrapping behavior means that if I were to introduce a third function that doesn't return an Option[Int] and tried to flatMap it to the sequence, it wouldn't work because flatMap expects you to return a monad (in this case an Option)
def baz(x: Int): String = x + " is a number"
foo(3).flatMap(foo).flatMap(bar).flatMap(baz) // <<< ERROR
To get around this, if your function doesn't return a monad, you'd just have to use the regular map function
foo(3).flatMap(foo).flatMap(bar).map(baz)
Which would then return Some("15 is a number")
It's the same reason you provide more than one way to do anything: it's a common enough operation that you may want to wrap it.
You could ask the opposite question: why have map and flatten when you already have flatMap and a way to store a single element inside your collection? That is,
x map f
x filter p
can be replaced by
x flatMap ( xi => x.take(0) :+ f(xi) )
x flatMap ( xi => if (p(xi)) x.take(0) :+ xi else x.take(0) )
so why bother with map and filter?
In fact, there are various minimal sets of operations you need to reconstruct many of the others (flatMap is a good choice because of its flexibility).
Pragmatically, it's better to have the tool you need. Same reason why there are non-adjustable wrenches.
The simplest reason is to compose an output set where each entry in the input set may produce more than one (or zero!) outputs.
For example, consider a program which outputs addresses for people to generate mailers. Most people have one address. Some have two or more. Some people, unfortunately, have none. Flatmap is a generalized algorithm to take a list of these people and return all of the addresses, regardless of how many come from each person.
The zero output case is particularly useful for monads, which often (always?) return exactly zero or one results (think Maybe- returns zero results if the computation fails, or one if it succeeds). In that case you want to perform an operation on "all of the results", which it just so happens may be one or many.
The "flatMap", or "bind", method, provides an invaluable way to chain together methods that provide their output wrapped in a Monadic construct (like List, Option, or Future). For example, suppose you have two methods that produce a Future of a result (eg. they make long-running calls to databases or web service calls or the like, and should be used asynchronously):
def fn1(input1: A): Future[B] // (for some types A and B)
def fn2(input2: B): Future[C] // (for some types B and C)
How to combine these? With flatMap, we can do this as simply as:
def fn3(input3: A): Future[C] = fn1(a).flatMap(b => fn2(b))
In this sense, we have "composed" a function fn3 out of fn1 and fn2 using flatMap, which has the same general structure (and so can be composed in turn with further similar functions).
The map method would give us a not-so-convenient - and not readily chainable - Future[Future[C]]. Certainly we can then use flatten to reduce this, but the flatMap method does it in one call, and can be chained as far as we wish.
This is so useful a way of working, in fact, that Scala provides the for-comprehension as essentially a short-cut for this (Haskell, too, provides a short-hand way of writing a chain of bind operations - I'm not a Haskell expert, though, and don't recall the details) - hence the talk you will have come across about for-comprehensions being "de-sugared" into a chain of flatMap calls (along with possible filter calls and a final map call for the yield).
Well, one could argue, you don't need .flatten either. Why not just do something like
#tailrec
def flatten[T](in: Seq[Seq[T], out: Seq[T] = Nil): Seq[T] = in match {
case Nil => out
case head ::tail => flatten(tail, out ++ head)
}
Same can be said about map:
#tailrec
def map[A,B](in: Seq[A], out: Seq[B] = Nil)(f: A => B): Seq[B] = in match {
case Nil => out
case head :: tail => map(tail, out :+ f(head))(f)
}
So, why are .flatten and .map provided by the library? Same reason .flatMap is: convenience.
There is also .collect, which is really just
list.filter(f.isDefinedAt _).map(f)
.reduce is actually nothing more then list.foldLeft(list.head)(f),
.headOption is
list match {
case Nil => None
case head :: _ => Some(head)
}
Etc ...

What is a Manifest in Scala and when do you need it?

Since Scala 2.7.2 there is something called Manifest which is a workaround for Java's type erasure. But how does Manifest work exactly and why / when do you need to use it?
The blog post Manifests: Reified Types by Jorge Ortiz explains some of it, but it doesn't explain how to use it together with context bounds.
Also, what is ClassManifest, what's the difference with Manifest?
I have some code (part of a larger program, can't easily include it here) that has some warnings with regard to type erasure; I suspect I can solve these by using manifests, but I'm not sure exactly how.
The compiler knows more information about types than the JVM runtime can easily represent. A Manifest is a way for the compiler to send an inter-dimensional message to the code at runtime about the type information that was lost.
It isn't clear if a Manifest would benefit the errors you are seeing without knowing more detail.
One common use of Manifests is to have your code behave differently based on the static type of a collection. For example, what if you wanted to treat a List[String] differently from other types of a List:
def foo[T](x: List[T])(implicit m: Manifest[T]) = {
if (m <:< manifest[String])
println("Hey, this list is full of strings")
else
println("Non-stringy list")
}
foo(List("one", "two")) // Hey, this list is full of strings
foo(List(1, 2)) // Non-stringy list
foo(List("one", 2)) // Non-stringy list
A reflection-based solution to this would probably involve inspecting each element of the list.
A context bound seems most suited to using type-classes in scala, and is well explained here by Debasish Ghosh:
http://debasishg.blogspot.com/2010/06/scala-implicits-type-classes-here-i.html
Context bounds can also just make the method signatures more readable. For example, the above function could be re-written using context bounds like so:
def foo[T: Manifest](x: List[T]) = {
if (manifest[T] <:< manifest[String])
println("Hey, this list is full of strings")
else
println("Non-stringy list")
}
A Manifest was intended to reify generic types that get type-erased to run on the JVM (which does not support generics). However, they had some serious issues: they were too simplistic, and were unable to fully support Scala's type system. They were thus deprecated in Scala 2.10, and are replaced with TypeTags (which are essentially what the Scala compiler itself uses to represent types, and therefore fully support Scala types). For more details on the difference, see:
Scala: What is a TypeTag and how do I use it?
How do the new Scala TypeTags improve the (deprecated) Manifests?
In other words
when do you need it?
Before 2013-01-04, when Scala 2.10 was released.
Not a complete answer, but regarding the difference between Manifest and ClassManifest, you can find an example in the Scala 2.8 Array paper:
The only remaining question is how to implement generic array creation. Unlike Java, Scala allows an instance creation new Array[T] where T is a type parameter. How can this be implemented, given the fact that there does not exist a uniform array representation in Java?
The only way to do this is to require additional runtime information which describes the type T. Scala 2.8 has a new mechanism for this, which is called a Manifest. An object of type Manifest[T] provides complete information about the type T.
Manifest values are typically passed in implicit parameters; and the compiler knows how to construct them for statically known types T.
There exists also a weaker form named ClassManifest which can be constructed from knowing just the top-level class of a type, without necessarily knowing all its argument types.
It is this type of runtime information that’s required for array creation.
Example:
One needs to provide this information by passing a ClassManifest[T] into the
method as an implicit parameter:
def tabulate[T](len:Int, f:Int=>T)(implicit m:ClassManifest[T]) = {
val xs = new Array[T](len)
for (i <- 0 until len) xs(i) = f(i)
xs
}
As a shorthand form, a context bound1 can be used on the type parameter T instead,
(See this SO question for illustration)
, giving:
def tabulate[T: ClassManifest](len:Int, f:Int=>T) = {
val xs = new Array[T](len)
for (i <- 0 until len) xs(i) = f(i)
xs
}
When calling tabulate on a type such as Int, or String, or List[T], the Scala compiler can create a class manifest to pass as implicit argument to tabulate.
Let's also chck out manifest in scala sources (Manifest.scala), we see:
Manifest.scala:
def manifest[T](implicit m: Manifest[T]) = m
So with regards to following example code:
def foo[A](somelist: List[A])(implicit m: Manifest[A]): String = {
if (m <:< manifest[String]) {
"its a string"
} else {
"its not a string"
}
}
we can see that the manifest function searches for an implicit m: Manifest[T] which satisfies the type parameter you provide in our example code it was manifest[String]. So when you call something like:
if (m <:< manifest[String]) {
you are checking if the current implicit m which you defined in your function is of type manifest[String] and as the manifest is a function of type manifest[T] it would search for a specific manifest[String] and it would find if there is such an implicit.

Recursive overloading semantics in the Scala REPL - JVM languages

Using Scala's command line REPL:
def foo(x: Int): Unit = {}
def foo(x: String): Unit = {println(foo(2))}
gives
error: type mismatch;
found: Int(2)
required: String
It seems that you can't define overloaded recursive methods in the REPL. I thought this was a bug in the Scala REPL and filed it, but it was almost instantly closed with "wontfix: I don't see any way this could be supported given the semantics of the interpreter, because these two methods must to be compiled together." He recommended putting the methods in an enclosing object.
Is there a JVM language implementation or Scala expert who could explain why? I can see it would be a problem if the methods called each other for instance, but in this case?
Or if this is too large a question and you think I need more prerequisite knowledge, does someone have any good links to books or sites about language implementations, especially on the JVM? (I know about John Rose's blog, and the book Programming Language Pragmatics... but that's about it. :)
The issue is due to the fact that the interpreter most often has to replace existing elements with a given name, rather than overload them. For example, I will often be running through experimenting with something, often creating a method called test:
def test(x: Int) = x + x
A little later on, let's say that I'm running a different experiment and I create another method named test, unrelated to the first:
def test(ls: List[Int]) = (0 /: ls) { _ + _ }
This isn't an entirely unrealistic scenario. In fact, it's precisely how most people use the interpreter, often without even realizing it. If the interpreter arbitrarily decided to keep both versions of test in scope, that could lead to confusing semantic differences in using test. For example, we might make a call to test, accidentally passing an Int rather than List[Int] (not the most unlikely accident in the world):
test(1 :: Nil) // => 1
test(2) // => 4 (expecting 2)
Over time, the root scope of the interpreter would get incredibly cluttered with various versions of methods, fields, etc. I tend to leave my interpreter open for days at a time, but if overloading like this were allowed, we would be forced to "flush" the interpreter every so often as things got to be too confusing.
It's not a limitation of the JVM or the Scala compiler, it's a deliberate design decision. As mentioned in the bug, you can still overload if you're within something other than the root scope. Enclosing your test methods within a class seems like the best solution to me.
% scala28
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.
scala> def foo(x: Int): Unit = () ; def foo(x: String): Unit = { println(foo(2)) }
foo: (x: String)Unit <and> (x: Int)Unit
foo: (x: String)Unit <and> (x: Int)Unit
scala> foo(5)
scala> foo("abc")
()
REPL will accept if you copy both lines and paste both at same time.
As shown by extempore's answer, it is possible to overload. Daniel's comment about design decision is correct, but, I think, incomplete and a bit misleading. There's no outlawing of overloads (since they are possible), but they are not easily achieved.
The design decisions that lead to this are:
All previous definitions must be available.
Only newly entered code is compiled, instead of recompiling everything ever entered every time.
It must be possible to redefine definitions (as Daniel mentioned).
It must be possible to define members such as vals and defs, not only classes and objects.
The problem is... how to achieve all these goals? How do we process your example?
def foo(x: Int): Unit = {}
def foo(x: String): Unit = {println(foo(2))}
Starting with the 4th item, A val or def can only be defined inside a class, trait, object or package object. So, REPL puts the definitions inside objects, like this (not actual representation!)
package $line1 { // input line
object $read { // what was read
object $iw { // definitions
def foo(x: Int): Unit = {}
}
// val res1 would be here somewhere if this was an expression
}
}
Now, due to how JVM works, once you defined one of them, you can't extend them. You could, of course, recompile everything, but we discarded that. So you need to place it in a different place:
package $line1 { // input line
object $read { // what was read
object $iw { // definitions
def foo(x: String): Unit = { println(foo(2)) }
}
}
}
And this explains why your examples are not overloads: they are defined in two different places. If you put them in the same line, they'd all be defined together, which would make them overloads, as shown in extempore's example.
As for the other design decisions, each new package import definitions and "res" from previous packages, and the imports can shadow each other, which makes it possible to "redefine" stuff.