Is there a universal method to create a tail recursive function in Scala? - scala

While checking Intel's BigDL repo, I stumbled upon this method:
private def recursiveListFiles(f: java.io.File, r: Regex): Array[File] = {
val these = f.listFiles()
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_, r))
}
I noticed that it was not tail recursive and decided to write a tail recursive version:
private def recursiveListFiles(f: File, r: Regex): Array[File] = {
#scala.annotation.tailrec def recursiveListFiles0(f: Array[File], r: Regex, a: Array[File]): Array[File] = {
f match {
case Array() => a
case htail => {
val these = htail.head.listFiles()
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
recursiveListFiles0(these.filter(_.isDirectory)++htail.tail, r, a ++ good)
}
}
}
recursiveListFiles0(Array[File](f), r, Array.empty[File])
}
What made this difficult compared to what I am used to is the concept that a File can be transformed into an Array[File] which adds another level of depth.
What is the theory behind recursion on datatypes that have the following member?
def listTs[T]: T => Traversable[T]

Short answer
If you generalize the idea and think of it as a monad (polymorphic thing working for arbitrary type params) then you won't be able to implement a tail recursive implementation.
Trampolines try to solve this very problem by providing a way to evaluate a recursive computation without overflowing the stack. The general idea is to create a stream of pairs of (result, computation). So at each step you'll have to return the computed result up to that point and a function to create the next result (aka thunk).
From Rich Dougherty’s blog:
A trampoline is a loop that repeatedly runs functions. Each function,
called a thunk, returns the next function for the loop to run. The
trampoline never runs more than one thunk at a time, so if you break
up your program into small enough thunks and bounce each one off the
trampoline, then you can be sure the stack won't grow too big.
More + References
In the categorical sense, the theory behind such data types is closely related to Cofree Monads and fold and unfold functions, and in general to Fixed point types.
See this fantastic talk: Fun and Games with Fix Cofree and Doobie by Rob Norris which discusses a use case very similar to your question.
This article about Free monads and Trampolines is also related to your first question: Stackless Scala With Free Monads.
See also this part of the Matryoshka docs. Matryoshka is a Scala library implementing monads around the concept of FixedPoint types.

Related

Looking for some guidance on how to code a writer for a given "AST" (DynamoDB)

As a personal project, I am writing yet another Scala library for DynamoDb. It contains many interesting aspect such as reading and writing from an AST (just as Json), handling HTTP request, streaming data…
In order to be able able to communicate with DynamoDb, one needs to be able to read from / to the DynamoDb format (the “AST”). I extracted this reading / writing from / to the AST in a minimalist library: dynamo-ast. It contains two main type classes: DynamoReads[_] and DynamoWrites[_] (deeply inspired from Play Json).
I successfully coded the reading part of the library ending with a very simple code such as :
trait DynamoRead[A] { self =>
def read(dynamoType: DynamoType): DynamoReadResult[A]
}
case class TinyImage(url: String, alt: String)
val dynamoReads: DynamoReads[TinyImage] = {
for {
url <- read[String].at(“url”)
alt <- read[String].at(“alt”)
} yield (url, alt) map (TinyImage.apply _).tupled
}
dynamoReads.reads(dynamoAst) //yield DynamoReadResult[TinyImage]
At that point, I thought I wrote the most complicated part of the library and the DynamoWrite[_] part would be a piece of cake. I am however stuck on writing the DynamoWrite part. I was a fool.
My goal is to provide a very similar “user experience” with the DynamoWrite[_] and keep it as simple as possible such as :
val dynamoWrites: DynamoWrites[TinyImage] = {
for {
url <- write[String].at(“url”)
alt <- write[String].at(“alt”)
} yield (url, alt) map (TinyImage.unapply _) //I am not sure what to yield here nor how to code it
}
dynamoWrites.write(TinyImage(“http://fake.url”, “The alt desc”)) //yield DynamoWriteResult[DynamoType]
Since this library is deeply inspired from Play Json library (because I like its simplicity) I had a look at the sources several times. I kind of dislike the way the writer part is coded because to me, it adds a lot of overhead (basically each time a field a written, a new JsObject is created with one field and the resulting JsObject for a complete class is the merge of all the JsObjects containing one field).
From my understanding, the DynamoReads part can be written with only one trait (DynamoRead[_]). The DynamoWrites part however requires at least two such as :
trait DynamoWrites[A] {
def write(a: A): DynamoWriteResult[DynamoType]
}
trait DynamoWritesPath[A] {
def write(path:String, a: A): DynamoWriteResult[(String, DynamoType)]
}
The DynamoWrites[_] is to write plain String, Int… and the DynamoWritesPath[_] is to write a tuple of (String, WhateverTypeHere) (to simulate a “field”).
So writing write[String].at(“url”) would yield a DynamoWritesPath[String]. Now I have several issues :
I have no clue how to write flatMap for my DynamoWritesPath[_]
what should yield a for comprehension to be able to obtain a DynamoWrite[TinyImage]
What I wrote so far (totally fuzzy and not compiling at all, looking for some help on this). Not committed at the moment (gist): https://gist.github.com/louis-forite/cad97cc0a47847b2e4177192d9dbc3ae
To sum up, I am looking for some guidance on how to write the DynamoWrites[_] part. My goal is to provide for the client the most straight forward way to code a DynamoWrites[_] for a given type. My non goal is to write the perfect library and keep it a zero dependency library.
Link to the library: https://github.com/louis-forite/dynamo-ast
A Reads is a covariant functor. That means it has map. It can also be seen as a Monad which means it has flatMap (although a monad is overkill unless you need the previous field in order to know how to process the next):
trait Reads[A] {
def map [B] (f: A => B): Reads[B]
def flatMap [B](f: A => Reads[B]): Reads[B] // not necessary, but available
}
The reason for this, is that to transform a Reads[Int] to a Reads[String], you need to first read the Int, then apply the Int => String function.
But a Writes is a contravariant functor. It has contramap where the direction of the types is reversed:
trait Writes[A] {
def contramap [B](f: B => A): Reads[B]
}
The type on the function is reversed because to transform a Writes[Int] to a Writes[String] you must receive the String from the caller, apply the transformation String => Int and then write the Int.
I don't think it makes sense to provide for-comprehension syntax (flatMap) for the Writes API.
// here it is clear that you're extracting a string value
url <- read[String].at(“url”)
// but what does this mean for the write method?
url <- write[String].at("url")
// what is `url`?
That's probably why play doesn't provide one either, and why they focus on their combinator syntax (using the and function, their version of applicative functor builder?).
For reference: http://blog.tmorris.net/posts/functors-and-things-using-scala/index.html
You can achieve a more consistent API by using something like the and method in play json:
(write[String]("url") and write[String]("alt"))(unlift(TinyImage.unapply))
(read[String]("url") and read[String]("alt"))(TinyImage.apply)
// unfortunately, the type ascription is necessary in this case
(write[String]("url") and write[String]("alt")) {(x: TinyImage) =>
(x.url, x.alt)
}
// transforming
val instantDynamoType: DynamoFormat[Instant] =
format[String].xmap(Instant.parse _)((_: Instant).toString)
You can still use for-comprehension for the reads, although it's a bit over-powered (sort of implies that fields must be processed in-sequence, while that's not technically necessary).

List implementation of foldLeft in Scala

Scala foldLeft implementation is:
def foldLeft[B](z: B)(op: (B, A) => B): B = {
var result = z
this foreach (x => result = op(result, x))
result
}
Why scala develovers don't use something like tail recursion or something else like this(It's just example) :
def foldLeft[T](start: T, myList: List[T])(f:(T, T) => T): T = {
def foldRec(accum: T, list: List[T]): T = {
list match {
case Nil => accum
case head :: tail => foldRec(f(accum, head), tail)
}
}
foldRec(start, myList)
}
Can it be? Why if it cannot/can?
"Why not replace this simple three-line piece of code with this less simple seven-line piece of code that does the same thing?"
Um. That's why.
(If you are asking about performance, then one would need benchmarks of both solutions and an indication that the non-closure version was significantly faster.)
According to this answer, Scala does support tail-recursion optimization, but it looks like it wasn't there from the beginning, and it might still not work in every case, so that specific implementation might be a leftover.
That said, Scala is multi-paradigm and I don't think it strives for purity in terms of its functional programming, so I wouldn't be surprised if they went for the most practical or convenient approach.
Beside the imperative solution is simpler, it is also way more general. As you may have noticed, foldLeft is implemented in TraversableOnce and depends only on the foreach method. Thus, by extending Traversable and implementing foreach, which is probably the simplest method to implement on any collection, you get all these wonderful methods.
The declarative implementation on the other hand is reflexive on the structure of the List and very specific as it depends on Nil and ::.

Is there any fundamental limitations that stops Scala from implementing pattern matching over functions?

In languages like SML, Erlang and in buch of others we may define functions like this:
fun reverse [] = []
| reverse x :: xs = reverse xs # [x];
I know we can write analog in Scala like this (and I know, there are many flaws in the code below):
def reverse[T](lst: List[T]): List[T] = lst match {
case Nil => Nil
case x :: xs => reverse(xs) ++ List(x)
}
But I wonder, if we could write former code in Scala, perhaps with desugaring to the latter.
Is there any fundamental limitations for such syntax being implemented in the future (I mean, really fundamental -- e.g. the way type inference works in scala, or something else, except parser obviously)?
UPD
Here is a snippet of how it could look like:
type T
def reverse(Nil: List[T]) = Nil
def reverse(x :: xs: List[T]): List[T] = reverse(xs) ++ List(x)
It really depends on what you mean by fundamental.
If you are really asking "if there is a technical showstopper that would prevent to implement this feature", then I would say the answer is no. You are talking about desugaring, and you are on the right track here. All there is to do is to basically stitch several separates cases into one single function, and this can be done as a mere preprocessing step (this only requires syntactic knowledge, no need for semantic knowledge). But for this to even make sense, I would define a few rules:
The function signature is mandatory (in Haskell by example, this would be optional, but it is always optional whether you are defining the function at once or in several parts). We could try to arrange to live without the signature and attempt to extract it from the different parts, but lack of type information would quickly come to byte us. A simpler argument is that if we are to try to infer an implicit signature, we might as well do it for all the methods. But the truth is that there are very good reasons to have explicit singatures in scala and I can't imagine to change that.
All the parts must be defined within the same scope. To start with, they must be declared in the same file because each source file is compiled separately, and thus a simple preprocessor would not be enough to implement the feature. Second, we still end up with a single method in the end, so it's only natural to have all the parts in the same scope.
Overloading is not possible for such methods (otherwise we would need to repeat the signature for each part just so the preprocessor knows which part belongs to which overload)
Parts are added (stitched) to the generated match in the order they are declared
So here is how it could look like:
def reverse[T](lst: List[T]): List[T] // Exactly like an abstract def (provides the signature)
// .... some unrelated code here...
def reverse(Nil) = Nil
// .... another bit of unrelated code here...
def reverse(x :: xs ) = reverse(xs) ++ List(x)
Which could be trivially transformed into:
def reverse[T](list: List[T]): List[T] = lst match {
case Nil => Nil
case x :: xs => reverse(xs) ++ List(x)
}
// .... some unrelated code here...
// .... another bit of unrelated code here...
It is easy to see that the above transformation is very mechanical and can be done by just manipulating a source AST (the AST produced by the slightly modified grammar that accepts this new constructs), and transforming it into the target AST (the AST produced by the standard scala grammar).
Then we can compile the result as usual.
So there you go, with a few simple rules we are able to implement a preprocessor that does all the work to implement this new feature.
If by fundamental you are asking "is there anything that would make this feature out of place" then it can be argued that this does not feel very scala. But more to the point, it does not bring that much to the table. Scala author(s) actually tend toward making the language simpler (as in less built-in features, trying to move some built-in features into libraries) and adding a new syntax that is not really more readable goes against the goal of simplification.
In SML, your code snippet is literally just syntactic sugar (a "derived form" in the terminology of the language spec) for
val rec reverse = fn x =>
case x of [] => []
| x::xs = reverse xs # [x]
which is very close to the Scala code you show. So, no there is no "fundamental" reason that Scala couldn't provide the same kind of syntax. The main problem is Scala's need for more type annotations, which makes this shorthand syntax far less attractive in general, and probably not worth the while.
Note also that the specific syntax you suggest would not fly well, because there is no way to distinguish one case-by-case function definition from two overloaded functions syntactically. You probably would need some alternative syntax, similar to SML using "|".
I don't know SML or Erlang, but I know Haskell. It is a language without method overloading. Method overloading combined with such pattern matching could lead to ambiguities. Imagine following code:
def f(x: String) = "String "+x
def f(x: List[_]) = "List "+x
What should it mean? It can mean method overloading, i.e. the method is determined in compile time. It can also mean pattern matching. There would be just a f(x: AnyRef) method that would do the matching.
Scala also has named parameters, which would be probably also broken.
I don't think that Scala is able to offer more simple syntax than you have shown in general. A simpler syntax may IMHO work in some special cases only.
There are at least two problems:
[ and ] are reserved characters because they are used for type arguments. The compiler allows spaces around them, so that would not be an option.
The other problem is that = returns Unit. So the expression after the | would not return any result
The closest I could come up with is this (note that is very specialized towards your example):
// Define a class to hold the values left and right of the | sign
class |[T, S](val left: T, val right: PartialFunction[T, T])
// Create a class that contains the | operator
class OrAssoc[T](left: T) {
def |(right: PartialFunction[T, T]): T | T = new |(left, right)
}
// Add the | to any potential target
implicit def anyToOrAssoc[S](left: S): OrAssoc[S] = new OrAssoc(left)
object fun {
// Use the magic of the update method
def update[T, S](choice: T | S): T => T = { arg =>
if (choice.right.isDefinedAt(arg)) choice.right(arg)
else choice.left
}
}
// Use the above construction to define a new method
val reverse: List[Int] => List[Int] =
fun() = List.empty[Int] | {
case x :: xs => reverse(xs) ++ List(x)
}
// Call the method
reverse(List(3, 2, 1))

what is proper monad or sequence comprehension to both map and carry state across?

I'm writing a programming language interpreter.
I have need of the right code idiom to both evaluate a sequence of expressions to get a sequence of their values, and propagate state from one evaluator to the next to the next as the evaluations take place. I'd like a functional programming idiom for this.
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
What I have is this code which I'm using to try to figure this out. Bear with a few lines of test rig first:
// test rig
class MonadLearning extends JUnit3Suite {
val d = List("1", "2", "3") // some expressions to evaluate.
type ResType = Int
case class State(i : ResType) // trivial state for experiment purposes
val initialState = State(0)
// my stub/dummy "eval" function...obviously the real one will be...real.
def computeResultAndNewState(s : String, st : State) : (ResType, State) = {
val State(i) = st
val res = s.toInt + i
val newStateInt = i + 1
(res, State(newStateInt))
}
My current solution. Uses a var which is updated as the body of the map is evaluated:
def testTheVarWay() {
var state = initialState
val r = d.map {
s =>
{
val (result, newState) = computeResultAndNewState(s, state)
state = newState
result
}
}
println(r)
println(state)
}
I have what I consider unacceptable solutions using foldLeft which does what I call "bag it as you fold" idiom:
def testTheFoldWay() {
// This startFold thing, requires explicit type. That alone makes it muddy.
val startFold : (List[ResType], State) = (Nil, initialState)
val (r, state) = d.foldLeft(startFold) {
case ((tail, st), s) => {
val (r, ns) = computeResultAndNewState(s, st)
(tail :+ r, ns) // we want a constant-time append here, not O(N). Or could Cons on front and reverse later
}
}
println(r)
println(state)
}
I also have a couple of recursive variations (which are obvious, but also not clear or well motivated), one using streams which is almost tolerable:
def testTheStreamsWay() {
lazy val states = initialState #:: resultStates // there are states
lazy val args = d.toStream // there are arguments
lazy val argPairs = args zip states // put them together
lazy val resPairs : Stream[(ResType, State)] = argPairs.map{ case (d1, s1) => computeResultAndNewState(d1, s1) } // map across them
lazy val (results , resultStates) = myUnzip(resPairs)// Note .unzip causes infinite loop. Had to write my own.
lazy val r = results.toList
lazy val finalState = resultStates.last
println(r)
println(finalState)
}
But, I can't figure out anything as compact or clear as the original 'var' solution above, which I'm willing to live with, but I think somebody who eats/drinks/sleeps monad idioms is going to just say ... use this... (Hopefully!)
With the map-with-accumulator combinator (the easy way)
The higher-order function you want is mapAccumL. It's in Haskell's standard library, but for Scala you'll have to use something like Scalaz.
First the imports (note that I'm using Scalaz 7 here; for previous versions you'd import Scalaz._):
import scalaz._, syntax.std.list._
And then it's a one-liner:
scala> d.mapAccumLeft(initialState, computeResultAndNewState)
res1: (State, List[ResType]) = (State(3),List(1, 3, 5))
Note that I've had to reverse the order of your evaluator's arguments and the return value tuple to match the signatures expected by mapAccumLeft (state first in both cases).
With the state monad (the slightly less easy way)
As Petr Pudlák points out in another answer, you can also use the state monad to solve this problem. Scalaz actually provides a number of facilities that make working with the state monad much easier than the version in his answer suggests, and they won't fit in a comment, so I'm adding them here.
First of all, Scalaz does provide a mapM—it's just called traverse (which is a little more general, as Petr Pudlák notes in his comment). So assuming we've got the following (I'm using Scalaz 7 again here):
import scalaz._, Scalaz._
type ResType = Int
case class Container(i: ResType)
val initial = Container(0)
val d = List("1", "2", "3")
def compute(s: String): State[Container, ResType] = State {
case Container(i) => (Container(i + 1), s.toInt + i)
}
We can write this:
d.traverse[({type L[X] = State[Container, X]})#L, ResType](compute).run(initial)
If you don't like the ugly type lambda, you can get rid of it like this:
type ContainerState[X] = State[Container, X]
d.traverse[ContainerState, ResType](compute).run(initial)
But it gets even better! Scalaz 7 gives you a version of traverse that's specialized for the state monad:
scala> d.traverseS(compute).run(initial)
res2: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
And as if that wasn't enough, there's even a version with the run built in:
scala> d.runTraverseS(initial)(compute)
res3: (Container, List[ResType]) = (Container(3),List(1, 3, 5))
Still not as nice as the mapAccumLeft version, in my opinion, but pretty clean.
What you're describing is a computation within the state monad. I believe that the answer to your question
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
is that it's a monadic map using the state monad.
Values of the state monad are computations that read some internal state, possibly modify it, and return some value. It is often used in Haskell (see here or here).
For Scala, there is a trait in the ScalaZ library called State that models it (see also the source). There are utility methods in States for creating instances of State. Note that from the monadic point of view State is just a monadic value. This may seem confusing at first, because it's described by a function depending on a state. (A monadic function would be something of type A => State[B].)
Next you need is a monadic map function that computes values of your expressions, threading the state through the computations. In Haskell, there is a library method mapM that does just that, when specialized to the state monad.
In Scala, there is no such library function (if it is, please correct me). But it's possible to create one. To give a full example:
import scalaz._;
object StateExample
extends App
with States /* utility methods */
{
// The context that is threaded through the state.
// In our case, it just maps variables to integer values.
class Context(val map: Map[String,Int]);
// An example that returns the requested variable's value
// and increases it's value in the context.
def eval(expression: String): State[Context,Int] =
state((ctx: Context) => {
val v = ctx.map.get(expression).getOrElse(0);
(new Context(ctx.map + ((expression, v + 1)) ), v);
});
// Specialization of Haskell's mapM to our State monad.
def mapState[S,A,B](f: A => State[S,B])(xs: Seq[A]): State[S,Seq[B]] =
state((initState: S) => {
var s = initState;
// process the sequence, threading the state
// through the computation
val ys = for(x <- xs) yield { val r = f(x)(s); s = r._1; r._2 };
// return the final state and the output result
(s, ys);
});
// Example: Try to evaluate some variables, starting from an empty context.
val expressions = Seq("x", "y", "y", "x", "z", "x");
print( mapState(eval)(expressions) ! new Context(Map[String,Int]()) );
}
This way you can create simple functions that take some arguments and return State and then combine them into more complex ones by using State.map or State.flatMap (or perhaps better using for comprehensions), and then you can run the whole computation on a list of expressions by mapM.
See also http://blog.tmorris.net/posts/the-state-monad-for-scala-users/
Edit: See Travis Brown's answer, he described how to use the state monad in Scala much more nicely.
He also asks:
But why, when there's a standard combinator that does exactly what you need in this case?
(I ask this as someone who's been slapped for using the state monad when mapAccumL would do.)
It's because the original question asked:
It's not a fold because the results come out like a map. It's not a map because of the state prop across.
and I believe the proper answer is it is a monadic map using the state monad.
Using mapAccumL is surely faster, both less memory and CPU overhead. But the state monad captures the concept of what is going on, the essence of the problem. I believe in many (if not most) cases this is more important. Once we realize the essence of the problem, we can either use the high-level concepts to nicely describe the solution (perhaps sacrificing speed/memory a little) or optimize it to be fast (or perhaps even manage to do both).
On the other hand, mapAccumL solves this particular problem, but doesn't give us a broader answer. If we need to modify it a little, it might happen it won't work any more. Or, if the library starts to be complex, the code can start to be messy and we won't know how to improve it, how to make the original idea clear again.
For example, in the case of evaluating stateful expressions, the library can become complicated and complex. But if we use the state monad, we can build the library around small functions, each taking some arguments and returning something like State[Context,Result]. These atomic computations can be combined to more complex ones using flatMap method or for comprehensions, and finally we'll construct the desired task. The principle will stay the same across the whole library, and the final task will also be something that returns State[Context,Result].
To conclude: I'm not saying using the state monad is the best solution, and certainly it's not the fastest one. I just believe it is most didactic for a functional programmer - it describes the problem in a clean, abstract way.
You could do this recursively:
def testTheRecWay(xs: Seq[String]) = {
def innerTestTheRecWay(xs: Seq[String], priorState: State = initialState, result: Vector[ResType] = Vector()): Seq[ResType] = {
xs match {
case Nil => result
case x :: tail =>
val (res, newState) = computeResultAndNewState(x, priorState)
innerTestTheRecWay(tail, newState, result :+ res)
}
}
innerTestTheRecWay(xs)
}
Recursion is a common practice in functional programming and is most of the time easier to read, write and understand than loops. In fact scala does not have any loops other than while. fold, map, flatMap, for (which is just sugar for flatMap/map), etc. are all recursive.
This method is tail recursive and will be optimized by the compiler to not build a stack, so it is absolutely safe to use. You can add the #annotation.tailrec annotaion, to force the compiler to apply tail recursion elimination. If your method is not tailrec the compiler will then complain.
edit: renamed inner method to avoid ambiguity

costly computation occuring in both isDefined and Apply of a PartialFunction

It is quite possible that to know whether a function is defined at some point, a significant part of computing its value has to be done. In a PartialFunction, when implementing isDefined and apply, both methods will have to do that. What to do is this common job is costly?
There is the possibility of caching its result, hoping that apply will be called after isDefined. Definitely ugly.
I often wish that PartialFunction[A,B] would be Function[A, Option[B]], which is clearly isomorphic. Or maybe, there could be another method in PartialFunction, say applyOption(a: A): Option[B]. With some mixins, implementors would have a choice of implementing either isDefined and apply or applyOption. Or all of them to be on the safe side, performance wise. Clients which test isDefined just before calling apply would be encouraged to use applyOption instead.
However, this is not so. Some major methods in the library, among them collect in collections require a PartialFunction. Is there a clean (or not so clean) way to avoid paying for computations repeated between isDefined and apply?
Also, is the applyOption(a: A): Option[B] method reasonable? Does it sound feasible to add it in a future version? Would it be worth it?
Why is caching such a problem? In most cases, you have a local computation, so as long as you write a wrapper for the caching, you needn't worry about it. I have the following code in my utility library:
class DroppedFunction[-A,+B](f: A => Option[B]) extends PartialFunction[A,B] {
private[this] var tested = false
private[this] var arg: A = _
private[this] var ans: Option[B] = None
private[this] def cache(a: A) {
if (!tested || a != arg) {
tested = true
arg = a
ans = f(a)
}
}
def isDefinedAt(a: A) = {
cache(a)
ans.isDefined
}
def apply(a: A) = {
cache(a)
ans.get
}
}
class DroppableFunction[A,B](f: A => Option[B]) {
def drop = new DroppedFunction(f)
}
implicit def function_is_droppable[A,B](f: A => Option[B]) = new DroppableFunction(f)
and then if I have an expensive computation, I write a function method A => Option[B] and do something like (f _).drop to use it in collect or whatnot. (If you wanted to do it inline, you could create a method that takes A=>Option[B] and returns a partial function.)
(The opposite transformation--from PartialFunction to A => Option[B]--is called lifting, hence the "drop"; "unlift" is, I think, a more widely used term for the opposite operation.)
Have a look at this thread, Rethinking PartialFunction. You're not the only one wondering about this.
This is an interesting question, and I'll give my 2 cents.
First of resist the urge for premature optimization. Make sure the partial function is the problem. I was amazed at how fast they are on some cases.
Now assuming there is a problem, where would it come from?
Could be a large number of case clauses
Complex pattern matching
Some complex computation on the if causes
One option I'd try to find ways to fail fast. Break the pattern matching into layer, then chain partial functions. This way you can fail the match early. Also extract repeated sub matching. For example:
Lets assume OddEvenList is an extractor that break a list into a odd list and an even list:
var pf1: PartialFuntion[List[Int],R] = {
case OddEvenList(1::ors, 2::ers) =>
case OddEvenList(3::ors, 4::ors) =>
}
Break to two part, one that matches the split then one that tries to match re result (to avoid repeated computation. However this may require some re-engineering
var pf2: PartialFunction[(List[Int],List[Int],R) = {
case (1 :: ors, 2 :: ers) => R1
case (3 :: ors, 4 :: ors) => R2
}
var pf1: PartialFuntion[List[Int],R] = {
case OddEvenList(ors, ers) if(pf2.isDefinedAt(ors,ers) => pf2(ors,ers)
}
I have used this when progressively reading XML files that hard a rather inconstant format.
Another option is to compose partial functions using andThen. Although a quick test here seamed to indicate that only the first was is actually tests.
There is absolutely nothing wrong with caching mechanism inside the partial function, if:
the function returns always the same input, when passed the same argument
it has no side effects
it is completely hidden from the rest of the world
Such cached function is not distiguishable from a plain old pure partial function...