Random as instance of scalaz.Monad - scala

This is a follow-up to my previous question. I wrote a monad (for an exercise) that is actually a function generating random values. However it is not defined as an instance of type class scalaz.Monad.
Now I looked at Rng library and noticed that it defined Rng as scalaz.Monad:
implicit val RngMonad: Monad[Rng] =
new Monad[Rng] {
def bind[A, B](a: Rng[A])(f: A => Rng[B]) = a flatMap f
def point[A](a: => A) = insert(a)
}
So I wonder how exactly users benefit from that. How can we use the fact that Rng is an instance of type class scalaz.Monad ? Can you give any examples ?

Here's a simple example. Suppose I want to pick a random size for a range, and then pick a random index inside that range, and then return both the range and the index. The second computation of a random value clearly depends on the first—I need to know the size of the range in order to pick a value in the range.
This kind of thing is specifically what monadic binding is for—it allows you to write the following:
val rangeAndIndex: Rng[(Range, Int)] = for {
max <- Rng.positiveint
index <- Rng.chooseint(0, max)
} yield (0 to max, index)
This wouldn't be possible if we didn't have a Monad instance for Rng.

One of the benefit is that you will get a lot of useful methods defined in MonadOps.
For example, Rng.double.iterateUntil(_ < 0.1) will produce only the values that are less than 0.1 (while the values greater than 0.1 will be skipped).
iterateUntil can be used for generation of distribution samples using a rejection method.
E.g. this is the code that creates a beta distribution sample generator:
import com.nicta.rng.Rng
import java.lang.Math
import scalaz.syntax.monad._
object Main extends App {
def beta(alpha: Double, beta: Double): Rng[Double] = {
// Purely functional port of Numpy's beta generator: https://github.com/numpy/numpy/blob/31b94e85a99db998bd6156d2b800386973fef3e1/numpy/random/mtrand/distributions.c#L187
if (alpha <= 1.0 && beta <= 1.0) {
val rng: Rng[Double] = Rng.double
val xy: Rng[(Double, Double)] = for {
u <- rng
v <- rng
} yield (Math.pow(u, 1 / alpha), Math.pow(v, 1 / beta))
xy.iterateUntil { case (x, y) => x + y <= 1.0 }.map { case (x, y) => x / (x + y) }
} else ???
}
val rng: Rng[List[Double]] = beta(0.5, 0.5).fill(10)
println(rng.run.unsafePerformIO) // Prints 10 samples of the beta distribution
}

Like any interface, declaring an instance of Monad[Rng] does two things: it provides an implementation of the Monad methods under standard names, and it expresses an implicit contract that those method implementations conform to certain laws (in this case, the monad laws).
#Travis gave an example of one thing that's implemented with these interfaces, the Scalaz implementation of map and flatMap. You're right that you could implement these directly; they're "inherited" in Monad (actually a little more complex than that).
For an example of a method that you definitely have to implement some Scalaz interface for, how about sequence? This is a method that turns a List (or more generally a Traversable) of contexts into a single context for a List, e.g.:
val randomlyGeneratedNumbers: List[Rng[Int]] = ...
randomlyGeneratedNumbers.sequence: Rng[List[Int]]
But this actually only uses Applicative[Rng] (which is a superclass), not the full power of Monad. I can't actually think of anything that uses Monad directly (there are a few methods on MonadOps, e.g. untilM, but I've never used any of them in anger), but you might want a Bind for a "wrapper" case where you have an "inner" Monad "inside" your Rng things, in which case MonadTrans is useful:
val a: Rng[Reader[Config, Int]] = ...
def f: Int => Rng[Reader[Config, Float]] = ...
//would be a pain to manually implement something to combine a and f
val b: ReaderT[Rng, Config, Int] = ...
val g: Int => ReaderT[Rng, Config, Float] = ...
b >>= g
To be totally honest though, Applicative is probably good enough for most Monad use cases, at least the simpler ones.
Of course all of these methods are things you could implement yourself, but like any library the whole point of Scalaz is that they're already implemented, and under standard names, making it easier for other people to understand your code.

Related

Fibonacci memoization in Scala with Memo.mutableHashMapMemo

I am trying implement the fibonacci function in Scala with memoization
One example given here uses a case statement:
Is there a generic way to memoize in Scala?
import scalaz.Memo
lazy val fib: Int => BigInt = Memo.mutableHashMapMemo {
case 0 => 0
case 1 => 1
case n => fib(n-2) + fib(n-1)
}
It seems the variable n is implicitly defined as the first argument, but I get a compilation error if I replace n with _
Also what advantage does the lazy keyword have here, as the function seems to work equally well with and without this keyword.
However I wanted to generalize this to a more generic function definition with appropriate typing
import scalaz.Memo
def fibonachi(n: Int) : Int = Memo.mutableHashMapMemo[Int, Int] {
var value : Int = 0
if( n <= 1 ) { value = n; }
else { value = fibonachi(n-1) + fibonachi(n-2) }
return value
}
but I get the following compilation error
cmd10.sc:4: type mismatch;
found : Int => Int
required: Int
def fibonachi(n: Int) : Int = Memo.mutableHashMapMemo[Int, Int] {
^Compilation Failed
Compilation Failed
So I am trying to understand the generic way of adding adding a memoization annotation to a scala def function
One way to achieve a Fibonacci sequence is via a recursive Stream.
val fib: Stream[BigInt] = 0 #:: fib.scan(1:BigInt)(_+_)
An interesting aspect of streams is that, if something holds on to the head of the stream, the calculation results are auto-memoized. So, in this case, because the identifier fib is a val and not a def, the value of fib(n) is calculated only once and simply retrieved thereafter.
However, indexing a Stream is still a linear operation. If you want to memoize that away you could create a simple memo-wrapper.
def memo[A,R](f: A=>R): A=>R =
new collection.mutable.WeakHashMap[A,R] {
override def apply(a: A) = getOrElseUpdate(a,f(a))
}
val fib: Stream[BigInt] = 0 #:: fib.scan(1:BigInt)(_+_)
val mfib = memo(fib)
mfib(99) //res0: BigInt = 218922995834555169026
The more general question I am trying to ask is how to take a pre-existing def function and add a mutable/immutable memoization annotation/wrapper to it inline.
Unfortunately there is no way to do this in Scala unless you are willing to use a macro annotation for this which feels like an overkill to me or to use some very ugly design.
The contradicting requirements are "def" and "inline". The fundamental reason for this is that whatever you do inline with the def can't create any new place to store the memoized values (unless you use a macro that can re-write code introducing new val/vars). You may try to work this around using some global cache but this IMHO falls under the "ugly design" branch.
The design of ScalaZ Memo is used to create a val of the type Function[K,V] which is often written in Scala as just K => V instead of def. In this way the produced val can contain also the storage for the cached values. On the other hand syntactically the difference between usage of a def method and of a K => V function is minimal so this works pretty well. Since the Scala compiler knows how to convert def method into a function, you can wrap a def with Memo but you can't get a def out of it. If for some reason you need def anyway, you'll need another wrapper def.
import scalaz.Memo
object Fib {
def fib(n: Int): BigInt = n match {
case 0 => BigInt(0)
case 1 => BigInt(1)
case _ => fib(n - 2) + fib(n - 1)
}
// "fib _" converts a method into a function. Sometimes "_" might be omitted
// and compiler can imply it but sometimes the compiler needs this explicit hint
lazy val fib_mem_val: Int => BigInt = Memo.mutableHashMapMemo(fib _)
def fib_mem_def(n: Int): BigInt = fib_mem_val(n)
}
println(Fib.fib(5))
println(Fib.fib_mem_val(5))
println(Fib.fib_mem_def(5))
Note how there is no difference in syntax of calling fib, fib_mem_val and fib_mem_def although fib_mem_val is a value. You may also try this example online
Note: beware that some ScalaZ Memo implementations are not thread-safe.
As for the lazy part, the benefit is typical for any lazy val: the actual value with the underlying storage will not be created until the first usage. If the method will be used anyway, I see no benefits in declaring it as lazy

What is the mechanism by which functions with multiple parameter lists can (sometimes) be used with less than the required number of parameters?

Let me introduce this question by way of an example. This was taken from Lecture 2.3 of Martin Odersky's Functional Programming course.
I have a function to find fixed points iteratively like so
object fixed_points {
println("Welcome to Fixed Points")
val tolerance = 0.0001
def isCloseEnough(x: Double, y: Double) =
abs((x-y)/x) < tolerance
def fixedPoint(f: Double => Double)(firstGuess: Double) = {
def iterate(guess: Double): Double = {
println(guess)
val next = f(guess)
if (isCloseEnough(guess, next)) next
else iterate(next)
}
iterate(firstGuess)
}
I can adapt this function to finding square roots like so
def sqrt(x: Double) =
fixedPoint(y => x/y)(1.0)
However, this does not converge for certain arguments (like 4 for example). So I apply an average damping to it, essentially converting it to Newton-Raphson like so
def sqrt(x: Double) =
fixedPoint(y => (x/y+y)/2)(1.0)
which converges.
Now average damping is general enough to warrant its own function, so I refactor my code like so
def averageDamp(f: Double => Double)(x: Double) = (x+f(x))/2
and
def sqrtDamp(x: Double) =
fixedPoint(averageDamp(y=>x/y))(1.0) (*)
Whoa! What just happened?? I'm using averageDamp with only one parameter (when it was defined with two) and the compiler does not complain!
Now, I understand that I can use partial application like so
def a = averageDamp(x=>2*x)_
a(3) // returns 4.5
No problems there. But when I attempt to use averageDamp with less than the requisite number of parameters (as was done in sqrtDamp) like so
def a = averageDamp(x=>2*x) (**)
I get an error missing arguments for method averageDamp.
Questions:
How is what I have done in (**) different from (*) that the compiler complains in the former but not the latter?
So it looks like using less than the requisite parameters is allowed under certain circumstances. What are these circumstances and what is the name given to this mechanism? (I realize this would come under the topic of 'currying', but I'm after the specific name of this subset of currying, as it were)
This answer expands on the comment posted by #som-snytt.
The difference between (**) and (*) is that in the former, fixedPoint provides a type definition, whereas in the latter a does not. Essentially, whenever your code provides an explicit type declaration, the compiler is happy yo overlook the omission of the trailing underscore. This is a deliberate design decision, see Martin Odersky's explanation.
To illustrate this point, here is a small example.
object A {
def add(a: Int)(b:Int): Int = a + b
val x: Int => Int = add(5) // compiles fine
val y = add(5) // produces the following compiler error
}
/* missing arguments for method add in object A;
follow this method with `_' if you want to treat it as a partially applied function
val y = add(5)
^
*/

Monadic fold with State monad in constant space (heap and stack)?

Is it possible to perform a fold in the State monad in constant stack and heap space? Or is a different functional technique a better fit to my problem?
The next sections describe the problem and a motivating use case. I'm using Scala, but solutions in Haskell are welcome too.
Fold in the State Monad Fills the Heap
Assume Scalaz 7. Consider a monadic fold in the State monad. To avoid stack overflows, we'll trampoline the fold.
import scalaz._
import Scalaz._
import scalaz.std.iterable._
import Free.Trampoline
type TrampolinedState[S, B] = StateT[Trampoline, S, B] // monad type constructor
type S = Int // state is an integer
type M[B] = TrampolinedState[S, B] // our trampolined state monad
type R = Int // or some other monoid
val col: Iterable[R] = largeIterableofRs() // defined elsewhere
val (count, sum): (S, R) = col.foldLeftM[M, R](Monoid[R].zero){
(acc: R, x: R) => StateT[Trampoline, S, R] {
s: S => Trampoline.done {
(s + 1, Monoid[R].append(acc, x))
}
}
} run 0 run
// In Scalaz 7, foldLeftM is implemented in terms of foldRight, which in turn
// is a reversed.foldLeft. This pulls the whole collection into memory and kills
// the heap. Ignore this heap overflow. We could reimplement foldLeftM to avoid
// this overflow or use a foldRightM instead.
// Our real issue is the heap used by the unexecuted State mobits.
For a large collection col, this will fill the heap.
I believe that during the fold, a closure (a State mobit) is created for each value in the collection (the x: R parameter), filling the heap. None of those can be evaluated until run 0 is executed, providing the initial state.
Can this O(n) heap usage be avoided?
More specifically, can the initial state be provided before the fold so that the State monad can execute during each bind, rather than nesting closures for later evaluation?
Or can the fold be constructed such that it is executed lazily after the State monad is run? In this way, the next x: R closure would not be created until after the previous ones have been evaluated and made suitable for garbage collection.
Or is there a better functional paradigm for this sort of work?
Example Application
But perhaps I'm using the wrong tool for the job. The evolution of an example use case follows. Am I wandering down the wrong path here?
Consider reservoir sampling, i.e., picking in one pass a uniform random k items from a collection too large to fit in memory. In Scala, such a function might be
def sample[A](col: TraversableOnce[A])(k: Int): Vector[A]
and if pimped into the TraversableOnce type could be used like this
val tenRandomInts = (Int.Min to Int.Max) sample 10
The work done by sample is essentially a fold:
def sample[A](col: Traversable[A])(k: Int): Vector[A] = {
col.foldLeft(Vector()){update(k)(_: Vector[A], _: A)}
}
However, update is stateful; it depends on n, the number of items already seen. (It also depends on an RNG, but for simplicity I assume that is global and stateful. The techniques used to handle n would extend trivially.). So how to handle this state?
The impure solution is simple and runs with constant stack and heap.
/* Impure version of update function */
def update[A](k: Int) = new Function2[Vector[A], A, Vector[A]] {
var n = 0
def apply(sample: Vector[A], x: A): Vector[A] = {
n += 1
algorithmR(k, n, acc, x)
}
}
def algorithmR(k: Int, n: Int, acc: Vector[A], x: A): Vector[A] = {
if (sample.size < k) {
sample :+ x // must keep first k elements
} else {
val r = rand.nextInt(n) + 1 // for simplicity, rand is global/stateful
if (r <= k)
sample.updated(r - 1, x) // sample is 0-index
else
sample
}
}
But what about a purely functional solution? update must take n as an additional parameter and return the new value along with the updated sample. We could include n in the implicit state, the fold accumulator, e.g.,
(col.foldLeft ((0, Vector())) (update(k)(_: (Int, Vector[A]), _: A)))._2
But that obscures the intent; we only really intend to accumulate the sample vector. This problem seems ready made for the State monad and a monadic left fold. Let's try again.
We'll use Scalaz 7, with these imports
import scalaz._
import Scalaz._
import scalaz.std.iterable_
and operate over an Iterable[A], since Scalaz doesn't support monadic folding of a Traversable.
sample is now defined
// sample using State monad
def sample[A](col: Iterable[A])(k: Int): Vector[A] = {
type M[B] = State[Int, B]
// foldLeftM is implemented using foldRight, which must reverse `col`, blowing
// the heap for large `col`. Ignore this issue for now.
// foldLeftM could be implemented differently or we could switch to
// foldRightM, implemented using foldLeft.
col.foldLeftM[M, Vector[A]](Vector())(update(k)(_: Vector[A], _: A)) eval 0
}
where update is
// update using State monad
def update(k: Int) = {
(acc: Vector[A], x: A) => State[Int, Vector[A]] {
n => (n + 1, algorithmR(k, n + 1, acc, x)) // algR same as impure solution
}
}
Unfortunately, this blows the stack on a large collection.
So let's trampoline it. sample is now
// sample using trampolined State monad
def sample[A](col: Iterable[A])(k: Int): Vector[A] = {
import Free.Trampoline
type TrampolinedState[S, B] = StateT[Trampoline, S, B]
type M[B] = TrampolinedState[Int, B]
// Same caveat about foldLeftM using foldRight and blowing the heap
// applies here. Ignore for now. This solution blows the heap anyway;
// let's fix that issue first.
col.foldLeftM[M, Vector[A]](Vector())(update(k)(_: Vector[A], _: A)) eval 0 run
}
where update is
// update using trampolined State monad
def update(k: Int) = {
(acc: Vector[A], x: A) => StateT[Trampoline, Int, Vector[A]] {
n => Trampoline.done { (n + 1, algorithmR(k, n + 1, acc, x) }
}
}
This fixes the stack overflow, but still blows the heap for very large collections (or very small heaps). One anonymous function per
value in the collection is created during the fold (I believe to close over each x: A parameter), consuming the heap before the trampoline is even run. (FWIW, the State version has this issue too; the stack overflow just surfaces first with smaller collections.)
Our real issue is the heap used by the unexecuted State mobits.
No, it is not. The real issue is that the collection doesn't fit in memory and that foldLeftM and foldRightM force the entire collection. A side effect of the impure solution is that you are freeing memory as you go. In the "purely functional" solution, you're not doing that anywhere.
Your use of Iterable ignores a crucial detail: what kind of collection col actually is, how its elements are created and how they are expected to be discarded. And so, necessarily, does foldLeftM on Iterable. It is likely too strict, and you are forcing the entire collection into memory. For example, if it is a Stream, then as long as you are holding on to col all the elements forced so far will be in memory. If it's some other kind of lazy Iterable that doesn't memoize its elements, then the fold is still too strict.
I tried your first example with an EphemeralStream did not see any significant heap pressure, even though it will clearly have the same "unexecuted State mobits". The difference is that an EphemeralStream's elements are weakly referenced and its foldRight doesn't force the entire stream.
I suspect that if you used Foldable.foldr, then you would not see the problematic behaviour since it folds with a function that is lazy in its second argument. When you call the fold, you want it to return a suspension that looks something like this immediately:
Suspend(() => head |+| tail.foldRightM(...))
When the trampoline resumes the first suspension and runs up to the next suspension, all of the allocations between suspensions will become available to be freed by the garbage collector.
Try the following:
def foldM[M[_]:Monad,A,B](a: A, bs: Iterable[B])(f: (A, B) => M[A]): M[A] =
if (bs.isEmpty) Monad[M].point(a)
else Monad[M].bind(f(a, bs.head))(fax => foldM(fax, bs.tail)(f))
val MS = StateT.stateTMonadState[Int, Trampoline]
import MS._
foldM[M,R,Int](Monoid[R].zero, col) {
(x, r) => modify(_ + 1) map (_ => Monoid[R].append(x, r))
} run 0 run
This will run in constant heap for a trampolined monad M, but will overflow the stack for a non-trampolined monad.
But the real problem is that Iterable is not a good abstraction for data that are too large to fit in memory. Sure, you can write an imperative side-effecty program where you explicitly discard elements after each iteration or use a lazy right fold. That works well until you want to compose that program with another one. And I'm assuming that the whole reason you're investigating doing this in a State monad to begin with is to gain compositionality.
So what can you do? Here are some options:
Make use of Reducer, Monoid, and composition thereof, then run in an imperative explicitly-freeing loop (or a trampolined lazy right fold) as the last step, after which composition is not possible or expected.
Use Iteratee composition and monadic Enumerators to feed them.
Write compositional stream transducers with Scalaz-Stream.
The last of these options is the one that I would use and recommend in the general case.
Using State, or any similar monad, isn't a good approach to the problem.
Using State is condemned to blow the stack/heap on large collections.
Consider a value of x: State[A,B] constructed from a large collection (for
example by folding over it). Then x can be evaluated on different values of the initial state A, yielding different results. So x needs to retain all information
contained in the collection. An in pure settings, x can't forget some
information not to blow stack/heap, so anything that is computed remains in
memory until the whole monadic value is freed, which happens only after the
result is evaluated. So the memory consumption of x is proportional to the size of the collection.
I believe a fitting approach to this problem is to use functional iteratees/pipes/conduits. This concept (referred to under these three names) was invented to process large collections of data with constant memory consumption, and to describe such processes using simple combinator.
I tried to use Scalaz' Iteratees, but it seems this part isn't mature yet, it suffers from stack overflows just as State does (or perhaps I'm not using it right; the code is available here, if anybody is interested).
However, it was simple using my (still a bit experimental) scala-conduit library (disclaimer: I'm the author):
import conduit._
import conduit.Pipe._
object Run extends App {
// Define a sampling function as a sink: It consumes
// data of type `A` and produces a vector of samples.
def sampleI[A](k: Int): Sink[A, Vector[A]] =
sampleI[A](k, 0, Vector())
// Create a sampling sink with a given state. It requests
// a value from the upstream conduit. If there is one,
// update the state and continue (the first argument to `requestF`).
// If not, return the current sample (the second argument).
// The `Finalizer` part isn't important for our problem.
private def sampleI[A](k: Int, n: Int, sample: Vector[A]):
Sink[A, Vector[A]] =
requestF((x: A) => sampleI(k, n + 1, algorithmR(k, n + 1, sample, x)),
(_: Any) => sample)(Finalizer.empty)
// The sampling algorithm copied from the question.
val rand = new scala.util.Random()
def algorithmR[A](k: Int, n: Int, sample: Vector[A], x: A): Vector[A] = {
if (sample.size < k) {
sample :+ x // must keep first k elements
} else {
val r = rand.nextInt(n) + 1 // for simplicity, rand is global/stateful
if (r <= k)
sample.updated(r - 1, x) // sample is 0-index
else
sample
}
}
// Construct an iterable of all `short` values, pipe it into our sampling
// funcition, and run the combined pipe.
{
print(runPipe(Util.fromIterable(Short.MinValue to Short.MaxValue) >->
sampleI(10)))
}
}
Update: It'd be possible to solve the problem using State, but we need to implement a custom fold specifically for State that knows how to do it constant space:
import scala.collection._
import scala.language.higherKinds
import scalaz._
import Scalaz._
import scalaz.std.iterable._
object Run extends App {
// Folds in a state monad over a foldable
def stateFold[F[_],E,S,A](xs: F[E],
f: (A, E) => State[S,A],
z: A)(implicit F: Foldable[F]): State[S,A] =
State[S,A]((s: S) => F.foldLeft[E,(S,A)](xs, (s, z))((p, x) => f(p._2, x)(p._1)))
// Sample a lazy collection view
def sampleS[F[_],A](k: Int, xs: F[A])(implicit F: Foldable[F]):
State[Int,Vector[A]] =
stateFold[F,A,Int,Vector[A]](xs, update(k), Vector())
// update using State monad
def update[A](k: Int) = {
(acc: Vector[A], x: A) => State[Int, Vector[A]] {
n => (n + 1, algorithmR(k, n + 1, acc, x)) // algR same as impure solution
}
}
def algorithmR[A](k: Int, n: Int, sample: Vector[A], x: A): Vector[A] = ...
{
print(sampleS(10, (Short.MinValue to Short.MaxValue)).eval(0))
}
}

What's the difference between multiple parameters lists and multiple parameters per list in Scala?

In Scala one can write (curried?) functions like this
def curriedFunc(arg1: Int) (arg2: String) = { ... }
What is the difference between the above curriedFunc function definition with two parameters lists and functions with multiple parameters in a single parameter list:
def curriedFunc(arg1: Int, arg2: String) = { ... }
From a mathematical point of view this is (curriedFunc(x))(y) and curriedFunc(x,y) but I can write def sum(x) (y) = x + y and the same will be def sum2(x, y) = x + y
I know only one difference - this is partially applied functions. But both ways are equivalent for me.
Are there any other differences?
Strictly speaking, this is not a curried function, but a method with multiple argument lists, although admittedly it looks like a function.
As you said, the multiple arguments lists allow the method to be used in the place of a partially applied function. (Sorry for the generally silly examples I use)
object NonCurr {
def tabulate[A](n: Int, fun: Int => A) = IndexedSeq.tabulate(n)(fun)
}
NonCurr.tabulate[Double](10, _) // not possible
val x = IndexedSeq.tabulate[Double](10) _ // possible. x is Function1 now
x(math.exp(_)) // complete the application
Another benefit is that you can use curly braces instead of parenthesis which looks nice if the second argument list consists of a single function, or thunk. E.g.
NonCurr.tabulate(10, { i => val j = util.Random.nextInt(i + 1); i - i % 2 })
versus
IndexedSeq.tabulate(10) { i =>
val j = util.Random.nextInt(i + 1)
i - i % 2
}
Or for the thunk:
IndexedSeq.fill(10) {
println("debug: operating the random number generator")
util.Random.nextInt(99)
}
Another advantage is, you can refer to arguments of a previous argument list for defining default argument values (although you could also say it's a disadvantage that you cannot do that in single list :)
// again I'm not very creative with the example, so forgive me
def doSomething(f: java.io.File)(modDate: Long = f.lastModified) = ???
Finally, there are three other application in an answer to related post Why does Scala provide both multiple parameters lists and multiple parameters per list? . I will just copy them here, but the credit goes to Knut Arne Vedaa, Kevin Wright, and extempore.
First: you can have multiple var args:
def foo(as: Int*)(bs: Int*)(cs: Int*) = as.sum * bs.sum * cs.sum
...which would not be possible in a single argument list.
Second, it aids the type inference:
def foo[T](a: T, b: T)(op: (T,T) => T) = op(a, b)
foo(1, 2){_ + _} // compiler can infer the type of the op function
def foo2[T](a: T, b: T, op: (T,T) => T) = op(a, b)
foo2(1, 2, _ + _) // compiler too stupid, unfortunately
And last, this is the only way you can have implicit and non implicit args, as implicit is a modifier for a whole argument list:
def gaga [A](x: A)(implicit mf: Manifest[A]) = ??? // ok
def gaga2[A](x: A, implicit mf: Manifest[A]) = ??? // not possible
There's another difference that was not covered by 0__'s excellent answer: default parameters. A parameter from one parameter list can be used when computing the default in another parameter list, but not in the same one.
For example:
def f(x: Int, y: Int = x * 2) = x + y // not valid
def g(x: Int)(y: Int = x * 2) = x + y // valid
That's the whole point, is that the curried and uncurried forms are equivalent! As others have pointed out, one or the other form can be syntactically more convenient to work with depending on the situation, and that is the only reason to prefer one over the other.
It's important to understand that even if Scala didn't have special syntax for declaring curried functions, you could still construct them; this is just a mathematical inevitability once you have the ability to create functions which return functions.
To demonstrate this, imagine that the def foo(a)(b)(c) = {...} syntax didn't exist. Then you could still achieve the exact same thing like so: def foo(a) = (b) => (c) => {...}.
Like many features in Scala, this is just a syntactic convenience for doing something that would be possible anyway, but with slightly more verbosity.
The two forms are isomorphic. The main difference is that curried functions are easier to apply partially, while non-curried functions have slightly nicer syntax, at least in Scala.

Clojure's 'let' equivalent in Scala

Often I face following situation: suppose I have these three functions
def firstFn: Int = ...
def secondFn(b: Int): Long = ...
def thirdFn(x: Int, y: Long, z: Long): Long = ...
and I also have calculate function. My first approach can look like this:
def calculate(a: Long) = thirdFn(firstFn, secondFn(firstFn), secondFn(firstFn) + a)
It looks beautiful and without any curly brackets - just one expression. But it's not optimal, so I end up with this code:
def calculate(a: Long) = {
val first = firstFn
val second = secondFn(first)
thirdFn(first, second, second + a)
}
Now it's several expressions surrounded with curly brackets. At such moments I envy Clojure a little bit. With let function I can define this function in one expression.
So my goal here is to define calculate function with one expression. I come up with 2 solutions.
1 - With scalaz I can define it like this (are there better ways to do this with scalaz?):
def calculate(a: Long) =
firstFn |> {first => secondFn(first) |> {second => thirdFn(first, second, second + a)}}
What I don't like about this solution is that it's nested. The more vals I have the deeper this nesting is.
2 - With for comprehension I can achieve something similar:
def calculate(a: Long) =
for (first <- Option(firstFn); second <- Option(secondFn(first))) yield thirdFn(first, second, second + a)
From one hand this solution has flat structure, just like let in Clojure, but from the other hand I need to wrap functions' results in Option and receive Option as result from calculate (it's good it I'm dealing with nulls, but I don't... and don't want to).
Are there better ways to achieve my goal? What is the idiomatic way for dealing with such situations (may be I should stay with vals... but let way of doing it looks so elegant)?
From other hand it's connected to Referential transparency. All three functions are referentially transparent (in my example firstFn calculates some constant like Pi), so theoretically they can be replaced with calculation results. I know this, but compiler does not, so it can't optimize my first attempt. And here is my second question:
Can I somehow (may be with annotation) give hint to compiler, that my function is referentially transparent, so that it can optimize this function for me (put some kind of caching there, for example)?
Edit
Thanks everybody for the great answers! It's just impossible to select one best answer (may be because they all so good) so I will accept answer with the most up-votes, I think it's fair enough.
in the non-recursive case, let is a restructuring of lambda.
def firstFn : Int = 42
def secondFn(b : Int) : Long = 42
def thirdFn(x : Int, y : Long, z : Long) : Long = x + y + z
def let[A, B](x : A)(f : A => B) : B = f(x)
def calculate(a: Long) = let(firstFn){first => let(secondFn(first)){second => thirdFn(first, second, second + a)}}
Of course, that's still nested. Can't avoid that. But you said you like the monadic form. So here's the identity monad
case class Identity[A](x : A) {
def map[B](f : A => B) = Identity(f(x))
def flatMap[B](f : A => Identity[B]) = f(x)
}
And here's your monadic calculate. Unwrap the result by calling .x
def calculateMonad(a : Long) = for {
first <- Identity(firstFn)
second <- Identity(secondFn(first))
} yield thirdFn(first, second, second + a)
But at this point it sure looks like the original val version.
The Identity monad exists in Scalaz with more sophistication
http://scalaz.googlecode.com/svn/continuous/latest/browse.sxr/scalaz/Identity.scala.html
Stick with the original form:
def calculate(a: Long) = {
val first = firstFn
val second = secondFn(first)
thirdFn(first, second, second + a)
}
It's concise and clear, even to Java developers. It's roughly equivalent to let, just without limiting the scope of the names.
Here's an option you may have overlooked.
def calculate(a: Long)(i: Int = firstFn)(j: Long = secondFn(i)) = thirdFn(i,j,j+a)
If you actually want to create a method, this is the way I'd do it.
Alternatively, you could create a method (one might name it let) that avoids nesting:
class Usable[A](a: A) {
def use[B](f: A=>B) = f(a)
def reuse[B,C](f: A=>B)(g: (A,B)=>C) = g(a,f(a))
// Could add more
}
implicit def use_anything[A](a: A) = new Usable(a)
def calculate(a: Long) =
firstFn.reuse(secondFn)((first, second) => thirdFn(first,second,second+a))
But now you might need to name the same things multiple times.
If you feel the first form is cleaner/more elegant/more readable, then why not just stick with it?
First, read this recent commit message to the Scala compiler from none other than Martin Odersky and take it to heart...
Perhaps the real issue here is instantly jumping the gun on claiming it's sub-optimal. The JVM is pretty hot at optimising this sort of thing. At times, it's just plain amazing!
Assuming you have a genuine performance issue in an application that's in genuine need of a speed up, you should start with a profiler report proving that this is a significant bottleneck, on a suitably configured and warmed up JVM.
Then, and only then, should you look at ways to make it faster that may end up sacrificing code clarity.
Why not use pattern matching here:
def calculate(a: Long) = firstFn match { case f => secondFn(f) match { case s => thirdFn(f,s,s + a) } }
How about using currying to record the function return values (parameters from preceding parameter groups are available in suceeding groups).
A bit odd looking but fairly concise and no repeated invocations:
def calculate(a: Long)(f: Int = firstFn)(s: Long = secondFn(f)) = thirdFn(f, s, s + a)
println(calculate(1L)()())