Monadic fold with State monad in constant space (heap and stack)? - scala

Is it possible to perform a fold in the State monad in constant stack and heap space? Or is a different functional technique a better fit to my problem?
The next sections describe the problem and a motivating use case. I'm using Scala, but solutions in Haskell are welcome too.
Fold in the State Monad Fills the Heap
Assume Scalaz 7. Consider a monadic fold in the State monad. To avoid stack overflows, we'll trampoline the fold.
import scalaz._
import Scalaz._
import scalaz.std.iterable._
import Free.Trampoline
type TrampolinedState[S, B] = StateT[Trampoline, S, B] // monad type constructor
type S = Int // state is an integer
type M[B] = TrampolinedState[S, B] // our trampolined state monad
type R = Int // or some other monoid
val col: Iterable[R] = largeIterableofRs() // defined elsewhere
val (count, sum): (S, R) = col.foldLeftM[M, R](Monoid[R].zero){
(acc: R, x: R) => StateT[Trampoline, S, R] {
s: S => Trampoline.done {
(s + 1, Monoid[R].append(acc, x))
}
}
} run 0 run
// In Scalaz 7, foldLeftM is implemented in terms of foldRight, which in turn
// is a reversed.foldLeft. This pulls the whole collection into memory and kills
// the heap. Ignore this heap overflow. We could reimplement foldLeftM to avoid
// this overflow or use a foldRightM instead.
// Our real issue is the heap used by the unexecuted State mobits.
For a large collection col, this will fill the heap.
I believe that during the fold, a closure (a State mobit) is created for each value in the collection (the x: R parameter), filling the heap. None of those can be evaluated until run 0 is executed, providing the initial state.
Can this O(n) heap usage be avoided?
More specifically, can the initial state be provided before the fold so that the State monad can execute during each bind, rather than nesting closures for later evaluation?
Or can the fold be constructed such that it is executed lazily after the State monad is run? In this way, the next x: R closure would not be created until after the previous ones have been evaluated and made suitable for garbage collection.
Or is there a better functional paradigm for this sort of work?
Example Application
But perhaps I'm using the wrong tool for the job. The evolution of an example use case follows. Am I wandering down the wrong path here?
Consider reservoir sampling, i.e., picking in one pass a uniform random k items from a collection too large to fit in memory. In Scala, such a function might be
def sample[A](col: TraversableOnce[A])(k: Int): Vector[A]
and if pimped into the TraversableOnce type could be used like this
val tenRandomInts = (Int.Min to Int.Max) sample 10
The work done by sample is essentially a fold:
def sample[A](col: Traversable[A])(k: Int): Vector[A] = {
col.foldLeft(Vector()){update(k)(_: Vector[A], _: A)}
}
However, update is stateful; it depends on n, the number of items already seen. (It also depends on an RNG, but for simplicity I assume that is global and stateful. The techniques used to handle n would extend trivially.). So how to handle this state?
The impure solution is simple and runs with constant stack and heap.
/* Impure version of update function */
def update[A](k: Int) = new Function2[Vector[A], A, Vector[A]] {
var n = 0
def apply(sample: Vector[A], x: A): Vector[A] = {
n += 1
algorithmR(k, n, acc, x)
}
}
def algorithmR(k: Int, n: Int, acc: Vector[A], x: A): Vector[A] = {
if (sample.size < k) {
sample :+ x // must keep first k elements
} else {
val r = rand.nextInt(n) + 1 // for simplicity, rand is global/stateful
if (r <= k)
sample.updated(r - 1, x) // sample is 0-index
else
sample
}
}
But what about a purely functional solution? update must take n as an additional parameter and return the new value along with the updated sample. We could include n in the implicit state, the fold accumulator, e.g.,
(col.foldLeft ((0, Vector())) (update(k)(_: (Int, Vector[A]), _: A)))._2
But that obscures the intent; we only really intend to accumulate the sample vector. This problem seems ready made for the State monad and a monadic left fold. Let's try again.
We'll use Scalaz 7, with these imports
import scalaz._
import Scalaz._
import scalaz.std.iterable_
and operate over an Iterable[A], since Scalaz doesn't support monadic folding of a Traversable.
sample is now defined
// sample using State monad
def sample[A](col: Iterable[A])(k: Int): Vector[A] = {
type M[B] = State[Int, B]
// foldLeftM is implemented using foldRight, which must reverse `col`, blowing
// the heap for large `col`. Ignore this issue for now.
// foldLeftM could be implemented differently or we could switch to
// foldRightM, implemented using foldLeft.
col.foldLeftM[M, Vector[A]](Vector())(update(k)(_: Vector[A], _: A)) eval 0
}
where update is
// update using State monad
def update(k: Int) = {
(acc: Vector[A], x: A) => State[Int, Vector[A]] {
n => (n + 1, algorithmR(k, n + 1, acc, x)) // algR same as impure solution
}
}
Unfortunately, this blows the stack on a large collection.
So let's trampoline it. sample is now
// sample using trampolined State monad
def sample[A](col: Iterable[A])(k: Int): Vector[A] = {
import Free.Trampoline
type TrampolinedState[S, B] = StateT[Trampoline, S, B]
type M[B] = TrampolinedState[Int, B]
// Same caveat about foldLeftM using foldRight and blowing the heap
// applies here. Ignore for now. This solution blows the heap anyway;
// let's fix that issue first.
col.foldLeftM[M, Vector[A]](Vector())(update(k)(_: Vector[A], _: A)) eval 0 run
}
where update is
// update using trampolined State monad
def update(k: Int) = {
(acc: Vector[A], x: A) => StateT[Trampoline, Int, Vector[A]] {
n => Trampoline.done { (n + 1, algorithmR(k, n + 1, acc, x) }
}
}
This fixes the stack overflow, but still blows the heap for very large collections (or very small heaps). One anonymous function per
value in the collection is created during the fold (I believe to close over each x: A parameter), consuming the heap before the trampoline is even run. (FWIW, the State version has this issue too; the stack overflow just surfaces first with smaller collections.)

Our real issue is the heap used by the unexecuted State mobits.
No, it is not. The real issue is that the collection doesn't fit in memory and that foldLeftM and foldRightM force the entire collection. A side effect of the impure solution is that you are freeing memory as you go. In the "purely functional" solution, you're not doing that anywhere.
Your use of Iterable ignores a crucial detail: what kind of collection col actually is, how its elements are created and how they are expected to be discarded. And so, necessarily, does foldLeftM on Iterable. It is likely too strict, and you are forcing the entire collection into memory. For example, if it is a Stream, then as long as you are holding on to col all the elements forced so far will be in memory. If it's some other kind of lazy Iterable that doesn't memoize its elements, then the fold is still too strict.
I tried your first example with an EphemeralStream did not see any significant heap pressure, even though it will clearly have the same "unexecuted State mobits". The difference is that an EphemeralStream's elements are weakly referenced and its foldRight doesn't force the entire stream.
I suspect that if you used Foldable.foldr, then you would not see the problematic behaviour since it folds with a function that is lazy in its second argument. When you call the fold, you want it to return a suspension that looks something like this immediately:
Suspend(() => head |+| tail.foldRightM(...))
When the trampoline resumes the first suspension and runs up to the next suspension, all of the allocations between suspensions will become available to be freed by the garbage collector.
Try the following:
def foldM[M[_]:Monad,A,B](a: A, bs: Iterable[B])(f: (A, B) => M[A]): M[A] =
if (bs.isEmpty) Monad[M].point(a)
else Monad[M].bind(f(a, bs.head))(fax => foldM(fax, bs.tail)(f))
val MS = StateT.stateTMonadState[Int, Trampoline]
import MS._
foldM[M,R,Int](Monoid[R].zero, col) {
(x, r) => modify(_ + 1) map (_ => Monoid[R].append(x, r))
} run 0 run
This will run in constant heap for a trampolined monad M, but will overflow the stack for a non-trampolined monad.
But the real problem is that Iterable is not a good abstraction for data that are too large to fit in memory. Sure, you can write an imperative side-effecty program where you explicitly discard elements after each iteration or use a lazy right fold. That works well until you want to compose that program with another one. And I'm assuming that the whole reason you're investigating doing this in a State monad to begin with is to gain compositionality.
So what can you do? Here are some options:
Make use of Reducer, Monoid, and composition thereof, then run in an imperative explicitly-freeing loop (or a trampolined lazy right fold) as the last step, after which composition is not possible or expected.
Use Iteratee composition and monadic Enumerators to feed them.
Write compositional stream transducers with Scalaz-Stream.
The last of these options is the one that I would use and recommend in the general case.

Using State, or any similar monad, isn't a good approach to the problem.
Using State is condemned to blow the stack/heap on large collections.
Consider a value of x: State[A,B] constructed from a large collection (for
example by folding over it). Then x can be evaluated on different values of the initial state A, yielding different results. So x needs to retain all information
contained in the collection. An in pure settings, x can't forget some
information not to blow stack/heap, so anything that is computed remains in
memory until the whole monadic value is freed, which happens only after the
result is evaluated. So the memory consumption of x is proportional to the size of the collection.
I believe a fitting approach to this problem is to use functional iteratees/pipes/conduits. This concept (referred to under these three names) was invented to process large collections of data with constant memory consumption, and to describe such processes using simple combinator.
I tried to use Scalaz' Iteratees, but it seems this part isn't mature yet, it suffers from stack overflows just as State does (or perhaps I'm not using it right; the code is available here, if anybody is interested).
However, it was simple using my (still a bit experimental) scala-conduit library (disclaimer: I'm the author):
import conduit._
import conduit.Pipe._
object Run extends App {
// Define a sampling function as a sink: It consumes
// data of type `A` and produces a vector of samples.
def sampleI[A](k: Int): Sink[A, Vector[A]] =
sampleI[A](k, 0, Vector())
// Create a sampling sink with a given state. It requests
// a value from the upstream conduit. If there is one,
// update the state and continue (the first argument to `requestF`).
// If not, return the current sample (the second argument).
// The `Finalizer` part isn't important for our problem.
private def sampleI[A](k: Int, n: Int, sample: Vector[A]):
Sink[A, Vector[A]] =
requestF((x: A) => sampleI(k, n + 1, algorithmR(k, n + 1, sample, x)),
(_: Any) => sample)(Finalizer.empty)
// The sampling algorithm copied from the question.
val rand = new scala.util.Random()
def algorithmR[A](k: Int, n: Int, sample: Vector[A], x: A): Vector[A] = {
if (sample.size < k) {
sample :+ x // must keep first k elements
} else {
val r = rand.nextInt(n) + 1 // for simplicity, rand is global/stateful
if (r <= k)
sample.updated(r - 1, x) // sample is 0-index
else
sample
}
}
// Construct an iterable of all `short` values, pipe it into our sampling
// funcition, and run the combined pipe.
{
print(runPipe(Util.fromIterable(Short.MinValue to Short.MaxValue) >->
sampleI(10)))
}
}
Update: It'd be possible to solve the problem using State, but we need to implement a custom fold specifically for State that knows how to do it constant space:
import scala.collection._
import scala.language.higherKinds
import scalaz._
import Scalaz._
import scalaz.std.iterable._
object Run extends App {
// Folds in a state monad over a foldable
def stateFold[F[_],E,S,A](xs: F[E],
f: (A, E) => State[S,A],
z: A)(implicit F: Foldable[F]): State[S,A] =
State[S,A]((s: S) => F.foldLeft[E,(S,A)](xs, (s, z))((p, x) => f(p._2, x)(p._1)))
// Sample a lazy collection view
def sampleS[F[_],A](k: Int, xs: F[A])(implicit F: Foldable[F]):
State[Int,Vector[A]] =
stateFold[F,A,Int,Vector[A]](xs, update(k), Vector())
// update using State monad
def update[A](k: Int) = {
(acc: Vector[A], x: A) => State[Int, Vector[A]] {
n => (n + 1, algorithmR(k, n + 1, acc, x)) // algR same as impure solution
}
}
def algorithmR[A](k: Int, n: Int, sample: Vector[A], x: A): Vector[A] = ...
{
print(sampleS(10, (Short.MinValue to Short.MaxValue)).eval(0))
}
}

Related

How to reason about stack safety in Scala Cats / fs2?

Here is a piece of code from the documentation for fs2. The function go is recursive. The question is how do we know if it is stack safe and how to reason if any function is stack safe?
import fs2._
// import fs2._
def tk[F[_],O](n: Long): Pipe[F,O,O] = {
def go(s: Stream[F,O], n: Long): Pull[F,O,Unit] = {
s.pull.uncons.flatMap {
case Some((hd,tl)) =>
hd.size match {
case m if m <= n => Pull.output(hd) >> go(tl, n - m)
case m => Pull.output(hd.take(n.toInt)) >> Pull.done
}
case None => Pull.done
}
}
in => go(in,n).stream
}
// tk: [F[_], O](n: Long)fs2.Pipe[F,O,O]
Stream(1,2,3,4).through(tk(2)).toList
// res33: List[Int] = List(1, 2)
Would it also be stack safe if we call go from another method?
def tk[F[_],O](n: Long): Pipe[F,O,O] = {
def go(s: Stream[F,O], n: Long): Pull[F,O,Unit] = {
s.pull.uncons.flatMap {
case Some((hd,tl)) =>
hd.size match {
case m if m <= n => otherMethod(...)
case m => Pull.output(hd.take(n.toInt)) >> Pull.done
}
case None => Pull.done
}
}
def otherMethod(...) = {
Pull.output(hd) >> go(tl, n - m)
}
in => go(in,n).stream
}
My previous answer here gives some background information that might be useful. The basic idea is that some effect types have flatMap implementations that support stack-safe recursion directly—you can nest flatMap calls either explicitly or through recursion as deeply as you want and you won't overflow the stack.
For some effect types it's not possible for flatMap to be stack-safe, because of the semantics of the effect. In other cases it may be possible to write a stack-safe flatMap, but the implementers might have decided not to because of performance or other considerations.
Unfortunately there's no standard (or even conventional) way to know whether the flatMap for a given type is stack-safe. Cats does include a tailRecM operation that should provide stack-safe monadic recursion for any lawful monadic effect type, and sometimes looking at a tailRecM implementation that's known to be lawful can provide some hints about whether a flatMap is stack-safe. In the case of Pull it looks like this:
def tailRecM[A, B](a: A)(f: A => Pull[F, O, Either[A, B]]) =
f(a).flatMap {
case Left(a) => tailRecM(a)(f)
case Right(b) => Pull.pure(b)
}
This tailRecM is just recursing through flatMap, and we know that Pull's Monad instance is lawful, which is pretty good evidence that Pull's flatMap is stack-safe. The one complicating factor here is that the instance for Pull has an ApplicativeError constraint on F that Pull's flatMap doesn't, but in this case that doesn't change anything.
So the tk implementation here is stack-safe because flatMap on Pull is stack-safe, and we know that from looking at its tailRecM implementation. (If we dug a little deeper we could figure out that flatMap is stack-safe because Pull is essentially a wrapper for FreeC, which is trampolined.)
It probably wouldn't be terribly hard to rewrite tk in terms of tailRecM, although we'd have to add the otherwise unnecessary ApplicativeError constraint. I'm guessing the authors of the documentation chose not to do that for clarity, and because they knew Pull's flatMap is fine.
Update: here's a fairly mechanical tailRecM translation:
import cats.ApplicativeError
import fs2._
def tk[F[_], O](n: Long)(implicit F: ApplicativeError[F, Throwable]): Pipe[F, O, O] =
in => Pull.syncInstance[F, O].tailRecM((in, n)) {
case (s, n) => s.pull.uncons.flatMap {
case Some((hd, tl)) =>
hd.size match {
case m if m <= n => Pull.output(hd).as(Left((tl, n - m)))
case m => Pull.output(hd.take(n.toInt)).as(Right(()))
}
case None => Pull.pure(Right(()))
}
}.stream
Note that there's no explicit recursion.
The answer to your second question depends on what the other method looks like, but in the case of your specific example, >> will just result in more flatMap layers, so it should be fine.
To address your question more generally, this whole topic is a confusing mess in Scala. You shouldn't have to dig into implementations like we did above just to know whether a type supports stack-safe monadic recursion or not. Better conventions around documentation would be a help here, but unfortunately we're not doing a very good job of that. You could always use tailRecM to be "safe" (which is what you'll want to do when the F[_] is generic, anyway), but even then you're trusting that the Monad implementation is lawful.
To sum up: it's a bad situation all around, and in sensitive situations you should definitely write your own tests to verify that implementations like this are stack-safe.

Fibonacci memoization in Scala with Memo.mutableHashMapMemo

I am trying implement the fibonacci function in Scala with memoization
One example given here uses a case statement:
Is there a generic way to memoize in Scala?
import scalaz.Memo
lazy val fib: Int => BigInt = Memo.mutableHashMapMemo {
case 0 => 0
case 1 => 1
case n => fib(n-2) + fib(n-1)
}
It seems the variable n is implicitly defined as the first argument, but I get a compilation error if I replace n with _
Also what advantage does the lazy keyword have here, as the function seems to work equally well with and without this keyword.
However I wanted to generalize this to a more generic function definition with appropriate typing
import scalaz.Memo
def fibonachi(n: Int) : Int = Memo.mutableHashMapMemo[Int, Int] {
var value : Int = 0
if( n <= 1 ) { value = n; }
else { value = fibonachi(n-1) + fibonachi(n-2) }
return value
}
but I get the following compilation error
cmd10.sc:4: type mismatch;
found : Int => Int
required: Int
def fibonachi(n: Int) : Int = Memo.mutableHashMapMemo[Int, Int] {
^Compilation Failed
Compilation Failed
So I am trying to understand the generic way of adding adding a memoization annotation to a scala def function
One way to achieve a Fibonacci sequence is via a recursive Stream.
val fib: Stream[BigInt] = 0 #:: fib.scan(1:BigInt)(_+_)
An interesting aspect of streams is that, if something holds on to the head of the stream, the calculation results are auto-memoized. So, in this case, because the identifier fib is a val and not a def, the value of fib(n) is calculated only once and simply retrieved thereafter.
However, indexing a Stream is still a linear operation. If you want to memoize that away you could create a simple memo-wrapper.
def memo[A,R](f: A=>R): A=>R =
new collection.mutable.WeakHashMap[A,R] {
override def apply(a: A) = getOrElseUpdate(a,f(a))
}
val fib: Stream[BigInt] = 0 #:: fib.scan(1:BigInt)(_+_)
val mfib = memo(fib)
mfib(99) //res0: BigInt = 218922995834555169026
The more general question I am trying to ask is how to take a pre-existing def function and add a mutable/immutable memoization annotation/wrapper to it inline.
Unfortunately there is no way to do this in Scala unless you are willing to use a macro annotation for this which feels like an overkill to me or to use some very ugly design.
The contradicting requirements are "def" and "inline". The fundamental reason for this is that whatever you do inline with the def can't create any new place to store the memoized values (unless you use a macro that can re-write code introducing new val/vars). You may try to work this around using some global cache but this IMHO falls under the "ugly design" branch.
The design of ScalaZ Memo is used to create a val of the type Function[K,V] which is often written in Scala as just K => V instead of def. In this way the produced val can contain also the storage for the cached values. On the other hand syntactically the difference between usage of a def method and of a K => V function is minimal so this works pretty well. Since the Scala compiler knows how to convert def method into a function, you can wrap a def with Memo but you can't get a def out of it. If for some reason you need def anyway, you'll need another wrapper def.
import scalaz.Memo
object Fib {
def fib(n: Int): BigInt = n match {
case 0 => BigInt(0)
case 1 => BigInt(1)
case _ => fib(n - 2) + fib(n - 1)
}
// "fib _" converts a method into a function. Sometimes "_" might be omitted
// and compiler can imply it but sometimes the compiler needs this explicit hint
lazy val fib_mem_val: Int => BigInt = Memo.mutableHashMapMemo(fib _)
def fib_mem_def(n: Int): BigInt = fib_mem_val(n)
}
println(Fib.fib(5))
println(Fib.fib_mem_val(5))
println(Fib.fib_mem_def(5))
Note how there is no difference in syntax of calling fib, fib_mem_val and fib_mem_def although fib_mem_val is a value. You may also try this example online
Note: beware that some ScalaZ Memo implementations are not thread-safe.
As for the lazy part, the benefit is typical for any lazy val: the actual value with the underlying storage will not be created until the first usage. If the method will be used anyway, I see no benefits in declaring it as lazy

Early return from a for loop in Scala

Now I have some Scala code similar to the following:
def foo(x: Int, fn: (Int, Int) => Boolean): Boolean = {
for {
i <- 0 until x
j <- i + 1 until x
if fn(i, j)
} return true
false
}
But I get the feeling that return true is not so functional (or maybe it is?). Is there a way to rewrite this piece of code in a more elegant way?
In general, what is the more functional (if any) way to write the return-early-from-a-loop kind of code?
There are several methods can help, such as find, exists, etc. For your case, try this:
def foo2(x: Int, fn: (Int, Int) => Boolean): Boolean = {
(0 until x).exists(i =>
(i+1 until x).exists(j=>fn(i, j)))
}
Since all you are checking for is existence, you can just compose 2 uses of exists:
(0 until x).exists(i => (i + 1 until x).exists(fn(i, _)))
More generally, if you are concerned with more than just determining if a certain element exists, you can convert your comprehension to a series of Streams, Iterators, or views, you can use exists and it will evaluate lazily, avoiding unnecessary executions of the loop:
def foo(x: Int, fn: (Int, Int) => Boolean): Boolean = {
(for {
i <- (0 until x).iterator
j <- (i + 1 until x).iterator
} yield(i, j)).exists(fn.tupled)
}
You can also use map and flatMap instead of a for, and toStream or view instead of iterator:
(0 until x).view.flatMap(i => (i + 1 until x).toStream.map(j => i -> j)).exists(fn.tupled)
You can also use view on any collection to get a collection where all the transformers are performed lazily. This is possibly the most idiomatic way to short-circuit a collection traversal. From the docs on views:
Scala collections are by default strict in all their transformers, except for Stream, which implements all its transformer methods lazily. However, there is a systematic way to turn every collection into a lazy one and vice versa, which is based on collection views. A view is a special kind of collection that represents some base collection, but implements all transformers lazily.
As far as overhead is concerned, it really depends on the specifics! Different collections have different implementations of view, toStream, and iterator that may vary in amount of overhead. If fn is very expensive to compute, this overhead is probably worth it, and keeping a consistent, idiomatic, functional style to your code makes it more maintainable, debuggable, and readable. If you are in a situation that calls for extreme optimization, you may want to fall back to the lower-level constructs like return (which isn't without it's own overhead!).

Random as instance of scalaz.Monad

This is a follow-up to my previous question. I wrote a monad (for an exercise) that is actually a function generating random values. However it is not defined as an instance of type class scalaz.Monad.
Now I looked at Rng library and noticed that it defined Rng as scalaz.Monad:
implicit val RngMonad: Monad[Rng] =
new Monad[Rng] {
def bind[A, B](a: Rng[A])(f: A => Rng[B]) = a flatMap f
def point[A](a: => A) = insert(a)
}
So I wonder how exactly users benefit from that. How can we use the fact that Rng is an instance of type class scalaz.Monad ? Can you give any examples ?
Here's a simple example. Suppose I want to pick a random size for a range, and then pick a random index inside that range, and then return both the range and the index. The second computation of a random value clearly depends on the first—I need to know the size of the range in order to pick a value in the range.
This kind of thing is specifically what monadic binding is for—it allows you to write the following:
val rangeAndIndex: Rng[(Range, Int)] = for {
max <- Rng.positiveint
index <- Rng.chooseint(0, max)
} yield (0 to max, index)
This wouldn't be possible if we didn't have a Monad instance for Rng.
One of the benefit is that you will get a lot of useful methods defined in MonadOps.
For example, Rng.double.iterateUntil(_ < 0.1) will produce only the values that are less than 0.1 (while the values greater than 0.1 will be skipped).
iterateUntil can be used for generation of distribution samples using a rejection method.
E.g. this is the code that creates a beta distribution sample generator:
import com.nicta.rng.Rng
import java.lang.Math
import scalaz.syntax.monad._
object Main extends App {
def beta(alpha: Double, beta: Double): Rng[Double] = {
// Purely functional port of Numpy's beta generator: https://github.com/numpy/numpy/blob/31b94e85a99db998bd6156d2b800386973fef3e1/numpy/random/mtrand/distributions.c#L187
if (alpha <= 1.0 && beta <= 1.0) {
val rng: Rng[Double] = Rng.double
val xy: Rng[(Double, Double)] = for {
u <- rng
v <- rng
} yield (Math.pow(u, 1 / alpha), Math.pow(v, 1 / beta))
xy.iterateUntil { case (x, y) => x + y <= 1.0 }.map { case (x, y) => x / (x + y) }
} else ???
}
val rng: Rng[List[Double]] = beta(0.5, 0.5).fill(10)
println(rng.run.unsafePerformIO) // Prints 10 samples of the beta distribution
}
Like any interface, declaring an instance of Monad[Rng] does two things: it provides an implementation of the Monad methods under standard names, and it expresses an implicit contract that those method implementations conform to certain laws (in this case, the monad laws).
#Travis gave an example of one thing that's implemented with these interfaces, the Scalaz implementation of map and flatMap. You're right that you could implement these directly; they're "inherited" in Monad (actually a little more complex than that).
For an example of a method that you definitely have to implement some Scalaz interface for, how about sequence? This is a method that turns a List (or more generally a Traversable) of contexts into a single context for a List, e.g.:
val randomlyGeneratedNumbers: List[Rng[Int]] = ...
randomlyGeneratedNumbers.sequence: Rng[List[Int]]
But this actually only uses Applicative[Rng] (which is a superclass), not the full power of Monad. I can't actually think of anything that uses Monad directly (there are a few methods on MonadOps, e.g. untilM, but I've never used any of them in anger), but you might want a Bind for a "wrapper" case where you have an "inner" Monad "inside" your Rng things, in which case MonadTrans is useful:
val a: Rng[Reader[Config, Int]] = ...
def f: Int => Rng[Reader[Config, Float]] = ...
//would be a pain to manually implement something to combine a and f
val b: ReaderT[Rng, Config, Int] = ...
val g: Int => ReaderT[Rng, Config, Float] = ...
b >>= g
To be totally honest though, Applicative is probably good enough for most Monad use cases, at least the simpler ones.
Of course all of these methods are things you could implement yourself, but like any library the whole point of Scalaz is that they're already implemented, and under standard names, making it easier for other people to understand your code.

A grasp of immutable datastructures

I am learning scala and as a good student I try to obey all rules I found.
One rule is: IMMUTABILITY!!!
So I have tried to code everything with immutable data structures and vals, and sometimes this is really hard.
But today I thought to myself: the only important thing is that the object/class should have no mutable state. I am not forced to code all methods in an immutable style, because these methods don't affect each other.
My Question: Am I correct or are there any problems/disadvantages I dont see?
EDIT:
Code example for aishwarya:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Explanation: It is a method to calculate the log-likelihood of a homogeneous Markov model of variable order. The apply method of state takes all previous symbols and the coming symbol and returns the probability of doing so.
As you may see: the whole method is just multiplying some probabilities which would be much easier using vars.
The rule is not really immutability, but referential transparency. It's perfectly OK to use locally declared mutable variables and arrays, because none of the effects are observable to any other parts of the overall program.
The principle of referential transparency (RT) is this:
An expression e is referentially transparent if for all programs p every occurrence of e in p can be replaced with the result of evaluating e, without affecting the observable result of p.
Note that if e creates and mutates some local state, it doesn't violate RT since nobody can observe this happening.
That said, I very much doubt that your implementation is any more straightforward with vars.
The case for functional programming is one of being concise in your code and bringing in a more mathematical approach. It can reduce the possibility of bugs and make your code smaller and more readable. As for being easier or not, it does require that you think about your problems differently. But once you get use to thinking with functional patterns it's likely that functional will become easier that the more imperative style.
It is really hard to be perfectly functional and have zero mutable state but very beneficial to have minimal mutable state. The thing to remember is that everything needs to done in balance and not to the extreme. By reducing the amount of mutable state you end up making it harder to write code with unintended consequences. A common pattern is to have a mutable variable whose value is immutable. This way identity ( the named variable ) and value ( an immutable object the variable can be assigned ) are seperate.
var acc: List[Int] = Nil
// lots of complex stuff that adds values
acc ::= 1
acc ::= 2
acc ::= 3
// do loop current list
acc foreach { i => /* do stuff that mutates acc */ acc ::= i * 10 }
println( acc ) // List( 1, 2, 3, 10, 20, 30 )
The foreach is looping over the value of acc at the time we started the foreach. Any mutations to acc do not affect the loop. This is much safer than the typical iterators in java where the list can change mid iteration.
There is also a concurrency concern. Immutable objects are useful because of the JSR-133 memory model specification which asserts that the initialization of an objects final members will occur before any thread can have visibility to those members, period! If they are not final then they are "mutable" and there is no guarantee of proper initialization.
Actors are the perfect place to put mutable state. Objects that represent data should be immutable. Take the following example.
object MyActor extends Actor {
var acc: List[Int] = Nil
def act() {
loop {
react {
case i: Int => acc ::= i
case "what is your current value" => reply( acc )
case _ => // ignore all other messages
}
}
}
}
In this case we can send the value of acc ( which is a List ) and not worry about synchronization because List is immutable aka all of the members of the List object are final. Also because of the immutability we know that no other actor can change the underlying data structure that was sent and thus no other actor can change the mutable state of this actor.
Since Apocalisp has already mentioned the stuff I was going to quote him on, I'll discuss the code. You say it is just multiplying stuff, but I don't see that -- it makes reference to at least three important methods defined outside: order, states and M.log. I can infer that order is an Int, and that states return a function that takes a List[T] and a T and returns Double.
There's also some weird stuff going on...
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
sequence is never used except to define seqPos, so why do that?
val stateSequence = (0 to order).toList.padTo(sequence.length,order)
val seqPos = sequence.zipWithIndex
def probOfSymbAtPos(symb: T, pos: Int) : Double = {
val state = states(stateSequence(pos))
M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
Actually, you could use sequence here instead of seqPos.map( _._1 ), since all that does is undo the zipWithIndex. Also, slice(0, pos) is just take(pos).
}
val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )
probs.sum
}
Now, given the missing methods, it is difficult to assert how this should really be written in functional style. Keeping the mystery methods would yield:
def logLikelihood(seq: Iterator[T]): Double = {
import scala.collection.immutable.Queue
case class State(index: Int, order: Int, slice: Queue[T], result: Double)
seq.foldLeft(State(0, 0, Queue.empty, 0.0)) {
case (State(index, ord, slice, result), symb) =>
val state = states(order)
val partial = M.log(state(slice, symb))
val newSlice = slice enqueue symb
State(index + 1,
if (ord == order) ord else ord + 1,
if (queue.size > order) newSlice.dequeue._2 else newSlice,
result + partial)
}.result
}
Only I suspect the state/M.log stuff could be made part of State as well. I notice other optimizations now that I have written it like this. The sliding window you are using reminds me, of course, of sliding:
seq.sliding(order).zipWithIndex.map {
case (slice, index) => M.log(states(index + order)(slice.init, slice.last))
}.sum
That will only start at the orderth element, so some adaptation would be in order. Not too difficult, though. So let's rewrite it again:
def logLikelihood(seq: Iterator[T]): Double = {
val sequence = seq.toList
val slices = (1 until order).map(sequence take) ::: sequence.sliding(order)
slices.zipWithIndex.map {
case (slice, index) => M.log(states(index)(slice.init, slice.last))
}.sum
}
I wish I could see M.log and states... I bet I could turn that map into a foldLeft and do away with these two methods. And I suspect the method returned by states could take the whole slice instead of two parameters.
Still... not bad, is it?