This is a followup to this question.
Here's the code I'm trying to understand (it's from http://apocalisp.wordpress.com/2010/10/17/scalaz-tutorial-enumeration-based-io-with-iteratees/):
object io {
sealed trait IO[A] {
def unsafePerformIO: A
}
object IO {
def apply[A](a: => A): IO[A] = new IO[A] {
def unsafePerformIO = a
}
}
implicit val IOMonad = new Monad[IO] {
def pure[A](a: => A): IO[A] = IO(a)
def bind[A,B](a: IO[A], f: A => IO[B]): IO[B] = IO {
implicitly[Monad[Function0]].bind(() => a.unsafePerformIO,
(x:A) => () => f(x).unsafePerformIO)()
}
}
}
This code is used like this (I'm assuming an import io._ is implied)
def bufferFile(f: File) = IO { new BufferedReader(new FileReader(f)) }
def closeReader(r: Reader) = IO { r.close }
def bracket[A,B,C](init: IO[A], fin: A => IO[B], body: A => IO[C]): IO[C] = for { a <- init
c <- body(a)
_ <- fin(a) } yield c
def enumFile[A](f: File, i: IterV[String, A]): IO[IterV[String, A]] = bracket(bufferFile(f),
closeReader(_:BufferedReader),
enumReader(_:BufferedReader, i))
I'm now trying to understand the implicit val IOMonad definition. Here's how I understand it. This is a scalaz.Monad, so it needs to define pure and bind abstract values of the scalaz.Monad trait.
pure takes a value and turns it into a value contained in the "container" type. For example it could take an Int and return a List[Int]. This seems pretty simple.
bind takes a "container" type and a function that maps the type that the container holds to another type. The value that is returned is the same container type, but it's now holding a new type. An example would be taking a List[Int] and mapping it to a List[String] using a function that maps Ints to Strings. Is bind pretty much the same as map?
The implementation of bind is where I'm stuck. Here's the code:
def bind[A,B](a: IO[A], f: A => IO[B]): IO[B] = IO {
implicitly[Monad[Function0]].bind(() => a.unsafePerformIO,
(x:A) => () => f(x).unsafePerformIO)()
}
This definition takes IO[A] and maps it to IO[B] using a function that takes an A and returns an IO[B]. I guess to do this, it has to use flatMap to "flatten" the result (correct?).
The = IO { ... } is the same as
= new IO[A] {
def unsafePerformIO = implicitly[Monad[Function0]].bind(() => a.unsafePerformIO,
(x:A) => () => f(x).unsafePerformIO)()
}
}
I think?
the implicitly method looks for an implicit value (value, right?) that implements Monad[Function0]. Where does this implicit definition come from? I'm guessing this is from the implicit val IOMonad = new Monad[IO] {...} definition, but we're inside that definition right now and things get a little circular and my brain starts to get stuck in an infinite loop :)
Also, the first argument to bind (() => a.unsafePerformIO) seems to be a function that takes no parameters and returns a.unsafePerformIO. How should I read this? bind takes a container type as its first argument, so maybe () => a.unsafePerformIO resolves to a container type?
IO[A] is intended to represent an Action returning an A, where the result of the Action may depend on the environment (meaning anything, values of variables, file system, system time...) and the execution of the action may also modify the environment. Actually, scala type for an Action would be Function0. Function0[A] returns an A when called and it is certainly allowed to depend on and modify the environment. IO is Function0 under another name, but it is intended to distinguish (tag?) those Function0 which depends on the environment from the other ones, which are actually pure value (if you say f is a function[A] which always returns the same value, without any side effect, there is no much difference between f and its result). To be precise, it is not so much that function tagged as IO must have side effect. It is that those not so tagged must have none. Note however than wrapping impure functions in IO is entirely voluntary, there is no way you will have a guarantee when you get a Function0 that it is pure. Using IO is certainly not the dominant style in scala.
pure takes a value and turns it into a value contained in the
"container" type.
Quite right, but "container" may mean quite a lot of things. And the one returned by pure must be as light as possible, it must be the one that makes no difference. The point of list is that they may have any number of values. The one returned by pure must have one. The point of IO is that it depends on and affect the environment. The one returned by pure must do no such thing. So it is actually the pure Function0 () => a, wrapped in IO.
bind pretty much the same as map
Not so, bind is the same as flatMap. As you write, map would receive a function from Int to String, but here you have the function from Int to List[String]
Now, forget IO for a moment and consider what bind/flatMap would mean for an Action, that is for Function0.
Let's have
val askUserForLineNumber: () => Int = {...}
val readingLineAt: Int => Function0[String] = {i: Int => () => ...}
Now if we must combine, as bind/flatMap does, those items to get an action that returns a String, what it must be is pretty clear: ask the reader for the line number, read that line and returns it. That would be
val askForLineNumberAndReadIt= () => {
val lineNumber : Int = askUserForLineNumber()
val readingRequiredLine: Function0[String] = readingLineAt(line)
val lineContent= readingRequiredLine()
lineContent
}
More generically
def bind[A,B](a: Function0[A], f: A => Function0[B]) = () => {
val value = a()
val nextAction = f(value)
val result = nextAction()
result
}
And shorter:
def bind[A,B](a: Function0[A], f: A => Function0[B])
= () => {f(a())()}
So we know what bind must be for Function0, pure is clear too. We can do
object ActionMonad extends Monad[Function0] {
def pure[A](a: => A) = () => a
def bind[A,B](a: () => A, f: A => Function0[B]) = () => f(a())()
}
Now, IO is Function0 in disguise. Instead of just doing a(), we must do a.unsafePerformIO. And to define one, instead of () => body, we write IO {body}
So there could be
object IOMonad extends Monad[IO] {
def pure[A](a: => A) = IO {a}
def bind[A,B](a: IO[A], f: A => IO[B]) = IO {f(a.unsafePerformIO).unsafePerformIO}
}
In my view, that would be good enough. But in fact it repeats the ActionMonad. The point in the code you refer to is to avoid that and reuse what is done for Function0 instead. One goes easily from IO to Function0 (with () => io.unsafePerformIo) as well as from Function0 to IO (with IO { action() }). If you have f: A => IO[B], you can also change that to f: A => Function0[B], just by composing with the IO to Function0 transform, so (x: A) => f(x).unsafePerformIO.
What happens here in the bind of IO is:
() => a.unsafePerformIO: turn a into a Function0
(x:A) => () => f(x).unsafePerformIO): turn f into an A => Function0[B]
implicitly[Monad[Function0]]: get the default monad for Function0, the very same as the ActionMonad above
bind(...): apply the bind of the Function0 monad to the arguments a and f that have just been converted to Function0
The enclosing IO{...}: convert the result back to IO.
(Not sure I like it much)
Related
Background
I have been reading the book Functional Programming in Scala, and have some questions regarding the content in Chapter 7: Purely functional parallelism.
Here is the code for the answers in the book: Par.scala, but I am confused about certain part of it.
Here is the first part of the code of Par.scala, which stands for Parallelism:
import java.util.concurrent._
object Par {
type Par[A] = ExecutorService => Future[A]
def unit[A](a: A): Par[A] = (es: ExecutorService) => UnitFuture(a)
private case class UnitFuture[A](get: A) extends Future[A] {
def isDone = true
def get(timeout: Long, units: TimeUnit): A = get
def isCancelled = false
def cancel(evenIfRunning: Boolean): Boolean = false
}
def map2[A, B, C](a: Par[A], b: Par[B])(f: (A, B) => C): Par[C] =
(es: ExecutorService) => {
val af = a(es)
val bf = b(es)
UnitFuture(f(af.get, bf.get))
}
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
def run[A](es: ExecutorService)(a: Par[A]): Future[A] = a(es)
def asyncF[A, B](f: A => B): A => Par[B] =
a => lazyUnit(f(a))
def map[A, B](pa: Par[A])(f: A => B): Par[B] =
map2(pa, unit(()))((a, _) => f(a))
}
The simplest possible model for Par[A] might be ExecutorService => Future[A], and run simply returns the Future.
unit promotes a constant value to a parallel computation by returning a UnitFuture, which is a simple implementation of Future that just wraps a constant value.
map2 combines the results of two parallel computations with a binary function.
fork marks a computation for concurrent evaluation. The evaluation won’t actually occur until forced by run. Here is with its simplest and most natural implementation of it. Even though it has its problems, let's first put them aside.
lazyUnit wraps its unevaluated argument in a Par and marks it for concurrent evaluation.
run extracts a value from a Par by actually performing the computation.
asyncF converts any function A => B to one that evaluates its result asynchronously.
Questions
The fork is the function confuses me a lot here, because it takes a lazy argument, which will be evaluated later when it is called. Then my questions are more about when we should use this fork, i.e., when we need lazy-evaluation and when we need to have the value directly.
Here is an exercise from the book:
EXERCISE 7.5
Hard: Write this function, called sequence. No additional primitives are required. Do not call run.
def sequence[A](ps: List[Par[A]]): Par[List[A]]
And here is the answers (offered here).
First
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldRight[Par[List[A]]](unit(List()))((h, t) => map2(h, t)(_ :: _))
What is the different between above code and the following:
def sequence_simple[A](l: List[Par[A]]): Par[List[A]] =
l.foldLeft[Par[List[A]]](unit(List()))((t, h) => map2(h, t)(_ :: _))
Additionally
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] =
as match {
case Nil => unit(Nil)
case h :: t => map2(h, fork(sequenceRight(t)))(_ :: _)
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] = fork {
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(sequenceBalanced(l), sequenceBalanced(r))(_ ++ _)
}
}
In sequenceRight, fork is used when recursive function is directly called. However, in sequenceBalanced, fork is used outside of the whole function body.
Then, what is the differences or above code and the following (where we switched the places of fork):
def sequenceRight[A](as: List[Par[A]]): Par[List[A]] = fork {
as match {
case Nil => unit(Nil)
case h :: t => map2(h, sequenceRight(t))(_ :: _)
}
}
def sequenceBalanced[A](as: IndexedSeq[Par[A]]): Par[IndexedSeq[A]] =
if (as.isEmpty) unit(Vector())
else if (as.length == 1) map(as.head)(a => Vector(a))
else {
val (l,r) = as.splitAt(as.length/2)
map2(fork(sequenceBalanced(l)), fork(sequenceBalanced(r)))(_ ++ _)
}
Finally, given the sequence defined above, we have the following function:
def parMap[A,B](ps: List[A])(f: A => B): Par[List[B]] = fork {
val fbs: List[Par[B]] = ps.map(asyncF(f))
sequence(fbs)
}
I would like to know, can I also implement the function in the following way, which is by applying the lazyUnit defined in the beginning? Is this implementation lazyUnit(ps.map(f)) lazy?
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
I did not completely understand your doubt. But I see a major problem with the following solution,
def parMapByLazyUnit[A, B](ps: List[A])(f: A => B): Par[List[B]] =
lazyUnit(ps.map(f))
To understand the problem lets look at def lazyUnit,
def fork[A](a: => Par[A]): Par[A] =
(es: ExecutorService) => es.submit(new Callable[A] {
def call: A = a(es).get
})
def lazyUnit[A](a: => A): Par[A] =
fork(unit(a))
So... lazyUnit takes an expression of type => A and submits it to ExecutorService to get evaluated. And returns the wrapped result of this parallel computation as Par[A].
In parMap for every element of ps: List[A], we not only have to evaluate the corresponding mapping using the function f: A => B but we have to do these evaluations in parallel.
But our solution lazyUnit(ps.map(f)) will submit the whole { ps.map(f) } evaluation as a single task to our ExecutionService. Which means we are not doing it in parallel.
What we need to do is make sure that for each element a in ps: [A], the function f: A => B is executed as a separate task for our ExecutorService.
Now, as we learned from our implementation is that we can run an expression of type exp: => A by using lazyUnit(exp) to get a result: Par[A].
So, we will do exactly that for every a: A in ps: List[A],
val parMappedTmp = ps.map( a => lazyUnit(f(a) ) )
// or
val parMappedTmp = ps.map( a => asyncF(f)(a) )
// or
val parMappedTmp = ps.map(asyncF(f))
But, Now our parMappedTmp is a List[Par[B]] and whereas we needed a Par[List[B]]
So, you will need a function with the following signature to get what you wanted,
def sequence[A](ps: List[Par[A]]): Par[List[A]]
Once you have it,
val parMapped = sequence(parMappedTmp)
I'm trying to understand particular use of underscore in Scala. And following piece of code I cannot understand
class Test[T, S] {
def f1(f: T => S): Unit = f2(_ map f)
def f2(f: Try[T] => Try[S]): Unit = {}
}
How is the _ treated in this case? How is the T=>S becomes Try[T]=>Try[S]?
It seems you are reading it wrongly. Look at the type of f2(Try[T] => Try[S]):Unit.
Then looking into f1 we have f: T => S.
The _ in value position desugars to f2(g => g map f).
Let's see what we know so far:
f2(Try[T] => Try[S]):Unit
f: T => S
f2(g => g map f)
Give 1. and 3. we can infer that the type of g has to be Try[T]. map over Try[T] takes T => Something, in case f which is T => S, in which case Something is S.
It may seem a bit hard to read now, but once you learn to distinguish between type and value position readin this type of code becomes trivial.
Another thing to notice def f2(f: Try[T] => Try[S]): Unit = {} is quite uninteresting and may be a bit detrimental in solving your particular question.
I'd try to solve this like that: first forget the class you created. Now implement this (replace the ??? with a useful implementation):
object P1 {
def fmap[A, B](A => B): Try[A] => Try[B] = ???
}
For bonus points use the _ as the first char in your implementation.
I am trying to write a partially applied function. I thought the below would work but it doesn't. Grateful for any help.
scala> def doSth(f: => Unit) { f }
doSth: (f: => Unit)Unit
scala> def sth() = { println ("Hi there") }
sth: ()Unit
scala> doSth(sth)
Hi there
scala> val b = sth _
b: () => Unit = <function0>
scala> doSth(b)
<console>:11: warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses
doSth(b)
^
Thanks!
The difference is subtle. sth is a method, so you can call it without the parentheses, which is what is happening here:
doSth(sth)
But b is just a function () => Unit, which you must use the parentheses to invoke.
doSth(b())
Otherwise you would not be able to assign b to another identifier.
val c: () => Unit = b
If we automatically invoked b here, c would be Unit instead of () => Unit. The ambiguity must be removed.
Let me also clarify that doSth is not a method that accepts only functions. f: => Unit means it accepts anything that evaluates to Unit, which includes methods that return Unit when they are invoked. doSth(sth) is not passing the message sth to doSth, it's invoking sth without parentheses and passing the result.
I think the difference is that b has type () => Unit, and doSth expects an expression that returns Unit (minor difference). If you want to store sth as a variable without the side effects, you could do:
lazy val b = sth
doSth(b)
The other option would be to make it so that doSth takes in () => Unit:
def doSth(f: () => Unit) { f() }
def sth() = { println ("Hi there") }
doSth(sth)
val b = sth _
doSth(b)
I am going through the book "Functional Programming in Scala" and have run across an example that I don't fully understand.
In the chapter on strictness/laziness the authors describe the construction of Streams and have code like this:
sealed trait Stream[+A]
case object Empty extends Stream[Nothing]
case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
object Stream {
def cons[A](hd: => A, tl: => Stream[A]) : Stream[A] = {
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail)
}
...
}
The question I have is in the smart constructor (cons) where it calls the constructor for the Cons case class. The specific syntax being used to pass the head and tail vals doesn't make sense to me. Why not just call the constructor like this:
Cons(head, tail)
As I understand the syntax used it is forcing the creation of two Function0 objects that simply return the head and tail vals. How is that different from just passing head and tail (without the () => prefix) since the Cons case class is already defined to take these parameters by-name anyway? Isn't this redundant? Or have I missed something?
The difference is in => A not being equal to () => A.
The former is pass by name, and the latter is a function that takes no parameters and returns an A.
You can test this out in the Scala REPL.
scala> def test(x: => Int): () => Int = x
<console>:9: error: type mismatch;
found : Int
required: () => Int
def test(x: => Int): () => Int = x
^
Simply referencing x in my sample causes the parameter to be invoked. In your sample, it's constructing a method which defers invocation of x.
First, you are assuming that => A and () => A are the same. However, they are not. For example, the => A can only be used in the context of passing parameters by-name - it is impossible to declare a val of type => A. As case class parameters are always vals (unless explicitly declared vars), it is clear why case class Cons[+A](h: => A, t: => Stream[A]) would not work.
Second, just wrapping a by-name parameter into a function with an empty parameter list is not the same as what the code above accomplishes: using lazy vals, it is ensured that both hd and tl are evaluated at most once. If the code read
Cons(() => hd, () => tl)
the original hd would be evaluated every time the h method (field) of a Cons object is invoked. Using a lazy val, hd is evaluated only the first time the h method of this Cons object is invoked, and the same value is returned in every subsequent invocation.
Demonstrating the difference in a stripped-down fashion in the REPL:
> def foo = { println("evaluating foo"); "foo" }
> val direct : () => String = () => foo
> direct()
evaluating foo
res6: String = foo
> direct()
evaluating foo
res7: String = foo
> val lzy : () => String = { lazy val v = foo; () => v }
> lzy()
evaluating foo
res8: String = foo
> lzy()
res9: String = foo
Note how the "evaluating foo" output in the second invocation of lzy() is gone, as opposed to the second invocation of direct().
Note that the parameters of the method cons are by-name parameters (hd and tl). That means that if you call cons, the arguments will not be evaluated before you call cons; they will be evaluated later, at the moment you use them inside cons.
Note that the Cons constructor takes two functions of type Unit => A, but not as by-name parameters. So these will be evaluated before you call the constructor.
If you do Cons(head, tail) then head and tail will be evaluated, which means hd and tl will be evaluated.
But the whole point here was to avoid calling hd and tl until necessary (when someone accesses h or t in the Cons object). So, you pass two anonymous functions to the Cons constructor; these functions will not be called until someone accesses h or t.
In def cons[A](hd: => A, tl: => Stream[A]) : Stream[A]
the type of hd is A, tl is Stream[A]
whereas in case class Cons[+A](h: () => A, t: () => Stream[A]) extends Stream[A]
h is of type Function0[A] and t of type Function0[Stream[A]]
given the type of hd is A, the smart constructor invokes the case class as
lazy val head = hd
lazy val tail = tl
Cons(() => head, () => tail) //it creates a function closure so that head is accessible within Cons for lazy evaluation
I'd like to be able to pass in callback functions as parameters to a method. Right now, I can pass in a function of signature () => Unit, as in
def doSomething(fn:() => Unit) {
//... do something
fn()
}
which is fine, I suppose, but I'd like to be able to pass in any function with any parameters and any return type.
Is there a syntax to do that?
Thanks
To be able to execute a function passed as a callback you have to be able to call it with the arguments it requires. f: A => R must be called as f(someA), and g: (A,B) => R must be called as f(someA, someB).
Implementing a callback you want a function that accepts a context and returns something (you don't care what the something is). For example foreach in scala.Option is defined as:
def foreach[U](f: (A) => U): Unit
The function is called with a value of type A but the result is discarded so we don't care what type it has. Alternatively if you don't care what was completed the callback function could be implemented as:
def onComplete[U](f: () => U): Unit
It would then allow being called with any function that takes no arguments:
val f: () => Int = ...
val g: () => String = ...
onComplete(f) // In result is discarded
onComplete(g) // String result is discarded
If you really wanted a function that accepted any possible function there are some tricks you could use (but you really shouldn't). For example you could define a view bound with implicit conversions:
// Type F is any type that can be converted to () => U
def onComplete[F <% () => U, U](f: F): Unit
// Define conversions for each Function type to Function0
implicit def f1tof0[A,U](f: A => U): () => U = ...
implicit def f2tof0[A,B,U](f: (A,B) => U): () => U = ...
implicit def f3tof0[A,B,C,U](f: (A,B,C) => U): () => U = ...
.
.
etc.
Now onComplete accepts any function:
val f: Int => String = ...
val g: (List[Int], String) => String = ...
onComplete(f)
onComplete(g)
Defining the types for the above conversions is relatively simple, but there is no rational way to implement them so is pretty much entirely useless.