I have read a blog post about Reader monad.
The post is truly great and explains the topic in details but I did not get why I should use the Reader monad in that case.
The post says: Suppose there is a function query: String => Connection => ResultSet
def query(sql:String) = conn:Connection => conn.createStatement.executeQuery(sql)
We can run a few queries as follows:
def doSomeQueries(conn: Connection) = {
val rs1 = query("SELECT COUNT(*) FROM Foo")(conn)
val rs2 = query("SELECT COUNT(*) FROM Bar")(conn)
rs1.getInt(1) + rs2.getInt(1)
}
So far so good, but the post suggests use the Reader monad instead:
class Reader[E, A](run: E => A) {
def map[B](f: A => B):Reader[E, B] =
new Reader(е=> f(run(е)))
def flatMap[B](f:A => Reader[E, B]): Reader[E, B] =
new Reader(е => f(run(е)).run(е))
}
val query(sql:String): Reader[Connection, ResultSet] =
new Reader(conn => conn.createStatement.executeQuery(sql))
def doSomeQueries(conn: Connection) = for {
rs1 <- query("SELECT COUNT(*) FROM Foo")
rs2 <- query("SELECT COUNT(*) FROM Bar")
} yield rs1.getInt(1) + rs2.getInt(1)
Ok, I got that I don't need to thread connection through the calls explicitly. So what ?
Why the solution with Reader monad is better than the previous one ?
UPDATE: Fixed the typo in def query: = should be =>
This comment only exists because SO insists that edits must be at least 6 chars long. So here we go.
The most important reason is that the reader monad allows you to build up complex computations compositionally. Consider the following line from your non-reader example:
val rs1 = query("SELECT COUNT(*) FROM Foo")(conn)
The fact that we're passing in conn manually means that this line doesn't really make sense on its own—it can only be understood and reasoned about in the context of the doSomeQueries method that gives us conn.
Often this is just fine—there's obviously nothing wrong about defining and using local variables (at least in the val sense). Sometimes, though, it's more convenient (or desirable for other reasons) to build up computations out of stand-alone, composable pieces, and the reader monad can help with this.
Consider query("SELECT COUNT(*) FROM Foo") in your second example. Assuming we know what query is, this is an entirely self-contained expression—there are no variables like conn that need to be bound by some enclosing scope. This means you can reuse and refactor more confidently, and that you don't have quite so much stuff to hold in your head when you're reasoning about it.
Again, this isn't ever necessary—it's largely a matter of style. If you decide to give it a try (and I'd suggest that you do), you'll probably pretty quickly develop preferences and intuitions about where it makes your code more intelligible and where it doesn't.
One other advantage is that you can compose different kinds of "effects" using ReaderT (or by adding Reader into some other stack). That set of issues probably deserves its own question and answer, though.
One last note: you probably want your doSomeQueries to look like this:
def doSomeQueries: Reader[Connection, Int] = for {
rs1 <- query("SELECT COUNT(*) FROM Foo")
rs2 <- query("SELECT COUNT(*) FROM Bar")
} yield rs1.getInt(1) + rs2.getInt(1)
Or, if this really is the end of the line:
def doSomeQueries(conn: Connection) = (
for {
rs1 <- query("SELECT COUNT(*) FROM Foo")
rs2 <- query("SELECT COUNT(*) FROM Bar")
} yield rs1.getInt(1) + rs2.getInt(1)
).run(conn)
In your current version you're not actually using conn.
For finding out the general benefits of using ReaderMonad I recommend Travis Brown's excellent answer - the strength of ReaderMonad lies in its compositionality and other extras provided by monads (e.g. the ReaderT et al). You get the most benefit out of it if you write your other code in monadic style too.
You've also asked specifically what's so desirable in not having to pass the connection around explicitly. I'll try to answer this part of your question here.
First, few words less to type / less to read is already an improvement. The more complex the whole codebase is the more I appreciate that. When I read a long method (not written by me of course ;) ) I find it easier when its logic isn't interwoven with dumb argument passing.
Second, ReaderMonad gives you a guarantee, that the connection is the same object all the way down. Most often you want exactly that. In your first example it's very easy to call
query("SELECT COUNT(*) FROM Bar")(anotherConnectionWhereverItCameFrom)
regardless of whether it's been done for purpose or by mistake. When I read a long method and see ReaderMonad used I know that there'll be only one connection used. No nasty surprises caused by some "tactical solution" in the 219th line of the method.
Note, that those benefits can be also achieved without ReaderMonad, even if it does good job in that area. You could for example just write:
class Query(val connection: Connection) {
def apply(sql:String) = connection.createStatement.executeQuery(sql)
}
def doSomeQueries(query: Query) = {
val rs1 = query("SELECT COUNT(*) FROM Foo")
val rs2 = query("SELECT COUNT(*) FROM Bar")
rs1.getInt(1) + rs2.getInt(1)
}
doSomeQueries(new Query(connection))
It wouldn't have neither composability nor other nice features of monads, but would achieve the ReaderMonad's goal of not passing the argument (connection) explicitly.
Related
I want to flatMap a Try[Option[A]] using some function that uses the value inside the Option to create another Try, and I want the solution to be simple and idiomatic. I have illustrated the problem with an example. The goal is to create a Option[Group] with members and events wrapped in a single Try that can contain errors from any of the three functions.
def getGroup(id: Long): Try[Option[Group]]
def getMembersForGroup(groupId: Long): Try[Seq[Member]]
def getMeetingsForGroup(groupId: Long): Try[Seq[Meeting]]
I find it difficult to flatMap from the Try returned by getGroup to the Try from the member- and meeting-functions because there's an Option "in the way". This is what i have come up with so far:
getGroup(id).flatMap(
groupOpt => groupOpt.map(
group => addStuff(group).map(group => Some(group))
).getOrElse(Success(None))
)
def addStuff(g: Group): Try[Group] =
for {
members <- getMembersForGroup(g.id)
meetings <- getMeetingsForGroup(g.id)
} yield g.copy(members = members, meetings = meetings)
What I don't like about my solution is that I have to wrap the group returned by addStuff in an Option to perform the getOrElse. At this point the type is Option[Try[Option[Group]]] which I think makes the solution difficult to understand at first glance.
Is there a simpler solution to this problem?
Cats has an OptionT type that might simplify this: documentation here and source here.
Your example would be:
def getGroupWithStuff(id: Long): OptionT[Try, Group] = {
for {
g <- OptionT(getGroup(id))
members <- OptionT.liftF(getMembersForGroup(g.id))
meetings <- OptionT.liftF(getMeetingsForGroup(g.id))
} yield g.copy(members = members, meetings = meetings)
}
You could use .fold instead of .map.getOrElse ... That makes it a little bit nicer:
getGroup(id)
.flatMap {
_.fold(Try(Option.empty[Group])){
addStuff(_).map(Option.apply)
}
}
or write the two cases explicitly - that may look a little clearer in this case, because you can avoid having to spell out the ugly looking type signature:
getGroup(id).flatMap {
case None => Success(None)
case Some(group) => addStuff(group).map(Option.apply)
}
You probably could simplify your getGroup call to:
getGroup(id).map(
groupOpt => groupOpt.flatMap(
group => addStuff(group).toOption
)
)
, however that would be at cost of ignoring potential failure info from addStuff call. If it is not acceptable then it is unlikely you could simplify your code further.
Try this. You get to keep your for comprehension syntax as well as Failure information from any of the three calls (whichever fails first).
def getFullGroup(id: Long): Try[Option[Group]] =
getGroup(id).flatMap[Option[Group]] { _.map[Try[Group]]{ group =>
for {
meetings <- getMeetingsForGroup(id)
members <- getMembersForGroup
} yield group.copy(meetings = meetings, members = members)
}.fold[Try[Option[Group]]](Success(None))(_.map(Some(_)))
}
Note the type acrobatics at the end:
fold[Try[Option[Group]]](Success(None))(_.map(Some(_)))
It's hard to get right without type annotations and an IDE. In this particular case, that's not too bad, but imagine meetings and members depended on another nested try option which in turn depended on the original. Or imagine if you wanted to a comprehension on individual Meetings and Groups rather than using the entire list.
You can try using an OptionT monad transformer from cats or scalaz to stack Try[Option[Group]] into a non-nested OptionT[Try, Group]. If you use a monad transformer, it can look like this:
def getFullGroup(id: Long): OptionT[Try, Group] =
OptionT(getGroup(id)).flatMapF { group =>
for {
meetings <- getMeetingsForGroup(id)
members <- getMembersForGroup(id)
} yield group.copy(meetings = meetings, members = members)
}
}
For this particular case, there's not really much gain. But do look into it if you have a lot of this kind of code.
By the way, the boilerplate at the end of the first example that flips the Try and Option is called a sequence. When it follows a map, the whole thing is called traverse. It's a pattern that comes up often and is abstracted away by functional programming libraries. Instead of using OptionT, you can do something like:
def getFullGroup(id: Long): Try[Option[Group]] =
getGroup(id).flatMap[Option[Group]] { _.traverse { group =>
for {
meetings <- getMeetingsForGroup(id)
members <- getMembersForGroup
} yield group.copy(meetings = meetings, members = members)
}
}
(Generally, if you're mapping f then flipping monads, you want to traverse with f.)
I am trying to figure out the scala way to implment something I would do all the time in java.
In java I would have snarf_image (below) return null if it meets the if condition, otherwise return the bArray.
What is the scala way to do this? This code doesnt even compile and I cant figure out the right way to do it - Im sure my thinking is off.
def snarf_image ( sUrl : String ) : Array[Byte] = {
val bis = new BufferedInputStream(new URL(sUrl.replace(" ", "%20")).openStream())
val bArray = Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte).toArray
val img = ImageProcessing.ArrayToImage(bArray)
if ( img.getHeight < 100 || img.getWidth < 100 ) {
Empty
} else {
bArray
}
}
For the record I am using lift (hence using empty) but Im pretty sure this is more a scala question.
You can sometimes use Option for cases where you want to return null (in Java).
I haven't compiled it, but it should work.
def snarf_image ( sUrl : String ) : Option[Array[Byte]] = {
val bis = new BufferedInputStream(new URL(sUrl.replace(" ", "%20")).openStream())
val bArray = Stream.continually(bis.read).takeWhile(-1 !=).map(_.toByte).toArray
val img = ImageProcessing.ArrayToImage(bArray)
if ( img.getHeight < 100 || img.getWidth < 100 ) {
None
} else {
Some(bArray)
}
}
TLDR: there are many options to use: Option, Box, Either, Try and even Future (and more can be found in different libraries), likely you will be good with Option.
First of all you can use empty collections instead of null -- it is common advice not only for Scala, but for many other languages including Java and C#:
def snarf_image ( sUrl : String ) : Array[Byte] = {
// ...
if ( img.getHeight < 100 || img.getWidth < 100 ) {
Array.empty[Byte]
} else {
bArray
}
}
Of course this might not save you from the surprises, as you may accidentally forgot to check collection for emptiness, but it is more likely that you will got a huge surprise with null.
See also Is it better to return null or empty collection?
Next, comes Option.
Option
Use option when you either have something (Some) or nothing (None), and you don't care why (there is only one reason or it does not matter).
The option is not a scala invention (I've seen it ML languages, in Haskell it is known as Maybe, it even comes to the std lib in java 8 and available to use in earlier java versions as a guava part).
It is used extensively in standard library and you will likely see it in many of the 3rd-party libraries. Canonical example could be retrieval from Map, when there is no such key -- Option emphasizes the fact that Map could not contain such key so you have to either -- dealt with possibility of missing key or propagate it further.
val xys = Map(1 -> "foo", 2 -> "bar")
xys.get(3)
// Option[String] = None
xys.get(1)
// Option[String] = Some(foo)
xys.get(2).toUpperCase
// cannot operate directly on Option -- has to unwrap it first:
// error: value toUpperCase is not a member of Option[String]
Either
Use Either when you want to know why it failed or when you have two possibilities (not only success/failure) or when you use older version of scala (prior to 2.10)
It has quite same roots as an Option and is used not only in Scala but in some other languages (e.g. already mentioned Haskell). Besides doubtful distinction -- Either is not a monad in Scala, the main difference is that you don't have empty element, you have either right or left. It is a way less commonly used than Option because it is not instantly clear what should be right and what should be left, but the common convention is to use Right for the successful value and Left for a faulty path (usually it contains Throwable object with explanation of what gone wrong). The Either is mostly oysted by Try which is not that fabulos and originally born in Twitter (one of the biggest and earliest Scala adopters out there).
Try
Use try for successful value/failure reason in scala 2.10+ (or 2.9.3, since it was back ported)
It is a little bit more convenient since the cases named Success and Failure.
The example from official doc (which is nice by the way):
import scala.util.{Try, Success, Failure}
def divide: Try[Int] = {
val dividend = Try(Console.readLine("Enter an Int that you'd like to divide:\n").toInt)
val divisor = Try(Console.readLine("Enter an Int that you'd like to divide by:\n").toInt)
val problem = dividend.flatMap(x => divisor.map(y => x/y))
problem match {
case Success(v) =>
println("Result of " + dividend.get + "/"+ divisor.get +" is: " + v)
Success(v)
case Failure(e) =>
println("You must've divided by zero or entered something that's not an Int. Try again!")
println("Info from the exception: " + e.getMessage)
divide
}
}
Future
Use future when your option is split over space–time continuum: right now it is empty, but seconds ago it is fulfilled (but not vice versa -- future is a process of fulfillment).
Future is new in Scala, just like Try and just like Try it made it's path from Twitter util. It used in concurrent code where you want to send some work to the background and either await result some time later or invoke a callback on completion. Try serves as a definite result -- future is completed successfully or ended up abnormally (caught an exception).
Note that there are many implementations of Future (Scalaz, Unfiltered, Twitter, Akka) -- they are likely will be unified with scala.actors.Future
Further reading is this brilliant overview of scala Futures/Promises.
Third-party
Box should be used in Lift ecosystem and basically is an Option on steroids which as Empty/Full paths.
Or and Every. Or mimics Either and Box, Every is used to collect many errors.
Validation. Used from within Scalaz library.
Given rowParser of type RowParser[Photo], this is how you would parse a list of rows coming from a table photo, according to the code samples I have seen so far:
def getPhotos(album: Album): List[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
).as(rowParser *)
}
Where the * operator creates a parser of type ResultSetParser[List[Photo]]. Now, I was wondering if it was equally possible to get a parser that yields a Stream (thinking that being more lazy is always better), but I only came up with this:
def getPhotos(album: Album): Stream[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
)() collect (rowParser(_) match { case Success(photo) => photo })
}
It works, but it seems overly complicated. I could of course just call toStream on the List I get from the first function, but my goal was to only apply rowParser on rows that are actually read. Is there an easier way to achieve this?
EDIT: I know that limit should be used in the query, if the number of rows of interest is known beforehand. I am also aware that, in many cases, you are going to use the whole result anyway, so being lazy will not improve performance. But there might be a case where you save a few cycles, e.g. if for some reason, you have search criteria that you cannot or do not want to express in SQL. So I thought it was odd that, given the fact that anorm provides a way to obtain a Stream of SqlRow, I didn't find a straightforward way to apply a RowParser on that.
I ended up creating my own stream method which corresponds to the list method:
def stream[A](p: RowParser[A]) = new ResultSetParser[Stream[A]] {
def apply(rows: SqlParser.ResultSet): SqlResult[Stream[A]] = rows.headOption.map(p(_)) match {
case None => Success(Stream.empty[A])
case Some(Success(a)) => {
val s: Stream[A] = a #:: rows.tail.flatMap(r => p(r) match {
case Success(r) => Some(r)
case _ => None
})
Success(s)
}
case Some(Error(msg)) => Error(msg)
}
}
Note that the Play SqlResult can only be either Success/Error while each row can also be Success/Error. I handle this for the first row only, assuming the rest will be the same. This may or may not work for you.
You're better off making smaller (paged) queries using limit and offset.
Anorm would need some modification if you're going to keep your (large) result around in memory and stream it from there. Then the other concern would be the new memory requirements for your JVM. And how would you deal with caching on the service level? See, previously you could easily cache something like photos?page=1&size=10, but now you just have photos, and the caching technology would have no idea what to do with the stream.
Even worse, and possibly on a JDBC-level, wrapping Stream around limited and offset-ed execute statements and just making multiple calls to the database behind the scenes, but this sounds like it would need a fair bit of work to port the Stream code that Scala generates to Java land (to work with Groovy, jRuby, etc), then get it on the approved for the JDBC 5 or 6 roadmap. This idea will probably be shunned as being too complicated, which it is.
You could wrap Stream around your entire DAO (where the limit and offset trickery would happen), but this almost sounds like more trouble than it's worth :-)
I ran into a similar situation but ran into a Call Stack Overflow exception when the built-in anorm function to convert to Streams attempted to parse the result set.
In order to get around this I elected to abandon the anorm ResultSetParser paradigm, and fall back to the java.sql.ResultSet object.
I wanted to use anorm's internal classes for the parsing result set rows, but, ever since version 2.4, they have made all of the pertinent classes and methods private to their package, and have deprecated several other methods that would have been more straight-forward to use.
I used a combination of Promises and Futures to work around the ManagedResource that anorm now returns. I avoided all deprecated functions.
import anorm._
import java.sql.ResultSet
import scala.concurrent._
def SqlStream[T](sql:SqlQuery)(parse:ResultSet => T)(implicit ec:ExecutionContext):Future[Stream[T]] = {
val conn = db.getConnection()
val mr = sql.preparedStatement(conn, false)
val p = Promise[Unit]()
val p2 = Promise[ResultSet]()
Future {
mr.map({ stmt =>
p2.success(stmt.executeQuery)
Await.ready(p.future, duration.Duration.Inf)
}).acquireAndGet(identity).andThen { case _ => conn.close() }
}
def _stream(rs:ResultSet):Stream[T] = {
if (rs.next()) parse(rs) #:: _stream(rs)
else {
p.success(())
Stream.empty
}
}
p2.future.map { rs =>
rs.beforeFirst()
_stream(rs)
}
}
A rather trivial usage of this function would be something like this:
def getText(implicit ec:ExecutionContext):Future[Stream[String]] = {
SqlStream(SQL("select FIELD from TABLE")) { rs => rs.getString("FIELD") }
}
There are, of course, drawbacks to this approach, however, this got around my problem and did not require inclusion of any other libraries.
I found a similar question but it has what seems to be a simpler case, where the expensive operation is always the same. In my case, I want to collect a set of results of some expensive API calls that I'd like to execute in parallel.
Say I have:
def apiRequest1(q: Query): Option[Result]
def apiRequest2(q: Query): Option[Result]
where q is the same value.
I'd like a List[Result] or similar (obviously List[Option[Result]] is fine) and I'd like the two expensive operations to happen in parallel.
Naturally a simple List constructor doesn't execute in parallel:
List(apiRequest1(q), apiRequest2(q))
Can the parallel collections help? Or should I be looking to futures and the like instead? The only approach I can think of using parallel collections seems hacky:
List(q, q).par.zipWithIndex.flatMap((q) =>
if (q._2 % 2 == 0) apiRequest1(q._1) else apiRequest2(q._1)
)
Actually, all things being equal, maybe that isn't so bad...
Why don’t you write
List(apiRequest1 _, apiRequest2 _).par.map(_(q))
Quick and dirty solution:
scala> def apiRequest1(q: Query): Option[Result] = { Thread.sleep(1000); Some(new Result) }
apiRequest1: (q: Query)Option[Result]
scala> def apiRequest2(q: Query): Option[Result] = { Thread.sleep(3000); Some(new Result) }
apiRequest2: (q: Query)Option[Result]
scala> val f = List(() => apiRequest1(q), () => apiRequest2(q)).par.map(_())
f: scala.collection.parallel.immutable.ParSeq[Option[Result]] = ParVector(Some(Result#1f24908), Some(Result#198c0b5))
I m not sure it would actually work in parallel if you have only two or a small numbers of calls, there is a threshold for parallelization, and it would probably work sequentially with so small a collection, on the ground that it is not worth the parallelization overhead (it can't know that as it depends on the the operation you want to run, but it is reasonable to have a threshold on collection operations).
I'm asking a slight different question than this one. Suppose I have a code snippet:
def foo(i : Int) : List[String] = {
val s = i.toString + "!" //using val
s :: Nil
}
This is functionally equivalent to the following:
def foo(i : Int) : List[String] = {
def s = i.toString + "!" //using def
s :: Nil
}
Why would I choose one over the other? Obviously I would assume the second has a slight disadvantages in:
creating more bytecode (the inner def is lifted to a method in the class)
a runtime performance overhead of invoking a method over accessing a value
non-strict evaluation means I could easily access s twice (i.e. unnecesasarily redo a calculation)
The only advantage I can think of is:
non-strict evaluation of s means it is only called if it is used (but then I could just use a lazy val)
What are peoples' thoughts here? Is there a significant dis-benefit to me making all inner vals defs?
1)
One answer I didn't see mentioned is that the stack frame for the method you're describing could actually be smaller. Each val you declare will occupy a slot on the JVM stack, however, the whenever you use a def obtained value it will get consumed in the first expression you use it in. Even if the def references something from the environment, the compiler will pass .
The HotSpot should optimize both these things, or so some people claim. See:
http://www.ibm.com/developerworks/library/j-jtp12214/
Since the inner method gets compiled into a regular private method behind the scene and it is usually very small, the JIT compiler might choose to inline it and then optimize it. This could save time allocating smaller stack frames (?), or, by having fewer elements on the stack, make local variables access quicker.
But, take this with a (big) grain of salt - I haven't actually made extensive benchmarks to backup this claim.
2)
In addition, to expand on Kevin's valid reply, the stable val provides also means that you can use it with path dependent types - something you can't do with a def, since the compiler doesn't check its purity.
3)
For another reason you might want to use a def, see a related question asked not so long ago:
Functional processing of Scala streams without OutOfMemory errors
Essentially, using defs to produce Streams ensures that there do not exist additional references to these objects, which is important for the GC. Since Streams are lazy anyway, the overhead of creating them is probably negligible even if you have multiple defs.
The val is strict, it's given a value as soon as you define the thing.
Internally, the compiler will mark it as STABLE, equivalent to final in Java. This should allow the JVM to make all sorts of optimisations - I just don't know what they are :)
I can see an advantage in the fact that you are less bound to a location when using a def than when using a val.
This is not a technical advantage but allows for better structuring in some cases.
So, stupid example (please edit this answer, if you’ve got a better one), this is not possible with val:
def foo(i : Int) : List[String] = {
def ret = s :: Nil
def s = i.toString + "!"
ret
}
There may be cases where this is important or just convenient.
(So, basically, you can achieve the same with lazy val but, if only called at most once, it will probably be faster than a lazy val.)
For a local declaration like this (with no arguments, evaluated precisely once and with no code evaluated between the point of declaration and the point of evaluation) there is no semantic difference. I wouldn't be surprised if the "val" version compiled to simpler and more efficient code than the "def" version, but you would have to examine the bytecode and possibly profile to be sure.
In your example I would use a val. I think the val/def choice is more meaningful when declaring class members:
class A { def a0 = "a"; def a1 = "a" }
class B extends A {
var c = 0
override def a0 = { c += 1; "a" + c }
override val a1 = "b"
}
In the base class using def allows the sub class to override with possibly a def that does not return a constant. Or it could override with a val. So that gives more flexibility than a val.
Edit: one more use case of using def over val is when an abstract class has a "val" for which the value should be provided by a subclass.
abstract class C { def f: SomeObject }
new C { val f = new SomeObject(...) }