Converting thunk to sequence upon iteration - scala

I have a server API that returns a list of things, and does so in chunks of, let's say, 25 items at a time. With every response, we get a list of items, and a "token" that we can use for the following server call to return the next 25, and so on.
Please note that we're using a client library that has been written in stodgy old mutable Java, and doesn't lend itself nicely to all of Scala's functional compositional patterns.
I'm looking for a way to return a lazily evaluated sequence of all server items, by doing a server call with the latest token whenever the local list of items has been exhausted. What I have so far is:
def fetchFromServer(uglyStateObject: StateObject): Seq[Thing] = {
val results = server.call(uglyStateObject)
uglyStateObject.update(results.token())
results.asScala.toList ++ (if results.moreAvailable() then
fetchFromServer(uglyStateObject)
else
List())
}
However, this function does eager evaluation. What I'm looking for is to have ++ concatenate a "strict sequence" and a "lazy sequence", where a thunk will be used to retrieve the next set of items from the server. In effect, I want something like this:
results.asScala.toList ++ Seq.lazy(() => fetchFromServer(uglyStateObject))
Except I don't know what to use in place of Seq.lazy.
Things I've seen so far:
SeqView, but I've seen comments that it shouldn't be used because it re-evaluates all the time?
Streams, but they seem like the abstraction is supposed to generate elements at a time, whereas I want to generate a bunch of elements at a time.
What should I use?

I also suggest you to take a look at scalaz-strem. Here is small example how it may look like
import scalaz.stream._
import scalaz.concurrent.Task
// Returns updated state + fetched data
def fetchFromServer(uglyStateObject: StateObject): (StateObject, Seq[Thing]) = ???
// Initial state
val init: StateObject = new StateObject
val p: Process[Task, Thing] = Process.repeatEval[Task, Seq[Thing]] {
var state = init
Task(fetchFromServer(state)) map {
case (s, seq) =>
state = s
seq
}
} flatMap Process.emitAll

As a matter of fact, in the meantime I already found a slightly different answer that I find more readable (indeed using Streams):
def fetchFromServer(uglyStateObject: StateObject): Stream[Thing] = {
val results = server.call(uglyStateObject)
uglyStateObject.update(results.token())
results.asScala.toStream #::: (if results.moreAvailable() then
fetchFromServer(uglyStateObject)
else
Stream.empty)
}
Thanks everyone for

Related

How can I avoid duplicating work on a collectFirst call?

I have something like this:
val yyy = aListOfObjects.collectFirst{
case x if(x.calc("input").isDefined) => x.calc("input").get
}
I have some list of things. Each thing has a method, calc, that does work to bring back an optional result, if the calculation was successful or not.
I want to go thru the list of things and bring back the calculated value of the first one that is successful, or None if none of them work.
The snip above does this, but... you'll see I call x.calc() twice. Assuming the call to calc is non-trivial, how can I avoid the double call? (I also don't want to pre-call calc() on all things since I only care about the first one that works.)
You can use the fact that Streams are lazy in scala:
aListOfObjects
.toStream
.flatMap { _.calc("input") }
.headOption
Or define your own extractor:
object Calc {
def unapply(x: TypeOfX) = x.calc("input")
}
Now, you can write: listOfObject.collectFirst { case Calc(x) => x }

What is the advantage of using Option.map over Option.isEmpty and Option.get?

I am a new to Scala coming from Java background, currently confused about the best practice considering Option[T].
I feel like using Option.map is just more functional and beautiful, but this is not a good argument to convince other people. Sometimes, isEmpty check feels more straight forward thus more readable. Is there any objective advantages, or is it just personal preference?
Example:
Variation 1:
someOption.map{ value =>
{
//some lines of code
}
} orElse(foo)
Variation 2:
if(someOption.isEmpty){
foo
} else{
val value = someOption.get
//some lines of code
}
I intentionally excluded the options to use fold or pattern matching. I am simply not pleased by the idea of treating Option as a collection right now, and using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO. But no matter why I dislike these options, I want to keep the scope of this question to be the above two variations as named in the title.
Is there any objective advantages, or is it just personal preference?
I think there's a thin line between objective advantages and personal preference. You cannot make one believe there is an absolute truth to either one.
The biggest advantage one gains from using the monadic nature of Scala constructs is composition. The ability to chain operations together without having to "worry" about the internal value is powerful, not only with Option[T], but also working with Future[T], Try[T], Either[A, B] and going back and forth between them (also see Monad Transformers).
Let's try and see how using predefined methods on Option[T] can help with control flow. For example, consider a case where you have an Option[Int] which you want to multiply only if it's greater than a value, otherwise return -1. In the imperative approach, we get:
val option: Option[Int] = generateOptionValue
var res: Int = if (option.isDefined) {
val value = option.get
if (value > 40) value * 2 else -1
} else -1
Using collections style method on Option, an equivalent would look like:
val result: Int = option
.filter(_ > 40)
.map(_ * 2)
.getOrElse(-1)
Let's now consider a case for composition. Let's say we have an operation which might throw an exception. Additionaly, this operation may or may not yield a value. If it returns a value, we want to query a database with that value, otherwise, return an empty string.
A look at the imperative approach with a try-catch block:
var result: String = _
try {
val maybeResult = dangerousMethod()
if (maybeResult.isDefined) {
result = queryDatabase(maybeResult.get)
} else result = ""
}
catch {
case NonFatal(e) => result = ""
}
Now let's consider using scala.util.Try along with an Option[String] and composing both together:
val result: String = Try(dangerousMethod())
.toOption
.flatten
.map(queryDatabase)
.getOrElse("")
I think this eventually boils down to which one can help you create clear control flow of your operations. Getting used to working with Option[T].map rather than Option[T].get will make your code safer.
To wrap up, I don't believe there's a single truth. I do believe that composition can lead to beautiful, readable, side effect deferring safe code and I'm all for it. I think the best way to show other people what you feel is by giving them examples as we just saw, and letting them feel for themselves the power they can leverage with these sets of tools.
using pattern matching for a simple isEmpty check is an abuse of pattern matching IMHO
If you do just want an isEmpty check, isEmpty/isDefined is perfectly fine. But in your case you also want to get the value. And using pattern matching for this is not abuse; it's precisely the basic use-case. Using get allows to very easily make errors like forgetting to check isDefined or making the wrong check:
if(someOption.isEmpty){
val value = someOption.get
//some lines of code
} else{
//some other lines
}
Hopefully testing would catch it, but there's no reason to settle for "hopefully".
Combinators (map and friends) are better than get for the same reason pattern matching is: they don't allow you to make this kind of mistake. Choosing between pattern matching and combinators is a different question. Generally combinators are preferred because they are more composable (as Yuval's answer explains). If you want to do something covered by a single combinator, I'd generally choose them; if you need a combination like map ... getOrElse, or a fold with multi-line branches, it depends on the specific case.
It seems similar to you in case of Option but just consider the case of Future. You will not be able to interact with the future's value after going out of Future monad.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Promise
import scala.util.{Success, Try}
// create a promise which we will complete after sometime
val promise = Promise[String]();
// Now lets consider the future contained in this promise
val future = promise.future;
val byGet = if (!future.value.isEmpty) {
val valTry = future.value.get
valTry match {
case Success(v) => v + " :: Added"
case _ => "DEFAULT :: Added"
}
} else "DEFAULT :: Added"
val byMap = future.map(s => s + " :: Added")
// promise was completed now
promise.complete(Try("PROMISE"))
//Now lets print both values
println(byGet)
// DEFAULT :: Added
println(byMap)
// Success(PROMISE :: Added)

How to pass data from closure without repeating yourself

I'm using Play 2 with Anorm to manage database access. A common pattern I find myself doing is this:
val (futureChecklists, jobsLookup) =
DB.withConnection { implicit connection =>
val futureChecklists = futureChecklistRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
val jobsLookup = futureChecklistJobRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
.groupBy(_.futureChecklist.id)
.withDefaultValue(List.empty)
(futureChecklists, jobsLookup)
}
Which seems kinda weird, because I have to repeat myself. It also gets a bit unruly if I have several variables I'll need in the outer scope, but I don't want to keep the connection open.
Is there an easy way to pass this information back without having to resort to using vars?
What I would like is something like:
val futureChecklists
val jobsLookup
DB.withConnection { implicit connection =>
futureChecklists = futureChecklistRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
jobsLookup = futureChecklistJobRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
.groupBy(_.futureChecklist.id)
.withDefaultValue(List.empty)
}
That way I don't have the same tuple at the beginning and end.
I am afraid there is no easy way not to duplicate the tuple declaration, but var is definitely not the way to go around it.
You're mentioning that it becomes weird and difficult with multiple variables at time which returned as a tuple. This indeed can become really tricky and error prone, especially then you end up having large N-tuples with the same parameter types. In that scenario I would consider having a dedicated contained i.e. a case class where you can reference variables by name and not by position in the tuple. The side benefit is that you can assign the whole container to a variable and reference it in the natural way.
Last but not least you don't mention much about your particular use case, but maybe it is worth considering having the 2 queries results obtained in the separate withConnection block. If you are using any collection pooling mechanism, then there is hardly any benefit having it in the same with connection block and with the separate blocks you might even get a flexibility to pararelize the DB queries using separate connections.
There are three ways that i came up with:
Return tuple immediately
val (users, posts) =
DB.withConnection { connection => (
connection.getUsers,
connection.getPosts
)}
I think this is OK for simple code and small numbers of vals. For more complex code and more vals this can be error prone. Someone can accidentally change order of elements in tuple on just one side of assignment, and assign data to wrong vals (which will be reported by compiler only if it also cause type mismatch).
Use anonymous class
val dbResult =
DB.withConnection { connection =>
new {
val users = connection.getUsers
val posts = connection.getPosts
}
}
If you like to have users and posts variables instead of dbResult.users and dbResult.posts you can:
import dbResult._
This solution is a little exotic, but it works just fine and is quite clean.
Use case class
First define case class for your return value:
case class DBResult(users: List[User], posts: List[Post])
and then use it:
val DBResult(users: List[User], posts: List[Post]) =
DB.withConnection { connection =>
DBResult(
users = connection.getUsers,
posts = connection.getPosts
)
}
This is best if you intend to reuse this case class multiple times.

How to get truly atomic update for TrieMap.getOrElseUpdate

As I understand, TrieMap.getOrElseUpdate is still not truly atomic, and this fixes only returned result (it could return different instances for different callers before this fix), so the updater function still might be called several times, as documentation (for 2.11.7) says:
Note: This method will invoke op at most once. However, op may be invoked without the result being added to the map if a concurrent process is also trying to add a value corresponding to the same key k.
*I've checked that manually for 2.11.7, still "at least once"
How to guarantee one-time call (if I use TrieMap for factories)?
I think this solution should work for my requirements:
trait LazyComp { val get: Int }
val map = new TrieMap[String, LazyComp]()
val count = new AtomicInteger() //just for test, you don't need it
def getSingleton(key: String) = {
val v = new LazyComp {
lazy val get = {
//compute something
count.incrementAndGet() //just for test, you don't need it
}
}
map.putIfAbsent(key, v).getOrElse(v).get
}
I believe, lazy val actually uses synchronized inside. And also the code inside get should be safe from exceptions
However, performance could be improved in future: SIP-20
Test:
scala> (0 to 10000000).par.map(_ => getSingleton("zzz")).last
res8: Int = 1
P.S. Java has computeIfAbscent method on ConcurrentHashMap which I could use as well.

Is there an easy way to get a Stream as output of a RowParser?

Given rowParser of type RowParser[Photo], this is how you would parse a list of rows coming from a table photo, according to the code samples I have seen so far:
def getPhotos(album: Album): List[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
).as(rowParser *)
}
Where the * operator creates a parser of type ResultSetParser[List[Photo]]. Now, I was wondering if it was equally possible to get a parser that yields a Stream (thinking that being more lazy is always better), but I only came up with this:
def getPhotos(album: Album): Stream[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
)() collect (rowParser(_) match { case Success(photo) => photo })
}
It works, but it seems overly complicated. I could of course just call toStream on the List I get from the first function, but my goal was to only apply rowParser on rows that are actually read. Is there an easier way to achieve this?
EDIT: I know that limit should be used in the query, if the number of rows of interest is known beforehand. I am also aware that, in many cases, you are going to use the whole result anyway, so being lazy will not improve performance. But there might be a case where you save a few cycles, e.g. if for some reason, you have search criteria that you cannot or do not want to express in SQL. So I thought it was odd that, given the fact that anorm provides a way to obtain a Stream of SqlRow, I didn't find a straightforward way to apply a RowParser on that.
I ended up creating my own stream method which corresponds to the list method:
def stream[A](p: RowParser[A]) = new ResultSetParser[Stream[A]] {
def apply(rows: SqlParser.ResultSet): SqlResult[Stream[A]] = rows.headOption.map(p(_)) match {
case None => Success(Stream.empty[A])
case Some(Success(a)) => {
val s: Stream[A] = a #:: rows.tail.flatMap(r => p(r) match {
case Success(r) => Some(r)
case _ => None
})
Success(s)
}
case Some(Error(msg)) => Error(msg)
}
}
Note that the Play SqlResult can only be either Success/Error while each row can also be Success/Error. I handle this for the first row only, assuming the rest will be the same. This may or may not work for you.
You're better off making smaller (paged) queries using limit and offset.
Anorm would need some modification if you're going to keep your (large) result around in memory and stream it from there. Then the other concern would be the new memory requirements for your JVM. And how would you deal with caching on the service level? See, previously you could easily cache something like photos?page=1&size=10, but now you just have photos, and the caching technology would have no idea what to do with the stream.
Even worse, and possibly on a JDBC-level, wrapping Stream around limited and offset-ed execute statements and just making multiple calls to the database behind the scenes, but this sounds like it would need a fair bit of work to port the Stream code that Scala generates to Java land (to work with Groovy, jRuby, etc), then get it on the approved for the JDBC 5 or 6 roadmap. This idea will probably be shunned as being too complicated, which it is.
You could wrap Stream around your entire DAO (where the limit and offset trickery would happen), but this almost sounds like more trouble than it's worth :-)
I ran into a similar situation but ran into a Call Stack Overflow exception when the built-in anorm function to convert to Streams attempted to parse the result set.
In order to get around this I elected to abandon the anorm ResultSetParser paradigm, and fall back to the java.sql.ResultSet object.
I wanted to use anorm's internal classes for the parsing result set rows, but, ever since version 2.4, they have made all of the pertinent classes and methods private to their package, and have deprecated several other methods that would have been more straight-forward to use.
I used a combination of Promises and Futures to work around the ManagedResource that anorm now returns. I avoided all deprecated functions.
import anorm._
import java.sql.ResultSet
import scala.concurrent._
def SqlStream[T](sql:SqlQuery)(parse:ResultSet => T)(implicit ec:ExecutionContext):Future[Stream[T]] = {
val conn = db.getConnection()
val mr = sql.preparedStatement(conn, false)
val p = Promise[Unit]()
val p2 = Promise[ResultSet]()
Future {
mr.map({ stmt =>
p2.success(stmt.executeQuery)
Await.ready(p.future, duration.Duration.Inf)
}).acquireAndGet(identity).andThen { case _ => conn.close() }
}
def _stream(rs:ResultSet):Stream[T] = {
if (rs.next()) parse(rs) #:: _stream(rs)
else {
p.success(())
Stream.empty
}
}
p2.future.map { rs =>
rs.beforeFirst()
_stream(rs)
}
}
A rather trivial usage of this function would be something like this:
def getText(implicit ec:ExecutionContext):Future[Stream[String]] = {
SqlStream(SQL("select FIELD from TABLE")) { rs => rs.getString("FIELD") }
}
There are, of course, drawbacks to this approach, however, this got around my problem and did not require inclusion of any other libraries.