Scala work with multiple futures - scala

I have two source of data which returns List[Int]
// first work with postgresql database
object Db {
def get: Future[List[Int]] = // impl
}
// second it's remote service
object Rs {
def get: Future[List[Int]] = // impl
}
And then i want to return two lists. But i do not know how to dealing with exceptions:
Db possible throw ConnectionUnavailable
Remote service - Bad Request, or InternalServer error
Both - TimeoutException
But, when i have only results from db, i want to return it. If i have results from db and from remote service i want to return sum of two lists.
How to work with this case?

val total: Future[Int] =
Db.get.flatMap { dbResults =>
Rs.get.map { remoteResults =>
dbResults.sum + remoteResults.sum
}
}
or equivalently
val total: Future[Int] = for {
dbResults <- Db.get
remoteResults <- Rs.get
} yield dbResults.sum + remoteResults.sum
I explicitly annotated the result type for the sake of clarity but it's not necessary.
total is a Future[Int] holding either a successful or a failed computation. If you need to handle errors you can attach a onFailure handler on it. For instance
total.onFailure {
case e: TimeoutException => // ...
case e: ConnectionError => // ...
}
(the names of the exceptions are made up)

You need to combine flatMap and recover:
for {
db <- Db.get
rs <- Rs.get.recover {
case e =>
logger.error("Error requesting external service", e)
List.fill(db.length)(0)
}
} yield (db, rs).zipped.map(_+_).sum
You can tweak the transformations if you like (I assumed that you meant element-wise sum of lists), but the basic idea stays the same - if you want to "ignore" the failure of some future, you need to call recover on it.
If you want, you can extract the recovering function from the for comprehension, but recover still has to be called inside of it:
def handler(n: Int): PartialFunction[Throwable, List[Int]] = {
case e =>
logger.error("Error requesting external service", e)
List.fill(n)(0)
}
for {
db <- Db.get
rs <- Rs.get.recover(handler(db.length))
} yield (db, rs).zipped.map(_+_).sum

Related

Why a Thread.sleep or closing the connection is required after waiting for a remove call to complete?

I'm again seeking you to share your wisdom with me, the scala padawan!
I'm playing with reactive mongo in scala and while I was writting a test using scalatest, I faced the following issue.
First the code:
"delete" when {
"passing an existent id" should {
"succeed" in {
val testRecord = TestRecord(someString)
Await.result(persistenceService.persist(testRecord), Duration.Inf)
Await.result(persistenceService.delete(testRecord.id), Duration.Inf)
Thread.sleep(1000) // Why do I need that to make the test succeeds?
val thrownException = intercept[RecordNotFoundException] {
Await.result(persistenceService.read(testRecord.id), Duration.Inf)
}
thrownException.getMessage should include(testRecord._id.toString)
}
}
}
And the read and delete methods with the code initializing connection to db (part of the constructor):
class MongoPersistenceService[R](url: String, port: String, databaseName: String, collectionName: String) {
val driver = MongoDriver()
val parsedUri: Try[MongoConnection.ParsedURI] = MongoConnection.parseURI("%s:%s".format(url, port))
val connection: Try[MongoConnection] = parsedUri.map(driver.connection)
val mongoConnection = Future.fromTry(connection)
def db: Future[DefaultDB] = mongoConnection.flatMap(_.database(databaseName))
def collection: Future[BSONCollection] = db.map(_.collection(collectionName))
def read(id: BSONObjectID): Future[R] = {
val query = BSONDocument("_id" -> id)
val readResult: Future[R] = for {
coll <- collection
record <- coll.find(query).requireOne[R]
} yield record
readResult.recover {
case NoSuchResultException => throw RecordNotFoundException(id)
}
}
def delete(id: BSONObjectID): Future[Unit] = {
val query = BSONDocument("_id" -> id)
// first read then call remove. Read will throw if not present
read(id).flatMap { (_) => collection.map(coll => coll.remove(query)) }
}
}
So to make my test pass, I had to had a Thread.sleep right after waiting for the delete to complete. Knowing this is evil usually punished by many whiplash, I want learn and find the proper fix here.
While trying other stuff, I found instead of waiting, entirely closing the connection to the db was also doing the trick...
What am I misunderstanding here? Should a connection to the db be opened and close for each call to it? And not do many actions like adding, removing, updating records with one connection?
Note that everything works fine when I remove the read call in my delete function. Also by closing the connection, I mean call close on the MongoDriver from my test and also stop and start again embed Mongo which I'm using in background.
Thanks for helping guys.
Warning: this is a blind guess, I've no experience with MongoDB on Scala.
You may have forgotten to flatMap
Take a look at this bit:
collection.map(coll => coll.remove(query))
Since collection is Future[BSONCollection] per your code and remove returns Future[WriteResult] per doc, so actual type of this expression is Future[Future[WriteResult]].
Now, you have annotated your function as returning Future[Unit]. Scala often makes Unit as a return value by throwing away possibly meaningful values, which it does in your case:
read(id).flatMap { (_) =>
collection.map(coll => {
coll.remove(query) // we didn't wait for removal
() // before returning unit
})
}
So your code should probably be
read(id).flatMap(_ => collection.flatMap(_.remove(query).map(_ => ())))
Or a for-comprehension:
for {
_ <- read(id)
coll <- collection
_ <- coll.remove(query)
} yield ()
You can make Scala warn you about discarded values by adding a compiler flag (assuming SBT):
scalacOptions += "-Ywarn-value-discard"

Scala - Batched Stream from Futures

I have instances of a case class Thing, and I have a bunch of queries to run that return a collection of Things like so:
def queries: Seq[Future[Seq[Thing]]]
I need to collect all Things from all futures (like above) and group them into equally sized collections of 10,000 so they can be serialized to files of 10,000 Things.
def serializeThings(Seq[Thing]): Future[Unit]
I want it to be implemented in such a way that I don't wait for all queries to run before serializing. As soon as there are 10,000 Things returned after the futures of the first queries complete, I want to start serializing.
If I do something like:
Future.sequence(queries)
It will collect the results of all the queries, but my understanding is that operations like map won't be invoked until all queries complete and all the Things must fit into memory at once.
What's the best way to implement a batched stream pipeline using Scala collections and concurrent libraries?
I think that I managed to make something. The solution is based on my previous answer. It collects results from Future[List[Thing]] results until it reaches a treshold of BatchSize. Then it calls serializeThings future, when it finishes, the loop continues with the rest.
object BatchFutures extends App {
case class Thing(id: Int)
def getFuture(id: Int): Future[List[Thing]] = {
Future.successful {
List.fill(3)(Thing(id))
}
}
def serializeThings(things: Seq[Thing]): Future[Unit] = Future.successful {
//Thread.sleep(2000)
println("processing: " + things)
}
val ids = (1 to 4).toList
val BatchSize = 5
val future = ids.foldLeft(Future.successful[List[Thing]](Nil)) {
case (acc, id) =>
acc flatMap { processed =>
getFuture(id) flatMap { res =>
val all = processed ++ res
val (batch, rest) = all.splitAt(5)
if (batch.length == BatchSize) { // if futures filled the batch with needed amount
serializeThings(batch) map { _ =>
rest // process the rest
}
} else {
Future.successful(all) //if we need more Things for a batch
}
}
}
}.flatMap { rest =>
serializeThings(rest)
}
Await.result(future, Duration.Inf)
}
The result prints:
processing: List(Thing(1), Thing(1), Thing(1), Thing(2), Thing(2))
processing: List(Thing(2), Thing(3), Thing(3), Thing(3), Thing(4))
processing: List(Thing(4), Thing(4))
When the number of Things isn't divisible by BatchSize we have to call serializeThings once more(last flatMap). I hope it helps! :)
Before you do Future.sequence do what you want to do with individual future and then use Future.sequence.
//this can be used for serializing
def doSomething(): Unit = ???
//do something with the failed future
def doSomethingElse(): Unit = ???
def doSomething(list: List[_]) = ???
val list: List[Future[_]] = List.fill(10000)(Future(doSomething()))
val newList =
list.par.map { f =>
f.map { result =>
doSomething()
}.recover { case throwable =>
doSomethingElse()
}
}
Future.sequence(newList).map ( list => doSomething(list)) //wait till all are complete
instead of newList generation you could use Future.traverse
Future.traverse(list)(f => f.map( x => doSomething()).recover {case th => doSomethingElse() }).map ( completeListOfValues => doSomething(completeListOfValues))

Scala Synchronising Asynchronous calls with Future

I have a method that does a couple of database look up and performs some logic.
The MyType object that I return from the method is as follows:
case class MyResultType(typeId: Long, type1: Seq[Type1], type2: Seq[Type2])
The method definition is like this:
def myMethod(typeId: Long, timeInterval: Interval) = async {
// 1. check if I can find an entity in the database for typeId
val myTypeOption = await(db.run(findMyTypeById(typeId))) // I'm getting the headOption on this result
if (myTypeOption.isDefined) {
val anotherDbLookUp = await(doSomeDBStuff) // Line A
// the interval gets split and assume that I get a List of thse intervals
val intervalList = splitInterval(interval)
// for each of the interval in the intervalList, I do database look up
val results: Seq[(Future[Seq[Type1], Future[Seq[Type2])] = for {
interval <- intervalList
} yield {
(getType1Entries(interval), getType2Entries(interval))
}
// best way to work with the results so that I can return MyResultType
}
else {
None
}
}
Now the getType1Entries(interval) and getType2Entries(interval) each returns a Future of Seq(Type1) and Seq(Type2) entries!
My problem now is to get the Seq(Type1) and Seq(Type2) out of the Future and stuff that into the MyResultType case class?
You could refer to this question you asked
Scala transforming a Seq with Future
so you get the
val results2: Future[Seq([Iterable[Type1], [Iterable[Type2])] = ???
and then call await on it and you have no Futures at all, you can do what you want.
I hope I understood the question correctly.
Oh and by the way you should map myTypeOption instead of checking if it's defined and returning None if it's not
if (myTypeOption.isDefined) {
Some(x)
} else {
None
}
can be simply replaced with
myTypeOption.map { _ => // ignoring what actually was inside option
x // return whatever you want, without wrapping it in Some
}
If I understood your question correctly, then this should do the trick.
def myMethod(typeId: Long, timeInterval: Interval): Option[Seq[MyResultType]] = async {
// 1. check if I can find an entity in the database for typeId
val myTypeOption = await(db.run(findMyTypeById(typeId))) // I'm getting the headOption on this result
if (myTypeOption.isDefined) {
// the interval gets split and assume that I get a List of thse intervals
val intervalList = splitInterval(interval)
// for each of the interval in the intervalList, I do database look up
val results: Seq[(Future[Seq[Type1]], Future[Seq[Type2]])] = for {
interval <- intervalList
} yield {
(getType1Entries(interval), getType2Entries(interval))
}
// best way to work with the results so that I can return MyResultType
Some(
await(
Future.sequence(
results.map{
case (l, r) =>
l.zip(r).map{
case (vl, vr) => MyResultType(typeId, vl, vr)
}
})))
}
else {
None
}
}
There are two parts to your problem, 1) how to deal with two dependent futures, and 2) how to extract the resulting values.
When dealing with dependent futures, I normally compose them together:
val future1 = Future { 10 }
val future2 = Future { 20 }
// results in a new future with type (Int, Int)
val combined = for {
a <- future1
b <- future2
} yield (a, b)
// then you can use foreach/map, Await, or onComplete to do
// something when your results are ready..
combined.foreach { ((a, b)) =>
// do something with the result here
}
To extract the results I generally use Await if I need to make a synchronous response, use _.onComplete() if I need to deal with potential failure, and use _.foreach()/_.map() for most other circumstances.

Scala waiting for sequence of futures

I was hoping code like follows would wait for both futures, but it does not.
object Fiddle {
val f1 = Future {
throw new Throwable("baaa") // emulating a future that bumped into an exception
}
val f2 = Future {
Thread.sleep(3000L) // emulating a future that takes a bit longer to complete
2
}
val lf = List(f1, f2) // in the general case, this would be a dynamically sized list
val seq = Future.sequence(lf)
seq.onComplete {
_ => lf.foreach(f => println(f.isCompleted))
}
}
val a = FuturesSequence
I assumed seq.onComplete would wait for them all to complete before completing itself, but not so; it results in:
true
false
.sequence was a bit hard to follow in the source of scala.concurrent.Future, I wonder how I would implement a parallel that waits for all original futures of a (dynamically sized) sequence, or what might be the problem here.
Edit: A related question: https://worldbuilding.stackexchange.com/questions/12348/how-do-you-prove-youre-from-the-future :)
One common approach to waiting for all results (failed or not) is to "lift" failures into a new representation inside the future, so that all futures complete with some result (although they may complete with a result that represents failure). One natural way to get that is lifting to a Try.
Twitter's implementation of futures provides a liftToTry method that makes this trivial, but you can do something similar with the standard library's implementation:
import scala.util.{ Failure, Success, Try }
val lifted: List[Future[Try[Int]]] = List(f1, f2).map(
_.map(Success(_)).recover { case t => Failure(t) }
)
Now Future.sequence(lifted) will be completed when every future is completed, and will represent successes and failures using Try.
And so, a generic solution for waiting on all original futures of a sequence of futures may look as follows, assuming an execution context is of course implicitly available.
import scala.util.{ Failure, Success, Try }
private def lift[T](futures: Seq[Future[T]]) =
futures.map(_.map { Success(_) }.recover { case t => Failure(t) })
def waitAll[T](futures: Seq[Future[T]]) =
Future.sequence(lift(futures)) // having neutralized exception completions through the lifting, .sequence can now be used
waitAll(SeqOfFutures).map {
// do whatever with the completed futures
}
A Future produced by Future.sequence completes when either:
all the futures have completed successfully, or
one of the futures has failed
The second point is what's happening in your case, and it makes sense to complete as soon as one of the wrapped Future has failed, because the wrapping Future can only hold a single Throwable in the failure case. There's no point in waiting for the other futures because the result will be the same failure.
This is an example that supports the previous answer. There is an easy way to do this using just the standard Scala APIs.
In the example, I am creating 3 futures. These will complete at 5, 7, and 9 seconds respectively. The call to Await.result will block until all futures have resolved. Once all 3 futures have completed, a will be set to List(5,7,9) and execution will continue.
Additionally, if an exception is thrown in any of the futures, Await.result will immediately unblock and throw the exception. Uncomment the Exception(...) line to see this in action.
try {
val a = Await.result(Future.sequence(Seq(
Future({
blocking {
Thread.sleep(5000)
}
System.err.println("A")
5
}),
Future({
blocking {
Thread.sleep(7000)
}
System.err.println("B")
7
//throw new Exception("Ha!")
}),
Future({
blocking {
Thread.sleep(9000)
}
System.err.println("C")
9
}))),
Duration("100 sec"))
System.err.println(a)
} catch {
case e: Exception ⇒
e.printStackTrace()
}
Even though it is quite old question But this is how I got it running in recent time.
object Fiddle {
val f1 = Future {
throw new Throwable("baaa") // emulating a future that bumped into an exception
}
val f2 = Future {
Thread.sleep(3000L) // emulating a future that takes a bit longer to complete
2
}
val lf = List(f1, f2) // in the general case, this would be a dynamically sized list
val seq = Future.sequence(lf)
import scala.concurrent.duration._
Await.result(seq, Duration.Inf)
}
This won't get completed and will wait till all the future gets completed. You can change the waiting time as per your use case. I have kept it to infinite and that was required in my case.
We can enrich Seq[Future[T]] with its own onComplete method through an implicit class:
def lift[T](f: Future[T])(implicit ec: ExecutionContext): Future[Try[T]] =
f map { Success(_) } recover { case e => Failure(e) }
def lift[T](fs: Seq[Future[T]])(implicit ec: ExecutionContext): Seq[Future[Try[T]]] =
fs map { lift(_) }
implicit class RichSeqFuture[+T](val fs: Seq[Future[T]]) extends AnyVal {
def onComplete[U](f: Seq[Try[T]] => U)(implicit ec: ExecutionContext) = {
Future.sequence(lift(fs)) onComplete {
case Success(s) => f(s)
case Failure(e) => throw e // will never happen, because of the Try lifting
}
}
}
Then, in your particular MWE, you can do:
val f1 = Future {
throw new Throwable("baaa") // emulating a future that bumped into an exception
}
val f2 = Future {
Thread.sleep(3000L) // emulating a future that takes a bit longer to complete
2
}
val lf = List(f1, f2)
lf onComplete { _ map {
case Success(v) => ???
case Failure(e) => ???
}}
This solution has the advantage of allowing you to call an onComplete on a sequence of futures as you would on a single future.
Create the Future with a Try to avoid extra hoops.
implicit val ec = ExecutionContext.global
val f1 = Future {
Try {
throw new Throwable("kaboom")
}
}
val f2 = Future {
Try {
Thread.sleep(1000L)
2
}
}
Await.result(
Future.sequence(Seq(f1, f2)), Duration("2 sec")
) foreach {
case Success(res) => println(s"Success. $res")
case Failure(e) => println(s"Failure. ${e.getMessage}")
}

Why does a Scala for-comprehension have to start with a generator?

According to the Scala Language Specification (§6.19), "An enumerator sequence always starts with a generator". Why?
I sometimes find this restriction to be a hindrance when using for-comprehensions with monads, because it means you can't do things like this:
def getFooValue(): Future[Int] = {
for {
manager = Manager.getManager() // could throw an exception
foo <- manager.makeFoo() // method call returns a Future
value = foo.getValue()
} yield value
}
Indeed, scalac rejects this with the error message '<-' expected but '=' found.
If this was valid syntax in Scala, one advantage would be that any exception thrown by Manager.getManager() would be caught by the Future monad used within the for-comprehension, and would cause it to yield a failed Future, which is what I want. The workaround of moving the call to Manager.getManager() outside the for-comprehension doesn't have this advantage:
def getFooValue(): Future[Int] = {
val manager = Manager.getManager()
for {
foo <- manager.makeFoo()
value = foo.getValue()
} yield value
}
In this case, an exception thrown by foo.getValue() will yield a failed Future (which is what I want), but an exception thrown by Manager.getManager() will be thrown back to the caller of getFooValue() (which is not what I want). Other possible ways of handling the exception are more verbose.
I find this restriction especially puzzling because in Haskell's otherwise similar do notation, there is no requirement that a do block should begin with a statement containing <-. Can anyone explain this difference between Scala and Haskell?
Here's a complete working example showing how exceptions are caught by the Future monad in for-comprehensions:
import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.{Try, Success, Failure}
class Foo(val value: Int) {
def getValue(crash: Boolean): Int = {
if (crash) {
throw new Exception("failed to get value")
} else {
value
}
}
}
class Manager {
def makeFoo(crash: Boolean): Future[Foo] = {
if (crash) {
throw new Exception("failed to make Foo")
} else {
Future(new Foo(10))
}
}
}
object Manager {
def getManager(crash: Boolean): Manager = {
if (crash) {
throw new Exception("failed to get manager")
} else {
new Manager()
}
}
}
object Main extends App {
def getFooValue(crashGetManager: Boolean,
crashMakeFoo: Boolean,
crashGetValue: Boolean): Future[Int] = {
for {
manager <- Future(Manager.getManager(crashGetManager))
foo <- manager.makeFoo(crashMakeFoo)
value = foo.getValue(crashGetValue)
} yield value
}
def waitForValue(future: Future[Int]): Unit = {
val result = Try(Await.result(future, Duration("10 seconds")))
result match {
case Success(value) => println(s"Got value: $value")
case Failure(e) => println(s"Got error: $e")
}
}
val future1 = getFooValue(false, false, false)
waitForValue(future1)
val future2 = getFooValue(true, false, false)
waitForValue(future2)
val future3 = getFooValue(false, true, false)
waitForValue(future3)
val future4 = getFooValue(false, false, true)
waitForValue(future4)
}
Here's the output:
Got value: 10
Got error: java.lang.Exception: failed to get manager
Got error: java.lang.Exception: failed to make Foo
Got error: java.lang.Exception: failed to get value
This is a trivial example, but I'm working on a project in which we have a lot of non-trivial code that depends on this behaviour. As far as I understand, this is one of the main advantages of using Future (or Try) as a monad. What I find strange is that I have to write
manager <- Future(Manager.getManager(crashGetManager))
instead of
manager = Manager.getManager(crashGetManager)
(Edited to reflect #RexKerr's point that the monad is doing the work of catching the exceptions.)
for comprehensions do not catch exceptions. Try does, and it has the appropriate methods to participate in for-comprehensions, so you can
for {
manager <- Try { Manager.getManager() }
...
}
But then it's expecting Try all the way down unless you manually or implicitly have a way to switch container types (e.g. something that converts Try to a List).
So I'm not sure your premises are right. Any assignment you made in a for-comprehension can just be made early.
(Also, there is no point doing an assignment inside a for comprehension just to yield that exact value. Just do the computation in the yield block.)
(Also, just to illustrate that multiple types can play a role in for comprehensions so there's not a super-obvious correct answer for how to wrap an early assignment in terms of later types:
// List and Option, via implicit conversion
for {i <- List(1,2,3); j <- Option(i).filter(_ <2)} yield j
// Custom compatible types with map/flatMap
// Use :paste in the REPL to define A and B together
class A[X] { def flatMap[Y](f: X => B[Y]): A[Y] = new A[Y] }
class B[X](x: X) { def map[Y](f: X => Y): B[Y] = new B(f(x)) }
for{ i <- (new A[Int]); j <- (new B(i)) } yield j.toString
Even if you take the first type you still have the problem of whether there is a unique "bind" (way to wrap) and whether to doubly-wrap things that are already the correct type. There could be rules for all these things, but for-comprehensions are already hard enough to learn, no?)
Haskell translates the equivalent of for { manager = Manager.getManager(); ... } to the equivalent of lazy val manager = Manager.getManager(); for { ... }. This seems to work:
scala> lazy val x: Int = throw new Exception("")
x: Int = <lazy>
scala> for { y <- Future(x + 1) } yield y
res8: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#fedb05d
scala> Try(Await.result(res1, Duration("10 seconds")))
res9: scala.util.Try[Int] = Failure(java.lang.Exception: )
I think the reason this can't be done is because for-loops are syntactic sugar for flatMap and map methods (except if you are using a condition in the for-loop, in that case it's desugared with the method withFilter). When you are storing in a immutable variable, you can't use these methods. That's the reason you would be ok using Try as pointed out by Rex Kerr. In that case, you should be able to use map and flatMap methods.