How to cancel a future action if another future did failed? - scala

I have 2 futures (2 actions on db tables) and I want that before saving modifications to check if both futures have finished successfully.
Right now, I start second future inside the first (as a dependency), but I know is not the best option. I know I can use a for-comprehension to execute both futures in parallel, but even if one fail, the other will be executed (not tested yet)
firstFuture.dropColumn(tableName) match {
case Success(_) => secondFuture.deleteEntity(entity)
case Failure(e) => throw new Exception(e.getMessage)
}
// the first future alters a table, drops a column
// the second future deletes a row from another table
in this case, if first future is executed successfully, the second can fail. I want to revert the update of first future. I heard about SQL transactions, seems to be something like that, but how?
val futuresResult = for {
first <- firstFuture.dropColumn(tableName)
second <- secondFuture.deleteEntity(entity)
} yield (first, second)
A for-comprehension is much better in my case because I don't have dependencies between these two futures and can be executed in parallel, but this not solve my problem, the result can be (success, success) or (failed, success) for example.

Regarding Future running sequentially vs in parallel:
This is a bit tricky because Scala Future is designed to be eager. There are some other constructs across various Scala libraries that handle synchronous and asynchronous effects, such as cats IO, Monix Task, ZIO etc. which are designed in a lazy way, and they don't have this behaviour.
The thing with Future being eager is that it will start the computation as soon as it is can. Here "start" means schedule it on an ExecutionContext that is either selected explicitly or present implicitly. While it's technically possible that the execution is stalled a bit in case the scheduler decides to do so, it will most likely be started almost instantaneously.
So if you have a value of type Future, it's going to start running then and there. If you have a lazy value of type Future, or a function / method that returns a value of type Future, then it's not.
But even if all you have are simple values (no lazy vals or defs), if the Future definition is done inside the for-comprehension, then it means it's part of a monadic flatMap chain (if you don't understand this, ignore it for now) and it will be run in sequence, not in parallel. Why? This is not specific to Futures; every for-comprehension has the semantics of being a sequential chain in which you can pass the result of the previous step to the next step. So it's only logical that you can't run something in step n + 1 if it depends on something from step n.
Here's some code to demonstrate this.
val program = for {
_ <- Future { Thread.sleep(5000); println("f1") }
_ <- Future { Thread.sleep(5000); println("f2") }
} yield ()
Await.result(program, Duration.Inf)
This program will wait five seconds, then print "f1", then wait another five seconds, and then print "f2".
Now let's take a look at this:
val f1 = Future { Thread.sleep(5000); println("f1") }
val f2 = Future { Thread.sleep(5000); println("f2") }
val program = for {
_ <- f1
_ <- f2
} yield ()
Await.result(program, Duration.Inf)
The program, however, will print "f1" and "f2" simultaneously after five seconds.
Note that the sequence semantics are not really violated in the second case. f2 still has the opportunity to use the result of f1. But f2 is not using the result of f1; it's a standalone value that can be computed immediately (defined with a val). So if we change val f2 to a function, e.g. def f2(number: Int), then the execution changes:
val f1 = Future { Thread.sleep(5000); println("f1"); 42 }
def f2(number: Int) = Future { Thread.sleep(5000); println(number) }
val program = for {
number <- f1
_ <- f2(number)
} yield ()
As you would expect, this will print "f1" after five seconds, and only then will the other Future start, so it will print "42" after another five seconds.
Regarding transactions:
As #cbley mentioned in the comment, this sounds like you want database transactions. For example, in SQL databases this has a very specific meaning and it ensures the ACID properties.
If that's what you need, you need to solve it on the database layer. Future is too generic for that; it's just an effect type that models sync and async computations. When you see a Future value, just by looking at the type, you can't tell if it's the result of a database call or, say, some HTTP call.
For example, doobie describes every database query as a ConnectionIO type. You can have multiple queries lined up in a for-comprehension, just how you would have with Future:
val program = for {
a <- database.getA()
_ <- database.write("foo")
b <- database.getB()
} yield {
// use a and b
}
But unlike our earlier examples, here getA() and getB() don't return a value of type Future[A], but ConnectionIO[A]. What's cool about that is that doobie completely takes care of the fact that you probably want these queries to be run in a single transaction, so if getB() fails, "foo" will not be committed to the database.
So what you would do in that case is obtain the full description of your set of queries, wrap it into a single value program of type ConnectionIO, and once you want to actually run the transaction, you would do something like program.transact(myTransactor), where myTransactor is an instance of Transactor, a doobie construct that knows how to connect to your physical database.
And once you transact, your ConnectionIO[A] is turned into a Future[A]. If the transaction failed, you'll have a failed Future, and nothing will be really committed to your database.
If your database operations are independent of each other and can be run in parallel, doobie will also allow you to do that. Committing transactions via doobie, both in sequence and in parallel, is quite nicely explained in the docs.

Related

Why is it not recommended to retrieve value from Scala's Future?

I started working on Scala very recently and came across its feature called Future. I had posted a question for help with my code and some help from it.
In that conversation, I was told that it is not recommended to retrieve the value from a Future.
I understand that it is a parallel process when executed but if the value of a Future is not recommended to be retrieved, how/when do I access the result of it ? If the purpose of Future is to run a thread/process independent of main thread, why is it that it is not recommended to access it ? Will the Future automatically assign its output to its caller ? If so, how would we know when to access it ?
I wrote the below code to return a Future with a Map[String, String].
def getBounds(incLogIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
incLogIdMap.keys.foreach(table => if(!incLogIdMap(table).contains("INVALID")) {
val minMax = s"select max(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) maxTms, min(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) minTms from queue.${table} where key_ids in (${incLogIdMap(table)})"
val boundsDF = spark.read.format("jdbc").option("url", commonParams.getGpConUrl()).option("dbtable", s"(${minMax}) as ctids")
.option("user", commonParams.getGpUserName()).option("password", commonParams.getGpPwd()).load()
val maxTms = boundsDF.select("minTms").head.getLong(0).toString + "," + boundsDF.select("maxTms").head.getLong(0).toString
boundsMap += (table -> maxTms)
}
)
boundsMap
}
If I have to use the value which is returned from the method getBounds, can I access it in the below way ?
val tmsobj = new MinMaxVals(spark, commonParams)
tmsobj.getBounds(incLogIds) onComplete ({
case Success(Map) => val boundsMap = tmsobj.getBounds(incLogIds)
case Failure(value) => println("Future failed..")
})
Could anyone care to clear my doubts ?
As the others have pointed out, waiting to retrieve a value from a Future defeats the whole point of launching the Future in the first place.
But onComplete() doesn't cause the rest of your code to wait, it just attaches extra instructions to be carried out as part of the Future thread while the rest of your code goes on its merry way.
So what's wrong with your proposed code to access the result of getBounds()? Let's walk through it.
tmsobj.getBounds(incLogIds) onComplete { //launch Future, when it completes ...
case Success(m) => //if Success then store the result Map in local variable "m"
val boundsMap = tmsobj.getBounds(incLogIds) //launch a new and different Future
//boundsMap is a local variable, it disappears after this code block
case Failure(value) => //if Failure then store error in local variable "value"
println("Future failed..") //send some info to STDOUT
}//end of code block
You'll note that I changed Success(Map) to Success(m) because Map is a type (it's a companion object) and can't be used to match the result of your Future.
In conclusion: onComplete() doesn't cause your code to wait on the Future, which is good, but it is somewhat limited because it returns Unit, i.e. it has no return value with which it can communicate the result of the Future.
TLDR; Futures are not meant to manage shared state but they are good for composing asynchronous pieces of code. You can use map, flatMap and many other operations to combine Futures.
The computation that the Future represents will be executed using the given ExecutionContext (usually given implicitly), which will usually be on a thread-pool, so you are right to assume that the Future computation happens in parallel. Because of this concurrency, it is generally not advised to mutate state that is shared from inside the body of the Future, for example:
var i: Int = 0
val f: Future[Unit] = Future {
// Some computation
i = 42
}
Because you then run the risk of also accessing/modifying i in another thread (maybe the "main" one). In this kind of concurrent access situation, Futures would probably not be the right concurrency model, and you could imagine using monitors or message-passing instead.
Another possibility that is tempting but also discouraged is to block the main thread until the result becomes available:
val f: Future[Init] = Future { 42 }
val i: Int = Await.result(f)
The reason this is bad is that you will completely block the main thread, annealing the benefits of having concurrent execution in the first place. If you do this too much, you might also run in trouble because of a large number of threads that are blocked and hogging resources.
How do you then know when to access the result? You don't and it's actually the reason why you should try to compose Futures as much as possible, and only subscribe to their onComplete method at the very edge of your application. It's typical for most of your methods to take and return Futures, and only subscribe to them in very specific places.
It is not recommended to wait for a Future using Await.result because this blocks the execution of the current thread until some unknown point in the future, possibly forever.
It is perfectly OK to process the value of a Future by passing a processing function to a call such as map on the Future. This will call your function when the future is complete. The result of map is another Future, which can, in turn, be processed using map, onComplete or other methods.

How to specify the exact execution order of asynchronous calls in Scala unit tests?

I have written many different unit tests for futures in Scala.
All asynchronous calls use an execution context.
To make sure that the asynchronous calls are always executed in the same order, I need to delay some tasks which is rather difficult and slows the tests down.
The executor might still (depending on its implementation) complete some tasks before others.
What is the best way to test concurrent code with a specific execution order? For example, I have the following test case:
"firstSucc" should "complete the final future with the first one" in {
val executor = getExecutor
val util = getUtil
val f0 = util.async(executor, () => 10)
f0.sync
val f1 = util.async(executor, () => { delay(); 11 })
val f = f0.firstSucc(f1)
f.get should be(10)
}
where delay is def delay() = Thread.sleep(4000) and sync synchronizes the future (calls Await.ready(future, Duration.Inf)).
That's how I want to make sure that f0 is already completed and f1 completes AFTER f0. It is not enough that f0 is completed since firstSucc could be shuffling the futures. Therefore, f1 should be delayed until after the check of f.get.
Another idea is to create futures from promises and complete them at a certain point in time:
"firstSucc" should "complete the final future with the first one" in {
val executor = getExecutor
val util = getUtil
val f0 = util.async(executor, () => 10)
val p = getPromise
val f1 = p.future
val f = f0.firstSucc(f1)
f.get should be(10)
p.trySuccess(11)
}
Is there any easier/better approach to define the execution order? Maybe another execution service where one can configure the order of submitted tasks?
For this specific case it might be enough to delay the second future until after the result has been checked but in some cases ALL futures have to be completed but in a certain order.
The complete code can be found here: https://github.com/tdauth/scala-futures-promises
The test case is part of this class: https://github.com/tdauth/scala-futures-promises/blob/master/src/test/scala/tdauth/futuresandpromises/AbstractFutureTest.scala
This question might be related since Scala can use Java Executor Services: Controlling Task execution order with ExecutorService
For most simple cases, I'd say a single threaded executor should be enough - if you start your futures one-by-one, they'll be executed serially, and complete in the same order.
But it looks like your problem is actually more complex than what you are describing: you are not only looking for a way to ensure one future completes later than the other, but in general, to make a sequence of arbitrary events happen in a particular order. Fr example, the snippet in your question, verifies that the second future starts after the first one completes (I have not idea what the delay is for in that case btw).
You can use eventually to wait for a particular event to occur before continuing:
val f = Future(doSomething)
eventually {
someFlag shouldBe true
}
val f1 = Future(doSomethingElse)
eventually {
f.isCompleted shouldBe true
}
someFlag = false
eventually {
someFlag shouldBe true
}
f1.futureValue shoudlBe false

Order of execution of Future - Making sequential inserts in a db non-blocking

A simple scenario here. I am using akka streams to read from kafka and write into an external source, in my case: cassandra.
Akka streams(reactive-kafka) library equips me with backpressure and other nifty things to make this possible.
kafka being a Source and Cassandra being a Sink, when I get bunch of events which are, for example be cassandra queries here through Kafka which are supposed to be executed sequentially (ex: it could be a INSERT, UPDATE and a DELETE and must be sequential).
I cannot use mayAsync and execute both the statement, Future is eager and there is a chance that DELETE or UPDATE might get executed first before INSERT.
I am forced to use Cassandra's execute as opposed to executeAsync which is non-blocking.
There is no way to make a complete async solution to this issue, but how ever is there a much elegant way to do this?
For ex: Make the Future lazy and sequential and offload it to a different execution context of sorts.
mapAsync gives a parallelism option as well.
Can Monix Task be of help here?
This a general design question and what are the approaches one can take.
UPDATE:
Flow[In].mapAsync(3)(input => {
input match {
case INSERT => //do insert - returns future
case UPDATE => //do update - returns future
case DELETE => //delete - returns future
}
The scenario is a little more complex. There could be thousands of insert, update and delete coming in order for specific key(s)(in kafka)
I would ideally want to execute the 3 futures of a single key in sequence. I believe Monix's Task can help?
If you process things with parallelism of 1, they will get executed in strict sequence, which will solve your problem.
But that's not interesting. If you want, you can run operations for different keys in parallel - if processing for different keys is independent, which, I assume from your description, is possible. To do this, you have to buffer the incoming values and then regroup it. Let's see some code:
import monix.reactive.Observable
import scala.concurrent.duration._
import monix.eval.Task
// Your domain logic - I'll use these stubs
trait Event
trait Acknowledgement // whatever your DB functions return, if you need it
def toKey(e: Event): String = ???
def processOne(event: Event): Task[Acknowledgement] = Task.deferFuture {
event match {
case _ => ??? // insert/update/delete
}
}
// Monix Task.traverse is strictly sequential, which is what you need
def processMany(evs: Seq[Event]): Task[Seq[Acknowledgement]] =
Task.traverse(evs)(processOne)
def processEventStreamInParallel(source: Observable[Event]): Observable[Acknowledgement] =
source
// Process a bunch of events, but don't wait too long for whole 100. Fine-tune for your data source
.bufferTimedAndCounted(2.seconds, 100)
.concatMap { batch =>
Observable
.fromIterable(batch.groupBy(toKey).values) // Standard collection methods FTW
.mapAsync(3)(processMany) // processing up to 3 different keys in parallel - tho 3 is not necessary, probably depends on your DB throughput
.flatMap(Observable.fromIterable) // flattening it back
}
The concatMap operator here will ensure that your chunks are processed sequentially as well. So even if one buffer has key1 -> insert, key1 -> update and the other has key1 -> delete, that causes no problems. In Monix, this is the same as flatMap, but in other Rx libraries flatMap might be an alias for mergeMap which has no ordering guarantee.
This can be done with Futures too, tho there's no standard "sequential traverse", so you have to roll your own, something like:
def processMany(evs: Seq[Event]): Future[Seq[Acknowledgement]] =
evs.foldLeft(Future.successful(Vector.empty[Acknowledgement])){ (acksF, ev) =>
for {
acks <- acksF
next <- processOne(ev)
} yield acks :+ next
}
You can use akka-streams subflows, to group by key, then merge substreams if you want to do something with what you get from your database operations:
def databaseOp(input: In): Future[Out] = input match {
case INSERT => ...
case UPDATE => ...
case DELETE => ...
}
val databaseFlow: Flow[In, Out, NotUsed] =
Flow[In].groupBy(Int.maxValues, _.key).mapAsync(1)(databaseOp).mergeSubstreams
Note that order from input source won't be kept in output as it is done in mapAsync, but all operations on the same key will still be in order.
You are looking for Future.flatMap:
def doSomething: Future[Unit]
def doSomethingElse: Future[Unit]
val result = doSomething.flatMap { _ => doSomethingElse }
This executes the first function, and then, when its Future is satisfied, starts the second one. The result is a new Future that completes when the result of the second execution is satisfied.
The result of the first future is passed into the function you give to .flatMap, so the second function can depend on the result of the first one. For example:
def getUserID: Future[Int]
def getUser(id: Int): Future[User]
val userName: Future[String] = getUserID.flatMap(getUser).map(_.name)
You can also write this as a for-comprehension:
for {
id <- getUserID
user <- getUser(id)
} yield user.name

Compose stateless futures with side-effecting futures in Scala

While composing futures with a for-yield structure, some with side effects, some without, I introduced a race condition because a future depending on a side effect did not take the result of that side effecting future as an argument.
In short:
future b reads a value that is changed by a side effect from future
a, but future a does not explicitly depend on the result of future b and could therefore happen before b finishes reading.
To solve the problem, my colleague introduced a dummy function taking as an argument the result of b and simply throwing it away. This was done to make the dependency explicit.
The actual code is here:
val futureConnection:Future[(Either[String, (Connection)],Boolean)] =
for {
scannerUser <- scanner.orFail("Scanning user not found")
scannedUser <- futureScannedUser.orFail("Scanned user not found")
existsInAnyDirection <- connections.existsInAnyDirection(scannerUser, scannedUser)
connection <- connections.createConnection(scannerUser, scannedUser, form.magicWord, existsInAnyDirection)
} yield {
(connection, existsInAnyDirection)
}
In this case, future b is
connections.existsInAnyDirection(scannerUser, scannedUser)
and future a with the dummy parameter is
connections.createConnection(scannerUser, scannedUser, form.magicWord, existsInAnyDirection)
Notice that the parameter existsInAnyDirection is never used inside createConnection. This effectively creates the dependency graph that createConnection cannot be initiated before existsInAnyDirection is completed.
Now for the question:
Is there a more sane way to make the dependency explicit?
Bonus Info
My own digging tells me, that the Scala Futures simply don't handle side effects very well. The methods on the Future trait that deal with side effects return Unit, whereas there could very well be results to read from a side effecting operation, i.e. error codes, generated ID's, any other meta info, really.
Future is handling side effect of postponed computation like A => Future[B].
You tried to mix few different side effects but composed only one of them Future[_].
Try to choose second container, this can be Product or State, depends on your side effect and think in way of composing of side-effects (may be you will need modand transformers). And after your code can looks like (simplest cases):
for {
scannerUser <- scanner.orFail("Scanning ...")
(scannedUser, magicWord) <- futureScannedUser.orFail("Scanned ...")
connection <- connections.createConnection(scannerUser, scannedUser, magicWord)
} yield {
(connection, existsInAnyDirection)
}
// OR
for {
(scannerUser, state) <- scanner.orFail("Scanning ...")
(scannedUser, nextState) <- futureScannedUser(state).orFail("Scanned ...")
connection <- connections.createConnection(scannerUser, scannedUser, nextState)
} yield {
(connection, existsInAnyDirection)
}

Use only the value from IO monad without precedent IO actions

Doing some home project, I encountered an interested effect, which now , seems obvious to me, but still I do not see a way to get away from it.
That is the gist (I am using ScalaZ, but in haskell there would be probably the same result):
def askAndReadResponse(question: String): IO[String] = {
putStrLn(question) >> readLn
}
def core: IO[String] = {
val answer: IO[String] = askAndReadResponse("enter something")
val cond: IO[Boolean] = answer map {_.length > 2}
IO.ioMonad.ifM(cond, answer, core)
}
When I am trying to get an input from core, the askAndReadResponse evaluates twice - once for evaluating the condition, and then in ifM (so I have the message and readLn one more time then necessary).
What I need - just the validated value (to print it later, for instance)
Is there any elegant way to do this, in particular - to pass further the result of IO, without preceding IO actions, namely avoiding execution of askAndReadResponse twice?
You can sequence the effects using monadic binding with flatMap:
def core: IO[String] = askAndReadResponse("enter something").flatMap {
case response if response.length > 2 => response.point[IO]
case response => core
}
This lets you take the result of one computation (the user entering text after being prompted) and use it in subsequent computations (the calculation about whether to return or loop, and the result if returning).
ifM just isn't going to be useful in your caseā€”it would only work here if your condition and your successful branch were independent computations.