I started working on Scala very recently and came across its feature called Future. I had posted a question for help with my code and some help from it.
In that conversation, I was told that it is not recommended to retrieve the value from a Future.
I understand that it is a parallel process when executed but if the value of a Future is not recommended to be retrieved, how/when do I access the result of it ? If the purpose of Future is to run a thread/process independent of main thread, why is it that it is not recommended to access it ? Will the Future automatically assign its output to its caller ? If so, how would we know when to access it ?
I wrote the below code to return a Future with a Map[String, String].
def getBounds(incLogIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
incLogIdMap.keys.foreach(table => if(!incLogIdMap(table).contains("INVALID")) {
val minMax = s"select max(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) maxTms, min(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) minTms from queue.${table} where key_ids in (${incLogIdMap(table)})"
val boundsDF = spark.read.format("jdbc").option("url", commonParams.getGpConUrl()).option("dbtable", s"(${minMax}) as ctids")
.option("user", commonParams.getGpUserName()).option("password", commonParams.getGpPwd()).load()
val maxTms = boundsDF.select("minTms").head.getLong(0).toString + "," + boundsDF.select("maxTms").head.getLong(0).toString
boundsMap += (table -> maxTms)
}
)
boundsMap
}
If I have to use the value which is returned from the method getBounds, can I access it in the below way ?
val tmsobj = new MinMaxVals(spark, commonParams)
tmsobj.getBounds(incLogIds) onComplete ({
case Success(Map) => val boundsMap = tmsobj.getBounds(incLogIds)
case Failure(value) => println("Future failed..")
})
Could anyone care to clear my doubts ?
As the others have pointed out, waiting to retrieve a value from a Future defeats the whole point of launching the Future in the first place.
But onComplete() doesn't cause the rest of your code to wait, it just attaches extra instructions to be carried out as part of the Future thread while the rest of your code goes on its merry way.
So what's wrong with your proposed code to access the result of getBounds()? Let's walk through it.
tmsobj.getBounds(incLogIds) onComplete { //launch Future, when it completes ...
case Success(m) => //if Success then store the result Map in local variable "m"
val boundsMap = tmsobj.getBounds(incLogIds) //launch a new and different Future
//boundsMap is a local variable, it disappears after this code block
case Failure(value) => //if Failure then store error in local variable "value"
println("Future failed..") //send some info to STDOUT
}//end of code block
You'll note that I changed Success(Map) to Success(m) because Map is a type (it's a companion object) and can't be used to match the result of your Future.
In conclusion: onComplete() doesn't cause your code to wait on the Future, which is good, but it is somewhat limited because it returns Unit, i.e. it has no return value with which it can communicate the result of the Future.
TLDR; Futures are not meant to manage shared state but they are good for composing asynchronous pieces of code. You can use map, flatMap and many other operations to combine Futures.
The computation that the Future represents will be executed using the given ExecutionContext (usually given implicitly), which will usually be on a thread-pool, so you are right to assume that the Future computation happens in parallel. Because of this concurrency, it is generally not advised to mutate state that is shared from inside the body of the Future, for example:
var i: Int = 0
val f: Future[Unit] = Future {
// Some computation
i = 42
}
Because you then run the risk of also accessing/modifying i in another thread (maybe the "main" one). In this kind of concurrent access situation, Futures would probably not be the right concurrency model, and you could imagine using monitors or message-passing instead.
Another possibility that is tempting but also discouraged is to block the main thread until the result becomes available:
val f: Future[Init] = Future { 42 }
val i: Int = Await.result(f)
The reason this is bad is that you will completely block the main thread, annealing the benefits of having concurrent execution in the first place. If you do this too much, you might also run in trouble because of a large number of threads that are blocked and hogging resources.
How do you then know when to access the result? You don't and it's actually the reason why you should try to compose Futures as much as possible, and only subscribe to their onComplete method at the very edge of your application. It's typical for most of your methods to take and return Futures, and only subscribe to them in very specific places.
It is not recommended to wait for a Future using Await.result because this blocks the execution of the current thread until some unknown point in the future, possibly forever.
It is perfectly OK to process the value of a Future by passing a processing function to a call such as map on the Future. This will call your function when the future is complete. The result of map is another Future, which can, in turn, be processed using map, onComplete or other methods.
Related
I have 2 futures (2 actions on db tables) and I want that before saving modifications to check if both futures have finished successfully.
Right now, I start second future inside the first (as a dependency), but I know is not the best option. I know I can use a for-comprehension to execute both futures in parallel, but even if one fail, the other will be executed (not tested yet)
firstFuture.dropColumn(tableName) match {
case Success(_) => secondFuture.deleteEntity(entity)
case Failure(e) => throw new Exception(e.getMessage)
}
// the first future alters a table, drops a column
// the second future deletes a row from another table
in this case, if first future is executed successfully, the second can fail. I want to revert the update of first future. I heard about SQL transactions, seems to be something like that, but how?
val futuresResult = for {
first <- firstFuture.dropColumn(tableName)
second <- secondFuture.deleteEntity(entity)
} yield (first, second)
A for-comprehension is much better in my case because I don't have dependencies between these two futures and can be executed in parallel, but this not solve my problem, the result can be (success, success) or (failed, success) for example.
Regarding Future running sequentially vs in parallel:
This is a bit tricky because Scala Future is designed to be eager. There are some other constructs across various Scala libraries that handle synchronous and asynchronous effects, such as cats IO, Monix Task, ZIO etc. which are designed in a lazy way, and they don't have this behaviour.
The thing with Future being eager is that it will start the computation as soon as it is can. Here "start" means schedule it on an ExecutionContext that is either selected explicitly or present implicitly. While it's technically possible that the execution is stalled a bit in case the scheduler decides to do so, it will most likely be started almost instantaneously.
So if you have a value of type Future, it's going to start running then and there. If you have a lazy value of type Future, or a function / method that returns a value of type Future, then it's not.
But even if all you have are simple values (no lazy vals or defs), if the Future definition is done inside the for-comprehension, then it means it's part of a monadic flatMap chain (if you don't understand this, ignore it for now) and it will be run in sequence, not in parallel. Why? This is not specific to Futures; every for-comprehension has the semantics of being a sequential chain in which you can pass the result of the previous step to the next step. So it's only logical that you can't run something in step n + 1 if it depends on something from step n.
Here's some code to demonstrate this.
val program = for {
_ <- Future { Thread.sleep(5000); println("f1") }
_ <- Future { Thread.sleep(5000); println("f2") }
} yield ()
Await.result(program, Duration.Inf)
This program will wait five seconds, then print "f1", then wait another five seconds, and then print "f2".
Now let's take a look at this:
val f1 = Future { Thread.sleep(5000); println("f1") }
val f2 = Future { Thread.sleep(5000); println("f2") }
val program = for {
_ <- f1
_ <- f2
} yield ()
Await.result(program, Duration.Inf)
The program, however, will print "f1" and "f2" simultaneously after five seconds.
Note that the sequence semantics are not really violated in the second case. f2 still has the opportunity to use the result of f1. But f2 is not using the result of f1; it's a standalone value that can be computed immediately (defined with a val). So if we change val f2 to a function, e.g. def f2(number: Int), then the execution changes:
val f1 = Future { Thread.sleep(5000); println("f1"); 42 }
def f2(number: Int) = Future { Thread.sleep(5000); println(number) }
val program = for {
number <- f1
_ <- f2(number)
} yield ()
As you would expect, this will print "f1" after five seconds, and only then will the other Future start, so it will print "42" after another five seconds.
Regarding transactions:
As #cbley mentioned in the comment, this sounds like you want database transactions. For example, in SQL databases this has a very specific meaning and it ensures the ACID properties.
If that's what you need, you need to solve it on the database layer. Future is too generic for that; it's just an effect type that models sync and async computations. When you see a Future value, just by looking at the type, you can't tell if it's the result of a database call or, say, some HTTP call.
For example, doobie describes every database query as a ConnectionIO type. You can have multiple queries lined up in a for-comprehension, just how you would have with Future:
val program = for {
a <- database.getA()
_ <- database.write("foo")
b <- database.getB()
} yield {
// use a and b
}
But unlike our earlier examples, here getA() and getB() don't return a value of type Future[A], but ConnectionIO[A]. What's cool about that is that doobie completely takes care of the fact that you probably want these queries to be run in a single transaction, so if getB() fails, "foo" will not be committed to the database.
So what you would do in that case is obtain the full description of your set of queries, wrap it into a single value program of type ConnectionIO, and once you want to actually run the transaction, you would do something like program.transact(myTransactor), where myTransactor is an instance of Transactor, a doobie construct that knows how to connect to your physical database.
And once you transact, your ConnectionIO[A] is turned into a Future[A]. If the transaction failed, you'll have a failed Future, and nothing will be really committed to your database.
If your database operations are independent of each other and can be run in parallel, doobie will also allow you to do that. Committing transactions via doobie, both in sequence and in parallel, is quite nicely explained in the docs.
Say I do the following:
def foo: Future[Int] = ...
var cache: Option[Int] = None
def getValue: Future[Int] = synchronized {
cache match {
case Some(value) => Future(value)
case None =>
foo.map { value =>
cache = Some(value)
value
}
}
}
Is there a risk of deadlock with the above code? Or can I assume that the synchronzied block applies even within the future map block?
For a deadlock to exist, at least two different lock operations are to be called (in a possibly out of order sequence).
From what you show here (but we do not see what the foo implementation is), this is not the case. Only one lock exist and it is reentrant (if you try to enter twice on the same syncrhronized block from the same thread, you won't lock yourself out).
Therefore, no deadlock is possible from the code you've shown.
Still, I question this design. Maybe it is a simplification of your actual code, but from what I understand, you have
A function that can generate a int
You want to call this function only once and cache its result
I'd simplify your implementation greatly if that's the case :
def expensiveComputation: Int = ???
val foo = Future { expensiveComputation() }
def getValue: Future[Int] = foo
You'd have a single call to expensiveComputation (per instance of your enclosing object), and a synchronized cache on its return value, because Future is in and of itself a concurrency-safe construct.
Note that Future itself functions as a cache (see GPI's answer). However, GPI's answer isn't quite equivalent to your code: your code will only cache a successful value and will retry, while if the initial call to expensiveComputation in GPI's answer fails, getValue will always fail.
This however, gives us retry until successful:
def foo: Future[Int] = ???
private def retryFoo(): Future[Int] = foo.recoverWith{ case _ => retryFoo() }
lazy val getValue: Future[Int] = retryFoo()
In general, anything related to Futures which is asynchronous will not respect the synchronized block, unless you happen to Await on the asynchronous part within the synchronized block (which kind of defeats the point). In your case, it's absolutely possible for the following sequence (among many others) to occur:
Initial state: cache = None
Thread A calls getValue, obtains lock
Thread A pattern matches to None, calls foo to get a Future[Int] (fA0), schedules a callback to run in some thread B on fA0's successful completion (fA1)
Thread A releases lock
Thread A returns fA1
Thread C calls getValue, obtains lock
Thread C patter matches to None, calls foo to get a Future[Int] (fC0), schedules a callback to run in some thread D on fC0's successful completion (fC1)
fA0 completes successfully with value 42
Thread B runs callback on fA0, sets cache = Some(42), completes successfully with value 42
Thread C releases lock
Thread C returns fC1
fC1 completes successfull with value 7
Thread D runs callback on fC0, sets cache = Some(7), completes successfully with value 7
The code above can't deadlock, but there's no guarantee that foo will successfully complete exactly once (it could successfully complete arbitrarily many times), nor is there any guarantee as to which particular value of foo will be returned by a given call to getValue.
EDIT to add: You could also replace
cache = Some(value)
value
with
cache.synchronized { cache = cache.orElse(Some(value)) }
cache.get
Which would prevent cache from being assigned to multiple times (i.e. it would always contain the value returned by the first map callback to execute on a future returned by foo). It probably still wouldn't deadlock (I find that if I have to reason about a deadlock, my time is probably better spent reasoning about a better abstraction), but is this elaborate/verbose machinery better than just using a retry-on-failure Future as a cache?
No, but synchronized isn't actually doing much here. getValue returns almost immediately with a Future (which may or may not be completed yet), so the lock on getValue is extremely short-lived. It does not wait for foo.map to evaluate before releasing the lock, because that is executed only after foo is completed, which will almost certainly happen after getValue returns.
I was reading this article http://danielwestheide.com/blog/2013/01/16/the-neophytes-guide-to-scala-part-9-promises-and-futures-in-practice.html and I was looking at this code:
object Government {
def redeemCampaignPledge(): Future[TaxCut] = {
val p = Promise[TaxCut]()
Future {
println("Starting the new legislative period.")
Thread.sleep(2000)
p.success(TaxCut(20))
println("We reduced the taxes! You must reelect us!!!!1111")
}
p.future
}
}
I've seen this type of code a few times and I'm confused. So we have this Promise:
val p = Promise[TaxCut]()
And this Future:
Future {
println("Starting the new legislative period.")
Thread.sleep(2000)
p.success(TaxCut(20))
println("We reduced the taxes! You must reelect us!!!!1111")
}
I don't see any assignment between them so I don't understand: How are they connected?
I don't see any assignment between them so I don't understand: How are
they connected?
A Promise is a one way of creating a Future.
When you use Future { } and import scala.concurrent.ExecutionContext.Implicits.global, you're queuing a function on one of Scala's threadpool threads. But, that isn't the only way to generate a Future. A Future need not necessarily be scheduled on a different thread.
What this example does is:
Creates a Promise[TaxCut] which will be completed sometime in the near future.
Queues a function to be ran inside a threadpool thread via the Future apply. This function also completes the Promise via the Promise.success method
Returns the future generated by the promise via Promise.future. When this future returns, it may not be completed yet, depending on how fast the execution of the function queued to the Future really runs (the OP was trying to convey this via the Thread.sleep method, delaying the completion of the future).
Im using Play and have an action in which I want to do two things:-
firstly check my cache for a value
secondly, call a web service with the value
Since WS API returns a Future, I'm using Action.async.
My Redis cache module also returns a Future.
Assume I'm using another ExecutionContext appropriately for the potentially long running tasks.
Q. Can someone confirm if I'm on the right track by doing the following. I know I have not catered for the Exceptional cases in the below - just keeping it simple for brevity.
def token = Action.async { implicit request =>
// 1. Get Future for read on cache
val cacheFuture = scala.concurrent.Future {
cache.get[String](id)
}
// 2. Map inside cache Future to call web service
cacheFuture.map { result =>
WS.url(url).withQueryString("id" -> result).get().map { response =>
// process response
Ok(responseData)
}
}
}
My concern is that this may not be the most efficient way of doing things because I assume different threads may handle the task of completing each of the Futures.
Any recommendations for a better approach are greatly appreciated.
That's not specific to Play. I suggest you have a look at documentations explaining how Futures work.
val x: Future[FutureOp2ResType] = futureOp1(???).flatMap { res1 => futureOp2(res1, ???) }
Or with for-comprehension
val x: Future[TypeOfRes] = for {
res1 <- futureOp1(???)
res2 <- futureOp2(res1, ???)
// ...
} yield res
As for how the Futures are executed (using threads), it depends on which ExecutionContext you use (e.g. the global one, the Play one, ...).
WS.get returning a Future, it should not be called within cacheFuture.map, or it will returns a Future[Future[...]].
Given that we must avoid...
1) Modifying state
2) Blocking
...what is a correct end-to-end usage for a Future?
The general practice in using Futures seems to be transforming them into other Futures by using map, flatMap etc. but it's no good creating Futures forever.
Will there always be a call to onComplete somewhere, with methods writing the result of the Future to somewhere external to the application (e.g. web socket; the console; a message broker) or is there a non-blocking way of accessing the result?
All of the information on Futures in the Scaladocs - http://docs.scala-lang.org/overviews/core/futures.html seem to end up writing to the console. onComplete doesn't return anything, so presumably we have to end up doing some "fire-and-forget" IO.
e.g. a call to println
f onComplete {
case Success(number) => println(number)
case Failure(err) => println("An error has occured: " + err.getMessage)
}
But what about in more complex cases where we want to do more with the result of the Future?
As an example, in the Play framework Action.async can return a Future[Result] and the framework handles the rest. Will it eventually have to expect never to get a result from the Future?
We know the user needs to be returned a Result, so how can a framework do this using only a Unit method?
Is there a non-blocking way to retrieve the value of a future and use it elsewhere within the application, or is a call to Await inevitable?
Best practice is to use callbacks such as onComplete, onSuccess, onFailure for side effecting operations, e.g. logging, monitoring, I/O.
If you need the continue with the result of of your Future computation as opposed to do a side-effecting operation, you should use map to get access to the result of your computation and compose over it.
Future returns a unit, yes. That's because it's an asynchronous trigger. You need to register a callback in order to gather the result.
From your referenced scaladoc (with my comments):
// first assign the future with expected return type to a variable.
val f: Future[List[String]] = Future {
session.getRecentPosts
}
// immediately register the callbacks
f onFailure {
case t => println("An error has occurred: " + t.getMessage)
}
f onSuccess {
case posts => for (post <- posts) println(post)
}
Or instead of println-ing you could do something with the result:
f onSuccess {
case posts: List[String] => someFunction(posts)
}
Try this out:
import scala.concurrent.duration._
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
val f: Future[Int] = Future { 43 }
val result: Int = Await.result(f, 0 nanos)
So what is going on here?
You're defining a computation to be executed on a different thread.
So you Future { 43 } returns immediately.
Then you can wait for it and gather the result (via Await.result) or define computation on it without waiting for it to be completed (via map etc...)
Actually, the kind of Future you are talking about are used for side-effects. The result returned by a Future depends its type :
val f = Future[Int] { 42 }
For example, I could send the result of Future[Int] to another Future :
val f2 = f.flatMap(integer => Future{ println(integer) }) // may print 42
As you know, a future is a process that happens concurrently. So you can get its result in the future (that is, using methods such as onComplete) OR by explicitly blocking the current thread until it gets a value :
import scala.concurrent.Await
import akka.util.Timeout
import scala.concurrent.duration._
implicit val timeout = Timeout(5 seconds)
val integer = Await.result(Future { 42 }, timeout.duration)
Usually when you start dealing with asynchronous processes, you have to think in terms of reactions which may never occur. Using chained Futures is like declaring a possible chain of events which could be broken at any moment. Therefore, waiting for a Future's value is definitely not a good practice as you may never get it :
val integer = Await.result(Future { throw new RuntimeException() }, timeout.duration) // will throw an uncaught exception
Try to think more in terms of events, than in procedures.