Say I do the following:
def foo: Future[Int] = ...
var cache: Option[Int] = None
def getValue: Future[Int] = synchronized {
cache match {
case Some(value) => Future(value)
case None =>
foo.map { value =>
cache = Some(value)
value
}
}
}
Is there a risk of deadlock with the above code? Or can I assume that the synchronzied block applies even within the future map block?
For a deadlock to exist, at least two different lock operations are to be called (in a possibly out of order sequence).
From what you show here (but we do not see what the foo implementation is), this is not the case. Only one lock exist and it is reentrant (if you try to enter twice on the same syncrhronized block from the same thread, you won't lock yourself out).
Therefore, no deadlock is possible from the code you've shown.
Still, I question this design. Maybe it is a simplification of your actual code, but from what I understand, you have
A function that can generate a int
You want to call this function only once and cache its result
I'd simplify your implementation greatly if that's the case :
def expensiveComputation: Int = ???
val foo = Future { expensiveComputation() }
def getValue: Future[Int] = foo
You'd have a single call to expensiveComputation (per instance of your enclosing object), and a synchronized cache on its return value, because Future is in and of itself a concurrency-safe construct.
Note that Future itself functions as a cache (see GPI's answer). However, GPI's answer isn't quite equivalent to your code: your code will only cache a successful value and will retry, while if the initial call to expensiveComputation in GPI's answer fails, getValue will always fail.
This however, gives us retry until successful:
def foo: Future[Int] = ???
private def retryFoo(): Future[Int] = foo.recoverWith{ case _ => retryFoo() }
lazy val getValue: Future[Int] = retryFoo()
In general, anything related to Futures which is asynchronous will not respect the synchronized block, unless you happen to Await on the asynchronous part within the synchronized block (which kind of defeats the point). In your case, it's absolutely possible for the following sequence (among many others) to occur:
Initial state: cache = None
Thread A calls getValue, obtains lock
Thread A pattern matches to None, calls foo to get a Future[Int] (fA0), schedules a callback to run in some thread B on fA0's successful completion (fA1)
Thread A releases lock
Thread A returns fA1
Thread C calls getValue, obtains lock
Thread C patter matches to None, calls foo to get a Future[Int] (fC0), schedules a callback to run in some thread D on fC0's successful completion (fC1)
fA0 completes successfully with value 42
Thread B runs callback on fA0, sets cache = Some(42), completes successfully with value 42
Thread C releases lock
Thread C returns fC1
fC1 completes successfull with value 7
Thread D runs callback on fC0, sets cache = Some(7), completes successfully with value 7
The code above can't deadlock, but there's no guarantee that foo will successfully complete exactly once (it could successfully complete arbitrarily many times), nor is there any guarantee as to which particular value of foo will be returned by a given call to getValue.
EDIT to add: You could also replace
cache = Some(value)
value
with
cache.synchronized { cache = cache.orElse(Some(value)) }
cache.get
Which would prevent cache from being assigned to multiple times (i.e. it would always contain the value returned by the first map callback to execute on a future returned by foo). It probably still wouldn't deadlock (I find that if I have to reason about a deadlock, my time is probably better spent reasoning about a better abstraction), but is this elaborate/verbose machinery better than just using a retry-on-failure Future as a cache?
No, but synchronized isn't actually doing much here. getValue returns almost immediately with a Future (which may or may not be completed yet), so the lock on getValue is extremely short-lived. It does not wait for foo.map to evaluate before releasing the lock, because that is executed only after foo is completed, which will almost certainly happen after getValue returns.
Related
I have 2 futures (2 actions on db tables) and I want that before saving modifications to check if both futures have finished successfully.
Right now, I start second future inside the first (as a dependency), but I know is not the best option. I know I can use a for-comprehension to execute both futures in parallel, but even if one fail, the other will be executed (not tested yet)
firstFuture.dropColumn(tableName) match {
case Success(_) => secondFuture.deleteEntity(entity)
case Failure(e) => throw new Exception(e.getMessage)
}
// the first future alters a table, drops a column
// the second future deletes a row from another table
in this case, if first future is executed successfully, the second can fail. I want to revert the update of first future. I heard about SQL transactions, seems to be something like that, but how?
val futuresResult = for {
first <- firstFuture.dropColumn(tableName)
second <- secondFuture.deleteEntity(entity)
} yield (first, second)
A for-comprehension is much better in my case because I don't have dependencies between these two futures and can be executed in parallel, but this not solve my problem, the result can be (success, success) or (failed, success) for example.
Regarding Future running sequentially vs in parallel:
This is a bit tricky because Scala Future is designed to be eager. There are some other constructs across various Scala libraries that handle synchronous and asynchronous effects, such as cats IO, Monix Task, ZIO etc. which are designed in a lazy way, and they don't have this behaviour.
The thing with Future being eager is that it will start the computation as soon as it is can. Here "start" means schedule it on an ExecutionContext that is either selected explicitly or present implicitly. While it's technically possible that the execution is stalled a bit in case the scheduler decides to do so, it will most likely be started almost instantaneously.
So if you have a value of type Future, it's going to start running then and there. If you have a lazy value of type Future, or a function / method that returns a value of type Future, then it's not.
But even if all you have are simple values (no lazy vals or defs), if the Future definition is done inside the for-comprehension, then it means it's part of a monadic flatMap chain (if you don't understand this, ignore it for now) and it will be run in sequence, not in parallel. Why? This is not specific to Futures; every for-comprehension has the semantics of being a sequential chain in which you can pass the result of the previous step to the next step. So it's only logical that you can't run something in step n + 1 if it depends on something from step n.
Here's some code to demonstrate this.
val program = for {
_ <- Future { Thread.sleep(5000); println("f1") }
_ <- Future { Thread.sleep(5000); println("f2") }
} yield ()
Await.result(program, Duration.Inf)
This program will wait five seconds, then print "f1", then wait another five seconds, and then print "f2".
Now let's take a look at this:
val f1 = Future { Thread.sleep(5000); println("f1") }
val f2 = Future { Thread.sleep(5000); println("f2") }
val program = for {
_ <- f1
_ <- f2
} yield ()
Await.result(program, Duration.Inf)
The program, however, will print "f1" and "f2" simultaneously after five seconds.
Note that the sequence semantics are not really violated in the second case. f2 still has the opportunity to use the result of f1. But f2 is not using the result of f1; it's a standalone value that can be computed immediately (defined with a val). So if we change val f2 to a function, e.g. def f2(number: Int), then the execution changes:
val f1 = Future { Thread.sleep(5000); println("f1"); 42 }
def f2(number: Int) = Future { Thread.sleep(5000); println(number) }
val program = for {
number <- f1
_ <- f2(number)
} yield ()
As you would expect, this will print "f1" after five seconds, and only then will the other Future start, so it will print "42" after another five seconds.
Regarding transactions:
As #cbley mentioned in the comment, this sounds like you want database transactions. For example, in SQL databases this has a very specific meaning and it ensures the ACID properties.
If that's what you need, you need to solve it on the database layer. Future is too generic for that; it's just an effect type that models sync and async computations. When you see a Future value, just by looking at the type, you can't tell if it's the result of a database call or, say, some HTTP call.
For example, doobie describes every database query as a ConnectionIO type. You can have multiple queries lined up in a for-comprehension, just how you would have with Future:
val program = for {
a <- database.getA()
_ <- database.write("foo")
b <- database.getB()
} yield {
// use a and b
}
But unlike our earlier examples, here getA() and getB() don't return a value of type Future[A], but ConnectionIO[A]. What's cool about that is that doobie completely takes care of the fact that you probably want these queries to be run in a single transaction, so if getB() fails, "foo" will not be committed to the database.
So what you would do in that case is obtain the full description of your set of queries, wrap it into a single value program of type ConnectionIO, and once you want to actually run the transaction, you would do something like program.transact(myTransactor), where myTransactor is an instance of Transactor, a doobie construct that knows how to connect to your physical database.
And once you transact, your ConnectionIO[A] is turned into a Future[A]. If the transaction failed, you'll have a failed Future, and nothing will be really committed to your database.
If your database operations are independent of each other and can be run in parallel, doobie will also allow you to do that. Committing transactions via doobie, both in sequence and in parallel, is quite nicely explained in the docs.
I have the following code that set the Atomic variable (both java.util.concurrent.atomic and monix.execution.atomic behaves the same:
class Foo {
val s = AtomicAny(null: String)
def foo() = {
println("called")
/* Side Effects */
"foo"
}
def get(): String = {
s.compareAndSet(null, foo())
s.get
}
}
val f = new Foo
f.get //Foo.s set from null to foo, print called
f.get //Foo.s not updated, but still print called
The second time it compareAndSet, it did not update the value, but still foo is called. This is causing problem because foo is having side effects (in my real code, it creates an Akka actor and give me error because it tries to create duplicate actors).
How can I make sure the second parameter is not evaluated unless it is actually used? (Preferably not using synchronized)
I need to pass implicit parameter to foo so lazy val would not work. E.g.
lazy val s = get() //Error cannot provide implicit parameter
def foo()(implicit context: Context) = {
println("called")
/* Side Effects */
"foo"
}
def get()(implicit context: Context): String = {
s.compareAndSet(null, foo())
s.get
}
Updated answer
The quick answer is to put this code inside an actor and then you don't have to worry about synchronisation.
If you are using Akka Actors you should never need to do your own thread synchronisation using low-level primitives. The whole point of the actor model is to limit the interaction between threads to just passing asynchronous messages. This provides all the thread synchronisation that you need and guarantees that an actor processes a single message at a time in a single-threaded manner.
You should definitely not have a function that is accessed simultaneously by multiple threads that creates a singleton actor. Just create the actor when you have the information you need and pass the ActorRef to any other actors that need it using dependency injection or a message. Or create the actor at the start and initialise it when the first message arrives (using context.become to manage the actor state).
Original answer
The simplest solution is just to use a lazy val to hold your instance of foo:
class Foo {
lazy val foo = {
println("called")
/* Side Effects */
"foo"
}
}
This will create foo the first time it is used and after that will just return the same value.
If this is not possible for some reason, use an AtomicInteger initialised to 0 and then call incrementAndGet. If this returns 1 then it is the first pass through this code and you can call foo.
Explanation:
Atomic operations such as compareAndSet require support from the CPU instruction set, and modern processors have single atomic instructions for such operations. In some cases (e.g. cache line is held exclusively by this processor) the operation can be very fast. Other cases (e.g. cache line also in cache of another processor) the operation can be significantly slower and can impact other threads.
The result is that the CPU must be holding the new value before the atomic instruction is executed. So the value must be computed before it is known whether it is needed or not.
I started working on Scala very recently and came across its feature called Future. I had posted a question for help with my code and some help from it.
In that conversation, I was told that it is not recommended to retrieve the value from a Future.
I understand that it is a parallel process when executed but if the value of a Future is not recommended to be retrieved, how/when do I access the result of it ? If the purpose of Future is to run a thread/process independent of main thread, why is it that it is not recommended to access it ? Will the Future automatically assign its output to its caller ? If so, how would we know when to access it ?
I wrote the below code to return a Future with a Map[String, String].
def getBounds(incLogIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
incLogIdMap.keys.foreach(table => if(!incLogIdMap(table).contains("INVALID")) {
val minMax = s"select max(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) maxTms, min(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) minTms from queue.${table} where key_ids in (${incLogIdMap(table)})"
val boundsDF = spark.read.format("jdbc").option("url", commonParams.getGpConUrl()).option("dbtable", s"(${minMax}) as ctids")
.option("user", commonParams.getGpUserName()).option("password", commonParams.getGpPwd()).load()
val maxTms = boundsDF.select("minTms").head.getLong(0).toString + "," + boundsDF.select("maxTms").head.getLong(0).toString
boundsMap += (table -> maxTms)
}
)
boundsMap
}
If I have to use the value which is returned from the method getBounds, can I access it in the below way ?
val tmsobj = new MinMaxVals(spark, commonParams)
tmsobj.getBounds(incLogIds) onComplete ({
case Success(Map) => val boundsMap = tmsobj.getBounds(incLogIds)
case Failure(value) => println("Future failed..")
})
Could anyone care to clear my doubts ?
As the others have pointed out, waiting to retrieve a value from a Future defeats the whole point of launching the Future in the first place.
But onComplete() doesn't cause the rest of your code to wait, it just attaches extra instructions to be carried out as part of the Future thread while the rest of your code goes on its merry way.
So what's wrong with your proposed code to access the result of getBounds()? Let's walk through it.
tmsobj.getBounds(incLogIds) onComplete { //launch Future, when it completes ...
case Success(m) => //if Success then store the result Map in local variable "m"
val boundsMap = tmsobj.getBounds(incLogIds) //launch a new and different Future
//boundsMap is a local variable, it disappears after this code block
case Failure(value) => //if Failure then store error in local variable "value"
println("Future failed..") //send some info to STDOUT
}//end of code block
You'll note that I changed Success(Map) to Success(m) because Map is a type (it's a companion object) and can't be used to match the result of your Future.
In conclusion: onComplete() doesn't cause your code to wait on the Future, which is good, but it is somewhat limited because it returns Unit, i.e. it has no return value with which it can communicate the result of the Future.
TLDR; Futures are not meant to manage shared state but they are good for composing asynchronous pieces of code. You can use map, flatMap and many other operations to combine Futures.
The computation that the Future represents will be executed using the given ExecutionContext (usually given implicitly), which will usually be on a thread-pool, so you are right to assume that the Future computation happens in parallel. Because of this concurrency, it is generally not advised to mutate state that is shared from inside the body of the Future, for example:
var i: Int = 0
val f: Future[Unit] = Future {
// Some computation
i = 42
}
Because you then run the risk of also accessing/modifying i in another thread (maybe the "main" one). In this kind of concurrent access situation, Futures would probably not be the right concurrency model, and you could imagine using monitors or message-passing instead.
Another possibility that is tempting but also discouraged is to block the main thread until the result becomes available:
val f: Future[Init] = Future { 42 }
val i: Int = Await.result(f)
The reason this is bad is that you will completely block the main thread, annealing the benefits of having concurrent execution in the first place. If you do this too much, you might also run in trouble because of a large number of threads that are blocked and hogging resources.
How do you then know when to access the result? You don't and it's actually the reason why you should try to compose Futures as much as possible, and only subscribe to their onComplete method at the very edge of your application. It's typical for most of your methods to take and return Futures, and only subscribe to them in very specific places.
It is not recommended to wait for a Future using Await.result because this blocks the execution of the current thread until some unknown point in the future, possibly forever.
It is perfectly OK to process the value of a Future by passing a processing function to a call such as map on the Future. This will call your function when the future is complete. The result of map is another Future, which can, in turn, be processed using map, onComplete or other methods.
My question is probably vague (could not think of how to describe it well) but hopefully this example will make things more clear:
class IntTestFake extends FunSpec with ScalaFutures {
describe("This"){
it("Fails for some reason"){
var a = "Chicken"
val b = "Steak"
def timeout() = Future{
while(a != b){}
}
Future{
Thread.sleep(3000)
a = b
}
whenReady(timeout(), Timeout(20 seconds), Interval(50 milliseconds))(result => result)
}
it("Passes...why?!?"){
var a = "Chicken"
val b = "Steak"
def timeout() = Future{
while(a != b){
println("this works...")
}
}
Future{
Thread.sleep(3000)
a = b
}
whenReady(timeout(), Timeout(20 seconds), Interval(50 milliseconds))(result => result)
}
}
}
In the first test (Fails for some reason) the while loop has an empty body. In the second test (Passes...why?!?) the while loop body has a println statement in it. My original thought was garbage collection was doing something funky but with that whenReady statement I am expecting something to return so I would expect GC to leave it alone until then. Apologies if this has already been asked I could not find an example.
The problem is that the code is reading a var from two threads without warning the compiler that it is going to do this, and this leads to unpredictable behaviour. The compiler does not know that the value of a is going to change under its feet, so it is perfectly allowed to cache that value in a register or some other bit of memory. If it does, that while loop is going to spin forever.
It happens that your first test fails and the second succeeds, but this is a result of the particular compiler and scheduler that you are using, and could be different on a different system.
The solution is to avoid using a shared variable and use a proper synchronisation mechanism. In this case, a Promise would probably do the trick.
a needs to be #volatile, without it writes from other threads are not guaranteed to be visible to the current thread, until it hits a "memory barrier" (a special point in the code, where all caches are flashed - in a conceptual sense as pointed out in the comment, not necessarily mapping directly to how exactly hardware off a particular cpu handles that). This is why the second case works - there's plenty of memory barriers inside a println call.
So, changing var a ... to #volatile var a ... will make it work ... but, seriously, don't use vars. At least, not until you have learned enough scala to be able to recognize the cases where you have to have them.
I don't get the actual (semantic) difference between the two "expressions".
It is said "loop" fits to "react" and "while(true)" to "receive", because "react" does not return and "loop" is a function which calls the body again all over again (it least this is what I deduct from the sources - I am not really familiar with the used "andThen"). "Receive" blocks one Thread from the pool, "react" does not. However, for "react" a Thread is looked up which the function can be attached to.
So the question is: why can't I use "loop" with "receive"? It also seems to behave different (and better!) than the "while(true)" variant, at least this is what I observe in a profiler.
Even more strange is that calling a ping-pong with "-Dactors.maxPoolSize=1 -Dactors.corePoolSize=1" with "while(true)" and "receive" blocks immediately (that's what I would expect) - however, with "loop" and "receive", it works without problems - in one Thread - how's this?
Thanks!
The critical difference between while and loop is that while restricts the loop iterations to occur in the same thread. The loop construct (as described by Daniel) enables the actor sub-system to invoke the reactions on any thread it chooses.
Hence using a combination of receive within while (true) ties an actor to a single thread. Using loop and react allows you run support many actors on a single thread.
The method loop is defined in the object Actor:
private[actors] trait Body[a] {
def andThen[b](other: => b): Unit
}
implicit def mkBody[a](body: => a) = new Body[a] {
def andThen[b](other: => b): Unit = self.seq(body, other)
}
/**
* Causes <code>self</code> to repeatedly execute
* <code>body</code>.
*
* #param body the code block to be executed
*/
def loop(body: => Unit): Unit = body andThen loop(body)
This is confusing, but what happens is that the block that comes after loop (the thing between { and }) is passed to the method seq as first argument, and a new loop with that block is passed as the second argument.
As for the method seq, in the trait Actor, we find:
private def seq[a, b](first: => a, next: => b): Unit = {
val s = Actor.self
val killNext = s.kill
s.kill = () => {
s.kill = killNext
// to avoid stack overflow:
// instead of directly executing `next`,
// schedule as continuation
scheduleActor({ case _ => next }, 1)
throw new SuspendActorException
}
first
throw new KillActorException
}
So, the new loop is scheduled for the next action after a kill, then the block gets executed, and then an exception of type KillActorException is thrown, which will cause the loop to be executed again.
So, a while loop performs much faster that a loop, as it throws no exceptions, does no scheduling, etc. On the other hand, the scheduler gets the opportunity to schedule something else between two executions of a loop.