Scala async/await and parallelization

Scala async/await and parallelization - scala

I'm learning about the uses of async/await in Scala. I have read this in https://github.com/scala/async
Theoretically this code is asynchronous (non-blocking), but it's not parallelized:
def slowCalcFuture: Future[Int] = ...
def combined: Future[Int] = async {
await(slowCalcFuture) + await(slowCalcFuture)
}
val x: Int = Await.result(combined, 10.seconds)
whereas this other one is parallelized:
def combined: Future[Int] = async {
val future1 = slowCalcFuture
val future2 = slowCalcFuture
await(future1) + await(future2)
}
The only difference between them is the use of intermediate variables.
How can this affect the parallelization?

Since it's similar to async & await in C#, maybe I can provide some insight. In C#, it's a general rule that Task that can be awaited should be returned 'hot', i.e. already running. I assume it's the same in Scala, where the Future returned from the function does not have to be explicitly started, but is just 'running' after being called. If it's not the case, then the following is pure (and probably not true) speculation.
Let's analyze the first case:
async {
await(slowCalcFuture) + await(slowCalcFuture)
}
We get to that block and hit the first await:
async {
await(slowCalcFuture) + await(slowCalcFuture)
^^^^^
}
Ok, so we're asynchronously waiting for that calculation to finish. When it's finished, we 'move on' with analyzing the block:
async {
await(slowCalcFuture) + await(slowCalcFuture)
^^^^^
}
Second await, so we're asynchronously waiting for second calculation to finish. After that's done, we can calculate the final result by adding two integers.
As you can see, we're moving step-by-step through awaits, awaiting Futures as they come one by one.
Let's take a look at the second example:
async {
val future1 = slowCalcFuture
val future2 = slowCalcFuture
await(future1) + await(future2)
}
OK, so here's what (probably) happens:
async {
val future1 = slowCalcFuture // >> first future is started, but not awaited
val future2 = slowCalcFuture // >> second future is started, but not awaited
await(future1) + await(future2)
^^^^^
}
Then we're awaiting the first Future, but both of the futures are currently running. When the first one returns, the second might have already completed (so we will have the result available at once) or we might have to wait for a little bit longer.
Now it's clear that second example runs two calculations in parallel, then waits for both of them to finish. When both are ready, it returns. First example runs the calculations in a non-blocking way, but sequentially.

the answer by Patryk is correct if a little difficult to follow. the main thing to understand about async/await is that it's just another way of doing Future's flatMap. there's no concurrency magic behind the scenes. all the calls inside an async block are sequential, including await which doesn't actually block the executing thread but rather wraps the rest of the async block in a closure and passes it as a callback on completion of the Future we're waiting on. so in the first piece of code the second calculation doesn't start until the first await has completed since no one started it prior to that.

In first case you create a new thread to execute a slow future and wait for it in a single call. So invocation of the second slow future is performed after the first one is complete.
In the second case when val future1 = slowCalcFuture is called, it effectively create a new thread, pass pointer to "slowCalcFuture" function to the thread and says "execute it please". It takes as much time as it is necessary to get a thread instance from thread pool, and pass a pointer to a function to the thread instance. Which can be considered instant. So, because val future1 = slowCalcFuture is translated into "get thread and pass pointer" operations, it is complete in no time and the next line is executed without any delay val future2 = slowCalcFuture. Feauture 2 is scheduled to execution without any delay too.
Fundamental difference between val future1 = slowCalcFuture and await(slowCalcFuture) is the same as between asking somebody to make you coffee and waiting for your coffee to be ready. Asking takes 2 seconds: which is needed to say phrase: "could you make me coffee please?". But waiting for coffee to be ready will take 4 minutes.
Possible modification of this task could be waiting for 1st available answer. For example, you want to connect to any server in a cluster. You issue requests to connect to every server you know, and the first one which responds, will be your server. You could do this with:
Future.firstCompletedOf(Array(slowCalcFuture, slowCalcFuture))

Related

Monix Task.sleep and single thread execution

I am trying to comprehend Task scheduling principles in Monix.
The following code (source: https://slides.com/avasil/fp-concurrency-scalamatsuri2019#/4/3) produces only '1's, as expected.
val s1: Scheduler = Scheduler(
ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor()),
ExecutionModel.SynchronousExecution)
def repeat(id: Int): Task[Unit] =
Task(println(s"$id ${Thread.currentThread().getName}")) >> repeat(id)
val prog: Task[(Unit, Unit)] = (repeat(1), repeat(2)).parTupled
prog.runToFuture(s1)
// Output:
// 1 pool-1-thread-1
// 1 pool-1-thread-1
// 1 pool-1-thread-1
// ...
When we add Task.sleep to the repeat method
def repeat(id: Int): Task[Unit] =
Task(println(s"$id ${Thread.currentThread().getName}")) >>
Task.sleep(1.millis) >> repeat(id)
the output changes to
// Output
// 1 pool-1-thread-1
// 2 pool-1-thread-1
// 1 pool-1-thread-1
// 2 pool-1-thread-1
// ...
Both tasks are now executed concurently on a single thread! Nice :)
Some cooperative yielding has kicked in. What happenend here exactly? Thanks :)
EDIT: same happens with Task.shift instead of Task.sleep.

I'm not sure if that's the answer you're looking for, but here it goes:
Allthough naming suggests otherwise, Task.sleep cannot be compared to more conventional methods like Thread.sleep.
Task.sleep does not actually run on a thread, but instead simply instructs the scheduler to run a callback after the elapsed time.
Here's a little code snippet from monix/TaskSleep.scala for comparison:
[...]
implicit val s = ctx.scheduler
val c = TaskConnectionRef()
ctx.connection.push(c.cancel)
c := ctx.scheduler.scheduleOnce(
timespan.length,
timespan.unit,
new SleepRunnable(ctx, cb)
)
[...]
private final class SleepRunnable(ctx: Context, cb: Callback[Throwable, Unit]) extends Runnable {
def run(): Unit = {
ctx.connection.pop()
// We had an async boundary, as we must reset the frame
ctx.frameRef.reset()
cb.onSuccess(())
}
}
[...]
During the period before the callback (here: cb) is executed, your single-threaded scheduler (here: ctx.scheduler) can simply use his thread for whatever computation is queued next.
This also explains why this approach is preferable, as we don't block threads during the sleep intervals - wasting less computation cycles.
Hope this helps.

To expand on Markus's answer.
As a mental model (for illustration purpose), you can imagine the thread pool like a stack. Since, you only have one executor thread pool, it'll try to run repeat1 first and then repeat2.
Internally, everything is just a giant FlatMap. The run loop will schedule all the tasks based on the execution model.
What happens is, sleep schedules a runnable to the thread pool. It pushes the runnable (repeat1) to the top of the stack, hence giving the chance for repeat2 to run. The same thing will happen with repeat2.
Note that, by default Monix's execution model will do an async boundary for every 1024 flatmap.

Why is it not recommended to retrieve value from Scala's Future?

I started working on Scala very recently and came across its feature called Future. I had posted a question for help with my code and some help from it.
In that conversation, I was told that it is not recommended to retrieve the value from a Future.
I understand that it is a parallel process when executed but if the value of a Future is not recommended to be retrieved, how/when do I access the result of it ? If the purpose of Future is to run a thread/process independent of main thread, why is it that it is not recommended to access it ? Will the Future automatically assign its output to its caller ? If so, how would we know when to access it ?
I wrote the below code to return a Future with a Map[String, String].
def getBounds(incLogIdMap:scala.collection.mutable.Map[String, String]): Future[scala.collection.mutable.Map[String, String]] = Future {
var boundsMap = scala.collection.mutable.Map[String, String]()
incLogIdMap.keys.foreach(table => if(!incLogIdMap(table).contains("INVALID")) {
val minMax = s"select max(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) maxTms, min(cast(to_char(update_tms,'yyyyddmmhhmmss') as bigint)) minTms from queue.${table} where key_ids in (${incLogIdMap(table)})"
val boundsDF = spark.read.format("jdbc").option("url", commonParams.getGpConUrl()).option("dbtable", s"(${minMax}) as ctids")
.option("user", commonParams.getGpUserName()).option("password", commonParams.getGpPwd()).load()
val maxTms = boundsDF.select("minTms").head.getLong(0).toString + "," + boundsDF.select("maxTms").head.getLong(0).toString
boundsMap += (table -> maxTms)
}
)
boundsMap
}
If I have to use the value which is returned from the method getBounds, can I access it in the below way ?
val tmsobj = new MinMaxVals(spark, commonParams)
tmsobj.getBounds(incLogIds) onComplete ({
case Success(Map) => val boundsMap = tmsobj.getBounds(incLogIds)
case Failure(value) => println("Future failed..")
})
Could anyone care to clear my doubts ?

As the others have pointed out, waiting to retrieve a value from a Future defeats the whole point of launching the Future in the first place.
But onComplete() doesn't cause the rest of your code to wait, it just attaches extra instructions to be carried out as part of the Future thread while the rest of your code goes on its merry way.
So what's wrong with your proposed code to access the result of getBounds()? Let's walk through it.
tmsobj.getBounds(incLogIds) onComplete { //launch Future, when it completes ...
case Success(m) => //if Success then store the result Map in local variable "m"
val boundsMap = tmsobj.getBounds(incLogIds) //launch a new and different Future
//boundsMap is a local variable, it disappears after this code block
case Failure(value) => //if Failure then store error in local variable "value"
println("Future failed..") //send some info to STDOUT
}//end of code block
You'll note that I changed Success(Map) to Success(m) because Map is a type (it's a companion object) and can't be used to match the result of your Future.
In conclusion: onComplete() doesn't cause your code to wait on the Future, which is good, but it is somewhat limited because it returns Unit, i.e. it has no return value with which it can communicate the result of the Future.

TLDR; Futures are not meant to manage shared state but they are good for composing asynchronous pieces of code. You can use map, flatMap and many other operations to combine Futures.
The computation that the Future represents will be executed using the given ExecutionContext (usually given implicitly), which will usually be on a thread-pool, so you are right to assume that the Future computation happens in parallel. Because of this concurrency, it is generally not advised to mutate state that is shared from inside the body of the Future, for example:
var i: Int = 0
val f: Future[Unit] = Future {
// Some computation
i = 42
}
Because you then run the risk of also accessing/modifying i in another thread (maybe the "main" one). In this kind of concurrent access situation, Futures would probably not be the right concurrency model, and you could imagine using monitors or message-passing instead.
Another possibility that is tempting but also discouraged is to block the main thread until the result becomes available:
val f: Future[Init] = Future { 42 }
val i: Int = Await.result(f)
The reason this is bad is that you will completely block the main thread, annealing the benefits of having concurrent execution in the first place. If you do this too much, you might also run in trouble because of a large number of threads that are blocked and hogging resources.
How do you then know when to access the result? You don't and it's actually the reason why you should try to compose Futures as much as possible, and only subscribe to their onComplete method at the very edge of your application. It's typical for most of your methods to take and return Futures, and only subscribe to them in very specific places.

It is not recommended to wait for a Future using Await.result because this blocks the execution of the current thread until some unknown point in the future, possibly forever.
It is perfectly OK to process the value of a Future by passing a processing function to a call such as map on the Future. This will call your function when the future is complete. The result of map is another Future, which can, in turn, be processed using map, onComplete or other methods.

How to specify the exact execution order of asynchronous calls in Scala unit tests?

I have written many different unit tests for futures in Scala.
All asynchronous calls use an execution context.
To make sure that the asynchronous calls are always executed in the same order, I need to delay some tasks which is rather difficult and slows the tests down.
The executor might still (depending on its implementation) complete some tasks before others.
What is the best way to test concurrent code with a specific execution order? For example, I have the following test case:
"firstSucc" should "complete the final future with the first one" in {
val executor = getExecutor
val util = getUtil
val f0 = util.async(executor, () => 10)
f0.sync
val f1 = util.async(executor, () => { delay(); 11 })
val f = f0.firstSucc(f1)
f.get should be(10)
}
where delay is def delay() = Thread.sleep(4000) and sync synchronizes the future (calls Await.ready(future, Duration.Inf)).
That's how I want to make sure that f0 is already completed and f1 completes AFTER f0. It is not enough that f0 is completed since firstSucc could be shuffling the futures. Therefore, f1 should be delayed until after the check of f.get.
Another idea is to create futures from promises and complete them at a certain point in time:
"firstSucc" should "complete the final future with the first one" in {
val executor = getExecutor
val util = getUtil
val f0 = util.async(executor, () => 10)
val p = getPromise
val f1 = p.future
val f = f0.firstSucc(f1)
f.get should be(10)
p.trySuccess(11)
}
Is there any easier/better approach to define the execution order? Maybe another execution service where one can configure the order of submitted tasks?
For this specific case it might be enough to delay the second future until after the result has been checked but in some cases ALL futures have to be completed but in a certain order.
The complete code can be found here: https://github.com/tdauth/scala-futures-promises
The test case is part of this class: https://github.com/tdauth/scala-futures-promises/blob/master/src/test/scala/tdauth/futuresandpromises/AbstractFutureTest.scala
This question might be related since Scala can use Java Executor Services: Controlling Task execution order with ExecutorService

For most simple cases, I'd say a single threaded executor should be enough - if you start your futures one-by-one, they'll be executed serially, and complete in the same order.
But it looks like your problem is actually more complex than what you are describing: you are not only looking for a way to ensure one future completes later than the other, but in general, to make a sequence of arbitrary events happen in a particular order. Fr example, the snippet in your question, verifies that the second future starts after the first one completes (I have not idea what the delay is for in that case btw).
You can use eventually to wait for a particular event to occur before continuing:
val f = Future(doSomething)
eventually {
someFlag shouldBe true
}
val f1 = Future(doSomethingElse)
eventually {
f.isCompleted shouldBe true
}
someFlag = false
eventually {
someFlag shouldBe true
}
f1.futureValue shoudlBe false

Future declaration seems independent from promise

I was reading this article http://danielwestheide.com/blog/2013/01/16/the-neophytes-guide-to-scala-part-9-promises-and-futures-in-practice.html and I was looking at this code:
object Government {
def redeemCampaignPledge(): Future[TaxCut] = {
val p = Promise[TaxCut]()
Future {
println("Starting the new legislative period.")
Thread.sleep(2000)
p.success(TaxCut(20))
println("We reduced the taxes! You must reelect us!!!!1111")
}
p.future
}
}
I've seen this type of code a few times and I'm confused. So we have this Promise:
val p = Promise[TaxCut]()
And this Future:
Future {
println("Starting the new legislative period.")
Thread.sleep(2000)
p.success(TaxCut(20))
println("We reduced the taxes! You must reelect us!!!!1111")
}
I don't see any assignment between them so I don't understand: How are they connected?

I don't see any assignment between them so I don't understand: How are
they connected?
A Promise is a one way of creating a Future.
When you use Future { } and import scala.concurrent.ExecutionContext.Implicits.global, you're queuing a function on one of Scala's threadpool threads. But, that isn't the only way to generate a Future. A Future need not necessarily be scheduled on a different thread.
What this example does is:
Creates a Promise[TaxCut] which will be completed sometime in the near future.
Queues a function to be ran inside a threadpool thread via the Future apply. This function also completes the Promise via the Promise.success method
Returns the future generated by the promise via Promise.future. When this future returns, it may not be completed yet, depending on how fast the execution of the function queued to the Future really runs (the OP was trying to convey this via the Thread.sleep method, delaying the completion of the future).

Is there any point in blocking for a future?

Suppose I have an application serving many requests. One of the requests takes a while to complete. I have the following code:
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.Await
import scala.concurrent.Future
def longReq(data:String):String = {
val respFuture = Future{
// some code that computes resp but takes a long time
// can my application process other requests during this time?
resp = ??? // time-consuming step
}
Await.result(respFuture, 2 minutes)
}
If I don't use futures at all, the application will be blocked until resp is computed and no other requests can be served in parallel during that time. However, if I use futures and then block for resp using Await, will the application be able to serve other requests in parallel while resp is being computed?

In your particular example, assuming that longReq is called serially by a request loop, the answer is No, it cannot process anything else. For that longReq would have to return a future instead:
def longReq(data:String): Future[String] = {
Future {
// some code that computes resp but takes a long time
// can my application process other requests during this time?
resp = ??? // time-consuming step
}
}
Of course that just pushes the reason you likely used Await.result further down the line.
The purpose of using Future is to avoid blocking, but it is a turtles-all-the-way-down buy-in. If you want to use a Future, the final recipient has to be able to deal with getting the result in an asynchronous way, i.e. your request loop must have a way to capture the caller in such a way that when the future is finally completed the caller can be told about the result
Let's assume your request loop receives a request object that a response callback, then you would call longReq like this (assuming the use of longReq that returns a Future):
def asyncCall(request: Request): Unit = {
longReq(request.data).map( result => request.response(result) )
}
The most common scenario where you would use the flow is HTTP or other servers where the synchronous Request => Response cycle has an async equivalent of Request => Future[Response], which pretty much any modern server framework offers (Play, Finatra, Scalatra, etc.)
When to use Await.result
The one scenario, where it might be reasonable to use Await.result is if you have a bunch of Futures and are willing to block while the all complete (assuming the use of longReq that returns a Future):
val futures = allData.map(longReq)) // List[Future[String]]
val combined = Future.sequence(futures) // Future[List[String]]
val responses = Await.result(combined, 10.seconds) // List[String]
Of course, combined being a Future, it would still be better to map over it and handle the result asynchronously

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Scala async/await and parallelization - scala

Related

Monix Task.sleep and single thread execution

Why is it not recommended to retrieve value from Scala's Future?

How to specify the exact execution order of asynchronous calls in Scala unit tests?

Future declaration seems independent from promise

Is there any point in blocking for a future?

Categories

Resources