I am trying to comprehend Task scheduling principles in Monix.
The following code (source: https://slides.com/avasil/fp-concurrency-scalamatsuri2019#/4/3) produces only '1's, as expected.
val s1: Scheduler = Scheduler(
ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor()),
ExecutionModel.SynchronousExecution)
def repeat(id: Int): Task[Unit] =
Task(println(s"$id ${Thread.currentThread().getName}")) >> repeat(id)
val prog: Task[(Unit, Unit)] = (repeat(1), repeat(2)).parTupled
prog.runToFuture(s1)
// Output:
// 1 pool-1-thread-1
// 1 pool-1-thread-1
// 1 pool-1-thread-1
// ...
When we add Task.sleep to the repeat method
def repeat(id: Int): Task[Unit] =
Task(println(s"$id ${Thread.currentThread().getName}")) >>
Task.sleep(1.millis) >> repeat(id)
the output changes to
// Output
// 1 pool-1-thread-1
// 2 pool-1-thread-1
// 1 pool-1-thread-1
// 2 pool-1-thread-1
// ...
Both tasks are now executed concurently on a single thread! Nice :)
Some cooperative yielding has kicked in. What happenend here exactly? Thanks :)
EDIT: same happens with Task.shift instead of Task.sleep.
I'm not sure if that's the answer you're looking for, but here it goes:
Allthough naming suggests otherwise, Task.sleep cannot be compared to more conventional methods like Thread.sleep.
Task.sleep does not actually run on a thread, but instead simply instructs the scheduler to run a callback after the elapsed time.
Here's a little code snippet from monix/TaskSleep.scala for comparison:
[...]
implicit val s = ctx.scheduler
val c = TaskConnectionRef()
ctx.connection.push(c.cancel)
c := ctx.scheduler.scheduleOnce(
timespan.length,
timespan.unit,
new SleepRunnable(ctx, cb)
)
[...]
private final class SleepRunnable(ctx: Context, cb: Callback[Throwable, Unit]) extends Runnable {
def run(): Unit = {
ctx.connection.pop()
// We had an async boundary, as we must reset the frame
ctx.frameRef.reset()
cb.onSuccess(())
}
}
[...]
During the period before the callback (here: cb) is executed, your single-threaded scheduler (here: ctx.scheduler) can simply use his thread for whatever computation is queued next.
This also explains why this approach is preferable, as we don't block threads during the sleep intervals - wasting less computation cycles.
Hope this helps.
To expand on Markus's answer.
As a mental model (for illustration purpose), you can imagine the thread pool like a stack. Since, you only have one executor thread pool, it'll try to run repeat1 first and then repeat2.
Internally, everything is just a giant FlatMap. The run loop will schedule all the tasks based on the execution model.
What happens is, sleep schedules a runnable to the thread pool. It pushes the runnable (repeat1) to the top of the stack, hence giving the chance for repeat2 to run. The same thing will happen with repeat2.
Note that, by default Monix's execution model will do an async boundary for every 1024 flatmap.
Related
Say I do the following:
def foo: Future[Int] = ...
var cache: Option[Int] = None
def getValue: Future[Int] = synchronized {
cache match {
case Some(value) => Future(value)
case None =>
foo.map { value =>
cache = Some(value)
value
}
}
}
Is there a risk of deadlock with the above code? Or can I assume that the synchronzied block applies even within the future map block?
For a deadlock to exist, at least two different lock operations are to be called (in a possibly out of order sequence).
From what you show here (but we do not see what the foo implementation is), this is not the case. Only one lock exist and it is reentrant (if you try to enter twice on the same syncrhronized block from the same thread, you won't lock yourself out).
Therefore, no deadlock is possible from the code you've shown.
Still, I question this design. Maybe it is a simplification of your actual code, but from what I understand, you have
A function that can generate a int
You want to call this function only once and cache its result
I'd simplify your implementation greatly if that's the case :
def expensiveComputation: Int = ???
val foo = Future { expensiveComputation() }
def getValue: Future[Int] = foo
You'd have a single call to expensiveComputation (per instance of your enclosing object), and a synchronized cache on its return value, because Future is in and of itself a concurrency-safe construct.
Note that Future itself functions as a cache (see GPI's answer). However, GPI's answer isn't quite equivalent to your code: your code will only cache a successful value and will retry, while if the initial call to expensiveComputation in GPI's answer fails, getValue will always fail.
This however, gives us retry until successful:
def foo: Future[Int] = ???
private def retryFoo(): Future[Int] = foo.recoverWith{ case _ => retryFoo() }
lazy val getValue: Future[Int] = retryFoo()
In general, anything related to Futures which is asynchronous will not respect the synchronized block, unless you happen to Await on the asynchronous part within the synchronized block (which kind of defeats the point). In your case, it's absolutely possible for the following sequence (among many others) to occur:
Initial state: cache = None
Thread A calls getValue, obtains lock
Thread A pattern matches to None, calls foo to get a Future[Int] (fA0), schedules a callback to run in some thread B on fA0's successful completion (fA1)
Thread A releases lock
Thread A returns fA1
Thread C calls getValue, obtains lock
Thread C patter matches to None, calls foo to get a Future[Int] (fC0), schedules a callback to run in some thread D on fC0's successful completion (fC1)
fA0 completes successfully with value 42
Thread B runs callback on fA0, sets cache = Some(42), completes successfully with value 42
Thread C releases lock
Thread C returns fC1
fC1 completes successfull with value 7
Thread D runs callback on fC0, sets cache = Some(7), completes successfully with value 7
The code above can't deadlock, but there's no guarantee that foo will successfully complete exactly once (it could successfully complete arbitrarily many times), nor is there any guarantee as to which particular value of foo will be returned by a given call to getValue.
EDIT to add: You could also replace
cache = Some(value)
value
with
cache.synchronized { cache = cache.orElse(Some(value)) }
cache.get
Which would prevent cache from being assigned to multiple times (i.e. it would always contain the value returned by the first map callback to execute on a future returned by foo). It probably still wouldn't deadlock (I find that if I have to reason about a deadlock, my time is probably better spent reasoning about a better abstraction), but is this elaborate/verbose machinery better than just using a retry-on-failure Future as a cache?
No, but synchronized isn't actually doing much here. getValue returns almost immediately with a Future (which may or may not be completed yet), so the lock on getValue is extremely short-lived. It does not wait for foo.map to evaluate before releasing the lock, because that is executed only after foo is completed, which will almost certainly happen after getValue returns.
I have written many different unit tests for futures in Scala.
All asynchronous calls use an execution context.
To make sure that the asynchronous calls are always executed in the same order, I need to delay some tasks which is rather difficult and slows the tests down.
The executor might still (depending on its implementation) complete some tasks before others.
What is the best way to test concurrent code with a specific execution order? For example, I have the following test case:
"firstSucc" should "complete the final future with the first one" in {
val executor = getExecutor
val util = getUtil
val f0 = util.async(executor, () => 10)
f0.sync
val f1 = util.async(executor, () => { delay(); 11 })
val f = f0.firstSucc(f1)
f.get should be(10)
}
where delay is def delay() = Thread.sleep(4000) and sync synchronizes the future (calls Await.ready(future, Duration.Inf)).
That's how I want to make sure that f0 is already completed and f1 completes AFTER f0. It is not enough that f0 is completed since firstSucc could be shuffling the futures. Therefore, f1 should be delayed until after the check of f.get.
Another idea is to create futures from promises and complete them at a certain point in time:
"firstSucc" should "complete the final future with the first one" in {
val executor = getExecutor
val util = getUtil
val f0 = util.async(executor, () => 10)
val p = getPromise
val f1 = p.future
val f = f0.firstSucc(f1)
f.get should be(10)
p.trySuccess(11)
}
Is there any easier/better approach to define the execution order? Maybe another execution service where one can configure the order of submitted tasks?
For this specific case it might be enough to delay the second future until after the result has been checked but in some cases ALL futures have to be completed but in a certain order.
The complete code can be found here: https://github.com/tdauth/scala-futures-promises
The test case is part of this class: https://github.com/tdauth/scala-futures-promises/blob/master/src/test/scala/tdauth/futuresandpromises/AbstractFutureTest.scala
This question might be related since Scala can use Java Executor Services: Controlling Task execution order with ExecutorService
For most simple cases, I'd say a single threaded executor should be enough - if you start your futures one-by-one, they'll be executed serially, and complete in the same order.
But it looks like your problem is actually more complex than what you are describing: you are not only looking for a way to ensure one future completes later than the other, but in general, to make a sequence of arbitrary events happen in a particular order. Fr example, the snippet in your question, verifies that the second future starts after the first one completes (I have not idea what the delay is for in that case btw).
You can use eventually to wait for a particular event to occur before continuing:
val f = Future(doSomething)
eventually {
someFlag shouldBe true
}
val f1 = Future(doSomethingElse)
eventually {
f.isCompleted shouldBe true
}
someFlag = false
eventually {
someFlag shouldBe true
}
f1.futureValue shoudlBe false
I create 10000 actors and send a message to each, but it seems that the akka system can't complete all the work.
when I check the thread state, they are all in TIMED_WATIING.
My code:
class Reciver extends Actor {
val log = Logging(context.system, this)
var num = 0
def receive = {
case a: Int => log.info(s"[${self.path}] receive $a, num is $num")
Thread.sleep(2000)
log.info(s"[${self.path}] processing $a, num is $num")
num = a
}
}
object ActorSyncOrAsync extends App {
val system = ActorSystem("mysys")
for (i <- 0 to 10000) {
val actor = system.actorOf(Props[Reciver])
actor ! i
}
println("main thread send request complete")
}
You should remove Thread.sleep or (if you're using default thread-pool) surround it with:
scala.concurrent.blocking {
Thread.sleep(2000)
}
scala.concurrent.blocking marks the computation to have a managed blocking, which means that it tells the pool that computation is not taking CPU resources but just waits for some result or timeout. You should be careful with this however. So, basically, this advice works if you're using Thread.sleep for debugging purposes or just to emulate some activity - no Thread.sleep (even surrounded by blocking) should take place in production code.
Explanation:
When some fixed pool is used (including fork-join as it doesn't steal work from threads blocked by Thread.sleep) - there is only POOL_SIZE threads (it equals to the number of cores in your system by default) is used for computation. Everything else is going to be queued.
So, let's say 4 cores, 2 seconds per task, 10000 tasks - it's gonna take 2*10000/4 = 5000 seconds.
The general advice is to not block (including Thread.sleep) inside your actors: Blocking needs careful management. If you need to delay some action it's better to use Scheduler (as #Lukasz mentioned): http://doc.akka.io/docs/akka/2.4.4/scala/scheduler.html
I'm learning about the uses of async/await in Scala. I have read this in https://github.com/scala/async
Theoretically this code is asynchronous (non-blocking), but it's not parallelized:
def slowCalcFuture: Future[Int] = ...
def combined: Future[Int] = async {
await(slowCalcFuture) + await(slowCalcFuture)
}
val x: Int = Await.result(combined, 10.seconds)
whereas this other one is parallelized:
def combined: Future[Int] = async {
val future1 = slowCalcFuture
val future2 = slowCalcFuture
await(future1) + await(future2)
}
The only difference between them is the use of intermediate variables.
How can this affect the parallelization?
Since it's similar to async & await in C#, maybe I can provide some insight. In C#, it's a general rule that Task that can be awaited should be returned 'hot', i.e. already running. I assume it's the same in Scala, where the Future returned from the function does not have to be explicitly started, but is just 'running' after being called. If it's not the case, then the following is pure (and probably not true) speculation.
Let's analyze the first case:
async {
await(slowCalcFuture) + await(slowCalcFuture)
}
We get to that block and hit the first await:
async {
await(slowCalcFuture) + await(slowCalcFuture)
^^^^^
}
Ok, so we're asynchronously waiting for that calculation to finish. When it's finished, we 'move on' with analyzing the block:
async {
await(slowCalcFuture) + await(slowCalcFuture)
^^^^^
}
Second await, so we're asynchronously waiting for second calculation to finish. After that's done, we can calculate the final result by adding two integers.
As you can see, we're moving step-by-step through awaits, awaiting Futures as they come one by one.
Let's take a look at the second example:
async {
val future1 = slowCalcFuture
val future2 = slowCalcFuture
await(future1) + await(future2)
}
OK, so here's what (probably) happens:
async {
val future1 = slowCalcFuture // >> first future is started, but not awaited
val future2 = slowCalcFuture // >> second future is started, but not awaited
await(future1) + await(future2)
^^^^^
}
Then we're awaiting the first Future, but both of the futures are currently running. When the first one returns, the second might have already completed (so we will have the result available at once) or we might have to wait for a little bit longer.
Now it's clear that second example runs two calculations in parallel, then waits for both of them to finish. When both are ready, it returns. First example runs the calculations in a non-blocking way, but sequentially.
the answer by Patryk is correct if a little difficult to follow. the main thing to understand about async/await is that it's just another way of doing Future's flatMap. there's no concurrency magic behind the scenes. all the calls inside an async block are sequential, including await which doesn't actually block the executing thread but rather wraps the rest of the async block in a closure and passes it as a callback on completion of the Future we're waiting on. so in the first piece of code the second calculation doesn't start until the first await has completed since no one started it prior to that.
In first case you create a new thread to execute a slow future and wait for it in a single call. So invocation of the second slow future is performed after the first one is complete.
In the second case when val future1 = slowCalcFuture is called, it effectively create a new thread, pass pointer to "slowCalcFuture" function to the thread and says "execute it please". It takes as much time as it is necessary to get a thread instance from thread pool, and pass a pointer to a function to the thread instance. Which can be considered instant. So, because val future1 = slowCalcFuture is translated into "get thread and pass pointer" operations, it is complete in no time and the next line is executed without any delay val future2 = slowCalcFuture. Feauture 2 is scheduled to execution without any delay too.
Fundamental difference between val future1 = slowCalcFuture and await(slowCalcFuture) is the same as between asking somebody to make you coffee and waiting for your coffee to be ready. Asking takes 2 seconds: which is needed to say phrase: "could you make me coffee please?". But waiting for coffee to be ready will take 4 minutes.
Possible modification of this task could be waiting for 1st available answer. For example, you want to connect to any server in a cluster. You issue requests to connect to every server you know, and the first one which responds, will be your server. You could do this with:
Future.firstCompletedOf(Array(slowCalcFuture, slowCalcFuture))
I don't get the actual (semantic) difference between the two "expressions".
It is said "loop" fits to "react" and "while(true)" to "receive", because "react" does not return and "loop" is a function which calls the body again all over again (it least this is what I deduct from the sources - I am not really familiar with the used "andThen"). "Receive" blocks one Thread from the pool, "react" does not. However, for "react" a Thread is looked up which the function can be attached to.
So the question is: why can't I use "loop" with "receive"? It also seems to behave different (and better!) than the "while(true)" variant, at least this is what I observe in a profiler.
Even more strange is that calling a ping-pong with "-Dactors.maxPoolSize=1 -Dactors.corePoolSize=1" with "while(true)" and "receive" blocks immediately (that's what I would expect) - however, with "loop" and "receive", it works without problems - in one Thread - how's this?
Thanks!
The critical difference between while and loop is that while restricts the loop iterations to occur in the same thread. The loop construct (as described by Daniel) enables the actor sub-system to invoke the reactions on any thread it chooses.
Hence using a combination of receive within while (true) ties an actor to a single thread. Using loop and react allows you run support many actors on a single thread.
The method loop is defined in the object Actor:
private[actors] trait Body[a] {
def andThen[b](other: => b): Unit
}
implicit def mkBody[a](body: => a) = new Body[a] {
def andThen[b](other: => b): Unit = self.seq(body, other)
}
/**
* Causes <code>self</code> to repeatedly execute
* <code>body</code>.
*
* #param body the code block to be executed
*/
def loop(body: => Unit): Unit = body andThen loop(body)
This is confusing, but what happens is that the block that comes after loop (the thing between { and }) is passed to the method seq as first argument, and a new loop with that block is passed as the second argument.
As for the method seq, in the trait Actor, we find:
private def seq[a, b](first: => a, next: => b): Unit = {
val s = Actor.self
val killNext = s.kill
s.kill = () => {
s.kill = killNext
// to avoid stack overflow:
// instead of directly executing `next`,
// schedule as continuation
scheduleActor({ case _ => next }, 1)
throw new SuspendActorException
}
first
throw new KillActorException
}
So, the new loop is scheduled for the next action after a kill, then the block gets executed, and then an exception of type KillActorException is thrown, which will cause the loop to be executed again.
So, a while loop performs much faster that a loop, as it throws no exceptions, does no scheduling, etc. On the other hand, the scheduler gets the opportunity to schedule something else between two executions of a loop.