blocking keyword in Scala - scala

What's the difference between Future(blocking(blockingCall())) and blocking(Future(blockingCall()))? Both of these are defined in scala.concurrent._
I've looked at the scala docs and some other stack overflow answers but remain unclear on what the difference is.

blocking acts as a hint to the ExecutionContext that it contains blocking code, so that it may spawn a new thread to prevent deadlocks. This presumes the ExecutionContext can do that, but not all are made to.
Let's look at each one-by-one.
Future(blocking(blockingCall()))
This requires an implicit ExecutionContext to execute the Future. If the ExecutionContext being used is a BlockContext (like scala.concurrent.ExecutionContext.Implicits.global is), it may be able to spawn a new thread in its thread pool to handle the blocking call, if it needs to. If it isn't, then nothing special happens.
blocking(Future(blockingCall()))
This tells us that Future(blockingCall()) may be a blocking call, so it is treated the same as above. Except here, Future.apply is non-blocking, so using blocking effectively does nothing but add a little overhead. It doesn't matter what ExecutionContext we're calling it from here, as it isn't blocking anyway. However, the blocking call within the Future will block a thread in the ExecutionContext it's running on, without the hint that its blocking. So, there is no reason to ever do this.
I've explained blocking more in depth in this answer.
REPL Examples:
import java.util.concurrent.Executors
import scala.concurrent._
val ec = scala.concurrent.ExecutionContext.Implicits.global
val executorService = Executors.newFixedThreadPool(4)
val ec2 = ExecutionContext.fromExecutorService(executorService)
def blockingCall(i: Int): Unit = { Thread.sleep(1000); println("blocking call.. " + i) }
// Spawns enough new threads in `ec` to handle the 100 blocking calls
(0 to 100) foreach { i => Future(blocking(blockingCall(i)))(ec) }
// Does not spawn new threads, and `ec2` reaches thread starvation
// execution will be staggered as threads are freed
(0 to 100) foreach { i => Future(blocking(blockingCall(i)))(ec2) }
// `blocking` does nothing because the `Future` is executed in a different context,
// and `ec2` reaches thread starvation
(0 to 100) foreach { i => blocking(Future(blockingCall(i))(ec2)) }
// `blocking` still does nothing, but `ec` does not know to spawn new threads (even though it could)
// so we reach thread starvation again
(0 to 100) foreach { i => blocking(Future(blockingCall(i))(ec)) }

Related

Is synchronous HTTP request wrapped in a Future considered CPU or IO bound?

Consider the following two snippets where first wraps scalaj-http requests with Future, whilst second uses async-http-client
Sync client wrapped with Future using global EC
object SyncClientWithFuture {
def main(args: Array[String]): Unit = {
import scala.concurrent.ExecutionContext.Implicits.global
import scalaj.http.Http
val delay = "3000"
val slowApi = s"http://slowwly.robertomurray.co.uk/delay/${delay}/url/https://www.google.co.uk"
val nestedF = Future(Http(slowApi).asString).flatMap { _ =>
Future.sequence(List(
Future(Http(slowApi).asString),
Future(Http(slowApi).asString),
Future(Http(slowApi).asString)
))
}
time { Await.result(nestedF, Inf) }
}
}
Async client using global EC
object AsyncClient {
def main(args: Array[String]): Unit = {
import scala.concurrent.ExecutionContext.Implicits.global
import sttp.client._
import sttp.client.asynchttpclient.future.AsyncHttpClientFutureBackend
implicit val sttpBackend = AsyncHttpClientFutureBackend()
val delay = "3000"
val slowApi = uri"http://slowwly.robertomurray.co.uk/delay/${delay}/url/https://www.google.co.uk"
val nestedF = basicRequest.get(slowApi).send().flatMap { _ =>
Future.sequence(List(
basicRequest.get(slowApi).send(),
basicRequest.get(slowApi).send(),
basicRequest.get(slowApi).send()
))
}
time { Await.result(nestedF, Inf) }
}
}
The snippets are using
Slowwly to simulate slow API
scalaj-http
async-http-client sttp backend
time
The former takes 12 seconds whilst the latter takes 6 seconds. It seems the former behaves as if it is CPU bound however I do not see how that is the case since Future#sequence should executes the HTTP requests in parallel? Why does synchronous client wrapped in Future behave differently from proper async client? Is it not the case that async client does the same kind of thing where it wraps calls in Futures under the hood?
Future#sequence should execute the HTTP requests in parallel?
First of all, Future#sequence doesn't execute anything. It just produces a future that completes when all parameters complete.
Evaluation (execution) of constructed futures starts immediately If there is a free thread in the EC. Otherwise, it simply submits it for a sort of queue.
I am sure that in the first case you have single thread execution of futures.
println(scala.concurrent.ExecutionContext.Implicits.global) -> parallelism = 6
Don't know why it is like this, it might that other 5 thread is always busy for some reason. You can experiment with explicitly created new EC with 5-10 threads.
The difference with the Async case that you don't create a future by yourself, it is provided by the library, that internally don't block the thread. It starts the async process, "subscribes" for a result, and returns the future, which completes when the result will come.
Actually, async lib could have another EC internally, but I doubt.
Btw, Futures are not supposed to contain slow/io/blocking evaluations without blocking. Otherwise, you potentially will block the main thread pool (EC) and your app will be completely frozen.

Why can't Actors complete all work although I created 10000 of them?

I create 10000 actors and send a message to each, but it seems that the akka system can't complete all the work.
when I check the thread state, they are all in TIMED_WATIING.
My code:
class Reciver extends Actor {
val log = Logging(context.system, this)
var num = 0
def receive = {
case a: Int => log.info(s"[${self.path}] receive $a, num is $num")
Thread.sleep(2000)
log.info(s"[${self.path}] processing $a, num is $num")
num = a
}
}
object ActorSyncOrAsync extends App {
val system = ActorSystem("mysys")
for (i <- 0 to 10000) {
val actor = system.actorOf(Props[Reciver])
actor ! i
}
println("main thread send request complete")
}
You should remove Thread.sleep or (if you're using default thread-pool) surround it with:
scala.concurrent.blocking {
Thread.sleep(2000)
}
scala.concurrent.blocking marks the computation to have a managed blocking, which means that it tells the pool that computation is not taking CPU resources but just waits for some result or timeout. You should be careful with this however. So, basically, this advice works if you're using Thread.sleep for debugging purposes or just to emulate some activity - no Thread.sleep (even surrounded by blocking) should take place in production code.
Explanation:
When some fixed pool is used (including fork-join as it doesn't steal work from threads blocked by Thread.sleep) - there is only POOL_SIZE threads (it equals to the number of cores in your system by default) is used for computation. Everything else is going to be queued.
So, let's say 4 cores, 2 seconds per task, 10000 tasks - it's gonna take 2*10000/4 = 5000 seconds.
The general advice is to not block (including Thread.sleep) inside your actors: Blocking needs careful management. If you need to delay some action it's better to use Scheduler (as #Lukasz mentioned): http://doc.akka.io/docs/akka/2.4.4/scala/scheduler.html

Does Akka's fork-join-executor make use of scala.concurrent.blocking?

I know that scala.concurrent.blocking is a hint for ExecutionContext that a piece of code performs some long operation / blocks on some IO. ExecutionContext can, but does not have to, make use of this "hint".
As described here:
scala.concurrent.blocking - what does it actually do?
http://www.cakesolutions.net/teamblogs/demystifying-the-blocking-construct-in-scala-futures
scala.concurrent.ExecutionContext.Implicits.global is a ForkJoinPool, which spawns a new thread for a code wrapped in scala.concurrent.blocking.
What about Akka's fork-join-executor. Does it also make use of scala.concurrent.blocking in any way?
Yes, it does!
It's easy enough to figure this out by searching for BlockContext. For Akka, that leads us here, and if you follow the threads in the code and documentation, you can confirm it.
This is also easy enough to test on our own. We can make a new ActorSystem without any configured dispatchers (so it will use the default fork-join-executor), then use its dispatcher as our ExecutionContext.
import akka.actor._
import scala.concurrent._
implicit val ec = ActorSystem("test").dispatcher
First, without blocking you will notice that several futures start right away, but not all of them. The another batch will start as the first batch completes. Clearly, the default pool reaches starvation:
(0 to 100) foreach { n =>
Future {
println("starting Future: " + n)
Thread.sleep(3000)
println("ending Future: " + n)
}
}
Then, with blocking all of the futures should execute almost immediately, as opposed to the previous example:
(0 to 100) foreach { n =>
Future {
println("starting Future: " + n)
blocking(Thread.sleep(3000))
println("ending Future: " + n)
}
}

How to run futures on the current actor's dispatcher in Akka

Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.

Sleeping actors?

What's the best way to have an actor sleep? I have actors set up as agents which want to maintain different parts of a database (including getting data from external sources). For a number of reasons (including not overloading the database or communications and general load issues), I want the actors to sleep between each operation. I'm looking at something like 10 actor objects.
The actors will run pretty much infinitely, as there will always be new data coming in, or sitting in a table waiting to be propagated to other parts of the database etc. The idea is for the database to be as complete as possible at any point in time.
I could do this with an infinite loop, and a sleep at the end of each loop, but according to http://www.scala-lang.org/node/242 actors use a thread pool which is expanded whenever all threads are blocked. So I imagine a Thread.sleep in each actor would be a bad idea as would waste threads unnecessarily.
I could perhaps have a central actor with its own loop that sends messages to subscribers on a clock (like async event clock observers)?
Has anyone done anything similar or have any suggestions? Sorry for extra (perhaps superfluous) information.
Cheers
Joe
There was a good point to Erlang in the first answer, but it seems disappeared. You can do the same Erlang-like trick with Scala actors easily. E.g. let's create a scheduler that does not use threads:
import actors.{Actor,TIMEOUT}
def scheduler(time: Long)(f: => Unit) = {
def fixedRateLoop {
Actor.reactWithin(time) {
case TIMEOUT => f; fixedRateLoop
case 'stop =>
}
}
Actor.actor(fixedRateLoop)
}
And let's test it (I did it right in Scala REPL) using a test client actor:
case class Ping(t: Long)
import Actor._
val test = actor { loop {
receiveWithin(3000) {
case Ping(t) => println(t/1000)
case TIMEOUT => println("TIMEOUT")
case 'stop => exit
}
} }
Run the scheduler:
import compat.Platform.currentTime
val sched = scheduler(2000) { test ! Ping(currentTime) }
and you will see something like this
scala> 1249383399
1249383401
1249383403
1249383405
1249383407
which means our scheduler sends a message every 2 seconds as expected. Let's stop the scheduler:
sched ! 'stop
the test client will begin to report timeouts:
scala> TIMEOUT
TIMEOUT
TIMEOUT
stop it as well:
test ! 'stop
There's no need to explicitly cause an actor to sleep: using loop and react for each actor means that the underlying thread pool will have waiting threads whilst there are no messages for the actors to process.
In the case that you want to schedule events for your actors to process, this is pretty easy using a single-threaded scheduler from the java.util.concurrent utilities:
object Scheduler {
import java.util.concurrent.Executors
import scala.compat.Platform
import java.util.concurrent.TimeUnit
private lazy val sched = Executors.newSingleThreadScheduledExecutor();
def schedule(f: => Unit, time: Long) {
sched.schedule(new Runnable {
def run = f
}, time , TimeUnit.MILLISECONDS);
}
}
You could extend this to take periodic tasks and it might be used thus:
val execTime = //...
Scheduler.schedule( { Actor.actor { target ! message }; () }, execTime)
Your target actor will then simply need to implement an appropriate react loop to process the given message. There is no need for you to have any actor sleep.
ActorPing (Apache License) from lift-util has schedule and scheduleAtFixedRate Source: ActorPing.scala
From scaladoc:
The ActorPing object schedules an actor to be ping-ed with a given message at specific intervals. The schedule methods return a ScheduledFuture object which can be cancelled if necessary
There unfortunately are two errors in the answer of oxbow_lakes.
One is a simple declaration mistake (long time vs time: Long), but the second is some more subtle.
oxbow_lakes declares run as
def run = actors.Scheduler.execute(f)
This however leads to messages disappearing from time to time. That is: they are scheduled but get never send. Declaring run as
def run = f
fixed it for me. It's done the exact way in the ActorPing of lift-util.
The whole scheduler code becomes:
object Scheduler {
private lazy val sched = Executors.newSingleThreadedScheduledExecutor();
def schedule(f: => Unit, time: Long) {
sched.schedule(new Runnable {
def run = f
}, time - Platform.currentTime, TimeUnit.MILLISECONDS);
}
}
I tried to edit oxbow_lakes post, but could not save it (broken?), not do I have rights to comment, yet. Therefore a new post.