How does the execution context from
import scala.concurrent.ExecutionContext.Implicits.global
differ from Play's execution contexts:
import play.core.Execution.Implicits.{internalContext, defaultContext}
They are very different.
In Play 2.3.x and prior, play.core.Execution.Implicits.internalContext is a ForkJoinPool with fixed constraints on size, used internally by Play. You should never use it for your application code. From the docs:
Play Internal Thread Pool - This is used internally by Play. No application code should ever be executed by a thread in this thread pool, and no blocking should ever be done in this thread pool. Its size can be configured by setting internal-threadpool-size in application.conf, and it defaults to the number of available processors.
Instead, you would use play.api.libs.concurrent.Execution.Implicits.defaultContext, which uses an ActorSystem.
In 2.4.x, they both use the same ActorSystem. This means that Akka will distribute work among its own pool of threads, but in a way that is invisible to you (other than configuration). Several Akka actors can share the same thread.
scala.concurrent.ExecutionContext.Implicits.global is an ExecutionContext defined in the Scala standard library. It is a special ForkJoinPool that using the blocking method to handle potentially blocking code in order to spawn new threads in the pool. You really shouldn't use this in a Play application, as Play will have no control over it. It also has the potential to spawn a lot of threads and use a ton of memory, if you're not careful.
I've written more about scala.concurrent.ExecutionContext.Implicits.global in this answer.
They are the same and point out to the default dispatcher of the underlying actor system in your
Play or Akka or combined application.
##Default Play's Context
play.api.libs.concurrent.Execution.Implicits.defaultContext
##Play's Internal Context
play.core.Execution.Implicits.internalContext
##Guice's EC Injected
class ClassA #Inject()(config: Configuration)
(implicit ec: ExecutionContext) {
...
}
But this is different:
scala.concurrent.ExecutionContext.Implicits.global
Also DB drivers, e.g. if you use slick, may come up with their own Execution Context. Anyway,
Best Practices:
Don’t use scala.concurrent.ExecutionContext.Implicits.global when you are using play or akka framework, in this way you may use more threads than optimum during high load so the performance may decrease.
Don’t be afraid! use the default dispatcher as much as you want everywhere unless you do some blocking task for example listening on network connection, or reading from db explicitly that makes you “current threed” waiting for the result.
Start with default executor and if you found Play / Akka not responding well during high load, switch to a new thread pool for time consuming computation tasks.
Computational tasks that are taking long time is not usually considered blocking. For example traversing an auto completion tree in the memory. But you may considered them blocking when you want to have your control structures remaining functioning once you have a time taking computational task.
The bad thing that may happen when you consider computational tasks as non-blocking is that the play and Akka message dispatcher will be paused when all threads are computing in heavy load. The pros of a separate dispatcher is that the queue processor doesn’t starve. The Cons with separate dispatcher is that you may allocate more threads that optimum and your overall performance will be decreased.
The difference is for high load servers, don’t worry for small projects, use the default
Use scala.concurrent.ExecutionContext.Implicits.global when you have no other executor running in your app. Don’t worry this is safe then.
Once you create Futures, use the default pool, this is the safest way unless you are sure that the future is blocking. Then use a separate pool or use blocking{} structure if possible.
Create a separate thread pool once
You Await for a future
You call Thread.sleep
You are reading a stream/socket/http call
Manually querying db with a blocking driver, (usually slick is safe)
Schedule a task to be run in 10 second
Schedule a task to be run every second
For map/recover operations of a future, use the default executor, usually this is safe
Exception handling is safe with default dispatcher
Always use Akka dispatchers with you in Play or Akka, has a nice way to define a new dispatcher in application.conf
PRELUDE: This question is from 6 years ago, since then many things have changed. I know this is not an answer to the original question, but I was fooled more than 1 day with the same confusion that the original question states; so I decided to share my research results with the community.
The latest update reagrding ExecutionContext, which perfectly applies to Play 2.8.15 is as follows. The Play 2.6 migration guide states:
The play.api.libs.concurrent.Execution class has been deprecated, as it was using global mutable state under the hood to pull the “current” application’s ExecutionContext.
If you want to specify the implicit behavior that you had previously, then you should pass in the execution context implicitly in the constructor.
So you cannot use play.api.libs.concurrent.Execution.Implicits.defaultContext anymore. The no-configuration, out-of-the-box practice is to provide an implicit value of type scala.concurrent.ExecutionContext for the controller, something like:
import scala.concurrent.ExecutionContext
#Singleton
class AsyncController #Inject()(cc: ControllerComponents, actorSystem: ActorSystem)(implicit exec: ExecutionContext) extends AbstractController(cc)
This means that none of the above answers hold, also the question itself is not relevant anymore, since play.core.Execution.Implicits.defaultContext is not available anymore.
Related
I am asking myself the question: "When should you use scala.concurrent.blocking?"
If I understood correctly, the blocking {} only makes sense to be used in conjunction with the ForkJoinPool. In addition docs.scala-lang.org highlights, that blocking shouldn't be used for long running executions:
Last but not least, you must remember that the ForkJoinPool is not designed for long-lasting blocking operations.
I assume a long running execution is a database call or some kind of external IO. In this case a separate thread pools should be used, e.g. CachedThreadPool. Most IO related frameworks, like sttp, doobie, cats can make use of a provided IO thread pool.
So I am asking myself, which use-case still exists for the blocking statement? Is this only useful, when working with locking and waiting operations, like semaphores?
Consider the problem of thread pool starvation. Say you have a fixed size thread pool of 10 available threads, something like so:
implicit val myFixedThreadPool =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
If for some reason all 10 threads are tied up, and a new request comes in which requires an 11th thread to do its work, then this 11th request will hang until one of the threads becomes available.
blocking { Future { ... } } construct can be interpreted as saying please do not consume a thread from myFixedThreadPool but instead spin up a new thread outside myFixedThreadPool.
One practical use case for this is if your application can conceptually be considered to be in two parts, one part which say in 90% of cases is talking to proper async APIs, but there is another part which in few special cases has to talk to say a very slow external API which takes many seconds to respond and which we have no control over. Using the fixed thread pool for the true async part is relatively safe from thread pool starvation, however also using the same fixed thread pool for the second part presents the danger of the situation where suddenly 10 requests are made to the slow external API, which now causes 90% of other requests to hang waiting for those slow requests to finish. Wrapping those slow requests in blocking would help minimise the chances of 90% of other requests from hanging.
Another way of achieving this kind of "swimlaning" of true async request from blocking requests is by offloading the blocking request to a separate dedicated thread pool to be used just for the blocking calls, something like so
implicit val myDefaultPool =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
val myPoolForBlockingRequests =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(20))
Future {
callAsyncApi
} // consume thread from myDefaultPool
...
Future {
callBlockingApi
}(myPoolForBlockingRequests) // consume thread from myPoolForBlockingRequests
I am asking myself the question: "When should you use scala.concurrent.blocking?"
Well, since that is mostly useful for Future and Future should never be used for serious business logic then never.
Now, "jokes" aside, when using Futures then you should always use blocking when wrapping blocking operations, AND receive a custom ExecutionContext; instead of hardcoding the global one. Note, this should always be the case, even for non-blocking operations, but IME most folks using Future don't do this... but that is another discussion.
Then, callers of those blocking operations may decide if they will use their compute EC or a blocking one.
When the docs mention long-lasting they don't mean anything specific, mostly because is too hard to be specific about that; is context / application specific. What you need to understand is that blocking by default (note the actual EC may do whatever they want) will just create a new thread, and if you create a lot of threads and they take too long to be released you will saturate your memory and kill the program with an OOM error.
For those situations, the recommendation is to control the back pressure of your app to avoid creating too many threads. One way to do that is to create a fixed thread pool for the maximum number of blocking operations you will support and just enqueue all other pending tasks; such EC should just ignore blocking calls. You may also just have an unbound number of threads but manage the back pressure manually in other parts of your code; e.g. with an explicit Queue, this was common advice before: https://gist.github.com/djspiewak/46b543800958cf61af6efa8e072bfd5c
However, having blocked threads is always hurtful for the performance of your app, even if the compute EC is not blocked. The latest talks by Daniel explain those in detail: "The case for effect systems" & "Threads at scale".
So the ecosystem is pushing hard the state of the art to avoid that at all costs but is not a simple task. Still, runtimes like the ones provided by cats-effect or ZIO are optimized to handle blocking tasks the best they can as of today, and will probably improve during this and next years.
I'm new to akka-actor and confused with some problems:
when I create an actorSystem, and use actorOf(Props(classOf[AX], ...)) to create actor in main method, how many instances are there for my actor AX?
If the answer to Q1 was just one, does this mean whatever data-structure I created in the AX actor class's definition will only appear in one thread and I should not concern about concurrency problems?
What if one of my actor's action (one case in receive method) is a time consuming task and would take quite long time to finish? Will my single Actor instance not responding until it finish that task?
If the answer to Q3 is right, what I am supposed to do to prevent my actor from not responding? Should I start another thread and send another message back to it until finish the task? Is there a best practice there I should follow?
yes, the actor system will only create 1 actor instance for each time you call the 'actorOf' method. However, when using a Router it is possible to create 1 router which spreads the load to any number of actors. So in that case it is possible to construct multiple instances, but 'normally' using actorOf just creates 1 instance.
Yes, within an actor you do not have to worry about concurrency because Akka guarantees that any actor only processes 1 message at the time. You must take care not to somehow mutate the state of the actor from code outside the actor. So whenever exposing the actor state, always do this using an immutable class. Case classes are excellent for this. But also be ware of modifying the actor state when completing a Future from inside the actor. Since the Future runs on it's own thread you could have a concurrency issue when the Future completes and the actor is processing a next message at the same time.
The actor executes on 1 thread at the time, but this might be a different thread each time the actor executes.
Akka is a highly concurrent and distributed framework, everything is asynchronous and non-blocking and you must do the same within your application. Scala and Akka provide several solutions to do this. Whenever you have a time consuming task within an actor you might either delegate the time consuming task to another actor just for this purpose, use Futures or use Scala's 'async/await/blocking'. When using 'blocking' you give a hint to the compiler/runtime a blocking action is done and the runtime might start additional thread to prevent thread starvation. The Scala Concurrent programming book is an excellent guide to learn this stuff. Also look at the concurrent package ScalaDocs and Neophyte's Guide to Scala.
If the actor really has to wait for the time consuming task to complete, then yes, your actor can only respond when that's finished. But this is a very 'request-response' way of thinking. Try to get away from this. The actor could also respond immediately indicating the task has started and send an additional message once the task has been completed.
With time consuming tasks always be sure to use a different threadpool so the ActorSystem will not be blocked because all of it's available threads are used up by time consuming tasks. For Future's you can provide a separate ExecutionContext (do not use the ActorSystem's Dispatch context for this!), but via Akka's configuration you can also configure certain actors to run on a different thread pool.
See 3.
Success!
one instance (if you declare a router in your props then (maybe) more than one)
Yes. This is one of the advantages of actors.
Yes. An Actor will process messages sequentially.
You can use scala.concurrent.Future (do not use actor state in the future) or delegate the work to a child actor (the main actor can manage the state and can respond to messages). Future or child-actor depends on use case.
I'm not sure how threading works inside of play, from what I understand netty uses a single thread but not sure how this translates to how controller actions are called.
class SomeController extends Controller {
val processor = new PegDownProcessor() //
def index = Action { request =>
val result = processor.doSomething()
Ok("hello")
}
}
The pegdown library says instantiating the PegDownProcessor could take 100's of milliseconds, and suggests to use a single reference in an application.
Note that the first time you create a PegDownProcessor it can take up
to a few hundred milliseconds to prepare the underlying parboiled
parser instance. However, once the first processor has been built all
further instantiations will be fast. Also, you can reuse an existing
PegDownProcessor instance as often as you want, as long as you prevent
concurrent accesses, since neither the PegDownProcessor nor the
underlying parser is thread-safe.
https://github.com/sirthias/pegdown
It also says that it isn't thread-safe.
Is the above usage designed correctly where I use a single instance as a val inside of a controller, and actually use it inside of a controller action?
Please explain if it is correct i.e. thread-safe or why it isn't?
Play actions can be called from multiple threads.
A quick solution that popped into my head:
You could create a pool of processors. The pool would be thread-safe and would contain a given number of processors (you could assign the number of processors dynamically or based on the CPU/RAM you have). When a request comes in, the pool puts it in a (FIFO) queue (of course you should use a thread-safe queue implementation). Each processor operates on its own thread, when one finishes a job, it checks the queue for a new job. The enqueue method of the pool returns a Future which is resolved when the task is processed. Play supports async results for the controller methods, so this would play nicely with Play as well.
A similar solution is to use Akka and its actor pool feature that basically implements the above approach in a more generic way. Since actors are single-threaded, each actor would have a single reference to a processor and would simply do the same as you would do on a single thread. Akka allows for advanced options, such as defining the scheduling method, and also fits nicely in the Play stack. Akka has almost no overhead itself, and you can create thousands of actors without any performance issues.
I am working with Play Framework (Scala) version 2.3. From the docs:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
This has me a bit confused on how to tune my webapp. Specifically, since my app has a good amount of blocking calls: a mix of JDBC calls, and calls to 3rd party services using blocking SDKs, what is the strategy for configuring the execution context and determining the number of threads to provide? Do I need a separate execution context? Why can't I simply configure the default pool to have a sufficient amount of threads (and if I do this, why would I still need to wrap the calls in a Future?)?
I know this ultimately will depend on the specifics of my app, but I'm looking for some guidance on the strategy and approach. The play docs preach the use of non-blocking operations everywhere but in reality the typical web-app hitting a sql database has many blocking calls, and I got the impression from reading the docs that this type of app will perform far from optimally with the default configurations.
[...] what is the strategy for configuring the execution context and
determining the number of threads to provide
Well, that's the tricky part which depends on your individual requirements.
First of all, you probably should choose a basic profile from the docs (pure asynchronous, highly synchronous or many specific thread pools)
The second step is to fine-tune your setup by profiling and benchmarking your application
Do I need a separate execution context?
Not necessarily. But it makes sense to use separate execution contexts if you want to trigger all your blocking IO-calls at once and not in a sequential way (so database call B does not have to wait until database call A is finished).
Why can't I simply configure the default pool to have a sufficient
amount of threads (and if I do this, why would I still need to wrap
the calls in a Future?)?
You can, check the docs:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-min = 300
parallelism-max = 300
}
}
}
}
}
With this approach, you basically are turning Play into a one-thread-per-request-model. This is not the idea behind Play, but if you're doing a lot of blocking IO calls, it's the simplest approach. In this case, you don't need to wrap your database calls in a Future.
To put it in a nutshell, you basically have three ways to go:
Only use (IO-)technologies whose API calls are non-blocking and asynchronous. This allows you to use a small threadpool / default execution context which suits the nature of Play
Turn Play into a one-thread-per-request Framework by drastically increasing the default execution context. No futures needed, just call your blocking database as always
Create specific execution contexts for your blocking IO-calls and gain fine-grained control of what you are doing
Firstly, before diving in and refactoring your app, you should determine whether this is actually a problem for you. Run some benchmarks (gatling is superb) and do a few profiles with something like JProfiler. If you can live with the current performance then happy days.
The ideal is to use a reactive driver which would return you a future that then gets passed all the way back to your controller. Unfortunately async is still an Open ticket for slick. Interacting with REST APIs can be made reactive using the PlayWS library, but if you have to go via a library that your 3rd party provides then you're stuck.
So, assuming that none of these are feasible and that you do need to improve performance, the question is what benefit would Play's suggestion have? I think what they're getting at here is that it's useful to partition your threads into those that block and those that can make use of asynchronous techniques.
If, for instance, only some proportion of your requests are long and blocking then with a single thread pool you risk all threads being used for the blocking operations. Your controller would then not be able to handle any new requests, irrespective of whether that request needs to call a blocking service. If you can allocate enough threads that this never happens then no problem.
If, on the other hand, you are hitting your limit for threads then by using two pools you can keep your fast, non-blocking requests snappy. You would have one pool servicing requests in your controller and calling into services which return futures. Some of these futures would actually be performing work using a separate pool of threads, but only for the blocking operations. If there is any portion of your app which could be made reactive, then your controller could take advantage of this while isolating the controller from the blocking operations.
Akka 2.x requires many commands to reference an ActorSystem. So, to create an instance of an actor MyActor you might say:
val system = ActorSystem()
val myActor = system.actorOf(Props[MyActor])
Because of the frequent need for an ActorSystem: many code examples omit the creation from the code and assume that the reader knows where a system variable has come from.
If your code produces actors in different places, you could duplicate this code, possibly creating additional ActorSystem instances, or you could try to share the same ActorSystem instance by referring to some global or by passing the ActorSystem around.
The Akka documentation provides a general overview of systems of actors under the heading 'Actor Systems', and there is documentation of the ActorSystem class. But neither of these help a great deal in explaining why a user of Akka can't just rely on Akka to manage this under-the-hood.
Question(s)
What are the implications of sharing the same ActorSystem object or creating a new one each time?
What are the best practices here? Passing around an ActorSystem all the time seems surprisingly heavy-handed.
Some examples give the ActorSystem a name: ActorSystem("MySystem") others just call ActorSystem(). What difference does this make, and what if you use the same name twice?
Does akka-testkit require that you share a common ActorSystem with the one you pass to the TestKit constructor?
Creating an ActorSystem is very expensive, so you want to avoid creating a new one each time you need it. Also your actors should run in the same ActorSystem, unless there is a good reason for them not to. The name of the ActorSystem is also part the the path to the actors that run in it. E.g. if you create an actor in a system named MySystem it will have a path like akka://MySystem/user/$a. If you are in an actor context, you always have a reference to the ActorSystem. In an Actor you can call context.system. I don't know what akka-testkit expects, but you could take a look at the akka tests.
So to sum it up, you should always use the same system, unless there is a good reason not to do so.
Here are some materials which might be helpful to understand "Why does document always suggest to use one ActorSystem for one logical application" :
The heaviest part of an ActorSystem is the dispatcher. Each ActorSystem has at least one. The dispatcher is the engine that makes the actors running. In order to make running, it needs threads (usually got from a thread pool). The default dispatcher uses a fork-join thread pool with at least 8 threads.
There are shared facilities, like the guardian actors, the event stream, the scheduler, etc. Some of them are in user space, some are internal. All of them need to be created and started.
One ActorSystem with one thread pool configures to the numbers of cores should give the best results in most cases.
Here document mentions logical application, I prefer to consider blocking or non-blocking application. According to dispatcher's configuration, one ActorSystem is for one configuration. If the application is for the same logics, one ActorSystem should be enough.
Here is a discussion , if you have time, you can read it. They discuss a lot, ActorSystem, local or remote, etc.