Event based design - Futures, Promises vs Akka Persistence - scala

I have multiple use cases which require predefined events to be fired based on a certain user actions.
e.g. let's say when NewUser is created in the application, it'll have to call CreateUserInWorkflowSystem and FireEmailToTheUser asynchronously. There are many other business cases of this nature where events will be predefined based on a usecase. I can use Promises/Futures to model these events as below
if 'NewUser' then
call `CreateUserInWorkflowSystem` (which will be Future based API)
call `FireEmailToTheUser` (which will be Future based API)
if 'FileImport' then
call `API3` (which will be Future based call)
call `API4` (which will be Future based call)
All those Future calls will have to log failures somewhere so failed calls can be retried etc. Note NewUser call won't be waiting for those Futures (events per say) to complete.
That was using plain Futures/Promises APIs. However I am thinking Akka Persistence will be an appropriate fit here and blocking calls can still run into Futures. With Akka persistence, handling failure will be easy as it provides it out of box etc. I understand Akka persistence is still in experimental stage but that doesn't seem to be a big concern as typesafe generally keeps these new frameworks into experimental state before promoting into future release etc. (same was true with Macros). Given these requirements do you think Futures/Promises or Akka persistence is a better fit here?

This is an opinion based question - not the best type to ask on SO. Anyway, trying to answer.
It really depends what you are more comfortable with and what your requirements are. Do you need to scale the system later beyond a single JVM - use Akka. Do you want to keep it more simple - use Futures.
If you use Futures you can store all state and actions to execute in a job queue/db. It's quite reasonable.
If you use Akka Persistence then obviously it will help you with persistence. Akka will help to perform supervison, recovery and retries easier. If your CreateUserInWorkflowSystem action fails result is propagated to supervising actor which probably restarts the failed actor and makes it retry for N times. If your supervising actor fails then his supervisor will do the right thing, or eventually the whole app will crash which is good. With Futures you would have to implement this mechanism yourself and make sure that application can crash when needed.
If you have completely independent actions then Futures and Actors sound about the same. If you have to chain actions and compose them, then using Futures will be a somewhat more natural thing to do: for comprehensions, etc. In Akka you would have to wait for a message and based on a type of a message perform next action.
Try to mock a simple implementation using both and compare what you like/dislike given your particular application requirements. Overall, both choices are good, but I'm slightly leaning towards actors in this case.

Related

When to create an Akka Actor

I have a REST service which services only one POST request. I want to use an actor to process the request. However I don't know if I should create one actor and derive all the requests using this actor or should I create an actor every time I get a request. What are the pros and cons of these choices.
Also, how is it parallel execution when I create one actor and use that actor to process all my requests. It certainly looks like sequential execution. I would want to understand this as well.
If you use one Actor requests are queued inside the actor mail box and are processed one by one by the actor. This is sequential and not recommended.
Thats why it is said
One actor is no actor.
Create a manager Actor which manages other actors. As actors are quite cheap you can create one actor for every request without any problem.
Do db interactions and other heavy computation using a future and direct results of the future to request handling actor using pipeTo pattern.
Use actors only to divide and distribute work and use Futures to do compute intensive work.
I would create an actor per request and use the "tell" pattern to delegate the work to the newly created actor. If the REST framework you use supports completing the request from another actor (Spray, Akka-HTTP does), then you can complete the request from this new actor. This way your request handling actor is free to handle the next request.
I find this a wonderful resource that explains the pros & cons of ask & tell and per-request-actors. It can be helpful to you.
I agree with what #pamu said. Actors are cheap. But be mindful that if ever you are gonna use a singleton Actor, do not make it stateful it will cause trouble.
And if you are gonna use Futures to do intensive work (which you should do). Make sure you give them specific ExecutionContext / Dispatcher. Using the global dispatcher or ExecutionContext is not good.
Or in each api you have, create a certain dispatcher to control the # of Actors that will work on that kind of endpoint / api.
For example you have "/get/transactions"
specify a dispatcher that would only spawn this # of thread. For this api.
The advantage of this is you can control the # of threads and resources your app uses. When it comes to dealing with heavy traffic. This is a good practice.

Should Akka Actors do real processing tasks?

I'm writing an application that reads relatively large text files, validates and transforms the data (every line in a text file is an own item, there are around 100M items/file) and creates some kind of output. There already exists a multihreaded Java application (using BlockingQueue between Reading/Processing/Persisting Tasks), but I want to implement a Scala application that does the same thing.
Akka seems to be a very popular choice for building concurrent applications. Unfortunately, due to the asynchronous nature of actors, I still don't understand what a single actor can or can't do, e.g. if I can use actors as traditional workers that do some sort of calculation.
Several documentations say that Actors should never block and I understand why. But the given examples for blocking code always only mention such things as blocking file/network IO.. things that make the actor waiting for a short period of time which is of course a bad thing.
But what if the actor is "blocking" because it actually does something useful instead of waiting? In my case, the processing and transformation of a single line/item of text takes 80ms which is quite a long time (pure processing, no IO involved). Can this work be done by an actor directly or should I use a Future instead (but then, If I have to use Futures anyway, why use Akka in the first place..)?.
The Akka docs and examples show that work can be done directly by actors. But it seems that the authors only do very simplistic work (such as calling filter on a String or incrementing a counter and that's it). I don't know if they do this to keep the docs simple and concise or because you really should not do more that within an actor.
How would you design an Akka-based application for my use case (reading text file, processing every line which takes quite some time, eventually persisting the result)? Or is this some kind of problem that does not suit to Akka?
It all depends on the type of an actor.
I use this rule of thumb: if you don't need to talk to this actor and this actor does not have any other responsibilities, then it's ok to block in it doing actual work. You can treat it as a Future and this is what I would call a "worker".
If you block in an actor that is not a leaf node (worker), i.e. work distributor then the whole system will slow down.
There are a few patterns that involve work pulling/pushing or actor per request model. Either of those could be a fit for your application. You can have a manager that creates an actor for each piece of work and when the work is finished actor sends result back to manager and dies. You can also keep an actor alive and ask for more work from that actor. You can also combine actors and Futures.
Sometimes you want to be able to talk to a worker if your processing is more complex and involves multiple stages. In that case a worker can delegate work yet to another actor or to a future.
To sum-up don't block in manager/work distribution actors. It's ok to block in workers if that does not slow your system down.
disclaimer: by blocking I mean doing actual work, not just busy waiting which is never ok.
Doing computations that take 100ms is fine in an actor. However, you need to make sure to properly deal with backpressure. One way would be to use the work-pulling pattern, where your CPU bound actors request new work whenever they are ready instead of receiving new work items in a message.
That said, your problem description sounds like a processing pipeline that might benefit from using a higher level abstraction such as akka streams. Basically, produce a stream of file names to be processed and then use transformations such as map to get the desired result. I have something like this in production that sounds pretty similar to your problem description, and it works very well provided the data used by the individual processing chunks is not too large.
Of course, a stream will also be materialized to a number of actors. But the high level interface will be more type-safe and easier to reason about.

How to tune Play Framework application with proper threadpools?

I am working with Play Framework (Scala) version 2.3. From the docs:
You can’t magically turn synchronous IO into asynchronous by wrapping it in a Future. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a Future, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency.
This has me a bit confused on how to tune my webapp. Specifically, since my app has a good amount of blocking calls: a mix of JDBC calls, and calls to 3rd party services using blocking SDKs, what is the strategy for configuring the execution context and determining the number of threads to provide? Do I need a separate execution context? Why can't I simply configure the default pool to have a sufficient amount of threads (and if I do this, why would I still need to wrap the calls in a Future?)?
I know this ultimately will depend on the specifics of my app, but I'm looking for some guidance on the strategy and approach. The play docs preach the use of non-blocking operations everywhere but in reality the typical web-app hitting a sql database has many blocking calls, and I got the impression from reading the docs that this type of app will perform far from optimally with the default configurations.
[...] what is the strategy for configuring the execution context and
determining the number of threads to provide
Well, that's the tricky part which depends on your individual requirements.
First of all, you probably should choose a basic profile from the docs (pure asynchronous, highly synchronous or many specific thread pools)
The second step is to fine-tune your setup by profiling and benchmarking your application
Do I need a separate execution context?
Not necessarily. But it makes sense to use separate execution contexts if you want to trigger all your blocking IO-calls at once and not in a sequential way (so database call B does not have to wait until database call A is finished).
Why can't I simply configure the default pool to have a sufficient
amount of threads (and if I do this, why would I still need to wrap
the calls in a Future?)?
You can, check the docs:
play {
akka {
akka.loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = WARNING
actor {
default-dispatcher = {
fork-join-executor {
parallelism-min = 300
parallelism-max = 300
}
}
}
}
}
With this approach, you basically are turning Play into a one-thread-per-request-model. This is not the idea behind Play, but if you're doing a lot of blocking IO calls, it's the simplest approach. In this case, you don't need to wrap your database calls in a Future.
To put it in a nutshell, you basically have three ways to go:
Only use (IO-)technologies whose API calls are non-blocking and asynchronous. This allows you to use a small threadpool / default execution context which suits the nature of Play
Turn Play into a one-thread-per-request Framework by drastically increasing the default execution context. No futures needed, just call your blocking database as always
Create specific execution contexts for your blocking IO-calls and gain fine-grained control of what you are doing
Firstly, before diving in and refactoring your app, you should determine whether this is actually a problem for you. Run some benchmarks (gatling is superb) and do a few profiles with something like JProfiler. If you can live with the current performance then happy days.
The ideal is to use a reactive driver which would return you a future that then gets passed all the way back to your controller. Unfortunately async is still an Open ticket for slick. Interacting with REST APIs can be made reactive using the PlayWS library, but if you have to go via a library that your 3rd party provides then you're stuck.
So, assuming that none of these are feasible and that you do need to improve performance, the question is what benefit would Play's suggestion have? I think what they're getting at here is that it's useful to partition your threads into those that block and those that can make use of asynchronous techniques.
If, for instance, only some proportion of your requests are long and blocking then with a single thread pool you risk all threads being used for the blocking operations. Your controller would then not be able to handle any new requests, irrespective of whether that request needs to call a blocking service. If you can allocate enough threads that this never happens then no problem.
If, on the other hand, you are hitting your limit for threads then by using two pools you can keep your fast, non-blocking requests snappy. You would have one pool servicing requests in your controller and calling into services which return futures. Some of these futures would actually be performing work using a separate pool of threads, but only for the blocking operations. If there is any portion of your app which could be made reactive, then your controller could take advantage of this while isolating the controller from the blocking operations.

Reactive services and scalability without Akka

I've read Reactive Manifesto for a couple of times and tried to wrap my head around all this reactive, async, non-blocking stuff. It's clear how to make scalable systems on top of Actors, but will i get the same effect, in term of scalability, asynchronous execution execution if i would actively use scala Future all over my code, every method would either accept or return a Future. Would such service be scalable and responsive?. Lets say that in this question i'm not much interested in event-driven and resilient part of the service.
This answer reflects my experience using Akka in Scala and its Actor and Future types. I don't consider myself an expert, yet, but I've done a few systems using these libraries and feel I am beginning to develop a sense for how to use them.
The choice of using an Actor vs. a Future is about the nature of the concurrency you require. Futures compose monadically and in DAG-structured graphs, making certain computational structures very elegantly composed. But they have severe limitations. Basically, the computation that is performed concurrently within a Future must be self-contained (or at most reference only immutable external state) or you have not solved the problem of inter-thread interference with all the attendant risks such as deadlock, race conditions, unpredictable behavior or data structure corruption.
When your computation involves long-lived, mutable state, an Actor encapsulates that securely preventing corruption and race conditions. On the other hand, actors are not composable, but they do provide a lot of flexibility in how you construct networks of interacting computations. This is true only providing you don't limit actor responses to always and only sending them (the response to a request) back to the actor from which the request was received. If you only send responses to the requesting actor, you're limited to a tree-structured pattern of inter-actor interaction.
In any real, non-trivial system, you're very likely to use both formalisms, Future and Actor.

Using Scala Akka framework for blocking CLI calls

I'm relatively new to Akka & Scala, but I would like to use Akka as a generic framework to pull together information from various web tools, and cli commands.
I understand the general principal that in an Actor model, it is highly desirable not to have the actors block. And in the case of the http requests, there are async http clients (such as Spray) that means that I can handle the requests asynchronously within the Actor framework.
However, I'm unsure what is the best approach when combining actors with existing blocking API calls such as the scala ProcessBuilder/ProcessIO libraries. In terms of issuing these CLI commands I expect a relatively small amount of concurrency, e.g. perhaps executing a max of 10 concurrent CLI invocations on a 12 core machine.
Is it better to have a single actor managing these CLI commands, farming the actual work off to Futures that are created as needed? Or would it be cleaner just to maintain a set of separate actors backed by a PinnedDispatcher? Or something else?
From the Akka documentation ( http://doc.akka.io/docs/akka/snapshot/general/actor-systems.html#Blocking_Needs_Careful_Management ):
"
Blocking Needs Careful Management
In some cases it is unavoidable to do blocking operations, i.e. to put a thread to sleep for an indeterminate time, waiting for an external event to occur. Examples are legacy RDBMS drivers or messaging APIs, and the underlying reason in typically that (network) I/O occurs under the covers. When facing this, you may be tempted to just wrap the blocking call inside a Future and work with that instead, but this strategy is too simple: you are quite likely to find bottle-necks or run out of memory or threads when the application runs under increased load.
The non-exhaustive list of adequate solutions to the “blocking problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router [Java, Scala]), making sure to configure a thread pool which is either dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded number of tasks of this nature will exhaust your memory or thread limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they occur as actor messages.
The first possibility is especially well-suited for resources which are single-threaded in nature, like database handles which traditionally can only execute one outstanding query at a time and use internal synchronization to ensure this. A common pattern is to create a router for N actors, each of which wraps a single DB connection and handles queries as sent to the router. The number N must then be tuned for maximum throughput, which will vary depending on which DBMS is deployed on what hardware."