Design of server component in Scala - scala

Suppose I need to a Scala component to process incoming requests concurrently and return the processing results. Suppose also that the request processing consists of a few steps. Some of the steps are resource and time consuming, some of them are either I/O or CPU-bound, etc.
Assuming that requests come from inside the JVM I would design the component as follows:
actor "Facade" is the entry point: it receives requests and sends the results to the clients.
actor "Dispatcher" execute processing steps asynchronously with Futures, which wrap the steps
steps send their results back to the "Dispatcher" actor. It's implemented with Future callbacks.
When the request processing finish "Dispatcher" sends the results to "Facade".
Does it make sense? Are there any good examples of such a component in Scala?

It makes sense given the details you showed.
A good framework for such case would be Akka. See the examples available in the source code, and the many open sourced projects in GitHub.

Related

Is it possible to combine REST and messaging for microservices?

We have the first version of an application based on a microservice architecture. We used REST for external and internal communication.
Now we want to switch to AP from CP (CAP theorem)* and use a message bus for communication between microservices.
There is a lot of information about how to create an event bus based on Kafka, RabbitMQ, etc.
But I can't find any best practices for a combination of REST and messaging.
For example, you create a car service and you need to add different car components. It would make more sense, for this purpose, to use REST with POST requests. On the other hand, a service for booking a car would be a good task for an event-based approach.
Do you have a similar approach when you have a different dictionary and business logic capabilities? How do you combine them? Just support both approaches separately? Or unify them in one approach?
* for the first version, we agreed to choose consistency and partition tolerance. But now availability becomes more important for us.
Bottom line up front: You're looking for Command Query Responsibility Segregation; which defines an architectural pattern for breaking up responsibilities from querying for data to asking for a process to be run. The short answer is you do not want to mix the two in either a query or a process in a blocking fashion. The rest of this answer will go into detail as to why, and the three different ways you can do what you're trying to do.
This answer is a short form of the experience I have with Microservices. My bona fides: I've created Microservices topologies from scratch (and nearly zero knowledge) and as they say hit every branch on the way down.
One of the benefits of starting from zero-knowledge is that the first topology I created used a mixture of intra-service synchronous and blocking (HTTP) communication (to retrieve data needed for an operation from the service that held it), and message queues + asynchronous events to run operations (for Commands).
I'll define both terms:
Commands: Telling a service to do something. For instance, "Run ETL Batch job". You expect there to be an output from this; but it is necessarily a process that you're not going to be able to reliably wait on. A command has side-effects. Something will change because of this action (If nothing happens and nothing changes, then you haven't done anything).
Query: Asking a service for data that it holds. This data may have been there because of a Command given, but asking for data should not have side effects. No Command operations should need to be run because of a Query received.
Anyway, back to the topology.
Level 1: Mixed HTTP and Events
For this first topology, we mixed Synchronous Queries with Asynchronous Events being emitted. This was... problematic.
Message Buses are by their nature observable. One setting in RabbitMQ, or an Event Source, and you can observe all events in the system. This has some good side-effects, in that when something happens in the process you can typically figure out what events led to that state (if you follow an event-driven paradigm + state machines).
HTTP Calls are not observable without inspecting network traffic or logging those requests (which itself has problems, so we're going to start with "not feasible" in normal operations). Therefore if you mix a message based process and HTTP calls, you're going to have holes where you can't tell what's going on. You'll have spots where due to a network error your HTTP call didn't return data, and your services didn't continue the process because of that. You'll also need to hook up Retry/Circuit Breaker patterns for your HTTP calls to ensure they at least try a few times, but then you have to differentiate between "Not up because it's down", and "Not up because it's momentarily busy".
In short, mixing the two methods for a Command Driven process is not very resilient.
Level 2: Events define RPC/Internal Request/Response for data; Queries are External
In step two of this maturity model, you separate out Commands and Queries. Commands should use an event driven system, and queries should happen through HTTP. If you need the results of a query for a Command, then you issue a message and use a Request/Response pattern over your message bus.
This has benefits and problems too.
Benefits-wise your entire Command is now observable, even as it hops through multiple services. You can also replay processes in the system by rerunning events, which can be useful in tracking down problems.
Problems-wise now some of your events look a lot like queries; and you're now recreating the beautiful HTTP and REST semantics available in HTTP for messages; and that's not terribly fun or useful. As an example, a 404 tells you there's no data in REST. For a message based event, you have to recreate those semantics (There's a good Youtube conference talk on the subject I can't find but a team tried to do just that with great pain).
However, your events are now asynchronous and non-blocking, and every service can be refactored to a state-machine that will respond to a given event. Some caveats are those events should contain all the data needed for the operation (which leads to messages growing over the course of a process).
Your queries can still use HTTP for external communication; but for internal command/processes, you'd use the message bus.
I don't recommend this approach either (though it's a step up from the first approach). I don't recommend it because of the impurity your events start to take on, and in a microservices system having contracts be the same throughout the system is important.
Level 3: Producers of Data emit data as events. Consumers Record data for their use.
The third step in the maturity model (and we were on our way to that paradigm when I departed from the project) is for services that produce data to issue events when that data is produced. That data is then jotted down by services listening for those events, and those services will use that (could be?) stale data to conduct their operations. External customers still use HTTP; but internally you emit events when new data is produced, and each service that cares about that data will store it to use when it needs to. This is the crux of Michael Bryzek's talk Designing Microservices Architecture the Right way. Michael Bryzek is the CTO of Flow.io, a white-label e-commerce company.
If you want a deeper answer along with other issues at play, I'll point you to my blog post on the subject.

When to create an Akka Actor

I have a REST service which services only one POST request. I want to use an actor to process the request. However I don't know if I should create one actor and derive all the requests using this actor or should I create an actor every time I get a request. What are the pros and cons of these choices.
Also, how is it parallel execution when I create one actor and use that actor to process all my requests. It certainly looks like sequential execution. I would want to understand this as well.
If you use one Actor requests are queued inside the actor mail box and are processed one by one by the actor. This is sequential and not recommended.
Thats why it is said
One actor is no actor.
Create a manager Actor which manages other actors. As actors are quite cheap you can create one actor for every request without any problem.
Do db interactions and other heavy computation using a future and direct results of the future to request handling actor using pipeTo pattern.
Use actors only to divide and distribute work and use Futures to do compute intensive work.
I would create an actor per request and use the "tell" pattern to delegate the work to the newly created actor. If the REST framework you use supports completing the request from another actor (Spray, Akka-HTTP does), then you can complete the request from this new actor. This way your request handling actor is free to handle the next request.
I find this a wonderful resource that explains the pros & cons of ask & tell and per-request-actors. It can be helpful to you.
I agree with what #pamu said. Actors are cheap. But be mindful that if ever you are gonna use a singleton Actor, do not make it stateful it will cause trouble.
And if you are gonna use Futures to do intensive work (which you should do). Make sure you give them specific ExecutionContext / Dispatcher. Using the global dispatcher or ExecutionContext is not good.
Or in each api you have, create a certain dispatcher to control the # of Actors that will work on that kind of endpoint / api.
For example you have "/get/transactions"
specify a dispatcher that would only spawn this # of thread. For this api.
The advantage of this is you can control the # of threads and resources your app uses. When it comes to dealing with heavy traffic. This is a good practice.

Should Akka Actors do real processing tasks?

I'm writing an application that reads relatively large text files, validates and transforms the data (every line in a text file is an own item, there are around 100M items/file) and creates some kind of output. There already exists a multihreaded Java application (using BlockingQueue between Reading/Processing/Persisting Tasks), but I want to implement a Scala application that does the same thing.
Akka seems to be a very popular choice for building concurrent applications. Unfortunately, due to the asynchronous nature of actors, I still don't understand what a single actor can or can't do, e.g. if I can use actors as traditional workers that do some sort of calculation.
Several documentations say that Actors should never block and I understand why. But the given examples for blocking code always only mention such things as blocking file/network IO.. things that make the actor waiting for a short period of time which is of course a bad thing.
But what if the actor is "blocking" because it actually does something useful instead of waiting? In my case, the processing and transformation of a single line/item of text takes 80ms which is quite a long time (pure processing, no IO involved). Can this work be done by an actor directly or should I use a Future instead (but then, If I have to use Futures anyway, why use Akka in the first place..)?.
The Akka docs and examples show that work can be done directly by actors. But it seems that the authors only do very simplistic work (such as calling filter on a String or incrementing a counter and that's it). I don't know if they do this to keep the docs simple and concise or because you really should not do more that within an actor.
How would you design an Akka-based application for my use case (reading text file, processing every line which takes quite some time, eventually persisting the result)? Or is this some kind of problem that does not suit to Akka?
It all depends on the type of an actor.
I use this rule of thumb: if you don't need to talk to this actor and this actor does not have any other responsibilities, then it's ok to block in it doing actual work. You can treat it as a Future and this is what I would call a "worker".
If you block in an actor that is not a leaf node (worker), i.e. work distributor then the whole system will slow down.
There are a few patterns that involve work pulling/pushing or actor per request model. Either of those could be a fit for your application. You can have a manager that creates an actor for each piece of work and when the work is finished actor sends result back to manager and dies. You can also keep an actor alive and ask for more work from that actor. You can also combine actors and Futures.
Sometimes you want to be able to talk to a worker if your processing is more complex and involves multiple stages. In that case a worker can delegate work yet to another actor or to a future.
To sum-up don't block in manager/work distribution actors. It's ok to block in workers if that does not slow your system down.
disclaimer: by blocking I mean doing actual work, not just busy waiting which is never ok.
Doing computations that take 100ms is fine in an actor. However, you need to make sure to properly deal with backpressure. One way would be to use the work-pulling pattern, where your CPU bound actors request new work whenever they are ready instead of receiving new work items in a message.
That said, your problem description sounds like a processing pipeline that might benefit from using a higher level abstraction such as akka streams. Basically, produce a stream of file names to be processed and then use transformations such as map to get the desired result. I have something like this in production that sounds pretty similar to your problem description, and it works very well provided the data used by the individual processing chunks is not too large.
Of course, a stream will also be materialized to a number of actors. But the high level interface will be more type-safe and easier to reason about.

Event based design - Futures, Promises vs Akka Persistence

I have multiple use cases which require predefined events to be fired based on a certain user actions.
e.g. let's say when NewUser is created in the application, it'll have to call CreateUserInWorkflowSystem and FireEmailToTheUser asynchronously. There are many other business cases of this nature where events will be predefined based on a usecase. I can use Promises/Futures to model these events as below
if 'NewUser' then
call `CreateUserInWorkflowSystem` (which will be Future based API)
call `FireEmailToTheUser` (which will be Future based API)
if 'FileImport' then
call `API3` (which will be Future based call)
call `API4` (which will be Future based call)
All those Future calls will have to log failures somewhere so failed calls can be retried etc. Note NewUser call won't be waiting for those Futures (events per say) to complete.
That was using plain Futures/Promises APIs. However I am thinking Akka Persistence will be an appropriate fit here and blocking calls can still run into Futures. With Akka persistence, handling failure will be easy as it provides it out of box etc. I understand Akka persistence is still in experimental stage but that doesn't seem to be a big concern as typesafe generally keeps these new frameworks into experimental state before promoting into future release etc. (same was true with Macros). Given these requirements do you think Futures/Promises or Akka persistence is a better fit here?
This is an opinion based question - not the best type to ask on SO. Anyway, trying to answer.
It really depends what you are more comfortable with and what your requirements are. Do you need to scale the system later beyond a single JVM - use Akka. Do you want to keep it more simple - use Futures.
If you use Futures you can store all state and actions to execute in a job queue/db. It's quite reasonable.
If you use Akka Persistence then obviously it will help you with persistence. Akka will help to perform supervison, recovery and retries easier. If your CreateUserInWorkflowSystem action fails result is propagated to supervising actor which probably restarts the failed actor and makes it retry for N times. If your supervising actor fails then his supervisor will do the right thing, or eventually the whole app will crash which is good. With Futures you would have to implement this mechanism yourself and make sure that application can crash when needed.
If you have completely independent actions then Futures and Actors sound about the same. If you have to chain actions and compose them, then using Futures will be a somewhat more natural thing to do: for comprehensions, etc. In Akka you would have to wait for a message and based on a type of a message perform next action.
Try to mock a simple implementation using both and compare what you like/dislike given your particular application requirements. Overall, both choices are good, but I'm slightly leaning towards actors in this case.

Akka - How many instances of an actor should you create?

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).