Akka - How many instances of an actor should you create? - scala

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.

Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.

If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.

Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.

It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.

For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).

Related

When to create an Akka Actor

I have a REST service which services only one POST request. I want to use an actor to process the request. However I don't know if I should create one actor and derive all the requests using this actor or should I create an actor every time I get a request. What are the pros and cons of these choices.
Also, how is it parallel execution when I create one actor and use that actor to process all my requests. It certainly looks like sequential execution. I would want to understand this as well.
If you use one Actor requests are queued inside the actor mail box and are processed one by one by the actor. This is sequential and not recommended.
Thats why it is said
One actor is no actor.
Create a manager Actor which manages other actors. As actors are quite cheap you can create one actor for every request without any problem.
Do db interactions and other heavy computation using a future and direct results of the future to request handling actor using pipeTo pattern.
Use actors only to divide and distribute work and use Futures to do compute intensive work.
I would create an actor per request and use the "tell" pattern to delegate the work to the newly created actor. If the REST framework you use supports completing the request from another actor (Spray, Akka-HTTP does), then you can complete the request from this new actor. This way your request handling actor is free to handle the next request.
I find this a wonderful resource that explains the pros & cons of ask & tell and per-request-actors. It can be helpful to you.
I agree with what #pamu said. Actors are cheap. But be mindful that if ever you are gonna use a singleton Actor, do not make it stateful it will cause trouble.
And if you are gonna use Futures to do intensive work (which you should do). Make sure you give them specific ExecutionContext / Dispatcher. Using the global dispatcher or ExecutionContext is not good.
Or in each api you have, create a certain dispatcher to control the # of Actors that will work on that kind of endpoint / api.
For example you have "/get/transactions"
specify a dispatcher that would only spawn this # of thread. For this api.
The advantage of this is you can control the # of threads and resources your app uses. When it comes to dealing with heavy traffic. This is a good practice.

Efficient processing of custom data by actors

I am an Akka newbie trying things out for a particular problem. I am trying to write code for an actor system which would efficiently process custom data coming from multiple clients in the form of events. By custom data, I mean, the content and structure of the data would vary between events from the same client (e.g., we might have instrumented to drop 5 events containing 5 different piece of information for the same client), and between events from different clients (e.g., we might be capturing completely different set of information from one client vs. another). I am wondering what would be a good way to use actor-based processing for this type of scenarios.
This are the alternatives what I have thought so far:
(A) I will write an actor which would load client-specific processor class through reflection, based on the client whose event is being processed. The client-specific processor class would contain logic corresponding to all the type of events that would be received for that client. I will initiate 'n' instances of this actor.
context.actorOf(Props[CustomEventProcessor].withRouter(RoundRobinPool(nrOfInstances = 100)), name = "CustomProcessor")
(B) I will write actors for each client, each containing logic corresponding to all the type of events that would be received for that client. I will initiate 'n' instances of each of these actors.
context.actorOf(Props[CleintXEventProcessor].withRouter(RoundRobinPool(nrOfInstances = 50)), name = "ClientXCustomProcessor")
context.actorOf(Props[CleintYEventProcessor].withRouter(RoundRobinPool(nrOfInstances = 50)), name = "ClientYCustomProcessor")
At this point, I have a few questions:
Would [A] be slower compared to [B] becuase [A] is using reflection? I am assuming that once an actor instance has finished processing a particular event, it dies, so the next actor instance processing an event from the same client would have to start with loading the processor class again. Is this assumption correct?
Given a specific event flow pattern, would a system based on [B] have a heavier runtime memory footprint compared to [A] becuase now each actor for each client can have multiple instances of them in memory?
Any other way to approach this problem?
Thanks for any pointers.
Well,
It could be a bit slower, but I think not really noticeable. And no, you don't have to kill actors between events.
No, because single actor takes like 400 bytes in memory, so you can create a single actor for each event, not only one actor per client.
Yes, via Reactive Streams which I think is a bit clearer solution than actors, but Akka Streams are still experimental, and it may be a bit harder to learn than actors. But you'll have backpressure for free if its needed.

Starting Actors on-demand by identifier in Akka

I'm currently implementing a system that that receives inbound messages from an external monitoring system. I'm translating these messages into more concise 'events', and I'm using these to alter the state of 'Managed System' objects. Akka Actors seemed like a good use case for encapsulating mutable state in concurrent applications.
The managed systems are identified by a name (99% of the time this is a hostname). Whenever a proper event is received, the system routes the message to the correct actor based on the name property. At first I used to use actorSelection and the complete paths of said actors, but that was very ugly, and I saw several people advise against relying on the fully qualified name of an actor to deliver message.
So I've set up a simple EventBus, which is great as I can now simply do:
eventBus.subscribe(subscriber1, "/managedSystem01")
eventBus.subscribe(subscriber2, "/managedSystem02")
eventBus.publish(MonitoringEvent("/managedSystem01", MonitoringMessage("managedSystem01", "N", "CPU_LOAD_HIGH", True)))
eventBus.publish(MonitoringEvent("/managedSystem02", MonitoringMessage("managedSystem02", "Y", "DISK_USAGE_HIGH", True)))
Of course, I now have the issue that, should I receive and event that concerns a managed system for which I've not spawned an actor yet (this is entirely possibly, it is impossible for me to get an absolute list of managed systems unfortunately), the message will be routed to the dead-letter mailbox.
Ideally I don't want this to happen. When it is unable to address a specific actor, I want to spawn a new one dynamically.
I suppose that, theoretically, I could subscribe to DeadLetter messages but:
That sounds a little 'hacky', since those message are essentially reserved for the system
Is it even possible to recover the original message (in my case, the MonitoringMessage) that is sent to the DeadLetter mailbox?
Alternatively is there a way to check if there are ZERO subscribers to a certain "topic"?
What you describe ("send to Actor by some identifier, if it does not exist buffer until it gets created and then deliver to that newly on-demand created Actor") is implemented in Akka as Cluster Sharding.
While it is designed primarily for sharding load (work) across a cluster, you could use it locally as well, since your requirement is essentially a scaled down (to one node) version of problem that it solves. It takes care of starting new Actors if they don't exist for a given identifier etc, so you'd simply subscribe the shard-region to the events and it'll take care of creating the actors for you.

Is it mandatory to have a master actor in Akka?

I am trying to learn a bit about akka actors (in scala) and I came up with this question that I couldn't find an answer to.
Do you necessarily need to create a master actor and from there create the workers with a workerRouter?
Or can you just skip this step and go directly to create workers with a workerRouter from your Main object?
Let me know if you need any code, but I'm basically following the HelloWorld for akka.
Strictly speaking: yes. In terms of bussiness logic: no.
Akka's actors, are by design, hierarchical . That means, any actors you create will always have a "parent"/"master", if not one defined yourself, then the /user guardian actor .
However, note that this hierarchical relationship, from Akka's system point of view, concerns the actor lifecycle and child supervision. It does not care about how you wire the actors up with your messages and/or any custom lifecycle handling.
So, from the point of view of your application, you can have your worker actors run as peers with some some sort of consensus scheme. They will of course have system parent (/user if you won't define one yourself), but as long as you don't care about supervision - and if you're just starting to learn Akka you might not - everything will work fine.
Finally, note that there can be many schemes for working in a "worker pool" setup. For example, this article on work pulling might give you some inspiration on the possible solutions to such problems.

Actors (scala/akka): is it implied that the receive method will be accessed in a threadsafe manner?

I assume that the messages will be received and processed in a threadsafe manner. However, I have been reading (some) akka/scala docs but I didn't encounter the keyword 'threadsafe' yet.
It is probably because the actor model assumes that each actor instance processes its own mailbox sequentially. That means it should never happen, that two or more concurrent threads execute single actor instance's code. Technically you could create a method in an actor's class (because it is still an object) and call it from multiple threads concurrently, but this would be a major departure from the actor's usage rules and you would do it "at your own risk", because then you would lose all thread-safety guarantees of that model.
This is also one of the reasons, why Akka introduced a concept of ActorRef - a handle, that lets you communicate with the actor through message passing, but not by calling its methods directly.
I think we have it pretty well documented: http://doc.akka.io/docs/akka/2.3.9/general/jmm.html
Actors are 'Treadsafe'. The Actor System (AKKA), provides each actor with its own 'light-weight thread'. Meaning that this is not a tread, but the AKKA system will give the impression that an Actor is always running in it's own thread to the developer. This means that any operations performed as a result of acting on a message are, for all purposes, thread safe.
However, you should not undermine AKKA by using mutable messages or public state. If you develop you actors to be stand alone units of functionality, then they will be threadsafe.
See also:
http://doc.akka.io/docs/akka/2.3.12/general/actors.html#State
and
http://doc.akka.io/docs/akka/2.3.12/general/jmm.html for a more indepth study of the AKKA memory model and how it manages 'tread' issues.