Reliable Actors - share a collection between all instances? - azure-service-fabric

Currently, I spawn a bunch of actors to do some long running task. I put these into a List, then call Task.WaitAll() to collect the results.
However, I want to collect the results as each Actor completes its task (rather than wait for every actor to finish and aggregate it at the end).
I was thinking about using reliable collections, but how do I share a reliable collection between all actors?
Thanks!

Actors are separate entities by design so you can't simply 'share' a collection between several instances. A couple of workarounds that cross my mind -
Spawn a dedicated actor with a fixed id and make other actors to communicate with this one when they finish their job
Create a separate stateful service that your actors will be accessing to store the required data
Have a separate resource, like EventHub, to listen for the events that actors will generate upon tasks completion

Related

Use akka actors to traverse directory tree

I'm new to the actor model and was trying to write a simple example. I want to traverse a directory tree using Scala and Akka. The program should find all files and perform an arbitrary (but fast) operation on each file.
I wanted to check how can I model recursion using actors?
How do I gracefully stop the actor system when the traversal will be finished?
How can I control the number of actors to protect against out of memory?
Is there a way to keep the mailboxes of the actors from growing too big?
What will be different if the file operation will take long time to execute?
Any help would be appreciated!
Actors are workers. They take work in and give results back, or they supervise other workers. In general, you want your actors to have a single responsibility.
In theory, you could have an actor that processes a directory's contents, working on each file, or spawning an actor for each directory encountered. This would be bad, as long file-processing time would stall the system.
There are several methods for stopping the actor system gracefully. The Akka documentation mentions several of them.
You could have an actor supervisor that queues up requests for actors, spawns actors if below an actor threshold count, and decrementing the count when actors finish up. This is the job of a supervisor actor. The supervisor actor could sit to one side while it monitors, or it could also dispatch work. Akka has actor models the implement both of these approaches.
Yes, there are several ways to control the size of a mailbox. Read the documentation.
The file operation can block other processing if you do it the wrong way, such as a naive, recursive traversal.
The first thing to note is there are two types of work: traversing the file hierarchy and processing an individual file. As your first implementation try, create two actors, actor A and actor B. Actor A will traverse the file system, and send messages to actor B with the path to files to process. When actor A is done, it sends an "all done" indicator to actor B and terminates. When actor B processes the "all done" indicator, it terminates. This is a basic implementation that you can use to learn how to use the actors.
Everything else is a variation on this. Next variation might be creating two actor B's with a shared mailbox. Shutdown is a little more involved but still straightforward. The next variation is to create a dispatcher actor which farms out work to one or more actor B's. The next variation uses multiple actor A's to traverse the file system, with a supervisor to control how many actors get created.
If you follow this development plan, you will have learned a lot about how to use Akka, and can answer all of your questions.

When to create an Akka Actor

I have a REST service which services only one POST request. I want to use an actor to process the request. However I don't know if I should create one actor and derive all the requests using this actor or should I create an actor every time I get a request. What are the pros and cons of these choices.
Also, how is it parallel execution when I create one actor and use that actor to process all my requests. It certainly looks like sequential execution. I would want to understand this as well.
If you use one Actor requests are queued inside the actor mail box and are processed one by one by the actor. This is sequential and not recommended.
Thats why it is said
One actor is no actor.
Create a manager Actor which manages other actors. As actors are quite cheap you can create one actor for every request without any problem.
Do db interactions and other heavy computation using a future and direct results of the future to request handling actor using pipeTo pattern.
Use actors only to divide and distribute work and use Futures to do compute intensive work.
I would create an actor per request and use the "tell" pattern to delegate the work to the newly created actor. If the REST framework you use supports completing the request from another actor (Spray, Akka-HTTP does), then you can complete the request from this new actor. This way your request handling actor is free to handle the next request.
I find this a wonderful resource that explains the pros & cons of ask & tell and per-request-actors. It can be helpful to you.
I agree with what #pamu said. Actors are cheap. But be mindful that if ever you are gonna use a singleton Actor, do not make it stateful it will cause trouble.
And if you are gonna use Futures to do intensive work (which you should do). Make sure you give them specific ExecutionContext / Dispatcher. Using the global dispatcher or ExecutionContext is not good.
Or in each api you have, create a certain dispatcher to control the # of Actors that will work on that kind of endpoint / api.
For example you have "/get/transactions"
specify a dispatcher that would only spawn this # of thread. For this api.
The advantage of this is you can control the # of threads and resources your app uses. When it comes to dealing with heavy traffic. This is a good practice.

How much processing should be done in an Akka actor ?

I've to do some processing (run some jobs) on a regular interval (10min, 30min, 60min ...). I've different actors for each. I'm sending messages to self inside an actor to trigger the processing. The processing done inside an actor could take anywhere from a 10ms to 30s or more in some cases. My question from an Actor design perspective is how "heavy" can the processing inside an Actor receive be ? Or it doesn't really matter ?
It's fine to have actors that take arbitrarily long to process a message. But you wouldn't want to do that in an actor that is responsible for the timely processing of further messages.
A common pattern is to have a manager actor that receives messages and farms out work to worker actors. It's no problem if the worker actors take a long time, because the manager actor can simply create another worker actor if it needs one.
Think about what needs to be responsive and what will take a long time, and make sure a given actor isn't doing both.
But also, if you have something to do that is going to take 30 seconds, you might want to break it up into multiple actors somehow so that multiple cores can work on it in parallel. Another common pattern is for an actor given a task to look at how big it is and if it is over some threshold, spawn multiple actors and give each part of the job. Each of those actors then does the same thing, deciding whether to do the job itself or split up the work among some child actors. In this way trees of actors are formed on the fly based on the size of the job, and when the job is done the actors disappear.

Stateless Akka actors

I am just starting with Akka and I am trying to split some messy functions in more manageable pieces, each of which would be carried out by a different actor.
My task seems exactly what is suitable for the actor model. I have a tree of objects, which are saved in a database. Each node has some attributes; lets' concentrate on just one and call it wealth. The wealth of the children depends on the wealth of the parent. When one computes the wealth on a node, this should trigger a similar computation on the children. I want to collect all the updated instances of nodes and save them at the same time in the DB.
This seems easy: an actor just performs the computation and then starts another actor for each child of the current node. When all children send back a message with the results of the computation, the actor collects them, adds the current node and sends a message to the actor above him.
The problem is that I do not know a way to be sure that one has received a message from all the children. A trivial way would be to use a counter and increase it whenever a message from a child arrives.
But what happens if two independent parts of the system require such a computation to be performed and the same actor is reused? The actor will spawn twice as many children and the counting will not be reliable anymore. What I would need is to make sure that the same actor is not reused from the external system, but new actors are generated each time the computation is triggered.
Is this even possible? If not, is there some automatic mechanism in Akka to make sure that every child has performed its task?
Am I just using the actor model in a situation that is not suitable here? Essentially I am doing nothing more than could be done with functions - the actors themselves are stateless, but they allow me to simplify and parallelize the computation.
The way that you describe things, I think what you really want is a recursive function that returns a Future[Tree[NodeWealth]], the function would spawn a Future every time it is called and it would call itself for each child in the hierarchy, at the end of the function it would compose the Futures from those calls into a single Future[Result]. When the recursive function terminates and returns the Future[Tree[NodeWealth]] you can then compose it with another Future which updates your DB. Take a look at the Akka documentation here on Futures. And pay particular attention to the section "Composing Futures".
The composition of futures should allow you to avoid state and implement this easily.
Another option would be to use actors and futures, and use the ask pattern on child actors, compose the resulting futures into a single future which you return to the sender (parent actor). This is essentially the same thing, just intertwined with actors. I would probably only go this route if you are already representing your nodes as actors for some other reason.

Akka - How many instances of an actor should you create?

I'm new to the Akka framework and I'm building an HTTP server application on top of Netty + Akka.
My idea so far is to create an actor for each type of request. E.g. I would have an actor for a POST to /my-resource and another actor for a GET to /my-resource.
Where I'm confused is how I should go about actor creation? Should I:
Create a new actor for every request (by this I mean for every request should I do a TypedActor.newInstance() of the appropriate actor)? How expensive is it to create a new actor?
Create one instance of each actor on server start up and use that actor instance for every request? I've read that an actor can only process one message at a time, so couldn't this be a bottle neck?
Do something else?
Thanks for any feedback.
Well, you create an Actor for each instance of mutable state that you want to manage.
In your case, that might be just one actor if my-resource is a single object and you want to treat each request serially - that easily ensures that you only return consistent states between modifications.
If (more likely) you manage multiple resources, one actor per resource instance is usually ideal unless you run into many thousands of resources. While you can also run per-request actors, you'll end up with a strange design if you don't think about the state those requests are accessing - e.g. if you just create one Actor per POST request, you'll find yourself worrying how to keep them from concurrently modifying the same resource, which is a clear indication that you've defined your actors wrongly.
I usually have fairly trivial request/reply actors whose main purpose it is to abstract the communication with external systems. Their communication with the "instance" actors is then normally limited to one request/response pair to perform the actual action.
If you are using Akka, you can create an actor per request. Akka is extremely slim on resources and you can create literarily millions of actors on an pretty ordinary JVM heap. Also, they will only consume cpu/stack/threads when they actually do something.
A year ago I made a comparison between the resource consumption of the thread-based and event-based standard actors. And Akka is even better than the event-base.
One of the big points of Akka in my opinion is that it allows you to design your system as "one actor per usage" where earlier actor systems often forced you to do "use only actors for shared services" due to resource overhead.
I would recommend that you go for option 1.
Options 1) or 2) have both their drawbacks. So then, let's use options 3) Routing (Akka 2.0+)
Router is an element which act as a load balancer, routing the requests to other Actors which will perform the task needed.
Akka provides different Router implementations with different logic to route a message (for example SmallestMailboxPool or RoundRobinPool).
Every Router may have several children and its task is to supervise their Mailbox to further decide where to route the received message.
//This will create 5 instances of the actor ExampleActor
//managed and supervised by a RoundRobinRouter
ActorRef roundRobinRouter = getContext().actorOf(
Props.create(ExampleActor.class).withRouter(new RoundRobinRouter(5)),"router");
This procedure is well explained in this blog.
It's quite a reasonable option, but whether it's suitable depends on specifics of your request handling.
Yes, of course it could.
For many cases the best thing to do would be to just have one actor responding to every request (or perhaps one actor per type of request), but the only thing this actor does is to forward the task to another actor (or spawn a Future) which will actually do the job.
For scaling up the serial requests handling, add a master actor (Supervisor) which in turn will delegate to the worker actors (Children) (round-robin fashion).